jqueue: Use timeout when acquiring locks
As already noted in the design document, an opcode's priority isincreased when the lock(s) can't be acquired within a certain amount oftime, except at the highest priority, where in such a case a blockingacquire is used....
Migrate call from backend._GetVGInfo to bdev.LogicalVolume.GetVGInfo
This patch removes duplicate code found in backend which also needs toget VG infos. To make it simpler we moved to bdev.LogicalVolume.GetVGInfo.
Signed-off-by: René Nussbaumer <rn@google.com>...
jqueue: Introduce per-opcode context object
This is better to group per-opcode data.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
mcpu: Adjust lock acquire strategy
The changes to job queue processing require some changes on this class'interface. LockAttemptTimeoutStrategy might move to another place, but that'llbe done in a later patch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
mcpu.Processor: Raise exception on lock acquire timeout
Right now the timeout is not passed by any caller, making the codeeffectively go back to blocking acquires. Since the timeout is alwaysNone, no caller needs to be changed in this patch.
This change also means that any LUXI query handled by ganeti-masterd...
jqueue: Rename current_op to better reflect what it actually is
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
jqueue: Separate function for in-memory variables
Merge branch 'devel-2.2'
Merge branch 'stable-2.2' into devel-2.2
jqueue: Add unittest for _QueuedJob.CalcStatus
Use free space in vg instead of biggest free pv space for a snapshot
Even for snapshot we looked at the biggest free pv space even thoughthe vg might have fit the snapshot we aborted if one of the pvs was toosmall. This patch fixes this by looking at the vg size instead of the pv...
Merge branch 'devel-2.1' into devel-2.2
Fix mac checker regex
Currently, the mac checker regex could match a corner case of11:22:33:44:55:66: (one extra colon at the end). We fix this, and wealso move the regex compilation outside of this function, at modulelevel.
Signed-off-by: Iustin Pop <iustin@google.com>...
Ignore failures while shutting down instances during failover from offline node
Don't abort failover if instance shutdown doesn't work on a node markedoffline. The node is offline, so the instances living on it are too. Beforeyou had to use --ignore-consistency to archieve that....
cli: Expose priority option and pass priority to master daemon
jqueue: Change model from per-job to per-opcode processing
In order to support priorities, the processing of jobs needs to bechanged. Instead of processing jobs as a whole, the code is changed toprocess one opcode at a time and then return to the queue. See the...
jqueue: Use priority for worker pool
A small helper function is added to make this easier. Priorities are notyet used in all necessary places.
Fix migration on new KVMs
New KVMs (0.12.1.2-el6 and 0.13.5 tested) exit immediately afterunsuccessful network connection when they are in "-incoming" mode. Thesimple check netutils.TcpPing causes remote kvm to exit so the migrationwill always fail. This check is also redundant by the way as if the...
cli: Pass options in {Add,Remove}Tags
They'll be used for job priorities. Also add an empty line tognt-os where it's missing.
jqueue: Add missing docstring to _QueuedJob.Cancel
This was forgotten in commit 099b2870b.
Bail out if daemon gets fired up under wrong uid
This patch bails out in early stage, if the user invoking the daemondoesn't match the user set at configure time.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
opcode summary: improve display for list summaries
Currently, opcodes like NODE_EVAC_STRATEGY look bad:
89684 error NODE_EVAC_STRATEGY([u'node3'])
With this patch, we try to render list arguments a little bit better:
89684 error NODE_EVAC_STRATEGY(node3)...
workerpool: Fix typo
A call to logging.debug was missing an argument, leading to complaintson stderr at runtime.
jqueue: Move CancelJob logic to separate function
Moving the internals of this function will allow it to be used fromunittests in the future. Splitting this into a pure, side-effect freefunction and an impure one makes the pure function easily testable....
(no conflicts, took LGTM from original commit)
cmdlib: Fix type of “name” parameter for tag operations
The parameter “name” is be None for cluster tags.
rlib2: Set tag operation param “name” to None for cluster tags
Otherwise parameter verification in the master daemon fails.
Stop all daemons precautiosly before trying to start ganeti-noded again
Please note that if the pid file is broken or missing we'll not catchthe process (if any is running) and it's up to the user to fix this state
Check for duplicate nodegroup names
Since the nodegroups dict is indexed by uuid duplicate names mighthappen as a result of bugs. Add a check to prevent them.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Split a long docstring line in objects.py
InitConfig: create nodegroups as well
This patch also ensures that the initial configuration has all theneeded UUIDs and that they are unique, by using aTemporaryReservationManager inside InitConfit to generate them.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Export nodegroups list (names/uuids) via ssconf
Add nodegroup bash autocomplation
We autocomplete both by nodegroup name and uuid.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add nodegroup option to AddNode
Add node's nodegroup field
If a node doesn't have a node group we'll upgrade the config making itthe cluster default. Also the node add and removal operations arechanged to set/clear the node group correctly. Finally we populate the"members" list of nodegroups on config load with the value from the...
Check for nodegroup uuid indexing
Since the uuid is immutable the probability of it getting out of syncbetween the object and the dict key is very low. Still, checking ischeap, so we do it to be more sure nothing is wrong.
config.LookupNodeGroup
This function allows a node group to be looked up by name or uuid.If no nodegroup is specified and only one exists, that one is returned.
ConfigWriter: create the default node group
If no node groups exist we'll create a new default empty one.
Add a 'nodegroups' slot to ConfigData
Also: - reformat the slots declaration of ConfigData - call UpgradeConfig on each node groups to handle future upgrades - add nodegroups special case in {To,From}Dict - add nodegroups to _AllUUIDObjects
_ContainerFromDicts: handle None source
When _ContainerFromDicts is called on some element which doesn't existin the config, because it is yet to be upgraded, it will receive itsvalue as None. We take care of this case by using an empty element ofthe required target type....
Add a new NodeGroup config object
The "members" slot of this object is not serialized, and is discardedwhen deserializing, initializing it explicitly to [].
cli: Add option definition for priority
jqueue: Ensure only accepted priorities are allowed for submitting jobs
Quoting the design document: “Submitted opcodes can have one of the prioritieslisted below. Other priorities are reserved for internal use”. Submitting jobsat priority -20 should not be allowed....
Add support for job priority to opcodes and job queue objects
This allows clients to submit opcodes with a priority. Except for beingtracked by the job queue, it is not yet used by any code.
Unittests for jqueue._QueuedOpCode and jqueue._QueuedJob are provided for...
Add job priority constants
Remove mcpu's ReportLocks callback
This is no longer needed with the new lock monitor. One callback is kept tocheck for cancelled jobs.
Revert "jqueue: Resume jobs from “waitlock” status"
This reverts commit 4008c8edae31a3971fa8c4b200238afc8005d3d4.
While it worked in my initial tests, I've now found cases where this doesn'twork properly as it is. More work is needed and will be done as part of the...
Fix OS_VARIANT variable setting
This was introduced in efaa9b06d1e1e6d1678d0edd75b1ba37cf0de3d9.
in OSCoreEnv: inst_os.name is pure operating system name (without variant) as variant is stripped in OSFromDisk(). So we always get variant = inst_os.supported_variants[0] (first...
Fix pylint warning in http/__init__.py
My bad for not seeing this before:R0201:614:HttpBase.GetSslCiphers: Method could be a function
Allow SSL ciphers to be overridden in HTTP server
Users of this class, such as the RAPI server, might want to override or adjustthe default SSL cipher defined in a constant.
jqueue: Resume jobs from “waitlock” status
After an unclean restart of ganeti-masterd, jobs in the “waitlock” status canbe safely restarted. They hadn't modified the cluster yet.
jqueue: Move queue inspection into separate function
This makes the init function a lot smaller while not changingfunctionality.
jqueue: Don't update file in MarkUnfinishedOps
This reduced the number of updates to the job files. It's used in two placeswhile processing a job and the file is updated just afterwards.
locking.SharedLock: Update class docstring
This was already outdated when the initial version of SharedLock was addedin commit 162c1c1f1 (February 2008).
locking: Implement priorities in SharedLock and LockSet
For proper support of job priorities, jobs' locks need to respectpriorities. Otherwise it could happen that a job with a lower prioritycould get a lock before a job with a higher priority (depending on...
Remove utils.EnsureDir as this is done by ensure-dirs.in now
The config now should also belong to confd group and readable by it
Move job queue to new ganeti.runtime
Revert "Make it possible to call utils.Daemonize with uid and gid to run as"
This reverts commit 743b53d4eb9f3de46edb5e54738dab287b1979ac.
Conflicts:
lib/daemon.py
Trivial conflict resolved. This patch reverts changes from earlier permissionsseparation stage. This is not needed anymore as start-stop-daemon takes care...
cli: Use list of options shared between commands
The completion script for bash has to know about these options. Until nowthe list was in two places--once in cli.py and once inautotools/build-bash-completion. A shared list is used with this patch.
jqueue: Use separate function for encoding errors
Comes with unittest.
Cluster.UpgradeConfig: populate primary_ip_family
Adding a runtime configuration library
This is used to expand the users/group names just once atinitial call.
Log warning instead of raising OpExecError for ndisc6
Signed-off-by: Manuel Franceschini <livewire@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix some epydoc warnings
Signed-off-by: Manuel Franceschini <livewire@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix breakage introduced by commit 8044bf655
Note to self: even patches removing one line can break everything.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Manuel Franceschini <livewire@google.com>
Remove “dry_run” from opcodes.OpCreateInstance
I'ts declared twice, once in opcodes.OpCode and here, and this oneis redundant.
Handle ENOENT case in ssconf.GetPrimaryIPFamily
This patch adds an optional default parameter to SimpleStore._ReadFile. Thiscan be used to default the return value of this method in case the ssconf fileis not present.
In this particular case it is used to return AF_INET in case...
Show list of pending acquires in “gnt-debug locks”
This is accomplished by keeping a list of waiting threads insteadof just their number inside the lock-internal condition. A fewother tweaks to the output format are also made.
Fix scp command when target is an IPv6 address
Due to the syntax used for the target in scp <target>:<path>, it isnecessary when the target is an IPv6 address to enclose it in squarebrackets.
Change bootstrap.SetupDaemonNode to use scp as we can assume SSH is setup
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Manuel Franceschini <livewire@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
hansmi helped me with merging the conflict. Thanks
Conflicts: lib/workerpool.py
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Adding a paramiko fingerprint format helper
And provide unittests for them
workerpool: Add support for task priority
To add job priorities, the worker pool underlying the job queue mustsupport priorities per task. This patch adds them to the worker pool.
Add simple lock monitor
This patch adds an initial implementation of a lock monitor, accessiblefor the user through “gnt-debug locks”. It currently shows all resourcelocks: BGL, nodes and instances. Config and job queue locks could beshown too, but wouldn't be of much help. The current owner(s) and mode...
workerpool: Allow setting task name
With this patch, the task name is added to the thread name and will show up inlogs. Log messages from jobs will look like “pid=578/JobQueue14/Job13 mcpu:289DEBUG LU locks acquired/cluster/BGL/shared”.
Use one function to parse “--fields” option values
locking.LockSet: Use function to get member lock name
Switch to the RPC call to update /etc/hosts in LUAddNode and LURemoveNode
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add RPC calls to update /etc/hosts
Fix a few epydoc docstrings
Make family argument in FormatAddress optional
By doing this we delegate the task of finding the correct address familyto the FormatAddress method.
Support IPv6 for instances
gnt-node add: add error msg when using IPv6
Use family in backend.StartMaster
This patches changes the StartMaster method to consult the clusterprimary ip version when deciding whether to use arping or ndisc6 afteractivating the master ip.
Signed-off-by: Manuel Franceschini <livewire@google.com>...
Make Hostname object always intialize its name to fqdn
This patch restores the behaviour of Hostname (previously known asHostInfo) to always use fqdn. This was broken due to the fact that thenow used getaddrinfo does not return an fqdn in contrast togethostbyname_ex()....
Fix small spelling mistake
Stop adding the dry-run option by default
Currently cli.py unconditionally adds the dry-run option. This patchdisables this, and exports dry-run as a normal option.
The other alternative I tried to implement (adding a new fake option fordisabling the auto-add per individual command) would require changes in...
jqueue: Remove lock status field
With the job queue changes for Ganeti 2.2, watched and queried jobs areloaded directly from disk, rendering the in-memory “lock_status” fielduseless. Writing it to disk would be possible, but has a huge cost atruntime (when tested, processing 1'000 opcodes involved 4'000 additional...
utils: Use WriteFile in {Set,Remove}EtcHostsEntry
This avoids duplicate effort and has been a TODO for a long time.
Don't ignore secondary node silently
Currently on non-mirrored disk templates the secondary node is ignoredsilently. This patch adds a check for this case, and warns the usershould this be happening. This solves issue 113.
The patch also moves a prereq check to an argument check. This is ok...