Fix group verification of offline nodes
Commit aef59ae7 reworked the file verification, but forgot to takeinto account offline nodes.
The fact that this was not detected yet is due to the fact that wedon't test clusters with offline nodes in QA :(
Signed-off-by: Iustin Pop <iustin@google.com>...
Disallow variants for OSes that don't support them
Otherwise we get no variant checks at all, but the variant is stillrecorded.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix OS queries for API v20 w/parameters
OS parameters is a list of tuples, so we can't pass it directly toutils.NiceSort, hence we use a sort key.
This was not detected in QA since QA only tests API v10 :(
Add helper for declaring all locks shared
This patch adds a function for abstracting“dict.fromkeys(locking.LEVELS, 1)”. It also removes a duplicateassignment for the share_locks in LUInstanceQuerydata.
Additionally, it moves the _SupportsOob function to the helper...
Add ht-based result checks to opcodes
This adds the infrastructure necessary to check opcode results usinght-based functions. Checks are added for two opcodes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Change OpClusterVerifyDisks to per-group opcodes
Until now verifying disks, which is also used by the watcher,would lock all nodes and instances. With this patch the opcodeis changed to operate on per nodegroup, requiring fewer locks.
Both “gnt-cluster” and “ganeti-watcher” are changed for the...
cmdlib: Give instance name in error message on group evacuation
cmdlib: Factorize mapping instance LVs to node/volume
cli.JobExecutor: Feedback function for info output
This will be used in the watcher where we don't want topollute stdout unless in debug mode.
Fix recompilation of htools on regen-vcs-version
Currently, most htools code depends on Constants.hs which is generatedfrom constants.py and also depends on _autoconf.py. Also, _autoconf.pydepends on vcs-version, which all together means that when 'make...
Add another name for the --yes-do-it option
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Most boring patch ever
s/'/"/ in (hopefully) the right places.
Merge branch 'devel-2.4'
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Reopen daemon's stdio on SIGHUP
Before this patch daemons would continue to refer to an old logfile fortheir standard I/O if they had been asked to reopen the log (SIGHUP).
Reopen log file only once after SIGHUP
Commit b6fa9a44 added a re-openable log handler. The log file isreopened when a daemon is sent a HUP signal. Due to a bug in the code,fixed by this patch, the log file would be reopened for every single logmessage thereafter....
Don't leak file descriptors when setting up daemon output
When a daemon's output is configured using “utils.SetupDaemonFDs”, thefunction must use dup2(2). Unfortunately the code didn't close theoriginal file descriptors, leaking them in the process.
gnt-instance info: Return static info if node offline
Before this patch “gnt-instance info” would fail with the error message“Error checking node $node: Node is marked offline” if the instance'sprimary node is marked offline and the user didn't explicitely request...
Ignore offline primary when failing over
When the source node for a failover is marked offline, there's no needto require the user to specify “--ignore-consistency”.
To make it work at all, a number of bugs introduced by the merge ofmigration and failover are also fixed by this patch....
gnt-instance console: Use query instead of opcode
This means opening the console no longer requires the instance lock,allowing it to be used during long-running operations (e.g. replacing adisk).
Add opcode attribute for comments
This attribute allows programmatic submitters of jobs (e.g. iallocator)to add a comment to each opcode, describing its purpose. Example:
$ gnt-job info 123Job ID: 123 … Opcodes: OP_INSTANCE_REPLACE_DISKS …...
gnt-node volumes: Fix instance names
Commit 84d7e26b changed “objects.Instance.MapLVsByN” to not just returnthe LV name, but to include the volume group name (e.g.“xenvg/d67e8700….disk0_data”). This in turn broke the mapping of volumenames in LUNodeQueryvols, stopping instance names from displayed in...
Fix instance failover (missing argument)
More fallout from commit 323f9095b49d.
Implement instance failover via RAPI
No idea why this was missed before.
Make lock monitor more versatile
With this change it'll be possible to register other lock informationproviders. One usecase for this are job dependencies, which can be shownin the output of “gnt-debug locks”, too.
The lock monitor is changed to accept more than one return value from...
locking.GLM: Allow adding locks to monitor
This will be used for exporting job dependencies throughthe lock monitor.
Export job dependencies through lock monitor
This makes them visible to the user. Example:
$ gnt-debug locks -o name,pendingName Pendingjob/890 job:891,892job/892 job:894
Add error state to LUGroupEvacuate's exceptions
Rename *_STATUS_WAITLOCK to …_WAITING
This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to {JOB,OP}_STATUS_WAITING, as per design document for chained jobs.
gnt-group: Add command to evacuate whole group
Add new opcode for evacuating group
Fix locking issue with job dependencies
When jobs waiting for a dependency are notified, they're re-added to thequeue. This would require owning the queue lock in exclusive mode, butsince the function doing so is called from within the job/opcodeprocessor, it only holds the lock in shared mode....
jqueue: Read-only jobs don't need processor lock
Add support for KVM keymaps
Signed-off-by: Sébastien Bocahu <zecrazytux@zecrazytux.net>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
gnt-debug: Add tests for job dependencies
jqueue: Implement submitting multiple jobs with dependencies
With this change users of the “SubmitManyJobs” interface can userelative job dependencies. Relative job IDs in dependencies are resolvedbefore handing the job off to the workerpool.
Fix node evacuation
- Adjust for new iallocator result format- Split some code into helper functions
jqueue: Add “writable” flag to memory objects
Basically only one instance of the job, the one being processed,should be serialized to disk and replicated to other nodes. Withthis flag assertions can be added in various places.
Implement chained jobs
An overview is available in the design document for this change,doc/design-chained-jobs.rst.
When a job enters the job processor, the current opcode's dependenciesare evaluated. If a referenced job has not yet reached the desired...
Remove constants for iallocator multi-relocate
They're no longer necessary.
Fix assertion error on unclean master shutdown
Commit 66bd7445 added an assertion to ensure a finalized job has its“end_timestamp” attribute set. Unfortunately it didn't cover a case whenthe queue is recovering from an unclean master shutdown.
Make SharedLock._is_owned public
This will be useful for assertions. GanetiLockManager._is_owned isexported, too.
Adding a wrapper around "xm console"
The wrapper will connect to the console, and check in the background ifthe instance is paused, unpausing it as necessary.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adding a wrapper around connecting to kvm console
Add opcode attribute for chained jobs
Set startup_paused to False when restarting
This fixes the lint error:
E1120:1220:InstanceReboot: No value passed for parameter'startup_paused' in function call
gnt-cluster {command|copyfile}: Support per-group operations
This patch allows commands to be run on and files to be copied to allnodes within a specific group.
cli.GetOnlineNodes: Support node group filter, use query2
This patc changes cli.GetOnlineNodes to use query2, which does thefiltering in the master daemon, and adds a new parameter to filter bynode group.
Unittests were added for the old implementation and then adopted to...
ht.WithDesc: Work around pylint warning
Explicitely defining “__call__” silences a pylint warning when wrappedtype check functions are used directly. I had no idea pylint is thisintelligent.
ht: Add new check for numbers
Places which receive floats can usually also deal with integers, e.g.OpTestDelay. Tests are added and the new check function is used for theaforementioned opcode and verifying query results.
Fix off-by-one bug in job serial generation
Commit 009e73d0 (September 2009) changed the job queue to generatemultiple job serials at once. Ever since it would return one more thanrequested.
The “serial” file in the job queue directory is defined to contain the...
Reverts the patch series about console wrappers
This reverts commits 030a9cb8022b83bf43ec14dfbafd943299bc01c4 andae082df0000a785b693b2f4aa434650a81a94bdf.
There are two problems:
- Makefile.am breakage, which is trivial to revert- unittest breakage, which honestly I'm not sure how to fix and how...
Add gnt-instance start --pause
Creates the instance, but pauses execution before booting. This combinedwith 'gnt-instance console' unpausing instances means that the entireboot process can be viewed and monitored.
Signed-off-by: Stephen Shirley <diamond@google.com>...
Fix lint error
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
RAPI: Document all feature strings
- Use constants and an assertion- Update documentation for node migration
Remove old node evacuation opcode
LUNodeEvacStrategy has been replaced with LUNodeEvacuate.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Change RAPI for new node evacuation opcode
The change is not backwards compatible, see the updated NEWS file.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Change “gnt-node evacuate” to use new opcode
By default it'll now evacuate all instances from the node, notjust secondaries.
Add new opcode to evacuate node
This new opcode will replace LUNodeEvacStrategy, which used to return alist of instances and new secondary nodes. With the new opcode theiallocator (if available) is tasked to generate the necessary operationsin the form of opcodes. This moves some logic from the client to the...
Alias gnt-job show to gnt-job info
Am I the only one to make that mistake 10 times a week?
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix cluster verify for empty node groups
There were some implicit assertions in the code that all node groupshave nodes, which is not necessarily true.
Additionally, the patch does a wrapping change.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Fix a lint warning
Patch db8e5f1c removed the use of feedback_fn, hence pylint warnnow.
Fix bug in recreate-disks for DRBD instances
The new functionality in 2.4.2 for recreate-disks to change nodes isbroken for DRBD instances: it simply changes the nodes without caringfor the DRBD minors mapping, which will lead to conflicts in non-empty...
KVM: configure bridged NICs at migration start
Commit 5d9bfd870 moved tap interface handling from KVM to Ganeti, partlyto also solve the problem of routed interfaces getting configured tooearly during live migrations, causing network anomalies. In that...
Fix bug in drbd8 replace disks on current nodes
Currently the drbd8 replace-disks on the same node (i.e. -p or -s) hasa bug in that it does modify the instance disk temporarily beforechanging it back to the same value. However, we don't need to, andshouldn't do that: what this operation do is simply change the LVM...
Conflicts: lib/cmdlib.py - use RequireSharedFileStorage there
Signed-off-by: Guido Trotter <ultrotter@google.com>...
remove bootstrap._InitSharedFileStorage
This function is a copy of bootstrap._InitFileStorage with the followingdifferences: - check constants.ENABLE_SHARED_FILE_STORAGE and not constants.ENABLE_FILE_STORAGE - use different local variable names - one different error string...
LUInstanceCreate: use opcodes.RequireFileStorage
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Don't add ",boot=on" to disks on kvm >= 0.14
Under newer kvm this prevents the vm from starting.Ah, change!
KVM: fix per-instance stored UID value
When using the pool security model, ExecuteKVMRuntime was storing theinstance's UID using str(uid), which would result in storing theLockedUid._repr__() result:
$ cat /var/run/ganeti/kvm-hypervisor/uid/xxxxxxxxxxxxx...
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Add one forgotten element to the file disk path
This was left out during the fix/refactoring
Conflicts: lib/cmdlib.py - use constants.DTS_FILEBASED...
Add DTS_FILEBASED constant
LUInstanceCreate: fix file storage dir calculation
- Move the calculation at the beginning of CheckPrereq, since it doesn't modify any state, but still keeps locks- Only perform the calculation if the actual disk template is filebased- Error out if there is no defined file storage dir...
Check that filestorage is enabled when requested
Remove self.op.file_storage_dir isabs check
As the manpage says, and the code does, self.op.file_storage_dir is anadditional relative path under the cluster file storage dir. As such itshould not be absolute.
Replace iallocator's mreloc w/ change-group and node-evac
This patch removes all occurrences of the “multi-relocate” iallocatormode. Commit 25ee7fd845 updated the design document and introducedseparate modes, “change-group” and “node-evacuate”. The constants aren't...
jqueue: Allow loading of archived jobs
Chained jobs need to look at previous jobs, including archived ones. Anice side-effect of this change is the ability to look at archived jobsusing “gnt-job info <id>” as long as the ID is known.
Adding basic abstraction layer for caching
This includes an own simple cache implementation and aninterface to a memcache instance.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix locking issues in LUClusterVerifyGroup
- Use functions in ConfigWriter instead of custom loops- Calculate nodes only once instances locks are acquired, removes one potential race condition- Don't retrieve lists of all node/instance information without locks...
cmdlib: Acquire BGL for LUClusterVerifyConfig
LUClusterVerifyConfig verifies a number of configuration settings. Fordoing so, it needs a consistent list of nodes, groups and instances. Sofar no locks were acquired at all (except for the BGL in shared mode)....
Export/import instance tags
Fix issue with tags on instance creation
Commit 720f56c85a added the ability to specify tags when creating aninstance. The “tags” attribute of an instance object needs to be a set,but the patch's code saved it as a list, causing breakage in other parts...
Add tag handling to {Op,LU}InstanceCreate
Add a tag slot to opcodes.OpInstanceCreate. We do not reuse _PTags, as this isintended for OpTagsSet and thus:
a) is not documented b) does not carry a default value, making it mandatory
Also pass the tags to the iallocator during instance creation....
Add tagging option to gnt-instance create
Add TAG_ADD_OPT option to cli.py and use it in gnt-instance. Modifycli.GenericInstanceCreate() accordingly.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Signed-off-by: Iustin Pop <iustin@google.com>...
Export instance tags to instance hooks
Instance hooks now get an INSTANCE_TAGS environment variable, which contains aspace-delimited list of the affected instance's tags.
Also update the documentation to reflect the change.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
http.client: Make debug log less noisy
The HTTP client code generates quite a lot of debug log messages. Withthis patch they're hidden unless explicitely enabled in the code.
jqueue: Fix potential race condition when cancelling queued jobs
When a job was cancelled, its status would be changed and the filewritten again. Since this was a final status, the job file could bemoved anytime for archival. If the job was still in the queue, however,...
iallocator: fix incomplete refactoring
Commit fdbe29ee changed the iallocator modes from 'r'/'w' to'ro'/'rw', but forgot one check in LUTestAllocator. This patch justcompletes the replacements.
iallocator: export the hypervisor value
In 'allocate' mode, the documentation specifies that we export thehypervisor value (“Allocation needs, in addition: … hypervisor, thehypervisor of this instance”) and we need that on input, however wedon't actually export it....
iallocator: change default for target_groups
Per the design doc, the target_groups request key "if present, it musteither be the empty list, or contain a list of group UUIDs". Currentlyit defaults to None/null, which is not valid.
iallocator: rename mem_size to memory
Currently, the iallocator in 'allocate' requires mem_size on inputbut serialises that as 'memory'. This inconsistency makes it hard toautomatically validate the parameters, hence this patch renamesmem_size.
iallocator: add ht-checking for the request
Currently, we only ht-check the result value from the iallocator, andwe send whatever we happen to check manually in the LUs that call theiallocator.
This is not good, as we have to duplicate checks in many places, and...
gnt-node migrate: Use LU-generated jobs
Until now LUNodeMigrate used multiple tasklets to evacuate all primaryinstances on a node. In some cases it would acquire all node locks,which isn't good on big clusters. With upcoming improvements to the LUsfor instance failover and migration, switching to separate jobs looks...
Fix argument order in ReserveLV and ReserveMAC
ConfigWriter.ReserveLV() and Configwriter.ReserveMAC() calledTemporaryReservationManager.Reserve() with the ec_id and resource argumentsswapped. As a result, two reservation attempts for the same resource type...