History | View | Annotate | Download (460 kB)
Fix instance failover (missing argument)
More fallout from commit 323f9095b49d.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add error state to LUGroupEvacuate's exceptions
Add new opcode for evacuating group
Fix node evacuation
- Adjust for new iallocator result format- Split some code into helper functions
Merge branch 'devel-2.4'
Add gnt-instance start --pause
Creates the instance, but pauses execution before booting. This combinedwith 'gnt-instance console' unpausing instances means that the entireboot process can be viewed and monitored.
Signed-off-by: Stephen Shirley <diamond@google.com>...
Remove old node evacuation opcode
LUNodeEvacStrategy has been replaced with LUNodeEvacuate.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Add new opcode to evacuate node
This new opcode will replace LUNodeEvacStrategy, which used to return alist of instances and new secondary nodes. With the new opcode theiallocator (if available) is tasked to generate the necessary operationsin the form of opcodes. This moves some logic from the client to the...
Fix cluster verify for empty node groups
There were some implicit assertions in the code that all node groupshave nodes, which is not necessarily true.
Additionally, the patch does a wrapping change.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Fix bug in recreate-disks for DRBD instances
The new functionality in 2.4.2 for recreate-disks to change nodes isbroken for DRBD instances: it simply changes the nodes without caringfor the DRBD minors mapping, which will lead to conflicts in non-empty...
Fix a lint warning
Patch db8e5f1c removed the use of feedback_fn, hence pylint warnnow.
Fix bug in drbd8 replace disks on current nodes
Currently the drbd8 replace-disks on the same node (i.e. -p or -s) hasa bug in that it does modify the instance disk temporarily beforechanging it back to the same value. However, we don't need to, andshouldn't do that: what this operation do is simply change the LVM...
Conflicts: lib/cmdlib.py - use RequireSharedFileStorage there
Signed-off-by: Guido Trotter <ultrotter@google.com>...
LUInstanceCreate: use opcodes.RequireFileStorage
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Add one forgotten element to the file disk path
This was left out during the fix/refactoring
Conflicts: lib/cmdlib.py - use constants.DTS_FILEBASED...
Add DTS_FILEBASED constant
LUInstanceCreate: fix file storage dir calculation
- Move the calculation at the beginning of CheckPrereq, since it doesn't modify any state, but still keeps locks- Only perform the calculation if the actual disk template is filebased- Error out if there is no defined file storage dir...
Check that filestorage is enabled when requested
Remove self.op.file_storage_dir isabs check
As the manpage says, and the code does, self.op.file_storage_dir is anadditional relative path under the cluster file storage dir. As such itshould not be absolute.
Replace iallocator's mreloc w/ change-group and node-evac
This patch removes all occurrences of the “multi-relocate” iallocatormode. Commit 25ee7fd845 updated the design document and introducedseparate modes, “change-group” and “node-evacuate”. The constants aren't...
Fix locking issues in LUClusterVerifyGroup
- Use functions in ConfigWriter instead of custom loops- Calculate nodes only once instances locks are acquired, removes one potential race condition- Don't retrieve lists of all node/instance information without locks...
cmdlib: Acquire BGL for LUClusterVerifyConfig
LUClusterVerifyConfig verifies a number of configuration settings. Fordoing so, it needs a consistent list of nodes, groups and instances. Sofar no locks were acquired at all (except for the BGL in shared mode)....
Export/import instance tags
Fix issue with tags on instance creation
Commit 720f56c85a added the ability to specify tags when creating aninstance. The “tags” attribute of an instance object needs to be a set,but the patch's code saved it as a list, causing breakage in other parts...
Export instance tags to instance hooks
Instance hooks now get an INSTANCE_TAGS environment variable, which contains aspace-delimited list of the affected instance's tags.
Also update the documentation to reflect the change.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Add tag handling to {Op,LU}InstanceCreate
Add a tag slot to opcodes.OpInstanceCreate. We do not reuse _PTags, as this isintended for OpTagsSet and thus:
a) is not documented b) does not carry a default value, making it mandatory
Also pass the tags to the iallocator during instance creation....
iallocator: add ht-checking for the request
Currently, we only ht-check the result value from the iallocator, andwe send whatever we happen to check manually in the LUs that call theiallocator.
This is not good, as we have to duplicate checks in many places, and...
iallocator: rename mem_size to memory
Currently, the iallocator in 'allocate' requires mem_size on inputbut serialises that as 'memory'. This inconsistency makes it hard toautomatically validate the parameters, hence this patch renamesmem_size.
Signed-off-by: Iustin Pop <iustin@google.com>...
iallocator: change default for target_groups
Per the design doc, the target_groups request key "if present, it musteither be the empty list, or contain a list of group UUIDs". Currentlyit defaults to None/null, which is not valid.
iallocator: export the hypervisor value
In 'allocate' mode, the documentation specifies that we export thehypervisor value (“Allocation needs, in addition: … hypervisor, thehypervisor of this instance”) and we need that on input, however wedon't actually export it....
iallocator: fix incomplete refactoring
Commit fdbe29ee changed the iallocator modes from 'r'/'w' to'ro'/'rw', but forgot one check in LUTestAllocator. This patch justcompletes the replacements.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
gnt-node migrate: Use LU-generated jobs
Until now LUNodeMigrate used multiple tasklets to evacuate all primaryinstances on a node. In some cases it would acquire all node locks,which isn't good on big clusters. With upcoming improvements to the LUsfor instance failover and migration, switching to separate jobs looks...
TLReplaceDisks: Move assertion checking locks
Commit 1bee66f3 added assertions for ensuring only the necessary locksare kept while replacing disks. One of them makes sure locks have beenreleased during the operation. Unfortunately the commit added the check...
Fix bug in LUNodeMigrate
Commit aac4511a added CheckArguments to LUNodeMigrate with a call to_CheckIAllocatorOrNode. When no default iallocator is defined,evacuating a node would always fail:
$ gnt-node migrate node123Migrate instance(s) '...'?y/[n]/?: y...
node evac: don't call IAllocator if no instances
Currently we generate an empty list only for the '-n node' invocation,but for iallocator we still call the iallocator (which needs an RPCcall, etc.). By moving the computation of instances outside of the if...
Fix a couple of style mistakes
Cluster verify: check for nodes/instances with no group
Previously, all nodes and instances would always be visited/verified. Bydriving the verification by node group now, we will miss nodes andinstances that can't be reached from existing node groups, should that rare...
Cluster verify: fix LV checks for split instances
When sharding by group, if a mirrored instance is split (primary andsecondary) between two groups, its volumes will not be properly checked:the group of the primary will warn about a missing volume in the secondary,...
Cluster verify: make NV_NODELIST smaller
To cope with increasing cluster sizes, we now make nodes try to contact allother nodes in their group, and one node from every other group.
Signed-off-by: Adeodato Simo <dato@google.com>Signed-off-by: Guido Trotter <ultrotter@google.com>...
Cluster verify: verify hypervisor parameters only once
The list of all hypervisor parameters has to be computed inLUClusterVerifyGroup, since it needs to be passed to nodes asNV_HVPARAMS. However, it is better only to verify said parameters once,out of LUClusterVerifyConfig....
Split LUClusterVerify into LUClusterVerify{Config,Group}
With this change, LUClusterVerifyConfig becomes a "light" LU that onlyverifies the global config and other, master-only settings, and the bulk ofnode/instance verification is done by LUClusterVerifyGroup, which only acts...
Cluster verify: factor out error codes and functions
We move all error code definitions, plus the _Error and _ErrorIf helpers,to a private _VerifyErrors mix-in class that can be later shared by the newtwo cluster verify LUs.
(_Error and _ErrorIf code was moved around verbatim, except to disable...
Cluster verify: make "instance runs in wrong node" node-driven
Previously, the "instance should not be running in this node" error wascomputed by verifying, for each instance, whether any node other than itsprimary was running it. But this is not a well-suited approach if we were...
Verify an absent vm_capable node for files
If we're not verifying all nodes, adding a node outside the currentgroup for file checksums helps us making sure checksums are the same inall of the cluster.
Cluster verify: master must be present for _VerifyFiles
This commit prepares the call to _VerifyFiles for the case when the masternode is not one of the nodes that's being verified (which will be the casefor all node groups but one). We fix it by always passing master info and...
Cluster verify: don't assume we're verifying all nodes/instances
This commit fixes a few initial simple cases in which it was assumed thatwe're always working over the whole cluster. With this change, wedifferentiate between "nodes/instances to verify" and "checks that need...
Cluster verify: gather node/instance list in CheckPrereq
This commit introduces no behavior changes, and is only a minor refactoringthat aids with a cleaner division of future LUClusterVerify work. Thechange consists in:
- substitute the {node,instance}{list,info} structures previously created...
Merge remote branch 'origin/devel-2.4'
iallocator: Stricter check for multi-evac result
Check new secondary nodes' group like it's already done formulti-relocation requests.
cmdlib: Use ganeti.ht for checking iallocator result
cmdlib: Remove punctuation from error messages
Add new iallocator mode to LUTestAllocator
cmdlib.IAllocator: Add multi-relocate support
Introduce instance start/stop no_remember attribute
This will allow stopping or starting an instance without changing theremembered state. While this seems counter-intuitive at first (it willcreate cluster verify errors), it can help in a few corner cases:...
cmdlib.IAllocator: Fewer temporary variables
Reduce the number of temporary variables and generate dictionaries inone go.
TLMigrateInstance: do not migrate to self
Check that the instance is not being migrated to its current primary nodeduring CheckPrereq. Otherwise migration is aborted because the instance isalready running and cleaned-up, which causes the running instance to be killed....
cmdlib.IAllocator: Use lookup table for mode-specific data
Try to prevent instance memory changes N+1 failures
There are multiple bugs with the code checking for N+1 failures in theinstance memory changes which needs significant changes, in themeantime we can at least:
- change the warning message into an error (--force will skip checks)...
Remove references to acquired_locks
These sneaked in from 2.4 during the merge, but this attribute isactually gone in the master branch.
Use the new dry-run mode in cmdlib
This will hopefully detect potential LVM (or any other storage, whenthey implement it) issues before committing changes just on somenodes.
Unfortunately due to the dry_run opcode handling, we can't integratethis into the usual handling (as we need to activate the disks before...
Implement grow dry-run at RPC level
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
cmdlib: Sort nodes for OOB commands
Also reorder the methods to match all other LUs.
cmdlib: Use helper for expanding nodes for OOB commands
cmdlib: Expand instances using helper for repairing disks
Also change the way “share_locks” is filled.
Fix bug introduced in commit 0d5a0b96
When removing “acquired_locks” in commit 0d5a0b96, I didn't rememberthat it does not contain the Big Ganeti Lock.
Fix lock release in TLMigrateInstance
Commit 52f33103 introduced lock release factorization, replacing manuallock release using utility functions. However, it brokeTLMigrateInstance due to a typo (passing the Tasklet to ReleaseLocksinstead of the parent LU). We fix this by passing the LU to...
cmdlib: Remove acquired_locks attribute from LUs
The “acquired_locks” attribute in LUs is used to keep a list of acquiredlocks at each lock level. This information is already known in the lockmanager, which also happens to be the authoritative source. Removing the...
cmdlib: Use local alias for lock manager
Saves some typing and we'll use it more often in the future.
Add --no-wait-for-sync when converting to drbd
Currently, when converting an instance from plain to DRBD, theinstance is blocked during the entire resync period. This patch addsthe --no-wait-for-sync so that the operation finishes as soon as theDRBD sync has started, without waiting for the entire sync. This makes...
Recreate instance disks: allow changing nodes
This patch introduces the option of changing an instance's nodes whendoing the disk recreation. The rationale is that currently if aninstance lives on a node that has gone down and is marked offline,it's not possible to re-create the disks and reinstall the instance on...
Fix instance failover/migration w.r.t TLMigrateInstance
Commit 1c6e5787 removed the iallocator and target_node keywordparameters from TLMigrateInstance, but I didn't update their use inLUInstanceFailover and (not fully) in LUInstanceMigrate.
Rename instance: only show new name when different
It makes not sense to show messages like:Fri May 6 02:04:01 2011 - INFO: Resolved given name 'instance18' to'instance18'
So we'll skip the message if the resolved name is identical to therequested one....
Fix race condition in LUGroupAssignNodes
The original code would get all node information and their groupswithout before acquiring the necessary locks. With this patch the nodeinformation is only retrieved once all locks have been acquired. Groupsare locked optimistically and verified after acquiring the node locks....
Fix DTS_EXT_MIRROR migration
Commit faaabe3c fixed failover behaviour for DTS_INT_MIRROR instances, howeverit broke migration for DTS_EXT_MIRROR instances, by moving iallocator and nodechecks from LUInstanceMigrate to TLMigrateInstance. This has the side-effect...
Use node group locking for replacing disks
This is one of the first opcodes to make use of node group locking. Toget an instance's node groups, the instance's nodes need to be lookedat. Due to a previous design decision nodes are locked after the group,...
TLMigrateInstance: Fix live migration breakage
Commit 77fcff4 unintentionally incorporated code fromTLMigrateInstance.CheckPrereq into TLMigrateInstance._RunAllocator, presumablyduring a rebase from earlier versions of the patch to the 2.5 codebase. As a...
cmdlib: Update error messages, remove some punctuation
- Clarify some error messages- Remove unnecessary punctuation- Merge two if conditions in one place
cmdlib: Fix typo, s/nick/NIC/
A small optimisation in cluster verify
This removes (count of instances + count of nodes) lockacquires/releases.
A few docstring fixes
At least one generates an epydoc error :)
Cluster verify: check for missing bridges
Currently cluster verify doesn't check for bridge information; theonly checks are done at instance create and failover/migratetime. This means a cluster that seems healthy will fail creation jobs.
This patch implements a simple verification that all nodes (in the...
cmdlib: Factorize lock releasing
There will be more lock releasing with upcoming changes, so this willcentralize the logic behind it (what locks to keep, which variables toupdate, etc.).
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
TLReplaceDisks: Use implicit loop for dictionary
Release unneeded locks while replacing disks
If an iallocator is used, “gnt-instance replace-disks” would acquire thelocks of all nodes (only the allocator will decide which node to use).Unfortunately the unneeded locks were not released during the operation,...
cmdlib: Drop SSH runner from LU base class
It is no longer used.
cmdlib.py: fix indentation in _VerifyNode
Signed-off-by: Adeodato Simo <dato@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
TLMigrateInstance: Fix confusing text
Commit d5cafd31 changed this error message, swapping thetext parts in the process.
LUInstanceRename: Amend comment about lock
Also add an assertion.
iallocator: Relocation nodes must be in same group
Quoting from iallocator.rst: “[…] ``relocate`` request is used when anexisting instance needs to be moved within its node group […]”.
Allow creating the DRBD metadev in a different VG
This is a simple change to allow specifying a different VG for themeta device during the creation of instances and addition of disks viagnt-instance modify.
Make _GenerateDRBD8Branch accept different VG names
This is a small change to make this function take a list of VG names,instead of a single one.
Fix for multiple VGs - PlainToDrbd and replace-disks
Converting an instance from 'plain' to 'drbd'. The old code wouldcreate the drbd volumes in the default VG and then the renames wouldfail. This fix pulls the plain VG names from the existing volumes and...
Replace disks: keep the meta device in the same VG
This patch enhances the multi-VG support in replace disks, by keepingthe meta device in the same VG, as opposed to moving it to the datadevice VG (note that we don't have a way to create the meta in adifferent VG in the first place, but at least we correctly handle a...
Improve error messages in cluster verify/OS
A few issues in the clarity of the error messages are fixed:
- "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image"
- "API version lenny-image differs from reference node node1: 10, 5...