History | View | Annotate | Download (474.7 kB)
Merge branch 'devel-2.5'
Fix OS creation's error handling when pausing sync
Commit 41e1e79 introduced a feature in which when wait_for_sync is notset, DRBD sync is paused during the OS installation.
Doing so, however, broke OS creation's error handling: the result valuefrom the instance_os_add RPC call was overwritten by the one of the...
cmdlib: Support for CPU pinning
Signed-off-by: Tsachy Shacham <tsachy@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix for auto parameters on import
Signed-off-by: Agata Murawska <agatamurawska@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
DeprecationWarning fixes for pylint
In version 0.21, pylint unified all the disable-* (and enable-*)directives to disable (resp. enable). This leads to a lot ofDeprecationWarning being emitted even if one uses the recommendedversion of pylint (0.21.1, as stated in devnotes.rst)....
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Two more PEP8 fixes
cmdlib: Avoid wrapping using backslash
gnt_group: Avoid * magic using keyword arguments (the “pep8” tooldoesn't like the inline comment in this case and will complain aboutspaces around the “*” operator)
PEP8 style fixes
Identified using the “pep8” utility.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Allow importing instance with full auto parameters
Disk template is no longer required when importing instance
… provided that disk_template value is set in the config.ini file.
Signed-off-by: Agata Murawska <agatamurawska@google.com>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Documentation fix for importing with --src-dir option
Get rid of {disk,nic}_count variables
This also fixes an issue if "disk_template = diskless" and no"disk_count" was specified, while doing an import of said instancespecifications.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Small improvements for cluster verify
- Check if BGL is actually owned- Show group name as feedback
Allow locking to be used via OpQuery
The original design for query2 specifically excluded locking, but nowit's turned out that it would be a good thing to have in watcher. Thispatch adds a new parameter to OpQuery and enables its use in LUQuery. Amissing function is added to LUGroupQuery, a comment clarified in...
Change OpClusterVerifyConfig's result, verify results
This patch removes the list of node groups (not used anymore sincecommit fcad7225e3fc) from OpClusterVerifyConfig's result and adds resultverification to all OpClusterVerify* opcodes.
Use LU-generated jobs for verifying cluster
This patch moves the logic for verifying the various node groups in acluster into the master daemon. Job dependencies are used to ensure theconfiguration, which requires the BGL, is verified first.
With this change it will be possible to expose whole-cluster...
Allow fixing of split instances via relocate
Currently, the IAllocator code requests strictly that the (set of) groups ofthe nodes we're relocating from is equal to the set of groups we'rerelocating to.
This, however, makes is impossible to fix split instances, since (by...
Further cleanup after multi-evacuate removal
Commit f0edfcf6 removed the parsing of multi-evacuate result, but thecode went from:
if mode in (multi-evac, relocate): … if mode relocate: …
to:
if mode relocate: … if mode == relocate...
Fix bug in IAllocator parsing of Evacuate result
Commit 342f9172 added stricter checks for the iallocator result inevacuate mode, but it does this irrespective of the resultstatus. When the result has failed and (according to the design) thelist of nodes is empty, this code will trigger the following:...
Remove iallocator's “multi-evacuate” mode
It is no longer used and has been deprecated in 2.5.
Add docstring to cmdlib.TLReplaceDisks._FindFaultyDisks
LUGroupVerifyDisks: Use _CheckInstanceNodeGroups' result
… instead of getting the list of instances once again from theconfiguration.
cmdlib: Factorize checking node groups' instances
Remove 15-second sleep from LUInstanceCreate
Remove 15 second sleep when wait_for_sync is not set. LUInstanceCreate alreadycalls _WaitForSync with oneshot=True, which already performs an internalwait-loop for disks to start syncing.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Add a readability alias
lu.glm.list_owned becomes lu.owned_locks, which is clearer for thereader.
Also rename three variables (which were before named owned_locks) tomake clearer what they track.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Add opcode to change instance's group
This is quite similar to evacuating a group, but the lockingis different.
Factorize checking instance's node groups
Lock potential target nodes for group evacuation
All potential target nodes should be locked while calculatinga group evacuation.
Small changes in group evacuation
- Use OpPrereqError in CheckPrereq- Clarify command synopsis
cmdlib: Factorize getting iallocator
The same logic will be used for changing an instance's group.
Pause DRBD sync for OS install if not wait_for_sync
When wait_for_sync is set to False in LUInstanceCreate, Ganeti lets DRBD syncin the background while performing the rest of the installation steps,including OS installation.
However, OS installation is a very disk-intensive task that intereferes badly...
Optimise use of repeated/looping GetInstanceInfo
Similar to the previous patch, this adds a helper function toeliminate repeated calls info ConfigWriter.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Optimise use of repeated/looping GetNodeInfo
This adds a new ConfigWriter.GetMultiNodeInfo function and replacesmultiple/looping calls to GetNodeInfo with it.
Fix types passed to IAllocator
Iallocator mode reloc, parameter reloc_from takes a list; half of thecode already forced this parameter to list, we add the other two caseswhere it is needed.
Add primary/second nodes' group as query fields
These will be very useful for ganeti-watcher as it needs to retrieveinstances by group.
Remove requirement for variants on OS API v15+
This removes:
- the check in backend that such OSes have a variants file or if it exists that is non-empty; in order for this to work, we also rework the logic in backend._TryOSFromDisk to allow for optional OS files...
Fix group verification of offline nodes
Commit aef59ae7 reworked the file verification, but forgot to takeinto account offline nodes.
The fact that this was not detected yet is due to the fact that wedon't test clusters with offline nodes in QA :(
Signed-off-by: Iustin Pop <iustin@google.com>...
Disallow variants for OSes that don't support them
Otherwise we get no variant checks at all, but the variant is stillrecorded.
Add helper for declaring all locks shared
This patch adds a function for abstracting“dict.fromkeys(locking.LEVELS, 1)”. It also removes a duplicateassignment for the share_locks in LUInstanceQuerydata.
Additionally, it moves the _SupportsOob function to the helper...
Change OpClusterVerifyDisks to per-group opcodes
Until now verifying disks, which is also used by the watcher,would lock all nodes and instances. With this patch the opcodeis changed to operate on per nodegroup, requiring fewer locks.
Both “gnt-cluster” and “ganeti-watcher” are changed for the...
cmdlib: Give instance name in error message on group evacuation
cmdlib: Factorize mapping instance LVs to node/volume
Most boring patch ever
s/'/"/ in (hopefully) the right places.
gnt-instance info: Return static info if node offline
Before this patch “gnt-instance info” would fail with the error message“Error checking node $node: Node is marked offline” if the instance'sprimary node is marked offline and the user didn't explicitely request...
Ignore offline primary when failing over
When the source node for a failover is marked offline, there's no needto require the user to specify “--ignore-consistency”.
To make it work at all, a number of bugs introduced by the merge ofmigration and failover are also fixed by this patch....
Merge branch 'devel-2.4'
gnt-node volumes: Fix instance names
Commit 84d7e26b changed “objects.Instance.MapLVsByN” to not just returnthe LV name, but to include the volume group name (e.g.“xenvg/d67e8700….disk0_data”). This in turn broke the mapping of volumenames in LUNodeQueryvols, stopping instance names from displayed in...
Fix instance failover (missing argument)
More fallout from commit 323f9095b49d.
Add error state to LUGroupEvacuate's exceptions
Add new opcode for evacuating group
Fix node evacuation
- Adjust for new iallocator result format- Split some code into helper functions
Add gnt-instance start --pause
Creates the instance, but pauses execution before booting. This combinedwith 'gnt-instance console' unpausing instances means that the entireboot process can be viewed and monitored.
Signed-off-by: Stephen Shirley <diamond@google.com>...
Remove old node evacuation opcode
LUNodeEvacStrategy has been replaced with LUNodeEvacuate.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Add new opcode to evacuate node
This new opcode will replace LUNodeEvacStrategy, which used to return alist of instances and new secondary nodes. With the new opcode theiallocator (if available) is tasked to generate the necessary operationsin the form of opcodes. This moves some logic from the client to the...
Fix cluster verify for empty node groups
There were some implicit assertions in the code that all node groupshave nodes, which is not necessarily true.
Additionally, the patch does a wrapping change.
Fix bug in recreate-disks for DRBD instances
The new functionality in 2.4.2 for recreate-disks to change nodes isbroken for DRBD instances: it simply changes the nodes without caringfor the DRBD minors mapping, which will lead to conflicts in non-empty...
Fix a lint warning
Patch db8e5f1c removed the use of feedback_fn, hence pylint warnnow.
Fix bug in drbd8 replace disks on current nodes
Currently the drbd8 replace-disks on the same node (i.e. -p or -s) hasa bug in that it does modify the instance disk temporarily beforechanging it back to the same value. However, we don't need to, andshouldn't do that: what this operation do is simply change the LVM...
Conflicts: lib/cmdlib.py - use RequireSharedFileStorage there
Signed-off-by: Guido Trotter <ultrotter@google.com>...
LUInstanceCreate: use opcodes.RequireFileStorage
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Add one forgotten element to the file disk path
This was left out during the fix/refactoring
Conflicts: lib/cmdlib.py - use constants.DTS_FILEBASED...
Add DTS_FILEBASED constant
LUInstanceCreate: fix file storage dir calculation
- Move the calculation at the beginning of CheckPrereq, since it doesn't modify any state, but still keeps locks- Only perform the calculation if the actual disk template is filebased- Error out if there is no defined file storage dir...
Check that filestorage is enabled when requested
Remove self.op.file_storage_dir isabs check
As the manpage says, and the code does, self.op.file_storage_dir is anadditional relative path under the cluster file storage dir. As such itshould not be absolute.
Replace iallocator's mreloc w/ change-group and node-evac
This patch removes all occurrences of the “multi-relocate” iallocatormode. Commit 25ee7fd845 updated the design document and introducedseparate modes, “change-group” and “node-evacuate”. The constants aren't...
Fix locking issues in LUClusterVerifyGroup
- Use functions in ConfigWriter instead of custom loops- Calculate nodes only once instances locks are acquired, removes one potential race condition- Don't retrieve lists of all node/instance information without locks...
cmdlib: Acquire BGL for LUClusterVerifyConfig
LUClusterVerifyConfig verifies a number of configuration settings. Fordoing so, it needs a consistent list of nodes, groups and instances. Sofar no locks were acquired at all (except for the BGL in shared mode)....
Export/import instance tags
Fix issue with tags on instance creation
Commit 720f56c85a added the ability to specify tags when creating aninstance. The “tags” attribute of an instance object needs to be a set,but the patch's code saved it as a list, causing breakage in other parts...
Export instance tags to instance hooks
Instance hooks now get an INSTANCE_TAGS environment variable, which contains aspace-delimited list of the affected instance's tags.
Also update the documentation to reflect the change.
Add tag handling to {Op,LU}InstanceCreate
Add a tag slot to opcodes.OpInstanceCreate. We do not reuse _PTags, as this isintended for OpTagsSet and thus:
a) is not documented b) does not carry a default value, making it mandatory
Also pass the tags to the iallocator during instance creation....
iallocator: add ht-checking for the request
Currently, we only ht-check the result value from the iallocator, andwe send whatever we happen to check manually in the LUs that call theiallocator.
This is not good, as we have to duplicate checks in many places, and...
iallocator: rename mem_size to memory
Currently, the iallocator in 'allocate' requires mem_size on inputbut serialises that as 'memory'. This inconsistency makes it hard toautomatically validate the parameters, hence this patch renamesmem_size.
iallocator: change default for target_groups
Per the design doc, the target_groups request key "if present, it musteither be the empty list, or contain a list of group UUIDs". Currentlyit defaults to None/null, which is not valid.
iallocator: export the hypervisor value
In 'allocate' mode, the documentation specifies that we export thehypervisor value (“Allocation needs, in addition: … hypervisor, thehypervisor of this instance”) and we need that on input, however wedon't actually export it....
iallocator: fix incomplete refactoring
Commit fdbe29ee changed the iallocator modes from 'r'/'w' to'ro'/'rw', but forgot one check in LUTestAllocator. This patch justcompletes the replacements.
gnt-node migrate: Use LU-generated jobs
Until now LUNodeMigrate used multiple tasklets to evacuate all primaryinstances on a node. In some cases it would acquire all node locks,which isn't good on big clusters. With upcoming improvements to the LUsfor instance failover and migration, switching to separate jobs looks...
TLReplaceDisks: Move assertion checking locks
Commit 1bee66f3 added assertions for ensuring only the necessary locksare kept while replacing disks. One of them makes sure locks have beenreleased during the operation. Unfortunately the commit added the check...
Fix bug in LUNodeMigrate
Commit aac4511a added CheckArguments to LUNodeMigrate with a call to_CheckIAllocatorOrNode. When no default iallocator is defined,evacuating a node would always fail:
$ gnt-node migrate node123Migrate instance(s) '...'?y/[n]/?: y...
node evac: don't call IAllocator if no instances
Currently we generate an empty list only for the '-n node' invocation,but for iallocator we still call the iallocator (which needs an RPCcall, etc.). By moving the computation of instances outside of the if...
Fix a couple of style mistakes
Cluster verify: check for nodes/instances with no group
Previously, all nodes and instances would always be visited/verified. Bydriving the verification by node group now, we will miss nodes andinstances that can't be reached from existing node groups, should that rare...
Cluster verify: fix LV checks for split instances
When sharding by group, if a mirrored instance is split (primary andsecondary) between two groups, its volumes will not be properly checked:the group of the primary will warn about a missing volume in the secondary,...
Cluster verify: make NV_NODELIST smaller
To cope with increasing cluster sizes, we now make nodes try to contact allother nodes in their group, and one node from every other group.
Signed-off-by: Adeodato Simo <dato@google.com>Signed-off-by: Guido Trotter <ultrotter@google.com>...
Cluster verify: verify hypervisor parameters only once
The list of all hypervisor parameters has to be computed inLUClusterVerifyGroup, since it needs to be passed to nodes asNV_HVPARAMS. However, it is better only to verify said parameters once,out of LUClusterVerifyConfig....
Split LUClusterVerify into LUClusterVerify{Config,Group}
With this change, LUClusterVerifyConfig becomes a "light" LU that onlyverifies the global config and other, master-only settings, and the bulk ofnode/instance verification is done by LUClusterVerifyGroup, which only acts...
Cluster verify: factor out error codes and functions
We move all error code definitions, plus the _Error and _ErrorIf helpers,to a private _VerifyErrors mix-in class that can be later shared by the newtwo cluster verify LUs.
(_Error and _ErrorIf code was moved around verbatim, except to disable...
Cluster verify: make "instance runs in wrong node" node-driven
Previously, the "instance should not be running in this node" error wascomputed by verifying, for each instance, whether any node other than itsprimary was running it. But this is not a well-suited approach if we were...
Verify an absent vm_capable node for files
If we're not verifying all nodes, adding a node outside the currentgroup for file checksums helps us making sure checksums are the same inall of the cluster.
Cluster verify: master must be present for _VerifyFiles
This commit prepares the call to _VerifyFiles for the case when the masternode is not one of the nodes that's being verified (which will be the casefor all node groups but one). We fix it by always passing master info and...
Cluster verify: don't assume we're verifying all nodes/instances
This commit fixes a few initial simple cases in which it was assumed thatwe're always working over the whole cluster. With this change, wedifferentiate between "nodes/instances to verify" and "checks that need...
Cluster verify: gather node/instance list in CheckPrereq
This commit introduces no behavior changes, and is only a minor refactoringthat aids with a cleaner division of future LUClusterVerify work. Thechange consists in:
- substitute the {node,instance}{list,info} structures previously created...
Merge remote branch 'origin/devel-2.4'