History | View | Annotate | Download (475.7 kB)
Merge branch 'devel-2.4' into stable-2.5
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix extra whitespace
Sorry, didn't catch this before…
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>(cherry picked from commit 54b010cad1ea0a536ed037bf315a04dd1c079964)...
Further fixes concerning drbd port release
Commit 3b3b1bc does not entirely fix the bug introduced in commitf396ad8. It fixes consistency of config data in permanent storage, butdoes not ensure consistency in data held in runtime memory of masterd.
The bug of duplicate ports is still triggered when LUInstanceRemove()...
Fix a bug concerning TCP port release
Commit f396ad8 returns the TCP port used by DRBD disk back to theTCP/UDP port pool using AddTcpUdpPort().
However, AddTcpUdpPort() writes the config on every invocation,using _WriteConfig(). This causes two problems:...
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
LUOobCommand: acquire BGL in shared mode
Fixed a typo so that now LUOobCommand acquires the BLG in shared mode, asintended.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
LUNodeAdd: Verify version in Prereq
There are other ways to leave the cluster in a broken state than justthe version check. However they are not very trivial to fix in 2.5. Soleave it up to 2.6 for a nicer fix.
Signed-off-by: René Nussbaumer <rn@google.com>...
LUClusterVerifyConfig: Share BGL, acquire all locks in shared mode
Instead of acquiring the BGL in exclusive mode (which blocks all otheroperations), we acquire all locks for groups, nodes and instances inshared mode before verifying the configuration....
Fix type error in LUInstanceChangeGroup
If a specific list of groups has been requested, then the code usedthat, without transforming it to a (frozen)set first, which resultsin:
unsupported operand type(s) for &: 'list' and 'frozenset'
Trivial fix is to do that in the 'then' branch....
Fix cluster verification issues on multi-group clusters
This patch attempts to fix a number of issues with “gnt-cluster verify”in presence of multiple node groups and DRBD8 instances split over nodesin more than one group.
- Look up instances in a group only by their primary node (otherwise...
Migrate: don't check for free memory on cleanup
Cleanup just updates the config with the correct location of theinstance, or informs of its down status, but never starts it. As suchthere's no point in checking for enough free memory. Actually this check...
LUGroupAssignNodes: Fix node membership corruption
Note: This bug only manifests itself in Ganeti 2.5, but since theproblematic code also exists in 2.4, I decided to fix it there.
If a node was assigned to a new group using “gnt-group assign-nodes” the...
Fix pylint warning on unreachable code
Commit c50452c3186 added an exception when all instances should beevacuated off a node, but did so in a way which made pylint complainabout unreachable code.
LUNodeEvacuate: Disallow migrating all instances at once
There is a design issue in the iallocator interface which prevents usfrom doing this.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
LUNodeEvacuate: Locking fixes
When evacuating a node, only an assertion without informative text wasused to check if the necessary node locks had been acquired. This was ontop of evaluating the list of nodes without having a node group lock, sothis was changed as well....
Fix error when removing node
ConfigWriter.GetAllInstancesInfo returns a dictionary, not a list.Removing a node would fail with “too many values to unpack”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Ensure unused ports return to the free port pool
Ensure ports previously allocated by calling ConfigWriter's AllocatePort() arereturned to the pool of free ports when no longer needed:
Fail if node/group evacuation can't evacuate instances
If an instance can't be evacuated, only a message would be printed. Withthis change the operation always aborts. Newly added unittests check forthis behaviour.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
LUInstanceRename: Compare name with name
… instead of object with name.
LUClusterRepairDiskSizes: Acquire instance locks in exclusive mode
Instances are modified if their disk size doesn't match.
OpGroupVerifyDisks: Fix wrong result type declaration
If an instance had actually a missing disk, the type check would fail.
Fix issue when verifying cluster files
If a cluster has any non-master-candidate nodes, those don't contain allfiles (e.g. config.data). With commit aef59ae764dc (March 31st, 2011)the logic was changed and subsequently verifying a cluster with non-mcnodes would complain....
Fix adding nodes after commit 64c7b3831dc
Commit 64c7b3831dc changed the RPC call for verifying SSH connections.Unfortunately this case in adding nodes was missed.
LUClusterVerifyGroup: Spread SSH checks over more nodes
When verifying a group the code would always check SSH to all nodes inthe same group, as well as the first node for every other group. On bigclusters this can cause issues since many nodes will try to connect to...
Fix handling of cluster verify hooks
The change to enforce boolean results for cluster verify group opcodemissed the HooksCallBack, which uses a very ugly 1/0logic. Furthermore, the logic is wrong, since it unconditionallyresets the verify result to true....
Redistribute the RAPI certificate
This reverts to the old behaviour in Ganeti 2.4 and before.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
DeprecationWarning fixes for pylint
In version 0.21, pylint unified all the disable-* (and enable-*)directives to disable (resp. enable). This leads to a lot ofDeprecationWarning being emitted even if one uses the recommendedversion of pylint (0.21.1, as stated in devnotes.rst)....
Two more PEP8 fixes
cmdlib: Avoid wrapping using backslash
gnt_group: Avoid * magic using keyword arguments (the “pep8” tooldoesn't like the inline comment in this case and will complain aboutspaces around the “*” operator)
PEP8 style fixes
Identified using the “pep8” utility.
Documentation fix for importing with --src-dir option
Signed-off-by: Agata Murawska <agatamurawska@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>(cherry picked from commit b7d7876bd0e9844fab8be28bfa1fd5d563ec7412)
Conflicts:
lib/cmdlib.py (easily fixed)
Signed-off-by: Agata Murawska <agatamurawska@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Small improvements for cluster verify
- Check if BGL is actually owned- Show group name as feedback
Allow locking to be used via OpQuery
The original design for query2 specifically excluded locking, but nowit's turned out that it would be a good thing to have in watcher. Thispatch adds a new parameter to OpQuery and enables its use in LUQuery. Amissing function is added to LUGroupQuery, a comment clarified in...
Change OpClusterVerifyConfig's result, verify results
This patch removes the list of node groups (not used anymore sincecommit fcad7225e3fc) from OpClusterVerifyConfig's result and adds resultverification to all OpClusterVerify* opcodes.
Use LU-generated jobs for verifying cluster
This patch moves the logic for verifying the various node groups in acluster into the master daemon. Job dependencies are used to ensure theconfiguration, which requires the BGL, is verified first.
With this change it will be possible to expose whole-cluster...
Allow fixing of split instances via relocate
Currently, the IAllocator code requests strictly that the (set of) groups ofthe nodes we're relocating from is equal to the set of groups we'rerelocating to.
This, however, makes is impossible to fix split instances, since (by...
Further cleanup after multi-evacuate removal
Commit f0edfcf6 removed the parsing of multi-evacuate result, but thecode went from:
if mode in (multi-evac, relocate): … if mode relocate: …
to:
if mode relocate: … if mode == relocate...
Fix bug in IAllocator parsing of Evacuate result
Commit 342f9172 added stricter checks for the iallocator result inevacuate mode, but it does this irrespective of the resultstatus. When the result has failed and (according to the design) thelist of nodes is empty, this code will trigger the following:...
Remove iallocator's “multi-evacuate” mode
It is no longer used and has been deprecated in 2.5.
Add docstring to cmdlib.TLReplaceDisks._FindFaultyDisks
LUGroupVerifyDisks: Use _CheckInstanceNodeGroups' result
… instead of getting the list of instances once again from theconfiguration.
cmdlib: Factorize checking node groups' instances
Remove 15-second sleep from LUInstanceCreate
Remove 15 second sleep when wait_for_sync is not set. LUInstanceCreate alreadycalls _WaitForSync with oneshot=True, which already performs an internalwait-loop for disks to start syncing.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Add a readability alias
lu.glm.list_owned becomes lu.owned_locks, which is clearer for thereader.
Also rename three variables (which were before named owned_locks) tomake clearer what they track.
Add opcode to change instance's group
This is quite similar to evacuating a group, but the lockingis different.
Factorize checking instance's node groups
Lock potential target nodes for group evacuation
All potential target nodes should be locked while calculatinga group evacuation.
Small changes in group evacuation
- Use OpPrereqError in CheckPrereq- Clarify command synopsis
cmdlib: Factorize getting iallocator
The same logic will be used for changing an instance's group.
Pause DRBD sync for OS install if not wait_for_sync
When wait_for_sync is set to False in LUInstanceCreate, Ganeti lets DRBD syncin the background while performing the rest of the installation steps,including OS installation.
However, OS installation is a very disk-intensive task that intereferes badly...
Optimise use of repeated/looping GetInstanceInfo
Similar to the previous patch, this adds a helper function toeliminate repeated calls info ConfigWriter.
Optimise use of repeated/looping GetNodeInfo
This adds a new ConfigWriter.GetMultiNodeInfo function and replacesmultiple/looping calls to GetNodeInfo with it.
Fix types passed to IAllocator
Iallocator mode reloc, parameter reloc_from takes a list; half of thecode already forced this parameter to list, we add the other two caseswhere it is needed.
Add primary/second nodes' group as query fields
These will be very useful for ganeti-watcher as it needs to retrieveinstances by group.
Remove requirement for variants on OS API v15+
This removes:
- the check in backend that such OSes have a variants file or if it exists that is non-empty; in order for this to work, we also rework the logic in backend._TryOSFromDisk to allow for optional OS files...
Fix group verification of offline nodes
Commit aef59ae7 reworked the file verification, but forgot to takeinto account offline nodes.
The fact that this was not detected yet is due to the fact that wedon't test clusters with offline nodes in QA :(
Signed-off-by: Iustin Pop <iustin@google.com>...
Disallow variants for OSes that don't support them
Otherwise we get no variant checks at all, but the variant is stillrecorded.
Add helper for declaring all locks shared
This patch adds a function for abstracting“dict.fromkeys(locking.LEVELS, 1)”. It also removes a duplicateassignment for the share_locks in LUInstanceQuerydata.
Additionally, it moves the _SupportsOob function to the helper...
Change OpClusterVerifyDisks to per-group opcodes
Until now verifying disks, which is also used by the watcher,would lock all nodes and instances. With this patch the opcodeis changed to operate on per nodegroup, requiring fewer locks.
Both “gnt-cluster” and “ganeti-watcher” are changed for the...
cmdlib: Give instance name in error message on group evacuation
cmdlib: Factorize mapping instance LVs to node/volume
Most boring patch ever
s/'/"/ in (hopefully) the right places.
gnt-instance info: Return static info if node offline
Before this patch “gnt-instance info” would fail with the error message“Error checking node $node: Node is marked offline” if the instance'sprimary node is marked offline and the user didn't explicitely request...
Ignore offline primary when failing over
When the source node for a failover is marked offline, there's no needto require the user to specify “--ignore-consistency”.
To make it work at all, a number of bugs introduced by the merge ofmigration and failover are also fixed by this patch....
Merge branch 'devel-2.4'
gnt-node volumes: Fix instance names
Commit 84d7e26b changed “objects.Instance.MapLVsByN” to not just returnthe LV name, but to include the volume group name (e.g.“xenvg/d67e8700….disk0_data”). This in turn broke the mapping of volumenames in LUNodeQueryvols, stopping instance names from displayed in...
Fix instance failover (missing argument)
More fallout from commit 323f9095b49d.
Add error state to LUGroupEvacuate's exceptions
Add new opcode for evacuating group
Fix node evacuation
- Adjust for new iallocator result format- Split some code into helper functions
Add gnt-instance start --pause
Creates the instance, but pauses execution before booting. This combinedwith 'gnt-instance console' unpausing instances means that the entireboot process can be viewed and monitored.
Signed-off-by: Stephen Shirley <diamond@google.com>...
Remove old node evacuation opcode
LUNodeEvacStrategy has been replaced with LUNodeEvacuate.
Add new opcode to evacuate node
This new opcode will replace LUNodeEvacStrategy, which used to return alist of instances and new secondary nodes. With the new opcode theiallocator (if available) is tasked to generate the necessary operationsin the form of opcodes. This moves some logic from the client to the...
Fix cluster verify for empty node groups
There were some implicit assertions in the code that all node groupshave nodes, which is not necessarily true.
Additionally, the patch does a wrapping change.
Fix bug in recreate-disks for DRBD instances
The new functionality in 2.4.2 for recreate-disks to change nodes isbroken for DRBD instances: it simply changes the nodes without caringfor the DRBD minors mapping, which will lead to conflicts in non-empty...
Fix a lint warning
Patch db8e5f1c removed the use of feedback_fn, hence pylint warnnow.
Fix bug in drbd8 replace disks on current nodes
Currently the drbd8 replace-disks on the same node (i.e. -p or -s) hasa bug in that it does modify the instance disk temporarily beforechanging it back to the same value. However, we don't need to, andshouldn't do that: what this operation do is simply change the LVM...
Conflicts: lib/cmdlib.py - use RequireSharedFileStorage there
Signed-off-by: Guido Trotter <ultrotter@google.com>...
LUInstanceCreate: use opcodes.RequireFileStorage
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Add one forgotten element to the file disk path
This was left out during the fix/refactoring
Conflicts: lib/cmdlib.py - use constants.DTS_FILEBASED...
Add DTS_FILEBASED constant
LUInstanceCreate: fix file storage dir calculation
- Move the calculation at the beginning of CheckPrereq, since it doesn't modify any state, but still keeps locks- Only perform the calculation if the actual disk template is filebased- Error out if there is no defined file storage dir...
Check that filestorage is enabled when requested
Remove self.op.file_storage_dir isabs check
As the manpage says, and the code does, self.op.file_storage_dir is anadditional relative path under the cluster file storage dir. As such itshould not be absolute.
Replace iallocator's mreloc w/ change-group and node-evac
This patch removes all occurrences of the “multi-relocate” iallocatormode. Commit 25ee7fd845 updated the design document and introducedseparate modes, “change-group” and “node-evacuate”. The constants aren't...
Fix locking issues in LUClusterVerifyGroup
- Use functions in ConfigWriter instead of custom loops- Calculate nodes only once instances locks are acquired, removes one potential race condition- Don't retrieve lists of all node/instance information without locks...
cmdlib: Acquire BGL for LUClusterVerifyConfig
LUClusterVerifyConfig verifies a number of configuration settings. Fordoing so, it needs a consistent list of nodes, groups and instances. Sofar no locks were acquired at all (except for the BGL in shared mode)....
Export/import instance tags
Fix issue with tags on instance creation
Commit 720f56c85a added the ability to specify tags when creating aninstance. The “tags” attribute of an instance object needs to be a set,but the patch's code saved it as a list, causing breakage in other parts...
Export instance tags to instance hooks
Instance hooks now get an INSTANCE_TAGS environment variable, which contains aspace-delimited list of the affected instance's tags.
Also update the documentation to reflect the change.
Add tag handling to {Op,LU}InstanceCreate
Add a tag slot to opcodes.OpInstanceCreate. We do not reuse _PTags, as this isintended for OpTagsSet and thus:
a) is not documented b) does not carry a default value, making it mandatory
Also pass the tags to the iallocator during instance creation....
iallocator: add ht-checking for the request
Currently, we only ht-check the result value from the iallocator, andwe send whatever we happen to check manually in the LUs that call theiallocator.
This is not good, as we have to duplicate checks in many places, and...
iallocator: rename mem_size to memory
Currently, the iallocator in 'allocate' requires mem_size on inputbut serialises that as 'memory'. This inconsistency makes it hard toautomatically validate the parameters, hence this patch renamesmem_size.
iallocator: change default for target_groups
Per the design doc, the target_groups request key "if present, it musteither be the empty list, or contain a list of group UUIDs". Currentlyit defaults to None/null, which is not valid.