History | View | Annotate | Download (438.7 kB)
Cluster verify: verify hypervisor parameters only once
The list of all hypervisor parameters has to be computed inLUClusterVerifyGroup, since it needs to be passed to nodes asNV_HVPARAMS. However, it is better only to verify said parameters once,out of LUClusterVerifyConfig....
Split LUClusterVerify into LUClusterVerify{Config,Group}
With this change, LUClusterVerifyConfig becomes a "light" LU that onlyverifies the global config and other, master-only settings, and the bulk ofnode/instance verification is done by LUClusterVerifyGroup, which only acts...
Cluster verify: factor out error codes and functions
We move all error code definitions, plus the _Error and _ErrorIf helpers,to a private _VerifyErrors mix-in class that can be later shared by the newtwo cluster verify LUs.
(_Error and _ErrorIf code was moved around verbatim, except to disable...
Cluster verify: make "instance runs in wrong node" node-driven
Previously, the "instance should not be running in this node" error wascomputed by verifying, for each instance, whether any node other than itsprimary was running it. But this is not a well-suited approach if we were...
Verify an absent vm_capable node for files
If we're not verifying all nodes, adding a node outside the currentgroup for file checksums helps us making sure checksums are the same inall of the cluster.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Cluster verify: master must be present for _VerifyFiles
This commit prepares the call to _VerifyFiles for the case when the masternode is not one of the nodes that's being verified (which will be the casefor all node groups but one). We fix it by always passing master info and...
Cluster verify: don't assume we're verifying all nodes/instances
This commit fixes a few initial simple cases in which it was assumed thatwe're always working over the whole cluster. With this change, wedifferentiate between "nodes/instances to verify" and "checks that need...
Cluster verify: gather node/instance list in CheckPrereq
This commit introduces no behavior changes, and is only a minor refactoringthat aids with a cleaner division of future LUClusterVerify work. Thechange consists in:
- substitute the {node,instance}{list,info} structures previously created...
Merge remote branch 'origin/devel-2.4'
iallocator: Stricter check for multi-evac result
Check new secondary nodes' group like it's already done formulti-relocation requests.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
cmdlib: Use ganeti.ht for checking iallocator result
cmdlib: Remove punctuation from error messages
Add new iallocator mode to LUTestAllocator
cmdlib.IAllocator: Add multi-relocate support
Introduce instance start/stop no_remember attribute
This will allow stopping or starting an instance without changing theremembered state. While this seems counter-intuitive at first (it willcreate cluster verify errors), it can help in a few corner cases:...
cmdlib.IAllocator: Fewer temporary variables
Reduce the number of temporary variables and generate dictionaries inone go.
TLMigrateInstance: do not migrate to self
Check that the instance is not being migrated to its current primary nodeduring CheckPrereq. Otherwise migration is aborted because the instance isalready running and cleaned-up, which causes the running instance to be killed....
cmdlib.IAllocator: Use lookup table for mode-specific data
Try to prevent instance memory changes N+1 failures
There are multiple bugs with the code checking for N+1 failures in theinstance memory changes which needs significant changes, in themeantime we can at least:
- change the warning message into an error (--force will skip checks)...
Remove references to acquired_locks
These sneaked in from 2.4 during the merge, but this attribute isactually gone in the master branch.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Merge branch 'devel-2.4'
Use the new dry-run mode in cmdlib
This will hopefully detect potential LVM (or any other storage, whenthey implement it) issues before committing changes just on somenodes.
Unfortunately due to the dry_run opcode handling, we can't integratethis into the usual handling (as we need to activate the disks before...
Implement grow dry-run at RPC level
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
cmdlib: Sort nodes for OOB commands
Also reorder the methods to match all other LUs.
cmdlib: Use helper for expanding nodes for OOB commands
cmdlib: Expand instances using helper for repairing disks
Also change the way “share_locks” is filled.
Fix bug introduced in commit 0d5a0b96
When removing “acquired_locks” in commit 0d5a0b96, I didn't rememberthat it does not contain the Big Ganeti Lock.
Fix lock release in TLMigrateInstance
Commit 52f33103 introduced lock release factorization, replacing manuallock release using utility functions. However, it brokeTLMigrateInstance due to a typo (passing the Tasklet to ReleaseLocksinstead of the parent LU). We fix this by passing the LU to...
cmdlib: Remove acquired_locks attribute from LUs
The “acquired_locks” attribute in LUs is used to keep a list of acquiredlocks at each lock level. This information is already known in the lockmanager, which also happens to be the authoritative source. Removing the...
cmdlib: Use local alias for lock manager
Saves some typing and we'll use it more often in the future.
Add --no-wait-for-sync when converting to drbd
Currently, when converting an instance from plain to DRBD, theinstance is blocked during the entire resync period. This patch addsthe --no-wait-for-sync so that the operation finishes as soon as theDRBD sync has started, without waiting for the entire sync. This makes...
Recreate instance disks: allow changing nodes
This patch introduces the option of changing an instance's nodes whendoing the disk recreation. The rationale is that currently if aninstance lives on a node that has gone down and is marked offline,it's not possible to re-create the disks and reinstall the instance on...
Fix instance failover/migration w.r.t TLMigrateInstance
Commit 1c6e5787 removed the iallocator and target_node keywordparameters from TLMigrateInstance, but I didn't update their use inLUInstanceFailover and (not fully) in LUInstanceMigrate.
Signed-off-by: Iustin Pop <iustin@google.com>...
Rename instance: only show new name when different
It makes not sense to show messages like:Fri May 6 02:04:01 2011 - INFO: Resolved given name 'instance18' to'instance18'
So we'll skip the message if the resolved name is identical to therequested one....
Fix race condition in LUGroupAssignNodes
The original code would get all node information and their groupswithout before acquiring the necessary locks. With this patch the nodeinformation is only retrieved once all locks have been acquired. Groupsare locked optimistically and verified after acquiring the node locks....
Fix DTS_EXT_MIRROR migration
Commit faaabe3c fixed failover behaviour for DTS_INT_MIRROR instances, howeverit broke migration for DTS_EXT_MIRROR instances, by moving iallocator and nodechecks from LUInstanceMigrate to TLMigrateInstance. This has the side-effect...
Use node group locking for replacing disks
This is one of the first opcodes to make use of node group locking. Toget an instance's node groups, the instance's nodes need to be lookedat. Due to a previous design decision nodes are locked after the group,...
TLMigrateInstance: Fix live migration breakage
Commit 77fcff4 unintentionally incorporated code fromTLMigrateInstance.CheckPrereq into TLMigrateInstance._RunAllocator, presumablyduring a rebase from earlier versions of the patch to the 2.5 codebase. As a...
cmdlib: Update error messages, remove some punctuation
- Clarify some error messages- Remove unnecessary punctuation- Merge two if conditions in one place
cmdlib: Fix typo, s/nick/NIC/
A small optimisation in cluster verify
This removes (count of instances + count of nodes) lockacquires/releases.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
A few docstring fixes
At least one generates an epydoc error :)
Cluster verify: check for missing bridges
Currently cluster verify doesn't check for bridge information; theonly checks are done at instance create and failover/migratetime. This means a cluster that seems healthy will fail creation jobs.
This patch implements a simple verification that all nodes (in the...
cmdlib: Factorize lock releasing
There will be more lock releasing with upcoming changes, so this willcentralize the logic behind it (what locks to keep, which variables toupdate, etc.).
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
TLReplaceDisks: Use implicit loop for dictionary
Release unneeded locks while replacing disks
If an iallocator is used, “gnt-instance replace-disks” would acquire thelocks of all nodes (only the allocator will decide which node to use).Unfortunately the unneeded locks were not released during the operation,...
cmdlib: Drop SSH runner from LU base class
It is no longer used.
cmdlib.py: fix indentation in _VerifyNode
Signed-off-by: Adeodato Simo <dato@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
TLMigrateInstance: Fix confusing text
Commit d5cafd31 changed this error message, swapping thetext parts in the process.
LUInstanceRename: Amend comment about lock
Also add an assertion.
iallocator: Relocation nodes must be in same group
Quoting from iallocator.rst: “[…] ``relocate`` request is used when anexisting instance needs to be moved within its node group […]”.
Allow creating the DRBD metadev in a different VG
This is a simple change to allow specifying a different VG for themeta device during the creation of instances and addition of disks viagnt-instance modify.
Make _GenerateDRBD8Branch accept different VG names
This is a small change to make this function take a list of VG names,instead of a single one.
Fix for multiple VGs - PlainToDrbd and replace-disks
Converting an instance from 'plain' to 'drbd'. The old code wouldcreate the drbd volumes in the default VG and then the renames wouldfail. This fix pulls the plain VG names from the existing volumes and...
Replace disks: keep the meta device in the same VG
This patch enhances the multi-VG support in replace disks, by keepingthe meta device in the same VG, as opposed to moving it to the datadevice VG (note that we don't have a way to create the meta in adifferent VG in the first place, but at least we correctly handle a...
Fix punctuation in an error message
IIRC we don't use punctuation at the end of error messages.
Prevent readding of the master node
This breaks Ganeti in multiple ways. If we don't make the check ingnt-node itself, then bootstrap.SetupNodeDaemon will restart themaster daemon, making the operation fail:
node1# gnt-node add --readd node1 Cannot communicate with the master daemon....
Improve error messages in cluster verify/OS
A few issues in the clarity of the error messages are fixed:
- "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image"
- "API version lenny-image differs from reference node node1: 10, 5...
masterd: Add support for tagging node groups
TLMigrateInstance: remove 10s sleeps
TLMigrateInstance._ExecMigration contains two 10-second sleeps betweenindividual migration steps.
Apart from prolonging the migration duration by 20s, the second sleepcauses FinalizeMigration to be called 10 seconds after the real...
Fix typo in LUGroupAssignNodes
disk wiping: fix bug in chunk size computation
The current wipe_chunk_size computation is doing min(int_value,float_value). For small disks (below 10GiB), the actual formula willresult into the float value being chosen. This results into veryinteresting behaviour:...
Release locks before wiping disks during instance creation
Ganeti 2.3 introduced an optional feature to overwrite an instance'sdisks on creation. Unfortunately the code kept all locks while doing thewipe, slowing down the creation of multiple instances in parallel....
Nicer formatting for group query error
Before this patc the message would look like “Some groups do not exist:[u'foo', u'bar']”, now it's “Some groups do not exist: foo, bar”.
Merge branch 'stable-2.4' into devel-2.4
LUInstanceQueryData: Don't acquire locks unless requested
Until now LUInstanceQueryData always acquired locks for the instance(s)and nodes involved. In combination with long-running operations thisprevented the use of “gnt-instance info”, even with the “--static”...
TLMigrateInstance: Merge failover code, allow fallback
As the code for failover for checking is almost identical it's an easytask to switch it over to the TLMigrateInstance. This allows us tofallback to failover if migrate fails prereq check for some reason....
Verify file consistency using centrally computed list
Until now “gnt-cluster verify” (LUClusterVerify) would compute its ownlist of files to check for consistency. This list was not complete andcertain inconsistencies were missed.
With this patch the code is changed to use the list of files used by...
cmdlib: Factorize computation of ancillary files
… and change the logic in _RedistributeAncillaryFiles. The virtuallysame list of files will be used to verify the files' consistency.
cmdlib: Fix mistake made in commit 75c7520f0
Commit 75c7520f0 used the wrong constant. I double-checked all otherchanges made in the commit.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
cmdlib: Replace hardcoded values with constants
Display the actual memory values in N+1 failures
This changes the display from:Mon Apr 4 02:29:46 2011 * Verifying N+1 Memory redundancyMon Apr 4 02:29:46 2011 - ERROR: node node2: not enough memory toaccomodate instance failovers should node node1 fail...
Relax instance ERROR on admin_down on offline node
This fixes a issue, where an stopped instances is reported as ERRORin cluster verify if it lives on a offline node. As the instances isdown this shouldn't happen.
Signed-off-by: René Nussbaumer <rn@google.com>...
Implement submitting jobs from logical units
The design details can be seen in the design document(doc/design-lu-generated-jobs.rst).
Split BuildHooksEnv of LUs
Commit dd7f677623 added another call to BuildHooksEnv to providepost-phase status variables. Since BuildHooksEnv also built the nodelists, that meant they have to be built twice. First a rather strictcheck was used, but it turned out to be more tricky. Commit b423c51336...
Fix hook node list when adding node
This broke QA (and everyone trying to add a node) by complaining aboutdifferent node lists.
cmdlib: Factorize running post-pase hook
masterd: Simplify code for field queries
Instead of going via cmdlib and using special cases for differentresources, the list of fields is used directly.
constants: Rename QR_OP_*, add QR_VIA_RAPI
Commit 28b71a76 added a list of resources which can be queried usingLUXI. Unfortunately the variable was named “QR_OP_LUXI”, which can beconfusing. This patch renames “QR_OP_QUERY” to “QR_VIA_OP”, “QR_OP_LUXI”...
TLReplaceDisks: Add check if disks are activated
Previously we failed later with a rather useless error message. Thispatch fixes this and tells the user to activate-disks if replace-disksis in the need of activated disks rather than abort with a cryptic error...
LUOsDiagnose: Move legacy behaviour into filter
The behaviour of LUOsDiagnose needs special treatment. Commit d22dfef7changed it to not return hidden, blacklisted or invalid OSes if therespective field is not requested. This behaviour needs to be preserved...
Convert OsDiagnose to query
Treat empty oob_program param as default
There is currently no way to reset oob_program back to its default fromthe cmdline, which causes problems for cluster-merge. This patch meansthat the following now works: gnt-cluster modify --node-parameters oob_program=...
Fix bug in instance listing with orphan instances
Nodes can return unknown instances, so we shouldn't use the name as anindex without checking.
Instance failover: fix bug for INT_MIRROR cases
Patches db366d9a and aac4511a added support for EXT_MIRROR instances,but inadvertently introduced a bug: for INT_MIRROR cases, we don'tneed (actually we can't support) neither an iallocator nor a targetnode....
OpOobCommand: Adding power on delay
This delays the invocation of the power on of the next node. So if youpower on a bunch of nodes it will not blow the fuse.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Shared storage instance failover
Modify LUFailoverInstance to enable shared storage instances to failover.Shared storage instance failover requires either a target node or aniallocator to determine the target node. If none is given, the cluster default...
Rename DTS_NET_MIRROR to DTS_INT_MIRROR
DTS_INT_MIRROR better contrasts DTS_EXT_MIRROR.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>[iustin@google.com: updated patch for changed context]Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Shared storage node migration
Modify LUNodeMigrate to provide node migration for nodes with instances usingshared storage. gnt-node migrate has to be passed an iallocator for migrationof shared storage instances to be performed. When using a shared storage...
Shared block storage support
This patch introduces basic shared block storage support.
It introduces a new storage backend, bdev.PersistentBlockDevice, touse as a backend for shared block storage. The new bdev requires a newBLOCKDEV_DRIVER_MANUAL constant with the value "manual" and uses it as...
IAllocator changes to work with shared storage
Make cmdlib.IAllocator shared-storage-aware. IAllocator requires secondarynodes only on DTS_NET_MIRROR disk templates and requires no secondaries forDTS_EXT_MIRROR templates.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Shared storage instance migration
Modify LUMigrateInstance and TLMigrateInstance to allow instance migrations forinstances with DTS_EXT_MIRROR disk templates.
Migrations of shared storage instances require either a target node, or aniallocator to determine the target node. If none is given, the cluster default...
Merge branch 'stable-2.4'
LUInstanceRename: Fail if renamed hostname mismatch
There's a problem if you run gnt-instance rename with a non FQDN and therenamed LU tries to resolve the hostname to make it FQDN. It could bethat this resolved hostname was just a CNAME to another name which leads...
Remove deprecated 'bridge' nic parameter
This has been a synonym for "link" since a few major versions.Add a NEWS entry so we won't forget to mention it at release time.
query: Fix bug when names are specified
If the client/caller would specify names through the use of a filter,the result would be sorted. This is a regression over earlier Ganetiversions and verified in QA. This patch adds an optional parameter tocontrol the sorting and provides unittests....
Core shared file storage support
This patch introduces core file storage support, consisting of the following:
A configure-time switch for enabling/disabling shared file storagesupport and controlling the shared file storage location:--with-shared-file-storage-dir=. Shared file storage configuration is then...