History | View | Annotate | Download (405.8 kB)
TLReplaceDisks: Move assertion checking locks
Commit 1bee66f3 added assertions for ensuring only the necessary locksare kept while replacing disks. One of them makes sure locks have beenreleased during the operation. Unfortunately the commit added the check...
node evac: don't call IAllocator if no instances
Currently we generate an empty list only for the '-n node' invocation,but for iallocator we still call the iallocator (which needs an RPCcall, etc.). By moving the computation of instances outside of the if...
Introduce instance start/stop no_remember attribute
This will allow stopping or starting an instance without changing theremembered state. While this seems counter-intuitive at first (it willcreate cluster verify errors), it can help in a few corner cases:...
Try to prevent instance memory changes N+1 failures
There are multiple bugs with the code checking for N+1 failures in theinstance memory changes which needs significant changes, in themeantime we can at least:
- change the warning message into an error (--force will skip checks)...
Add --no-wait-for-sync when converting to drbd
Currently, when converting an instance from plain to DRBD, theinstance is blocked during the entire resync period. This patch addsthe --no-wait-for-sync so that the operation finishes as soon as theDRBD sync has started, without waiting for the entire sync. This makes...
Recreate instance disks: allow changing nodes
This patch introduces the option of changing an instance's nodes whendoing the disk recreation. The rationale is that currently if aninstance lives on a node that has gone down and is marked offline,it's not possible to re-create the disks and reinstall the instance on...
Rename instance: only show new name when different
It makes not sense to show messages like:Fri May 6 02:04:01 2011 - INFO: Resolved given name 'instance18' to'instance18'
So we'll skip the message if the resolved name is identical to therequested one....
Fix race condition in LUGroupAssignNodes
The original code would get all node information and their groupswithout before acquiring the necessary locks. With this patch the nodeinformation is only retrieved once all locks have been acquired. Groupsare locked optimistically and verified after acquiring the node locks....
cmdlib: Fix typo, s/nick/NIC/
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
A small optimisation in cluster verify
This removes (count of instances + count of nodes) lockacquires/releases.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
A few docstring fixes
At least one generates an epydoc error :)
Cluster verify: check for missing bridges
Currently cluster verify doesn't check for bridge information; theonly checks are done at instance create and failover/migratetime. This means a cluster that seems healthy will fail creation jobs.
This patch implements a simple verification that all nodes (in the...
TLReplaceDisks: Use implicit loop for dictionary
Release unneeded locks while replacing disks
If an iallocator is used, “gnt-instance replace-disks” would acquire thelocks of all nodes (only the allocator will decide which node to use).Unfortunately the unneeded locks were not released during the operation,...
Allow creating the DRBD metadev in a different VG
This is a simple change to allow specifying a different VG for themeta device during the creation of instances and addition of disks viagnt-instance modify.
Signed-off-by: Iustin Pop <iustin@google.com>...
Make _GenerateDRBD8Branch accept different VG names
This is a small change to make this function take a list of VG names,instead of a single one.
Fix for multiple VGs - PlainToDrbd and replace-disks
Converting an instance from 'plain' to 'drbd'. The old code wouldcreate the drbd volumes in the default VG and then the renames wouldfail. This fix pulls the plain VG names from the existing volumes and...
Replace disks: keep the meta device in the same VG
This patch enhances the multi-VG support in replace disks, by keepingthe meta device in the same VG, as opposed to moving it to the datadevice VG (note that we don't have a way to create the meta in adifferent VG in the first place, but at least we correctly handle a...
Fix punctuation in an error message
IIRC we don't use punctuation at the end of error messages.
Prevent readding of the master node
This breaks Ganeti in multiple ways. If we don't make the check ingnt-node itself, then bootstrap.SetupNodeDaemon will restart themaster daemon, making the operation fail:
node1# gnt-node add --readd node1 Cannot communicate with the master daemon....
Improve error messages in cluster verify/OS
A few issues in the clarity of the error messages are fixed:
- "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image"
- "API version lenny-image differs from reference node node1: 10, 5...
Fix typo in LUGroupAssignNodes
disk wiping: fix bug in chunk size computation
The current wipe_chunk_size computation is doing min(int_value,float_value). For small disks (below 10GiB), the actual formula willresult into the float value being chosen. This results into veryinteresting behaviour:...
Release locks before wiping disks during instance creation
Ganeti 2.3 introduced an optional feature to overwrite an instance'sdisks on creation. Unfortunately the code kept all locks while doing thewipe, slowing down the creation of multiple instances in parallel....
Nicer formatting for group query error
Before this patc the message would look like “Some groups do not exist:[u'foo', u'bar']”, now it's “Some groups do not exist: foo, bar”.
Merge branch 'stable-2.4' into devel-2.4
LUInstanceQueryData: Don't acquire locks unless requested
Until now LUInstanceQueryData always acquired locks for the instance(s)and nodes involved. In combination with long-running operations thisprevented the use of “gnt-instance info”, even with the “--static”...
Display the actual memory values in N+1 failures
This changes the display from:Mon Apr 4 02:29:46 2011 * Verifying N+1 Memory redundancyMon Apr 4 02:29:46 2011 - ERROR: node node2: not enough memory toaccomodate instance failovers should node node1 fail...
Treat empty oob_program param as default
There is currently no way to reset oob_program back to its default fromthe cmdline, which causes problems for cluster-merge. This patch meansthat the following now works: gnt-cluster modify --node-parameters oob_program=...
Fix bug in instance listing with orphan instances
Nodes can return unknown instances, so we shouldn't use the name as anindex without checking.
Fix potential data-loss bug in disk wipe routines
For the 2.4 release, we only add the missing RPC calls. However, thisneeds to be fixed properly, by preventing usage of mis-configureddisks.
Also add a bit more logging so that it's directly clear on which node...
Merge branch 'devel-2.4' into stable-2.4
NodeQuery: don't query non-vm_capable nodes
Because non-vm_capable nodes most likely don't have a hypervisorconfigured and/or storage, so the call will fail anyway.
Fix HV/OS parameter validation on non-vm nodes
Currently, there is at least one LU that does wrong validation of HVparameters (against all nodes, LUClusterSetParams). It's possible tofix this case, but I went and modified the base functions to filterout non-vm_capable nodes so all callers are protected....
Fix LUClusterRepairDiskSizes and rpc result usage
This LU was introduced before the RPC result conversion from .data to.payload, and it has managed to keep the old-style usage (how? it'sthe only LU that does so). Fix by changing to payload, and add some...
Fix RPC mismatch in blockdev_getsize[s]
Commit 92fd2250 added consistency checks in the RPC layer, which brokethe call_blockdev_getsizes RPC call (declared with 's' at the end inrpc.py, without 's' in the node daemon).
The immediate fix is to correct the rpc function name, the long term...
Fix bug in iallocator data structures build
Commit a1cef11c fixed non-vm_capable nodes export, but brokeinadvertently offline nodes. The update of the dict only needs tohappen for online nodes, in the 'if' block.
Without this patch, offline nodes keep the data from the last node...
Fix error msg for instances on offline nodes
Currently, for both primary and secondary offline nodes, we give thesame message:- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: instance lives on offline node(s) node3...
cluster verify and instance disks on offline nodes
Currently, cluster-verify says:
- ERROR: instance instance14: couldn't retrieve status for disk/0 on node3: node offline- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: couldn't retrieve status for disk/0 on node3: node offline...
Cluster verify and N+1 warnings for offline nodes
Currently, cluster verify shows warnings N+1 warnings for offlinenodes having any redundant instances since the memory data that wehave for those nodes is zero, so any instance will trigger thewarning....
Re-create instance disk symlinks on activate
This patch implements recreation of instance disk symlinks when theactivate-disks operation is run. Until now, it was not possible tore-create these symlinks without stopping and starting or migrating aninstance as the RPC call where this is done was in instance startup...
Export console information as query field
This makes it possible to get the console information via a LUXI query.
Prevent removal of last node group
- Add check in ConfigWriter to prevent last node group from being removed- Tidy up error message a bit
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix instance list for instances running multiple times
If for some reason (e.g. failed migration) one instance is runningon multiple nodes the output can become inconsistent. To get that errorand make it consistent between runs we make the call on the secondary...
Merge branch 'devel-2.3' into devel-2.4
cluster verify: add hvparams verification
Currently, the validity of the hypervisor parameters is only checkedat init/modification time, and not in the cluster verify. This is bad,as it can lead to inconsistent state that is only detected when thenext modification (which can be unrelated) is made, leading to...
Verify disks: increase parallelism and other fixes
The recent work on multi-VG support has converted LUClusterVerifyDisksinto doing serialised calls to each node, as each node can havedifferent VGs. This is suboptimal, especially for big clusters, where...
Deactivate disks: allow skipping hypervisor checks
In some cases (e.g. the hypervisor not running at all), we might wantto force disk deactivation, skipping the hypervisor checks. I believethis is not a good thing to do all the time, so this patch adds the...
Show hidden/blacklisted OSes in cluster info
Since we can blacklist/hide non-existing OSes (for preseeding), wecannot query easily the OSes themselves for this status. Hence weexport the entire lists in cluster info (which should be cheaper thangnt-os diagnose)....
Fix LUOSDiagnose and non-vm_capable nodes
This skips non-vm_capable nodes in the OS diagnose search, since suchOSes will not be used anyway on those nodes.
Rephrasing two error messages for auto promotion
Using auto_promote or auto-promote can lead to confusion on using theuser facing interfaces. While auto-promote is fine for CLI it's not forRAPI and vice-versa. This patch should eliminate this confusion....
Fix payload check for out-of-band health
This logic error was not detected before as health has not beenimplemented on the cli and therefore no QA code existed for that.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix premature abort of LUOobCommand due to result.Raise
This is a bug I recognized while doing tests on gnt-node health. A leftover result.Raise line causes premature abort of LUOobCommand on thefirst node failing the RPC call. This is not expected behaviour for...
Modify LUOobCommand to support multiple nodes
This will change the result of this LU to a query like result. A list oftuples with information about the state of the data.
It also includes the modification to the commands calling this opcode.
Signed-off-by: René Nussbaumer <rn@google.com>...
Another fix for LUClusterVerifyDisks
The LVM queries should only be done for vm_capable nodes. In order todo this, we also add a new ConfigWriter method to abstract that query.
Fix disk adoption breakage
Disk adoption is currently broken by 84d7e26b, which added multiple LVMvolume group support. This patch fixes the calls to rpc.call_vg_list,which are multi-node calls but were handled as single-node calls in84d7e26b.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Fix disk count check in LUSetInstanceParams
LUSetInstanceParams checked instance.nics (and not instance.disks)against constants.MAX_DISKS.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Rename OpGetTags and LUGetTags
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Rename OpSearchTags and LUSearchTags
Rename OpAddTags and LUAddTags
Rename OpTestJobqueue and LUTestJobqueue
Rename OpStartupInstance and LUStartupInstance
Rename OpAddNode and LUAddNode
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Rename OpNodeEvacuationStrategy and LUNodeEvacuationStrategy
Rename OpMigrateNode and LUMigrateNode
Rename OpModifyNodeStorage and LUModifyNodeStorage
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Rename OpPowercycleNode and LUPowercycleNode
Rename OpQueryNodes and LUQueryNodes
Rename OpQueryNodeVolumes and LUQueryNodeVolumes
Rename OpQueryNodeStorage and LUQueryNodeStorage
Rename OpRemoveNode and LURemoveNode
Rename OpSetNodeParams and LUSetNodeParams
Rename OpDiagnoseOS and LUDiagnoseOS
Rename OpDelTags and LUDelTags
Rename OpMigrateInstance and LUMigrateInstance
Rename OpMoveInstance and LUMoveInstance
Rename OpQueryInstances and LUQueryInstances
Rename OpQueryInstanceData and LUQueryInstanceData
Rename OpRebootInstance and LURebootInstance
Rename OpRecreateInstanceDisks and LURecreateInstanceDisks
Rename OpReinstallInstance and LUReinstallInstance
Rename OpRemoveInstance and LURemoveInstance
Rename OpRenameInstance and LURenameInstance
Rename OpReplaceDisks and LUReplaceDisks
Rename OpSetInstanceParams and LUSetInstanceParams
Rename OpShutdownInstance and LUShutdownInstance
Rename OpVerifyDisks and LUVerifyDisks
Rename OpAddGroup and LUAddGroup
Rename OpAssignGroupNodes and LUAssignGroupNodes
Rename OpQueryGroups and LUQueryGroups
Rename OpRemoveGroup and LURemoveGroup
Rename OpRenameGroup and LURenameGroup
Rename OpSetGroupParams and LUSetGroupParams
Rename OpActivateInstanceDisks and LUActivateInstanceDisks
Rename OpConnectConsole and LUConnectConsole
Rename OpCreateInstance and LUCreateInstance
Rename OpDeactivateInstanceDisks and LUDeactivateInstanceDisks
Rename OpFailoverInstance and LUFailoverInstance
Rename OpGrowDisk and LUGrowDisk
Rename OpVerifyCluster and LUVerifyCluster