History | View | Annotate | Download (424.1 kB)
TLMigrateInstance: Fix live migration breakage
Commit 77fcff4 unintentionally incorporated code fromTLMigrateInstance.CheckPrereq into TLMigrateInstance._RunAllocator, presumablyduring a rebase from earlier versions of the patch to the 2.5 codebase. As a...
cmdlib: Update error messages, remove some punctuation
- Clarify some error messages- Remove unnecessary punctuation- Merge two if conditions in one place
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
cmdlib: Factorize lock releasing
There will be more lock releasing with upcoming changes, so this willcentralize the logic behind it (what locks to keep, which variables toupdate, etc.).
Merge branch 'devel-2.4'
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
TLReplaceDisks: Use implicit loop for dictionary
Release unneeded locks while replacing disks
If an iallocator is used, “gnt-instance replace-disks” would acquire thelocks of all nodes (only the allocator will decide which node to use).Unfortunately the unneeded locks were not released during the operation,...
cmdlib: Drop SSH runner from LU base class
It is no longer used.
cmdlib.py: fix indentation in _VerifyNode
Signed-off-by: Adeodato Simo <dato@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
TLMigrateInstance: Fix confusing text
Commit d5cafd31 changed this error message, swapping thetext parts in the process.
LUInstanceRename: Amend comment about lock
Also add an assertion.
iallocator: Relocation nodes must be in same group
Quoting from iallocator.rst: “[…] ``relocate`` request is used when anexisting instance needs to be moved within its node group […]”.
Allow creating the DRBD metadev in a different VG
This is a simple change to allow specifying a different VG for themeta device during the creation of instances and addition of disks viagnt-instance modify.
Signed-off-by: Iustin Pop <iustin@google.com>...
Make _GenerateDRBD8Branch accept different VG names
This is a small change to make this function take a list of VG names,instead of a single one.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix for multiple VGs - PlainToDrbd and replace-disks
Converting an instance from 'plain' to 'drbd'. The old code wouldcreate the drbd volumes in the default VG and then the renames wouldfail. This fix pulls the plain VG names from the existing volumes and...
Replace disks: keep the meta device in the same VG
This patch enhances the multi-VG support in replace disks, by keepingthe meta device in the same VG, as opposed to moving it to the datadevice VG (note that we don't have a way to create the meta in adifferent VG in the first place, but at least we correctly handle a...
Fix punctuation in an error message
IIRC we don't use punctuation at the end of error messages.
Prevent readding of the master node
This breaks Ganeti in multiple ways. If we don't make the check ingnt-node itself, then bootstrap.SetupNodeDaemon will restart themaster daemon, making the operation fail:
node1# gnt-node add --readd node1 Cannot communicate with the master daemon....
Improve error messages in cluster verify/OS
A few issues in the clarity of the error messages are fixed:
- "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image"
- "API version lenny-image differs from reference node node1: 10, 5...
masterd: Add support for tagging node groups
TLMigrateInstance: remove 10s sleeps
TLMigrateInstance._ExecMigration contains two 10-second sleeps betweenindividual migration steps.
Apart from prolonging the migration duration by 20s, the second sleepcauses FinalizeMigration to be called 10 seconds after the real...
Fix typo in LUGroupAssignNodes
disk wiping: fix bug in chunk size computation
The current wipe_chunk_size computation is doing min(int_value,float_value). For small disks (below 10GiB), the actual formula willresult into the float value being chosen. This results into veryinteresting behaviour:...
Release locks before wiping disks during instance creation
Ganeti 2.3 introduced an optional feature to overwrite an instance'sdisks on creation. Unfortunately the code kept all locks while doing thewipe, slowing down the creation of multiple instances in parallel....
Nicer formatting for group query error
Before this patc the message would look like “Some groups do not exist:[u'foo', u'bar']”, now it's “Some groups do not exist: foo, bar”.
Merge branch 'stable-2.4' into devel-2.4
LUInstanceQueryData: Don't acquire locks unless requested
Until now LUInstanceQueryData always acquired locks for the instance(s)and nodes involved. In combination with long-running operations thisprevented the use of “gnt-instance info”, even with the “--static”...
TLMigrateInstance: Merge failover code, allow fallback
As the code for failover for checking is almost identical it's an easytask to switch it over to the TLMigrateInstance. This allows us tofallback to failover if migrate fails prereq check for some reason....
Verify file consistency using centrally computed list
Until now “gnt-cluster verify” (LUClusterVerify) would compute its ownlist of files to check for consistency. This list was not complete andcertain inconsistencies were missed.
With this patch the code is changed to use the list of files used by...
cmdlib: Factorize computation of ancillary files
… and change the logic in _RedistributeAncillaryFiles. The virtuallysame list of files will be used to verify the files' consistency.
cmdlib: Fix mistake made in commit 75c7520f0
Commit 75c7520f0 used the wrong constant. I double-checked all otherchanges made in the commit.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
cmdlib: Replace hardcoded values with constants
Display the actual memory values in N+1 failures
This changes the display from:Mon Apr 4 02:29:46 2011 * Verifying N+1 Memory redundancyMon Apr 4 02:29:46 2011 - ERROR: node node2: not enough memory toaccomodate instance failovers should node node1 fail...
Relax instance ERROR on admin_down on offline node
This fixes a issue, where an stopped instances is reported as ERRORin cluster verify if it lives on a offline node. As the instances isdown this shouldn't happen.
Signed-off-by: René Nussbaumer <rn@google.com>...
Implement submitting jobs from logical units
The design details can be seen in the design document(doc/design-lu-generated-jobs.rst).
Split BuildHooksEnv of LUs
Commit dd7f677623 added another call to BuildHooksEnv to providepost-phase status variables. Since BuildHooksEnv also built the nodelists, that meant they have to be built twice. First a rather strictcheck was used, but it turned out to be more tricky. Commit b423c51336...
Fix hook node list when adding node
This broke QA (and everyone trying to add a node) by complaining aboutdifferent node lists.
cmdlib: Factorize running post-pase hook
masterd: Simplify code for field queries
Instead of going via cmdlib and using special cases for differentresources, the list of fields is used directly.
constants: Rename QR_OP_*, add QR_VIA_RAPI
Commit 28b71a76 added a list of resources which can be queried usingLUXI. Unfortunately the variable was named “QR_OP_LUXI”, which can beconfusing. This patch renames “QR_OP_QUERY” to “QR_VIA_OP”, “QR_OP_LUXI”...
TLReplaceDisks: Add check if disks are activated
Previously we failed later with a rather useless error message. Thispatch fixes this and tells the user to activate-disks if replace-disksis in the need of activated disks rather than abort with a cryptic error...
LUOsDiagnose: Move legacy behaviour into filter
The behaviour of LUOsDiagnose needs special treatment. Commit d22dfef7changed it to not return hidden, blacklisted or invalid OSes if therespective field is not requested. This behaviour needs to be preserved...
Convert OsDiagnose to query
Treat empty oob_program param as default
There is currently no way to reset oob_program back to its default fromthe cmdline, which causes problems for cluster-merge. This patch meansthat the following now works: gnt-cluster modify --node-parameters oob_program=...
Fix bug in instance listing with orphan instances
Nodes can return unknown instances, so we shouldn't use the name as anindex without checking.
Instance failover: fix bug for INT_MIRROR cases
Patches db366d9a and aac4511a added support for EXT_MIRROR instances,but inadvertently introduced a bug: for INT_MIRROR cases, we don'tneed (actually we can't support) neither an iallocator nor a targetnode....
OpOobCommand: Adding power on delay
This delays the invocation of the power on of the next node. So if youpower on a bunch of nodes it will not blow the fuse.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Rename DTS_NET_MIRROR to DTS_INT_MIRROR
DTS_INT_MIRROR better contrasts DTS_EXT_MIRROR.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>[iustin@google.com: updated patch for changed context]Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Shared storage instance failover
Modify LUFailoverInstance to enable shared storage instances to failover.Shared storage instance failover requires either a target node or aniallocator to determine the target node. If none is given, the cluster default...
Shared storage node migration
Modify LUNodeMigrate to provide node migration for nodes with instances usingshared storage. gnt-node migrate has to be passed an iallocator for migrationof shared storage instances to be performed. When using a shared storage...
Shared storage instance migration
Modify LUMigrateInstance and TLMigrateInstance to allow instance migrations forinstances with DTS_EXT_MIRROR disk templates.
Migrations of shared storage instances require either a target node, or aniallocator to determine the target node. If none is given, the cluster default...
IAllocator changes to work with shared storage
Make cmdlib.IAllocator shared-storage-aware. IAllocator requires secondarynodes only on DTS_NET_MIRROR disk templates and requires no secondaries forDTS_EXT_MIRROR templates.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Shared block storage support
This patch introduces basic shared block storage support.
It introduces a new storage backend, bdev.PersistentBlockDevice, touse as a backend for shared block storage. The new bdev requires a newBLOCKDEV_DRIVER_MANUAL constant with the value "manual" and uses it as...
Merge branch 'stable-2.4'
LUInstanceRename: Fail if renamed hostname mismatch
There's a problem if you run gnt-instance rename with a non FQDN and therenamed LU tries to resolve the hostname to make it FQDN. It could bethat this resolved hostname was just a CNAME to another name which leads...
Remove deprecated 'bridge' nic parameter
This has been a synonym for "link" since a few major versions.Add a NEWS entry so we won't forget to mention it at release time.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
query: Fix bug when names are specified
If the client/caller would specify names through the use of a filter,the result would be sorted. This is a regression over earlier Ganetiversions and verified in QA. This patch adds an optional parameter tocontrol the sorting and provides unittests....
Core shared file storage support
This patch introduces core file storage support, consisting of the following:
A configure-time switch for enabling/disabling shared file storagesupport and controlling the shared file storage location:--with-shared-file-storage-dir=. Shared file storage configuration is then...
cmdlib: Allow use of more complex filters
This patch finally enables the use of complex filters through opcodesand LUXI.
Fix potential data-loss bug in disk wipe routines
For the 2.4 release, we only add the missing RPC calls. However, thisneeds to be fixed properly, by preventing usage of mis-configureddisks.
Also add a bit more logging so that it's directly clear on which node...
gnt-instance reboot start instance if not yet started
This patch starts the instance when gnt-instance reboot is invoked on ainstance already stopped.
Add constants for instance status
They've been hardcoded for too long.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
cmdlib: Fix pylint error
cmdlib: Use filters internally for queries
This is in preparation for implementing real query filters.
Merge branch 'devel-2.4' into stable-2.4
NodeQuery: don't query non-vm_capable nodes
Because non-vm_capable nodes most likely don't have a hypervisorconfigured and/or storage, so the call will fail anyway.
Fix HV/OS parameter validation on non-vm nodes
Currently, there is at least one LU that does wrong validation of HVparameters (against all nodes, LUClusterSetParams). It's possible tofix this case, but I went and modified the base functions to filterout non-vm_capable nodes so all callers are protected....
Fix LUClusterRepairDiskSizes and rpc result usage
This LU was introduced before the RPC result conversion from .data to.payload, and it has managed to keep the old-style usage (how? it'sthe only LU that does so). Fix by changing to payload, and add some...
Fix RPC mismatch in blockdev_getsize[s]
Commit 92fd2250 added consistency checks in the RPC layer, which brokethe call_blockdev_getsizes RPC call (declared with 's' at the end inrpc.py, without 's' in the node daemon).
The immediate fix is to correct the rpc function name, the long term...
Remove force_master support from LUOobCommand
As per discussion on the man-page1 update, this functionality should beremoved and replaced by just give the command to run if the user insistsof power cycle/power off the master and refuse to operate.
[1] http://groups.google.com/group/ganeti-devel/browse_thread/thread/95d4879a747cc295...
Make OpGroupRename consistent with OpInstanceRename
OpInstanceRename uses “instance_name” (like almost all other OpInstance*opcodes), not “old_name”, to specify the original name. OpGroupRename ismade consistent by renaming “old_name” to “group_name”....
Fix bug in iallocator data structures build
Commit a1cef11c fixed non-vm_capable nodes export, but brokeinadvertently offline nodes. The update of the dict only needs tohappen for online nodes, in the 'if' block.
Without this patch, offline nodes keep the data from the last node...
Fix error msg for instances on offline nodes
Currently, for both primary and secondary offline nodes, we give thesame message:- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: instance lives on offline node(s) node3...
cluster verify and instance disks on offline nodes
Currently, cluster-verify says:
- ERROR: instance instance14: couldn't retrieve status for disk/0 on node3: node offline- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: couldn't retrieve status for disk/0 on node3: node offline...
Cluster verify and N+1 warnings for offline nodes
Currently, cluster verify shows warnings N+1 warnings for offlinenodes having any redundant instances since the memory data that wehave for those nodes is zero, so any instance will trigger thewarning....
Fix execution order in LUOobCommand causing wrongly setup node list
In commit bfceedbe a check was added to put the master at the end orskip it completely. While this functionality works, it was done at thewrong point because node_names was already processed to a node list...
Do not repeatedly call GetClusterInfo() in inner loop
Signed-off-by: Adeodato Simo <dato@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix docstring for NodeImage.sbp attribute
This was stating "secondary nodes" were the keys of the dictionary, whenthey are primary nodes. Also, further clarify only the node's secondaryinstances are included.
Signed-off-by: Adeodato Simo <dato@google.com>...
Add two new opcode options to LUOobCommand
This patch adds ignore_status to ignore the offline flag of nodesand also adds a force_master option to force operations on master nodeif they will make the master unavailable (for some time).
Re-create instance disk symlinks on activate
This patch implements recreation of instance disk symlinks when theactivate-disks operation is run. Until now, it was not possible tore-create these symlinks without stopping and starting or migrating aninstance as the RPC call where this is done was in instance startup...
Export console information as query field
This makes it possible to get the console information via a LUXI query.
Prevent removal of last node group
- Add check in ConfigWriter to prevent last node group from being removed- Tidy up error message a bit
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix instance list for instances running multiple times
If for some reason (e.g. failed migration) one instance is runningon multiple nodes the output can become inconsistent. To get that errorand make it consistent between runs we make the call on the secondary...
Merge branch 'devel-2.3' into devel-2.4
cluster verify: add hvparams verification
Currently, the validity of the hypervisor parameters is only checkedat init/modification time, and not in the cluster verify. This is bad,as it can lead to inconsistent state that is only detected when thenext modification (which can be unrelated) is made, leading to...
Verify disks: increase parallelism and other fixes
The recent work on multi-VG support has converted LUClusterVerifyDisksinto doing serialised calls to each node, as each node can havedifferent VGs. This is suboptimal, especially for big clusters, where...
Deactivate disks: allow skipping hypervisor checks
In some cases (e.g. the hypervisor not running at all), we might wantto force disk deactivation, skipping the hypervisor checks. I believethis is not a good thing to do all the time, so this patch adds the...
Show hidden/blacklisted OSes in cluster info
Since we can blacklist/hide non-existing OSes (for preseeding), wecannot query easily the OSes themselves for this status. Hence weexport the entire lists in cluster info (which should be cheaper thangnt-os diagnose)....
Fix LUOSDiagnose and non-vm_capable nodes
This skips non-vm_capable nodes in the OS diagnose search, since suchOSes will not be used anyway on those nodes.
Rephrasing two error messages for auto promotion
Using auto_promote or auto-promote can lead to confusion on using theuser facing interfaces. While auto-promote is fine for CLI it's not forRAPI and vice-versa. This patch should eliminate this confusion....
Fix payload check for out-of-band health
This logic error was not detected before as health has not beenimplemented on the cli and therefore no QA code existed for that.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix premature abort of LUOobCommand due to result.Raise
This is a bug I recognized while doing tests on gnt-node health. A leftover result.Raise line causes premature abort of LUOobCommand on thefirst node failing the RPC call. This is not expected behaviour for...
Modify LUOobCommand to support multiple nodes
This will change the result of this LU to a query like result. A list oftuples with information about the state of the data.
It also includes the modification to the commands calling this opcode.
Another fix for LUClusterVerifyDisks
The LVM queries should only be done for vm_capable nodes. In order todo this, we also add a new ConfigWriter method to abstract that query.
Fix disk adoption breakage
Disk adoption is currently broken by 84d7e26b, which added multiple LVMvolume group support. This patch fixes the calls to rpc.call_vg_list,which are multi-node calls but were handled as single-node calls in84d7e26b.
Fix disk count check in LUSetInstanceParams
LUSetInstanceParams checked instance.nics (and not instance.disks)against constants.MAX_DISKS.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Rename OpAddTags and LUAddTags
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Rename OpTestJobqueue and LUTestJobqueue