History | View | Annotate | Download (240.8 kB)
Handle ghost instances in temp DRBD map
Currently cluster-verify doesn't handle the (admitedly invalid) case where wehave reservation for instances that were removed in the meantime.
This patch adds a check for this and prevents code errors in cluster-verify in...
Fix error handling in replace-disks with new node
Currently the _CreateSingleBlockDev function only raises OpExecError and notBlockDeviceError. This means that we don't release the instance's temporaryminors properly, and this creates problems later if the instance is removed...
Export tags to cluster verify hooks
This patch export the cluster and node tags to the cluster verify hookscripts. The tags are exported as a space-separated list, which allowseasy parsing from the shell (e.g. “for tag in $GANETI_CLUSTER_TAGS; do...”) and therefore requires the previous “Don't allow spaces in tag...
Update the iallocator documentation
This updates the iallocator documentation to 2.0, bumps up theiallocator version (and moves a constants to lib/constants.py), andfixes a style on install.rst.
Reviewed-by: ultrotter
LUVerifyCluster: Handle the "no volume group" case
If we're only file based and out volume group is set to "None" there'sno point in asking nodes for their volume groups, logical volumes, anddrbd devices, and checking those.
Reviewed-by: iustinp
Fix some epydoc style issues
99% of the epydoc return tags are "@return:", but each of the modified fileshad one "@returns:" line. We fix this for consistency.
Reviewed-by: imsnah
Update some hooks settings
While reviewing the hooks document, I realised we are not correctlyexporting the instance properties.
This patch fixes: - export the disk and disk template in all LUs, not only (hardcoded) in the instance create - removes the instance create INSTANCE_ prefix on some non-instance...
Remove the extra_args parameter in instance start
This patch removes the extra_args parameter and instead switches theinstance to the HV_KERNEL_ARGS hypervisor option.
This is a big change, but it's a needed cleanup, this extra parameter onall RPC calls is not generic and we also need to have a persistent value...
Make gnt-instance info work with offline nodes
This simply makes LUQueryInstanceData return the same information as fora static query when one or both of the nodes are down.
Fix some bugs in reboot
There are two issues fixed in this patch: - first, the recent RPC changes caused loss of data in hard reboot type; we weren't reporting any results from the stop/start instance calls; - second, in soft or hard reboots, we didn't initialized the disk...
Convert IOErrors for /proc/drbd into our errors
If /proc/drbd can't be opened, this raises an IOError, but all theerror-handling behaviour in backend treats only BlockDeviceErrors. Thiscreates a plain failure in cluster verify and in other RPC calls.
This patch simply converts EnvironmentErrors into BlockDeviceErrors, and...
SetInstanceParams: export nic changes to hooks
Currently we export the old instance "as is" and any nic changes getlost, so hooks won't know of a different ip, bridge, or mac address.This patch fixes it by putting the nics in the override dict, if anychanges are done....
LUSetInstanceParams: Fix nic handling
CheckArguments: Use constants.VALUE_NONE rather than hardcoding the string "none" If we're adding a nic fill the nic_dict with default values Check if the mac is syntactically valid, if we have one Don't allow the mac to be 'auto' when modifying a nic...
Instance Creation: generate nics earlier
We want the real nic to be shown to the hooks and the allocators, sowe'll generate them in CheckPrereq. We also write a comment about therace condition we generate. This race condition existed even before, somoving this generation will just lenghen it a bit. A separate patch...
Some small fixes
This patch removes the admin_ram LUQueryInstances field (is brokenanyway) and fixes the VNC address checks in the Xen Hypervisor.
Fix LUQueryInstances fields.
The query fields are now regular expressions. We need to quote the dots,otherwise invalid fields will be accepted but they will lose specialformatting in the cli scripts.
Fix RPC result handling in _AssembleInstanceDisks
For (status, data)-style RPC calls, the result data is in the ‘payload’attribute. This was missed in the conversion patch, with the only sideeffect that gnt-instance activate-disks didn't show a nice output...
Switch the instance_shutdown rpc to (status, data)
This patch changes the return type from this RPC call to include statusinformation and renames the backend method to match the RPC call name.
The patch is a little bigger than the reboot one, since this call is...
Switch the instance_reboot rpc to (status, data)
This small patch changes the return type from this RPC call to includestatus information and renames the backend method to match the RPC callname.
_GenerateDiskTemplate: correct file disk index
Currently when adding disks the base for the index is not taken intoaccount, and disk 0 is added twice.
HTS_USE_VNC, rename and remove KVM
Currently we use the HTS_USE_VNC constant only to copy the vnc passwordfile. While KVM uses vnc it currently has no password support, nor we'llbe on time making one for 2.0, so renaming the constant toHTS_COPY_VNC_PASSWORD and only putting Xen HVM in it. In the future...
Some fixes to node add and re-add
The patch changes the pre-checks in node-add and re-add: - if the node is not already in the cluster, refuse to re-add - when re-adding, reuse the secondary IP from the cluster configuration - when re-adding, reset the offline and drained flags, so that RPC...
Instance parameters: force typing
We want all the hv/be parameters to have a known type, rather than arandom mix of empty string, boolean values, and None, so we declare thetype of each variable and we enforce/convert it.
- Add some new constants for enforceable value types...
Implement modification of the drained flag
This patch adds LU and cli-level support for modification of the nodedrained flag. It is similar to the offline changes.
Prevent allocations on drained nodes
This patch adds checks for drained nodes in the logical units thatallocate or move instances around. We also update an error message (notstyle-compliant).
cluster verify: show correctly drained nodes
This patch changes slightly the output of gnt-cluster verify for drainednodes, and also adds a note with the total number of drained nodes(similar to the offline nodes note).
Allow query of the drained node attribute
This patch exports the drained attribute: - LUQueryNodes accepts now the drained field - RAPI exports it for node objects - gnt-node info shows it now (along newly-added master_candidate and offline flags)...
Add a ‘drained’ attribute to node objects
This attribute will be used to prevent any allocation on the node (anyof replace-disks with new secondary this node, failover to the node,migration to the node).
The patch adds the attribute and initializes it correctly in cluster...
Switch the blockdev_remove rpc to (status, data)
This converts the backend and cmdlib modules to a (status, data)implementation of the blockdev_remove rpc call. bdev.py is not yetconverted, so we don't actually have error information.
We also fix a bug in _RemoveDisks by not reusing a variable....
Switch the blockdev_shutdown rpc to (status, data)
This converts the backend and cmdlib modules to a (status, data)implementation of the blockdev_shutdown rpc call. bdev.py is not yetconverted, so we don't actually have error information.
We also fix a bug in _ShutdownInstanceDisks by not reusing a variable....
Convert blockdev_assemble rpc to (status, data)
This converts the RPC call blockdev_assemble to the new-style resultformat. Note that we won't usually have error information, but it's thefirst step toward it.
LUSetInstanceParams: use the correct hvparams
In LUSetInstanceParam we used to save the dict without defaults for theinstance params as hv_inst, but to use the populated one for theinstance (hv_new). Fixing this leads to instances without all theparameters set....
Add a new instance query flag ‘disk_usage’
This patch adds a new instance query flag called disk_usage thatretrieves the overall space used by an instance on each of its nodes.This can be used when balancing the cluster or checking N+1 status.
The flag is also exported in RAPI. Note the flag is currently broken for...
Uniformize some function names in backend.py
Currently, the names of the functions in backend.py that are actuallyRPC procedures and are called from ganeti-noded are not corresponding tothe RPC names. This makes it hard to actually see which functions are...
rpc.call_blockdev_find: convert to (status, data)
This patch converts the call_blockdev_find - which searches for blockdevices and returns their status - to the (status, data) format. We alsomodify the backend function name to match the rpc call.
Export the cpu nodes and sockets from Xen
This is a hand-picked forward patch of commit 1755 on the 1.2 branch(hand-picked since the trees diverged too much since then):
The patch changed the xen hypervisor to compute the number of cpu sockets/nodes and enables the command line and the RAPI to show this...
cmdlib: simplify some rpc error handling cases
By using the RemoteFailMsg() or the payload field of RpcResult, we cansimplify a few functions in cmdlib.
LUCreateInstance: only set running flag at the end
In lockless queries, it's better if we see the instance in ADMIN_downrather than ERROR_down during the time it's installed. As such, wechange the LU to only mark the instance 'up' at the time we are ready to...
Enable lockless node queries
Similar to the instance list, this patch enables lockless node queris.“gnt-node list” accepts now the “--sync” flag which enables locking, thedefault is lockless.
Implement lockless query operations
This patch adds the framework for, and enables lockless OpQueryInstances. Thismeans that instances will be shown in ERROR_up or ERROR_down state, even thoughthis is not an error (but just an in-progress job).
The framework is implemented as follows:...
An attempt at fixing some encoding issues
This patch unifies the hardcoded re-encoding attempts into a singlefunction in utils.py. This function is used to take either an unicode orstr object and convert it to a ASCII-only str object which can be safely...
Small patch for handling errors in node add
This small path hopefully fixes the handling of ssh verify errors innode add (note: untested).
Return error messages in node add ssh handling
When the rpc call node_add fails, we don't have any error message. Thispatch changes the call to return (status, data) so that the user can seethe correct error message.
LUQueryClusterInfo: filter hvparams
We don't need to show hvparams for hypervisors which are not enabled onthe cluster.
LUAddNode: copy the vnc password file also for KVM
Before we used to copy the file if xen-hvm was enabled on the cluster,no we'll do that if any enabled hypervisor is in the new HTS_USE_VNCgroup.
GetShellCommand: get hvparams and beparams
Sometimes the hypervisor will use the instance hv and/or be parametersto determine the best shell command. This is not possible, though,currently, as the instance hv/beparams are not filled, so we have topass the filled versions separately....
Implement software release version checks too
Currently the LUVerifyCluster only reports the protocol version changes,not software ones. This is useful to know/monitor, so we add this too asa warning.
LUQueryInstances: keep the given order of names
Currently LUQueryInstances keeps the ordering of instances only in some cases,and in others it will reorder the list. This patch fixes this by more clearlyseparating the various cases (names passed or not and locking or not locking),...
Fix gnt-cluster modify -H and offline nodes
Fix the mode attribute of newly-created disks
Currently, only the LUSetInstanceParams correctly sets up the modeattribute via a manual operation. We remove this and instead do thecorrect setting in the generic _GenerateDiskTemplate function, so thatwe set the mode correctly for all disk creations....
Fix batcher for 2.0-style disks and nics
This patch fixes the gnt-instance batch-create command, and in doing soalso slightly changes two other functions: - we change utils.ParseUnit so that it accepts integer values also (both ParseUnit(5) and ParseUnit("5") return the same value)...
Make iallocator work with offline nodes
This patch changes the iallocator framework to work with and properlyexport to plugins offline nodes. It does this by only exporting thestatic configuration data for those nodes, and not attempting to parsethe runtime data....
Remove checking of DRBD metadata for validity
Currently the DRBD code checks that the metadata devices are validbefore creation, initial disk attachment and add children.
However, the process for checking validity requires a free DRBD minor,and this conflict with parallel checking....
A couple of small fixes to iallocator
This removes some constraints: - only two disks supported, this is no longer true as the underlying functions can now compute size for a variable number of disks - error when the hypervisor was not being passed...
Automatically release DRBD minors on success
This patch converts the DRBD minors reservation protocol from explicitrelease to automatic release on the success paths. On the errors paths,it's still needed to manual release.
The patch doesn't bring much by itself, but is needed for a future patch...
Change the instance status attribute to boolean
Due to historic reasons, the “should run or not” attribute of aninstance was denoted by its “status” attribute having a string value ofeither ‘up’ or ‘down’. Checking this is in code was done via hardcoding...
Add calls in the intra-node migration protocol
Currently the hypervisor is expected to do all the migration from thesource side. With this patch we also add the option of passing someinformation to the target side, and starting some operation there.
As a bonus, a function to cleanup any started operation is included....
Convert RenameInstance to (status, data)
This allows the rename failures to show the ouput of OS scripts.
Fix adding of disks to an instance
The ConfigWriter.AllocateDRBDMinor requires the instance name, not theinstance object. The LUSetInstanceParms is passing wrongly the instanceobject, which can cause breakage.
The patch also adds asserts to check for this mismatch in ConfigWriter....
Make cluster-verify check the drbd minors space
This patch adds support for verification of drbd minors space in clusterverify: minors which belong to running instances and should be onlinebut are not, and minors which do not belong to any instace but are in...
Some small fixes in cmdlib
Convert AddOSToInstance to (status, data)
This allows the install and reinstall instance to return (hopefully)relevant log files from the OS create scripts.
Convert the start instance rpc to (status, data)
This will record the failure cause in starting up the instance in thejob log (and thus to the user).
Fix handling of failures in create instance disks
Commit 2302 only modified _CreateBlockDevOnPrimary to the new styleresult, but _CreateBlockDevOnSecondary was forgotten. After the mergerof the two functions, _CreateBlockDevOnSecondary was taken as template...
Use instance.all_nodes instead of hand-building it
This patch replaces a few obvious uses of [instance.primary_node] +list(instance.secondary_nodes) (or similar usage) with the newinstance.all_nodes.
Split the block device creation in two parts
Some callers of _CreateBlockDev need recursive behaviour, but not all.The replace secondary first creates (manually) new LVs to ensure storageis there, and then it creates the new DRBD. At this point, we need a...
Combine the two _CreateBlockDevOnXXX functions
Since only two boolean parameters differ between these two functions, wecombine them as to have less code duplication. This will be needed inthe future as we will need to split off the recursive part off....
Switch call_blockdev_create call to (status, data)
This allows errors to be visible at the user level instead of just nodedaemon logs.
Small change in the instance disk creation path
For future propagation of error messages from backend to cmdlib and tothe job log, just having True/False return from the disk creationfunction is not enough.
This patch converts these functions (_CreateDisks, _CreateBlockDevOnXXX)...
Use the same root for both _data and _meta LVs
Currently we use a different UUID for the _data and _meta volumes of aDRBD disk. This is confusing as it's hard to associate the two in theoutput of “lvs” or “gnt-node volumes”.
The patch changes so that they use the same prefix....
Fix LUExportInstance
Due to deficiencies in our block device implementation, it is a must tocall SetDiskID on disks before passing them to remote nodes. Since inexport/import, we don't touch the disks themselves, this was not neededbefore in this function....
Fix gnt-backup export with short names
We need to pass the fully-qualified node to _CheckNodeOnline, not the shortone.
Forward port the live migration from 1.2 branch
This is forward port via copy (and not individual patches cherry-pick)of the latest code on the 1.2 branch related to the migration.
The changes compared to 1.2 are the fact that we don't need theIdentifyDisks step anymore (the drbd rpc calls are independent now), and...
Port replace disk/change node to the new DRBD RPCs
In replace disks to new secondary, since Attach (and thereforecall_blockdev_find) is not modifying the devices anymore, we need toswitch this LU to the new call_drbd_disconnect_net andcall_drbd_attach_net functions....
Fix modification of instance memory
... as found by the QA script - bug was introduced by me in commit 2117.
Reviwed-by: imsnah
Fix some errors in instance modify --disk remove
The RpcResult introduction still left some bugs (after multiple patches): - we don't correctly check the result type - rename a variable to prevent a conflict
Fix an error handling case in instance info
The checking for invalid instance names in LUQueryInstanceData is brokensince commit 1642.
Introduce a very simple LU to force config updates
This LU can be used to force a push of the config in case it's needed,for example after an upgrade to update the ssconf_release_version file.
Fix gnt-os for offline nodes
We shouldn't query offline nodes in gnt-os. This patch adds an utilityfunction to ConfigWriter that returns the names of online nodes and usesit in LUDiagnoseOS to query only the good nodes.
Cleanup replace-disks modes and options
In 1.2, due to the md+drbd7 legacy, we had a complex choice of replacemodes, and the new drbd8 modes where forced into this syntax, with somecomplicated rules of transition from one mode to another (if REPLACE_ALL...
Fix cluster verify/node net test for offline nodes
For offline nodes, we shouldn't add them to the NV_NODELIST andNV_NODENETTEST tests since they most likely won't succeed.
The patch makes gnt-cluster verify happy again in such cases.
_AssembleInstanceDisks: fix rpcresult handling
Commit 2117 changed _AssembleInstanceDisks to correctly parse thefailure status of the new RpcResult structure, but it didn't fix thestoring of only the result payload. Since RpcResult is not JSONserializable, LUActivateInstanceDisks is failing....
ganeti.cmdlib: Check remote API certificate on "gnt-cluster verify"
Reviewed-by: amishchenko
LUConnectConsole: fix primary_node online check
The primary node is part of the instance, not of the opcode.
cleanup: fix IAllocator hypervisor usage
Two problems: the iallocator.hypervisor wasn't initialized to None inthe constructor, so pylint doesn't realize it's initialized later withsetattr.
Second, 'hypervisor' is a module, so we shouldn't use it as a variable....
cleanup: LUReplaceDisks unused vars
And a small whitespace fix.
cleanup: do not hide upper-scope name
hypervisor is a module, so we shouldn't use it as an argument.
cleanup: fix use of _CheckNodeOnline
A few cases of wrong variable name.
cleanup: LUAddNode, LUSetNodeParams unused variable
This is a leftover from the abstraction of AdjustCandidatePool, and italso requires the config lock, so it's better to remove it.
cleanup: LURenameCluster wrong variable name
Warn for instances living on offline nodes
The patch also changes the result to error for non-reachable secondary nodes(as for primary nodes).
Fix _AdjustCandidatePool
Currently the ConfigWriter.MaintainCandidatePool returns node names, and_AdjustCandidatePool uses them as such, but then it passes these tocontext.ReaddNode which in turn passes them to jqueue.JobQueue.AddNode whichuses them as objects.Node instances....
gnt-node modify: add the offline attribute
This patch changes gnt-node modify and the associated opcode/lu to allowmodification of the node offline attribute.
Setting a node into offline mode automatically demotes it from themaster role.
Make cluster verify understand offline nodes
This patch changes cluster verify to not alert on offline nodes, butinstead just show a note at the end with the number of such nodes.
It also removes warnings in verify-disks and hooks about failures tomake rpc calls to such nodes....
cmdlib: check node stats in prereqs
This patch adds checks for offline nodes in most instance LUs so that wecan work with offline secondaries, but not with offline primaries. Somecases (like grow disk, which needs both sides up) are not allowingoffline nodes at all....
Add two utility functions to cmdlib
These will be used for parameter checking and node status checking.
Add function to compute the master candidates
Since some nodes can be offline, we can't just take the length of thenode list as the maximum possible number of master candidates.
The patch adds an utility function to correctly compute this value andreplaces hardcoded computations with the use of this function. It then...
Cleanup the config file on demotion from candidate
This patch adds a simple rpc which makes a backup of the config file andthen removes it. This is done so that cluster verify doesn't complainimmediately after demoting a node.
watcher: handle offline nodes better
This patch changes the LUQueryInstances to show a different state foroffline nodes and also modifies the watcher to understand the offlinestate in its checks.
node list: add the offline field