History | View | Annotate | Download (252.8 kB)
Fix the confusing ssh/hostname message in node add
Before, it used to say:
ssh/hostname verification failed node1.example.com -> hostname mismatch, got node2
Now it says for wrong hostnames (maybe too verbose):
ssh/hostname verification failed (checking from node1.example.com): hostname...
Merge commit 'origin/next'
Add check for duplicate MACs in instance add
Currently LUAddInstance doesn't check for duplicate MACs, and it failsduring the Exec() phase when trying to add the instance to the config(ConfigWriter checks for this). This patch copies the code fromLUModifyInstance (which already does it)....
Return cluster tags from LUQueryClusterInfo
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
repair-size: ensure child disks have sane sizes
While this patch doesn't do a full match of on-disk size to config-sizefor child devices, it does a sanity check (for DRBD only) that the childsize is not less than the DRBD size. While this would be a strange...
Fix checks in LUSetNodeParms for the master node
There was a check already in the LU for the master node, however iswasn't correct. This patch disallows any role changes on the master nodevia LUSetNodeParms (and as this LU can't change anything else, it...
Ignore results from drained nodes in iallocator
Since drained nodes could be (partially or fully) broken in iallocator,we ignore results from these nodes when building the cluster map inpreparation for sending it to the script.
This is a cheap change for the stable branch; ideally we should not...
Fix yet another bug in LURepairDiskSizes
This is a result of broken copy-paste, and because needed_locks is not adict right here it will error out.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix a bug in LURepairDiskSizes
The same old story: disks need SetDiskID before being sent to remotenodes.
Add cluster-init --no-etc-hosts parameter
If --no-etc-hosts is passed in at cluster init time we set a newparameter in the cluster's object to false, and avoid adding nodes tothe hosts file. The UpgradeConfig function is used to set the value toTrue, when upgrading from an old configuration version....
Merge branch 'master' into next
export: add meaningful exit code
Currently ‘gnt-backup export’ always returns exit code zero, even in theface of complete failure during backup (only failure to stop/start theinstance will cause job failure and thus non-zero exit code). This isbad, since one cannot script the backup....
Implement gnt-cluster check-disk-sizes
This patch adds a new opcode and lu for checking disk sizes. Currentlyit does only top-level disk verification, and also doesn't checkprimary/secondary node size mismatches (these two are added as TODOs inthe Exec() function of the LU)....
Implement --ignore-size in activate-disks
This patch modified OpActivateDisks, LUActivateDisks and gnt-instanceactivate-disks to support and pass this option to_AssembleInstanceDisks.
The patch is quite trivial I think; there should be no issues from it...
Add ignore size support in _AssembleInstanceDisks
This patch adds an optional parameter to _AssembleInstanceDisks thatallows ignoring of size information by making a copy of the diskstructure and setting the size to zero.
Signed-off-by: Iustin Pop <iustin@google.com>...
Extend call_node_start_master rpc with no_voting
When the parameter is set to True and start_daemons is also True,ganeti-masterd will be started with the new --no-voting --yes-do-itoptions.
This new option is set to True only on masterfailover, when no_voting is...
Yet another fallout from the pylint fixes
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Fix another issue with hypervisor_name change
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Make sure enabled_hypervisors list is valid
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
_GenerateDiskTemplate: use base_index in the name
Currently if a disk is added later the base_index is not considered, andall the disks are called disk0. This patch fixes it.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
cmdlib: Fix typo in LUQueryClusterInfo
This was broken by my pylint fixes patch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix pylint warnings
Fix some typos
Cleanup config data when draining nodes
Currently, when draining nodes we reset their master candidate flag, butwe don't instruct them to demote themselves. This leads to “ERROR: file'/var/lib/ganeti/config.data' should not exist on non master candidates...
Fix node readd issues
This patch fixes a few node readd issues.
Currently, the node readd consists of two opcodes: - OpSetNodeParms, which resets the offline/drained flags - OpAddNode (with readd=True), which reconfigures the node
The problem is that between these two, the configuration is inconsistent...
Fix error message for extra files on non MC nodes
Currently the message for extraneous files on non master candidates isconfusing, to say the least. This makes it hopefully more clear.
Fix adjustement of candidates in cluster modify
The code for adjusting the candidate pool size was done after the configupdate, and this means we triggered the save of the config file withoutfixing the candidate pool, which aborts with an error.
The patch just moves it above. The old comment was valid, but we anyway...
Add a new node list field
This patch adds a ‘role’ node list field, which shows a one-characternode status. This is a simpler way to see the node status than selectingall the flags individually.
Fix handling of 'vcpus' in instance list
Currently running “gnt-instance list -o+vcpus” fails with a cryptic message: Unhandled Ganeti error: vcpus
This is due to multiple issues: - in some corner cases cmdlib.py raises an errors.ParameterError but this is not handled by cli.py...
Fix checking for valid OS in instance create
The current check in LUCreateInstance.CheckPrereq() is wrong - it only checksif we got an OS, but not if we got a valid OS. This patch fixes it.
Show disk size in instance info
The size of the instance's disk was not shown in “gnt-instance info”.This patch adds it and formats it nicely if possible.
LUQueryInstances: fix querying for nic data
Currently we support querying for "mac" "ip" or "bridge", meaning "theone of the first nic. We are not checking that there is a first nic,though, and thus could incur in errors. This patch fixes it by returning...
Wait for a while in failed resyncs
This patch is an attempt at fixing some very rare occurrences of messages like: - "There are some degraded disks for this instance", or: - "Cannot resync disks on node node3.example.com: [True, 100]"
What I believe happens is that drbd has finished syncing, but not all...
Fix two issues with exports and snapshot errors
This patch fixes two issues related to failed snapshots during exports: - first, the error messages used disk.logical_id1, which is a node name for DRBD, and it resulted in strange error messages like...
Set the size on new DRBDs in replace secondary
Currently the code in cmdlib doesn't set the device size to new DRBDdevices in replace secondary, but we need to do it otherwise it getsinitialized to None.
Change failover instance when instance is stopped
Currently, if the instance is stopped, we still check for enough memoryon the target node. This is a little bit too strict, since in case toomany nodes have failed and one is out of the memory, this prevents...
Export more instance information in hooks
Currently we miss in hooks the instance's hypervisor, hypervisorparameters and backend parameters. This forces hooks to query back intoganeti, which is dangerous due to possible luxi sockets exhaustion.
This patch adds these three as INSTANCE_HYPERVISOR, INSTANCE_HV_*,...
Signed-off-by: Guido Trotter <ultrotter@google.com>
IAllocator: export total disk size for instances
This patch adds for current instance a ‘disk_space_total’ key, similarto the key for the new instance in case of new allocations.
Add -H/-B startup parameters to gnt-instance
This patch modifies the start instance script, opcode and logical unitto support temporary startup parameters.
Different from 1.2, where only the kernel arguments were supportingchanges (and thus xen-pvm specific), this version supports changing all...
call_instance_start: add optional hv/be parameters
This patch modifies the rpc.call_instance_start - the master side - totake optional hv/be parameters. The noded side is unchanged andoblivious to the change.
This will allow implementation of single-user capability and such on...
Instance reinstall: don't mix up errors
If the remote info rpc call fails we can't assume that the instance isup.
Don't check memory at startup if instance is up
LUSetClusterParams: improve volume group removal
Currently LUSetClusterParams will remove the volume group if the vg_namefield passed in is not true, but not None. Setting the target volumegroup to False or the empty string, though, is a bad idea because it's...
LUQueryClusterInfo: return a few more fields
Some fields can be set at cluster init, and perhaps even modifed withSetClusterParams but there's no way to know them. With this patch weexport them in the cluster info query.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
LUSetInstanceParam: don't assume memory is integer
LUSetInstanceParam currently assumes that the 'memory' value of acall_instance_info result is an integer, while the rest of the codeexplicitely converts it to int(). Converting it to int works around a...
Remove some superfluous imports
This is for Python 2.6 compatibility.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix argument checking in LUSetClusterParams
This patch fixes two issues with LUSetClusterParams and argumentchecking.
First, this LU used the wrong function name (CheckParameters instead ofCheckArguments), which means that no parameter checking was done at all;...
Include node name in hypervisor validation errors
The current validation routine just says "failed", without specifyingthe node name. This is very confusing, and we should log the node nametoo.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Alexander Schreiber <als@google.com>
LUDiagnoseOS: change locking and error handling
Since the “list OSes” call is exported via RAPI, this can be used prettyeasily to DOS the master daemon during long jobs.
The implementation of LUDiagnoseOS makes an RPC call to all nodes; welock nodes here in order to prevent node removal....
Fix verify-disks with broken volume groups
When a remote node returns invalid LVM data, we check it, but we don'tstop and continue with the rest of the checks (which require a validvolume group). This raises an internal error and breaks verify disks.
This seems unchanged for a long while, I don't know why it surfaced just...
Prevent errors when xenvg is broken cluster verify
When vg_name is not returned at all, we currently abort with an internalerror. This is because we don't catch KeyError.
This patch adds a custom message for this case, and also adds KeyErrorto the list of catched exceptions, just for safety....
A bunch of doc and other small fixes
This patch adds a couple of both externally and internally reportedissues: - missing SGML tags (Issue 54), report and patch by superdupont - wrong variable used in the init.d script, report and patch by Karsten Keil <karsten-keil@t-online.de>...
Handle ghost instances in temp DRBD map
Currently cluster-verify doesn't handle the (admitedly invalid) case where wehave reservation for instances that were removed in the meantime.
This patch adds a check for this and prevents code errors in cluster-verify in...
Fix error handling in replace-disks with new node
Currently the _CreateSingleBlockDev function only raises OpExecError and notBlockDeviceError. This means that we don't release the instance's temporaryminors properly, and this creates problems later if the instance is removed...
Export tags to cluster verify hooks
This patch export the cluster and node tags to the cluster verify hookscripts. The tags are exported as a space-separated list, which allowseasy parsing from the shell (e.g. “for tag in $GANETI_CLUSTER_TAGS; do...”) and therefore requires the previous “Don't allow spaces in tag...
Update the iallocator documentation
This updates the iallocator documentation to 2.0, bumps up theiallocator version (and moves a constants to lib/constants.py), andfixes a style on install.rst.
Reviewed-by: ultrotter
LUVerifyCluster: Handle the "no volume group" case
If we're only file based and out volume group is set to "None" there'sno point in asking nodes for their volume groups, logical volumes, anddrbd devices, and checking those.
Reviewed-by: iustinp
Fix some epydoc style issues
99% of the epydoc return tags are "@return:", but each of the modified fileshad one "@returns:" line. We fix this for consistency.
Reviewed-by: imsnah
Update some hooks settings
While reviewing the hooks document, I realised we are not correctlyexporting the instance properties.
This patch fixes: - export the disk and disk template in all LUs, not only (hardcoded) in the instance create - removes the instance create INSTANCE_ prefix on some non-instance...
Remove the extra_args parameter in instance start
This patch removes the extra_args parameter and instead switches theinstance to the HV_KERNEL_ARGS hypervisor option.
This is a big change, but it's a needed cleanup, this extra parameter onall RPC calls is not generic and we also need to have a persistent value...
Make gnt-instance info work with offline nodes
This simply makes LUQueryInstanceData return the same information as fora static query when one or both of the nodes are down.
Fix some bugs in reboot
There are two issues fixed in this patch: - first, the recent RPC changes caused loss of data in hard reboot type; we weren't reporting any results from the stop/start instance calls; - second, in soft or hard reboots, we didn't initialized the disk...
Convert IOErrors for /proc/drbd into our errors
If /proc/drbd can't be opened, this raises an IOError, but all theerror-handling behaviour in backend treats only BlockDeviceErrors. Thiscreates a plain failure in cluster verify and in other RPC calls.
This patch simply converts EnvironmentErrors into BlockDeviceErrors, and...
SetInstanceParams: export nic changes to hooks
Currently we export the old instance "as is" and any nic changes getlost, so hooks won't know of a different ip, bridge, or mac address.This patch fixes it by putting the nics in the override dict, if anychanges are done....
LUSetInstanceParams: Fix nic handling
CheckArguments: Use constants.VALUE_NONE rather than hardcoding the string "none" If we're adding a nic fill the nic_dict with default values Check if the mac is syntactically valid, if we have one Don't allow the mac to be 'auto' when modifying a nic...
Instance Creation: generate nics earlier
We want the real nic to be shown to the hooks and the allocators, sowe'll generate them in CheckPrereq. We also write a comment about therace condition we generate. This race condition existed even before, somoving this generation will just lenghen it a bit. A separate patch...
Some small fixes
This patch removes the admin_ram LUQueryInstances field (is brokenanyway) and fixes the VNC address checks in the Xen Hypervisor.
Fix LUQueryInstances fields.
The query fields are now regular expressions. We need to quote the dots,otherwise invalid fields will be accepted but they will lose specialformatting in the cli scripts.
Fix RPC result handling in _AssembleInstanceDisks
For (status, data)-style RPC calls, the result data is in the ‘payload’attribute. This was missed in the conversion patch, with the only sideeffect that gnt-instance activate-disks didn't show a nice output...
Switch the instance_shutdown rpc to (status, data)
This patch changes the return type from this RPC call to include statusinformation and renames the backend method to match the RPC call name.
The patch is a little bigger than the reboot one, since this call is...
Switch the instance_reboot rpc to (status, data)
This small patch changes the return type from this RPC call to includestatus information and renames the backend method to match the RPC callname.
_GenerateDiskTemplate: correct file disk index
Currently when adding disks the base for the index is not taken intoaccount, and disk 0 is added twice.
HTS_USE_VNC, rename and remove KVM
Currently we use the HTS_USE_VNC constant only to copy the vnc passwordfile. While KVM uses vnc it currently has no password support, nor we'llbe on time making one for 2.0, so renaming the constant toHTS_COPY_VNC_PASSWORD and only putting Xen HVM in it. In the future...
Some fixes to node add and re-add
The patch changes the pre-checks in node-add and re-add: - if the node is not already in the cluster, refuse to re-add - when re-adding, reuse the secondary IP from the cluster configuration - when re-adding, reset the offline and drained flags, so that RPC...
Instance parameters: force typing
We want all the hv/be parameters to have a known type, rather than arandom mix of empty string, boolean values, and None, so we declare thetype of each variable and we enforce/convert it.
- Add some new constants for enforceable value types...
Implement modification of the drained flag
This patch adds LU and cli-level support for modification of the nodedrained flag. It is similar to the offline changes.
Prevent allocations on drained nodes
This patch adds checks for drained nodes in the logical units thatallocate or move instances around. We also update an error message (notstyle-compliant).
cluster verify: show correctly drained nodes
This patch changes slightly the output of gnt-cluster verify for drainednodes, and also adds a note with the total number of drained nodes(similar to the offline nodes note).
Allow query of the drained node attribute
This patch exports the drained attribute: - LUQueryNodes accepts now the drained field - RAPI exports it for node objects - gnt-node info shows it now (along newly-added master_candidate and offline flags)...
Add a ‘drained’ attribute to node objects
This attribute will be used to prevent any allocation on the node (anyof replace-disks with new secondary this node, failover to the node,migration to the node).
The patch adds the attribute and initializes it correctly in cluster...
Switch the blockdev_remove rpc to (status, data)
This converts the backend and cmdlib modules to a (status, data)implementation of the blockdev_remove rpc call. bdev.py is not yetconverted, so we don't actually have error information.
We also fix a bug in _RemoveDisks by not reusing a variable....
Switch the blockdev_shutdown rpc to (status, data)
This converts the backend and cmdlib modules to a (status, data)implementation of the blockdev_shutdown rpc call. bdev.py is not yetconverted, so we don't actually have error information.
We also fix a bug in _ShutdownInstanceDisks by not reusing a variable....
Convert blockdev_assemble rpc to (status, data)
This converts the RPC call blockdev_assemble to the new-style resultformat. Note that we won't usually have error information, but it's thefirst step toward it.
LUSetInstanceParams: use the correct hvparams
In LUSetInstanceParam we used to save the dict without defaults for theinstance params as hv_inst, but to use the populated one for theinstance (hv_new). Fixing this leads to instances without all theparameters set....
Add a new instance query flag ‘disk_usage’
This patch adds a new instance query flag called disk_usage thatretrieves the overall space used by an instance on each of its nodes.This can be used when balancing the cluster or checking N+1 status.
The flag is also exported in RAPI. Note the flag is currently broken for...
Uniformize some function names in backend.py
Currently, the names of the functions in backend.py that are actuallyRPC procedures and are called from ganeti-noded are not corresponding tothe RPC names. This makes it hard to actually see which functions are...
rpc.call_blockdev_find: convert to (status, data)
This patch converts the call_blockdev_find - which searches for blockdevices and returns their status - to the (status, data) format. We alsomodify the backend function name to match the rpc call.
Export the cpu nodes and sockets from Xen
This is a hand-picked forward patch of commit 1755 on the 1.2 branch(hand-picked since the trees diverged too much since then):
The patch changed the xen hypervisor to compute the number of cpu sockets/nodes and enables the command line and the RAPI to show this...
cmdlib: simplify some rpc error handling cases
By using the RemoteFailMsg() or the payload field of RpcResult, we cansimplify a few functions in cmdlib.
LUCreateInstance: only set running flag at the end
In lockless queries, it's better if we see the instance in ADMIN_downrather than ERROR_down during the time it's installed. As such, wechange the LU to only mark the instance 'up' at the time we are ready to...
Enable lockless node queries
Similar to the instance list, this patch enables lockless node queris.“gnt-node list” accepts now the “--sync” flag which enables locking, thedefault is lockless.
Implement lockless query operations
This patch adds the framework for, and enables lockless OpQueryInstances. Thismeans that instances will be shown in ERROR_up or ERROR_down state, even thoughthis is not an error (but just an in-progress job).
The framework is implemented as follows:...
An attempt at fixing some encoding issues
This patch unifies the hardcoded re-encoding attempts into a singlefunction in utils.py. This function is used to take either an unicode orstr object and convert it to a ASCII-only str object which can be safely...
Small patch for handling errors in node add
This small path hopefully fixes the handling of ssh verify errors innode add (note: untested).
Return error messages in node add ssh handling
When the rpc call node_add fails, we don't have any error message. Thispatch changes the call to return (status, data) so that the user can seethe correct error message.
LUQueryClusterInfo: filter hvparams
We don't need to show hvparams for hypervisors which are not enabled onthe cluster.
LUAddNode: copy the vnc password file also for KVM
Before we used to copy the file if xen-hvm was enabled on the cluster,no we'll do that if any enabled hypervisor is in the new HTS_USE_VNCgroup.
GetShellCommand: get hvparams and beparams
Sometimes the hypervisor will use the instance hv and/or be parametersto determine the best shell command. This is not possible, though,currently, as the instance hv/beparams are not filled, so we have topass the filled versions separately....