Show more details for failed xen commands
This patch also logs the output of the xm commands in case of failures;some corner cases were forgotten in the last redo.
Reviewed-by: imsnah
Fix some bugs in reboot
There are two issues fixed in this patch: - first, the recent RPC changes caused loss of data in hard reboot type; we weren't reporting any results from the stop/start instance calls; - second, in soft or hard reboots, we didn't initialized the disk...
Convert IOErrors for /proc/drbd into our errors
If /proc/drbd can't be opened, this raises an IOError, but all theerror-handling behaviour in backend treats only BlockDeviceErrors. Thiscreates a plain failure in cluster verify and in other RPC calls.
This patch simply converts EnvironmentErrors into BlockDeviceErrors, and...
Convert default root partition to msdos style
As discussed with 2.0 msdos partition style should be the default in theinstance OS, so we're changing the default instance params accordingly.A followup patch will update the debootstrap os.
Reviewed-by: iustinp
RAPI: documentation updates
This patch fixes the version and does some update to the RAPI resourcesdocs.
RAPI: fixes related to write mode
This patch fixes many small issues related to write functions: - update documentations w.r.t. how to add users - update the instance add function for latest API - add instance delete - fix addition of tags - update some error messages...
Some small improvements to the fake hypervisor
This patch modifies the fake hypervisor to substract the memory “used”by “running” instances from the free memory, so the actual nodeinformation changes based on the running instances.
Also some style changes and fixes are added....
SetInstanceParams: export nic changes to hooks
Currently we export the old instance "as is" and any nic changes getlost, so hooks won't know of a different ip, bridge, or mac address.This patch fixes it by putting the nics in the override dict, if anychanges are done....
Remove two fixed FIXME and convert one to TODO
The cli FIXME is not something broken, but rather some better handlingfeature we'd rather have, and the two backend FIXME are done (disks havetheir read only parameter set, and the error is raised and thus reaches...
RAPI: format error messages as JSON
This patch changes the format of the HTTP error messages from text/html, whichis hard to parse from RAPI clients, to JSON which can be automatically parsed.
The error message is an object, which contains always three keys:...
Make RAPI return 502/504 errors for luxi errors
This changes the RAPI error codes for luxi errors; a timeout error isnow reported properly as 504, while any other luxi error is reported as502.
It would be good to convert even more errors into proper return codes in...
job queue: log the opcode error too
Currently we only log "Error in opcode ...", but we don't log the error itself.This is not good for debugging.
Reviewed-by: ultrotter
LUSetInstanceParams: Fix nic handling
CheckArguments: Use constants.VALUE_NONE rather than hardcoding the string "none" If we're adding a nic fill the nic_dict with default values Check if the mac is syntactically valid, if we have one Don't allow the mac to be 'auto' when modifying a nic...
ConfigWriter.AddInstance check instance mac
There is a race condition in CreateInstance, since the mac address isgenerated early and only added to the config (and thus really assured tobe unique) only at this point. Since it's possible that another instance...
Instance Creation: generate nics earlier
We want the real nic to be shown to the hooks and the allocators, sowe'll generate them in CheckPrereq. We also write a comment about therace condition we generate. This race condition existed even before, somoving this generation will just lenghen it a bit. A separate patch...
Handle better broken disks
While running burnin: File "/usr/lib/python2.4/site-packages/ganeti/objects.py", line 497, in str val += ", size=%dm)>" % self.sizeTypeError: int argument required
This happened while handling another error, so we lose the original...
Do not check 'None' disk IDs for duplicates
In case of 'None' logical or physical IDs, we don't need to check themfor duplicates. This case can happen for DRBD devices in case of newlyadded disks, for example.
Prevent race condition on MAC addresses
This patch adds a temporary set for MACs that have been requested butare not yet in the configuration (as part of an instance NIC). The MACsof an instance are automatically removed from this set when the instance...
Some small fixes
This patch removes the admin_ram LUQueryInstances field (is brokenanyway) and fixes the VNC address checks in the Xen Hypervisor.
Fix LUQueryInstances fields.
The query fields are now regular expressions. We need to quote the dots,otherwise invalid fields will be accepted but they will lose specialformatting in the cli scripts.
Apply the right permissions to /etc/hosts
In the current Ganeti version when modifying /etc/hosts we mistakenlygive it the permissions of the temporary file we create to define itscontent, which is by default 0600. This breaks most non-rootapplications, and thus must be corrected. This patch forces the mode to...
Fix RPC result handling in _AssembleInstanceDisks
For (status, data)-style RPC calls, the result data is in the ‘payload’attribute. This was missed in the conversion patch, with the only sideeffect that gnt-instance activate-disks didn't show a nice output...
ConfigWriter: add checks for duplicate disk IDs
This patch adds a safety check for duplicate disk logical/physical IDs,in order to prevent possible software bugs.
Switch the instance_shutdown rpc to (status, data)
This patch changes the return type from this RPC call to include statusinformation and renames the backend method to match the RPC call name.
The patch is a little bigger than the reboot one, since this call is...
Switch the instance_reboot rpc to (status, data)
This small patch changes the return type from this RPC call to includestatus information and renames the backend method to match the RPC callname.
FileStorage: abort creating over an existing file
In FileStorage there is a TODO: decide whether we should check for existing files and abort or notAfter Ganeti ate my instance data I decided. Let's abort.In general there is no reason we should overwrite existing files, and...
_GenerateDiskTemplate: correct file disk index
Currently when adding disks the base for the index is not taken intoaccount, and disk 0 is added twice.
HTS_USE_VNC, rename and remove KVM
Currently we use the HTS_USE_VNC constant only to copy the vnc passwordfile. While KVM uses vnc it currently has no password support, nor we'llbe on time making one for 2.0, so renaming the constant toHTS_COPY_VNC_PASSWORD and only putting Xen HVM in it. In the future...
Some fixes to node add and re-add
The patch changes the pre-checks in node-add and re-add: - if the node is not already in the cluster, refuse to re-add - when re-adding, reuse the secondary IP from the cluster configuration - when re-adding, reset the offline and drained flags, so that RPC...
Instance parameters: force typing
We want all the hv/be parameters to have a known type, rather than arandom mix of empty string, boolean values, and None, so we declare thetype of each variable and we enforce/convert it.
- Add some new constants for enforceable value types...
Implement modification of the drained flag
This patch adds LU and cli-level support for modification of the nodedrained flag. It is similar to the offline changes.
Prevent allocations on drained nodes
This patch adds checks for drained nodes in the logical units thatallocate or move instances around. We also update an error message (notstyle-compliant).
cluster verify: show correctly drained nodes
This patch changes slightly the output of gnt-cluster verify for drainednodes, and also adds a note with the total number of drained nodes(similar to the offline nodes note).
ConfigWriter: handle the drained node flag
This patch changes the master candidate pool computations inConfigWriter to properly handle drained nodes. They are now excludedfrom counting towards the reachable number of candidates.
The patch also adds verification of consistency for the node status....
Allow query of the drained node attribute
This patch exports the drained attribute: - LUQueryNodes accepts now the drained field - RAPI exports it for node objects - gnt-node info shows it now (along newly-added master_candidate and offline flags)...
Add a ‘drained’ attribute to node objects
This attribute will be used to prevent any allocation on the node (anyof replace-disks with new secondary this node, failover to the node,migration to the node).
The patch adds the attribute and initializes it correctly in cluster...
Some error message cleanups
Cleanup of DRBD8._CheckMetaSize
This patch converts the _CheckMetaSize method to raise exceptionsinstead of logging and returning False. This fits now in the new rpcreturn types, so it's a cheap change.
Change the disk assembly to raise exceptions
This big patch converts the bdev Assemble() methods and the supportingfunctions to raise exceptions instead of returning False. This is a bigpatch, since the assembly functions touch other functions: add children,...
Change BlockDev.Remove() failure result
Currently, the Remove() methods of block devices return True/False.This doesn't permit any error detail reporting.
This patch changes the return type to None for success, and raisesBlockDeviceError in case of failure. This permits the details to be...
Switch the blockdev_remove rpc to (status, data)
This converts the backend and cmdlib modules to a (status, data)implementation of the blockdev_remove rpc call. bdev.py is not yetconverted, so we don't actually have error information.
We also fix a bug in _RemoveDisks by not reusing a variable....
Change BlockDev.Shutdown() failure result
Currently, the Shutdown() methods of block devices return True/False.This doesn't permit any error detail reporting.
Switch the blockdev_shutdown rpc to (status, data)
This converts the backend and cmdlib modules to a (status, data)implementation of the blockdev_shutdown rpc call. bdev.py is not yetconverted, so we don't actually have error information.
We also fix a bug in _ShutdownInstanceDisks by not reusing a variable....
Convert blockdev_assemble rpc to (status, data)
This converts the RPC call blockdev_assemble to the new-style resultformat. Note that we won't usually have error information, but it's thefirst step toward it.
RAPI: fix a pylint warning
Child classes of _R_TAGS must define TAG_LEVEL, but for good style let'sdefine it also here to at least ensure we don't get a 'Unknownattribute' exception.
Of course, this also silences a pylint warning.
Reviewed-by: amishchenko
LUSetInstanceParams: use the correct hvparams
In LUSetInstanceParam we used to save the dict without defaults for theinstance params as hv_inst, but to use the populated one for theinstance (hv_new). Fixing this leads to instances without all theparameters set....
KVM: Correct CheckParameterSyntax docstring
The comment is not really true anymore, as we have a lot of parametersnowadays.
KVM: Fix _CallMonitorCommand error message
1) Only instance_name is available2) There was a missing string parameter
KVM: Add usb mouse type parameter
In some cases 'mouse' may work better than 'tablet', so we'll handleboth by allowing the user to specify a parameter. By default no mouse isused.
KVM: allow netboot
With this patch we allow KVM instances to be booted off the network.The only issue is that this is not compatible with virtio nics, sowe disallow them, when booting from the net.
KVM: actually support different nic types
When executing the KVM runtime we load the nic type from the runtimehvparams and use it to specify the nic model type. As for the disk wetranslate the DEV_PARAVIRTUAL type to 'virtio'.
KVM: export hvparams in the runtime
They'll be used to set the nic type when we execute the runtime, sincethe nics are processed later. We need to save the hvparams because wewant to use the same one as when we saved the runtime, rather than usethe current instance ones, to avoid applying only some changed...
KVM: actually support different disk types
By passing the relevant if= value to the disk we support different disktypes. The only change is that we'll translate "paravirtual" to"virtio" to keep only one "paravirtualized" value, around ganeti. Theif= value is calculated outside the disks loop, as it's the same for all...
Xen-HVM: Improve the invalid disk/nic type error
Copy the message from the KVM one, adding a missing 'the' and a list ofpossible values, to help the user in his decision.
KVM: parameters for different disk and nic types
- Add a bunch of NICs and DISKs types- Specify which one are valid disks and nics for KVM (the new ones toghether with some of the old ones)- Add the default values (paravirtual)- Allow the disk and nic types as parameters and check their validity...
Rename the device type constants
These are not HVM specific, so have been given an HT generic name.
s/HT_HVM_VNC_BASE_PORT/VNC_BASE_PORT/g
The VNC base port has nothing to do with HVM itself, and is general toVNC itself, so we're removing the HT_HVM prefix to the constant.
Add a new instance query flag ‘disk_usage’
This patch adds a new instance query flag called disk_usage thatretrieves the overall space used by an instance on each of its nodes.This can be used when balancing the cluster or checking N+1 status.
The flag is also exported in RAPI. Note the flag is currently broken for...
Uniformize some function names in backend.py
Currently, the names of the functions in backend.py that are actuallyRPC procedures and are called from ganeti-noded are not corresponding tothe RPC names. This makes it hard to actually see which functions are...
bdev: add and use two utility functions
This patch adds two utility functions for raising BlockDeviceErrorexceptions and for running functions while ignoring this error. Most ofthe manual “raise errors.BlockDeviceError” cases are converted to_ThrowError, as this makes the code clearer....
rpc.call_blockdev_find: convert to (status, data)
This patch converts the call_blockdev_find - which searches for blockdevices and returns their status - to the (status, data) format. We alsomodify the backend function name to match the rpc call.
Export the cpu nodes and sockets from Xen
This is a hand-picked forward patch of commit 1755 on the 1.2 branch(hand-picked since the trees diverged too much since then):
The patch changed the xen hypervisor to compute the number of cpu sockets/nodes and enables the command line and the RAPI to show this...
Fix handling OS errors in AddOSToInstance
This patch fixes the error handling in the add OS to instance functionwith regard to invalid OSes. Previously, we didn't handle any sucherrors, with the end result that the user would have to look in the nodedaemon log....
backend.DrbdAttachNet: don't ignore Open() errors
Currently the return value or errors from the block device Open() methodare ignored. This patch catches any BlockDeviceErrors and returns awell-formatted result.
cmdlib: simplify some rpc error handling cases
By using the RemoteFailMsg() or the payload field of RpcResult, we cansimplify a few functions in cmdlib.
RpcResult: add a new payload field
For results which use the (status, payload) response type, it's easierto define a ‘payload’ field on the result holding the payload than toextract it using “data1” in the caller code.
LUCreateInstance: only set running flag at the end
In lockless queries, it's better if we see the instance in ADMIN_downrather than ERROR_down during the time it's installed. As such, wechange the LU to only mark the instance 'up' at the time we are ready to...
KVM: don't boot from a virtio cdrom
Apparently it's not supported. Also add -boot command line parametersto kvm, since they seem to help booting from the right place. Everythingwill still only work when not using a kernel, but well... :)
KVM: don't boot from cdrom with no cdrom
Support cdrom image and boot order for KVM
The cdrom image has the same meaning than in Xen HVM, and so doesboot_order, even though it has a slightly different syntax, and uses thevalue 'disk' too boot from disk and 'cdrom' to boot from cdrom.
Get rid of constants.HT_HVM_DEFAULT_BOOT_ORDER
Confusingly, as a leftober from 1.2, there was aconstants.HT_HVM_DEFAULT_BOOT_ORDER constant, with a value opposite tothe default HV_BOOT_ORDER hv param that got enabled only ifHV_BOOT_ORDER was set to None. Since setting it to None is very...
Fix rapi job listing
This patch fixes a couple of issues with the job listing: - in case of a non-existing job, nicely raise 404 instead of 500 - in the job detail listing, also list the job log, the job timestamps, etc. - the opcode migrate instance was missing its description field...
KVM: add VNC TLS and X509 parameters
With this parameters VNC for KVM is able to be protected by tls,optionally with an x509 certificate, and optionally verifying theclient as well. Additionally in this patch we limit the bind address tobeing a directory, rather than a file or a directory, for simplicity, as...
KVM: allow binding vnc to a file
Before we forced the VNC_BIND_ADDRESS to be an ip. Now we also accept apath, and bind the instance to it, or to a file in it if it's adirectory.
Fix some issues for lockless queries
This patch converts some more jobs with only queries into cheaper luxiqueries (no job created), and fixes some fallout from the locklessqueries changes.
Revive RAPI QA tests for 2.0-style RAPI
This patch fixes the RAPI QA tests to work with today's RAPI code andalso does some other minor improvements: - QA: only create the cluster if so configured (‘create-cluster’ key), this allows running parts of the QA suite against existing clusters...
rapi: fix 'bulk' processing and add locking option
This patch fixes the 'bulk' parameter (before any non-emptyspecification was considered True, in conflict with the documentation,i.e. bulk=0 still did bulk queries).
The patch also adds optional locking on the instance/node listing (does...
rapi: cleanup and update to latest 2.0 API
This patch cleans up and updates the RAPI interface: - queries are changes to luxi queries instead of jobs, where possible - since we changed the API version, we remove the old-style attributes (sda_size, ip, etc.) and replace them with 2.0 style...
Enable lockless node queries
Similar to the instance list, this patch enables lockless node queris.“gnt-node list” accepts now the “--sync” flag which enables locking, thedefault is lockless.
rapi: fix authentication and queries
For queries, we don't want to require authentication. We fix this by adding anoverride GetAuthRealm in the rapi daemon.
We also fix a method name.
Add one new luxi query: cluster info
This is the last query that RAPI executes via opcodes and is purelystatic (config values only). As such, we can convert it safely to aquery instead of job.
ssconf: add some more keys and some fixes
This patch adds the online node list and instance list to the ssconfkeys. In order to do distribute correctly the instance list, we need toupdate the cluster serial number on instance additions and removals.
The patch also changes the permissions on the ssconf files to be 0444:...
Implement lockless query operations
This patch adds the framework for, and enables lockless OpQueryInstances. Thismeans that instances will be shown in ERROR_up or ERROR_down state, even thoughthis is not an error (but just an in-progress job).
The framework is implemented as follows:...
KVM: Make GetAllInstancesInfo concurrency-safe
Or actually more so. If this function gets called while instances getshut down, it might try to report information on instances which don'texits. Try to fail gracefully if that happens, by just skipping an...
Correct a typo in ReadPidFile's docstring
An attempt at fixing some encoding issues
This patch unifies the hardcoded re-encoding attempts into a singlefunction in utils.py. This function is used to take either an unicode orstr object and convert it to a ASCII-only str object which can be safely...
Small patch for handling errors in node add
This small path hopefully fixes the handling of ssh verify errors innode add (note: untested).
ssh: more details on failure
In case we fail without output from the ssh command, we should at leastadd the exit code or any other failure reason to the error message, andlog it and the cmdline used to the node daemon log.
Give a sane permission to the known_host file
A couple of small changes to the OS environment
This patch correctly exports the mode of disks (rw/ro) and also exportsthe instance OS.
Whitespace change: bad indentation in constants.py
This patch only changes some indentation in constants.py.
Return error messages in node add ssh handling
When the rpc call node_add fails, we don't have any error message. Thispatch changes the call to return (status, data) so that the user can seethe correct error message.
LUQueryClusterInfo: filter hvparams
We don't need to show hvparams for hypervisors which are not enabled onthe cluster.
KVM: advise about VNC support on GetShellCommand
KVM: enable VNC if a VNC_BIND_ADDRESS is defined
We'll also enable a tablet usb device, as suggested by the kvm man page.
KVM: Allow the HV_VNC_BIND_ADDRESS parameter
LUAddNode: copy the vnc password file also for KVM
Before we used to copy the file if xen-hvm was enabled on the cluster,no we'll do that if any enabled hypervisor is in the new HTS_USE_VNCgroup.
Add HT_KVM to HTS_REQ_PORT
HT_KVM doesn't technically require a port, but if it has one it can givevnc displays to instances.
KVM: make the kernel and initrd arguments optional
Under KVM we don't strictly need a kernel and initrd. If some are passedwe'll use them, otherwise the guest OS will need to behave as fullynative, and have its own boot loader and kernel.The root_path hypervisor parameter becomes mandatory only if a kernel is...
KVM: add the HV_SERIAL_CONSOLE parameter
Up until now a KVM instance was forced to have a serial port.With this change this is no longer mandatory, by default we'll use one,but if the HV_SERIAL_CONSOLE parameter is set to False we'll do without.