RAPI: rlib1 removal
The resources we still need moved to rlib2.
Reviewed-by: iustinp
RAPI: Implement /2 resource
RAPI: Deprecate version Rapi version1
It is impossible to keep backward compatibility due tosignificant changes in the Ganeti core.
Fix gnt-cluster modify -H and offline nodes
Reviewed-by: ultrotter
Actually mark drives as read-only if so configured
This patch correctly marks the drives as read-only for Xen, and raisesand exception for KVM since it doesn't support read-only drives.
Fix some issues related to job cancelling
This patch fixes two issues with the cancel mechanism: - cancelled jobs show as such, and not in error state (we mark them as OP_STATUS_CANCELED and not OP_STATUS_ERROR) - queued jobs which are cancelled don't raise errors in the master (we...
Xen: use utils.WriteFile for the instance configs
Also raise HypervisorError rather than OpExecError.
Xen: use utils.Readfile to read the VNC password
Implement disk verify checks in config verify
This patch adds a simple check that the 'mode' attribute of top-level disks iscorrect. It does not recurse over children.
The framework could be extended with other checks in the future.
Reviewed-by: imsnah
Fix the mode attribute of newly-created disks
Currently, only the LUSetInstanceParams correctly sets up the modeattribute via a manual operation. We remove this and instead do thecorrect setting in the generic _GenerateDiskTemplate function, so thatwe set the mode correctly for all disk creations....
Rework the multi-instance gnt commands
This patch changes the multi-instance gnt-* commands (gnt-instancestart/stop, gnt-node evacuate/failover) such that the individualoperations are submitted in parallel, ideally improving the speed of theexecution....
Fix single-job archiving (gnt-job archive)
This is a simply typo from the conversion to multi-job archiving.
KVM and Xen: add the HV_ROOT_PATH parameter
This parameter allows a different path to be passed to the instancekernel. The new parameter is mandatory, and by default has the value ofthe old hardcoded value for both kvm and xen.
Beta1 clusters will need to have this parameter added for their...
KVM: implement GetShellCommandForConsole
This is a class method, because it calls _InstanceSerial, which isanother class method. The patch changes it to classmethod for all thehypervisor classes.
KVM: classify _Instance{Monitor,Serial,KVMRuntime}
Those methods need nothing from the instantiated class, and justmanipulate strings, and fetch some class global variables, so they canbe classmethods.
Xen and KVM: correct a typo when checking args
A missing 'be' was present in the error string for both xen and kvm,when the kernel or initrd path was not absolute.
Fix batcher for 2.0-style disks and nics
This patch fixes the gnt-instance batch-create command, and in doing soalso slightly changes two other functions: - we change utils.ParseUnit so that it accepts integer values also (both ParseUnit(5) and ParseUnit("5") return the same value)...
Make iallocator work with offline nodes
This patch changes the iallocator framework to work with and properlyexport to plugins offline nodes. It does this by only exporting thestatic configuration data for those nodes, and not attempting to parsethe runtime data....
Remove checking of DRBD metadata for validity
Currently the DRBD code checks that the metadata devices are validbefore creation, initial disk attachment and add children.
However, the process for checking validity requires a free DRBD minor,and this conflict with parallel checking....
Relax the restrictions on temporary DRBD minors
Currently the restrictions are too harsh: there is a time intervalbetween an instance gets a new disk and before it is added to theconfiguration in which the restriction is not met. We solve this byallowing temporary DRBD minors to match existing minors (for the same...
Introduce more configuration consistency checks
This patch enhances the duplicate DRBD minors checks (currently just afew) and adds automatic checks of configuration consistency atconfiguration file writing time.
In order to do so and show meaningful error messages, the...
Fill the 'call' attribute of offline rpc results
When creating ‘fake’ results for offline nodes, we currently don't passthe call attribute. This complicates debugging, so even though thisshould not matter in practice, it's better to fix it.
A couple of small fixes to iallocator
This removes some constraints: - only two disks supported, this is no longer true as the underlying functions can now compute size for a variable number of disks - error when the hypervisor was not being passed...
luxi: close and reopen the socket on errors
This is less of an actual issue for regular gnt-* clients, but it'seasily reproducible with burnin and possible with RAPI (depending on howthe program uses luxi.Client(s)).
In case of burnin, if we interrupt the client (^C) while it polls the...
ShutdownInstance: log instance name, not object
When an instance fails to shut down we currently log its whole object,rather than just the instance name.
KVM live migration: handle failure
If the KVM live migration ends up in a 'failed' state it has beenaborted at the kvm level, and the machine is still running locally.We support also the 'cancelled' state even though there should be no wayof reaching it, without manual intervention....
KVM: change a few IOError with EnvironmentError
KVM: instance migration
The tcp port used for migrating KVM instances is selectable at./configure time. We use a single port as nodes are locked anyway duringa migration, so no two migrations can happen at the same time to thesame node.
KVM: add the _InstancePidAlive function
Throughout the kvm code we very often look for the instance pidfilename, read it, and check if the process is alive. Abstract this into aprivate function and use that one instead.
This patch also changes RebootInstance to check whether the instance is...
KVM: fix RebootInstance
RebootInstance was broken, because it just used to call StartInstancewith wrong parameters. With this patch we still stop the instance, butuse the saved kvm runtime to start it again.
KVM: retry the instance shutdown command
When we ask the instance to shutdown sometimes the command won't work,especially if the instance isn't fully booted up. We'll wait for a bit,and give it a few chances before giving up.
Xen: implement auxiliary migration functions
These are used, for the xen hypervisor, to copy the xen config file tothe remote node. This breaks migration for instances which have beenmigrated, but not restarted, with the old code, for which the configfile was just lost....
Automatically release DRBD minors on success
This patch converts the DRBD minors reservation protocol from explicitrelease to automatic release on the success paths. On the errors paths,it's still needed to manual release.
The patch doesn't bring much by itself, but is needed for a future patch...
Fix some more pylint errors
Two are real errors (invalid names) and one is style error (overridingname from outer scope).
Log the rpc call name in the RPC errors message
Currently the rpc module logs the error description and target node inrpc calls logging, as such:
2009-01-21 00:50:01,456: pid=1051/Thread-21 ERROR RPC error from node node1.example.com: Connection failed (111: Connection...
Change the instance status attribute to boolean
Due to historic reasons, the “should run or not” attribute of aninstance was denoted by its “status” attribute having a string value ofeither ‘up’ or ‘down’. Checking this is in code was done via hardcoding...
Implement the new live migration backend functions
MigrationInfo, AcceptInstance and AbortMigration are implemented ashypervisor specific functions, and by default they do nothing (asthey're not always necessary).
This patch also converts hv_base.MigrateInstance docstring to epydoc,...
KVM: save and remove the KVM runtime
At instance startup time we save the kvm runtime, and at stop time wedelete it. This patch also includes a function to load the kvm runtime,which is unused yet.
KVM: split KVM runtime generation and startup
Before we used to generate the kvm command line and then just run it.With this patch we split the generation from the time it is run,allowing us to save it and replay it at reboot.
We must take special care about instance nics:...
Add calls in the intra-node migration protocol
Currently the hypervisor is expected to do all the migration from thesource side. With this patch we also add the option of passing someinformation to the target side, and starting some operation there.
As a bonus, a function to cleanup any started operation is included....
Update the objects.Disk formatting method
With the addition of minors, this needs to show them too.
KVM: add a _CONF_DIR
Currently we keep pid files and control files. In the conf dir we'llalso keep the data to start the instance anew, and the networkinterface scripts. These will then be copied to a separate area (since_CONF_DIR could be mounted 'noexec') and used to start the instance....
KVM: Remove sockets after shutdown
Abstract the monitor and serial socket naming in two functions, andreuse them to cleanup the files after shutdown.
KVM: fix class docstring
Xen: use epydoc in MigrateInstance docstring
ShutdownInstance: report hypervisor error
When StopInstance raises an HypervisorError, report it in the loggedmessage to ease with debugging.
ConfigObject docstring, close an open parenthesis
Fix a typo in luxi's docstring
Update the logging output of job processing
(this is related to the master daemon log)
Currently it's not possible to follow (in the non-debug runs) thelogical execution thread of jobs. This is due to the fact that we don'tlog the thread name (so we lose the association of log messages to jobs)...
.gitignore: Don't exclude whole /autotools/ dir, but only files
This way newly added files will be not be excluded by default. Fixesalso a small whitespace error in utils.py.
Convert RenameInstance to (status, data)
This allows the rename failures to show the ouput of OS scripts.
Fix adding of disks to an instance
The ConfigWriter.AllocateDRBDMinor requires the instance name, not theinstance object. The LUSetInstanceParms is passing wrongly the instanceobject, which can cause breakage.
The patch also adds asserts to check for this mismatch in ConfigWriter....
Make cluster-verify check the drbd minors space
This patch adds support for verification of drbd minors space in clusterverify: minors which belong to running instances and should be onlinebut are not, and minors which do not belong to any instace but are in...
Fix a couple of epydoc warnings
DRBD: check for in-use minor during Create
In order to prevent errors with old, in-use DRBD minors, we check andabort at create time if our minor is already in use. For this we need toalso modify DRBD8Status to be able to parse cs:Unconfigured devices....
Add a TailFile function
This patch adds a tail file function, to be used for parsing and returning inthe job log OS installation failures.
Some small fixes in cmdlib
Convert AddOSToInstance to (status, data)
This allows the install and reinstall instance to return (hopefully)relevant log files from the OS create scripts.
Convert the start instance rpc to (status, data)
This will record the failure cause in starting up the instance in thejob log (and thus to the user).
Fix handling of failures in create instance disks
Commit 2302 only modified _CreateBlockDevOnPrimary to the new styleresult, but _CreateBlockDevOnSecondary was forgotten. After the mergerof the two functions, _CreateBlockDevOnSecondary was taken as template...
Move the default MAC prefix to the constants file
Instead of having the default live in the gnt-cluster script, we move itto the constants file. The patch also fixes a typo on constants.py.
Use instance.all_nodes instead of hand-building it
This patch replaces a few obvious uses of [instance.primary_node] +list(instance.secondary_nodes) (or similar usage) with the newinstance.all_nodes.
Fix non-drbd instance creation
Commit 2294 introduced a new instance.all_nodes property, whichunfortunately is working incorrectly for non-drbd instances.
This patch fixes it by making sure the primary node is always added tothe set, even before recursing over (any potential) children....
Small simplification in MapLVsByNode
We don't need to pre-create the node entries in lvmap, since they willbe created at recursion time.
Split the block device creation in two parts
Some callers of _CreateBlockDev need recursive behaviour, but not all.The replace secondary first creates (manually) new LVs to ensure storageis there, and then it creates the new DRBD. At this point, we need a...
Combine the two _CreateBlockDevOnXXX functions
Since only two boolean parameters differ between these two functions, wecombine them as to have less code duplication. This will be needed inthe future as we will need to split off the recursive part off....
Switch call_blockdev_create call to (status, data)
This allows errors to be visible at the user level instead of just nodedaemon logs.
Small change in the instance disk creation path
For future propagation of error messages from backend to cmdlib and tothe job log, just having True/False return from the disk creationfunction is not enough.
This patch converts these functions (_CreateDisks, _CreateBlockDevOnXXX)...
Block device creation cleanup
Currently when creation LVM-based instances, we always get theextremely-confusing message "ERROR Can't find LV /dev/xenvg/..." whichis actually expected. This behaviour was introduced before we hadUUID-style LV names, since at that point it was not a unexpected to have...
Use the same root for both _data and _meta LVs
Currently we use a different UUID for the _data and _meta volumes of aDRBD disk. This is confusing as it's hard to associate the two in theoutput of “lvs” or “gnt-node volumes”.
The patch changes so that they use the same prefix....
Fix LUExportInstance
Due to deficiencies in our block device implementation, it is a must tocall SetDiskID on disks before passing them to remote nodes. Since inexport/import, we don't touch the disks themselves, this was not neededbefore in this function....
Instance: add a new all_nodes property
Since we often need the list of all nodes of an instance, we add a new"all_nodes" property that returns all nodes of the instance, and weswitch secondary_nodes to a simpler implementation based on this newfunction....
Fix gnt-backup export with short names
We need to pass the fully-qualified node to _CheckNodeOnline, not the shortone.
Some docstring updates
This patch rewraps some comments to shorter lengths, changesdouble-quotes to single-quotes inside triple-quoted docstrings forbetter editor handling.
It also fixes some epydoc errors, namely invalid crossreferences (aftermethod rename), documentation for inexistent (removed) parameters, etc....
ganeti-noded: reduce log noise
The source port/addr is currently logged three times for eachconnection, and this is unnecessary. We change two log entries to debug,since they are useful for precise timing, and we keep only one at INFOlevel.
Forward port the live migration from 1.2 branch
This is forward port via copy (and not individual patches cherry-pick)of the latest code on the 1.2 branch related to the migration.
The changes compared to 1.2 are the fact that we don't need theIdentifyDisks step anymore (the drbd rpc calls are independent now), and...
Port replace disk/change node to the new DRBD RPCs
In replace disks to new secondary, since Attach (and thereforecall_blockdev_find) is not modifying the devices anymore, we need toswitch this LU to the new call_drbd_disconnect_net andcall_drbd_attach_net functions....
Forward-port DrbdNetReconfig
This is a modified forward-port of DrbdNetReconfig and their associatedRPCs. In Ganeti 2.0, these functions will be used for two things: - live migration (as in 1.2) - and for other network reconfiguration tasks, since DRBD8.Attach()...
backend: rename AttachOrAssemble to Assemble
Since now the Assemble function is different than Attach, we rename thisbackend function to show that the intent is to fully assemble the device(and it's always allowed to modify the device).
drbd: change the semantics of Attach vs. Assemble
Currently, both the Attach and Assemble methods for DRBD8 devices will use andalter the device state. This is suboptimal, and it has been workedaround in 1.2 via a special cache in the node daemon so that we don't...
bdev: Do not call Assemble() on children
The caller of dev.Assemble() (backend._RecursiveAssembleBD) is doing anexplicit recursion over all the children of the device, with bettererror reporting. As such, we don't need this repeated assembly insidethe base BlockDev class....
Fix modification of instance memory
... as found by the QA script - bug was introduced by me in commit 2117.
Reviwed-by: imsnah
Increase resync speed to 60MB/s
This is a forward-port of commit 2219 on the 1.2 branch.
Skip offline nodes in gnt-cluster commands
This patch makes gnt-cluster copyfile and command skip the offlinenodes.
Reviwed-by: ultrotter, imsnah
Fix some errors in instance modify --disk remove
The RpcResult introduction still left some bugs (after multiple patches): - we don't correctly check the result type - rename a variable to prevent a conflict
Fix an error handling case in instance info
The checking for invalid instance names in LUQueryInstanceData is brokensince commit 1642.
Introduce a very simple LU to force config updates
This LU can be used to force a push of the config in case it's needed,for example after an upgrade to update the ssconf_release_version file.
Add a new ssconf file with the ganeti version
The patch adds a new ssconf file containing the ganeti version.
Work around a DRBD sync speed race condition
This is modified forward-port of commit 1544 on the 1.2 branch:
When DRBD is doing its dance to establish a connection with its peer, it also sends the synchronization speed over the wire. In some cases setting the sync speed only after setting up both...
burnin: use the new replace_disks constants
This patch updates burnin to the latest replace disks constant, andchanges the constants' values to be more accurate.
Fix gnt-os for offline nodes
We shouldn't query offline nodes in gnt-os. This patch adds an utilityfunction to ConfigWriter that returns the names of online nodes and usesit in LUDiagnoseOS to query only the good nodes.
Silence warning on node list for offline nodes
The warning in node list is meant for nodes that return wronginformation, but for offline nodes this case is normal.
Rework the daemonization sequence
The current fork+close fds sequence has deficiencies which are hard towork around: - logging can start logging before we fork (e.g. if we need to emit messages related to master checking), and thus use FDs which we...
Cleanup replace-disks modes and options
In 1.2, due to the md+drbd7 legacy, we had a complex choice of replacemodes, and the new drbd8 modes where forced into this syntax, with somecomplicated rules of transition from one mode to another (if REPLACE_ALL...
Fix cluster verify/node net test for offline nodes
For offline nodes, we shouldn't add them to the NV_NODELIST andNV_NODENETTEST tests since they most likely won't succeed.
The patch makes gnt-cluster verify happy again in such cases.
rpc: Add a method for easy check of remote results
The patch adds a new method to the rpc.RpcResult class called"RemoteFailMsg" which is useful for the RPC calls which return a(status, payload) style result.
Add an instance_migratable rpc call
This is a forward-port of commit 1194 on the 1.2 branch:
This call will check whether an instance is up on its primary, and that it has been started with symlinks. We currently have no on-secondary checks, nor any hypervisor specific call....
bdev: forward-port ReAttachNet/DisconnectNet
This is plain copy of the 1.2 ReAttachNet and DisconnectNet methods onthe DRBD8 device, with the logger to logging module changes and theReAttachNet method renamed to AttachNet.
These methods are not used anywhere right now, but will be used for...
backend: Remove symlinks by disk name
This is a modified forward-port of commit 1184 on the 1.2 branch:
backend: Remove symlinks by disk name, not using a wildcard
The changes to the original patch are related to the docstring style and...
Pass instance name to rpc call blockdev_close
This is an extract of commit 1166 on the 1.2 branch (Add a rpc call fordrbd network reconfiguration), but only the blockdev_close part.
The patch changes the blockdev_close call to take the instance so that...