Log the rpc call name in the RPC errors message
Currently the rpc module logs the error description and target node inrpc calls logging, as such:
2009-01-21 00:50:01,456: pid=1051/Thread-21 ERROR RPC error from node node1.example.com: Connection failed (111: Connection...
Change the instance status attribute to boolean
Due to historic reasons, the “should run or not” attribute of aninstance was denoted by its “status” attribute having a string value ofeither ‘up’ or ‘down’. Checking this is in code was done via hardcoding...
Implement the new live migration backend functions
MigrationInfo, AcceptInstance and AbortMigration are implemented ashypervisor specific functions, and by default they do nothing (asthey're not always necessary).
This patch also converts hv_base.MigrateInstance docstring to epydoc,...
KVM: save and remove the KVM runtime
At instance startup time we save the kvm runtime, and at stop time wedelete it. This patch also includes a function to load the kvm runtime,which is unused yet.
Reviewed-by: iustinp
KVM: split KVM runtime generation and startup
Before we used to generate the kvm command line and then just run it.With this patch we split the generation from the time it is run,allowing us to save it and replay it at reboot.
We must take special care about instance nics:...
Add calls in the intra-node migration protocol
Currently the hypervisor is expected to do all the migration from thesource side. With this patch we also add the option of passing someinformation to the target side, and starting some operation there.
As a bonus, a function to cleanup any started operation is included....
Update the objects.Disk formatting method
With the addition of minors, this needs to show them too.
Reviewed-by: ultrotter
KVM: add a _CONF_DIR
Currently we keep pid files and control files. In the conf dir we'llalso keep the data to start the instance anew, and the networkinterface scripts. These will then be copied to a separate area (since_CONF_DIR could be mounted 'noexec') and used to start the instance....
KVM: Remove sockets after shutdown
Abstract the monitor and serial socket naming in two functions, andreuse them to cleanup the files after shutdown.
KVM: fix class docstring
Xen: use epydoc in MigrateInstance docstring
ShutdownInstance: report hypervisor error
When StopInstance raises an HypervisorError, report it in the loggedmessage to ease with debugging.
ConfigObject docstring, close an open parenthesis
Fix a typo in luxi's docstring
Update the logging output of job processing
(this is related to the master daemon log)
Currently it's not possible to follow (in the non-debug runs) thelogical execution thread of jobs. This is due to the fact that we don'tlog the thread name (so we lose the association of log messages to jobs)...
.gitignore: Don't exclude whole /autotools/ dir, but only files
This way newly added files will be not be excluded by default. Fixesalso a small whitespace error in utils.py.
Convert RenameInstance to (status, data)
This allows the rename failures to show the ouput of OS scripts.
Fix adding of disks to an instance
The ConfigWriter.AllocateDRBDMinor requires the instance name, not theinstance object. The LUSetInstanceParms is passing wrongly the instanceobject, which can cause breakage.
The patch also adds asserts to check for this mismatch in ConfigWriter....
Make cluster-verify check the drbd minors space
This patch adds support for verification of drbd minors space in clusterverify: minors which belong to running instances and should be onlinebut are not, and minors which do not belong to any instace but are in...
Fix a couple of epydoc warnings
DRBD: check for in-use minor during Create
In order to prevent errors with old, in-use DRBD minors, we check andabort at create time if our minor is already in use. For this we need toalso modify DRBD8Status to be able to parse cs:Unconfigured devices....
Add a TailFile function
This patch adds a tail file function, to be used for parsing and returning inthe job log OS installation failures.
Some small fixes in cmdlib
Convert AddOSToInstance to (status, data)
This allows the install and reinstall instance to return (hopefully)relevant log files from the OS create scripts.
Convert the start instance rpc to (status, data)
This will record the failure cause in starting up the instance in thejob log (and thus to the user).
Fix handling of failures in create instance disks
Commit 2302 only modified _CreateBlockDevOnPrimary to the new styleresult, but _CreateBlockDevOnSecondary was forgotten. After the mergerof the two functions, _CreateBlockDevOnSecondary was taken as template...
Move the default MAC prefix to the constants file
Instead of having the default live in the gnt-cluster script, we move itto the constants file. The patch also fixes a typo on constants.py.
Use instance.all_nodes instead of hand-building it
This patch replaces a few obvious uses of [instance.primary_node] +list(instance.secondary_nodes) (or similar usage) with the newinstance.all_nodes.
Fix non-drbd instance creation
Commit 2294 introduced a new instance.all_nodes property, whichunfortunately is working incorrectly for non-drbd instances.
This patch fixes it by making sure the primary node is always added tothe set, even before recursing over (any potential) children....
Small simplification in MapLVsByNode
We don't need to pre-create the node entries in lvmap, since they willbe created at recursion time.
Split the block device creation in two parts
Some callers of _CreateBlockDev need recursive behaviour, but not all.The replace secondary first creates (manually) new LVs to ensure storageis there, and then it creates the new DRBD. At this point, we need a...
Combine the two _CreateBlockDevOnXXX functions
Since only two boolean parameters differ between these two functions, wecombine them as to have less code duplication. This will be needed inthe future as we will need to split off the recursive part off....
Switch call_blockdev_create call to (status, data)
This allows errors to be visible at the user level instead of just nodedaemon logs.
Small change in the instance disk creation path
For future propagation of error messages from backend to cmdlib and tothe job log, just having True/False return from the disk creationfunction is not enough.
This patch converts these functions (_CreateDisks, _CreateBlockDevOnXXX)...
Block device creation cleanup
Currently when creation LVM-based instances, we always get theextremely-confusing message "ERROR Can't find LV /dev/xenvg/..." whichis actually expected. This behaviour was introduced before we hadUUID-style LV names, since at that point it was not a unexpected to have...
Use the same root for both _data and _meta LVs
Currently we use a different UUID for the _data and _meta volumes of aDRBD disk. This is confusing as it's hard to associate the two in theoutput of “lvs” or “gnt-node volumes”.
The patch changes so that they use the same prefix....
Fix LUExportInstance
Due to deficiencies in our block device implementation, it is a must tocall SetDiskID on disks before passing them to remote nodes. Since inexport/import, we don't touch the disks themselves, this was not neededbefore in this function....
Instance: add a new all_nodes property
Since we often need the list of all nodes of an instance, we add a new"all_nodes" property that returns all nodes of the instance, and weswitch secondary_nodes to a simpler implementation based on this newfunction....
Fix gnt-backup export with short names
We need to pass the fully-qualified node to _CheckNodeOnline, not the shortone.
Reviewed-by: imsnah
Some docstring updates
This patch rewraps some comments to shorter lengths, changesdouble-quotes to single-quotes inside triple-quoted docstrings forbetter editor handling.
It also fixes some epydoc errors, namely invalid crossreferences (aftermethod rename), documentation for inexistent (removed) parameters, etc....
ganeti-noded: reduce log noise
The source port/addr is currently logged three times for eachconnection, and this is unnecessary. We change two log entries to debug,since they are useful for precise timing, and we keep only one at INFOlevel.
Forward port the live migration from 1.2 branch
This is forward port via copy (and not individual patches cherry-pick)of the latest code on the 1.2 branch related to the migration.
The changes compared to 1.2 are the fact that we don't need theIdentifyDisks step anymore (the drbd rpc calls are independent now), and...
Port replace disk/change node to the new DRBD RPCs
In replace disks to new secondary, since Attach (and thereforecall_blockdev_find) is not modifying the devices anymore, we need toswitch this LU to the new call_drbd_disconnect_net andcall_drbd_attach_net functions....
Forward-port DrbdNetReconfig
This is a modified forward-port of DrbdNetReconfig and their associatedRPCs. In Ganeti 2.0, these functions will be used for two things: - live migration (as in 1.2) - and for other network reconfiguration tasks, since DRBD8.Attach()...
backend: rename AttachOrAssemble to Assemble
Since now the Assemble function is different than Attach, we rename thisbackend function to show that the intent is to fully assemble the device(and it's always allowed to modify the device).
drbd: change the semantics of Attach vs. Assemble
Currently, both the Attach and Assemble methods for DRBD8 devices will use andalter the device state. This is suboptimal, and it has been workedaround in 1.2 via a special cache in the node daemon so that we don't...
bdev: Do not call Assemble() on children
The caller of dev.Assemble() (backend._RecursiveAssembleBD) is doing anexplicit recursion over all the children of the device, with bettererror reporting. As such, we don't need this repeated assembly insidethe base BlockDev class....
Fix modification of instance memory
... as found by the QA script - bug was introduced by me in commit 2117.
Reviwed-by: imsnah
Increase resync speed to 60MB/s
This is a forward-port of commit 2219 on the 1.2 branch.
Skip offline nodes in gnt-cluster commands
This patch makes gnt-cluster copyfile and command skip the offlinenodes.
Reviwed-by: ultrotter, imsnah
Fix some errors in instance modify --disk remove
The RpcResult introduction still left some bugs (after multiple patches): - we don't correctly check the result type - rename a variable to prevent a conflict
Fix an error handling case in instance info
The checking for invalid instance names in LUQueryInstanceData is brokensince commit 1642.
Introduce a very simple LU to force config updates
This LU can be used to force a push of the config in case it's needed,for example after an upgrade to update the ssconf_release_version file.
Add a new ssconf file with the ganeti version
The patch adds a new ssconf file containing the ganeti version.
Work around a DRBD sync speed race condition
This is modified forward-port of commit 1544 on the 1.2 branch:
When DRBD is doing its dance to establish a connection with its peer, it also sends the synchronization speed over the wire. In some cases setting the sync speed only after setting up both...
burnin: use the new replace_disks constants
This patch updates burnin to the latest replace disks constant, andchanges the constants' values to be more accurate.
Fix gnt-os for offline nodes
We shouldn't query offline nodes in gnt-os. This patch adds an utilityfunction to ConfigWriter that returns the names of online nodes and usesit in LUDiagnoseOS to query only the good nodes.
Silence warning on node list for offline nodes
The warning in node list is meant for nodes that return wronginformation, but for offline nodes this case is normal.
Rework the daemonization sequence
The current fork+close fds sequence has deficiencies which are hard towork around: - logging can start logging before we fork (e.g. if we need to emit messages related to master checking), and thus use FDs which we...
Cleanup replace-disks modes and options
In 1.2, due to the md+drbd7 legacy, we had a complex choice of replacemodes, and the new drbd8 modes where forced into this syntax, with somecomplicated rules of transition from one mode to another (if REPLACE_ALL...
Fix cluster verify/node net test for offline nodes
For offline nodes, we shouldn't add them to the NV_NODELIST andNV_NODENETTEST tests since they most likely won't succeed.
The patch makes gnt-cluster verify happy again in such cases.
rpc: Add a method for easy check of remote results
The patch adds a new method to the rpc.RpcResult class called"RemoteFailMsg" which is useful for the RPC calls which return a(status, payload) style result.
Add an instance_migratable rpc call
This is a forward-port of commit 1194 on the 1.2 branch:
This call will check whether an instance is up on its primary, and that it has been started with symlinks. We currently have no on-secondary checks, nor any hypervisor specific call....
bdev: forward-port ReAttachNet/DisconnectNet
This is plain copy of the 1.2 ReAttachNet and DisconnectNet methods onthe DRBD8 device, with the logger to logging module changes and theReAttachNet method renamed to AttachNet.
These methods are not used anywhere right now, but will be used for...
backend: Remove symlinks by disk name
This is a modified forward-port of commit 1184 on the 1.2 branch:
backend: Remove symlinks by disk name, not using a wildcard
The changes to the original patch are related to the docstring style and...
Pass instance name to rpc call blockdev_close
This is an extract of commit 1166 on the 1.2 branch (Add a rpc call fordrbd network reconfiguration), but only the blockdev_close part.
The patch changes the blockdev_close call to take the instance so that...
Fix the _RemoveBlockDevLinks() function
This is a forward-port of commit 1163 on the 1.2 branch: This fixes the removal of the instance symlinks (probably breakage from the glob changes).
Remove instance's symlinks
This is a forward-port of commits 1150 and 1151 on the 1.2 branch: Add _RemoveBlockDevLinks auxiliary function, called when an instance fails to start and when it is shut down.
and: Fix cut&paste error when removing symlinks...
Catch BlockDeviceError when starting instance
This is a forward-port of commit 1149 on the 1.2 branch: _GatherAndLinkBlockDevs used to raise the errors.BlockDeviceError exception when it failed to create a block device, and with this patch set it does so also when it fails to create a symlink to it....
Create symlinks to intances' block devices
This is a forward-port of commit 1148 on the 1.2 branch: Change the _GatherBlockDevs private function, called only one time by StartInstance, to _GatherAndLinkBlockDevs, and make it transform the device returned even more by calling the new _SimlinkBlockDev auxiliary...
Simplify hypervisor block_devices structure
This is a partial forward-port of commit 1136 on the 1.2 branch:
The hypervisor doesn't need to be passed the whole block device structure, so we'll just give it the block device name on the local node, and the name as seen by the instance. This will make it easier to...
_AssembleInstanceDisks: fix rpcresult handling
Commit 2117 changed _AssembleInstanceDisks to correctly parse thefailure status of the new RpcResult structure, but it didn't fix thestoring of only the result payload. Since RpcResult is not JSONserializable, LUActivateInstanceDisks is failing....
Fix some pylint-detected issues
Two bad indentation cases and a missing variable.
ganeti.bootstrap: Set permissions on newly uploaded files
Reviewed-by: amishchenko
ganeti.cmdlib: Check remote API certificate on "gnt-cluster verify"
ganeti.bootstrap: Upload remote API certificate to new nodes
ganeti.bootstrap: Prepare for remote API certificate
ganeti.bootstrap: Write SSL key to temporary file and set permissions
Previously, we set the permissions only after writing the key. Thisgave other users on the system a small window during which they couldread the key.
ganeti.bootstrap: Generate SSL certificate for remote API
ganeti.bootstrap: Move SSL certificate generation into separate function
ganeti-rapi: Implement HTTP authentication
Passwords are stored in "$localstatedir/lib/ganeti/rapi_users". Useroptions specify the access permissions of a user (see docstring forganeti.http.ReadPasswordFile), for which only "write" is supportedto grant write access. Every other user has read-only access....
ganeti.http: Function to read password file
Lines in the password file are of the following format:
<username> <password> [options]
Fields are separated by whitespace. Username and password aremandatory, options are optional and separated by comma (",")....
ganeti.http: Add support for private data in HTTP requests
ganeti.http: Add support for basic HTTP authentication
As per RFC2617.
ganeti.http: Prepare authentication for HTTP server
The authentication class will override PreHandleRequest.
Job queue: Allow more than one file rename per RPC call
ganeti.jqueue: Group job archivals to reduce number of RPC calls
Reducing the actual number of RPC calls will come in another patch.
Prevent RPC timeout on auto-archiving jobs
With a large job queue, auto-archiving jobs can take a very long time,causing timeouts on the luxi RPC layer. With this change, auto-archive returns after half of the RPC timeout has passed. The userwill see how many jobs are left unchecked....
jqueue: When auto-archiving jobs, calculate job status only once
This is done by passing the job object to _ArchiveJobUnlocked insteadof only the job ID. Also return whether job was actually archived.
Use subdirectories for job queue archive
As it turned out, having many files in a single directory can bevery painful. With this patch, only 10'000 files are stored in adirectory for the job queue archive. With 10'000 directries, thisallows for up to 100 million jobs be archived without having large...
Add rename function automatically creating directories if needed
Unfortunately, os.makedirs in Python 2.4 is not safe against multipleprocesses creating the same directory tree at the same time. This isonly fixed in Python 2.5 and up. Adding more checks in our code doesn't...
ganeti.http: Don't pass poller object around
They're cheap to instantiate and doing this changes makes the codea bit simpler.
Rename http.HttpInternalError to HttpInternalServerError
All other exceptions are named after the error name in RFC2616 (HTTP/1.1).
ganeti.http: Add more constants and errors
ganeti.http: Ignore ENOTCONN when shutting down the connection
Implement support for additional headers with HTTP errors
Add simple unittests for ganeti.http
More complex unittests will need some refactoring in the HTTP code.
ganeti.bootstrap: Whitespace fix
Add job queue size limit
A job queue with too many jobs can increase memory usage and/or makethe master daemon slow. The current limit is just an arbitrary number.A "soft" limit for automatic job archival is prepared.
utils.KillProcess: Use waitpid() to wait for child processes
Sometimes the proc filesystem doesn't reflect the current status ofa process. By calling waitpid(), we make sure to get the currentinformation, at least for child processes. The timeout is still...