Add a TailFile function
This patch adds a tail file function, to be used for parsing and returning inthe job log OS installation failures.
Reviewed-by: ultrotter
Some small fixes in cmdlib
Convert AddOSToInstance to (status, data)
This allows the install and reinstall instance to return (hopefully)relevant log files from the OS create scripts.
Convert the start instance rpc to (status, data)
This will record the failure cause in starting up the instance in thejob log (and thus to the user).
Fix handling of failures in create instance disks
Commit 2302 only modified _CreateBlockDevOnPrimary to the new styleresult, but _CreateBlockDevOnSecondary was forgotten. After the mergerof the two functions, _CreateBlockDevOnSecondary was taken as template...
Move the default MAC prefix to the constants file
Instead of having the default live in the gnt-cluster script, we move itto the constants file. The patch also fixes a typo on constants.py.
Use instance.all_nodes instead of hand-building it
This patch replaces a few obvious uses of [instance.primary_node] +list(instance.secondary_nodes) (or similar usage) with the newinstance.all_nodes.
Fix non-drbd instance creation
Commit 2294 introduced a new instance.all_nodes property, whichunfortunately is working incorrectly for non-drbd instances.
This patch fixes it by making sure the primary node is always added tothe set, even before recursing over (any potential) children....
Small simplification in MapLVsByNode
We don't need to pre-create the node entries in lvmap, since they willbe created at recursion time.
Split the block device creation in two parts
Some callers of _CreateBlockDev need recursive behaviour, but not all.The replace secondary first creates (manually) new LVs to ensure storageis there, and then it creates the new DRBD. At this point, we need a...
Combine the two _CreateBlockDevOnXXX functions
Since only two boolean parameters differ between these two functions, wecombine them as to have less code duplication. This will be needed inthe future as we will need to split off the recursive part off....
Switch call_blockdev_create call to (status, data)
This allows errors to be visible at the user level instead of just nodedaemon logs.
Small change in the instance disk creation path
For future propagation of error messages from backend to cmdlib and tothe job log, just having True/False return from the disk creationfunction is not enough.
This patch converts these functions (_CreateDisks, _CreateBlockDevOnXXX)...
Block device creation cleanup
Currently when creation LVM-based instances, we always get theextremely-confusing message "ERROR Can't find LV /dev/xenvg/..." whichis actually expected. This behaviour was introduced before we hadUUID-style LV names, since at that point it was not a unexpected to have...
Use the same root for both _data and _meta LVs
Currently we use a different UUID for the _data and _meta volumes of aDRBD disk. This is confusing as it's hard to associate the two in theoutput of “lvs” or “gnt-node volumes”.
The patch changes so that they use the same prefix....
Fix LUExportInstance
Due to deficiencies in our block device implementation, it is a must tocall SetDiskID on disks before passing them to remote nodes. Since inexport/import, we don't touch the disks themselves, this was not neededbefore in this function....
Instance: add a new all_nodes property
Since we often need the list of all nodes of an instance, we add a new"all_nodes" property that returns all nodes of the instance, and weswitch secondary_nodes to a simpler implementation based on this newfunction....
Fix gnt-backup export with short names
We need to pass the fully-qualified node to _CheckNodeOnline, not the shortone.
Reviewed-by: imsnah
Some docstring updates
This patch rewraps some comments to shorter lengths, changesdouble-quotes to single-quotes inside triple-quoted docstrings forbetter editor handling.
It also fixes some epydoc errors, namely invalid crossreferences (aftermethod rename), documentation for inexistent (removed) parameters, etc....
ganeti-noded: reduce log noise
The source port/addr is currently logged three times for eachconnection, and this is unnecessary. We change two log entries to debug,since they are useful for precise timing, and we keep only one at INFOlevel.
Forward port the live migration from 1.2 branch
This is forward port via copy (and not individual patches cherry-pick)of the latest code on the 1.2 branch related to the migration.
The changes compared to 1.2 are the fact that we don't need theIdentifyDisks step anymore (the drbd rpc calls are independent now), and...
Port replace disk/change node to the new DRBD RPCs
In replace disks to new secondary, since Attach (and thereforecall_blockdev_find) is not modifying the devices anymore, we need toswitch this LU to the new call_drbd_disconnect_net andcall_drbd_attach_net functions....
Forward-port DrbdNetReconfig
This is a modified forward-port of DrbdNetReconfig and their associatedRPCs. In Ganeti 2.0, these functions will be used for two things: - live migration (as in 1.2) - and for other network reconfiguration tasks, since DRBD8.Attach()...
backend: rename AttachOrAssemble to Assemble
Since now the Assemble function is different than Attach, we rename thisbackend function to show that the intent is to fully assemble the device(and it's always allowed to modify the device).
drbd: change the semantics of Attach vs. Assemble
Currently, both the Attach and Assemble methods for DRBD8 devices will use andalter the device state. This is suboptimal, and it has been workedaround in 1.2 via a special cache in the node daemon so that we don't...
bdev: Do not call Assemble() on children
The caller of dev.Assemble() (backend._RecursiveAssembleBD) is doing anexplicit recursion over all the children of the device, with bettererror reporting. As such, we don't need this repeated assembly insidethe base BlockDev class....
Fix modification of instance memory
... as found by the QA script - bug was introduced by me in commit 2117.
Reviwed-by: imsnah
Increase resync speed to 60MB/s
This is a forward-port of commit 2219 on the 1.2 branch.
Skip offline nodes in gnt-cluster commands
This patch makes gnt-cluster copyfile and command skip the offlinenodes.
Reviwed-by: ultrotter, imsnah
Fix some errors in instance modify --disk remove
The RpcResult introduction still left some bugs (after multiple patches): - we don't correctly check the result type - rename a variable to prevent a conflict
Fix an error handling case in instance info
The checking for invalid instance names in LUQueryInstanceData is brokensince commit 1642.
Introduce a very simple LU to force config updates
This LU can be used to force a push of the config in case it's needed,for example after an upgrade to update the ssconf_release_version file.
Add a new ssconf file with the ganeti version
The patch adds a new ssconf file containing the ganeti version.
Work around a DRBD sync speed race condition
This is modified forward-port of commit 1544 on the 1.2 branch:
When DRBD is doing its dance to establish a connection with its peer, it also sends the synchronization speed over the wire. In some cases setting the sync speed only after setting up both...
burnin: use the new replace_disks constants
This patch updates burnin to the latest replace disks constant, andchanges the constants' values to be more accurate.
Fix gnt-os for offline nodes
We shouldn't query offline nodes in gnt-os. This patch adds an utilityfunction to ConfigWriter that returns the names of online nodes and usesit in LUDiagnoseOS to query only the good nodes.
Silence warning on node list for offline nodes
The warning in node list is meant for nodes that return wronginformation, but for offline nodes this case is normal.
Rework the daemonization sequence
The current fork+close fds sequence has deficiencies which are hard towork around: - logging can start logging before we fork (e.g. if we need to emit messages related to master checking), and thus use FDs which we...
Cleanup replace-disks modes and options
In 1.2, due to the md+drbd7 legacy, we had a complex choice of replacemodes, and the new drbd8 modes where forced into this syntax, with somecomplicated rules of transition from one mode to another (if REPLACE_ALL...
Fix cluster verify/node net test for offline nodes
For offline nodes, we shouldn't add them to the NV_NODELIST andNV_NODENETTEST tests since they most likely won't succeed.
The patch makes gnt-cluster verify happy again in such cases.
rpc: Add a method for easy check of remote results
The patch adds a new method to the rpc.RpcResult class called"RemoteFailMsg" which is useful for the RPC calls which return a(status, payload) style result.
Add an instance_migratable rpc call
This is a forward-port of commit 1194 on the 1.2 branch:
This call will check whether an instance is up on its primary, and that it has been started with symlinks. We currently have no on-secondary checks, nor any hypervisor specific call....
bdev: forward-port ReAttachNet/DisconnectNet
This is plain copy of the 1.2 ReAttachNet and DisconnectNet methods onthe DRBD8 device, with the logger to logging module changes and theReAttachNet method renamed to AttachNet.
These methods are not used anywhere right now, but will be used for...
backend: Remove symlinks by disk name
This is a modified forward-port of commit 1184 on the 1.2 branch:
backend: Remove symlinks by disk name, not using a wildcard
The changes to the original patch are related to the docstring style and...
Pass instance name to rpc call blockdev_close
This is an extract of commit 1166 on the 1.2 branch (Add a rpc call fordrbd network reconfiguration), but only the blockdev_close part.
The patch changes the blockdev_close call to take the instance so that...
Fix the _RemoveBlockDevLinks() function
This is a forward-port of commit 1163 on the 1.2 branch: This fixes the removal of the instance symlinks (probably breakage from the glob changes).
Remove instance's symlinks
This is a forward-port of commits 1150 and 1151 on the 1.2 branch: Add _RemoveBlockDevLinks auxiliary function, called when an instance fails to start and when it is shut down.
Reviewed-by: iustinp
and: Fix cut&paste error when removing symlinks...
Catch BlockDeviceError when starting instance
This is a forward-port of commit 1149 on the 1.2 branch: _GatherAndLinkBlockDevs used to raise the errors.BlockDeviceError exception when it failed to create a block device, and with this patch set it does so also when it fails to create a symlink to it....
Create symlinks to intances' block devices
This is a forward-port of commit 1148 on the 1.2 branch: Change the _GatherBlockDevs private function, called only one time by StartInstance, to _GatherAndLinkBlockDevs, and make it transform the device returned even more by calling the new _SimlinkBlockDev auxiliary...
Simplify hypervisor block_devices structure
This is a partial forward-port of commit 1136 on the 1.2 branch:
The hypervisor doesn't need to be passed the whole block device structure, so we'll just give it the block device name on the local node, and the name as seen by the instance. This will make it easier to...
_AssembleInstanceDisks: fix rpcresult handling
Commit 2117 changed _AssembleInstanceDisks to correctly parse thefailure status of the new RpcResult structure, but it didn't fix thestoring of only the result payload. Since RpcResult is not JSONserializable, LUActivateInstanceDisks is failing....
Fix some pylint-detected issues
Two bad indentation cases and a missing variable.
ganeti.bootstrap: Set permissions on newly uploaded files
Reviewed-by: amishchenko
ganeti.cmdlib: Check remote API certificate on "gnt-cluster verify"
ganeti.bootstrap: Upload remote API certificate to new nodes
ganeti.bootstrap: Prepare for remote API certificate
ganeti.bootstrap: Write SSL key to temporary file and set permissions
Previously, we set the permissions only after writing the key. Thisgave other users on the system a small window during which they couldread the key.
ganeti.bootstrap: Generate SSL certificate for remote API
ganeti.bootstrap: Move SSL certificate generation into separate function
ganeti-rapi: Implement HTTP authentication
Passwords are stored in "$localstatedir/lib/ganeti/rapi_users". Useroptions specify the access permissions of a user (see docstring forganeti.http.ReadPasswordFile), for which only "write" is supportedto grant write access. Every other user has read-only access....
ganeti.http: Function to read password file
Lines in the password file are of the following format:
<username> <password> [options]
Fields are separated by whitespace. Username and password aremandatory, options are optional and separated by comma (",")....
ganeti.http: Add support for private data in HTTP requests
ganeti.http: Add support for basic HTTP authentication
As per RFC2617.
ganeti.http: Prepare authentication for HTTP server
The authentication class will override PreHandleRequest.
Job queue: Allow more than one file rename per RPC call
ganeti.jqueue: Group job archivals to reduce number of RPC calls
Reducing the actual number of RPC calls will come in another patch.
Prevent RPC timeout on auto-archiving jobs
With a large job queue, auto-archiving jobs can take a very long time,causing timeouts on the luxi RPC layer. With this change, auto-archive returns after half of the RPC timeout has passed. The userwill see how many jobs are left unchecked....
jqueue: When auto-archiving jobs, calculate job status only once
This is done by passing the job object to _ArchiveJobUnlocked insteadof only the job ID. Also return whether job was actually archived.
Use subdirectories for job queue archive
As it turned out, having many files in a single directory can bevery painful. With this patch, only 10'000 files are stored in adirectory for the job queue archive. With 10'000 directries, thisallows for up to 100 million jobs be archived without having large...
Add rename function automatically creating directories if needed
Unfortunately, os.makedirs in Python 2.4 is not safe against multipleprocesses creating the same directory tree at the same time. This isonly fixed in Python 2.5 and up. Adding more checks in our code doesn't...
ganeti.http: Don't pass poller object around
They're cheap to instantiate and doing this changes makes the codea bit simpler.
Rename http.HttpInternalError to HttpInternalServerError
All other exceptions are named after the error name in RFC2616 (HTTP/1.1).
ganeti.http: Add more constants and errors
ganeti.http: Ignore ENOTCONN when shutting down the connection
Implement support for additional headers with HTTP errors
Add simple unittests for ganeti.http
More complex unittests will need some refactoring in the HTTP code.
ganeti.bootstrap: Whitespace fix
Add job queue size limit
A job queue with too many jobs can increase memory usage and/or makethe master daemon slow. The current limit is just an arbitrary number.A "soft" limit for automatic job archival is prepared.
utils.KillProcess: Use waitpid() to wait for child processes
Sometimes the proc filesystem doesn't reflect the current status ofa process. By calling waitpid(), we make sure to get the currentinformation, at least for child processes. The timeout is still...
LUConnectConsole: fix primary_node online check
The primary node is part of the instance, not of the opcode.
_RunCmdPipe: handle EINTR in poller.poll()
poll() can be interrupted. rather than failing we retry until itreturns.
KVM: improve socat interface
Call socat with a full path specified at configure time, rather thanjust by its name, and check for the binary to exist at hypervisorverify.
KVM: use a different default kernel path
It makes sense for the default kvm kernel not to be called "xenU".
ganeti.http: Add three TODOs for improvements
ganeti.http: Explicitly initiate handshake
Otherwise it would be done on the first read/write operation, makingerror handling more difficult (such as EOF during handshake).
ganeti.http: Implement handshake socket operation
ganeti.http: Handle SSL_ERROR_ZERO_RETURN
Also add a comment next to the place where the SSL connection is shutdown.
cleanup: ConfigWriter, initialize all attributes
We should initialized the _last_cluster_serial in the constructor too (just tobe consistent).
cleanup: rapi v2 instance tags wrong attribute
This was changed in the past, but it seems this class was forgotten.
cleanup: http server, line too long
cleanup: http client, line too long
cleanup: xen hypervisor
Wrong indentation and uniformize one method signature.
cleanup: kvm code likes to redefine names
lib/ssh.py: import the logging module
This only means most of our error paths in this module were not working(and generating exceptions).
SshRunner: add docstring for _BuildSshOptions
cleanup: use _ for unused loop counter
cleanup: WorkerPool, wrong variable name
Quoting Michael: "why is this even working?"
Reviewed-by: imsnah,amishchenko
cleanup: TcpPing, wrong variable name
The default value of 'False' wasn't initialized properly. It doesn'trequire initialization, but it's cleaner this way.
cleanup: SetEtcHostsEntry unused var
cleanup: fix IAllocator hypervisor usage
Two problems: the iallocator.hypervisor wasn't initialized to None inthe constructor, so pylint doesn't realize it's initialized later withsetattr.
Second, 'hypervisor' is a module, so we shouldn't use it as a variable....