Pass instance name to rpc call blockdev_close
This is an extract of commit 1166 on the 1.2 branch (Add a rpc call fordrbd network reconfiguration), but only the blockdev_close part.
The patch changes the blockdev_close call to take the instance so that...
Fix the _RemoveBlockDevLinks() function
This is a forward-port of commit 1163 on the 1.2 branch: This fixes the removal of the instance symlinks (probably breakage from the glob changes).
Reviewed-by: imsnah
Remove instance's symlinks
This is a forward-port of commits 1150 and 1151 on the 1.2 branch: Add _RemoveBlockDevLinks auxiliary function, called when an instance fails to start and when it is shut down.
Reviewed-by: iustinp
and: Fix cut&paste error when removing symlinks...
Catch BlockDeviceError when starting instance
This is a forward-port of commit 1149 on the 1.2 branch: _GatherAndLinkBlockDevs used to raise the errors.BlockDeviceError exception when it failed to create a block device, and with this patch set it does so also when it fails to create a symlink to it....
Create symlinks to intances' block devices
This is a forward-port of commit 1148 on the 1.2 branch: Change the _GatherBlockDevs private function, called only one time by StartInstance, to _GatherAndLinkBlockDevs, and make it transform the device returned even more by calling the new _SimlinkBlockDev auxiliary...
Simplify hypervisor block_devices structure
This is a partial forward-port of commit 1136 on the 1.2 branch:
The hypervisor doesn't need to be passed the whole block device structure, so we'll just give it the block device name on the local node, and the name as seen by the instance. This will make it easier to...
_AssembleInstanceDisks: fix rpcresult handling
Commit 2117 changed _AssembleInstanceDisks to correctly parse thefailure status of the new RpcResult structure, but it didn't fix thestoring of only the result payload. Since RpcResult is not JSONserializable, LUActivateInstanceDisks is failing....
Fix some pylint-detected issues
Two bad indentation cases and a missing variable.
ganeti.bootstrap: Set permissions on newly uploaded files
Reviewed-by: amishchenko
ganeti.cmdlib: Check remote API certificate on "gnt-cluster verify"
ganeti.bootstrap: Upload remote API certificate to new nodes
ganeti.bootstrap: Prepare for remote API certificate
ganeti.bootstrap: Write SSL key to temporary file and set permissions
Previously, we set the permissions only after writing the key. Thisgave other users on the system a small window during which they couldread the key.
ganeti.bootstrap: Generate SSL certificate for remote API
ganeti.bootstrap: Move SSL certificate generation into separate function
ganeti-rapi: Implement HTTP authentication
Passwords are stored in "$localstatedir/lib/ganeti/rapi_users". Useroptions specify the access permissions of a user (see docstring forganeti.http.ReadPasswordFile), for which only "write" is supportedto grant write access. Every other user has read-only access....
ganeti.http: Function to read password file
Lines in the password file are of the following format:
<username> <password> [options]
Fields are separated by whitespace. Username and password aremandatory, options are optional and separated by comma (",")....
ganeti.http: Add support for private data in HTTP requests
ganeti.http: Add support for basic HTTP authentication
As per RFC2617.
ganeti.http: Prepare authentication for HTTP server
The authentication class will override PreHandleRequest.
Job queue: Allow more than one file rename per RPC call
Reviewed-by: ultrotter
ganeti.jqueue: Group job archivals to reduce number of RPC calls
Reducing the actual number of RPC calls will come in another patch.
Prevent RPC timeout on auto-archiving jobs
With a large job queue, auto-archiving jobs can take a very long time,causing timeouts on the luxi RPC layer. With this change, auto-archive returns after half of the RPC timeout has passed. The userwill see how many jobs are left unchecked....
jqueue: When auto-archiving jobs, calculate job status only once
This is done by passing the job object to _ArchiveJobUnlocked insteadof only the job ID. Also return whether job was actually archived.
Use subdirectories for job queue archive
As it turned out, having many files in a single directory can bevery painful. With this patch, only 10'000 files are stored in adirectory for the job queue archive. With 10'000 directries, thisallows for up to 100 million jobs be archived without having large...
Add rename function automatically creating directories if needed
Unfortunately, os.makedirs in Python 2.4 is not safe against multipleprocesses creating the same directory tree at the same time. This isonly fixed in Python 2.5 and up. Adding more checks in our code doesn't...
ganeti.http: Don't pass poller object around
They're cheap to instantiate and doing this changes makes the codea bit simpler.
Rename http.HttpInternalError to HttpInternalServerError
All other exceptions are named after the error name in RFC2616 (HTTP/1.1).
ganeti.http: Add more constants and errors
ganeti.http: Ignore ENOTCONN when shutting down the connection
Implement support for additional headers with HTTP errors
Add simple unittests for ganeti.http
More complex unittests will need some refactoring in the HTTP code.
ganeti.bootstrap: Whitespace fix
Add job queue size limit
A job queue with too many jobs can increase memory usage and/or makethe master daemon slow. The current limit is just an arbitrary number.A "soft" limit for automatic job archival is prepared.
utils.KillProcess: Use waitpid() to wait for child processes
Sometimes the proc filesystem doesn't reflect the current status ofa process. By calling waitpid(), we make sure to get the currentinformation, at least for child processes. The timeout is still...
LUConnectConsole: fix primary_node online check
The primary node is part of the instance, not of the opcode.
_RunCmdPipe: handle EINTR in poller.poll()
poll() can be interrupted. rather than failing we retry until itreturns.
KVM: improve socat interface
Call socat with a full path specified at configure time, rather thanjust by its name, and check for the binary to exist at hypervisorverify.
KVM: use a different default kernel path
It makes sense for the default kvm kernel not to be called "xenU".
ganeti.http: Add three TODOs for improvements
ganeti.http: Explicitly initiate handshake
Otherwise it would be done on the first read/write operation, makingerror handling more difficult (such as EOF during handshake).
ganeti.http: Implement handshake socket operation
ganeti.http: Handle SSL_ERROR_ZERO_RETURN
Also add a comment next to the place where the SSL connection is shutdown.
cleanup: ConfigWriter, initialize all attributes
We should initialized the _last_cluster_serial in the constructor too (just tobe consistent).
cleanup: rapi v2 instance tags wrong attribute
This was changed in the past, but it seems this class was forgotten.
cleanup: http server, line too long
cleanup: http client, line too long
cleanup: xen hypervisor
Wrong indentation and uniformize one method signature.
cleanup: kvm code likes to redefine names
lib/ssh.py: import the logging module
This only means most of our error paths in this module were not working(and generating exceptions).
SshRunner: add docstring for _BuildSshOptions
cleanup: use _ for unused loop counter
cleanup: WorkerPool, wrong variable name
Quoting Michael: "why is this even working?"
Reviewed-by: imsnah,amishchenko
cleanup: TcpPing, wrong variable name
The default value of 'False' wasn't initialized properly. It doesn'trequire initialization, but it's cleaner this way.
cleanup: SetEtcHostsEntry unused var
cleanup: fix IAllocator hypervisor usage
Two problems: the iallocator.hypervisor wasn't initialized to None inthe constructor, so pylint doesn't realize it's initialized later withsetattr.
Second, 'hypervisor' is a module, so we shouldn't use it as a variable....
cleanup: LUReplaceDisks unused vars
And a small whitespace fix.
cleanup: do not hide upper-scope name
hypervisor is a module, so we shouldn't use it as an argument.
cleanup: fix use of _CheckNodeOnline
A few cases of wrong variable name.
cleanup: LUAddNode, LUSetNodeParams unused variable
This is a leftover from the abstraction of AdjustCandidatePool, and italso requires the config lock, so it's better to remove it.
cleanup: LURenameCluster wrong variable name
cleanup: fix export NIC count the same way as disk
For safety, we use the same algorithm as in disk count.
cleanup: fix backend._RecursiveFindBD
_RecursiveFindBD takes a parameter that isn't used; moreover, nowhere inthe SVN history can I find a case that it has been used.
As such, remove this parameter and fix its callers.
cleanup: more unused vars
cleanup: sanitize a default parameter
Instead of relying that the usage of the parameter is ok with mutabledefault parameters, let's just make it safer..
cleanup: exceptions should derive from Exception
cleanup: fix GatherMasterVotes
Remove unused vars
cleanup: _InitSSHSetup doesn't need its argument
cleanup: fix 'variable unused' warning
In the iteration we don't care about the node names, so we change thefor loop to be over the values (and not itervalues).
ganeti.http: Rename HttpBase._using_ssl to HttpBase.using_ssl
It'll be queried from other classes.
ganeti.http: Rename HttpSocketBase to HttpBase
It's more appropriate.
Fix epydoc format warnings
This patch should fix all outstanding epydoc parsing errors; as such, weswitch epydoc into verbose mode so that any new errors will be visible.
ganeti.backend: Improve compression check
ganeti.http: Docstring updates
ganeti.http: Remove _HttpClientError
This is a leftover from old code.
ganeti.http.server: Increase connection backlog to 1024
This solves a problem with many concurrent requests. By default, 1024is the maximum backlog on Linux kernels. We limit the number of clientsthrough MAX_CHILDREN, too. The idea of just increasing the backlog is...
RPC: Compress file upload data
Adding compression to larger amounts of data is more efficient thantransferring it (len(nodes) - 1) times over the network withoutcompression. We were able to compress a 800KB config file to about30 KB, which is about 40 KB with Base64 encoding (required due to...
Warn for instances living on offline nodes
The patch also changes the result to error for non-reachable secondary nodes(as for primary nodes).
Fix _AdjustCandidatePool
Currently the ConfigWriter.MaintainCandidatePool returns node names, and_AdjustCandidatePool uses them as such, but then it passes these tocontext.ReaddNode which in turn passes them to jqueue.JobQueue.AddNode whichuses them as objects.Node instances....
gnt-node modify: add the offline attribute
This patch changes gnt-node modify and the associated opcode/lu to allowmodification of the node offline attribute.
Setting a node into offline mode automatically demotes it from themaster role.
RPC: do not make calls to offline nodes
This patch changes the _MultNodeCall and _SingleNodeCall helpers to notactually make calls to offline nodes, but instead generate fakeresponses which have a parameter caller 'offline' set so that callerscan check for this value if they want (otherwise, it's just a failed RPC...
Make cluster verify understand offline nodes
This patch changes cluster verify to not alert on offline nodes, butinstead just show a note at the end with the number of such nodes.
It also removes warnings in verify-disks and hooks about failures tomake rpc calls to such nodes....
cmdlib: check node stats in prereqs
This patch adds checks for offline nodes in most instance LUs so that wecan work with offline secondaries, but not with offline primaries. Somecases (like grow disk, which needs both sides up) are not allowingoffline nodes at all....
Add two utility functions to cmdlib
These will be used for parameter checking and node status checking.
Add function to compute the master candidates
Since some nodes can be offline, we can't just take the length of thenode list as the maximum possible number of master candidates.
The patch adds an utility function to correctly compute this value andreplaces hardcoded computations with the use of this function. It then...
http: use slicing instead of string modification
The combination of the current buffer splitting method and (4KB) buffersize is very inefficient when writing big amounts of data. Just walkingover a 16 megabyte string using a 4K buffer takes (on a random computer)...
Add the offline node list to ssconf
The patch also changes the various node list generation to be moreconsistent.
Cleanup the config file on demotion from candidate
This patch adds a simple rpc which makes a backup of the config file andthen removes it. This is done so that cluster verify doesn't complainimmediately after demoting a node.
watcher: handle offline nodes better
This patch changes the LUQueryInstances to show a different state foroffline nodes and also modifies the watcher to understand the offlinestate in its checks.
node list: add the offline field
Add a new node parameter 'offline'
This patch adds a new node parameter called offline that will be used tomark nodes which should be touched by commands.
We also add this flag at cluster init, node add, and export it toiallocator scripts.
ssconf: empty files should not add a newline
Currently we add a newline in the ssconf writeout process, even if thefile is empty. We chage this case so that lists of values (e.g. offlinenodes) are correct (not a list of one empty element).
ganeti.http: Add constant for DELETE
Remove old HTTP code
ganeti.rpc: Convert to new HTTP server
ganeti.http: Split HTTP server and client into separate files
This includes a large rewrite of the HTTP server code. The handling ofOpenSSL errors had some problems that were hard to fix with itsstructure. When preparing all of this, I realized that actually HTTP...
Rename all HTTP classes to camel case
It should be consistent.
ganeti.http: Remove underline from two classes
This is a preparation step for splitting the HTTP client and server codeinto two separate modules.
Move HTTP code to subpackage
LURemoveNode, promote nodes to master candidates
If after the remove node there are not enough master candidates, we'lltry to promote them.