Parallelize LUQueryInstances
This first version acquires a shared lock on all requested instances andtheir nodes. In the future it can be improved by acquiring less locks ifno dynamic fields have been asked, and/or by locking just primary nodes.
Reviewed-by: imsnah
A few more locking unit tests
A few more tests written while bug-hunting. One of them shows a realissue, at last. :)
Add lock-all-through-GLM unit test
I was hunting for a bug in my code and thought the culprit was in thelocking library, so I added a test to check. Unfortunately turns out itwasn't. :( Committing the test anyway, while still trying to figure outwhat's wrong......
LockSet: allow lists with duplicate values
If a list with a duplicate value is passed to a lockset what the codenow does is to try to acquire the lock twice, generating adouble-acquire exception in the SharedLock code. This is definitely anissue. In order to solve it we can either forbit double values in a list...
Processor: lock all levels even if one is missing
If a locking level wasn't specified locking used to stop. This meansthat if one, for example, didn't specify anything at the LEVEL_INSTANCElevel, no locks at the LEVEL_NODE level were acquired either. With this...
LURebootInstance: move arg check in ExpandNames
The check for the reboot type can be done without any locks held, sowe'll move it to ExpandNames. Plus, we note in a FIXME that if thereboot type is not full, we can probably just lock the primary node, and...
QA: Convert configuration from YAML to JSON
We no longer use YAML in Ganeti at all. This patch converts the QAconfiguration from YAML to JSON. JSON doesn't support comments andI had to use a hack with fields starting with '#'.
Reviewed-by: ultrotter
LUVerifyCluster: Return boolean indication success
Reviewed-by: schreiberal
Use Linux-specific way to name master socket
By using this Linux-specific way we don't have to care about removing thesocket file when quitting or starting (after an unclean shutdown). For amore detailed description, see the comment in the patch.
QA: Try to run more scripts with --version
This patch also sorts the list.
QA: Always accept added node's SSH key
QA: Do not upload known_hosts file anymore
The cluster no longer keeps individual host's SSH key, but ratheraliases all of them to the cluster name.
Copy qa_utils.AssertIn from 1.2 branch
Apparently it was forgotten when import the remote API QA tests.
gnt-node: Add option to always accept peer's SSH key
This option will be used to add nodes to the cluster withoutasking the user to confirm the key. Together with key basedauthentication this can be used in the QA tests.
SshRunner: Add parameter to always accept peer's SSH key
This will be used to add nodes without user interaction, specificallyin QA tests.
Move SSH option building into a function
I'm going to add another option and it would make maintainingthem in constants even more complicated.
SshRunner.Run: Pass all arguments to BuildCmd
This patch changes SshRunner.Run to pass all arguments toSshRunner.BuildCmd. They had the same arguments beforeand should stay that way. This change makes it easierto add new or change existing arguments....
Whitespace fixes for remote API QA checks
Remove QA hook functionality
To my knowledge they're used nowhere and it's at least slightlyconfusing to people adding new QA checks.
Pass hypervisor type to the OS scripts
It's handy to make the os scripts know which hypervisor the instance isgoing to run under. In order not to change the os API we pass thisinformation in the environment, where the os scripts can access it ifthey're hypervisor-aware....
RunCmd: add optional environment overriding
If the user passes an env dict to RunCmd we'll override the environmentpassed to the to-be-executed command with the values in the dict. Thisallows us to pass arbitrary environment values to commands we run....
KVM Hypervisor Cleanup
- Remove a few experiemental code lines left as comments- Rework first disks' boot=on addition, which was calculated twice- Remove an empty line- Remove reference to hvm_pae which doesn't apply to kvm
Allow kvm hypervisor in gnt-cluster init
Add KVM hypervisor code
ht_kvm.py contains the code for ganeti to work under kvm.This patch also modifies Makefile.am to ship that file, andlib/hypervisor/__init__.py to import it, and add kvm to thehypervisors map.
constants: add HT_KVM
Add a new hypervisor type, HT_KVM, to constants, and register it in theHYPER_TYPES set.
Add --with-kvm-path configure option
This allows to configure a different path to the kvm binary. By default/usr/bin/kvm is used, which is the one found in debian and ubuntu.
FakeHypervisor: fix a function signature
StartInstance takes 'block_devices', not 'force' as its third argument.Even if this is not used in the fake hypervisor it's better to have thecorrect argument name to avoid confusion.
Convert RunCmd to an epydoc docstring
Fix adding pristine nodes
If a node hasn't been part of the cluster before being added it'll nothave the cluster's SSH key. This patch makes sure to accept those bynot aliasing the machine name to the cluster name.
Fix race locking issue in noded
Noded didn't release the job queue lock after initialising it. Thispatch makes sure to unlock once the work is done.
cli: Use new RPC call instead of polling
This means commands will not take at least one second anymore.
Add RPC call to wait for job changes
This way clients can react faster to status or message changes anddon't have to poll anymore.
jqueue: Change log message time format
See the comment in the patch.
Add functions to split time into tuple and merge it back
These will be used for job logs.
Use new query function for exports in gnt-backup
Reviewed-by: iustinp
Add query function for exports
Don't always remove queue lock when queue is purged
The lock should only be removed if ganeti-noded is going to quit.Otherwise it needs to be kept to prevent another process from creatingit again while we're still holding the (removed) lock. This is due to...
backend: Add optional exclusion list to _CleanDirectory
The code cleaning the queue will make use of it.
jqueue: Move archived jobs on all nodes
Otherwise one might have archived jobs back in the list after a masterfailover.
noded: Add RPC function to rename job queue files
This will be used to archive jobs.
backend: Add function to check whether file is in queue dir
Another function will need to check whether its parametersare job queue files.
noded: Add decorator for job queue lock
The lock will also be needed by another function.
Two small style fixes
Implement queue locking in node daemon
jstore: Change to not always require a lock
This way we can do locking when both noded and masterd are runningon the same machine, the latter holding an exclusive lock on thequeue.
More logging for errors during noded RPC calls
Log only unexpected errors in utils.FileLock
Otherwise users might be confused by errors in log files.
Disallow uploading job queue files through upload_file
The job queue is now updated through its own RPC functions.
jqueue: Use new job queue RPC functions
Add job queue RPC functions
jobqueue_update: Uploads a job queue file's content to a node. Themost common operation is to upload something that we already havein a string. Unlike in the upload_file function, the file is notread again when distributing changes, but content has to be passed...
Move function cleaning directory to module level
JobQueuePurge() will be used by an RPC function.
Use API instead of command line utilities in watcher
Fix cli.PollJob
feedback_fn wasn't passed to it.
Notify job queue about added/removed nodes
The job queue maintains its own node list and must be notifiedwhen nodes are added/removed.
Implement {Add,Readd,Remove}Node in GanetiContext
By doing this we've a central place which coordinates what needs to bedone when adding or removing nodes. Another patch will add calls intothe job queue.
Two log messages move to config.py.
When removing a node, node_leave_cluster is now called after it has...
jqueue: Implement {Add,Remove}Node
These functions will be used to notify the queue about newly addedor removed nodes.
jqueue: Don't pass the list of nodes to SubmitJob anymore
The job queue now maintains its own list and is updated whennodes are added or removed from the cluster.
Maintain node list in job queue
The code makes sure not to include the master in the list.
masterd: Move job queue into context object
The job queue must be called from cmdlib when adding or removingnodes to the cluster. Moving it to the context objects makesthis possible.
Clean job queue directories when leaving cluster
Old job files shouldn't be left on nodes removed from a cluster.
Use new RPC call in “gnt-node list”
Implement query for nodes
Use new query RPC call in “gnt-instance list”
Implement query for instances
Queries don't create jobs and are more efficient. Log messagesare not yet stored anywhere.
jqueue: Replicate jobs to all nodes
Newly added nodes are not yet taken care of. Queue locking onnon-master nodes is not yet correct.
jqueue: Use new jstore module
jstore: Add queue helper functions
This will be used to move common code out of jqueue.
Implement job submission for scripts
This patch adds the infrastructure for executing a job in background,instead of foreground, via a new “--submit” option. The behaviour isthat the job ID is printed and the script will immediately exit.
The patch also converts gnt-node list to this model (yes, this will be a...
Another typo in the install doc
Update the module build section of install doc
jqueue: Move assert into decorator
This reduces code duplication. A later patch will modify the job queuea bit more and will need a change of this assert. The assertion isalso removed from all class-internal functions.
Split cli.SubmitOpCode in two parts
The current SubmitOpCode function is not flexible enough to be used forsubmitters that don't want to wait for the job finish.
The patch splits this in two, a SendJob function and a PollJob one, andthe old SubmitOpCode becomes a wrapper. Note that the new SendJob takes...
Allow job queue files to be uploaded through ganeti-noded
This is needed for job queue replication.
Add FileLock utility class
This class is a wrapper around fcntl.flock and abstracts opening andclosing the lockfile. It'll used for the job queue.
(The patch also removes a duplicate import of tempfile into the unittest)
jqueue: Store context in job queue instead of worker pool
The job queue will need to access to configuration, which is providedthrough the context object, to get a list of nodes.
RAPI Implement DELETE for tags
First write operation (add tag) for Ganeti RAPI
Add instance tag handling, improved error logging....oh, yes adopt instance listing for RAPI2!
Fix cluster destroy
With the recent startup/shutdown changes (and with the master daemon inplace), the cluster destroy needs some fixing.
This patch moves the finalization of the destroy out from cmdlib intobootstrap, so we can nicely shutdown the rapi and master daemons....
Xen: remove two end-of-line semicolons
It's python, isn't it?
Fix cluster init
With the recent changes, I forgot the extra parameter to this rpc call.Also the rpc call needs to be done after we setup the config data, forthe master daemon to be able to start, so we move it after all otherinit steps.
Make gnt-* commands fail nicely on non-masters
This patch adds a check that we are on the master after failing toconnect to the socket, and log nicely the master name.
Parallelize LUFailoverInstance
ChainOpCode is still BGL-only
Prevent mistakes with an assert.
Fix a misuse of exc_info in logging.info
This is my fault, sorry.
Fix pylint-detected issues
This is mostly: - whitespace fix (space at EOL in some files, not all, broken indentation, etc) - variable names overriding others (one is a real bug in there) - too-long-lines - cleanup of most unused imports (not all)...
Fix some errors detected by pylint
Unify SetupDaemon/SetupLogging
The 'old-style' info, error, debug logs do not make much sense. Thispatch unifies the SetupLogging and SetupDaemon functions. As a result,all the commands logs to a 'commands.log' file.
The patch also changes the log setup to keep going if there's an error...
Simplify the log constants and add another one
The patch changes the log constants by moving the slash to the end ofthe log dir instead of at the beginning of each log file name.
It also adds a new LOG_COMMANDS constant (to be used in a next patch)....
Fix gnt-cluster getmaster
This is special in the sense that it can run on any node. As such, wejust instantiate ssconf and read the data from it.
Parallelize {Startup,Shutdown,Reboot}Instance
Parallelize LUReinstallInstance
self.recalculate_locks[locking.LEVEL_NODE] could have any value andeverything would work anyway. We'll use the string 'replace' byconvention because in the future we might want an 'append' mode.
LogicalUnit._LockInstancesNodes helper function
This function is used to lock instances' primary and secondary nodesafter locking instances themselves.
Make sharing locks possible
LUs can declare which locks they need by populating theself.needed_locks dictionary, but those locks are always acquired asexclusive. Make it possible to acquire shared locks as well, bydeclaring a particular level as shared in the self.share_locks...
Add LogicalUnit.DeclareLocks
This additional LogicalUnit function is optional to implement, but letsyou change your locking needs for one level just before locking it, butafter the previous levels have been already locked. It is useful forexample to calculate what nodes to lock after locking an instance....
LURenameInstance, add/remove relevant locks
LURenameInstance forgot to remove the old lock name and add the new one,making it impossible for parallel LUs to act on the instance (without amaster daemon restart). This also fixes burning+rename with theparallelization of {Start,Stop}Instance....
Rewrite job queue
We found several issues in the old job queue implementation. It had raceconditions, deadlocks and other deficiencies.
Short summary:- _QueuedOpCode and _QueuedJob are now more or less data structures with a few utility functions. __Setup is gone....
workerpool: Log when waiting for a thread
Rework master startup/shutdown/failover
This (big) patch reworks the master startup/shutdown and the fixes themaster failover.
What does the patch do?
For master start/stop: - remove the old ganeti-master script and its associated man page - moves the ip start/stop directly into the backend.(Start|Stop)Master...
Expose utils.DaemonPidFileName
Since we need to compute this from outside utils.py, we change this to apublic function.
Implement checking for the master role in rapi
This patch moves the CheckMaster function from ganeti-masterd to ssconf(most logical place, it cannot go in utils since we would have recursiveimports between ssconf and utils) and changes ganeti-rapi to also call...