Remove auto_balance from burnin/cmdlib
There is no such feature in trunk yet.
Reviewed-by: iustinp
Add utils.ReadFile function
It abstracts exception handling and is like a complement toutils.WriteFile.
GetAllInstancesInfo, change internal iterator name
GetAllInstancesInfo used "node" as an iterator name. Change it toinstance to make it less confusing.
Parallelize Tag operations
For now we lock the instance/node for adding/deleting tags from it, butwe could probably in the future do without, with more support from theconfig for atomic operations.
Parallelize LUSetClusterParams (and add a FIXME)
Reviewed-by: imsnah
Parallelize LURemoveExport
Parallelize LURemoveInstance
Using the new add/remove infrastructure this becomes pretty easy! :)
Parallelize LUCreateInstance
Finally, instance create on different node, without iallocator, can runin parallel. Iallocator usage still needs all nodes to be locked,unfortunately. As a bonus most checks which could have been moved toExpandNames, before any locking is done....
Implement adding/removal of locks by declaration
With this patch LUs can declare locks to be added when they start and/orremoved after they finish. For now locks can only be added in theacquired state, and removed if owned, and added locks default to be...
LockSet: forbid add() on a partially owned set
This patch bans add() on a half-acquired set. This behavior waspreviously possible, but created a deadlock if someone tried to acquirethe set-lock in the meantime, and thus is now forbidden. ThetestAddRemove unit test is fixed for this new behavior, and includes a...
Fix typo in a locking.py comment
Use is_owned to determine whether to unlock
Now that is_owned is public we don't need to play games at the end of anLU. If we're still owning anything we just release it.
Add GanetiLockManager.is_owned function
This is a public version of the private function we already had.We don't just change the previous version because it had lots of usersin the library itself and in the testing code.
Fix LockSet._names() to work with the set-lock
If the set-lock is acquired, currently, the _names function will fail ona double acquire of a non-recursive lock. This patch fixes the behavior,and some lines of code added to the testAcquireSetLock test check that...
jqueue: Add common RPC error handling function
We didn't decide yet what exactly it should do with failed nodes.
Reviewed-by: ultrotter
Remove locking of instances in certain queries
This patch is similar to the node patch (rev 1650). We disable lockingof instance (and nodes) if we only query static information.
Add an atomic ConfigWrite.GetAllInstanceInfo()
In order to be able to query instance without locking them, we need thesame atomic query of multiple instances as for nodes.
Add ConfigWriter._UnlockedGetInstanceList/Info()
This patch splits the GetInstanceInfo and GetInstanceList methods intotwo parts, one locked one _Unlocked similar to the way nodes arequeried.
Rewrite the 'only submit job' handling in scripts
The "sys.exit(0)" was not nice as you couldn't differentiate it fromother exit codes. We change this to a specially defined exception forthis, so that multi-opcode commands can handle this nicely.
Optimize the OpQueryNodes for names only
Currently, OpQueryNodes is locking all nodes (in shared mode), whichwill also block the special case of querying only for the node names(this is needed for gnt-cluster command, for example). There is nological requirement to not give the administrator enough power if she/he...
Add a way to export all node information at once
The patch adds a new function to export all node information at once(i.e. atomically with respect to the configuration lock).
Never remove job queue lock in node daemon
Otherwise, corruption could occur in some corner cases. E.g. whenLeaveNode is running in a child and is in the process of removingqueue files, the main process gets killed, started again and getsa request to update the queue. This is rather extreme corner case,...
Export backend.GetMasterInfo over the rpc layer
We create a multi-node call so that querying all nodes for agreementwill be fast.
Change backend._GetMasterInfo to return more data
The _GetMasterInfo() function needs to export the master name too to beuseful in master safety checks. This patch makes it a public (no _)function and adds a third element in the return tuple. Its callers are...
Parallelize LUQueryInstanceData
Parallelize LUVerify{Cluster,Disks}
These are two easy querying LUs which require shared access to allnodes/instances.
Parallelize LUReplaceDisks
This is the most complex parallelization so far. We have to lock oneinstance (and its nodes) plus one more node if doing a remote replace,or all nodes if doing a remote replace with iallocator.
_LockInstancesNodes: support append mode
This will be used to lock the instance's nodes in addition to some more.
Processor: remove ChainOpCode
This function was incompatible with the new locking system, and itsusage has been removed from the code. For now LUs share code by callingcommon module-private functions in cmdlib.py, in the future they willuse tasklets (when those will be implemented)....
Parallelize LU{A,Dea}ctivateInstanceDisks
Now that they are not used in other opcodes by chaining,this can easily be done.
LUReplaceDisks: remove use of ChainOpCode
The calls to OpActivateInstanceDisks and OpDeactivateInstanceDisks hasbeen replaced by _StartInstanceDisks and _SafeShutdownInstanceDisksrespectively. This is the last usage of ChainOpCode.
Create new _SafeShutdownInstanceDisks function
This new function checks whether an instance is running, before shuttingdown its disks. This is what the Exec() of LUDeactivateInstanceDisksdid, so that is replaced by a call to this function.
Fix a typo in LogicalUnit.ExpandNames docstring
s/locking.LEVEL_INSTANCES/locking.LEVEL_INSTANCE/
Use constants.LOCKS_REPLACE instead of hardcoding
This constant replaces what we used to write in recalculate_locks, andrepresents the lock recalculation mode. It lives in constants.py becauseit's used only in cmdlib, and thus doesn't deal with the locking library...
Fix LUReplaceDisks with iallocator
self._RunAllocator() sets self.op.remote_node, but doesn't return thenew remote node. If we set it to the return value of the function webasically reset it to None, and iallocator is never run.
Fix LUGrowDisk
The rpc library returns a list, not a tuple, so we'll accept both.
Fix iallocator run
Parallelize LUExportInstance
Unfortunately for the first version we need to lock all nodes. The patchdiscusses why this is and discuss ways to improve this in the future.
Parallelize LUGrowDisk
LURebootInstance: lock only primary when possible
When rebooting an instance and we're not changing it's disks status (allthe cases except in a "full" reboot) we can lock just its primary node.
Add primary_only flag to _LockInstancesNodes
As the name says when the flag is on (the default is off) only theprimary nodes are locked, as opposed to all of them.
utils.FileLock: Implement timeout
The timeout can be used in ganeti-noded to be more robust againstdeadlocks.
Add locking.ALL_SET constant and use it
Rather than specifying None in needed_locks every time, with a nicecomment saying to read what we mean rather than what we write, and thatNone actually means All, in our magic world, we'll hide this secretunder the ALL_SET constant in the locking module, which has value, you...
utils.SplitTime: More rounding fixes
SplitTime didn't round the same on different platforms. This patch changesit to use microseconds and not care about rounding.
Prevent mistakes using _GetWantedNodes
All the users of _GetWantedNodes have been converted to be concurrentLUs, and thus cannot call this function with an empty list of nodesanymore. This patch makes this restriction a part of the functionitself. This prevents mistakes in new concurrent LUs, and creates more...
Paralleliza LUQueryNodeVolumes and LUQueryExports
Parallelize LUDiagnoseOS
LUQueryExports: make 'node' field mandatory
It turns out this fields was already mandatory. If it hadn't beed valid,in fact, a value of None would have been passed to _GetWantedNodes whichwould have thrown an exception.
s/Chain(OpQueryExports)/rpc.call_export_list(...)/
Parallel opcodes are not (yet?) supported for chaining. Turns outthough that chaining is used only four times in the code, and twice it'sfor querying exports. But what's the need to chain the full opcode, when...
Fix wrong indentation in LUQueryNodes
Merge r1607 from branches/ganeti/ganeti-1.2
Use a default vnc_bind_address if None is specified
merge r1568 from branches/ganeti/ganeti-1.2
Add more fields to gnt-instance list
merge r1548 from branches/ganeti/ganeti-1.2
Fix wrong wording of instance rename error message.
merge r1541 from branches/ganeti/ganeti-1.2
more information for VNC console port
merge r1540 from branches/ganeti/ganeti-1.2
Allow access to HVM serial console
merge r1539 from branches/ganeti/ganeti-1.2
Display VNC console port in gnt-instance info.
merge r1538 from branches/ganeti/ganeti-1.2
Check HVM device type on instance modify as well.
Check memory size before setting it
With this change when a user asks for a new memory size for an instance,the number is checked instead of just applied. The operation fails onlyif the instance would not be able to restart on its primary node, butgenerates warnings should it be impossible to failover the instance or...
Pass the force param to SetInstanceParms
It was already allowed in gnt-instance modify, but ignored.It will be used to force skipping parameter checks.
This is a forward-port from branches/ganeti-1.2
Original-Reviewed-by: imsnahReviewed-by: iustinp
Merge r1536 from branches/ganeti/ganeti-1.2
Add HVM device type flags 2/3
utils.SplitTime: Fix rounding of milliseconds
Reported by Iustin.
It used to return this:
utils.SplitTime(1234.999999999999)
(1234, 1000)
while it should've returned this:
(1235, 0)
merge r1535 from branches/ganeti/ganeti-1.2
Add HVM device type flags 1/4
Make WaitForJobChanges deal with long jobs
This patch alters the WaitForJobChanges luxi-RPC call to have aconfigurable timeout, so that the call behaves nicely with long jobsthat have no update.
We do this by adding a timeout parameter in the RPC call, and returning...
merge r997 from branches/ganeti/ganeti-1.2
Fix gnt-instance modify for HVM parameters
This patch makes gnt-instance modify work again for the advancedHVM parameters after it was broken by other changes.
Fix error message when masterd is not listening
Fix issue when acquiring empty lock sets
By design if an empty list of locks is acquired from a set, no locks areacquired, and thus release() cannot be called on the set. On the otherhand if None is passed instead of the list, the whole set is acquired,...
jqueue: Replace normal cache dict with weakref dict
A job should only exist once in memory. After the cache is cleaned,there can still be references to a job somewhere else. If thereare multiple instances, one can get updated while a function iswaiting for changes on another instance. By using...
jqueue: Keep timestamp of opcode start and end
jqueue: Reset run_op_idx after job is done
It can be confusing otherwise.
Fix a small typo in a constant
Seems noone ran a burnin lately :)
Reviwed-by: amischenko,ultrotter
Make sure that client programs get all messages
This is a large patch, but I can't figure out how to split it withoutbreaking stuff. The old way of getting messages by always getting thelast one didn't bring all messages to the client if they were added...
Add simple lock debug output
Currently it can only be enabled by modifying utils.py, but we canadd a command line parameter later if needed.
Reviewed-by: schreiberal
Parallelize LUQueryNodes
As for LUQueryInstances the first version just acquires a shared lock on allnodes. In the future further optimizations are possible, as outlined bycomments in the code.
Parallelize LUQueryInstances
This first version acquires a shared lock on all requested instances andtheir nodes. In the future it can be improved by acquiring less locks ifno dynamic fields have been asked, and/or by locking just primary nodes.
LockSet: allow lists with duplicate values
If a list with a duplicate value is passed to a lockset what the codenow does is to try to acquire the lock twice, generating adouble-acquire exception in the SharedLock code. This is definitely anissue. In order to solve it we can either forbit double values in a list...
Processor: lock all levels even if one is missing
If a locking level wasn't specified locking used to stop. This meansthat if one, for example, didn't specify anything at the LEVEL_INSTANCElevel, no locks at the LEVEL_NODE level were acquired either. With this...
LURebootInstance: move arg check in ExpandNames
The check for the reboot type can be done without any locks held, sowe'll move it to ExpandNames. Plus, we note in a FIXME that if thereboot type is not full, we can probably just lock the primary node, and...
LUVerifyCluster: Return boolean indication success
Use Linux-specific way to name master socket
By using this Linux-specific way we don't have to care about removing thesocket file when quitting or starting (after an unclean shutdown). For amore detailed description, see the comment in the patch.
gnt-node: Add option to always accept peer's SSH key
This option will be used to add nodes to the cluster withoutasking the user to confirm the key. Together with key basedauthentication this can be used in the QA tests.
SshRunner: Add parameter to always accept peer's SSH key
This will be used to add nodes without user interaction, specificallyin QA tests.
Move SSH option building into a function
I'm going to add another option and it would make maintainingthem in constants even more complicated.
SshRunner.Run: Pass all arguments to BuildCmd
This patch changes SshRunner.Run to pass all arguments toSshRunner.BuildCmd. They had the same arguments beforeand should stay that way. This change makes it easierto add new or change existing arguments....
Pass hypervisor type to the OS scripts
It's handy to make the os scripts know which hypervisor the instance isgoing to run under. In order not to change the os API we pass thisinformation in the environment, where the os scripts can access it ifthey're hypervisor-aware....
RunCmd: add optional environment overriding
If the user passes an env dict to RunCmd we'll override the environmentpassed to the to-be-executed command with the values in the dict. Thisallows us to pass arbitrary environment values to commands we run....
KVM Hypervisor Cleanup
- Remove a few experiemental code lines left as comments- Rework first disks' boot=on addition, which was calculated twice- Remove an empty line- Remove reference to hvm_pae which doesn't apply to kvm
Add KVM hypervisor code
ht_kvm.py contains the code for ganeti to work under kvm.This patch also modifies Makefile.am to ship that file, andlib/hypervisor/__init__.py to import it, and add kvm to thehypervisors map.
constants: add HT_KVM
Add a new hypervisor type, HT_KVM, to constants, and register it in theHYPER_TYPES set.
Add --with-kvm-path configure option
This allows to configure a different path to the kvm binary. By default/usr/bin/kvm is used, which is the one found in debian and ubuntu.
FakeHypervisor: fix a function signature
StartInstance takes 'block_devices', not 'force' as its third argument.Even if this is not used in the fake hypervisor it's better to have thecorrect argument name to avoid confusion.
Convert RunCmd to an epydoc docstring
Fix adding pristine nodes
If a node hasn't been part of the cluster before being added it'll nothave the cluster's SSH key. This patch makes sure to accept those bynot aliasing the machine name to the cluster name.
Fix race locking issue in noded
Noded didn't release the job queue lock after initialising it. Thispatch makes sure to unlock once the work is done.
cli: Use new RPC call instead of polling
This means commands will not take at least one second anymore.
Add RPC call to wait for job changes
This way clients can react faster to status or message changes anddon't have to poll anymore.
jqueue: Change log message time format
See the comment in the patch.
Add functions to split time into tuple and merge it back
These will be used for job logs.
Add query function for exports
Don't always remove queue lock when queue is purged
The lock should only be removed if ganeti-noded is going to quit.Otherwise it needs to be kept to prevent another process from creatingit again while we're still holding the (removed) lock. This is due to...
backend: Add optional exclusion list to _CleanDirectory
The code cleaning the queue will make use of it.