locking.LockSet: Improve assertions
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
locking: Factorize LockSet.acquire
By moving the main code of LockSet.acquire to its own functionwe reduce the code complexity a bit and clarify the exceptionhandling.
This also fixes a case where a lock acquire timeout wasn'thandled correctly, leading to obscure error messages....
mcpu: Make sure added locks are released on errors
opcodes: Add missing shutdown_timeout to OpRemoveInstance
luxi: Pass socket path directly to exception, not in tuple
gnt-* use the correct opcode slot to build opcodes
gnt-* scripts were building wrong opcodes for commands which had theshutdown_timeout slot (due to missing testing after renaming). Fixing.
Also change SHUTDOWN_TIMEOUT_OPT dest field name to "shutdown_timeout":...
rapi: fix tag operations
This patch fixes the tag PUT/DELETE operations, and additionally changesthe Tags* functions to take only positional and not keyword arguments(the defaults do not make any sense at all, and they are always calledwith all arguments)....
cli: add SHUTDOWN_TIMEOUT_OPT
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add timeout options to other LUs
All the LUs that shut down the instance need to be able too pass thetimeout parameter as well.
mcpu: Change lock attempt timeout calculation
With this patch all timeouts are pre-calculated. The interface ofthe _LockTimeoutStrategy class is also changed a bit; NextAttemptnow returns a new instance.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Code and docstring style fixes
Found using pylint and epydoc.
mcpu: Improve lock reporting with timeouts
mcpu: Implement lock timeouts
The timeout is always between ~0.1 and ~10.0 seconds. A smallvariation of ±5% is added to prevent different jobs fromfighting each other. After 10 attempts to acquire the locks witha timeout, a blocking acquire is made.
Lock status reporting will be improved in a separate patch....
mcpu: Remove unused exclusive_BGL attribute
locking.LockSet: Implement acquire timeouts
The timeout passed to LockSet.acquire() is measured over all lock acquires. IfLockSet.acquire fails to acquire all requested locks within the specifiedamount of time, all locks are released again and the acquire fails....
Accept shutdown timeout from the user
Using the new --timeout option:
- gnt-instance shutdown is changed to accept a timeout- the opcode is changed to hold one- the LU is changed to optionally get one- the rpc is changed to carry one- the backend is changed to take it as a parameter rather than...
ChrootManager: clean StopInstance
Currently it has lots for duplicated code, and internal retries.Clean it up with the following assumptions:
We'll probably be called more than once.It is ok to fail to stop, unless we're called with force=True.If we're called only once, and with force=True it's ok not to run the...
cli: add a timeout option
KVMHypervisor: use the StopInstance retry feature
Since we know StopInstance is going to be called more than once (atleast twice, once with force and once without, but normally quite a lotmore) we don't need our own sleep/loop, and we can just send one monitor...
backend.InstanceShutdown: small cleanup
1) unhardcode the timeout, abstracting it in a constant2) Use time.time() rather than hiding the timeout in a range()3) call hyper.StopInstance multiple times -- currently all hypervisors just ignore all calls but once...
Add default instance shutdown timeout constant
It reflects the "current" two minutes we give to the instance.
Hypervisors: Add retry= to StopInstance
Currently some hypervisors need the stop operations to be retried morethan once, while other ones only do it in one pass. With this changewe'll handle retries outside the hypervisor code, but telling whetherthis is the first try or not....
Get rid of utils.CommaJoin
- We never remember to use it (5 uses vs 21 " ,".join())- It's longer to write than " ,".join()- The added value of the apostrophe in the string is not very much
Match instance and node names case insensitively
Since DNS cannot contain two names with different cases anyway, thisshould be ok.
Add case_sensitive keyword to MatchNameComponent
Now featuring unit testing, and more deterministic results on somecorner cases.
VNC password: move to hv param and use in kvm
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Check the OS name for variants
If an OS supports variants, unless --force-variant is specified a validvariant must be passed.
Add force_variant slot to Create/ReinstallInstance
These two opcode need to know whether an unknown variant must be forcedthrough or not.
Allow --force-variant for instance add/reinstall
Passing this option makes an undeclared variant be passed to the os "asis", hoping it'll be able to figure it out (as per the design doc).
Update client os lists to name+variant format
List of OSes are displayed by gnt-os list, rapi, and gnt-instancereinstall --select-os, and checked by burnin. In all of these show thelist with name+variant, if the os has variants.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Add slot and constant for supported OS variants
The slot will contain a list of variants, and the variants file constantcontains the file in the os dir which is supposed to hold the list.
Populate OS variants if an api >= 15 is present
Adding the file name to the os_files dict will fill in the full path andget it checked, if present we also read it and split into lines, one perdeclared variant.
OSEnvironment: populate OS_VARIANT
According to the design on api_version >= 15 the OS variant is the partof the OS name after the "+" sign. If none is found, we just pass in thefirst variant an OS declares (which is bound to exist, as we check forit in _TryOSFromDisk)....
OSFromDisk: handle variants when loading os
When we load an OS from disk, we need _TryOSFromDisk to get the realname, without any variant. This allows any functionality that uses theinstance OS to handle a name with a variant.
Add per-node variants list to OS diagnose output
Add "variants" field to LUDiagnoseOS
If selected this field will contain a list of os variants supported onall nodes.
cli.CalculateOSNames
Given an os and its variants, return a list of "full" os names.
Fix rpc.call_os_get to actually return the OS
Since nobody ever read the actual OS object, this bug was introduced inthe rpc conversion.
Convert os api version file name to a constant
TryOSFromDisk: s/os_scripts/os_files/
We'll be using this dict/loop to check more than just scripts, so we'rerenaming the variables appropriately.
TryOSFromDisk: only check actual os scripts for +x
Currently all checked files in the loop are os scripts, so nothing willchange, but in the future we only want the +x bit on actual os scripts,not necessarily all files.
Add support for using the bootloader in xen-pvm
This patch adds three optional parameters: - 'use_bootloader', whether use or not the bootloader - 'bootloader_path', absolute path to the bootloader - 'bootloader_args', extra arguments to the bootloader...
Replace all xrange() with range()
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
More locking tests race conditions fixes
There were more race conditions. By adding a notify function toSharedLock.acquire we can prevent them.
LUSetNodeParams: autopromote self when needed
If we're de-offlining or de-draining a node we need to promote it to MCif we have not enough, or the config will be corrupt.
Abstract self-promotion decision
During node add we decide whether to self promote to an MC. Abstractthis decision making to a separate function.
Fix master candidate removal
Currently during a master candidate removal, when it's possible topromote another node, the removal operation fails because of a corruptconfig before it's even possible to do the promotion. Fixing this bydoing the promotion before, excluding the current node....
LUSetNodeParams: Don't break config on mc demotion.
If --force is used to demote an MC, but then there are not enough MCs inthe cluster, the configuration gets corrupted until a node is promoted.
In order to avoid that we only allow demotion with --force if the node...
Master candidate stats, return one more value
Other than returning the current number of candidates, and the number ofdesired and possible candidates, we also return the maximum possiblenumber, even if greater than our desires. All callers for now ignore...
SingleActionPipeCondition =~ s/Action/Notify/
With this patch we simplify usage on the SingleActionCondition (whichwasn't a condition at all) by making it a real condition. This way wecan just wait() on it, or notifyAll() as we would on a normal one. The...
Abstract "base" condition code in a separate class
Each condition has an underlying lock, the acquire and release methods,and a few helper methods to check that it's called in the proper way.
Abstract them to a separate class so we can have more than one without...
locking.SharedLock: Fix bug in delete function
SharedLock.__acquire_unlocked uses keyword parameters. Just passingthe timeout would set the “shared” parameter.
Rename LockSet.acquire parameter “blocking” to “timeout”
Also remove the “blocking” parameter from LockSet.remove andGanetiLockManager.remove. There's no point in implementing timeouts on removalunless we need them.
Change SharedLock to new pipe(2)-based condition
Add _PipeCondition class
_PipeCondition is a condition implemented using pipe(2) and poll(2).It allows the implementation of timeouts without using a busy-wait loopwith time.sleep.
Unlike Python's built-in threading.Condition class and to save filedescriptors and an internal queue, it can only be used to notify...
Add _SingleActionPipeCondition class
This class will be used as a basic block for pipe(2)-basedconditions. Upon initialization it creates a pipe and can benotified once (hence the “single action” in the name). Acallable helper class is used to wait for notifications....
SharedLock: implement timeouts
This patch greatly simplifies the SharedLock code and implementstimeouts for the acquire() and delete() functions. A wrapper aroundPython's threading.Condition class must be used to ensure threadsafety when check whether there are any waiters left....
Extend confd instances ips query
The query now accepts a link parameter.
Signed-off-by: Luca Bigliardi <shammash@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Merge remote branch 'origin/master' into mogu
confd/client: make it possible to update peer list
Until now the peers have to be the same all the time. Adding a newfunction to update the list, and call it from the constructor to avoidduplicating code.
confd/client: pass self to upcalls
It may be handy for upcalls to know which client called them, and callit back. So we create a new "client" field in the upcall target,containing the current client instance
ConfdFilterCallback: fix a bug in expire
The HandleExpire function takes the whole "up" structure, and not justthe salt.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Encode the actual exception raised by LU execution
Currently, the actual exception raised during an LU execution (one ofOpPrereqError, OpExecError, HooksError, etc.) is lost because thejqueue.py code simply sets that to a str(err), and the code in cli.py...
Move the luxi error handling into errors.py
Currently the luxi error handling is hardcoded as special encoding onthe masterd-side and special decoding on the client side. This patchmoves it to errors.py such that other parts of the code can reuse thesame encoding....
Fix the confusing ssh/hostname message in node add
Before, it used to say:
ssh/hostname verification failed node1.example.com -> hostname mismatch, got node2
Now it says for wrong hostnames (maybe too verbose):
ssh/hostname verification failed (checking from node1.example.com): hostname...
Implement ConfdFilterCallback
This callback can be stacked with another one, and will filter duplicateor old results, making handling of results easier.
Remove secrets and kill confd on cluster leave
Confd client: add module level documentation
Populate the docstring with documentation on the client library's usage.
Add uuid on node/instance add and cluster init
This patch does a little bit of cleanup first, since we want to callGenerateUniqueID without reacquiring the lock.
Note that we don't necessarily need to do this for the cluster, since atfirst startup ConfigWriter will do it anyway. But it's better to...
Automatically cleanup _temporary_ids at save
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>(cherry picked from commit 34d657bae4361b9d6fd8c6314dc7cca57b51c773)
Separate the computation of all config IDs
We will need this in another place, so we abstract the 'compute allcurrent IDs' functionality into a separate function. We also change thename of the _ComputeAllLVs to _AllLVs to match the other _All*sfunctions....
Change config upgrade to be explicit
Currently the config upgrade is done at each object instantiation, thatmeans that ganeti-noded will run UpgradeConfig on all objects receivedremotely (instances, disks, nics). This is not so good, so this patchchanges it so that only the ConfigWriter runs this method at...
Merge commit 'origin/next'
Add missing import sys to lib/daemon.py
It does “print >> sys.stderr, …” but there is no import sys.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
KVMHypervisor: wrap long line
Export the uuid in RAPI
This also simplifies a little the field declaration in RAPI.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Export and show the cluster uuid in cluster info
Implement uuid in gnt-node/instance list and info
The patch modifies LUQueryInstanceData to return the uuid too and alsoadds support for it in the gnt-* scripts.
Simplify handling of regular fields in LUQuery*
For fields that correspond directly to an object's field, we cansimplify the handling. The patch also adds the new 'uuid' fields toobjects so that it can be queried.
Signed-off-by: Iustin Pop <iustin@google.com>...
Automatically fill in missing UUIDs
The patch also starts using the current UUIDs (in the new attributes)while computing the _AllIDs list.
Add uuid attributes to configuration entities
Node init: copy hmac key as well
Without this confd will not start when a node is added to the cluster.
Unpack the confd reply as an object, from the dict
KVM nic script: enable interface forwarding
If forwarding is enabled globally this is a no-op. If instead it'senabled only for some special interfaces where instance traffic has togo to/comes from (for example a gre tunnel) then it's useful toexplicitely enable it for the instances interfaces as well....
KVM nic script: use routed link as table
In order to be able to maintain the node network standard routinguntouched while routing instance traffic through a different dedicatedinterface (eg: a gre tunnel) we need to specify the instance routingpath inside a separate table, which will also contain different default...
Confd client: make SendRequest args optional
By default "None" will be used as an args value
Confd client: Change callback model
We move to one callback in total, rather than one per call, and call itboth for server replies and request expiring.
Confd client: make confd port configurable
The port can be now chosen at library init time, with a default ofcalling GetDaemonPort.
Confd client library: enable optional logging
If a logger is passed in, we log some debugging messages that might helpsomeone who's debugging a confd client to understand what's going on.
Confd: add instances IPs query
Extend confd to answer queries about instances IPs.
Signed-off-by: Luca Bigliardi <shammash@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix utils.MatchNameComponent for full matches
While ‘test1’ matches both ‘test1’ and ‘test1.example’, it has a full,exact match and we should return it if that is the case.
Fix _RemoveDisk for file based instances
During 621b7678 two typos were introduced which prevent file basedinstances removal to work correctly. Fixing both of them to what theywere meant to be.
cmdlib._CreateDisks fix a broken result.Raise
The format string has the ": %s" at the end, but no argument is passed,which of course raises a TypeError. Removing ": %s" as it's added bythe RpcResult Raise() method anyway.
Unify the instance creation code
Currently the AddInstance in gnt-instance and ImportInstance ingnt-backup duplicate all of their code except the actual opcode creation(the parameters to it). By moving this to cli.py (not optimal location,but we don't have another one), we can use a single copy of the code,...
Move the “--ignore-secondaries” option to cli.py
Move the “--no-shutdown” option to cli.py
Remove explicit DEBUG_OPT and add it by default
Since >90% of the commands take the “--debug” option, and all shouldactually take it (the gnt-job command is currently missing it), it makessense to simply remove this and add it by default in cli.py.