Switch from os.path.join to utils.PathJoin
This passes a full burnin with lots of instances, and should be safe aswe mostly to join a known root (various constants) to a run-timevariable.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
utils: Add a PathJoin function
This will replace os.path.join since it is not safe for directorytraversal issues.
Add an extra safety layer to _CleanDirectory
In order to protect from accidental use of _CleanDirectory on a randomdirectory, we add a list of allowed clean directories, somewhat similarto _ALLOWED_UPLOAD_FILES (but statically computed).
Signed-off-by: Iustin Pop <iustin@google.com>...
Avoid absolute path for privileged commands
Using absolute path for a privileged command is a bad idea as this path may vary.For example /usr/sbin/brctl in Debian and /sbin/brctl in ALTLinux. Using $PATH isa better idea.
Signed-off-by: Vitaly Kuznetsov <vitty@altlinux.ru>...
Make SSH_CONFIG_DIR customizable
This patch adds ability to customize ssh config directory with --with-ssh-config-dir(instead of hardcoded /etc/ssh value). This is useful in Linux distributions withcustom ssh config directories (/etc/openssh in ALTLinux, for example)....
Merge branch 'stable-2.1' into devel-2.1
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Add NLD constants to Ganeti
This avoids the need for them to be injected in the nbma repository.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Merge remote branch 'origin/devel-2.0' into devel-2.1
Conflicts: NEWS: Trivial configure.ac: Trivial...
Fix two potentially endless loops in http library
The first can be problematic if poll(2) returns POLLHUP|POLLERR on asocket. Before it would be only be respected for SOCKOP_RECV, but sincethey can also occur on other socket operations, esp. in combination with...
Move watcher's EnsureDaemon function to utils
This is going to be used from the nbma repository, to ensure that thenld daemon is running.
Add multi-key support to the serializer
Signed-off-by: Balazs Lecz <leczb@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix bug in LUQueryConfigValues
LUQueryConfigValues supports multiple output fields. If the client askedfor the watcher pause status, it would not get a list, but simply thevalue.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add watcher hooks
These hooks are run on all nodes, after the "base" daemons are started.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix typo in LUVerifyCluster when checking node time
The first argument to _ErrorIf should always be True in this case.
Implement utils.RunParts and use it for hooks
This function is a generic pythonic version of runparts. We currentlyuse it in the backend HooksRunner, but we'll use it for runningdifferent directories as well.
Change backend hooks runner to use RunCmd
And save lots of lines of code, in the process
Add reset_env option to RunCmd
This allows to run a command with only the passed in environment, ratherthan just updating the default one with it.
Now with unit testing.
Make it possible to pass custom private key path to SshRunner.Run
Signed-off-by: René Nussbaumer <rn@google.com>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Handle EAGAIN in LUXI client
If too many clients try to connect to the master at the same time, some ofthem might fail if the master doesn't accept the connections fast enough.
Show message when job is waiting in queue or for locks
Jobs submitted via the standard command line utilities didn't give anyindication that anything is happening while they were waiting in the jobqueue (e.g. due to other jobs using all worker threads) or acquiring...
Add LUNodeEvacuationStrategy
Add a new opcode for node evacuation
We add this as a new opcode since we don't want to alter the behaviourof current opcodes/lus.
Implement support for mevac in OpTestAllocator
Implement IAllocator multi-evacuate mode
This is a new mode that request a solution for the evacuation ofmultiple nodes. The external script will be fed a list of names, and isexpected to return a list of [instance, new_node(s)] lists, detailingthe evacuation path of each instance....
Accept both 'nodes' and 'result from iallocator
This patch switches the default result key from 'nodes' to 'result'. Theold name is still accepted for backwards-compatiblity, and should beremoved in later versions.
Change internal API for the IAllocator class
Currently the 'name' parameter in the constructor is required (as anon-keyword argument). Since the (to follow) node evac IAllocator modedoesn't have 'name' as a valid argument, we're moving this one into the...
Remove redundant code in IAllocator class
This moves the setting of the request member on the in_data, of therequest type, and of the branching basef on request type outside ofindividual functions and directly into the constructor.
Since the values we're using externally are identical to the...
bootstrap: Wait for node daemon when adding new node
Until now this was only done for the master node, thoughthe problem originally fixed in 8f215968 also occurs forother node daemons.
Reset tempfile module after fork where useful
Move RunInSeparateProcess to ganeti.utils
This function could be useful in other places and thisway we can easily unittest it.
Add function to reset tempfile module after fork
On fork, the tempfile module's pseudo random generator isnot reset. If several processes (e.g. two children or parentand child) try to create a temporary file, they'll conflict.This function can be used to reset the name generator which...
Fix ssh host key checking with no-key-check
In case we add a node with “--no-ssh-key-check”, this should overrideany default yes/ask values in the system-wide (or user) ssh key check.
Currently this only works in batch mode, whereas in non-batch we only...
Simplify a bit _GetWantedNodes
This should have been done in the _ExpandNodeName patch.
Fix a wrong docstring
There's no such thing as OpProgrammerError (I found this as I wrote itin code in another place, and pylint complained).
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Remove boiler-plate code about node/instance names
Currently we have lots of duplication of the error-checking (and properexception raising) around node/instance name expansion. LUCreateInstanceis the only place where we have abstracted this.
This patch creates two functions (ExpandNodeName and ExpandInstanceName)...
Merge remote branch 'origin/stable-2.1' into devel-2.1
Release all node locks during disk replace
This patch extends commit 7ea7bcf by releasing all node locks in diskreplace for the early release mode. The rationale behind this is:
- LUCreateInstance already releases all node locks while waiting for disk synchronization, and does an instance startup later...
Unify a few re.compile calls in DRBD
These are both cleanups and, in the case of _MassageProcData, switchingfrom a weaker RE to a stronger one (we now need cs: in the line,previosuly any line starting with \d+: was accepted).
Auto-enable early release for offline old nodes
In case the old node is offline, we won't be able to talk to it toremove the storage, and in most cases the node is poweredoff/unreachable.
In this case, it makes no sense to delay the storage release, so we...
Run instance hooks on more nodes
This should fix issue 68: some hooks should be run on more nodes thancurrently. GrowDisk runs on both nodes, remove run the post hook on theinstance's nodes, and failover and migrate run the post hook on thesource node too....
Add {NEW,OLD}_{PRIMARY,SECONDARY} vars to hooks
Per issue 71, the migrate and failover need special variables forkeeping the nodes consistent during instance migrations.
Pass debug mode to noded for OS-related calls
Add the options attribute to cli.JobExecutor
Implement generic CLI options->opcode updates
This patch changes SubmitOpCode and SubmitOrSend such that we have asingle function that does generic CLI options to opcode attributesfunction. This will allow, once all scripts pass the opts argument toSubmitOpCode, to pass the debug parameter or the dry-run one to the LUs....
Change the debug CLI option to integer/count
This changes from boolean to integer/count (for a future differentiationbased on the actual debug level). All the uses of the code only testit's boolean status, so it still works as an integer value.
Add a generic 'debug_level' attribute to opcodes
Also automatically fix opcodes which have this missing in the LU initroutine.
Fix bug introduced in commit 413b747
While commit 413b747 fixed the issue of poll(2) returning toosoon, it didn't work when the poll(2) call should've beenblocking. This is now fixed and verified.
Fix dumpers/loaders after slots cleanup
Commit 154b958 changed (correctly) the slots usage, but this brokedumpers/loaders since we relied directly on the own class slotsfield.
To compensate, we introduce a simple function for computing the slots...
Fix locking bug causing high CPU usage
Iustin Pop noticed unusually high CPU usage with 2.1's masterdaemon, even with very simple opcodes like OP_TEST_DELAY. Asit turns out, we inadvertently passed seconds as millisecondsto a call to poll(2). Due to the way the loop around the call...
Add an early release lock/storage for disk replace
This patch adds an early_release parameter in the OpReplaceDisks andOpEvacuateNode opcodes, allowing earlier release of storage and moreimportantly of internal Ganeti locks.
The behaviour of the early release is that any locks and storage on all...
TLReplaceDisks: Delay iallocator when evacuating node
When evacuating nodes, the iallocator was run for allinstances without taking planned changes into consideration.This patch delays part of CheckPrereq and running theiallocator for node evacuation....
Implement debug level across OS-related RPC calls
This doesn't implement the full functionality, we need to add the debuglevel to the opcodes too, but at least won't require changing the RPCcalls during the 2.1 series.
Second try to fix LUVerifyCluster
My previous patch, commit 785d142, fixed the case where a node is markedoffline. With this patch it'll also handle other failures correctly.
LUVerifyCluster: Fix bug with offline nodes
[…] * Other Notes - NOTICE: 1 offline node(s) found. * Hooks ResultsFailure: command execution error:iteration over non-sequence
Commit a0c9776a introduced an error simulation mode to LUVerifyCluster.Due to a small mistake, offline nodes weren't skipped when checking the...
utils: Fix retry delay calculator
Before this patch, it would always sleep for at leastthe time specified as the upper limit. Now it actuallylimits the sleep time.
Bump RPC protocol version to 30
Make the snapshot decision based on disk type
… instead of disk size, which is not as reliable. This actuallysimplifies the code; but it still leaves the possibility of stackoverflows if the disk data structure is corrupted.
Fix missing bridge for xen instances
Xen instances nic definitions miss the target bridge.
This bug was introduced in commit 503b97a9.
Signed-off-by: Alessandro Cincaglini <alessandro.ciancaglini@gmail.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>...
Fix flipping MC flag bug
Currently unofflining or undraining an already functional mastercandidate node, can cause it to demote itself. In order to avoid that weonly trigger the self-promotion check if the node is not currently acandidate.
Add capability to use syslog for logging
This patch adds a configure-time parameter that will set the defaultsused by all programs, and command-line parameters in the daemons thatallow overriding it.
Syslog 'yes' enables syslog in addition to file-based logging, 'only'...
utils.FileLock: handle init errors properly
If the open of the lock file fails (due to whatever reason), 'self'won't have the 'fd' attribute, and thus we fail in Close/__del__, whichwill ruin proper error reporting:
IOError: [Errno 30] Read-only file system: '/var/lib/ganeti/queue/lock'...
locking: add/fix @type information
This patch missing @type information for all public methods, modifiesone to conform to the rest, and removes some information from @paramwhen it's been expressed in @type.
Fix slots definitions
According to http://docs.python.org/reference/datamodel.html#slots
Merge branch 'devel-2.0' into devel-2.1
Conflicts: lib/backend.py - trivial merge...
Add a crude disable for DRBD barriers
Ideally we want to/will have per-device DRBD controls of disk/metadataflushes. In the meantime, we want at least a disable of the barrierfunctionality for cases where one has battery-backed caches.
Background: DRBD has four mechanism of handling ordered disk-writes....
LURemoveNode safety in face of wrong node list
LURemoveNode runs under the BGL, which means we're guaranteed that thelist of nodes as retrieved in CheckPrereq is still valid inBuildHooksEnv. However, we can make Ganeti handle failures in case thelocking is broken (or the node list has been modified otherwise) easily,...
Fix an unsafe formatting bug
This might fix issue 84; in any case, the current situation is that wehave a potentially unsafe formatting, which should be fixed.
Ensure all int/float conversions are handled right
int()/float() can raise either ValueError (in case of int("a")), orTypeError (in case of int(None)). We had many bugs over time due tothis, and a recent one was just diagnosed, so we go over the codebase...
KVM: fix pylint warning
Specify string format arguments as logging function parameters
Signed-off-by: Guido Trotter <ultrotter@google.com>
KVM: be more resilient on broken migration answers
Before, when doing kvm live migrations we use to accept an "unknownstatus" but to reject anything that didn't match our regexp. Since we'veseen "info migrate" return a completely empty answer, we'll be more...
cli: Fix bug when not using headers
Commit 9fe72672 added code to not write spaces at the end of each line.Unfortunately it didn't work properly when not printing headers—there wouldstill be spaces.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Switch the SplitKeyVal function to accept escapes
This tiny patch switches the SplitKeyVal function (and thus the commandline options like -H, -B, etc.) to UnescapeAndSplit, thus allowing oneto use escaped commas in the values of parameters.
confd client: copy the peers in UpdatePeerList
Since the peer list is shuffled by the client, we don't keep a referenceto the list which was passed in, but copy it internally.
Add an UnescapeAndSplit function
In many cases, where we accept (usually from the command line) a list ofparameters, we remove the use of the separator as an component of any ofthe elements.
This patch adds a new function that can split strings of the form...
Generate hmac file with a newline at the end
This makes it slightly easier to cut&paste its content.
jqueue: Don't return negative number for unchecked jobs when archiving
When the queue was empty, the calculation for unchecked jobs whilearchiving would return -1. ``last_touched`` is set to 0, the job ID list(``all_job_ids``) is empty. Calculating ``len(all_job_ids) -...
cli.GenerateTable: Don't write EOL spaces
With this change, there won't be unnecessary space charactersat the end of lines.
Improve logging for workerpool tasks by providing repr
Before it would log something like “starting task(<ganeti.http.client._HttpClientPendingRequest object at 0x2aaaad176790>,)”,which isn't really useful for debugging. Now it'll log “[…]<ganeti.http.client._HttpClientPendingRequest...
workerpool: Simplify log messages
workerpool: Use worker name as thread name
This way it shows up in debug logs.
workerpool: Make worker ID alphanumeric
Having a proper name instead of just a number makes debuggingeasier.
locking: Fix race condition in LockSet
This patch fixes a race condition when acquiring all locks ina LockSet instance. The list of lock names needs to be sortedto guarantee a consistent locking order, but the names were notsorted when acquiring all locks in the set....
mcpu: Log lock status with sorted names
Reading and comparing sorted lists is easier when debugging locking problems.
locking: Append to list outside error handling block
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
locking: Don't fail in error handling if lock isn't owned
In case an exception was thrown while acquiring the lock, not necessarily allowned locks are also really acquired. Before this change, an exception could bemasked by another exception thrown here. There is no good clean-up strategy...
Normalize MAC addresses to all lower.
This change will normalize the MAC to all lower after validation.
Signed-off-by: René Nussbaumer <rn@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Introduce a Luxi call for GetTags
This changes from submitting jobs to get the tags (in cli scripts) toqueries, which (since the tags query is a cheap one) should be muchfaster.
The tags queries are already done without locks (in the generic querypaths for instances/nodes/cluster), so this shouldn't break tags query...
LURenameCluster: run post hook on all nodes
Since the cluster name might be used for various purposes on nodes, weshould let all nodes "know" about a cluster rename by running the posthook on all nodes. This will make cluster rename slightlyslower/costlier, but it is not/shouldn't be an operation that is run...
Fix unused imports or add silences where needed
In some cases pylint doesn't parse the import correctly, so we addsilences; but there are also many cases of unused imports, which wesimply remove.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
bdev: Add a TODO and a pylint silence
A piece of old code in bdev.py uses a for loop over a single variablebecause we can 'break' out of the loop or exit on the 'else' path. Thisis not a nice usage of the for loop, it should be converted to astandard if...elif...else structure....
Further pylint disables, mostly for Unused args
Many of our functions have to follow a given API, and thus we have tokeep a given signature, but pylint doesn't understand this. Therefore,we silence this warning.
The patch does a few other cleanups.
LUDiagnoseOS._DiagnoseByOS: remove unused arg
The node_list argument to _DiagnoseByOS is not used, and is obsoleted bythe fact that the rlist argument already has the valid nodes as keys(assuming RPC behaviour didn't change). Thus, we remove it and silence...
hv_xen/_GetConfigFileDiskData: remove unused arg
The disk template is not needed, all that's used is the disk data. Assuch, remove this parameter from the function.
jqueue/_CheckRpcResult: log the whole operation
Currently only the rpc call, but not its description (which also showsthe argument) is logged. We change this to log failmsg too, and thisalso silences a warning.