ConfigWriter: prevent using a foreign config
If the configuration file doesn't denote this node as master, we preventstartup. This would have detected our previous race condition moreeasily, hence we add it as a permanent check.
Signed-off-by: Iustin Pop <iustin@google.com>...
Fix bootstrap.MasterFailover race with watcher
This fixes a recently diagnosed race condition between master failoverand the watcher.
Currently, the master failover first stops the master daemon, checksthat the IP is no longer reachable, and then distributes the updated...
ConfigWriter: protect against multiple writers
This should fix the case where there are two masters that both try todistribute the configuration file to the cluster. The first one that does so,will "win" the ownership of the config.data.
backend.Upload: switch to utils.SafeWriteFile
This allows serialization of updates to a given file, with respect toother cooperating writers.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add a "safe" file wrapper over WriteFile
Add functions to read and compare file 'ID's
LUSetInstanceParams: Remove unused attribute
“os_new” is not used anywhere, removing it.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Adding backend method to wipe a block device
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Allow to specify wipe command and flags at configure time
Fix typo introduced in 8d8c4ef
Commit 8d8c4ef broke instance reinstall with different OS, due to anattribute typo.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Fix clearing of the default iallocator
And also update the man page.
gnt-instance reinstall: Allow overriding OS parameters
This allows OS installation scripts to make use of special parameters,e.g. to retain some data on reinstallation.
The RAPI resource is not updated as it takes all parameters via thequery string and encoding arbitrary data in a query string is tricky....
Add option to ignore offline node on instance start/stop
In some cases it can be useful to mark as an instance as startedor stopped while its primary node is offline. With this patch,a new option, “--ignore-offline”, is introduced to “gnt-instancestart” and “… stop”....
utils: Add function to find items in dictionary using regex
This basically extracts a small piece of code from ganeti-rapi and putsit into a utility function. RAPI resources are found using a dictionaryin which the keys can either be static strings or compiled regular...
Let gnt-cluster support prealloc_wipe_disks
This includes a new option gnt-cluster init and approriate outputon gnt-cluster info. Though gnt-cluster modify is not yet prepared.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
http.client: Disable SSL session ID cache
This patch disables the SSL session ID cache for all cURL operations.This is needed because http.HttpBase's PyOpenSSL implementation does notcurrently set a context using SSL_set_session_id_context(3SSL), cURLtries to re-use the session ID and, according to...
http.auth: Fix docstring error
This was missing from commit 2287b920.
Merge branch 'stable-2.2'
Merge branch 'stable-2.2' into devel-2.2
Fix compatibility with Pyinotify 0.8
I didn't know why the code previously used“pyinotify.EventsCodes.ALL_FLAGS” instead of using the flags from“pyinotify.EventsCodes” directly. Turns out that Pyinotify 0.8 has themin “pyinotify”, not “pyinotify.EventsCodes”....
Extract base class from SingleFileEventHandler
The base class can contain code useful to other inotify users.As it is “SingleFileEventHandler” can not be used in ganeti-rapi,therefore it'll use its own small inotify handler class basedon this base class....
http.auth.ReadPasswordFile: Don't read file directly
Reading the file before this function allows for better errorreporting.
Move the parameter types to their own module
This is for cleanup, and for later reuse in other parts of the code(outside of LUs).
"Fix" handling of old software versions on startup
Currently, masterd startup with old software versions is very confusingfor users: we present two tracebacks, with a message in the middle about"version mismatch". This can lead to users believing that all that needs...
Export more information via LUQueryInstances/RAPI
Currently, the custom instance parameters (hv, be, nicp) are onlyqueryable via LUQueryInstanceData. LUQueryInstance returns only thefilled parameters, thus its users (especially RAPI) have no way to know...
Set list of trusted SSL CAs for client to verify
As per SSL_CTX_set_client_CA_list(3SSL), set the list of acceptable CAsadvertised to SSL clients to include the server's own certificate. Thisevidently fixes the pycurl/gnutls RPC client.
During the TLS Handshake, when client verification is requested, the...
Show instance state in instance console failures
The current message is not entirely clear, as it doesn't show the reasonwhy the instance is not running.
Fix epydoc errors
And sorry!
jqueue: Fix bug when cancelling jobs
If a job was cancelled while it was waiting for locks, an assertionwould've failed. This patch fixes the problem and provides a unittest to check for this situation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
mcpu: Raise directly in _AcquireLocks
Removes code duplication.
jqueue/gnt-job: Add job priority fields for display
These fields can help with debugging.
jqueue: Resume jobs from “waitlock” status (2nd try)
Commit 5ef699a0e had to roll back an earlier attempt at implementingthis. With the improved job queue processer, this is finally possible.
Add prealloc_wipe_disks as a cluster-wide configuration variable
This is the first step for the support of wiping block devices priorto creation of the instance.
Merge branch 'devel-2.2'
Conflicts: lib/rpc.py (trivial, copyright header)
RPC: disable curl's Expect header
This patch solves the very slow (~8-9 seconds) gnt-instance modifybehaviour. Well, it solves in general the slow RPC behaviour, but it wasmost visible in that LU.
It seems that curl's behaviour with regard to file uploads (via PUT) and...
jqueue, CancelJob: Check status only once per call
This simplifies the code a bit--the status is only checked once.
Fix a rare bug in StartDaemonChild and GenericMain
I've seen cases where the result from str(sys.exc_info()[1]) is ""; thisbreaks the error reporting as the parent relies on non-empty errormessages to properly detect child status (otherwise it will try to read...
Enhance the error reporting
Since daemon startup error will be often related to socket errors, so itmakes sense to change the original reporting:
Error when starting daemon process: "(98, 'Address already in use')"
Into:
Error when starting daemon process: 'Socket-related error: Address...
Change daemon.GenericMain/utils.Daemonize workflow
This patch copies the pipe-based error reporting functionality fromutils.StartDaemon (I gave up for now on tryin to merge the two).
This patch will fix two longstanding bugs:
- if we fork, we lose all error reporting from the child to the original...
Change utils.GenericMain protocol
Currently, GenericMain does a two-staged workflow:
- Check, before forking- then Exec, after forking
This means we don't have any possibility to treat preparation work(before the daemon is ready for work) different from the actual work....
Use only one version of WritePidFile
This patch merges the pid file handling used for ganeti-* daemons andimpexp daemons. The latter version is used, since it's more reliable:uses locked pid files as opposed to checking 'live' processes.
Abstract daemon file descriptor setup
This does some slight changes:
- Daemonize() doesn't explicitly close the file-descriptors anymore, but only implicitly via the usage of dup2- StartDaemonChild uses separate devnull for stdin (rdonly) and stdout/stderr (wronly), or if using a log file, it uses it in append...
Abstract some daemon functionality
This patch abstracts the chdir/umask/setsid functionality, which isidentical in the code functions, just that Daemonize did the chdir/umaskin the second child; with this change it does it in the first, asStartDaemon....
Export VG name via LUQueryConfigValues
This will be used by LUXI client programs to display the VG name.
LUGetTags: Acquire locks in shared mode
Retrieving tags can be done while the lock is shared. Only writingneeds to be exclusive.
Also add a FIXME for cluster tags, where the code currently doesn'tuse any locks except the config lock.
LUDelTags: Improve formatting of error message
Use utils.CommaJoin to add spaces after comma, clean up code a bit.
Before: Tag(s) 'bar','baz','foo','moo' not foundAfter: Tag(s) 'bar', 'baz', 'foo', 'moo' not found
cli: Move parsing of --net option to separate function
This function will also be used in tools/move-instance.
kvm: collapse two consecutive extend calls
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
kvm: Introduce support for -mem-path
Using hugepages, KVM instances can get a good performance boost. Toactivate that, we need to pass the -mem-path argument to KVM along withthe mount point of the hugetlbfs file system on the node.
For the sake of memory availability computation, we use the -mem-prealloc...
Conflicts: lib/objects.py (trivial, strange that this one, and only this one, conflicted)
Rename the _oss cluster vars to _os
Per the mailing list discussion, rename _oss to _os, both in cluster parametersand in the rest of the code.
This is just an s/_oss/_os, with the exception of a small bit of cleanuparound the helper_os function in cmdlib.py....
KVM: Add function to check the hypervisor version
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix instance rename regression from 3fe11ba3
Committ 3fe11ba3 broke the instance rename as we don't use the FQDNanymore. This fixes it.
Sort OS names and variants in LUDiagnoseOS
The OS list and variants as returned from LUDiagnoseOS is not sorted,and gnt-instance reinstall doesn't sort it either. This means that itthe menu that users are present with is inconsistent across clusters,and that is confusing....
Change behaviour of OpDiagnoseOS w.r.t. 'valid'
This patch changes the behaviour of OpDiagnoseOS with regards to the'valid' field to be similar to the one for the hidden/blacklistedfields: unless this field is requested, invalid OSes are filtered out....
Allow gnt-os modify to change the new OS params
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Add two more _T-type tests
These are useful for more in-depth checking of some kinds of arguments.
Add blacklisted/hidden OS support in LUDiagnoseOS
This changes the behaviour of LUDiagnoseOS significantly.
The addition of hidden/blacklisted OSes would mean that each user-facingclient would have to filter intentionally such OSes from display, whichis not a good choice. Rather, the patch makes LUDiagnoseOS not return...
Restrict blacklisted OSes in instance installation
Add two new cluster settings
The new variables are:
- a list of hidden OSes, that should not be displayed to the users in interactive selection (e.g. reinstall); however, if they are selected, they can be used- a list of OSes that should be hidden and blocked from install-time selection...
Abstract OS name/variant functions
Currently, the computation of the 'pure' name or the variant ishardcoded and spread around the functions that need it. This is notnice, and in the future we'd spread it even more with more usage ofvariants/pure os names....
Avoid nodegroup name/uuid conflicts
Forbid nodegroups to be called with a name that matches the UUID regularexpression. Uppercase versions are forbidden as well.
Move the uuid regexp to utils.py
Fix docstring typo in jqueue._JobProcessor._MarkWaitlock
epydoc complained:“File …/ganeti/jqueue.py, line 886, inganeti.jqueue._JobProcessor._MarkWaitlock Warning: Redefinition of type for job”
jqueue: Use priority for acquiring locks
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
mcpu: Implement priority for lock acquiring
Until now the priority for lock acquires couldn't be passedwhen running opcodes.
locking: Implement priority in Ganeti lock manager
locking: Don't set default priority as keyword default
This allows users of these classes to simply pass None if they want to use thedefault value (the actual default is an internal constant), instead ofdynamically constructing the keyword arguments.
jqueue: Use timeout when acquiring locks
As already noted in the design document, an opcode's priority isincreased when the lock(s) can't be acquired within a certain amount oftime, except at the highest priority, where in such a case a blockingacquire is used....
Migrate call from backend._GetVGInfo to bdev.LogicalVolume.GetVGInfo
This patch removes duplicate code found in backend which also needs toget VG infos. To make it simpler we moved to bdev.LogicalVolume.GetVGInfo.
Signed-off-by: René Nussbaumer <rn@google.com>...
jqueue: Introduce per-opcode context object
This is better to group per-opcode data.
mcpu: Adjust lock acquire strategy
The changes to job queue processing require some changes on this class'interface. LockAttemptTimeoutStrategy might move to another place, but that'llbe done in a later patch.
mcpu.Processor: Raise exception on lock acquire timeout
Right now the timeout is not passed by any caller, making the codeeffectively go back to blocking acquires. Since the timeout is alwaysNone, no caller needs to be changed in this patch.
This change also means that any LUXI query handled by ganeti-masterd...
jqueue: Rename current_op to better reflect what it actually is
jqueue: Separate function for in-memory variables
jqueue: Add unittest for _QueuedJob.CalcStatus
Use free space in vg instead of biggest free pv space for a snapshot
Even for snapshot we looked at the biggest free pv space even thoughthe vg might have fit the snapshot we aborted if one of the pvs was toosmall. This patch fixes this by looking at the vg size instead of the pv...
Merge branch 'devel-2.1' into devel-2.2
Fix mac checker regex
Currently, the mac checker regex could match a corner case of11:22:33:44:55:66: (one extra colon at the end). We fix this, and wealso move the regex compilation outside of this function, at modulelevel.
Ignore failures while shutting down instances during failover from offline node
Don't abort failover if instance shutdown doesn't work on a node markedoffline. The node is offline, so the instances living on it are too. Beforeyou had to use --ignore-consistency to archieve that....
cli: Expose priority option and pass priority to master daemon
jqueue: Change model from per-job to per-opcode processing
In order to support priorities, the processing of jobs needs to bechanged. Instead of processing jobs as a whole, the code is changed toprocess one opcode at a time and then return to the queue. See the...
jqueue: Use priority for worker pool
A small helper function is added to make this easier. Priorities are notyet used in all necessary places.
Fix migration on new KVMs
New KVMs (0.12.1.2-el6 and 0.13.5 tested) exit immediately afterunsuccessful network connection when they are in "-incoming" mode. Thesimple check netutils.TcpPing causes remote kvm to exit so the migrationwill always fail. This check is also redundant by the way as if the...
cli: Pass options in {Add,Remove}Tags
They'll be used for job priorities. Also add an empty line tognt-os where it's missing.
jqueue: Add missing docstring to _QueuedJob.Cancel
This was forgotten in commit 099b2870b.
Bail out if daemon gets fired up under wrong uid
This patch bails out in early stage, if the user invoking the daemondoesn't match the user set at configure time.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
opcode summary: improve display for list summaries
Currently, opcodes like NODE_EVAC_STRATEGY look bad:
89684 error NODE_EVAC_STRATEGY([u'node3'])
With this patch, we try to render list arguments a little bit better:
89684 error NODE_EVAC_STRATEGY(node3)...
workerpool: Fix typo
A call to logging.debug was missing an argument, leading to complaintson stderr at runtime.
jqueue: Move CancelJob logic to separate function
Moving the internals of this function will allow it to be used fromunittests in the future. Splitting this into a pure, side-effect freefunction and an impure one makes the pure function easily testable....
(no conflicts, took LGTM from original commit)
cmdlib: Fix type of “name” parameter for tag operations
The parameter “name” is be None for cluster tags.
rlib2: Set tag operation param “name” to None for cluster tags
Otherwise parameter verification in the master daemon fails.
Stop all daemons precautiosly before trying to start ganeti-noded again
Please note that if the pid file is broken or missing we'll not catchthe process (if any is running) and it's up to the user to fix this state