Support modify of prealloc_wipe_disks config value
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Export a node's group information in iallocator
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Rename node.nodegroup to node.group
In the context of a node, its group has (at least today) only onemeaning, that is the node's node group. As such, we renamenode.nodegroup to just node.group.
Note: if we want to keep node in there, it should be at least...
Rename --nodegroup to --node-group
For consistency with other CLI options.
Export node group data in iallocator
Split IAllocator._ComputeClusterData
The node and instance computations were all in this big function; weseparate them out for more clarity.
Putting the pieces together and invoke the wipe in cmdlib
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Adding RPC call for blockdev_wipe
Second iteration over backend.BlockdevWipe
This patch now uses dd entirely to wipe the disk, make itmuch easier to wipe in blocks so we can give interactive feedbackabout the status.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>...
Simplify and extend the instance OS env
Some parameters were missing (uuid, c/mtime). We simplify the exportmethod; unfortunately we cannot simply iterate over slots since themapping is not 1:1.
Fix QA mixup of node/instance tests
There are two node tests that are run from RunCommonInstanceTests, which is thebad place—it causes these node tests to be run three times instead of once.
ConfigWriter: prevent using a foreign config
If the configuration file doesn't denote this node as master, we preventstartup. This would have detected our previous race condition moreeasily, hence we add it as a permanent check.
Signed-off-by: Iustin Pop <iustin@google.com>...
Fix bootstrap.MasterFailover race with watcher
This fixes a recently diagnosed race condition between master failoverand the watcher.
Currently, the master failover first stops the master daemon, checksthat the IP is no longer reachable, and then distributes the updated...
ConfigWriter: protect against multiple writers
This should fix the case where there are two masters that both try todistribute the configuration file to the cluster. The first one that does so,will "win" the ownership of the config.data.
backend.Upload: switch to utils.SafeWriteFile
This allows serialization of updates to a given file, with respect toother cooperating writers.
Add a "safe" file wrapper over WriteFile
Add functions to read and compare file 'ID's
LUSetInstanceParams: Remove unused attribute
“os_new” is not used anywhere, removing it.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Adding backend method to wipe a block device
Allow to specify wipe command and flags at configure time
Fix typo introduced in 8d8c4ef
Commit 8d8c4ef broke instance reinstall with different OS, due to anattribute typo.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Adjust the error message of setup-ssh if join check fails
Fix clearing of the default iallocator
And also update the man page.
gnt-instance reinstall: Allow overriding OS parameters
This allows OS installation scripts to make use of special parameters,e.g. to retain some data on reinstallation.
The RAPI resource is not updated as it takes all parameters via thequery string and encoding arbitrary data in a query string is tricky....
Add option to ignore offline node on instance start/stop
In some cases it can be useful to mark as an instance as startedor stopped while its primary node is offline. With this patch,a new option, “--ignore-offline”, is introduced to “gnt-instancestart” and “… stop”....
utils: Add function to find items in dictionary using regex
This basically extracts a small piece of code from ganeti-rapi and putsit into a utility function. RAPI resources are found using a dictionaryin which the keys can either be static strings or compiled regular...
QA RAPI: Test HTTP 404 and 501
This tests the HTTP Not Found and Not Implemented errors.
QA: Add test for “gnt-node modify”
Let gnt-cluster support prealloc_wipe_disks
This includes a new option gnt-cluster init and approriate outputon gnt-cluster info. Though gnt-cluster modify is not yet prepared.
Merge branch 'devel-2.2'
Bump version to 2.2.1, update NEWS
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
http.client: Disable SSL session ID cache
This patch disables the SSL session ID cache for all cURL operations.This is needed because http.HttpBase's PyOpenSSL implementation does notcurrently set a context using SSL_set_session_id_context(3SSL), cURLtries to re-use the session ID and, according to...
Crude workaround for pylint breakage
The way we currently call pylint, the exact order it inspect modules inlib/http/ depends on the filesystem order. This is not good, and iflib/http/server.py is loaded before lib/http/__init__.py, it will throwa "R0921:763:HttpMessageReader: Abstract class not referenced" (as that...
http.auth: Fix docstring error
This was missing from commit 2287b920.
devnotes.rst: Remove hardcoded Python version
Merge branch 'stable-2.2'
Brown-bag fix for leftover comment
I did forgot this in the original patch. Sorry!!!!
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Rework QA interaction with the watcher
The interaction with cron-launched watcher is a well-known failure mode of QA:
---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance
For the following tests it's recommended to turn off the ganeti-watcher cronjob....
Add a new watcher option --ignore-pause
During cluster maintenance, when the watcher is disabled, it's useful torun it just once. This is incovenient to do currently, as the watcherneeds to be unpaused, then run, then paused again.
This patch adds an option “--ignore-pause” that can be used to ignore...
Release 2.2.1~rc1
Merge branch 'stable-2.2' into devel-2.2
Fix compatibility with Pyinotify 0.8
I didn't know why the code previously used“pyinotify.EventsCodes.ALL_FLAGS” instead of using the flags from“pyinotify.EventsCodes” directly. Turns out that Pyinotify 0.8 has themin “pyinotify”, not “pyinotify.EventsCodes”....
ganeti-rapi: Watch directory, not file for user file changes
We noticed several issues when just watching the file, among them raceconditions upon replacing the file using rename(2) (the new watcherwould be created too soon). By just watching the directory for events on...
Extract base class from SingleFileEventHandler
The base class can contain code useful to other inotify users.As it is “SingleFileEventHandler” can not be used in ganeti-rapi,therefore it'll use its own small inotify handler class basedon this base class....
http.auth.ReadPasswordFile: Don't read file directly
Reading the file before this function allows for better errorreporting.
Move the parameter types to their own module
This is for cleanup, and for later reuse in other parts of the code(outside of LUs).
"Fix" handling of old software versions on startup
Currently, masterd startup with old software versions is very confusingfor users: we present two tracebacks, with a message in the middle about"version mismatch". This can lead to users believing that all that needs...
Require aclocal 1.11.1 or above for devel/release
1.11.1 is the version in squeeze and lucid, and we know it works. Wealso know that 1.10.1 in hardy and lenny doesn't, nor do 1.10 in etchand 1.9.6 in dapper. We haven't tested any other version.
With older versions python.m4 is buggy, and results in the package being...
Revert "Require aclocal 1.11.1 or above for autogen.sh"
The comparison is incorrect, and the check also breaks daily work onautobuilders and older distros.
This reverts commit dbc4dda7f5b66c9905c3cf6e44414536a5b38177.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Export more information via LUQueryInstances/RAPI
Currently, the custom instance parameters (hv, be, nicp) are onlyqueryable via LUQueryInstanceData. LUQueryInstance returns only thefilled parameters, thus its users (especially RAPI) have no way to know...
Add mising --units in gnt-instance list man page
Also fixes some wrapping issues, and one typo.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>(cherry picked from commit f8409165b4e6d24bd160ee6c85ba432ae8afa117)...
Set list of trusted SSL CAs for client to verify
As per SSL_CTX_set_client_CA_list(3SSL), set the list of acceptable CAsadvertised to SSL clients to include the server's own certificate. Thisevidently fixes the pycurl/gnutls RPC client.
During the TLS Handshake, when client verification is requested, the...
Require aclocal 1.11.1 or above for autogen.sh
Show instance state in instance console failures
The current message is not entirely clear, as it doesn't show the reasonwhy the instance is not running.
Fix epydoc errors
And sorry!
jqueue: Fix bug when cancelling jobs
If a job was cancelled while it was waiting for locks, an assertionwould've failed. This patch fixes the problem and provides a unittest to check for this situation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
mcpu: Raise directly in _AcquireLocks
Removes code duplication.
jqueue/gnt-job: Add job priority fields for display
These fields can help with debugging.
jqueue: Resume jobs from “waitlock” status (2nd try)
Commit 5ef699a0e had to roll back an earlier attempt at implementingthis. With the improved job queue processer, this is finally possible.
Add prealloc_wipe_disks as a cluster-wide configuration variable
This is the first step for the support of wiping block devices priorto creation of the instance.
Conflicts: lib/rpc.py (trivial, copyright header)
RPC: disable curl's Expect header
This patch solves the very slow (~8-9 seconds) gnt-instance modifybehaviour. Well, it solves in general the slow RPC behaviour, but it wasmost visible in that LU.
It seems that curl's behaviour with regard to file uploads (via PUT) and...
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Luca Bigliardi <shammash@google.com>
Merge commit 'v2.2.0.1' into stable-2.2
Conflicts: NEWS - merge configure.ac - keep 2.2.1~rc0 version
Release Ganeti 2.2.0.1
2.2.0 was built with old autotools, and it's incompatible with Python2.6. Rebuilding with a newer autotools version fixes this.
Change QA log output
Currently, the logging in QA doesn't show the duration of the varioussteps, and if it is needed one has to perform log manipulation. Thispatch changes the output so that the log informatio is line based (asopposed to block-based), such that it's easy to grep for all log lines:...
gnt-job cancel: Use non-zero exit status if canceling failed
This allows the use “gnt-job cancel” in scripts.
jqueue, CancelJob: Check status only once per call
This simplifies the code a bit--the status is only checked once.
Bump version to 2.2.1~rc0
Also update NEWS.
Try again to fix the inter-cluster move QA test
This time, we re-establish the old pri/sec nodes corretly. Unfortunately thiswill require now a 3-node cluster at least for drbd instances, hence it'ssomewhat suboptimal, but… The other option would be to move it simply from p:s...
Fix a rare bug in StartDaemonChild and GenericMain
I've seen cases where the result from str(sys.exc_info()[1]) is ""; thisbreaks the error reporting as the parent relies on non-empty errormessages to properly detect child status (otherwise it will try to read...
Enhance the error reporting
Since daemon startup error will be often related to socket errors, so itmakes sense to change the original reporting:
Error when starting daemon process: "(98, 'Address already in use')"
Into:
Error when starting daemon process: 'Socket-related error: Address...
Convert ganeti daemons to the three-stage startup
This makes almost all of the daemons show error messages, and not returnuntil they finished listening on the appropriate sockets.
Masterd is the only one "special", as it doesn't do enoughinitialization in the server creation, only later....
Change daemon.GenericMain/utils.Daemonize workflow
This patch copies the pipe-based error reporting functionality fromutils.StartDaemon (I gave up for now on tryin to merge the two).
This patch will fix two longstanding bugs:
- if we fork, we lose all error reporting from the child to the original...
Change utils.GenericMain protocol
Currently, GenericMain does a two-staged workflow:
- Check, before forking- then Exec, after forking
This means we don't have any possibility to treat preparation work(before the daemon is ready for work) different from the actual work....
Use only one version of WritePidFile
This patch merges the pid file handling used for ganeti-* daemons andimpexp daemons. The latter version is used, since it's more reliable:uses locked pid files as opposed to checking 'live' processes.
Abstract daemon file descriptor setup
This does some slight changes:
- Daemonize() doesn't explicitly close the file-descriptors anymore, but only implicitly via the usage of dup2- StartDaemonChild uses separate devnull for stdin (rdonly) and stdout/stderr (wronly), or if using a log file, it uses it in append...
Abstract some daemon functionality
This patch abstracts the chdir/umask/setsid functionality, which isidentical in the code functions, just that Daemonize did the chdir/umaskin the second child; with this change it does it in the first, asStartDaemon....
Merge branch 'devel-2.2' into master
QA: Fix instance move tests
The instance move tests were moving the instance from node pair (A,_) to(B, A), and left it there. This patch makes sure that the first stepmoves the instance to (B,A) but the second one back to (A,B), so thatthe instance is left on the same primary node....
Add simple unittest for utils.CommaJoin
Export VG name via LUQueryConfigValues
This will be used by LUXI client programs to display the VG name.
gnt-cluster: Replace hardcoded “xenvg” with value retrieved from master
This fixes issue 125 (http://code.google.com/p/ganeti/issues/detail?id=125)
LUGetTags: Acquire locks in shared mode
Retrieving tags can be done while the lock is shared. Only writingneeds to be exclusive.
Also add a FIXME for cluster tags, where the code currently doesn'tuse any locks except the config lock.
LUDelTags: Improve formatting of error message
Use utils.CommaJoin to add spaces after comma, clean up code a bit.
Before: Tag(s) 'bar','baz','foo','moo' not foundAfter: Tag(s) 'bar', 'baz', 'foo', 'moo' not found
cli: Move parsing of --net option to separate function
This function will also be used in tools/move-instance.
move-instance: Allow overriding instance parameters
When moving a single instance within the same cluster, the NICis not allowed to re-use an existing MAC address. To avoid this,NIC parameters must be overridden. BE, HV, OS and NIC parameterscan be overridden after applying this patch....
RAPI QA: Override MAC address when moving instance
This will make this test work again.
Clean up Ganeti 2.3 design document
- Typos- Fix capitalization- Fix quoting in some places- Rewrite part of privilege separation section to match with subsection titles
QA: Enable all tests by default
This patch enables all tests by default, unless when they'reexplicitely disabled in the config file. This will make surenewly added tests are run even when an old configuration fileis used.
A comment is also added qa-sample.json....
qa_config: Use ganeti.serializer for loading config
kvm: collapse two consecutive extend calls
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
kvm: Introduce support for -mem-path
Using hugepages, KVM instances can get a good performance boost. Toactivate that, we need to pass the -mem-path argument to KVM along withthe mount point of the hugetlbfs file system on the node.
For the sake of memory availability computation, we use the -mem-prealloc...
Conflicts: lib/objects.py (trivial, strange that this one, and only this one, conflicted)
Rename the _oss cluster vars to _os
Per the mailing list discussion, rename _oss to _os, both in cluster parametersand in the rest of the code.
This is just an s/_oss/_os, with the exception of a small bit of cleanuparound the helper_os function in cmdlib.py....