Iustin Pop [Wed, 27 Oct 2010 14:12:01 +0000 (16:12 +0200)]
Add support for vm_capable in cluster verify
The method to make vm_capable integrate easily into cluster verify is as follows:
- we add a new NV_VMNODES that represents *non*-vm-capable nodes
- the LU populates this list (it's expected that non-vm_capable nodes
are few compared to vm_capable nodes)
- backend skips the checks that are related to VM hosting
- in the LU, we reorder the VM-related checks so that they occur after
the non-VM (generic) tests, and we only execute them conditionally
Additionally, we add some support to the instance checks to detect
instances living on bad nodes.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Oct 2010 12:43:32 +0000 (14:43 +0200)]
Add vm_capable to gnt-node modify
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Oct 2010 11:53:27 +0000 (13:53 +0200)]
Add vm_capable to LUSetNodeParams
And also do some cleanup: we only run the role changed actions if the
node has actually changed roles.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Oct 2010 11:30:16 +0000 (13:30 +0200)]
ConfigWriter: add some helper functions
This can be used to compute a node's instances easily, and a small
function to get all non-vm_capable nodes.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 15:50:16 +0000 (17:50 +0200)]
LUClusterVerify: Complain if disk is marked faulty
This will show a warning if, for example, one side of a DRBD
disk becomes unavailable. The data is collected separately
from the other verification data.
Example output:
* Verifying instance status
- ERROR: instance inst1: disk/0 on node2 is faulty
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 12:32:14 +0000 (14:32 +0200)]
A number of Makefile fixes
- run-in-tempdir should depend on what it copies to the temporary dir
- Add PYTHON_BOOTSTRAP to BUILT_SOURCES
- Don't use “mkdir -p” directly
- Create directory if necessary for writing bootstrap script
In summary, this should make “make distcheck” in pristine
checkout work again.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 12:05:34 +0000 (14:05 +0200)]
Rebuild bash completion if client scripts change
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 11:41:47 +0000 (13:41 +0200)]
Move gnt-backup to ganeti.client.gnt_backup
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 11:37:48 +0000 (13:37 +0200)]
Move gnt-instance to ganeti.client.gnt_instance
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 11:33:28 +0000 (13:33 +0200)]
Move gnt-job to ganeti.client.gnt_job
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 11:30:44 +0000 (13:30 +0200)]
Move gnt-node to ganeti.client.gnt_node
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 11:27:32 +0000 (13:27 +0200)]
Move gnt-cluster to ganeti.client.gnt_cluster
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 11:52:14 +0000 (13:52 +0200)]
Python bootstrapper: hardcode /usr/bin/python
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 11:02:13 +0000 (13:02 +0200)]
Move gnt-os to ganeti.client.gnt_os
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 11:00:49 +0000 (13:00 +0200)]
Move gnt-debug to ganeti.client.gnt_debug
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Oct 2010 10:58:13 +0000 (12:58 +0200)]
Allow programs to be part of the Ganeti library
Eventually this will help ensuring that clients and servers are of the
same version, as long as they're imported from the same path. Currently
it's relatively easy for gnt-* and ganeti-* to be from a different
version.
Scripts will be at ganeti.client.gnt_* and a small bootstrap script
calls a “Main” function from the module.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 26 Oct 2010 15:43:37 +0000 (17:43 +0200)]
Add master_capab to gnt-node modify
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Tue, 26 Oct 2010 15:36:35 +0000 (17:36 +0200)]
Implement the master_capable flag in node modify
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 26 Oct 2010 15:59:10 +0000 (17:59 +0200)]
Export the capability flags in query, rapi, ialloc
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 26 Oct 2010 12:05:56 +0000 (14:05 +0200)]
Add the master/vm_capable flags to objects
This adds the flag and some initial handling. The rest of the changes,
for cmdlib, come in a separate patch.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Tue, 26 Oct 2010 13:33:36 +0000 (15:33 +0200)]
Rework node role changes
There have been many bugs in gnt-node modify. Let's try to introduce
some more.
This patch reworks the node role changes from tracking the flag changes
to completely overwriting the flags based on the new role. This paves
the way for (in 2.4 or later) moving to a single attribute for nodes.
We compute the old role and the new role based on the required changes
and whether we need to auto-promote. Once this is done, the body of the
Exec() function becomes trivial (there's more code related to output
formatting than the node flag changes).
Another advantage of the new version is that the entire flags are
overwritten, and that all are changed at the same time, making it
impossible (harder?) to have partial updates.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Balazs Lecz [Tue, 26 Oct 2010 18:17:34 +0000 (19:17 +0100)]
Minor language fixes to the 2.3 design doc.
Signed-off-by: Balazs Lecz <leczb@google.com>
[dato@google.com: extracted language fixes from bigger patch.]
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 26 Oct 2010 11:59:22 +0000 (13:59 +0200)]
Add documentation about the capability flags
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 26 Oct 2010 13:02:35 +0000 (15:02 +0200)]
Enable failure on warnings in epydoc
This causes epydoc to fail on any warning.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 26 Oct 2010 13:02:01 +0000 (15:02 +0200)]
rpc: Work around epydoc warning
Aliasing the “threading” module allows us to avoid the “No information
available for ganeti.rpc._RpcThreadLocal's base threading.local” warning
by epydoc.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
René Nussbaumer [Tue, 26 Oct 2010 13:08:03 +0000 (15:08 +0200)]
Revert "Allow to specify wipe command and flags at configure time"
This reverts commit
6e991d0e64e36adf985d0512e4148bcd6a160c6a.
Conflicts:
lib/constants.py (this got already removed, so no changes in here)
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 26 Oct 2010 12:30:35 +0000 (14:30 +0200)]
Merge branch 'devel-2.2'
* devel-2.2:
Allow remote imports without checked names
ConfigWriter: Fix typo in error message parts
Fix remote imports
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 25 Oct 2010 16:37:52 +0000 (18:37 +0200)]
Allow remote imports without checked names
By default all names are checked (LUCreateInstance, name_check). In some
cases it can be useful to disable this check, but doing so was not
allowed for remote imports. One should be aware, however, that using
this feature can lead to rename script failures when importing a remote
instance without the proper name, e.g.:
“Failed to run rename script for inst1 on node node3.example.com: OS
rename script failed (exited with exit code 1), last lines in the log
file:\nCannot rename from inst2.example.com to inst1:\nInstance has a
different hostname (inst2)”
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Tue, 26 Oct 2010 11:46:36 +0000 (13:46 +0200)]
Update NEWS
This add my recent changes for support of wiping disks prior to
allocation as a new feature to the NEWS file
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Tue, 26 Oct 2010 11:13:33 +0000 (13:13 +0200)]
Support modify of prealloc_wipe_disks config value
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 25 Oct 2010 12:08:40 +0000 (14:08 +0200)]
Export a node's group information in iallocator
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 25 Oct 2010 11:21:00 +0000 (13:21 +0200)]
Rename node.nodegroup to node.group
In the context of a node, its group has (at least today) only one
meaning, that is the node's node group. As such, we rename
node.nodegroup to just node.group.
Note: if we want to keep node in there, it should be at least
node_group, for consistency with the other node attributes.
Similarly, we rename the OpAddNode nodegroup attribute to group.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 25 Oct 2010 11:19:44 +0000 (13:19 +0200)]
Rename --nodegroup to --node-group
For consistency with other CLI options.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 25 Oct 2010 11:11:08 +0000 (13:11 +0200)]
Export node group data in iallocator
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 25 Oct 2010 11:00:00 +0000 (13:00 +0200)]
Split IAllocator._ComputeClusterData
The node and instance computations were all in this big function; we
separate them out for more clarity.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Mon, 25 Oct 2010 14:40:37 +0000 (16:40 +0200)]
Putting the pieces together and invoke the wipe in cmdlib
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Mon, 25 Oct 2010 14:32:31 +0000 (16:32 +0200)]
Adding RPC call for blockdev_wipe
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Mon, 25 Oct 2010 14:30:34 +0000 (16:30 +0200)]
Second iteration over backend.BlockdevWipe
This patch now uses dd entirely to wipe the disk, make it
much easier to wipe in blocks so we can give interactive feedback
about the status.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 22 Oct 2010 16:07:39 +0000 (18:07 +0200)]
ConfigWriter: Fix typo in error message parts
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 25 Oct 2010 10:06:20 +0000 (12:06 +0200)]
Simplify and extend the instance OS env
Some parameters were missing (uuid, c/mtime). We simplify the export
method; unfortunately we cannot simply iterate over __slots__ since the
mapping is not 1:1.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 22 Oct 2010 14:53:29 +0000 (16:53 +0200)]
Fix QA mixup of node/instance tests
There are two node tests that are run from RunCommonInstanceTests, which is the
bad place—it causes these node tests to be run three times instead of once.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 22 Oct 2010 13:54:52 +0000 (15:54 +0200)]
ConfigWriter: prevent using a foreign config
If the configuration file doesn't denote this node as master, we prevent
startup. This would have detected our previous race condition more
easily, hence we add it as a permanent check.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 22 Oct 2010 13:40:29 +0000 (15:40 +0200)]
Fix bootstrap.MasterFailover race with watcher
This fixes a recently diagnosed race condition between master failover
and the watcher.
Currently, the master failover first stops the master daemon, checks
that the IP is no longer reachable, and then distributes the updated
configuration. Between the stop and the distribution, it can happen that
the watcher starts the master daemon on the old node again, since ssconf
still points the master to it (and all nodes vote so).
In even more weird cases, the master daemon starts and before it manages
to open the configuration file, it is updated, which means the master
will respond to QueryClusterInfo with another node as the real master.
This patch reorders the actions during master failover:
- first, we redistribute a fixed config; this means the old master will
refuse to update its own config file and ssconf, and that most jobs
that change state will fail to finish
- we then immediately kill it; after this step, the watcher will be
unable to start it, since the master will refuse startup
- and only then we check for IP reachability, etc.
I've tested the new version against concurrent launch of the watcher;
while my tests are not very exhaustive, two things can happen: watcher
see the daemons as dead, and tries to restart them, which also fail; or
it simply get an error while reading from the master daemon. Both these
should be OK.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 22 Oct 2010 12:33:14 +0000 (14:33 +0200)]
ConfigWriter: protect against multiple writers
This should fix the case where there are two masters that both try to
distribute the configuration file to the cluster. The first one that does so,
will "win" the ownership of the config.data.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 22 Oct 2010 12:40:13 +0000 (14:40 +0200)]
backend.Upload: switch to utils.SafeWriteFile
This allows serialization of updates to a given file, with respect to
other cooperating writers.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 22 Oct 2010 12:29:47 +0000 (14:29 +0200)]
Add a "safe" file wrapper over WriteFile
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 22 Oct 2010 12:00:35 +0000 (14:00 +0200)]
Add functions to read and compare file 'ID's
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Wed, 20 Oct 2010 18:10:21 +0000 (20:10 +0200)]
LUSetInstanceParams: Remove unused attribute
“os_new” is not used anywhere, removing it.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 21 Oct 2010 12:19:19 +0000 (14:19 +0200)]
Adding backend method to wipe a block device
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 21 Oct 2010 12:51:51 +0000 (14:51 +0200)]
Allow to specify wipe command and flags at configure time
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 22 Oct 2010 07:59:03 +0000 (09:59 +0200)]
Fix remote imports
A simple typo…
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 22 Oct 2010 08:12:39 +0000 (10:12 +0200)]
Fix typo introduced in 8d8c4ef
Commit 8d8c4ef broke instance reinstall with different OS, due to an
attribute typo.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
René Nussbaumer [Thu, 21 Oct 2010 13:01:32 +0000 (15:01 +0200)]
Adjust the error message of setup-ssh if join check fails
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 21 Oct 2010 10:28:33 +0000 (12:28 +0200)]
Fix clearing of the default iallocator
And also update the man page.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Wed, 20 Oct 2010 18:13:49 +0000 (20:13 +0200)]
gnt-instance reinstall: Allow overriding OS parameters
This allows OS installation scripts to make use of special parameters,
e.g. to retain some data on reinstallation.
The RAPI resource is not updated as it takes all parameters via the
query string and encoding arbitrary data in a query string is tricky.
The resource will need to be changed to use the POST body instead.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 20 Oct 2010 12:51:53 +0000 (14:51 +0200)]
Add option to ignore offline node on instance start/stop
In some cases it can be useful to mark as an instance as started
or stopped while its primary node is offline. With this patch,
a new option, “--ignore-offline”, is introduced to “gnt-instance
start” and “… stop”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 19 Oct 2010 14:47:16 +0000 (16:47 +0200)]
utils: Add function to find items in dictionary using regex
This basically extracts a small piece of code from ganeti-rapi and puts
it into a utility function. RAPI resources are found using a dictionary
in which the keys can either be static strings or compiled regular
expressions. This might be handy in other places, hence extracting it
and adding unittests.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 19 Oct 2010 16:14:00 +0000 (18:14 +0200)]
QA RAPI: Test HTTP 404 and 501
This tests the HTTP Not Found and Not Implemented errors.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 20 Oct 2010 12:04:32 +0000 (14:04 +0200)]
QA: Add test for “gnt-node modify”
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Tue, 12 Oct 2010 11:39:43 +0000 (13:39 +0200)]
Let gnt-cluster support prealloc_wipe_disks
This includes a new option gnt-cluster init and approriate output
on gnt-cluster info. Though gnt-cluster modify is not yet prepared.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 19 Oct 2010 11:39:11 +0000 (13:39 +0200)]
Merge branch 'devel-2.2'
* devel-2.2:
Bump version to 2.2.1, update NEWS
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 19 Oct 2010 11:26:40 +0000 (13:26 +0200)]
Bump version to 2.2.1, update NEWS
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 15 Oct 2010 15:29:12 +0000 (17:29 +0200)]
Merge branch 'devel-2.2'
* devel-2.2:
http.client: Disable SSL session ID cache
Crude workaround for pylint breakage
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Apollon Oikonomopoulos [Fri, 15 Oct 2010 05:55:59 +0000 (08:55 +0300)]
http.client: Disable SSL session ID cache
This patch disables the SSL session ID cache for all cURL operations.
This is needed because http.HttpBase's PyOpenSSL implementation does not
currently set a context using SSL_set_session_id_context(3SSL), cURL
tries to re-use the session ID and, according to
SSL_set_session_id_context(3SSL):
If the session id context is not set on an SSL/TLS server and client
certificates are used, stored sessions will not be reused but a fatal
error will be flagged and the handshake will fail.
Ideally, session caching should be either controlled, or disabled in
HttpBase, however PyOpenSSL does not seem to implement
SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which
are used for these purposes (it seems that only M2Crypto's SSL module
supports these).
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Apollon Oikonomopoulos [Fri, 15 Oct 2010 05:55:59 +0000 (08:55 +0300)]
http.client: Disable SSL session ID cache
This patch disables the SSL session ID cache for all cURL operations.
This is needed because http.HttpBase's PyOpenSSL implementation does not
currently set a context using SSL_set_session_id_context(3SSL), cURL
tries to re-use the session ID and, according to
SSL_set_session_id_context(3SSL):
If the session id context is not set on an SSL/TLS server and client
certificates are used, stored sessions will not be reused but a fatal
error will be flagged and the handshake will fail.
Ideally, session caching should be either controlled, or disabled in
HttpBase, however PyOpenSSL does not seem to implement
SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which
are used for these purposes (it seems that only M2Crypto's SSL module
supports these).
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 15 Oct 2010 14:48:26 +0000 (16:48 +0200)]
Crude workaround for pylint breakage
The way we currently call pylint, the exact order it inspect modules in
lib/http/ depends on the filesystem order. This is not good, and if
lib/http/server.py is loaded before lib/http/__init__.py, it will throw
a "R0921:763:HttpMessageReader: Abstract class not referenced" (as that
class is used in server.py).
For the short-term fix, we just add server.py after "ganeti", so that
it gets parsed (again?) and pylint sees the usage of the class.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 15 Oct 2010 14:19:54 +0000 (16:19 +0200)]
http.auth: Fix docstring error
This was missing from commit
2287b920.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 15 Oct 2010 14:19:08 +0000 (16:19 +0200)]
devnotes.rst: Remove hardcoded Python version
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 14 Oct 2010 12:29:24 +0000 (14:29 +0200)]
Merge branch 'stable-2.2'
* stable-2.2:
Release 2.2.1~rc1
Require aclocal 1.11.1 or above for devel/release
Revert "Require aclocal 1.11.1 or above for autogen.sh"
Add mising --units in gnt-instance list man page
Set list of trusted SSL CAs for client to verify
Require aclocal 1.11.1 or above for autogen.sh
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 14 Oct 2010 12:20:46 +0000 (14:20 +0200)]
Brown-bag fix for leftover comment
I did forgot this in the original patch. Sorry!!!!
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Thu, 14 Oct 2010 09:40:37 +0000 (11:40 +0200)]
Rework QA interaction with the watcher
The interaction with cron-launched watcher is a well-known failure mode of QA:
---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance
For the following tests it's recommended to turn off the ganeti-watcher cronjob.
---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher
…
Error: Domain 'instance1' does not exist.
Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes
-oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d'
2010-10-13 23:55:04,479: pid=1659 ganeti-watcher:626
ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked
---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher
In order to fix this, we disable the watcher during these tests, and
re-enable it afterwards. To protect against watcher being disabled, we
enable it unconditionally at the start of the QA (we do want it enabled,
in order to see the interaction between the watcher and many
creation/disk replace jobs, etc.).
Note: even after this patch, if a cron-watcher was started and is still
running during the test, we'll have locking issues. I think for now this
is OK, we'll have to see how often that happens.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 14 Oct 2010 08:44:22 +0000 (10:44 +0200)]
Add a new watcher option --ignore-pause
During cluster maintenance, when the watcher is disabled, it's useful to
run it just once. This is incovenient to do currently, as the watcher
needs to be unpaused, then run, then paused again.
This patch adds an option “--ignore-pause” that can be used to ignore
the cluster-level setting. Also the man page is updated as it was
missing the options available.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 14 Oct 2010 10:58:06 +0000 (12:58 +0200)]
Release 2.2.1~rc1
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 14 Oct 2010 10:34:57 +0000 (12:34 +0200)]
Merge branch 'stable-2.2' into devel-2.2
* stable-2.2:
Require aclocal 1.11.1 or above for devel/release
Revert "Require aclocal 1.11.1 or above for autogen.sh"
Set list of trusted SSL CAs for client to verify
Require aclocal 1.11.1 or above for autogen.sh
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Wed, 13 Oct 2010 13:13:44 +0000 (15:13 +0200)]
Fix compatibility with Pyinotify 0.8
I didn't know why the code previously used
“pyinotify.EventsCodes.ALL_FLAGS” instead of using the flags from
“pyinotify.EventsCodes” directly. Turns out that Pyinotify 0.8 has them
in “pyinotify”, not “pyinotify.EventsCodes”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 13 Oct 2010 10:55:45 +0000 (12:55 +0200)]
ganeti-rapi: Watch directory, not file for user file changes
We noticed several issues when just watching the file, among them race
conditions upon replacing the file using rename(2) (the new watcher
would be created too soon). By just watching the directory for events on
the rapi_users file, this can be avoided.
A nice side-effect is that now the users file is also reloaded if it
didn't exist upon ganeti-rapi's start (see the documentation update).
Since ganeti-rapi now becomes active for virtually every change in the
configuration directory (…/lib/ganeti), moving the rapi_users file to a
separate directory will be considered. It doesn't have to happen in or
before this patch, though.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 13 Oct 2010 10:43:27 +0000 (12:43 +0200)]
Extract base class from SingleFileEventHandler
The base class can contain code useful to other inotify users.
As it is “SingleFileEventHandler” can not be used in ganeti-rapi,
therefore it'll use its own small inotify handler class based
on this base class.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 11 Oct 2010 12:11:19 +0000 (14:11 +0200)]
http.auth.ReadPasswordFile: Don't read file directly
Reading the file before this function allows for better error
reporting.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 12 Oct 2010 09:39:51 +0000 (11:39 +0200)]
Move the parameter types to their own module
This is for cleanup, and for later reuse in other parts of the code
(outside of LUs).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 13 Oct 2010 10:29:26 +0000 (12:29 +0200)]
"Fix" handling of old software versions on startup
Currently, masterd startup with old software versions is very confusing
for users: we present two tracebacks, with a message in the middle about
"version mismatch". This can lead to users believing that all that needs
to be done is to fix the config file.
This patch attempts to improve this by handling this case in masterd
itself (not in the child), and showing a more friendly message for this
case.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 13 Oct 2010 10:44:55 +0000 (11:44 +0100)]
Require aclocal 1.11.1 or above for devel/release
1.11.1 is the version in squeeze and lucid, and we know it works. We
also know that 1.10.1 in hardy and lenny doesn't, nor do 1.10 in etch
and 1.9.6 in dapper. We haven't tested any other version.
With older versions python.m4 is buggy, and results in the package being
built not working on python 2.6 (which uses dist-packages rather than
site-packages as a module directory).
Version comparison is done component-by-component, over a bash array.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 13 Oct 2010 10:09:45 +0000 (11:09 +0100)]
Revert "Require aclocal 1.11.1 or above for autogen.sh"
The comparison is incorrect, and the check also breaks daily work on
autobuilders and older distros.
This reverts commit
dbc4dda7f5b66c9905c3cf6e44414536a5b38177.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 12 Oct 2010 15:33:53 +0000 (17:33 +0200)]
Export more information via LUQueryInstances/RAPI
Currently, the custom instance parameters (hv, be, nicp) are only
queryable via LUQueryInstanceData. LUQueryInstance returns only the
filled parameters, thus its users (especially RAPI) have no way to know
if a parameter is custom or the default value.
This patch adds three new parameters: custom_hvparams, custom_beparams,
custom_nicparams, that are also exported in RAPI.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 12 Oct 2010 15:50:04 +0000 (17:50 +0200)]
Add mising --units in gnt-instance list man page
Also fixes some wrapping issues, and one typo.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
(cherry picked from commit
f8409165b4e6d24bd160ee6c85ba432ae8afa117)
Conflicts:
man/gnt-instance.sgml (re-wrapped)
Apollon Oikonomopoulos [Tue, 12 Oct 2010 15:08:06 +0000 (18:08 +0300)]
Set list of trusted SSL CAs for client to verify
As per SSL_CTX_set_client_CA_list(3SSL), set the list of acceptable CAs
advertised to SSL clients to include the server's own certificate. This
evidently fixes the pycurl/gnutls RPC client.
During the TLS Handshake, when client verification is requested, the
Server sends a CertificateRequest message which states that the client
should send a valid certificate as a response. The CertificateRequest
message contains a section called "certificate_authorities", which,
according to the standard, is a list of the Distinguished Names (DNs) of
acceptable certification authorities. The client uses this list to send
a certificate signed by one of the acceptable CAs.
Under OpenSSL's server implementation, this list must be set manually
using some appropriate call, otherwise the list is empty. TLS 1.0[1]
does not state whether the list may be left blank, whereas TLS 1.1[2]
and 1.2[3] state that in case the list is blank, then the client *may*
send any certificate of a valid type (valid types are specified
elsewhere in the handshake).
OpenSSL clients seem to obey the behaviour specified in TLS 1.1+,
whereas at least curl+GnuTLS does not send any certificates if the list
is empty (which is not wrong per the spec, but also evidently not
configurable).
[1] http://tools.ietf.org/html/rfc2246
[2] http://tools.ietf.org/html/rfc4346
[3] http://tools.ietf.org/html/rfc5246
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Tue, 12 Oct 2010 15:30:50 +0000 (16:30 +0100)]
Require aclocal 1.11.1 or above for autogen.sh
1.11.1 is the version in squeeze and lucid, and we know it works. We
also know that 1.10.1 in hardy and lenny doesn't, nor do 1.10 in etch
and 1.9.6 in dapper. We haven't tested any other version.
With older versions python.m4 is buggy, and results in the package being
built not working on python 2.6 (which uses dist-packages rather than
site-packages as a module directory).
The autogen.sh interpreter is changed to bash, as we need to use the [[
builtin to compare versions with "<". [ doesn't have that functionality,
and we can't of course rely on dpkg, which won't be installed on all
distributions.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 12 Oct 2010 14:54:14 +0000 (16:54 +0200)]
Show instance state in instance console failures
The current message is not entirely clear, as it doesn't show the reason
why the instance is not running.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 12 Oct 2010 13:28:58 +0000 (15:28 +0200)]
Fix epydoc errors
And sorry!
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 8 Oct 2010 14:03:17 +0000 (16:03 +0200)]
jqueue: Fix bug when cancelling jobs
If a job was cancelled while it was waiting for locks, an assertion
would've failed. This patch fixes the problem and provides a unit
test to check for this situation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 8 Oct 2010 14:01:32 +0000 (16:01 +0200)]
jqueue: Resume jobs from “waitlock” status (2nd try)
Commit
5ef699a0e had to roll back an earlier attempt at implementing
this. With the improved job queue processer, this is finally possible.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 8 Oct 2010 13:59:30 +0000 (15:59 +0200)]
jqueue/gnt-job: Add job priority fields for display
These fields can help with debugging.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 8 Oct 2010 13:57:36 +0000 (15:57 +0200)]
mcpu: Raise directly in _AcquireLocks
Removes code duplication.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Tue, 12 Oct 2010 09:04:23 +0000 (11:04 +0200)]
Add prealloc_wipe_disks as a cluster-wide configuration variable
This is the first step for the support of wiping block devices prior
to creation of the instance.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 11 Oct 2010 13:16:15 +0000 (15:16 +0200)]
Merge branch 'devel-2.2'
* devel-2.2:
RPC: disable curl's Expect header
Conflicts:
lib/rpc.py (trivial, copyright header)
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 11 Oct 2010 12:31:09 +0000 (14:31 +0200)]
RPC: disable curl's Expect header
This patch solves the very slow (~8-9 seconds) gnt-instance modify
behaviour. Well, it solves in general the slow RPC behaviour, but it was
most visible in that LU.
It seems that curl's behaviour with regard to file uploads (via PUT) and
the 'Expect' header are interacting badly with our http server.
First, our http server doesn't properly handle this header. According to
RFC 2616:
Requirements for HTTP/1.1 origin servers: Upon receiving a request
which includes an Expect request-header field with the "100-continue"
expectation, an origin server MUST either respond with 100 (Continue)
status and continue to read from the input stream, or respond with a
final status code.
Our server doesn't do this, and hence it triggers this behaviour in curl
(from the curl FAQ):
4.16 My HTTP POST or PUT requests are slow!
libcurl makes all POST and PUT requests (except for POST requests with a
very tiny request body) use the "Expect: 100-continue" header. This header
allows the server to deny the operation early so that libcurl can bail out
already before having to send any data. This is useful in authentication
cases and others.
However, many servers don't implement the Expect: stuff properly and if the
server doesn't respond (positively) within 1 second libcurl will continue
and send off the data anyway.
You can disable libcurl's use of the Expect: header the same way you disable
any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.
This behaviour was detected by watching the captured traffic (in non-SSL
mode), where between the initial HTTP headers (ending with the Expect
one), there was a ~1-2 second pause until curl was sending the body.
Properly RTFM-ing would have saved ~1 day of digging around, but hey…
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 8 Oct 2010 17:46:05 +0000 (18:46 +0100)]
Merge branch 'devel-2.2'
* devel-2.2:
Release Ganeti 2.2.0.1
Bump version to 2.2.1~rc0
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>
Guido Trotter [Fri, 8 Oct 2010 17:31:29 +0000 (18:31 +0100)]
Merge commit 'v2.2.0.1' into stable-2.2
* commit 'v2.2.0.1':
Release Ganeti 2.2.0.1
Conflicts:
NEWS
- merge
configure.ac
- keep 2.2.1~rc0 version
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>
Guido Trotter [Fri, 8 Oct 2010 16:55:40 +0000 (17:55 +0100)]
Release Ganeti 2.2.0.1
2.2.0 was built with old autotools, and it's incompatible with Python
2.6. Rebuilding with a newer autotools version fixes this.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>
Iustin Pop [Fri, 8 Oct 2010 12:40:30 +0000 (14:40 +0200)]
Change QA log output
Currently, the logging in QA doesn't show the duration of the various
steps, and if it is needed one has to perform log manipulation. This
patch changes the output so that the log informatio is line based (as
opposed to block-based), such that it's easy to grep for all log lines:
./qa/ganeti-qa.py --yes-do-it qa.json 2>&1|grep ^----
---- 2010-10-08 14:40:21.730382 start Test SSH connection --------------
---- 2010-10-08 14:40:23.156633 time=0:00:01.426251 Test SSH connection
---- 2010-10-08 14:40:23.156735 start ICMP ping each node --------------
---- 2010-10-08 14:40:24.230479 time=0:00:01.073744 ICMP ping each node
---- 2010-10-08 14:40:24.230583 start Test availibility of Ganeti commands
---- 2010-10-08 14:40:32.314586 time=0:00:08.084003 Test availibility of Ganeti commands
---- 2010-10-08 14:40:32.314734 start gnt-node info --------------------
---- 2010-10-08 14:40:32.860884 time=0:00:00.546150 gnt-node info ------
or just for the duration of the steps:
./qa/ganeti-qa.py --yes-do-it ../qa-mpgntac5.fra.json 2>&1|grep ^----.*time=
---- 2010-10-08 14:42:12.630067 time=0:00:01.239256 Test SSH connection
---- 2010-10-08 14:42:14.204393 time=0:00:01.574221 ICMP ping each node
---- 2010-10-08 14:42:22.170828 time=0:00:07.966331 Test availibility of Ganeti commands
---- 2010-10-08 14:42:22.701030 time=0:00:00.530037 gnt-node info ------
This will help with identifying slow steps or even graphing the QA
duration.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 7 Oct 2010 14:58:41 +0000 (16:58 +0200)]
gnt-job cancel: Use non-zero exit status if canceling failed
This allows the use “gnt-job cancel” in scripts.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>