ganeti-local
13 years agoRebuild bash completion if client scripts change
Michael Hanselmann [Wed, 27 Oct 2010 12:05:34 +0000 (14:05 +0200)]
Rebuild bash completion if client scripts change

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove gnt-backup to ganeti.client.gnt_backup
Michael Hanselmann [Wed, 27 Oct 2010 11:41:47 +0000 (13:41 +0200)]
Move gnt-backup to ganeti.client.gnt_backup

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove gnt-instance to ganeti.client.gnt_instance
Michael Hanselmann [Wed, 27 Oct 2010 11:37:48 +0000 (13:37 +0200)]
Move gnt-instance to ganeti.client.gnt_instance

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove gnt-job to ganeti.client.gnt_job
Michael Hanselmann [Wed, 27 Oct 2010 11:33:28 +0000 (13:33 +0200)]
Move gnt-job to ganeti.client.gnt_job

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove gnt-node to ganeti.client.gnt_node
Michael Hanselmann [Wed, 27 Oct 2010 11:30:44 +0000 (13:30 +0200)]
Move gnt-node to ganeti.client.gnt_node

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove gnt-cluster to ganeti.client.gnt_cluster
Michael Hanselmann [Wed, 27 Oct 2010 11:27:32 +0000 (13:27 +0200)]
Move gnt-cluster to ganeti.client.gnt_cluster

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoPython bootstrapper: hardcode /usr/bin/python
Michael Hanselmann [Wed, 27 Oct 2010 11:52:14 +0000 (13:52 +0200)]
Python bootstrapper: hardcode /usr/bin/python

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove gnt-os to ganeti.client.gnt_os
Michael Hanselmann [Wed, 27 Oct 2010 11:02:13 +0000 (13:02 +0200)]
Move gnt-os to ganeti.client.gnt_os

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove gnt-debug to ganeti.client.gnt_debug
Michael Hanselmann [Wed, 27 Oct 2010 11:00:49 +0000 (13:00 +0200)]
Move gnt-debug to ganeti.client.gnt_debug

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAllow programs to be part of the Ganeti library
Michael Hanselmann [Wed, 27 Oct 2010 10:58:13 +0000 (12:58 +0200)]
Allow programs to be part of the Ganeti library

Eventually this will help ensuring that clients and servers are of the
same version, as long as they're imported from the same path. Currently
it's relatively easy for gnt-* and ganeti-* to be from a different
version.

Scripts will be at ganeti.client.gnt_* and a small bootstrap script
calls a “Main” function from the module.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd master_capab to gnt-node modify
Iustin Pop [Tue, 26 Oct 2010 15:43:37 +0000 (17:43 +0200)]
Add master_capab to gnt-node modify

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoImplement the master_capable flag in node modify
Iustin Pop [Tue, 26 Oct 2010 15:36:35 +0000 (17:36 +0200)]
Implement the master_capable flag in node modify

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoExport the capability flags in query, rapi, ialloc
Iustin Pop [Tue, 26 Oct 2010 15:59:10 +0000 (17:59 +0200)]
Export the capability flags in query, rapi, ialloc

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd the master/vm_capable flags to objects
Iustin Pop [Tue, 26 Oct 2010 12:05:56 +0000 (14:05 +0200)]
Add the master/vm_capable flags to objects

This adds the flag and some initial handling. The rest of the changes,
for cmdlib, come in a separate patch.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRework node role changes
Iustin Pop [Tue, 26 Oct 2010 13:33:36 +0000 (15:33 +0200)]
Rework node role changes

There have been many bugs in gnt-node modify. Let's try to introduce
some more.

This patch reworks the node role changes from tracking the flag changes
to completely overwriting the flags based on the new role. This paves
the way for (in 2.4 or later) moving to a single attribute for nodes.

We compute the old role and the new role based on the required changes
and whether we need to auto-promote. Once this is done, the body of the
Exec() function becomes trivial (there's more code related to output
formatting than the node flag changes).

Another advantage of the new version is that the entire flags are
overwritten, and that all are changed at the same time, making it
impossible (harder?) to have partial updates.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMinor language fixes to the 2.3 design doc.
Balazs Lecz [Tue, 26 Oct 2010 18:17:34 +0000 (19:17 +0100)]
Minor language fixes to the 2.3 design doc.

Signed-off-by: Balazs Lecz <leczb@google.com>
[dato@google.com: extracted language fixes from bigger patch.]
Signed-off-by: Adeodato Simo <dato@google.com>

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd documentation about the capability flags
Iustin Pop [Tue, 26 Oct 2010 11:59:22 +0000 (13:59 +0200)]
Add documentation about the capability flags

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoEnable failure on warnings in epydoc
Michael Hanselmann [Tue, 26 Oct 2010 13:02:35 +0000 (15:02 +0200)]
Enable failure on warnings in epydoc

This causes epydoc to fail on any warning.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agorpc: Work around epydoc warning
Michael Hanselmann [Tue, 26 Oct 2010 13:02:01 +0000 (15:02 +0200)]
rpc: Work around epydoc warning

Aliasing the “threading” module allows us to avoid the “No information
available for ganeti.rpc._RpcThreadLocal's base threading.local” warning
by epydoc.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoRevert "Allow to specify wipe command and flags at configure time"
René Nussbaumer [Tue, 26 Oct 2010 13:08:03 +0000 (15:08 +0200)]
Revert "Allow to specify wipe command and flags at configure time"

This reverts commit 6e991d0e64e36adf985d0512e4148bcd6a160c6a.

Conflicts:

lib/constants.py (this got already removed, so no changes in here)

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'devel-2.2'
Michael Hanselmann [Tue, 26 Oct 2010 12:30:35 +0000 (14:30 +0200)]
Merge branch 'devel-2.2'

* devel-2.2:
  Allow remote imports without checked names
  ConfigWriter: Fix typo in error message parts
  Fix remote imports

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAllow remote imports without checked names
Michael Hanselmann [Mon, 25 Oct 2010 16:37:52 +0000 (18:37 +0200)]
Allow remote imports without checked names

By default all names are checked (LUCreateInstance, name_check). In some
cases it can be useful to disable this check, but doing so was not
allowed for remote imports. One should be aware, however, that using
this feature can lead to rename script failures when importing a remote
instance without the proper name, e.g.:

“Failed to run rename script for inst1 on node node3.example.com: OS
rename script failed (exited with exit code 1), last lines in the log
file:\nCannot rename from inst2.example.com to inst1:\nInstance has a
different hostname (inst2)”

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoUpdate NEWS
René Nussbaumer [Tue, 26 Oct 2010 11:46:36 +0000 (13:46 +0200)]
Update NEWS

This add my recent changes for support of wiping disks prior to
allocation as a new feature to the NEWS file

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoSupport modify of prealloc_wipe_disks config value
René Nussbaumer [Tue, 26 Oct 2010 11:13:33 +0000 (13:13 +0200)]
Support modify of prealloc_wipe_disks config value

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoExport a node's group information in iallocator
Iustin Pop [Mon, 25 Oct 2010 12:08:40 +0000 (14:08 +0200)]
Export a node's group information in iallocator

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRename node.nodegroup to node.group
Iustin Pop [Mon, 25 Oct 2010 11:21:00 +0000 (13:21 +0200)]
Rename node.nodegroup to node.group

In the context of a node, its group has (at least today) only one
meaning, that is the node's node group. As such, we rename
node.nodegroup to just node.group.

Note: if we want to keep node in there, it should be at least
node_group, for consistency with the other node attributes.

Similarly, we rename the OpAddNode nodegroup attribute to group.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRename --nodegroup to --node-group
Iustin Pop [Mon, 25 Oct 2010 11:19:44 +0000 (13:19 +0200)]
Rename --nodegroup to --node-group

For consistency with other CLI options.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoExport node group data in iallocator
Iustin Pop [Mon, 25 Oct 2010 11:11:08 +0000 (13:11 +0200)]
Export node group data in iallocator

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoSplit IAllocator._ComputeClusterData
Iustin Pop [Mon, 25 Oct 2010 11:00:00 +0000 (13:00 +0200)]
Split IAllocator._ComputeClusterData

The node and instance computations were all in this big function; we
separate them out for more clarity.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoPutting the pieces together and invoke the wipe in cmdlib
René Nussbaumer [Mon, 25 Oct 2010 14:40:37 +0000 (16:40 +0200)]
Putting the pieces together and invoke the wipe in cmdlib

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdding RPC call for blockdev_wipe
René Nussbaumer [Mon, 25 Oct 2010 14:32:31 +0000 (16:32 +0200)]
Adding RPC call for blockdev_wipe

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoSecond iteration over backend.BlockdevWipe
René Nussbaumer [Mon, 25 Oct 2010 14:30:34 +0000 (16:30 +0200)]
Second iteration over backend.BlockdevWipe

This patch now uses dd entirely to wipe the disk, make it
much easier to wipe in blocks so we can give interactive feedback
about the status.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoConfigWriter: Fix typo in error message parts
Michael Hanselmann [Fri, 22 Oct 2010 16:07:39 +0000 (18:07 +0200)]
ConfigWriter: Fix typo in error message parts

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoSimplify and extend the instance OS env
Iustin Pop [Mon, 25 Oct 2010 10:06:20 +0000 (12:06 +0200)]
Simplify and extend the instance OS env

Some parameters were missing (uuid, c/mtime). We simplify the export
method; unfortunately we cannot simply iterate over __slots__ since the
mapping is not 1:1.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix QA mixup of node/instance tests
Iustin Pop [Fri, 22 Oct 2010 14:53:29 +0000 (16:53 +0200)]
Fix QA mixup of node/instance tests

There are two node tests that are run from RunCommonInstanceTests, which is the
bad place—it causes these node tests to be run three times instead of once.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoConfigWriter: prevent using a foreign config
Iustin Pop [Fri, 22 Oct 2010 13:54:52 +0000 (15:54 +0200)]
ConfigWriter: prevent using a foreign config

If the configuration file doesn't denote this node as master, we prevent
startup. This would have detected our previous race condition more
easily, hence we add it as a permanent check.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix bootstrap.MasterFailover race with watcher
Iustin Pop [Fri, 22 Oct 2010 13:40:29 +0000 (15:40 +0200)]
Fix bootstrap.MasterFailover race with watcher

This fixes a recently diagnosed race condition between master failover
and the watcher.

Currently, the master failover first stops the master daemon, checks
that the IP is no longer reachable, and then distributes the updated
configuration. Between the stop and the distribution, it can happen that
the watcher starts the master daemon on the old node again, since ssconf
still points the master to it (and all nodes vote so).

In even more weird cases, the master daemon starts and before it manages
to open the configuration file, it is updated, which means the master
will respond to QueryClusterInfo with another node as the real master.

This patch reorders the actions during master failover:

- first, we redistribute a fixed config; this means the old master will
  refuse to update its own config file and ssconf, and that most jobs
  that change state will fail to finish
- we then immediately kill it; after this step, the watcher will be
  unable to start it, since the master will refuse startup
- and only then we check for IP reachability, etc.

I've tested the new version against concurrent launch of the watcher;
while my tests are not very exhaustive, two things can happen: watcher
see the daemons as dead, and tries to restart them, which also fail; or
it simply get an error while reading from the master daemon. Both these
should be OK.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoConfigWriter: protect against multiple writers
Iustin Pop [Fri, 22 Oct 2010 12:33:14 +0000 (14:33 +0200)]
ConfigWriter: protect against multiple writers

This should fix the case where there are two masters that both try to
distribute the configuration file to the cluster. The first one that does so,
will "win" the ownership of the config.data.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agobackend.Upload: switch to utils.SafeWriteFile
Iustin Pop [Fri, 22 Oct 2010 12:40:13 +0000 (14:40 +0200)]
backend.Upload: switch to utils.SafeWriteFile

This allows serialization of updates to a given file, with respect to
other cooperating writers.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd a "safe" file wrapper over WriteFile
Iustin Pop [Fri, 22 Oct 2010 12:29:47 +0000 (14:29 +0200)]
Add a "safe" file wrapper over WriteFile

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd functions to read and compare file 'ID's
Iustin Pop [Fri, 22 Oct 2010 12:00:35 +0000 (14:00 +0200)]
Add functions to read and compare file 'ID's

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoLUSetInstanceParams: Remove unused attribute
Michael Hanselmann [Wed, 20 Oct 2010 18:10:21 +0000 (20:10 +0200)]
LUSetInstanceParams: Remove unused attribute

“os_new” is not used anywhere, removing it.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdding backend method to wipe a block device
René Nussbaumer [Thu, 21 Oct 2010 12:19:19 +0000 (14:19 +0200)]
Adding backend method to wipe a block device

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAllow to specify wipe command and flags at configure time
René Nussbaumer [Thu, 21 Oct 2010 12:51:51 +0000 (14:51 +0200)]
Allow to specify wipe command and flags at configure time

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix remote imports
Iustin Pop [Fri, 22 Oct 2010 07:59:03 +0000 (09:59 +0200)]
Fix remote imports

A simple typo…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoFix typo introduced in 8d8c4ef
Iustin Pop [Fri, 22 Oct 2010 08:12:39 +0000 (10:12 +0200)]
Fix typo introduced in 8d8c4ef

Commit 8d8c4ef broke instance reinstall with different OS, due to an
attribute typo.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoAdjust the error message of setup-ssh if join check fails
René Nussbaumer [Thu, 21 Oct 2010 13:01:32 +0000 (15:01 +0200)]
Adjust the error message of setup-ssh if join check fails

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix clearing of the default iallocator
Iustin Pop [Thu, 21 Oct 2010 10:28:33 +0000 (12:28 +0200)]
Fix clearing of the default iallocator

And also update the man page.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agognt-instance reinstall: Allow overriding OS parameters
Michael Hanselmann [Wed, 20 Oct 2010 18:13:49 +0000 (20:13 +0200)]
gnt-instance reinstall: Allow overriding OS parameters

This allows OS installation scripts to make use of special parameters,
e.g. to retain some data on reinstallation.

The RAPI resource is not updated as it takes all parameters via the
query string and encoding arbitrary data in a query string is tricky.
The resource will need to be changed to use the POST body instead.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd option to ignore offline node on instance start/stop
Michael Hanselmann [Wed, 20 Oct 2010 12:51:53 +0000 (14:51 +0200)]
Add option to ignore offline node on instance start/stop

In some cases it can be useful to mark as an instance as started
or stopped while its primary node is offline. With this patch,
a new option, “--ignore-offline”, is introduced to “gnt-instance
start” and “… stop”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoutils: Add function to find items in dictionary using regex
Michael Hanselmann [Tue, 19 Oct 2010 14:47:16 +0000 (16:47 +0200)]
utils: Add function to find items in dictionary using regex

This basically extracts a small piece of code from ganeti-rapi and puts
it into a utility function. RAPI resources are found using a dictionary
in which the keys can either be static strings or compiled regular
expressions. This might be handy in other places, hence extracting it
and adding unittests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoQA RAPI: Test HTTP 404 and 501
Michael Hanselmann [Tue, 19 Oct 2010 16:14:00 +0000 (18:14 +0200)]
QA RAPI: Test HTTP 404 and 501

This tests the HTTP Not Found and Not Implemented errors.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoQA: Add test for “gnt-node modify”
Michael Hanselmann [Wed, 20 Oct 2010 12:04:32 +0000 (14:04 +0200)]
QA: Add test for “gnt-node modify”

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLet gnt-cluster support prealloc_wipe_disks
René Nussbaumer [Tue, 12 Oct 2010 11:39:43 +0000 (13:39 +0200)]
Let gnt-cluster support prealloc_wipe_disks

This includes a new option gnt-cluster init and approriate output
on gnt-cluster info. Though gnt-cluster modify is not yet prepared.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.2'
Michael Hanselmann [Tue, 19 Oct 2010 11:39:11 +0000 (13:39 +0200)]
Merge branch 'devel-2.2'

* devel-2.2:
  Bump version to 2.2.1, update NEWS

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoBump version to 2.2.1, update NEWS v2.2.1
Michael Hanselmann [Tue, 19 Oct 2010 11:26:40 +0000 (13:26 +0200)]
Bump version to 2.2.1, update NEWS

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.2'
Michael Hanselmann [Fri, 15 Oct 2010 15:29:12 +0000 (17:29 +0200)]
Merge branch 'devel-2.2'

* devel-2.2:
  http.client: Disable SSL session ID cache
  Crude workaround for pylint breakage

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agohttp.client: Disable SSL session ID cache
Apollon Oikonomopoulos [Fri, 15 Oct 2010 05:55:59 +0000 (08:55 +0300)]
http.client: Disable SSL session ID cache

This patch disables the SSL session ID cache for all cURL operations.
This is needed because http.HttpBase's PyOpenSSL implementation does not
currently set a context using SSL_set_session_id_context(3SSL), cURL
tries to re-use the session ID and, according to
SSL_set_session_id_context(3SSL):

 If the session id context is not set on an SSL/TLS server and client
 certificates are used, stored sessions will not be reused but a fatal
 error will be flagged and the handshake will fail.

Ideally, session caching should be either controlled, or disabled in
HttpBase, however PyOpenSSL does not seem to implement
SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which
are used for these purposes (it seems that only M2Crypto's SSL module
supports these).

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agohttp.client: Disable SSL session ID cache
Apollon Oikonomopoulos [Fri, 15 Oct 2010 05:55:59 +0000 (08:55 +0300)]
http.client: Disable SSL session ID cache

This patch disables the SSL session ID cache for all cURL operations.
This is needed because http.HttpBase's PyOpenSSL implementation does not
currently set a context using SSL_set_session_id_context(3SSL), cURL
tries to re-use the session ID and, according to
SSL_set_session_id_context(3SSL):

 If the session id context is not set on an SSL/TLS server and client
 certificates are used, stored sessions will not be reused but a fatal
 error will be flagged and the handshake will fail.

Ideally, session caching should be either controlled, or disabled in
HttpBase, however PyOpenSSL does not seem to implement
SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which
are used for these purposes (it seems that only M2Crypto's SSL module
supports these).

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoCrude workaround for pylint breakage
Iustin Pop [Fri, 15 Oct 2010 14:48:26 +0000 (16:48 +0200)]
Crude workaround for pylint breakage

The way we currently call pylint, the exact order it inspect modules in
lib/http/ depends on the filesystem order. This is not good, and if
lib/http/server.py is loaded before lib/http/__init__.py, it will throw
a "R0921:763:HttpMessageReader: Abstract class not referenced" (as that
class is used in server.py).

For the short-term fix, we just add server.py after "ganeti", so that
it gets parsed (again?) and pylint sees the usage of the class.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agohttp.auth: Fix docstring error
Michael Hanselmann [Fri, 15 Oct 2010 14:19:54 +0000 (16:19 +0200)]
http.auth: Fix docstring error

This was missing from commit 2287b920.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agodevnotes.rst: Remove hardcoded Python version
Michael Hanselmann [Fri, 15 Oct 2010 14:19:08 +0000 (16:19 +0200)]
devnotes.rst: Remove hardcoded Python version

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'stable-2.2'
Iustin Pop [Thu, 14 Oct 2010 12:29:24 +0000 (14:29 +0200)]
Merge branch 'stable-2.2'

* stable-2.2:
  Release 2.2.1~rc1
  Require aclocal 1.11.1 or above for devel/release
  Revert "Require aclocal 1.11.1 or above for autogen.sh"
  Add mising --units in gnt-instance list man page
  Set list of trusted SSL CAs for client to verify
  Require aclocal 1.11.1 or above for autogen.sh

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoBrown-bag fix for leftover comment
Iustin Pop [Thu, 14 Oct 2010 12:20:46 +0000 (14:20 +0200)]
Brown-bag fix for leftover comment

I did forgot this in the original patch. Sorry!!!!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRework QA interaction with the watcher
Iustin Pop [Thu, 14 Oct 2010 09:40:37 +0000 (11:40 +0200)]
Rework QA interaction with the watcher

The interaction with cron-launched watcher is a well-known failure mode of QA:

---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance

For the following tests it's recommended to turn off the ganeti-watcher cronjob.

---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher

Error: Domain 'instance1' does not exist.
Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes
  -oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d'
2010-10-13 23:55:04,479:  pid=1659 ganeti-watcher:626
 ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked
---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher

In order to fix this, we disable the watcher during these tests, and
re-enable it afterwards. To protect against watcher being disabled, we
enable it unconditionally at the start of the QA (we do want it enabled,
in order to see the interaction between the watcher and many
creation/disk replace jobs, etc.).

Note: even after this patch, if a cron-watcher was started and is still
running during the test, we'll have locking issues. I think for now this
is OK, we'll have to see how often that happens.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd a new watcher option --ignore-pause
Iustin Pop [Thu, 14 Oct 2010 08:44:22 +0000 (10:44 +0200)]
Add a new watcher option --ignore-pause

During cluster maintenance, when the watcher is disabled, it's useful to
run it just once. This is incovenient to do currently, as the watcher
needs to be unpaused, then run, then paused again.

This patch adds an option “--ignore-pause” that can be used to ignore
the cluster-level setting. Also the man page is updated as it was
missing the options available.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRelease 2.2.1~rc1 v2.2.1rc1
Iustin Pop [Thu, 14 Oct 2010 10:58:06 +0000 (12:58 +0200)]
Release 2.2.1~rc1

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'stable-2.2' into devel-2.2
Iustin Pop [Thu, 14 Oct 2010 10:34:57 +0000 (12:34 +0200)]
Merge branch 'stable-2.2' into devel-2.2

* stable-2.2:
  Require aclocal 1.11.1 or above for devel/release
  Revert "Require aclocal 1.11.1 or above for autogen.sh"
  Set list of trusted SSL CAs for client to verify
  Require aclocal 1.11.1 or above for autogen.sh

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix compatibility with Pyinotify 0.8
Michael Hanselmann [Wed, 13 Oct 2010 13:13:44 +0000 (15:13 +0200)]
Fix compatibility with Pyinotify 0.8

I didn't know why the code previously used
“pyinotify.EventsCodes.ALL_FLAGS” instead of using the flags from
“pyinotify.EventsCodes” directly. Turns out that Pyinotify 0.8 has them
in “pyinotify”, not “pyinotify.EventsCodes”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoganeti-rapi: Watch directory, not file for user file changes
Michael Hanselmann [Wed, 13 Oct 2010 10:55:45 +0000 (12:55 +0200)]
ganeti-rapi: Watch directory, not file for user file changes

We noticed several issues when just watching the file, among them race
conditions upon replacing the file using rename(2) (the new watcher
would be created too soon). By just watching the directory for events on
the rapi_users file, this can be avoided.

A nice side-effect is that now the users file is also reloaded if it
didn't exist upon ganeti-rapi's start (see the documentation update).

Since ganeti-rapi now becomes active for virtually every change in the
configuration directory (…/lib/ganeti), moving the rapi_users file to a
separate directory will be considered. It doesn't have to happen in or
before this patch, though.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoExtract base class from SingleFileEventHandler
Michael Hanselmann [Wed, 13 Oct 2010 10:43:27 +0000 (12:43 +0200)]
Extract base class from SingleFileEventHandler

The base class can contain code useful to other inotify users.
As it is “SingleFileEventHandler” can not be used in ganeti-rapi,
therefore it'll use its own small inotify handler class based
on this base class.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agohttp.auth.ReadPasswordFile: Don't read file directly
Michael Hanselmann [Mon, 11 Oct 2010 12:11:19 +0000 (14:11 +0200)]
http.auth.ReadPasswordFile: Don't read file directly

Reading the file before this function allows for better error
reporting.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove the parameter types to their own module
Iustin Pop [Tue, 12 Oct 2010 09:39:51 +0000 (11:39 +0200)]
Move the parameter types to their own module

This is for cleanup, and for later reuse in other parts of the code
(outside of LUs).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years ago"Fix" handling of old software versions on startup
Iustin Pop [Wed, 13 Oct 2010 10:29:26 +0000 (12:29 +0200)]
"Fix" handling of old software versions on startup

Currently, masterd startup with old software versions is very confusing
for users: we present two tracebacks, with a message in the middle about
"version mismatch". This can lead to users believing that all that needs
to be done is to fix the config file.

This patch attempts to improve this by handling this case in masterd
itself (not in the child), and showing a more friendly message for this
case.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRequire aclocal 1.11.1 or above for devel/release
Guido Trotter [Wed, 13 Oct 2010 10:44:55 +0000 (11:44 +0100)]
Require aclocal 1.11.1 or above for devel/release

1.11.1 is the version in squeeze and lucid, and we know it works. We
also know that 1.10.1 in hardy and lenny doesn't, nor do 1.10 in etch
and 1.9.6 in dapper. We haven't tested any other version.

With older versions python.m4 is buggy, and results in the package being
built not working on python 2.6 (which uses dist-packages rather than
site-packages as a module directory).

Version comparison is done component-by-component, over a bash array.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRevert "Require aclocal 1.11.1 or above for autogen.sh"
Guido Trotter [Wed, 13 Oct 2010 10:09:45 +0000 (11:09 +0100)]
Revert "Require aclocal 1.11.1 or above for autogen.sh"

The comparison is incorrect, and the check also breaks daily work on
autobuilders and older distros.

This reverts commit dbc4dda7f5b66c9905c3cf6e44414536a5b38177.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoExport more information via LUQueryInstances/RAPI
Iustin Pop [Tue, 12 Oct 2010 15:33:53 +0000 (17:33 +0200)]
Export more information via LUQueryInstances/RAPI

Currently, the custom instance parameters (hv, be, nicp) are only
queryable via LUQueryInstanceData. LUQueryInstance returns only the
filled parameters, thus its users (especially RAPI) have no way to know
if a parameter is custom or the default value.

This patch adds three new parameters: custom_hvparams, custom_beparams,
custom_nicparams, that are also exported in RAPI.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd mising --units in gnt-instance list man page
Iustin Pop [Tue, 12 Oct 2010 15:50:04 +0000 (17:50 +0200)]
Add mising --units in gnt-instance list man page

Also fixes some wrapping issues, and one typo.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
(cherry picked from commit f8409165b4e6d24bd160ee6c85ba432ae8afa117)

Conflicts:

man/gnt-instance.sgml (re-wrapped)

13 years agoSet list of trusted SSL CAs for client to verify
Apollon Oikonomopoulos [Tue, 12 Oct 2010 15:08:06 +0000 (18:08 +0300)]
Set list of trusted SSL CAs for client to verify

As per SSL_CTX_set_client_CA_list(3SSL), set the list of acceptable CAs
advertised to SSL clients to include the server's own certificate. This
evidently fixes the pycurl/gnutls RPC client.

During the TLS Handshake, when client verification is requested, the
Server sends a CertificateRequest message which states that the client
should send a valid certificate as a response. The CertificateRequest
message contains a section called "certificate_authorities", which,
according to the standard, is a list of the Distinguished Names (DNs) of
acceptable certification authorities. The client uses this list to send
a certificate signed by one of the acceptable CAs.

Under OpenSSL's server implementation, this list must be set manually
using some appropriate call, otherwise the list is empty. TLS 1.0[1]
does not state whether the list may be left blank, whereas TLS 1.1[2]
and 1.2[3] state that in case the list is blank, then the client *may*
send any certificate of a valid type (valid types are specified
elsewhere in the handshake).

OpenSSL clients seem to obey the behaviour specified in TLS 1.1+,
whereas at least curl+GnuTLS does not send any certificates if the list
is empty (which is not wrong per the spec, but also evidently not
configurable).

[1] http://tools.ietf.org/html/rfc2246
[2] http://tools.ietf.org/html/rfc4346
[3] http://tools.ietf.org/html/rfc5246

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRequire aclocal 1.11.1 or above for autogen.sh
Guido Trotter [Tue, 12 Oct 2010 15:30:50 +0000 (16:30 +0100)]
Require aclocal 1.11.1 or above for autogen.sh

1.11.1 is the version in squeeze and lucid, and we know it works. We
also know that 1.10.1 in hardy and lenny doesn't, nor do 1.10 in etch
and 1.9.6 in dapper. We haven't tested any other version.

With older versions python.m4 is buggy, and results in the package being
built not working on python 2.6 (which uses dist-packages rather than
site-packages as a module directory).

The autogen.sh interpreter is changed to bash, as we need to use the [[
builtin to compare versions with "<". [ doesn't have that functionality,
and we can't of course rely on dpkg, which won't be installed on all
distributions.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoShow instance state in instance console failures
Iustin Pop [Tue, 12 Oct 2010 14:54:14 +0000 (16:54 +0200)]
Show instance state in instance console failures

The current message is not entirely clear, as it doesn't show the reason
why the instance is not running.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix epydoc errors
Iustin Pop [Tue, 12 Oct 2010 13:28:58 +0000 (15:28 +0200)]
Fix epydoc errors

And sorry!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agojqueue: Fix bug when cancelling jobs
Michael Hanselmann [Fri, 8 Oct 2010 14:03:17 +0000 (16:03 +0200)]
jqueue: Fix bug when cancelling jobs

If a job was cancelled while it was waiting for locks, an assertion
would've failed. This patch fixes the problem and provides a unit
test to check for this situation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue: Resume jobs from “waitlock” status (2nd try)
Michael Hanselmann [Fri, 8 Oct 2010 14:01:32 +0000 (16:01 +0200)]
jqueue: Resume jobs from “waitlock” status (2nd try)

Commit 5ef699a0e had to roll back an earlier attempt at implementing
this. With the improved job queue processer, this is finally possible.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue/gnt-job: Add job priority fields for display
Michael Hanselmann [Fri, 8 Oct 2010 13:59:30 +0000 (15:59 +0200)]
jqueue/gnt-job: Add job priority fields for display

These fields can help with debugging.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agomcpu: Raise directly in _AcquireLocks
Michael Hanselmann [Fri, 8 Oct 2010 13:57:36 +0000 (15:57 +0200)]
mcpu: Raise directly in _AcquireLocks

Removes code duplication.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd prealloc_wipe_disks as a cluster-wide configuration variable
René Nussbaumer [Tue, 12 Oct 2010 09:04:23 +0000 (11:04 +0200)]
Add prealloc_wipe_disks as a cluster-wide configuration variable

This is the first step for the support of wiping block devices prior
to creation of the instance.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.2'
Iustin Pop [Mon, 11 Oct 2010 13:16:15 +0000 (15:16 +0200)]
Merge branch 'devel-2.2'

* devel-2.2:
  RPC: disable curl's Expect header

Conflicts:
lib/rpc.py (trivial, copyright header)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRPC: disable curl's Expect header
Iustin Pop [Mon, 11 Oct 2010 12:31:09 +0000 (14:31 +0200)]
RPC: disable curl's Expect header

This patch solves the very slow (~8-9 seconds) gnt-instance modify
behaviour. Well, it solves in general the slow RPC behaviour, but it was
most visible in that LU.

It seems that curl's behaviour with regard to file uploads (via PUT) and
the 'Expect' header are interacting badly with our http server.

First, our http server doesn't properly handle this header. According to
RFC 2616:

  Requirements for HTTP/1.1 origin servers: Upon receiving a request
  which includes an Expect request-header field with the "100-continue"
  expectation, an origin server MUST either respond with 100 (Continue)
  status and continue to read from the input stream, or respond with a
  final status code.

Our server doesn't do this, and hence it triggers this behaviour in curl
(from the curl FAQ):

  4.16 My HTTP POST or PUT requests are slow!

  libcurl makes all POST and PUT requests (except for POST requests with a
  very tiny request body) use the "Expect: 100-continue" header. This header
  allows the server to deny the operation early so that libcurl can bail out
  already before having to send any data. This is useful in authentication
  cases and others.

  However, many servers don't implement the Expect: stuff properly and if the
  server doesn't respond (positively) within 1 second libcurl will continue
  and send off the data anyway.

  You can disable libcurl's use of the Expect: header the same way you disable
  any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.

This behaviour was detected by watching the captured traffic (in non-SSL
mode), where between the initial HTTP headers (ending with the Expect
one), there was a ~1-2 second pause until curl was sending the body.
Properly RTFM-ing would have saved ~1 day of digging around, but hey…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'devel-2.2'
Guido Trotter [Fri, 8 Oct 2010 17:46:05 +0000 (18:46 +0100)]
Merge branch 'devel-2.2'

* devel-2.2:
  Release Ganeti 2.2.0.1
  Bump version to 2.2.1~rc0

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>

13 years agoMerge commit 'v2.2.0.1' into stable-2.2
Guido Trotter [Fri, 8 Oct 2010 17:31:29 +0000 (18:31 +0100)]
Merge commit 'v2.2.0.1' into stable-2.2

* commit 'v2.2.0.1':
  Release Ganeti 2.2.0.1

Conflicts:
NEWS
  - merge
configure.ac
  - keep 2.2.1~rc0 version

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>

13 years agoRelease Ganeti 2.2.0.1 v2.2.0.1
Guido Trotter [Fri, 8 Oct 2010 16:55:40 +0000 (17:55 +0100)]
Release Ganeti 2.2.0.1

2.2.0 was built with old autotools, and it's incompatible with Python
2.6. Rebuilding with a newer autotools version fixes this.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>

13 years agoChange QA log output
Iustin Pop [Fri, 8 Oct 2010 12:40:30 +0000 (14:40 +0200)]
Change QA log output

Currently, the logging in QA doesn't show the duration of the various
steps, and if it is needed one has to perform log manipulation. This
patch changes the output so that the log informatio is line based (as
opposed to block-based), such that it's easy to grep for all log lines:

./qa/ganeti-qa.py --yes-do-it qa.json  2>&1|grep ^----
---- 2010-10-08 14:40:21.730382 start Test SSH connection --------------
---- 2010-10-08 14:40:23.156633 time=0:00:01.426251 Test SSH connection
---- 2010-10-08 14:40:23.156735 start ICMP ping each node --------------
---- 2010-10-08 14:40:24.230479 time=0:00:01.073744 ICMP ping each node
---- 2010-10-08 14:40:24.230583 start Test availibility of Ganeti commands
---- 2010-10-08 14:40:32.314586 time=0:00:08.084003 Test availibility of Ganeti commands
---- 2010-10-08 14:40:32.314734 start gnt-node info --------------------
---- 2010-10-08 14:40:32.860884 time=0:00:00.546150 gnt-node info ------

or just for the duration of the steps:
./qa/ganeti-qa.py --yes-do-it ../qa-mpgntac5.fra.json  2>&1|grep ^----.*time=
---- 2010-10-08 14:42:12.630067 time=0:00:01.239256 Test SSH connection
---- 2010-10-08 14:42:14.204393 time=0:00:01.574221 ICMP ping each node
---- 2010-10-08 14:42:22.170828 time=0:00:07.966331 Test availibility of Ganeti commands
---- 2010-10-08 14:42:22.701030 time=0:00:00.530037 gnt-node info ------

This will help with identifying slow steps or even graphing the QA
duration.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agognt-job cancel: Use non-zero exit status if canceling failed
Michael Hanselmann [Thu, 7 Oct 2010 14:58:41 +0000 (16:58 +0200)]
gnt-job cancel: Use non-zero exit status if canceling failed

This allows the use “gnt-job cancel” in scripts.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue, CancelJob: Check status only once per call
Michael Hanselmann [Thu, 7 Oct 2010 14:58:00 +0000 (16:58 +0200)]
jqueue, CancelJob: Check status only once per call

This simplifies the code a bit--the status is only checked once.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoBump version to 2.2.1~rc0 v2.2.1rc0
Michael Hanselmann [Thu, 7 Oct 2010 13:08:47 +0000 (15:08 +0200)]
Bump version to 2.2.1~rc0

Also update NEWS.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.2'
Iustin Pop [Thu, 7 Oct 2010 12:17:10 +0000 (14:17 +0200)]
Merge branch 'devel-2.2'

* devel-2.2:
  Try again to fix the inter-cluster move QA test

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoTry again to fix the inter-cluster move QA test
Iustin Pop [Thu, 7 Oct 2010 09:56:12 +0000 (11:56 +0200)]
Try again to fix the inter-cluster move QA test

This time, we re-establish the old pri/sec nodes corretly. Unfortunately this
will require now a 3-node cluster at least for drbd instances, hence it's
somewhat suboptimal, but… The other option would be to move it simply from p:s
to s:p and then back to p:s, without involving a third node (for DRBD case),
but I think that moving it to a completely separate node is slightly better for
testing.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix a rare bug in StartDaemonChild and GenericMain
Iustin Pop [Wed, 6 Oct 2010 09:15:09 +0000 (11:15 +0200)]
Fix a rare bug in StartDaemonChild and GenericMain

I've seen cases where the result from str(sys.exc_info()[1]) is ""; this
breaks the error reporting as the parent relies on non-empty error
messages to properly detect child status (otherwise it will try to read
the pid and fail, so on).

While this was always in case of asserts, we need to ensure this doesn't
happen. Therefore we abstract this functionality (writing the error
message) and ensure we write a non-empty string in the new function.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoEnhance the error reporting
Iustin Pop [Wed, 6 Oct 2010 08:13:23 +0000 (10:13 +0200)]
Enhance the error reporting

Since daemon startup error will be often related to socket errors, so it
makes sense to change the original reporting:

  Error when starting daemon process: "(98, 'Address already in use')"

Into:

  Error when starting daemon process: 'Socket-related error: Address
  already in use (errno=98)'

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>