code.grnet.gr Git - ganeti-local/log

Rebuild bash completion if client scripts change

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Move gnt-backup to ganeti.client.gnt_backup

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Move gnt-instance to ganeti.client.gnt_instance

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Move gnt-job to ganeti.client.gnt_job

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Move gnt-node to ganeti.client.gnt_node

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Move gnt-cluster to ganeti.client.gnt_cluster

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Python bootstrapper: hardcode /usr/bin/python

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Move gnt-os to ganeti.client.gnt_os

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Move gnt-debug to ganeti.client.gnt_debug

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Allow programs to be part of the Ganeti library

Eventually this will help ensuring that clients and servers are of the
same version, as long as they're imported from the same path. Currently
it's relatively easy for gnt-* and ganeti-* to be from a different
version.

Scripts will be at ganeti.client.gnt_* and a small bootstrap script
calls a “Main” function from the module.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Add master_capab to gnt-node modify

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

Implement the master_capable flag in node modify

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Export the capability flags in query, rapi, ialloc

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Add the master/vm_capable flags to objects

This adds the flag and some initial handling. The rest of the changes,
for cmdlib, come in a separate patch.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

Rework node role changes

There have been many bugs in gnt-node modify. Let's try to introduce
some more.

This patch reworks the node role changes from tracking the flag changes
to completely overwriting the flags based on the new role. This paves
the way for (in 2.4 or later) moving to a single attribute for nodes.

We compute the old role and the new role based on the required changes
and whether we need to auto-promote. Once this is done, the body of the
Exec() function becomes trivial (there's more code related to output
formatting than the node flag changes).

Another advantage of the new version is that the entire flags are
overwritten, and that all are changed at the same time, making it
impossible (harder?) to have partial updates.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Minor language fixes to the 2.3 design doc.

Signed-off-by: Balazs Lecz <leczb@google.com>
[dato@google.com: extracted language fixes from bigger patch.]
Signed-off-by: Adeodato Simo <dato@google.com>

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Add documentation about the capability flags

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Enable failure on warnings in epydoc

This causes epydoc to fail on any warning.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

rpc: Work around epydoc warning

Aliasing the “threading” module allows us to avoid the “No information
available for ganeti.rpc._RpcThreadLocal's base threading.local” warning
by epydoc.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

Revert "Allow to specify wipe command and flags at configure time"

This reverts commit 6e991d0e64e36adf985d0512e4148bcd6a160c6a.

Conflicts:

lib/constants.py (this got already removed, so no changes in here)

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Merge branch 'devel-2.2'

* devel-2.2:
  Allow remote imports without checked names
  ConfigWriter: Fix typo in error message parts
  Fix remote imports

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Allow remote imports without checked names

By default all names are checked (LUCreateInstance, name_check). In some
cases it can be useful to disable this check, but doing so was not
allowed for remote imports. One should be aware, however, that using
this feature can lead to rename script failures when importing a remote
instance without the proper name, e.g.:

“Failed to run rename script for inst1 on node node3.example.com: OS
rename script failed (exited with exit code 1), last lines in the log
file:\nCannot rename from inst2.example.com to inst1:\nInstance has a
different hostname (inst2)”

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Update NEWS

This add my recent changes for support of wiping disks prior to
allocation as a new feature to the NEWS file

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Support modify of prealloc_wipe_disks config value

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Export a node's group information in iallocator

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Rename node.nodegroup to node.group

In the context of a node, its group has (at least today) only one
meaning, that is the node's node group. As such, we rename
node.nodegroup to just node.group.

Note: if we want to keep node in there, it should be at least
node_group, for consistency with the other node attributes.

Similarly, we rename the OpAddNode nodegroup attribute to group.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Rename --nodegroup to --node-group

For consistency with other CLI options.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Export node group data in iallocator

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Split IAllocator._ComputeClusterData

The node and instance computations were all in this big function; we
separate them out for more clarity.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Putting the pieces together and invoke the wipe in cmdlib

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Adding RPC call for blockdev_wipe

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Second iteration over backend.BlockdevWipe

This patch now uses dd entirely to wipe the disk, make it
much easier to wipe in blocks so we can give interactive feedback
about the status.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ConfigWriter: Fix typo in error message parts

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Simplify and extend the instance OS env

Some parameters were missing (uuid, c/mtime). We simplify the export
method; unfortunately we cannot simply iterate over __slots__ since the
mapping is not 1:1.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Fix QA mixup of node/instance tests

There are two node tests that are run from RunCommonInstanceTests, which is the
bad place—it causes these node tests to be run three times instead of once.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ConfigWriter: prevent using a foreign config

If the configuration file doesn't denote this node as master, we prevent
startup. This would have detected our previous race condition more
easily, hence we add it as a permanent check.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Fix bootstrap.MasterFailover race with watcher

This fixes a recently diagnosed race condition between master failover
and the watcher.

Currently, the master failover first stops the master daemon, checks
that the IP is no longer reachable, and then distributes the updated
configuration. Between the stop and the distribution, it can happen that
the watcher starts the master daemon on the old node again, since ssconf
still points the master to it (and all nodes vote so).

In even more weird cases, the master daemon starts and before it manages
to open the configuration file, it is updated, which means the master
will respond to QueryClusterInfo with another node as the real master.

This patch reorders the actions during master failover:

- first, we redistribute a fixed config; this means the old master will
  refuse to update its own config file and ssconf, and that most jobs
  that change state will fail to finish
- we then immediately kill it; after this step, the watcher will be
  unable to start it, since the master will refuse startup
- and only then we check for IP reachability, etc.

I've tested the new version against concurrent launch of the watcher;
while my tests are not very exhaustive, two things can happen: watcher
see the daemons as dead, and tries to restart them, which also fail; or
it simply get an error while reading from the master daemon. Both these
should be OK.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ConfigWriter: protect against multiple writers

This should fix the case where there are two masters that both try to
distribute the configuration file to the cluster. The first one that does so,
will "win" the ownership of the config.data.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

backend.Upload: switch to utils.SafeWriteFile

This allows serialization of updates to a given file, with respect to
other cooperating writers.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Add a "safe" file wrapper over WriteFile

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Add functions to read and compare file 'ID's

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

LUSetInstanceParams: Remove unused attribute

“os_new” is not used anywhere, removing it.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Adding backend method to wipe a block device

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Allow to specify wipe command and flags at configure time

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Fix remote imports

A simple typo…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

Fix typo introduced in 8d8c4ef

Commit 8d8c4ef broke instance reinstall with different OS, due to an
attribute typo.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

Adjust the error message of setup-ssh if join check fails

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Fix clearing of the default iallocator

And also update the man page.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

gnt-instance reinstall: Allow overriding OS parameters

This allows OS installation scripts to make use of special parameters,
e.g. to retain some data on reinstallation.

The RAPI resource is not updated as it takes all parameters via the
query string and encoding arbitrary data in a query string is tricky.
The resource will need to be changed to use the POST body instead.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Add option to ignore offline node on instance start/stop

In some cases it can be useful to mark as an instance as started
or stopped while its primary node is offline. With this patch,
a new option, “--ignore-offline”, is introduced to “gnt-instance
start” and “… stop”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

utils: Add function to find items in dictionary using regex

This basically extracts a small piece of code from ganeti-rapi and puts
it into a utility function. RAPI resources are found using a dictionary
in which the keys can either be static strings or compiled regular
expressions. This might be handy in other places, hence extracting it
and adding unittests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

QA RAPI: Test HTTP 404 and 501

This tests the HTTP Not Found and Not Implemented errors.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

QA: Add test for “gnt-node modify”

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Let gnt-cluster support prealloc_wipe_disks

This includes a new option gnt-cluster init and approriate output
on gnt-cluster info. Though gnt-cluster modify is not yet prepared.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Merge branch 'devel-2.2'

* devel-2.2:
Bump version to 2.2.1, update NEWS

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Bump version to 2.2.1, update NEWS

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Merge branch 'devel-2.2'

* devel-2.2:
http.client: Disable SSL session ID cache
Crude workaround for pylint breakage

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

http.client: Disable SSL session ID cache

This patch disables the SSL session ID cache for all cURL operations.
This is needed because http.HttpBase's PyOpenSSL implementation does not
currently set a context using SSL_set_session_id_context(3SSL), cURL
tries to re-use the session ID and, according to
SSL_set_session_id_context(3SSL):

If the session id context is not set on an SSL/TLS server and client
certificates are used, stored sessions will not be reused but a fatal
error will be flagged and the handshake will fail.

Ideally, session caching should be either controlled, or disabled in
HttpBase, however PyOpenSSL does not seem to implement
SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which
are used for these purposes (it seems that only M2Crypto's SSL module
supports these).

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Crude workaround for pylint breakage

The way we currently call pylint, the exact order it inspect modules in
lib/http/ depends on the filesystem order. This is not good, and if
lib/http/server.py is loaded before lib/http/__init__.py, it will throw
a "R0921:763:HttpMessageReader: Abstract class not referenced" (as that
class is used in server.py).

For the short-term fix, we just add server.py after "ganeti", so that
it gets parsed (again?) and pylint sees the usage of the class.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

http.auth: Fix docstring error

This was missing from commit 2287b920.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

devnotes.rst: Remove hardcoded Python version

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Merge branch 'stable-2.2'

* stable-2.2:
  Release 2.2.1~rc1
  Require aclocal 1.11.1 or above for devel/release
  Revert "Require aclocal 1.11.1 or above for autogen.sh"
  Add mising --units in gnt-instance list man page
  Set list of trusted SSL CAs for client to verify
  Require aclocal 1.11.1 or above for autogen.sh

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Brown-bag fix for leftover comment

I did forgot this in the original patch. Sorry!!!!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

Rework QA interaction with the watcher

The interaction with cron-launched watcher is a well-known failure mode of QA:

---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance

For the following tests it's recommended to turn off the ganeti-watcher cronjob.

---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher
…
Error: Domain 'instance1' does not exist.
Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes
-oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d'
2010-10-13 23:55:04,479: pid=1659 ganeti-watcher:626
ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked
---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher

In order to fix this, we disable the watcher during these tests, and
re-enable it afterwards. To protect against watcher being disabled, we
enable it unconditionally at the start of the QA (we do want it enabled,
in order to see the interaction between the watcher and many
creation/disk replace jobs, etc.).

Note: even after this patch, if a cron-watcher was started and is still
running during the test, we'll have locking issues. I think for now this
is OK, we'll have to see how often that happens.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Add a new watcher option --ignore-pause

During cluster maintenance, when the watcher is disabled, it's useful to
run it just once. This is incovenient to do currently, as the watcher
needs to be unpaused, then run, then paused again.

This patch adds an option “--ignore-pause” that can be used to ignore
the cluster-level setting. Also the man page is updated as it was
missing the options available.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Release 2.2.1~rc1

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Merge branch 'stable-2.2' into devel-2.2

* stable-2.2:
  Require aclocal 1.11.1 or above for devel/release
  Revert "Require aclocal 1.11.1 or above for autogen.sh"
  Set list of trusted SSL CAs for client to verify
  Require aclocal 1.11.1 or above for autogen.sh

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Fix compatibility with Pyinotify 0.8

I didn't know why the code previously used
“pyinotify.EventsCodes.ALL_FLAGS” instead of using the flags from
“pyinotify.EventsCodes” directly. Turns out that Pyinotify 0.8 has them
in “pyinotify”, not “pyinotify.EventsCodes”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ganeti-rapi: Watch directory, not file for user file changes

We noticed several issues when just watching the file, among them race
conditions upon replacing the file using rename(2) (the new watcher
would be created too soon). By just watching the directory for events on
the rapi_users file, this can be avoided.

A nice side-effect is that now the users file is also reloaded if it
didn't exist upon ganeti-rapi's start (see the documentation update).

Since ganeti-rapi now becomes active for virtually every change in the
configuration directory (…/lib/ganeti), moving the rapi_users file to a
separate directory will be considered. It doesn't have to happen in or
before this patch, though.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Extract base class from SingleFileEventHandler

The base class can contain code useful to other inotify users.
As it is “SingleFileEventHandler” can not be used in ganeti-rapi,
therefore it'll use its own small inotify handler class based
on this base class.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

http.auth.ReadPasswordFile: Don't read file directly

Reading the file before this function allows for better error
reporting.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Move the parameter types to their own module

This is for cleanup, and for later reuse in other parts of the code
(outside of LUs).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

"Fix" handling of old software versions on startup

Currently, masterd startup with old software versions is very confusing
for users: we present two tracebacks, with a message in the middle about
"version mismatch". This can lead to users believing that all that needs
to be done is to fix the config file.

This patch attempts to improve this by handling this case in masterd
itself (not in the child), and showing a more friendly message for this
case.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Require aclocal 1.11.1 or above for devel/release

1.11.1 is the version in squeeze and lucid, and we know it works. We
also know that 1.10.1 in hardy and lenny doesn't, nor do 1.10 in etch
and 1.9.6 in dapper. We haven't tested any other version.

With older versions python.m4 is buggy, and results in the package being
built not working on python 2.6 (which uses dist-packages rather than
site-packages as a module directory).

Version comparison is done component-by-component, over a bash array.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Revert "Require aclocal 1.11.1 or above for autogen.sh"

The comparison is incorrect, and the check also breaks daily work on
autobuilders and older distros.

This reverts commit dbc4dda7f5b66c9905c3cf6e44414536a5b38177.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Export more information via LUQueryInstances/RAPI

Currently, the custom instance parameters (hv, be, nicp) are only
queryable via LUQueryInstanceData. LUQueryInstance returns only the
filled parameters, thus its users (especially RAPI) have no way to know
if a parameter is custom or the default value.

This patch adds three new parameters: custom_hvparams, custom_beparams,
custom_nicparams, that are also exported in RAPI.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Add mising --units in gnt-instance list man page

Also fixes some wrapping issues, and one typo.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
(cherry picked from commit f8409165b4e6d24bd160ee6c85ba432ae8afa117)

Conflicts:

man/gnt-instance.sgml (re-wrapped)

Set list of trusted SSL CAs for client to verify

As per SSL_CTX_set_client_CA_list(3SSL), set the list of acceptable CAs
advertised to SSL clients to include the server's own certificate. This
evidently fixes the pycurl/gnutls RPC client.

During the TLS Handshake, when client verification is requested, the
Server sends a CertificateRequest message which states that the client
should send a valid certificate as a response. The CertificateRequest
message contains a section called "certificate_authorities", which,
according to the standard, is a list of the Distinguished Names (DNs) of
acceptable certification authorities. The client uses this list to send
a certificate signed by one of the acceptable CAs.

Under OpenSSL's server implementation, this list must be set manually
using some appropriate call, otherwise the list is empty. TLS 1.0[1]
does not state whether the list may be left blank, whereas TLS 1.1[2]
and 1.2[3] state that in case the list is blank, then the client *may*
send any certificate of a valid type (valid types are specified
elsewhere in the handshake).

OpenSSL clients seem to obey the behaviour specified in TLS 1.1+,
whereas at least curl+GnuTLS does not send any certificates if the list
is empty (which is not wrong per the spec, but also evidently not
configurable).

[1] http://tools.ietf.org/html/rfc2246
[2] http://tools.ietf.org/html/rfc4346
[3] http://tools.ietf.org/html/rfc5246

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

Require aclocal 1.11.1 or above for autogen.sh

1.11.1 is the version in squeeze and lucid, and we know it works. We
also know that 1.10.1 in hardy and lenny doesn't, nor do 1.10 in etch
and 1.9.6 in dapper. We haven't tested any other version.

With older versions python.m4 is buggy, and results in the package being
built not working on python 2.6 (which uses dist-packages rather than
site-packages as a module directory).

The autogen.sh interpreter is changed to bash, as we need to use the [[
builtin to compare versions with "<". [ doesn't have that functionality,
and we can't of course rely on dpkg, which won't be installed on all
distributions.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Show instance state in instance console failures

The current message is not entirely clear, as it doesn't show the reason
why the instance is not running.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Fix epydoc errors

And sorry!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

jqueue: Fix bug when cancelling jobs

If a job was cancelled while it was waiting for locks, an assertion
would've failed. This patch fixes the problem and provides a unit
test to check for this situation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

jqueue: Resume jobs from “waitlock” status (2nd try)

Commit 5ef699a0e had to roll back an earlier attempt at implementing
this. With the improved job queue processer, this is finally possible.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

jqueue/gnt-job: Add job priority fields for display

These fields can help with debugging.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

mcpu: Raise directly in _AcquireLocks

Removes code duplication.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Add prealloc_wipe_disks as a cluster-wide configuration variable

This is the first step for the support of wiping block devices prior
to creation of the instance.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Merge branch 'devel-2.2'

* devel-2.2:
RPC: disable curl's Expect header

Conflicts:
lib/rpc.py (trivial, copyright header)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

RPC: disable curl's Expect header

This patch solves the very slow (~8-9 seconds) gnt-instance modify
behaviour. Well, it solves in general the slow RPC behaviour, but it was
most visible in that LU.

It seems that curl's behaviour with regard to file uploads (via PUT) and
the 'Expect' header are interacting badly with our http server.

First, our http server doesn't properly handle this header. According to
RFC 2616:

  Requirements for HTTP/1.1 origin servers: Upon receiving a request
  which includes an Expect request-header field with the "100-continue"
  expectation, an origin server MUST either respond with 100 (Continue)
  status and continue to read from the input stream, or respond with a
  final status code.

Our server doesn't do this, and hence it triggers this behaviour in curl
(from the curl FAQ):

  4.16 My HTTP POST or PUT requests are slow!

  libcurl makes all POST and PUT requests (except for POST requests with a
  very tiny request body) use the "Expect: 100-continue" header. This header
  allows the server to deny the operation early so that libcurl can bail out
  already before having to send any data. This is useful in authentication
  cases and others.

  However, many servers don't implement the Expect: stuff properly and if the
  server doesn't respond (positively) within 1 second libcurl will continue
  and send off the data anyway.

  You can disable libcurl's use of the Expect: header the same way you disable
  any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.

This behaviour was detected by watching the captured traffic (in non-SSL
mode), where between the initial HTTP headers (ending with the Expect
one), there was a ~1-2 second pause until curl was sending the body.
Properly RTFM-ing would have saved ~1 day of digging around, but hey…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Merge branch 'devel-2.2'

* devel-2.2:
Release Ganeti 2.2.0.1
Bump version to 2.2.1~rc0

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>

Merge commit 'v2.2.0.1' into stable-2.2

* commit 'v2.2.0.1':
  Release Ganeti 2.2.0.1

Conflicts:
NEWS
  - merge
configure.ac
  - keep 2.2.1~rc0 version

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>

Release Ganeti 2.2.0.1

2.2.0 was built with old autotools, and it's incompatible with Python
2.6. Rebuilding with a newer autotools version fixes this.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>

Change QA log output

Currently, the logging in QA doesn't show the duration of the various
steps, and if it is needed one has to perform log manipulation. This
patch changes the output so that the log informatio is line based (as
opposed to block-based), such that it's easy to grep for all log lines:

./qa/ganeti-qa.py --yes-do-it qa.json 2>&1|grep ^----
---- 2010-10-08 14:40:21.730382 start Test SSH connection --------------
---- 2010-10-08 14:40:23.156633 time=0:00:01.426251 Test SSH connection
---- 2010-10-08 14:40:23.156735 start ICMP ping each node --------------
---- 2010-10-08 14:40:24.230479 time=0:00:01.073744 ICMP ping each node
---- 2010-10-08 14:40:24.230583 start Test availibility of Ganeti commands
---- 2010-10-08 14:40:32.314586 time=0:00:08.084003 Test availibility of Ganeti commands
---- 2010-10-08 14:40:32.314734 start gnt-node info --------------------
---- 2010-10-08 14:40:32.860884 time=0:00:00.546150 gnt-node info ------

or just for the duration of the steps:
./qa/ganeti-qa.py --yes-do-it ../qa-mpgntac5.fra.json 2>&1|grep ^----.*time=
---- 2010-10-08 14:42:12.630067 time=0:00:01.239256 Test SSH connection
---- 2010-10-08 14:42:14.204393 time=0:00:01.574221 ICMP ping each node
---- 2010-10-08 14:42:22.170828 time=0:00:07.966331 Test availibility of Ganeti commands
---- 2010-10-08 14:42:22.701030 time=0:00:00.530037 gnt-node info ------

This will help with identifying slow steps or even graphing the QA
duration.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

gnt-job cancel: Use non-zero exit status if canceling failed

This allows the use “gnt-job cancel” in scripts.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

jqueue, CancelJob: Check status only once per call

This simplifies the code a bit--the status is only checked once.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Bump version to 2.2.1~rc0

Also update NEWS.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Merge branch 'devel-2.2'

* devel-2.2:
Try again to fix the inter-cluster move QA test

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Try again to fix the inter-cluster move QA test

This time, we re-establish the old pri/sec nodes corretly. Unfortunately this
will require now a 3-node cluster at least for drbd instances, hence it's
somewhat suboptimal, but… The other option would be to move it simply from p:s
to s:p and then back to p:s, without involving a third node (for DRBD case),
but I think that moving it to a completely separate node is slightly better for
testing.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Fix a rare bug in StartDaemonChild and GenericMain

I've seen cases where the result from str(sys.exc_info()[1]) is ""; this
breaks the error reporting as the parent relies on non-empty error
messages to properly detect child status (otherwise it will try to read
the pid and fail, so on).

While this was always in case of asserts, we need to ensure this doesn't
happen. Therefore we abstract this functionality (writing the error
message) and ensure we write a non-empty string in the new function.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Enhance the error reporting

Since daemon startup error will be often related to socket errors, so it
makes sense to change the original reporting:

  Error when starting daemon process: "(98, 'Address already in use')"

Into:

  Error when starting daemon process: 'Socket-related error: Address
  already in use (errno=98)'

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>