ganeti-local
13 years agohttp.client: Disable SSL session ID cache
Apollon Oikonomopoulos [Fri, 15 Oct 2010 05:55:59 +0000 (08:55 +0300)]
http.client: Disable SSL session ID cache

This patch disables the SSL session ID cache for all cURL operations.
This is needed because http.HttpBase's PyOpenSSL implementation does not
currently set a context using SSL_set_session_id_context(3SSL), cURL
tries to re-use the session ID and, according to
SSL_set_session_id_context(3SSL):

 If the session id context is not set on an SSL/TLS server and client
 certificates are used, stored sessions will not be reused but a fatal
 error will be flagged and the handshake will fail.

Ideally, session caching should be either controlled, or disabled in
HttpBase, however PyOpenSSL does not seem to implement
SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which
are used for these purposes (it seems that only M2Crypto's SSL module
supports these).

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agohttp.auth: Fix docstring error
Michael Hanselmann [Fri, 15 Oct 2010 14:19:54 +0000 (16:19 +0200)]
http.auth: Fix docstring error

This was missing from commit 2287b920.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agodevnotes.rst: Remove hardcoded Python version
Michael Hanselmann [Fri, 15 Oct 2010 14:19:08 +0000 (16:19 +0200)]
devnotes.rst: Remove hardcoded Python version

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'stable-2.2'
Iustin Pop [Thu, 14 Oct 2010 12:29:24 +0000 (14:29 +0200)]
Merge branch 'stable-2.2'

* stable-2.2:
  Release 2.2.1~rc1
  Require aclocal 1.11.1 or above for devel/release
  Revert "Require aclocal 1.11.1 or above for autogen.sh"
  Add mising --units in gnt-instance list man page
  Set list of trusted SSL CAs for client to verify
  Require aclocal 1.11.1 or above for autogen.sh

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoBrown-bag fix for leftover comment
Iustin Pop [Thu, 14 Oct 2010 12:20:46 +0000 (14:20 +0200)]
Brown-bag fix for leftover comment

I did forgot this in the original patch. Sorry!!!!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRework QA interaction with the watcher
Iustin Pop [Thu, 14 Oct 2010 09:40:37 +0000 (11:40 +0200)]
Rework QA interaction with the watcher

The interaction with cron-launched watcher is a well-known failure mode of QA:

---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance

For the following tests it's recommended to turn off the ganeti-watcher cronjob.

---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher

Error: Domain 'instance1' does not exist.
Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes
  -oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d'
2010-10-13 23:55:04,479:  pid=1659 ganeti-watcher:626
 ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked
---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher

In order to fix this, we disable the watcher during these tests, and
re-enable it afterwards. To protect against watcher being disabled, we
enable it unconditionally at the start of the QA (we do want it enabled,
in order to see the interaction between the watcher and many
creation/disk replace jobs, etc.).

Note: even after this patch, if a cron-watcher was started and is still
running during the test, we'll have locking issues. I think for now this
is OK, we'll have to see how often that happens.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd a new watcher option --ignore-pause
Iustin Pop [Thu, 14 Oct 2010 08:44:22 +0000 (10:44 +0200)]
Add a new watcher option --ignore-pause

During cluster maintenance, when the watcher is disabled, it's useful to
run it just once. This is incovenient to do currently, as the watcher
needs to be unpaused, then run, then paused again.

This patch adds an option “--ignore-pause” that can be used to ignore
the cluster-level setting. Also the man page is updated as it was
missing the options available.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRelease 2.2.1~rc1 v2.2.1rc1
Iustin Pop [Thu, 14 Oct 2010 10:58:06 +0000 (12:58 +0200)]
Release 2.2.1~rc1

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'stable-2.2' into devel-2.2
Iustin Pop [Thu, 14 Oct 2010 10:34:57 +0000 (12:34 +0200)]
Merge branch 'stable-2.2' into devel-2.2

* stable-2.2:
  Require aclocal 1.11.1 or above for devel/release
  Revert "Require aclocal 1.11.1 or above for autogen.sh"
  Set list of trusted SSL CAs for client to verify
  Require aclocal 1.11.1 or above for autogen.sh

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix compatibility with Pyinotify 0.8
Michael Hanselmann [Wed, 13 Oct 2010 13:13:44 +0000 (15:13 +0200)]
Fix compatibility with Pyinotify 0.8

I didn't know why the code previously used
“pyinotify.EventsCodes.ALL_FLAGS” instead of using the flags from
“pyinotify.EventsCodes” directly. Turns out that Pyinotify 0.8 has them
in “pyinotify”, not “pyinotify.EventsCodes”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoganeti-rapi: Watch directory, not file for user file changes
Michael Hanselmann [Wed, 13 Oct 2010 10:55:45 +0000 (12:55 +0200)]
ganeti-rapi: Watch directory, not file for user file changes

We noticed several issues when just watching the file, among them race
conditions upon replacing the file using rename(2) (the new watcher
would be created too soon). By just watching the directory for events on
the rapi_users file, this can be avoided.

A nice side-effect is that now the users file is also reloaded if it
didn't exist upon ganeti-rapi's start (see the documentation update).

Since ganeti-rapi now becomes active for virtually every change in the
configuration directory (…/lib/ganeti), moving the rapi_users file to a
separate directory will be considered. It doesn't have to happen in or
before this patch, though.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoExtract base class from SingleFileEventHandler
Michael Hanselmann [Wed, 13 Oct 2010 10:43:27 +0000 (12:43 +0200)]
Extract base class from SingleFileEventHandler

The base class can contain code useful to other inotify users.
As it is “SingleFileEventHandler” can not be used in ganeti-rapi,
therefore it'll use its own small inotify handler class based
on this base class.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agohttp.auth.ReadPasswordFile: Don't read file directly
Michael Hanselmann [Mon, 11 Oct 2010 12:11:19 +0000 (14:11 +0200)]
http.auth.ReadPasswordFile: Don't read file directly

Reading the file before this function allows for better error
reporting.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove the parameter types to their own module
Iustin Pop [Tue, 12 Oct 2010 09:39:51 +0000 (11:39 +0200)]
Move the parameter types to their own module

This is for cleanup, and for later reuse in other parts of the code
(outside of LUs).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years ago"Fix" handling of old software versions on startup
Iustin Pop [Wed, 13 Oct 2010 10:29:26 +0000 (12:29 +0200)]
"Fix" handling of old software versions on startup

Currently, masterd startup with old software versions is very confusing
for users: we present two tracebacks, with a message in the middle about
"version mismatch". This can lead to users believing that all that needs
to be done is to fix the config file.

This patch attempts to improve this by handling this case in masterd
itself (not in the child), and showing a more friendly message for this
case.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRequire aclocal 1.11.1 or above for devel/release
Guido Trotter [Wed, 13 Oct 2010 10:44:55 +0000 (11:44 +0100)]
Require aclocal 1.11.1 or above for devel/release

1.11.1 is the version in squeeze and lucid, and we know it works. We
also know that 1.10.1 in hardy and lenny doesn't, nor do 1.10 in etch
and 1.9.6 in dapper. We haven't tested any other version.

With older versions python.m4 is buggy, and results in the package being
built not working on python 2.6 (which uses dist-packages rather than
site-packages as a module directory).

Version comparison is done component-by-component, over a bash array.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRevert "Require aclocal 1.11.1 or above for autogen.sh"
Guido Trotter [Wed, 13 Oct 2010 10:09:45 +0000 (11:09 +0100)]
Revert "Require aclocal 1.11.1 or above for autogen.sh"

The comparison is incorrect, and the check also breaks daily work on
autobuilders and older distros.

This reverts commit dbc4dda7f5b66c9905c3cf6e44414536a5b38177.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoExport more information via LUQueryInstances/RAPI
Iustin Pop [Tue, 12 Oct 2010 15:33:53 +0000 (17:33 +0200)]
Export more information via LUQueryInstances/RAPI

Currently, the custom instance parameters (hv, be, nicp) are only
queryable via LUQueryInstanceData. LUQueryInstance returns only the
filled parameters, thus its users (especially RAPI) have no way to know
if a parameter is custom or the default value.

This patch adds three new parameters: custom_hvparams, custom_beparams,
custom_nicparams, that are also exported in RAPI.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd mising --units in gnt-instance list man page
Iustin Pop [Tue, 12 Oct 2010 15:50:04 +0000 (17:50 +0200)]
Add mising --units in gnt-instance list man page

Also fixes some wrapping issues, and one typo.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
(cherry picked from commit f8409165b4e6d24bd160ee6c85ba432ae8afa117)

Conflicts:

man/gnt-instance.sgml (re-wrapped)

13 years agoSet list of trusted SSL CAs for client to verify
Apollon Oikonomopoulos [Tue, 12 Oct 2010 15:08:06 +0000 (18:08 +0300)]
Set list of trusted SSL CAs for client to verify

As per SSL_CTX_set_client_CA_list(3SSL), set the list of acceptable CAs
advertised to SSL clients to include the server's own certificate. This
evidently fixes the pycurl/gnutls RPC client.

During the TLS Handshake, when client verification is requested, the
Server sends a CertificateRequest message which states that the client
should send a valid certificate as a response. The CertificateRequest
message contains a section called "certificate_authorities", which,
according to the standard, is a list of the Distinguished Names (DNs) of
acceptable certification authorities. The client uses this list to send
a certificate signed by one of the acceptable CAs.

Under OpenSSL's server implementation, this list must be set manually
using some appropriate call, otherwise the list is empty. TLS 1.0[1]
does not state whether the list may be left blank, whereas TLS 1.1[2]
and 1.2[3] state that in case the list is blank, then the client *may*
send any certificate of a valid type (valid types are specified
elsewhere in the handshake).

OpenSSL clients seem to obey the behaviour specified in TLS 1.1+,
whereas at least curl+GnuTLS does not send any certificates if the list
is empty (which is not wrong per the spec, but also evidently not
configurable).

[1] http://tools.ietf.org/html/rfc2246
[2] http://tools.ietf.org/html/rfc4346
[3] http://tools.ietf.org/html/rfc5246

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRequire aclocal 1.11.1 or above for autogen.sh
Guido Trotter [Tue, 12 Oct 2010 15:30:50 +0000 (16:30 +0100)]
Require aclocal 1.11.1 or above for autogen.sh

1.11.1 is the version in squeeze and lucid, and we know it works. We
also know that 1.10.1 in hardy and lenny doesn't, nor do 1.10 in etch
and 1.9.6 in dapper. We haven't tested any other version.

With older versions python.m4 is buggy, and results in the package being
built not working on python 2.6 (which uses dist-packages rather than
site-packages as a module directory).

The autogen.sh interpreter is changed to bash, as we need to use the [[
builtin to compare versions with "<". [ doesn't have that functionality,
and we can't of course rely on dpkg, which won't be installed on all
distributions.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoShow instance state in instance console failures
Iustin Pop [Tue, 12 Oct 2010 14:54:14 +0000 (16:54 +0200)]
Show instance state in instance console failures

The current message is not entirely clear, as it doesn't show the reason
why the instance is not running.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix epydoc errors
Iustin Pop [Tue, 12 Oct 2010 13:28:58 +0000 (15:28 +0200)]
Fix epydoc errors

And sorry!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agojqueue: Fix bug when cancelling jobs
Michael Hanselmann [Fri, 8 Oct 2010 14:03:17 +0000 (16:03 +0200)]
jqueue: Fix bug when cancelling jobs

If a job was cancelled while it was waiting for locks, an assertion
would've failed. This patch fixes the problem and provides a unit
test to check for this situation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue: Resume jobs from “waitlock” status (2nd try)
Michael Hanselmann [Fri, 8 Oct 2010 14:01:32 +0000 (16:01 +0200)]
jqueue: Resume jobs from “waitlock” status (2nd try)

Commit 5ef699a0e had to roll back an earlier attempt at implementing
this. With the improved job queue processer, this is finally possible.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue/gnt-job: Add job priority fields for display
Michael Hanselmann [Fri, 8 Oct 2010 13:59:30 +0000 (15:59 +0200)]
jqueue/gnt-job: Add job priority fields for display

These fields can help with debugging.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agomcpu: Raise directly in _AcquireLocks
Michael Hanselmann [Fri, 8 Oct 2010 13:57:36 +0000 (15:57 +0200)]
mcpu: Raise directly in _AcquireLocks

Removes code duplication.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd prealloc_wipe_disks as a cluster-wide configuration variable
René Nussbaumer [Tue, 12 Oct 2010 09:04:23 +0000 (11:04 +0200)]
Add prealloc_wipe_disks as a cluster-wide configuration variable

This is the first step for the support of wiping block devices prior
to creation of the instance.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.2'
Iustin Pop [Mon, 11 Oct 2010 13:16:15 +0000 (15:16 +0200)]
Merge branch 'devel-2.2'

* devel-2.2:
  RPC: disable curl's Expect header

Conflicts:
lib/rpc.py (trivial, copyright header)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRPC: disable curl's Expect header
Iustin Pop [Mon, 11 Oct 2010 12:31:09 +0000 (14:31 +0200)]
RPC: disable curl's Expect header

This patch solves the very slow (~8-9 seconds) gnt-instance modify
behaviour. Well, it solves in general the slow RPC behaviour, but it was
most visible in that LU.

It seems that curl's behaviour with regard to file uploads (via PUT) and
the 'Expect' header are interacting badly with our http server.

First, our http server doesn't properly handle this header. According to
RFC 2616:

  Requirements for HTTP/1.1 origin servers: Upon receiving a request
  which includes an Expect request-header field with the "100-continue"
  expectation, an origin server MUST either respond with 100 (Continue)
  status and continue to read from the input stream, or respond with a
  final status code.

Our server doesn't do this, and hence it triggers this behaviour in curl
(from the curl FAQ):

  4.16 My HTTP POST or PUT requests are slow!

  libcurl makes all POST and PUT requests (except for POST requests with a
  very tiny request body) use the "Expect: 100-continue" header. This header
  allows the server to deny the operation early so that libcurl can bail out
  already before having to send any data. This is useful in authentication
  cases and others.

  However, many servers don't implement the Expect: stuff properly and if the
  server doesn't respond (positively) within 1 second libcurl will continue
  and send off the data anyway.

  You can disable libcurl's use of the Expect: header the same way you disable
  any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.

This behaviour was detected by watching the captured traffic (in non-SSL
mode), where between the initial HTTP headers (ending with the Expect
one), there was a ~1-2 second pause until curl was sending the body.
Properly RTFM-ing would have saved ~1 day of digging around, but hey…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'devel-2.2'
Guido Trotter [Fri, 8 Oct 2010 17:46:05 +0000 (18:46 +0100)]
Merge branch 'devel-2.2'

* devel-2.2:
  Release Ganeti 2.2.0.1
  Bump version to 2.2.1~rc0

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>

13 years agoMerge commit 'v2.2.0.1' into stable-2.2
Guido Trotter [Fri, 8 Oct 2010 17:31:29 +0000 (18:31 +0100)]
Merge commit 'v2.2.0.1' into stable-2.2

* commit 'v2.2.0.1':
  Release Ganeti 2.2.0.1

Conflicts:
NEWS
  - merge
configure.ac
  - keep 2.2.1~rc0 version

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>

13 years agoRelease Ganeti 2.2.0.1 v2.2.0.1
Guido Trotter [Fri, 8 Oct 2010 16:55:40 +0000 (17:55 +0100)]
Release Ganeti 2.2.0.1

2.2.0 was built with old autotools, and it's incompatible with Python
2.6. Rebuilding with a newer autotools version fixes this.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>

13 years agoChange QA log output
Iustin Pop [Fri, 8 Oct 2010 12:40:30 +0000 (14:40 +0200)]
Change QA log output

Currently, the logging in QA doesn't show the duration of the various
steps, and if it is needed one has to perform log manipulation. This
patch changes the output so that the log informatio is line based (as
opposed to block-based), such that it's easy to grep for all log lines:

./qa/ganeti-qa.py --yes-do-it qa.json  2>&1|grep ^----
---- 2010-10-08 14:40:21.730382 start Test SSH connection --------------
---- 2010-10-08 14:40:23.156633 time=0:00:01.426251 Test SSH connection
---- 2010-10-08 14:40:23.156735 start ICMP ping each node --------------
---- 2010-10-08 14:40:24.230479 time=0:00:01.073744 ICMP ping each node
---- 2010-10-08 14:40:24.230583 start Test availibility of Ganeti commands
---- 2010-10-08 14:40:32.314586 time=0:00:08.084003 Test availibility of Ganeti commands
---- 2010-10-08 14:40:32.314734 start gnt-node info --------------------
---- 2010-10-08 14:40:32.860884 time=0:00:00.546150 gnt-node info ------

or just for the duration of the steps:
./qa/ganeti-qa.py --yes-do-it ../qa-mpgntac5.fra.json  2>&1|grep ^----.*time=
---- 2010-10-08 14:42:12.630067 time=0:00:01.239256 Test SSH connection
---- 2010-10-08 14:42:14.204393 time=0:00:01.574221 ICMP ping each node
---- 2010-10-08 14:42:22.170828 time=0:00:07.966331 Test availibility of Ganeti commands
---- 2010-10-08 14:42:22.701030 time=0:00:00.530037 gnt-node info ------

This will help with identifying slow steps or even graphing the QA
duration.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agognt-job cancel: Use non-zero exit status if canceling failed
Michael Hanselmann [Thu, 7 Oct 2010 14:58:41 +0000 (16:58 +0200)]
gnt-job cancel: Use non-zero exit status if canceling failed

This allows the use “gnt-job cancel” in scripts.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue, CancelJob: Check status only once per call
Michael Hanselmann [Thu, 7 Oct 2010 14:58:00 +0000 (16:58 +0200)]
jqueue, CancelJob: Check status only once per call

This simplifies the code a bit--the status is only checked once.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoBump version to 2.2.1~rc0 v2.2.1rc0
Michael Hanselmann [Thu, 7 Oct 2010 13:08:47 +0000 (15:08 +0200)]
Bump version to 2.2.1~rc0

Also update NEWS.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.2'
Iustin Pop [Thu, 7 Oct 2010 12:17:10 +0000 (14:17 +0200)]
Merge branch 'devel-2.2'

* devel-2.2:
  Try again to fix the inter-cluster move QA test

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoTry again to fix the inter-cluster move QA test
Iustin Pop [Thu, 7 Oct 2010 09:56:12 +0000 (11:56 +0200)]
Try again to fix the inter-cluster move QA test

This time, we re-establish the old pri/sec nodes corretly. Unfortunately this
will require now a 3-node cluster at least for drbd instances, hence it's
somewhat suboptimal, but… The other option would be to move it simply from p:s
to s:p and then back to p:s, without involving a third node (for DRBD case),
but I think that moving it to a completely separate node is slightly better for
testing.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix a rare bug in StartDaemonChild and GenericMain
Iustin Pop [Wed, 6 Oct 2010 09:15:09 +0000 (11:15 +0200)]
Fix a rare bug in StartDaemonChild and GenericMain

I've seen cases where the result from str(sys.exc_info()[1]) is ""; this
breaks the error reporting as the parent relies on non-empty error
messages to properly detect child status (otherwise it will try to read
the pid and fail, so on).

While this was always in case of asserts, we need to ensure this doesn't
happen. Therefore we abstract this functionality (writing the error
message) and ensure we write a non-empty string in the new function.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoEnhance the error reporting
Iustin Pop [Wed, 6 Oct 2010 08:13:23 +0000 (10:13 +0200)]
Enhance the error reporting

Since daemon startup error will be often related to socket errors, so it
makes sense to change the original reporting:

  Error when starting daemon process: "(98, 'Address already in use')"

Into:

  Error when starting daemon process: 'Socket-related error: Address
  already in use (errno=98)'

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoConvert ganeti daemons to the three-stage startup
Iustin Pop [Wed, 6 Oct 2010 07:57:41 +0000 (09:57 +0200)]
Convert ganeti daemons to the three-stage startup

This makes almost all of the daemons show error messages, and not return
until they finished listening on the appropriate sockets.

Masterd is the only one "special", as it doesn't do enough
initialization in the server creation, only later.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoChange daemon.GenericMain/utils.Daemonize workflow
Iustin Pop [Wed, 6 Oct 2010 07:36:57 +0000 (09:36 +0200)]
Change daemon.GenericMain/utils.Daemonize workflow

This patch copies the pipe-based error reporting functionality from
utils.StartDaemon (I gave up for now on tryin to merge the two).

This patch will fix two longstanding bugs:

- if we fork, we lose all error reporting from the child to the original
  parent
- if we fork, the original parent exits before the child is ready to
  "work" (whatever the work might be)

Both these are fixed once the users of daemon.GenericMain are converted
to the three-state setup, as we'll get error reporting via the pipe and
also not exit until the PrepFn is done.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoChange utils.GenericMain protocol
Iustin Pop [Tue, 5 Oct 2010 14:56:56 +0000 (16:56 +0200)]
Change utils.GenericMain protocol

Currently, GenericMain does a two-staged workflow:

- Check, before forking
- then Exec, after forking

This means we don't have any possibility to treat preparation work
(before the daemon is ready for work) different from the actual work.

The patch adds another PreExec function that is run just before Exec,
and which should ensure that the daemon is ready for serving client
before it returns. Its result is then sent as the third argument to
Exec.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoUse only one version of WritePidFile
Iustin Pop [Tue, 5 Oct 2010 12:30:40 +0000 (14:30 +0200)]
Use only one version of WritePidFile

This patch merges the pid file handling used for ganeti-* daemons and
impexp daemons. The latter version is used, since it's more reliable:
uses locked pid files as opposed to checking 'live' processes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAbstract daemon file descriptor setup
Iustin Pop [Tue, 5 Oct 2010 11:36:45 +0000 (13:36 +0200)]
Abstract daemon file descriptor setup

This does some slight changes:

- Daemonize() doesn't explicitly close the file-descriptors anymore, but
  only implicitly via the usage of dup2
- StartDaemonChild uses separate devnull for stdin (rdonly) and
  stdout/stderr (wronly), or if using a log file, it uses it in append
  mode

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAbstract some daemon functionality
Iustin Pop [Tue, 5 Oct 2010 11:05:50 +0000 (13:05 +0200)]
Abstract some daemon functionality

This patch abstracts the chdir/umask/setsid functionality, which is
identical in the code functions, just that Daemonize did the chdir/umask
in the second child; with this change it does it in the first, as
StartDaemon.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'devel-2.2' into master
Iustin Pop [Wed, 6 Oct 2010 12:38:44 +0000 (14:38 +0200)]
Merge branch 'devel-2.2' into master

* devel-2.2:
  QA: Fix instance move tests

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoQA: Fix instance move tests
Iustin Pop [Wed, 6 Oct 2010 11:41:49 +0000 (13:41 +0200)]
QA: Fix instance move tests

The instance move tests were moving the instance from node pair (A,_) to
(B, A), and left it there. This patch makes sure that the first step
moves the instance to (B,A) but the second one back to (A,B), so that
the instance is left on the same primary node.

The original secondary node is lost though, if I read the code
correctly.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'devel-2.2'
Michael Hanselmann [Tue, 5 Oct 2010 15:03:40 +0000 (17:03 +0200)]
Merge branch 'devel-2.2'

* devel-2.2:
  Add simple unittest for utils.CommaJoin
  LUDelTags: Improve formatting of error message
  LUGetTags: Acquire locks in shared mode
  gnt-cluster: Replace hardcoded “xenvg” with value retrieved from master
  Export VG name via LUQueryConfigValues
  RAPI QA: Override MAC address when moving instance
  move-instance: Allow overriding instance parameters
  cli: Move parsing of --net option to separate function
  kvm: collapse two consecutive extend calls
  kvm: Introduce support for -mem-path

Conflicts:
test/ganeti.cli_unittest.py: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd simple unittest for utils.CommaJoin
Michael Hanselmann [Fri, 1 Oct 2010 15:09:12 +0000 (17:09 +0200)]
Add simple unittest for utils.CommaJoin

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLUDelTags: Improve formatting of error message
Michael Hanselmann [Fri, 1 Oct 2010 14:59:59 +0000 (16:59 +0200)]
LUDelTags: Improve formatting of error message

Use utils.CommaJoin to add spaces after comma, clean up code a bit.

Before: Tag(s) 'bar','baz','foo','moo' not found
After: Tag(s) 'bar', 'baz', 'foo', 'moo' not found

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLUGetTags: Acquire locks in shared mode
Michael Hanselmann [Fri, 1 Oct 2010 14:59:41 +0000 (16:59 +0200)]
LUGetTags: Acquire locks in shared mode

Retrieving tags can be done while the lock is shared. Only writing
needs to be exclusive.

Also add a FIXME for cluster tags, where the code currently doesn't
use any locks except the config lock.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agognt-cluster: Replace hardcoded “xenvg” with value retrieved from master
Michael Hanselmann [Fri, 1 Oct 2010 11:56:53 +0000 (13:56 +0200)]
gnt-cluster: Replace hardcoded “xenvg” with value retrieved from master

This fixes issue 125 (http://code.google.com/p/ganeti/issues/detail?id=125)

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoExport VG name via LUQueryConfigValues
Michael Hanselmann [Fri, 1 Oct 2010 11:56:46 +0000 (13:56 +0200)]
Export VG name via LUQueryConfigValues

This will be used by LUXI client programs to display the VG name.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRAPI QA: Override MAC address when moving instance
Michael Hanselmann [Wed, 29 Sep 2010 15:34:51 +0000 (17:34 +0200)]
RAPI QA: Override MAC address when moving instance

This will make this test work again.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agomove-instance: Allow overriding instance parameters
Michael Hanselmann [Wed, 29 Sep 2010 15:31:36 +0000 (17:31 +0200)]
move-instance: Allow overriding instance parameters

When moving a single instance within the same cluster, the NIC
is not allowed to re-use an existing MAC address. To avoid this,
NIC parameters must be overridden. BE, HV, OS and NIC parameters
can be overridden after applying this patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocli: Move parsing of --net option to separate function
Michael Hanselmann [Wed, 29 Sep 2010 15:30:05 +0000 (17:30 +0200)]
cli: Move parsing of --net option to separate function

This function will also be used in tools/move-instance.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoClean up Ganeti 2.3 design document
Michael Hanselmann [Thu, 30 Sep 2010 15:39:22 +0000 (17:39 +0200)]
Clean up Ganeti 2.3 design document

- Typos
- Fix capitalization
- Fix quoting in some places
- Rewrite part of privilege separation section to
  match with subsection titles

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoQA: Enable all tests by default
Michael Hanselmann [Fri, 1 Oct 2010 13:24:30 +0000 (15:24 +0200)]
QA: Enable all tests by default

This patch enables all tests by default, unless when they're
explicitely disabled in the config file. This will make sure
newly added tests are run even when an old configuration file
is used.

A comment is also added qa-sample.json.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoqa_config: Use ganeti.serializer for loading config
Michael Hanselmann [Wed, 29 Sep 2010 15:56:19 +0000 (17:56 +0200)]
qa_config: Use ganeti.serializer for loading config

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agokvm: collapse two consecutive extend calls
Guido Trotter [Tue, 5 Oct 2010 13:47:51 +0000 (14:47 +0100)]
kvm: collapse two consecutive extend calls

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agokvm: Introduce support for -mem-path
Miguel Di Ciurcio Filho [Tue, 5 Oct 2010 13:35:02 +0000 (10:35 -0300)]
kvm: Introduce support for -mem-path

Using hugepages, KVM instances can get a good performance boost. To
activate that, we need to pass the -mem-path argument to KVM along with
the mount point of the hugetlbfs file system on the node.

For the sake of memory availability computation, we use the -mem-prealloc
argument when enabling hugepages, so KVM will reserve all hugepages it
needs when it starts. This avoids allocating an instance on a node that
will not have enough pages in case other instance needs more than what
is available after it boots.

Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoMerge branch 'devel-2.2'
Iustin Pop [Tue, 5 Oct 2010 09:14:17 +0000 (11:14 +0200)]
Merge branch 'devel-2.2'

* devel-2.2:
  Rename the _oss cluster vars to _os

Conflicts:
lib/objects.py (trivial, strange that this one, and only this one, conflicted)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRename the _oss cluster vars to _os
Iustin Pop [Tue, 5 Oct 2010 09:07:28 +0000 (11:07 +0200)]
Rename the _oss cluster vars to _os

Per the mailing list discussion, rename _oss to _os, both in cluster parameters
and in the rest of the code.

This is just an s/_oss/_os, with the exception of a small bit of cleanup
around the helper_os function in cmdlib.py.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'devel-2.2'
Iustin Pop [Tue, 5 Oct 2010 08:32:14 +0000 (10:32 +0200)]
Merge branch 'devel-2.2'

* devel-2.2:
  gnt-job info: Sort input fields
  KVM: Add function to check the hypervisor version
  Bump version to 2.2.0, update NEWS
  Fix instance rename regression from 3fe11ba3
  Fix instance rename regression from 3fe11ba3
  Update RAPI documentation for /2/nodes/[node_name]/migrate
  Sort OS names and variants in LUDiagnoseOS
  Add some trivial QA tests for the new OS states
  Change behaviour of OpDiagnoseOS w.r.t. 'valid'
  Allow gnt-os modify to change the new OS params
  Add two more _T-type tests
  Add blacklisted/hidden OS support in LUDiagnoseOS
  Restrict blacklisted OSes in instance installation
  Add two new cluster settings
  Abstract OS name/variant functions
  Add OS new states to the design doc
  Remove the RPC changes from the 2.2 design
  Remove 'Detailed Design' from design-2.2.rst

Conflicts:
lib/cli.py
lib/cmdlib.py
lib/objects.py
scripts/gnt-os

All conflicts were trivial.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'stable-2.2' into devel-2.2
Michael Hanselmann [Mon, 4 Oct 2010 16:41:52 +0000 (18:41 +0200)]
Merge branch 'stable-2.2' into devel-2.2

* stable-2.2:
  Bump version to 2.2.0, update NEWS
  Fix instance rename regression from 3fe11ba3

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agognt-job info: Sort input fields
Michael Hanselmann [Mon, 4 Oct 2010 16:40:04 +0000 (18:40 +0200)]
gnt-job info: Sort input fields

This helps to find a value for complex opcodes.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoKVM: Add function to check the hypervisor version
Guido Trotter [Mon, 4 Oct 2010 15:22:53 +0000 (16:22 +0100)]
KVM: Add function to check the hypervisor version

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoBump version to 2.2.0, update NEWS v2.2.0
Michael Hanselmann [Mon, 4 Oct 2010 14:58:08 +0000 (16:58 +0200)]
Bump version to 2.2.0, update NEWS

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix instance rename regression from 3fe11ba3
Iustin Pop [Thu, 30 Sep 2010 00:32:38 +0000 (20:32 -0400)]
Fix instance rename regression from 3fe11ba3

Committ 3fe11ba3 broke the instance rename as we don't use the FQDN
anymore. This fixes it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix instance rename regression from 3fe11ba3
Iustin Pop [Thu, 30 Sep 2010 00:32:38 +0000 (20:32 -0400)]
Fix instance rename regression from 3fe11ba3

Committ 3fe11ba3 broke the instance rename as we don't use the FQDN
anymore. This fixes it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoUpdate RAPI documentation for /2/nodes/[node_name]/migrate
Michael Hanselmann [Thu, 30 Sep 2010 16:16:34 +0000 (18:16 +0200)]
Update RAPI documentation for /2/nodes/[node_name]/migrate

This was forgotten in commit 52194140.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoSort OS names and variants in LUDiagnoseOS
Iustin Pop [Wed, 22 Sep 2010 07:34:12 +0000 (09:34 +0200)]
Sort OS names and variants in LUDiagnoseOS

The OS list and variants as returned from LUDiagnoseOS is not sorted,
and gnt-instance reinstall doesn't sort it either. This means that it
the menu that users are present with is inconsistent across clusters,
and that is confusing.

To make this consistent across all users of the LU, we sort the names in
the LU itself.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd some trivial QA tests for the new OS states
Iustin Pop [Tue, 21 Sep 2010 11:50:07 +0000 (13:50 +0200)]
Add some trivial QA tests for the new OS states

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoChange behaviour of OpDiagnoseOS w.r.t. 'valid'
Iustin Pop [Tue, 21 Sep 2010 08:07:12 +0000 (10:07 +0200)]
Change behaviour of OpDiagnoseOS w.r.t. 'valid'

This patch changes the behaviour of OpDiagnoseOS with regards to the
'valid' field to be similar to the one for the hidden/blacklisted
fields: unless this field is requested, invalid OSes are filtered out.

The rationale is that, except for the gnt-os info/diagnose, all other
users of this opcode are requesting the valid field just to filter out
invalid OSes, and not for any other use. Thus, changing this behaviour
makes these callers simpler.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAllow gnt-os modify to change the new OS params
Iustin Pop [Tue, 21 Sep 2010 07:59:33 +0000 (09:59 +0200)]
Allow gnt-os modify to change the new OS params

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAdd two more _T-type tests
Iustin Pop [Tue, 21 Sep 2010 07:58:53 +0000 (09:58 +0200)]
Add two more _T-type tests

These are useful for more in-depth checking of some kinds of arguments.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoAdd blacklisted/hidden OS support in LUDiagnoseOS
Iustin Pop [Mon, 20 Sep 2010 09:52:31 +0000 (11:52 +0200)]
Add blacklisted/hidden OS support in LUDiagnoseOS

This changes the behaviour of LUDiagnoseOS significantly.

The addition of hidden/blacklisted OSes would mean that each user-facing
client would have to filter intentionally such OSes from display, which
is not a good choice. Rather, the patch makes LUDiagnoseOS not return
any hidden or blacklisted OSes unless the hidden or respectively the
blacklisted status is requested.

While unconventional, this makes `gnt-instance reinstall --select-os`
work as intended without any changes; similar for gnt-os list. gnt-os
diagnose/gnt-os info are changed to query for, and display the new
fields.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRestrict blacklisted OSes in instance installation
Iustin Pop [Thu, 16 Sep 2010 15:42:51 +0000 (17:42 +0200)]
Restrict blacklisted OSes in instance installation

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAdd two new cluster settings
Iustin Pop [Thu, 16 Sep 2010 14:34:15 +0000 (16:34 +0200)]
Add two new cluster settings

The new variables are:

- a list of hidden OSes, that should not be displayed to the users in
  interactive selection (e.g. reinstall); however, if they are selected, they
  can be used
- a list of OSes that should be hidden and blocked from install-time selection

The filtering will apply at pure OS name level, not OS+variant level.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAbstract OS name/variant functions
Iustin Pop [Thu, 16 Sep 2010 14:00:25 +0000 (16:00 +0200)]
Abstract OS name/variant functions

Currently, the computation of the 'pure' name or the variant is
hardcoded and spread around the functions that need it. This is not
nice, and in the future we'd spread it even more with more usage of
variants/pure os names.

This patch abstracts these functions into the OS class, and then
replaces the hardcoded uses with the new functions.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAdd OS new states to the design doc
Iustin Pop [Tue, 21 Sep 2010 08:38:04 +0000 (10:38 +0200)]
Add OS new states to the design doc

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRemove the RPC changes from the 2.2 design
Iustin Pop [Tue, 21 Sep 2010 08:26:02 +0000 (10:26 +0200)]
Remove the RPC changes from the 2.2 design

These were not implemented.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRemove 'Detailed Design' from design-2.2.rst
Iustin Pop [Tue, 21 Sep 2010 08:24:18 +0000 (10:24 +0200)]
Remove 'Detailed Design' from design-2.2.rst

This also bumps up the rest of the headings.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agognt-debug: Test job submission as part of “test-jobqueue”
Michael Hanselmann [Fri, 24 Sep 2010 16:20:25 +0000 (18:20 +0200)]
gnt-debug: Test job submission as part of “test-jobqueue”

This checks whether jobs with invalid priorities are rejected.
At the same time it tests SubmitJob and SubmitManyJobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAvoid nodegroup name/uuid conflicts
Guido Trotter [Tue, 7 Sep 2010 09:43:35 +0000 (10:43 +0100)]
Avoid nodegroup name/uuid conflicts

Forbid nodegroups to be called with a name that matches the UUID regular
expression. Uppercase versions are forbidden as well.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove the uuid regexp to utils.py
Guido Trotter [Wed, 29 Sep 2010 10:19:06 +0000 (11:19 +0100)]
Move the uuid regexp to utils.py

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix docstring typo in jqueue._JobProcessor._MarkWaitlock
Michael Hanselmann [Fri, 24 Sep 2010 15:27:20 +0000 (17:27 +0200)]
Fix docstring typo in jqueue._JobProcessor._MarkWaitlock

epydoc complained:
“File …/ganeti/jqueue.py, line 886, in
ganeti.jqueue._JobProcessor._MarkWaitlock
  Warning: Redefinition of type for job”

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue: Use priority for acquiring locks
Michael Hanselmann [Thu, 23 Sep 2010 16:36:22 +0000 (18:36 +0200)]
jqueue: Use priority for acquiring locks

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agomcpu: Implement priority for lock acquiring
Michael Hanselmann [Thu, 23 Sep 2010 16:35:37 +0000 (18:35 +0200)]
mcpu: Implement priority for lock acquiring

Until now the priority for lock acquires couldn't be passed
when running opcodes.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agolocking: Implement priority in Ganeti lock manager
Michael Hanselmann [Thu, 23 Sep 2010 13:01:33 +0000 (15:01 +0200)]
locking: Implement priority in Ganeti lock manager

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agolocking: Don't set default priority as keyword default
Michael Hanselmann [Thu, 23 Sep 2010 13:01:07 +0000 (15:01 +0200)]
locking: Don't set default priority as keyword default

This allows users of these classes to simply pass None if they want to use the
default value (the actual default is an internal constant), instead of
dynamically constructing the keyword arguments.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agojqueue: Use timeout when acquiring locks
Michael Hanselmann [Wed, 22 Sep 2010 10:13:33 +0000 (12:13 +0200)]
jqueue: Use timeout when acquiring locks

As already noted in the design document, an opcode's priority is
increased when the lock(s) can't be acquired within a certain amount of
time, except at the highest priority, where in such a case a blocking
acquire is used.

A unittest is provided. Priorities are not yet used for acquiring the
lock(s)—this will need further changes on mcpu.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoAdding design-doc for user separation
René Nussbaumer [Tue, 18 May 2010 12:39:18 +0000 (14:39 +0200)]
Adding design-doc for user separation

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMigrate call from backend._GetVGInfo to bdev.LogicalVolume.GetVGInfo
René Nussbaumer [Thu, 23 Sep 2010 11:23:51 +0000 (13:23 +0200)]
Migrate call from backend._GetVGInfo to bdev.LogicalVolume.GetVGInfo

This patch removes duplicate code found in backend which also needs to
get VG infos. To make it simpler we moved to bdev.LogicalVolume.GetVGInfo.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agojqueue: Introduce per-opcode context object
Michael Hanselmann [Mon, 20 Sep 2010 17:31:36 +0000 (19:31 +0200)]
jqueue: Introduce per-opcode context object

This is better to group per-opcode data.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agomcpu: Adjust lock acquire strategy
Michael Hanselmann [Mon, 20 Sep 2010 16:15:03 +0000 (18:15 +0200)]
mcpu: Adjust lock acquire strategy

The changes to job queue processing require some changes on this class'
interface. LockAttemptTimeoutStrategy might move to another place, but that'll
be done in a later patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agomcpu.Processor: Raise exception on lock acquire timeout
Michael Hanselmann [Mon, 20 Sep 2010 16:13:52 +0000 (18:13 +0200)]
mcpu.Processor: Raise exception on lock acquire timeout

Right now the timeout is not passed by any caller, making the code
effectively go back to blocking acquires. Since the timeout is always
None, no caller needs to be changed in this patch.

This change also means that any LUXI query handled by ganeti-masterd
will use blocking acquires if they need locks (only the case for getting
tags).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agojqueue: Rename current_op to better reflect what it actually is
Michael Hanselmann [Mon, 20 Sep 2010 17:30:19 +0000 (19:30 +0200)]
jqueue: Rename current_op to better reflect what it actually is

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>