ganeti-local
14 years agoIntroduce a micro type system for opcodes
Iustin Pop [Fri, 18 Jun 2010 15:45:15 +0000 (17:45 +0200)]
Introduce a micro type system for opcodes

Currently, we have one structual validation for opcode attributes: the
_OP_REQP, which checks that a given attribute is not 'None', and the
rest of the checks are done at runtime. This means our type system has
two types: None versus Not-None.

We have been hit many times by small, trivial bugs in this area, and
only a huge amount of unittest and/or hand-written checks would ensure
that we cover all possibilities. This patch attempts to redress the
needs for manual checks by introducing a micro-type system for the
validation of the opcode attributes. What we lose, from the start, are
the custom error messages (e.g. "Invalid reboot mode, choose one of …",
or "The disk index must be a positive integer"). What we gain is the
ability to express easily things as:

- this parameter must be None or an int
- this parameter must be a non-empty list
- this parameter must be either none or a list of dictionaries with keys
  from the list of valid hypervisors and the values dictionaries with
  keys strings and values either None or strings; furthermore, the list
  must be non-empty

These examples show that we have a composable (as opposed to just a few
static types) system, and that we can nest it a few times (just for
sanity; we could nest it up to stack depth).

We also gain lots of ))))))), which is not that nice :)

The current patch moves the existing _OP_REQP to the new framework, but
if accepted, a lot more validations should move to it. In the end, we
definitely should declare a type for all the opcode parameters
(eventually moving _OP_REQP directly to opcodes.py and validating in the
load/init case, and build __slots__ from it).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoSome more CheckPrereq/CheckArguments cleanup
Iustin Pop [Fri, 18 Jun 2010 12:40:17 +0000 (14:40 +0200)]
Some more CheckPrereq/CheckArguments cleanup

For a few LUs, a few tests in, or even the whole CheckPrereq, can be
moved to CheckArguments, as they don't touch state and only do a 'type'
validation.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoLU.CheckPrereq: do not require implementation
Iustin Pop [Fri, 18 Jun 2010 10:18:26 +0000 (12:18 +0200)]
LU.CheckPrereq: do not require implementation

Currently, the base class LogicalUnit's CheckPrereq will raise
NotImplementedError, which means that the child LUs have to implement
it. However, many LUs don't actually have a need for this function
(hence the many "pass" statements as the only body).

By changing the base class behaviour, we can simplify many LUs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoRemove the obsolete EvacuateNode OpCode/LU
Iustin Pop [Fri, 18 Jun 2010 09:58:12 +0000 (11:58 +0200)]
Remove the obsolete EvacuateNode OpCode/LU

All code has been switched to the new-style LU… time for cleanup.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoRAPI: switch evacuate node to the new model
Iustin Pop [Fri, 18 Jun 2010 09:55:52 +0000 (11:55 +0200)]
RAPI: switch evacuate node to the new model

This patch removes the last use of the old-style OpEvacuateNode. It also
fixes the dry-run mode for this RAPI resource - the dry-run parameter
was not used at all before.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAbstract export mode validity check
Iustin Pop [Fri, 18 Jun 2010 09:36:14 +0000 (11:36 +0200)]
Abstract export mode validity check

The export mode is checked in two places with the exact same code…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoCleanup LU.ExpandNames versus CheckArguments
Iustin Pop [Fri, 18 Jun 2010 09:22:30 +0000 (11:22 +0200)]
Cleanup LU.ExpandNames versus CheckArguments

When LogicalUnit.CheckArguments was introduced, not all code dealing
with static argument checking was moved to it; many of these checks were
left in ExpandNames. With time, most of them migrated, and this patch
does the final cleanups.

The patch is straightforward, with the exception of LURebootInstance,
where an old style ParameterError exception is converted to the new
OpPrereqError with ECODE_INVAL.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoMove opcode attribute defaults to data structures
Iustin Pop [Fri, 18 Jun 2010 06:17:00 +0000 (08:17 +0200)]
Move opcode attribute defaults to data structures

LUExportInstance had two opcode fields set to default via both
_CheckBooleanOpField and getattr(…, False).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd OS verification support to cluster verify
Iustin Pop [Tue, 15 Jun 2010 22:40:44 +0000 (00:40 +0200)]
Add OS verification support to cluster verify

For this, we needed to extend the NodeImage class with a few extra
variables, and we do a trick in the node verification where we pick the
first node that returned valid OS data as the reference node, and then
we compare all other nodes against it.

The checks added are:

- consistency of DiagnoseOS responses
- multiple paths for an OS
- inconsistent OS between a reference node and the current node

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoSimplify gnt-os diagnose output
Iustin Pop [Wed, 16 Jun 2010 02:56:48 +0000 (04:56 +0200)]
Simplify gnt-os diagnose output

Currently, we always list the api/variants, even if these are empty.
This patch changes so that we make clear distiction for empty values
("[no variants]" versus "[variants: ]"), and we only list variants and
parameters when the OS API indicates they should be supported.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd a new gnt-os info command
Iustin Pop [Mon, 14 Jun 2010 17:42:15 +0000 (19:42 +0200)]
Add a new gnt-os info command

This can be used to show the actual OS parameters and supported
variants, in a global manner (rather than per-node as gnt-os diagnose).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoLUDiagnoseOS: add more fields, cleanup
Iustin Pop [Mon, 14 Jun 2010 23:45:20 +0000 (01:45 +0200)]
LUDiagnoseOS: add more fields, cleanup

This patch exports all the way from backend a new field ‘api_version’
which holds the list of support API versions, and exposes the (already
computed) ‘parameters’ field.

The patch also reworks (again) the field calculation in its Exec()
method. All callers of LUDiagnoseOS pass in the 'valid' and 'variants'
parameters, thus having the special casing of whether to compute or not
the validity seems overkill. We move to a model where we always compute
these across-nodes arguments, in order to simplify the code, and we also
change the parameters set to be intersection of all node's values (which
means a change in description will drop the parameter from the list of
parameters).

Additionally, we update scripts/gnt-os, which was broken for multi-dir
OSes since the introduction of variants…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd support for OS parameters during import/export
Iustin Pop [Sun, 13 Jun 2010 06:18:52 +0000 (08:18 +0200)]
Add support for OS parameters during import/export

Nothing special here, just copy/adjust the beparams code.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd support for modifying instance OS parameters
Iustin Pop [Sun, 13 Jun 2010 05:57:15 +0000 (07:57 +0200)]
Add support for modifying instance OS parameters

We move the instance OS rename checks earlier, as we need to run the
validation against the new OS, if it has changed.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd support for modifying cluster OS parameters
Iustin Pop [Sun, 13 Jun 2010 00:48:34 +0000 (02:48 +0200)]
Add support for modifying cluster OS parameters

We use _GetUpdatedParams in order to support removal too, and then
validate the OS parameters if the OS exists.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years ago_GetUpdatedParams: enhance value removal options
Iustin Pop [Sun, 13 Jun 2010 22:01:53 +0000 (00:01 +0200)]
_GetUpdatedParams: enhance value removal options

This patch adds controls for whether we recognize
constants.VALUE_DEFAULT or not as a default value, and also adds
dash-prefixes as another way for parameter removal.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd support for OS parameters during instance add
Iustin Pop [Sat, 12 Jun 2010 07:11:50 +0000 (09:11 +0200)]
Add support for OS parameters during instance add

This is not yet complete, as it lacks proper support for instance
import.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoShow OS parameters in cluster/instance info
Iustin Pop [Sat, 12 Jun 2010 06:49:29 +0000 (08:49 +0200)]
Show OS parameters in cluster/instance info

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd OS parameters to cluster and instance objects
Iustin Pop [Sat, 12 Jun 2010 02:17:20 +0000 (04:17 +0200)]
Add OS parameters to cluster and instance objects

The patch also modifies the instance RPC calls to fill the osparameters
correctly with the cluster defaults, and exports the OS parameters in
the instance/OS environment.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoIntroduce an RPC call for OS parameters validation
Iustin Pop [Sat, 12 Jun 2010 02:13:29 +0000 (04:13 +0200)]
Introduce an RPC call for OS parameters validation

While we only support the 'parameters' check today, the RPC call is
generic enough that will be able to support other checks in the future.
The backend function will both validate the parameters list (so as to
make sure we don't pass in extra parameters that the OS validation
doesn't care about) and the parameter values, via the OS verify script.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd reading of OS parameters from disk
Iustin Pop [Sat, 12 Jun 2010 02:02:06 +0000 (04:02 +0200)]
Add reading of OS parameters from disk

The patch also modifies the internal methods in LUDiagnoseOS and gnt-os
to deal with the format change of call_os_diagnose.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd os api v20 and related fields to the OS object
Iustin Pop [Sat, 12 Jun 2010 01:55:59 +0000 (03:55 +0200)]
Add os api v20 and related fields to the OS object

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoSilence a pylint warning
Iustin Pop [Wed, 16 Jun 2010 02:22:51 +0000 (04:22 +0200)]
Silence a pylint warning

The OS parameters code will bump the number of lines over 10K, and thus
we need to silence this (no, we don't want any other module to become
this big…, so we use a targeted silence only).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoUpdate the 2.2 design doc with OS parameters
Iustin Pop [Wed, 23 Jun 2010 04:29:57 +0000 (06:29 +0200)]
Update the 2.2 design doc with OS parameters

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoRemove job object condition
Guido Trotter [Tue, 22 Jun 2010 09:46:05 +0000 (11:46 +0200)]
Remove job object condition

We don't need it anymore, since nobody waits on it.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoParallelize WaitForJobChanges
Guido Trotter [Tue, 15 Jun 2010 16:39:30 +0000 (17:39 +0100)]
Parallelize WaitForJobChanges

As for QueryJobs we rely on file updates rather than condition
notification to acquire job changes. In order to do that we use the
pyinotify module to watch files. This might make the client a bit slower
(pending planned improvements, such as subscription-based
WaitForJobChanges) but detaches it from the job execution.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoUpdate the job file on feedback
Guido Trotter [Tue, 22 Jun 2010 09:02:10 +0000 (11:02 +0200)]
Update the job file on feedback

This is needed to convert waitforjobchanges to use inotify and the
on-disk version and decouple it from the job queue lock. No replication
to remote nodes is done, to keep the operation fast.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoDon't lock on QueryJobs, by using the disk version
Guido Trotter [Mon, 14 Jun 2010 10:23:52 +0000 (11:23 +0100)]
Don't lock on QueryJobs, by using the disk version

We move from querying the in-memory version to loading all jobs from the
disk. Since the jobs are written/deleted on disk in an atomic manner, we
don't need to lock at all. Also, since we're just looking at the
contents of a directory, we don't need to check that the job queue is
"open".

If some jobs are removed between when we listed them and us loading
them, we need to be able to cope: if we were asked to load those jobs
specifically, we must report the failure, but if we were just asked to
"load all" we shall just not consider them as part of the "all" set,
since they were deleted.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd JobQueue.SafeLoadJobFromDisk
Guido Trotter [Tue, 22 Jun 2010 09:35:32 +0000 (11:35 +0200)]
Add JobQueue.SafeLoadJobFromDisk

This will be used to read a job file without having to deal with
exceptions from _LoadJobFromDisk.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agojqueue._LoadJobFromDisk: remove safety archival
Guido Trotter [Tue, 22 Jun 2010 09:19:47 +0000 (11:19 +0200)]
jqueue._LoadJobFromDisk: remove safety archival

Currently _LoadJobFromDisk archives job files it finds corrupted. Since
we want to use it to load files without holding locks, this could cause
a conflict: we just move the feature to _LoadJobUnlocked which is always
called with the lock held.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd repetition count to the TestDelay opcode
Guido Trotter [Wed, 23 Jun 2010 07:50:26 +0000 (09:50 +0200)]
Add repetition count to the TestDelay opcode

If the repetition count is not passed or is passed as 0 we sleep exactly
one time, otherwise we sleep "repeat" times and log in between.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMerge branch 'devel-2.1'
Iustin Pop [Tue, 22 Jun 2010 13:25:55 +0000 (15:25 +0200)]
Merge branch 'devel-2.1'

* devel-2.1:
  Add "adopt" to the allowed disk parameters
  Improve pylintrc for pylint 0.21+
  Fix warnings with Python 2.6
  Fix a small bug introduced in cf26a87a
  Fix the type of 'valid' attribute in LUDiagnoseOS

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd "adopt" to the allowed disk parameters
Apollon Oikonomopoulos [Fri, 18 Jun 2010 14:52:05 +0000 (17:52 +0300)]
Add "adopt" to the allowed disk parameters

"adopt" was missing from bd061c3, thus breaking disk adoption.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoImprove pylintrc for pylint 0.21+
Iustin Pop [Tue, 22 Jun 2010 09:48:36 +0000 (11:48 +0200)]
Improve pylintrc for pylint 0.21+

While we'll need to update the source files too, at least this change
makes pylint 0.21 not fail on the current source tree.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoFix warnings with Python 2.6
Iustin Pop [Tue, 22 Jun 2010 09:38:23 +0000 (11:38 +0200)]
Fix warnings with Python 2.6

'format' is a new built-in function, and 'bytes' is a new builtin type.
We rename this to make pylint happy (and remove potential bugs).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoFix a small bug introduced in cf26a87a
Iustin Pop [Fri, 18 Jun 2010 12:30:48 +0000 (14:30 +0200)]
Fix a small bug introduced in cf26a87a

Commit cf26a87a added a tiny typo, which would break non-FQDN arguments
to modify node storage.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoFix the type of 'valid' attribute in LUDiagnoseOS
Iustin Pop [Mon, 14 Jun 2010 20:09:23 +0000 (22:09 +0200)]
Fix the type of 'valid' attribute in LUDiagnoseOS

The update of the valid status in LUDiagnoseOS says:

  valid = valid and osl and osl[0][1]

However, in Python, “True and []” (which '[]' we get for an invalid OS)
will result in “[]”, and thus the valid field for an OS will be either
True or an empty list. Which is not what we want…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd "adopt" to the allowed disk parameters
Apollon Oikonomopoulos [Fri, 18 Jun 2010 14:52:05 +0000 (17:52 +0300)]
Add "adopt" to the allowed disk parameters

"adopt" was missing from bd061c3, thus breaking disk adoption.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMerge branch 'stable-2.1'
Guido Trotter [Fri, 18 Jun 2010 10:41:30 +0000 (11:41 +0100)]
Merge branch 'stable-2.1'

* stable-2.1:
  Bump up version for the 2.1.4 release
  Update NEWS about the latest 2.1 change
  Fix handling of errors from socket.gethostbyname
  Update a comment in qa-sample.json
  RAPI client: Add support for Python 2.6
  Update NEWS for Ganeti 2.1.4

Conflicts:
NEWS: keep both
configure.ac: keep the 2.2 version
qa/qa-sample.json: merge nearby changes

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoBump up version for the 2.1.4 release v2.1.4
Guido Trotter [Thu, 17 Jun 2010 15:09:26 +0000 (16:09 +0100)]
Bump up version for the 2.1.4 release

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoUpdate NEWS about the latest 2.1 change
Guido Trotter [Thu, 17 Jun 2010 17:06:22 +0000 (18:06 +0100)]
Update NEWS about the latest 2.1 change

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix handling of errors from socket.gethostbyname
Iustin Pop [Wed, 16 Jun 2010 03:16:05 +0000 (05:16 +0200)]
Fix handling of errors from socket.gethostbyname

Socket functions can raise more than just gaierror. Most of the times,
socket.gethostbyname_ex will return gaierror, but rarely it will also
raise herror. For completeness, we catch all socket exceptions with data
of type (code, description).

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoUpdate a comment in qa-sample.json
Guido Trotter [Thu, 17 Jun 2010 16:53:56 +0000 (17:53 +0100)]
Update a comment in qa-sample.json

Fix the sentence to say what it means.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agognt-debug: remove @todo from GenericOpCodes
Guido Trotter [Thu, 17 Jun 2010 14:46:09 +0000 (15:46 +0100)]
gnt-debug: remove @todo from GenericOpCodes

- the function is not broken, and we're using in nowadays
- we have example json files and all, which show its usage
=> the todo is incorrect

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agojqueue.AddManyJobs: use AddManyTasks
Guido Trotter [Thu, 17 Jun 2010 13:02:20 +0000 (14:02 +0100)]
jqueue.AddManyJobs: use AddManyTasks

Rather than adding the jobs to the worker pool one at a time, we add
them all together, which is slightly faster, and ensures they don't get
started while we loop.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoWorkerpool.AddManyTasks: check tasks type
Guido Trotter [Thu, 17 Jun 2010 13:32:55 +0000 (14:32 +0100)]
Workerpool.AddManyTasks: check tasks type

Each task has to be a sequence, or the RunTask call will fail.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agocount the number of tasks done in the wp unittest
Guido Trotter [Thu, 17 Jun 2010 13:02:32 +0000 (14:02 +0100)]
count the number of tasks done in the wp unittest

Currently there's no way to know if something actually gets done.
After this check we actually test that the threads do their job.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRAPI client: Add support for Python 2.6
Michael Hanselmann [Thu, 17 Jun 2010 14:48:43 +0000 (16:48 +0200)]
RAPI client: Add support for Python 2.6

The httplib module used by urllib2 requires its sockets to have a
makefile() method to provide a file-like interface (or rather
file-in-Python-like) to the socket. PyOpenSSL doesn't implement
makefile() as the semantics require files to call dup(2) on the
underlying file descriptors, something not easily done on SSL sockets.

Python up to and including 2.5 have a class to simulate makefile(),
httplib.FakeSocket. With the addition of SSL support in Python 2.6, this
class was deprecated and no longer functions.

This patch adds a new, simpler wrapper class which is used in Python 2.6
and above only. It's good enough for this use.

There are general problems in these generic wrapper classes--none of
them handles SSL I/O properly. They break, for example, when the server
requests a renegotiation. This will need more work.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoRAPI client: Add support for Python 2.6
Michael Hanselmann [Thu, 17 Jun 2010 14:48:43 +0000 (16:48 +0200)]
RAPI client: Add support for Python 2.6

The httplib module used by urllib2 requires its sockets to have a
makefile() method to provide a file-like interface (or rather
file-in-Python-like) to the socket. PyOpenSSL doesn't implement
makefile() as the semantics require files to call dup(2) on the
underlying file descriptors, something not easily done on SSL sockets.

Python up to and including 2.5 have a class to simulate makefile(),
httplib.FakeSocket. With the addition of SSL support in Python 2.6, this
class was deprecated and no longer functions.

This patch adds a new, simpler wrapper class which is used in Python 2.6
and above only. It's good enough for this use.

There are general problems in these generic wrapper classes--none of
them handles SSL I/O properly. They break, for example, when the server
requests a renegotiation. This will need more work.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoBump RPC protocol version to 40
Michael Hanselmann [Thu, 17 Jun 2010 12:14:19 +0000 (14:14 +0200)]
Bump RPC protocol version to 40

Many RPC calls have changed in Ganeti 2.2, hence bumping the RPC protocol
version.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoChange ganeti-cleaner unittest to not use random values
Michael Hanselmann [Thu, 17 Jun 2010 12:12:16 +0000 (14:12 +0200)]
Change ganeti-cleaner unittest to not use random values

Using random values in unittests isn't good. This one broke exactly
when building the 2.2.0~beta0 release. I suspect there were duplicate
job IDs generated (due to $large being not so large).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoUpdate NEWS for Ganeti 2.1.4
Guido Trotter [Thu, 17 Jun 2010 11:06:36 +0000 (12:06 +0100)]
Update NEWS for Ganeti 2.1.4

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoBump version to 2.2.0~beta0 v2.2.0beta0
Michael Hanselmann [Thu, 17 Jun 2010 09:40:03 +0000 (11:40 +0200)]
Bump version to 2.2.0~beta0

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoFix parameter names in SimpleFillBE/NIC docstrings
Guido Trotter [Thu, 17 Jun 2010 10:08:53 +0000 (11:08 +0100)]
Fix parameter names in SimpleFillBE/NIC docstrings

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAsyncAwaker: use shutdown on the socketpair
Guido Trotter [Thu, 17 Jun 2010 08:42:36 +0000 (09:42 +0100)]
AsyncAwaker: use shutdown on the socketpair

This makes sure the out_socket can only be used for writing, and the
in_socket for reading.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoWorkerPool.AddManyTasks
Guido Trotter [Thu, 17 Jun 2010 08:15:17 +0000 (09:15 +0100)]
WorkerPool.AddManyTasks

Useful if we want to add many tasks at once, without contention with the
previous one we added starting.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agojqueue: make replication on job update optional
Guido Trotter [Thu, 17 Jun 2010 07:44:25 +0000 (08:44 +0100)]
jqueue: make replication on job update optional

Sometimes it's useful to write to the local filesystem, but immediate
replication to all master candidates is not needed.

The _WriteAndReplicateFileUnlocked function gets renamed to
_UpdateJobQueueFile, as calling "write and replicate, but don't
replicate" seemed a bit strange.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agos/queue._GetJobInfoUnlocked/job.GetInfo/
Guido Trotter [Tue, 15 Jun 2010 11:08:41 +0000 (12:08 +0100)]
s/queue._GetJobInfoUnlocked/job.GetInfo/

The job queue currently has a static _GetJobInfoUnlocked method.
Changing it to be a normal method of _QueuedJob, which makes more sense.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAbstract loading job file from disk
Guido Trotter [Tue, 15 Jun 2010 10:17:24 +0000 (11:17 +0100)]
Abstract loading job file from disk

Move the work from _LoadJobUnlocked to _LoadJobFileFromDisk, which can
then be used in other contexts as well. Also, if we fail to deserialize
the job, archive it as well (before we archived it only if we failed to
create the related object, but kept it there if deserialization failed.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoMakefile: Add support for local Makefile additions
Michael Hanselmann [Thu, 17 Jun 2010 09:25:56 +0000 (11:25 +0200)]
Makefile: Add support for local Makefile additions

With the recent addition of a check for directories listed in Makefile
local custom directories are always reported as unlisted. This patch
adds support for a “Makefile.local” file, which can adjust settings in
Makefile. Example: “DIRCHECK_EXCLUDE += xyz .mydata doc/manhtml”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoListVisibleFiles: do not sort output
Guido Trotter [Fri, 11 Jun 2010 20:23:33 +0000 (21:23 +0100)]
ListVisibleFiles: do not sort output

Among all users, turns out just one *may* need the output to be sorted.
All the others can cope without.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agojqueue: simplify removal from _nodes
Guido Trotter [Mon, 14 Jun 2010 12:17:33 +0000 (13:17 +0100)]
jqueue: simplify removal from _nodes

Somewhere we do try/del/except and somewhere just pop. Using pop
everywhere saves lines of code.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoImprove gnt-debug man page
Manuel Franceschini [Mon, 14 Jun 2010 11:59:15 +0000 (13:59 +0200)]
Improve gnt-debug man page

Signed-off-by: Manuel Franceschini <livewire@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoRemove a TODO
Iustin Pop [Sun, 13 Jun 2010 06:05:24 +0000 (08:05 +0200)]
Remove a TODO

Since OS objects are not stored in the configuration, we cannot put
os_hvp there, therefore the TODO is obsolete…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoRework LUSetInstanceParams._GetUpdatedParams
Iustin Pop [Sun, 13 Jun 2010 05:45:27 +0000 (07:45 +0200)]
Rework LUSetInstanceParams._GetUpdatedParams

Currently, this function does three things:
- special handling of constants.VALUE_DEFAULT
- type enforcing of the resulting dict
- filling the dictionary with defaults

However, except for the first one, the second two do not belong in this
function:
- in the future, not all parameter dictionaries will be able to be
  enforced
- filling the dictionary with defaults cannot be done via a defaults
  dict in all cases, and should be done by the specialized functions
  (ideally we'd pass a partial function instance here, but we don't have
  that yet…)

As such, we remove the last items, and move them to the callers; this is
overall the same complexity, as we were calling this function in just
three places and constructing the many arguments was also complicated.

Furthermore, we move the function out of LUSetInstanceParams, as in the
future it will be used by LUSetClusterParams too.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoSplit the core-OS and instance-specific env
Iustin Pop [Fri, 11 Jun 2010 00:30:11 +0000 (02:30 +0200)]
Split the core-OS and instance-specific env

Since we'll need to be able to generate the OS-specific environment
separately from the instance one, we move it to a separate function. We
also add a new OS_NAME env. var which is identical to the INSTANCE_OS
one (which won't exist for OS-only environments).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd cluster.SimpleFill*() functions
Iustin Pop [Sun, 13 Jun 2010 05:22:07 +0000 (07:22 +0200)]
Add cluster.SimpleFill*() functions

Currently, the existing cluster.Fill* functions take as argument an
instance. This means that in any case where we don't have an actual
instance object, we have to resort to calling the low-level
objects.FillDict function.

This is bad for two reasons:
- we have to know of, and we hardcode, the cluster object internals
  (e.g. that the nicparams are stored in a dict indexed by group)
- which can result in subtle bugs, if the underlying storage mechanisms
  change

This patch adds a lower-level implementation SimpleFillHV for FillHV and
SimpleFillBE for FillBE, and adds a completely new SimpleFillNIC (all
use cases until now hardcoded cluster.nicparams[constant.PP_DEFAULT]
directly); it then uses these new functions in cmdlib.py.

A side effect is that _CheckNicsBridgesExist loses the 'profile'
parameter, which was unused. If it's needed, we should add it later via
a proper profile parameter to SimpleFillNIC.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoMerge branch 'devel-2.1' into master
Iustin Pop [Mon, 14 Jun 2010 18:11:38 +0000 (20:11 +0200)]
Merge branch 'devel-2.1' into master

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>

14 years agoFix a bug in instance startup with custom hvparams
Iustin Pop [Sun, 13 Jun 2010 05:19:37 +0000 (07:19 +0200)]
Fix a bug in instance startup with custom hvparams

Since the introduction of OS-specific hvparams, we shouldn't ever use
objects.FillDict directly for instances, but instead go via the cluster
object. Otherwise the os_hvp will be ignored.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoFix unsafe variant initializer in _TryOSFromDisk
Iustin Pop [Mon, 14 Jun 2010 01:34:41 +0000 (03:34 +0200)]
Fix unsafe variant initializer in _TryOSFromDisk

In case an OS has inconsistent declarations, we might get into a case
where one node reports a valid variants list (with OS API >=15), and
another node has OS API < 15, in which case its supported_variants gets
the default value of None. This leads to the same variable having
inconsistent data types, which leads to subtle bugs later: instead of
reporting something like "Inconsistent OS API versions", the LU exits
with a run-time exception. Furthermore, in another datapath, variants is
initialized to '[]' in case of OS diagnose failures.

The patch changes _TryOSFromDisk to initialize variants to '[]' for
OS api level below 15, and changes the variants calculation in
DiagnoseOS to be more readable.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoMakefile: Add check for DIRS consistency
Michael Hanselmann [Mon, 14 Jun 2010 16:52:09 +0000 (18:52 +0200)]
Makefile: Add check for DIRS consistency

It's easy to forget to add a new directory to DIRS. This check should
report such inconsistencies.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoDisallow DES for SSL connections
Michael Hanselmann [Mon, 14 Jun 2010 15:37:47 +0000 (17:37 +0200)]
Disallow DES for SSL connections

Older OpenSSL versions include DES-CBC3-* ciphers when specifying the
HIGH group of ciphers. Removing potentially weak ciphers from the list
of allowed ciphers ensures only strong ciphers are considered for SSL
connections.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoStart instance after creating snapshots for export
Michael Hanselmann [Mon, 14 Jun 2010 14:37:51 +0000 (16:37 +0200)]
Start instance after creating snapshots for export

This restores functionality lost in commit 387794f8. Found during
tests using QA scripts. An instance should be started after it
has been temporarily shutdown for an export.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoUse import/export magic for backup/import and inter-cluster moves
Michael Hanselmann [Fri, 11 Jun 2010 17:04:51 +0000 (19:04 +0200)]
Use import/export magic for backup/import and inter-cluster moves

This should prevent bugs in our code from accidentally overwriting
disks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoDisable compression for all intra-cluster imports/exports
Michael Hanselmann [Fri, 11 Jun 2010 17:01:16 +0000 (19:01 +0200)]
Disable compression for all intra-cluster imports/exports

Tests have shown that usually we're CPU-bound for intra-cluster
imports/exports. Disabling compression will help with this.

Some versions of OpenSSL, depending on the build options, also
compress transparently. This will need further work in Ganeti.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoqa_rapi: Test inter-cluster instance move script
Michael Hanselmann [Fri, 11 Jun 2010 16:57:29 +0000 (18:57 +0200)]
qa_rapi: Test inter-cluster instance move script

This test moves an instance on the same cluster and, if successful,
moves it back. While not testing a real move between two clusters,
this is certainly better than nothing.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agobackend: Add support for import/export magic
Michael Hanselmann [Fri, 11 Jun 2010 16:03:42 +0000 (18:03 +0200)]
backend: Add support for import/export magic

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoimport/export daemon: Add support for a magic prefix
Michael Hanselmann [Fri, 11 Jun 2010 15:14:44 +0000 (17:14 +0200)]
import/export daemon: Add support for a magic prefix

This “magic” value will be used to ensure that we don't accidentially
connect to the wrong daemon (e.g. due to a bug), comparable to DRBD's
per-disk secret. Just depending on the SSL certificate isn't enough
as it's always per instance and not per disk.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoimport/export daemon: Simplify command building
Michael Hanselmann [Fri, 11 Jun 2010 14:18:12 +0000 (16:18 +0200)]
import/export daemon: Simplify command building

Instead of appending strings, stage parts in a list. Building the "dd"
command is moved to a separate function.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoimport/export: Limit max length of socat options
Michael Hanselmann [Fri, 11 Jun 2010 13:17:45 +0000 (15:17 +0200)]
import/export: Limit max length of socat options

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoimport/export: Validate remote host/port
Michael Hanselmann [Fri, 11 Jun 2010 12:07:23 +0000 (14:07 +0200)]
import/export: Validate remote host/port

The hostname and port received from the remote cluster should
be validated, just in case.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoutils: Add function to validate service name
Michael Hanselmann [Fri, 11 Jun 2010 11:52:19 +0000 (13:52 +0200)]
utils: Add function to validate service name

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoHandle ESRCH when sending signals
Michael Hanselmann [Mon, 14 Jun 2010 12:10:56 +0000 (14:10 +0200)]
Handle ESRCH when sending signals

Upon sending signals, ESRCH can be reported when the target no
longer exists.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd missing directory from Makefile.am
Guido Trotter [Mon, 14 Jun 2010 16:47:25 +0000 (17:47 +0100)]
Add missing directory from Makefile.am

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd example gnt-debug submit-job json files
Guido Trotter [Mon, 14 Jun 2010 15:16:30 +0000 (16:16 +0100)]
Add example gnt-debug submit-job json files

These files are being used to test the job queue performance with
various changes and conditions. Adding them here for posterity.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoFix RpcResult.Raise error code
Iustin Pop [Sat, 12 Jun 2010 00:29:01 +0000 (02:29 +0200)]
Fix RpcResult.Raise error code

A typo in the Raise() method of rpc.RpcResult means that any remote
errors will lack an appropriate error code; this will confuse e.g. RAPI
users.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoCache a few bits of status in jqueue
Guido Trotter [Fri, 11 Jun 2010 11:25:59 +0000 (12:25 +0100)]
Cache a few bits of status in jqueue

Currently each time we submit a job we check the job queue size, and the
drained file. With this change we keep these pieces of information in
memory and don't read them from the filesystem each time.

Significant changes include:
  - The drained value can only be properly set by calling the
    appropriate cluster command "gnt-cluster queue drain/undrain" and
    not by removing/creating the file in the job queue directory. Not
    that anybody would have done it in this undocumented way before.
  - We get rid of the soft limit for the job queue, which we haven't
    ever used anyway.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agojstore._ReadNumericFile: use utils.ReadFile
Guido Trotter [Wed, 9 Jun 2010 13:27:26 +0000 (14:27 +0100)]
jstore._ReadNumericFile: use utils.ReadFile

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agojqueue: Rename _queue_lock to _queue_filelock
Guido Trotter [Fri, 4 Jun 2010 15:51:33 +0000 (16:51 +0100)]
jqueue: Rename _queue_lock to _queue_filelock

The name clarifies the difference between this and the internal lock.
Also explain a bit better what it is.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoOptimize _GetJobIDsUnlocked
Guido Trotter [Fri, 11 Jun 2010 11:17:52 +0000 (12:17 +0100)]
Optimize _GetJobIDsUnlocked

Currently we sort the list of job queue files twice (once in
utils.ListVisibleFiles with sort and then later with NiceSort). We apply
the _RE_JOB_FILE regular expression twice (once in _ListJobFiles and
once in _ExtractJobID). This simplifies the code a little, and a couple
of functions performing basically the same job are collapsed.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRemove unused parameter from function
Guido Trotter [Fri, 11 Jun 2010 11:11:11 +0000 (12:11 +0100)]
Remove unused parameter from function

This also removes the relevant pylint disable.
No point in keeping unused parameters around: if/when we need them it's
easy to add it back.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix a TODO in _QueuedJob
Guido Trotter [Fri, 11 Jun 2010 10:34:34 +0000 (11:34 +0100)]
Fix a TODO in _QueuedJob

Rather than raising Exception use GenericError and explain a bit better
what happened.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoListVisibleFiles: do optional sorting
Guido Trotter [Wed, 9 Jun 2010 17:12:35 +0000 (18:12 +0100)]
ListVisibleFiles: do optional sorting

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoImprove import-export unittest a bit
Michael Hanselmann [Thu, 10 Jun 2010 17:02:15 +0000 (19:02 +0200)]
Improve import-export unittest a bit

- Increase timeouts from 10 to 30 seconds (this still breaks when the
  machine is busy, e.g. using bonnie++)
- Depend on only one timeout per test instead of three
- Reset variables before each test

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoTest client timeout for import-export daemon
Michael Hanselmann [Thu, 10 Jun 2010 14:25:28 +0000 (16:25 +0200)]
Test client timeout for import-export daemon

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoGenerate import-export unittest certs in parallel
Michael Hanselmann [Thu, 10 Jun 2010 14:12:30 +0000 (16:12 +0200)]
Generate import-export unittest certs in parallel

Generating certificates can be slow.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoEnforce consistency in disks and nics input dicts
Guido Trotter [Tue, 8 Jun 2010 16:40:40 +0000 (17:40 +0100)]
Enforce consistency in disks and nics input dicts

With this change unknown disk and nic parameters will be refused, rather
than silently ignored, so that one can't pass them in by mistake and not
realize what went wrong.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoTLMigrateInstance: pass lu to _Check*
Guido Trotter [Thu, 10 Jun 2010 16:43:59 +0000 (17:43 +0100)]
TLMigrateInstance: pass lu to _Check*

The various _Check* helper functions expect an lu to be passed in, but
the TL is passed instead. This works... sometimes! :)

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoRemove locking._CountingCondition
Guido Trotter [Wed, 9 Jun 2010 19:01:27 +0000 (20:01 +0100)]
Remove locking._CountingCondition

This class is unused and untested. We must have forgot it around.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRemove the job queue drain rpc call
Guido Trotter [Wed, 9 Jun 2010 11:07:25 +0000 (12:07 +0100)]
Remove the job queue drain rpc call

This call was introduced but never used. In two years.
Since it's just creating/removing a file it can also be in simpler ways,
without a special rpc call, if/when we need it again. In the meantime,
let's give it to history.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>