Iustin Pop [Fri, 18 Jun 2010 15:45:15 +0000 (17:45 +0200)]
Introduce a micro type system for opcodes
Currently, we have one structual validation for opcode attributes: the
_OP_REQP, which checks that a given attribute is not 'None', and the
rest of the checks are done at runtime. This means our type system has
two types: None versus Not-None.
We have been hit many times by small, trivial bugs in this area, and
only a huge amount of unittest and/or hand-written checks would ensure
that we cover all possibilities. This patch attempts to redress the
needs for manual checks by introducing a micro-type system for the
validation of the opcode attributes. What we lose, from the start, are
the custom error messages (e.g. "Invalid reboot mode, choose one of …",
or "The disk index must be a positive integer"). What we gain is the
ability to express easily things as:
- this parameter must be None or an int
- this parameter must be a non-empty list
- this parameter must be either none or a list of dictionaries with keys
from the list of valid hypervisors and the values dictionaries with
keys strings and values either None or strings; furthermore, the list
must be non-empty
These examples show that we have a composable (as opposed to just a few
static types) system, and that we can nest it a few times (just for
sanity; we could nest it up to stack depth).
We also gain lots of ))))))), which is not that nice :)
The current patch moves the existing _OP_REQP to the new framework, but
if accepted, a lot more validations should move to it. In the end, we
definitely should declare a type for all the opcode parameters
(eventually moving _OP_REQP directly to opcodes.py and validating in the
load/init case, and build __slots__ from it).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 18 Jun 2010 12:40:17 +0000 (14:40 +0200)]
Some more CheckPrereq/CheckArguments cleanup
For a few LUs, a few tests in, or even the whole CheckPrereq, can be
moved to CheckArguments, as they don't touch state and only do a 'type'
validation.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 18 Jun 2010 10:18:26 +0000 (12:18 +0200)]
LU.CheckPrereq: do not require implementation
Currently, the base class LogicalUnit's CheckPrereq will raise
NotImplementedError, which means that the child LUs have to implement
it. However, many LUs don't actually have a need for this function
(hence the many "pass" statements as the only body).
By changing the base class behaviour, we can simplify many LUs.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 18 Jun 2010 09:58:12 +0000 (11:58 +0200)]
Remove the obsolete EvacuateNode OpCode/LU
All code has been switched to the new-style LU… time for cleanup.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 18 Jun 2010 09:55:52 +0000 (11:55 +0200)]
RAPI: switch evacuate node to the new model
This patch removes the last use of the old-style OpEvacuateNode. It also
fixes the dry-run mode for this RAPI resource - the dry-run parameter
was not used at all before.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 18 Jun 2010 09:36:14 +0000 (11:36 +0200)]
Abstract export mode validity check
The export mode is checked in two places with the exact same code…
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 18 Jun 2010 09:22:30 +0000 (11:22 +0200)]
Cleanup LU.ExpandNames versus CheckArguments
When LogicalUnit.CheckArguments was introduced, not all code dealing
with static argument checking was moved to it; many of these checks were
left in ExpandNames. With time, most of them migrated, and this patch
does the final cleanups.
The patch is straightforward, with the exception of LURebootInstance,
where an old style ParameterError exception is converted to the new
OpPrereqError with ECODE_INVAL.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 18 Jun 2010 06:17:00 +0000 (08:17 +0200)]
Move opcode attribute defaults to data structures
LUExportInstance had two opcode fields set to default via both
_CheckBooleanOpField and getattr(…, False).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Tue, 15 Jun 2010 22:40:44 +0000 (00:40 +0200)]
Add OS verification support to cluster verify
For this, we needed to extend the NodeImage class with a few extra
variables, and we do a trick in the node verification where we pick the
first node that returned valid OS data as the reference node, and then
we compare all other nodes against it.
The checks added are:
- consistency of DiagnoseOS responses
- multiple paths for an OS
- inconsistent OS between a reference node and the current node
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Wed, 16 Jun 2010 02:56:48 +0000 (04:56 +0200)]
Simplify gnt-os diagnose output
Currently, we always list the api/variants, even if these are empty.
This patch changes so that we make clear distiction for empty values
("[no variants]" versus "[variants: ]"), and we only list variants and
parameters when the OS API indicates they should be supported.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 14 Jun 2010 17:42:15 +0000 (19:42 +0200)]
Add a new gnt-os info command
This can be used to show the actual OS parameters and supported
variants, in a global manner (rather than per-node as gnt-os diagnose).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 14 Jun 2010 23:45:20 +0000 (01:45 +0200)]
LUDiagnoseOS: add more fields, cleanup
This patch exports all the way from backend a new field ‘api_version’
which holds the list of support API versions, and exposes the (already
computed) ‘parameters’ field.
The patch also reworks (again) the field calculation in its Exec()
method. All callers of LUDiagnoseOS pass in the 'valid' and 'variants'
parameters, thus having the special casing of whether to compute or not
the validity seems overkill. We move to a model where we always compute
these across-nodes arguments, in order to simplify the code, and we also
change the parameters set to be intersection of all node's values (which
means a change in description will drop the parameter from the list of
parameters).
Additionally, we update scripts/gnt-os, which was broken for multi-dir
OSes since the introduction of variants…
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 13 Jun 2010 06:18:52 +0000 (08:18 +0200)]
Add support for OS parameters during import/export
Nothing special here, just copy/adjust the beparams code.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 13 Jun 2010 05:57:15 +0000 (07:57 +0200)]
Add support for modifying instance OS parameters
We move the instance OS rename checks earlier, as we need to run the
validation against the new OS, if it has changed.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 13 Jun 2010 00:48:34 +0000 (02:48 +0200)]
Add support for modifying cluster OS parameters
We use _GetUpdatedParams in order to support removal too, and then
validate the OS parameters if the OS exists.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 13 Jun 2010 22:01:53 +0000 (00:01 +0200)]
_GetUpdatedParams: enhance value removal options
This patch adds controls for whether we recognize
constants.VALUE_DEFAULT or not as a default value, and also adds
dash-prefixes as another way for parameter removal.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 12 Jun 2010 07:11:50 +0000 (09:11 +0200)]
Add support for OS parameters during instance add
This is not yet complete, as it lacks proper support for instance
import.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 12 Jun 2010 06:49:29 +0000 (08:49 +0200)]
Show OS parameters in cluster/instance info
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 12 Jun 2010 02:17:20 +0000 (04:17 +0200)]
Add OS parameters to cluster and instance objects
The patch also modifies the instance RPC calls to fill the osparameters
correctly with the cluster defaults, and exports the OS parameters in
the instance/OS environment.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 12 Jun 2010 02:13:29 +0000 (04:13 +0200)]
Introduce an RPC call for OS parameters validation
While we only support the 'parameters' check today, the RPC call is
generic enough that will be able to support other checks in the future.
The backend function will both validate the parameters list (so as to
make sure we don't pass in extra parameters that the OS validation
doesn't care about) and the parameter values, via the OS verify script.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 12 Jun 2010 02:02:06 +0000 (04:02 +0200)]
Add reading of OS parameters from disk
The patch also modifies the internal methods in LUDiagnoseOS and gnt-os
to deal with the format change of call_os_diagnose.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 12 Jun 2010 01:55:59 +0000 (03:55 +0200)]
Add os api v20 and related fields to the OS object
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Wed, 16 Jun 2010 02:22:51 +0000 (04:22 +0200)]
Silence a pylint warning
The OS parameters code will bump the number of lines over 10K, and thus
we need to silence this (no, we don't want any other module to become
this big…, so we use a targeted silence only).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Wed, 23 Jun 2010 04:29:57 +0000 (06:29 +0200)]
Update the 2.2 design doc with OS parameters
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Tue, 22 Jun 2010 09:46:05 +0000 (11:46 +0200)]
Remove job object condition
We don't need it anymore, since nobody waits on it.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 15 Jun 2010 16:39:30 +0000 (17:39 +0100)]
Parallelize WaitForJobChanges
As for QueryJobs we rely on file updates rather than condition
notification to acquire job changes. In order to do that we use the
pyinotify module to watch files. This might make the client a bit slower
(pending planned improvements, such as subscription-based
WaitForJobChanges) but detaches it from the job execution.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 22 Jun 2010 09:02:10 +0000 (11:02 +0200)]
Update the job file on feedback
This is needed to convert waitforjobchanges to use inotify and the
on-disk version and decouple it from the job queue lock. No replication
to remote nodes is done, to keep the operation fast.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Mon, 14 Jun 2010 10:23:52 +0000 (11:23 +0100)]
Don't lock on QueryJobs, by using the disk version
We move from querying the in-memory version to loading all jobs from the
disk. Since the jobs are written/deleted on disk in an atomic manner, we
don't need to lock at all. Also, since we're just looking at the
contents of a directory, we don't need to check that the job queue is
"open".
If some jobs are removed between when we listed them and us loading
them, we need to be able to cope: if we were asked to load those jobs
specifically, we must report the failure, but if we were just asked to
"load all" we shall just not consider them as part of the "all" set,
since they were deleted.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 22 Jun 2010 09:35:32 +0000 (11:35 +0200)]
Add JobQueue.SafeLoadJobFromDisk
This will be used to read a job file without having to deal with
exceptions from _LoadJobFromDisk.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 22 Jun 2010 09:19:47 +0000 (11:19 +0200)]
jqueue._LoadJobFromDisk: remove safety archival
Currently _LoadJobFromDisk archives job files it finds corrupted. Since
we want to use it to load files without holding locks, this could cause
a conflict: we just move the feature to _LoadJobUnlocked which is always
called with the lock held.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Wed, 23 Jun 2010 07:50:26 +0000 (09:50 +0200)]
Add repetition count to the TestDelay opcode
If the repetition count is not passed or is passed as 0 we sleep exactly
one time, otherwise we sleep "repeat" times and log in between.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 22 Jun 2010 13:25:55 +0000 (15:25 +0200)]
Merge branch 'devel-2.1'
* devel-2.1:
Add "adopt" to the allowed disk parameters
Improve pylintrc for pylint 0.21+
Fix warnings with Python 2.6
Fix a small bug introduced in
cf26a87a
Fix the type of 'valid' attribute in LUDiagnoseOS
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Apollon Oikonomopoulos [Fri, 18 Jun 2010 14:52:05 +0000 (17:52 +0300)]
Add "adopt" to the allowed disk parameters
"adopt" was missing from bd061c3, thus breaking disk adoption.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 22 Jun 2010 09:48:36 +0000 (11:48 +0200)]
Improve pylintrc for pylint 0.21+
While we'll need to update the source files too, at least this change
makes pylint 0.21 not fail on the current source tree.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Tue, 22 Jun 2010 09:38:23 +0000 (11:38 +0200)]
Fix warnings with Python 2.6
'format' is a new built-in function, and 'bytes' is a new builtin type.
We rename this to make pylint happy (and remove potential bugs).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 18 Jun 2010 12:30:48 +0000 (14:30 +0200)]
Fix a small bug introduced in
cf26a87a
Commit
cf26a87a added a tiny typo, which would break non-FQDN arguments
to modify node storage.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 14 Jun 2010 20:09:23 +0000 (22:09 +0200)]
Fix the type of 'valid' attribute in LUDiagnoseOS
The update of the valid status in LUDiagnoseOS says:
valid = valid and osl and osl[0][1]
However, in Python, “True and []” (which '[]' we get for an invalid OS)
will result in “[]”, and thus the valid field for an OS will be either
True or an empty list. Which is not what we want…
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Apollon Oikonomopoulos [Fri, 18 Jun 2010 14:52:05 +0000 (17:52 +0300)]
Add "adopt" to the allowed disk parameters
"adopt" was missing from bd061c3, thus breaking disk adoption.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Fri, 18 Jun 2010 10:41:30 +0000 (11:41 +0100)]
Merge branch 'stable-2.1'
* stable-2.1:
Bump up version for the 2.1.4 release
Update NEWS about the latest 2.1 change
Fix handling of errors from socket.gethostbyname
Update a comment in qa-sample.json
RAPI client: Add support for Python 2.6
Update NEWS for Ganeti 2.1.4
Conflicts:
NEWS: keep both
configure.ac: keep the 2.2 version
qa/qa-sample.json: merge nearby changes
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 17 Jun 2010 15:09:26 +0000 (16:09 +0100)]
Bump up version for the 2.1.4 release
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 17 Jun 2010 17:06:22 +0000 (18:06 +0100)]
Update NEWS about the latest 2.1 change
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 16 Jun 2010 03:16:05 +0000 (05:16 +0200)]
Fix handling of errors from socket.gethostbyname
Socket functions can raise more than just gaierror. Most of the times,
socket.gethostbyname_ex will return gaierror, but rarely it will also
raise herror. For completeness, we catch all socket exceptions with data
of type (code, description).
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Thu, 17 Jun 2010 16:53:56 +0000 (17:53 +0100)]
Update a comment in qa-sample.json
Fix the sentence to say what it means.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 17 Jun 2010 14:46:09 +0000 (15:46 +0100)]
gnt-debug: remove @todo from GenericOpCodes
- the function is not broken, and we're using in nowadays
- we have example json files and all, which show its usage
=> the todo is incorrect
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 17 Jun 2010 13:02:20 +0000 (14:02 +0100)]
jqueue.AddManyJobs: use AddManyTasks
Rather than adding the jobs to the worker pool one at a time, we add
them all together, which is slightly faster, and ensures they don't get
started while we loop.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 17 Jun 2010 13:32:55 +0000 (14:32 +0100)]
Workerpool.AddManyTasks: check tasks type
Each task has to be a sequence, or the RunTask call will fail.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 17 Jun 2010 13:02:32 +0000 (14:02 +0100)]
count the number of tasks done in the wp unittest
Currently there's no way to know if something actually gets done.
After this check we actually test that the threads do their job.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 17 Jun 2010 14:48:43 +0000 (16:48 +0200)]
RAPI client: Add support for Python 2.6
The httplib module used by urllib2 requires its sockets to have a
makefile() method to provide a file-like interface (or rather
file-in-Python-like) to the socket. PyOpenSSL doesn't implement
makefile() as the semantics require files to call dup(2) on the
underlying file descriptors, something not easily done on SSL sockets.
Python up to and including 2.5 have a class to simulate makefile(),
httplib.FakeSocket. With the addition of SSL support in Python 2.6, this
class was deprecated and no longer functions.
This patch adds a new, simpler wrapper class which is used in Python 2.6
and above only. It's good enough for this use.
There are general problems in these generic wrapper classes--none of
them handles SSL I/O properly. They break, for example, when the server
requests a renegotiation. This will need more work.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 17 Jun 2010 14:48:43 +0000 (16:48 +0200)]
RAPI client: Add support for Python 2.6
The httplib module used by urllib2 requires its sockets to have a
makefile() method to provide a file-like interface (or rather
file-in-Python-like) to the socket. PyOpenSSL doesn't implement
makefile() as the semantics require files to call dup(2) on the
underlying file descriptors, something not easily done on SSL sockets.
Python up to and including 2.5 have a class to simulate makefile(),
httplib.FakeSocket. With the addition of SSL support in Python 2.6, this
class was deprecated and no longer functions.
This patch adds a new, simpler wrapper class which is used in Python 2.6
and above only. It's good enough for this use.
There are general problems in these generic wrapper classes--none of
them handles SSL I/O properly. They break, for example, when the server
requests a renegotiation. This will need more work.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 17 Jun 2010 12:14:19 +0000 (14:14 +0200)]
Bump RPC protocol version to 40
Many RPC calls have changed in Ganeti 2.2, hence bumping the RPC protocol
version.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 17 Jun 2010 12:12:16 +0000 (14:12 +0200)]
Change ganeti-cleaner unittest to not use random values
Using random values in unittests isn't good. This one broke exactly
when building the 2.2.0~beta0 release. I suspect there were duplicate
job IDs generated (due to $large being not so large).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Thu, 17 Jun 2010 11:06:36 +0000 (12:06 +0100)]
Update NEWS for Ganeti 2.1.4
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 17 Jun 2010 09:40:03 +0000 (11:40 +0200)]
Bump version to 2.2.0~beta0
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Thu, 17 Jun 2010 10:08:53 +0000 (11:08 +0100)]
Fix parameter names in SimpleFillBE/NIC docstrings
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 17 Jun 2010 08:42:36 +0000 (09:42 +0100)]
AsyncAwaker: use shutdown on the socketpair
This makes sure the out_socket can only be used for writing, and the
in_socket for reading.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 17 Jun 2010 08:15:17 +0000 (09:15 +0100)]
WorkerPool.AddManyTasks
Useful if we want to add many tasks at once, without contention with the
previous one we added starting.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 17 Jun 2010 07:44:25 +0000 (08:44 +0100)]
jqueue: make replication on job update optional
Sometimes it's useful to write to the local filesystem, but immediate
replication to all master candidates is not needed.
The _WriteAndReplicateFileUnlocked function gets renamed to
_UpdateJobQueueFile, as calling "write and replicate, but don't
replicate" seemed a bit strange.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Tue, 15 Jun 2010 11:08:41 +0000 (12:08 +0100)]
s/queue._GetJobInfoUnlocked/job.GetInfo/
The job queue currently has a static _GetJobInfoUnlocked method.
Changing it to be a normal method of _QueuedJob, which makes more sense.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Tue, 15 Jun 2010 10:17:24 +0000 (11:17 +0100)]
Abstract loading job file from disk
Move the work from _LoadJobUnlocked to _LoadJobFileFromDisk, which can
then be used in other contexts as well. Also, if we fail to deserialize
the job, archive it as well (before we archived it only if we failed to
create the related object, but kept it there if deserialization failed.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 17 Jun 2010 09:25:56 +0000 (11:25 +0200)]
Makefile: Add support for local Makefile additions
With the recent addition of a check for directories listed in Makefile
local custom directories are always reported as unlisted. This patch
adds support for a “Makefile.local” file, which can adjust settings in
Makefile. Example: “DIRCHECK_EXCLUDE += xyz .mydata doc/manhtml”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Fri, 11 Jun 2010 20:23:33 +0000 (21:23 +0100)]
ListVisibleFiles: do not sort output
Among all users, turns out just one *may* need the output to be sorted.
All the others can cope without.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Mon, 14 Jun 2010 12:17:33 +0000 (13:17 +0100)]
jqueue: simplify removal from _nodes
Somewhere we do try/del/except and somewhere just pop. Using pop
everywhere saves lines of code.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Manuel Franceschini [Mon, 14 Jun 2010 11:59:15 +0000 (13:59 +0200)]
Improve gnt-debug man page
Signed-off-by: Manuel Franceschini <livewire@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Sun, 13 Jun 2010 06:05:24 +0000 (08:05 +0200)]
Remove a TODO
Since OS objects are not stored in the configuration, we cannot put
os_hvp there, therefore the TODO is obsolete…
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 13 Jun 2010 05:45:27 +0000 (07:45 +0200)]
Rework LUSetInstanceParams._GetUpdatedParams
Currently, this function does three things:
- special handling of constants.VALUE_DEFAULT
- type enforcing of the resulting dict
- filling the dictionary with defaults
However, except for the first one, the second two do not belong in this
function:
- in the future, not all parameter dictionaries will be able to be
enforced
- filling the dictionary with defaults cannot be done via a defaults
dict in all cases, and should be done by the specialized functions
(ideally we'd pass a partial function instance here, but we don't have
that yet…)
As such, we remove the last items, and move them to the callers; this is
overall the same complexity, as we were calling this function in just
three places and constructing the many arguments was also complicated.
Furthermore, we move the function out of LUSetInstanceParams, as in the
future it will be used by LUSetClusterParams too.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 11 Jun 2010 00:30:11 +0000 (02:30 +0200)]
Split the core-OS and instance-specific env
Since we'll need to be able to generate the OS-specific environment
separately from the instance one, we move it to a separate function. We
also add a new OS_NAME env. var which is identical to the INSTANCE_OS
one (which won't exist for OS-only environments).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 13 Jun 2010 05:22:07 +0000 (07:22 +0200)]
Add cluster.SimpleFill*() functions
Currently, the existing cluster.Fill* functions take as argument an
instance. This means that in any case where we don't have an actual
instance object, we have to resort to calling the low-level
objects.FillDict function.
This is bad for two reasons:
- we have to know of, and we hardcode, the cluster object internals
(e.g. that the nicparams are stored in a dict indexed by group)
- which can result in subtle bugs, if the underlying storage mechanisms
change
This patch adds a lower-level implementation SimpleFillHV for FillHV and
SimpleFillBE for FillBE, and adds a completely new SimpleFillNIC (all
use cases until now hardcoded cluster.nicparams[constant.PP_DEFAULT]
directly); it then uses these new functions in cmdlib.py.
A side effect is that _CheckNicsBridgesExist loses the 'profile'
parameter, which was unused. If it's needed, we should add it later via
a proper profile parameter to SimpleFillNIC.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 14 Jun 2010 18:11:38 +0000 (20:11 +0200)]
Merge branch 'devel-2.1' into master
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Sun, 13 Jun 2010 05:19:37 +0000 (07:19 +0200)]
Fix a bug in instance startup with custom hvparams
Since the introduction of OS-specific hvparams, we shouldn't ever use
objects.FillDict directly for instances, but instead go via the cluster
object. Otherwise the os_hvp will be ignored.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 14 Jun 2010 01:34:41 +0000 (03:34 +0200)]
Fix unsafe variant initializer in _TryOSFromDisk
In case an OS has inconsistent declarations, we might get into a case
where one node reports a valid variants list (with OS API >=15), and
another node has OS API < 15, in which case its supported_variants gets
the default value of None. This leads to the same variable having
inconsistent data types, which leads to subtle bugs later: instead of
reporting something like "Inconsistent OS API versions", the LU exits
with a run-time exception. Furthermore, in another datapath, variants is
initialized to '[]' in case of OS diagnose failures.
The patch changes _TryOSFromDisk to initialize variants to '[]' for
OS api level below 15, and changes the variants calculation in
DiagnoseOS to be more readable.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 14 Jun 2010 16:52:09 +0000 (18:52 +0200)]
Makefile: Add check for DIRS consistency
It's easy to forget to add a new directory to DIRS. This check should
report such inconsistencies.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 14 Jun 2010 15:37:47 +0000 (17:37 +0200)]
Disallow DES for SSL connections
Older OpenSSL versions include DES-CBC3-* ciphers when specifying the
HIGH group of ciphers. Removing potentially weak ciphers from the list
of allowed ciphers ensures only strong ciphers are considered for SSL
connections.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 14 Jun 2010 14:37:51 +0000 (16:37 +0200)]
Start instance after creating snapshots for export
This restores functionality lost in commit
387794f8. Found during
tests using QA scripts. An instance should be started after it
has been temporarily shutdown for an export.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 11 Jun 2010 17:04:51 +0000 (19:04 +0200)]
Use import/export magic for backup/import and inter-cluster moves
This should prevent bugs in our code from accidentally overwriting
disks.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 11 Jun 2010 17:01:16 +0000 (19:01 +0200)]
Disable compression for all intra-cluster imports/exports
Tests have shown that usually we're CPU-bound for intra-cluster
imports/exports. Disabling compression will help with this.
Some versions of OpenSSL, depending on the build options, also
compress transparently. This will need further work in Ganeti.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 11 Jun 2010 16:57:29 +0000 (18:57 +0200)]
qa_rapi: Test inter-cluster instance move script
This test moves an instance on the same cluster and, if successful,
moves it back. While not testing a real move between two clusters,
this is certainly better than nothing.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 11 Jun 2010 16:03:42 +0000 (18:03 +0200)]
backend: Add support for import/export magic
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 11 Jun 2010 15:14:44 +0000 (17:14 +0200)]
import/export daemon: Add support for a magic prefix
This “magic” value will be used to ensure that we don't accidentially
connect to the wrong daemon (e.g. due to a bug), comparable to DRBD's
per-disk secret. Just depending on the SSL certificate isn't enough
as it's always per instance and not per disk.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 11 Jun 2010 14:18:12 +0000 (16:18 +0200)]
import/export daemon: Simplify command building
Instead of appending strings, stage parts in a list. Building the "dd"
command is moved to a separate function.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 11 Jun 2010 13:17:45 +0000 (15:17 +0200)]
import/export: Limit max length of socat options
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 11 Jun 2010 12:07:23 +0000 (14:07 +0200)]
import/export: Validate remote host/port
The hostname and port received from the remote cluster should
be validated, just in case.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 11 Jun 2010 11:52:19 +0000 (13:52 +0200)]
utils: Add function to validate service name
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 14 Jun 2010 12:10:56 +0000 (14:10 +0200)]
Handle ESRCH when sending signals
Upon sending signals, ESRCH can be reported when the target no
longer exists.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Mon, 14 Jun 2010 16:47:25 +0000 (17:47 +0100)]
Add missing directory from Makefile.am
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Mon, 14 Jun 2010 15:16:30 +0000 (16:16 +0100)]
Add example gnt-debug submit-job json files
These files are being used to test the job queue performance with
various changes and conditions. Adding them here for posterity.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Sat, 12 Jun 2010 00:29:01 +0000 (02:29 +0200)]
Fix RpcResult.Raise error code
A typo in the Raise() method of rpc.RpcResult means that any remote
errors will lack an appropriate error code; this will confuse e.g. RAPI
users.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Fri, 11 Jun 2010 11:25:59 +0000 (12:25 +0100)]
Cache a few bits of status in jqueue
Currently each time we submit a job we check the job queue size, and the
drained file. With this change we keep these pieces of information in
memory and don't read them from the filesystem each time.
Significant changes include:
- The drained value can only be properly set by calling the
appropriate cluster command "gnt-cluster queue drain/undrain" and
not by removing/creating the file in the job queue directory. Not
that anybody would have done it in this undocumented way before.
- We get rid of the soft limit for the job queue, which we haven't
ever used anyway.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 9 Jun 2010 13:27:26 +0000 (14:27 +0100)]
jstore._ReadNumericFile: use utils.ReadFile
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 4 Jun 2010 15:51:33 +0000 (16:51 +0100)]
jqueue: Rename _queue_lock to _queue_filelock
The name clarifies the difference between this and the internal lock.
Also explain a bit better what it is.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 11 Jun 2010 11:17:52 +0000 (12:17 +0100)]
Optimize _GetJobIDsUnlocked
Currently we sort the list of job queue files twice (once in
utils.ListVisibleFiles with sort and then later with NiceSort). We apply
the _RE_JOB_FILE regular expression twice (once in _ListJobFiles and
once in _ExtractJobID). This simplifies the code a little, and a couple
of functions performing basically the same job are collapsed.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 11 Jun 2010 11:11:11 +0000 (12:11 +0100)]
Remove unused parameter from function
This also removes the relevant pylint disable.
No point in keeping unused parameters around: if/when we need them it's
easy to add it back.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 11 Jun 2010 10:34:34 +0000 (11:34 +0100)]
Fix a TODO in _QueuedJob
Rather than raising Exception use GenericError and explain a bit better
what happened.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 9 Jun 2010 17:12:35 +0000 (18:12 +0100)]
ListVisibleFiles: do optional sorting
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 10 Jun 2010 17:02:15 +0000 (19:02 +0200)]
Improve import-export unittest a bit
- Increase timeouts from 10 to 30 seconds (this still breaks when the
machine is busy, e.g. using bonnie++)
- Depend on only one timeout per test instead of three
- Reset variables before each test
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 10 Jun 2010 14:25:28 +0000 (16:25 +0200)]
Test client timeout for import-export daemon
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 10 Jun 2010 14:12:30 +0000 (16:12 +0200)]
Generate import-export unittest certs in parallel
Generating certificates can be slow.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 8 Jun 2010 16:40:40 +0000 (17:40 +0100)]
Enforce consistency in disks and nics input dicts
With this change unknown disk and nic parameters will be refused, rather
than silently ignored, so that one can't pass them in by mistake and not
realize what went wrong.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 10 Jun 2010 16:43:59 +0000 (17:43 +0100)]
TLMigrateInstance: pass lu to _Check*
The various _Check* helper functions expect an lu to be passed in, but
the TL is passed instead. This works... sometimes! :)
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Wed, 9 Jun 2010 19:01:27 +0000 (20:01 +0100)]
Remove locking._CountingCondition
This class is unused and untested. We must have forgot it around.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 9 Jun 2010 11:07:25 +0000 (12:07 +0100)]
Remove the job queue drain rpc call
This call was introduced but never used. In two years.
Since it's just creating/removing a file it can also be in simpler ways,
without a special rpc call, if/when we need it again. In the meantime,
let's give it to history.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>