e873317a 07/30/2008 02:31 pm Guido Trotter

Parallelize {Startup,Shutdown,Reboot}Instance

Reviewed-by: iustinp

4e0b4d2d 07/30/2008 02:30 pm Guido Trotter

Parallelize LUReinstallInstance

self.recalculate_locks[locking.LEVEL_NODE] could have any value and
everything would work anyway. We'll use the string 'replace' by
convention because in the future we might want an 'append' mode.

Reviewed-by: iustinp

c4a2fee1 07/30/2008 02:30 pm Guido Trotter

LogicalUnit._LockInstancesNodes helper function

This function is used to lock instances' primary and secondary nodes
after locking instances themselves.

Reviewed-by: iustinp

3977a4c1 07/30/2008 02:30 pm Guido Trotter

Make sharing locks possible

LUs can declare which locks they need by populating the
self.needed_locks dictionary, but those locks are always acquired as
exclusive. Make it possible to acquire shared locks as well, by
declaring a particular level as shared in the self.share_locks...

fb8dcb62 07/30/2008 02:29 pm Guido Trotter

Add LogicalUnit.DeclareLocks

This additional LogicalUnit function is optional to implement, but lets
you change your locking needs for one level just before locking it, but
after the previous levels have been already locked. It is useful for
example to calculate what nodes to lock after locking an instance....

74b5913f 07/30/2008 02:29 pm Guido Trotter

LURenameInstance, add/remove relevant locks

LURenameInstance forgot to remove the old lock name and add the new one,
making it impossible for parallel LUs to act on the instance (without a
master daemon restart). This also fixes burning+rename with the
parallelization of {Start,Stop}Instance....

85f03e0d 07/30/2008 01:02 pm Michael Hanselmann

Rewrite job queue

We found several issues in the old job queue implementation. It had race
conditions, deadlocks and other deficiencies.

Short summary:
- _QueuedOpCode and _QueuedJob are now more or less data structures with a few
utility functions. __Setup is gone....

c0a8eb9e 07/30/2008 11:56 am Michael Hanselmann

workerpool: Log when waiting for a thread

Reviewed-by: iustinp

b1b6ea87 07/30/2008 11:43 am Iustin Pop

Rework master startup/shutdown/failover

This (big) patch reworks the master startup/shutdown and the fixes the
master failover.

What does the patch do?

For master start/stop:
- remove the old ganeti-master script and its associated man page
- moves the ip start/stop directly into the backend.(Start|Stop)Master...

53beffbb 07/30/2008 11:34 am Iustin Pop

Expose utils.DaemonPidFileName

Since we need to compute this from outside, we change this to a
public function.

Reviewed-by: ultrotter

5675cd1f 07/30/2008 11:33 am Iustin Pop

Implement checking for the master role in rapi

This patch moves the CheckMaster function from ganeti-masterd to ssconf
(most logical place, it cannot go in utils since we would have recursive
imports between ssconf and utils) and changes ganeti-rapi to also call...

1c65840b 07/30/2008 11:32 am Iustin Pop

Add a new parameter to backend.(Start|Stop)Master

This patch adds a new, unused for now, parameter to the start and stop
master operations in backend. The idea behind it is that we need to be
able to control whether the IP (de)activation is coupled with daemon...

6aff91f6 07/29/2008 05:07 pm Michael Hanselmann

Log thread name when debug output is enabled

Reviewed-by: iustinp

8090e19f 07/29/2008 05:07 pm Michael Hanselmann

jqueue: Fix error logging

The passed parameters were not correct.

Reviewed-by: iustinp, ultrotter

bff2ddc5 07/29/2008 01:42 pm Iustin Pop

Fix constants typo

Reviewed-by: imsnah

99e88451 07/29/2008 12:06 pm Iustin Pop

Use constants for the pid file stems

Reviewed-by: imsnah

b2a1f511 07/29/2008 11:49 am Iustin Pop

Add a KillProcess function

We cannot depend on all environments to have a start-stop-daemon or
similar tool. We instead implement a KillProcess function that behaves
similar to “start-stop-daemon --retry”.

Note that the attached unittest can hang in foreground if the child...

d9f311d7 07/29/2008 11:49 am Iustin Pop

Change IsPidFileAlive into ReadPidFile

We already have a function to test if a PID is alive, so it makes more
sense to use function composition that force calling (since we need to
read PIDs from files in other places too). Now IsProcessAlive returns
False for PIDs <= 0, since this is the error return from ReadPidFile....

3cd62121 07/28/2008 01:17 pm Michael Hanselmann

Move ganeti-rapi core code to daemon

All other daemons have their main code in themselves and not in a module.
This patch does the same to ganeti-rapi by moving the code from
lib/rapi/ to daemons/ganeti-rapi.

Reviewed-by: iustinp

e2ae9123 07/28/2008 01:16 pm Michael Hanselmann

Replace httperror module with ganeti.http

The generic HTTP server doesn't know about httperror based exceptions
and would treat them as unknown exceptions, thereby not doing the right
thing with HTTP errors.

Reviewed-by: iustinp

188c5e0a 07/28/2008 01:13 pm Michael Hanselmann

Implement job canceling on server side

Locking is not completeley right due to a deadlock when the job calls
UpdateJob after changing its status.

Reviewed-by: ultrotter

533bb4b1 07/28/2008 12:16 pm Michael Hanselmann

Fix exception class name in utils.WritePidFile

Reviewed-by: iustinp

4cb1d919 07/28/2008 12:16 pm Michael Hanselmann

Add “canceled” status for opcodes

Reviewed-by: ultrotter

fae737ac 07/25/2008 03:47 pm Michael Hanselmann

Move code extracting job ID into function

It might come in handy at some point and makes the code a bit easier
to read.

Reviewed-by: iustinp

5d414478 07/25/2008 03:32 pm Oleksiy Mishchenko

Convert set to a list in LUGetTags

The set triggers exception on a list-tags command and RAPI calls for tags
since it is not serializable by JSON.

Reviewed-by: iustinp

a0638838 07/24/2008 07:34 pm Oleksiy Mishchenko

Switch RAPI to ganeti.http module

Reviewed-by: imsnah

c609f802 07/24/2008 02:32 pm Michael Hanselmann

Implement job archiving on the server side

So far no error reporting to the client is done. Clients don't get
noticed if a job doesn't exist or couldn't be archived because of
its current status.

The internal cache is always cleaned when the preconditions didn't...

0cb94105 07/24/2008 02:32 pm Michael Hanselmann

Add directory for archived jobs

Reviewed-by: iustinp

ce594241 07/23/2008 07:56 pm Michael Hanselmann

Move code formatting job ID into a base class

A later patch will add a memory based job storage class, hence this
code is going into a separate class. It also changes the number format
to always use at least 10 digits, allowing up to 9'999'999'999 jobs to...

b330ac0b 07/23/2008 05:23 pm Guido Trotter

Add utils.{Write,Remove}PidFile

WritePidFile is a helper function that writes the current pid in a
pidfile within the ganeti run directory. RemovePidFile tries to delete

Reviewed-by: iustinp

fee80e90 07/23/2008 05:23 pm Guido Trotter

Add utils.IsPidFileAlive function

This helper function reads a pid from a file containing it and checks
whether it refers to a live process.

Reviewed-by: iustinp

04e1bfaf 07/23/2008 05:23 pm Guido Trotter

Invert nodes/instances locking order

An implementation mistake from the original design caused nodes to be
locked before instances, rather than after. This patch inverts the level
numbering, changing also the relevant unittests and the recursive
locking function starting point....

51ee2f49 07/23/2008 05:16 pm Oleksiy Mishchenko

Generalization of bulk output mapping

Reviewed-by: iustinp

21cc1fbd 07/23/2008 04:30 pm Michael Hanselmann

Rename JobStorage to DiskJobStorage

Reviewed-by: iustinp

205d71fd 07/23/2008 03:25 pm Michael Hanselmann

Fix logging with string job IDs

The job ID is now a string, hence logging must use %s instead of %d.

Reviewed-by: iustinp

dca1764e 07/23/2008 03:13 pm Iustin Pop

Simplify rapi.baserlib.MapFields()

We can use zip for simplifying this function. Actually, at this point
I'm not sure if it needs to be a separate function at all.

Reviewed-by: imsnah

3be9a705 07/23/2008 02:34 pm Michael Hanselmann

Make job ID a string

The docstring says that _NewSerialUnlocked returns “a string
representing the job identifier”. Until now it returned an
integer and this patch changes it.

Reviewed-by: iustinp

c3f0a12f 07/23/2008 01:06 pm Iustin Pop

Distribute the queue serial file after each update

This patch adds distribution of the queue serial file after each write
to it (but before a new job is created and written with that ID, and
before a response is returned, so we should be safe from crashes in...

c4beba1c 07/23/2008 01:06 pm Iustin Pop

Make the job storage init reuse a serial file

This will be needed for master failover. If we don't have a valid queue
directory, we need to reinitialize it, but we should keep the existing
serial number.

As such, we abstract the reading of the serial and if we find a valid...

42ff3343 07/23/2008 11:22 am Guido Trotter


This was a TODO for 2.0

Reviewed-by: iustinp

1a5c7281 07/22/2008 05:25 pm Guido Trotter

Convert SetInstanceParams to concurrency

Grab a lock for the instance we're working on, and update its params.

Reviewed-by: iustinp

ea94e1cd 07/22/2008 05:25 pm Guido Trotter

Use Update in SetInstanceParams

When we set the instance params we're not adding a new instance, but
just updating an existing one, so why using AddInstance?

Reviewed-by: iustinp

8659b73e 07/22/2008 05:25 pm Guido Trotter

Convert LUConnectConsole to concurrency

For ConnectConsole we just need to lock the instance we're connecting
to. We make a few rpcs to its primary node, but node daemons can now
handle multiple queries and nodes cannot be removed till they have
instances on them anyway. Note that since we return the ssh command, and...

43905206 07/22/2008 05:24 pm Guido Trotter

Add _ExpandAndLockInstance auxiliary function.

LUs that take an instance name as input and need to expand its name and
lock it can use it to simplify their ExpandNames call. Possibly, and
_ExpandAndLockNode will come as well.

Reviewed-by: iustinp

642339cf 07/22/2008 05:24 pm Guido Trotter

Convert two (simple) LUs to be concurrent

LUQueryClusterInfo and LUDumpClusterConfig can be made concurrent and
don't need to acquire any locks. In fact they don't interact with the
cluster at all, but just with its configuration, which is thread-safe by...

0eed6e61 07/22/2008 05:23 pm Guido Trotter

Add missing empty line

Two top level definitions were separated only by one empty line.
Fixing this.

Reviewed-by: imsnah

c2dca9af 07/22/2008 05:12 pm Oleksiy Mishchenko

Put the poper RAPI baserlib

Reviewed-by: imsnah

57f8615f 07/22/2008 05:05 pm Michael Hanselmann

Make argument to CleanCacheUnlocked mandatory

Not passing the argument means it has the value None. Iterating None
doesn't work:
>>> "123" in None
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: iterable argument required...

10b207d4 07/22/2008 04:33 pm Oleksiy Mishchenko

Split RAPI resources to pieces

Reviewed-by: iustinp

53b1d12b 07/22/2008 11:17 am Michael Hanselmann

Split conditions in worker pool

This patch splits the single threading.Condition object used in the
worker pool for synchronization into three.

- worker_to_pool: Notified if a worker wants to notify the pool
- pool_to_worker: Notified if the pool wants to notify a single...

de499029 07/21/2008 06:32 pm Michael Hanselmann

Add signal handler class

This signal handler class abstracts some of the code previously
used in other places. It also uninstalls its handler when Reset()
is called or the class is destructed, thereby restoring the
previous behaviour.

Reviewed-by: iustinp

bac5ffc3 07/17/2008 03:51 pm Oleksiy Mishchenko

Implement jobs resource in RAPI

Reviewed-by: imsnah

8075ce7e 07/16/2008 03:17 pm Oleksiy Mishchenko

Breath life in to RAPI for trunk

Reviewed-by: imsnah

a7399f66 07/15/2008 06:47 pm Iustin Pop

Documentation updates

Reviewed-by: imsnah

0e46916d 07/15/2008 02:56 pm Iustin Pop

Rename BaseJO to BaseOpCode

Since we don't have for now a job definition object anymore, we rename
this class to BaseOpCode. It's still useful (and not merged with OpCode)
since it holds all the 'pure' logic (no custom field handling, etc.)
whereas OpCode holds opcode specific data (OP_ID handling, etc)....

f0d874fe 07/15/2008 01:49 pm Iustin Pop

Sort the job list in _GetJobIDsUnlocked

Since the IDs are integers, we can simply sort them.

Reviewed-by: imsnah

d4104181 07/14/2008 06:22 pm Iustin Pop

Further fixes to enable RAPI startup

Note that since RAPI itself doesn't use luxi.Client yet, nothing works,
but at least it can startup now.

Reviewed-by: imsnah

d3f0bf8f 07/14/2008 06:04 pm Iustin Pop

Add forgotten RAPI constant

This was forgot on the forward-porting of RAPI.

Reviewed-by: imsnah

e2212007 07/14/2008 04:38 pm Iustin Pop

Improve cli.SubmitOpCode

Currently, the feedback_fn argument to SubmitOpCode is no longer used.
We still need it in burnin, so we re-enable it by making the code call
that function with the msg argument in case feedback_fn is callable. The
patch also modifies burnin to accept the new argument format (msg is not...

f1048938 07/14/2008 04:15 pm Iustin Pop

First version of user feedback fixes

This patch contains a raw version for fixing feedback_fn.

The new mechanism works as follows:
- instead of a per-Processor feedback_fn, there's one for each
ExecOpCode, so that feedback for different opcodes go via possibly...

ac0930b9 07/14/2008 02:27 pm Iustin Pop

Cache some jobs in memory

This patch adds a caching mechanisms to the JobStorage. Note that is
does not make the memory cache authoritative.

The algorithm is:
- all jobs loaded from disks are entered in the cache
- all new jobs are entered in the cache...

8a70e415 07/14/2008 02:12 pm Iustin Pop

Fix JobStorage._GetJobIDsUnlocked

The job ID returned must be an integer (and the regex enforces that),
but we didn't convert it manually.

Reviewed-by: imsnah

911a495b 07/14/2008 01:08 pm Iustin Pop

Change JobStorage to work with ids not filenames

Currently some of the functions in JobStorage work with filenames (which
is an implementation detail and should only be used when dealing with
the storage) and not with job IDs. We need to change this in order to...

f1da30e6 07/11/2008 07:17 pm Michael Hanselmann

Add experimental persistency to job queue

It's not perfect and it's not finished, but it's a start.

- Serial number is read only once, but written on each update
- Jobs are kept only on disk (caching will be implemented)

Reviewed-by: iustinp

18682bca 07/11/2008 06:45 pm Iustin Pop

Convert to the logging module

The patch also switches some of the exception logs to use
logging.exception (and therefore the log message will have a diferent

(Note that this might not be a good choice in all cases, though)

Reviewed-by: imsnah

a237d0a8 07/11/2008 06:45 pm Iustin Pop

Add PID to all logs

This patch (for trunk) adds the PID to all daemon logs.

Reviewed-by: imsnah

a17a7623 07/11/2008 04:54 pm Iustin Pop

Fix backend.NodeVolumes handling of LVM output

This is the same fix as for GetVolumeList.

I've checked manually and all other places that call lvm commands are
already checking the output validity in terms of correct number of

Reviewed-by: ultrotter

df4c2628 07/11/2008 04:23 pm Iustin Pop

Fix backend.GetVolumeList handling of LVM output

Sometimes ‘lvs’ can spit error messages on stdout, even when one wants
to parse the output:
Inconsistent metadata copies found - updating to use version 2776

So we need to validate the output to guard against such cases....

a43f68dc 07/11/2008 03:20 pm Michael Hanselmann

Add generic HTTP server classes

Some of the code is adopted from the 1.2 branch
(lib/rapi/ This code can be used as a base for the
various HTTP servers in Ganeti.

Reviewed-by: iustinp

af30b2fd 07/11/2008 01:25 pm Michael Hanselmann

Make "gnt-job list" work again

"gnt-job list" was broken after my recent changes in the RPC
between clients and the master. This patch makes it work again.

Reviewed-by: iustinp

8c229cc7 07/11/2008 12:47 pm Oleksiy Mishchenko

Initial copy of RAPI filebase to the trunk

Reviewed-by: iustinp

eb0f0ce0 07/10/2008 03:38 pm Michael Hanselmann

Move watcher's LockFile function to utils

Reviewed-by: iustinp

307149a8 07/10/2008 03:29 pm Iustin Pop

Switch _QueuedOpCode to have their own lock

Right now, the queued opcode doesn't have a lock, and instead relies on
the parent QueuedJob's lock.

This is not good for logging feedback, so it's better to have a lock for
each queuedopcode.

Reviewed-by: ultrotter

7996a135 07/10/2008 03:17 pm Iustin Pop

Add a simple decorator for instance methods

This is just a simple, hardcoded decorator for object methods needing
synchronization on the _lock instance attribute.

Reviewed-by: ultrotter

c8549bfd 07/10/2008 12:22 pm Michael Hanselmann

jqueue: Log more information when running opcodes

Reviewed-by: iustinp

ff5fac04 07/09/2008 05:46 pm Iustin Pop

Fix double-logging in daemons

Currently, in debug mode, both the logfile handler and the stderr
handler will log debug messages. Since the stderr is redirected to the
same logfile (to catch non-logged errors), it means log entries are

The patch adds an extra parameter to the logger.SetupDaemon() function...

68676a00 07/09/2008 01:41 pm Iustin Pop

Move the master socket in the ganeti run dir

... as it was intended from the beggining, but by mistake left in the
top run dir.

Reviewed-by: ultrotter

cb999543 07/09/2008 01:41 pm Iustin Pop

Reduce duplicate Attach() calls in bdev

Currently, the 'public' functions of bdev (FindDevice and
AttachOrAssemble) will call the Attach() method right after class

But the constructor itself calls this function, and therefore we have
duplicate Attach() calls (which are not cheap at all)....

468c5f77 07/09/2008 01:41 pm Iustin Pop

Convert to the logging module

This does not enhance in any way the messages; it just switches to the
new module.

Reviewed-by: imsnah

bb698c1f 07/09/2008 01:41 pm Iustin Pop

Convert to the logging module

The patch also logs all commands executed from RunCmd when we are at
debug level.

Reviewed-by: imsnah

d4fa5c23 07/09/2008 01:41 pm Iustin Pop

Remove the old locking functions

This removes (hopefully) all traces of the old locking functions and

Reviewed-by: imsnah

2467e0d3 07/09/2008 01:34 pm Michael Hanselmann

Remove old job queue code

Reviewed-by: iustinp

0bbe448c 07/09/2008 01:34 pm Michael Hanselmann

Change masterd/client RPC protocol

- Introduce abstraction class on client side
- Use constants for method names
- Adopt legacy function SubmitOpCode to use it

Reviewed-by: iustinp

3d8548c4 07/09/2008 01:34 pm Michael Hanselmann

Make luxi RPC more flexible

- Use constants for dict entries
- Handle exceptions on server side
- Rename client function to CallMethod to match server side naming

Reviewed-by: iustinp

e2715f69 07/09/2008 01:33 pm Michael Hanselmann

Add very simple job queue

Reviewed-by: iustinp

fbe9022f 07/08/2008 07:32 pm Guido Trotter

Convert LUTestDelay to concurrent usage

In order to do so:
- We set REQ_BGL to False
- We implement ExpandNames

That's it, really.

Reviewed-by: iustinp

68adfdb2 07/08/2008 07:32 pm Guido Trotter

Processor: Acquire locks before executing an LU

If we're running in a "new style" LU we may need some locks, as required
by the ExpandNames function, to be able to run. We'll walk up the lock
levels present in the needed_locks dictionary and acquire them, then run...

d465bdc8 07/08/2008 07:31 pm Guido Trotter

LogicalUnit: add ExpandNames function

New concurrent LUs will need to call ExpandNames so that any names
passed in by the user are canonicalized, and can be used by hooks,
locking and other parts of the code. This was done in CheckPrereq
before, but it's now splitted out, as it's needed for locking, which in...

36c381d7 07/08/2008 07:31 pm Guido Trotter

Processor: Move LU execution to its own method

This makes the try...finally code simplier, and helps adding a more
complex locking structure before the actual execution. It also fixes a
concurrency bug caused by the fact that write_count was read before
acquiring the BGL, and thus spurious config update hooks run could have...

5f33b613 07/08/2008 06:10 pm Michael Hanselmann

constants: Add job and opcode status strings

Reviewed-by: iustinp

b3558df1 07/08/2008 06:03 pm Michael Hanselmann

workerpool: Don't notify if there was no task

Workers have to notify their pool if they finished a task to make
the WorkerPool.Quiesce function work. This is done in the finally:
clause to notify even in case of an exception. However, before
we notified on each run, even if there was no task, thereby creating...

75afaefc 07/08/2008 05:42 pm Iustin Pop

Add a top level RUN_GANETI_DIR constant

This patch creates a base RUN_GANETI_DIR and then moves the other run
dir constants to use that (even if just setting BDEV_CACHE_DIR as equal
to it, rather than putting it deeper, for now).

Also we create a constant list of all the subdirs we need in RUN_DIR to...

bf94c0f5 07/08/2008 05:41 pm Iustin Pop

symlinks: Add DISK_LINKS_DIR constant

The DISK_LINKS_DIR points to the RUN_DIR/ganeti/instance-disks
directory, which will contain symlinks to the instances' disks. These
provide a stable name accross all nodes for them, and permit
live-migration to happen....

fad50141 07/08/2008 02:16 pm Michael Hanselmann

luxi: Use serializer module instead of simplejson

Reviewed-by: iustinp

071448fb 07/08/2008 12:38 pm Michael Hanselmann

serializer.DumpJson: Control indentation by parameter

If the simplejson module supports indentation, it's always used. There
are cases where we might not want to use it or enable it only for
debugging purposes, such as in RPC.

Reviewed-by: iustinp

6048c986 07/08/2008 12:14 pm Guido Trotter

Add a missing import to cmdlib

cmdlib uses some constants from locking (ie. locking levels) but doesn't
import it. This patch fixes the issue.

Reviewed-by: iustinp

f64c9de6 07/08/2008 11:55 am Guido Trotter

Fix an error accessing the cfg

Since the context is passed to LogicalUnit, rather than the cfg, we can
only access the cfg as self.cfg, self.context.cfg, or context.cfg (in
the constructor). cfg is not valid anymore.

Reviewed-by: iustinp

a2fd9afc 07/08/2008 11:49 am Guido Trotter

Add and remove instance/node locks

Whenever we add an instance or node to the cluster (i.e. to the config
and whenever we remove them we should add/remove locks as well). In the
future we may want to optimize this so that the configwriter does it, or
it's handled at the context level, but till we're adding/removing...

77b657a3 07/08/2008 11:49 am Guido Trotter

Pass context to LUs

Rather than passing a ConfigWriter to the LUs we'll pass the whole
context, from which a ConfigWriter can be extracted, but we can also
access the GanetiLockManager. This also fixes the places where a FakeLU
is created.

Reviewed-by: iustinp

0b097284 07/08/2008 11:49 am Guido Trotter

Fix a typo in LUTestDelay docstring

Reviewed-by: iustinp