Statistics
| Branch: | Tag: | Revision:

root / daemons / ganeti-masterd @ 90b54c26

History | View | Annotate | Download (16 kB)

# Date Author Comment
b726aff0 06/15/2009 08:08 pm Iustin Pop

Convert node_start_master to new style result

This is used in multiple places outside cmdlib.py, so it's a more
interesting patch.

Signed-off-by: Iustin Pop <>
Reviewed-by: Guido Trotter <>

2971c913 05/21/2009 08:10 pm Iustin Pop

Add a luxi call for multi-job submit

As a workaround for the job submit timeouts that we have, this patch
adds a new luxi call for multi-job submit; the advantage is that all the
jobs are added in the queue and only after the workers can start
processing them....

dd36d829 05/04/2009 04:51 pm Iustin Pop

Fix luxi serialization in ganeti-masterd

Currently, lib/luxi.py used lib/serializer.py for encoding/decoding
messages, but the master daemon uses directly the simplejson module.
This is wrong as any non-trivial change to serializer.py will break the
master daemon....

77921a95 04/06/2009 11:21 am Iustin Pop

Disable synchronous (locking) queries

This patch raises an error in the master daemon in case the user
requests a locking query; accordingly, all clients were modified to send
only lockless queries. This is short-term fix, for proper fix the
clients should be modified to submit a job when the user request a...

e566ddbd 04/06/2009 11:20 am Iustin Pop

Add some more debugging info to masterd

This patch will log data about queries, which are today completely
invisible (at the default log level) in the master log file.

Reviewed-by: imsnah

9dae41ad 02/27/2009 07:08 pm Guido Trotter

Create runtime dir in bootstrap

Some hypervisors (KVM) need RUN_GANETI_DIR to exist even at cluster init
time. This patch creates it in InitCluster just before hv parameter
checking. Since the code to make list of directories is already repeated
twice in the code, and this would be the third time, we abstract it into...

5de4474d 02/12/2009 09:32 am Iustin Pop

master daemon: allow skipping the voting process

This patch introduces a 'force' mode for the master daemon startup where
the voting process is not done, but the user has to confirm manually the
startup (before forking, of course).

Reviewed-by: imsnah

66baeccc 02/04/2009 05:11 pm Iustin Pop

Add one new luxi query: cluster info

This is the last query that RAPI executes via opcodes and is purely
static (config values only). As such, we can convert it safely to a
query instead of job.

Reviewed-by: imsnah

ec79568d 02/04/2009 12:30 pm Iustin Pop

Implement lockless query operations

This patch adds the framework for, and enables lockless OpQueryInstances. This
means that instances will be shown in ERROR_up or ERROR_down state, even though
this is not an error (but just an in-progress job).

The framework is implemented as follows:...

c979d253 01/21/2009 04:12 pm Iustin Pop

Fix some more pylint errors

Two are real errors (invalid names) and one is style error (overriding
name from outer scope).

Reviewed-by: ultrotter

d21d09d6 01/20/2009 07:19 pm Iustin Pop

Update the logging output of job processing

(this is related to the master daemon log)

Currently it's not possible to follow (in the non-debug runs) the
logical execution thread of jobs. This is due to the fact that we don't
log the thread name (so we lose the association of log messages to jobs)...

7d88772a 01/09/2009 02:52 pm Iustin Pop

Rework the daemonization sequence

The current fork+close fds sequence has deficiencies which are hard to
work around:
- logging can start logging before we fork (e.g. if we need to emit
messages related to master checking), and thus use FDs which we...

e09fdcfa 01/06/2009 11:57 am Iustin Pop

Fix some pylint-detected issues

Two bad indentation cases and a missing variable.

Reviewed-by: imsnah

f8ad5591 12/18/2008 06:38 pm Michael Hanselmann

Prevent RPC timeout on auto-archiving jobs

With a large job queue, auto-archiving jobs can take a very long time,
causing timeouts on the luxi RPC layer. With this change, auto-
archive returns after half of the RPC timeout has passed. The user
will see how many jobs are left unchecked....

c41eea6e 12/11/2008 07:13 pm Iustin Pop

Fix epydoc format warnings

This patch should fix all outstanding epydoc parsing errors; as such, we
switch epydoc into verbose mode so that any new errors will be visible.

Reviewed-by: imsnah

bbe19c17 12/02/2008 07:07 am Iustin Pop

Fix master failover

The ssconf files were not updated by the master failover. We need to
push them, and since we already have RPC initialized, we can use the
standard ConfigWriter to do so - this will take care of both the config
file and the ssconf files....

1cb8d376 11/26/2008 06:49 pm Guido Trotter

ganeti-masterd: create RUN_GANETI_DIR as well

Since we're not sure ganeti-noded has started yet, we need to create
RUN_GANETI_DIR before SOCKET_DIR as well, with the proper permissions.

Reviewed-by: imsnah

227647ac 11/25/2008 07:11 pm Guido Trotter

Move the MASTER_SOCKET to SOCKET_DIR

Before it was in the abstract linux namespace, where unfortunately we
couldn't easily check from python the credentials of the connecting
clients. Now we also have to remove the file on exit and when starting.

Reviewed-by: imsnah

d823660a 11/25/2008 07:11 pm Guido Trotter

ganeti-masterd: create SOCKET_DIR

If SOCKET_DIR doesn't exist we create it in the master daemon, before
trying to put a socket inside it.

Reviewed-by: imsnah

15486fa7 11/21/2008 12:46 pm Michael Hanselmann

ganeti-masterd: Remove PID file at the end

Removing the PID file should be the last thing done. This patch makes
sure it's also removed when master.server_cleanup() throws an exception.

Also initialize logging only after writing the PID file.

Reviewed-by: iustinp

4331f6cd 11/21/2008 12:45 pm Michael Hanselmann

Reuse HTTP client pool for RPC

ganeti-masterd: Add initialization and shutdown of RPC pool. It needs
to be shutdown before forking.

ganeti.cli: Add decorator function to initialize and shutdown RPC pool.

ganeti.rpc: Add functions to initialize and shutdown RPC pool. Throw...

99aabbed 10/20/2008 05:47 pm Iustin Pop

Convert the job queue rpcs to address-based

The two main multi-node job queue RPC calls (jobqueue_update,
jobqueue_rename) are converted to address-based calls, in order to speed
up queue changes. For this, we need to change the _nodes attribute on
the jobqueue to be a dict {name: ip}, instead of a set....

82d9caef 10/20/2008 03:50 pm Iustin Pop

Remove the logger.py module

Since now we use only one function from the logger module
(SetupLogging), we move it to utils.py (which is already imported by all
users of this function), and we remove the module.

Reviewed-by: imsnah

d7cdb55d 10/16/2008 02:36 pm Iustin Pop

Improvements to the master startup checks

In order to account for future improvements to master failover, we move
the actual data gathering capabilities from ganeti-masterd into
bootstrap.py, and we leave only the verification into masterd.

The verification procedure is then changed to retry multiple times (up...

3ccafd0e 10/16/2008 11:37 am Iustin Pop

Add an interface for the drain flag changes/query

This adds the set/reset in the jqueue and luxi modules, and a way to
query it in OpQueryConfigValues, and also the comand line interface for
it:
$ gnt-cluster queue info
The drain flag is unset
$ gnt-cluster queue drain...

6797ec29 10/15/2008 01:51 pm Iustin Pop

Implement transport of ganeti errors across luxi

This patch adds a generic method to identify the ganeti error given its
class name, and implements this across the luxi protocol.

Reviewed-by: imsnah

72737a7f 10/10/2008 12:55 pm Iustin Pop

Convert rpc module to RpcRunner

This big patch changes the call model used in internode-rpc from
standalong function calls in the rpc module to via a RpcRunner class,
that holds all the methods. This can be used in the future to enable
smarter processing in the RPC layer itself (some quick examples are not...

e92376d7 10/07/2008 11:03 am Iustin Pop

Implement job 'waiting' status

Background: when we have multiple jobs in the queue (more than just a
few), many of the jobs (up to the number of threads) will be in state
'running', although many of them could be actually blocked, waiting for
some locks. This is not good, as one cannot easily see what is...

07cd723a 10/06/2008 07:42 pm Iustin Pop

Implement job auto-archiving

This patch adds a new luxi call that implements auto-archiving of jobs
older than a certain age (or -1 for all completed jobs), and the gnt-job
command that makes use of this (with 'all' for -1).

Reviewed-by: imsnah

a42872ff 10/01/2008 08:36 pm Michael Hanselmann

Convert ganeti-master

Use simpleconfig instead of ssconf.

Reviewed-by: iustinp

ae5849b5 10/01/2008 08:33 pm Michael Hanselmann

Add new query to get cluster config values

This can be used to retrieve certain cluster config values from
within clients.

OpDumpClusterConfig was not used anywhere, hence I'm just reusing
it. The way ConfigWriter.DumpConfig returned the configuration
was not thread-safe, anyway (no deepcopy)....

36205981 09/09/2008 03:25 pm Iustin Pop

Implement master startup safety check

This is an initial version of the master startup checks. It's a very
rudimentary change, however in normal usage (an old master was started,
the rest of the cluster is functioning normally) it will succeed in
preventing wrong startups....

5c735209 08/29/2008 04:42 pm Iustin Pop

Make WaitForJobChanges deal with long jobs

This patch alters the WaitForJobChanges luxi-RPC call to have a
configurable timeout, so that the call behaves nicely with long jobs
that have no update.

We do this by adding a timeout parameter in the RPC call, and returning...

6c5a7090 08/27/2008 11:34 am Michael Hanselmann

Make sure that client programs get all messages

This is a large patch, but I can't figure out how to split it without
breaking stuff. The old way of getting messages by always getting the
last one didn't bring all messages to the client if they were added...

9894ece7 08/18/2008 02:12 pm Michael Hanselmann

Use Linux-specific way to name master socket

By using this Linux-specific way we don't have to care about removing the
socket file when quitting or starting (after an unclean shutdown). For a
more detailed description, see the comment in the patch.

Reviewed-by: schreiberal

dfe57c22 08/11/2008 07:27 pm Michael Hanselmann

Add RPC call to wait for job changes

This way clients can react faster to status or message changes and
don't have to poll anymore.

Reviewed-by: ultrotter

32f93223 08/08/2008 02:29 pm Michael Hanselmann

Add query function for exports

Reviewed-by: iustinp

c36176cc 08/06/2008 05:56 pm Michael Hanselmann

Notify job queue about added/removed nodes

The job queue maintains its own node list and must be notified
when nodes are added/removed.

Reviewed-by: iustinp

d8470559 08/06/2008 05:56 pm Michael Hanselmann

Implement {Add,Readd,Remove}Node in GanetiContext

By doing this we've a central place which coordinates what needs to be
done when adding or removing nodes. Another patch will add calls into
the job queue.

Two log messages move to config.py.

When removing a node, node_leave_cluster is now called after it has...

4c848b18 08/06/2008 04:35 pm Michael Hanselmann

jqueue: Don't pass the list of nodes to SubmitJob anymore

The job queue now maintains its own list and is updated when
nodes are added or removed from the cluster.

Reviewed-by: iustinp

9113300d 08/06/2008 04:35 pm Michael Hanselmann

masterd: Move job queue into context object

The job queue must be called from cmdlib when adding or removing
nodes to the cluster. Moving it to the context objects makes
this possible.

Reviewed-by: iustinp

02f7fe54 08/06/2008 11:26 am Michael Hanselmann

Implement query for nodes

Reviewed-by: iustinp

ee6c7b94 08/06/2008 11:25 am Michael Hanselmann

Implement query for instances

Queries don't create jobs and are more efficient. Log messages
are not yet stored anywhere.

Reviewed-by: iustinp

59f187eb 07/30/2008 03:32 pm Iustin Pop

Unify SetupDaemon/SetupLogging

The 'old-style' info, error, debug logs do not make much sense. This
patch unifies the SetupLogging and SetupDaemon functions. As a result,
all the commands logs to a 'commands.log' file.

The patch also changes the log setup to keep going if there's an error...

b1b6ea87 07/30/2008 11:43 am Iustin Pop

Rework master startup/shutdown/failover

This (big) patch reworks the master startup/shutdown and the fixes the
master failover.

What does the patch do?

For master start/stop:
- remove the old ganeti-master script and its associated man page
- moves the ip start/stop directly into the backend.(Start|Stop)Master...

5675cd1f 07/30/2008 11:33 am Iustin Pop

Implement checking for the master role in rapi

This patch moves the CheckMaster function from ganeti-masterd to ssconf
(most logical place, it cannot go in utils since we would have recursive
imports between ssconf and utils) and changes ganeti-rapi to also call...

99e88451 07/29/2008 12:06 pm Iustin Pop

Use constants for the pid file stems

Reviewed-by: imsnah

3a2c7775 07/24/2008 02:32 pm Michael Hanselmann

Fix RPC parameters for {Cancel,Archive}Job

They aren't be tuples on the client side.

Reviewed-by: iustinp

8feda3ad 07/23/2008 05:23 pm Guido Trotter

ganeti-masterd: write and remove pidfile

Reviewed-by: iustinp

c3f0a12f 07/23/2008 01:06 pm Iustin Pop

Distribute the queue serial file after each update

This patch adds distribution of the queue serial file after each write
to it (but before a new job is created and written with that ID, and
before a response is returned, so we should be safe from crashes in...

610bc9ee 07/21/2008 06:32 pm Michael Hanselmann

Use new signal handler class in master daemon

Reviewed-by: ultrotter

36088c4c 07/14/2008 06:52 pm Michael Hanselmann

Fix previous patch using workerpool in masterd

The function to stop a worker pool is TerminateWorkers(), not Shutdown().

Reviewed-by: iustinp

23e50d39 07/14/2008 06:42 pm Michael Hanselmann

Use workerpool in master daemon

Reusing threads instead of starting one for each request is more efficient.

Reviewed-by: iustinp

0ed468d3 07/10/2008 03:48 pm Michael Hanselmann

Remove more old job queue code

Apparently I forgot to this code when removing the rest.

Reviewed-by: iustinp

ff5fac04 07/09/2008 05:46 pm Iustin Pop

Fix double-logging in daemons

Currently, in debug mode, both the logfile handler and the stderr
handler will log debug messages. Since the stderr is redirected to the
same logfile (to catch non-logged errors), it means log entries are
doubled.

The patch adds an extra parameter to the logger.SetupDaemon() function...

d4fa5c23 07/09/2008 01:41 pm Iustin Pop

Remove the old locking functions

This removes (hopefully) all traces of the old locking functions and
uses.

Reviewed-by: imsnah

2467e0d3 07/09/2008 01:34 pm Michael Hanselmann

Remove old job queue code

Reviewed-by: iustinp

0bbe448c 07/09/2008 01:34 pm Michael Hanselmann

Change masterd/client RPC protocol

- Introduce abstraction class on client side
- Use constants for method names
- Adopt legacy function SubmitOpCode to use it

Reviewed-by: iustinp

3d8548c4 07/09/2008 01:34 pm Michael Hanselmann

Make luxi RPC more flexible

- Use constants for dict entries
- Handle exceptions on server side
- Rename client function to CallMethod to match server side naming

Reviewed-by: iustinp

50a3fbb2 07/09/2008 01:34 pm Michael Hanselmann

Instantiate new job queue in master daemon

Reviewed-by: iustinp

3b316acb 07/03/2008 03:06 pm Iustin Pop

Add custom logging setup for daemons

It's better for daemons if:
- they log only to one log file
- the log level is included
- for debug runs, the filename/line number is included

This patch moves the custom formatter from the watcher to the logging...

cc2bea8b 07/02/2008 02:58 pm Michael Hanselmann

ganeti-masterd: Remove unused locking code

Reviewed-by: iustinp, ultrotter

96cb3986 07/02/2008 02:58 pm Michael Hanselmann

ganeti-masterd: Use logging module

Reviewed-by: ultrotter, iustinp

984f7c32 07/01/2008 03:28 pm Guido Trotter

Context: s/GLM/glm/

Make the GanetiLockManager instance of GanetiContext lowercase

Reviewed-by: imsnah

a478cd7e 07/01/2008 01:43 pm Guido Trotter

Increase the thread size to 5

Now that we use the locking library to make sure running opcodes cannot
step on each other toes we can have a bigger thread size, and
potentially process many opcodes in a parallel manner.

Reviewed-by: iustinp

1c901d13 07/01/2008 01:43 pm Guido Trotter

Processor: pass context in and use it.

The processor used to create a new ConfigWriter when it was initialized.
We now have one in the context, so we'll just recycle it. First of all
we'll pass the context in when creating a new Processor object, then
we'll just use context.cfg, which is granted to be initialized, wherever...

39dcf2ef 06/30/2008 03:37 pm Guido Trotter

ganeti-masterd: init and distribute common context

This patch creates a new GanetiContext class, which is used to hold
context common to all ganeti worker threads. As for the
GanetiLockingManager class it is paramount that there is only one such
class throughout the execution of Ganeti, so the class checks for that,...

0db7ac4d 06/23/2008 06:00 pm Guido Trotter

Handle any exception in ganeti-masterd

If an uncaught exception is thrown currently it destroys the calling
thread. This patch changes the behaviour to failing the current job,
logging a message, but trying to keep the daemon up.

Reviewed-by: imsnah

ea6e6c2b 06/13/2008 01:14 pm Guido Trotter

Fail job on ganeti exceptions

When a Job raises a ganeti exception a message is printed but nothing is
reported in the job itself. It's better to update the job status, thus
notifying the client, possibly polling for the job result, of what went
wrong.

Reviewed-by: iustinp

ce862cd5 05/01/2008 02:15 pm Guido Trotter

ganeti-masterd: Some docstrings work

- Add a docstring to IOServer's constructor
- Add argument description to PoolWorker's and JobRunner's ones

Reviewed-by: iustinp

b74159ee 04/29/2008 10:37 am Iustin Pop

Disable forking in the master daemon

This patch adds a mechanism to disable utils.RunCmd in selected
programs. This is needed in the master daemon unless we confirm
threading doesn't pose any problems.

This makes cluster init fail, but creating new trunk clusters is anyway...

a4af651e 04/28/2008 04:02 pm Iustin Pop

Move the 'cmd' lock from cli.py to ganeti-masterd

This patch removes the lock and the lock options from cli.py and moves
them to the master.

Later during development we can remove it completely, but for now it's
good to protect any other tool that uses the lock directly....

685ee993 04/28/2008 04:01 pm Iustin Pop

Convert cli.SubmitOpCode to use the master

This patch converts the cli.py SubmitOpCode method to use the unix
protocol and thus execute the opcodes via the master.

The patch allows a partial burnin to work with the master. Currently the
query opcodes, since they are executed via the SubmitOpCode, are...

35049ff2 04/10/2008 06:36 pm Iustin Pop

Add per-opcode results to job processing

This patch changes the definition of a job and introduces per-opcode
results.

First, the result and status fields of a job are condensed into a single
'status' attribute. Then, we introduce an opcode status and one result...

c1f2901b 04/05/2008 06:29 pm Iustin Pop

Implement forking/master role checking in masterd

This patch adds checks for the master role and daemonize support to
ganeti-masterd.

The patch modifies the startup/shutdown of the server because:
- we want bind()/listen() to the master socket to occur before forking...

7a1ecaed 04/04/2008 03:44 pm Iustin Pop

Add a simple gnt-job script

This patch adds a very basic gnt-job script that allows job querying.
This goes on top of the previous master daemon patches.

Currently, because of the not-changed cmd lock, you can't query the jobs
as long as a job is running - you have to rm the cmd lock and then you...

ffeffa1d 04/01/2008 05:45 pm Iustin Pop

Initial tests with ganeti-masterd

This patch adds a very in-progress master daemon. This needs to be
launched manually, does not background itself, but can be used for
opcode execution.

Also parts of this code should be moved to luxi.py.

Reviewed-by: ultrotter