Statistics
| Branch: | Tag: | Revision:

root / lib / jqueue.py @ 8b3fd458

History | View | Annotate | Download (23.3 kB)

# Date Author Comment
72737a7f 10/10/2008 12:55 pm Iustin Pop

Convert rpc module to RpcRunner

This big patch changes the call model used in internode-rpc from
standalong function calls in the rpc module to via a RpcRunner class,
that holds all the methods. This can be used in the future to enable
smarter processing in the RPC layer itself (some quick examples are not...

e92376d7 10/07/2008 11:03 am Iustin Pop

Implement job 'waiting' status

Background: when we have multiple jobs in the queue (more than just a
few), many of the jobs (up to the number of threads) will be in state
'running', although many of them could be actually blocked, waiting for
some locks. This is not good, as one cannot easily see what is...

07cd723a 10/06/2008 07:42 pm Iustin Pop

Implement job auto-archiving

This patch adds a new luxi call that implements auto-archiving of jobs
older than a certain age (or -1 for all completed jobs), and the gnt-job
command that makes use of this (with 'all' for -1).

Reviewed-by: imsnah

1daae384 10/06/2008 04:29 pm Iustin Pop

Increase the number of threads to 25

Since our locks are not gathered nicely, we can have jobs that are
actually blocking on locks (parallel burnin shows this), so at least we
need to increase the number of threads above the usual number of jobs we
could have in a such a case....

c56ec146 09/30/2008 03:04 pm Iustin Pop

Enhance the job-related timestamps

This patch adds start, stop, and received timestamp for jobs (and allows
querying of them), and allows querying of the opcode timestamps.

Reviewed-by: imsnah

5b23c34c 09/29/2008 06:38 pm Iustin Pop

Add opcode execution log in job info

This patch adds the job execution log in “gnt-job info” and also allows
its selection in “gnt-job list” (however here it's not very useful as
it's not easy to parse). It does this by adding a new field in the query
job call, named ‘oplog’....

60dd1473 09/29/2008 04:09 pm Iustin Pop

Implement job summary in gnt-job list

It is not currently possibly to show a summary of the job in the output
of “gnt-job list”. The closes is listing the whole opcode(s), but that
is too verbose. Also, the default output (id, status) is not very
useful, unless one looks for (and knows about) an exact job ID....

3b87986e 09/29/2008 04:08 pm Iustin Pop

Nicely sort the job list

Unless we decide to change the job identifiers to integer, we should at
least sort the list returned by _GetJobIDsUnlocked.

Reviewed-by: imsnah

e74798c1 09/10/2008 08:46 pm Michael Hanselmann

jqueue: Add common RPC error handling function

We didn't decide yet what exactly it should do with failed nodes.

Reviewed-by: ultrotter

5c735209 08/29/2008 04:42 pm Iustin Pop

Make WaitForJobChanges deal with long jobs

This patch alters the WaitForJobChanges luxi-RPC call to have a
configurable timeout, so that the call behaves nicely with long jobs
that have no update.

We do this by adding a timeout parameter in the RPC call, and returning...

5685c1a5 08/27/2008 05:52 pm Michael Hanselmann

jqueue: Replace normal cache dict with weakref dict

A job should only exist once in memory. After the cache is cleaned,
there can still be references to a job somewhere else. If there
are multiple instances, one can get updated while a function is
waiting for changes on another instance. By using...

70552c46 08/27/2008 05:52 pm Michael Hanselmann

jqueue: Keep timestamp of opcode start and end

Reviewed-by: ultrotter

65548ed5 08/27/2008 05:48 pm Michael Hanselmann

jqueue: Reset run_op_idx after job is done

It can be confusing otherwise.

Reviewed-by: ultrotter

6c5a7090 08/27/2008 11:34 am Michael Hanselmann

Make sure that client programs get all messages

This is a large patch, but I can't figure out how to split it without
breaking stuff. The old way of getting messages by always getting the
last one didn't bring all messages to the client if they were added...

dfe57c22 08/11/2008 07:27 pm Michael Hanselmann

Add RPC call to wait for job changes

This way clients can react faster to status or message changes and
don't have to poll anymore.

Reviewed-by: ultrotter

d5e317ba 08/11/2008 07:27 pm Michael Hanselmann

jqueue: Change log message time format

See the comment in the patch.

Reviewed-by: ultrotter

abc1f2ce 08/08/2008 02:21 pm Michael Hanselmann

jqueue: Move archived jobs on all nodes

Otherwise one might have archived jobs back in the list after a master
failover.

Reviewed-by: iustinp

5d6fb8eb 08/08/2008 01:03 pm Michael Hanselmann

jstore: Change to not always require a lock

This way we can do locking when both noded and masterd are running
on the same machine, the latter holding an exclusive lock on the
queue.

Reviewed-by: iustinp

9f774ee8 08/08/2008 01:01 pm Michael Hanselmann

jqueue: Use new job queue RPC functions

Reviewed-by: iustinp

d2e03a33 08/06/2008 04:36 pm Michael Hanselmann

jqueue: Implement {Add,Remove}Node

These functions will be used to notify the queue about newly added
or removed nodes.

Reviewed-by: iustinp

4c848b18 08/06/2008 04:35 pm Michael Hanselmann

jqueue: Don't pass the list of nodes to SubmitJob anymore

The job queue now maintains its own list and is updated when
nodes are added or removed from the cluster.

Reviewed-by: iustinp

8e00939c 08/06/2008 04:35 pm Michael Hanselmann

Maintain node list in job queue

The code makes sure not to include the master in the list.

Reviewed-by: iustinp

23752136 08/05/2008 01:33 pm Michael Hanselmann

jqueue: Replicate jobs to all nodes

Newly added nodes are not yet taken care of. Queue locking on
non-master nodes is not yet correct.

Reviewed-by: iustinp

04ab05ce 08/04/2008 03:27 pm Michael Hanselmann

jqueue: Use new jstore module

Reviewed-by: iustinp

db37da70 07/31/2008 06:03 pm Michael Hanselmann

jqueue: Move assert into decorator

This reduces code duplication. A later patch will modify the job queue
a bit more and will need a change of this assert. The assertion is
also removed from all class-internal functions.

Reviewed-by: iustinp

5bdce580 07/31/2008 05:33 pm Michael Hanselmann

jqueue: Store context in job queue instead of worker pool

The job queue will need to access to configuration, which is provided
through the context object, to get a list of nodes.

Reviewed-by: iustinp

38206f3c 07/30/2008 05:04 pm Iustin Pop

Fix pylint-detected issues

This is mostly:
- whitespace fix (space at EOL in some files, not all, broken
indentation, etc)
- variable names overriding others (one is a real bug in there)
- too-long-lines
- cleanup of most unused imports (not all)...

85f03e0d 07/30/2008 01:02 pm Michael Hanselmann

Rewrite job queue

We found several issues in the old job queue implementation. It had race
conditions, deadlocks and other deficiencies.

Short summary:
- _QueuedOpCode and _QueuedJob are now more or less data structures with a few
utility functions. __Setup is gone....

8090e19f 07/29/2008 05:07 pm Michael Hanselmann

jqueue: Fix error logging

The passed parameters were not correct.

Reviewed-by: iustinp, ultrotter

188c5e0a 07/28/2008 01:13 pm Michael Hanselmann

Implement job canceling on server side

Locking is not completeley right due to a deadlock when the job calls
UpdateJob after changing its status.

Reviewed-by: ultrotter

4cb1d919 07/28/2008 12:16 pm Michael Hanselmann

Add “canceled” status for opcodes

Reviewed-by: ultrotter

fae737ac 07/25/2008 03:47 pm Michael Hanselmann

Move code extracting job ID into function

It might come in handy at some point and makes the code a bit easier
to read.

Reviewed-by: iustinp

c609f802 07/24/2008 02:32 pm Michael Hanselmann

Implement job archiving on the server side

So far no error reporting to the client is done. Clients don't get
noticed if a job doesn't exist or couldn't be archived because of
its current status.

The internal cache is always cleaned when the preconditions didn't...

0cb94105 07/24/2008 02:32 pm Michael Hanselmann

Add directory for archived jobs

Reviewed-by: iustinp

ce594241 07/23/2008 07:56 pm Michael Hanselmann

Move code formatting job ID into a base class

A later patch will add a memory based job storage class, hence this
code is going into a separate class. It also changes the number format
to always use at least 10 digits, allowing up to 9'999'999'999 jobs to...

21cc1fbd 07/23/2008 04:30 pm Michael Hanselmann

Rename JobStorage to DiskJobStorage

Reviewed-by: iustinp

205d71fd 07/23/2008 03:25 pm Michael Hanselmann

Fix logging with string job IDs

The job ID is now a string, hence logging must use %s instead of %d.

Reviewed-by: iustinp

3be9a705 07/23/2008 02:34 pm Michael Hanselmann

Make job ID a string

The docstring says that _NewSerialUnlocked returns “a string
representing the job identifier”. Until now it returned an
integer and this patch changes it.

Reviewed-by: iustinp

c3f0a12f 07/23/2008 01:06 pm Iustin Pop

Distribute the queue serial file after each update

This patch adds distribution of the queue serial file after each write
to it (but before a new job is created and written with that ID, and
before a response is returned, so we should be safe from crashes in...

c4beba1c 07/23/2008 01:06 pm Iustin Pop

Make the job storage init reuse a serial file

This will be needed for master failover. If we don't have a valid queue
directory, we need to reinitialize it, but we should keep the existing
serial number.

As such, we abstract the reading of the serial and if we find a valid...

57f8615f 07/22/2008 05:05 pm Michael Hanselmann

Make argument to CleanCacheUnlocked mandatory

Not passing the argument means it has the value None. Iterating None
doesn't work:
>>> "123" in None
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: iterable argument required...

bac5ffc3 07/17/2008 03:51 pm Oleksiy Mishchenko

Implement jobs resource in RAPI

Reviewed-by: imsnah

f0d874fe 07/15/2008 01:49 pm Iustin Pop

Sort the job list in _GetJobIDsUnlocked

Since the IDs are integers, we can simply sort them.

Reviewed-by: imsnah

f1048938 07/14/2008 04:15 pm Iustin Pop

First version of user feedback fixes

This patch contains a raw version for fixing feedback_fn.

The new mechanism works as follows:
- instead of a per-Processor feedback_fn, there's one for each
ExecOpCode, so that feedback for different opcodes go via possibly...

ac0930b9 07/14/2008 02:27 pm Iustin Pop

Cache some jobs in memory

This patch adds a caching mechanisms to the JobStorage. Note that is
does not make the memory cache authoritative.

The algorithm is:
- all jobs loaded from disks are entered in the cache
- all new jobs are entered in the cache...

8a70e415 07/14/2008 02:12 pm Iustin Pop

Fix JobStorage._GetJobIDsUnlocked

The job ID returned must be an integer (and the regex enforces that),
but we didn't convert it manually.

Reviewed-by: imsnah

911a495b 07/14/2008 01:08 pm Iustin Pop

Change JobStorage to work with ids not filenames

Currently some of the functions in JobStorage work with filenames (which
is an implementation detail and should only be used when dealing with
the storage) and not with job IDs. We need to change this in order to...

f1da30e6 07/11/2008 07:17 pm Michael Hanselmann

Add experimental persistency to job queue

It's not perfect and it's not finished, but it's a start.

- Serial number is read only once, but written on each update
- Jobs are kept only on disk (caching will be implemented)

Reviewed-by: iustinp

af30b2fd 07/11/2008 01:25 pm Michael Hanselmann

Make "gnt-job list" work again

"gnt-job list" was broken after my recent changes in the RPC
between clients and the master. This patch makes it work again.

Reviewed-by: iustinp

307149a8 07/10/2008 03:29 pm Iustin Pop

Switch _QueuedOpCode to have their own lock

Right now, the queued opcode doesn't have a lock, and instead relies on
the parent QueuedJob's lock.

This is not good for logging feedback, so it's better to have a lock for
each queuedopcode.

Reviewed-by: ultrotter

7996a135 07/10/2008 03:17 pm Iustin Pop

Add a simple decorator for instance methods

This is just a simple, hardcoded decorator for object methods needing
synchronization on the _lock instance attribute.

Reviewed-by: ultrotter

c8549bfd 07/10/2008 12:22 pm Michael Hanselmann

jqueue: Log more information when running opcodes

Reviewed-by: iustinp

2467e0d3 07/09/2008 01:34 pm Michael Hanselmann

Remove old job queue code

Reviewed-by: iustinp

e2715f69 07/09/2008 01:33 pm Michael Hanselmann

Add very simple job queue

Reviewed-by: iustinp

5d3a153a 06/13/2008 01:14 pm Guido Trotter

Fix a typo in jqueue.py

s/result/op_result/ (this code was never used, so this wasn't caught)

Reviewed-by: iustinp

35049ff2 04/10/2008 06:36 pm Iustin Pop

Add per-opcode results to job processing

This patch changes the definition of a job and introduces per-opcode
results.

First, the result and status fields of a job are condensed into a single
'status' attribute. Then, we introduce an opcode status and one result...

283439c9 04/07/2008 02:18 pm Iustin Pop

Implement selective job query

This patch implements query-ing of only selected jobs instead of all.

Reviewed-by: ultrotter

7a1ecaed 04/04/2008 03:44 pm Iustin Pop

Add a simple gnt-job script

This patch adds a very basic gnt-job script that allows job querying.
This goes on top of the previous master daemon patches.

Currently, because of the not-changed cmd lock, you can't query the jobs
as long as a job is running - you have to rm the cmd lock and then you...

498ae1cc 04/01/2008 04:04 pm Iustin Pop

A dumb queue implementation

This patch adds a very dumb in-memory only queue implementation.

Reviewed-by: imsnah