root / doc / design-2.0-job-queue.rst @ 23828f1c
History | View | Annotate | Download (5.7 kB)
1 |
Job Queue |
---|---|
2 |
========= |
3 |
|
4 |
.. contents:: |
5 |
|
6 |
Overview |
7 |
-------- |
8 |
|
9 |
In Ganeti 1.2, operations in a cluster have to be done in a serialized way. |
10 |
Virtually any operation locks the whole cluster by grabbing the global lock. |
11 |
Other commands can't return before all work has been done. |
12 |
|
13 |
By implementing a job queue and granular locking, we can lower the latency of |
14 |
command execution inside a Ganeti cluster. |
15 |
|
16 |
|
17 |
Detailed Design |
18 |
--------------- |
19 |
|
20 |
Job execution—“Life of a Ganeti job” |
21 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
22 |
|
23 |
#. Job gets submitted by the client. A new job identifier is generated and |
24 |
assigned to the job. The job is then automatically replicated [#replic]_ |
25 |
to all nodes in the cluster. The identifier is returned to the client. |
26 |
#. A pool of worker threads waits for new jobs. If all are busy, the job has |
27 |
to wait and the first worker finishing its work will grab it. Otherwise any |
28 |
of the waiting threads will pick up the new job. |
29 |
#. Client waits for job status updates by calling a waiting RPC function. |
30 |
Log message may be shown to the user. Until the job is started, it can also |
31 |
be cancelled. |
32 |
#. As soon as the job is finished, its final result and status can be retrieved |
33 |
from the server. |
34 |
#. If the client archives the job, it gets moved to a history directory. |
35 |
There will be a method to archive all jobs older than a a given age. |
36 |
|
37 |
.. [#replic] We need replication in order to maintain the consistency across |
38 |
all nodes in the system; the master node only differs in the fact that |
39 |
now it is running the master daemon, but it if fails and we do a master |
40 |
failover, the jobs are still visible on the new master (even though they |
41 |
will be marked as failed). |
42 |
|
43 |
Failures to replicate a job to other nodes will be only flagged as |
44 |
errors in the master daemon log if more than half of the nodes failed, |
45 |
otherwise we ignore the failure, and rely on the fact that the next |
46 |
update (for still running jobs) will retry the update. For finished |
47 |
jobs, it is less of a problem. |
48 |
|
49 |
Future improvements will look into checking the consistency of the job |
50 |
list and jobs themselves at master daemon startup. |
51 |
|
52 |
|
53 |
Job storage |
54 |
~~~~~~~~~~~ |
55 |
|
56 |
Jobs are stored in the filesystem as individual files, serialized |
57 |
using JSON (standard serialization mechanism in Ganeti). |
58 |
|
59 |
The choice of storing each job in its own file was made because: |
60 |
|
61 |
- a file can be atomically replaced |
62 |
- a file can easily be replicated to other nodes |
63 |
- checking consistency across nodes can be implemented very easily, since |
64 |
all job files should be (at a given moment in time) identical |
65 |
|
66 |
The other possible choices that were discussed and discounted were: |
67 |
|
68 |
- single big file with all job data: not feasible due to difficult updates |
69 |
- in-process databases: hard to replicate the entire database to the |
70 |
other nodes, and replicating individual operations does not mean wee keep |
71 |
consistency |
72 |
|
73 |
|
74 |
Queue structure |
75 |
~~~~~~~~~~~~~~~ |
76 |
|
77 |
All file operations have to be done atomically by writing to a temporary file |
78 |
and subsequent renaming. Except for log messages, every change in a job is |
79 |
stored and replicated to other nodes. |
80 |
|
81 |
:: |
82 |
|
83 |
/var/lib/ganeti/queue/ |
84 |
job-1 (JSON encoded job description and status) |
85 |
[…] |
86 |
job-37 |
87 |
job-38 |
88 |
job-39 |
89 |
lock (Queue managing process opens this file in exclusive mode) |
90 |
serial (Last job ID used) |
91 |
version (Queue format version) |
92 |
|
93 |
|
94 |
Locking |
95 |
~~~~~~~ |
96 |
|
97 |
Locking in the job queue is a complicated topic. It is called from more than |
98 |
one thread and must be thread-safe. For simplicity, a single lock is used for |
99 |
the whole job queue. |
100 |
|
101 |
A more detailed description can be found in doc/locking.txt. |
102 |
|
103 |
|
104 |
Internal RPC |
105 |
~~~~~~~~~~~~ |
106 |
|
107 |
RPC calls available between Ganeti master and node daemons: |
108 |
|
109 |
jobqueue_update(file_name, content) |
110 |
Writes a file in the job queue directory. |
111 |
jobqueue_purge() |
112 |
Cleans the job queue directory completely, including archived job. |
113 |
jobqueue_rename(old, new) |
114 |
Renames a file in the job queue directory. |
115 |
|
116 |
|
117 |
Client RPC |
118 |
~~~~~~~~~~ |
119 |
|
120 |
RPC between Ganeti clients and the Ganeti master daemon supports the following |
121 |
operations: |
122 |
|
123 |
SubmitJob(ops) |
124 |
Submits a list of opcodes and returns the job identifier. The identifier is |
125 |
guaranteed to be unique during the lifetime of a cluster. |
126 |
WaitForJobChange(job_id, fields, […], timeout) |
127 |
This function waits until a job changes or a timeout expires. The condition |
128 |
for when a job changed is defined by the fields passed and the last log |
129 |
message received. |
130 |
QueryJobs(job_ids, fields) |
131 |
Returns field values for the job identifiers passed. |
132 |
CancelJob(job_id) |
133 |
Cancels the job specified by identifier. This operation may fail if the job |
134 |
is already running, canceled or finished. |
135 |
ArchiveJob(job_id) |
136 |
Moves a job into the …/archive/ directory. This operation will fail if the |
137 |
job has not been canceled or finished. |
138 |
|
139 |
|
140 |
Job and opcode status |
141 |
~~~~~~~~~~~~~~~~~~~~~ |
142 |
|
143 |
Each job and each opcode has, at any time, one of the following states: |
144 |
|
145 |
Queued |
146 |
The job/opcode was submitted, but did not yet start. |
147 |
Waiting |
148 |
The job/opcode is waiting for a lock to proceed. |
149 |
Running |
150 |
The job/opcode is running. |
151 |
Canceled |
152 |
The job/opcode was canceled before it started. |
153 |
Success |
154 |
The job/opcode ran and finished successfully. |
155 |
Error |
156 |
The job/opcode was aborted with an error. |
157 |
|
158 |
If the master is aborted while a job is running, the job will be set to the |
159 |
Error status once the master started again. |
160 |
|
161 |
|
162 |
History |
163 |
~~~~~~~ |
164 |
|
165 |
Archived jobs are kept in a separate directory, |
166 |
/var/lib/ganeti/queue/archive/. This is done in order to speed up the |
167 |
queue handling: by default, the jobs in the archive are not touched by |
168 |
any functions. Only the current (unarchived) jobs are parsed, loaded, |
169 |
and verified (if implemented) by the master daemon. |
170 |
|
171 |
|
172 |
Ganeti updates |
173 |
~~~~~~~~~~~~~~ |
174 |
|
175 |
The queue has to be completely empty for Ganeti updates with changes |
176 |
in the job queue structure. In order to allow this, there will be a |
177 |
way to prevent new jobs entering the queue. |