|
1 |
============
|
|
2 |
Chained jobs
|
|
3 |
============
|
|
4 |
|
|
5 |
.. contents:: :depth: 4
|
|
6 |
|
|
7 |
This is a design document about the innards of Ganeti's job processing.
|
|
8 |
Readers are advised to study previous design documents on the topic:
|
|
9 |
|
|
10 |
- :ref:`Original job queue <jqueue-original-design>`
|
|
11 |
- :ref:`Job priorities <jqueue-job-priority-design>`
|
|
12 |
- :doc:`LU-generated jobs <design-lu-generated-jobs>`
|
|
13 |
|
|
14 |
|
|
15 |
Current state and shortcomings
|
|
16 |
==============================
|
|
17 |
|
|
18 |
Ever since the introduction of the job queue with Ganeti 2.0 there have
|
|
19 |
been situations where we wanted to run several jobs in a specific order.
|
|
20 |
Due to the job queue's current design, such a guarantee can not be
|
|
21 |
given. Jobs are run according to their priority, their ability to
|
|
22 |
acquire all necessary locks and other factors.
|
|
23 |
|
|
24 |
One way to work around this limitation is to do some kind of job
|
|
25 |
grouping in the client code. Once all jobs of a group have finished, the
|
|
26 |
next group is submitted and waited for. There are different kinds of
|
|
27 |
clients for Ganeti, some of which don't share code (e.g. Python clients
|
|
28 |
vs. htools). This design proposes a solution which would be implemented
|
|
29 |
as part of the job queue in the master daemon.
|
|
30 |
|
|
31 |
|
|
32 |
Proposed changes
|
|
33 |
================
|
|
34 |
|
|
35 |
With the implementation of :ref:`job priorities
|
|
36 |
<jqueue-job-priority-design>` the processing code was re-architectured
|
|
37 |
and became a lot more versatile. It now returns jobs to the queue in
|
|
38 |
case the locks for an opcode can't be acquired, allowing other
|
|
39 |
jobs/opcodes to be run in the meantime.
|
|
40 |
|
|
41 |
The proposal is to add a new, optional property to opcodes to define
|
|
42 |
dependencies on other jobs. Job X could define opcodes with a dependency
|
|
43 |
on the success of job Y and would only be run once job Y is finished. If
|
|
44 |
there's a dependency on success and job Y failed, job X would fail as
|
|
45 |
well. Since such dependencies would use job IDs, the jobs still need to
|
|
46 |
be submitted in the right order.
|
|
47 |
|
|
48 |
.. pyassert::
|
|
49 |
|
|
50 |
# Update description below if finalized job status change
|
|
51 |
constants.JOBS_FINALIZED == frozenset([
|
|
52 |
constants.JOB_STATUS_CANCELED,
|
|
53 |
constants.JOB_STATUS_SUCCESS,
|
|
54 |
constants.JOB_STATUS_ERROR,
|
|
55 |
])
|
|
56 |
|
|
57 |
The new attribute's value would be a list of two-valued tuples. Each
|
|
58 |
tuple contains a job ID and a list of requested status for the job
|
|
59 |
depended upon. Only final status are accepted
|
|
60 |
(:pyeval:`utils.CommaJoin(constants.JOBS_FINALIZED)`). An empty list is
|
|
61 |
equivalent to specifying all final status (except
|
|
62 |
:pyeval:`constants.JOB_STATUS_CANCELED`, which is treated specially).
|
|
63 |
An opcode runs only once all its dependency requirements have been
|
|
64 |
fulfilled.
|
|
65 |
|
|
66 |
Any job referring to a cancelled job is also cancelled unless it
|
|
67 |
explicitely lists :pyeval:`constants.JOB_STATUS_CANCELED` as a requested
|
|
68 |
status.
|
|
69 |
|
|
70 |
In case a referenced job can not be found in the normal queue or the
|
|
71 |
archive, referring jobs fail as the status of the referenced job can't
|
|
72 |
be determined.
|
|
73 |
|
|
74 |
With this change, clients can submit all wanted jobs in the right order
|
|
75 |
and proceed to wait for changes on all these jobs (see
|
|
76 |
``cli.JobExecutor``). The master daemon will take care of executing them
|
|
77 |
in the right order, while still presenting the client with a simple
|
|
78 |
interface.
|
|
79 |
|
|
80 |
Clients using the ``SubmitManyJobs`` interface can use relative job IDs
|
|
81 |
(negative integers) to refer to jobs in the same submission.
|
|
82 |
|
|
83 |
.. highlight:: javascript
|
|
84 |
|
|
85 |
Example data structures::
|
|
86 |
|
|
87 |
# First job
|
|
88 |
{
|
|
89 |
"job_id": "6151",
|
|
90 |
"ops": [
|
|
91 |
{ "OP_ID": "OP_INSTANCE_REPLACE_DISKS", ..., },
|
|
92 |
{ "OP_ID": "OP_INSTANCE_FAILOVER", ..., },
|
|
93 |
],
|
|
94 |
}
|
|
95 |
|
|
96 |
# Second job, runs in parallel with first job
|
|
97 |
{
|
|
98 |
"job_id": "7687",
|
|
99 |
"ops": [
|
|
100 |
{ "OP_ID": "OP_INSTANCE_MIGRATE", ..., },
|
|
101 |
],
|
|
102 |
}
|
|
103 |
|
|
104 |
# Third job, depending on success of previous jobs
|
|
105 |
{
|
|
106 |
"job_id": "9218",
|
|
107 |
"ops": [
|
|
108 |
{ "OP_ID": "OP_NODE_SET_PARAMS",
|
|
109 |
"depend": [
|
|
110 |
[6151, ["success"]],
|
|
111 |
[7687, ["success"]],
|
|
112 |
],
|
|
113 |
"offline": True, },
|
|
114 |
],
|
|
115 |
}
|
|
116 |
|
|
117 |
|
|
118 |
Other discussed solutions
|
|
119 |
=========================
|
|
120 |
|
|
121 |
Job-level attribute
|
|
122 |
-------------------
|
|
123 |
|
|
124 |
At a first look it might seem to be better to put dependencies on
|
|
125 |
previous jobs at a job level. However, it turns out that having the
|
|
126 |
option of defining only a single opcode in a job as having such a
|
|
127 |
dependency can be useful as well. The code complexity in the job queue
|
|
128 |
is equivalent if not simpler.
|
|
129 |
|
|
130 |
Since opcodes are guaranteed to run in order, clients can just define
|
|
131 |
the dependency on the first opcode.
|
|
132 |
|
|
133 |
Another reason for the choice of an opcode-level attribute is that the
|
|
134 |
current LUXI interface for submitting jobs is a bit restricted and would
|
|
135 |
need to be changed to allow the addition of job-level attributes,
|
|
136 |
potentially requiring changes in all LUXI clients and/or breaking
|
|
137 |
backwards compatibility.
|
|
138 |
|
|
139 |
|
|
140 |
Client-side logic
|
|
141 |
-----------------
|
|
142 |
|
|
143 |
There's at least one implementation of a batched job executor twisted
|
|
144 |
into the ``burnin`` tool's code. While certainly possible, a client-side
|
|
145 |
solution should be avoided due to the different clients already in use.
|
|
146 |
For one, the :doc:`remote API <rapi>` client shouldn't import
|
|
147 |
non-standard modules. htools are written in Haskell and can't use Python
|
|
148 |
modules. A batched job executor contains quite some logic. Even if
|
|
149 |
cleanly abstracted in a (Python) library, sharing code between different
|
|
150 |
clients is difficult if not impossible.
|
|
151 |
|
|
152 |
|
|
153 |
.. vim: set textwidth=72 :
|
|
154 |
.. Local Variables:
|
|
155 |
.. mode: rst
|
|
156 |
.. fill-column: 72
|
|
157 |
.. End:
|