Statistics
| Branch: | Tag: | Revision:

root / doc / design-chained-jobs.rst @ 87c7621a

History | View | Annotate | Download (8.2 kB)

1 6c3d18e0 Michael Hanselmann
============
2 6c3d18e0 Michael Hanselmann
Chained jobs
3 6c3d18e0 Michael Hanselmann
============
4 6c3d18e0 Michael Hanselmann
5 6c3d18e0 Michael Hanselmann
.. contents:: :depth: 4
6 6c3d18e0 Michael Hanselmann
7 6c3d18e0 Michael Hanselmann
This is a design document about the innards of Ganeti's job processing.
8 6c3d18e0 Michael Hanselmann
Readers are advised to study previous design documents on the topic:
9 6c3d18e0 Michael Hanselmann
10 6c3d18e0 Michael Hanselmann
- :ref:`Original job queue <jqueue-original-design>`
11 6c3d18e0 Michael Hanselmann
- :ref:`Job priorities <jqueue-job-priority-design>`
12 6c3d18e0 Michael Hanselmann
- :doc:`LU-generated jobs <design-lu-generated-jobs>`
13 6c3d18e0 Michael Hanselmann
14 6c3d18e0 Michael Hanselmann
15 6c3d18e0 Michael Hanselmann
Current state and shortcomings
16 6c3d18e0 Michael Hanselmann
==============================
17 6c3d18e0 Michael Hanselmann
18 6c3d18e0 Michael Hanselmann
Ever since the introduction of the job queue with Ganeti 2.0 there have
19 6c3d18e0 Michael Hanselmann
been situations where we wanted to run several jobs in a specific order.
20 6c3d18e0 Michael Hanselmann
Due to the job queue's current design, such a guarantee can not be
21 6c3d18e0 Michael Hanselmann
given. Jobs are run according to their priority, their ability to
22 6c3d18e0 Michael Hanselmann
acquire all necessary locks and other factors.
23 6c3d18e0 Michael Hanselmann
24 6c3d18e0 Michael Hanselmann
One way to work around this limitation is to do some kind of job
25 6c3d18e0 Michael Hanselmann
grouping in the client code. Once all jobs of a group have finished, the
26 6c3d18e0 Michael Hanselmann
next group is submitted and waited for. There are different kinds of
27 6c3d18e0 Michael Hanselmann
clients for Ganeti, some of which don't share code (e.g. Python clients
28 6c3d18e0 Michael Hanselmann
vs. htools). This design proposes a solution which would be implemented
29 6c3d18e0 Michael Hanselmann
as part of the job queue in the master daemon.
30 6c3d18e0 Michael Hanselmann
31 6c3d18e0 Michael Hanselmann
32 6c3d18e0 Michael Hanselmann
Proposed changes
33 6c3d18e0 Michael Hanselmann
================
34 6c3d18e0 Michael Hanselmann
35 6c3d18e0 Michael Hanselmann
With the implementation of :ref:`job priorities
36 6c3d18e0 Michael Hanselmann
<jqueue-job-priority-design>` the processing code was re-architectured
37 6c3d18e0 Michael Hanselmann
and became a lot more versatile. It now returns jobs to the queue in
38 6c3d18e0 Michael Hanselmann
case the locks for an opcode can't be acquired, allowing other
39 6c3d18e0 Michael Hanselmann
jobs/opcodes to be run in the meantime.
40 6c3d18e0 Michael Hanselmann
41 6c3d18e0 Michael Hanselmann
The proposal is to add a new, optional property to opcodes to define
42 6c3d18e0 Michael Hanselmann
dependencies on other jobs. Job X could define opcodes with a dependency
43 6c3d18e0 Michael Hanselmann
on the success of job Y and would only be run once job Y is finished. If
44 6c3d18e0 Michael Hanselmann
there's a dependency on success and job Y failed, job X would fail as
45 6c3d18e0 Michael Hanselmann
well. Since such dependencies would use job IDs, the jobs still need to
46 6c3d18e0 Michael Hanselmann
be submitted in the right order.
47 6c3d18e0 Michael Hanselmann
48 6c3d18e0 Michael Hanselmann
.. pyassert::
49 6c3d18e0 Michael Hanselmann
50 6c3d18e0 Michael Hanselmann
   # Update description below if finalized job status change
51 6c3d18e0 Michael Hanselmann
   constants.JOBS_FINALIZED == frozenset([
52 6c3d18e0 Michael Hanselmann
     constants.JOB_STATUS_CANCELED,
53 6c3d18e0 Michael Hanselmann
     constants.JOB_STATUS_SUCCESS,
54 6c3d18e0 Michael Hanselmann
     constants.JOB_STATUS_ERROR,
55 6c3d18e0 Michael Hanselmann
     ])
56 6c3d18e0 Michael Hanselmann
57 6c3d18e0 Michael Hanselmann
The new attribute's value would be a list of two-valued tuples. Each
58 6c3d18e0 Michael Hanselmann
tuple contains a job ID and a list of requested status for the job
59 6c3d18e0 Michael Hanselmann
depended upon. Only final status are accepted
60 6c3d18e0 Michael Hanselmann
(:pyeval:`utils.CommaJoin(constants.JOBS_FINALIZED)`). An empty list is
61 6c3d18e0 Michael Hanselmann
equivalent to specifying all final status (except
62 6c3d18e0 Michael Hanselmann
:pyeval:`constants.JOB_STATUS_CANCELED`, which is treated specially).
63 6c3d18e0 Michael Hanselmann
An opcode runs only once all its dependency requirements have been
64 6c3d18e0 Michael Hanselmann
fulfilled.
65 6c3d18e0 Michael Hanselmann
66 6c3d18e0 Michael Hanselmann
Any job referring to a cancelled job is also cancelled unless it
67 2ed0e208 Iustin Pop
explicitly lists :pyeval:`constants.JOB_STATUS_CANCELED` as a requested
68 6c3d18e0 Michael Hanselmann
status.
69 6c3d18e0 Michael Hanselmann
70 6c3d18e0 Michael Hanselmann
In case a referenced job can not be found in the normal queue or the
71 6c3d18e0 Michael Hanselmann
archive, referring jobs fail as the status of the referenced job can't
72 6c3d18e0 Michael Hanselmann
be determined.
73 6c3d18e0 Michael Hanselmann
74 6c3d18e0 Michael Hanselmann
With this change, clients can submit all wanted jobs in the right order
75 6c3d18e0 Michael Hanselmann
and proceed to wait for changes on all these jobs (see
76 6c3d18e0 Michael Hanselmann
``cli.JobExecutor``). The master daemon will take care of executing them
77 6c3d18e0 Michael Hanselmann
in the right order, while still presenting the client with a simple
78 6c3d18e0 Michael Hanselmann
interface.
79 6c3d18e0 Michael Hanselmann
80 6c3d18e0 Michael Hanselmann
Clients using the ``SubmitManyJobs`` interface can use relative job IDs
81 6c3d18e0 Michael Hanselmann
(negative integers) to refer to jobs in the same submission.
82 6c3d18e0 Michael Hanselmann
83 6c3d18e0 Michael Hanselmann
.. highlight:: javascript
84 6c3d18e0 Michael Hanselmann
85 6c3d18e0 Michael Hanselmann
Example data structures::
86 6c3d18e0 Michael Hanselmann
87 6c3d18e0 Michael Hanselmann
  # First job
88 6c3d18e0 Michael Hanselmann
  {
89 6c3d18e0 Michael Hanselmann
    "job_id": "6151",
90 6c3d18e0 Michael Hanselmann
    "ops": [
91 6c3d18e0 Michael Hanselmann
      { "OP_ID": "OP_INSTANCE_REPLACE_DISKS", ..., },
92 6c3d18e0 Michael Hanselmann
      { "OP_ID": "OP_INSTANCE_FAILOVER", ..., },
93 6c3d18e0 Michael Hanselmann
      ],
94 6c3d18e0 Michael Hanselmann
  }
95 6c3d18e0 Michael Hanselmann
96 6c3d18e0 Michael Hanselmann
  # Second job, runs in parallel with first job
97 6c3d18e0 Michael Hanselmann
  {
98 6c3d18e0 Michael Hanselmann
    "job_id": "7687",
99 6c3d18e0 Michael Hanselmann
    "ops": [
100 6c3d18e0 Michael Hanselmann
      { "OP_ID": "OP_INSTANCE_MIGRATE", ..., },
101 6c3d18e0 Michael Hanselmann
      ],
102 6c3d18e0 Michael Hanselmann
  }
103 6c3d18e0 Michael Hanselmann
104 6c3d18e0 Michael Hanselmann
  # Third job, depending on success of previous jobs
105 6c3d18e0 Michael Hanselmann
  {
106 6c3d18e0 Michael Hanselmann
    "job_id": "9218",
107 6c3d18e0 Michael Hanselmann
    "ops": [
108 6c3d18e0 Michael Hanselmann
      { "OP_ID": "OP_NODE_SET_PARAMS",
109 6c3d18e0 Michael Hanselmann
        "depend": [
110 6c3d18e0 Michael Hanselmann
          [6151, ["success"]],
111 6c3d18e0 Michael Hanselmann
          [7687, ["success"]],
112 6c3d18e0 Michael Hanselmann
          ],
113 6c3d18e0 Michael Hanselmann
        "offline": True, },
114 6c3d18e0 Michael Hanselmann
      ],
115 6c3d18e0 Michael Hanselmann
  }
116 6c3d18e0 Michael Hanselmann
117 6c3d18e0 Michael Hanselmann
118 2915335f Michael Hanselmann
Implementation details
119 2915335f Michael Hanselmann
----------------------
120 2915335f Michael Hanselmann
121 2915335f Michael Hanselmann
Status while waiting for dependencies
122 2915335f Michael Hanselmann
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123 2915335f Michael Hanselmann
124 2915335f Michael Hanselmann
Jobs waiting for dependencies are certainly not in the queue anymore and
125 2915335f Michael Hanselmann
therefore need to change their status from "queued". While waiting for
126 2915335f Michael Hanselmann
opcode locks the job is in the "waiting" status (the constant is named
127 2915335f Michael Hanselmann
``JOB_STATUS_WAITLOCK``, but the actual value is ``waiting``). There the
128 2915335f Michael Hanselmann
following possibilities:
129 2915335f Michael Hanselmann
130 2915335f Michael Hanselmann
#. Introduce a new status, e.g. "waitdeps".
131 2915335f Michael Hanselmann
132 2915335f Michael Hanselmann
   Pro:
133 2915335f Michael Hanselmann
134 2915335f Michael Hanselmann
   - Clients know for sure a job is waiting for dependencies, not locks
135 2915335f Michael Hanselmann
136 2915335f Michael Hanselmann
   Con:
137 2915335f Michael Hanselmann
138 2915335f Michael Hanselmann
   - Code and tests would have to be updated/extended for the new status
139 2915335f Michael Hanselmann
   - List of possible state transitions certainly wouldn't get simpler
140 2915335f Michael Hanselmann
   - Breaks backwards compatibility, older clients might get confused
141 2915335f Michael Hanselmann
142 2915335f Michael Hanselmann
#. Use existing "waiting" status.
143 2915335f Michael Hanselmann
144 2915335f Michael Hanselmann
   Pro:
145 2915335f Michael Hanselmann
146 2915335f Michael Hanselmann
   - No client changes necessary, less code churn (note that there are
147 2915335f Michael Hanselmann
     clients which don't live in Ganeti core)
148 2915335f Michael Hanselmann
   - Clients don't need to know the difference between waiting for a job
149 2915335f Michael Hanselmann
     and waiting for a lock; it doesn't make a difference
150 2915335f Michael Hanselmann
   - Fewer state transitions (see commit ``5fd6b69479c0``, which removed
151 2915335f Michael Hanselmann
     many state transitions and disk writes)
152 2915335f Michael Hanselmann
153 2915335f Michael Hanselmann
   Con:
154 2915335f Michael Hanselmann
155 2915335f Michael Hanselmann
   - Not immediately visible what a job is waiting for, but it's the
156 2915335f Michael Hanselmann
     same issue with locks; this is the reason why the lock monitor
157 2915335f Michael Hanselmann
     (``gnt-debug locks``) was introduced; job dependencies can be shown
158 2915335f Michael Hanselmann
     as "locks" in the monitor
159 2915335f Michael Hanselmann
160 2915335f Michael Hanselmann
Based on these arguments, the proposal is to do the following:
161 2915335f Michael Hanselmann
162 2915335f Michael Hanselmann
- Rename ``JOB_STATUS_WAITLOCK`` constant to ``JOB_STATUS_WAITING`` to
163 2915335f Michael Hanselmann
  reflect its actual meanting: the job is waiting for something
164 2915335f Michael Hanselmann
- While waiting for dependencies and locks, jobs are in the "waiting"
165 2915335f Michael Hanselmann
  status
166 2915335f Michael Hanselmann
- Export dependency information in lock monitor; example output::
167 2915335f Michael Hanselmann
168 2915335f Michael Hanselmann
    Name      Mode Owner Pending
169 2915335f Michael Hanselmann
    job/27491 -    -     success:job/34709,job/21459
170 2915335f Michael Hanselmann
    job/21459 -    -     success,error:job/14513
171 2915335f Michael Hanselmann
172 2915335f Michael Hanselmann
173 2915335f Michael Hanselmann
Cost of deserialization
174 2915335f Michael Hanselmann
~~~~~~~~~~~~~~~~~~~~~~~
175 2915335f Michael Hanselmann
176 2915335f Michael Hanselmann
To determine the status of a dependency job the job queue must have
177 2915335f Michael Hanselmann
access to its data structure. Other queue operations already do this,
178 2915335f Michael Hanselmann
e.g. archiving, watching a job's progress and querying jobs.
179 2915335f Michael Hanselmann
180 2915335f Michael Hanselmann
Initially (Ganeti 2.0/2.1) the job queue shared the job objects
181 2915335f Michael Hanselmann
in memory and protected them using locks. Ganeti 2.2 (see :doc:`design
182 2915335f Michael Hanselmann
document <design-2.2>`) changed the queue to read and deserialize jobs
183 2915335f Michael Hanselmann
from disk. This significantly reduced locking and code complexity.
184 2915335f Michael Hanselmann
Nowadays inotify is used to wait for changes on job files when watching
185 2915335f Michael Hanselmann
a job's progress.
186 2915335f Michael Hanselmann
187 2915335f Michael Hanselmann
Reading from disk and deserializing certainly has some cost associated
188 2915335f Michael Hanselmann
with it, but it's a significantly simpler architecture than
189 2915335f Michael Hanselmann
synchronizing in memory with locks. At the stage where dependencies are
190 2915335f Michael Hanselmann
evaluated the queue lock is held in shared mode, so different workers
191 2915335f Michael Hanselmann
can read at the same time (deliberately ignoring CPython's interpreter
192 2915335f Michael Hanselmann
lock).
193 2915335f Michael Hanselmann
194 2915335f Michael Hanselmann
It is expected that the majority of executed jobs won't use
195 2915335f Michael Hanselmann
dependencies and therefore won't be affected.
196 2915335f Michael Hanselmann
197 2915335f Michael Hanselmann
198 6c3d18e0 Michael Hanselmann
Other discussed solutions
199 6c3d18e0 Michael Hanselmann
=========================
200 6c3d18e0 Michael Hanselmann
201 6c3d18e0 Michael Hanselmann
Job-level attribute
202 6c3d18e0 Michael Hanselmann
-------------------
203 6c3d18e0 Michael Hanselmann
204 6c3d18e0 Michael Hanselmann
At a first look it might seem to be better to put dependencies on
205 6c3d18e0 Michael Hanselmann
previous jobs at a job level. However, it turns out that having the
206 6c3d18e0 Michael Hanselmann
option of defining only a single opcode in a job as having such a
207 6c3d18e0 Michael Hanselmann
dependency can be useful as well. The code complexity in the job queue
208 6c3d18e0 Michael Hanselmann
is equivalent if not simpler.
209 6c3d18e0 Michael Hanselmann
210 6c3d18e0 Michael Hanselmann
Since opcodes are guaranteed to run in order, clients can just define
211 6c3d18e0 Michael Hanselmann
the dependency on the first opcode.
212 6c3d18e0 Michael Hanselmann
213 6c3d18e0 Michael Hanselmann
Another reason for the choice of an opcode-level attribute is that the
214 6c3d18e0 Michael Hanselmann
current LUXI interface for submitting jobs is a bit restricted and would
215 6c3d18e0 Michael Hanselmann
need to be changed to allow the addition of job-level attributes,
216 6c3d18e0 Michael Hanselmann
potentially requiring changes in all LUXI clients and/or breaking
217 6c3d18e0 Michael Hanselmann
backwards compatibility.
218 6c3d18e0 Michael Hanselmann
219 6c3d18e0 Michael Hanselmann
220 6c3d18e0 Michael Hanselmann
Client-side logic
221 6c3d18e0 Michael Hanselmann
-----------------
222 6c3d18e0 Michael Hanselmann
223 6c3d18e0 Michael Hanselmann
There's at least one implementation of a batched job executor twisted
224 6c3d18e0 Michael Hanselmann
into the ``burnin`` tool's code. While certainly possible, a client-side
225 6c3d18e0 Michael Hanselmann
solution should be avoided due to the different clients already in use.
226 6c3d18e0 Michael Hanselmann
For one, the :doc:`remote API <rapi>` client shouldn't import
227 6c3d18e0 Michael Hanselmann
non-standard modules. htools are written in Haskell and can't use Python
228 6c3d18e0 Michael Hanselmann
modules. A batched job executor contains quite some logic. Even if
229 6c3d18e0 Michael Hanselmann
cleanly abstracted in a (Python) library, sharing code between different
230 6c3d18e0 Michael Hanselmann
clients is difficult if not impossible.
231 6c3d18e0 Michael Hanselmann
232 6c3d18e0 Michael Hanselmann
233 6c3d18e0 Michael Hanselmann
.. vim: set textwidth=72 :
234 6c3d18e0 Michael Hanselmann
.. Local Variables:
235 6c3d18e0 Michael Hanselmann
.. mode: rst
236 6c3d18e0 Michael Hanselmann
.. fill-column: 72
237 6c3d18e0 Michael Hanselmann
.. End: