root / doc / design-chained-jobs.rst @ cd93a0ec
History | View | Annotate | Download (8.2 kB)
1 | 6c3d18e0 | Michael Hanselmann | ============ |
---|---|---|---|
2 | 6c3d18e0 | Michael Hanselmann | Chained jobs |
3 | 6c3d18e0 | Michael Hanselmann | ============ |
4 | 6c3d18e0 | Michael Hanselmann | |
5 | 6c3d18e0 | Michael Hanselmann | .. contents:: :depth: 4 |
6 | 6c3d18e0 | Michael Hanselmann | |
7 | 6c3d18e0 | Michael Hanselmann | This is a design document about the innards of Ganeti's job processing. |
8 | 6c3d18e0 | Michael Hanselmann | Readers are advised to study previous design documents on the topic: |
9 | 6c3d18e0 | Michael Hanselmann | |
10 | 6c3d18e0 | Michael Hanselmann | - :ref:`Original job queue <jqueue-original-design>` |
11 | 6c3d18e0 | Michael Hanselmann | - :ref:`Job priorities <jqueue-job-priority-design>` |
12 | 6c3d18e0 | Michael Hanselmann | - :doc:`LU-generated jobs <design-lu-generated-jobs>` |
13 | 6c3d18e0 | Michael Hanselmann | |
14 | 6c3d18e0 | Michael Hanselmann | |
15 | 6c3d18e0 | Michael Hanselmann | Current state and shortcomings |
16 | 6c3d18e0 | Michael Hanselmann | ============================== |
17 | 6c3d18e0 | Michael Hanselmann | |
18 | 6c3d18e0 | Michael Hanselmann | Ever since the introduction of the job queue with Ganeti 2.0 there have |
19 | 6c3d18e0 | Michael Hanselmann | been situations where we wanted to run several jobs in a specific order. |
20 | 6c3d18e0 | Michael Hanselmann | Due to the job queue's current design, such a guarantee can not be |
21 | 6c3d18e0 | Michael Hanselmann | given. Jobs are run according to their priority, their ability to |
22 | 6c3d18e0 | Michael Hanselmann | acquire all necessary locks and other factors. |
23 | 6c3d18e0 | Michael Hanselmann | |
24 | 6c3d18e0 | Michael Hanselmann | One way to work around this limitation is to do some kind of job |
25 | 6c3d18e0 | Michael Hanselmann | grouping in the client code. Once all jobs of a group have finished, the |
26 | 6c3d18e0 | Michael Hanselmann | next group is submitted and waited for. There are different kinds of |
27 | 6c3d18e0 | Michael Hanselmann | clients for Ganeti, some of which don't share code (e.g. Python clients |
28 | 6c3d18e0 | Michael Hanselmann | vs. htools). This design proposes a solution which would be implemented |
29 | 6c3d18e0 | Michael Hanselmann | as part of the job queue in the master daemon. |
30 | 6c3d18e0 | Michael Hanselmann | |
31 | 6c3d18e0 | Michael Hanselmann | |
32 | 6c3d18e0 | Michael Hanselmann | Proposed changes |
33 | 6c3d18e0 | Michael Hanselmann | ================ |
34 | 6c3d18e0 | Michael Hanselmann | |
35 | 6c3d18e0 | Michael Hanselmann | With the implementation of :ref:`job priorities |
36 | 6c3d18e0 | Michael Hanselmann | <jqueue-job-priority-design>` the processing code was re-architectured |
37 | 6c3d18e0 | Michael Hanselmann | and became a lot more versatile. It now returns jobs to the queue in |
38 | 6c3d18e0 | Michael Hanselmann | case the locks for an opcode can't be acquired, allowing other |
39 | 6c3d18e0 | Michael Hanselmann | jobs/opcodes to be run in the meantime. |
40 | 6c3d18e0 | Michael Hanselmann | |
41 | 6c3d18e0 | Michael Hanselmann | The proposal is to add a new, optional property to opcodes to define |
42 | 6c3d18e0 | Michael Hanselmann | dependencies on other jobs. Job X could define opcodes with a dependency |
43 | 6c3d18e0 | Michael Hanselmann | on the success of job Y and would only be run once job Y is finished. If |
44 | 6c3d18e0 | Michael Hanselmann | there's a dependency on success and job Y failed, job X would fail as |
45 | 6c3d18e0 | Michael Hanselmann | well. Since such dependencies would use job IDs, the jobs still need to |
46 | 6c3d18e0 | Michael Hanselmann | be submitted in the right order. |
47 | 6c3d18e0 | Michael Hanselmann | |
48 | 6c3d18e0 | Michael Hanselmann | .. pyassert:: |
49 | 6c3d18e0 | Michael Hanselmann | |
50 | 6c3d18e0 | Michael Hanselmann | # Update description below if finalized job status change |
51 | 6c3d18e0 | Michael Hanselmann | constants.JOBS_FINALIZED == frozenset([ |
52 | 6c3d18e0 | Michael Hanselmann | constants.JOB_STATUS_CANCELED, |
53 | 6c3d18e0 | Michael Hanselmann | constants.JOB_STATUS_SUCCESS, |
54 | 6c3d18e0 | Michael Hanselmann | constants.JOB_STATUS_ERROR, |
55 | 6c3d18e0 | Michael Hanselmann | ]) |
56 | 6c3d18e0 | Michael Hanselmann | |
57 | 6c3d18e0 | Michael Hanselmann | The new attribute's value would be a list of two-valued tuples. Each |
58 | 6c3d18e0 | Michael Hanselmann | tuple contains a job ID and a list of requested status for the job |
59 | 6c3d18e0 | Michael Hanselmann | depended upon. Only final status are accepted |
60 | 6c3d18e0 | Michael Hanselmann | (:pyeval:`utils.CommaJoin(constants.JOBS_FINALIZED)`). An empty list is |
61 | 6c3d18e0 | Michael Hanselmann | equivalent to specifying all final status (except |
62 | 6c3d18e0 | Michael Hanselmann | :pyeval:`constants.JOB_STATUS_CANCELED`, which is treated specially). |
63 | 6c3d18e0 | Michael Hanselmann | An opcode runs only once all its dependency requirements have been |
64 | 6c3d18e0 | Michael Hanselmann | fulfilled. |
65 | 6c3d18e0 | Michael Hanselmann | |
66 | 6c3d18e0 | Michael Hanselmann | Any job referring to a cancelled job is also cancelled unless it |
67 | 2ed0e208 | Iustin Pop | explicitly lists :pyeval:`constants.JOB_STATUS_CANCELED` as a requested |
68 | 6c3d18e0 | Michael Hanselmann | status. |
69 | 6c3d18e0 | Michael Hanselmann | |
70 | 6c3d18e0 | Michael Hanselmann | In case a referenced job can not be found in the normal queue or the |
71 | 6c3d18e0 | Michael Hanselmann | archive, referring jobs fail as the status of the referenced job can't |
72 | 6c3d18e0 | Michael Hanselmann | be determined. |
73 | 6c3d18e0 | Michael Hanselmann | |
74 | 6c3d18e0 | Michael Hanselmann | With this change, clients can submit all wanted jobs in the right order |
75 | 6c3d18e0 | Michael Hanselmann | and proceed to wait for changes on all these jobs (see |
76 | 6c3d18e0 | Michael Hanselmann | ``cli.JobExecutor``). The master daemon will take care of executing them |
77 | 6c3d18e0 | Michael Hanselmann | in the right order, while still presenting the client with a simple |
78 | 6c3d18e0 | Michael Hanselmann | interface. |
79 | 6c3d18e0 | Michael Hanselmann | |
80 | 6c3d18e0 | Michael Hanselmann | Clients using the ``SubmitManyJobs`` interface can use relative job IDs |
81 | 6c3d18e0 | Michael Hanselmann | (negative integers) to refer to jobs in the same submission. |
82 | 6c3d18e0 | Michael Hanselmann | |
83 | 6c3d18e0 | Michael Hanselmann | .. highlight:: javascript |
84 | 6c3d18e0 | Michael Hanselmann | |
85 | 6c3d18e0 | Michael Hanselmann | Example data structures:: |
86 | 6c3d18e0 | Michael Hanselmann | |
87 | 6c3d18e0 | Michael Hanselmann | # First job |
88 | 6c3d18e0 | Michael Hanselmann | { |
89 | 6c3d18e0 | Michael Hanselmann | "job_id": "6151", |
90 | 6c3d18e0 | Michael Hanselmann | "ops": [ |
91 | 6c3d18e0 | Michael Hanselmann | { "OP_ID": "OP_INSTANCE_REPLACE_DISKS", ..., }, |
92 | 6c3d18e0 | Michael Hanselmann | { "OP_ID": "OP_INSTANCE_FAILOVER", ..., }, |
93 | 6c3d18e0 | Michael Hanselmann | ], |
94 | 6c3d18e0 | Michael Hanselmann | } |
95 | 6c3d18e0 | Michael Hanselmann | |
96 | 6c3d18e0 | Michael Hanselmann | # Second job, runs in parallel with first job |
97 | 6c3d18e0 | Michael Hanselmann | { |
98 | 6c3d18e0 | Michael Hanselmann | "job_id": "7687", |
99 | 6c3d18e0 | Michael Hanselmann | "ops": [ |
100 | 6c3d18e0 | Michael Hanselmann | { "OP_ID": "OP_INSTANCE_MIGRATE", ..., }, |
101 | 6c3d18e0 | Michael Hanselmann | ], |
102 | 6c3d18e0 | Michael Hanselmann | } |
103 | 6c3d18e0 | Michael Hanselmann | |
104 | 6c3d18e0 | Michael Hanselmann | # Third job, depending on success of previous jobs |
105 | 6c3d18e0 | Michael Hanselmann | { |
106 | 6c3d18e0 | Michael Hanselmann | "job_id": "9218", |
107 | 6c3d18e0 | Michael Hanselmann | "ops": [ |
108 | 6c3d18e0 | Michael Hanselmann | { "OP_ID": "OP_NODE_SET_PARAMS", |
109 | 6c3d18e0 | Michael Hanselmann | "depend": [ |
110 | 6c3d18e0 | Michael Hanselmann | [6151, ["success"]], |
111 | 6c3d18e0 | Michael Hanselmann | [7687, ["success"]], |
112 | 6c3d18e0 | Michael Hanselmann | ], |
113 | 6c3d18e0 | Michael Hanselmann | "offline": True, }, |
114 | 6c3d18e0 | Michael Hanselmann | ], |
115 | 6c3d18e0 | Michael Hanselmann | } |
116 | 6c3d18e0 | Michael Hanselmann | |
117 | 6c3d18e0 | Michael Hanselmann | |
118 | 2915335f | Michael Hanselmann | Implementation details |
119 | 2915335f | Michael Hanselmann | ---------------------- |
120 | 2915335f | Michael Hanselmann | |
121 | 2915335f | Michael Hanselmann | Status while waiting for dependencies |
122 | 2915335f | Michael Hanselmann | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
123 | 2915335f | Michael Hanselmann | |
124 | 2915335f | Michael Hanselmann | Jobs waiting for dependencies are certainly not in the queue anymore and |
125 | 2915335f | Michael Hanselmann | therefore need to change their status from "queued". While waiting for |
126 | 2915335f | Michael Hanselmann | opcode locks the job is in the "waiting" status (the constant is named |
127 | 2915335f | Michael Hanselmann | ``JOB_STATUS_WAITLOCK``, but the actual value is ``waiting``). There the |
128 | 2915335f | Michael Hanselmann | following possibilities: |
129 | 2915335f | Michael Hanselmann | |
130 | 2915335f | Michael Hanselmann | #. Introduce a new status, e.g. "waitdeps". |
131 | 2915335f | Michael Hanselmann | |
132 | 2915335f | Michael Hanselmann | Pro: |
133 | 2915335f | Michael Hanselmann | |
134 | 2915335f | Michael Hanselmann | - Clients know for sure a job is waiting for dependencies, not locks |
135 | 2915335f | Michael Hanselmann | |
136 | 2915335f | Michael Hanselmann | Con: |
137 | 2915335f | Michael Hanselmann | |
138 | 2915335f | Michael Hanselmann | - Code and tests would have to be updated/extended for the new status |
139 | 2915335f | Michael Hanselmann | - List of possible state transitions certainly wouldn't get simpler |
140 | 2915335f | Michael Hanselmann | - Breaks backwards compatibility, older clients might get confused |
141 | 2915335f | Michael Hanselmann | |
142 | 2915335f | Michael Hanselmann | #. Use existing "waiting" status. |
143 | 2915335f | Michael Hanselmann | |
144 | 2915335f | Michael Hanselmann | Pro: |
145 | 2915335f | Michael Hanselmann | |
146 | 2915335f | Michael Hanselmann | - No client changes necessary, less code churn (note that there are |
147 | 2915335f | Michael Hanselmann | clients which don't live in Ganeti core) |
148 | 2915335f | Michael Hanselmann | - Clients don't need to know the difference between waiting for a job |
149 | 2915335f | Michael Hanselmann | and waiting for a lock; it doesn't make a difference |
150 | 2915335f | Michael Hanselmann | - Fewer state transitions (see commit ``5fd6b69479c0``, which removed |
151 | 2915335f | Michael Hanselmann | many state transitions and disk writes) |
152 | 2915335f | Michael Hanselmann | |
153 | 2915335f | Michael Hanselmann | Con: |
154 | 2915335f | Michael Hanselmann | |
155 | 2915335f | Michael Hanselmann | - Not immediately visible what a job is waiting for, but it's the |
156 | 2915335f | Michael Hanselmann | same issue with locks; this is the reason why the lock monitor |
157 | 2915335f | Michael Hanselmann | (``gnt-debug locks``) was introduced; job dependencies can be shown |
158 | 2915335f | Michael Hanselmann | as "locks" in the monitor |
159 | 2915335f | Michael Hanselmann | |
160 | 2915335f | Michael Hanselmann | Based on these arguments, the proposal is to do the following: |
161 | 2915335f | Michael Hanselmann | |
162 | 2915335f | Michael Hanselmann | - Rename ``JOB_STATUS_WAITLOCK`` constant to ``JOB_STATUS_WAITING`` to |
163 | 2915335f | Michael Hanselmann | reflect its actual meanting: the job is waiting for something |
164 | 2915335f | Michael Hanselmann | - While waiting for dependencies and locks, jobs are in the "waiting" |
165 | 2915335f | Michael Hanselmann | status |
166 | 2915335f | Michael Hanselmann | - Export dependency information in lock monitor; example output:: |
167 | 2915335f | Michael Hanselmann | |
168 | 2915335f | Michael Hanselmann | Name Mode Owner Pending |
169 | 2915335f | Michael Hanselmann | job/27491 - - success:job/34709,job/21459 |
170 | 2915335f | Michael Hanselmann | job/21459 - - success,error:job/14513 |
171 | 2915335f | Michael Hanselmann | |
172 | 2915335f | Michael Hanselmann | |
173 | 2915335f | Michael Hanselmann | Cost of deserialization |
174 | 2915335f | Michael Hanselmann | ~~~~~~~~~~~~~~~~~~~~~~~ |
175 | 2915335f | Michael Hanselmann | |
176 | 2915335f | Michael Hanselmann | To determine the status of a dependency job the job queue must have |
177 | 2915335f | Michael Hanselmann | access to its data structure. Other queue operations already do this, |
178 | 2915335f | Michael Hanselmann | e.g. archiving, watching a job's progress and querying jobs. |
179 | 2915335f | Michael Hanselmann | |
180 | 2915335f | Michael Hanselmann | Initially (Ganeti 2.0/2.1) the job queue shared the job objects |
181 | 2915335f | Michael Hanselmann | in memory and protected them using locks. Ganeti 2.2 (see :doc:`design |
182 | 2915335f | Michael Hanselmann | document <design-2.2>`) changed the queue to read and deserialize jobs |
183 | 2915335f | Michael Hanselmann | from disk. This significantly reduced locking and code complexity. |
184 | 2915335f | Michael Hanselmann | Nowadays inotify is used to wait for changes on job files when watching |
185 | 2915335f | Michael Hanselmann | a job's progress. |
186 | 2915335f | Michael Hanselmann | |
187 | 2915335f | Michael Hanselmann | Reading from disk and deserializing certainly has some cost associated |
188 | 2915335f | Michael Hanselmann | with it, but it's a significantly simpler architecture than |
189 | 2915335f | Michael Hanselmann | synchronizing in memory with locks. At the stage where dependencies are |
190 | 2915335f | Michael Hanselmann | evaluated the queue lock is held in shared mode, so different workers |
191 | 2915335f | Michael Hanselmann | can read at the same time (deliberately ignoring CPython's interpreter |
192 | 2915335f | Michael Hanselmann | lock). |
193 | 2915335f | Michael Hanselmann | |
194 | 2915335f | Michael Hanselmann | It is expected that the majority of executed jobs won't use |
195 | 2915335f | Michael Hanselmann | dependencies and therefore won't be affected. |
196 | 2915335f | Michael Hanselmann | |
197 | 2915335f | Michael Hanselmann | |
198 | 6c3d18e0 | Michael Hanselmann | Other discussed solutions |
199 | 6c3d18e0 | Michael Hanselmann | ========================= |
200 | 6c3d18e0 | Michael Hanselmann | |
201 | 6c3d18e0 | Michael Hanselmann | Job-level attribute |
202 | 6c3d18e0 | Michael Hanselmann | ------------------- |
203 | 6c3d18e0 | Michael Hanselmann | |
204 | 6c3d18e0 | Michael Hanselmann | At a first look it might seem to be better to put dependencies on |
205 | 6c3d18e0 | Michael Hanselmann | previous jobs at a job level. However, it turns out that having the |
206 | 6c3d18e0 | Michael Hanselmann | option of defining only a single opcode in a job as having such a |
207 | 6c3d18e0 | Michael Hanselmann | dependency can be useful as well. The code complexity in the job queue |
208 | 6c3d18e0 | Michael Hanselmann | is equivalent if not simpler. |
209 | 6c3d18e0 | Michael Hanselmann | |
210 | 6c3d18e0 | Michael Hanselmann | Since opcodes are guaranteed to run in order, clients can just define |
211 | 6c3d18e0 | Michael Hanselmann | the dependency on the first opcode. |
212 | 6c3d18e0 | Michael Hanselmann | |
213 | 6c3d18e0 | Michael Hanselmann | Another reason for the choice of an opcode-level attribute is that the |
214 | 6c3d18e0 | Michael Hanselmann | current LUXI interface for submitting jobs is a bit restricted and would |
215 | 6c3d18e0 | Michael Hanselmann | need to be changed to allow the addition of job-level attributes, |
216 | 6c3d18e0 | Michael Hanselmann | potentially requiring changes in all LUXI clients and/or breaking |
217 | 6c3d18e0 | Michael Hanselmann | backwards compatibility. |
218 | 6c3d18e0 | Michael Hanselmann | |
219 | 6c3d18e0 | Michael Hanselmann | |
220 | 6c3d18e0 | Michael Hanselmann | Client-side logic |
221 | 6c3d18e0 | Michael Hanselmann | ----------------- |
222 | 6c3d18e0 | Michael Hanselmann | |
223 | 6c3d18e0 | Michael Hanselmann | There's at least one implementation of a batched job executor twisted |
224 | 6c3d18e0 | Michael Hanselmann | into the ``burnin`` tool's code. While certainly possible, a client-side |
225 | 6c3d18e0 | Michael Hanselmann | solution should be avoided due to the different clients already in use. |
226 | 6c3d18e0 | Michael Hanselmann | For one, the :doc:`remote API <rapi>` client shouldn't import |
227 | 6c3d18e0 | Michael Hanselmann | non-standard modules. htools are written in Haskell and can't use Python |
228 | 6c3d18e0 | Michael Hanselmann | modules. A batched job executor contains quite some logic. Even if |
229 | 6c3d18e0 | Michael Hanselmann | cleanly abstracted in a (Python) library, sharing code between different |
230 | 6c3d18e0 | Michael Hanselmann | clients is difficult if not impossible. |
231 | 6c3d18e0 | Michael Hanselmann | |
232 | 6c3d18e0 | Michael Hanselmann | |
233 | 6c3d18e0 | Michael Hanselmann | .. vim: set textwidth=72 : |
234 | 6c3d18e0 | Michael Hanselmann | .. Local Variables: |
235 | 6c3d18e0 | Michael Hanselmann | .. mode: rst |
236 | 6c3d18e0 | Michael Hanselmann | .. fill-column: 72 |
237 | 6c3d18e0 | Michael Hanselmann | .. End: |