|
1 |
==================================
|
|
2 |
Submitting jobs from logical units
|
|
3 |
==================================
|
|
4 |
|
|
5 |
.. contents:: :depth: 4
|
|
6 |
|
|
7 |
This is a design document about the innards of Ganeti's job processing.
|
|
8 |
Readers are advised to study previous design documents on the topic:
|
|
9 |
|
|
10 |
- :ref:`Original job queue <jqueue-original-design>`
|
|
11 |
- :ref:`Job priorities <jqueue-job-priority-design>`
|
|
12 |
|
|
13 |
|
|
14 |
Current state and shortcomings
|
|
15 |
==============================
|
|
16 |
|
|
17 |
Some Ganeti operations want to execute as many operations in parallel as
|
|
18 |
possible. Examples are evacuating or failing over a node (``gnt-node
|
|
19 |
evacuate``/``gnt-node failover``). Without changing large parts of the
|
|
20 |
code, e.g. the RPC layer, to be asynchronous, or using threads inside a
|
|
21 |
logical unit, only a single operation can be executed at a time per job.
|
|
22 |
|
|
23 |
Currently clients work around this limitation by retrieving the list of
|
|
24 |
desired targets and then re-submitting a number of jobs. This requires
|
|
25 |
logic to be kept in the client, in some cases leading to duplication
|
|
26 |
(e.g. CLI and RAPI).
|
|
27 |
|
|
28 |
|
|
29 |
Proposed changes
|
|
30 |
================
|
|
31 |
|
|
32 |
The job queue lock is guaranteed to be released while executing an
|
|
33 |
opcode/logical unit. This means an opcode can talk to the job queue and
|
|
34 |
submit more jobs. It then receives the job IDs, like any job submitter
|
|
35 |
using the LUXI interface would. These job IDs are returned to the
|
|
36 |
client, who then will then proceed to wait for the jobs to finish.
|
|
37 |
|
|
38 |
Technically, the job queue already passes a number of callbacks to the
|
|
39 |
opcode processor. These are used for giving user feedback, notifying the
|
|
40 |
job queue of an opcode having gotten its locks, and checking whether the
|
|
41 |
opcode has been cancelled. A new callback function is added to submit
|
|
42 |
jobs. Its signature and result will be equivalent to the job queue's
|
|
43 |
existing ``SubmitManyJobs`` function.
|
|
44 |
|
|
45 |
Logical units can submit jobs by returning an instance of a special
|
|
46 |
container class with a list of jobs, each of which is a list of opcodes
|
|
47 |
(e.g. ``[[op1, op2], [op3]]``). The opcode processor will recognize
|
|
48 |
instances of the special class when used a return value and will submit
|
|
49 |
the contained jobs. The submission status and job IDs returned by the
|
|
50 |
submission callback are used as the opcode's result. It should be
|
|
51 |
encapsulated in a dictionary allowing for future extensions.
|
|
52 |
|
|
53 |
.. highlight:: javascript
|
|
54 |
|
|
55 |
Example::
|
|
56 |
|
|
57 |
{
|
|
58 |
"jobs": [
|
|
59 |
(True, "8149"),
|
|
60 |
(True, "21019"),
|
|
61 |
(False, "Submission failed"),
|
|
62 |
(True, "31594"),
|
|
63 |
],
|
|
64 |
}
|
|
65 |
|
|
66 |
Job submissions can fail for variety of reasons, e.g. a full or drained
|
|
67 |
job queue. Lists of jobs can not be submitted atomically, meaning some
|
|
68 |
might fail while others succeed. The client is responsible for handling
|
|
69 |
such cases.
|
|
70 |
|
|
71 |
|
|
72 |
Other discussed solutions
|
|
73 |
=========================
|
|
74 |
|
|
75 |
Instead of requiring the client to wait for the returned jobs, another
|
|
76 |
idea was to do so from within the submitting opcode in the master
|
|
77 |
daemon. While technically possible, doing so would have two major
|
|
78 |
drawbacks:
|
|
79 |
|
|
80 |
- Opcodes waiting for other jobs to finish block one job queue worker
|
|
81 |
thread
|
|
82 |
- All locks must be released before starting the waiting process,
|
|
83 |
failure to do so can lead to deadlocks
|
|
84 |
|
|
85 |
Instead of returning the job IDs as part of the normal opcode result,
|
|
86 |
introducing a new opcode field, e.g. ``op_jobids``, was discussed and
|
|
87 |
dismissed. A new field would touch many areas and possibly break some
|
|
88 |
assumptions. There were also questions about the semantics.
|
|
89 |
|
|
90 |
.. vim: set textwidth=72 :
|
|
91 |
.. Local Variables:
|
|
92 |
.. mode: rst
|
|
93 |
.. fill-column: 72
|
|
94 |
.. End:
|