Statistics
| Branch: | Tag: | Revision:

root / doc / design-lu-generated-jobs.rst @ 33c730a2

History | View | Annotate | Download (3.4 kB)

1 ed9fda24 Michael Hanselmann
==================================
2 ed9fda24 Michael Hanselmann
Submitting jobs from logical units
3 ed9fda24 Michael Hanselmann
==================================
4 ed9fda24 Michael Hanselmann
5 ed9fda24 Michael Hanselmann
.. contents:: :depth: 4
6 ed9fda24 Michael Hanselmann
7 ed9fda24 Michael Hanselmann
This is a design document about the innards of Ganeti's job processing.
8 ed9fda24 Michael Hanselmann
Readers are advised to study previous design documents on the topic:
9 ed9fda24 Michael Hanselmann
10 ed9fda24 Michael Hanselmann
- :ref:`Original job queue <jqueue-original-design>`
11 ed9fda24 Michael Hanselmann
- :ref:`Job priorities <jqueue-job-priority-design>`
12 ed9fda24 Michael Hanselmann
13 ed9fda24 Michael Hanselmann
14 ed9fda24 Michael Hanselmann
Current state and shortcomings
15 ed9fda24 Michael Hanselmann
==============================
16 ed9fda24 Michael Hanselmann
17 ed9fda24 Michael Hanselmann
Some Ganeti operations want to execute as many operations in parallel as
18 ed9fda24 Michael Hanselmann
possible. Examples are evacuating or failing over a node (``gnt-node
19 ed9fda24 Michael Hanselmann
evacuate``/``gnt-node failover``). Without changing large parts of the
20 ed9fda24 Michael Hanselmann
code, e.g. the RPC layer, to be asynchronous, or using threads inside a
21 ed9fda24 Michael Hanselmann
logical unit, only a single operation can be executed at a time per job.
22 ed9fda24 Michael Hanselmann
23 ed9fda24 Michael Hanselmann
Currently clients work around this limitation by retrieving the list of
24 ed9fda24 Michael Hanselmann
desired targets and then re-submitting a number of jobs. This requires
25 ed9fda24 Michael Hanselmann
logic to be kept in the client, in some cases leading to duplication
26 ed9fda24 Michael Hanselmann
(e.g. CLI and RAPI).
27 ed9fda24 Michael Hanselmann
28 ed9fda24 Michael Hanselmann
29 ed9fda24 Michael Hanselmann
Proposed changes
30 ed9fda24 Michael Hanselmann
================
31 ed9fda24 Michael Hanselmann
32 ed9fda24 Michael Hanselmann
The job queue lock is guaranteed to be released while executing an
33 ed9fda24 Michael Hanselmann
opcode/logical unit. This means an opcode can talk to the job queue and
34 ed9fda24 Michael Hanselmann
submit more jobs. It then receives the job IDs, like any job submitter
35 ed9fda24 Michael Hanselmann
using the LUXI interface would. These job IDs are returned to the
36 ed9fda24 Michael Hanselmann
client, who then will then proceed to wait for the jobs to finish.
37 ed9fda24 Michael Hanselmann
38 ed9fda24 Michael Hanselmann
Technically, the job queue already passes a number of callbacks to the
39 ed9fda24 Michael Hanselmann
opcode processor. These are used for giving user feedback, notifying the
40 ed9fda24 Michael Hanselmann
job queue of an opcode having gotten its locks, and checking whether the
41 ed9fda24 Michael Hanselmann
opcode has been cancelled. A new callback function is added to submit
42 ed9fda24 Michael Hanselmann
jobs. Its signature and result will be equivalent to the job queue's
43 ed9fda24 Michael Hanselmann
existing ``SubmitManyJobs`` function.
44 ed9fda24 Michael Hanselmann
45 ed9fda24 Michael Hanselmann
Logical units can submit jobs by returning an instance of a special
46 ed9fda24 Michael Hanselmann
container class with a list of jobs, each of which is a list of opcodes
47 ed9fda24 Michael Hanselmann
(e.g.  ``[[op1, op2], [op3]]``). The opcode processor will recognize
48 ed9fda24 Michael Hanselmann
instances of the special class when used a return value and will submit
49 ed9fda24 Michael Hanselmann
the contained jobs. The submission status and job IDs returned by the
50 ed9fda24 Michael Hanselmann
submission callback are used as the opcode's result. It should be
51 ed9fda24 Michael Hanselmann
encapsulated in a dictionary allowing for future extensions.
52 ed9fda24 Michael Hanselmann
53 ed9fda24 Michael Hanselmann
.. highlight:: javascript
54 ed9fda24 Michael Hanselmann
55 ed9fda24 Michael Hanselmann
Example::
56 ed9fda24 Michael Hanselmann
57 ed9fda24 Michael Hanselmann
  {
58 ed9fda24 Michael Hanselmann
    "jobs": [
59 ed9fda24 Michael Hanselmann
      (True, "8149"),
60 ed9fda24 Michael Hanselmann
      (True, "21019"),
61 ed9fda24 Michael Hanselmann
      (False, "Submission failed"),
62 ed9fda24 Michael Hanselmann
      (True, "31594"),
63 ed9fda24 Michael Hanselmann
      ],
64 ed9fda24 Michael Hanselmann
  }
65 ed9fda24 Michael Hanselmann
66 ed9fda24 Michael Hanselmann
Job submissions can fail for variety of reasons, e.g. a full or drained
67 ed9fda24 Michael Hanselmann
job queue. Lists of jobs can not be submitted atomically, meaning some
68 ed9fda24 Michael Hanselmann
might fail while others succeed. The client is responsible for handling
69 ed9fda24 Michael Hanselmann
such cases.
70 ed9fda24 Michael Hanselmann
71 ed9fda24 Michael Hanselmann
72 ed9fda24 Michael Hanselmann
Other discussed solutions
73 ed9fda24 Michael Hanselmann
=========================
74 ed9fda24 Michael Hanselmann
75 ed9fda24 Michael Hanselmann
Instead of requiring the client to wait for the returned jobs, another
76 ed9fda24 Michael Hanselmann
idea was to do so from within the submitting opcode in the master
77 ed9fda24 Michael Hanselmann
daemon. While technically possible, doing so would have two major
78 ed9fda24 Michael Hanselmann
drawbacks:
79 ed9fda24 Michael Hanselmann
80 ed9fda24 Michael Hanselmann
- Opcodes waiting for other jobs to finish block one job queue worker
81 ed9fda24 Michael Hanselmann
  thread
82 ed9fda24 Michael Hanselmann
- All locks must be released before starting the waiting process,
83 ed9fda24 Michael Hanselmann
  failure to do so can lead to deadlocks
84 ed9fda24 Michael Hanselmann
85 ed9fda24 Michael Hanselmann
Instead of returning the job IDs as part of the normal opcode result,
86 ed9fda24 Michael Hanselmann
introducing a new opcode field, e.g. ``op_jobids``, was discussed and
87 ed9fda24 Michael Hanselmann
dismissed. A new field would touch many areas and possibly break some
88 ed9fda24 Michael Hanselmann
assumptions. There were also questions about the semantics.
89 ed9fda24 Michael Hanselmann
90 ed9fda24 Michael Hanselmann
.. vim: set textwidth=72 :
91 ed9fda24 Michael Hanselmann
.. Local Variables:
92 ed9fda24 Michael Hanselmann
.. mode: rst
93 ed9fda24 Michael Hanselmann
.. fill-column: 72
94 ed9fda24 Michael Hanselmann
.. End: