|
1 |
==========================================
|
|
2 |
Filtering of jobs for the Ganeti job queue
|
|
3 |
==========================================
|
|
4 |
|
|
5 |
.. contents:: :depth: 4
|
|
6 |
|
|
7 |
This is a design document detailing the semantics of the fine-grained control
|
|
8 |
of jobs in Ganeti. For the implementation there will be a separate
|
|
9 |
design document that also describes the vision for the Ganeti daemon
|
|
10 |
structure.
|
|
11 |
|
|
12 |
|
|
13 |
Current state and shortcomings
|
|
14 |
==============================
|
|
15 |
|
|
16 |
Control of the Ganeti job queue is quite limited. There is a single
|
|
17 |
status bit, the "drained flag". If set, no new jobs are accepted to
|
|
18 |
the queue. This is too coarse for some use cases.
|
|
19 |
|
|
20 |
- The queue might be required to be drained for several reasons,
|
|
21 |
initiated by different persons or automatic programs. Each one
|
|
22 |
should be able to indicate that his reasons for draining are over
|
|
23 |
without affecting the others.
|
|
24 |
|
|
25 |
- There is no support for partial drains. For example, one might want
|
|
26 |
to allow all jobs belonging to a manual (or externally coordinated)
|
|
27 |
maintenance, while disallowing all other jobs.
|
|
28 |
|
|
29 |
- There is no support for blocking jobs by their op-codes, e.g.,
|
|
30 |
disallowing all jobs that bring new instances to a cluster. This might
|
|
31 |
be part of a maintenance preparation.
|
|
32 |
|
|
33 |
- There is no support for a soft version of draining, where all
|
|
34 |
jobs currently in the queue are finished, while new jobs entering
|
|
35 |
the queue are delayed until the drain is over.
|
|
36 |
|
|
37 |
|
|
38 |
Proposed changes
|
|
39 |
================
|
|
40 |
|
|
41 |
We propose to add filters on the job queue. These will be part of the
|
|
42 |
configuration and as such are persisted with it. Conceptionally, the
|
|
43 |
filters are always processed when a job enters the queue and while it
|
|
44 |
is still in the queue. Of course, in the implementation, reevaluation
|
|
45 |
is only carried out, if something could make the result change, e.g.,
|
|
46 |
a new job is entered to the queue, or the filter rules are changed.
|
|
47 |
There is no distinction between filter processing when a job is about
|
|
48 |
to enter the queue and while it is in the queue, as this can be
|
|
49 |
expressed by the filter rules themselves (see predicates below).
|
|
50 |
|
|
51 |
Format of a Filter rule
|
|
52 |
-----------------------
|
|
53 |
|
|
54 |
Filter rules are given by the following data.
|
|
55 |
|
|
56 |
- A UUID. This ensures that there can be different filter rules
|
|
57 |
that otherwise have all parameters equal. In this way, multiple
|
|
58 |
drains for different reasons are possible. The UUID is used to
|
|
59 |
address the filter rule, in particular for deletion.
|
|
60 |
|
|
61 |
If no UUID is provided at rule addition, Ganeti will create one.
|
|
62 |
|
|
63 |
- The watermark. This is the highest job id ever used, as valid in
|
|
64 |
the moment when the filter was added. This data will be added
|
|
65 |
automatically upon addition of the filter.
|
|
66 |
|
|
67 |
- A priority. This is a non-negative integer. Filters are processed
|
|
68 |
in order of increasing priority until a rule applies. While there
|
|
69 |
is a well-defined order in which rules of the same priority are
|
|
70 |
evaluated (increasing watermark, then the uuid, are taken as tie
|
|
71 |
breakers), it is not recommended to have rules of the same priority
|
|
72 |
that overlap and have different actions associated.
|
|
73 |
|
|
74 |
- A list of predicates. The rule fires, if all of them hold true
|
|
75 |
for the job.
|
|
76 |
|
|
77 |
- An action. For the time being, one of the following, but more
|
|
78 |
actions might be added in the future (in particular, future
|
|
79 |
implementations might add an action making filtering continue with
|
|
80 |
a different filter chain).
|
|
81 |
|
|
82 |
- ACCEPT. The job will be accepted; no further filter rules
|
|
83 |
are applied.
|
|
84 |
- PAUSE. The job will be accepted to the queue and remain there;
|
|
85 |
however, it is not executed. If an opcode is currently running,
|
|
86 |
it continues, but the next opcode will not be started. For a paused
|
|
87 |
job all locks it might have acquired will be released as soon as
|
|
88 |
possible, at the latest when the currently running opcode has
|
|
89 |
finished. The job queue will take care of this.
|
|
90 |
- REJECT. The job is rejected. If it is already in the queue,
|
|
91 |
it will be marked as cancelled.
|
|
92 |
- CONTINUE. The filtering continues processing with the next
|
|
93 |
rule. Such a rule will never have any direct or indirect effect,
|
|
94 |
but it can serve as documentation for a "normally present, but
|
|
95 |
currently disabled" rule.
|
|
96 |
|
|
97 |
- A reason trail, in the same format as reason trails for opcodes.
|
|
98 |
This allows to find out, which maintenance (or other reason) caused
|
|
99 |
the addition of this filter rule.
|
|
100 |
|
|
101 |
Predicates available for the filter rules
|
|
102 |
-----------------------------------------
|
|
103 |
|
|
104 |
A predicate is a list, with the first element being the name of the
|
|
105 |
predicate and the rest being parameters suitable for that predicate.
|
|
106 |
In most cases, the name of the predicate will be a field of a job,
|
|
107 |
and there will be a single parameter, which is a boolean expression
|
|
108 |
(``filter``) in the sense
|
|
109 |
of the Ganeti query language. However, no assumption should be made
|
|
110 |
that all predicates are of this shape. More predicates may be added
|
|
111 |
in the future.
|
|
112 |
|
|
113 |
- ``jobid``. Only parameter is a boolean expression. For this expression,
|
|
114 |
there is only one field available, ``id``, which represents the id the job to be
|
|
115 |
filtered. In all value positions, the string ``watermark`` will be
|
|
116 |
replaced by the value of the watermark.
|
|
117 |
|
|
118 |
- ``opcode``. Only parameter is boolean expresion. For this expression, ``OP_ID``
|
|
119 |
and all other fields present in the opcode are available. This predicate
|
|
120 |
will hold true, if the expression is true for at least one opcode in
|
|
121 |
the job.
|
|
122 |
|
|
123 |
- ``reason``. Only parameter is a boolean expression. For this expression, the three
|
|
124 |
fields ``source``, ``reason``, ``timestamp`` of reason trail entries
|
|
125 |
are available. This predicate is true, if one of the entries of one
|
|
126 |
of the opcodes in this job satisfies the expression.
|
|
127 |
|
|
128 |
|
|
129 |
Examples
|
|
130 |
========
|
|
131 |
|
|
132 |
Draining the queue.
|
|
133 |
::
|
|
134 |
|
|
135 |
{'priority': 0,
|
|
136 |
'predicates': [['jobid', ['>', 'id', 'watermark']]],
|
|
137 |
'action': 'REJECT'}
|
|
138 |
|
|
139 |
Soft draining could be achieved by replacing ``REJECT`` by ``PAUSE`` in the
|
|
140 |
above example.
|
|
141 |
|
|
142 |
Pausing all new jobs not belonging to a specific maintenance.
|
|
143 |
::
|
|
144 |
|
|
145 |
{'priority': 1,
|
|
146 |
'predicates': [['jobid', ['>', 'id', 'watermark']],
|
|
147 |
['reason', ['!', ['=~', 'reason', 'maintenance pink bunny']]]],
|
|
148 |
'action': 'PAUSE'}
|
|
149 |
|
|
150 |
Canceling all queued instance creations and disallowing new such jobs.
|
|
151 |
::
|
|
152 |
|
|
153 |
{'priority': 1,
|
|
154 |
'predicates': [['opcode', ['=', 'OP_ID', 'OP_INSTANCE_CREATE']]],
|
|
155 |
'action': 'REJECT'}
|
|
156 |
|
|
157 |
|
|
158 |
|
|
159 |
Interface
|
|
160 |
=========
|
|
161 |
|
|
162 |
Since queue control is intended to be used by external maintenance-handling
|
|
163 |
tools as well, the primary interface for manipulating queue filters is the
|
|
164 |
:doc:`rapi`. For convenience, a command-line interface will be added as well.
|
|
165 |
|
|
166 |
The following resources will be added.
|
|
167 |
|
|
168 |
- /2/filters/
|
|
169 |
|
|
170 |
- GET returns the list of all currently set filters
|
|
171 |
|
|
172 |
- POST adds a new filter
|
|
173 |
|
|
174 |
- /2/filters/[uuid]
|
|
175 |
|
|
176 |
- GET returns the description of the specified filter
|
|
177 |
|
|
178 |
- DELETE removes the specified filter
|
|
179 |
|
|
180 |
- PUT replaces the specified filter rule, or creates it,
|
|
181 |
if it doesn't exist already.
|
|
182 |
|
|
183 |
Security considerations
|
|
184 |
=======================
|
|
185 |
|
|
186 |
Filtering of jobs is not a security feature. It merely serves the purpose
|
|
187 |
of coordinating efforts and avoiding accidental conflicting
|
|
188 |
jobs. Everybody with appropriate credentials can modify the filter
|
|
189 |
rules, not just the originator of a rule. To avoid accidental
|
|
190 |
lock-out, requests modifying the queue are executed directly and not
|
|
191 |
going through the queue themselves.
|