Revision e408eb8a

b/Makefile.am
454 454
	doc/design-oob.rst \
455 455
	doc/design-ovf-support.rst \
456 456
	doc/design-opportunistic-locking.rst \
457
	doc/design-optables.rst \
457 458
	doc/design-partitioned.rst \
458 459
	doc/design-query-splitting.rst \
459 460
	doc/design-query2.rst \
b/doc/design-draft.rst
23 23
   design-hugepages-support.rst
24 24
   design-cmdlib-unittests.rst
25 25
   design-hotplug.rst
26
   design-optables.rst
26 27

  
27 28
.. vim: set textwidth=72 :
28 29
.. Local Variables:
b/doc/design-optables.rst
1
==========================================
2
Filtering of jobs for the Ganeti job queue
3
==========================================
4

  
5
.. contents:: :depth: 4
6

  
7
This is a design document detailing the semantics of the fine-grained control
8
of jobs in Ganeti. For the implementation there will be a separate
9
design document that also describes the vision for the Ganeti daemon
10
structure.
11

  
12

  
13
Current state and shortcomings
14
==============================
15

  
16
Control of the Ganeti job queue is quite limited. There is a single
17
status bit, the "drained flag". If set, no new jobs are accepted to
18
the queue. This is too coarse for some use cases.
19

  
20
- The queue might be required to be drained for several reasons,
21
  initiated by different persons or automatic programs. Each one
22
  should be able to indicate that his reasons for draining are over
23
  without affecting the others.
24

  
25
- There is no support for partial drains. For example, one might want
26
  to allow all jobs belonging to a manual (or externally coordinated)
27
  maintenance, while disallowing all other jobs.
28

  
29
- There is no support for blocking jobs by their op-codes, e.g.,
30
  disallowing all jobs that bring new instances to a cluster. This might
31
  be part of a maintenance preparation.
32

  
33
- There is no support for a soft version of draining, where all
34
  jobs currently in the queue are finished, while new jobs entering
35
  the queue are delayed until the drain is over.
36

  
37

  
38
Proposed changes
39
================
40

  
41
We propose to add filters on the job queue. These will be part of the
42
configuration and as such are persisted with it. Conceptionally, the
43
filters are always processed when a job enters the queue and while it
44
is still in the queue. Of course, in the implementation, reevaluation
45
is only carried out, if something could make the result change, e.g.,
46
a new job is entered to the queue, or the filter rules are changed.
47
There is no distinction between filter processing when a job is about
48
to enter the queue and while it is in the queue, as this can be
49
expressed by the filter rules themselves (see predicates below).
50

  
51
Format of a Filter rule
52
-----------------------
53

  
54
Filter rules are given by the following data.
55

  
56
- A UUID. This ensures that there can be different filter rules
57
  that otherwise have all parameters equal. In this way, multiple
58
  drains for different reasons are possible. The UUID is used to
59
  address the filter rule, in particular for deletion.
60

  
61
  If no UUID is provided at rule addition, Ganeti will create one.
62

  
63
- The watermark. This is the highest job id ever used, as valid in
64
  the moment when the filter was added. This data will be added
65
  automatically upon addition of the filter.
66

  
67
- A priority. This is a non-negative integer. Filters are processed
68
  in order of increasing priority until a rule applies. While there
69
  is a well-defined order in which rules of the same priority are
70
  evaluated (increasing watermark, then the uuid, are taken as tie
71
  breakers), it is not recommended to have rules of the same priority
72
  that overlap and have different actions associated.
73

  
74
- A list of predicates. The rule fires, if all of them hold true
75
  for the job.
76

  
77
- An action. For the time being, one of the following, but more
78
  actions might be added in the future (in particular, future
79
  implementations might add an action making filtering continue with
80
  a different filter chain).
81

  
82
  - ACCEPT. The job will be accepted; no further filter rules
83
    are applied.
84
  - PAUSE. The job will be accepted to the queue and remain there;
85
    however, it is not executed. If an opcode is currently running,
86
    it continues, but the next opcode will not be started. For a paused
87
    job all locks it might have acquired will be released as soon as
88
    possible, at the latest when the currently running opcode has
89
    finished. The job queue will take care of this.
90
  - REJECT. The job is rejected. If it is already in the queue,
91
    it will be marked as cancelled.
92
  - CONTINUE. The filtering continues processing with the next
93
    rule. Such a rule will never have any direct or indirect effect,
94
    but it can serve as documentation for a "normally present, but
95
    currently disabled" rule.
96

  
97
- A reason trail, in the same format as reason trails for opcodes. 
98
  This allows to find out, which maintenance (or other reason) caused
99
  the addition of this filter rule.
100

  
101
Predicates available for the filter rules
102
-----------------------------------------
103

  
104
A predicate is a list, with the first element being the name of the
105
predicate and the rest being parameters suitable for that predicate.
106
In most cases, the name of the predicate will be a field of a job,
107
and there will be a single parameter, which is a boolean expression
108
(``filter``) in the sense
109
of the Ganeti query language. However, no assumption should be made
110
that all predicates are of this shape. More predicates may be added
111
in the future.
112

  
113
- ``jobid``. Only parameter is a boolean expression. For this expression,
114
  there is only one field available, ``id``, which represents the id the job to be
115
  filtered. In all value positions, the string ``watermark`` will be
116
  replaced by the value of the watermark.
117

  
118
- ``opcode``. Only parameter is boolean expresion. For this expression, ``OP_ID``
119
  and all other fields present in the opcode are available. This predicate
120
  will hold true, if the expression is true for at least one opcode in
121
  the job.
122

  
123
- ``reason``. Only parameter is a boolean expression. For this expression, the three
124
  fields ``source``, ``reason``, ``timestamp`` of reason trail entries
125
  are available. This predicate is true, if one of the entries of one
126
  of the opcodes in this job satisfies the expression.
127

  
128

  
129
Examples
130
========
131

  
132
Draining the queue.
133
::
134

  
135
   {'priority': 0,
136
    'predicates': [['jobid', ['>', 'id', 'watermark']]],
137
    'action': 'REJECT'}
138

  
139
Soft draining could be achieved by replacing ``REJECT`` by ``PAUSE`` in the
140
above example.
141

  
142
Pausing all new jobs not belonging to a specific maintenance.
143
::
144

  
145
   {'priority': 1,
146
    'predicates': [['jobid', ['>', 'id', 'watermark']],
147
                   ['reason', ['!', ['=~', 'reason', 'maintenance pink bunny']]]],
148
    'action': 'PAUSE'}
149

  
150
Canceling all queued instance creations and disallowing new such jobs.
151
::
152

  
153
  {'priority': 1,
154
   'predicates': [['opcode', ['=', 'OP_ID', 'OP_INSTANCE_CREATE']]],
155
   'action': 'REJECT'}
156

  
157

  
158

  
159
Interface
160
=========
161

  
162
Since queue control is intended to be used by external maintenance-handling
163
tools as well, the primary interface for manipulating queue filters is the
164
:doc:`rapi`. For convenience, a command-line interface will be added as well.
165

  
166
The following resources will be added.
167

  
168
- /2/filters/
169

  
170
  - GET returns the list of all currently set filters
171

  
172
  - POST adds a new filter
173

  
174
- /2/filters/[uuid]
175

  
176
  - GET returns the description of the specified filter
177

  
178
  - DELETE removes the specified filter
179

  
180
  - PUT replaces the specified filter rule, or creates it,
181
    if it doesn't exist already.
182

  
183
Security considerations
184
=======================
185

  
186
Filtering of jobs is not a security feature. It merely serves the purpose
187
of coordinating efforts and avoiding accidental conflicting
188
jobs. Everybody with appropriate credentials can modify the filter
189
rules, not just the originator of a rule. To avoid accidental
190
lock-out, requests modifying the queue are executed directly and not
191
going through the queue themselves.

Also available in: Unified diff