Statistics
| Branch: | Tag: | Revision:

root / doc / design-oob.rst @ 7142485a

History | View | Annotate | Download (16 kB)

1
Ganeti Node OOB Management Framework
2
====================================
3

    
4
Objective
5
---------
6

    
7
Extend Ganeti with Out of Band (:term:`OOB`) Cluster Node Management
8
Capabilities.
9

    
10
Background
11
----------
12

    
13
Ganeti currently has no support for Out of Band management of the nodes
14
in a cluster. It relies on the OS running on the nodes and has therefore
15
limited possibilities when the OS is not responding. The command
16
``gnt-node powercycle`` can be issued to attempt a reboot of a node that
17
crashed but there are no means to power a node off and power it back
18
on. Supporting this is very handy in the following situations:
19

    
20
  * **Emergency Power Off**: During emergencies, time is critical and
21
    manual tasks just add latency which can be avoided through
22
    automation. If a server room overheats, halting the OS on the nodes
23
    is not enough. The nodes need to be powered off cleanly to prevent
24
    damage to equipment.
25
  * **Repairs**: In most cases, repairing a node means that the node has
26
    to be powered off.
27
  * **Crashes**: Software bugs may crash a node. Having an OS
28
    independent way to power-cycle a node helps to recover the node
29
    without human intervention.
30

    
31
Overview
32
--------
33

    
34
Ganeti will be extended with OOB capabilities through adding a new
35
**Cluster Parameter** (``--oob-program``), a new **Node Property**
36
(``--oob-program``), a new **Node State (powered)** and support in
37
``gnt-node`` for invoking an **External Helper Command** which executes
38
the actual OOB command (``gnt-node <command> nodename ...``). The
39
supported commands are: ``power on``, ``power off``, ``power cycle``,
40
``power status`` and ``health``.
41

    
42
.. note::
43
  The new **Node State (powered)** is a **State of Record**
44
  (:term:`SoR`), not a **State of World** (:term:`SoW`).  The maximum
45
  execution time of the **External Helper Command** will be limited to
46
  60s to prevent the cluster from getting locked for an undefined amount
47
  of time.
48

    
49
Detailed Design
50
---------------
51

    
52
New ``gnt-cluster`` Parameter
53
+++++++++++++++++++++++++++++
54

    
55
| Program: ``gnt-cluster``
56
| Command: ``modify|init``
57
| Parameters: ``--oob-program``
58
| Options: ``--oob-program``: executable OOB program (absolute path)
59

    
60
New ``gnt-cluster epo`` Command
61
+++++++++++++++++++++++++++++++
62

    
63
| Program: ``gnt-cluster``
64
| Command: ``epo``
65
| Parameter: ``--on`` ``--force`` ``--groups`` ``--all``
66
| Options: ``--on``: By default epo turns off, with ``--on`` it tries to get the
67
|                    cluster back online
68
|          ``--force``: To force the operation without asking for confirmation
69
|          ``--groups``: To operate on groups instead of nodes
70
|          ``--all``: To operate on the whole cluster
71

    
72
This is a convenience command to allow easy emergency power off of a
73
whole cluster or part of it. It takes care of all steps needed to get
74
the cluster into a sane state to turn off the nodes.
75

    
76
With ``--on`` it does the reverse and tries to bring the rest of the
77
cluster back to life.
78

    
79
.. note::
80
  The master node is not able to shut itself cleanly down. Therefore,
81
  this command will not do all the work on single node clusters. On
82
  multi node clusters the command tries to find another master or if
83
  that is not possible prepares everything to the point where the user
84
  has to shutdown the master node itself alone this applies also to the
85
  single node cluster configuration.
86

    
87
New ``gnt-node`` Property
88
+++++++++++++++++++++++++
89

    
90
| Program: ``gnt-node``
91
| Command: ``modify|add``
92
| Parameters: ``--oob-program``
93
| Options: ``--oob-program``: executable OOB program (absolute path)
94

    
95
.. note::
96
  If ``--oob-program`` is set to ``!`` then the node has no OOB
97
  capabilities.  Otherwise, we will inherit the node group respectively
98
  the cluster wide value. I.e. the nodes have to opt out from OOB
99
  capabilities.
100

    
101
Addition to ``gnt-cluster verify``
102
++++++++++++++++++++++++++++++++++
103

    
104
| Program: ``gnt-cluster``
105
| Command: ``verify``
106
| Parameter: None
107
| Option: None
108
| Additional Checks:
109

    
110
  1. existence and execution flag of OOB program on all Master
111
     Candidates if the cluster parameter ``--oob-program`` is set or at
112
     least one node has the property ``--oob-program`` set. The OOB
113
     helper is just invoked on the master
114
  2. check if node state powered matches actual power state of the
115
     machine for those nodes where ``--oob-program`` is set
116

    
117
New Node State
118
++++++++++++++
119

    
120
Ganeti supports the following two boolean states related to the nodes:
121

    
122
**drained**
123
  The cluster still communicates with drained nodes but excludes them
124
  from allocation operations
125

    
126
**offline**
127
  if offline, the cluster does not communicate with offline nodes;
128
  useful for nodes that are not reachable in order to avoid delays
129

    
130
And will extend this list with the following boolean state:
131

    
132
**powered**
133
  if not powered, the cluster does not communicate with not powered
134
  nodes if the node property ``--oob-program`` is not set, the state
135
  powered is not displayed
136

    
137
Additionally modify the meaning of the offline state as follows:
138

    
139
**offline**
140
  if offline, the cluster does not communicate with offline nodes
141
  (**with the exception of OOB commands for nodes where**
142
  ``--oob-program`` **is set**); useful for nodes that are not reachable
143
  in order to avoid delays
144

    
145
The corresponding command extensions are:
146

    
147
| Program: ``gnt-node``
148
| Command: ``info``
149
| Parameter:  [ ``nodename`` ... ]
150
| Option: None
151

    
152
Additional Output (:term:`SoR`, ommited if node property
153
``--oob-program`` is not set):
154
powered: ``[True|False]``
155

    
156
| Program: ``gnt-node``
157
| Command: ``modify``
158
| Parameter: nodename
159
| Option: [ ``--powered=yes|no`` ]
160
| Reasoning: sometimes you will need to sync the :term:`SoR` with the :term:`SoW` manually
161
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
162
|         the node in question
163

    
164
New ``gnt-node`` commands: ``power [on|off|cycle|status]``
165
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
166

    
167
| Program: ``gnt-node``
168
| Command: ``power [on|off|cycle|status]``
169
| Parameters: [ ``nodename`` ... ]
170
| Options: None
171
| Caveats:
172

    
173
  * If no nodenames are passed to ``power [on|off|cycle]``, the user
174
    will be prompted with ``"Do you really want to power [on|off|cycle]
175
    the following nodes: <display list of OOB capable nodes in the
176
    cluster)? (y/n)"``
177
  * For ``power-status``, nodename is optional, if omitted, we list the
178
    power-status of all OOB capable nodes in the cluster (:term:`SoW`)
179
  * User should be warned and needs to confirm with yes if s/he tries to
180
    ``power [off|cycle]`` a node with running instances.
181

    
182
Error Handling
183
^^^^^^^^^^^^^^
184

    
185
+-----------------------------+----------------------------------------------+
186
| Exception                   | Error Message                                |
187
+=============================+==============================================+
188
| OOB program return code != 0| OOB program execution failed ($ERROR_MSG)    |
189
+-----------------------------+----------------------------------------------+
190
| OOB program execution time  | OOB program execution timeout exceeded, OOB  |
191
| exceeds 60s                 | program execution aborted                    |
192
+-----------------------------+----------------------------------------------+
193

    
194
Node State Changes
195
^^^^^^^^^^^^^^^^^^
196

    
197
+----------------+---------------+----------------+--------------------------+
198
| State before   |Command        | State after    | Comment                  |
199
| execution      |               | execution      |                          |
200
+================+===============+================+==========================+
201
| powered: False |``power off``  | powered: False | FYI: IPMI will complain  |
202
|                |               |                | if you try to power off  |
203
|                |               |                | a machine that is already|
204
|                |               |                | powered off              |
205
+----------------+---------------+----------------+--------------------------+
206
| powered: False |``power cycle``| powered: False | FYI: IPMI will complain  |
207
|                |               |                | if you try to cycle a    |
208
|                |               |                | machine that is already  |
209
|                |               |                | powered off              |
210
+----------------+---------------+----------------+--------------------------+
211
| powered: False |``power on``   | powered: True  |                          |
212
+----------------+---------------+----------------+--------------------------+
213
| powered: True  |``power off``  | powered: False |                          |
214
+----------------+---------------+----------------+--------------------------+
215
| powered: True  |``power cycle``| powered: True  |                          |
216
+----------------+---------------+----------------+--------------------------+
217
| powered: True  |``power on``   | powered: True  | FYI: IPMI will complain  |
218
|                |               |                | if you try to power on   |
219
|                |               |                | a machine that is already|
220
|                |               |                | powered on               |
221
+----------------+---------------+----------------+--------------------------+
222

    
223
.. note::
224

    
225
  * If the command fails, the Node State remains unchanged.
226
  * We will not prevent the user from trying to power off a node that is
227
    already powered off since the powered state represents the
228
    :term:`SoR` only and not the :term:`SoW`. This can however create
229
    problems when the cluster administrator wants to bring the
230
    :term:`SoR` in sync with the :term:SoW` without actually having to
231
    mess with the node(s). For this case, we allow direct modification
232
    of the powered state through the gnt-node modify
233
    ``--powered=[yes|no]`` command as long as the node has OOB
234
    capabilities (i.e. ``--oob-program`` is set).
235
  * All node power state changes will be logged
236

    
237
Node Power Status Listing (:term:`SoW`)
238
+++++++++++++++++++++++++++++++++++++++
239

    
240
| Program: ``gnt-node``
241
| Command: ``power-status``
242
| Parameters: [ ``nodename`` ... ]
243

    
244
Example output (represents :term:`SoW`)::
245

    
246
  gnt-node oob power-status
247
  Node                      Power Status
248
  node1.example.com         on
249
  node2.example.com         off
250
  node3.example.com         on
251
  node4.example.com         unknown
252

    
253
.. note::
254

    
255
  * We use ``unknown`` in case the Helper Program could not determine
256
    the power state.
257
  * If no nodenames are provided, we will list the power state of all
258
    nodes which are not opted out from OOB management.
259
  * Only nodes which are not opted out from OOB management will be
260
    listed.  Invoking the command on a node that does not meet this
261
    condition will result in an error message "Node X does not support
262
    OOB commands".
263

    
264
Node Power Status Listing (:term:`SoR`)
265
+++++++++++++++++++++++++++++++++++++++
266

    
267
| Program: ``gnt-node``
268
| Command: ``info``
269
| Parameter:  [ ``nodename`` ... ]
270
| Option: None
271

    
272
Example output (represents :term:`SoR`)::
273

    
274
  gnt-node info node1.example.com
275
  Node name: node1.example.com
276
    primary ip: 192.168.1.1
277
    secondary ip: 192.168.2.1
278
    master candidate: True
279
    drained: False
280
    offline: False
281
    powered: True
282
    primary for instances:
283
      - inst1.example.com
284
      - inst2.example.com
285
      - inst3.example.com
286
    secondary for instances:
287
      - inst4.example.com
288
      - inst5.example.com
289
      - inst6.example.com
290
      - inst7.example.com
291

    
292
.. note::
293
  Only nodes which are not opted out from OOB management will report the
294
  powered state.
295

    
296
New ``gnt-node`` oob subcommand: ``health``
297
+++++++++++++++++++++++++++++++++++++++++++
298

    
299
| Program: ``gnt-node``
300
| Command: ``health``
301
| Parameters: [ ``nodename`` ... ]
302
| Options: None
303
| Example: ``/usr/bin/oob health node5.example.com``
304

    
305
Caveats:
306

    
307
  * If no nodename(s) are provided, we will report the health of all
308
    nodes in the cluster which have ``--oob-program`` set.
309
  * Only nodes which are not opted out from OOB management will report
310
    their health. Invoking the command on a node that does not meet this
311
    condition will result in an error message "Node does not support OOB
312
    commands".
313

    
314
For error handling see `Error Handling`_
315

    
316
OOB Program (Helper Program) Parameters, Return Codes and Data Format
317
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
318

    
319
| Program: executable OOB program (absolute path)
320
| Parameters: command nodename
321
| Command: [power-{on|off|cycle|status}|health]
322
| Options: None
323
| Example: ``/usr/bin/oob power-on node1.example.com``
324
| Caveat: maximum runtime is limited to 60s
325

    
326
Return Codes
327
^^^^^^^^^^^^
328

    
329
+-------------+-------------------------+
330
| Return code | Meaning                 |
331
+=============+=========================+
332
| 0           | Command succeeded       |
333
+-------------+-------------------------+
334
| 1           | Command failed          |
335
+-------------+-------------------------+
336
| others      | Unsupported/undefined   |
337
+-------------+-------------------------+
338

    
339
Error messages are passed from the helper program to Ganeti through
340
:manpage:`stderr(3)` (return code == 1).  On :manpage:`stdout(3)`, the
341
helper program will send data back to Ganeti (return code == 0). The
342
format of the data is JSON.
343

    
344
+-----------------+------------------------------+
345
| Command         | Expected output              |
346
+=================+==============================+
347
| ``power-on``    | None                         |
348
+-----------------+------------------------------+
349
| ``power-off``   | None                         |
350
+-----------------+------------------------------+
351
| ``power-cycle`` | None                         |
352
+-----------------+------------------------------+
353
| ``power-status``| ``{ "powered": true|false }``|
354
+-----------------+------------------------------+
355
| ``health``      | ::                           |
356
|                 |                              |
357
|                 |   [[item, status],           |
358
|                 |    [item, status],           |
359
|                 |    ...]                      |
360
+-----------------+------------------------------+
361

    
362
Data Format
363
^^^^^^^^^^^
364

    
365
For the health output, the fields are:
366

    
367
+--------+------------------------------------------------------------------+
368
| Field  | Meaning                                                          |
369
+========+==================================================================+
370
| item   | String identifier of the item we are querying the health of,     |
371
|        | examples:                                                        |
372
|        |                                                                  |
373
|        |   * Ambient Temp                                                 |
374
|        |   * PS Redundancy                                                |
375
|        |   * FAN 1 RPM                                                    |
376
+--------+------------------------------------------------------------------+
377
| status | String; Can take one of the following four values:               |
378
|        |                                                                  |
379
|        |   * OK                                                           |
380
|        |   * WARNING                                                      |
381
|        |   * CRITICAL                                                     |
382
|        |   * UNKNOWN                                                      |
383
+--------+------------------------------------------------------------------+
384

    
385
.. note::
386

    
387
  * The item output list is defined by the Helper Program. It is up to
388
    the author of the Helper Program to decide which items should be
389
    monitored and what each corresponding return status is.
390
  * Ganeti will currently not take any actions based on the item
391
    status. It will however create log entries for items with status
392
    WARNING or CRITICAL for each run of the ``gnt-node oob health
393
    nodename`` command. Automatic actions (regular monitoring of the
394
    item status) is considered a new service and will be treated in a
395
    separate design document.
396

    
397
Logging
398
-------
399

    
400
The ``gnt-node power-[on|off]`` (power state changes) commands will
401
create log entries following current Ganeti logging practices. In
402
addition, health items with status WARNING or CRITICAL will be logged
403
for each run of ``gnt-node health``.
404

    
405
.. vim: set textwidth=72 :
406
.. Local Variables:
407
.. mode: rst
408
.. fill-column: 72
409
.. End: