Revision e3c39cc3 doc/design-oob.rst

b/doc/design-oob.rst
4 4
Objective
5 5
---------
6 6

  
7
Extend Ganeti with Out of Band Cluster Node Management Capabilities.
7
Extend Ganeti with Out of Band (:term:`OOB`) Cluster Node Management
8
Capabilities.
8 9

  
9 10
Background
10 11
----------
11 12

  
12
Ganeti currently has no support for Out of Band management of the nodes in a
13
cluster. It relies on the OS running on the nodes and has therefore limited
14
possibilities when the OS is not responding. The command ``gnt-node powercycle``
15
can be issued to attempt a reboot of a node that crashed but there are no means
16
to power a node off and power it back on. Supporting this is very handy in the
17
following situations:
18

  
19
  * **Emergency Power Off**: During emergencies, time is critical and manual
20
    tasks just add latency which can be avoided through automation. If a server
21
    room overheats, halting the OS on the nodes is not enough. The nodes need
22
    to be powered off cleanly to prevent damage to equipment.
23
  * **Repairs**: In most cases, repairing a node means that the node has to be
24
    powered off.
25
  * **Crashes**: Software bugs may crash a node. Having an OS independent way to
26
    power-cycle a node helps to recover the node without human intervention.
13
Ganeti currently has no support for Out of Band management of the nodes
14
in a cluster. It relies on the OS running on the nodes and has therefore
15
limited possibilities when the OS is not responding. The command
16
``gnt-node powercycle`` can be issued to attempt a reboot of a node that
17
crashed but there are no means to power a node off and power it back
18
on. Supporting this is very handy in the following situations:
19

  
20
  * **Emergency Power Off**: During emergencies, time is critical and
21
    manual tasks just add latency which can be avoided through
22
    automation. If a server room overheats, halting the OS on the nodes
23
    is not enough. The nodes need to be powered off cleanly to prevent
24
    damage to equipment.
25
  * **Repairs**: In most cases, repairing a node means that the node has
26
    to be powered off.
27
  * **Crashes**: Software bugs may crash a node. Having an OS
28
    independent way to power-cycle a node helps to recover the node
29
    without human intervention.
27 30

  
28 31
Overview
29 32
--------
30 33

  
31
Ganeti will be extended with OOB capabilities through adding a new **Cluster
32
Parameter** (``--oob-program``), a new **Node Property** (``--oob-program``), a
33
new **Node State (powered)** and support in ``gnt-node`` for invoking an
34
**External Helper Command** which executes the actual OOB command (``gnt-node
35
<command> nodename ...``). The supported commands are: ``power on``,
36
``power off``, ``power cycle``, ``power status`` and ``health``.
34
Ganeti will be extended with OOB capabilities through adding a new
35
**Cluster Parameter** (``--oob-program``), a new **Node Property**
36
(``--oob-program``), a new **Node State (powered)** and support in
37
``gnt-node`` for invoking an **External Helper Command** which executes
38
the actual OOB command (``gnt-node <command> nodename ...``). The
39
supported commands are: ``power on``, ``power off``, ``power cycle``,
40
``power status`` and ``health``.
37 41

  
38 42
.. note::
39
  The new **Node State (powered)** is a **State of Record
40
  (SoR)**, not a **State of World (SoW)**.  The maximum execution time of the
41
  **External Helper Command** will be limited to 60s to prevent the cluster from
42
  getting locked for an undefined amount of time.
43
  The new **Node State (powered)** is a **State of Record**
44
  (:term:`SoR`), not a **State of World** (:term:`SoW`).  The maximum
45
  execution time of the **External Helper Command** will be limited to
46
  60s to prevent the cluster from getting locked for an undefined amount
47
  of time.
43 48

  
44 49
Detailed Design
45 50
---------------
......
64 69
|          ``--groups``: To operate on groups instead of nodes
65 70
|          ``--all``: To operate on the whole cluster
66 71

  
67
This is a convenience command to allow easy emergency power off of a whole
68
cluster or part of it. It takes care of all steps needed to get the cluster into
69
a sane state to turn off the nodes.
72
This is a convenience command to allow easy emergency power off of a
73
whole cluster or part of it. It takes care of all steps needed to get
74
the cluster into a sane state to turn off the nodes.
70 75

  
71
With ``--on`` it does the reverse and tries to bring the rest of the cluster back
72
to life.
76
With ``--on`` it does the reverse and tries to bring the rest of the
77
cluster back to life.
73 78

  
74 79
.. note::
75
  The master node is not able to shut itself cleanly down. Therefore, this
76
  command will not do all the work on single node clusters. On multi node
77
  clusters the command tries to find another master or if that is not possible
78
  prepares everything to the point where the user has to shutdown the master
79
  node itself alone this applies also to the single node cluster configuration.
80
  The master node is not able to shut itself cleanly down. Therefore,
81
  this command will not do all the work on single node clusters. On
82
  multi node clusters the command tries to find another master or if
83
  that is not possible prepares everything to the point where the user
84
  has to shutdown the master node itself alone this applies also to the
85
  single node cluster configuration.
80 86

  
81 87
New ``gnt-node`` Property
82 88
+++++++++++++++++++++++++
......
87 93
| Options: ``--oob-program``: executable OOB program (absolute path)
88 94

  
89 95
.. note::
90
  If ``--oob-program`` is set to ``!`` then the node has no OOB capabilities.
91
  Otherwise, we will inherit the node group respectively the cluster wide
92
  value. I.e. the nodes have to opt out from OOB capabilities.
96
  If ``--oob-program`` is set to ``!`` then the node has no OOB
97
  capabilities.  Otherwise, we will inherit the node group respectively
98
  the cluster wide value. I.e. the nodes have to opt out from OOB
99
  capabilities.
93 100

  
94 101
Addition to ``gnt-cluster verify``
95 102
++++++++++++++++++++++++++++++++++
......
100 107
| Option: None
101 108
| Additional Checks:
102 109

  
103
  1. existence and execution flag of OOB program on all Master Candidates if
104
     the cluster parameter ``--oob-program`` is set or at least one node has
105
     the property ``--oob-program`` set. The OOB helper is just invoked on the
106
     master
107
  2. check if node state powered matches actual power state of the machine for
108
     those nodes where ``--oob-program`` is set
110
  1. existence and execution flag of OOB program on all Master
111
     Candidates if the cluster parameter ``--oob-program`` is set or at
112
     least one node has the property ``--oob-program`` set. The OOB
113
     helper is just invoked on the master
114
  2. check if node state powered matches actual power state of the
115
     machine for those nodes where ``--oob-program`` is set
109 116

  
110 117
New Node State
111 118
++++++++++++++
......
113 120
Ganeti supports the following two boolean states related to the nodes:
114 121

  
115 122
**drained**
116
  The cluster still communicates with drained nodes but excludes them from
117
  allocation operations
123
  The cluster still communicates with drained nodes but excludes them
124
  from allocation operations
118 125

  
119 126
**offline**
120
  if offline, the cluster does not communicate with offline nodes; useful for
121
  nodes that are not reachable in order to avoid delays
127
  if offline, the cluster does not communicate with offline nodes;
128
  useful for nodes that are not reachable in order to avoid delays
122 129

  
123 130
And will extend this list with the following boolean state:
124 131

  
125 132
**powered**
126
  if not powered, the cluster does not communicate with not powered nodes if
127
  the node property ``--oob-program`` is not set, the state powered is not
128
  displayed
133
  if not powered, the cluster does not communicate with not powered
134
  nodes if the node property ``--oob-program`` is not set, the state
135
  powered is not displayed
129 136

  
130 137
Additionally modify the meaning of the offline state as follows:
131 138

  
132 139
**offline**
133
  if offline, the cluster does not communicate with offline nodes (**with the
134
  exception of OOB commands for nodes where** ``--oob-program`` **is set**);
135
  useful for nodes that are not reachable in order to avoid delays
140
  if offline, the cluster does not communicate with offline nodes
141
  (**with the exception of OOB commands for nodes where**
142
  ``--oob-program`` **is set**); useful for nodes that are not reachable
143
  in order to avoid delays
136 144

  
137 145
The corresponding command extensions are:
138 146

  
......
141 149
| Parameter:  [ ``nodename`` ... ]
142 150
| Option: None
143 151

  
144
Additional Output (SoR, ommited if node property ``--oob-program`` is not set):
152
Additional Output (:term:`SoR`, ommited if node property
153
``--oob-program`` is not set):
145 154
powered: ``[True|False]``
146 155

  
147 156
| Program: ``gnt-node``
148 157
| Command: ``modify``
149 158
| Parameter: nodename
150 159
| Option: [ ``--powered=yes|no`` ]
151
| Reasoning: sometimes you will need to sync the SoR with the SoW manually
160
| Reasoning: sometimes you will need to sync the :term:`SoR` with the :term:`SoW` manually
152 161
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
153 162
|         the node in question
154 163

  
......
161 170
| Options: None
162 171
| Caveats:
163 172

  
164
  * If no nodenames are passed to ``power [on|off|cycle]``, the user will be
165
    prompted with ``"Do you really want to power [on|off|cycle] the following
166
    nodes: <display list of OOB capable nodes in the cluster)? (y/n)"``
173
  * If no nodenames are passed to ``power [on|off|cycle]``, the user
174
    will be prompted with ``"Do you really want to power [on|off|cycle]
175
    the following nodes: <display list of OOB capable nodes in the
176
    cluster)? (y/n)"``
167 177
  * For ``power-status``, nodename is optional, if omitted, we list the
168
    power-status of all OOB capable nodes in the cluster (SoW)
178
    power-status of all OOB capable nodes in the cluster (:term:`SoW`)
169 179
  * User should be warned and needs to confirm with yes if s/he tries to
170 180
    ``power [off|cycle]`` a node with running instances.
171 181

  
172 182
Error Handling
173 183
^^^^^^^^^^^^^^
174 184

  
175
+------------------------------+-----------------------------------------------+
176
| Exception                    | Error Message                                 |
177
+==============================+===============================================+
178
| OOB program return code != 0 | OOB program execution failed ($ERROR_MSG)     |
179
+------------------------------+-----------------------------------------------+
180
| OOB program execution time   | OOB program execution timeout exceeded, OOB   |
181
| exceeds 60s                  | program execution aborted                     |
182
+------------------------------+-----------------------------------------------+
185
+-----------------------------+----------------------------------------------+
186
| Exception                   | Error Message                                |
187
+=============================+==============================================+
188
| OOB program return code != 0| OOB program execution failed ($ERROR_MSG)    |
189
+-----------------------------+----------------------------------------------+
190
| OOB program execution time  | OOB program execution timeout exceeded, OOB  |
191
| exceeds 60s                 | program execution aborted                    |
192
+-----------------------------+----------------------------------------------+
183 193

  
184 194
Node State Changes
185 195
^^^^^^^^^^^^^^^^^^
186 196

  
187
+----------------+-----------------+----------------+--------------------------+
188
| State before   | Command         | State after    | Comment                  |
189
| execution      |                 | execution      |                          |
190
+================+=================+================+==========================+
191
| powered: False | ``power off``   | powered: False | FYI: IPMI will complain  |
192
|                |                 |                | if you try to power off  |
193
|                |                 |                | a machine that is already|
194
|                |                 |                | powered off              |
195
+----------------+-----------------+----------------+--------------------------+
196
| powered: False | ``power cycle`` | powered: False | FYI: IPMI will complain  |
197
|                |                 |                | if you try to cycle a    |
198
|                |                 |                | machine that is already  |
199
|                |                 |                | powered off              |
200
+----------------+-----------------+----------------+--------------------------+
201
| powered: False | ``power on``    | powered: True  |                          |
202
+----------------+-----------------+----------------+--------------------------+
203
| powered: True  | ``power off``   | powered: False |                          |
204
+----------------+-----------------+----------------+--------------------------+
205
| powered: True  | ``power cycle`` | powered: True  |                          |
206
+----------------+-----------------+----------------+--------------------------+
207
| powered: True  | ``power on``    | powered: True  | FYI: IPMI will complain  |
208
|                |                 |                | if you try to power on   |
209
|                |                 |                | a machine that is already|
210
|                |                 |                | powered on               |
211
+----------------+-----------------+----------------+--------------------------+
197
+----------------+---------------+----------------+--------------------------+
198
| State before   |Command        | State after    | Comment                  |
199
| execution      |               | execution      |                          |
200
+================+===============+================+==========================+
201
| powered: False |``power off``  | powered: False | FYI: IPMI will complain  |
202
|                |               |                | if you try to power off  |
203
|                |               |                | a machine that is already|
204
|                |               |                | powered off              |
205
+----------------+---------------+----------------+--------------------------+
206
| powered: False |``power cycle``| powered: False | FYI: IPMI will complain  |
207
|                |               |                | if you try to cycle a    |
208
|                |               |                | machine that is already  |
209
|                |               |                | powered off              |
210
+----------------+---------------+----------------+--------------------------+
211
| powered: False |``power on``   | powered: True  |                          |
212
+----------------+---------------+----------------+--------------------------+
213
| powered: True  |``power off``  | powered: False |                          |
214
+----------------+---------------+----------------+--------------------------+
215
| powered: True  |``power cycle``| powered: True  |                          |
216
+----------------+---------------+----------------+--------------------------+
217
| powered: True  |``power on``   | powered: True  | FYI: IPMI will complain  |
218
|                |               |                | if you try to power on   |
219
|                |               |                | a machine that is already|
220
|                |               |                | powered on               |
221
+----------------+---------------+----------------+--------------------------+
212 222

  
213 223
.. note::
214 224

  
215 225
  * If the command fails, the Node State remains unchanged.
216 226
  * We will not prevent the user from trying to power off a node that is
217
    already powered off since the powered state represents the **SoR** only and
218
    not the **SoW**. This can however create problems when the cluster
219
    administrator wants to bring the **SoR** in sync with the **SoW** without
220
    actually having to mess with the node(s). For this case, we allow direct
221
    modification of the powered state through the gnt-node modify
222
    ``--powered=[yes|no]`` command as long as the node has OOB capabilities
223
    (i.e. ``--oob-program`` is set).
227
    already powered off since the powered state represents the
228
    :term:`SoR` only and not the :term:`SoW`. This can however create
229
    problems when the cluster administrator wants to bring the
230
    :term:`SoR` in sync with the :term:SoW` without actually having to
231
    mess with the node(s). For this case, we allow direct modification
232
    of the powered state through the gnt-node modify
233
    ``--powered=[yes|no]`` command as long as the node has OOB
234
    capabilities (i.e. ``--oob-program`` is set).
224 235
  * All node power state changes will be logged
225 236

  
226
Node Power Status Listing (SoW)
227
+++++++++++++++++++++++++++++++
237
Node Power Status Listing (:term:`SoW`)
238
+++++++++++++++++++++++++++++++++++++++
228 239

  
229 240
| Program: ``gnt-node``
230 241
| Command: ``power-status``
231 242
| Parameters: [ ``nodename`` ... ]
232 243

  
233
Example output (represents **SoW**)::
244
Example output (represents :term:`SoW`)::
234 245

  
235 246
  gnt-node oob power-status
236 247
  Node                      Power Status
......
241 252

  
242 253
.. note::
243 254

  
244
  * We use ``unknown`` in case the Helper Program could not determine the power
245
    state.
246
  * If no nodenames are provided, we will list the power state of all nodes
247
    which are not opted out from OOB management.
248
  * Only nodes which are not opted out from OOB management will be listed.
249
    Invoking the command on a node that does not meet this condition will
250
    result in an error message "Node X does not support OOB commands".
255
  * We use ``unknown`` in case the Helper Program could not determine
256
    the power state.
257
  * If no nodenames are provided, we will list the power state of all
258
    nodes which are not opted out from OOB management.
259
  * Only nodes which are not opted out from OOB management will be
260
    listed.  Invoking the command on a node that does not meet this
261
    condition will result in an error message "Node X does not support
262
    OOB commands".
251 263

  
252
Node Power Status Listing (SoR)
253
+++++++++++++++++++++++++++++++
264
Node Power Status Listing (:term:`SoR`)
265
+++++++++++++++++++++++++++++++++++++++
254 266

  
255 267
| Program: ``gnt-node``
256 268
| Command: ``info``
257 269
| Parameter:  [ ``nodename`` ... ]
258 270
| Option: None
259 271

  
260
Example output (represents **SoR**)::
272
Example output (represents :term:`SoR`)::
261 273

  
262 274
  gnt-node info node1.example.com
263 275
  Node name: node1.example.com
......
278 290
      - inst7.example.com
279 291

  
280 292
.. note::
281
  Only nodes which are not opted out from OOB management will
282
  report the powered state.
293
  Only nodes which are not opted out from OOB management will report the
294
  powered state.
283 295

  
284 296
New ``gnt-node`` oob subcommand: ``health``
285 297
+++++++++++++++++++++++++++++++++++++++++++
......
292 304

  
293 305
Caveats:
294 306

  
295
  * If no nodename(s) are provided, we will report the health of all nodes in
296
    the cluster which have ``--oob-program`` set.
297
  * Only nodes which are not opted out from OOB management will report their
298
    health. Invoking the command on a node that does not meet this condition
299
    will result in an error message "Node does not support OOB commands".
307
  * If no nodename(s) are provided, we will report the health of all
308
    nodes in the cluster which have ``--oob-program`` set.
309
  * Only nodes which are not opted out from OOB management will report
310
    their health. Invoking the command on a node that does not meet this
311
    condition will result in an error message "Node does not support OOB
312
    commands".
300 313

  
301 314
For error handling see `Error Handling`_
302 315

  
......
313 326
Return Codes
314 327
^^^^^^^^^^^^
315 328

  
316
+---------------+--------------------------+
317
| Return code   | Meaning                  |
318
+===============+==========================+
319
| 0             | Command succeeded        |
320
+---------------+--------------------------+
321
| 1             | Command failed           |
322
+---------------+--------------------------+
323
| others        | Unsupported/undefined    |
324
+---------------+--------------------------+
325

  
326
Error messages are passed from the helper program to Ganeti through StdErr
327
(return code == 1).  On StdOut, the helper program will send data back to
328
Ganeti (return code == 0). The format of the data is JSON.
329

  
330
+------------------+-------------------------------+
331
| Command          | Expected output               |
332
+==================+===============================+
333
| ``power-on``     | None                          |
334
+------------------+-------------------------------+
335
| ``power-off``    | None                          |
336
+------------------+-------------------------------+
337
| ``power-cycle``  | None                          |
338
+------------------+-------------------------------+
339
| ``power-status`` | ``{ "powered": true|false }`` |
340
+------------------+-------------------------------+
341
| ``health``       | ::                            |
342
|                  |                               |
343
|                  |   [[item, status],            |
344
|                  |    [item, status],            |
345
|                  |    ...]                       |
346
+------------------+-------------------------------+
329
+-------------+-------------------------+
330
| Return code | Meaning                 |
331
+=============+=========================+
332
| 0           | Command succeeded       |
333
+-------------+-------------------------+
334
| 1           | Command failed          |
335
+-------------+-------------------------+
336
| others      | Unsupported/undefined   |
337
+-------------+-------------------------+
338

  
339
Error messages are passed from the helper program to Ganeti through
340
:manpage:`stderr(3)` (return code == 1).  On :manpage:`stdout(3)`, the
341
helper program will send data back to Ganeti (return code == 0). The
342
format of the data is JSON.
343

  
344
+-----------------+------------------------------+
345
| Command         | Expected output              |
346
+=================+==============================+
347
| ``power-on``    | None                         |
348
+-----------------+------------------------------+
349
| ``power-off``   | None                         |
350
+-----------------+------------------------------+
351
| ``power-cycle`` | None                         |
352
+-----------------+------------------------------+
353
| ``power-status``| ``{ "powered": true|false }``|
354
+-----------------+------------------------------+
355
| ``health``      | ::                           |
356
|                 |                              |
357
|                 |   [[item, status],           |
358
|                 |    [item, status],           |
359
|                 |    ...]                      |
360
+-----------------+------------------------------+
347 361

  
348 362
Data Format
349 363
^^^^^^^^^^^
350 364

  
351 365
For the health output, the fields are:
352 366

  
353
+--------+--------------------------------------------------------------------+
354
| Field  | Meaning                                                            |
355
+========+====================================================================+
356
| item   | String identifier of the item we are querying the health of,       |
357
|        | examples:                                                          |
358
|        |                                                                    |
359
|        |   * Ambient Temp                                                   |
360
|        |   * PS Redundancy                                                  |
361
|        |   * FAN 1 RPM                                                      |
362
+--------+--------------------------------------------------------------------+
363
| status | String; Can take one of the following four values:                 |
364
|        |                                                                    |
365
|        |   * OK                                                             |
366
|        |   * WARNING                                                        |
367
|        |   * CRITICAL                                                       |
368
|        |   * UNKNOWN                                                        |
369
+--------+--------------------------------------------------------------------+
367
+--------+------------------------------------------------------------------+
368
| Field  | Meaning                                                          |
369
+========+==================================================================+
370
| item   | String identifier of the item we are querying the health of,     |
371
|        | examples:                                                        |
372
|        |                                                                  |
373
|        |   * Ambient Temp                                                 |
374
|        |   * PS Redundancy                                                |
375
|        |   * FAN 1 RPM                                                    |
376
+--------+------------------------------------------------------------------+
377
| status | String; Can take one of the following four values:               |
378
|        |                                                                  |
379
|        |   * OK                                                           |
380
|        |   * WARNING                                                      |
381
|        |   * CRITICAL                                                     |
382
|        |   * UNKNOWN                                                      |
383
+--------+------------------------------------------------------------------+
370 384

  
371 385
.. note::
372 386

  
373
  * The item output list is defined by the Helper Program. It is up to the
374
    author of the Helper Program to decide which items should be monitored and
375
    what each corresponding return status is.
376
  * Ganeti will currently not take any actions based on the item status. It
377
    will however create log entries for items with status WARNING or CRITICAL
378
    for each run of the ``gnt-node oob health nodename`` command. Automatic
379
    actions (regular monitoring of the item status) is considered a new service
380
    and will be treated in a separate design document.
387
  * The item output list is defined by the Helper Program. It is up to
388
    the author of the Helper Program to decide which items should be
389
    monitored and what each corresponding return status is.
390
  * Ganeti will currently not take any actions based on the item
391
    status. It will however create log entries for items with status
392
    WARNING or CRITICAL for each run of the ``gnt-node oob health
393
    nodename`` command. Automatic actions (regular monitoring of the
394
    item status) is considered a new service and will be treated in a
395
    separate design document.
381 396

  
382 397
Logging
383 398
-------
384 399

  
385
The ``gnt-node power-[on|off]`` (power state changes) commands will create log
386
entries following current Ganeti logging practices. In addition, health items
387
with status WARNING or CRITICAL will be logged for each run of ``gnt-node
388
health``.
400
The ``gnt-node power-[on|off]`` (power state changes) commands will
401
create log entries following current Ganeti logging practices. In
402
addition, health items with status WARNING or CRITICAL will be logged
403
for each run of ``gnt-node health``.
389 404

  
390 405
.. vim: set textwidth=72 :
391 406
.. Local Variables:

Also available in: Unified diff