Statistics
| Branch: | Tag: | Revision:

root / doc / design-oob.rst @ 9ff4f2c0

History | View | Annotate | Download (16 kB)

1
Ganeti Node OOB Management Framework
2
====================================
3

    
4
Objective
5
---------
6

    
7
Extend Ganeti with Out of Band Cluster Node Management Capabilities.
8

    
9
Background
10
----------
11

    
12
Ganeti currently has no support for Out of Band management of the nodes in a
13
cluster. It relies on the OS running on the nodes and has therefore limited
14
possibilities when the OS is not responding. The command ``gnt-node powercycle``
15
can be issued to attempt a reboot of a node that crashed but there are no means
16
to power a node off and power it back on. Supporting this is very handy in the
17
following situations:
18

    
19
  * **Emergency Power Off**: During emergencies, time is critical and manual
20
    tasks just add latency which can be avoided through automation. If a server
21
    room overheats, halting the OS on the nodes is not enough. The nodes need
22
    to be powered off cleanly to prevent damage to equipment.
23
  * **Repairs**: In most cases, repairing a node means that the node has to be
24
    powered off.
25
  * **Crashes**: Software bugs may crash a node. Having an OS independent way to
26
    power-cycle a node helps to recover the node without human intervention.
27

    
28
Overview
29
--------
30

    
31
Ganeti will be extended with OOB capabilities through adding a new **Cluster
32
Parameter** (``--oob-program``), a new **Node Property** (``--oob-program``), a
33
new **Node State (powered)** and support in ``gnt-node`` for invoking an
34
**External Helper Command** which executes the actual OOB command (``gnt-node
35
<command> nodename ...``). The supported commands are: ``power on``,
36
``power off``, ``power cycle``, ``power status`` and ``health``.
37

    
38
.. note::
39
  The new **Node State (powered)** is a **State of Record
40
  (SoR)**, not a **State of World (SoW)**.  The maximum execution time of the
41
  **External Helper Command** will be limited to 60s to prevent the cluster from
42
  getting locked for an undefined amount of time.
43

    
44
Detailed Design
45
---------------
46

    
47
New ``gnt-cluster`` Parameter
48
+++++++++++++++++++++++++++++
49

    
50
| Program: ``gnt-cluster``
51
| Command: ``modify|init``
52
| Parameters: ``--oob-program``
53
| Options: ``--oob-program``: executable OOB program (absolute path)
54

    
55
New ``gnt-cluster epo`` Command
56
+++++++++++++++++++++++++++++++
57

    
58
| Program: ``gnt-cluster``
59
| Command: ``epo``
60
| Parameter: ``--on`` ``--force`` ``--groups`` ``--all``
61
| Options: ``--on``: By default epo turns off, with ``--on`` it tries to get the
62
|                    cluster back online
63
|          ``--force``: To force the operation without asking for confirmation
64
|          ``--groups``: To operate on groups instead of nodes
65
|          ``--all``: To operate on the whole cluster
66

    
67
This is a convenience command to allow easy emergency power off of a whole
68
cluster or part of it. It takes care of all steps needed to get the cluster into
69
a sane state to turn off the nodes.
70

    
71
With ``--on`` it does the reverse and tries to bring the rest of the cluster back
72
to life.
73

    
74
.. note::
75
  The master node is not able to shut itself cleanly down. Therefore, this
76
  command will not do all the work on single node clusters. On multi node
77
  clusters the command tries to find another master or if that is not possible
78
  prepares everything to the point where the user has to shutdown the master
79
  node itself alone this applies also to the single node cluster configuration.
80

    
81
New ``gnt-node`` Property
82
+++++++++++++++++++++++++
83

    
84
| Program: ``gnt-node``
85
| Command: ``modify|add``
86
| Parameters: ``--oob-program``
87
| Options: ``--oob-program``: executable OOB program (absolute path)
88

    
89
.. note::
90
  If ``--oob-program`` is set to ``!`` then the node has no OOB capabilities.
91
  Otherwise, we will inherit the node group respectively the cluster wide
92
  value. I.e. the nodes have to opt out from OOB capabilities.
93

    
94
Addition to ``gnt-cluster verify``
95
++++++++++++++++++++++++++++++++++
96

    
97
| Program: ``gnt-cluster``
98
| Command: ``verify``
99
| Parameter: None
100
| Option: None
101
| Additional Checks:
102

    
103
  1. existence and execution flag of OOB program on all Master Candidates if
104
     the cluster parameter ``--oob-program`` is set or at least one node has
105
     the property ``--oob-program`` set. The OOB helper is just invoked on the
106
     master
107
  2. check if node state powered matches actual power state of the machine for
108
     those nodes where ``--oob-program`` is set
109

    
110
New Node State
111
++++++++++++++
112

    
113
Ganeti supports the following two boolean states related to the nodes:
114

    
115
**drained**
116
  The cluster still communicates with drained nodes but excludes them from
117
  allocation operations
118

    
119
**offline**
120
  if offline, the cluster does not communicate with offline nodes; useful for
121
  nodes that are not reachable in order to avoid delays
122

    
123
And will extend this list with the following boolean state:
124

    
125
**powered**
126
  if not powered, the cluster does not communicate with not powered nodes if
127
  the node property ``--oob-program`` is not set, the state powered is not
128
  displayed
129

    
130
Additionally modify the meaning of the offline state as follows:
131

    
132
**offline**
133
  if offline, the cluster does not communicate with offline nodes (**with the
134
  exception of OOB commands for nodes where** ``--oob-program`` **is set**);
135
  useful for nodes that are not reachable in order to avoid delays
136

    
137
The corresponding command extensions are:
138

    
139
| Program: ``gnt-node``
140
| Command: ``info``
141
| Parameter:  [ ``nodename`` ... ]
142
| Option: None
143

    
144
Additional Output (SoR, ommited if node property ``--oob-program`` is not set):
145
powered: ``[True|False]``
146

    
147
| Program: ``gnt-node``
148
| Command: ``modify``
149
| Parameter: nodename
150
| Option: [ ``--powered=yes|no`` ]
151
| Reasoning: sometimes you will need to sync the SoR with the SoW manually
152
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
153
|         the node in question
154

    
155
New ``gnt-node`` commands: ``power [on|off|cycle|status]``
156
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
157

    
158
| Program: ``gnt-node``
159
| Command: ``power [on|off|cycle|status]``
160
| Parameters: [ ``nodename`` ... ]
161
| Options: None
162
| Caveats:
163

    
164
  * If no nodenames are passed to ``power [on|off|cycle]``, the user will be
165
    prompted with ``"Do you really want to power [on|off|cycle] the following
166
    nodes: <display list of OOB capable nodes in the cluster)? (y/n)"``
167
  * For ``power-status``, nodename is optional, if omitted, we list the
168
    power-status of all OOB capable nodes in the cluster (SoW)
169
  * User should be warned and needs to confirm with yes if s/he tries to
170
    ``power [off|cycle]`` a node with running instances.
171

    
172
Error Handling
173
^^^^^^^^^^^^^^
174

    
175
+------------------------------+-----------------------------------------------+
176
| Exception                    | Error Message                                 |
177
+==============================+===============================================+
178
| OOB program return code != 0 | OOB program execution failed ($ERROR_MSG)     |
179
+------------------------------+-----------------------------------------------+
180
| OOB program execution time   | OOB program execution timeout exceeded, OOB   |
181
| exceeds 60s                  | program execution aborted                     |
182
+------------------------------+-----------------------------------------------+
183

    
184
Node State Changes
185
^^^^^^^^^^^^^^^^^^
186

    
187
+----------------+-----------------+----------------+--------------------------+
188
| State before   | Command         | State after    | Comment                  |
189
| execution      |                 | execution      |                          |
190
+================+=================+================+==========================+
191
| powered: False | ``power off``   | powered: False | FYI: IPMI will complain  |
192
|                |                 |                | if you try to power off  |
193
|                |                 |                | a machine that is already|
194
|                |                 |                | powered off              |
195
+----------------+-----------------+----------------+--------------------------+
196
| powered: False | ``power cycle`` | powered: False | FYI: IPMI will complain  |
197
|                |                 |                | if you try to cycle a    |
198
|                |                 |                | machine that is already  |
199
|                |                 |                | powered off              |
200
+----------------+-----------------+----------------+--------------------------+
201
| powered: False | ``power on``    | powered: True  |                          |
202
+----------------+-----------------+----------------+--------------------------+
203
| powered: True  | ``power off``   | powered: False |                          |
204
+----------------+-----------------+----------------+--------------------------+
205
| powered: True  | ``power cycle`` | powered: True  |                          |
206
+----------------+-----------------+----------------+--------------------------+
207
| powered: True  | ``power on``    | powered: True  | FYI: IPMI will complain  |
208
|                |                 |                | if you try to power on   |
209
|                |                 |                | a machine that is already|
210
|                |                 |                | powered on               |
211
+----------------+-----------------+----------------+--------------------------+
212

    
213
.. note::
214

    
215
  * If the command fails, the Node State remains unchanged.
216
  * We will not prevent the user from trying to power off a node that is
217
    already powered off since the powered state represents the **SoR** only and
218
    not the **SoW**. This can however create problems when the cluster
219
    administrator wants to bring the **SoR** in sync with the **SoW** without
220
    actually having to mess with the node(s). For this case, we allow direct
221
    modification of the powered state through the gnt-node modify
222
    ``--powered=[yes|no]`` command as long as the node has OOB capabilities
223
    (i.e. ``--oob-program`` is set).
224
  * All node power state changes will be logged
225

    
226
Node Power Status Listing (SoW)
227
+++++++++++++++++++++++++++++++
228

    
229
| Program: ``gnt-node``
230
| Command: ``power-status``
231
| Parameters: [ ``nodename`` ... ]
232

    
233
Example output (represents **SoW**)::
234

    
235
  gnt-node oob power-status
236
  Node                      Power Status
237
  node1.example.com         on
238
  node2.example.com         off
239
  node3.example.com         on
240
  node4.example.com         unknown
241

    
242
.. note::
243

    
244
  * We use ``unknown`` in case the Helper Program could not determine the power
245
    state.
246
  * If no nodenames are provided, we will list the power state of all nodes
247
    which are not opted out from OOB management.
248
  * Only nodes which are not opted out from OOB management will be listed.
249
    Invoking the command on a node that does not meet this condition will
250
    result in an error message "Node X does not support OOB commands".
251

    
252
Node Power Status Listing (SoR)
253
+++++++++++++++++++++++++++++++
254

    
255
| Program: ``gnt-node``
256
| Command: ``info``
257
| Parameter:  [ ``nodename`` ... ]
258
| Option: None
259

    
260
Example output (represents **SoR**)::
261

    
262
  gnt-node info node1.example.com
263
  Node name: node1.example.com
264
    primary ip: 192.168.1.1
265
    secondary ip: 192.168.2.1
266
    master candidate: True
267
    drained: False
268
    offline: False
269
    powered: True
270
    primary for instances:
271
      - inst1.example.com
272
      - inst2.example.com
273
      - inst3.example.com
274
    secondary for instances:
275
      - inst4.example.com
276
      - inst5.example.com
277
      - inst6.example.com
278
      - inst7.example.com
279

    
280
.. note::
281
  Only nodes which are not opted out from OOB management will
282
  report the powered state.
283

    
284
New ``gnt-node`` oob subcommand: ``health``
285
+++++++++++++++++++++++++++++++++++++++++++
286

    
287
| Program: ``gnt-node``
288
| Command: ``health``
289
| Parameters: [ ``nodename`` ... ]
290
| Options: None
291
| Example: ``/usr/bin/oob health node5.example.com``
292

    
293
Caveats:
294

    
295
  * If no nodename(s) are provided, we will report the health of all nodes in
296
    the cluster which have ``--oob-program`` set.
297
  * Only nodes which are not opted out from OOB management will report their
298
    health. Invoking the command on a node that does not meet this condition
299
    will result in an error message "Node does not support OOB commands".
300

    
301
For error handling see `Error Handling`_
302

    
303
OOB Program (Helper Program) Parameters, Return Codes and Data Format
304
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
305

    
306
| Program: executable OOB program (absolute path)
307
| Parameters: command nodename
308
| Command: [power-{on|off|cycle|status}|health]
309
| Options: None
310
| Example: ``/usr/bin/oob power-on node1.example.com``
311
| Caveat: maximum runtime is limited to 60s
312

    
313
Return Codes
314
^^^^^^^^^^^^
315

    
316
+---------------+--------------------------+
317
| Return code   | Meaning                  |
318
+===============+==========================+
319
| 0             | Command succeeded        |
320
+---------------+--------------------------+
321
| 1             | Command failed           |
322
+---------------+--------------------------+
323
| others        | Unsupported/undefined    |
324
+---------------+--------------------------+
325

    
326
Error messages are passed from the helper program to Ganeti through StdErr
327
(return code == 1).  On StdOut, the helper program will send data back to
328
Ganeti (return code == 0). The format of the data is JSON.
329

    
330
+------------------+-------------------------------+
331
| Command          | Expected output               |
332
+==================+===============================+
333
| ``power-on``     | None                          |
334
+------------------+-------------------------------+
335
| ``power-off``    | None                          |
336
+------------------+-------------------------------+
337
| ``power-cycle``  | None                          |
338
+------------------+-------------------------------+
339
| ``power-status`` | ``{ "powered": true|false }`` |
340
+------------------+-------------------------------+
341
| ``health``       | ::                            |
342
|                  |                               |
343
|                  |   [[item, status],            |
344
|                  |    [item, status],            |
345
|                  |    ...]                       |
346
+------------------+-------------------------------+
347

    
348
Data Format
349
^^^^^^^^^^^
350

    
351
For the health output, the fields are:
352

    
353
+--------+--------------------------------------------------------------------+
354
| Field  | Meaning                                                            |
355
+========+====================================================================+
356
| item   | String identifier of the item we are querying the health of,       |
357
|        | examples:                                                          |
358
|        |                                                                    |
359
|        |   * Ambient Temp                                                   |
360
|        |   * PS Redundancy                                                  |
361
|        |   * FAN 1 RPM                                                      |
362
+--------+--------------------------------------------------------------------+
363
| status | String; Can take one of the following four values:                 |
364
|        |                                                                    |
365
|        |   * OK                                                             |
366
|        |   * WARNING                                                        |
367
|        |   * CRITICAL                                                       |
368
|        |   * UNKNOWN                                                        |
369
+--------+--------------------------------------------------------------------+
370

    
371
.. note::
372

    
373
  * The item output list is defined by the Helper Program. It is up to the
374
    author of the Helper Program to decide which items should be monitored and
375
    what each corresponding return status is.
376
  * Ganeti will currently not take any actions based on the item status. It
377
    will however create log entries for items with status WARNING or CRITICAL
378
    for each run of the ``gnt-node oob health nodename`` command. Automatic
379
    actions (regular monitoring of the item status) is considered a new service
380
    and will be treated in a separate design document.
381

    
382
Logging
383
-------
384

    
385
The ``gnt-node power-[on|off]`` (power state changes) commands will create log
386
entries following current Ganeti logging practices. In addition, health items
387
with status WARNING or CRITICAL will be logged for each run of ``gnt-node
388
health``.
389

    
390
.. vim: set textwidth=72 :
391
.. Local Variables:
392
.. mode: rst
393
.. fill-column: 72
394
.. End: