Statistics
| Branch: | Tag: | Revision:

root / doc / design-oob.rst @ 66e884e1

History | View | Annotate | Download (14.8 kB)

1
Ganeti Node OOB Management Framework
2
====================================
3

    
4
Objective
5
---------
6

    
7
Extend Ganeti with Out of Band Cluster Node Management Capabilities.
8

    
9
Background
10
----------
11

    
12
Ganeti currently has no support for Out of Band management of the nodes in a
13
cluster. It relies on the OS running on the nodes and has therefore limited
14
possibilities when the OS is not responding. The command ``gnt-node powercycle``
15
can be issued to attempt a reboot of a node that crashed but there are no means
16
to power a node off and power it back on. Supporting this is very handy in the
17
following situations:
18

    
19
  * **Emergency Power Off**: During emergencies, time is critical and manual
20
    tasks just add latency which can be avoided through automation. If a server
21
    room overheats, halting the OS on the nodes is not enough. The nodes need
22
    to be powered off cleanly to prevent damage to equipment.
23
  * **Repairs**: In most cases, repairing a node means that the node has to be
24
    powered off.
25
  * **Crashes**: Software bugs may crash a node. Having an OS independent way to
26
    power-cycle a node helps to recover the node without human intervention.
27

    
28
Overview
29
--------
30

    
31
Ganeti will be extended with OOB capabilities through adding a new **Cluster
32
Parameter** (``--oob-program``), a new **Node Property** (``--oob-program``), a
33
new **Node State (powered)** and support in ``gnt-node`` for invoking an
34
**External Helper Command** which executes the actual OOB command (``gnt-node
35
<command> nodename ...``). The supported commands are: ``power on``,
36
``power off``, ``power cycle``, ``power status`` and ``health``.
37

    
38
.. note::
39
  The new **Node State (powered)** is a **State of Record
40
  (SoR)**, not a **State of World (SoW)**.  The maximum execution time of the
41
  **External Helper Command** will be limited to 60s to prevent the cluster from
42
  getting locked for an undefined amount of time.
43

    
44
Detailed Design
45
---------------
46

    
47
New ``gnt-cluster`` Parameter
48
+++++++++++++++++++++++++++++
49

    
50
| Program: ``gnt-cluster``
51
| Command: ``modify|init``
52
| Parameters: ``--oob-program``
53
| Options: ``--oob-program``: executable OOB program (absolute path)
54

    
55
New ``gnt-node`` Property
56
+++++++++++++++++++++++++
57

    
58
| Program: ``gnt-node``
59
| Command: ``modify|add``
60
| Parameters: ``--oob-program``
61
| Options: ``--oob-program``: executable OOB program (absolute path)
62

    
63
.. note::
64
  If ``--oob-program`` is set to ``!`` then the node has no OOB capabilities.
65
  Otherwise, we will inherit the node group respectively the cluster wide
66
  value. I.e. the nodes have to opt out from OOB capabilities.
67

    
68
Addition to ``gnt-cluster verify``
69
++++++++++++++++++++++++++++++++++
70

    
71
| Program: ``gnt-cluster``
72
| Command: ``verify``
73
| Parameter: None
74
| Option: None
75
| Additional Checks:
76

    
77
  1. existence and execution flag of OOB program on all Master Candidates if
78
     the cluster parameter ``--oob-program`` is set or at least one node has
79
     the property ``--oob-program`` set. The OOB helper is just invoked on the
80
     master
81
  2. check if node state powered matches actual power state of the machine for
82
     those nodes where ``--oob-program`` is set
83

    
84
New Node State
85
++++++++++++++
86

    
87
Ganeti supports the following two boolean states related to the nodes:
88

    
89
**drained**
90
  The cluster still communicates with drained nodes but excludes them from
91
  allocation operations
92

    
93
**offline**
94
  if offline, the cluster does not communicate with offline nodes; useful for
95
  nodes that are not reachable in order to avoid delays
96

    
97
And will extend this list with the following boolean state:
98

    
99
**powered**
100
  if not powered, the cluster does not communicate with not powered nodes if
101
  the node property ``--oob-program`` is not set, the state powered is not
102
  displayed
103

    
104
Additionally modify the meaning of the offline state as follows:
105

    
106
**offline**
107
  if offline, the cluster does not communicate with offline nodes (**with the
108
  exception of OOB commands for nodes where** ``--oob-program`` **is set**);
109
  useful for nodes that are not reachable in order to avoid delays
110

    
111
The corresponding command extensions are:
112

    
113
| Program: ``gnt-node``
114
| Command: ``info``
115
| Parameter:  [ ``nodename`` ... ]
116
| Option: None
117

    
118
Additional Output (SoR, ommited if node property ``--oob-program`` is not set):
119
powered: ``[True|False]``
120

    
121
| Program: ``gnt-node``
122
| Command: ``modify``
123
| Parameter: nodename
124
| Option: [ ``--powered=yes|no`` ]
125
| Reasoning: sometimes you will need to sync the SoR with the SoW manually
126
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
127
|         the node in question
128

    
129
New ``gnt-node`` commands: ``power [on|off|cycle|status]``
130
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
131

    
132
| Program: ``gnt-node``
133
| Command: ``power [on|off|cycle|status]``
134
| Parameters: [ ``nodename`` ... ]
135
| Options: None
136
| Caveats:
137

    
138
  * If no nodenames are passed to ``power [on|off|cycle]``, the user will be
139
    prompted with ``"Do you really want to power [on|off|cycle] the following
140
    nodes: <display list of OOB capable nodes in the cluster)? (y/n)"``
141
  * For ``power-status``, nodename is optional, if omitted, we list the
142
    power-status of all OOB capable nodes in the cluster (SoW)
143
  * User should be warned and needs to confirm with yes if s/he tries to
144
    ``power [off|cycle]`` a node with running instances.
145

    
146
Error Handling
147
^^^^^^^^^^^^^^
148

    
149
+------------------------------+-----------------------------------------------+
150
| Exception                    | Error Message                                 |
151
+==============================+===============================================+
152
| OOB program return code != 0 | OOB program execution failed ($ERROR_MSG)     |
153
+------------------------------+-----------------------------------------------+
154
| OOB program execution time   | OOB program execution timeout exceeded, OOB   |
155
| exceeds 60s                  | program execution aborted                     |
156
+------------------------------+-----------------------------------------------+
157

    
158
Node State Changes
159
^^^^^^^^^^^^^^^^^^
160

    
161
+----------------+-----------------+----------------+--------------------------+
162
| State before   | Command         | State after    | Comment                  |
163
| execution      |                 | execution      |                          |
164
+================+=================+================+==========================+
165
| powered: False | ``power off``   | powered: False | FYI: IPMI will complain  |
166
|                |                 |                | if you try to power off  |
167
|                |                 |                | a machine that is already|
168
|                |                 |                | powered off              |
169
+----------------+-----------------+----------------+--------------------------+
170
| powered: False | ``power cycle`` | powered: False | FYI: IPMI will complain  |
171
|                |                 |                | if you try to cycle a    |
172
|                |                 |                | machine that is already  |
173
|                |                 |                | powered off              |
174
+----------------+-----------------+----------------+--------------------------+
175
| powered: False | ``power on``    | powered: True  |                          |
176
+----------------+-----------------+----------------+--------------------------+
177
| powered: True  | ``power off``   | powered: False |                          |
178
+----------------+-----------------+----------------+--------------------------+
179
| powered: True  | ``power cycle`` | powered: True  |                          |
180
+----------------+-----------------+----------------+--------------------------+
181
| powered: True  | ``power on``    | powered: True  | FYI: IPMI will complain  |
182
|                |                 |                | if you try to power on   |
183
|                |                 |                | a machine that is already|
184
|                |                 |                | powered on               |
185
+----------------+-----------------+----------------+--------------------------+
186

    
187
.. note::
188

    
189
  * If the command fails, the Node State remains unchanged.
190
  * We will not prevent the user from trying to power off a node that is
191
    already powered off since the powered state represents the **SoR** only and
192
    not the **SoW**. This can however create problems when the cluster
193
    administrator wants to bring the **SoR** in sync with the **SoW** without
194
    actually having to mess with the node(s). For this case, we allow direct
195
    modification of the powered state through the gnt-node modify
196
    ``--powered=[yes|no]`` command as long as the node has OOB capabilities
197
    (i.e. ``--oob-program`` is set).
198
  * All node power state changes will be logged
199

    
200
Node Power Status Listing (SoW)
201
+++++++++++++++++++++++++++++++
202

    
203
| Program: ``gnt-node``
204
| Command: ``power-status``
205
| Parameters: [ ``nodename`` ... ]
206

    
207
Example output (represents **SoW**)::
208

    
209
  gnt-node oob power-status
210
  Node                      Power Status
211
  node1.example.com         on
212
  node2.example.com         off
213
  node3.example.com         on
214
  node4.example.com         unknown
215

    
216
.. note::
217

    
218
  * We use ``unknown`` in case the Helper Program could not determine the power
219
    state.
220
  * If no nodenames are provided, we will list the power state of all nodes
221
    which are not opted out from OOB management.
222
  * Only nodes which are not opted out from OOB management will be listed.
223
    Invoking the command on a node that does not meet this condition will
224
    result in an error message "Node X does not support OOB commands".
225

    
226
Node Power Status Listing (SoR)
227
+++++++++++++++++++++++++++++++
228

    
229
| Program: ``gnt-node``
230
| Command: ``info``
231
| Parameter:  [ ``nodename`` ... ]
232
| Option: None
233

    
234
Example output (represents **SoR**)::
235

    
236
  gnt-node info node1.example.com
237
  Node name: node1.example.com
238
    primary ip: 192.168.1.1
239
    secondary ip: 192.168.2.1
240
    master candidate: True
241
    drained: False
242
    offline: False
243
    powered: True
244
    primary for instances:
245
      - inst1.example.com
246
      - inst2.example.com
247
      - inst3.example.com
248
    secondary for instances:
249
      - inst4.example.com
250
      - inst5.example.com
251
      - inst6.example.com
252
      - inst7.example.com
253

    
254
.. note::
255
  Only nodes which are not opted out from OOB management will
256
  report the powered state.
257

    
258
New ``gnt-node`` oob subcommand: ``health``
259
+++++++++++++++++++++++++++++++++++++++++++
260

    
261
| Program: ``gnt-node``
262
| Command: ``health``
263
| Parameters: [ ``nodename`` ... ]
264
| Options: None
265
| Example: ``/usr/bin/oob health node5.example.com``
266

    
267
Caveats:
268

    
269
  * If no nodename(s) are provided, we will report the health of all nodes in
270
    the cluster which have ``--oob-program`` set.
271
  * Only nodes which are not opted out from OOB management will report their
272
    health. Invoking the command on a node that does not meet this condition
273
    will result in an error message "Node does not support OOB commands".
274

    
275
For error handling see `Error Handling`_
276

    
277
OOB Program (Helper Program) Parameters, Return Codes and Data Format
278
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
279

    
280
| Program: executable OOB program (absolute path)
281
| Parameters: command nodename
282
| Command: [power-{on|off|cycle|status}|health]
283
| Options: None
284
| Example: ``/usr/bin/oob power-on node1.example.com``
285
| Caveat: maximum runtime is limited to 60s
286

    
287
Return Codes
288
^^^^^^^^^^^^
289

    
290
+---------------+--------------------------+
291
| Return code   | Meaning                  |
292
+===============+==========================+
293
| 0             | Command succeeded        |
294
+---------------+--------------------------+
295
| 1             | Command failed           |
296
+---------------+--------------------------+
297
| others        | Unsupported/undefined    |
298
+---------------+--------------------------+
299

    
300
Error messages are passed from the helper program to Ganeti through StdErr
301
(return code == 1).  On StdOut, the helper program will send data back to
302
Ganeti (return code == 0). The format of the data is JSON.
303

    
304
+------------------+-------------------------------+
305
| Command          | Expected output               |
306
+==================+===============================+
307
| ``power-on``     | None                          |
308
+------------------+-------------------------------+
309
| ``power-off``    | None                          |
310
+------------------+-------------------------------+
311
| ``power-cycle``  | None                          |
312
+------------------+-------------------------------+
313
| ``power-status`` | ``{ "powered": true|false }`` |
314
+------------------+-------------------------------+
315
| ``health``       | ::                            |
316
|                  |                               |
317
|                  |   [[item, status],            |
318
|                  |    [item, status],            |
319
|                  |    ...]                       |
320
+------------------+-------------------------------+
321

    
322
Data Format
323
^^^^^^^^^^^
324

    
325
For the health output, the fields are:
326

    
327
+--------+--------------------------------------------------------------------+
328
| Field  | Meaning                                                            |
329
+========+====================================================================+
330
| item   | String identifier of the item we are querying the health of,       |
331
|        | examples:                                                          |
332
|        |                                                                    |
333
|        |   * Ambient Temp                                                   |
334
|        |   * PS Redundancy                                                  |
335
|        |   * FAN 1 RPM                                                      |
336
+--------+--------------------------------------------------------------------+
337
| status | String; Can take one of the following four values:                 |
338
|        |                                                                    |
339
|        |   * OK                                                             |
340
|        |   * WARNING                                                        |
341
|        |   * CRITICAL                                                       |
342
|        |   * UNKNOWN                                                        |
343
+--------+--------------------------------------------------------------------+
344

    
345
.. note::
346

    
347
  * The item output list is defined by the Helper Program. It is up to the
348
    author of the Helper Program to decide which items should be monitored and
349
    what each corresponding return status is.
350
  * Ganeti will currently not take any actions based on the item status. It
351
    will however create log entries for items with status WARNING or CRITICAL
352
    for each run of the ``gnt-node oob health nodename`` command. Automatic
353
    actions (regular monitoring of the item status) is considered a new service
354
    and will be treated in a separate design document.
355

    
356
Logging
357
-------
358

    
359
The ``gnt-node power-[on|off]`` (power state changes) commands will create log
360
entries following current Ganeti logging practices. In addition, health items
361
with status WARNING or CRITICAL will be logged for each run of ``gnt-node
362
health``.