Statistics
| Branch: | Tag: | Revision:

root / doc / design-oob.rst @ 1e86ee97

History | View | Annotate | Download (14.8 kB)

1 1e86ee97 Marc Schmitt
Ganeti Node OOB Management Framework
2 1e86ee97 Marc Schmitt
====================================
3 1e86ee97 Marc Schmitt
4 1e86ee97 Marc Schmitt
Objective
5 1e86ee97 Marc Schmitt
---------
6 1e86ee97 Marc Schmitt
7 1e86ee97 Marc Schmitt
Extend Ganeti with Out of Band Cluster Node Management Capabilities.
8 1e86ee97 Marc Schmitt
9 1e86ee97 Marc Schmitt
Background
10 1e86ee97 Marc Schmitt
----------
11 1e86ee97 Marc Schmitt
12 1e86ee97 Marc Schmitt
Ganeti currently has no support for Out of Band management of the nodes in a
13 1e86ee97 Marc Schmitt
cluster. It relies on the OS running on the nodes and has therefore limited
14 1e86ee97 Marc Schmitt
possibilities when the OS is not responding. The command ``gnt-node powercycle``
15 1e86ee97 Marc Schmitt
can be issued to attempt a reboot of a node that crashed but there are no means
16 1e86ee97 Marc Schmitt
to power a node off and power it back on. Supporting this is very handy in the
17 1e86ee97 Marc Schmitt
following situations:
18 1e86ee97 Marc Schmitt
19 1e86ee97 Marc Schmitt
  * **Emergency Power Off**: During emergencies, time is critical and manual
20 1e86ee97 Marc Schmitt
    tasks just add latency which can be avoided through automation. If a server
21 1e86ee97 Marc Schmitt
    room overheats, halting the OS on the nodes is not enough. The nodes need
22 1e86ee97 Marc Schmitt
    to be powered off cleanly to prevent damage to equipment.
23 1e86ee97 Marc Schmitt
  * **Repairs**: In most cases, repairing a node means that the node has to be
24 1e86ee97 Marc Schmitt
    powered off.
25 1e86ee97 Marc Schmitt
  * **Crashes**: Software bugs may crash a node. Having an OS independent way to
26 1e86ee97 Marc Schmitt
    power-cycle a node helps to recover the node without human intervention.
27 1e86ee97 Marc Schmitt
28 1e86ee97 Marc Schmitt
Overview
29 1e86ee97 Marc Schmitt
--------
30 1e86ee97 Marc Schmitt
31 1e86ee97 Marc Schmitt
Ganeti will be extended with OOB capabilities through adding a new **Cluster
32 1e86ee97 Marc Schmitt
Parameter** (``--oob-program``), a new **Node Property** (``--oob-program``), a
33 1e86ee97 Marc Schmitt
new **Node State (powered)** and support in ``gnt-node`` for invoking an
34 1e86ee97 Marc Schmitt
**External Helper Command** which executes the actual OOB command (``gnt-node
35 1e86ee97 Marc Schmitt
<command> nodename ...``). The supported commands are: ``power on``,
36 1e86ee97 Marc Schmitt
``power off``, ``power cycle``, ``power status`` and ``health``.
37 1e86ee97 Marc Schmitt
38 1e86ee97 Marc Schmitt
.. note::
39 1e86ee97 Marc Schmitt
  The new **Node State (powered)** is a **State of Record
40 1e86ee97 Marc Schmitt
  (SoR)**, not a **State of World (SoW)**.  The maximum execution time of the
41 1e86ee97 Marc Schmitt
  **External Helper Command** will be limited to 60s to prevent the cluster from
42 1e86ee97 Marc Schmitt
  getting locked for an undefined amount of time.
43 1e86ee97 Marc Schmitt
44 1e86ee97 Marc Schmitt
Detailed Design
45 1e86ee97 Marc Schmitt
---------------
46 1e86ee97 Marc Schmitt
47 1e86ee97 Marc Schmitt
New ``gnt-cluster`` Parameter
48 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++
49 1e86ee97 Marc Schmitt
50 1e86ee97 Marc Schmitt
| Program: ``gnt-cluster``
51 1e86ee97 Marc Schmitt
| Command: ``modify|init``
52 1e86ee97 Marc Schmitt
| Parameters: ``--oob-program``
53 1e86ee97 Marc Schmitt
| Options: ``--oob-program``: executable OOB program (absolute path)
54 1e86ee97 Marc Schmitt
55 1e86ee97 Marc Schmitt
New ``gnt-node`` Property
56 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++
57 1e86ee97 Marc Schmitt
58 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
59 1e86ee97 Marc Schmitt
| Command: ``modify|add``
60 1e86ee97 Marc Schmitt
| Parameters: ``--oob-program``
61 1e86ee97 Marc Schmitt
| Options: ``--oob-program``: executable OOB program (absolute path)
62 1e86ee97 Marc Schmitt
63 1e86ee97 Marc Schmitt
.. note::
64 1e86ee97 Marc Schmitt
  If ``--oob-program`` is set to ``!`` then the node has no OOB capabilities.
65 1e86ee97 Marc Schmitt
  Otherwise, we will inherit the node group respectively the cluster wide
66 1e86ee97 Marc Schmitt
  value. I.e. the nodes have to opt out from OOB capabilities.
67 1e86ee97 Marc Schmitt
68 1e86ee97 Marc Schmitt
Addition to ``gnt-cluster verify``
69 1e86ee97 Marc Schmitt
++++++++++++++++++++++++++++++++++
70 1e86ee97 Marc Schmitt
71 1e86ee97 Marc Schmitt
| Program: ``gnt-cluster``
72 1e86ee97 Marc Schmitt
| Command: ``verify``
73 1e86ee97 Marc Schmitt
| Parameter: None
74 1e86ee97 Marc Schmitt
| Option: None
75 1e86ee97 Marc Schmitt
| Additional Checks:
76 1e86ee97 Marc Schmitt
77 1e86ee97 Marc Schmitt
  1. existence and execution flag of OOB program on all Master Candidates if
78 1e86ee97 Marc Schmitt
     the cluster parameter ``--oob-program`` is set or at least one node has
79 1e86ee97 Marc Schmitt
     the property ``--oob-program`` set. The OOB helper is just invoked on the
80 1e86ee97 Marc Schmitt
     master
81 1e86ee97 Marc Schmitt
  2. check if node state powered matches actual power state of the machine for
82 1e86ee97 Marc Schmitt
     those nodes where ``--oob-program`` is set
83 1e86ee97 Marc Schmitt
84 1e86ee97 Marc Schmitt
New Node State
85 1e86ee97 Marc Schmitt
++++++++++++++
86 1e86ee97 Marc Schmitt
87 1e86ee97 Marc Schmitt
Ganeti supports the following two boolean states related to the nodes:
88 1e86ee97 Marc Schmitt
89 1e86ee97 Marc Schmitt
**drained**
90 1e86ee97 Marc Schmitt
  The cluster still communicates with drained nodes but excludes them from
91 1e86ee97 Marc Schmitt
  allocation operations
92 1e86ee97 Marc Schmitt
93 1e86ee97 Marc Schmitt
**offline**
94 1e86ee97 Marc Schmitt
  if offline, the cluster does not communicate with offline nodes; useful for
95 1e86ee97 Marc Schmitt
  nodes that are not reachable in order to avoid delays
96 1e86ee97 Marc Schmitt
97 1e86ee97 Marc Schmitt
And will extend this list with the following boolean state:
98 1e86ee97 Marc Schmitt
99 1e86ee97 Marc Schmitt
**powered**
100 1e86ee97 Marc Schmitt
  if not powered, the cluster does not communicate with not powered nodes if
101 1e86ee97 Marc Schmitt
  the node property ``--oob-program`` is not set, the state powered is not
102 1e86ee97 Marc Schmitt
  displayed
103 1e86ee97 Marc Schmitt
104 1e86ee97 Marc Schmitt
Additionally modify the meaning of the offline state as follows:
105 1e86ee97 Marc Schmitt
106 1e86ee97 Marc Schmitt
**offline**
107 1e86ee97 Marc Schmitt
  if offline, the cluster does not communicate with offline nodes (**with the
108 1e86ee97 Marc Schmitt
  exception of OOB commands for nodes where** ``--oob-program`` **is set**);
109 1e86ee97 Marc Schmitt
  useful for nodes that are not reachable in order to avoid delays
110 1e86ee97 Marc Schmitt
111 1e86ee97 Marc Schmitt
The corresponding command extensions are:
112 1e86ee97 Marc Schmitt
113 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
114 1e86ee97 Marc Schmitt
| Command: ``info``
115 1e86ee97 Marc Schmitt
| Parameter:  [ ``nodename`` ... ]
116 1e86ee97 Marc Schmitt
| Option: None
117 1e86ee97 Marc Schmitt
118 1e86ee97 Marc Schmitt
Additional Output (SoR, ommited if node property ``--oob-program`` is not set):
119 1e86ee97 Marc Schmitt
powered: ``[True|False]``
120 1e86ee97 Marc Schmitt
121 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
122 1e86ee97 Marc Schmitt
| Command: ``modify``
123 1e86ee97 Marc Schmitt
| Parameter: nodename
124 1e86ee97 Marc Schmitt
| Option: [ ``--powered=yes|no`` ]
125 1e86ee97 Marc Schmitt
| Reasoning: sometimes you will need to sync the SoR with the SoW manually
126 1e86ee97 Marc Schmitt
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
127 1e86ee97 Marc Schmitt
|         the node in question
128 1e86ee97 Marc Schmitt
129 1e86ee97 Marc Schmitt
New ``gnt-node`` commands: ``power [on|off|cycle|status]``
130 1e86ee97 Marc Schmitt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
131 1e86ee97 Marc Schmitt
132 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
133 1e86ee97 Marc Schmitt
| Command: ``power [on|off|cycle|status]``
134 1e86ee97 Marc Schmitt
| Parameters: [ ``nodename`` ... ]
135 1e86ee97 Marc Schmitt
| Options: None
136 1e86ee97 Marc Schmitt
| Caveats:
137 1e86ee97 Marc Schmitt
138 1e86ee97 Marc Schmitt
  * If no nodenames are passed to ``power [on|off|cycle]``, the user will be
139 1e86ee97 Marc Schmitt
    prompted with ``"Do you really want to power [on|off|cycle] the following
140 1e86ee97 Marc Schmitt
    nodes: <display list of OOB capable nodes in the cluster)? (y/n)"``
141 1e86ee97 Marc Schmitt
  * For ``power-status``, nodename is optional, if omitted, we list the
142 1e86ee97 Marc Schmitt
    power-status of all OOB capable nodes in the cluster (SoW)
143 1e86ee97 Marc Schmitt
  * User should be warned and needs to confirm with yes if s/he tries to
144 1e86ee97 Marc Schmitt
    ``power [off|cycle]`` a node with running instances.
145 1e86ee97 Marc Schmitt
146 1e86ee97 Marc Schmitt
Error Handling
147 1e86ee97 Marc Schmitt
^^^^^^^^^^^^^^
148 1e86ee97 Marc Schmitt
149 1e86ee97 Marc Schmitt
+------------------------------+-----------------------------------------------+
150 1e86ee97 Marc Schmitt
| Exception                    | Error Message                                 |
151 1e86ee97 Marc Schmitt
+==============================+===============================================+
152 1e86ee97 Marc Schmitt
| OOB program return code != 0 | OOB program execution failed ($ERROR_MSG)     |
153 1e86ee97 Marc Schmitt
+------------------------------+-----------------------------------------------+
154 1e86ee97 Marc Schmitt
| OOB program execution time   | OOB program execution timeout exceeded, OOB   |
155 1e86ee97 Marc Schmitt
| exceeds 60s                  | program execution aborted                     |
156 1e86ee97 Marc Schmitt
+------------------------------+-----------------------------------------------+
157 1e86ee97 Marc Schmitt
158 1e86ee97 Marc Schmitt
Node State Changes
159 1e86ee97 Marc Schmitt
^^^^^^^^^^^^^^^^^^
160 1e86ee97 Marc Schmitt
161 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
162 1e86ee97 Marc Schmitt
| State before   | Command         | State after    | Comment                  |
163 1e86ee97 Marc Schmitt
| execution      |                 | execution      |                          |
164 1e86ee97 Marc Schmitt
+================+=================+================+==========================+
165 1e86ee97 Marc Schmitt
| powered: False | ``power off``   | powered: False | FYI: IPMI will complain  |
166 1e86ee97 Marc Schmitt
|                |                 |                | if you try to power off  |
167 1e86ee97 Marc Schmitt
|                |                 |                | a machine that is already|
168 1e86ee97 Marc Schmitt
|                |                 |                | powered off              |
169 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
170 1e86ee97 Marc Schmitt
| powered: False | ``power cycle`` | powered: False | FYI: IPMI will complain  |
171 1e86ee97 Marc Schmitt
|                |                 |                | if you try to cycle a    |
172 1e86ee97 Marc Schmitt
|                |                 |                | machine that is already  |
173 1e86ee97 Marc Schmitt
|                |                 |                | powered off              |
174 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
175 1e86ee97 Marc Schmitt
| powered: False | ``power on``    | powered: True  |                          |
176 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
177 1e86ee97 Marc Schmitt
| powered: True  | ``power off``   | powered: False |                          |
178 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
179 1e86ee97 Marc Schmitt
| powered: True  | ``power cycle`` | powered: True  |                          |
180 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
181 1e86ee97 Marc Schmitt
| powered: True  | ``power on``    | powered: True  | FYI: IPMI will complain  |
182 1e86ee97 Marc Schmitt
|                |                 |                | if you try to power on   |
183 1e86ee97 Marc Schmitt
|                |                 |                | a machine that is already|
184 1e86ee97 Marc Schmitt
|                |                 |                | powered on               |
185 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
186 1e86ee97 Marc Schmitt
187 1e86ee97 Marc Schmitt
.. note::
188 1e86ee97 Marc Schmitt
189 1e86ee97 Marc Schmitt
  * If the command fails, the Node State remains unchanged.
190 1e86ee97 Marc Schmitt
  * We will not prevent the user from trying to power off a node that is
191 1e86ee97 Marc Schmitt
    already powered off since the powered state represents the **SoR** only and
192 1e86ee97 Marc Schmitt
    not the **SoW**. This can however create problems when the cluster
193 1e86ee97 Marc Schmitt
    administrator wants to bring the **SoR** in sync with the **SoW** without
194 1e86ee97 Marc Schmitt
    actually having to mess with the node(s). For this case, we allow direct
195 1e86ee97 Marc Schmitt
    modification of the powered state through the gnt-node modify
196 1e86ee97 Marc Schmitt
    ``--powered=[yes|no]`` command as long as the node has OOB capabilities
197 1e86ee97 Marc Schmitt
    (i.e. ``--oob-program`` is set).
198 1e86ee97 Marc Schmitt
  * All node power state changes will be logged
199 1e86ee97 Marc Schmitt
200 1e86ee97 Marc Schmitt
Node Power Status Listing (SoW)
201 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++++
202 1e86ee97 Marc Schmitt
203 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
204 1e86ee97 Marc Schmitt
| Command: ``power-status``
205 1e86ee97 Marc Schmitt
| Parameters: [ ``nodename`` ... ]
206 1e86ee97 Marc Schmitt
207 1e86ee97 Marc Schmitt
Example output (represents **SoW**)::
208 1e86ee97 Marc Schmitt
209 1e86ee97 Marc Schmitt
  gnt-node oob power-status
210 1e86ee97 Marc Schmitt
  Node                      Power Status
211 1e86ee97 Marc Schmitt
  node1.example.com         on
212 1e86ee97 Marc Schmitt
  node2.example.com         off
213 1e86ee97 Marc Schmitt
  node3.example.com         on
214 1e86ee97 Marc Schmitt
  node4.example.com         unknown
215 1e86ee97 Marc Schmitt
216 1e86ee97 Marc Schmitt
.. note::
217 1e86ee97 Marc Schmitt
218 1e86ee97 Marc Schmitt
  * We use ``unknown`` in case the Helper Program could not determine the power
219 1e86ee97 Marc Schmitt
    state.
220 1e86ee97 Marc Schmitt
  * If no nodenames are provided, we will list the power state of all nodes
221 1e86ee97 Marc Schmitt
    which are not opted out from OOB management.
222 1e86ee97 Marc Schmitt
  * Only nodes which are not opted out from OOB management will be listed.
223 1e86ee97 Marc Schmitt
    Invoking the command on a node that does not meet this condition will
224 1e86ee97 Marc Schmitt
    result in an error message "Node X does not support OOB commands".
225 1e86ee97 Marc Schmitt
226 1e86ee97 Marc Schmitt
Node Power Status Listing (SoR)
227 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++++
228 1e86ee97 Marc Schmitt
229 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
230 1e86ee97 Marc Schmitt
| Command: ``info``
231 1e86ee97 Marc Schmitt
| Parameter:  [ ``nodename`` ... ]
232 1e86ee97 Marc Schmitt
| Option: None
233 1e86ee97 Marc Schmitt
234 1e86ee97 Marc Schmitt
Example output (represents **SoR**)::
235 1e86ee97 Marc Schmitt
236 1e86ee97 Marc Schmitt
  gnt-node info node1.example.com
237 1e86ee97 Marc Schmitt
  Node name: node1.example.com
238 1e86ee97 Marc Schmitt
    primary ip: 192.168.1.1
239 1e86ee97 Marc Schmitt
    secondary ip: 192.168.2.1
240 1e86ee97 Marc Schmitt
    master candidate: True
241 1e86ee97 Marc Schmitt
    drained: False
242 1e86ee97 Marc Schmitt
    offline: False
243 1e86ee97 Marc Schmitt
    powered: True
244 1e86ee97 Marc Schmitt
    primary for instances:
245 1e86ee97 Marc Schmitt
      - inst1.example.com
246 1e86ee97 Marc Schmitt
      - inst2.example.com
247 1e86ee97 Marc Schmitt
      - inst3.example.com
248 1e86ee97 Marc Schmitt
    secondary for instances:
249 1e86ee97 Marc Schmitt
      - inst4.example.com
250 1e86ee97 Marc Schmitt
      - inst5.example.com
251 1e86ee97 Marc Schmitt
      - inst6.example.com
252 1e86ee97 Marc Schmitt
      - inst7.example.com
253 1e86ee97 Marc Schmitt
254 1e86ee97 Marc Schmitt
.. note::
255 1e86ee97 Marc Schmitt
  Only nodes which are not opted out from OOB management will
256 1e86ee97 Marc Schmitt
  report the powered state.
257 1e86ee97 Marc Schmitt
258 1e86ee97 Marc Schmitt
New ``gnt-node`` oob subcommand: ``health``
259 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++++++++++++++++
260 1e86ee97 Marc Schmitt
261 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
262 1e86ee97 Marc Schmitt
| Command: ``health``
263 1e86ee97 Marc Schmitt
| Parameters: [ ``nodename`` ... ]
264 1e86ee97 Marc Schmitt
| Options: None
265 1e86ee97 Marc Schmitt
| Example: ``/usr/bin/oob health node5.example.com``
266 1e86ee97 Marc Schmitt
267 1e86ee97 Marc Schmitt
Caveats:
268 1e86ee97 Marc Schmitt
269 1e86ee97 Marc Schmitt
  * If no nodename(s) are provided, we will report the health of all nodes in
270 1e86ee97 Marc Schmitt
    the cluster which have ``--oob-program`` set.
271 1e86ee97 Marc Schmitt
  * Only nodes which are not opted out from OOB management will report their
272 1e86ee97 Marc Schmitt
    health. Invoking the command on a node that does not meet this condition
273 1e86ee97 Marc Schmitt
    will result in an error message "Node does not support OOB commands".
274 1e86ee97 Marc Schmitt
275 1e86ee97 Marc Schmitt
For error handling see `Error Handling`_
276 1e86ee97 Marc Schmitt
277 1e86ee97 Marc Schmitt
OOB Program (Helper Program) Parameters, Return Codes and Data Format
278 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
279 1e86ee97 Marc Schmitt
280 1e86ee97 Marc Schmitt
| Program: executable OOB program (absolute path)
281 1e86ee97 Marc Schmitt
| Parameters: command nodename
282 1e86ee97 Marc Schmitt
| Command: [power-{on|off|cycle|status}|health]
283 1e86ee97 Marc Schmitt
| Options: None
284 1e86ee97 Marc Schmitt
| Example: ``/usr/bin/oob power-on node1.example.com``
285 1e86ee97 Marc Schmitt
| Caveat: maximum runtime is limited to 60s
286 1e86ee97 Marc Schmitt
287 1e86ee97 Marc Schmitt
Return Codes
288 1e86ee97 Marc Schmitt
^^^^^^^^^^^^
289 1e86ee97 Marc Schmitt
290 1e86ee97 Marc Schmitt
+---------------+--------------------------+
291 1e86ee97 Marc Schmitt
| Return code   | Meaning                  |
292 1e86ee97 Marc Schmitt
+===============+==========================+
293 1e86ee97 Marc Schmitt
| 0             | Command succeeded        |
294 1e86ee97 Marc Schmitt
+---------------+--------------------------+
295 1e86ee97 Marc Schmitt
| 1             | Command failed           |
296 1e86ee97 Marc Schmitt
+---------------+--------------------------+
297 1e86ee97 Marc Schmitt
| others        | Unsupported/undefined    |
298 1e86ee97 Marc Schmitt
+---------------+--------------------------+
299 1e86ee97 Marc Schmitt
300 1e86ee97 Marc Schmitt
Error messages are passed from the helper program to Ganeti through StdErr
301 1e86ee97 Marc Schmitt
(return code == 1).  On StdOut, the helper program will send data back to
302 1e86ee97 Marc Schmitt
Ganeti (return code == 0). The format of the data is JSON.
303 1e86ee97 Marc Schmitt
304 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
305 1e86ee97 Marc Schmitt
| Command          | Expected output               |
306 1e86ee97 Marc Schmitt
+==================+===============================+
307 1e86ee97 Marc Schmitt
| ``power-on``     | None                          |
308 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
309 1e86ee97 Marc Schmitt
| ``power-off``    | None                          |
310 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
311 1e86ee97 Marc Schmitt
| ``power-cycle``  | None                          |
312 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
313 1e86ee97 Marc Schmitt
| ``power-status`` | ``{ "powered": true|false }`` |
314 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
315 1e86ee97 Marc Schmitt
| ``health``       | ::                            |
316 1e86ee97 Marc Schmitt
|                  |                               |
317 1e86ee97 Marc Schmitt
|                  |   [[item, status],            |
318 1e86ee97 Marc Schmitt
|                  |    [item, status],            |
319 1e86ee97 Marc Schmitt
|                  |    ...]                       |
320 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
321 1e86ee97 Marc Schmitt
322 1e86ee97 Marc Schmitt
Data Format
323 1e86ee97 Marc Schmitt
^^^^^^^^^^^
324 1e86ee97 Marc Schmitt
325 1e86ee97 Marc Schmitt
For the health output, the fields are:
326 1e86ee97 Marc Schmitt
327 1e86ee97 Marc Schmitt
+--------+--------------------------------------------------------------------+
328 1e86ee97 Marc Schmitt
| Field  | Meaning                                                            |
329 1e86ee97 Marc Schmitt
+========+====================================================================+
330 1e86ee97 Marc Schmitt
| item   | String identifier of the item we are querying the health of,       |
331 1e86ee97 Marc Schmitt
|        | examples:                                                          |
332 1e86ee97 Marc Schmitt
|        |                                                                    |
333 1e86ee97 Marc Schmitt
|        |   * Ambient Temp                                                   |
334 1e86ee97 Marc Schmitt
|        |   * PS Redundancy                                                  |
335 1e86ee97 Marc Schmitt
|        |   * FAN 1 RPM                                                      |
336 1e86ee97 Marc Schmitt
+--------+--------------------------------------------------------------------+
337 1e86ee97 Marc Schmitt
| status | String; Can take one of the following four values:                 |
338 1e86ee97 Marc Schmitt
|        |                                                                    |
339 1e86ee97 Marc Schmitt
|        |   * OK                                                             |
340 1e86ee97 Marc Schmitt
|        |   * WARNING                                                        |
341 1e86ee97 Marc Schmitt
|        |   * CRITICAL                                                       |
342 1e86ee97 Marc Schmitt
|        |   * UNKNOWN                                                        |
343 1e86ee97 Marc Schmitt
+--------+--------------------------------------------------------------------+
344 1e86ee97 Marc Schmitt
345 1e86ee97 Marc Schmitt
.. note::
346 1e86ee97 Marc Schmitt
347 1e86ee97 Marc Schmitt
  * The item output list is defined by the Helper Program. It is up to the
348 1e86ee97 Marc Schmitt
    author of the Helper Program to decide which items should be monitored and
349 1e86ee97 Marc Schmitt
    what each corresponding return status is.
350 1e86ee97 Marc Schmitt
  * Ganeti will currently not take any actions based on the item status. It
351 1e86ee97 Marc Schmitt
    will however create log entries for items with status WARNING or CRITICAL
352 1e86ee97 Marc Schmitt
    for each run of the ``gnt-node oob health nodename`` command. Automatic
353 1e86ee97 Marc Schmitt
    actions (regular monitoring of the item status) is considered a new service
354 1e86ee97 Marc Schmitt
    and will be treated in a separate design document.
355 1e86ee97 Marc Schmitt
356 1e86ee97 Marc Schmitt
Logging
357 1e86ee97 Marc Schmitt
-------
358 1e86ee97 Marc Schmitt
359 1e86ee97 Marc Schmitt
The ``gnt-node power-[on|off]`` (power state changes) commands will create log
360 1e86ee97 Marc Schmitt
entries following current Ganeti logging practices. In addition, health items
361 1e86ee97 Marc Schmitt
with status WARNING or CRITICAL will be logged for each run of ``gnt-node
362 1e86ee97 Marc Schmitt
health``.