Statistics
| Branch: | Tag: | Revision:

root / doc / design-oob.rst @ a52978c7

History | View | Annotate | Download (16 kB)

1 1e86ee97 Marc Schmitt
Ganeti Node OOB Management Framework
2 1e86ee97 Marc Schmitt
====================================
3 1e86ee97 Marc Schmitt
4 1e86ee97 Marc Schmitt
Objective
5 1e86ee97 Marc Schmitt
---------
6 1e86ee97 Marc Schmitt
7 1e86ee97 Marc Schmitt
Extend Ganeti with Out of Band Cluster Node Management Capabilities.
8 1e86ee97 Marc Schmitt
9 1e86ee97 Marc Schmitt
Background
10 1e86ee97 Marc Schmitt
----------
11 1e86ee97 Marc Schmitt
12 1e86ee97 Marc Schmitt
Ganeti currently has no support for Out of Band management of the nodes in a
13 1e86ee97 Marc Schmitt
cluster. It relies on the OS running on the nodes and has therefore limited
14 1e86ee97 Marc Schmitt
possibilities when the OS is not responding. The command ``gnt-node powercycle``
15 1e86ee97 Marc Schmitt
can be issued to attempt a reboot of a node that crashed but there are no means
16 1e86ee97 Marc Schmitt
to power a node off and power it back on. Supporting this is very handy in the
17 1e86ee97 Marc Schmitt
following situations:
18 1e86ee97 Marc Schmitt
19 1e86ee97 Marc Schmitt
  * **Emergency Power Off**: During emergencies, time is critical and manual
20 1e86ee97 Marc Schmitt
    tasks just add latency which can be avoided through automation. If a server
21 1e86ee97 Marc Schmitt
    room overheats, halting the OS on the nodes is not enough. The nodes need
22 1e86ee97 Marc Schmitt
    to be powered off cleanly to prevent damage to equipment.
23 1e86ee97 Marc Schmitt
  * **Repairs**: In most cases, repairing a node means that the node has to be
24 1e86ee97 Marc Schmitt
    powered off.
25 1e86ee97 Marc Schmitt
  * **Crashes**: Software bugs may crash a node. Having an OS independent way to
26 1e86ee97 Marc Schmitt
    power-cycle a node helps to recover the node without human intervention.
27 1e86ee97 Marc Schmitt
28 1e86ee97 Marc Schmitt
Overview
29 1e86ee97 Marc Schmitt
--------
30 1e86ee97 Marc Schmitt
31 1e86ee97 Marc Schmitt
Ganeti will be extended with OOB capabilities through adding a new **Cluster
32 1e86ee97 Marc Schmitt
Parameter** (``--oob-program``), a new **Node Property** (``--oob-program``), a
33 1e86ee97 Marc Schmitt
new **Node State (powered)** and support in ``gnt-node`` for invoking an
34 1e86ee97 Marc Schmitt
**External Helper Command** which executes the actual OOB command (``gnt-node
35 1e86ee97 Marc Schmitt
<command> nodename ...``). The supported commands are: ``power on``,
36 1e86ee97 Marc Schmitt
``power off``, ``power cycle``, ``power status`` and ``health``.
37 1e86ee97 Marc Schmitt
38 1e86ee97 Marc Schmitt
.. note::
39 1e86ee97 Marc Schmitt
  The new **Node State (powered)** is a **State of Record
40 1e86ee97 Marc Schmitt
  (SoR)**, not a **State of World (SoW)**.  The maximum execution time of the
41 1e86ee97 Marc Schmitt
  **External Helper Command** will be limited to 60s to prevent the cluster from
42 1e86ee97 Marc Schmitt
  getting locked for an undefined amount of time.
43 1e86ee97 Marc Schmitt
44 1e86ee97 Marc Schmitt
Detailed Design
45 1e86ee97 Marc Schmitt
---------------
46 1e86ee97 Marc Schmitt
47 1e86ee97 Marc Schmitt
New ``gnt-cluster`` Parameter
48 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++
49 1e86ee97 Marc Schmitt
50 1e86ee97 Marc Schmitt
| Program: ``gnt-cluster``
51 1e86ee97 Marc Schmitt
| Command: ``modify|init``
52 1e86ee97 Marc Schmitt
| Parameters: ``--oob-program``
53 1e86ee97 Marc Schmitt
| Options: ``--oob-program``: executable OOB program (absolute path)
54 1e86ee97 Marc Schmitt
55 effb49b4 René Nussbaumer
New ``gnt-cluster epo`` Command
56 effb49b4 René Nussbaumer
+++++++++++++++++++++++++++++++
57 effb49b4 René Nussbaumer
58 effb49b4 René Nussbaumer
| Program: ``gnt-cluster``
59 effb49b4 René Nussbaumer
| Command: ``epo``
60 effb49b4 René Nussbaumer
| Parameter: ``--on`` ``--force`` ``--groups`` ``--all``
61 effb49b4 René Nussbaumer
| Options: ``--on``: By default epo turns off, with ``--on`` it tries to get the
62 effb49b4 René Nussbaumer
|                    cluster back online
63 effb49b4 René Nussbaumer
|          ``--force``: To force the operation without asking for confirmation
64 effb49b4 René Nussbaumer
|          ``--groups``: To operate on groups instead of nodes
65 effb49b4 René Nussbaumer
|          ``--all``: To operate on the whole cluster
66 effb49b4 René Nussbaumer
67 effb49b4 René Nussbaumer
This is a convenience command to allow easy emergency power off of a whole
68 effb49b4 René Nussbaumer
cluster or part of it. It takes care of all steps needed to get the cluster into
69 effb49b4 René Nussbaumer
a sane state to turn off the nodes.
70 effb49b4 René Nussbaumer
71 effb49b4 René Nussbaumer
With ``--on`` it does the reverse and tries to bring the rest of the cluster back
72 effb49b4 René Nussbaumer
to life.
73 effb49b4 René Nussbaumer
74 effb49b4 René Nussbaumer
.. note::
75 effb49b4 René Nussbaumer
  The master node is not able to shut itself cleanly down. Therefore, this
76 effb49b4 René Nussbaumer
  command will not do all the work on single node clusters. On multi node
77 effb49b4 René Nussbaumer
  clusters the command tries to find another master or if that is not possible
78 effb49b4 René Nussbaumer
  prepares everything to the point where the user has to shutdown the master
79 effb49b4 René Nussbaumer
  node itself alone this applies also to the single node cluster configuration.
80 effb49b4 René Nussbaumer
81 1e86ee97 Marc Schmitt
New ``gnt-node`` Property
82 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++
83 1e86ee97 Marc Schmitt
84 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
85 1e86ee97 Marc Schmitt
| Command: ``modify|add``
86 1e86ee97 Marc Schmitt
| Parameters: ``--oob-program``
87 1e86ee97 Marc Schmitt
| Options: ``--oob-program``: executable OOB program (absolute path)
88 1e86ee97 Marc Schmitt
89 1e86ee97 Marc Schmitt
.. note::
90 1e86ee97 Marc Schmitt
  If ``--oob-program`` is set to ``!`` then the node has no OOB capabilities.
91 1e86ee97 Marc Schmitt
  Otherwise, we will inherit the node group respectively the cluster wide
92 1e86ee97 Marc Schmitt
  value. I.e. the nodes have to opt out from OOB capabilities.
93 1e86ee97 Marc Schmitt
94 1e86ee97 Marc Schmitt
Addition to ``gnt-cluster verify``
95 1e86ee97 Marc Schmitt
++++++++++++++++++++++++++++++++++
96 1e86ee97 Marc Schmitt
97 1e86ee97 Marc Schmitt
| Program: ``gnt-cluster``
98 1e86ee97 Marc Schmitt
| Command: ``verify``
99 1e86ee97 Marc Schmitt
| Parameter: None
100 1e86ee97 Marc Schmitt
| Option: None
101 1e86ee97 Marc Schmitt
| Additional Checks:
102 1e86ee97 Marc Schmitt
103 1e86ee97 Marc Schmitt
  1. existence and execution flag of OOB program on all Master Candidates if
104 1e86ee97 Marc Schmitt
     the cluster parameter ``--oob-program`` is set or at least one node has
105 1e86ee97 Marc Schmitt
     the property ``--oob-program`` set. The OOB helper is just invoked on the
106 1e86ee97 Marc Schmitt
     master
107 1e86ee97 Marc Schmitt
  2. check if node state powered matches actual power state of the machine for
108 1e86ee97 Marc Schmitt
     those nodes where ``--oob-program`` is set
109 1e86ee97 Marc Schmitt
110 1e86ee97 Marc Schmitt
New Node State
111 1e86ee97 Marc Schmitt
++++++++++++++
112 1e86ee97 Marc Schmitt
113 1e86ee97 Marc Schmitt
Ganeti supports the following two boolean states related to the nodes:
114 1e86ee97 Marc Schmitt
115 1e86ee97 Marc Schmitt
**drained**
116 1e86ee97 Marc Schmitt
  The cluster still communicates with drained nodes but excludes them from
117 1e86ee97 Marc Schmitt
  allocation operations
118 1e86ee97 Marc Schmitt
119 1e86ee97 Marc Schmitt
**offline**
120 1e86ee97 Marc Schmitt
  if offline, the cluster does not communicate with offline nodes; useful for
121 1e86ee97 Marc Schmitt
  nodes that are not reachable in order to avoid delays
122 1e86ee97 Marc Schmitt
123 1e86ee97 Marc Schmitt
And will extend this list with the following boolean state:
124 1e86ee97 Marc Schmitt
125 1e86ee97 Marc Schmitt
**powered**
126 1e86ee97 Marc Schmitt
  if not powered, the cluster does not communicate with not powered nodes if
127 1e86ee97 Marc Schmitt
  the node property ``--oob-program`` is not set, the state powered is not
128 1e86ee97 Marc Schmitt
  displayed
129 1e86ee97 Marc Schmitt
130 1e86ee97 Marc Schmitt
Additionally modify the meaning of the offline state as follows:
131 1e86ee97 Marc Schmitt
132 1e86ee97 Marc Schmitt
**offline**
133 1e86ee97 Marc Schmitt
  if offline, the cluster does not communicate with offline nodes (**with the
134 1e86ee97 Marc Schmitt
  exception of OOB commands for nodes where** ``--oob-program`` **is set**);
135 1e86ee97 Marc Schmitt
  useful for nodes that are not reachable in order to avoid delays
136 1e86ee97 Marc Schmitt
137 1e86ee97 Marc Schmitt
The corresponding command extensions are:
138 1e86ee97 Marc Schmitt
139 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
140 1e86ee97 Marc Schmitt
| Command: ``info``
141 1e86ee97 Marc Schmitt
| Parameter:  [ ``nodename`` ... ]
142 1e86ee97 Marc Schmitt
| Option: None
143 1e86ee97 Marc Schmitt
144 1e86ee97 Marc Schmitt
Additional Output (SoR, ommited if node property ``--oob-program`` is not set):
145 1e86ee97 Marc Schmitt
powered: ``[True|False]``
146 1e86ee97 Marc Schmitt
147 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
148 1e86ee97 Marc Schmitt
| Command: ``modify``
149 1e86ee97 Marc Schmitt
| Parameter: nodename
150 1e86ee97 Marc Schmitt
| Option: [ ``--powered=yes|no`` ]
151 1e86ee97 Marc Schmitt
| Reasoning: sometimes you will need to sync the SoR with the SoW manually
152 1e86ee97 Marc Schmitt
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
153 1e86ee97 Marc Schmitt
|         the node in question
154 1e86ee97 Marc Schmitt
155 1e86ee97 Marc Schmitt
New ``gnt-node`` commands: ``power [on|off|cycle|status]``
156 1e86ee97 Marc Schmitt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
157 1e86ee97 Marc Schmitt
158 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
159 1e86ee97 Marc Schmitt
| Command: ``power [on|off|cycle|status]``
160 1e86ee97 Marc Schmitt
| Parameters: [ ``nodename`` ... ]
161 1e86ee97 Marc Schmitt
| Options: None
162 1e86ee97 Marc Schmitt
| Caveats:
163 1e86ee97 Marc Schmitt
164 1e86ee97 Marc Schmitt
  * If no nodenames are passed to ``power [on|off|cycle]``, the user will be
165 1e86ee97 Marc Schmitt
    prompted with ``"Do you really want to power [on|off|cycle] the following
166 1e86ee97 Marc Schmitt
    nodes: <display list of OOB capable nodes in the cluster)? (y/n)"``
167 1e86ee97 Marc Schmitt
  * For ``power-status``, nodename is optional, if omitted, we list the
168 1e86ee97 Marc Schmitt
    power-status of all OOB capable nodes in the cluster (SoW)
169 1e86ee97 Marc Schmitt
  * User should be warned and needs to confirm with yes if s/he tries to
170 1e86ee97 Marc Schmitt
    ``power [off|cycle]`` a node with running instances.
171 1e86ee97 Marc Schmitt
172 1e86ee97 Marc Schmitt
Error Handling
173 1e86ee97 Marc Schmitt
^^^^^^^^^^^^^^
174 1e86ee97 Marc Schmitt
175 1e86ee97 Marc Schmitt
+------------------------------+-----------------------------------------------+
176 1e86ee97 Marc Schmitt
| Exception                    | Error Message                                 |
177 1e86ee97 Marc Schmitt
+==============================+===============================================+
178 1e86ee97 Marc Schmitt
| OOB program return code != 0 | OOB program execution failed ($ERROR_MSG)     |
179 1e86ee97 Marc Schmitt
+------------------------------+-----------------------------------------------+
180 1e86ee97 Marc Schmitt
| OOB program execution time   | OOB program execution timeout exceeded, OOB   |
181 1e86ee97 Marc Schmitt
| exceeds 60s                  | program execution aborted                     |
182 1e86ee97 Marc Schmitt
+------------------------------+-----------------------------------------------+
183 1e86ee97 Marc Schmitt
184 1e86ee97 Marc Schmitt
Node State Changes
185 1e86ee97 Marc Schmitt
^^^^^^^^^^^^^^^^^^
186 1e86ee97 Marc Schmitt
187 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
188 1e86ee97 Marc Schmitt
| State before   | Command         | State after    | Comment                  |
189 1e86ee97 Marc Schmitt
| execution      |                 | execution      |                          |
190 1e86ee97 Marc Schmitt
+================+=================+================+==========================+
191 1e86ee97 Marc Schmitt
| powered: False | ``power off``   | powered: False | FYI: IPMI will complain  |
192 1e86ee97 Marc Schmitt
|                |                 |                | if you try to power off  |
193 1e86ee97 Marc Schmitt
|                |                 |                | a machine that is already|
194 1e86ee97 Marc Schmitt
|                |                 |                | powered off              |
195 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
196 1e86ee97 Marc Schmitt
| powered: False | ``power cycle`` | powered: False | FYI: IPMI will complain  |
197 1e86ee97 Marc Schmitt
|                |                 |                | if you try to cycle a    |
198 1e86ee97 Marc Schmitt
|                |                 |                | machine that is already  |
199 1e86ee97 Marc Schmitt
|                |                 |                | powered off              |
200 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
201 1e86ee97 Marc Schmitt
| powered: False | ``power on``    | powered: True  |                          |
202 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
203 1e86ee97 Marc Schmitt
| powered: True  | ``power off``   | powered: False |                          |
204 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
205 1e86ee97 Marc Schmitt
| powered: True  | ``power cycle`` | powered: True  |                          |
206 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
207 1e86ee97 Marc Schmitt
| powered: True  | ``power on``    | powered: True  | FYI: IPMI will complain  |
208 1e86ee97 Marc Schmitt
|                |                 |                | if you try to power on   |
209 1e86ee97 Marc Schmitt
|                |                 |                | a machine that is already|
210 1e86ee97 Marc Schmitt
|                |                 |                | powered on               |
211 1e86ee97 Marc Schmitt
+----------------+-----------------+----------------+--------------------------+
212 1e86ee97 Marc Schmitt
213 1e86ee97 Marc Schmitt
.. note::
214 1e86ee97 Marc Schmitt
215 1e86ee97 Marc Schmitt
  * If the command fails, the Node State remains unchanged.
216 1e86ee97 Marc Schmitt
  * We will not prevent the user from trying to power off a node that is
217 1e86ee97 Marc Schmitt
    already powered off since the powered state represents the **SoR** only and
218 1e86ee97 Marc Schmitt
    not the **SoW**. This can however create problems when the cluster
219 1e86ee97 Marc Schmitt
    administrator wants to bring the **SoR** in sync with the **SoW** without
220 1e86ee97 Marc Schmitt
    actually having to mess with the node(s). For this case, we allow direct
221 1e86ee97 Marc Schmitt
    modification of the powered state through the gnt-node modify
222 1e86ee97 Marc Schmitt
    ``--powered=[yes|no]`` command as long as the node has OOB capabilities
223 1e86ee97 Marc Schmitt
    (i.e. ``--oob-program`` is set).
224 1e86ee97 Marc Schmitt
  * All node power state changes will be logged
225 1e86ee97 Marc Schmitt
226 1e86ee97 Marc Schmitt
Node Power Status Listing (SoW)
227 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++++
228 1e86ee97 Marc Schmitt
229 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
230 1e86ee97 Marc Schmitt
| Command: ``power-status``
231 1e86ee97 Marc Schmitt
| Parameters: [ ``nodename`` ... ]
232 1e86ee97 Marc Schmitt
233 1e86ee97 Marc Schmitt
Example output (represents **SoW**)::
234 1e86ee97 Marc Schmitt
235 1e86ee97 Marc Schmitt
  gnt-node oob power-status
236 1e86ee97 Marc Schmitt
  Node                      Power Status
237 1e86ee97 Marc Schmitt
  node1.example.com         on
238 1e86ee97 Marc Schmitt
  node2.example.com         off
239 1e86ee97 Marc Schmitt
  node3.example.com         on
240 1e86ee97 Marc Schmitt
  node4.example.com         unknown
241 1e86ee97 Marc Schmitt
242 1e86ee97 Marc Schmitt
.. note::
243 1e86ee97 Marc Schmitt
244 1e86ee97 Marc Schmitt
  * We use ``unknown`` in case the Helper Program could not determine the power
245 1e86ee97 Marc Schmitt
    state.
246 1e86ee97 Marc Schmitt
  * If no nodenames are provided, we will list the power state of all nodes
247 1e86ee97 Marc Schmitt
    which are not opted out from OOB management.
248 1e86ee97 Marc Schmitt
  * Only nodes which are not opted out from OOB management will be listed.
249 1e86ee97 Marc Schmitt
    Invoking the command on a node that does not meet this condition will
250 1e86ee97 Marc Schmitt
    result in an error message "Node X does not support OOB commands".
251 1e86ee97 Marc Schmitt
252 1e86ee97 Marc Schmitt
Node Power Status Listing (SoR)
253 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++++
254 1e86ee97 Marc Schmitt
255 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
256 1e86ee97 Marc Schmitt
| Command: ``info``
257 1e86ee97 Marc Schmitt
| Parameter:  [ ``nodename`` ... ]
258 1e86ee97 Marc Schmitt
| Option: None
259 1e86ee97 Marc Schmitt
260 1e86ee97 Marc Schmitt
Example output (represents **SoR**)::
261 1e86ee97 Marc Schmitt
262 1e86ee97 Marc Schmitt
  gnt-node info node1.example.com
263 1e86ee97 Marc Schmitt
  Node name: node1.example.com
264 1e86ee97 Marc Schmitt
    primary ip: 192.168.1.1
265 1e86ee97 Marc Schmitt
    secondary ip: 192.168.2.1
266 1e86ee97 Marc Schmitt
    master candidate: True
267 1e86ee97 Marc Schmitt
    drained: False
268 1e86ee97 Marc Schmitt
    offline: False
269 1e86ee97 Marc Schmitt
    powered: True
270 1e86ee97 Marc Schmitt
    primary for instances:
271 1e86ee97 Marc Schmitt
      - inst1.example.com
272 1e86ee97 Marc Schmitt
      - inst2.example.com
273 1e86ee97 Marc Schmitt
      - inst3.example.com
274 1e86ee97 Marc Schmitt
    secondary for instances:
275 1e86ee97 Marc Schmitt
      - inst4.example.com
276 1e86ee97 Marc Schmitt
      - inst5.example.com
277 1e86ee97 Marc Schmitt
      - inst6.example.com
278 1e86ee97 Marc Schmitt
      - inst7.example.com
279 1e86ee97 Marc Schmitt
280 1e86ee97 Marc Schmitt
.. note::
281 1e86ee97 Marc Schmitt
  Only nodes which are not opted out from OOB management will
282 1e86ee97 Marc Schmitt
  report the powered state.
283 1e86ee97 Marc Schmitt
284 1e86ee97 Marc Schmitt
New ``gnt-node`` oob subcommand: ``health``
285 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++++++++++++++++
286 1e86ee97 Marc Schmitt
287 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
288 1e86ee97 Marc Schmitt
| Command: ``health``
289 1e86ee97 Marc Schmitt
| Parameters: [ ``nodename`` ... ]
290 1e86ee97 Marc Schmitt
| Options: None
291 1e86ee97 Marc Schmitt
| Example: ``/usr/bin/oob health node5.example.com``
292 1e86ee97 Marc Schmitt
293 1e86ee97 Marc Schmitt
Caveats:
294 1e86ee97 Marc Schmitt
295 1e86ee97 Marc Schmitt
  * If no nodename(s) are provided, we will report the health of all nodes in
296 1e86ee97 Marc Schmitt
    the cluster which have ``--oob-program`` set.
297 1e86ee97 Marc Schmitt
  * Only nodes which are not opted out from OOB management will report their
298 1e86ee97 Marc Schmitt
    health. Invoking the command on a node that does not meet this condition
299 1e86ee97 Marc Schmitt
    will result in an error message "Node does not support OOB commands".
300 1e86ee97 Marc Schmitt
301 1e86ee97 Marc Schmitt
For error handling see `Error Handling`_
302 1e86ee97 Marc Schmitt
303 1e86ee97 Marc Schmitt
OOB Program (Helper Program) Parameters, Return Codes and Data Format
304 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
305 1e86ee97 Marc Schmitt
306 1e86ee97 Marc Schmitt
| Program: executable OOB program (absolute path)
307 1e86ee97 Marc Schmitt
| Parameters: command nodename
308 1e86ee97 Marc Schmitt
| Command: [power-{on|off|cycle|status}|health]
309 1e86ee97 Marc Schmitt
| Options: None
310 1e86ee97 Marc Schmitt
| Example: ``/usr/bin/oob power-on node1.example.com``
311 1e86ee97 Marc Schmitt
| Caveat: maximum runtime is limited to 60s
312 1e86ee97 Marc Schmitt
313 1e86ee97 Marc Schmitt
Return Codes
314 1e86ee97 Marc Schmitt
^^^^^^^^^^^^
315 1e86ee97 Marc Schmitt
316 1e86ee97 Marc Schmitt
+---------------+--------------------------+
317 1e86ee97 Marc Schmitt
| Return code   | Meaning                  |
318 1e86ee97 Marc Schmitt
+===============+==========================+
319 1e86ee97 Marc Schmitt
| 0             | Command succeeded        |
320 1e86ee97 Marc Schmitt
+---------------+--------------------------+
321 1e86ee97 Marc Schmitt
| 1             | Command failed           |
322 1e86ee97 Marc Schmitt
+---------------+--------------------------+
323 1e86ee97 Marc Schmitt
| others        | Unsupported/undefined    |
324 1e86ee97 Marc Schmitt
+---------------+--------------------------+
325 1e86ee97 Marc Schmitt
326 1e86ee97 Marc Schmitt
Error messages are passed from the helper program to Ganeti through StdErr
327 1e86ee97 Marc Schmitt
(return code == 1).  On StdOut, the helper program will send data back to
328 1e86ee97 Marc Schmitt
Ganeti (return code == 0). The format of the data is JSON.
329 1e86ee97 Marc Schmitt
330 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
331 1e86ee97 Marc Schmitt
| Command          | Expected output               |
332 1e86ee97 Marc Schmitt
+==================+===============================+
333 1e86ee97 Marc Schmitt
| ``power-on``     | None                          |
334 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
335 1e86ee97 Marc Schmitt
| ``power-off``    | None                          |
336 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
337 1e86ee97 Marc Schmitt
| ``power-cycle``  | None                          |
338 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
339 1e86ee97 Marc Schmitt
| ``power-status`` | ``{ "powered": true|false }`` |
340 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
341 1e86ee97 Marc Schmitt
| ``health``       | ::                            |
342 1e86ee97 Marc Schmitt
|                  |                               |
343 1e86ee97 Marc Schmitt
|                  |   [[item, status],            |
344 1e86ee97 Marc Schmitt
|                  |    [item, status],            |
345 1e86ee97 Marc Schmitt
|                  |    ...]                       |
346 1e86ee97 Marc Schmitt
+------------------+-------------------------------+
347 1e86ee97 Marc Schmitt
348 1e86ee97 Marc Schmitt
Data Format
349 1e86ee97 Marc Schmitt
^^^^^^^^^^^
350 1e86ee97 Marc Schmitt
351 1e86ee97 Marc Schmitt
For the health output, the fields are:
352 1e86ee97 Marc Schmitt
353 1e86ee97 Marc Schmitt
+--------+--------------------------------------------------------------------+
354 1e86ee97 Marc Schmitt
| Field  | Meaning                                                            |
355 1e86ee97 Marc Schmitt
+========+====================================================================+
356 1e86ee97 Marc Schmitt
| item   | String identifier of the item we are querying the health of,       |
357 1e86ee97 Marc Schmitt
|        | examples:                                                          |
358 1e86ee97 Marc Schmitt
|        |                                                                    |
359 1e86ee97 Marc Schmitt
|        |   * Ambient Temp                                                   |
360 1e86ee97 Marc Schmitt
|        |   * PS Redundancy                                                  |
361 1e86ee97 Marc Schmitt
|        |   * FAN 1 RPM                                                      |
362 1e86ee97 Marc Schmitt
+--------+--------------------------------------------------------------------+
363 1e86ee97 Marc Schmitt
| status | String; Can take one of the following four values:                 |
364 1e86ee97 Marc Schmitt
|        |                                                                    |
365 1e86ee97 Marc Schmitt
|        |   * OK                                                             |
366 1e86ee97 Marc Schmitt
|        |   * WARNING                                                        |
367 1e86ee97 Marc Schmitt
|        |   * CRITICAL                                                       |
368 1e86ee97 Marc Schmitt
|        |   * UNKNOWN                                                        |
369 1e86ee97 Marc Schmitt
+--------+--------------------------------------------------------------------+
370 1e86ee97 Marc Schmitt
371 1e86ee97 Marc Schmitt
.. note::
372 1e86ee97 Marc Schmitt
373 1e86ee97 Marc Schmitt
  * The item output list is defined by the Helper Program. It is up to the
374 1e86ee97 Marc Schmitt
    author of the Helper Program to decide which items should be monitored and
375 1e86ee97 Marc Schmitt
    what each corresponding return status is.
376 1e86ee97 Marc Schmitt
  * Ganeti will currently not take any actions based on the item status. It
377 1e86ee97 Marc Schmitt
    will however create log entries for items with status WARNING or CRITICAL
378 1e86ee97 Marc Schmitt
    for each run of the ``gnt-node oob health nodename`` command. Automatic
379 1e86ee97 Marc Schmitt
    actions (regular monitoring of the item status) is considered a new service
380 1e86ee97 Marc Schmitt
    and will be treated in a separate design document.
381 1e86ee97 Marc Schmitt
382 1e86ee97 Marc Schmitt
Logging
383 1e86ee97 Marc Schmitt
-------
384 1e86ee97 Marc Schmitt
385 1e86ee97 Marc Schmitt
The ``gnt-node power-[on|off]`` (power state changes) commands will create log
386 1e86ee97 Marc Schmitt
entries following current Ganeti logging practices. In addition, health items
387 1e86ee97 Marc Schmitt
with status WARNING or CRITICAL will be logged for each run of ``gnt-node
388 1e86ee97 Marc Schmitt
health``.
389 9ff4f2c0 Michael Hanselmann
390 9ff4f2c0 Michael Hanselmann
.. vim: set textwidth=72 :
391 9ff4f2c0 Michael Hanselmann
.. Local Variables:
392 9ff4f2c0 Michael Hanselmann
.. mode: rst
393 9ff4f2c0 Michael Hanselmann
.. fill-column: 72
394 9ff4f2c0 Michael Hanselmann
.. End: