Statistics
| Branch: | Tag: | Revision:

root / doc / design-oob.rst @ 7142485a

History | View | Annotate | Download (16 kB)

1 1e86ee97 Marc Schmitt
Ganeti Node OOB Management Framework
2 1e86ee97 Marc Schmitt
====================================
3 1e86ee97 Marc Schmitt
4 1e86ee97 Marc Schmitt
Objective
5 1e86ee97 Marc Schmitt
---------
6 1e86ee97 Marc Schmitt
7 e3c39cc3 Iustin Pop
Extend Ganeti with Out of Band (:term:`OOB`) Cluster Node Management
8 e3c39cc3 Iustin Pop
Capabilities.
9 1e86ee97 Marc Schmitt
10 1e86ee97 Marc Schmitt
Background
11 1e86ee97 Marc Schmitt
----------
12 1e86ee97 Marc Schmitt
13 e3c39cc3 Iustin Pop
Ganeti currently has no support for Out of Band management of the nodes
14 e3c39cc3 Iustin Pop
in a cluster. It relies on the OS running on the nodes and has therefore
15 e3c39cc3 Iustin Pop
limited possibilities when the OS is not responding. The command
16 e3c39cc3 Iustin Pop
``gnt-node powercycle`` can be issued to attempt a reboot of a node that
17 e3c39cc3 Iustin Pop
crashed but there are no means to power a node off and power it back
18 e3c39cc3 Iustin Pop
on. Supporting this is very handy in the following situations:
19 e3c39cc3 Iustin Pop
20 e3c39cc3 Iustin Pop
  * **Emergency Power Off**: During emergencies, time is critical and
21 e3c39cc3 Iustin Pop
    manual tasks just add latency which can be avoided through
22 e3c39cc3 Iustin Pop
    automation. If a server room overheats, halting the OS on the nodes
23 e3c39cc3 Iustin Pop
    is not enough. The nodes need to be powered off cleanly to prevent
24 e3c39cc3 Iustin Pop
    damage to equipment.
25 e3c39cc3 Iustin Pop
  * **Repairs**: In most cases, repairing a node means that the node has
26 e3c39cc3 Iustin Pop
    to be powered off.
27 e3c39cc3 Iustin Pop
  * **Crashes**: Software bugs may crash a node. Having an OS
28 e3c39cc3 Iustin Pop
    independent way to power-cycle a node helps to recover the node
29 e3c39cc3 Iustin Pop
    without human intervention.
30 1e86ee97 Marc Schmitt
31 1e86ee97 Marc Schmitt
Overview
32 1e86ee97 Marc Schmitt
--------
33 1e86ee97 Marc Schmitt
34 e3c39cc3 Iustin Pop
Ganeti will be extended with OOB capabilities through adding a new
35 e3c39cc3 Iustin Pop
**Cluster Parameter** (``--oob-program``), a new **Node Property**
36 e3c39cc3 Iustin Pop
(``--oob-program``), a new **Node State (powered)** and support in
37 e3c39cc3 Iustin Pop
``gnt-node`` for invoking an **External Helper Command** which executes
38 e3c39cc3 Iustin Pop
the actual OOB command (``gnt-node <command> nodename ...``). The
39 e3c39cc3 Iustin Pop
supported commands are: ``power on``, ``power off``, ``power cycle``,
40 e3c39cc3 Iustin Pop
``power status`` and ``health``.
41 1e86ee97 Marc Schmitt
42 1e86ee97 Marc Schmitt
.. note::
43 e3c39cc3 Iustin Pop
  The new **Node State (powered)** is a **State of Record**
44 e3c39cc3 Iustin Pop
  (:term:`SoR`), not a **State of World** (:term:`SoW`).  The maximum
45 e3c39cc3 Iustin Pop
  execution time of the **External Helper Command** will be limited to
46 e3c39cc3 Iustin Pop
  60s to prevent the cluster from getting locked for an undefined amount
47 e3c39cc3 Iustin Pop
  of time.
48 1e86ee97 Marc Schmitt
49 1e86ee97 Marc Schmitt
Detailed Design
50 1e86ee97 Marc Schmitt
---------------
51 1e86ee97 Marc Schmitt
52 1e86ee97 Marc Schmitt
New ``gnt-cluster`` Parameter
53 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++
54 1e86ee97 Marc Schmitt
55 1e86ee97 Marc Schmitt
| Program: ``gnt-cluster``
56 1e86ee97 Marc Schmitt
| Command: ``modify|init``
57 1e86ee97 Marc Schmitt
| Parameters: ``--oob-program``
58 1e86ee97 Marc Schmitt
| Options: ``--oob-program``: executable OOB program (absolute path)
59 1e86ee97 Marc Schmitt
60 effb49b4 René Nussbaumer
New ``gnt-cluster epo`` Command
61 effb49b4 René Nussbaumer
+++++++++++++++++++++++++++++++
62 effb49b4 René Nussbaumer
63 effb49b4 René Nussbaumer
| Program: ``gnt-cluster``
64 effb49b4 René Nussbaumer
| Command: ``epo``
65 effb49b4 René Nussbaumer
| Parameter: ``--on`` ``--force`` ``--groups`` ``--all``
66 effb49b4 René Nussbaumer
| Options: ``--on``: By default epo turns off, with ``--on`` it tries to get the
67 effb49b4 René Nussbaumer
|                    cluster back online
68 effb49b4 René Nussbaumer
|          ``--force``: To force the operation without asking for confirmation
69 effb49b4 René Nussbaumer
|          ``--groups``: To operate on groups instead of nodes
70 effb49b4 René Nussbaumer
|          ``--all``: To operate on the whole cluster
71 effb49b4 René Nussbaumer
72 e3c39cc3 Iustin Pop
This is a convenience command to allow easy emergency power off of a
73 e3c39cc3 Iustin Pop
whole cluster or part of it. It takes care of all steps needed to get
74 e3c39cc3 Iustin Pop
the cluster into a sane state to turn off the nodes.
75 effb49b4 René Nussbaumer
76 e3c39cc3 Iustin Pop
With ``--on`` it does the reverse and tries to bring the rest of the
77 e3c39cc3 Iustin Pop
cluster back to life.
78 effb49b4 René Nussbaumer
79 effb49b4 René Nussbaumer
.. note::
80 e3c39cc3 Iustin Pop
  The master node is not able to shut itself cleanly down. Therefore,
81 e3c39cc3 Iustin Pop
  this command will not do all the work on single node clusters. On
82 e3c39cc3 Iustin Pop
  multi node clusters the command tries to find another master or if
83 e3c39cc3 Iustin Pop
  that is not possible prepares everything to the point where the user
84 e3c39cc3 Iustin Pop
  has to shutdown the master node itself alone this applies also to the
85 e3c39cc3 Iustin Pop
  single node cluster configuration.
86 effb49b4 René Nussbaumer
87 1e86ee97 Marc Schmitt
New ``gnt-node`` Property
88 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++
89 1e86ee97 Marc Schmitt
90 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
91 1e86ee97 Marc Schmitt
| Command: ``modify|add``
92 1e86ee97 Marc Schmitt
| Parameters: ``--oob-program``
93 1e86ee97 Marc Schmitt
| Options: ``--oob-program``: executable OOB program (absolute path)
94 1e86ee97 Marc Schmitt
95 1e86ee97 Marc Schmitt
.. note::
96 e3c39cc3 Iustin Pop
  If ``--oob-program`` is set to ``!`` then the node has no OOB
97 e3c39cc3 Iustin Pop
  capabilities.  Otherwise, we will inherit the node group respectively
98 e3c39cc3 Iustin Pop
  the cluster wide value. I.e. the nodes have to opt out from OOB
99 e3c39cc3 Iustin Pop
  capabilities.
100 1e86ee97 Marc Schmitt
101 1e86ee97 Marc Schmitt
Addition to ``gnt-cluster verify``
102 1e86ee97 Marc Schmitt
++++++++++++++++++++++++++++++++++
103 1e86ee97 Marc Schmitt
104 1e86ee97 Marc Schmitt
| Program: ``gnt-cluster``
105 1e86ee97 Marc Schmitt
| Command: ``verify``
106 1e86ee97 Marc Schmitt
| Parameter: None
107 1e86ee97 Marc Schmitt
| Option: None
108 1e86ee97 Marc Schmitt
| Additional Checks:
109 1e86ee97 Marc Schmitt
110 e3c39cc3 Iustin Pop
  1. existence and execution flag of OOB program on all Master
111 e3c39cc3 Iustin Pop
     Candidates if the cluster parameter ``--oob-program`` is set or at
112 e3c39cc3 Iustin Pop
     least one node has the property ``--oob-program`` set. The OOB
113 e3c39cc3 Iustin Pop
     helper is just invoked on the master
114 e3c39cc3 Iustin Pop
  2. check if node state powered matches actual power state of the
115 e3c39cc3 Iustin Pop
     machine for those nodes where ``--oob-program`` is set
116 1e86ee97 Marc Schmitt
117 1e86ee97 Marc Schmitt
New Node State
118 1e86ee97 Marc Schmitt
++++++++++++++
119 1e86ee97 Marc Schmitt
120 1e86ee97 Marc Schmitt
Ganeti supports the following two boolean states related to the nodes:
121 1e86ee97 Marc Schmitt
122 1e86ee97 Marc Schmitt
**drained**
123 e3c39cc3 Iustin Pop
  The cluster still communicates with drained nodes but excludes them
124 e3c39cc3 Iustin Pop
  from allocation operations
125 1e86ee97 Marc Schmitt
126 1e86ee97 Marc Schmitt
**offline**
127 e3c39cc3 Iustin Pop
  if offline, the cluster does not communicate with offline nodes;
128 e3c39cc3 Iustin Pop
  useful for nodes that are not reachable in order to avoid delays
129 1e86ee97 Marc Schmitt
130 1e86ee97 Marc Schmitt
And will extend this list with the following boolean state:
131 1e86ee97 Marc Schmitt
132 1e86ee97 Marc Schmitt
**powered**
133 e3c39cc3 Iustin Pop
  if not powered, the cluster does not communicate with not powered
134 e3c39cc3 Iustin Pop
  nodes if the node property ``--oob-program`` is not set, the state
135 e3c39cc3 Iustin Pop
  powered is not displayed
136 1e86ee97 Marc Schmitt
137 1e86ee97 Marc Schmitt
Additionally modify the meaning of the offline state as follows:
138 1e86ee97 Marc Schmitt
139 1e86ee97 Marc Schmitt
**offline**
140 e3c39cc3 Iustin Pop
  if offline, the cluster does not communicate with offline nodes
141 e3c39cc3 Iustin Pop
  (**with the exception of OOB commands for nodes where**
142 e3c39cc3 Iustin Pop
  ``--oob-program`` **is set**); useful for nodes that are not reachable
143 e3c39cc3 Iustin Pop
  in order to avoid delays
144 1e86ee97 Marc Schmitt
145 1e86ee97 Marc Schmitt
The corresponding command extensions are:
146 1e86ee97 Marc Schmitt
147 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
148 1e86ee97 Marc Schmitt
| Command: ``info``
149 1e86ee97 Marc Schmitt
| Parameter:  [ ``nodename`` ... ]
150 1e86ee97 Marc Schmitt
| Option: None
151 1e86ee97 Marc Schmitt
152 e3c39cc3 Iustin Pop
Additional Output (:term:`SoR`, ommited if node property
153 e3c39cc3 Iustin Pop
``--oob-program`` is not set):
154 1e86ee97 Marc Schmitt
powered: ``[True|False]``
155 1e86ee97 Marc Schmitt
156 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
157 1e86ee97 Marc Schmitt
| Command: ``modify``
158 1e86ee97 Marc Schmitt
| Parameter: nodename
159 1e86ee97 Marc Schmitt
| Option: [ ``--powered=yes|no`` ]
160 e3c39cc3 Iustin Pop
| Reasoning: sometimes you will need to sync the :term:`SoR` with the :term:`SoW` manually
161 1e86ee97 Marc Schmitt
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
162 1e86ee97 Marc Schmitt
|         the node in question
163 1e86ee97 Marc Schmitt
164 1e86ee97 Marc Schmitt
New ``gnt-node`` commands: ``power [on|off|cycle|status]``
165 1e86ee97 Marc Schmitt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
166 1e86ee97 Marc Schmitt
167 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
168 1e86ee97 Marc Schmitt
| Command: ``power [on|off|cycle|status]``
169 1e86ee97 Marc Schmitt
| Parameters: [ ``nodename`` ... ]
170 1e86ee97 Marc Schmitt
| Options: None
171 1e86ee97 Marc Schmitt
| Caveats:
172 1e86ee97 Marc Schmitt
173 e3c39cc3 Iustin Pop
  * If no nodenames are passed to ``power [on|off|cycle]``, the user
174 e3c39cc3 Iustin Pop
    will be prompted with ``"Do you really want to power [on|off|cycle]
175 e3c39cc3 Iustin Pop
    the following nodes: <display list of OOB capable nodes in the
176 e3c39cc3 Iustin Pop
    cluster)? (y/n)"``
177 1e86ee97 Marc Schmitt
  * For ``power-status``, nodename is optional, if omitted, we list the
178 e3c39cc3 Iustin Pop
    power-status of all OOB capable nodes in the cluster (:term:`SoW`)
179 1e86ee97 Marc Schmitt
  * User should be warned and needs to confirm with yes if s/he tries to
180 1e86ee97 Marc Schmitt
    ``power [off|cycle]`` a node with running instances.
181 1e86ee97 Marc Schmitt
182 1e86ee97 Marc Schmitt
Error Handling
183 1e86ee97 Marc Schmitt
^^^^^^^^^^^^^^
184 1e86ee97 Marc Schmitt
185 e3c39cc3 Iustin Pop
+-----------------------------+----------------------------------------------+
186 e3c39cc3 Iustin Pop
| Exception                   | Error Message                                |
187 e3c39cc3 Iustin Pop
+=============================+==============================================+
188 e3c39cc3 Iustin Pop
| OOB program return code != 0| OOB program execution failed ($ERROR_MSG)    |
189 e3c39cc3 Iustin Pop
+-----------------------------+----------------------------------------------+
190 e3c39cc3 Iustin Pop
| OOB program execution time  | OOB program execution timeout exceeded, OOB  |
191 e3c39cc3 Iustin Pop
| exceeds 60s                 | program execution aborted                    |
192 e3c39cc3 Iustin Pop
+-----------------------------+----------------------------------------------+
193 1e86ee97 Marc Schmitt
194 1e86ee97 Marc Schmitt
Node State Changes
195 1e86ee97 Marc Schmitt
^^^^^^^^^^^^^^^^^^
196 1e86ee97 Marc Schmitt
197 e3c39cc3 Iustin Pop
+----------------+---------------+----------------+--------------------------+
198 e3c39cc3 Iustin Pop
| State before   |Command        | State after    | Comment                  |
199 e3c39cc3 Iustin Pop
| execution      |               | execution      |                          |
200 e3c39cc3 Iustin Pop
+================+===============+================+==========================+
201 e3c39cc3 Iustin Pop
| powered: False |``power off``  | powered: False | FYI: IPMI will complain  |
202 e3c39cc3 Iustin Pop
|                |               |                | if you try to power off  |
203 e3c39cc3 Iustin Pop
|                |               |                | a machine that is already|
204 e3c39cc3 Iustin Pop
|                |               |                | powered off              |
205 e3c39cc3 Iustin Pop
+----------------+---------------+----------------+--------------------------+
206 e3c39cc3 Iustin Pop
| powered: False |``power cycle``| powered: False | FYI: IPMI will complain  |
207 e3c39cc3 Iustin Pop
|                |               |                | if you try to cycle a    |
208 e3c39cc3 Iustin Pop
|                |               |                | machine that is already  |
209 e3c39cc3 Iustin Pop
|                |               |                | powered off              |
210 e3c39cc3 Iustin Pop
+----------------+---------------+----------------+--------------------------+
211 e3c39cc3 Iustin Pop
| powered: False |``power on``   | powered: True  |                          |
212 e3c39cc3 Iustin Pop
+----------------+---------------+----------------+--------------------------+
213 e3c39cc3 Iustin Pop
| powered: True  |``power off``  | powered: False |                          |
214 e3c39cc3 Iustin Pop
+----------------+---------------+----------------+--------------------------+
215 e3c39cc3 Iustin Pop
| powered: True  |``power cycle``| powered: True  |                          |
216 e3c39cc3 Iustin Pop
+----------------+---------------+----------------+--------------------------+
217 e3c39cc3 Iustin Pop
| powered: True  |``power on``   | powered: True  | FYI: IPMI will complain  |
218 e3c39cc3 Iustin Pop
|                |               |                | if you try to power on   |
219 e3c39cc3 Iustin Pop
|                |               |                | a machine that is already|
220 e3c39cc3 Iustin Pop
|                |               |                | powered on               |
221 e3c39cc3 Iustin Pop
+----------------+---------------+----------------+--------------------------+
222 1e86ee97 Marc Schmitt
223 1e86ee97 Marc Schmitt
.. note::
224 1e86ee97 Marc Schmitt
225 1e86ee97 Marc Schmitt
  * If the command fails, the Node State remains unchanged.
226 1e86ee97 Marc Schmitt
  * We will not prevent the user from trying to power off a node that is
227 e3c39cc3 Iustin Pop
    already powered off since the powered state represents the
228 e3c39cc3 Iustin Pop
    :term:`SoR` only and not the :term:`SoW`. This can however create
229 e3c39cc3 Iustin Pop
    problems when the cluster administrator wants to bring the
230 e3c39cc3 Iustin Pop
    :term:`SoR` in sync with the :term:SoW` without actually having to
231 e3c39cc3 Iustin Pop
    mess with the node(s). For this case, we allow direct modification
232 e3c39cc3 Iustin Pop
    of the powered state through the gnt-node modify
233 e3c39cc3 Iustin Pop
    ``--powered=[yes|no]`` command as long as the node has OOB
234 e3c39cc3 Iustin Pop
    capabilities (i.e. ``--oob-program`` is set).
235 1e86ee97 Marc Schmitt
  * All node power state changes will be logged
236 1e86ee97 Marc Schmitt
237 e3c39cc3 Iustin Pop
Node Power Status Listing (:term:`SoW`)
238 e3c39cc3 Iustin Pop
+++++++++++++++++++++++++++++++++++++++
239 1e86ee97 Marc Schmitt
240 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
241 1e86ee97 Marc Schmitt
| Command: ``power-status``
242 1e86ee97 Marc Schmitt
| Parameters: [ ``nodename`` ... ]
243 1e86ee97 Marc Schmitt
244 e3c39cc3 Iustin Pop
Example output (represents :term:`SoW`)::
245 1e86ee97 Marc Schmitt
246 1e86ee97 Marc Schmitt
  gnt-node oob power-status
247 1e86ee97 Marc Schmitt
  Node                      Power Status
248 1e86ee97 Marc Schmitt
  node1.example.com         on
249 1e86ee97 Marc Schmitt
  node2.example.com         off
250 1e86ee97 Marc Schmitt
  node3.example.com         on
251 1e86ee97 Marc Schmitt
  node4.example.com         unknown
252 1e86ee97 Marc Schmitt
253 1e86ee97 Marc Schmitt
.. note::
254 1e86ee97 Marc Schmitt
255 e3c39cc3 Iustin Pop
  * We use ``unknown`` in case the Helper Program could not determine
256 e3c39cc3 Iustin Pop
    the power state.
257 e3c39cc3 Iustin Pop
  * If no nodenames are provided, we will list the power state of all
258 e3c39cc3 Iustin Pop
    nodes which are not opted out from OOB management.
259 e3c39cc3 Iustin Pop
  * Only nodes which are not opted out from OOB management will be
260 e3c39cc3 Iustin Pop
    listed.  Invoking the command on a node that does not meet this
261 e3c39cc3 Iustin Pop
    condition will result in an error message "Node X does not support
262 e3c39cc3 Iustin Pop
    OOB commands".
263 1e86ee97 Marc Schmitt
264 e3c39cc3 Iustin Pop
Node Power Status Listing (:term:`SoR`)
265 e3c39cc3 Iustin Pop
+++++++++++++++++++++++++++++++++++++++
266 1e86ee97 Marc Schmitt
267 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
268 1e86ee97 Marc Schmitt
| Command: ``info``
269 1e86ee97 Marc Schmitt
| Parameter:  [ ``nodename`` ... ]
270 1e86ee97 Marc Schmitt
| Option: None
271 1e86ee97 Marc Schmitt
272 e3c39cc3 Iustin Pop
Example output (represents :term:`SoR`)::
273 1e86ee97 Marc Schmitt
274 1e86ee97 Marc Schmitt
  gnt-node info node1.example.com
275 1e86ee97 Marc Schmitt
  Node name: node1.example.com
276 1e86ee97 Marc Schmitt
    primary ip: 192.168.1.1
277 1e86ee97 Marc Schmitt
    secondary ip: 192.168.2.1
278 1e86ee97 Marc Schmitt
    master candidate: True
279 1e86ee97 Marc Schmitt
    drained: False
280 1e86ee97 Marc Schmitt
    offline: False
281 1e86ee97 Marc Schmitt
    powered: True
282 1e86ee97 Marc Schmitt
    primary for instances:
283 1e86ee97 Marc Schmitt
      - inst1.example.com
284 1e86ee97 Marc Schmitt
      - inst2.example.com
285 1e86ee97 Marc Schmitt
      - inst3.example.com
286 1e86ee97 Marc Schmitt
    secondary for instances:
287 1e86ee97 Marc Schmitt
      - inst4.example.com
288 1e86ee97 Marc Schmitt
      - inst5.example.com
289 1e86ee97 Marc Schmitt
      - inst6.example.com
290 1e86ee97 Marc Schmitt
      - inst7.example.com
291 1e86ee97 Marc Schmitt
292 1e86ee97 Marc Schmitt
.. note::
293 e3c39cc3 Iustin Pop
  Only nodes which are not opted out from OOB management will report the
294 e3c39cc3 Iustin Pop
  powered state.
295 1e86ee97 Marc Schmitt
296 1e86ee97 Marc Schmitt
New ``gnt-node`` oob subcommand: ``health``
297 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++++++++++++++++
298 1e86ee97 Marc Schmitt
299 1e86ee97 Marc Schmitt
| Program: ``gnt-node``
300 1e86ee97 Marc Schmitt
| Command: ``health``
301 1e86ee97 Marc Schmitt
| Parameters: [ ``nodename`` ... ]
302 1e86ee97 Marc Schmitt
| Options: None
303 1e86ee97 Marc Schmitt
| Example: ``/usr/bin/oob health node5.example.com``
304 1e86ee97 Marc Schmitt
305 1e86ee97 Marc Schmitt
Caveats:
306 1e86ee97 Marc Schmitt
307 e3c39cc3 Iustin Pop
  * If no nodename(s) are provided, we will report the health of all
308 e3c39cc3 Iustin Pop
    nodes in the cluster which have ``--oob-program`` set.
309 e3c39cc3 Iustin Pop
  * Only nodes which are not opted out from OOB management will report
310 e3c39cc3 Iustin Pop
    their health. Invoking the command on a node that does not meet this
311 e3c39cc3 Iustin Pop
    condition will result in an error message "Node does not support OOB
312 e3c39cc3 Iustin Pop
    commands".
313 1e86ee97 Marc Schmitt
314 1e86ee97 Marc Schmitt
For error handling see `Error Handling`_
315 1e86ee97 Marc Schmitt
316 1e86ee97 Marc Schmitt
OOB Program (Helper Program) Parameters, Return Codes and Data Format
317 1e86ee97 Marc Schmitt
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
318 1e86ee97 Marc Schmitt
319 1e86ee97 Marc Schmitt
| Program: executable OOB program (absolute path)
320 1e86ee97 Marc Schmitt
| Parameters: command nodename
321 1e86ee97 Marc Schmitt
| Command: [power-{on|off|cycle|status}|health]
322 1e86ee97 Marc Schmitt
| Options: None
323 1e86ee97 Marc Schmitt
| Example: ``/usr/bin/oob power-on node1.example.com``
324 1e86ee97 Marc Schmitt
| Caveat: maximum runtime is limited to 60s
325 1e86ee97 Marc Schmitt
326 1e86ee97 Marc Schmitt
Return Codes
327 1e86ee97 Marc Schmitt
^^^^^^^^^^^^
328 1e86ee97 Marc Schmitt
329 e3c39cc3 Iustin Pop
+-------------+-------------------------+
330 e3c39cc3 Iustin Pop
| Return code | Meaning                 |
331 e3c39cc3 Iustin Pop
+=============+=========================+
332 e3c39cc3 Iustin Pop
| 0           | Command succeeded       |
333 e3c39cc3 Iustin Pop
+-------------+-------------------------+
334 e3c39cc3 Iustin Pop
| 1           | Command failed          |
335 e3c39cc3 Iustin Pop
+-------------+-------------------------+
336 e3c39cc3 Iustin Pop
| others      | Unsupported/undefined   |
337 e3c39cc3 Iustin Pop
+-------------+-------------------------+
338 e3c39cc3 Iustin Pop
339 e3c39cc3 Iustin Pop
Error messages are passed from the helper program to Ganeti through
340 e3c39cc3 Iustin Pop
:manpage:`stderr(3)` (return code == 1).  On :manpage:`stdout(3)`, the
341 e3c39cc3 Iustin Pop
helper program will send data back to Ganeti (return code == 0). The
342 e3c39cc3 Iustin Pop
format of the data is JSON.
343 e3c39cc3 Iustin Pop
344 e3c39cc3 Iustin Pop
+-----------------+------------------------------+
345 e3c39cc3 Iustin Pop
| Command         | Expected output              |
346 e3c39cc3 Iustin Pop
+=================+==============================+
347 e3c39cc3 Iustin Pop
| ``power-on``    | None                         |
348 e3c39cc3 Iustin Pop
+-----------------+------------------------------+
349 e3c39cc3 Iustin Pop
| ``power-off``   | None                         |
350 e3c39cc3 Iustin Pop
+-----------------+------------------------------+
351 e3c39cc3 Iustin Pop
| ``power-cycle`` | None                         |
352 e3c39cc3 Iustin Pop
+-----------------+------------------------------+
353 e3c39cc3 Iustin Pop
| ``power-status``| ``{ "powered": true|false }``|
354 e3c39cc3 Iustin Pop
+-----------------+------------------------------+
355 e3c39cc3 Iustin Pop
| ``health``      | ::                           |
356 e3c39cc3 Iustin Pop
|                 |                              |
357 e3c39cc3 Iustin Pop
|                 |   [[item, status],           |
358 e3c39cc3 Iustin Pop
|                 |    [item, status],           |
359 e3c39cc3 Iustin Pop
|                 |    ...]                      |
360 e3c39cc3 Iustin Pop
+-----------------+------------------------------+
361 1e86ee97 Marc Schmitt
362 1e86ee97 Marc Schmitt
Data Format
363 1e86ee97 Marc Schmitt
^^^^^^^^^^^
364 1e86ee97 Marc Schmitt
365 1e86ee97 Marc Schmitt
For the health output, the fields are:
366 1e86ee97 Marc Schmitt
367 e3c39cc3 Iustin Pop
+--------+------------------------------------------------------------------+
368 e3c39cc3 Iustin Pop
| Field  | Meaning                                                          |
369 e3c39cc3 Iustin Pop
+========+==================================================================+
370 e3c39cc3 Iustin Pop
| item   | String identifier of the item we are querying the health of,     |
371 e3c39cc3 Iustin Pop
|        | examples:                                                        |
372 e3c39cc3 Iustin Pop
|        |                                                                  |
373 e3c39cc3 Iustin Pop
|        |   * Ambient Temp                                                 |
374 e3c39cc3 Iustin Pop
|        |   * PS Redundancy                                                |
375 e3c39cc3 Iustin Pop
|        |   * FAN 1 RPM                                                    |
376 e3c39cc3 Iustin Pop
+--------+------------------------------------------------------------------+
377 e3c39cc3 Iustin Pop
| status | String; Can take one of the following four values:               |
378 e3c39cc3 Iustin Pop
|        |                                                                  |
379 e3c39cc3 Iustin Pop
|        |   * OK                                                           |
380 e3c39cc3 Iustin Pop
|        |   * WARNING                                                      |
381 e3c39cc3 Iustin Pop
|        |   * CRITICAL                                                     |
382 e3c39cc3 Iustin Pop
|        |   * UNKNOWN                                                      |
383 e3c39cc3 Iustin Pop
+--------+------------------------------------------------------------------+
384 1e86ee97 Marc Schmitt
385 1e86ee97 Marc Schmitt
.. note::
386 1e86ee97 Marc Schmitt
387 e3c39cc3 Iustin Pop
  * The item output list is defined by the Helper Program. It is up to
388 e3c39cc3 Iustin Pop
    the author of the Helper Program to decide which items should be
389 e3c39cc3 Iustin Pop
    monitored and what each corresponding return status is.
390 e3c39cc3 Iustin Pop
  * Ganeti will currently not take any actions based on the item
391 e3c39cc3 Iustin Pop
    status. It will however create log entries for items with status
392 e3c39cc3 Iustin Pop
    WARNING or CRITICAL for each run of the ``gnt-node oob health
393 e3c39cc3 Iustin Pop
    nodename`` command. Automatic actions (regular monitoring of the
394 e3c39cc3 Iustin Pop
    item status) is considered a new service and will be treated in a
395 e3c39cc3 Iustin Pop
    separate design document.
396 1e86ee97 Marc Schmitt
397 1e86ee97 Marc Schmitt
Logging
398 1e86ee97 Marc Schmitt
-------
399 1e86ee97 Marc Schmitt
400 e3c39cc3 Iustin Pop
The ``gnt-node power-[on|off]`` (power state changes) commands will
401 e3c39cc3 Iustin Pop
create log entries following current Ganeti logging practices. In
402 e3c39cc3 Iustin Pop
addition, health items with status WARNING or CRITICAL will be logged
403 e3c39cc3 Iustin Pop
for each run of ``gnt-node health``.
404 9ff4f2c0 Michael Hanselmann
405 9ff4f2c0 Michael Hanselmann
.. vim: set textwidth=72 :
406 9ff4f2c0 Michael Hanselmann
.. Local Variables:
407 9ff4f2c0 Michael Hanselmann
.. mode: rst
408 9ff4f2c0 Michael Hanselmann
.. fill-column: 72
409 9ff4f2c0 Michael Hanselmann
.. End: