root / doc / design-oob.rst @ 5fa0375e
History | View | Annotate | Download (16 kB)
1 |
Ganeti Node OOB Management Framework |
---|---|
2 |
==================================== |
3 |
|
4 |
Objective |
5 |
--------- |
6 |
|
7 |
Extend Ganeti with Out of Band (:term:`OOB`) Cluster Node Management |
8 |
Capabilities. |
9 |
|
10 |
Background |
11 |
---------- |
12 |
|
13 |
Ganeti currently has no support for Out of Band management of the nodes |
14 |
in a cluster. It relies on the OS running on the nodes and has therefore |
15 |
limited possibilities when the OS is not responding. The command |
16 |
``gnt-node powercycle`` can be issued to attempt a reboot of a node that |
17 |
crashed but there are no means to power a node off and power it back |
18 |
on. Supporting this is very handy in the following situations: |
19 |
|
20 |
* **Emergency Power Off**: During emergencies, time is critical and |
21 |
manual tasks just add latency which can be avoided through |
22 |
automation. If a server room overheats, halting the OS on the nodes |
23 |
is not enough. The nodes need to be powered off cleanly to prevent |
24 |
damage to equipment. |
25 |
* **Repairs**: In most cases, repairing a node means that the node has |
26 |
to be powered off. |
27 |
* **Crashes**: Software bugs may crash a node. Having an OS |
28 |
independent way to power-cycle a node helps to recover the node |
29 |
without human intervention. |
30 |
|
31 |
Overview |
32 |
-------- |
33 |
|
34 |
Ganeti will be extended with OOB capabilities through adding a new |
35 |
**Cluster Parameter** (``--oob-program``), a new **Node Property** |
36 |
(``--oob-program``), a new **Node State (powered)** and support in |
37 |
``gnt-node`` for invoking an **External Helper Command** which executes |
38 |
the actual OOB command (``gnt-node <command> nodename ...``). The |
39 |
supported commands are: ``power on``, ``power off``, ``power cycle``, |
40 |
``power status`` and ``health``. |
41 |
|
42 |
.. note:: |
43 |
The new **Node State (powered)** is a **State of Record** |
44 |
(:term:`SoR`), not a **State of World** (:term:`SoW`). The maximum |
45 |
execution time of the **External Helper Command** will be limited to |
46 |
60s to prevent the cluster from getting locked for an undefined amount |
47 |
of time. |
48 |
|
49 |
Detailed Design |
50 |
--------------- |
51 |
|
52 |
New ``gnt-cluster`` Parameter |
53 |
+++++++++++++++++++++++++++++ |
54 |
|
55 |
| Program: ``gnt-cluster`` |
56 |
| Command: ``modify|init`` |
57 |
| Parameters: ``--oob-program`` |
58 |
| Options: ``--oob-program``: executable OOB program (absolute path) |
59 |
|
60 |
New ``gnt-cluster epo`` Command |
61 |
+++++++++++++++++++++++++++++++ |
62 |
|
63 |
| Program: ``gnt-cluster`` |
64 |
| Command: ``epo`` |
65 |
| Parameter: ``--on`` ``--force`` ``--groups`` ``--all`` |
66 |
| Options: ``--on``: By default epo turns off, with ``--on`` it tries to get the |
67 |
| cluster back online |
68 |
| ``--force``: To force the operation without asking for confirmation |
69 |
| ``--groups``: To operate on groups instead of nodes |
70 |
| ``--all``: To operate on the whole cluster |
71 |
|
72 |
This is a convenience command to allow easy emergency power off of a |
73 |
whole cluster or part of it. It takes care of all steps needed to get |
74 |
the cluster into a sane state to turn off the nodes. |
75 |
|
76 |
With ``--on`` it does the reverse and tries to bring the rest of the |
77 |
cluster back to life. |
78 |
|
79 |
.. note:: |
80 |
The master node is not able to shut itself cleanly down. Therefore, |
81 |
this command will not do all the work on single node clusters. On |
82 |
multi node clusters the command tries to find another master or if |
83 |
that is not possible prepares everything to the point where the user |
84 |
has to shutdown the master node itself alone this applies also to the |
85 |
single node cluster configuration. |
86 |
|
87 |
New ``gnt-node`` Property |
88 |
+++++++++++++++++++++++++ |
89 |
|
90 |
| Program: ``gnt-node`` |
91 |
| Command: ``modify|add`` |
92 |
| Parameters: ``--oob-program`` |
93 |
| Options: ``--oob-program``: executable OOB program (absolute path) |
94 |
|
95 |
.. note:: |
96 |
If ``--oob-program`` is set to ``!`` then the node has no OOB |
97 |
capabilities. Otherwise, we will inherit the node group respectively |
98 |
the cluster wide value. I.e. the nodes have to opt out from OOB |
99 |
capabilities. |
100 |
|
101 |
Addition to ``gnt-cluster verify`` |
102 |
++++++++++++++++++++++++++++++++++ |
103 |
|
104 |
| Program: ``gnt-cluster`` |
105 |
| Command: ``verify`` |
106 |
| Parameter: None |
107 |
| Option: None |
108 |
| Additional Checks: |
109 |
|
110 |
1. existence and execution flag of OOB program on all Master |
111 |
Candidates if the cluster parameter ``--oob-program`` is set or at |
112 |
least one node has the property ``--oob-program`` set. The OOB |
113 |
helper is just invoked on the master |
114 |
2. check if node state powered matches actual power state of the |
115 |
machine for those nodes where ``--oob-program`` is set |
116 |
|
117 |
New Node State |
118 |
++++++++++++++ |
119 |
|
120 |
Ganeti supports the following two boolean states related to the nodes: |
121 |
|
122 |
**drained** |
123 |
The cluster still communicates with drained nodes but excludes them |
124 |
from allocation operations |
125 |
|
126 |
**offline** |
127 |
if offline, the cluster does not communicate with offline nodes; |
128 |
useful for nodes that are not reachable in order to avoid delays |
129 |
|
130 |
And will extend this list with the following boolean state: |
131 |
|
132 |
**powered** |
133 |
if not powered, the cluster does not communicate with not powered |
134 |
nodes if the node property ``--oob-program`` is not set, the state |
135 |
powered is not displayed |
136 |
|
137 |
Additionally modify the meaning of the offline state as follows: |
138 |
|
139 |
**offline** |
140 |
if offline, the cluster does not communicate with offline nodes |
141 |
(**with the exception of OOB commands for nodes where** |
142 |
``--oob-program`` **is set**); useful for nodes that are not reachable |
143 |
in order to avoid delays |
144 |
|
145 |
The corresponding command extensions are: |
146 |
|
147 |
| Program: ``gnt-node`` |
148 |
| Command: ``info`` |
149 |
| Parameter: [ ``nodename`` ... ] |
150 |
| Option: None |
151 |
|
152 |
Additional Output (:term:`SoR`, ommited if node property |
153 |
``--oob-program`` is not set): |
154 |
powered: ``[True|False]`` |
155 |
|
156 |
| Program: ``gnt-node`` |
157 |
| Command: ``modify`` |
158 |
| Parameter: nodename |
159 |
| Option: [ ``--powered=yes|no`` ] |
160 |
| Reasoning: sometimes you will need to sync the :term:`SoR` with the :term:`SoW` manually |
161 |
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for |
162 |
| the node in question |
163 |
|
164 |
New ``gnt-node`` commands: ``power [on|off|cycle|status]`` |
165 |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
166 |
|
167 |
| Program: ``gnt-node`` |
168 |
| Command: ``power [on|off|cycle|status]`` |
169 |
| Parameters: [ ``nodename`` ... ] |
170 |
| Options: None |
171 |
| Caveats: |
172 |
|
173 |
* If no nodenames are passed to ``power [on|off|cycle]``, the user |
174 |
will be prompted with ``"Do you really want to power [on|off|cycle] |
175 |
the following nodes: <display list of OOB capable nodes in the |
176 |
cluster)? (y/n)"`` |
177 |
* For ``power-status``, nodename is optional, if omitted, we list the |
178 |
power-status of all OOB capable nodes in the cluster (:term:`SoW`) |
179 |
* User should be warned and needs to confirm with yes if s/he tries to |
180 |
``power [off|cycle]`` a node with running instances. |
181 |
|
182 |
Error Handling |
183 |
^^^^^^^^^^^^^^ |
184 |
|
185 |
+-----------------------------+----------------------------------------------+ |
186 |
| Exception | Error Message | |
187 |
+=============================+==============================================+ |
188 |
| OOB program return code != 0| OOB program execution failed ($ERROR_MSG) | |
189 |
+-----------------------------+----------------------------------------------+ |
190 |
| OOB program execution time | OOB program execution timeout exceeded, OOB | |
191 |
| exceeds 60s | program execution aborted | |
192 |
+-----------------------------+----------------------------------------------+ |
193 |
|
194 |
Node State Changes |
195 |
^^^^^^^^^^^^^^^^^^ |
196 |
|
197 |
+----------------+---------------+----------------+--------------------------+ |
198 |
| State before |Command | State after | Comment | |
199 |
| execution | | execution | | |
200 |
+================+===============+================+==========================+ |
201 |
| powered: False |``power off`` | powered: False | FYI: IPMI will complain | |
202 |
| | | | if you try to power off | |
203 |
| | | | a machine that is already| |
204 |
| | | | powered off | |
205 |
+----------------+---------------+----------------+--------------------------+ |
206 |
| powered: False |``power cycle``| powered: False | FYI: IPMI will complain | |
207 |
| | | | if you try to cycle a | |
208 |
| | | | machine that is already | |
209 |
| | | | powered off | |
210 |
+----------------+---------------+----------------+--------------------------+ |
211 |
| powered: False |``power on`` | powered: True | | |
212 |
+----------------+---------------+----------------+--------------------------+ |
213 |
| powered: True |``power off`` | powered: False | | |
214 |
+----------------+---------------+----------------+--------------------------+ |
215 |
| powered: True |``power cycle``| powered: True | | |
216 |
+----------------+---------------+----------------+--------------------------+ |
217 |
| powered: True |``power on`` | powered: True | FYI: IPMI will complain | |
218 |
| | | | if you try to power on | |
219 |
| | | | a machine that is already| |
220 |
| | | | powered on | |
221 |
+----------------+---------------+----------------+--------------------------+ |
222 |
|
223 |
.. note:: |
224 |
|
225 |
* If the command fails, the Node State remains unchanged. |
226 |
* We will not prevent the user from trying to power off a node that is |
227 |
already powered off since the powered state represents the |
228 |
:term:`SoR` only and not the :term:`SoW`. This can however create |
229 |
problems when the cluster administrator wants to bring the |
230 |
:term:`SoR` in sync with the :term:SoW` without actually having to |
231 |
mess with the node(s). For this case, we allow direct modification |
232 |
of the powered state through the gnt-node modify |
233 |
``--powered=[yes|no]`` command as long as the node has OOB |
234 |
capabilities (i.e. ``--oob-program`` is set). |
235 |
* All node power state changes will be logged |
236 |
|
237 |
Node Power Status Listing (:term:`SoW`) |
238 |
+++++++++++++++++++++++++++++++++++++++ |
239 |
|
240 |
| Program: ``gnt-node`` |
241 |
| Command: ``power-status`` |
242 |
| Parameters: [ ``nodename`` ... ] |
243 |
|
244 |
Example output (represents :term:`SoW`):: |
245 |
|
246 |
gnt-node oob power-status |
247 |
Node Power Status |
248 |
node1.example.com on |
249 |
node2.example.com off |
250 |
node3.example.com on |
251 |
node4.example.com unknown |
252 |
|
253 |
.. note:: |
254 |
|
255 |
* We use ``unknown`` in case the Helper Program could not determine |
256 |
the power state. |
257 |
* If no nodenames are provided, we will list the power state of all |
258 |
nodes which are not opted out from OOB management. |
259 |
* Only nodes which are not opted out from OOB management will be |
260 |
listed. Invoking the command on a node that does not meet this |
261 |
condition will result in an error message "Node X does not support |
262 |
OOB commands". |
263 |
|
264 |
Node Power Status Listing (:term:`SoR`) |
265 |
+++++++++++++++++++++++++++++++++++++++ |
266 |
|
267 |
| Program: ``gnt-node`` |
268 |
| Command: ``info`` |
269 |
| Parameter: [ ``nodename`` ... ] |
270 |
| Option: None |
271 |
|
272 |
Example output (represents :term:`SoR`):: |
273 |
|
274 |
gnt-node info node1.example.com |
275 |
Node name: node1.example.com |
276 |
primary ip: 192.168.1.1 |
277 |
secondary ip: 192.168.2.1 |
278 |
master candidate: True |
279 |
drained: False |
280 |
offline: False |
281 |
powered: True |
282 |
primary for instances: |
283 |
- inst1.example.com |
284 |
- inst2.example.com |
285 |
- inst3.example.com |
286 |
secondary for instances: |
287 |
- inst4.example.com |
288 |
- inst5.example.com |
289 |
- inst6.example.com |
290 |
- inst7.example.com |
291 |
|
292 |
.. note:: |
293 |
Only nodes which are not opted out from OOB management will report the |
294 |
powered state. |
295 |
|
296 |
New ``gnt-node`` oob subcommand: ``health`` |
297 |
+++++++++++++++++++++++++++++++++++++++++++ |
298 |
|
299 |
| Program: ``gnt-node`` |
300 |
| Command: ``health`` |
301 |
| Parameters: [ ``nodename`` ... ] |
302 |
| Options: None |
303 |
| Example: ``/usr/bin/oob health node5.example.com`` |
304 |
|
305 |
Caveats: |
306 |
|
307 |
* If no nodename(s) are provided, we will report the health of all |
308 |
nodes in the cluster which have ``--oob-program`` set. |
309 |
* Only nodes which are not opted out from OOB management will report |
310 |
their health. Invoking the command on a node that does not meet this |
311 |
condition will result in an error message "Node does not support OOB |
312 |
commands". |
313 |
|
314 |
For error handling see `Error Handling`_ |
315 |
|
316 |
OOB Program (Helper Program) Parameters, Return Codes and Data Format |
317 |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
318 |
|
319 |
| Program: executable OOB program (absolute path) |
320 |
| Parameters: command nodename |
321 |
| Command: [power-{on|off|cycle|status}|health] |
322 |
| Options: None |
323 |
| Example: ``/usr/bin/oob power-on node1.example.com`` |
324 |
| Caveat: maximum runtime is limited to 60s |
325 |
|
326 |
Return Codes |
327 |
^^^^^^^^^^^^ |
328 |
|
329 |
+-------------+-------------------------+ |
330 |
| Return code | Meaning | |
331 |
+=============+=========================+ |
332 |
| 0 | Command succeeded | |
333 |
+-------------+-------------------------+ |
334 |
| 1 | Command failed | |
335 |
+-------------+-------------------------+ |
336 |
| others | Unsupported/undefined | |
337 |
+-------------+-------------------------+ |
338 |
|
339 |
Error messages are passed from the helper program to Ganeti through |
340 |
:manpage:`stderr(3)` (return code == 1). On :manpage:`stdout(3)`, the |
341 |
helper program will send data back to Ganeti (return code == 0). The |
342 |
format of the data is JSON. |
343 |
|
344 |
+-----------------+------------------------------+ |
345 |
| Command | Expected output | |
346 |
+=================+==============================+ |
347 |
| ``power-on`` | None | |
348 |
+-----------------+------------------------------+ |
349 |
| ``power-off`` | None | |
350 |
+-----------------+------------------------------+ |
351 |
| ``power-cycle`` | None | |
352 |
+-----------------+------------------------------+ |
353 |
| ``power-status``| ``{ "powered": true|false }``| |
354 |
+-----------------+------------------------------+ |
355 |
| ``health`` | :: | |
356 |
| | | |
357 |
| | [[item, status], | |
358 |
| | [item, status], | |
359 |
| | ...] | |
360 |
+-----------------+------------------------------+ |
361 |
|
362 |
Data Format |
363 |
^^^^^^^^^^^ |
364 |
|
365 |
For the health output, the fields are: |
366 |
|
367 |
+--------+------------------------------------------------------------------+ |
368 |
| Field | Meaning | |
369 |
+========+==================================================================+ |
370 |
| item | String identifier of the item we are querying the health of, | |
371 |
| | examples: | |
372 |
| | | |
373 |
| | * Ambient Temp | |
374 |
| | * PS Redundancy | |
375 |
| | * FAN 1 RPM | |
376 |
+--------+------------------------------------------------------------------+ |
377 |
| status | String; Can take one of the following four values: | |
378 |
| | | |
379 |
| | * OK | |
380 |
| | * WARNING | |
381 |
| | * CRITICAL | |
382 |
| | * UNKNOWN | |
383 |
+--------+------------------------------------------------------------------+ |
384 |
|
385 |
.. note:: |
386 |
|
387 |
* The item output list is defined by the Helper Program. It is up to |
388 |
the author of the Helper Program to decide which items should be |
389 |
monitored and what each corresponding return status is. |
390 |
* Ganeti will currently not take any actions based on the item |
391 |
status. It will however create log entries for items with status |
392 |
WARNING or CRITICAL for each run of the ``gnt-node oob health |
393 |
nodename`` command. Automatic actions (regular monitoring of the |
394 |
item status) is considered a new service and will be treated in a |
395 |
separate design document. |
396 |
|
397 |
Logging |
398 |
------- |
399 |
|
400 |
The ``gnt-node power-[on|off]`` (power state changes) commands will |
401 |
create log entries following current Ganeti logging practices. In |
402 |
addition, health items with status WARNING or CRITICAL will be logged |
403 |
for each run of ``gnt-node health``. |
404 |
|
405 |
.. vim: set textwidth=72 : |
406 |
.. Local Variables: |
407 |
.. mode: rst |
408 |
.. fill-column: 72 |
409 |
.. End: |