root / doc / design-oob.rst @ 9ff4f2c0
History | View | Annotate | Download (16 kB)
1 |
Ganeti Node OOB Management Framework |
---|---|
2 |
==================================== |
3 |
|
4 |
Objective |
5 |
--------- |
6 |
|
7 |
Extend Ganeti with Out of Band Cluster Node Management Capabilities. |
8 |
|
9 |
Background |
10 |
---------- |
11 |
|
12 |
Ganeti currently has no support for Out of Band management of the nodes in a |
13 |
cluster. It relies on the OS running on the nodes and has therefore limited |
14 |
possibilities when the OS is not responding. The command ``gnt-node powercycle`` |
15 |
can be issued to attempt a reboot of a node that crashed but there are no means |
16 |
to power a node off and power it back on. Supporting this is very handy in the |
17 |
following situations: |
18 |
|
19 |
* **Emergency Power Off**: During emergencies, time is critical and manual |
20 |
tasks just add latency which can be avoided through automation. If a server |
21 |
room overheats, halting the OS on the nodes is not enough. The nodes need |
22 |
to be powered off cleanly to prevent damage to equipment. |
23 |
* **Repairs**: In most cases, repairing a node means that the node has to be |
24 |
powered off. |
25 |
* **Crashes**: Software bugs may crash a node. Having an OS independent way to |
26 |
power-cycle a node helps to recover the node without human intervention. |
27 |
|
28 |
Overview |
29 |
-------- |
30 |
|
31 |
Ganeti will be extended with OOB capabilities through adding a new **Cluster |
32 |
Parameter** (``--oob-program``), a new **Node Property** (``--oob-program``), a |
33 |
new **Node State (powered)** and support in ``gnt-node`` for invoking an |
34 |
**External Helper Command** which executes the actual OOB command (``gnt-node |
35 |
<command> nodename ...``). The supported commands are: ``power on``, |
36 |
``power off``, ``power cycle``, ``power status`` and ``health``. |
37 |
|
38 |
.. note:: |
39 |
The new **Node State (powered)** is a **State of Record |
40 |
(SoR)**, not a **State of World (SoW)**. The maximum execution time of the |
41 |
**External Helper Command** will be limited to 60s to prevent the cluster from |
42 |
getting locked for an undefined amount of time. |
43 |
|
44 |
Detailed Design |
45 |
--------------- |
46 |
|
47 |
New ``gnt-cluster`` Parameter |
48 |
+++++++++++++++++++++++++++++ |
49 |
|
50 |
| Program: ``gnt-cluster`` |
51 |
| Command: ``modify|init`` |
52 |
| Parameters: ``--oob-program`` |
53 |
| Options: ``--oob-program``: executable OOB program (absolute path) |
54 |
|
55 |
New ``gnt-cluster epo`` Command |
56 |
+++++++++++++++++++++++++++++++ |
57 |
|
58 |
| Program: ``gnt-cluster`` |
59 |
| Command: ``epo`` |
60 |
| Parameter: ``--on`` ``--force`` ``--groups`` ``--all`` |
61 |
| Options: ``--on``: By default epo turns off, with ``--on`` it tries to get the |
62 |
| cluster back online |
63 |
| ``--force``: To force the operation without asking for confirmation |
64 |
| ``--groups``: To operate on groups instead of nodes |
65 |
| ``--all``: To operate on the whole cluster |
66 |
|
67 |
This is a convenience command to allow easy emergency power off of a whole |
68 |
cluster or part of it. It takes care of all steps needed to get the cluster into |
69 |
a sane state to turn off the nodes. |
70 |
|
71 |
With ``--on`` it does the reverse and tries to bring the rest of the cluster back |
72 |
to life. |
73 |
|
74 |
.. note:: |
75 |
The master node is not able to shut itself cleanly down. Therefore, this |
76 |
command will not do all the work on single node clusters. On multi node |
77 |
clusters the command tries to find another master or if that is not possible |
78 |
prepares everything to the point where the user has to shutdown the master |
79 |
node itself alone this applies also to the single node cluster configuration. |
80 |
|
81 |
New ``gnt-node`` Property |
82 |
+++++++++++++++++++++++++ |
83 |
|
84 |
| Program: ``gnt-node`` |
85 |
| Command: ``modify|add`` |
86 |
| Parameters: ``--oob-program`` |
87 |
| Options: ``--oob-program``: executable OOB program (absolute path) |
88 |
|
89 |
.. note:: |
90 |
If ``--oob-program`` is set to ``!`` then the node has no OOB capabilities. |
91 |
Otherwise, we will inherit the node group respectively the cluster wide |
92 |
value. I.e. the nodes have to opt out from OOB capabilities. |
93 |
|
94 |
Addition to ``gnt-cluster verify`` |
95 |
++++++++++++++++++++++++++++++++++ |
96 |
|
97 |
| Program: ``gnt-cluster`` |
98 |
| Command: ``verify`` |
99 |
| Parameter: None |
100 |
| Option: None |
101 |
| Additional Checks: |
102 |
|
103 |
1. existence and execution flag of OOB program on all Master Candidates if |
104 |
the cluster parameter ``--oob-program`` is set or at least one node has |
105 |
the property ``--oob-program`` set. The OOB helper is just invoked on the |
106 |
master |
107 |
2. check if node state powered matches actual power state of the machine for |
108 |
those nodes where ``--oob-program`` is set |
109 |
|
110 |
New Node State |
111 |
++++++++++++++ |
112 |
|
113 |
Ganeti supports the following two boolean states related to the nodes: |
114 |
|
115 |
**drained** |
116 |
The cluster still communicates with drained nodes but excludes them from |
117 |
allocation operations |
118 |
|
119 |
**offline** |
120 |
if offline, the cluster does not communicate with offline nodes; useful for |
121 |
nodes that are not reachable in order to avoid delays |
122 |
|
123 |
And will extend this list with the following boolean state: |
124 |
|
125 |
**powered** |
126 |
if not powered, the cluster does not communicate with not powered nodes if |
127 |
the node property ``--oob-program`` is not set, the state powered is not |
128 |
displayed |
129 |
|
130 |
Additionally modify the meaning of the offline state as follows: |
131 |
|
132 |
**offline** |
133 |
if offline, the cluster does not communicate with offline nodes (**with the |
134 |
exception of OOB commands for nodes where** ``--oob-program`` **is set**); |
135 |
useful for nodes that are not reachable in order to avoid delays |
136 |
|
137 |
The corresponding command extensions are: |
138 |
|
139 |
| Program: ``gnt-node`` |
140 |
| Command: ``info`` |
141 |
| Parameter: [ ``nodename`` ... ] |
142 |
| Option: None |
143 |
|
144 |
Additional Output (SoR, ommited if node property ``--oob-program`` is not set): |
145 |
powered: ``[True|False]`` |
146 |
|
147 |
| Program: ``gnt-node`` |
148 |
| Command: ``modify`` |
149 |
| Parameter: nodename |
150 |
| Option: [ ``--powered=yes|no`` ] |
151 |
| Reasoning: sometimes you will need to sync the SoR with the SoW manually |
152 |
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for |
153 |
| the node in question |
154 |
|
155 |
New ``gnt-node`` commands: ``power [on|off|cycle|status]`` |
156 |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
157 |
|
158 |
| Program: ``gnt-node`` |
159 |
| Command: ``power [on|off|cycle|status]`` |
160 |
| Parameters: [ ``nodename`` ... ] |
161 |
| Options: None |
162 |
| Caveats: |
163 |
|
164 |
* If no nodenames are passed to ``power [on|off|cycle]``, the user will be |
165 |
prompted with ``"Do you really want to power [on|off|cycle] the following |
166 |
nodes: <display list of OOB capable nodes in the cluster)? (y/n)"`` |
167 |
* For ``power-status``, nodename is optional, if omitted, we list the |
168 |
power-status of all OOB capable nodes in the cluster (SoW) |
169 |
* User should be warned and needs to confirm with yes if s/he tries to |
170 |
``power [off|cycle]`` a node with running instances. |
171 |
|
172 |
Error Handling |
173 |
^^^^^^^^^^^^^^ |
174 |
|
175 |
+------------------------------+-----------------------------------------------+ |
176 |
| Exception | Error Message | |
177 |
+==============================+===============================================+ |
178 |
| OOB program return code != 0 | OOB program execution failed ($ERROR_MSG) | |
179 |
+------------------------------+-----------------------------------------------+ |
180 |
| OOB program execution time | OOB program execution timeout exceeded, OOB | |
181 |
| exceeds 60s | program execution aborted | |
182 |
+------------------------------+-----------------------------------------------+ |
183 |
|
184 |
Node State Changes |
185 |
^^^^^^^^^^^^^^^^^^ |
186 |
|
187 |
+----------------+-----------------+----------------+--------------------------+ |
188 |
| State before | Command | State after | Comment | |
189 |
| execution | | execution | | |
190 |
+================+=================+================+==========================+ |
191 |
| powered: False | ``power off`` | powered: False | FYI: IPMI will complain | |
192 |
| | | | if you try to power off | |
193 |
| | | | a machine that is already| |
194 |
| | | | powered off | |
195 |
+----------------+-----------------+----------------+--------------------------+ |
196 |
| powered: False | ``power cycle`` | powered: False | FYI: IPMI will complain | |
197 |
| | | | if you try to cycle a | |
198 |
| | | | machine that is already | |
199 |
| | | | powered off | |
200 |
+----------------+-----------------+----------------+--------------------------+ |
201 |
| powered: False | ``power on`` | powered: True | | |
202 |
+----------------+-----------------+----------------+--------------------------+ |
203 |
| powered: True | ``power off`` | powered: False | | |
204 |
+----------------+-----------------+----------------+--------------------------+ |
205 |
| powered: True | ``power cycle`` | powered: True | | |
206 |
+----------------+-----------------+----------------+--------------------------+ |
207 |
| powered: True | ``power on`` | powered: True | FYI: IPMI will complain | |
208 |
| | | | if you try to power on | |
209 |
| | | | a machine that is already| |
210 |
| | | | powered on | |
211 |
+----------------+-----------------+----------------+--------------------------+ |
212 |
|
213 |
.. note:: |
214 |
|
215 |
* If the command fails, the Node State remains unchanged. |
216 |
* We will not prevent the user from trying to power off a node that is |
217 |
already powered off since the powered state represents the **SoR** only and |
218 |
not the **SoW**. This can however create problems when the cluster |
219 |
administrator wants to bring the **SoR** in sync with the **SoW** without |
220 |
actually having to mess with the node(s). For this case, we allow direct |
221 |
modification of the powered state through the gnt-node modify |
222 |
``--powered=[yes|no]`` command as long as the node has OOB capabilities |
223 |
(i.e. ``--oob-program`` is set). |
224 |
* All node power state changes will be logged |
225 |
|
226 |
Node Power Status Listing (SoW) |
227 |
+++++++++++++++++++++++++++++++ |
228 |
|
229 |
| Program: ``gnt-node`` |
230 |
| Command: ``power-status`` |
231 |
| Parameters: [ ``nodename`` ... ] |
232 |
|
233 |
Example output (represents **SoW**):: |
234 |
|
235 |
gnt-node oob power-status |
236 |
Node Power Status |
237 |
node1.example.com on |
238 |
node2.example.com off |
239 |
node3.example.com on |
240 |
node4.example.com unknown |
241 |
|
242 |
.. note:: |
243 |
|
244 |
* We use ``unknown`` in case the Helper Program could not determine the power |
245 |
state. |
246 |
* If no nodenames are provided, we will list the power state of all nodes |
247 |
which are not opted out from OOB management. |
248 |
* Only nodes which are not opted out from OOB management will be listed. |
249 |
Invoking the command on a node that does not meet this condition will |
250 |
result in an error message "Node X does not support OOB commands". |
251 |
|
252 |
Node Power Status Listing (SoR) |
253 |
+++++++++++++++++++++++++++++++ |
254 |
|
255 |
| Program: ``gnt-node`` |
256 |
| Command: ``info`` |
257 |
| Parameter: [ ``nodename`` ... ] |
258 |
| Option: None |
259 |
|
260 |
Example output (represents **SoR**):: |
261 |
|
262 |
gnt-node info node1.example.com |
263 |
Node name: node1.example.com |
264 |
primary ip: 192.168.1.1 |
265 |
secondary ip: 192.168.2.1 |
266 |
master candidate: True |
267 |
drained: False |
268 |
offline: False |
269 |
powered: True |
270 |
primary for instances: |
271 |
- inst1.example.com |
272 |
- inst2.example.com |
273 |
- inst3.example.com |
274 |
secondary for instances: |
275 |
- inst4.example.com |
276 |
- inst5.example.com |
277 |
- inst6.example.com |
278 |
- inst7.example.com |
279 |
|
280 |
.. note:: |
281 |
Only nodes which are not opted out from OOB management will |
282 |
report the powered state. |
283 |
|
284 |
New ``gnt-node`` oob subcommand: ``health`` |
285 |
+++++++++++++++++++++++++++++++++++++++++++ |
286 |
|
287 |
| Program: ``gnt-node`` |
288 |
| Command: ``health`` |
289 |
| Parameters: [ ``nodename`` ... ] |
290 |
| Options: None |
291 |
| Example: ``/usr/bin/oob health node5.example.com`` |
292 |
|
293 |
Caveats: |
294 |
|
295 |
* If no nodename(s) are provided, we will report the health of all nodes in |
296 |
the cluster which have ``--oob-program`` set. |
297 |
* Only nodes which are not opted out from OOB management will report their |
298 |
health. Invoking the command on a node that does not meet this condition |
299 |
will result in an error message "Node does not support OOB commands". |
300 |
|
301 |
For error handling see `Error Handling`_ |
302 |
|
303 |
OOB Program (Helper Program) Parameters, Return Codes and Data Format |
304 |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
305 |
|
306 |
| Program: executable OOB program (absolute path) |
307 |
| Parameters: command nodename |
308 |
| Command: [power-{on|off|cycle|status}|health] |
309 |
| Options: None |
310 |
| Example: ``/usr/bin/oob power-on node1.example.com`` |
311 |
| Caveat: maximum runtime is limited to 60s |
312 |
|
313 |
Return Codes |
314 |
^^^^^^^^^^^^ |
315 |
|
316 |
+---------------+--------------------------+ |
317 |
| Return code | Meaning | |
318 |
+===============+==========================+ |
319 |
| 0 | Command succeeded | |
320 |
+---------------+--------------------------+ |
321 |
| 1 | Command failed | |
322 |
+---------------+--------------------------+ |
323 |
| others | Unsupported/undefined | |
324 |
+---------------+--------------------------+ |
325 |
|
326 |
Error messages are passed from the helper program to Ganeti through StdErr |
327 |
(return code == 1). On StdOut, the helper program will send data back to |
328 |
Ganeti (return code == 0). The format of the data is JSON. |
329 |
|
330 |
+------------------+-------------------------------+ |
331 |
| Command | Expected output | |
332 |
+==================+===============================+ |
333 |
| ``power-on`` | None | |
334 |
+------------------+-------------------------------+ |
335 |
| ``power-off`` | None | |
336 |
+------------------+-------------------------------+ |
337 |
| ``power-cycle`` | None | |
338 |
+------------------+-------------------------------+ |
339 |
| ``power-status`` | ``{ "powered": true|false }`` | |
340 |
+------------------+-------------------------------+ |
341 |
| ``health`` | :: | |
342 |
| | | |
343 |
| | [[item, status], | |
344 |
| | [item, status], | |
345 |
| | ...] | |
346 |
+------------------+-------------------------------+ |
347 |
|
348 |
Data Format |
349 |
^^^^^^^^^^^ |
350 |
|
351 |
For the health output, the fields are: |
352 |
|
353 |
+--------+--------------------------------------------------------------------+ |
354 |
| Field | Meaning | |
355 |
+========+====================================================================+ |
356 |
| item | String identifier of the item we are querying the health of, | |
357 |
| | examples: | |
358 |
| | | |
359 |
| | * Ambient Temp | |
360 |
| | * PS Redundancy | |
361 |
| | * FAN 1 RPM | |
362 |
+--------+--------------------------------------------------------------------+ |
363 |
| status | String; Can take one of the following four values: | |
364 |
| | | |
365 |
| | * OK | |
366 |
| | * WARNING | |
367 |
| | * CRITICAL | |
368 |
| | * UNKNOWN | |
369 |
+--------+--------------------------------------------------------------------+ |
370 |
|
371 |
.. note:: |
372 |
|
373 |
* The item output list is defined by the Helper Program. It is up to the |
374 |
author of the Helper Program to decide which items should be monitored and |
375 |
what each corresponding return status is. |
376 |
* Ganeti will currently not take any actions based on the item status. It |
377 |
will however create log entries for items with status WARNING or CRITICAL |
378 |
for each run of the ``gnt-node oob health nodename`` command. Automatic |
379 |
actions (regular monitoring of the item status) is considered a new service |
380 |
and will be treated in a separate design document. |
381 |
|
382 |
Logging |
383 |
------- |
384 |
|
385 |
The ``gnt-node power-[on|off]`` (power state changes) commands will create log |
386 |
entries following current Ganeti logging practices. In addition, health items |
387 |
with status WARNING or CRITICAL will be logged for each run of ``gnt-node |
388 |
health``. |
389 |
|
390 |
.. vim: set textwidth=72 : |
391 |
.. Local Variables: |
392 |
.. mode: rst |
393 |
.. fill-column: 72 |
394 |
.. End: |