root / doc / design-oob.rst @ 18e2b6e4
History | View | Annotate | Download (14.8 kB)
1 |
Ganeti Node OOB Management Framework |
---|---|
2 |
==================================== |
3 |
|
4 |
Objective |
5 |
--------- |
6 |
|
7 |
Extend Ganeti with Out of Band Cluster Node Management Capabilities. |
8 |
|
9 |
Background |
10 |
---------- |
11 |
|
12 |
Ganeti currently has no support for Out of Band management of the nodes in a |
13 |
cluster. It relies on the OS running on the nodes and has therefore limited |
14 |
possibilities when the OS is not responding. The command ``gnt-node powercycle`` |
15 |
can be issued to attempt a reboot of a node that crashed but there are no means |
16 |
to power a node off and power it back on. Supporting this is very handy in the |
17 |
following situations: |
18 |
|
19 |
* **Emergency Power Off**: During emergencies, time is critical and manual |
20 |
tasks just add latency which can be avoided through automation. If a server |
21 |
room overheats, halting the OS on the nodes is not enough. The nodes need |
22 |
to be powered off cleanly to prevent damage to equipment. |
23 |
* **Repairs**: In most cases, repairing a node means that the node has to be |
24 |
powered off. |
25 |
* **Crashes**: Software bugs may crash a node. Having an OS independent way to |
26 |
power-cycle a node helps to recover the node without human intervention. |
27 |
|
28 |
Overview |
29 |
-------- |
30 |
|
31 |
Ganeti will be extended with OOB capabilities through adding a new **Cluster |
32 |
Parameter** (``--oob-program``), a new **Node Property** (``--oob-program``), a |
33 |
new **Node State (powered)** and support in ``gnt-node`` for invoking an |
34 |
**External Helper Command** which executes the actual OOB command (``gnt-node |
35 |
<command> nodename ...``). The supported commands are: ``power on``, |
36 |
``power off``, ``power cycle``, ``power status`` and ``health``. |
37 |
|
38 |
.. note:: |
39 |
The new **Node State (powered)** is a **State of Record |
40 |
(SoR)**, not a **State of World (SoW)**. The maximum execution time of the |
41 |
**External Helper Command** will be limited to 60s to prevent the cluster from |
42 |
getting locked for an undefined amount of time. |
43 |
|
44 |
Detailed Design |
45 |
--------------- |
46 |
|
47 |
New ``gnt-cluster`` Parameter |
48 |
+++++++++++++++++++++++++++++ |
49 |
|
50 |
| Program: ``gnt-cluster`` |
51 |
| Command: ``modify|init`` |
52 |
| Parameters: ``--oob-program`` |
53 |
| Options: ``--oob-program``: executable OOB program (absolute path) |
54 |
|
55 |
New ``gnt-node`` Property |
56 |
+++++++++++++++++++++++++ |
57 |
|
58 |
| Program: ``gnt-node`` |
59 |
| Command: ``modify|add`` |
60 |
| Parameters: ``--oob-program`` |
61 |
| Options: ``--oob-program``: executable OOB program (absolute path) |
62 |
|
63 |
.. note:: |
64 |
If ``--oob-program`` is set to ``!`` then the node has no OOB capabilities. |
65 |
Otherwise, we will inherit the node group respectively the cluster wide |
66 |
value. I.e. the nodes have to opt out from OOB capabilities. |
67 |
|
68 |
Addition to ``gnt-cluster verify`` |
69 |
++++++++++++++++++++++++++++++++++ |
70 |
|
71 |
| Program: ``gnt-cluster`` |
72 |
| Command: ``verify`` |
73 |
| Parameter: None |
74 |
| Option: None |
75 |
| Additional Checks: |
76 |
|
77 |
1. existence and execution flag of OOB program on all Master Candidates if |
78 |
the cluster parameter ``--oob-program`` is set or at least one node has |
79 |
the property ``--oob-program`` set. The OOB helper is just invoked on the |
80 |
master |
81 |
2. check if node state powered matches actual power state of the machine for |
82 |
those nodes where ``--oob-program`` is set |
83 |
|
84 |
New Node State |
85 |
++++++++++++++ |
86 |
|
87 |
Ganeti supports the following two boolean states related to the nodes: |
88 |
|
89 |
**drained** |
90 |
The cluster still communicates with drained nodes but excludes them from |
91 |
allocation operations |
92 |
|
93 |
**offline** |
94 |
if offline, the cluster does not communicate with offline nodes; useful for |
95 |
nodes that are not reachable in order to avoid delays |
96 |
|
97 |
And will extend this list with the following boolean state: |
98 |
|
99 |
**powered** |
100 |
if not powered, the cluster does not communicate with not powered nodes if |
101 |
the node property ``--oob-program`` is not set, the state powered is not |
102 |
displayed |
103 |
|
104 |
Additionally modify the meaning of the offline state as follows: |
105 |
|
106 |
**offline** |
107 |
if offline, the cluster does not communicate with offline nodes (**with the |
108 |
exception of OOB commands for nodes where** ``--oob-program`` **is set**); |
109 |
useful for nodes that are not reachable in order to avoid delays |
110 |
|
111 |
The corresponding command extensions are: |
112 |
|
113 |
| Program: ``gnt-node`` |
114 |
| Command: ``info`` |
115 |
| Parameter: [ ``nodename`` ... ] |
116 |
| Option: None |
117 |
|
118 |
Additional Output (SoR, ommited if node property ``--oob-program`` is not set): |
119 |
powered: ``[True|False]`` |
120 |
|
121 |
| Program: ``gnt-node`` |
122 |
| Command: ``modify`` |
123 |
| Parameter: nodename |
124 |
| Option: [ ``--powered=yes|no`` ] |
125 |
| Reasoning: sometimes you will need to sync the SoR with the SoW manually |
126 |
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for |
127 |
| the node in question |
128 |
|
129 |
New ``gnt-node`` commands: ``power [on|off|cycle|status]`` |
130 |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
131 |
|
132 |
| Program: ``gnt-node`` |
133 |
| Command: ``power [on|off|cycle|status]`` |
134 |
| Parameters: [ ``nodename`` ... ] |
135 |
| Options: None |
136 |
| Caveats: |
137 |
|
138 |
* If no nodenames are passed to ``power [on|off|cycle]``, the user will be |
139 |
prompted with ``"Do you really want to power [on|off|cycle] the following |
140 |
nodes: <display list of OOB capable nodes in the cluster)? (y/n)"`` |
141 |
* For ``power-status``, nodename is optional, if omitted, we list the |
142 |
power-status of all OOB capable nodes in the cluster (SoW) |
143 |
* User should be warned and needs to confirm with yes if s/he tries to |
144 |
``power [off|cycle]`` a node with running instances. |
145 |
|
146 |
Error Handling |
147 |
^^^^^^^^^^^^^^ |
148 |
|
149 |
+------------------------------+-----------------------------------------------+ |
150 |
| Exception | Error Message | |
151 |
+==============================+===============================================+ |
152 |
| OOB program return code != 0 | OOB program execution failed ($ERROR_MSG) | |
153 |
+------------------------------+-----------------------------------------------+ |
154 |
| OOB program execution time | OOB program execution timeout exceeded, OOB | |
155 |
| exceeds 60s | program execution aborted | |
156 |
+------------------------------+-----------------------------------------------+ |
157 |
|
158 |
Node State Changes |
159 |
^^^^^^^^^^^^^^^^^^ |
160 |
|
161 |
+----------------+-----------------+----------------+--------------------------+ |
162 |
| State before | Command | State after | Comment | |
163 |
| execution | | execution | | |
164 |
+================+=================+================+==========================+ |
165 |
| powered: False | ``power off`` | powered: False | FYI: IPMI will complain | |
166 |
| | | | if you try to power off | |
167 |
| | | | a machine that is already| |
168 |
| | | | powered off | |
169 |
+----------------+-----------------+----------------+--------------------------+ |
170 |
| powered: False | ``power cycle`` | powered: False | FYI: IPMI will complain | |
171 |
| | | | if you try to cycle a | |
172 |
| | | | machine that is already | |
173 |
| | | | powered off | |
174 |
+----------------+-----------------+----------------+--------------------------+ |
175 |
| powered: False | ``power on`` | powered: True | | |
176 |
+----------------+-----------------+----------------+--------------------------+ |
177 |
| powered: True | ``power off`` | powered: False | | |
178 |
+----------------+-----------------+----------------+--------------------------+ |
179 |
| powered: True | ``power cycle`` | powered: True | | |
180 |
+----------------+-----------------+----------------+--------------------------+ |
181 |
| powered: True | ``power on`` | powered: True | FYI: IPMI will complain | |
182 |
| | | | if you try to power on | |
183 |
| | | | a machine that is already| |
184 |
| | | | powered on | |
185 |
+----------------+-----------------+----------------+--------------------------+ |
186 |
|
187 |
.. note:: |
188 |
|
189 |
* If the command fails, the Node State remains unchanged. |
190 |
* We will not prevent the user from trying to power off a node that is |
191 |
already powered off since the powered state represents the **SoR** only and |
192 |
not the **SoW**. This can however create problems when the cluster |
193 |
administrator wants to bring the **SoR** in sync with the **SoW** without |
194 |
actually having to mess with the node(s). For this case, we allow direct |
195 |
modification of the powered state through the gnt-node modify |
196 |
``--powered=[yes|no]`` command as long as the node has OOB capabilities |
197 |
(i.e. ``--oob-program`` is set). |
198 |
* All node power state changes will be logged |
199 |
|
200 |
Node Power Status Listing (SoW) |
201 |
+++++++++++++++++++++++++++++++ |
202 |
|
203 |
| Program: ``gnt-node`` |
204 |
| Command: ``power-status`` |
205 |
| Parameters: [ ``nodename`` ... ] |
206 |
|
207 |
Example output (represents **SoW**):: |
208 |
|
209 |
gnt-node oob power-status |
210 |
Node Power Status |
211 |
node1.example.com on |
212 |
node2.example.com off |
213 |
node3.example.com on |
214 |
node4.example.com unknown |
215 |
|
216 |
.. note:: |
217 |
|
218 |
* We use ``unknown`` in case the Helper Program could not determine the power |
219 |
state. |
220 |
* If no nodenames are provided, we will list the power state of all nodes |
221 |
which are not opted out from OOB management. |
222 |
* Only nodes which are not opted out from OOB management will be listed. |
223 |
Invoking the command on a node that does not meet this condition will |
224 |
result in an error message "Node X does not support OOB commands". |
225 |
|
226 |
Node Power Status Listing (SoR) |
227 |
+++++++++++++++++++++++++++++++ |
228 |
|
229 |
| Program: ``gnt-node`` |
230 |
| Command: ``info`` |
231 |
| Parameter: [ ``nodename`` ... ] |
232 |
| Option: None |
233 |
|
234 |
Example output (represents **SoR**):: |
235 |
|
236 |
gnt-node info node1.example.com |
237 |
Node name: node1.example.com |
238 |
primary ip: 192.168.1.1 |
239 |
secondary ip: 192.168.2.1 |
240 |
master candidate: True |
241 |
drained: False |
242 |
offline: False |
243 |
powered: True |
244 |
primary for instances: |
245 |
- inst1.example.com |
246 |
- inst2.example.com |
247 |
- inst3.example.com |
248 |
secondary for instances: |
249 |
- inst4.example.com |
250 |
- inst5.example.com |
251 |
- inst6.example.com |
252 |
- inst7.example.com |
253 |
|
254 |
.. note:: |
255 |
Only nodes which are not opted out from OOB management will |
256 |
report the powered state. |
257 |
|
258 |
New ``gnt-node`` oob subcommand: ``health`` |
259 |
+++++++++++++++++++++++++++++++++++++++++++ |
260 |
|
261 |
| Program: ``gnt-node`` |
262 |
| Command: ``health`` |
263 |
| Parameters: [ ``nodename`` ... ] |
264 |
| Options: None |
265 |
| Example: ``/usr/bin/oob health node5.example.com`` |
266 |
|
267 |
Caveats: |
268 |
|
269 |
* If no nodename(s) are provided, we will report the health of all nodes in |
270 |
the cluster which have ``--oob-program`` set. |
271 |
* Only nodes which are not opted out from OOB management will report their |
272 |
health. Invoking the command on a node that does not meet this condition |
273 |
will result in an error message "Node does not support OOB commands". |
274 |
|
275 |
For error handling see `Error Handling`_ |
276 |
|
277 |
OOB Program (Helper Program) Parameters, Return Codes and Data Format |
278 |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
279 |
|
280 |
| Program: executable OOB program (absolute path) |
281 |
| Parameters: command nodename |
282 |
| Command: [power-{on|off|cycle|status}|health] |
283 |
| Options: None |
284 |
| Example: ``/usr/bin/oob power-on node1.example.com`` |
285 |
| Caveat: maximum runtime is limited to 60s |
286 |
|
287 |
Return Codes |
288 |
^^^^^^^^^^^^ |
289 |
|
290 |
+---------------+--------------------------+ |
291 |
| Return code | Meaning | |
292 |
+===============+==========================+ |
293 |
| 0 | Command succeeded | |
294 |
+---------------+--------------------------+ |
295 |
| 1 | Command failed | |
296 |
+---------------+--------------------------+ |
297 |
| others | Unsupported/undefined | |
298 |
+---------------+--------------------------+ |
299 |
|
300 |
Error messages are passed from the helper program to Ganeti through StdErr |
301 |
(return code == 1). On StdOut, the helper program will send data back to |
302 |
Ganeti (return code == 0). The format of the data is JSON. |
303 |
|
304 |
+------------------+-------------------------------+ |
305 |
| Command | Expected output | |
306 |
+==================+===============================+ |
307 |
| ``power-on`` | None | |
308 |
+------------------+-------------------------------+ |
309 |
| ``power-off`` | None | |
310 |
+------------------+-------------------------------+ |
311 |
| ``power-cycle`` | None | |
312 |
+------------------+-------------------------------+ |
313 |
| ``power-status`` | ``{ "powered": true|false }`` | |
314 |
+------------------+-------------------------------+ |
315 |
| ``health`` | :: | |
316 |
| | | |
317 |
| | [[item, status], | |
318 |
| | [item, status], | |
319 |
| | ...] | |
320 |
+------------------+-------------------------------+ |
321 |
|
322 |
Data Format |
323 |
^^^^^^^^^^^ |
324 |
|
325 |
For the health output, the fields are: |
326 |
|
327 |
+--------+--------------------------------------------------------------------+ |
328 |
| Field | Meaning | |
329 |
+========+====================================================================+ |
330 |
| item | String identifier of the item we are querying the health of, | |
331 |
| | examples: | |
332 |
| | | |
333 |
| | * Ambient Temp | |
334 |
| | * PS Redundancy | |
335 |
| | * FAN 1 RPM | |
336 |
+--------+--------------------------------------------------------------------+ |
337 |
| status | String; Can take one of the following four values: | |
338 |
| | | |
339 |
| | * OK | |
340 |
| | * WARNING | |
341 |
| | * CRITICAL | |
342 |
| | * UNKNOWN | |
343 |
+--------+--------------------------------------------------------------------+ |
344 |
|
345 |
.. note:: |
346 |
|
347 |
* The item output list is defined by the Helper Program. It is up to the |
348 |
author of the Helper Program to decide which items should be monitored and |
349 |
what each corresponding return status is. |
350 |
* Ganeti will currently not take any actions based on the item status. It |
351 |
will however create log entries for items with status WARNING or CRITICAL |
352 |
for each run of the ``gnt-node oob health nodename`` command. Automatic |
353 |
actions (regular monitoring of the item status) is considered a new service |
354 |
and will be treated in a separate design document. |
355 |
|
356 |
Logging |
357 |
------- |
358 |
|
359 |
The ``gnt-node power-[on|off]`` (power state changes) commands will create log |
360 |
entries following current Ganeti logging practices. In addition, health items |
361 |
with status WARNING or CRITICAL will be logged for each run of ``gnt-node |
362 |
health``. |