|
1 |
Ganeti Node OOB Management Framework
|
|
2 |
====================================
|
|
3 |
|
|
4 |
Objective
|
|
5 |
---------
|
|
6 |
|
|
7 |
Extend Ganeti with Out of Band Cluster Node Management Capabilities.
|
|
8 |
|
|
9 |
Background
|
|
10 |
----------
|
|
11 |
|
|
12 |
Ganeti currently has no support for Out of Band management of the nodes in a
|
|
13 |
cluster. It relies on the OS running on the nodes and has therefore limited
|
|
14 |
possibilities when the OS is not responding. The command ``gnt-node powercycle``
|
|
15 |
can be issued to attempt a reboot of a node that crashed but there are no means
|
|
16 |
to power a node off and power it back on. Supporting this is very handy in the
|
|
17 |
following situations:
|
|
18 |
|
|
19 |
* **Emergency Power Off**: During emergencies, time is critical and manual
|
|
20 |
tasks just add latency which can be avoided through automation. If a server
|
|
21 |
room overheats, halting the OS on the nodes is not enough. The nodes need
|
|
22 |
to be powered off cleanly to prevent damage to equipment.
|
|
23 |
* **Repairs**: In most cases, repairing a node means that the node has to be
|
|
24 |
powered off.
|
|
25 |
* **Crashes**: Software bugs may crash a node. Having an OS independent way to
|
|
26 |
power-cycle a node helps to recover the node without human intervention.
|
|
27 |
|
|
28 |
Overview
|
|
29 |
--------
|
|
30 |
|
|
31 |
Ganeti will be extended with OOB capabilities through adding a new **Cluster
|
|
32 |
Parameter** (``--oob-program``), a new **Node Property** (``--oob-program``), a
|
|
33 |
new **Node State (powered)** and support in ``gnt-node`` for invoking an
|
|
34 |
**External Helper Command** which executes the actual OOB command (``gnt-node
|
|
35 |
<command> nodename ...``). The supported commands are: ``power on``,
|
|
36 |
``power off``, ``power cycle``, ``power status`` and ``health``.
|
|
37 |
|
|
38 |
.. note::
|
|
39 |
The new **Node State (powered)** is a **State of Record
|
|
40 |
(SoR)**, not a **State of World (SoW)**. The maximum execution time of the
|
|
41 |
**External Helper Command** will be limited to 60s to prevent the cluster from
|
|
42 |
getting locked for an undefined amount of time.
|
|
43 |
|
|
44 |
Detailed Design
|
|
45 |
---------------
|
|
46 |
|
|
47 |
New ``gnt-cluster`` Parameter
|
|
48 |
+++++++++++++++++++++++++++++
|
|
49 |
|
|
50 |
| Program: ``gnt-cluster``
|
|
51 |
| Command: ``modify|init``
|
|
52 |
| Parameters: ``--oob-program``
|
|
53 |
| Options: ``--oob-program``: executable OOB program (absolute path)
|
|
54 |
|
|
55 |
New ``gnt-node`` Property
|
|
56 |
+++++++++++++++++++++++++
|
|
57 |
|
|
58 |
| Program: ``gnt-node``
|
|
59 |
| Command: ``modify|add``
|
|
60 |
| Parameters: ``--oob-program``
|
|
61 |
| Options: ``--oob-program``: executable OOB program (absolute path)
|
|
62 |
|
|
63 |
.. note::
|
|
64 |
If ``--oob-program`` is set to ``!`` then the node has no OOB capabilities.
|
|
65 |
Otherwise, we will inherit the node group respectively the cluster wide
|
|
66 |
value. I.e. the nodes have to opt out from OOB capabilities.
|
|
67 |
|
|
68 |
Addition to ``gnt-cluster verify``
|
|
69 |
++++++++++++++++++++++++++++++++++
|
|
70 |
|
|
71 |
| Program: ``gnt-cluster``
|
|
72 |
| Command: ``verify``
|
|
73 |
| Parameter: None
|
|
74 |
| Option: None
|
|
75 |
| Additional Checks:
|
|
76 |
|
|
77 |
1. existence and execution flag of OOB program on all Master Candidates if
|
|
78 |
the cluster parameter ``--oob-program`` is set or at least one node has
|
|
79 |
the property ``--oob-program`` set. The OOB helper is just invoked on the
|
|
80 |
master
|
|
81 |
2. check if node state powered matches actual power state of the machine for
|
|
82 |
those nodes where ``--oob-program`` is set
|
|
83 |
|
|
84 |
New Node State
|
|
85 |
++++++++++++++
|
|
86 |
|
|
87 |
Ganeti supports the following two boolean states related to the nodes:
|
|
88 |
|
|
89 |
**drained**
|
|
90 |
The cluster still communicates with drained nodes but excludes them from
|
|
91 |
allocation operations
|
|
92 |
|
|
93 |
**offline**
|
|
94 |
if offline, the cluster does not communicate with offline nodes; useful for
|
|
95 |
nodes that are not reachable in order to avoid delays
|
|
96 |
|
|
97 |
And will extend this list with the following boolean state:
|
|
98 |
|
|
99 |
**powered**
|
|
100 |
if not powered, the cluster does not communicate with not powered nodes if
|
|
101 |
the node property ``--oob-program`` is not set, the state powered is not
|
|
102 |
displayed
|
|
103 |
|
|
104 |
Additionally modify the meaning of the offline state as follows:
|
|
105 |
|
|
106 |
**offline**
|
|
107 |
if offline, the cluster does not communicate with offline nodes (**with the
|
|
108 |
exception of OOB commands for nodes where** ``--oob-program`` **is set**);
|
|
109 |
useful for nodes that are not reachable in order to avoid delays
|
|
110 |
|
|
111 |
The corresponding command extensions are:
|
|
112 |
|
|
113 |
| Program: ``gnt-node``
|
|
114 |
| Command: ``info``
|
|
115 |
| Parameter: [ ``nodename`` ... ]
|
|
116 |
| Option: None
|
|
117 |
|
|
118 |
Additional Output (SoR, ommited if node property ``--oob-program`` is not set):
|
|
119 |
powered: ``[True|False]``
|
|
120 |
|
|
121 |
| Program: ``gnt-node``
|
|
122 |
| Command: ``modify``
|
|
123 |
| Parameter: nodename
|
|
124 |
| Option: [ ``--powered=yes|no`` ]
|
|
125 |
| Reasoning: sometimes you will need to sync the SoR with the SoW manually
|
|
126 |
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
|
|
127 |
| the node in question
|
|
128 |
|
|
129 |
New ``gnt-node`` commands: ``power [on|off|cycle|status]``
|
|
130 |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
|
131 |
|
|
132 |
| Program: ``gnt-node``
|
|
133 |
| Command: ``power [on|off|cycle|status]``
|
|
134 |
| Parameters: [ ``nodename`` ... ]
|
|
135 |
| Options: None
|
|
136 |
| Caveats:
|
|
137 |
|
|
138 |
* If no nodenames are passed to ``power [on|off|cycle]``, the user will be
|
|
139 |
prompted with ``"Do you really want to power [on|off|cycle] the following
|
|
140 |
nodes: <display list of OOB capable nodes in the cluster)? (y/n)"``
|
|
141 |
* For ``power-status``, nodename is optional, if omitted, we list the
|
|
142 |
power-status of all OOB capable nodes in the cluster (SoW)
|
|
143 |
* User should be warned and needs to confirm with yes if s/he tries to
|
|
144 |
``power [off|cycle]`` a node with running instances.
|
|
145 |
|
|
146 |
Error Handling
|
|
147 |
^^^^^^^^^^^^^^
|
|
148 |
|
|
149 |
+------------------------------+-----------------------------------------------+
|
|
150 |
| Exception | Error Message |
|
|
151 |
+==============================+===============================================+
|
|
152 |
| OOB program return code != 0 | OOB program execution failed ($ERROR_MSG) |
|
|
153 |
+------------------------------+-----------------------------------------------+
|
|
154 |
| OOB program execution time | OOB program execution timeout exceeded, OOB |
|
|
155 |
| exceeds 60s | program execution aborted |
|
|
156 |
+------------------------------+-----------------------------------------------+
|
|
157 |
|
|
158 |
Node State Changes
|
|
159 |
^^^^^^^^^^^^^^^^^^
|
|
160 |
|
|
161 |
+----------------+-----------------+----------------+--------------------------+
|
|
162 |
| State before | Command | State after | Comment |
|
|
163 |
| execution | | execution | |
|
|
164 |
+================+=================+================+==========================+
|
|
165 |
| powered: False | ``power off`` | powered: False | FYI: IPMI will complain |
|
|
166 |
| | | | if you try to power off |
|
|
167 |
| | | | a machine that is already|
|
|
168 |
| | | | powered off |
|
|
169 |
+----------------+-----------------+----------------+--------------------------+
|
|
170 |
| powered: False | ``power cycle`` | powered: False | FYI: IPMI will complain |
|
|
171 |
| | | | if you try to cycle a |
|
|
172 |
| | | | machine that is already |
|
|
173 |
| | | | powered off |
|
|
174 |
+----------------+-----------------+----------------+--------------------------+
|
|
175 |
| powered: False | ``power on`` | powered: True | |
|
|
176 |
+----------------+-----------------+----------------+--------------------------+
|
|
177 |
| powered: True | ``power off`` | powered: False | |
|
|
178 |
+----------------+-----------------+----------------+--------------------------+
|
|
179 |
| powered: True | ``power cycle`` | powered: True | |
|
|
180 |
+----------------+-----------------+----------------+--------------------------+
|
|
181 |
| powered: True | ``power on`` | powered: True | FYI: IPMI will complain |
|
|
182 |
| | | | if you try to power on |
|
|
183 |
| | | | a machine that is already|
|
|
184 |
| | | | powered on |
|
|
185 |
+----------------+-----------------+----------------+--------------------------+
|
|
186 |
|
|
187 |
.. note::
|
|
188 |
|
|
189 |
* If the command fails, the Node State remains unchanged.
|
|
190 |
* We will not prevent the user from trying to power off a node that is
|
|
191 |
already powered off since the powered state represents the **SoR** only and
|
|
192 |
not the **SoW**. This can however create problems when the cluster
|
|
193 |
administrator wants to bring the **SoR** in sync with the **SoW** without
|
|
194 |
actually having to mess with the node(s). For this case, we allow direct
|
|
195 |
modification of the powered state through the gnt-node modify
|
|
196 |
``--powered=[yes|no]`` command as long as the node has OOB capabilities
|
|
197 |
(i.e. ``--oob-program`` is set).
|
|
198 |
* All node power state changes will be logged
|
|
199 |
|
|
200 |
Node Power Status Listing (SoW)
|
|
201 |
+++++++++++++++++++++++++++++++
|
|
202 |
|
|
203 |
| Program: ``gnt-node``
|
|
204 |
| Command: ``power-status``
|
|
205 |
| Parameters: [ ``nodename`` ... ]
|
|
206 |
|
|
207 |
Example output (represents **SoW**)::
|
|
208 |
|
|
209 |
gnt-node oob power-status
|
|
210 |
Node Power Status
|
|
211 |
node1.example.com on
|
|
212 |
node2.example.com off
|
|
213 |
node3.example.com on
|
|
214 |
node4.example.com unknown
|
|
215 |
|
|
216 |
.. note::
|
|
217 |
|
|
218 |
* We use ``unknown`` in case the Helper Program could not determine the power
|
|
219 |
state.
|
|
220 |
* If no nodenames are provided, we will list the power state of all nodes
|
|
221 |
which are not opted out from OOB management.
|
|
222 |
* Only nodes which are not opted out from OOB management will be listed.
|
|
223 |
Invoking the command on a node that does not meet this condition will
|
|
224 |
result in an error message "Node X does not support OOB commands".
|
|
225 |
|
|
226 |
Node Power Status Listing (SoR)
|
|
227 |
+++++++++++++++++++++++++++++++
|
|
228 |
|
|
229 |
| Program: ``gnt-node``
|
|
230 |
| Command: ``info``
|
|
231 |
| Parameter: [ ``nodename`` ... ]
|
|
232 |
| Option: None
|
|
233 |
|
|
234 |
Example output (represents **SoR**)::
|
|
235 |
|
|
236 |
gnt-node info node1.example.com
|
|
237 |
Node name: node1.example.com
|
|
238 |
primary ip: 192.168.1.1
|
|
239 |
secondary ip: 192.168.2.1
|
|
240 |
master candidate: True
|
|
241 |
drained: False
|
|
242 |
offline: False
|
|
243 |
powered: True
|
|
244 |
primary for instances:
|
|
245 |
- inst1.example.com
|
|
246 |
- inst2.example.com
|
|
247 |
- inst3.example.com
|
|
248 |
secondary for instances:
|
|
249 |
- inst4.example.com
|
|
250 |
- inst5.example.com
|
|
251 |
- inst6.example.com
|
|
252 |
- inst7.example.com
|
|
253 |
|
|
254 |
.. note::
|
|
255 |
Only nodes which are not opted out from OOB management will
|
|
256 |
report the powered state.
|
|
257 |
|
|
258 |
New ``gnt-node`` oob subcommand: ``health``
|
|
259 |
+++++++++++++++++++++++++++++++++++++++++++
|
|
260 |
|
|
261 |
| Program: ``gnt-node``
|
|
262 |
| Command: ``health``
|
|
263 |
| Parameters: [ ``nodename`` ... ]
|
|
264 |
| Options: None
|
|
265 |
| Example: ``/usr/bin/oob health node5.example.com``
|
|
266 |
|
|
267 |
Caveats:
|
|
268 |
|
|
269 |
* If no nodename(s) are provided, we will report the health of all nodes in
|
|
270 |
the cluster which have ``--oob-program`` set.
|
|
271 |
* Only nodes which are not opted out from OOB management will report their
|
|
272 |
health. Invoking the command on a node that does not meet this condition
|
|
273 |
will result in an error message "Node does not support OOB commands".
|
|
274 |
|
|
275 |
For error handling see `Error Handling`_
|
|
276 |
|
|
277 |
OOB Program (Helper Program) Parameters, Return Codes and Data Format
|
|
278 |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
|
279 |
|
|
280 |
| Program: executable OOB program (absolute path)
|
|
281 |
| Parameters: command nodename
|
|
282 |
| Command: [power-{on|off|cycle|status}|health]
|
|
283 |
| Options: None
|
|
284 |
| Example: ``/usr/bin/oob power-on node1.example.com``
|
|
285 |
| Caveat: maximum runtime is limited to 60s
|
|
286 |
|
|
287 |
Return Codes
|
|
288 |
^^^^^^^^^^^^
|
|
289 |
|
|
290 |
+---------------+--------------------------+
|
|
291 |
| Return code | Meaning |
|
|
292 |
+===============+==========================+
|
|
293 |
| 0 | Command succeeded |
|
|
294 |
+---------------+--------------------------+
|
|
295 |
| 1 | Command failed |
|
|
296 |
+---------------+--------------------------+
|
|
297 |
| others | Unsupported/undefined |
|
|
298 |
+---------------+--------------------------+
|
|
299 |
|
|
300 |
Error messages are passed from the helper program to Ganeti through StdErr
|
|
301 |
(return code == 1). On StdOut, the helper program will send data back to
|
|
302 |
Ganeti (return code == 0). The format of the data is JSON.
|
|
303 |
|
|
304 |
+------------------+-------------------------------+
|
|
305 |
| Command | Expected output |
|
|
306 |
+==================+===============================+
|
|
307 |
| ``power-on`` | None |
|
|
308 |
+------------------+-------------------------------+
|
|
309 |
| ``power-off`` | None |
|
|
310 |
+------------------+-------------------------------+
|
|
311 |
| ``power-cycle`` | None |
|
|
312 |
+------------------+-------------------------------+
|
|
313 |
| ``power-status`` | ``{ "powered": true|false }`` |
|
|
314 |
+------------------+-------------------------------+
|
|
315 |
| ``health`` | :: |
|
|
316 |
| | |
|
|
317 |
| | [[item, status], |
|
|
318 |
| | [item, status], |
|
|
319 |
| | ...] |
|
|
320 |
+------------------+-------------------------------+
|
|
321 |
|
|
322 |
Data Format
|
|
323 |
^^^^^^^^^^^
|
|
324 |
|
|
325 |
For the health output, the fields are:
|
|
326 |
|
|
327 |
+--------+--------------------------------------------------------------------+
|
|
328 |
| Field | Meaning |
|
|
329 |
+========+====================================================================+
|
|
330 |
| item | String identifier of the item we are querying the health of, |
|
|
331 |
| | examples: |
|
|
332 |
| | |
|
|
333 |
| | * Ambient Temp |
|
|
334 |
| | * PS Redundancy |
|
|
335 |
| | * FAN 1 RPM |
|
|
336 |
+--------+--------------------------------------------------------------------+
|
|
337 |
| status | String; Can take one of the following four values: |
|
|
338 |
| | |
|
|
339 |
| | * OK |
|
|
340 |
| | * WARNING |
|
|
341 |
| | * CRITICAL |
|
|
342 |
| | * UNKNOWN |
|
|
343 |
+--------+--------------------------------------------------------------------+
|
|
344 |
|
|
345 |
.. note::
|
|
346 |
|
|
347 |
* The item output list is defined by the Helper Program. It is up to the
|
|
348 |
author of the Helper Program to decide which items should be monitored and
|
|
349 |
what each corresponding return status is.
|
|
350 |
* Ganeti will currently not take any actions based on the item status. It
|
|
351 |
will however create log entries for items with status WARNING or CRITICAL
|
|
352 |
for each run of the ``gnt-node oob health nodename`` command. Automatic
|
|
353 |
actions (regular monitoring of the item status) is considered a new service
|
|
354 |
and will be treated in a separate design document.
|
|
355 |
|
|
356 |
Logging
|
|
357 |
-------
|
|
358 |
|
|
359 |
The ``gnt-node power-[on|off]`` (power state changes) commands will create log
|
|
360 |
entries following current Ganeti logging practices. In addition, health items
|
|
361 |
with status WARNING or CRITICAL will be logged for each run of ``gnt-node
|
|
362 |
health``.
|