root / doc / design-oob.rst @ 7fa310f6
History | View | Annotate | Download (15.9 kB)
1 | 1e86ee97 | Marc Schmitt | Ganeti Node OOB Management Framework |
---|---|---|---|
2 | 1e86ee97 | Marc Schmitt | ==================================== |
3 | 1e86ee97 | Marc Schmitt | |
4 | 1e86ee97 | Marc Schmitt | Objective |
5 | 1e86ee97 | Marc Schmitt | --------- |
6 | 1e86ee97 | Marc Schmitt | |
7 | 1e86ee97 | Marc Schmitt | Extend Ganeti with Out of Band Cluster Node Management Capabilities. |
8 | 1e86ee97 | Marc Schmitt | |
9 | 1e86ee97 | Marc Schmitt | Background |
10 | 1e86ee97 | Marc Schmitt | ---------- |
11 | 1e86ee97 | Marc Schmitt | |
12 | 1e86ee97 | Marc Schmitt | Ganeti currently has no support for Out of Band management of the nodes in a |
13 | 1e86ee97 | Marc Schmitt | cluster. It relies on the OS running on the nodes and has therefore limited |
14 | 1e86ee97 | Marc Schmitt | possibilities when the OS is not responding. The command ``gnt-node powercycle`` |
15 | 1e86ee97 | Marc Schmitt | can be issued to attempt a reboot of a node that crashed but there are no means |
16 | 1e86ee97 | Marc Schmitt | to power a node off and power it back on. Supporting this is very handy in the |
17 | 1e86ee97 | Marc Schmitt | following situations: |
18 | 1e86ee97 | Marc Schmitt | |
19 | 1e86ee97 | Marc Schmitt | * **Emergency Power Off**: During emergencies, time is critical and manual |
20 | 1e86ee97 | Marc Schmitt | tasks just add latency which can be avoided through automation. If a server |
21 | 1e86ee97 | Marc Schmitt | room overheats, halting the OS on the nodes is not enough. The nodes need |
22 | 1e86ee97 | Marc Schmitt | to be powered off cleanly to prevent damage to equipment. |
23 | 1e86ee97 | Marc Schmitt | * **Repairs**: In most cases, repairing a node means that the node has to be |
24 | 1e86ee97 | Marc Schmitt | powered off. |
25 | 1e86ee97 | Marc Schmitt | * **Crashes**: Software bugs may crash a node. Having an OS independent way to |
26 | 1e86ee97 | Marc Schmitt | power-cycle a node helps to recover the node without human intervention. |
27 | 1e86ee97 | Marc Schmitt | |
28 | 1e86ee97 | Marc Schmitt | Overview |
29 | 1e86ee97 | Marc Schmitt | -------- |
30 | 1e86ee97 | Marc Schmitt | |
31 | 1e86ee97 | Marc Schmitt | Ganeti will be extended with OOB capabilities through adding a new **Cluster |
32 | 1e86ee97 | Marc Schmitt | Parameter** (``--oob-program``), a new **Node Property** (``--oob-program``), a |
33 | 1e86ee97 | Marc Schmitt | new **Node State (powered)** and support in ``gnt-node`` for invoking an |
34 | 1e86ee97 | Marc Schmitt | **External Helper Command** which executes the actual OOB command (``gnt-node |
35 | 1e86ee97 | Marc Schmitt | <command> nodename ...``). The supported commands are: ``power on``, |
36 | 1e86ee97 | Marc Schmitt | ``power off``, ``power cycle``, ``power status`` and ``health``. |
37 | 1e86ee97 | Marc Schmitt | |
38 | 1e86ee97 | Marc Schmitt | .. note:: |
39 | 1e86ee97 | Marc Schmitt | The new **Node State (powered)** is a **State of Record |
40 | 1e86ee97 | Marc Schmitt | (SoR)**, not a **State of World (SoW)**. The maximum execution time of the |
41 | 1e86ee97 | Marc Schmitt | **External Helper Command** will be limited to 60s to prevent the cluster from |
42 | 1e86ee97 | Marc Schmitt | getting locked for an undefined amount of time. |
43 | 1e86ee97 | Marc Schmitt | |
44 | 1e86ee97 | Marc Schmitt | Detailed Design |
45 | 1e86ee97 | Marc Schmitt | --------------- |
46 | 1e86ee97 | Marc Schmitt | |
47 | 1e86ee97 | Marc Schmitt | New ``gnt-cluster`` Parameter |
48 | 1e86ee97 | Marc Schmitt | +++++++++++++++++++++++++++++ |
49 | 1e86ee97 | Marc Schmitt | |
50 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-cluster`` |
51 | 1e86ee97 | Marc Schmitt | | Command: ``modify|init`` |
52 | 1e86ee97 | Marc Schmitt | | Parameters: ``--oob-program`` |
53 | 1e86ee97 | Marc Schmitt | | Options: ``--oob-program``: executable OOB program (absolute path) |
54 | 1e86ee97 | Marc Schmitt | |
55 | effb49b4 | René Nussbaumer | New ``gnt-cluster epo`` Command |
56 | effb49b4 | René Nussbaumer | +++++++++++++++++++++++++++++++ |
57 | effb49b4 | René Nussbaumer | |
58 | effb49b4 | René Nussbaumer | | Program: ``gnt-cluster`` |
59 | effb49b4 | René Nussbaumer | | Command: ``epo`` |
60 | effb49b4 | René Nussbaumer | | Parameter: ``--on`` ``--force`` ``--groups`` ``--all`` |
61 | effb49b4 | René Nussbaumer | | Options: ``--on``: By default epo turns off, with ``--on`` it tries to get the |
62 | effb49b4 | René Nussbaumer | | cluster back online |
63 | effb49b4 | René Nussbaumer | | ``--force``: To force the operation without asking for confirmation |
64 | effb49b4 | René Nussbaumer | | ``--groups``: To operate on groups instead of nodes |
65 | effb49b4 | René Nussbaumer | | ``--all``: To operate on the whole cluster |
66 | effb49b4 | René Nussbaumer | |
67 | effb49b4 | René Nussbaumer | This is a convenience command to allow easy emergency power off of a whole |
68 | effb49b4 | René Nussbaumer | cluster or part of it. It takes care of all steps needed to get the cluster into |
69 | effb49b4 | René Nussbaumer | a sane state to turn off the nodes. |
70 | effb49b4 | René Nussbaumer | |
71 | effb49b4 | René Nussbaumer | With ``--on`` it does the reverse and tries to bring the rest of the cluster back |
72 | effb49b4 | René Nussbaumer | to life. |
73 | effb49b4 | René Nussbaumer | |
74 | effb49b4 | René Nussbaumer | .. note:: |
75 | effb49b4 | René Nussbaumer | The master node is not able to shut itself cleanly down. Therefore, this |
76 | effb49b4 | René Nussbaumer | command will not do all the work on single node clusters. On multi node |
77 | effb49b4 | René Nussbaumer | clusters the command tries to find another master or if that is not possible |
78 | effb49b4 | René Nussbaumer | prepares everything to the point where the user has to shutdown the master |
79 | effb49b4 | René Nussbaumer | node itself alone this applies also to the single node cluster configuration. |
80 | effb49b4 | René Nussbaumer | |
81 | 1e86ee97 | Marc Schmitt | New ``gnt-node`` Property |
82 | 1e86ee97 | Marc Schmitt | +++++++++++++++++++++++++ |
83 | 1e86ee97 | Marc Schmitt | |
84 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
85 | 1e86ee97 | Marc Schmitt | | Command: ``modify|add`` |
86 | 1e86ee97 | Marc Schmitt | | Parameters: ``--oob-program`` |
87 | 1e86ee97 | Marc Schmitt | | Options: ``--oob-program``: executable OOB program (absolute path) |
88 | 1e86ee97 | Marc Schmitt | |
89 | 1e86ee97 | Marc Schmitt | .. note:: |
90 | 1e86ee97 | Marc Schmitt | If ``--oob-program`` is set to ``!`` then the node has no OOB capabilities. |
91 | 1e86ee97 | Marc Schmitt | Otherwise, we will inherit the node group respectively the cluster wide |
92 | 1e86ee97 | Marc Schmitt | value. I.e. the nodes have to opt out from OOB capabilities. |
93 | 1e86ee97 | Marc Schmitt | |
94 | 1e86ee97 | Marc Schmitt | Addition to ``gnt-cluster verify`` |
95 | 1e86ee97 | Marc Schmitt | ++++++++++++++++++++++++++++++++++ |
96 | 1e86ee97 | Marc Schmitt | |
97 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-cluster`` |
98 | 1e86ee97 | Marc Schmitt | | Command: ``verify`` |
99 | 1e86ee97 | Marc Schmitt | | Parameter: None |
100 | 1e86ee97 | Marc Schmitt | | Option: None |
101 | 1e86ee97 | Marc Schmitt | | Additional Checks: |
102 | 1e86ee97 | Marc Schmitt | |
103 | 1e86ee97 | Marc Schmitt | 1. existence and execution flag of OOB program on all Master Candidates if |
104 | 1e86ee97 | Marc Schmitt | the cluster parameter ``--oob-program`` is set or at least one node has |
105 | 1e86ee97 | Marc Schmitt | the property ``--oob-program`` set. The OOB helper is just invoked on the |
106 | 1e86ee97 | Marc Schmitt | master |
107 | 1e86ee97 | Marc Schmitt | 2. check if node state powered matches actual power state of the machine for |
108 | 1e86ee97 | Marc Schmitt | those nodes where ``--oob-program`` is set |
109 | 1e86ee97 | Marc Schmitt | |
110 | 1e86ee97 | Marc Schmitt | New Node State |
111 | 1e86ee97 | Marc Schmitt | ++++++++++++++ |
112 | 1e86ee97 | Marc Schmitt | |
113 | 1e86ee97 | Marc Schmitt | Ganeti supports the following two boolean states related to the nodes: |
114 | 1e86ee97 | Marc Schmitt | |
115 | 1e86ee97 | Marc Schmitt | **drained** |
116 | 1e86ee97 | Marc Schmitt | The cluster still communicates with drained nodes but excludes them from |
117 | 1e86ee97 | Marc Schmitt | allocation operations |
118 | 1e86ee97 | Marc Schmitt | |
119 | 1e86ee97 | Marc Schmitt | **offline** |
120 | 1e86ee97 | Marc Schmitt | if offline, the cluster does not communicate with offline nodes; useful for |
121 | 1e86ee97 | Marc Schmitt | nodes that are not reachable in order to avoid delays |
122 | 1e86ee97 | Marc Schmitt | |
123 | 1e86ee97 | Marc Schmitt | And will extend this list with the following boolean state: |
124 | 1e86ee97 | Marc Schmitt | |
125 | 1e86ee97 | Marc Schmitt | **powered** |
126 | 1e86ee97 | Marc Schmitt | if not powered, the cluster does not communicate with not powered nodes if |
127 | 1e86ee97 | Marc Schmitt | the node property ``--oob-program`` is not set, the state powered is not |
128 | 1e86ee97 | Marc Schmitt | displayed |
129 | 1e86ee97 | Marc Schmitt | |
130 | 1e86ee97 | Marc Schmitt | Additionally modify the meaning of the offline state as follows: |
131 | 1e86ee97 | Marc Schmitt | |
132 | 1e86ee97 | Marc Schmitt | **offline** |
133 | 1e86ee97 | Marc Schmitt | if offline, the cluster does not communicate with offline nodes (**with the |
134 | 1e86ee97 | Marc Schmitt | exception of OOB commands for nodes where** ``--oob-program`` **is set**); |
135 | 1e86ee97 | Marc Schmitt | useful for nodes that are not reachable in order to avoid delays |
136 | 1e86ee97 | Marc Schmitt | |
137 | 1e86ee97 | Marc Schmitt | The corresponding command extensions are: |
138 | 1e86ee97 | Marc Schmitt | |
139 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
140 | 1e86ee97 | Marc Schmitt | | Command: ``info`` |
141 | 1e86ee97 | Marc Schmitt | | Parameter: [ ``nodename`` ... ] |
142 | 1e86ee97 | Marc Schmitt | | Option: None |
143 | 1e86ee97 | Marc Schmitt | |
144 | 1e86ee97 | Marc Schmitt | Additional Output (SoR, ommited if node property ``--oob-program`` is not set): |
145 | 1e86ee97 | Marc Schmitt | powered: ``[True|False]`` |
146 | 1e86ee97 | Marc Schmitt | |
147 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
148 | 1e86ee97 | Marc Schmitt | | Command: ``modify`` |
149 | 1e86ee97 | Marc Schmitt | | Parameter: nodename |
150 | 1e86ee97 | Marc Schmitt | | Option: [ ``--powered=yes|no`` ] |
151 | 1e86ee97 | Marc Schmitt | | Reasoning: sometimes you will need to sync the SoR with the SoW manually |
152 | 1e86ee97 | Marc Schmitt | | Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for |
153 | 1e86ee97 | Marc Schmitt | | the node in question |
154 | 1e86ee97 | Marc Schmitt | |
155 | 1e86ee97 | Marc Schmitt | New ``gnt-node`` commands: ``power [on|off|cycle|status]`` |
156 | 1e86ee97 | Marc Schmitt | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
157 | 1e86ee97 | Marc Schmitt | |
158 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
159 | 1e86ee97 | Marc Schmitt | | Command: ``power [on|off|cycle|status]`` |
160 | 1e86ee97 | Marc Schmitt | | Parameters: [ ``nodename`` ... ] |
161 | 1e86ee97 | Marc Schmitt | | Options: None |
162 | 1e86ee97 | Marc Schmitt | | Caveats: |
163 | 1e86ee97 | Marc Schmitt | |
164 | 1e86ee97 | Marc Schmitt | * If no nodenames are passed to ``power [on|off|cycle]``, the user will be |
165 | 1e86ee97 | Marc Schmitt | prompted with ``"Do you really want to power [on|off|cycle] the following |
166 | 1e86ee97 | Marc Schmitt | nodes: <display list of OOB capable nodes in the cluster)? (y/n)"`` |
167 | 1e86ee97 | Marc Schmitt | * For ``power-status``, nodename is optional, if omitted, we list the |
168 | 1e86ee97 | Marc Schmitt | power-status of all OOB capable nodes in the cluster (SoW) |
169 | 1e86ee97 | Marc Schmitt | * User should be warned and needs to confirm with yes if s/he tries to |
170 | 1e86ee97 | Marc Schmitt | ``power [off|cycle]`` a node with running instances. |
171 | 1e86ee97 | Marc Schmitt | |
172 | 1e86ee97 | Marc Schmitt | Error Handling |
173 | 1e86ee97 | Marc Schmitt | ^^^^^^^^^^^^^^ |
174 | 1e86ee97 | Marc Schmitt | |
175 | 1e86ee97 | Marc Schmitt | +------------------------------+-----------------------------------------------+ |
176 | 1e86ee97 | Marc Schmitt | | Exception | Error Message | |
177 | 1e86ee97 | Marc Schmitt | +==============================+===============================================+ |
178 | 1e86ee97 | Marc Schmitt | | OOB program return code != 0 | OOB program execution failed ($ERROR_MSG) | |
179 | 1e86ee97 | Marc Schmitt | +------------------------------+-----------------------------------------------+ |
180 | 1e86ee97 | Marc Schmitt | | OOB program execution time | OOB program execution timeout exceeded, OOB | |
181 | 1e86ee97 | Marc Schmitt | | exceeds 60s | program execution aborted | |
182 | 1e86ee97 | Marc Schmitt | +------------------------------+-----------------------------------------------+ |
183 | 1e86ee97 | Marc Schmitt | |
184 | 1e86ee97 | Marc Schmitt | Node State Changes |
185 | 1e86ee97 | Marc Schmitt | ^^^^^^^^^^^^^^^^^^ |
186 | 1e86ee97 | Marc Schmitt | |
187 | 1e86ee97 | Marc Schmitt | +----------------+-----------------+----------------+--------------------------+ |
188 | 1e86ee97 | Marc Schmitt | | State before | Command | State after | Comment | |
189 | 1e86ee97 | Marc Schmitt | | execution | | execution | | |
190 | 1e86ee97 | Marc Schmitt | +================+=================+================+==========================+ |
191 | 1e86ee97 | Marc Schmitt | | powered: False | ``power off`` | powered: False | FYI: IPMI will complain | |
192 | 1e86ee97 | Marc Schmitt | | | | | if you try to power off | |
193 | 1e86ee97 | Marc Schmitt | | | | | a machine that is already| |
194 | 1e86ee97 | Marc Schmitt | | | | | powered off | |
195 | 1e86ee97 | Marc Schmitt | +----------------+-----------------+----------------+--------------------------+ |
196 | 1e86ee97 | Marc Schmitt | | powered: False | ``power cycle`` | powered: False | FYI: IPMI will complain | |
197 | 1e86ee97 | Marc Schmitt | | | | | if you try to cycle a | |
198 | 1e86ee97 | Marc Schmitt | | | | | machine that is already | |
199 | 1e86ee97 | Marc Schmitt | | | | | powered off | |
200 | 1e86ee97 | Marc Schmitt | +----------------+-----------------+----------------+--------------------------+ |
201 | 1e86ee97 | Marc Schmitt | | powered: False | ``power on`` | powered: True | | |
202 | 1e86ee97 | Marc Schmitt | +----------------+-----------------+----------------+--------------------------+ |
203 | 1e86ee97 | Marc Schmitt | | powered: True | ``power off`` | powered: False | | |
204 | 1e86ee97 | Marc Schmitt | +----------------+-----------------+----------------+--------------------------+ |
205 | 1e86ee97 | Marc Schmitt | | powered: True | ``power cycle`` | powered: True | | |
206 | 1e86ee97 | Marc Schmitt | +----------------+-----------------+----------------+--------------------------+ |
207 | 1e86ee97 | Marc Schmitt | | powered: True | ``power on`` | powered: True | FYI: IPMI will complain | |
208 | 1e86ee97 | Marc Schmitt | | | | | if you try to power on | |
209 | 1e86ee97 | Marc Schmitt | | | | | a machine that is already| |
210 | 1e86ee97 | Marc Schmitt | | | | | powered on | |
211 | 1e86ee97 | Marc Schmitt | +----------------+-----------------+----------------+--------------------------+ |
212 | 1e86ee97 | Marc Schmitt | |
213 | 1e86ee97 | Marc Schmitt | .. note:: |
214 | 1e86ee97 | Marc Schmitt | |
215 | 1e86ee97 | Marc Schmitt | * If the command fails, the Node State remains unchanged. |
216 | 1e86ee97 | Marc Schmitt | * We will not prevent the user from trying to power off a node that is |
217 | 1e86ee97 | Marc Schmitt | already powered off since the powered state represents the **SoR** only and |
218 | 1e86ee97 | Marc Schmitt | not the **SoW**. This can however create problems when the cluster |
219 | 1e86ee97 | Marc Schmitt | administrator wants to bring the **SoR** in sync with the **SoW** without |
220 | 1e86ee97 | Marc Schmitt | actually having to mess with the node(s). For this case, we allow direct |
221 | 1e86ee97 | Marc Schmitt | modification of the powered state through the gnt-node modify |
222 | 1e86ee97 | Marc Schmitt | ``--powered=[yes|no]`` command as long as the node has OOB capabilities |
223 | 1e86ee97 | Marc Schmitt | (i.e. ``--oob-program`` is set). |
224 | 1e86ee97 | Marc Schmitt | * All node power state changes will be logged |
225 | 1e86ee97 | Marc Schmitt | |
226 | 1e86ee97 | Marc Schmitt | Node Power Status Listing (SoW) |
227 | 1e86ee97 | Marc Schmitt | +++++++++++++++++++++++++++++++ |
228 | 1e86ee97 | Marc Schmitt | |
229 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
230 | 1e86ee97 | Marc Schmitt | | Command: ``power-status`` |
231 | 1e86ee97 | Marc Schmitt | | Parameters: [ ``nodename`` ... ] |
232 | 1e86ee97 | Marc Schmitt | |
233 | 1e86ee97 | Marc Schmitt | Example output (represents **SoW**):: |
234 | 1e86ee97 | Marc Schmitt | |
235 | 1e86ee97 | Marc Schmitt | gnt-node oob power-status |
236 | 1e86ee97 | Marc Schmitt | Node Power Status |
237 | 1e86ee97 | Marc Schmitt | node1.example.com on |
238 | 1e86ee97 | Marc Schmitt | node2.example.com off |
239 | 1e86ee97 | Marc Schmitt | node3.example.com on |
240 | 1e86ee97 | Marc Schmitt | node4.example.com unknown |
241 | 1e86ee97 | Marc Schmitt | |
242 | 1e86ee97 | Marc Schmitt | .. note:: |
243 | 1e86ee97 | Marc Schmitt | |
244 | 1e86ee97 | Marc Schmitt | * We use ``unknown`` in case the Helper Program could not determine the power |
245 | 1e86ee97 | Marc Schmitt | state. |
246 | 1e86ee97 | Marc Schmitt | * If no nodenames are provided, we will list the power state of all nodes |
247 | 1e86ee97 | Marc Schmitt | which are not opted out from OOB management. |
248 | 1e86ee97 | Marc Schmitt | * Only nodes which are not opted out from OOB management will be listed. |
249 | 1e86ee97 | Marc Schmitt | Invoking the command on a node that does not meet this condition will |
250 | 1e86ee97 | Marc Schmitt | result in an error message "Node X does not support OOB commands". |
251 | 1e86ee97 | Marc Schmitt | |
252 | 1e86ee97 | Marc Schmitt | Node Power Status Listing (SoR) |
253 | 1e86ee97 | Marc Schmitt | +++++++++++++++++++++++++++++++ |
254 | 1e86ee97 | Marc Schmitt | |
255 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
256 | 1e86ee97 | Marc Schmitt | | Command: ``info`` |
257 | 1e86ee97 | Marc Schmitt | | Parameter: [ ``nodename`` ... ] |
258 | 1e86ee97 | Marc Schmitt | | Option: None |
259 | 1e86ee97 | Marc Schmitt | |
260 | 1e86ee97 | Marc Schmitt | Example output (represents **SoR**):: |
261 | 1e86ee97 | Marc Schmitt | |
262 | 1e86ee97 | Marc Schmitt | gnt-node info node1.example.com |
263 | 1e86ee97 | Marc Schmitt | Node name: node1.example.com |
264 | 1e86ee97 | Marc Schmitt | primary ip: 192.168.1.1 |
265 | 1e86ee97 | Marc Schmitt | secondary ip: 192.168.2.1 |
266 | 1e86ee97 | Marc Schmitt | master candidate: True |
267 | 1e86ee97 | Marc Schmitt | drained: False |
268 | 1e86ee97 | Marc Schmitt | offline: False |
269 | 1e86ee97 | Marc Schmitt | powered: True |
270 | 1e86ee97 | Marc Schmitt | primary for instances: |
271 | 1e86ee97 | Marc Schmitt | - inst1.example.com |
272 | 1e86ee97 | Marc Schmitt | - inst2.example.com |
273 | 1e86ee97 | Marc Schmitt | - inst3.example.com |
274 | 1e86ee97 | Marc Schmitt | secondary for instances: |
275 | 1e86ee97 | Marc Schmitt | - inst4.example.com |
276 | 1e86ee97 | Marc Schmitt | - inst5.example.com |
277 | 1e86ee97 | Marc Schmitt | - inst6.example.com |
278 | 1e86ee97 | Marc Schmitt | - inst7.example.com |
279 | 1e86ee97 | Marc Schmitt | |
280 | 1e86ee97 | Marc Schmitt | .. note:: |
281 | 1e86ee97 | Marc Schmitt | Only nodes which are not opted out from OOB management will |
282 | 1e86ee97 | Marc Schmitt | report the powered state. |
283 | 1e86ee97 | Marc Schmitt | |
284 | 1e86ee97 | Marc Schmitt | New ``gnt-node`` oob subcommand: ``health`` |
285 | 1e86ee97 | Marc Schmitt | +++++++++++++++++++++++++++++++++++++++++++ |
286 | 1e86ee97 | Marc Schmitt | |
287 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
288 | 1e86ee97 | Marc Schmitt | | Command: ``health`` |
289 | 1e86ee97 | Marc Schmitt | | Parameters: [ ``nodename`` ... ] |
290 | 1e86ee97 | Marc Schmitt | | Options: None |
291 | 1e86ee97 | Marc Schmitt | | Example: ``/usr/bin/oob health node5.example.com`` |
292 | 1e86ee97 | Marc Schmitt | |
293 | 1e86ee97 | Marc Schmitt | Caveats: |
294 | 1e86ee97 | Marc Schmitt | |
295 | 1e86ee97 | Marc Schmitt | * If no nodename(s) are provided, we will report the health of all nodes in |
296 | 1e86ee97 | Marc Schmitt | the cluster which have ``--oob-program`` set. |
297 | 1e86ee97 | Marc Schmitt | * Only nodes which are not opted out from OOB management will report their |
298 | 1e86ee97 | Marc Schmitt | health. Invoking the command on a node that does not meet this condition |
299 | 1e86ee97 | Marc Schmitt | will result in an error message "Node does not support OOB commands". |
300 | 1e86ee97 | Marc Schmitt | |
301 | 1e86ee97 | Marc Schmitt | For error handling see `Error Handling`_ |
302 | 1e86ee97 | Marc Schmitt | |
303 | 1e86ee97 | Marc Schmitt | OOB Program (Helper Program) Parameters, Return Codes and Data Format |
304 | 1e86ee97 | Marc Schmitt | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
305 | 1e86ee97 | Marc Schmitt | |
306 | 1e86ee97 | Marc Schmitt | | Program: executable OOB program (absolute path) |
307 | 1e86ee97 | Marc Schmitt | | Parameters: command nodename |
308 | 1e86ee97 | Marc Schmitt | | Command: [power-{on|off|cycle|status}|health] |
309 | 1e86ee97 | Marc Schmitt | | Options: None |
310 | 1e86ee97 | Marc Schmitt | | Example: ``/usr/bin/oob power-on node1.example.com`` |
311 | 1e86ee97 | Marc Schmitt | | Caveat: maximum runtime is limited to 60s |
312 | 1e86ee97 | Marc Schmitt | |
313 | 1e86ee97 | Marc Schmitt | Return Codes |
314 | 1e86ee97 | Marc Schmitt | ^^^^^^^^^^^^ |
315 | 1e86ee97 | Marc Schmitt | |
316 | 1e86ee97 | Marc Schmitt | +---------------+--------------------------+ |
317 | 1e86ee97 | Marc Schmitt | | Return code | Meaning | |
318 | 1e86ee97 | Marc Schmitt | +===============+==========================+ |
319 | 1e86ee97 | Marc Schmitt | | 0 | Command succeeded | |
320 | 1e86ee97 | Marc Schmitt | +---------------+--------------------------+ |
321 | 1e86ee97 | Marc Schmitt | | 1 | Command failed | |
322 | 1e86ee97 | Marc Schmitt | +---------------+--------------------------+ |
323 | 1e86ee97 | Marc Schmitt | | others | Unsupported/undefined | |
324 | 1e86ee97 | Marc Schmitt | +---------------+--------------------------+ |
325 | 1e86ee97 | Marc Schmitt | |
326 | 1e86ee97 | Marc Schmitt | Error messages are passed from the helper program to Ganeti through StdErr |
327 | 1e86ee97 | Marc Schmitt | (return code == 1). On StdOut, the helper program will send data back to |
328 | 1e86ee97 | Marc Schmitt | Ganeti (return code == 0). The format of the data is JSON. |
329 | 1e86ee97 | Marc Schmitt | |
330 | 1e86ee97 | Marc Schmitt | +------------------+-------------------------------+ |
331 | 1e86ee97 | Marc Schmitt | | Command | Expected output | |
332 | 1e86ee97 | Marc Schmitt | +==================+===============================+ |
333 | 1e86ee97 | Marc Schmitt | | ``power-on`` | None | |
334 | 1e86ee97 | Marc Schmitt | +------------------+-------------------------------+ |
335 | 1e86ee97 | Marc Schmitt | | ``power-off`` | None | |
336 | 1e86ee97 | Marc Schmitt | +------------------+-------------------------------+ |
337 | 1e86ee97 | Marc Schmitt | | ``power-cycle`` | None | |
338 | 1e86ee97 | Marc Schmitt | +------------------+-------------------------------+ |
339 | 1e86ee97 | Marc Schmitt | | ``power-status`` | ``{ "powered": true|false }`` | |
340 | 1e86ee97 | Marc Schmitt | +------------------+-------------------------------+ |
341 | 1e86ee97 | Marc Schmitt | | ``health`` | :: | |
342 | 1e86ee97 | Marc Schmitt | | | | |
343 | 1e86ee97 | Marc Schmitt | | | [[item, status], | |
344 | 1e86ee97 | Marc Schmitt | | | [item, status], | |
345 | 1e86ee97 | Marc Schmitt | | | ...] | |
346 | 1e86ee97 | Marc Schmitt | +------------------+-------------------------------+ |
347 | 1e86ee97 | Marc Schmitt | |
348 | 1e86ee97 | Marc Schmitt | Data Format |
349 | 1e86ee97 | Marc Schmitt | ^^^^^^^^^^^ |
350 | 1e86ee97 | Marc Schmitt | |
351 | 1e86ee97 | Marc Schmitt | For the health output, the fields are: |
352 | 1e86ee97 | Marc Schmitt | |
353 | 1e86ee97 | Marc Schmitt | +--------+--------------------------------------------------------------------+ |
354 | 1e86ee97 | Marc Schmitt | | Field | Meaning | |
355 | 1e86ee97 | Marc Schmitt | +========+====================================================================+ |
356 | 1e86ee97 | Marc Schmitt | | item | String identifier of the item we are querying the health of, | |
357 | 1e86ee97 | Marc Schmitt | | | examples: | |
358 | 1e86ee97 | Marc Schmitt | | | | |
359 | 1e86ee97 | Marc Schmitt | | | * Ambient Temp | |
360 | 1e86ee97 | Marc Schmitt | | | * PS Redundancy | |
361 | 1e86ee97 | Marc Schmitt | | | * FAN 1 RPM | |
362 | 1e86ee97 | Marc Schmitt | +--------+--------------------------------------------------------------------+ |
363 | 1e86ee97 | Marc Schmitt | | status | String; Can take one of the following four values: | |
364 | 1e86ee97 | Marc Schmitt | | | | |
365 | 1e86ee97 | Marc Schmitt | | | * OK | |
366 | 1e86ee97 | Marc Schmitt | | | * WARNING | |
367 | 1e86ee97 | Marc Schmitt | | | * CRITICAL | |
368 | 1e86ee97 | Marc Schmitt | | | * UNKNOWN | |
369 | 1e86ee97 | Marc Schmitt | +--------+--------------------------------------------------------------------+ |
370 | 1e86ee97 | Marc Schmitt | |
371 | 1e86ee97 | Marc Schmitt | .. note:: |
372 | 1e86ee97 | Marc Schmitt | |
373 | 1e86ee97 | Marc Schmitt | * The item output list is defined by the Helper Program. It is up to the |
374 | 1e86ee97 | Marc Schmitt | author of the Helper Program to decide which items should be monitored and |
375 | 1e86ee97 | Marc Schmitt | what each corresponding return status is. |
376 | 1e86ee97 | Marc Schmitt | * Ganeti will currently not take any actions based on the item status. It |
377 | 1e86ee97 | Marc Schmitt | will however create log entries for items with status WARNING or CRITICAL |
378 | 1e86ee97 | Marc Schmitt | for each run of the ``gnt-node oob health nodename`` command. Automatic |
379 | 1e86ee97 | Marc Schmitt | actions (regular monitoring of the item status) is considered a new service |
380 | 1e86ee97 | Marc Schmitt | and will be treated in a separate design document. |
381 | 1e86ee97 | Marc Schmitt | |
382 | 1e86ee97 | Marc Schmitt | Logging |
383 | 1e86ee97 | Marc Schmitt | ------- |
384 | 1e86ee97 | Marc Schmitt | |
385 | 1e86ee97 | Marc Schmitt | The ``gnt-node power-[on|off]`` (power state changes) commands will create log |
386 | 1e86ee97 | Marc Schmitt | entries following current Ganeti logging practices. In addition, health items |
387 | 1e86ee97 | Marc Schmitt | with status WARNING or CRITICAL will be logged for each run of ``gnt-node |
388 | 1e86ee97 | Marc Schmitt | health``. |