root / doc / design-oob.rst @ e58c4f2c
History | View | Annotate | Download (16 kB)
1 | 1e86ee97 | Marc Schmitt | Ganeti Node OOB Management Framework |
---|---|---|---|
2 | 1e86ee97 | Marc Schmitt | ==================================== |
3 | 1e86ee97 | Marc Schmitt | |
4 | 1e86ee97 | Marc Schmitt | Objective |
5 | 1e86ee97 | Marc Schmitt | --------- |
6 | 1e86ee97 | Marc Schmitt | |
7 | e3c39cc3 | Iustin Pop | Extend Ganeti with Out of Band (:term:`OOB`) Cluster Node Management |
8 | e3c39cc3 | Iustin Pop | Capabilities. |
9 | 1e86ee97 | Marc Schmitt | |
10 | 1e86ee97 | Marc Schmitt | Background |
11 | 1e86ee97 | Marc Schmitt | ---------- |
12 | 1e86ee97 | Marc Schmitt | |
13 | e3c39cc3 | Iustin Pop | Ganeti currently has no support for Out of Band management of the nodes |
14 | e3c39cc3 | Iustin Pop | in a cluster. It relies on the OS running on the nodes and has therefore |
15 | e3c39cc3 | Iustin Pop | limited possibilities when the OS is not responding. The command |
16 | e3c39cc3 | Iustin Pop | ``gnt-node powercycle`` can be issued to attempt a reboot of a node that |
17 | e3c39cc3 | Iustin Pop | crashed but there are no means to power a node off and power it back |
18 | e3c39cc3 | Iustin Pop | on. Supporting this is very handy in the following situations: |
19 | e3c39cc3 | Iustin Pop | |
20 | e3c39cc3 | Iustin Pop | * **Emergency Power Off**: During emergencies, time is critical and |
21 | e3c39cc3 | Iustin Pop | manual tasks just add latency which can be avoided through |
22 | e3c39cc3 | Iustin Pop | automation. If a server room overheats, halting the OS on the nodes |
23 | e3c39cc3 | Iustin Pop | is not enough. The nodes need to be powered off cleanly to prevent |
24 | e3c39cc3 | Iustin Pop | damage to equipment. |
25 | e3c39cc3 | Iustin Pop | * **Repairs**: In most cases, repairing a node means that the node has |
26 | e3c39cc3 | Iustin Pop | to be powered off. |
27 | e3c39cc3 | Iustin Pop | * **Crashes**: Software bugs may crash a node. Having an OS |
28 | e3c39cc3 | Iustin Pop | independent way to power-cycle a node helps to recover the node |
29 | e3c39cc3 | Iustin Pop | without human intervention. |
30 | 1e86ee97 | Marc Schmitt | |
31 | 1e86ee97 | Marc Schmitt | Overview |
32 | 1e86ee97 | Marc Schmitt | -------- |
33 | 1e86ee97 | Marc Schmitt | |
34 | e3c39cc3 | Iustin Pop | Ganeti will be extended with OOB capabilities through adding a new |
35 | e3c39cc3 | Iustin Pop | **Cluster Parameter** (``--oob-program``), a new **Node Property** |
36 | e3c39cc3 | Iustin Pop | (``--oob-program``), a new **Node State (powered)** and support in |
37 | e3c39cc3 | Iustin Pop | ``gnt-node`` for invoking an **External Helper Command** which executes |
38 | e3c39cc3 | Iustin Pop | the actual OOB command (``gnt-node <command> nodename ...``). The |
39 | e3c39cc3 | Iustin Pop | supported commands are: ``power on``, ``power off``, ``power cycle``, |
40 | e3c39cc3 | Iustin Pop | ``power status`` and ``health``. |
41 | 1e86ee97 | Marc Schmitt | |
42 | 1e86ee97 | Marc Schmitt | .. note:: |
43 | e3c39cc3 | Iustin Pop | The new **Node State (powered)** is a **State of Record** |
44 | e3c39cc3 | Iustin Pop | (:term:`SoR`), not a **State of World** (:term:`SoW`). The maximum |
45 | e3c39cc3 | Iustin Pop | execution time of the **External Helper Command** will be limited to |
46 | e3c39cc3 | Iustin Pop | 60s to prevent the cluster from getting locked for an undefined amount |
47 | e3c39cc3 | Iustin Pop | of time. |
48 | 1e86ee97 | Marc Schmitt | |
49 | 1e86ee97 | Marc Schmitt | Detailed Design |
50 | 1e86ee97 | Marc Schmitt | --------------- |
51 | 1e86ee97 | Marc Schmitt | |
52 | 1e86ee97 | Marc Schmitt | New ``gnt-cluster`` Parameter |
53 | 1e86ee97 | Marc Schmitt | +++++++++++++++++++++++++++++ |
54 | 1e86ee97 | Marc Schmitt | |
55 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-cluster`` |
56 | 1e86ee97 | Marc Schmitt | | Command: ``modify|init`` |
57 | 1e86ee97 | Marc Schmitt | | Parameters: ``--oob-program`` |
58 | 1e86ee97 | Marc Schmitt | | Options: ``--oob-program``: executable OOB program (absolute path) |
59 | 1e86ee97 | Marc Schmitt | |
60 | effb49b4 | René Nussbaumer | New ``gnt-cluster epo`` Command |
61 | effb49b4 | René Nussbaumer | +++++++++++++++++++++++++++++++ |
62 | effb49b4 | René Nussbaumer | |
63 | effb49b4 | René Nussbaumer | | Program: ``gnt-cluster`` |
64 | effb49b4 | René Nussbaumer | | Command: ``epo`` |
65 | effb49b4 | René Nussbaumer | | Parameter: ``--on`` ``--force`` ``--groups`` ``--all`` |
66 | effb49b4 | René Nussbaumer | | Options: ``--on``: By default epo turns off, with ``--on`` it tries to get the |
67 | effb49b4 | René Nussbaumer | | cluster back online |
68 | effb49b4 | René Nussbaumer | | ``--force``: To force the operation without asking for confirmation |
69 | effb49b4 | René Nussbaumer | | ``--groups``: To operate on groups instead of nodes |
70 | effb49b4 | René Nussbaumer | | ``--all``: To operate on the whole cluster |
71 | effb49b4 | René Nussbaumer | |
72 | e3c39cc3 | Iustin Pop | This is a convenience command to allow easy emergency power off of a |
73 | e3c39cc3 | Iustin Pop | whole cluster or part of it. It takes care of all steps needed to get |
74 | e3c39cc3 | Iustin Pop | the cluster into a sane state to turn off the nodes. |
75 | effb49b4 | René Nussbaumer | |
76 | e3c39cc3 | Iustin Pop | With ``--on`` it does the reverse and tries to bring the rest of the |
77 | e3c39cc3 | Iustin Pop | cluster back to life. |
78 | effb49b4 | René Nussbaumer | |
79 | effb49b4 | René Nussbaumer | .. note:: |
80 | e3c39cc3 | Iustin Pop | The master node is not able to shut itself cleanly down. Therefore, |
81 | e3c39cc3 | Iustin Pop | this command will not do all the work on single node clusters. On |
82 | e3c39cc3 | Iustin Pop | multi node clusters the command tries to find another master or if |
83 | e3c39cc3 | Iustin Pop | that is not possible prepares everything to the point where the user |
84 | e3c39cc3 | Iustin Pop | has to shutdown the master node itself alone this applies also to the |
85 | e3c39cc3 | Iustin Pop | single node cluster configuration. |
86 | effb49b4 | René Nussbaumer | |
87 | 1e86ee97 | Marc Schmitt | New ``gnt-node`` Property |
88 | 1e86ee97 | Marc Schmitt | +++++++++++++++++++++++++ |
89 | 1e86ee97 | Marc Schmitt | |
90 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
91 | 1e86ee97 | Marc Schmitt | | Command: ``modify|add`` |
92 | 1e86ee97 | Marc Schmitt | | Parameters: ``--oob-program`` |
93 | 1e86ee97 | Marc Schmitt | | Options: ``--oob-program``: executable OOB program (absolute path) |
94 | 1e86ee97 | Marc Schmitt | |
95 | 1e86ee97 | Marc Schmitt | .. note:: |
96 | e3c39cc3 | Iustin Pop | If ``--oob-program`` is set to ``!`` then the node has no OOB |
97 | e3c39cc3 | Iustin Pop | capabilities. Otherwise, we will inherit the node group respectively |
98 | e3c39cc3 | Iustin Pop | the cluster wide value. I.e. the nodes have to opt out from OOB |
99 | e3c39cc3 | Iustin Pop | capabilities. |
100 | 1e86ee97 | Marc Schmitt | |
101 | 1e86ee97 | Marc Schmitt | Addition to ``gnt-cluster verify`` |
102 | 1e86ee97 | Marc Schmitt | ++++++++++++++++++++++++++++++++++ |
103 | 1e86ee97 | Marc Schmitt | |
104 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-cluster`` |
105 | 1e86ee97 | Marc Schmitt | | Command: ``verify`` |
106 | 1e86ee97 | Marc Schmitt | | Parameter: None |
107 | 1e86ee97 | Marc Schmitt | | Option: None |
108 | 1e86ee97 | Marc Schmitt | | Additional Checks: |
109 | 1e86ee97 | Marc Schmitt | |
110 | e3c39cc3 | Iustin Pop | 1. existence and execution flag of OOB program on all Master |
111 | e3c39cc3 | Iustin Pop | Candidates if the cluster parameter ``--oob-program`` is set or at |
112 | e3c39cc3 | Iustin Pop | least one node has the property ``--oob-program`` set. The OOB |
113 | e3c39cc3 | Iustin Pop | helper is just invoked on the master |
114 | e3c39cc3 | Iustin Pop | 2. check if node state powered matches actual power state of the |
115 | e3c39cc3 | Iustin Pop | machine for those nodes where ``--oob-program`` is set |
116 | 1e86ee97 | Marc Schmitt | |
117 | 1e86ee97 | Marc Schmitt | New Node State |
118 | 1e86ee97 | Marc Schmitt | ++++++++++++++ |
119 | 1e86ee97 | Marc Schmitt | |
120 | 1e86ee97 | Marc Schmitt | Ganeti supports the following two boolean states related to the nodes: |
121 | 1e86ee97 | Marc Schmitt | |
122 | 1e86ee97 | Marc Schmitt | **drained** |
123 | e3c39cc3 | Iustin Pop | The cluster still communicates with drained nodes but excludes them |
124 | e3c39cc3 | Iustin Pop | from allocation operations |
125 | 1e86ee97 | Marc Schmitt | |
126 | 1e86ee97 | Marc Schmitt | **offline** |
127 | e3c39cc3 | Iustin Pop | if offline, the cluster does not communicate with offline nodes; |
128 | e3c39cc3 | Iustin Pop | useful for nodes that are not reachable in order to avoid delays |
129 | 1e86ee97 | Marc Schmitt | |
130 | 1e86ee97 | Marc Schmitt | And will extend this list with the following boolean state: |
131 | 1e86ee97 | Marc Schmitt | |
132 | 1e86ee97 | Marc Schmitt | **powered** |
133 | e3c39cc3 | Iustin Pop | if not powered, the cluster does not communicate with not powered |
134 | e3c39cc3 | Iustin Pop | nodes if the node property ``--oob-program`` is not set, the state |
135 | e3c39cc3 | Iustin Pop | powered is not displayed |
136 | 1e86ee97 | Marc Schmitt | |
137 | 1e86ee97 | Marc Schmitt | Additionally modify the meaning of the offline state as follows: |
138 | 1e86ee97 | Marc Schmitt | |
139 | 1e86ee97 | Marc Schmitt | **offline** |
140 | e3c39cc3 | Iustin Pop | if offline, the cluster does not communicate with offline nodes |
141 | e3c39cc3 | Iustin Pop | (**with the exception of OOB commands for nodes where** |
142 | e3c39cc3 | Iustin Pop | ``--oob-program`` **is set**); useful for nodes that are not reachable |
143 | e3c39cc3 | Iustin Pop | in order to avoid delays |
144 | 1e86ee97 | Marc Schmitt | |
145 | 1e86ee97 | Marc Schmitt | The corresponding command extensions are: |
146 | 1e86ee97 | Marc Schmitt | |
147 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
148 | 1e86ee97 | Marc Schmitt | | Command: ``info`` |
149 | 1e86ee97 | Marc Schmitt | | Parameter: [ ``nodename`` ... ] |
150 | 1e86ee97 | Marc Schmitt | | Option: None |
151 | 1e86ee97 | Marc Schmitt | |
152 | e3c39cc3 | Iustin Pop | Additional Output (:term:`SoR`, ommited if node property |
153 | e3c39cc3 | Iustin Pop | ``--oob-program`` is not set): |
154 | 1e86ee97 | Marc Schmitt | powered: ``[True|False]`` |
155 | 1e86ee97 | Marc Schmitt | |
156 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
157 | 1e86ee97 | Marc Schmitt | | Command: ``modify`` |
158 | 1e86ee97 | Marc Schmitt | | Parameter: nodename |
159 | 1e86ee97 | Marc Schmitt | | Option: [ ``--powered=yes|no`` ] |
160 | e3c39cc3 | Iustin Pop | | Reasoning: sometimes you will need to sync the :term:`SoR` with the :term:`SoW` manually |
161 | 1e86ee97 | Marc Schmitt | | Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for |
162 | 1e86ee97 | Marc Schmitt | | the node in question |
163 | 1e86ee97 | Marc Schmitt | |
164 | 1e86ee97 | Marc Schmitt | New ``gnt-node`` commands: ``power [on|off|cycle|status]`` |
165 | 1e86ee97 | Marc Schmitt | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
166 | 1e86ee97 | Marc Schmitt | |
167 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
168 | 1e86ee97 | Marc Schmitt | | Command: ``power [on|off|cycle|status]`` |
169 | 1e86ee97 | Marc Schmitt | | Parameters: [ ``nodename`` ... ] |
170 | 1e86ee97 | Marc Schmitt | | Options: None |
171 | 1e86ee97 | Marc Schmitt | | Caveats: |
172 | 1e86ee97 | Marc Schmitt | |
173 | e3c39cc3 | Iustin Pop | * If no nodenames are passed to ``power [on|off|cycle]``, the user |
174 | e3c39cc3 | Iustin Pop | will be prompted with ``"Do you really want to power [on|off|cycle] |
175 | e3c39cc3 | Iustin Pop | the following nodes: <display list of OOB capable nodes in the |
176 | e3c39cc3 | Iustin Pop | cluster)? (y/n)"`` |
177 | 1e86ee97 | Marc Schmitt | * For ``power-status``, nodename is optional, if omitted, we list the |
178 | e3c39cc3 | Iustin Pop | power-status of all OOB capable nodes in the cluster (:term:`SoW`) |
179 | 1e86ee97 | Marc Schmitt | * User should be warned and needs to confirm with yes if s/he tries to |
180 | 1e86ee97 | Marc Schmitt | ``power [off|cycle]`` a node with running instances. |
181 | 1e86ee97 | Marc Schmitt | |
182 | 1e86ee97 | Marc Schmitt | Error Handling |
183 | 1e86ee97 | Marc Schmitt | ^^^^^^^^^^^^^^ |
184 | 1e86ee97 | Marc Schmitt | |
185 | e3c39cc3 | Iustin Pop | +-----------------------------+----------------------------------------------+ |
186 | e3c39cc3 | Iustin Pop | | Exception | Error Message | |
187 | e3c39cc3 | Iustin Pop | +=============================+==============================================+ |
188 | e3c39cc3 | Iustin Pop | | OOB program return code != 0| OOB program execution failed ($ERROR_MSG) | |
189 | e3c39cc3 | Iustin Pop | +-----------------------------+----------------------------------------------+ |
190 | e3c39cc3 | Iustin Pop | | OOB program execution time | OOB program execution timeout exceeded, OOB | |
191 | e3c39cc3 | Iustin Pop | | exceeds 60s | program execution aborted | |
192 | e3c39cc3 | Iustin Pop | +-----------------------------+----------------------------------------------+ |
193 | 1e86ee97 | Marc Schmitt | |
194 | 1e86ee97 | Marc Schmitt | Node State Changes |
195 | 1e86ee97 | Marc Schmitt | ^^^^^^^^^^^^^^^^^^ |
196 | 1e86ee97 | Marc Schmitt | |
197 | e3c39cc3 | Iustin Pop | +----------------+---------------+----------------+--------------------------+ |
198 | e3c39cc3 | Iustin Pop | | State before |Command | State after | Comment | |
199 | e3c39cc3 | Iustin Pop | | execution | | execution | | |
200 | e3c39cc3 | Iustin Pop | +================+===============+================+==========================+ |
201 | e3c39cc3 | Iustin Pop | | powered: False |``power off`` | powered: False | FYI: IPMI will complain | |
202 | e3c39cc3 | Iustin Pop | | | | | if you try to power off | |
203 | e3c39cc3 | Iustin Pop | | | | | a machine that is already| |
204 | e3c39cc3 | Iustin Pop | | | | | powered off | |
205 | e3c39cc3 | Iustin Pop | +----------------+---------------+----------------+--------------------------+ |
206 | e3c39cc3 | Iustin Pop | | powered: False |``power cycle``| powered: False | FYI: IPMI will complain | |
207 | e3c39cc3 | Iustin Pop | | | | | if you try to cycle a | |
208 | e3c39cc3 | Iustin Pop | | | | | machine that is already | |
209 | e3c39cc3 | Iustin Pop | | | | | powered off | |
210 | e3c39cc3 | Iustin Pop | +----------------+---------------+----------------+--------------------------+ |
211 | e3c39cc3 | Iustin Pop | | powered: False |``power on`` | powered: True | | |
212 | e3c39cc3 | Iustin Pop | +----------------+---------------+----------------+--------------------------+ |
213 | e3c39cc3 | Iustin Pop | | powered: True |``power off`` | powered: False | | |
214 | e3c39cc3 | Iustin Pop | +----------------+---------------+----------------+--------------------------+ |
215 | e3c39cc3 | Iustin Pop | | powered: True |``power cycle``| powered: True | | |
216 | e3c39cc3 | Iustin Pop | +----------------+---------------+----------------+--------------------------+ |
217 | e3c39cc3 | Iustin Pop | | powered: True |``power on`` | powered: True | FYI: IPMI will complain | |
218 | e3c39cc3 | Iustin Pop | | | | | if you try to power on | |
219 | e3c39cc3 | Iustin Pop | | | | | a machine that is already| |
220 | e3c39cc3 | Iustin Pop | | | | | powered on | |
221 | e3c39cc3 | Iustin Pop | +----------------+---------------+----------------+--------------------------+ |
222 | 1e86ee97 | Marc Schmitt | |
223 | 1e86ee97 | Marc Schmitt | .. note:: |
224 | 1e86ee97 | Marc Schmitt | |
225 | 1e86ee97 | Marc Schmitt | * If the command fails, the Node State remains unchanged. |
226 | 1e86ee97 | Marc Schmitt | * We will not prevent the user from trying to power off a node that is |
227 | e3c39cc3 | Iustin Pop | already powered off since the powered state represents the |
228 | e3c39cc3 | Iustin Pop | :term:`SoR` only and not the :term:`SoW`. This can however create |
229 | e3c39cc3 | Iustin Pop | problems when the cluster administrator wants to bring the |
230 | e3c39cc3 | Iustin Pop | :term:`SoR` in sync with the :term:SoW` without actually having to |
231 | e3c39cc3 | Iustin Pop | mess with the node(s). For this case, we allow direct modification |
232 | e3c39cc3 | Iustin Pop | of the powered state through the gnt-node modify |
233 | e3c39cc3 | Iustin Pop | ``--powered=[yes|no]`` command as long as the node has OOB |
234 | e3c39cc3 | Iustin Pop | capabilities (i.e. ``--oob-program`` is set). |
235 | 1e86ee97 | Marc Schmitt | * All node power state changes will be logged |
236 | 1e86ee97 | Marc Schmitt | |
237 | e3c39cc3 | Iustin Pop | Node Power Status Listing (:term:`SoW`) |
238 | e3c39cc3 | Iustin Pop | +++++++++++++++++++++++++++++++++++++++ |
239 | 1e86ee97 | Marc Schmitt | |
240 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
241 | 1e86ee97 | Marc Schmitt | | Command: ``power-status`` |
242 | 1e86ee97 | Marc Schmitt | | Parameters: [ ``nodename`` ... ] |
243 | 1e86ee97 | Marc Schmitt | |
244 | e3c39cc3 | Iustin Pop | Example output (represents :term:`SoW`):: |
245 | 1e86ee97 | Marc Schmitt | |
246 | 1e86ee97 | Marc Schmitt | gnt-node oob power-status |
247 | 1e86ee97 | Marc Schmitt | Node Power Status |
248 | 1e86ee97 | Marc Schmitt | node1.example.com on |
249 | 1e86ee97 | Marc Schmitt | node2.example.com off |
250 | 1e86ee97 | Marc Schmitt | node3.example.com on |
251 | 1e86ee97 | Marc Schmitt | node4.example.com unknown |
252 | 1e86ee97 | Marc Schmitt | |
253 | 1e86ee97 | Marc Schmitt | .. note:: |
254 | 1e86ee97 | Marc Schmitt | |
255 | e3c39cc3 | Iustin Pop | * We use ``unknown`` in case the Helper Program could not determine |
256 | e3c39cc3 | Iustin Pop | the power state. |
257 | e3c39cc3 | Iustin Pop | * If no nodenames are provided, we will list the power state of all |
258 | e3c39cc3 | Iustin Pop | nodes which are not opted out from OOB management. |
259 | e3c39cc3 | Iustin Pop | * Only nodes which are not opted out from OOB management will be |
260 | e3c39cc3 | Iustin Pop | listed. Invoking the command on a node that does not meet this |
261 | e3c39cc3 | Iustin Pop | condition will result in an error message "Node X does not support |
262 | e3c39cc3 | Iustin Pop | OOB commands". |
263 | 1e86ee97 | Marc Schmitt | |
264 | e3c39cc3 | Iustin Pop | Node Power Status Listing (:term:`SoR`) |
265 | e3c39cc3 | Iustin Pop | +++++++++++++++++++++++++++++++++++++++ |
266 | 1e86ee97 | Marc Schmitt | |
267 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
268 | 1e86ee97 | Marc Schmitt | | Command: ``info`` |
269 | 1e86ee97 | Marc Schmitt | | Parameter: [ ``nodename`` ... ] |
270 | 1e86ee97 | Marc Schmitt | | Option: None |
271 | 1e86ee97 | Marc Schmitt | |
272 | e3c39cc3 | Iustin Pop | Example output (represents :term:`SoR`):: |
273 | 1e86ee97 | Marc Schmitt | |
274 | 1e86ee97 | Marc Schmitt | gnt-node info node1.example.com |
275 | 1e86ee97 | Marc Schmitt | Node name: node1.example.com |
276 | 1e86ee97 | Marc Schmitt | primary ip: 192.168.1.1 |
277 | 1e86ee97 | Marc Schmitt | secondary ip: 192.168.2.1 |
278 | 1e86ee97 | Marc Schmitt | master candidate: True |
279 | 1e86ee97 | Marc Schmitt | drained: False |
280 | 1e86ee97 | Marc Schmitt | offline: False |
281 | 1e86ee97 | Marc Schmitt | powered: True |
282 | 1e86ee97 | Marc Schmitt | primary for instances: |
283 | 1e86ee97 | Marc Schmitt | - inst1.example.com |
284 | 1e86ee97 | Marc Schmitt | - inst2.example.com |
285 | 1e86ee97 | Marc Schmitt | - inst3.example.com |
286 | 1e86ee97 | Marc Schmitt | secondary for instances: |
287 | 1e86ee97 | Marc Schmitt | - inst4.example.com |
288 | 1e86ee97 | Marc Schmitt | - inst5.example.com |
289 | 1e86ee97 | Marc Schmitt | - inst6.example.com |
290 | 1e86ee97 | Marc Schmitt | - inst7.example.com |
291 | 1e86ee97 | Marc Schmitt | |
292 | 1e86ee97 | Marc Schmitt | .. note:: |
293 | e3c39cc3 | Iustin Pop | Only nodes which are not opted out from OOB management will report the |
294 | e3c39cc3 | Iustin Pop | powered state. |
295 | 1e86ee97 | Marc Schmitt | |
296 | 1e86ee97 | Marc Schmitt | New ``gnt-node`` oob subcommand: ``health`` |
297 | 1e86ee97 | Marc Schmitt | +++++++++++++++++++++++++++++++++++++++++++ |
298 | 1e86ee97 | Marc Schmitt | |
299 | 1e86ee97 | Marc Schmitt | | Program: ``gnt-node`` |
300 | 1e86ee97 | Marc Schmitt | | Command: ``health`` |
301 | 1e86ee97 | Marc Schmitt | | Parameters: [ ``nodename`` ... ] |
302 | 1e86ee97 | Marc Schmitt | | Options: None |
303 | 1e86ee97 | Marc Schmitt | | Example: ``/usr/bin/oob health node5.example.com`` |
304 | 1e86ee97 | Marc Schmitt | |
305 | 1e86ee97 | Marc Schmitt | Caveats: |
306 | 1e86ee97 | Marc Schmitt | |
307 | e3c39cc3 | Iustin Pop | * If no nodename(s) are provided, we will report the health of all |
308 | e3c39cc3 | Iustin Pop | nodes in the cluster which have ``--oob-program`` set. |
309 | e3c39cc3 | Iustin Pop | * Only nodes which are not opted out from OOB management will report |
310 | e3c39cc3 | Iustin Pop | their health. Invoking the command on a node that does not meet this |
311 | e3c39cc3 | Iustin Pop | condition will result in an error message "Node does not support OOB |
312 | e3c39cc3 | Iustin Pop | commands". |
313 | 1e86ee97 | Marc Schmitt | |
314 | 1e86ee97 | Marc Schmitt | For error handling see `Error Handling`_ |
315 | 1e86ee97 | Marc Schmitt | |
316 | 1e86ee97 | Marc Schmitt | OOB Program (Helper Program) Parameters, Return Codes and Data Format |
317 | 1e86ee97 | Marc Schmitt | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
318 | 1e86ee97 | Marc Schmitt | |
319 | 1e86ee97 | Marc Schmitt | | Program: executable OOB program (absolute path) |
320 | 1e86ee97 | Marc Schmitt | | Parameters: command nodename |
321 | 1e86ee97 | Marc Schmitt | | Command: [power-{on|off|cycle|status}|health] |
322 | 1e86ee97 | Marc Schmitt | | Options: None |
323 | 1e86ee97 | Marc Schmitt | | Example: ``/usr/bin/oob power-on node1.example.com`` |
324 | 1e86ee97 | Marc Schmitt | | Caveat: maximum runtime is limited to 60s |
325 | 1e86ee97 | Marc Schmitt | |
326 | 1e86ee97 | Marc Schmitt | Return Codes |
327 | 1e86ee97 | Marc Schmitt | ^^^^^^^^^^^^ |
328 | 1e86ee97 | Marc Schmitt | |
329 | e3c39cc3 | Iustin Pop | +-------------+-------------------------+ |
330 | e3c39cc3 | Iustin Pop | | Return code | Meaning | |
331 | e3c39cc3 | Iustin Pop | +=============+=========================+ |
332 | e3c39cc3 | Iustin Pop | | 0 | Command succeeded | |
333 | e3c39cc3 | Iustin Pop | +-------------+-------------------------+ |
334 | e3c39cc3 | Iustin Pop | | 1 | Command failed | |
335 | e3c39cc3 | Iustin Pop | +-------------+-------------------------+ |
336 | e3c39cc3 | Iustin Pop | | others | Unsupported/undefined | |
337 | e3c39cc3 | Iustin Pop | +-------------+-------------------------+ |
338 | e3c39cc3 | Iustin Pop | |
339 | e3c39cc3 | Iustin Pop | Error messages are passed from the helper program to Ganeti through |
340 | e3c39cc3 | Iustin Pop | :manpage:`stderr(3)` (return code == 1). On :manpage:`stdout(3)`, the |
341 | e3c39cc3 | Iustin Pop | helper program will send data back to Ganeti (return code == 0). The |
342 | e3c39cc3 | Iustin Pop | format of the data is JSON. |
343 | e3c39cc3 | Iustin Pop | |
344 | e3c39cc3 | Iustin Pop | +-----------------+------------------------------+ |
345 | e3c39cc3 | Iustin Pop | | Command | Expected output | |
346 | e3c39cc3 | Iustin Pop | +=================+==============================+ |
347 | e3c39cc3 | Iustin Pop | | ``power-on`` | None | |
348 | e3c39cc3 | Iustin Pop | +-----------------+------------------------------+ |
349 | e3c39cc3 | Iustin Pop | | ``power-off`` | None | |
350 | e3c39cc3 | Iustin Pop | +-----------------+------------------------------+ |
351 | e3c39cc3 | Iustin Pop | | ``power-cycle`` | None | |
352 | e3c39cc3 | Iustin Pop | +-----------------+------------------------------+ |
353 | e3c39cc3 | Iustin Pop | | ``power-status``| ``{ "powered": true|false }``| |
354 | e3c39cc3 | Iustin Pop | +-----------------+------------------------------+ |
355 | e3c39cc3 | Iustin Pop | | ``health`` | :: | |
356 | e3c39cc3 | Iustin Pop | | | | |
357 | e3c39cc3 | Iustin Pop | | | [[item, status], | |
358 | e3c39cc3 | Iustin Pop | | | [item, status], | |
359 | e3c39cc3 | Iustin Pop | | | ...] | |
360 | e3c39cc3 | Iustin Pop | +-----------------+------------------------------+ |
361 | 1e86ee97 | Marc Schmitt | |
362 | 1e86ee97 | Marc Schmitt | Data Format |
363 | 1e86ee97 | Marc Schmitt | ^^^^^^^^^^^ |
364 | 1e86ee97 | Marc Schmitt | |
365 | 1e86ee97 | Marc Schmitt | For the health output, the fields are: |
366 | 1e86ee97 | Marc Schmitt | |
367 | e3c39cc3 | Iustin Pop | +--------+------------------------------------------------------------------+ |
368 | e3c39cc3 | Iustin Pop | | Field | Meaning | |
369 | e3c39cc3 | Iustin Pop | +========+==================================================================+ |
370 | e3c39cc3 | Iustin Pop | | item | String identifier of the item we are querying the health of, | |
371 | e3c39cc3 | Iustin Pop | | | examples: | |
372 | e3c39cc3 | Iustin Pop | | | | |
373 | e3c39cc3 | Iustin Pop | | | * Ambient Temp | |
374 | e3c39cc3 | Iustin Pop | | | * PS Redundancy | |
375 | e3c39cc3 | Iustin Pop | | | * FAN 1 RPM | |
376 | e3c39cc3 | Iustin Pop | +--------+------------------------------------------------------------------+ |
377 | e3c39cc3 | Iustin Pop | | status | String; Can take one of the following four values: | |
378 | e3c39cc3 | Iustin Pop | | | | |
379 | e3c39cc3 | Iustin Pop | | | * OK | |
380 | e3c39cc3 | Iustin Pop | | | * WARNING | |
381 | e3c39cc3 | Iustin Pop | | | * CRITICAL | |
382 | e3c39cc3 | Iustin Pop | | | * UNKNOWN | |
383 | e3c39cc3 | Iustin Pop | +--------+------------------------------------------------------------------+ |
384 | 1e86ee97 | Marc Schmitt | |
385 | 1e86ee97 | Marc Schmitt | .. note:: |
386 | 1e86ee97 | Marc Schmitt | |
387 | e3c39cc3 | Iustin Pop | * The item output list is defined by the Helper Program. It is up to |
388 | e3c39cc3 | Iustin Pop | the author of the Helper Program to decide which items should be |
389 | e3c39cc3 | Iustin Pop | monitored and what each corresponding return status is. |
390 | e3c39cc3 | Iustin Pop | * Ganeti will currently not take any actions based on the item |
391 | e3c39cc3 | Iustin Pop | status. It will however create log entries for items with status |
392 | e3c39cc3 | Iustin Pop | WARNING or CRITICAL for each run of the ``gnt-node oob health |
393 | e3c39cc3 | Iustin Pop | nodename`` command. Automatic actions (regular monitoring of the |
394 | e3c39cc3 | Iustin Pop | item status) is considered a new service and will be treated in a |
395 | e3c39cc3 | Iustin Pop | separate design document. |
396 | 1e86ee97 | Marc Schmitt | |
397 | 1e86ee97 | Marc Schmitt | Logging |
398 | 1e86ee97 | Marc Schmitt | ------- |
399 | 1e86ee97 | Marc Schmitt | |
400 | e3c39cc3 | Iustin Pop | The ``gnt-node power-[on|off]`` (power state changes) commands will |
401 | e3c39cc3 | Iustin Pop | create log entries following current Ganeti logging practices. In |
402 | e3c39cc3 | Iustin Pop | addition, health items with status WARNING or CRITICAL will be logged |
403 | e3c39cc3 | Iustin Pop | for each run of ``gnt-node health``. |
404 | 9ff4f2c0 | Michael Hanselmann | |
405 | 9ff4f2c0 | Michael Hanselmann | .. vim: set textwidth=72 : |
406 | 9ff4f2c0 | Michael Hanselmann | .. Local Variables: |
407 | 9ff4f2c0 | Michael Hanselmann | .. mode: rst |
408 | 9ff4f2c0 | Michael Hanselmann | .. fill-column: 72 |
409 | 9ff4f2c0 | Michael Hanselmann | .. End: |