root / doc / design-autorepair.rst @ 11c97d7c
History | View | Annotate | Download (15.7 kB)
1 | 68640987 | Guido Trotter | ==================== |
---|---|---|---|
2 | 68640987 | Guido Trotter | Instance auto-repair |
3 | 68640987 | Guido Trotter | ==================== |
4 | 68640987 | Guido Trotter | |
5 | 68640987 | Guido Trotter | .. contents:: :depth: 4 |
6 | 68640987 | Guido Trotter | |
7 | 68640987 | Guido Trotter | This is a design document detailing the implementation of self-repair and |
8 | 68640987 | Guido Trotter | recreation of instances in Ganeti. It also discusses ideas that might be useful |
9 | 68640987 | Guido Trotter | for more future self-repair situations. |
10 | 68640987 | Guido Trotter | |
11 | 68640987 | Guido Trotter | Current state and shortcomings |
12 | 68640987 | Guido Trotter | ============================== |
13 | 68640987 | Guido Trotter | |
14 | 68640987 | Guido Trotter | Ganeti currently doesn't do any sort of self-repair or self-recreate of |
15 | 68640987 | Guido Trotter | instances: |
16 | 68640987 | Guido Trotter | |
17 | 68640987 | Guido Trotter | - If a drbd instance is broken (its primary of secondary nodes go |
18 | 68640987 | Guido Trotter | offline or need to be drained) an admin or an external tool must fail |
19 | 68640987 | Guido Trotter | it over if necessary, and then trigger a disk replacement. |
20 | 68640987 | Guido Trotter | - If a plain instance is broken (or both nodes of a drbd instance are) |
21 | 68640987 | Guido Trotter | an admin or an external tool must recreate its disk and reinstall it. |
22 | 68640987 | Guido Trotter | |
23 | 68640987 | Guido Trotter | Moreover in an oversubscribed cluster operations mentioned above might |
24 | 68640987 | Guido Trotter | fail for lack of capacity until a node is repaired or a new one added. |
25 | 68640987 | Guido Trotter | In this case an external tool would also need to go through any |
26 | 68640987 | Guido Trotter | "pending-recreate" or "pending-repair" instances and fix them. |
27 | 68640987 | Guido Trotter | |
28 | 68640987 | Guido Trotter | Proposed changes |
29 | 68640987 | Guido Trotter | ================ |
30 | 68640987 | Guido Trotter | |
31 | 68640987 | Guido Trotter | We'd like to increase the self-repair capabilities of Ganeti, at least |
32 | 68640987 | Guido Trotter | with regards to instances. In order to do so we plan to add mechanisms |
33 | 68640987 | Guido Trotter | to mark an instance as "due for being repaired" and then the relevant |
34 | 68640987 | Guido Trotter | repair to be performed as soon as it's possible, on the cluster. |
35 | 68640987 | Guido Trotter | |
36 | 68640987 | Guido Trotter | The self repair will be written as part of ganeti-watcher or as an extra |
37 | 68640987 | Guido Trotter | watcher component that is called less often. |
38 | 68640987 | Guido Trotter | |
39 | 68640987 | Guido Trotter | As the first version we'll only handle the case in which an instance |
40 | 68640987 | Guido Trotter | lives on an offline or drained node. In the future we may add more |
41 | 68640987 | Guido Trotter | self-repair capabilities for errors ganeti can detect. |
42 | 68640987 | Guido Trotter | |
43 | 68640987 | Guido Trotter | New attributes (or tags) |
44 | 68640987 | Guido Trotter | ------------------------ |
45 | 68640987 | Guido Trotter | |
46 | 68640987 | Guido Trotter | In order to know when to perform a self-repair operation we need to know |
47 | 68640987 | Guido Trotter | whether they are allowed by the cluster administrator. |
48 | 68640987 | Guido Trotter | |
49 | 68640987 | Guido Trotter | This can be implemented as either new attributes or tags. Tags could be |
50 | 68640987 | Guido Trotter | acceptable as they would only be read and interpreted by the self-repair tool |
51 | 68640987 | Guido Trotter | (part of the watcher), and not by the ganeti core opcodes and node rpcs. The |
52 | 68640987 | Guido Trotter | following tags would be needed: |
53 | 68640987 | Guido Trotter | |
54 | 68640987 | Guido Trotter | ganeti:watcher:autorepair:<type> |
55 | 68640987 | Guido Trotter | ++++++++++++++++++++++++++++++++ |
56 | 68640987 | Guido Trotter | |
57 | 68640987 | Guido Trotter | (instance/nodegroup/cluster) |
58 | 68640987 | Guido Trotter | Allow repairs to happen on an instance that has the tag, or that lives |
59 | 68640987 | Guido Trotter | in a cluster or nodegroup which does. Types of repair are in order of |
60 | 68640987 | Guido Trotter | perceived risk, lower to higher, and each type includes allowing the |
61 | 68640987 | Guido Trotter | operations in the lower ones: |
62 | 68640987 | Guido Trotter | |
63 | 68640987 | Guido Trotter | - ``fix-storage`` allows a disk replacement or another operation that |
64 | 68640987 | Guido Trotter | fixes the instance backend storage without affecting the instance |
65 | 68640987 | Guido Trotter | itself. This can for example recover from a broken drbd secondary, but |
66 | 68640987 | Guido Trotter | risks data loss if something is wrong on the primary but the secondary |
67 | 68640987 | Guido Trotter | was somehow recoverable. |
68 | 68640987 | Guido Trotter | - ``migrate`` allows an instance migration. This can recover from a |
69 | 68640987 | Guido Trotter | drained primary, but can cause an instance crash in some cases (bugs). |
70 | 68640987 | Guido Trotter | - ``failover`` allows instance reboot on the secondary. This can recover |
71 | 68640987 | Guido Trotter | from an offline primary, but the instance will lose its running state. |
72 | 68640987 | Guido Trotter | - ``reinstall`` allows disks to be recreated and an instance to be |
73 | 68640987 | Guido Trotter | reinstalled. This can recover from primary&secondary both being |
74 | 68640987 | Guido Trotter | offline, or from an offline primary in the case of non-redundant |
75 | 68640987 | Guido Trotter | instances. It causes data loss. |
76 | 68640987 | Guido Trotter | |
77 | 68640987 | Guido Trotter | Each repair type allows all the operations in the previous types, in the |
78 | 68640987 | Guido Trotter | order above, in order to ensure a repair can be completed fully. As such |
79 | 68640987 | Guido Trotter | a repair of a lower type might not be able to proceed if it detects an |
80 | 68640987 | Guido Trotter | error condition that requires a more risky or drastic solution, but |
81 | 68640987 | Guido Trotter | never vice versa (if a worse solution is allowed then so is a better |
82 | 68640987 | Guido Trotter | one). |
83 | 68640987 | Guido Trotter | |
84 | b1eb71c7 | Dato Simó | If there are multiple ``ganeti:watcher:autorepair:<type>`` tags in an |
85 | b1eb71c7 | Dato Simó | object (cluster, node group or instance), the least destructive tag |
86 | b1eb71c7 | Dato Simó | takes precedence. When multiplicity happens across objects, the nearest |
87 | b1eb71c7 | Dato Simó | tag wins. For example, if in a cluster with two instances, *I1* and |
88 | b1eb71c7 | Dato Simó | *I2*, *I1* has ``failover``, and the cluster itself has both |
89 | b1eb71c7 | Dato Simó | ``fix-storage`` and ``reinstall``, *I1* will end up with ``failover`` |
90 | b1eb71c7 | Dato Simó | and *I2* with ``fix-storage``. |
91 | b1eb71c7 | Dato Simó | |
92 | 68640987 | Guido Trotter | ganeti:watcher:autorepair:suspend[:<timestamp>] |
93 | 68640987 | Guido Trotter | +++++++++++++++++++++++++++++++++++++++++++++++ |
94 | 68640987 | Guido Trotter | |
95 | 68640987 | Guido Trotter | (instance/nodegroup/cluster) |
96 | 68640987 | Guido Trotter | If this tag is encountered no autorepair operations will start for the |
97 | 68640987 | Guido Trotter | instance (or for any instance, if present at the cluster or group |
98 | 68640987 | Guido Trotter | level). Any job which already started will be allowed to finish, but |
99 | 68640987 | Guido Trotter | then the autorepair system will not proceed further until this tag is |
100 | 68640987 | Guido Trotter | removed, or the timestamp passes (in which case the tag will be removed |
101 | 68640987 | Guido Trotter | automatically by the watcher). |
102 | 68640987 | Guido Trotter | |
103 | 68640987 | Guido Trotter | Note that depending on how this tag is used there might still be race |
104 | 68640987 | Guido Trotter | conditions related to it for an external tool that uses it |
105 | 68640987 | Guido Trotter | programmatically, as no "lock tag" or tag "test-and-set" operation is |
106 | 68640987 | Guido Trotter | present at this time. While this is known we won't solve these race |
107 | 68640987 | Guido Trotter | conditions in the first version. |
108 | 68640987 | Guido Trotter | |
109 | 68640987 | Guido Trotter | It might also be useful to easily have an operation that tags all |
110 | 68640987 | Guido Trotter | instances matching a filter on some charateristic. But again, this |
111 | 68640987 | Guido Trotter | wouldn't be specific to this tag. |
112 | 68640987 | Guido Trotter | |
113 | b1eb71c7 | Dato Simó | If there are multiple |
114 | b1eb71c7 | Dato Simó | ``ganeti:watcher:autorepair:suspend[:<timestamp>]`` tags in an object, |
115 | b1eb71c7 | Dato Simó | the form without timestamp takes precedence (permanent suspension); or, |
116 | b1eb71c7 | Dato Simó | if all object tags have a timestamp, the one with the highest timestamp. |
117 | b1eb71c7 | Dato Simó | When multiplicity happens across objects, the nearest tag wins, as |
118 | b1eb71c7 | Dato Simó | above. This makes it possible to suspend cluster-enabled repairs with a |
119 | b1eb71c7 | Dato Simó | single tag in the cluster object; or to suspend them only for a certain |
120 | b1eb71c7 | Dato Simó | node group or instance. At the same time, it is possible to re-enable |
121 | b1eb71c7 | Dato Simó | cluster-suspended repairs in a particular instance or group by applying |
122 | b1eb71c7 | Dato Simó | an enable tag to them. |
123 | b1eb71c7 | Dato Simó | |
124 | e47e51a8 | Dato Simó | ganeti:watcher:autorepair:pending:<type>:<id>:<timestamp>:<jobs> |
125 | e47e51a8 | Dato Simó | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
126 | 68640987 | Guido Trotter | |
127 | 68640987 | Guido Trotter | (instance) |
128 | 68640987 | Guido Trotter | If this tag is present a repair of type ``type`` is pending on the |
129 | 68640987 | Guido Trotter | target instance. This means that either jobs are being run, or it's |
130 | 68640987 | Guido Trotter | waiting for resource availability. ``id`` is the unique id identifying |
131 | 68640987 | Guido Trotter | this repair, ``timestamp`` is the time when this tag was first applied |
132 | 68640987 | Guido Trotter | to this instance for this ``id`` (we will "update" the tag by adding a |
133 | 68640987 | Guido Trotter | "new copy" of it and removing the old version as we run more jobs, but |
134 | 68640987 | Guido Trotter | the timestamp will never change for the same repair) |
135 | 68640987 | Guido Trotter | |
136 | 68640987 | Guido Trotter | ``jobs`` is the list of jobs already run or being run to repair the |
137 | 6d675203 | Dato Simó | instance (separated by a plus sign, *+*). If the instance has just |
138 | 6d675203 | Dato Simó | been put in pending state but no job has run yet, this list is empty. |
139 | 68640987 | Guido Trotter | |
140 | 68640987 | Guido Trotter | This tag will be set by ganeti if an equivalent autorepair tag is |
141 | 68640987 | Guido Trotter | present and a a repair is needed, or can be set by an external tool to |
142 | 68640987 | Guido Trotter | request a repair as a "once off". |
143 | 68640987 | Guido Trotter | |
144 | 68640987 | Guido Trotter | If multiple instances of this tag are present they will be handled in |
145 | 68640987 | Guido Trotter | order of timestamp. |
146 | 68640987 | Guido Trotter | |
147 | e47e51a8 | Dato Simó | ganeti:watcher:autorepair:result:<type>:<id>:<timestamp>:<result>:<jobs> |
148 | e47e51a8 | Dato Simó | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
149 | 68640987 | Guido Trotter | |
150 | 68640987 | Guido Trotter | (instance) |
151 | 68640987 | Guido Trotter | If this tag is present a repair of type ``type`` has been performed on |
152 | 68640987 | Guido Trotter | the instance and has been completed by ``timestamp``. The result is |
153 | 6d675203 | Dato Simó | either ``success``, ``failure`` or ``enoperm``, and jobs is a |
154 | 6d675203 | Dato Simó | *+*-separated list of jobs that were executed for this repair. |
155 | 68640987 | Guido Trotter | |
156 | 68640987 | Guido Trotter | An ``enoperm`` result is returned when the repair was brought on until |
157 | 68640987 | Guido Trotter | possible, but the repair type doesn't consent to proceed further. |
158 | 68640987 | Guido Trotter | |
159 | 68640987 | Guido Trotter | Possible states, and transitions |
160 | 68640987 | Guido Trotter | -------------------------------- |
161 | 68640987 | Guido Trotter | |
162 | 68640987 | Guido Trotter | At any point an instance can be in one of the following health states: |
163 | 68640987 | Guido Trotter | |
164 | 68640987 | Guido Trotter | Healthy |
165 | 68640987 | Guido Trotter | +++++++ |
166 | 68640987 | Guido Trotter | |
167 | 68640987 | Guido Trotter | The instance lives on only online nodes. The autorepair system will |
168 | 68640987 | Guido Trotter | never touch these instances. Any ``repair:pending`` tags will be removed |
169 | 68640987 | Guido Trotter | and marked ``success`` with no jobs attached to them. |
170 | 68640987 | Guido Trotter | |
171 | 68640987 | Guido Trotter | This state can transition to: |
172 | 68640987 | Guido Trotter | |
173 | 68640987 | Guido Trotter | - Needs-repair, repair disallowed (node offlined or drained, no |
174 | 68640987 | Guido Trotter | autorepair tag) |
175 | 68640987 | Guido Trotter | - Needs-repair, autorepair allowed (node offlined or drained, autorepair |
176 | 68640987 | Guido Trotter | tag present) |
177 | 68640987 | Guido Trotter | - Suspended (a suspend tag is added) |
178 | 68640987 | Guido Trotter | |
179 | 68640987 | Guido Trotter | Suspended |
180 | 68640987 | Guido Trotter | +++++++++ |
181 | 68640987 | Guido Trotter | |
182 | 68640987 | Guido Trotter | Whenever a ``repair:suspend`` tag is added the autorepair code won't |
183 | 68640987 | Guido Trotter | touch the instance until the timestamp on the tag has passed, if |
184 | 68640987 | Guido Trotter | present. The tag will be removed afterwards (and the instance will |
185 | 68640987 | Guido Trotter | transition to its correct state, depending on its health and other |
186 | 68640987 | Guido Trotter | tags). |
187 | 68640987 | Guido Trotter | |
188 | 68640987 | Guido Trotter | Note that when an instance is suspended any pending repair is |
189 | 68640987 | Guido Trotter | interrupted, but jobs which were submitted before the suspension are |
190 | 68640987 | Guido Trotter | allowed to finish. |
191 | 68640987 | Guido Trotter | |
192 | 68640987 | Guido Trotter | Needs-repair, repair disallowed |
193 | 68640987 | Guido Trotter | +++++++++++++++++++++++++++++++ |
194 | 68640987 | Guido Trotter | |
195 | 68640987 | Guido Trotter | The instance lives on an offline or drained node, but no autorepair tag |
196 | 68640987 | Guido Trotter | is set, or the autorepair tag set is of a type not powerful enough to |
197 | 68640987 | Guido Trotter | finish the repair. The autorepair system will never touch these |
198 | 68640987 | Guido Trotter | instances, and they can transition to: |
199 | 68640987 | Guido Trotter | |
200 | 68640987 | Guido Trotter | - Healthy (manual repair) |
201 | 68640987 | Guido Trotter | - Pending repair (a ``repair:pending`` tag is added) |
202 | 68640987 | Guido Trotter | - Needs-repair, repair allowed always (an autorepair always tag is added) |
203 | 68640987 | Guido Trotter | - Suspended (a suspend tag is added) |
204 | 68640987 | Guido Trotter | |
205 | 68640987 | Guido Trotter | Needs-repair, repair allowed always |
206 | 68640987 | Guido Trotter | +++++++++++++++++++++++++++++++++++ |
207 | 68640987 | Guido Trotter | |
208 | 68640987 | Guido Trotter | A ``repair:pending`` tag is added, and the instance transitions to the |
209 | 68640987 | Guido Trotter | Pending Repair state. The autorepair tag is preserved. |
210 | 68640987 | Guido Trotter | |
211 | 68640987 | Guido Trotter | Of course if a ``repair:suspended`` tag is found no pending tag will be |
212 | 68640987 | Guido Trotter | added, and the instance will instead transition to the Suspended state. |
213 | 68640987 | Guido Trotter | |
214 | 68640987 | Guido Trotter | Pending repair |
215 | 68640987 | Guido Trotter | ++++++++++++++ |
216 | 68640987 | Guido Trotter | |
217 | 68640987 | Guido Trotter | When an instance is in this stage the following will happen: |
218 | 68640987 | Guido Trotter | |
219 | 68640987 | Guido Trotter | If a ``repair:suspended`` tag is found the instance won't be touched and |
220 | 68640987 | Guido Trotter | moved to the Suspended state. Any jobs which were already running will |
221 | 68640987 | Guido Trotter | be left untouched. |
222 | 68640987 | Guido Trotter | |
223 | 68640987 | Guido Trotter | If there are still jobs running related to the instance and scheduled by |
224 | 68640987 | Guido Trotter | this repair they will be given more time to run, and the instance will |
225 | 68640987 | Guido Trotter | be checked again later. The state transitions to itself. |
226 | 68640987 | Guido Trotter | |
227 | 68640987 | Guido Trotter | If no jobs are running and the instance is detected to be healthy, the |
228 | 68640987 | Guido Trotter | ``repair:result`` tag will be added, and the current active |
229 | 68640987 | Guido Trotter | ``repair:pending`` tag will be removed. It will then transition to the |
230 | 68640987 | Guido Trotter | Healthy state if there are no ``repair:pending`` tags, or to the Pending |
231 | 68640987 | Guido Trotter | state otherwise: there, the instance being healthy, those tags will be |
232 | 68640987 | Guido Trotter | resolved without any operation as well (note that this is the same as |
233 | 68640987 | Guido Trotter | transitioning to the Healthy state, where ``repair:pending`` tags would |
234 | 68640987 | Guido Trotter | also be resolved). |
235 | 68640987 | Guido Trotter | |
236 | 68640987 | Guido Trotter | If no jobs are running and the instance still has issues: |
237 | 68640987 | Guido Trotter | |
238 | 68640987 | Guido Trotter | - if the last job(s) failed it can either be retried a few times, if |
239 | 68640987 | Guido Trotter | deemed to be safe, or the repair can transition to the Failed state. |
240 | 68640987 | Guido Trotter | The ``repair:result`` tag will be added, and the active |
241 | 68640987 | Guido Trotter | ``repair:pending`` tag will be removed (further ``repair:pending`` |
242 | 68640987 | Guido Trotter | tags will not be able to proceed, as explained by the Failed state, |
243 | 68640987 | Guido Trotter | until the failure state is cleared) |
244 | 68640987 | Guido Trotter | - if the last job(s) succeeded but there are not enough resources to |
245 | 68640987 | Guido Trotter | proceed, the state will transition to itself and no jobs are |
246 | 68640987 | Guido Trotter | scheduled. The tag is left untouched (and later checked again). This |
247 | 68640987 | Guido Trotter | basically just delays any repairs, the current ``pending`` tag stays |
248 | 68640987 | Guido Trotter | active, and any others are untouched). |
249 | 68640987 | Guido Trotter | - if the last job(s) succeeded but the repair type cannot allow to |
250 | 68640987 | Guido Trotter | proceed any further the ``repair:result`` tag is added with an |
251 | 68640987 | Guido Trotter | ``enoperm`` result, and the current ``repair:pending`` tag is removed. |
252 | 68640987 | Guido Trotter | The instance is now back to "Needs-repair, repair disallowed", |
253 | 68640987 | Guido Trotter | "Needs-repair, autorepair allowed", or "Pending" if there is already a |
254 | 68640987 | Guido Trotter | future tag that can repair the instance. |
255 | 68640987 | Guido Trotter | - if the last job(s) succeeded and the repair can continue new job(s) |
256 | 68640987 | Guido Trotter | can be submitted, and the ``repair:pending`` tag can be updated. |
257 | 68640987 | Guido Trotter | |
258 | 68640987 | Guido Trotter | Failed |
259 | 68640987 | Guido Trotter | ++++++ |
260 | 68640987 | Guido Trotter | |
261 | 68640987 | Guido Trotter | If repairing an instance has failed a ``repair:result:failure`` is |
262 | 68640987 | Guido Trotter | added. The presence of this tag is used to detect that an instance is in |
263 | 68640987 | Guido Trotter | this state, and it will not be touched until the failure is investigated |
264 | 68640987 | Guido Trotter | and the tag is removed. |
265 | 68640987 | Guido Trotter | |
266 | 68640987 | Guido Trotter | An external tool or person needs to investigate the state of the |
267 | 68640987 | Guido Trotter | instance and remove this tag when he is sure the instance is repaired |
268 | 68640987 | Guido Trotter | and safe to turn back to the normal autorepair system. |
269 | 68640987 | Guido Trotter | |
270 | 68640987 | Guido Trotter | (Alternatively we can use the suspended state (indefinitely or |
271 | 68640987 | Guido Trotter | temporarily) to mark the instance as "not touch" when we think a human |
272 | 68640987 | Guido Trotter | needs to look at it. To be decided). |
273 | 68640987 | Guido Trotter | |
274 | 819358e1 | Dato Simó | A graph with the possible transitions follows; note that in the graph, |
275 | 819358e1 | Dato Simó | following the implementation, the two ``Needs repair`` states have been |
276 | 819358e1 | Dato Simó | coalesced into one; and the ``Suspended`` state disapears, for it |
277 | 819358e1 | Dato Simó | becames an attribute of the instance object (its auto-repair policy). |
278 | 819358e1 | Dato Simó | |
279 | 819358e1 | Dato Simó | .. digraph:: "auto-repair-states" |
280 | 819358e1 | Dato Simó | |
281 | 819358e1 | Dato Simó | node [shape=circle, style=filled, fillcolor="#BEDEF1", |
282 | 819358e1 | Dato Simó | width=2, fixedsize=true]; |
283 | 819358e1 | Dato Simó | healthy [label="Healthy"]; |
284 | 819358e1 | Dato Simó | needsrep [label="Needs repair"]; |
285 | 819358e1 | Dato Simó | pendrep [label="Pending repair"]; |
286 | 819358e1 | Dato Simó | failed [label="Failed repair"]; |
287 | 819358e1 | Dato Simó | disabled [label="(no state)", width=1.25]; |
288 | 819358e1 | Dato Simó | |
289 | 819358e1 | Dato Simó | {rank=same; needsrep} |
290 | 819358e1 | Dato Simó | {rank=same; healthy} |
291 | 819358e1 | Dato Simó | {rank=same; pendrep} |
292 | 819358e1 | Dato Simó | {rank=same; failed} |
293 | 819358e1 | Dato Simó | {rank=same; disabled} |
294 | 819358e1 | Dato Simó | |
295 | 819358e1 | Dato Simó | // These nodes are needed to be the "origin" of the "initial state" arrows. |
296 | 819358e1 | Dato Simó | node [width=.5, label="", style=invis]; |
297 | 819358e1 | Dato Simó | inih; |
298 | 819358e1 | Dato Simó | inin; |
299 | 819358e1 | Dato Simó | inip; |
300 | 819358e1 | Dato Simó | inif; |
301 | 819358e1 | Dato Simó | inix; |
302 | 819358e1 | Dato Simó | |
303 | 819358e1 | Dato Simó | edge [fontsize=10, fontname="Arial Bold", fontcolor=blue] |
304 | 819358e1 | Dato Simó | |
305 | 819358e1 | Dato Simó | inih -> healthy [label="No tags or\nresult:success"]; |
306 | 819358e1 | Dato Simó | inip -> pendrep [label="Tag:\nautorepair:pending"]; |
307 | 819358e1 | Dato Simó | inif -> failed [label="Tag:\nresult:failure"]; |
308 | 819358e1 | Dato Simó | inix -> disabled [fontcolor=black, label="ArNotEnabled"]; |
309 | 819358e1 | Dato Simó | |
310 | 819358e1 | Dato Simó | edge [fontcolor="orange"]; |
311 | 819358e1 | Dato Simó | |
312 | 819358e1 | Dato Simó | healthy -> healthy [label="No problems\ndetected"]; |
313 | 819358e1 | Dato Simó | |
314 | 819358e1 | Dato Simó | healthy -> needsrep [ |
315 | 819358e1 | Dato Simó | label="Brokeness\ndetected in\nfirst half of\nthe tool run"]; |
316 | 819358e1 | Dato Simó | |
317 | 819358e1 | Dato Simó | pendrep -> healthy [ |
318 | 819358e1 | Dato Simó | label="All jobs\ncompleted\nsuccessfully /\ninstance healthy"]; |
319 | 819358e1 | Dato Simó | |
320 | 819358e1 | Dato Simó | pendrep -> failed [label="Some job(s)\nfailed"]; |
321 | 819358e1 | Dato Simó | |
322 | 819358e1 | Dato Simó | edge [fontcolor="red"]; |
323 | 819358e1 | Dato Simó | |
324 | 819358e1 | Dato Simó | needsrep -> pendrep [ |
325 | 819358e1 | Dato Simó | label="Repair\nallowed and\ninitial job(s)\nsubmitted"]; |
326 | 819358e1 | Dato Simó | |
327 | 819358e1 | Dato Simó | needsrep -> needsrep [ |
328 | 819358e1 | Dato Simó | label="Repairs suspended\n(no-op) or enabled\nbut not powerful enough\n(result: enoperm)"]; |
329 | 819358e1 | Dato Simó | |
330 | 819358e1 | Dato Simó | pendrep -> pendrep [label="More jobs\nsubmitted"]; |
331 | 819358e1 | Dato Simó | |
332 | 819358e1 | Dato Simó | |
333 | 68640987 | Guido Trotter | Repair operation |
334 | 68640987 | Guido Trotter | ---------------- |
335 | 68640987 | Guido Trotter | |
336 | 68640987 | Guido Trotter | Possible repairs are: |
337 | 68640987 | Guido Trotter | |
338 | 68640987 | Guido Trotter | - Replace-disks (drbd, if the secondary is down), (or other storage |
339 | 68640987 | Guido Trotter | specific fixes) |
340 | 68640987 | Guido Trotter | - Migrate (shared storage, rbd, drbd, if the primary is drained) |
341 | 68640987 | Guido Trotter | - Failover (shared storage, rbd, drbd, if the primary is down) |
342 | 68640987 | Guido Trotter | - Recreate disks + reinstall (all nodes down, plain, files or drbd) |
343 | 68640987 | Guido Trotter | |
344 | 68640987 | Guido Trotter | Note that more than one of these operations may need to happen before a |
345 | 68640987 | Guido Trotter | full repair is completed (eg. if a drbd primary goes offline first a |
346 | 68640987 | Guido Trotter | failover will happen, then a replce-disks). |
347 | 68640987 | Guido Trotter | |
348 | 68640987 | Guido Trotter | The self-repair tool will first take care of all needs-repair instance |
349 | 68640987 | Guido Trotter | that can be brought into ``pending`` state, and transition them as |
350 | 68640987 | Guido Trotter | described above. |
351 | 68640987 | Guido Trotter | |
352 | 68640987 | Guido Trotter | Then it will go through any ``repair:pending`` instances and handle them |
353 | 68640987 | Guido Trotter | as described above. |
354 | 68640987 | Guido Trotter | |
355 | 68640987 | Guido Trotter | Note that the repair tool MAY "group" instances by performing common |
356 | 68640987 | Guido Trotter | repair jobs for them (eg: node evacuate). |
357 | 68640987 | Guido Trotter | |
358 | 68640987 | Guido Trotter | Staging of work |
359 | 68640987 | Guido Trotter | --------------- |
360 | 68640987 | Guido Trotter | |
361 | 68640987 | Guido Trotter | First version: recreate-disks + reinstall (2.6.1) |
362 | 68640987 | Guido Trotter | Second version: failover and migrate repairs (2.7) |
363 | 68640987 | Guido Trotter | Third version: replace disks repair (2.7 or 2.8) |
364 | 68640987 | Guido Trotter | |
365 | 68640987 | Guido Trotter | Future work |
366 | 68640987 | Guido Trotter | =========== |
367 | 68640987 | Guido Trotter | |
368 | 68640987 | Guido Trotter | One important piece of work will be reporting what the autorepair system |
369 | 68640987 | Guido Trotter | is "thinking" and exporting this in a form that can be read by an |
370 | 68640987 | Guido Trotter | outside user or system. In order to do this we need a better |
371 | 68640987 | Guido Trotter | communication system than embedding this information into tags. This |
372 | 68640987 | Guido Trotter | should be thought in an extensible way that can be used in general for |
373 | 68640987 | Guido Trotter | Ganeti to provide "advisory" information about entities it manages, and |
374 | 68640987 | Guido Trotter | for an external system to "advise" ganeti over what it can do, but in a |
375 | 68640987 | Guido Trotter | less direct manner than submitting individual jobs. |
376 | 68640987 | Guido Trotter | |
377 | 68640987 | Guido Trotter | Note that cluster verify checks some errors that are actually instance |
378 | 68640987 | Guido Trotter | specific, (eg. a missing backend disk on a drbd node) or node-specific |
379 | 68640987 | Guido Trotter | (eg. an extra lvm device). If we were to split these into "instance |
380 | 68640987 | Guido Trotter | verify", "node verify" and "cluster verify", then we could easily use |
381 | 68640987 | Guido Trotter | this tool to perform some of those repairs as well. |
382 | 68640987 | Guido Trotter | |
383 | 68640987 | Guido Trotter | Finally self-repairs could also be extended to the cluster level, for |
384 | 68640987 | Guido Trotter | example concepts like "N+1 failures", missing master candidates, etc. or |
385 | 68640987 | Guido Trotter | node level for some specific types of errors. |
386 | 68640987 | Guido Trotter | |
387 | 68640987 | Guido Trotter | .. vim: set textwidth=72 : |
388 | 68640987 | Guido Trotter | .. Local Variables: |
389 | 68640987 | Guido Trotter | .. mode: rst |
390 | 68640987 | Guido Trotter | .. fill-column: 72 |
391 | 68640987 | Guido Trotter | .. End: |