Statistics
| Branch: | Tag: | Revision:

root / doc / design-autorepair.rst @ d3b06210

History | View | Annotate | Download (12.9 kB)

1 68640987 Guido Trotter
====================
2 68640987 Guido Trotter
Instance auto-repair
3 68640987 Guido Trotter
====================
4 68640987 Guido Trotter
5 68640987 Guido Trotter
.. contents:: :depth: 4
6 68640987 Guido Trotter
7 68640987 Guido Trotter
This is a design document detailing the implementation of self-repair and
8 68640987 Guido Trotter
recreation of instances in Ganeti. It also discusses ideas that might be useful
9 68640987 Guido Trotter
for more future self-repair situations.
10 68640987 Guido Trotter
11 68640987 Guido Trotter
Current state and shortcomings
12 68640987 Guido Trotter
==============================
13 68640987 Guido Trotter
14 68640987 Guido Trotter
Ganeti currently doesn't do any sort of self-repair or self-recreate of
15 68640987 Guido Trotter
instances:
16 68640987 Guido Trotter
17 68640987 Guido Trotter
- If a drbd instance is broken (its primary of secondary nodes go
18 68640987 Guido Trotter
  offline or need to be drained) an admin or an external tool must fail
19 68640987 Guido Trotter
  it over if necessary, and then trigger a disk replacement.
20 68640987 Guido Trotter
- If a plain instance is broken (or both nodes of a drbd instance are)
21 68640987 Guido Trotter
  an admin or an external tool must recreate its disk and reinstall it.
22 68640987 Guido Trotter
23 68640987 Guido Trotter
Moreover in an oversubscribed cluster operations mentioned above might
24 68640987 Guido Trotter
fail for lack of capacity until a node is repaired or a new one added.
25 68640987 Guido Trotter
In this case an external tool would also need to go through any
26 68640987 Guido Trotter
"pending-recreate" or "pending-repair" instances and fix them.
27 68640987 Guido Trotter
28 68640987 Guido Trotter
Proposed changes
29 68640987 Guido Trotter
================
30 68640987 Guido Trotter
31 68640987 Guido Trotter
We'd like to increase the self-repair capabilities of Ganeti, at least
32 68640987 Guido Trotter
with regards to instances. In order to do so we plan to add mechanisms
33 68640987 Guido Trotter
to mark an instance as "due for being repaired" and then the relevant
34 68640987 Guido Trotter
repair to be performed as soon as it's possible, on the cluster.
35 68640987 Guido Trotter
36 68640987 Guido Trotter
The self repair will be written as part of ganeti-watcher or as an extra
37 68640987 Guido Trotter
watcher component that is called less often.
38 68640987 Guido Trotter
39 68640987 Guido Trotter
As the first version we'll only handle the case in which an instance
40 68640987 Guido Trotter
lives on an offline or drained node. In the future we may add more
41 68640987 Guido Trotter
self-repair capabilities for errors ganeti can detect.
42 68640987 Guido Trotter
43 68640987 Guido Trotter
New attributes (or tags)
44 68640987 Guido Trotter
------------------------
45 68640987 Guido Trotter
46 68640987 Guido Trotter
In order to know when to perform a self-repair operation we need to know
47 68640987 Guido Trotter
whether they are allowed by the cluster administrator.
48 68640987 Guido Trotter
49 68640987 Guido Trotter
This can be implemented as either new attributes or tags. Tags could be
50 68640987 Guido Trotter
acceptable as they would only be read and interpreted by the self-repair tool
51 68640987 Guido Trotter
(part of the watcher), and not by the ganeti core opcodes and node rpcs. The
52 68640987 Guido Trotter
following tags would be needed:
53 68640987 Guido Trotter
54 68640987 Guido Trotter
ganeti:watcher:autorepair:<type>
55 68640987 Guido Trotter
++++++++++++++++++++++++++++++++
56 68640987 Guido Trotter
57 68640987 Guido Trotter
(instance/nodegroup/cluster)
58 68640987 Guido Trotter
Allow repairs to happen on an instance that has the tag, or that lives
59 68640987 Guido Trotter
in a cluster or nodegroup which does. Types of repair are in order of
60 68640987 Guido Trotter
perceived risk, lower to higher, and each type includes allowing the
61 68640987 Guido Trotter
operations in the lower ones:
62 68640987 Guido Trotter
63 68640987 Guido Trotter
- ``fix-storage`` allows a disk replacement or another operation that
64 68640987 Guido Trotter
  fixes the instance backend storage without affecting the instance
65 68640987 Guido Trotter
  itself. This can for example recover from a broken drbd secondary, but
66 68640987 Guido Trotter
  risks data loss if something is wrong on the primary but the secondary
67 68640987 Guido Trotter
  was somehow recoverable.
68 68640987 Guido Trotter
- ``migrate`` allows an instance migration. This can recover from a
69 68640987 Guido Trotter
  drained primary, but can cause an instance crash in some cases (bugs).
70 68640987 Guido Trotter
- ``failover`` allows instance reboot on the secondary. This can recover
71 68640987 Guido Trotter
  from an offline primary, but the instance will lose its running state.
72 68640987 Guido Trotter
- ``reinstall`` allows disks to be recreated and an instance to be
73 68640987 Guido Trotter
  reinstalled. This can recover from primary&secondary both being
74 68640987 Guido Trotter
  offline, or from an offline primary in the case of non-redundant
75 68640987 Guido Trotter
  instances. It causes data loss.
76 68640987 Guido Trotter
77 68640987 Guido Trotter
Each repair type allows all the operations in the previous types, in the
78 68640987 Guido Trotter
order above, in order to ensure a repair can be completed fully. As such
79 68640987 Guido Trotter
a repair of a lower type might not be able to proceed if it detects an
80 68640987 Guido Trotter
error condition that requires a more risky or drastic solution, but
81 68640987 Guido Trotter
never vice versa (if a worse solution is allowed then so is a better
82 68640987 Guido Trotter
one).
83 68640987 Guido Trotter
84 68640987 Guido Trotter
ganeti:watcher:autorepair:suspend[:<timestamp>]
85 68640987 Guido Trotter
+++++++++++++++++++++++++++++++++++++++++++++++
86 68640987 Guido Trotter
87 68640987 Guido Trotter
(instance/nodegroup/cluster)
88 68640987 Guido Trotter
If this tag is encountered no autorepair operations will start for the
89 68640987 Guido Trotter
instance (or for any instance, if present at the cluster or group
90 68640987 Guido Trotter
level). Any job which already started will be allowed to finish, but
91 68640987 Guido Trotter
then the autorepair system will not proceed further until this tag is
92 68640987 Guido Trotter
removed, or the timestamp passes (in which case the tag will be removed
93 68640987 Guido Trotter
automatically by the watcher).
94 68640987 Guido Trotter
95 68640987 Guido Trotter
Note that depending on how this tag is used there might still be race
96 68640987 Guido Trotter
conditions related to it for an external tool that uses it
97 68640987 Guido Trotter
programmatically, as no "lock tag" or tag "test-and-set" operation is
98 68640987 Guido Trotter
present at this time. While this is known we won't solve these race
99 68640987 Guido Trotter
conditions in the first version.
100 68640987 Guido Trotter
101 68640987 Guido Trotter
It might also be useful to easily have an operation that tags all
102 68640987 Guido Trotter
instances matching a  filter on some charateristic. But again, this
103 68640987 Guido Trotter
wouldn't be specific to this tag.
104 68640987 Guido Trotter
105 68640987 Guido Trotter
ganeti:watcher:repair:pending:<type>:<id>:<timestamp>:<jobs>
106 68640987 Guido Trotter
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
107 68640987 Guido Trotter
108 68640987 Guido Trotter
(instance)
109 68640987 Guido Trotter
If this tag is present a repair of type ``type`` is pending on the
110 68640987 Guido Trotter
target instance. This means that either jobs are being run, or it's
111 68640987 Guido Trotter
waiting for resource availability. ``id`` is the unique id identifying
112 68640987 Guido Trotter
this repair, ``timestamp`` is the time when this tag was first applied
113 68640987 Guido Trotter
to this instance for this ``id`` (we will "update" the tag by adding a
114 68640987 Guido Trotter
"new copy" of it and removing the old version as we run more jobs, but
115 68640987 Guido Trotter
the timestamp will never change for the same repair)
116 68640987 Guido Trotter
117 68640987 Guido Trotter
``jobs`` is the list of jobs already run or being run to repair the
118 68640987 Guido Trotter
instance. If the instance has just been put in pending state but no job
119 68640987 Guido Trotter
has run yet, this list is empty.
120 68640987 Guido Trotter
121 68640987 Guido Trotter
This tag will be set by ganeti if an equivalent autorepair tag is
122 68640987 Guido Trotter
present and a a repair is needed, or can be set by an external tool to
123 68640987 Guido Trotter
request a repair as a "once off".
124 68640987 Guido Trotter
125 68640987 Guido Trotter
If multiple instances of this tag are present they will be handled in
126 68640987 Guido Trotter
order of timestamp.
127 68640987 Guido Trotter
128 68640987 Guido Trotter
ganeti:watcher:repair:result:<type>:<id>:<timestamp>:<result>:<jobs>
129 68640987 Guido Trotter
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
130 68640987 Guido Trotter
131 68640987 Guido Trotter
(instance)
132 68640987 Guido Trotter
If this tag is present a repair of type ``type`` has been performed on
133 68640987 Guido Trotter
the instance and has been completed by ``timestamp``. The result is
134 68640987 Guido Trotter
either ``success``, ``failure`` or ``enoperm``, and jobs is a comma
135 68640987 Guido Trotter
separated list of jobs that were executed for this repair.
136 68640987 Guido Trotter
137 68640987 Guido Trotter
An ``enoperm`` result is returned when the repair was brought on until
138 68640987 Guido Trotter
possible, but the repair type doesn't consent to proceed further.
139 68640987 Guido Trotter
140 68640987 Guido Trotter
Possible states, and transitions
141 68640987 Guido Trotter
--------------------------------
142 68640987 Guido Trotter
143 68640987 Guido Trotter
At any point an instance can be in one of the following health states:
144 68640987 Guido Trotter
145 68640987 Guido Trotter
Healthy
146 68640987 Guido Trotter
+++++++
147 68640987 Guido Trotter
148 68640987 Guido Trotter
The instance lives on only online nodes. The autorepair system will
149 68640987 Guido Trotter
never touch these instances. Any ``repair:pending`` tags will be removed
150 68640987 Guido Trotter
and marked ``success`` with no jobs attached to them.
151 68640987 Guido Trotter
152 68640987 Guido Trotter
This state can transition to:
153 68640987 Guido Trotter
154 68640987 Guido Trotter
- Needs-repair, repair disallowed (node offlined or drained, no
155 68640987 Guido Trotter
  autorepair tag)
156 68640987 Guido Trotter
- Needs-repair, autorepair allowed (node offlined or drained, autorepair
157 68640987 Guido Trotter
  tag present)
158 68640987 Guido Trotter
- Suspended (a suspend tag is added)
159 68640987 Guido Trotter
160 68640987 Guido Trotter
Suspended
161 68640987 Guido Trotter
+++++++++
162 68640987 Guido Trotter
163 68640987 Guido Trotter
Whenever a ``repair:suspend`` tag is added the autorepair code won't
164 68640987 Guido Trotter
touch the instance until the timestamp on the tag has passed, if
165 68640987 Guido Trotter
present. The tag will be removed afterwards (and the instance will
166 68640987 Guido Trotter
transition to its correct state, depending on its health and other
167 68640987 Guido Trotter
tags).
168 68640987 Guido Trotter
169 68640987 Guido Trotter
Note that when an instance is suspended any pending repair is
170 68640987 Guido Trotter
interrupted, but jobs which were submitted before the suspension are
171 68640987 Guido Trotter
allowed to finish.
172 68640987 Guido Trotter
173 68640987 Guido Trotter
Needs-repair, repair disallowed
174 68640987 Guido Trotter
+++++++++++++++++++++++++++++++
175 68640987 Guido Trotter
176 68640987 Guido Trotter
The instance lives on an offline or drained node, but no autorepair tag
177 68640987 Guido Trotter
is set, or the autorepair tag set is of a type not powerful enough to
178 68640987 Guido Trotter
finish the repair. The autorepair system will never touch these
179 68640987 Guido Trotter
instances, and they can transition to:
180 68640987 Guido Trotter
181 68640987 Guido Trotter
- Healthy (manual repair)
182 68640987 Guido Trotter
- Pending repair (a ``repair:pending`` tag is added)
183 68640987 Guido Trotter
- Needs-repair, repair allowed always (an autorepair always tag is added)
184 68640987 Guido Trotter
- Suspended (a suspend tag is added)
185 68640987 Guido Trotter
186 68640987 Guido Trotter
Needs-repair, repair allowed always
187 68640987 Guido Trotter
+++++++++++++++++++++++++++++++++++
188 68640987 Guido Trotter
189 68640987 Guido Trotter
A ``repair:pending`` tag is added, and the instance transitions to the
190 68640987 Guido Trotter
Pending Repair state. The autorepair tag is preserved.
191 68640987 Guido Trotter
192 68640987 Guido Trotter
Of course if a ``repair:suspended`` tag is found no pending tag will be
193 68640987 Guido Trotter
added, and the instance will instead transition to the Suspended state.
194 68640987 Guido Trotter
195 68640987 Guido Trotter
Pending repair
196 68640987 Guido Trotter
++++++++++++++
197 68640987 Guido Trotter
198 68640987 Guido Trotter
When an instance is in this stage the following will happen:
199 68640987 Guido Trotter
200 68640987 Guido Trotter
If a ``repair:suspended`` tag is found the instance won't be touched and
201 68640987 Guido Trotter
moved to the Suspended state. Any jobs which were already running will
202 68640987 Guido Trotter
be left untouched.
203 68640987 Guido Trotter
204 68640987 Guido Trotter
If there are still jobs running related to the instance and scheduled by
205 68640987 Guido Trotter
this repair they will be given more time to run, and the instance will
206 68640987 Guido Trotter
be checked again later.  The state transitions to itself.
207 68640987 Guido Trotter
208 68640987 Guido Trotter
If no jobs are running and the instance is detected to be healthy, the
209 68640987 Guido Trotter
``repair:result`` tag will be added, and the current active
210 68640987 Guido Trotter
``repair:pending`` tag will be removed. It will then transition to the
211 68640987 Guido Trotter
Healthy state if there are no ``repair:pending`` tags, or to the Pending
212 68640987 Guido Trotter
state otherwise: there, the instance being healthy, those tags will be
213 68640987 Guido Trotter
resolved without any operation as well (note that this is the same as
214 68640987 Guido Trotter
transitioning to the Healthy state, where ``repair:pending`` tags would
215 68640987 Guido Trotter
also be resolved).
216 68640987 Guido Trotter
217 68640987 Guido Trotter
If no jobs are running and the instance still has issues:
218 68640987 Guido Trotter
219 68640987 Guido Trotter
- if the last job(s) failed it can either be retried a few times, if
220 68640987 Guido Trotter
  deemed to be safe, or the repair can transition to the Failed state.
221 68640987 Guido Trotter
  The ``repair:result`` tag will be added, and the active
222 68640987 Guido Trotter
  ``repair:pending`` tag will be removed (further ``repair:pending``
223 68640987 Guido Trotter
  tags will not be able to proceed, as explained by the Failed state,
224 68640987 Guido Trotter
  until the failure state is cleared)
225 68640987 Guido Trotter
- if the last job(s) succeeded but there are not enough resources to
226 68640987 Guido Trotter
  proceed, the state will transition to itself and no jobs are
227 68640987 Guido Trotter
  scheduled. The tag is left untouched (and later checked again). This
228 68640987 Guido Trotter
  basically just delays any repairs, the current ``pending`` tag stays
229 68640987 Guido Trotter
  active, and any others are untouched).
230 68640987 Guido Trotter
- if the last job(s) succeeded but the repair type cannot allow to
231 68640987 Guido Trotter
  proceed any further the ``repair:result`` tag is added with an
232 68640987 Guido Trotter
  ``enoperm`` result, and the current ``repair:pending`` tag is removed.
233 68640987 Guido Trotter
  The instance is now back to "Needs-repair, repair disallowed",
234 68640987 Guido Trotter
  "Needs-repair, autorepair allowed", or "Pending" if there is already a
235 68640987 Guido Trotter
  future tag that can repair the instance.
236 68640987 Guido Trotter
- if the last job(s) succeeded and the repair can continue new job(s)
237 68640987 Guido Trotter
  can be submitted, and the ``repair:pending`` tag can be updated.
238 68640987 Guido Trotter
239 68640987 Guido Trotter
Failed
240 68640987 Guido Trotter
++++++
241 68640987 Guido Trotter
242 68640987 Guido Trotter
If repairing an instance has failed a ``repair:result:failure`` is
243 68640987 Guido Trotter
added. The presence of this tag is used to detect that an instance is in
244 68640987 Guido Trotter
this state, and it will not be touched until the failure is investigated
245 68640987 Guido Trotter
and the tag is removed.
246 68640987 Guido Trotter
247 68640987 Guido Trotter
An external tool or person needs to investigate the state of the
248 68640987 Guido Trotter
instance and remove this tag when he is sure the instance is repaired
249 68640987 Guido Trotter
and safe to turn back to the normal autorepair system.
250 68640987 Guido Trotter
251 68640987 Guido Trotter
(Alternatively we can use the suspended state (indefinitely or
252 68640987 Guido Trotter
temporarily) to mark the instance as "not touch" when we think a human
253 68640987 Guido Trotter
needs to look at it. To be decided).
254 68640987 Guido Trotter
255 68640987 Guido Trotter
Repair operation
256 68640987 Guido Trotter
----------------
257 68640987 Guido Trotter
258 68640987 Guido Trotter
Possible repairs are:
259 68640987 Guido Trotter
260 68640987 Guido Trotter
- Replace-disks (drbd, if the secondary is down), (or other storage
261 68640987 Guido Trotter
  specific fixes)
262 68640987 Guido Trotter
- Migrate (shared storage, rbd, drbd, if the primary is drained)
263 68640987 Guido Trotter
- Failover (shared storage, rbd, drbd, if the primary is down)
264 68640987 Guido Trotter
- Recreate disks + reinstall (all nodes down, plain, files or drbd)
265 68640987 Guido Trotter
266 68640987 Guido Trotter
Note that more than one of these operations may need to happen before a
267 68640987 Guido Trotter
full repair is completed (eg. if a drbd primary goes offline first a
268 68640987 Guido Trotter
failover will happen, then a replce-disks).
269 68640987 Guido Trotter
270 68640987 Guido Trotter
The self-repair tool will first take care of all needs-repair instance
271 68640987 Guido Trotter
that can be brought into ``pending`` state, and transition them as
272 68640987 Guido Trotter
described above.
273 68640987 Guido Trotter
274 68640987 Guido Trotter
Then it will go through any ``repair:pending`` instances and handle them
275 68640987 Guido Trotter
as described above.
276 68640987 Guido Trotter
277 68640987 Guido Trotter
Note that the repair tool MAY "group" instances by performing common
278 68640987 Guido Trotter
repair jobs for them (eg: node evacuate).
279 68640987 Guido Trotter
280 68640987 Guido Trotter
Staging of work
281 68640987 Guido Trotter
---------------
282 68640987 Guido Trotter
283 68640987 Guido Trotter
First version: recreate-disks + reinstall (2.6.1)
284 68640987 Guido Trotter
Second version: failover and migrate repairs (2.7)
285 68640987 Guido Trotter
Third version: replace disks repair (2.7 or 2.8)
286 68640987 Guido Trotter
287 68640987 Guido Trotter
Future work
288 68640987 Guido Trotter
===========
289 68640987 Guido Trotter
290 68640987 Guido Trotter
One important piece of work will be reporting what the autorepair system
291 68640987 Guido Trotter
is "thinking" and exporting this in a form that can be read by an
292 68640987 Guido Trotter
outside user or system. In order to do this we need a better
293 68640987 Guido Trotter
communication system than embedding this information into tags. This
294 68640987 Guido Trotter
should be thought in an extensible way that can be used in general for
295 68640987 Guido Trotter
Ganeti to provide "advisory" information about entities it manages, and
296 68640987 Guido Trotter
for an external system to "advise" ganeti over what it can do, but in a
297 68640987 Guido Trotter
less direct manner than submitting individual jobs.
298 68640987 Guido Trotter
299 68640987 Guido Trotter
Note that cluster verify checks some errors that are actually instance
300 68640987 Guido Trotter
specific, (eg. a missing backend disk on a drbd node) or node-specific
301 68640987 Guido Trotter
(eg. an extra lvm device). If we were to split these into "instance
302 68640987 Guido Trotter
verify", "node verify" and "cluster verify", then we could easily use
303 68640987 Guido Trotter
this tool to perform some of those repairs as well.
304 68640987 Guido Trotter
305 68640987 Guido Trotter
Finally self-repairs could also be extended to the cluster level, for
306 68640987 Guido Trotter
example concepts like "N+1 failures", missing master candidates, etc. or
307 68640987 Guido Trotter
node level for some specific types of errors.
308 68640987 Guido Trotter
309 68640987 Guido Trotter
.. vim: set textwidth=72 :
310 68640987 Guido Trotter
.. Local Variables:
311 68640987 Guido Trotter
.. mode: rst
312 68640987 Guido Trotter
.. fill-column: 72
313 68640987 Guido Trotter
.. End: