Revision 395aa879
b/doc/design-2.1.rst | ||
---|---|---|
285 | 285 |
doesn't have a ganeti provided script, so nothing will be done for that |
286 | 286 |
hypervisor) |
287 | 287 |
|
288 |
|
|
289 |
Automated disk repairs infrastructure |
|
290 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
291 |
|
|
292 |
Replacing defective disks in an automated fashion is quite difficult with the |
|
293 |
current version of Ganeti. These changes will introduce additional |
|
294 |
functionality and interfaces to simplify automating disk replacements on a |
|
295 |
Ganeti node. |
|
296 |
|
|
297 |
Fix node volume group |
|
298 |
+++++++++++++++++++++ |
|
299 |
|
|
300 |
This is the most difficult addition, as it can lead to dataloss if it's not |
|
301 |
properly safeguarded. |
|
302 |
|
|
303 |
The operation must be done only when all the other nodes that have instances in |
|
304 |
common with the target node are fine, i.e. this is the only node with problems, |
|
305 |
and also we have to double-check that all instances on this node have at least |
|
306 |
a good copy of the data. |
|
307 |
|
|
308 |
This might mean that we have to enhance the GetMirrorStatus calls, and |
|
309 |
introduce and a smarter version that can tell us more about the status of an |
|
310 |
instance. |
|
311 |
|
|
312 |
Stop allocation on a given PV |
|
313 |
+++++++++++++++++++++++++++++ |
|
314 |
|
|
315 |
This is somewhat simple. First we need a "list PVs" opcode (and its associated |
|
316 |
logical unit) and then a set PV status opcode/LU. These in combination should |
|
317 |
allow both checking and changing the disk/PV status. |
|
318 |
|
|
319 |
Instance disk status |
|
320 |
++++++++++++++++++++ |
|
321 |
|
|
322 |
This new opcode or opcode change must list the instance-disk-index and node |
|
323 |
combinations of the instance together with their status. This will allow |
|
324 |
determining what part of the instance is broken (if any). |
|
325 |
|
|
326 |
Repair instance |
|
327 |
+++++++++++++++ |
|
328 |
|
|
329 |
This new opcode/LU/RAPI call will run ``replace-disks -p`` as needed, in order |
|
330 |
to fix the instance status. It only affects primary instances; secondaries can |
|
331 |
just be moved away. |
|
332 |
|
|
333 |
Migrate node |
|
334 |
++++++++++++ |
|
335 |
|
|
336 |
This new opcode/LU/RAPI call will take over the current ``gnt-node migrate`` |
|
337 |
code and run migrate for all instances on the node. |
|
338 |
|
|
339 |
Evacuate node |
|
340 |
++++++++++++++ |
|
341 |
|
|
342 |
This new opcode/LU/RAPI call will take over the current ``gnt-node evacuate`` |
|
343 |
code and run replace-secondary with an iallocator script for all instances on |
|
344 |
the node. |
|
345 |
|
|
346 |
|
|
288 | 347 |
External interface changes |
289 | 348 |
-------------------------- |
290 | 349 |
|
Also available in: Unified diff