Revision 9dd964b9 man/harep.rst
b/man/harep.rst | ||
---|---|---|
16 | 16 |
DESCRIPTION |
17 | 17 |
----------- |
18 | 18 |
|
19 |
harep is the Ganeti auto-repair tool. It is able to detect that an instance is
|
|
19 |
Harep is the Ganeti auto-repair tool. It is able to detect that an instance is
|
|
20 | 20 |
broken and to generate a sequence of jobs that will fix it, in accordance to the |
21 | 21 |
policies set by the administrator. |
22 | 22 |
|
23 |
Harep is able to recognize what state an instance is in (healthy, suspended, |
|
24 |
needs repair, repair disallowed, pending repair, repair disallowed, repair |
|
25 |
failed) and to lead it through a sequence of steps that will bring the instance |
|
26 |
back to the healthy state. Therefore, harep is mainly meant to be run regularly |
|
27 |
and frequently using a cron job, so that is can actually follow the instance |
|
28 |
along all the process. At every run, harep will update the tags it adds to |
|
29 |
instances that describe its repair status, and will submit jobs that actually |
|
30 |
perform the required repair operations. |
|
31 |
|
|
32 |
By default, harep only reports on the health status of instances, but doesn't |
|
33 |
perform any action, as they might be potentially dangerous. Therefore, harep |
|
34 |
will only touch instances that it has been explicitly authorized to work on. |
|
35 |
|
|
36 |
The tags enabling harep, can be associated to single instances, or to a |
|
37 |
nodegroup or to the whole cluster, therefore affecting all the instances they |
|
38 |
contain. The possible tags share the common structure:: |
|
39 |
|
|
40 |
ganeti:watcher:autorepair:<type> |
|
41 |
|
|
42 |
where ``<type>`` can have the following values: |
|
43 |
|
|
44 |
* ``fix-storage``: allow disk replacement or fix the backend without affecting the instance |
|
45 |
itself (broken DRBD secondary) |
|
46 |
* ``migrate``: allow instance migration |
|
47 |
* ``failover``: allow instance reboot on the secondary |
|
48 |
* ``reinstall``: allow disks to be recreated and the instance to be reinstalled |
|
49 |
|
|
50 |
Each element in the list of tags, includes all the authorizations of the |
|
51 |
previous one, with ``fix-storage`` being the least powerful and ``reinstall`` |
|
52 |
being the most powerful. |
|
53 |
|
|
54 |
In case multiple autorepair tags act on the same instance, only one can actually |
|
55 |
be active. The conflict is solved according to the following rules: |
|
56 |
|
|
57 |
#. if multiple tags are in the same object, the least destructive takes |
|
58 |
precedence. |
|
59 |
|
|
60 |
#. if the tags are across objects, the nearest tag wins. |
|
61 |
|
|
62 |
Example: |
|
63 |
A cluster has instances I1 and I2, where I1 has the ``failover`` tag, and |
|
64 |
the cluster has both ``fix-storage`` and ``reinstall``. |
|
65 |
The I1 instance will be allowed to ``failover``, the I2 instance only to |
|
66 |
``fix-storage``. |
|
67 |
|
|
68 |
|
|
23 | 69 |
OPTIONS |
24 | 70 |
------- |
25 | 71 |
|
Also available in: Unified diff