root / man / harep.rst @ e715a6d6
History | View | Annotate | Download (2.9 kB)
1 |
HAREP(1) Ganeti | Version @GANETI_VERSION@ |
---|---|
2 |
========================================== |
3 |
|
4 |
NAME |
5 |
---- |
6 |
|
7 |
harep - Ganeti auto-repair tool |
8 |
|
9 |
SYNOPSIS |
10 |
-------- |
11 |
|
12 |
**harep** [ [**-L** | **\--luxi** ] = *socket* ] [ --job-delay = *seconds* ] |
13 |
|
14 |
**harep** \--version |
15 |
|
16 |
DESCRIPTION |
17 |
----------- |
18 |
|
19 |
Harep is the Ganeti auto-repair tool. It is able to detect that an instance is |
20 |
broken and to generate a sequence of jobs that will fix it, in accordance to the |
21 |
policies set by the administrator. |
22 |
|
23 |
Harep is able to recognize what state an instance is in (healthy, suspended, |
24 |
needs repair, repair disallowed, pending repair, repair disallowed, repair |
25 |
failed) and to lead it through a sequence of steps that will bring the instance |
26 |
back to the healthy state. Therefore, harep is mainly meant to be run regularly |
27 |
and frequently using a cron job, so that is can actually follow the instance |
28 |
along all the process. At every run, harep will update the tags it adds to |
29 |
instances that describe its repair status, and will submit jobs that actually |
30 |
perform the required repair operations. |
31 |
|
32 |
By default, harep only reports on the health status of instances, but doesn't |
33 |
perform any action, as they might be potentially dangerous. Therefore, harep |
34 |
will only touch instances that it has been explicitly authorized to work on. |
35 |
|
36 |
The tags enabling harep, can be associated to single instances, or to a |
37 |
nodegroup or to the whole cluster, therefore affecting all the instances they |
38 |
contain. The possible tags share the common structure:: |
39 |
|
40 |
ganeti:watcher:autorepair:<type> |
41 |
|
42 |
where ``<type>`` can have the following values: |
43 |
|
44 |
* ``fix-storage``: allow disk replacement or fix the backend without affecting the instance |
45 |
itself (broken DRBD secondary) |
46 |
* ``migrate``: allow instance migration |
47 |
* ``failover``: allow instance reboot on the secondary |
48 |
* ``reinstall``: allow disks to be recreated and the instance to be reinstalled |
49 |
|
50 |
Each element in the list of tags, includes all the authorizations of the |
51 |
previous one, with ``fix-storage`` being the least powerful and ``reinstall`` |
52 |
being the most powerful. |
53 |
|
54 |
In case multiple autorepair tags act on the same instance, only one can actually |
55 |
be active. The conflict is solved according to the following rules: |
56 |
|
57 |
#. if multiple tags are in the same object, the least destructive takes |
58 |
precedence. |
59 |
|
60 |
#. if the tags are across objects, the nearest tag wins. |
61 |
|
62 |
Example: |
63 |
A cluster has instances I1 and I2, where I1 has the ``failover`` tag, and |
64 |
the cluster has both ``fix-storage`` and ``reinstall``. |
65 |
The I1 instance will be allowed to ``failover``, the I2 instance only to |
66 |
``fix-storage``. |
67 |
|
68 |
|
69 |
OPTIONS |
70 |
------- |
71 |
|
72 |
The options that can be passed to the program are as follows: |
73 |
|
74 |
-L *socket*, \--luxi=*socket* |
75 |
collect data via Luxi, optionally using the given *socket* path. |
76 |
|
77 |
\--job-delay=*seconds* |
78 |
insert this much delay before the execution of repair jobs to allow the tool |
79 |
to continue processing instances. |
80 |
|
81 |
.. vim: set textwidth=72 : |
82 |
.. Local Variables: |
83 |
.. mode: rst |
84 |
.. fill-column: 72 |
85 |
.. End: |