root / man / ganeti-watcher.rst @ dcedd81a
History | View | Annotate | Download (3.1 kB)
1 |
ganeti-watcher(8) Ganeti | Version @GANETI_VERSION@ |
---|---|
2 |
=================================================== |
3 |
|
4 |
Name |
5 |
---- |
6 |
|
7 |
ganeti-watcher - Ganeti cluster watcher |
8 |
|
9 |
Synopsis |
10 |
-------- |
11 |
|
12 |
**ganeti-watcher** [``--debug``] |
13 |
[``--job-age=``*age*] |
14 |
[``--ignore-pause``] |
15 |
|
16 |
DESCRIPTION |
17 |
----------- |
18 |
|
19 |
The **ganeti-watcher** is a periodically run script which is |
20 |
responsible for keeping the instances in the correct status. It has |
21 |
two separate functions, one for the master node and another one |
22 |
that runs on every node. |
23 |
|
24 |
If the watcher is disabled at cluster level (via the |
25 |
**gnt-cluster watcher pause** command), it will exit without doing |
26 |
anything. The cluster-level pause can be overridden via the |
27 |
``--ignore-pause`` option, for example if during a maintenance the |
28 |
watcher needs to be disabled in general, but the administrator |
29 |
wants to run it just once. |
30 |
|
31 |
The ``--debug`` option will increase the verbosity of the watcher |
32 |
and also activate logging to the standard error. |
33 |
|
34 |
Master operations |
35 |
~~~~~~~~~~~~~~~~~ |
36 |
|
37 |
Its primary function is to try to keep running all instances which |
38 |
are marked as *up* in the configuration file, by trying to start |
39 |
them a limited number of times. |
40 |
|
41 |
Another function is to "repair" DRBD links by reactivating the |
42 |
block devices of instances which have secondaries on nodes that |
43 |
have been rebooted. |
44 |
|
45 |
The watcher will also archive old jobs (older than the age given |
46 |
via the ``--job-age`` option, which defaults to 6 hours), in order |
47 |
to keep the job queue manageable. |
48 |
|
49 |
Node operations |
50 |
~~~~~~~~~~~~~~~ |
51 |
|
52 |
The watcher will restart any down daemons that are appropriate for |
53 |
the current node. |
54 |
|
55 |
In addition, it will execute any scripts which exist under the |
56 |
"watcher" directory in the Ganeti hooks directory |
57 |
(``@SYSCONFDIR@/ganeti/hooks``). This should be used for lightweight |
58 |
actions, like starting any extra daemons. |
59 |
|
60 |
If the cluster parameter ``maintain_node_health`` is enabled, then the |
61 |
watcher will also shutdown instances and DRBD devices if the node is |
62 |
declared as offline by known master candidates. |
63 |
|
64 |
The watcher does synchronous queries but will submit jobs for |
65 |
executing the changes. Due to locking, it could be that the jobs |
66 |
execute much later than the watcher submits them. |
67 |
|
68 |
FILES |
69 |
----- |
70 |
|
71 |
The command has a set of state files (one per group) located at |
72 |
``@LOCALSTATEDIR@/lib/ganeti/watcher.GROUP-UUID.data`` (only used on the |
73 |
master) and a log file at |
74 |
``@LOCALSTATEDIR@/log/ganeti/watcher.log``. Removal of either file(s) |
75 |
will not affect correct operation; the removal of the state file will |
76 |
just cause the restart counters for the instances to reset to zero, and |
77 |
mark nodes as freshly rebooted (so for example DRBD minors will be |
78 |
re-activated). |
79 |
|
80 |
In some cases, it's even desirable to reset the watcher state, for |
81 |
example after maintenance actions, or when you want to simulate the |
82 |
reboot of all nodes, so in this case, you can remove all state files: |
83 |
|
84 |
.. code-block:: bash |
85 |
|
86 |
rm -f @LOCALSTATEDIR@/lib/ganeti/watcher.*.data |
87 |
rm -f @LOCALSTATEDIR@/lib/ganeti/watcher.*.instance-status |
88 |
rm -f @LOCALSTATEDIR@/lib/ganeti/instance-status |
89 |
|
90 |
And then re-run the watcher. |
91 |
|
92 |
.. vim: set textwidth=72 : |
93 |
.. Local Variables: |
94 |
.. mode: rst |
95 |
.. fill-column: 72 |
96 |
.. End: |