Statistics
| Branch: | Tag: | Revision:

root / man / ganeti-watcher.rst @ f7b769b1

History | View | Annotate | Download (2.5 kB)

1 bae4b322 Iustin Pop
ganeti-watcher(8) Ganeti | Version @GANETI_VERSION@
2 bae4b322 Iustin Pop
===================================================
3 bae4b322 Iustin Pop
4 bae4b322 Iustin Pop
Name
5 bae4b322 Iustin Pop
----
6 bae4b322 Iustin Pop
7 bae4b322 Iustin Pop
ganeti-watcher - Ganeti cluster watcher
8 bae4b322 Iustin Pop
9 bae4b322 Iustin Pop
Synopsis
10 bae4b322 Iustin Pop
--------
11 bae4b322 Iustin Pop
12 bae4b322 Iustin Pop
**ganeti-watcher** [``--debug``]
13 bae4b322 Iustin Pop
[``--job-age=``*age*]
14 bae4b322 Iustin Pop
[``--ignore-pause``]
15 bae4b322 Iustin Pop
16 bae4b322 Iustin Pop
DESCRIPTION
17 bae4b322 Iustin Pop
-----------
18 bae4b322 Iustin Pop
19 bae4b322 Iustin Pop
The **ganeti-watcher** is a periodically run script which is
20 bae4b322 Iustin Pop
responsible for keeping the instances in the correct status. It has
21 bae4b322 Iustin Pop
two separate functions, one for the master node and another one
22 bae4b322 Iustin Pop
that runs on every node.
23 bae4b322 Iustin Pop
24 bae4b322 Iustin Pop
If the watcher is disabled at cluster level (via the
25 bae4b322 Iustin Pop
**gnt-cluster watcher pause** command), it will exit without doing
26 f7b769b1 Iustin Pop
anything. The cluster-level pause can be overridden via the
27 bae4b322 Iustin Pop
``--ignore-pause`` option, for example if during a maintenance the
28 bae4b322 Iustin Pop
watcher needs to be disabled in general, but the administrator
29 bae4b322 Iustin Pop
wants to run it just once.
30 bae4b322 Iustin Pop
31 bae4b322 Iustin Pop
The ``--debug`` option will increase the verbosity of the watcher
32 bae4b322 Iustin Pop
and also activate logging to the standard error.
33 bae4b322 Iustin Pop
34 bae4b322 Iustin Pop
Master operations
35 bae4b322 Iustin Pop
~~~~~~~~~~~~~~~~~
36 bae4b322 Iustin Pop
37 bae4b322 Iustin Pop
Its primary function is to try to keep running all instances which
38 bae4b322 Iustin Pop
are marked as *up* in the configuration file, by trying to start
39 bae4b322 Iustin Pop
them a limited number of times.
40 bae4b322 Iustin Pop
41 bae4b322 Iustin Pop
Another function is to "repair" DRBD links by reactivating the
42 bae4b322 Iustin Pop
block devices of instances which have secondaries on nodes that
43 bae4b322 Iustin Pop
have been rebooted.
44 bae4b322 Iustin Pop
45 bae4b322 Iustin Pop
The watcher will also archive old jobs (older than the age given
46 bae4b322 Iustin Pop
via the ``--job-age`` option, which defaults to 6 hours), in order
47 bae4b322 Iustin Pop
to keep the job queue manageable.
48 bae4b322 Iustin Pop
49 bae4b322 Iustin Pop
Node operations
50 bae4b322 Iustin Pop
~~~~~~~~~~~~~~~
51 bae4b322 Iustin Pop
52 bae4b322 Iustin Pop
The watcher will restart any down daemons that are appropriate for
53 bae4b322 Iustin Pop
the current node.
54 bae4b322 Iustin Pop
55 bae4b322 Iustin Pop
In addition, it will execute any scripts which exist under the
56 bae4b322 Iustin Pop
"watcher" directory in the Ganeti hooks directory
57 bae4b322 Iustin Pop
(``@SYSCONFDIR@/ganeti/hooks``). This should be used for lightweight
58 bae4b322 Iustin Pop
actions, like starting any extra daemons.
59 bae4b322 Iustin Pop
60 bae4b322 Iustin Pop
If the cluster parameter ``maintain_node_health`` is enabled, then the
61 bae4b322 Iustin Pop
watcher will also shutdown instances and DRBD devices if the node is
62 bae4b322 Iustin Pop
declared as offline by known master candidates.
63 bae4b322 Iustin Pop
64 bae4b322 Iustin Pop
The watcher does synchronous queries but will submit jobs for
65 bae4b322 Iustin Pop
executing the changes. Due to locking, it could be that the jobs
66 bae4b322 Iustin Pop
execute much later than the watcher submits them.
67 bae4b322 Iustin Pop
68 bae4b322 Iustin Pop
FILES
69 bae4b322 Iustin Pop
-----
70 bae4b322 Iustin Pop
71 bae4b322 Iustin Pop
The command has a state file located at
72 bae4b322 Iustin Pop
``@LOCALSTATEDIR@/lib/ganeti/watcher.data`` (only used on the master)
73 bae4b322 Iustin Pop
and a log file at ``@LOCALSTATEDIR@/log/ganeti/watcher.log``. Removal
74 bae4b322 Iustin Pop
of either file will not affect correct operation; the removal of the
75 bae4b322 Iustin Pop
state file will just cause the restart counters for the instances to
76 bae4b322 Iustin Pop
reset to zero.