root / man / ganeti-watcher.rst @ dcedd81a
History | View | Annotate | Download (3.1 kB)
1 | bae4b322 | Iustin Pop | ganeti-watcher(8) Ganeti | Version @GANETI_VERSION@ |
---|---|---|---|
2 | bae4b322 | Iustin Pop | =================================================== |
3 | bae4b322 | Iustin Pop | |
4 | bae4b322 | Iustin Pop | Name |
5 | bae4b322 | Iustin Pop | ---- |
6 | bae4b322 | Iustin Pop | |
7 | bae4b322 | Iustin Pop | ganeti-watcher - Ganeti cluster watcher |
8 | bae4b322 | Iustin Pop | |
9 | bae4b322 | Iustin Pop | Synopsis |
10 | bae4b322 | Iustin Pop | -------- |
11 | bae4b322 | Iustin Pop | |
12 | bae4b322 | Iustin Pop | **ganeti-watcher** [``--debug``] |
13 | bae4b322 | Iustin Pop | [``--job-age=``*age*] |
14 | bae4b322 | Iustin Pop | [``--ignore-pause``] |
15 | bae4b322 | Iustin Pop | |
16 | bae4b322 | Iustin Pop | DESCRIPTION |
17 | bae4b322 | Iustin Pop | ----------- |
18 | bae4b322 | Iustin Pop | |
19 | bae4b322 | Iustin Pop | The **ganeti-watcher** is a periodically run script which is |
20 | bae4b322 | Iustin Pop | responsible for keeping the instances in the correct status. It has |
21 | bae4b322 | Iustin Pop | two separate functions, one for the master node and another one |
22 | bae4b322 | Iustin Pop | that runs on every node. |
23 | bae4b322 | Iustin Pop | |
24 | bae4b322 | Iustin Pop | If the watcher is disabled at cluster level (via the |
25 | bae4b322 | Iustin Pop | **gnt-cluster watcher pause** command), it will exit without doing |
26 | f7b769b1 | Iustin Pop | anything. The cluster-level pause can be overridden via the |
27 | bae4b322 | Iustin Pop | ``--ignore-pause`` option, for example if during a maintenance the |
28 | bae4b322 | Iustin Pop | watcher needs to be disabled in general, but the administrator |
29 | bae4b322 | Iustin Pop | wants to run it just once. |
30 | bae4b322 | Iustin Pop | |
31 | bae4b322 | Iustin Pop | The ``--debug`` option will increase the verbosity of the watcher |
32 | bae4b322 | Iustin Pop | and also activate logging to the standard error. |
33 | bae4b322 | Iustin Pop | |
34 | bae4b322 | Iustin Pop | Master operations |
35 | bae4b322 | Iustin Pop | ~~~~~~~~~~~~~~~~~ |
36 | bae4b322 | Iustin Pop | |
37 | bae4b322 | Iustin Pop | Its primary function is to try to keep running all instances which |
38 | bae4b322 | Iustin Pop | are marked as *up* in the configuration file, by trying to start |
39 | bae4b322 | Iustin Pop | them a limited number of times. |
40 | bae4b322 | Iustin Pop | |
41 | bae4b322 | Iustin Pop | Another function is to "repair" DRBD links by reactivating the |
42 | bae4b322 | Iustin Pop | block devices of instances which have secondaries on nodes that |
43 | bae4b322 | Iustin Pop | have been rebooted. |
44 | bae4b322 | Iustin Pop | |
45 | bae4b322 | Iustin Pop | The watcher will also archive old jobs (older than the age given |
46 | bae4b322 | Iustin Pop | via the ``--job-age`` option, which defaults to 6 hours), in order |
47 | bae4b322 | Iustin Pop | to keep the job queue manageable. |
48 | bae4b322 | Iustin Pop | |
49 | bae4b322 | Iustin Pop | Node operations |
50 | bae4b322 | Iustin Pop | ~~~~~~~~~~~~~~~ |
51 | bae4b322 | Iustin Pop | |
52 | bae4b322 | Iustin Pop | The watcher will restart any down daemons that are appropriate for |
53 | bae4b322 | Iustin Pop | the current node. |
54 | bae4b322 | Iustin Pop | |
55 | bae4b322 | Iustin Pop | In addition, it will execute any scripts which exist under the |
56 | bae4b322 | Iustin Pop | "watcher" directory in the Ganeti hooks directory |
57 | bae4b322 | Iustin Pop | (``@SYSCONFDIR@/ganeti/hooks``). This should be used for lightweight |
58 | bae4b322 | Iustin Pop | actions, like starting any extra daemons. |
59 | bae4b322 | Iustin Pop | |
60 | bae4b322 | Iustin Pop | If the cluster parameter ``maintain_node_health`` is enabled, then the |
61 | bae4b322 | Iustin Pop | watcher will also shutdown instances and DRBD devices if the node is |
62 | bae4b322 | Iustin Pop | declared as offline by known master candidates. |
63 | bae4b322 | Iustin Pop | |
64 | bae4b322 | Iustin Pop | The watcher does synchronous queries but will submit jobs for |
65 | bae4b322 | Iustin Pop | executing the changes. Due to locking, it could be that the jobs |
66 | bae4b322 | Iustin Pop | execute much later than the watcher submits them. |
67 | bae4b322 | Iustin Pop | |
68 | bae4b322 | Iustin Pop | FILES |
69 | bae4b322 | Iustin Pop | ----- |
70 | bae4b322 | Iustin Pop | |
71 | 4e3c9f2d | Iustin Pop | The command has a set of state files (one per group) located at |
72 | 4e3c9f2d | Iustin Pop | ``@LOCALSTATEDIR@/lib/ganeti/watcher.GROUP-UUID.data`` (only used on the |
73 | 4e3c9f2d | Iustin Pop | master) and a log file at |
74 | 4e3c9f2d | Iustin Pop | ``@LOCALSTATEDIR@/log/ganeti/watcher.log``. Removal of either file(s) |
75 | 4e3c9f2d | Iustin Pop | will not affect correct operation; the removal of the state file will |
76 | 4e3c9f2d | Iustin Pop | just cause the restart counters for the instances to reset to zero, and |
77 | 4e3c9f2d | Iustin Pop | mark nodes as freshly rebooted (so for example DRBD minors will be |
78 | 4e3c9f2d | Iustin Pop | re-activated). |
79 | 4e3c9f2d | Iustin Pop | |
80 | 4e3c9f2d | Iustin Pop | In some cases, it's even desirable to reset the watcher state, for |
81 | 4e3c9f2d | Iustin Pop | example after maintenance actions, or when you want to simulate the |
82 | 29fbe62e | Iustin Pop | reboot of all nodes, so in this case, you can remove all state files: |
83 | 29fbe62e | Iustin Pop | |
84 | 3a817255 | Michael Hanselmann | .. code-block:: bash |
85 | 4e3c9f2d | Iustin Pop | |
86 | 4e3c9f2d | Iustin Pop | rm -f @LOCALSTATEDIR@/lib/ganeti/watcher.*.data |
87 | 4e3c9f2d | Iustin Pop | rm -f @LOCALSTATEDIR@/lib/ganeti/watcher.*.instance-status |
88 | 4e3c9f2d | Iustin Pop | rm -f @LOCALSTATEDIR@/lib/ganeti/instance-status |
89 | 4e3c9f2d | Iustin Pop | |
90 | 4e3c9f2d | Iustin Pop | And then re-run the watcher. |
91 | 9ff4f2c0 | Michael Hanselmann | |
92 | 9ff4f2c0 | Michael Hanselmann | .. vim: set textwidth=72 : |
93 | 9ff4f2c0 | Michael Hanselmann | .. Local Variables: |
94 | 9ff4f2c0 | Michael Hanselmann | .. mode: rst |
95 | 9ff4f2c0 | Michael Hanselmann | .. fill-column: 72 |
96 | 9ff4f2c0 | Michael Hanselmann | .. End: |