1 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
3 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
4 <!-- Please adjust the date whenever revising the manpage. -->
5 <!ENTITY dhdate "<date>June 08, 2010</date>">
6 <!-- SECTION should be 1-8, maybe w/ subsection other parameters are
7 allowed: see man(7), man(1). -->
8 <!ENTITY dhsection "<manvolnum>8</manvolnum>">
9 <!ENTITY dhucpackage "<refentrytitle>ganeti-watcher</refentrytitle>">
10 <!ENTITY dhpackage "ganeti-watcher">
12 <!ENTITY debian "<productname>Debian</productname>">
13 <!ENTITY gnu "<acronym>GNU</acronym>">
14 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
15 <!ENTITY footer SYSTEM "footer.sgml">
25 <holder>Google Inc.</holder>
33 <refmiscinfo>Ganeti 2.2</refmiscinfo>
36 <refname>&dhpackage;</refname>
38 <refpurpose>Ganeti cluster watcher</refpurpose>
42 <command>&dhpackage; </command>
44 <arg><option>--debug</option></arg>
45 <arg><option>--job-age=<replaceable>age</replaceable></option></arg>
46 <arg><option>--ignore-pause</option></arg>
51 <title>DESCRIPTION</title>
54 The <command>&dhpackage;</command> is a periodically run script
55 which is responsible for keeping the instances in the correct
56 status. It has two separate functions, one for the master node
57 and another one that runs on every node.
61 If the watcher is disabled at cluster level (via
62 the <command>gnt-cluster watcher pause</command> command), it
63 will exit without doing anything. The cluster-level pause can be
64 overriden via the <option>--ignore-pause</option> option, for
65 example if during a maintenance the watcher needs to be disabled
66 in general, but the administrator wants to run it just once.
70 The <option>--debug</option> option will increase the verbosity
71 of the watcher and also activate logging to the standard error.
75 <title>Master operations</title>
78 Its primary function is to try to keep running all instances
79 which are marked as <emphasis>up</emphasis> in the configuration
80 file, by trying to start them a limited number of times.
84 Another function is to <quote>repair</quote> DRBD links by
85 reactivating the block devices of instances which have
86 secondaries on nodes that have been rebooted.
90 The watcher will also archive old jobs (older than the age
91 given via the <option>--job-age</option> option, which
92 defaults to 6 hours), in order to keep the job queue
100 <title>Node operations</title>
103 The watcher will restart any down daemons that are appropriate
104 for the current node.
108 In addition, it will execute any scripts which exist under the
109 <quote>watcher</quote> directory in the Ganeti hooks directory
110 (@SYSCONFDIR@/ganeti/hooks). This should be used for
111 lightweight actions, like starting any extra daemons.
116 parameter <literal>maintain_node_health</literal> is enabled,
117 then the watcher will also shutdown instances and DRBD devices
118 if the node is declared as offline by known master candidates.
122 The watcher does synchronous queries but will submit jobs for
123 executing the changes. Due to locking, it could be that the jobs
124 execute much later than the watcher submits them.
136 The command has a state file located at
137 <filename>@LOCALSTATEDIR@/lib/ganeti/watcher.data</filename>
138 (only used on the master) and a log file at
139 <filename>@LOCALSTATEDIR@/log/ganeti/watcher.log</filename>. Removal of
140 either file will not affect correct operation; the removal of
141 the state file will just cause the restart counters for the
142 instances to reset to zero.
151 <!-- Keep this comment at the end of the file
156 sgml-minimize-attributes:nil
157 sgml-always-quote-attributes:t
160 sgml-parent-document:nil
161 sgml-default-dtd-file:nil
162 sgml-exposed-tags:nil
163 sgml-local-catalogs:nil
164 sgml-local-ecat-files:nil