+ gnt-node evacuate -I IALLOCATOR_SCRIPT NODE
+ gnt-node evacuate -n DESTINATION_NODE NODE
+
+The first version will compute the new secondary for each instance in
+turn using the given iallocator script, whereas the second one will
+simply move all instances to DESTINATION_NODE.
+
+Removal
++++++++
+
+Once a node no longer has any instances (neither primary nor secondary),
+it's easy to remove it from the cluster::
+
+ gnt-node remove NODE_NAME
+
+This will deconfigure the node, stop the ganeti daemons on it and leave
+it hopefully like before it joined to the cluster.
+
+Storage handling
+++++++++++++++++
+
+When using LVM (either standalone or with DRBD), it can become tedious
+to debug and fix it in case of errors. Furthermore, even file-based
+storage can become complicated to handle manually on many hosts. Ganeti
+provides a couple of commands to help with automation.
+
+Logical volumes
+~~~~~~~~~~~~~~~
+
+This is a command specific to LVM handling. It allows listing the
+logical volumes on a given node or on all nodes and their association to
+instances via the ``volumes`` command::
+
+ node1# gnt-node volumes
+ Node PhysDev VG Name Size Instance
+ node1 /dev/sdb1 xenvg e61fbc97-….disk0 512M instance17
+ node1 /dev/sdb1 xenvg ebd1a7d1-….disk0 512M instance19
+ node2 /dev/sdb1 xenvg 0af08a3d-….disk0 512M instance20
+ node2 /dev/sdb1 xenvg cc012285-….disk0 512M instance16
+ node2 /dev/sdb1 xenvg f0fac192-….disk0 512M instance18
+
+The above command maps each logical volume to a volume group and
+underlying physical volume and (possibly) to an instance.
+
+.. _storage-units-label:
+
+Generalized storage handling
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. versionadded:: 2.1
+
+Starting with Ganeti 2.1, a new storage framework has been implemented
+that tries to abstract the handling of the storage type the cluster
+uses.
+
+First is listing the backend storage and their space situation::
+
+ node1# gnt-node list-storage
+ Node Name Size Used Free
+ node1 /dev/sda7 673.8G 0M 673.8G
+ node1 /dev/sdb1 698.6G 1.5G 697.1G
+ node2 /dev/sda7 673.8G 0M 673.8G
+ node2 /dev/sdb1 698.6G 1.0G 697.6G
+
+The default is to list LVM physical volumes. It's also possible to list
+the LVM volume groups::
+
+ node1# gnt-node list-storage -t lvm-vg
+ Node Name Size
+ node1 xenvg 1.3T
+ node2 xenvg 1.3T
+
+Next is repairing storage units, which is currently only implemented for
+volume groups and does the equivalent of ``vgreduce --removemissing``::
+
+ node1# gnt-node repair-storage node2 lvm-vg xenvg
+ Sun Oct 25 22:21:45 2009 Repairing storage unit 'xenvg' on node2 ...
+
+Last is the modification of volume properties, which is (again) only
+implemented for LVM physical volumes and allows toggling the
+``allocatable`` value::
+
+ node1# gnt-node modify-storage --allocatable=no node2 lvm-pv /dev/sdb1
+
+Use of the storage commands
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All these commands are needed when recovering a node from a disk
+failure:
+
+- first, we need to recover from complete LVM failure (due to missing
+ disk), by running the ``repair-storage`` command
+- second, we need to change allocation on any partially-broken disk
+ (i.e. LVM still sees it, but it has bad blocks) by running
+ ``modify-storage``
+- then we can evacuate the instances as needed
+
+
+Cluster operations
+------------------
+
+Beside the cluster initialisation command (which is detailed in the
+:doc:`install` document) and the master failover command which is
+explained under node handling, there are a couple of other cluster
+operations available.
+
+.. _cluster-config-label:
+
+Standard operations
++++++++++++++++++++
+
+One of the few commands that can be run on any node (not only the
+master) is the ``getmaster`` command::
+
+ node2# gnt-cluster getmaster
+ node1.example.com
+ node2#
+
+It is possible to query and change global cluster parameters via the
+``info`` and ``modify`` commands::
+
+ node1# gnt-cluster info
+ Cluster name: cluster.example.com
+ Cluster UUID: 07805e6f-f0af-4310-95f1-572862ee939c
+ Creation time: 2009-09-25 05:04:15
+ Modification time: 2009-10-18 22:11:47
+ Master node: node1.example.com
+ Architecture (this node): 64bit (x86_64)
+ …
+ Tags: foo
+ Default hypervisor: xen-pvm
+ Enabled hypervisors: xen-pvm
+ Hypervisor parameters:
+ - xen-pvm:
+ root_path: /dev/sda1
+ …
+ Cluster parameters:
+ - candidate pool size: 10
+ …
+ Default instance parameters:
+ - default:
+ memory: 128
+ …
+ Default nic parameters:
+ - default:
+ link: xen-br0
+ …
+
+There various parameters above can be changed via the ``modify``
+commands as follows:
+
+- the hypervisor parameters can be changed via ``modify -H
+ xen-pvm:root_path=…``, and so on for other hypervisors/key/values
+- the "default instance parameters" are changeable via ``modify -B
+ parameter=value…`` syntax
+- the cluster parameters are changeable via separate options to the
+ modify command (e.g. ``--candidate-pool-size``, etc.)
+
+For detailed option list see the :manpage:`gnt-cluster(8)` man page.
+
+The cluster version can be obtained via the ``version`` command::
+ node1# gnt-cluster version
+ Software version: 2.1.0
+ Internode protocol: 20
+ Configuration format: 2010000
+ OS api version: 15
+ Export interface: 0
+
+This is not very useful except when debugging Ganeti.
+
+Global node commands
+++++++++++++++++++++
+
+There are two commands provided for replicating files to all nodes of a
+cluster and for running commands on all the nodes::
+
+ node1# gnt-cluster copyfile /path/to/file
+ node1# gnt-cluster command ls -l /path/to/file
+
+These are simple wrappers over scp/ssh and more advanced usage can be
+obtained using :manpage:`dsh(1)` and similar commands. But they are
+useful to update an OS script from the master node, for example.
+
+Cluster verification
+++++++++++++++++++++
+
+There are three commands that relate to global cluster checks. The first
+one is ``verify`` which gives an overview on the cluster state,
+highlighting any issues. In normal operation, this command should return
+no ``ERROR`` messages::
+
+ node1# gnt-cluster verify
+ Sun Oct 25 23:08:58 2009 * Verifying global settings
+ Sun Oct 25 23:08:58 2009 * Gathering data (2 nodes)
+ Sun Oct 25 23:09:00 2009 * Verifying node status
+ Sun Oct 25 23:09:00 2009 * Verifying instance status
+ Sun Oct 25 23:09:00 2009 * Verifying orphan volumes
+ Sun Oct 25 23:09:00 2009 * Verifying remaining instances
+ Sun Oct 25 23:09:00 2009 * Verifying N+1 Memory redundancy
+ Sun Oct 25 23:09:00 2009 * Other Notes
+ Sun Oct 25 23:09:00 2009 - NOTICE: 5 non-redundant instance(s) found.
+ Sun Oct 25 23:09:00 2009 * Hooks Results
+
+The second command is ``verify-disks``, which checks that the instance's
+disks have the correct status based on the desired instance state
+(up/down)::
+
+ node1# gnt-cluster verify-disks
+
+Note that this command will show no output when disks are healthy.
+
+The last command is used to repair any discrepancies in Ganeti's
+recorded disk size and the actual disk size (disk size information is
+needed for proper activation and growth of DRBD-based disks)::
+
+ node1# gnt-cluster repair-disk-sizes
+ Sun Oct 25 23:13:16 2009 - INFO: Disk 0 of instance instance1 has mismatched size, correcting: recorded 512, actual 2048
+ Sun Oct 25 23:13:17 2009 - WARNING: Invalid result from node node4, ignoring node results
+
+The above shows one instance having wrong disk size, and a node which
+returned invalid data, and thus we ignored all primary instances of that
+node.
+
+Configuration redistribution
+++++++++++++++++++++++++++++
+
+If the verify command complains about file mismatches between the master
+and other nodes, due to some node problems or if you manually modified
+configuration files, you can force an push of the master configuration
+to all other nodes via the ``redist-conf`` command::
+
+ node1# gnt-cluster redist-conf
+ node1#
+
+This command will be silent unless there are problems sending updates to
+the other nodes.
+
+
+Cluster renaming
+++++++++++++++++
+
+It is possible to rename a cluster, or to change its IP address, via the
+``rename`` command. If only the IP has changed, you need to pass the
+current name and Ganeti will realise its IP has changed::
+
+ node1# gnt-cluster rename cluster.example.com
+ This will rename the cluster to 'cluster.example.com'. If
+ you are connected over the network to the cluster name, the operation
+ is very dangerous as the IP address will be removed from the node and
+ the change may not go through. Continue?
+ y/[n]/?: y
+ Failure: prerequisites not met for this operation:
+ Neither the name nor the IP address of the cluster has changed
+
+In the above output, neither value has changed since the cluster
+initialisation so the operation is not completed.
+
+Queue operations
+++++++++++++++++
+
+The job queue execution in Ganeti 2.0 and higher can be inspected,
+suspended and resumed via the ``queue`` command::
+
+ node1~# gnt-cluster queue info
+ The drain flag is unset
+ node1~# gnt-cluster queue drain
+ node1~# gnt-instance stop instance1
+ Failed to submit job for instance1: Job queue is drained, refusing job
+ node1~# gnt-cluster queue info
+ The drain flag is set
+ node1~# gnt-cluster queue undrain
+
+This is most useful if you have an active cluster and you need to
+upgrade the Ganeti software, or simply restart the software on any node:
+
+#. suspend the queue via ``queue drain``
+#. wait until there are no more running jobs via ``gnt-job list``
+#. restart the master or another node, or upgrade the software
+#. resume the queue via ``queue undrain``
+
+.. note:: this command only stores a local flag file, and if you
+ failover the master, it will not have effect on the new master.
+
+
+Watcher control
++++++++++++++++
+
+The :manpage:`ganeti-watcher` is a program, usually scheduled via
+``cron``, that takes care of cluster maintenance operations (restarting
+downed instances, activating down DRBD disks, etc.). However, during
+maintenance and troubleshooting, this can get in your way; disabling it
+via commenting out the cron job is not so good as this can be
+forgotten. Thus there are some commands for automated control of the
+watcher: ``pause``, ``info`` and ``continue``::
+
+ node1~# gnt-cluster watcher info
+ The watcher is not paused.
+ node1~# gnt-cluster watcher pause 1h
+ The watcher is paused until Mon Oct 26 00:30:37 2009.
+ node1~# gnt-cluster watcher info
+ The watcher is paused until Mon Oct 26 00:30:37 2009.
+ node1~# ganeti-watcher -d
+ 2009-10-25 23:30:47,984: pid=28867 ganeti-watcher:486 DEBUG Pause has been set, exiting
+ node1~# gnt-cluster watcher continue
+ The watcher is no longer paused.
+ node1~# ganeti-watcher -d
+ 2009-10-25 23:31:04,789: pid=28976 ganeti-watcher:345 DEBUG Archived 0 jobs, left 0
+ 2009-10-25 23:31:05,884: pid=28976 ganeti-watcher:280 DEBUG Got data from cluster, writing instance status file
+ 2009-10-25 23:31:06,061: pid=28976 ganeti-watcher:150 DEBUG Data didn't change, just touching status file
+ node1~# gnt-cluster watcher info
+ The watcher is not paused.
+ node1~#
+
+The exact details of the argument to the ``pause`` command are available
+in the manpage.
+
+.. note:: this command only stores a local flag file, and if you
+ failover the master, it will not have effect on the new master.
+
+Node auto-maintenance
++++++++++++++++++++++
+
+If the cluster parameter ``maintain_node_health`` is enabled (see the
+manpage for :command:`gnt-cluster`, the init and modify subcommands),
+then the following will happen automatically:
+
+- the watcher will shutdown any instances running on offline nodes
+- the watcher will deactivate any DRBD devices on offline nodes
+
+In the future, more actions are planned, so only enable this parameter
+if the nodes are completely dedicated to Ganeti; otherwise it might be
+possible to lose data due to auto-maintenance actions.
+
+Removing a cluster entirely
++++++++++++++++++++++++++++
+
+The usual method to cleanup a cluster is to run ``gnt-cluster destroy``
+however if the Ganeti installation is broken in any way then this will
+not run.
+
+It is possible in such a case to cleanup manually most if not all traces
+of a cluster installation by following these steps on all of the nodes:
+
+1. Shutdown all instances. This depends on the virtualisation method
+ used (Xen, KVM, etc.):