file
The instance will use plain files as backend for its disks. No
redundancy is provided, and this is somewhat more difficult to
- configure for high performance.
+ configure for high performance. Note that for security reasons the
+ file storage directory must be listed under
+ ``/etc/ganeti/file-storage-paths``, and that file is not copied
+ automatically to all nodes by Ganeti.
+
+sharedfile
+ The instance will use plain files as backend, but Ganeti assumes that
+ those files will be available and in sync automatically on all nodes.
+ This allows live migration and failover of instances using this
+ method. As for ``file`` the file storage directory must be listed under
+ ``/etc/ganeti/file-storage-paths`` or ganeti will refuse to create
+ instances under it.
plain
The instance will use LVM devices as backend for its disks. No
The instance will use Volumes inside a RADOS cluster as backend for its
disks. It will access them using the RADOS block device (RBD).
+ext
+ The instance will use an external storage provider. See
+ :manpage:`ganeti-extstorage-interface(7)` for how to implement one.
+
+
IAllocator
~~~~~~~~~~
- Arguments for the NICs of the instance; by default, a single-NIC
instance is created. The IP and/or bridge of the NIC can be changed
- via ``--nic 0:ip=IP,bridge=BRIDGE``
+ via ``--net 0:ip=IP,link=BRIDGE``
-See the manpage for gnt-instance for the detailed option list.
+See :manpage:`ganeti-instance(8)` for the detailed option list.
For example if you want to create an highly available instance, with a
single disk of 50GB and the default memory size, having primary node
steps::
# instance is located on A, B
- $ gnt-instance replace -n %nodeC% %instance1%
+ $ gnt-instance replace-disks -n %nodeC% %instance1%
# instance has moved from (A, B) to (A, C)
# we now flip the primary/secondary nodes
$ gnt-instance migrate %instance1%
# instance lives on (C, A)
# we can then change A to D via:
- $ gnt-instance replace -n %nodeD% %instance1%
+ $ gnt-instance replace-disks -n %nodeD% %instance1%
Which brings it into the final configuration of ``(C, D)``. Note that we
needed to do two replace-disks operation (two copies of the instance
Otherwise, if you plan to re-create the cluster, you can just go ahead
and rerun ``gnt-cluster init``.
+Monitoring the cluster
+----------------------
+
+Starting with Ganeti 2.8, a monitoring daemon is available, providing
+information about the status and the performance of the system.
+
+The monitoring daemon runs on every node, listening on TCP port 1815. Each
+instance of the daemon provides information related to the node it is running
+on.
+
+.. include:: monitoring-query-format.rst
+
Tags handling
-------------
/cluster foo
/instances/instance1 owner:bar
+Autorepair
+----------
+
+The tool ``harep`` can be used to automatically fix some problems that are
+present in the cluster.
+
+It is mainly meant to be regularly and automatically executed
+as a cron job. This is quite evident by considering that, when executed, it does
+not immediately fix all the issues of the instances of the cluster, but it
+cycles the instances through a series of states, one at every ``harep``
+execution. Every state performs a step towards the resolution of the problem.
+This process goes on until the instance is brought back to the healthy state,
+or the tool realizes that it is not able to fix the instance, and
+therefore marks it as in failure state.
+
+Allowing harep to act on the cluster
+++++++++++++++++++++++++++++++++++++
+
+By default, ``harep`` checks the status of the cluster but it is not allowed to
+perform any modification. Modification must be explicitly allowed by an
+appropriate use of tags. Tagging can be applied at various levels, and can
+enable different kinds of autorepair, as hereafter described.
+
+All the tags that authorize ``harep`` to perform modifications follow this
+syntax::
+
+ ganeti:watcher:autorepair:<type>
+
+where ``<type>`` indicates the kind of intervention that can be performed. Every
+possible value of ``<type>`` includes at least all the authorization of the
+previous one, plus its own. The possible values, in increasing order of
+severity, are:
+
+- ``fix-storage`` allows a disk replacement or another operation that
+ fixes the instance backend storage without affecting the instance
+ itself. This can for example recover from a broken drbd secondary, but
+ risks data loss if something is wrong on the primary but the secondary
+ was somehow recoverable.
+- ``migrate`` allows an instance migration. This can recover from a
+ drained primary, but can cause an instance crash in some cases (bugs).
+- ``failover`` allows instance reboot on the secondary. This can recover
+ from an offline primary, but the instance will lose its running state.
+- ``reinstall`` allows disks to be recreated and an instance to be
+ reinstalled. This can recover from primary&secondary both being
+ offline, or from an offline primary in the case of non-redundant
+ instances. It causes data loss.
+
+These autorepair tags can be applied to a cluster, a nodegroup or an instance,
+and will act where they are applied and to everything in the entities sub-tree
+(e.g. a tag applied to a nodegroup will apply to all the instances contained in
+that nodegroup, but not to the rest of the cluster).
+
+If there are multiple ``ganeti:watcher:autorepair:<type>`` tags in an
+object (cluster, node group or instance), the least destructive tag
+takes precedence. When multiplicity happens across objects, the nearest
+tag wins. For example, if in a cluster with two instances, *I1* and
+*I2*, *I1* has ``failover``, and the cluster itself has both
+``fix-storage`` and ``reinstall``, *I1* will end up with ``failover``
+and *I2* with ``fix-storage``.
+
+Limiting harep
+++++++++++++++
+
+Sometimes it is useful to stop harep from performing its task temporarily,
+and it is useful to be able to do so without distrupting its configuration, that
+is, without removing the authorization tags. In order to do this, suspend tags
+are provided.
+
+Suspend tags can be added to cluster, nodegroup or instances, and act on the
+entire entities sub-tree. No operation will be performed by ``harep`` on the
+instances protected by a suspend tag. Their syntax is as follows::
+
+ ganeti:watcher:autorepair:suspend[:<timestamp>]
+
+If there are multiple suspend tags in an object, the form without timestamp
+takes precedence (permanent suspension); or, if all object tags have a
+timestamp, the one with the highest timestamp.
+
+Tags with a timestamp will be automatically removed when the time indicated by
+the timestamp is passed. Indefinite suspension tags have to be removed manually.
+
+Result reporting
+++++++++++++++++
+
+Harep will report about the result of its actions both through its CLI, and by
+adding tags to the instances it operated on. Such tags will follow the syntax
+hereby described::
+
+ ganeti:watcher:autorepair:result:<type>:<id>:<timestamp>:<result>:<jobs>
+
+If this tag is present a repair of type ``type`` has been performed on
+the instance and has been completed by ``timestamp``. The result is
+either ``success``, ``failure`` or ``enoperm``, and jobs is a
+*+*-separated list of jobs that were executed for this repair.
+
+An ``enoperm`` result is an error state due to permission problems. It
+is returned when the repair cannot proceed because it would require to perform
+an operation that is not allowed by the ``ganeti:watcher:autorepair:<type>`` tag
+that is defining the instance autorepair permissions.
+
+NB: if an instance repair ends up in a failure state, it will not be touched
+again by ``harep`` until it has been manually fixed by the system administrator
+and the ``ganeti:watcher:autorepair:result:failure:*`` tag has been manually
+removed.
Job operations
--------------
++++++++++
The ``cfgupgrade`` tools is used to upgrade between major (and minor)
-Ganeti versions. Point-releases are usually transparent for the admin.
+Ganeti versions, and to roll back. Point-releases are usually
+transparent for the admin.
More information about the upgrade procedure is listed on the wiki at
http://code.google.com/p/ganeti/wiki/UpgradeNotes.
See :doc:`separate documentation for move-instance <move-instance>`.
+users-setup
++++++++++++
+
+Ganeti can either be run entirely as root, or with every daemon running as
+its own specific user (if the parameters ``--with-user-prefix`` and/or
+``--with-group-prefix`` have been specified at ``./configure``-time).
+
+In case split users are activated, they are required to exist on the system,
+and they need to belong to the proper groups in order for the access
+permissions to files and programs to be correct.
+
+The ``users-setup`` tool, when run, takes care of setting up the proper
+users and groups.
+
+The tool does not accept any parameter, and requires root permissions to run.
+
.. TODO: document cluster-merge tool