Add documentation and modify manpages for the RBD disk template.
Signed-off-by: Constantinos Venetsanopoulos <cven@grnet.gr>
Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
versions 0.11.X or above have shown good behavior).
- `DRBD <http://www.drbd.org/>`_, kernel module and userspace utils,
version 8.0.7 or above
+- `RBD <http://ceph.newdream.net/>`_, kernel modules (rbd.ko/libceph.ko)
+ and userspace utils (ceph-common)
- `LVM2 <http://sourceware.org/lvm2/>`_
- `OpenSSH <http://www.openssh.com/portable.html>`_
- `bridge utilities <http://www.linuxfoundation.org/en/Net:Bridge>`_
usually they can be installed via the standard package manager. Also
many of them will already be installed on a standard machine. On
Debian/Ubuntu, you can use this command line to install all required
-packages, except for DRBD and Xen::
+packages, except for RBD, DRBD and Xen::
$ apt-get install lvm2 ssh bridge-utils iproute iputils-arping \
ndisc6 python python-pyopenssl openssl \
the instance sees the same virtual drive in all cases, the node-level
configuration varies between them.
-There are four disk templates you can choose from:
+There are five disk templates you can choose from:
diskless
The instance has no disks. Only used for special purpose operating
to obtain a highly available instance that can be failed over to a
remote node should the primary one fail.
+rbd
+ The instance will use Volumes inside a RADOS cluster as backend for its
+ disks. It will access them using the RADOS block device (RBD).
+
IAllocator
~~~~~~~~~~
target node, or the operation will fail if that's not possible. See
:ref:`instance-startup-label` for details.
+If the instance's disk template is of type rbd, then you can specify
+the target node (which can be any node) explicitly, or specify an
+iallocator plugin. If you omit both, the default iallocator will be
+used to determine the target node::
+
+ gnt-instance failover -n TARGET_NODE INSTANCE_NAME
+
Live migrating an instance
~~~~~~~~~~~~~~~~~~~~~~~~~~
which case the target node should have at least the instance's current
runtime memory free.
+If the instance's disk template is of type rbd, then you can specify
+the target node (which can be any node) explicitly, or specify an
+iallocator plugin. If you omit both, the default iallocator will be
+used to determine the target node::
+
+ gnt-instance migrate -n TARGET_NODE INSTANCE_NAME
+
Moving an instance (offline)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6. Remove the ganeti state directory (``rm -rf /var/lib/ganeti/*``),
replacing the path with the correct path for your installation.
+7. If using RBD, run ``rbd unmap /dev/rbdN`` to unmap the RBD disks.
+ Then remove the RBD disk images used by Ganeti, identified by their
+ UUIDs (``rbd rm uuid.rbd.diskN``).
+
On the master node, remove the cluster from the master-netdev (usually
``xen-br0`` for bridged mode, otherwise ``eth0`` or similar), by running
``ip a del $clusterip/32 dev xen-br0`` (use the correct cluster ip and
Command line interface changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The node selection options in instanece add and instance replace disks
+The node selection options in instance add and instance replace disks
can be replace by the new ``--iallocator=NAME`` option (shortened to
``-I``), which will cause the auto-assignement of nodes with the
passed iallocator. The selected node(s) will be show as part of the
You can also use file-based storage only, without LVM, but this setup is
not detailed in this document.
+If you choose to use RBD-based instances, there's no need for LVM
+provisioning. However, this feature is experimental, and is not
+recommended for production clusters.
+
While you can use an existing system, please note that the Ganeti
installation is intrusive in terms of changes to the system
configuration, and it's best to use a newly-installed system without
}
}
+Installing RBD
++++++++++++++++
+
+Recommended on all nodes: RBD_ is required if you want to create
+instances with RBD disks residing inside a RADOS cluster (make use of
+the rbd disk template). RBD-based instances can failover or migrate to
+any other node in the ganeti cluster, enabling you to exploit of all
+Ganeti's high availabilily (HA) features.
+
+.. attention::
+ Be careful though: rbd is still experimental! For now it is
+ recommended only for testing purposes. No sensitive data should be
+ stored there.
+
+.. _RBD: http://ceph.newdream.net/
+
+You will need the ``rbd`` and ``libceph`` kernel modules, the RBD/Ceph
+userspace utils (ceph-common Debian package) and an appropriate
+Ceph/RADOS configuration file on every VM-capable node.
+
+You will also need a working RADOS Cluster accessible by the above
+nodes.
+
+RADOS Cluster
+~~~~~~~~~~~~~
+
+You will need a working RADOS Cluster accesible by all VM-capable nodes
+to use the RBD template. For more information on setting up a RADOS
+Cluster, refer to the `official docs <http://ceph.newdream.net/>`_.
+
+If you want to use a pool for storing RBD disk images other than the
+default (``rbd``), you should first create the pool in the RADOS
+Cluster, and then set the corresponding rbd disk parameter named
+``pool``.
+
+Kernel Modules
+~~~~~~~~~~~~~~
+
+Unless your distribution already provides it, you might need to compile
+the ``rbd`` and ``libceph`` modules from source. You will need Linux
+Kernel 3.2 or above for the kernel modules. Alternatively you will have
+to build them as external modules (from Linux Kernel source 3.2 or
+above), if you want to run a less recent kernel, or your kernel doesn't
+include them.
+
+Userspace Utils
+~~~~~~~~~~~~~~~
+
+The RBD template has been tested with ``ceph-common`` v0.38 and
+above. We recommend using the latest version of ``ceph-common``.
+
+.. admonition:: Debian
+
+ On Debian, you can just install the RBD/Ceph userspace utils with
+ the following command::
+
+ apt-get install ceph-common
+
+Configuration file
+~~~~~~~~~~~~~~~~~~
+
+You should also provide an appropriate configuration file
+(``ceph.conf``) in ``/etc/ceph``. For the rbd userspace utils, you'll
+only need to specify the IP addresses of the RADOS Cluster monitors.
+
+.. admonition:: ceph.conf
+
+ Sample configuration file::
+
+ [mon.a]
+ host = example_monitor_host1
+ mon addr = 1.2.3.4:6789
+ [mon.b]
+ host = example_monitor_host2
+ mon addr = 1.2.3.5:6789
+ [mon.c]
+ host = example_monitor_host3
+ mon addr = 1.2.3.6:6789
+
+For more information, please see the `Ceph Docs
+<http://ceph.newdream.net/docs/latest/>`_
+
Other required software
+++++++++++++++++++++++
stripes
Number of stripes to use for new LVs.
+List of parameters available for the **rbd** template:
+
+pool
+ The RADOS cluster pool, inside which all rbd volumes will reside.
+ When a new RADOS cluster is deployed, the default pool to put rbd
+ volumes (Images in RADOS terminology) is 'rbd'.
+
The option ``--maintain-node-health`` allows one to enable/disable
automatic maintenance actions on nodes. Currently these include
automatic shutdown of instances and deactivation of DRBD devices on
^^^
| **add**
-| {-t|--disk-template {diskless | file \| plain \| drbd}}
+| {-t|--disk-template {diskless | file \| plain \| drbd \| rbd}}
| {--disk=*N*: {size=*VAL* \| adopt=*LV*}[,vg=*VG*][,metavg=*VG*][,mode=*ro\|rw*]
| \| {-s|--os-size} *SIZE*}
| [--no-ip-check] [--no-name-check] [--no-start] [--no-install]
drbd
Disk devices will be drbd (version 8.x) on top of lvm volumes.
+rbd
+ Disk devices will be rbd volumes residing inside a RADOS cluster.
+
The optional second value of the ``-n (--node)`` is used for the drbd
template type and specifies the remote node.
{*amount*}
Grows an instance's disk. This is only possible for instances having a
-plain or drbd disk template.
+plain, drbd or rbd disk template.
Note that this command only change the block device size; it will not
grow the actual filesystems, partitions, etc. that live on that
to the arguments in the create instance operation, with a suffix
denoting the unit.
-Note that the disk grow operation might complete on one node but fail
-on the other; this will leave the instance with different-sized LVs on
-the two nodes, but this will not create problems (except for unused
-space).
+For instances with a drbd template, note that the disk grow operation
+might complete on one node but fail on the other; this will leave the
+instance with different-sized LVs on the two nodes, but this will not
+create problems (except for unused space).
If you do not want gnt-instance to wait for the new disk region to be
synced, use the ``--no-wait-for-sync`` option.
FAILOVER
^^^^^^^^
-**failover** [-f] [--ignore-consistency] [--shutdown-timeout=*N*]
-[--submit] [--ignore-ipolicy] {*instance*}
+| **failover** [-f] [--ignore-consistency] [--ignore-ipolicy]
+| [--shutdown-timeout=*N*]
+| [{-n|--target-node} *node* \| {-I|--iallocator} *name*]
+| [--submit]
+| {*instance*}
Failover will stop the instance (if running), change its primary node,
and if it was originally running it will start it again (on the new
primary). This only works for instances with drbd template (in which
case you can only fail to the secondary node) and for externally
-mirrored templates (shared storage) (which can change to any other
+mirrored templates (blockdev and rbd) (which can change to any other
node).
+If the instance's disk template is of type blockdev or rbd, then you
+can explicitly specify the target node (which can be any node) using
+the ``-n`` or ``--target-node`` option, or specify an iallocator plugin
+using the ``-I`` or ``--iallocator`` option. If you omit both, the default
+iallocator will be used to specify the target node.
+
Normally the failover will check the consistency of the disks before
failing over the instance. If you are trying to migrate instances off
a dead node, this will fail. Use the ``--ignore-consistency`` option
**migrate** [-f] [--allow-failover] [--non-live]
[--migration-mode=live\|non-live] [--ignore-ipolicy]
-[--no-runtime-changes] {*instance*}
-
-Migrate will move the instance to its secondary node without
-shutdown. It only works for instances having the drbd8 disk template
-type.
+[--no-runtime-changes]
+[{-n|--target-node} *node* \| {-I|--iallocator} *name*] {*instance*}
+
+Migrate will move the instance to its secondary node without shutdown.
+As with failover, it only works for instances having the drbd disk
+template or an externally mirrored disk template type such as blockdev
+or rbd.
+
+If the instance's disk template is of type blockdev or rbd, then you can
+explicitly specify the target node (which can be any node) using the
+``-n`` or ``--target-node`` option, or specify an iallocator plugin
+using the ``-I`` or ``--iallocator`` option. If you omit both, the
+default iallocator will be used to specify the target node.
The migration command needs a perfectly healthy instance, as we rely
on the dual-master capability of drbd8 and the disks of the instance