rbd disk template documentation and manpages

author Stratos Psomadakis <psomas@grnet.gr>

Fri, 10 Feb 2012 10:29:13 +0000 (12:29 +0200)

committer Iustin Pop <iustin@google.com>

Fri, 10 Feb 2012 15:10:33 +0000 (16:10 +0100)
author Stratos Psomadakis <psomas@grnet.gr>
Fri, 10 Feb 2012 10:29:13 +0000 (12:29 +0200)
committer Iustin Pop <iustin@google.com>
Fri, 10 Feb 2012 15:10:33 +0000 (16:10 +0100)
diff --git a/INSTALL b/INSTALL

index a3bc7a7..767c160 100644 (file)
--- a/INSTALL
+++ b/INSTALL
@@ -19,6 +19,8 @@ Before installing, please verify that you have the following programs:
    versions 0.11.X or above have shown good behavior).
  - `DRBD <http://www.drbd.org/>`_, kernel module and userspace utils,
    version 8.0.7 or above
+- `RBD <http://ceph.newdream.net/>`_, kernel modules (rbd.ko/libceph.ko)
+  and userspace utils (ceph-common)
  - `LVM2 <http://sourceware.org/lvm2/>`_
  - `OpenSSH <http://www.openssh.com/portable.html>`_
  - `bridge utilities <http://www.linuxfoundation.org/en/Net:Bridge>`_
@@ -50,7 +52,7 @@ These programs are supplied as part of most Linux distributions, so
  usually they can be installed via the standard package manager. Also
  many of them will already be installed on a standard machine. On
  Debian/Ubuntu, you can use this command line to install all required
-packages, except for DRBD and Xen::
+packages, except for RBD, DRBD and Xen::
  
    $ apt-get install lvm2 ssh bridge-utils iproute iputils-arping \
                      ndisc6 python python-pyopenssl openssl \
diff --git a/doc/admin.rst b/doc/admin.rst

index 0f910ef..c9cbd96 100644 (file)
--- a/doc/admin.rst
+++ b/doc/admin.rst
@@ -115,7 +115,7 @@ The are multiple options for the storage provided to an instance; while
  the instance sees the same virtual drive in all cases, the node-level
  configuration varies between them.
  
-There are four disk templates you can choose from:
+There are five disk templates you can choose from:
  
  diskless
    The instance has no disks. Only used for special purpose operating
@@ -138,6 +138,10 @@ drbd
    to obtain a highly available instance that can be failed over to a
    remote node should the primary one fail.
  
+rbd
+  The instance will use Volumes inside a RADOS cluster as backend for its
+  disks. It will access them using the RADOS block device (RBD).
+
  IAllocator
  ~~~~~~~~~~
  
@@ -510,6 +514,13 @@ The instance will be started with an amount of memory between its
  target node, or the operation will fail if that's not possible. See
  :ref:`instance-startup-label` for details.
  
+If the instance's disk template is of type rbd, then you can specify
+the target node (which can be any node) explicitly, or specify an
+iallocator plugin. If you omit both, the default iallocator will be
+used to determine the target node::
+
+  gnt-instance failover -n TARGET_NODE INSTANCE_NAME
+
  Live migrating an instance
  ~~~~~~~~~~~~~~~~~~~~~~~~~~
  
@@ -530,6 +541,13 @@ migrating it, unless the ``--no-runtime-changes`` option is passed, in
  which case the target node should have at least the instance's current
  runtime memory free.
  
+If the instance's disk template is of type rbd, then you can specify
+the target node (which can be any node) explicitly, or specify an
+iallocator plugin. If you omit both, the default iallocator will be
+used to determine the target node::
+
+   gnt-instance migrate -n TARGET_NODE INSTANCE_NAME
+
  Moving an instance (offline)
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
@@ -1247,6 +1265,10 @@ of a cluster installation by following these steps on all of the nodes:
  6. Remove the ganeti state directory (``rm -rf /var/lib/ganeti/*``),
     replacing the path with the correct path for your installation.
  
+7. If using RBD, run ``rbd unmap /dev/rbdN`` to unmap the RBD disks.
+   Then remove the RBD disk images used by Ganeti, identified by their
+   UUIDs (``rbd rm uuid.rbd.diskN``).
+
  On the master node, remove the cluster from the master-netdev (usually
  ``xen-br0`` for bridged mode, otherwise ``eth0`` or similar), by running
  ``ip a del $clusterip/32 dev xen-br0`` (use the correct cluster ip and
diff --git a/doc/iallocator.rst b/doc/iallocator.rst

index 57a4388..723b948 100644 (file)
--- a/doc/iallocator.rst
+++ b/doc/iallocator.rst
@@ -41,7 +41,7 @@ using the first one whose filename matches the one given by the user.
  Command line interface changes
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
-The node selection options in instanece add and instance replace disks
+The node selection options in instance add and instance replace disks
  can be replace by the new ``--iallocator=NAME`` option (shortened to
  ``-I``), which will cause the auto-assignement of nodes with the
  passed iallocator. The selected node(s) will be show as part of the
diff --git a/doc/install.rst b/doc/install.rst

index d899abd..d46cf8e 100644 (file)
--- a/doc/install.rst
+++ b/doc/install.rst
@@ -69,6 +69,10 @@ all Ganeti features. The volume group name Ganeti uses (by default) is
  You can also use file-based storage only, without LVM, but this setup is
  not detailed in this document.
  
+If you choose to use RBD-based instances, there's no need for LVM
+provisioning. However, this feature is experimental, and is not
+recommended for production clusters.
+
  While you can use an existing system, please note that the Ganeti
  installation is intrusive in terms of changes to the system
  configuration, and it's best to use a newly-installed system without
@@ -300,6 +304,88 @@ instances on a node.
         }
       }
  
+Installing RBD
++++++++++++++++
+
+Recommended on all nodes: RBD_ is required if you want to create
+instances with RBD disks residing inside a RADOS cluster (make use of
+the rbd disk template). RBD-based instances can failover or migrate to
+any other node in the ganeti cluster, enabling you to exploit of all
+Ganeti's high availabilily (HA) features.
+
+.. attention::
+   Be careful though: rbd is still experimental! For now it is
+   recommended only for testing purposes.  No sensitive data should be
+   stored there.
+
+.. _RBD: http://ceph.newdream.net/
+
+You will need the ``rbd`` and ``libceph`` kernel modules, the RBD/Ceph
+userspace utils (ceph-common Debian package) and an appropriate
+Ceph/RADOS configuration file on every VM-capable node.
+
+You will also need a working RADOS Cluster accessible by the above
+nodes.
+
+RADOS Cluster
+~~~~~~~~~~~~~
+
+You will need a working RADOS Cluster accesible by all VM-capable nodes
+to use the RBD template. For more information on setting up a RADOS
+Cluster, refer to the `official docs <http://ceph.newdream.net/>`_.
+
+If you want to use a pool for storing RBD disk images other than the
+default (``rbd``), you should first create the pool in the RADOS
+Cluster, and then set the corresponding rbd disk parameter named
+``pool``.
+
+Kernel Modules
+~~~~~~~~~~~~~~
+
+Unless your distribution already provides it, you might need to compile
+the ``rbd`` and ``libceph`` modules from source. You will need Linux
+Kernel 3.2 or above for the kernel modules. Alternatively you will have
+to build them as external modules (from Linux Kernel source 3.2 or
+above), if you want to run a less recent kernel, or your kernel doesn't
+include them.
+
+Userspace Utils
+~~~~~~~~~~~~~~~
+
+The RBD template has been tested with ``ceph-common`` v0.38 and
+above. We recommend using the latest version of ``ceph-common``.
+
+.. admonition:: Debian
+
+   On Debian, you can just install the RBD/Ceph userspace utils with
+   the following command::
+
+      apt-get install ceph-common
+
+Configuration file
+~~~~~~~~~~~~~~~~~~
+
+You should also provide an appropriate configuration file
+(``ceph.conf``) in ``/etc/ceph``. For the rbd userspace utils, you'll
+only need to specify the IP addresses of the RADOS Cluster monitors.
+
+.. admonition:: ceph.conf
+
+   Sample configuration file::
+
+    [mon.a]
+           host = example_monitor_host1
+           mon addr = 1.2.3.4:6789
+    [mon.b]
+           host = example_monitor_host2
+           mon addr = 1.2.3.5:6789
+    [mon.c]
+           host = example_monitor_host3
+           mon addr = 1.2.3.6:6789
+
+For more information, please see the `Ceph Docs
+<http://ceph.newdream.net/docs/latest/>`_
+
  Other required software
  +++++++++++++++++++++++
  
diff --git a/man/gnt-cluster.rst b/man/gnt-cluster.rst

index 81a05b9..cfd994f 100644 (file)
--- a/man/gnt-cluster.rst
+++ b/man/gnt-cluster.rst
@@ -445,6 +445,13 @@ List of parameters available for the **plain** template:
  stripes
      Number of stripes to use for new LVs.
  
+List of parameters available for the **rbd** template:
+
+pool
+    The RADOS cluster pool, inside which all rbd volumes will reside.
+    When a new RADOS cluster is deployed, the default pool to put rbd
+    volumes (Images in RADOS terminology) is 'rbd'.
+
  The option ``--maintain-node-health`` allows one to enable/disable
  automatic maintenance actions on nodes. Currently these include
  automatic shutdown of instances and deactivation of DRBD devices on
diff --git a/man/gnt-instance.rst b/man/gnt-instance.rst

index 2102452..09f3105 100644 (file)
--- a/man/gnt-instance.rst
+++ b/man/gnt-instance.rst
@@ -27,7 +27,7 @@ ADD
  ^^^
  
  | **add**
-| {-t|--disk-template {diskless | file \| plain \| drbd}}
+| {-t|--disk-template {diskless | file \| plain \| drbd \| rbd}}
  | {--disk=*N*: {size=*VAL* \| adopt=*LV*}[,vg=*VG*][,metavg=*VG*][,mode=*ro\|rw*]
  |  \| {-s|--os-size} *SIZE*}
  | [--no-ip-check] [--no-name-check] [--no-start] [--no-install]
@@ -588,6 +588,9 @@ plain
  drbd
      Disk devices will be drbd (version 8.x) on top of lvm volumes.
  
+rbd
+    Disk devices will be rbd volumes residing inside a RADOS cluster.
+
  
  The optional second value of the ``-n (--node)`` is used for the drbd
  template type and specifies the remote node.
@@ -1321,7 +1324,7 @@ GROW-DISK
  {*amount*}
  
  Grows an instance's disk. This is only possible for instances having a
-plain or drbd disk template.
+plain, drbd or rbd disk template.
  
  Note that this command only change the block device size; it will not
  grow the actual filesystems, partitions, etc. that live on that
@@ -1341,10 +1344,10 @@ amount to increase the disk with in mebibytes) or can be given similar
  to the arguments in the create instance operation, with a suffix
  denoting the unit.
  
-Note that the disk grow operation might complete on one node but fail
-on the other; this will leave the instance with different-sized LVs on
-the two nodes, but this will not create problems (except for unused
-space).
+For instances with a drbd template, note that the disk grow operation
+might complete on one node but fail on the other; this will leave the
+instance with different-sized LVs on the two nodes, but this will not
+create problems (except for unused space).
  
  If you do not want gnt-instance to wait for the new disk region to be
  synced, use the ``--no-wait-for-sync`` option.
@@ -1401,16 +1404,25 @@ Recovery
  FAILOVER
  ^^^^^^^^
  
-**failover** [-f] [--ignore-consistency] [--shutdown-timeout=*N*]
-[--submit] [--ignore-ipolicy] {*instance*}
+| **failover** [-f] [--ignore-consistency] [--ignore-ipolicy]
+| [--shutdown-timeout=*N*]
+| [{-n|--target-node} *node* \| {-I|--iallocator} *name*]
+| [--submit]
+| {*instance*}
  
  Failover will stop the instance (if running), change its primary node,
  and if it was originally running it will start it again (on the new
  primary). This only works for instances with drbd template (in which
  case you can only fail to the secondary node) and for externally
-mirrored templates (shared storage) (which can change to any other
+mirrored templates (blockdev and rbd) (which can change to any other
  node).
  
+If the instance's disk template is of type blockdev or rbd, then you
+can explicitly specify the target node (which can be any node) using
+the ``-n`` or ``--target-node`` option, or specify an iallocator plugin
+using the ``-I`` or ``--iallocator`` option. If you omit both, the default
+iallocator will be used to specify the target node.
+
  Normally the failover will check the consistency of the disks before
  failing over the instance. If you are trying to migrate instances off
  a dead node, this will fail. Use the ``--ignore-consistency`` option
@@ -1443,11 +1455,19 @@ MIGRATE
  
  **migrate** [-f] [--allow-failover] [--non-live]
  [--migration-mode=live\|non-live] [--ignore-ipolicy]
-[--no-runtime-changes] {*instance*}
-
-Migrate will move the instance to its secondary node without
-shutdown. It only works for instances having the drbd8 disk template
-type.
+[--no-runtime-changes]
+[{-n|--target-node} *node* \| {-I|--iallocator} *name*] {*instance*}
+
+Migrate will move the instance to its secondary node without shutdown.
+As with failover, it only works for instances having the drbd disk
+template or an externally mirrored disk template type such as blockdev
+or rbd.
+
+If the instance's disk template is of type blockdev or rbd, then you can
+explicitly specify the target node (which can be any node) using the
+``-n`` or ``--target-node`` option, or specify an iallocator plugin
+using the ``-I`` or ``--iallocator`` option. If you omit both, the
+default iallocator will be used to specify the target node.
  
  The migration command needs a perfectly healthy instance, as we rely
  on the dual-master capability of drbd8 and the disks of the instance
author	Stratos Psomadakis <psomas@grnet.gr>
	Fri, 10 Feb 2012 10:29:13 +0000 (12:29 +0200)
committer	Iustin Pop <iustin@google.com>
	Fri, 10 Feb 2012 15:10:33 +0000 (16:10 +0100)
INSTALL		patch \| blob \| history
doc/admin.rst		patch \| blob \| history
doc/iallocator.rst		patch \| blob \| history
doc/install.rst		patch \| blob \| history
man/gnt-cluster.rst		patch \| blob \| history
man/gnt-instance.rst		patch \| blob \| history