code.grnet.gr Git - ganeti-local/blob - doc/design-ceph-ganeti-support.rst

   1 ============================
   2 RADOS/Ceph support in Ganeti
   3 ============================
   4
   5 .. contents:: :depth: 4
   6
   7 Objective
   8 =========
   9
  10 The project aims to improve Ceph RBD support in Ganeti. It can be
  11 primarily divided into following tasks.
  12
  13 - Use Qemu/KVM RBD driver to provide instances with direct RBD
  14   support.
  15 - Allow Ceph RBDs' configuration through Ganeti.
  16 - Write a data collector to monitor Ceph nodes.
  17
  18 Background
  19 ==========
  20
  21 Ceph RBD
  22 --------
  23
  24 Ceph is a distributed storage system which provides data access as
  25 files, objects and blocks. As part of this project, we're interested in
  26 integrating ceph's block device (RBD) directly with Qemu/KVM.
  27
  28 Primary components/daemons of Ceph.
  29 - Monitor - Serve as authentication point for clients.
  30 - Metadata - Store all the filesystem metadata (Not configured here as
  31 they are not required for RBD)
  32 - OSD - Object storage devices. One daemon for each drive/location.
  33
  34 RBD support in Ganeti
  35 ---------------------
  36
  37 Currently, Ganeti supports RBD volumes on a pre-configured Ceph cluster.
  38 This is enabled through RBD disk templates. These templates allow RBD
  39 volume's access through RBD Linux driver. The volumes are mapped to host
  40 as local block devices which are then attached to the instances. This
  41 method incurs an additional overhead. We plan to resolve it by using
  42 Qemu's RBD driver to enable direct access to RBD volumes for KVM
  43 instances.
  44
  45 Also, Ganeti currently uses RBD volumes on a pre-configured ceph cluster.
  46 Allowing configuration of ceph nodes through Ganeti will be a good
  47 addition to its prime features.
  48
  49
  50 Qemu/KVM Direct RBD Integration
  51 ===============================
  52
  53 A new disk param ``access`` is introduced. It's added at
  54 cluster/node-group level to simplify prototype implementation.
  55 It will specify the access method either as ``userspace`` or
  56 ``kernelspace``. It's accessible to StartInstance() in hv_kvm.py. The
  57 device path, ``rbd:<pool>/<vol_name>``, is generated by RADOSBlockDevice
  58 and is added to the params dictionary as ``kvm_dev_path``.
  59
  60 This approach ensures that no disk template specific changes are
  61 required in hv_kvm.py allowing easy integration of other distributed
  62 storage systems (like Gluster).
  63
  64 Note that the RBD volume is mapped as a local block device as before.
  65 The local mapping won't be used during instance operation in the
  66 ``userspace`` access mode, but can be used by administrators and OS
  67 scripts.
  68
  69 Updated commands
  70 ----------------
  71 ::
  72   $ gnt-instance info
  73
  74 ``access:userspace/kernelspace`` will be added to Disks category. This
  75 output applies to KVM based instances only.
  76
  77 Ceph configuration on Ganeti nodes
  78 ==================================
  79
  80 This document proposes configuration of distributed storage
  81 pool (Ceph or Gluster) through ganeti. Currently, this design document
  82 focuses on configuring a Ceph cluster. A prerequisite of this setup
  83 would be installation of ceph packages on all the concerned nodes.
  84
  85 At Ganeti Cluster init, the user will set distributed-storage specific
  86 options which will be stored at cluster level. The Storage cluster
  87 will be initialized using ``gnt-storage``. For the prototype, only a
  88 single storage pool/node-group is configured.
  89
  90 Following steps take place when a node-group is initialized as a storage
  91 cluster.
  92
  93   - Check for an existing ceph cluster through /etc/ceph/ceph.conf file
  94     on each node.
  95   - Fetch cluster configuration parameters and create a distributed
  96     storage object accordingly.
  97   - Issue an 'init distributed storage' RPC to group nodes (if any).
  98   - On each node, ``ceph`` cli tool will run appropriate services.
  99   - Mark nodes as well as the node-group as distributed-storage-enabled.
 100
 101 The storage cluster will operate at a node-group level. The ceph
 102 cluster will be initiated using gnt-storage. A new sub-command
 103 ``init-distributed-storage`` will be added to it.
 104
 105 The configuration of the nodes will be handled through an init function
 106 called by the node daemons running on the respective nodes. A new RPC is
 107 introduced to handle the calls.
 108
 109 A new object will be created to send the storage parameters to the node
 110 - storage_type, devices, node_role (mon/osd) etc.
 111
 112 A new node can be directly assigned to the storage enabled node-group.
 113 During the 'gnt-node add' process, required ceph daemons will be started
 114 and node will be added to the ceph cluster.
 115
 116 Only an offline node can be assigned to storage enabled node-group.
 117 ``gnt-node --readd`` needs to be performed to issue RPCs for spawning
 118 appropriate services on the newly assigned node.
 119
 120 Updated Commands
 121 ----------------
 122
 123 Following are the affected commands.::
 124
 125   $ gnt-cluster init -S ceph:disk=/dev/sdb,option=value...
 126
 127 During cluster initialization, ceph specific options are provided which
 128 apply at cluster-level.::
 129
 130   $ gnt-cluster modify -S ceph:option=value2...
 131
 132 For now, cluster modification will be allowed when there is no
 133 initialized storage cluster.::
 134
 135   $ gnt-storage init-distributed-storage -s{--storage-type} ceph \
 136     <node-group>
 137
 138 Ensure that no other node-group is configured as distributed storage
 139 cluster and configure ceph on the specified node-group. If there is no
 140 node in the node-group, it'll only be marked as distributed storage
 141 enabled and no action will be taken.::
 142
 143   $ gnt-group assign-nodes <group> <node>
 144
 145 It ensures that the node is offline if the node-group specified is
 146 distributed storage capable. Ceph configuration on the newly assigned
 147 node is not performed at this step.::
 148
 149   $ gnt-node --offline
 150
 151 If the node is part of storage node-group, an offline call will stop/remove
 152 ceph daemons.::
 153
 154   $ gnt-node add --readd
 155
 156 If the node is now part of the storage node-group, issue init
 157 distributed storage RPC to the respective node. This step is required
 158 after assigning a node to the storage enabled node-group::
 159
 160   $ gnt-node remove
 161
 162 A warning will be issued stating that the node is part of distributed
 163 storage, mark it offline before removal.
 164
 165 Data collector for Ceph
 166 -----------------------
 167
 168 TBD
 169
 170 Future Work
 171 -----------
 172
 173 Due to the loopback bug in ceph, one may run into daemon hang issues
 174 while performing writes to a RBD volumes through block device mapping.
 175 This bug is applicable only when the RBD volume is stored on the OSD
 176 running on the local node. In order to mitigate this issue, we can
 177 create storage pools on different nodegroups and access RBD
 178 volumes on different pools.
 179 http://tracker.ceph.com/issues/3076
 180
 181 .. vim: set textwidth=72 :
 182 .. Local Variables:
 183 .. mode: rst
 184 .. fill-column: 72
 185 .. End: