|
1 |
============================
|
|
2 |
RADOS/Ceph support in Ganeti
|
|
3 |
============================
|
|
4 |
|
|
5 |
.. contents:: :depth: 4
|
|
6 |
|
|
7 |
Objective
|
|
8 |
=========
|
|
9 |
|
|
10 |
The project aims to improve Ceph RBD support in Ganeti. It can be
|
|
11 |
primarily divided into following tasks.
|
|
12 |
|
|
13 |
- Use Qemu/KVM RBD driver to provide instances with direct RBD
|
|
14 |
support.
|
|
15 |
- Allow Ceph RBDs' configuration through Ganeti.
|
|
16 |
- Write a data collector to monitor Ceph nodes.
|
|
17 |
|
|
18 |
Background
|
|
19 |
==========
|
|
20 |
|
|
21 |
Ceph RBD
|
|
22 |
--------
|
|
23 |
|
|
24 |
Ceph is a distributed storage system which provides data access as
|
|
25 |
files, objects and blocks. As part of this project, we're interested in
|
|
26 |
integrating ceph's block device (RBD) directly with Qemu/KVM.
|
|
27 |
|
|
28 |
Primary components/daemons of Ceph.
|
|
29 |
- Monitor - Serve as authentication point for clients.
|
|
30 |
- Metadata - Store all the filesystem metadata (Not configured here as
|
|
31 |
they are not required for RBD)
|
|
32 |
- OSD - Object storage devices. One daemon for each drive/location.
|
|
33 |
|
|
34 |
RBD support in Ganeti
|
|
35 |
---------------------
|
|
36 |
|
|
37 |
Currently, Ganeti supports RBD volumes on a pre-configured Ceph cluster.
|
|
38 |
This is enabled through RBD disk templates. These templates allow RBD
|
|
39 |
volume's access through RBD Linux driver. The volumes are mapped to host
|
|
40 |
as local block devices which are then attached to the instances. This
|
|
41 |
method incurs an additional overhead. We plan to resolve it by using
|
|
42 |
Qemu's RBD driver to enable direct access to RBD volumes for KVM
|
|
43 |
instances.
|
|
44 |
|
|
45 |
Also, Ganeti currently uses RBD volumes on a pre-configured ceph cluster.
|
|
46 |
Allowing configuration of ceph nodes through Ganeti will be a good
|
|
47 |
addition to its prime features.
|
|
48 |
|
|
49 |
|
|
50 |
Qemu/KVM Direct RBD Integration
|
|
51 |
===============================
|
|
52 |
|
|
53 |
A new disk param ``access`` is introduced. It's added at
|
|
54 |
cluster/node-group level to simplify prototype implementation.
|
|
55 |
It will specify the access method either as ``userspace`` or
|
|
56 |
``kernelspace``. It's accessible to StartInstance() in hv_kvm.py. The
|
|
57 |
device path, ``rbd:<pool>/<vol_name>``, is generated by RADOSBlockDevice
|
|
58 |
and is added to the params dictionary as ``kvm_dev_path``.
|
|
59 |
|
|
60 |
This approach ensures that no disk template specific changes are
|
|
61 |
required in hv_kvm.py allowing easy integration of other distributed
|
|
62 |
storage systems (like Gluster).
|
|
63 |
|
|
64 |
Note that the RBD volume is mapped as a local block device as before.
|
|
65 |
The local mapping won't be used during instance operation in the
|
|
66 |
``userspace`` access mode, but can be used by administrators and OS
|
|
67 |
scripts.
|
|
68 |
|
|
69 |
Updated commands
|
|
70 |
----------------
|
|
71 |
::
|
|
72 |
$ gnt-instance info
|
|
73 |
|
|
74 |
``access:userspace/kernelspace`` will be added to Disks category. This
|
|
75 |
output applies to KVM based instances only.
|
|
76 |
|
|
77 |
Ceph configuration on Ganeti nodes
|
|
78 |
==================================
|
|
79 |
|
|
80 |
This document proposes configuration of distributed storage
|
|
81 |
pool (Ceph or Gluster) through ganeti. Currently, this design document
|
|
82 |
focuses on configuring a Ceph cluster. A prerequisite of this setup
|
|
83 |
would be installation of ceph packages on all the concerned nodes.
|
|
84 |
|
|
85 |
At Ganeti Cluster init, the user will set distributed-storage specific
|
|
86 |
options which will be stored at cluster level. The Storage cluster
|
|
87 |
will be initialized using ``gnt-storage``. For the prototype, only a
|
|
88 |
single storage pool/node-group is configured.
|
|
89 |
|
|
90 |
Following steps take place when a node-group is initialized as a storage
|
|
91 |
cluster.
|
|
92 |
|
|
93 |
- Check for an existing ceph cluster through /etc/ceph/ceph.conf file
|
|
94 |
on each node.
|
|
95 |
- Fetch cluster configuration parameters and create a distributed
|
|
96 |
storage object accordingly.
|
|
97 |
- Issue an 'init distributed storage' RPC to group nodes (if any).
|
|
98 |
- On each node, ``ceph`` cli tool will run appropriate services.
|
|
99 |
- Mark nodes as well as the node-group as distributed-storage-enabled.
|
|
100 |
|
|
101 |
The storage cluster will operate at a node-group level. The ceph
|
|
102 |
cluster will be initiated using gnt-storage. A new sub-command
|
|
103 |
``init-distributed-storage`` will be added to it.
|
|
104 |
|
|
105 |
The configuration of the nodes will be handled through an init function
|
|
106 |
called by the node daemons running on the respective nodes. A new RPC is
|
|
107 |
introduced to handle the calls.
|
|
108 |
|
|
109 |
A new object will be created to send the storage parameters to the node
|
|
110 |
- storage_type, devices, node_role (mon/osd) etc.
|
|
111 |
|
|
112 |
A new node can be directly assigned to the storage enabled node-group.
|
|
113 |
During the 'gnt-node add' process, required ceph daemons will be started
|
|
114 |
and node will be added to the ceph cluster.
|
|
115 |
|
|
116 |
Only an offline node can be assigned to storage enabled node-group.
|
|
117 |
``gnt-node --readd`` needs to be performed to issue RPCs for spawning
|
|
118 |
appropriate services on the newly assigned node.
|
|
119 |
|
|
120 |
Updated Commands
|
|
121 |
----------------
|
|
122 |
|
|
123 |
Following are the affected commands.::
|
|
124 |
|
|
125 |
$ gnt-cluster init -S ceph:disk=/dev/sdb,option=value...
|
|
126 |
|
|
127 |
During cluster initialization, ceph specific options are provided which
|
|
128 |
apply at cluster-level.::
|
|
129 |
|
|
130 |
$ gnt-cluster modify -S ceph:option=value2...
|
|
131 |
|
|
132 |
For now, cluster modification will be allowed when there is no
|
|
133 |
initialized storage cluster.::
|
|
134 |
|
|
135 |
$ gnt-storage init-distributed-storage -s{--storage-type} ceph \
|
|
136 |
<node-group>
|
|
137 |
|
|
138 |
Ensure that no other node-group is configured as distributed storage
|
|
139 |
cluster and configure ceph on the specified node-group. If there is no
|
|
140 |
node in the node-group, it'll only be marked as distributed storage
|
|
141 |
enabled and no action will be taken.::
|
|
142 |
|
|
143 |
$ gnt-group assign-nodes <group> <node>
|
|
144 |
|
|
145 |
It ensures that the node is offline if the node-group specified is
|
|
146 |
distributed storage capable. Ceph configuration on the newly assigned
|
|
147 |
node is not performed at this step.::
|
|
148 |
|
|
149 |
$ gnt-node --offline
|
|
150 |
|
|
151 |
If the node is part of storage node-group, an offline call will stop/remove
|
|
152 |
ceph daemons.::
|
|
153 |
|
|
154 |
$ gnt-node add --readd
|
|
155 |
|
|
156 |
If the node is now part of the storage node-group, issue init
|
|
157 |
distributed storage RPC to the respective node. This step is required
|
|
158 |
after assigning a node to the storage enabled node-group::
|
|
159 |
|
|
160 |
$ gnt-node remove
|
|
161 |
|
|
162 |
A warning will be issued stating that the node is part of distributed
|
|
163 |
storage, mark it offline before removal.
|
|
164 |
|
|
165 |
Data collector for Ceph
|
|
166 |
-----------------------
|
|
167 |
|
|
168 |
TBD
|
|
169 |
|
|
170 |
Future Work
|
|
171 |
-----------
|
|
172 |
|
|
173 |
Due to the loopback bug in ceph, one may run into daemon hang issues
|
|
174 |
while performing writes to a RBD volumes through block device mapping.
|
|
175 |
This bug is applicable only when the RBD volume is stored on the OSD
|
|
176 |
running on the local node. In order to mitigate this issue, we can
|
|
177 |
create storage pools on different nodegroups and access RBD
|
|
178 |
volumes on different pools.
|
|
179 |
http://tracker.ceph.com/issues/3076
|
|
180 |
|
|
181 |
.. vim: set textwidth=72 :
|
|
182 |
.. Local Variables:
|
|
183 |
.. mode: rst
|
|
184 |
.. fill-column: 72
|
|
185 |
.. End:
|