root / doc / design-ceph-ganeti-support.rst @ 9110fb4a
History | View | Annotate | Download (6.3 kB)
1 |
============================ |
---|---|
2 |
RADOS/Ceph support in Ganeti |
3 |
============================ |
4 |
|
5 |
.. contents:: :depth: 4 |
6 |
|
7 |
Objective |
8 |
========= |
9 |
|
10 |
The project aims to improve Ceph RBD support in Ganeti. It can be |
11 |
primarily divided into following tasks. |
12 |
|
13 |
- Use Qemu/KVM RBD driver to provide instances with direct RBD |
14 |
support. |
15 |
- Allow Ceph RBDs' configuration through Ganeti. |
16 |
- Write a data collector to monitor Ceph nodes. |
17 |
|
18 |
Background |
19 |
========== |
20 |
|
21 |
Ceph RBD |
22 |
-------- |
23 |
|
24 |
Ceph is a distributed storage system which provides data access as |
25 |
files, objects and blocks. As part of this project, we're interested in |
26 |
integrating ceph's block device (RBD) directly with Qemu/KVM. |
27 |
|
28 |
Primary components/daemons of Ceph. |
29 |
- Monitor - Serve as authentication point for clients. |
30 |
- Metadata - Store all the filesystem metadata (Not configured here as |
31 |
they are not required for RBD) |
32 |
- OSD - Object storage devices. One daemon for each drive/location. |
33 |
|
34 |
RBD support in Ganeti |
35 |
--------------------- |
36 |
|
37 |
Currently, Ganeti supports RBD volumes on a pre-configured Ceph cluster. |
38 |
This is enabled through RBD disk templates. These templates allow RBD |
39 |
volume's access through RBD Linux driver. The volumes are mapped to host |
40 |
as local block devices which are then attached to the instances. This |
41 |
method incurs an additional overhead. We plan to resolve it by using |
42 |
Qemu's RBD driver to enable direct access to RBD volumes for KVM |
43 |
instances. |
44 |
|
45 |
Also, Ganeti currently uses RBD volumes on a pre-configured ceph cluster. |
46 |
Allowing configuration of ceph nodes through Ganeti will be a good |
47 |
addition to its prime features. |
48 |
|
49 |
|
50 |
Qemu/KVM Direct RBD Integration |
51 |
=============================== |
52 |
|
53 |
A new disk param ``access`` is introduced. It's added at |
54 |
cluster/node-group level to simplify prototype implementation. |
55 |
It will specify the access method either as ``userspace`` or |
56 |
``kernelspace``. It's accessible to StartInstance() in hv_kvm.py. The |
57 |
device path, ``rbd:<pool>/<vol_name>``, is generated by RADOSBlockDevice |
58 |
and is added to the params dictionary as ``kvm_dev_path``. |
59 |
|
60 |
This approach ensures that no disk template specific changes are |
61 |
required in hv_kvm.py allowing easy integration of other distributed |
62 |
storage systems (like Gluster). |
63 |
|
64 |
Note that the RBD volume is mapped as a local block device as before. |
65 |
The local mapping won't be used during instance operation in the |
66 |
``userspace`` access mode, but can be used by administrators and OS |
67 |
scripts. |
68 |
|
69 |
Updated commands |
70 |
---------------- |
71 |
:: |
72 |
$ gnt-instance info |
73 |
|
74 |
``access:userspace/kernelspace`` will be added to Disks category. This |
75 |
output applies to KVM based instances only. |
76 |
|
77 |
Ceph configuration on Ganeti nodes |
78 |
================================== |
79 |
|
80 |
This document proposes configuration of distributed storage |
81 |
pool (Ceph or Gluster) through ganeti. Currently, this design document |
82 |
focuses on configuring a Ceph cluster. A prerequisite of this setup |
83 |
would be installation of ceph packages on all the concerned nodes. |
84 |
|
85 |
At Ganeti Cluster init, the user will set distributed-storage specific |
86 |
options which will be stored at cluster level. The Storage cluster |
87 |
will be initialized using ``gnt-storage``. For the prototype, only a |
88 |
single storage pool/node-group is configured. |
89 |
|
90 |
Following steps take place when a node-group is initialized as a storage |
91 |
cluster. |
92 |
|
93 |
- Check for an existing ceph cluster through /etc/ceph/ceph.conf file |
94 |
on each node. |
95 |
- Fetch cluster configuration parameters and create a distributed |
96 |
storage object accordingly. |
97 |
- Issue an 'init distributed storage' RPC to group nodes (if any). |
98 |
- On each node, ``ceph`` cli tool will run appropriate services. |
99 |
- Mark nodes as well as the node-group as distributed-storage-enabled. |
100 |
|
101 |
The storage cluster will operate at a node-group level. The ceph |
102 |
cluster will be initiated using gnt-storage. A new sub-command |
103 |
``init-distributed-storage`` will be added to it. |
104 |
|
105 |
The configuration of the nodes will be handled through an init function |
106 |
called by the node daemons running on the respective nodes. A new RPC is |
107 |
introduced to handle the calls. |
108 |
|
109 |
A new object will be created to send the storage parameters to the node |
110 |
- storage_type, devices, node_role (mon/osd) etc. |
111 |
|
112 |
A new node can be directly assigned to the storage enabled node-group. |
113 |
During the 'gnt-node add' process, required ceph daemons will be started |
114 |
and node will be added to the ceph cluster. |
115 |
|
116 |
Only an offline node can be assigned to storage enabled node-group. |
117 |
``gnt-node --readd`` needs to be performed to issue RPCs for spawning |
118 |
appropriate services on the newly assigned node. |
119 |
|
120 |
Updated Commands |
121 |
---------------- |
122 |
|
123 |
Following are the affected commands.:: |
124 |
|
125 |
$ gnt-cluster init -S ceph:disk=/dev/sdb,option=value... |
126 |
|
127 |
During cluster initialization, ceph specific options are provided which |
128 |
apply at cluster-level.:: |
129 |
|
130 |
$ gnt-cluster modify -S ceph:option=value2... |
131 |
|
132 |
For now, cluster modification will be allowed when there is no |
133 |
initialized storage cluster.:: |
134 |
|
135 |
$ gnt-storage init-distributed-storage -s{--storage-type} ceph \ |
136 |
<node-group> |
137 |
|
138 |
Ensure that no other node-group is configured as distributed storage |
139 |
cluster and configure ceph on the specified node-group. If there is no |
140 |
node in the node-group, it'll only be marked as distributed storage |
141 |
enabled and no action will be taken.:: |
142 |
|
143 |
$ gnt-group assign-nodes <group> <node> |
144 |
|
145 |
It ensures that the node is offline if the node-group specified is |
146 |
distributed storage capable. Ceph configuration on the newly assigned |
147 |
node is not performed at this step.:: |
148 |
|
149 |
$ gnt-node --offline |
150 |
|
151 |
If the node is part of storage node-group, an offline call will stop/remove |
152 |
ceph daemons.:: |
153 |
|
154 |
$ gnt-node add --readd |
155 |
|
156 |
If the node is now part of the storage node-group, issue init |
157 |
distributed storage RPC to the respective node. This step is required |
158 |
after assigning a node to the storage enabled node-group:: |
159 |
|
160 |
$ gnt-node remove |
161 |
|
162 |
A warning will be issued stating that the node is part of distributed |
163 |
storage, mark it offline before removal. |
164 |
|
165 |
Data collector for Ceph |
166 |
----------------------- |
167 |
|
168 |
TBD |
169 |
|
170 |
Future Work |
171 |
----------- |
172 |
|
173 |
Due to the loopback bug in ceph, one may run into daemon hang issues |
174 |
while performing writes to a RBD volumes through block device mapping. |
175 |
This bug is applicable only when the RBD volume is stored on the OSD |
176 |
running on the local node. In order to mitigate this issue, we can |
177 |
create storage pools on different nodegroups and access RBD |
178 |
volumes on different pools. |
179 |
http://tracker.ceph.com/issues/3076 |
180 |
|
181 |
.. vim: set textwidth=72 : |
182 |
.. Local Variables: |
183 |
.. mode: rst |
184 |
.. fill-column: 72 |
185 |
.. End: |