root / doc / design-ceph-ganeti-support.rst @ 0565f862
History | View | Annotate | Download (6.3 kB)
1 | e77c026d | Pulkit Singhal | ============================ |
---|---|---|---|
2 | e77c026d | Pulkit Singhal | RADOS/Ceph support in Ganeti |
3 | e77c026d | Pulkit Singhal | ============================ |
4 | e77c026d | Pulkit Singhal | |
5 | e77c026d | Pulkit Singhal | .. contents:: :depth: 4 |
6 | e77c026d | Pulkit Singhal | |
7 | e77c026d | Pulkit Singhal | Objective |
8 | e77c026d | Pulkit Singhal | ========= |
9 | e77c026d | Pulkit Singhal | |
10 | e77c026d | Pulkit Singhal | The project aims to improve Ceph RBD support in Ganeti. It can be |
11 | e77c026d | Pulkit Singhal | primarily divided into following tasks. |
12 | e77c026d | Pulkit Singhal | |
13 | e77c026d | Pulkit Singhal | - Use Qemu/KVM RBD driver to provide instances with direct RBD |
14 | e77c026d | Pulkit Singhal | support. |
15 | e77c026d | Pulkit Singhal | - Allow Ceph RBDs' configuration through Ganeti. |
16 | e77c026d | Pulkit Singhal | - Write a data collector to monitor Ceph nodes. |
17 | e77c026d | Pulkit Singhal | |
18 | e77c026d | Pulkit Singhal | Background |
19 | e77c026d | Pulkit Singhal | ========== |
20 | e77c026d | Pulkit Singhal | |
21 | e77c026d | Pulkit Singhal | Ceph RBD |
22 | e77c026d | Pulkit Singhal | -------- |
23 | e77c026d | Pulkit Singhal | |
24 | e77c026d | Pulkit Singhal | Ceph is a distributed storage system which provides data access as |
25 | e77c026d | Pulkit Singhal | files, objects and blocks. As part of this project, we're interested in |
26 | e77c026d | Pulkit Singhal | integrating ceph's block device (RBD) directly with Qemu/KVM. |
27 | e77c026d | Pulkit Singhal | |
28 | e77c026d | Pulkit Singhal | Primary components/daemons of Ceph. |
29 | e77c026d | Pulkit Singhal | - Monitor - Serve as authentication point for clients. |
30 | e77c026d | Pulkit Singhal | - Metadata - Store all the filesystem metadata (Not configured here as |
31 | e77c026d | Pulkit Singhal | they are not required for RBD) |
32 | e77c026d | Pulkit Singhal | - OSD - Object storage devices. One daemon for each drive/location. |
33 | e77c026d | Pulkit Singhal | |
34 | e77c026d | Pulkit Singhal | RBD support in Ganeti |
35 | e77c026d | Pulkit Singhal | --------------------- |
36 | e77c026d | Pulkit Singhal | |
37 | e77c026d | Pulkit Singhal | Currently, Ganeti supports RBD volumes on a pre-configured Ceph cluster. |
38 | e77c026d | Pulkit Singhal | This is enabled through RBD disk templates. These templates allow RBD |
39 | e77c026d | Pulkit Singhal | volume's access through RBD Linux driver. The volumes are mapped to host |
40 | e77c026d | Pulkit Singhal | as local block devices which are then attached to the instances. This |
41 | e77c026d | Pulkit Singhal | method incurs an additional overhead. We plan to resolve it by using |
42 | e77c026d | Pulkit Singhal | Qemu's RBD driver to enable direct access to RBD volumes for KVM |
43 | e77c026d | Pulkit Singhal | instances. |
44 | e77c026d | Pulkit Singhal | |
45 | e77c026d | Pulkit Singhal | Also, Ganeti currently uses RBD volumes on a pre-configured ceph cluster. |
46 | e77c026d | Pulkit Singhal | Allowing configuration of ceph nodes through Ganeti will be a good |
47 | e77c026d | Pulkit Singhal | addition to its prime features. |
48 | e77c026d | Pulkit Singhal | |
49 | e77c026d | Pulkit Singhal | |
50 | e77c026d | Pulkit Singhal | Qemu/KVM Direct RBD Integration |
51 | e77c026d | Pulkit Singhal | =============================== |
52 | e77c026d | Pulkit Singhal | |
53 | e77c026d | Pulkit Singhal | A new disk param ``access`` is introduced. It's added at |
54 | e77c026d | Pulkit Singhal | cluster/node-group level to simplify prototype implementation. |
55 | e77c026d | Pulkit Singhal | It will specify the access method either as ``userspace`` or |
56 | e77c026d | Pulkit Singhal | ``kernelspace``. It's accessible to StartInstance() in hv_kvm.py. The |
57 | e77c026d | Pulkit Singhal | device path, ``rbd:<pool>/<vol_name>``, is generated by RADOSBlockDevice |
58 | e77c026d | Pulkit Singhal | and is added to the params dictionary as ``kvm_dev_path``. |
59 | e77c026d | Pulkit Singhal | |
60 | e77c026d | Pulkit Singhal | This approach ensures that no disk template specific changes are |
61 | e77c026d | Pulkit Singhal | required in hv_kvm.py allowing easy integration of other distributed |
62 | e77c026d | Pulkit Singhal | storage systems (like Gluster). |
63 | e77c026d | Pulkit Singhal | |
64 | e77c026d | Pulkit Singhal | Note that the RBD volume is mapped as a local block device as before. |
65 | e77c026d | Pulkit Singhal | The local mapping won't be used during instance operation in the |
66 | e77c026d | Pulkit Singhal | ``userspace`` access mode, but can be used by administrators and OS |
67 | e77c026d | Pulkit Singhal | scripts. |
68 | e77c026d | Pulkit Singhal | |
69 | e77c026d | Pulkit Singhal | Updated commands |
70 | e77c026d | Pulkit Singhal | ---------------- |
71 | e77c026d | Pulkit Singhal | :: |
72 | e77c026d | Pulkit Singhal | $ gnt-instance info |
73 | e77c026d | Pulkit Singhal | |
74 | e77c026d | Pulkit Singhal | ``access:userspace/kernelspace`` will be added to Disks category. This |
75 | e77c026d | Pulkit Singhal | output applies to KVM based instances only. |
76 | e77c026d | Pulkit Singhal | |
77 | e77c026d | Pulkit Singhal | Ceph configuration on Ganeti nodes |
78 | e77c026d | Pulkit Singhal | ================================== |
79 | e77c026d | Pulkit Singhal | |
80 | e77c026d | Pulkit Singhal | This document proposes configuration of distributed storage |
81 | e77c026d | Pulkit Singhal | pool (Ceph or Gluster) through ganeti. Currently, this design document |
82 | e77c026d | Pulkit Singhal | focuses on configuring a Ceph cluster. A prerequisite of this setup |
83 | e77c026d | Pulkit Singhal | would be installation of ceph packages on all the concerned nodes. |
84 | e77c026d | Pulkit Singhal | |
85 | e77c026d | Pulkit Singhal | At Ganeti Cluster init, the user will set distributed-storage specific |
86 | e77c026d | Pulkit Singhal | options which will be stored at cluster level. The Storage cluster |
87 | e77c026d | Pulkit Singhal | will be initialized using ``gnt-storage``. For the prototype, only a |
88 | e77c026d | Pulkit Singhal | single storage pool/node-group is configured. |
89 | e77c026d | Pulkit Singhal | |
90 | e77c026d | Pulkit Singhal | Following steps take place when a node-group is initialized as a storage |
91 | e77c026d | Pulkit Singhal | cluster. |
92 | e77c026d | Pulkit Singhal | |
93 | e77c026d | Pulkit Singhal | - Check for an existing ceph cluster through /etc/ceph/ceph.conf file |
94 | e77c026d | Pulkit Singhal | on each node. |
95 | e77c026d | Pulkit Singhal | - Fetch cluster configuration parameters and create a distributed |
96 | e77c026d | Pulkit Singhal | storage object accordingly. |
97 | e77c026d | Pulkit Singhal | - Issue an 'init distributed storage' RPC to group nodes (if any). |
98 | e77c026d | Pulkit Singhal | - On each node, ``ceph`` cli tool will run appropriate services. |
99 | e77c026d | Pulkit Singhal | - Mark nodes as well as the node-group as distributed-storage-enabled. |
100 | e77c026d | Pulkit Singhal | |
101 | e77c026d | Pulkit Singhal | The storage cluster will operate at a node-group level. The ceph |
102 | e77c026d | Pulkit Singhal | cluster will be initiated using gnt-storage. A new sub-command |
103 | e77c026d | Pulkit Singhal | ``init-distributed-storage`` will be added to it. |
104 | e77c026d | Pulkit Singhal | |
105 | e77c026d | Pulkit Singhal | The configuration of the nodes will be handled through an init function |
106 | e77c026d | Pulkit Singhal | called by the node daemons running on the respective nodes. A new RPC is |
107 | e77c026d | Pulkit Singhal | introduced to handle the calls. |
108 | e77c026d | Pulkit Singhal | |
109 | e77c026d | Pulkit Singhal | A new object will be created to send the storage parameters to the node |
110 | e77c026d | Pulkit Singhal | - storage_type, devices, node_role (mon/osd) etc. |
111 | e77c026d | Pulkit Singhal | |
112 | e77c026d | Pulkit Singhal | A new node can be directly assigned to the storage enabled node-group. |
113 | e77c026d | Pulkit Singhal | During the 'gnt-node add' process, required ceph daemons will be started |
114 | e77c026d | Pulkit Singhal | and node will be added to the ceph cluster. |
115 | e77c026d | Pulkit Singhal | |
116 | e77c026d | Pulkit Singhal | Only an offline node can be assigned to storage enabled node-group. |
117 | e77c026d | Pulkit Singhal | ``gnt-node --readd`` needs to be performed to issue RPCs for spawning |
118 | e77c026d | Pulkit Singhal | appropriate services on the newly assigned node. |
119 | e77c026d | Pulkit Singhal | |
120 | e77c026d | Pulkit Singhal | Updated Commands |
121 | e77c026d | Pulkit Singhal | ---------------- |
122 | e77c026d | Pulkit Singhal | |
123 | e77c026d | Pulkit Singhal | Following are the affected commands.:: |
124 | e77c026d | Pulkit Singhal | |
125 | e77c026d | Pulkit Singhal | $ gnt-cluster init -S ceph:disk=/dev/sdb,option=value... |
126 | e77c026d | Pulkit Singhal | |
127 | e77c026d | Pulkit Singhal | During cluster initialization, ceph specific options are provided which |
128 | e77c026d | Pulkit Singhal | apply at cluster-level.:: |
129 | e77c026d | Pulkit Singhal | |
130 | e77c026d | Pulkit Singhal | $ gnt-cluster modify -S ceph:option=value2... |
131 | e77c026d | Pulkit Singhal | |
132 | e77c026d | Pulkit Singhal | For now, cluster modification will be allowed when there is no |
133 | e77c026d | Pulkit Singhal | initialized storage cluster.:: |
134 | e77c026d | Pulkit Singhal | |
135 | e77c026d | Pulkit Singhal | $ gnt-storage init-distributed-storage -s{--storage-type} ceph \ |
136 | e77c026d | Pulkit Singhal | <node-group> |
137 | e77c026d | Pulkit Singhal | |
138 | e77c026d | Pulkit Singhal | Ensure that no other node-group is configured as distributed storage |
139 | e77c026d | Pulkit Singhal | cluster and configure ceph on the specified node-group. If there is no |
140 | e77c026d | Pulkit Singhal | node in the node-group, it'll only be marked as distributed storage |
141 | e77c026d | Pulkit Singhal | enabled and no action will be taken.:: |
142 | e77c026d | Pulkit Singhal | |
143 | e77c026d | Pulkit Singhal | $ gnt-group assign-nodes <group> <node> |
144 | e77c026d | Pulkit Singhal | |
145 | e77c026d | Pulkit Singhal | It ensures that the node is offline if the node-group specified is |
146 | e77c026d | Pulkit Singhal | distributed storage capable. Ceph configuration on the newly assigned |
147 | e77c026d | Pulkit Singhal | node is not performed at this step.:: |
148 | e77c026d | Pulkit Singhal | |
149 | e77c026d | Pulkit Singhal | $ gnt-node --offline |
150 | e77c026d | Pulkit Singhal | |
151 | e77c026d | Pulkit Singhal | If the node is part of storage node-group, an offline call will stop/remove |
152 | e77c026d | Pulkit Singhal | ceph daemons.:: |
153 | e77c026d | Pulkit Singhal | |
154 | e77c026d | Pulkit Singhal | $ gnt-node add --readd |
155 | e77c026d | Pulkit Singhal | |
156 | e77c026d | Pulkit Singhal | If the node is now part of the storage node-group, issue init |
157 | e77c026d | Pulkit Singhal | distributed storage RPC to the respective node. This step is required |
158 | e77c026d | Pulkit Singhal | after assigning a node to the storage enabled node-group:: |
159 | e77c026d | Pulkit Singhal | |
160 | e77c026d | Pulkit Singhal | $ gnt-node remove |
161 | e77c026d | Pulkit Singhal | |
162 | e77c026d | Pulkit Singhal | A warning will be issued stating that the node is part of distributed |
163 | e77c026d | Pulkit Singhal | storage, mark it offline before removal. |
164 | e77c026d | Pulkit Singhal | |
165 | e77c026d | Pulkit Singhal | Data collector for Ceph |
166 | e77c026d | Pulkit Singhal | ----------------------- |
167 | e77c026d | Pulkit Singhal | |
168 | e77c026d | Pulkit Singhal | TBD |
169 | e77c026d | Pulkit Singhal | |
170 | e77c026d | Pulkit Singhal | Future Work |
171 | e77c026d | Pulkit Singhal | ----------- |
172 | e77c026d | Pulkit Singhal | |
173 | e77c026d | Pulkit Singhal | Due to the loopback bug in ceph, one may run into daemon hang issues |
174 | e77c026d | Pulkit Singhal | while performing writes to a RBD volumes through block device mapping. |
175 | e77c026d | Pulkit Singhal | This bug is applicable only when the RBD volume is stored on the OSD |
176 | e77c026d | Pulkit Singhal | running on the local node. In order to mitigate this issue, we can |
177 | e77c026d | Pulkit Singhal | create storage pools on different nodegroups and access RBD |
178 | e77c026d | Pulkit Singhal | volumes on different pools. |
179 | e77c026d | Pulkit Singhal | http://tracker.ceph.com/issues/3076 |
180 | e77c026d | Pulkit Singhal | |
181 | e77c026d | Pulkit Singhal | .. vim: set textwidth=72 : |
182 | e77c026d | Pulkit Singhal | .. Local Variables: |
183 | e77c026d | Pulkit Singhal | .. mode: rst |
184 | e77c026d | Pulkit Singhal | .. fill-column: 72 |
185 | e77c026d | Pulkit Singhal | .. End: |