Revision e77c026d

b/doc/design-ceph-ganeti-support.rst
1
============================
2
RADOS/Ceph support in Ganeti
3
============================
4

  
5
.. contents:: :depth: 4
6

  
7
Objective
8
=========
9

  
10
The project aims to improve Ceph RBD support in Ganeti. It can be
11
primarily divided into following tasks.
12

  
13
- Use Qemu/KVM RBD driver to provide instances with direct RBD
14
  support.
15
- Allow Ceph RBDs' configuration through Ganeti.
16
- Write a data collector to monitor Ceph nodes.
17

  
18
Background
19
==========
20

  
21
Ceph RBD
22
--------
23

  
24
Ceph is a distributed storage system which provides data access as
25
files, objects and blocks. As part of this project, we're interested in
26
integrating ceph's block device (RBD) directly with Qemu/KVM.
27

  
28
Primary components/daemons of Ceph.
29
- Monitor - Serve as authentication point for clients.
30
- Metadata - Store all the filesystem metadata (Not configured here as
31
they are not required for RBD)
32
- OSD - Object storage devices. One daemon for each drive/location.
33

  
34
RBD support in Ganeti
35
---------------------
36

  
37
Currently, Ganeti supports RBD volumes on a pre-configured Ceph cluster.
38
This is enabled through RBD disk templates. These templates allow RBD
39
volume's access through RBD Linux driver. The volumes are mapped to host
40
as local block devices which are then attached to the instances. This
41
method incurs an additional overhead. We plan to resolve it by using
42
Qemu's RBD driver to enable direct access to RBD volumes for KVM
43
instances.
44

  
45
Also, Ganeti currently uses RBD volumes on a pre-configured ceph cluster.
46
Allowing configuration of ceph nodes through Ganeti will be a good
47
addition to its prime features.
48

  
49

  
50
Qemu/KVM Direct RBD Integration
51
===============================
52

  
53
A new disk param ``access`` is introduced. It's added at
54
cluster/node-group level to simplify prototype implementation.
55
It will specify the access method either as ``userspace`` or
56
``kernelspace``. It's accessible to StartInstance() in hv_kvm.py. The
57
device path, ``rbd:<pool>/<vol_name>``, is generated by RADOSBlockDevice
58
and is added to the params dictionary as ``kvm_dev_path``.
59

  
60
This approach ensures that no disk template specific changes are
61
required in hv_kvm.py allowing easy integration of other distributed
62
storage systems (like Gluster).
63

  
64
Note that the RBD volume is mapped as a local block device as before.
65
The local mapping won't be used during instance operation in the
66
``userspace`` access mode, but can be used by administrators and OS
67
scripts.
68

  
69
Updated commands
70
----------------
71
::
72
  $ gnt-instance info
73

  
74
``access:userspace/kernelspace`` will be added to Disks category. This
75
output applies to KVM based instances only.
76

  
77
Ceph configuration on Ganeti nodes
78
==================================
79

  
80
This document proposes configuration of distributed storage
81
pool (Ceph or Gluster) through ganeti. Currently, this design document
82
focuses on configuring a Ceph cluster. A prerequisite of this setup
83
would be installation of ceph packages on all the concerned nodes.
84

  
85
At Ganeti Cluster init, the user will set distributed-storage specific
86
options which will be stored at cluster level. The Storage cluster
87
will be initialized using ``gnt-storage``. For the prototype, only a
88
single storage pool/node-group is configured.
89

  
90
Following steps take place when a node-group is initialized as a storage
91
cluster.
92

  
93
  - Check for an existing ceph cluster through /etc/ceph/ceph.conf file
94
    on each node.
95
  - Fetch cluster configuration parameters and create a distributed
96
    storage object accordingly.
97
  - Issue an 'init distributed storage' RPC to group nodes (if any).
98
  - On each node, ``ceph`` cli tool will run appropriate services.
99
  - Mark nodes as well as the node-group as distributed-storage-enabled.
100

  
101
The storage cluster will operate at a node-group level. The ceph
102
cluster will be initiated using gnt-storage. A new sub-command
103
``init-distributed-storage`` will be added to it.
104

  
105
The configuration of the nodes will be handled through an init function
106
called by the node daemons running on the respective nodes. A new RPC is
107
introduced to handle the calls.
108

  
109
A new object will be created to send the storage parameters to the node
110
- storage_type, devices, node_role (mon/osd) etc.
111

  
112
A new node can be directly assigned to the storage enabled node-group.
113
During the 'gnt-node add' process, required ceph daemons will be started
114
and node will be added to the ceph cluster.
115

  
116
Only an offline node can be assigned to storage enabled node-group.
117
``gnt-node --readd`` needs to be performed to issue RPCs for spawning
118
appropriate services on the newly assigned node.
119

  
120
Updated Commands
121
----------------
122

  
123
Following are the affected commands.::
124

  
125
  $ gnt-cluster init -S ceph:disk=/dev/sdb,option=value...
126

  
127
During cluster initialization, ceph specific options are provided which
128
apply at cluster-level.::
129

  
130
  $ gnt-cluster modify -S ceph:option=value2...
131

  
132
For now, cluster modification will be allowed when there is no
133
initialized storage cluster.::
134

  
135
  $ gnt-storage init-distributed-storage -s{--storage-type} ceph \
136
    <node-group>
137

  
138
Ensure that no other node-group is configured as distributed storage
139
cluster and configure ceph on the specified node-group. If there is no
140
node in the node-group, it'll only be marked as distributed storage
141
enabled and no action will be taken.::
142

  
143
  $ gnt-group assign-nodes <group> <node>
144

  
145
It ensures that the node is offline if the node-group specified is
146
distributed storage capable. Ceph configuration on the newly assigned
147
node is not performed at this step.::
148

  
149
  $ gnt-node --offline
150

  
151
If the node is part of storage node-group, an offline call will stop/remove
152
ceph daemons.::
153

  
154
  $ gnt-node add --readd
155

  
156
If the node is now part of the storage node-group, issue init
157
distributed storage RPC to the respective node. This step is required
158
after assigning a node to the storage enabled node-group::
159

  
160
  $ gnt-node remove
161

  
162
A warning will be issued stating that the node is part of distributed
163
storage, mark it offline before removal.
164

  
165
Data collector for Ceph
166
-----------------------
167

  
168
TBD
169

  
170
Future Work
171
-----------
172

  
173
Due to the loopback bug in ceph, one may run into daemon hang issues
174
while performing writes to a RBD volumes through block device mapping.
175
This bug is applicable only when the RBD volume is stored on the OSD
176
running on the local node. In order to mitigate this issue, we can
177
create storage pools on different nodegroups and access RBD
178
volumes on different pools.
179
http://tracker.ceph.com/issues/3076
180

  
181
.. vim: set textwidth=72 :
182
.. Local Variables:
183
.. mode: rst
184
.. fill-column: 72
185
.. End:
b/doc/design-draft.rst
24 24
   design-cmdlib-unittests.rst
25 25
   design-hotplug.rst
26 26
   design-optables.rst
27
   design-ceph-ganeti-support.rst
27 28

  
28 29
.. vim: set textwidth=72 :
29 30
.. Local Variables:

Also available in: Unified diff