Statistics
| Branch: | Tag: | Revision:

root / doc / design-shared-storage.rst @ 54f834df

History | View | Annotate | Download (10.8 kB)

1 a997cec5 Apollon Oikonomopoulos
======================================
2 a997cec5 Apollon Oikonomopoulos
Ganeti shared storage support for 2.3+
3 a997cec5 Apollon Oikonomopoulos
======================================
4 a997cec5 Apollon Oikonomopoulos
5 a997cec5 Apollon Oikonomopoulos
This document describes the changes in Ganeti 2.3+ compared to Ganeti
6 a997cec5 Apollon Oikonomopoulos
2.3 storage model.
7 a997cec5 Apollon Oikonomopoulos
8 a997cec5 Apollon Oikonomopoulos
.. contents:: :depth: 4
9 a997cec5 Apollon Oikonomopoulos
10 a997cec5 Apollon Oikonomopoulos
Objective
11 a997cec5 Apollon Oikonomopoulos
=========
12 a997cec5 Apollon Oikonomopoulos
13 a997cec5 Apollon Oikonomopoulos
The aim is to introduce support for externally mirrored, shared storage.
14 a997cec5 Apollon Oikonomopoulos
This includes two distinct disk templates:
15 a997cec5 Apollon Oikonomopoulos
16 a997cec5 Apollon Oikonomopoulos
- A shared filesystem containing instance disks as regular files
17 a997cec5 Apollon Oikonomopoulos
  typically residing on a networked or cluster filesystem (e.g. NFS,
18 a997cec5 Apollon Oikonomopoulos
  AFS, Ceph, OCFS2, etc.).
19 a997cec5 Apollon Oikonomopoulos
- Instance images being shared block devices, typically LUNs residing on
20 a997cec5 Apollon Oikonomopoulos
  a SAN appliance.
21 a997cec5 Apollon Oikonomopoulos
22 a997cec5 Apollon Oikonomopoulos
Background
23 a997cec5 Apollon Oikonomopoulos
==========
24 a997cec5 Apollon Oikonomopoulos
DRBD is currently the only shared storage backend supported by Ganeti.
25 a997cec5 Apollon Oikonomopoulos
DRBD offers the advantages of high availability while running on
26 a997cec5 Apollon Oikonomopoulos
commodity hardware at the cost of high network I/O for block-level
27 a997cec5 Apollon Oikonomopoulos
synchronization between hosts. DRBD's master-slave model has greatly
28 a997cec5 Apollon Oikonomopoulos
influenced Ganeti's design, primarily by introducing the concept of
29 a997cec5 Apollon Oikonomopoulos
primary and secondary nodes and thus defining an instance's “mobility
30 a997cec5 Apollon Oikonomopoulos
domain”.
31 a997cec5 Apollon Oikonomopoulos
32 a997cec5 Apollon Oikonomopoulos
Although DRBD has many advantages, many sites choose to use networked
33 a997cec5 Apollon Oikonomopoulos
storage appliances for Virtual Machine hosting, such as SAN and/or NAS,
34 a997cec5 Apollon Oikonomopoulos
which provide shared storage without the administrative overhead of DRBD
35 a997cec5 Apollon Oikonomopoulos
nor the limitation of a 1:1 master-slave setup. Furthermore, new
36 a997cec5 Apollon Oikonomopoulos
distributed filesystems such as Ceph are becoming viable alternatives to
37 a997cec5 Apollon Oikonomopoulos
expensive storage appliances. Support for both modes of operation, i.e.
38 a997cec5 Apollon Oikonomopoulos
shared block storage and shared file storage backend would make Ganeti a
39 a997cec5 Apollon Oikonomopoulos
robust choice for high-availability virtualization clusters.
40 a997cec5 Apollon Oikonomopoulos
41 a997cec5 Apollon Oikonomopoulos
Throughout this document, the term “externally mirrored storage” will
42 a997cec5 Apollon Oikonomopoulos
refer to both modes of shared storage, suggesting that Ganeti does not
43 a997cec5 Apollon Oikonomopoulos
need to take care about the mirroring process from one host to another.
44 a997cec5 Apollon Oikonomopoulos
45 a997cec5 Apollon Oikonomopoulos
Use cases
46 a997cec5 Apollon Oikonomopoulos
=========
47 a997cec5 Apollon Oikonomopoulos
We consider the following use cases:
48 a997cec5 Apollon Oikonomopoulos
49 a997cec5 Apollon Oikonomopoulos
- A virtualization cluster with FibreChannel shared storage, mapping at
50 a997cec5 Apollon Oikonomopoulos
  leaste one LUN per instance, accessible by the whole cluster.
51 a997cec5 Apollon Oikonomopoulos
- A virtualization cluster with instance images stored as files on an
52 a997cec5 Apollon Oikonomopoulos
  NFS server.
53 a997cec5 Apollon Oikonomopoulos
- A virtualization cluster storing instance images on a Ceph volume.
54 a997cec5 Apollon Oikonomopoulos
55 a997cec5 Apollon Oikonomopoulos
Design Overview
56 a997cec5 Apollon Oikonomopoulos
===============
57 a997cec5 Apollon Oikonomopoulos
58 a997cec5 Apollon Oikonomopoulos
The design addresses the following procedures:
59 a997cec5 Apollon Oikonomopoulos
60 a997cec5 Apollon Oikonomopoulos
- Refactoring of all code referring to constants.DTS_NET_MIRROR.
61 a997cec5 Apollon Oikonomopoulos
- Obsolescence of the primary-secondary concept for externally mirrored
62 a997cec5 Apollon Oikonomopoulos
  storage.
63 a997cec5 Apollon Oikonomopoulos
- Introduction of a shared file storage disk template for use with networked
64 a997cec5 Apollon Oikonomopoulos
  filesystems.
65 a997cec5 Apollon Oikonomopoulos
- Introduction of shared block device disk template with device
66 a997cec5 Apollon Oikonomopoulos
  adoption.
67 a997cec5 Apollon Oikonomopoulos
68 a997cec5 Apollon Oikonomopoulos
Additionally, mid- to long-term goals include:
69 a997cec5 Apollon Oikonomopoulos
70 a997cec5 Apollon Oikonomopoulos
- Support for external “storage pools”.
71 a997cec5 Apollon Oikonomopoulos
- Introduction of an interface for communicating with external scripts,
72 a997cec5 Apollon Oikonomopoulos
  providing methods for the various stages of a block device's and
73 a997cec5 Apollon Oikonomopoulos
  instance's life-cycle. In order to provide storage provisioning
74 a997cec5 Apollon Oikonomopoulos
  capabilities for various SAN appliances, external helpers in the form
75 a997cec5 Apollon Oikonomopoulos
  of a “storage driver” will be possibly introduced as well.
76 a997cec5 Apollon Oikonomopoulos
77 a997cec5 Apollon Oikonomopoulos
Refactoring of all code referring to constants.DTS_NET_MIRROR
78 a997cec5 Apollon Oikonomopoulos
=============================================================
79 a997cec5 Apollon Oikonomopoulos
80 a997cec5 Apollon Oikonomopoulos
Currently, all storage-related decision-making depends on a number of
81 a997cec5 Apollon Oikonomopoulos
frozensets in lib/constants.py, typically constants.DTS_NET_MIRROR.
82 a997cec5 Apollon Oikonomopoulos
However, constants.DTS_NET_MIRROR is used to signify two different
83 a997cec5 Apollon Oikonomopoulos
attributes:
84 a997cec5 Apollon Oikonomopoulos
85 a997cec5 Apollon Oikonomopoulos
- A storage device that is shared
86 a997cec5 Apollon Oikonomopoulos
- A storage device whose mirroring is supervised by Ganeti
87 a997cec5 Apollon Oikonomopoulos
88 a997cec5 Apollon Oikonomopoulos
We propose the introduction of two new frozensets to ease
89 a997cec5 Apollon Oikonomopoulos
decision-making:
90 a997cec5 Apollon Oikonomopoulos
91 a997cec5 Apollon Oikonomopoulos
- constants.DTS_EXT_MIRROR, holding externally mirrored disk templates
92 a997cec5 Apollon Oikonomopoulos
- constants.DTS_MIRRORED, being a union of constants.DTS_EXT_MIRROR and
93 a997cec5 Apollon Oikonomopoulos
  DTS_NET_MIRROR.
94 a997cec5 Apollon Oikonomopoulos
95 a997cec5 Apollon Oikonomopoulos
Additionally, DTS_NET_MIRROR will be renamed to DTS_INT_MIRROR to reflect
96 a997cec5 Apollon Oikonomopoulos
the status of the storage as internally mirrored by Ganeti.
97 a997cec5 Apollon Oikonomopoulos
98 a997cec5 Apollon Oikonomopoulos
Thus, checks could be grouped into the following categories:
99 a997cec5 Apollon Oikonomopoulos
100 a997cec5 Apollon Oikonomopoulos
- Mobility checks, like whether an instance failover or migration is
101 a997cec5 Apollon Oikonomopoulos
  possible should check against constants.DTS_MIRRORED
102 a997cec5 Apollon Oikonomopoulos
- Syncing actions should be performed only for templates in
103 a997cec5 Apollon Oikonomopoulos
  constants.DTS_NET_MIRROR
104 a997cec5 Apollon Oikonomopoulos
105 a997cec5 Apollon Oikonomopoulos
Obsolescence of the primary-secondary node model
106 a997cec5 Apollon Oikonomopoulos
================================================
107 a997cec5 Apollon Oikonomopoulos
108 a997cec5 Apollon Oikonomopoulos
The primary-secondary node concept has primarily evolved through the use
109 a997cec5 Apollon Oikonomopoulos
of DRBD. In a globally shared storage framework without need for
110 a997cec5 Apollon Oikonomopoulos
external sync (e.g. SAN, NAS, etc.), such a notion does not apply for the
111 a997cec5 Apollon Oikonomopoulos
following reasons:
112 a997cec5 Apollon Oikonomopoulos
113 a997cec5 Apollon Oikonomopoulos
1. Access to the storage does not necessarily imply different roles for
114 a997cec5 Apollon Oikonomopoulos
   the nodes (e.g. primary vs secondary).
115 a997cec5 Apollon Oikonomopoulos
2. The same storage is available to potentially more than 2 nodes. Thus,
116 a997cec5 Apollon Oikonomopoulos
   an instance backed by a SAN LUN for example may actually migrate to
117 a997cec5 Apollon Oikonomopoulos
   any of the other nodes and not just a pre-designated failover node.
118 a997cec5 Apollon Oikonomopoulos
119 a997cec5 Apollon Oikonomopoulos
The proposed solution is using the iallocator framework for run-time
120 a997cec5 Apollon Oikonomopoulos
decision making during migration and failover, for nodes with disk
121 a997cec5 Apollon Oikonomopoulos
templates in constants.DTS_EXT_MIRROR. Modifications to gnt-instance and
122 a997cec5 Apollon Oikonomopoulos
gnt-node will be required to accept target node and/or iallocator
123 a997cec5 Apollon Oikonomopoulos
specification for these operations. Modifications of the iallocator
124 a997cec5 Apollon Oikonomopoulos
protocol will be required to address at least the following needs:
125 a997cec5 Apollon Oikonomopoulos
126 a997cec5 Apollon Oikonomopoulos
- Allocation tools must be able to distinguish between internal and
127 a997cec5 Apollon Oikonomopoulos
  external storage
128 a997cec5 Apollon Oikonomopoulos
- Migration/failover decisions must take into account shared storage
129 a997cec5 Apollon Oikonomopoulos
  availability
130 a997cec5 Apollon Oikonomopoulos
131 a997cec5 Apollon Oikonomopoulos
Introduction of a shared file disk template
132 a997cec5 Apollon Oikonomopoulos
===========================================
133 a997cec5 Apollon Oikonomopoulos
134 a997cec5 Apollon Oikonomopoulos
Basic shared file storage support can be implemented by creating a new
135 a997cec5 Apollon Oikonomopoulos
disk template based on the existing FileStorage class, with only minor
136 a997cec5 Apollon Oikonomopoulos
modifications in lib/bdev.py. The shared file disk template relies on a
137 a997cec5 Apollon Oikonomopoulos
shared filesystem (e.g. NFS, AFS, Ceph, OCFS2 over SAN or DRBD) being
138 a997cec5 Apollon Oikonomopoulos
mounted on all nodes under the same path, where instance images will be
139 a997cec5 Apollon Oikonomopoulos
saved.
140 a997cec5 Apollon Oikonomopoulos
141 a997cec5 Apollon Oikonomopoulos
A new cluster initialization option is added to specify the mountpoint
142 a997cec5 Apollon Oikonomopoulos
of the shared filesystem.
143 a997cec5 Apollon Oikonomopoulos
144 a997cec5 Apollon Oikonomopoulos
The remainder of this document deals with shared block storage.
145 a997cec5 Apollon Oikonomopoulos
146 a997cec5 Apollon Oikonomopoulos
Introduction of a shared block device template
147 a997cec5 Apollon Oikonomopoulos
==============================================
148 a997cec5 Apollon Oikonomopoulos
149 a997cec5 Apollon Oikonomopoulos
Basic shared block device support will be implemented with an additional
150 a997cec5 Apollon Oikonomopoulos
disk template. This disk template will not feature any kind of storage
151 a997cec5 Apollon Oikonomopoulos
control (provisioning, removal, resizing, etc.), but will instead rely
152 a997cec5 Apollon Oikonomopoulos
on the adoption of already-existing block devices (e.g. SAN LUNs, NBD
153 a997cec5 Apollon Oikonomopoulos
devices, remote iSCSI targets, etc.).
154 a997cec5 Apollon Oikonomopoulos
155 a997cec5 Apollon Oikonomopoulos
The shared block device template will make the following assumptions:
156 a997cec5 Apollon Oikonomopoulos
157 a997cec5 Apollon Oikonomopoulos
- The adopted block device has a consistent name across all nodes,
158 a997cec5 Apollon Oikonomopoulos
  enforced e.g. via udev rules.
159 a997cec5 Apollon Oikonomopoulos
- The device will be available with the same path under all nodes in the
160 a997cec5 Apollon Oikonomopoulos
  node group.
161 a997cec5 Apollon Oikonomopoulos
162 a997cec5 Apollon Oikonomopoulos
Long-term shared storage goals
163 a997cec5 Apollon Oikonomopoulos
==============================
164 a997cec5 Apollon Oikonomopoulos
Storage pool handling
165 a997cec5 Apollon Oikonomopoulos
---------------------
166 a997cec5 Apollon Oikonomopoulos
167 a997cec5 Apollon Oikonomopoulos
A new cluster configuration attribute will be introduced, named
168 a997cec5 Apollon Oikonomopoulos
“storage_pools”, modeled as a dictionary mapping storage pools to
169 a997cec5 Apollon Oikonomopoulos
external storage drivers (see below), e.g.::
170 a997cec5 Apollon Oikonomopoulos
171 a997cec5 Apollon Oikonomopoulos
 {
172 a997cec5 Apollon Oikonomopoulos
  "nas1": "foostore",
173 a997cec5 Apollon Oikonomopoulos
  "nas2": "foostore",
174 a997cec5 Apollon Oikonomopoulos
  "cloud1": "barcloud",
175 a997cec5 Apollon Oikonomopoulos
 }
176 a997cec5 Apollon Oikonomopoulos
177 a997cec5 Apollon Oikonomopoulos
Ganeti will not interpret the contents of this dictionary, although it
178 a997cec5 Apollon Oikonomopoulos
will provide methods for manipulating them under some basic constraints
179 a997cec5 Apollon Oikonomopoulos
(pool identifier uniqueness, driver existence). The manipulation of
180 a997cec5 Apollon Oikonomopoulos
storage pools will be performed by implementing new options to the
181 a997cec5 Apollon Oikonomopoulos
`gnt-cluster` command::
182 a997cec5 Apollon Oikonomopoulos
183 a997cec5 Apollon Oikonomopoulos
 gnt-cluster modify --add-pool nas1 foostore
184 a997cec5 Apollon Oikonomopoulos
 gnt-cluster modify --remove-pool nas1 # There may be no instances using
185 a997cec5 Apollon Oikonomopoulos
                                       # the pool to remove it
186 a997cec5 Apollon Oikonomopoulos
187 a997cec5 Apollon Oikonomopoulos
Furthermore, the storage pools will be used to indicate the availability
188 a997cec5 Apollon Oikonomopoulos
of storage pools to different node groups, thus specifying the
189 a997cec5 Apollon Oikonomopoulos
instances' “mobility domain”.
190 a997cec5 Apollon Oikonomopoulos
191 a997cec5 Apollon Oikonomopoulos
New disk templates will also be necessary to facilitate the use of external
192 a997cec5 Apollon Oikonomopoulos
storage. The proposed addition is a whole template namespace created by
193 a997cec5 Apollon Oikonomopoulos
prefixing the pool names with a fixed string, e.g. “ext:”, forming names
194 a997cec5 Apollon Oikonomopoulos
like “ext:nas1”, “ext:foo”.
195 a997cec5 Apollon Oikonomopoulos
196 a997cec5 Apollon Oikonomopoulos
Interface to the external storage drivers
197 a997cec5 Apollon Oikonomopoulos
-----------------------------------------
198 a997cec5 Apollon Oikonomopoulos
199 a997cec5 Apollon Oikonomopoulos
In addition to external storage pools, a new interface will be
200 a997cec5 Apollon Oikonomopoulos
introduced to allow external scripts to provision and manipulate shared
201 a997cec5 Apollon Oikonomopoulos
storage.
202 a997cec5 Apollon Oikonomopoulos
203 a997cec5 Apollon Oikonomopoulos
In order to provide storage provisioning and manipulation (e.g. growing,
204 a997cec5 Apollon Oikonomopoulos
renaming) capabilities, each instance's disk template can possibly be
205 a997cec5 Apollon Oikonomopoulos
associated with an external “storage driver” which, based on the
206 a997cec5 Apollon Oikonomopoulos
instance's configuration and tags, will perform all supported storage
207 a997cec5 Apollon Oikonomopoulos
operations using auxiliary means (e.g. XML-RPC, ssh, etc.).
208 a997cec5 Apollon Oikonomopoulos
209 a997cec5 Apollon Oikonomopoulos
A “storage driver” will have to provide the following methods:
210 a997cec5 Apollon Oikonomopoulos
211 a997cec5 Apollon Oikonomopoulos
- Create a disk
212 a997cec5 Apollon Oikonomopoulos
- Remove a disk
213 a997cec5 Apollon Oikonomopoulos
- Rename a disk
214 a997cec5 Apollon Oikonomopoulos
- Resize a disk
215 a997cec5 Apollon Oikonomopoulos
- Attach a disk to a given node
216 a997cec5 Apollon Oikonomopoulos
- Detach a disk from a given node
217 a997cec5 Apollon Oikonomopoulos
218 a997cec5 Apollon Oikonomopoulos
The proposed storage driver architecture borrows heavily from the OS
219 a997cec5 Apollon Oikonomopoulos
interface and follows a one-script-per-function approach. A storage
220 a997cec5 Apollon Oikonomopoulos
driver is expected to provide the following scripts:
221 a997cec5 Apollon Oikonomopoulos
222 a997cec5 Apollon Oikonomopoulos
- `create`
223 a997cec5 Apollon Oikonomopoulos
- `resize`
224 a997cec5 Apollon Oikonomopoulos
- `rename`
225 a997cec5 Apollon Oikonomopoulos
- `remove`
226 a997cec5 Apollon Oikonomopoulos
- `attach`
227 a997cec5 Apollon Oikonomopoulos
- `detach`
228 a997cec5 Apollon Oikonomopoulos
229 a997cec5 Apollon Oikonomopoulos
These executables will be called once for each disk with no arguments
230 a997cec5 Apollon Oikonomopoulos
and all required information will be passed through environment
231 a997cec5 Apollon Oikonomopoulos
variables. The following environment variables will always be present on
232 a997cec5 Apollon Oikonomopoulos
each invocation:
233 a997cec5 Apollon Oikonomopoulos
234 a997cec5 Apollon Oikonomopoulos
- `INSTANCE_NAME`: The instance's name
235 a997cec5 Apollon Oikonomopoulos
- `INSTANCE_UUID`: The instance's UUID
236 a997cec5 Apollon Oikonomopoulos
- `INSTANCE_TAGS`: The instance's tags
237 a997cec5 Apollon Oikonomopoulos
- `DISK_INDEX`: The current disk index.
238 a997cec5 Apollon Oikonomopoulos
- `LOGICAL_ID`: The disk's logical id (if existing)
239 a997cec5 Apollon Oikonomopoulos
- `POOL`: The storage pool the instance belongs to.
240 a997cec5 Apollon Oikonomopoulos
241 a997cec5 Apollon Oikonomopoulos
Additional variables may be available in a per-script context (see
242 a997cec5 Apollon Oikonomopoulos
below).
243 a997cec5 Apollon Oikonomopoulos
244 a997cec5 Apollon Oikonomopoulos
Of particular importance is the disk's logical ID, which will act as
245 a997cec5 Apollon Oikonomopoulos
glue between Ganeti and the external storage drivers; there are two
246 a997cec5 Apollon Oikonomopoulos
possible ways of using a disk's logical ID in a storage driver:
247 a997cec5 Apollon Oikonomopoulos
248 a997cec5 Apollon Oikonomopoulos
1. Simply use it as a unique identifier (e.g. UUID) and keep a separate,
249 a997cec5 Apollon Oikonomopoulos
   external database linking it to the actual storage.
250 a997cec5 Apollon Oikonomopoulos
2. Encode all useful storage information in the logical ID and have the
251 a997cec5 Apollon Oikonomopoulos
   driver decode it at runtime.
252 a997cec5 Apollon Oikonomopoulos
253 a997cec5 Apollon Oikonomopoulos
All scripts should return 0 on success and non-zero on error accompanied by
254 a997cec5 Apollon Oikonomopoulos
an appropriate error message on stderr. Furthermore, the following
255 a997cec5 Apollon Oikonomopoulos
special cases are defined:
256 a997cec5 Apollon Oikonomopoulos
257 a997cec5 Apollon Oikonomopoulos
1. `create` In case of success, a string representing the disk's logical
258 a997cec5 Apollon Oikonomopoulos
   id must be returned on stdout, which will be saved in the instance's
259 a997cec5 Apollon Oikonomopoulos
   configuration and can be later used by the other scripts of the same
260 a997cec5 Apollon Oikonomopoulos
   storage driver. The logical id may be based on instance name,
261 a997cec5 Apollon Oikonomopoulos
   instance uuid and/or disk index.
262 a997cec5 Apollon Oikonomopoulos
263 a997cec5 Apollon Oikonomopoulos
   Additional environment variables present:
264 a997cec5 Apollon Oikonomopoulos
     - `DISK_SIZE`: The requested disk size in MiB
265 a997cec5 Apollon Oikonomopoulos
266 a997cec5 Apollon Oikonomopoulos
2. `resize` In case of success, output the new disk size.
267 a997cec5 Apollon Oikonomopoulos
268 a997cec5 Apollon Oikonomopoulos
   Additional environment variables present:
269 a997cec5 Apollon Oikonomopoulos
     - `DISK_SIZE`: The requested disk size in MiB
270 a997cec5 Apollon Oikonomopoulos
271 a997cec5 Apollon Oikonomopoulos
3. `rename` On success, a new logical id should be returned, which will
272 a997cec5 Apollon Oikonomopoulos
   replace the old one. This script is meant to rename the instance's
273 a997cec5 Apollon Oikonomopoulos
   backing store and update the disk's logical ID in case one of them is
274 a997cec5 Apollon Oikonomopoulos
   bound to the instance name.
275 a997cec5 Apollon Oikonomopoulos
276 a997cec5 Apollon Oikonomopoulos
   Additional environment variables present:
277 a997cec5 Apollon Oikonomopoulos
     - `NEW_INSTANCE_NAME`: The instance's new name.
278 a997cec5 Apollon Oikonomopoulos
279 a997cec5 Apollon Oikonomopoulos
280 a997cec5 Apollon Oikonomopoulos
.. vim: set textwidth=72 :