Statistics
| Branch: | Tag: | Revision:

root / doc / design-file-based-storage.rst @ bced76fd

History | View | Annotate | Download (9.9 kB)

1
==================
2
File-based Storage
3
==================
4

    
5
This page describes the proposed file-based storage for the 2.0 version
6
of Ganeti. The project consists in extending Ganeti in order to support
7
a filesystem image as Virtual Block Device (VBD) in Dom0 as the primary
8
storage for a VM.
9

    
10
Objective
11
=========
12

    
13
Goals:
14

    
15
* file-based storage for virtual machines running in a Xen-based
16
  Ganeti cluster
17

    
18
* failover of file-based virtual machines between cluster-nodes
19

    
20
* export/import file-based virtual machines
21

    
22
* reuse existing image files
23

    
24
* allow Ganeti to initalize the cluster without checking for a volume
25
  group (e.g. xenvg)
26

    
27
Non Goals:
28

    
29
* any kind of data mirroring between clusters for file-based instances
30
  (this should be achieved by using shared storage)
31

    
32
* special support for live-migration
33

    
34
* encryption of VBDs
35

    
36
* compression of VBDs
37

    
38
Background
39
==========
40

    
41
Ganeti is a virtual server management software tool built on top of Xen
42
VM monitor and other Open Source software.
43

    
44
Since Ganeti currently supports only block devices as storage backend
45
for virtual machines, the wish came up to provide a file-based backend.
46
Using this file-based option provides the possibility to store the VBDs
47
on basically every filesystem and therefore allows to deploy external
48
data storages (e.g. SAN, NAS, etc.) in clusters.
49

    
50
Overview
51
========
52

    
53
Introduction
54
++++++++++++
55

    
56
Xen (and other hypervisors) provide(s) the possibility to use a file as
57
the primary storage for a VM. One file represents one VBD.
58

    
59
Advantages/Disadvantages
60
++++++++++++++++++++++++
61

    
62
Advantages of file-backed VBD:
63

    
64
* support of sparse allocation
65

    
66
* easy from a management/backup point of view (e.g. you can just copy
67
  the files around)
68

    
69
* external storage (e.g. SAN, NAS) can be used to store VMs
70

    
71
Disadvantages of file-backed VBD:
72
* possible performance loss for I/O-intensive workloads
73

    
74
* using sparse files requires care to ensure the sparseness is
75
  preserved when copying, and there is no header in which metadata
76
  relating back to the VM can be stored
77

    
78
Xen-related specifications
79
++++++++++++++++++++++++++
80

    
81
Driver
82
~~~~~~
83

    
84
There are several ways to realize the required functionality with an
85
underlying Xen hypervisor.
86

    
87
1) loopback driver
88
^^^^^^^^^^^^^^^^^^
89

    
90
Advantages:
91
* available in most precompiled kernels
92
* stable, since it is in kernel tree for a long time
93
* easy to set up
94

    
95
Disadvantages:
96

    
97
* buffer writes very aggressively, which can affect guest filesystem
98
  correctness in the event of a host crash
99

    
100
* can even cause out-of-memory kernel crashes in Dom0 under heavy
101
  write load
102

    
103
* substantial slowdowns under heavy I/O workloads
104

    
105
* the default number of supported loopdevices is only 8
106

    
107
* doesn't support QCOW files
108

    
109
``blktap`` driver
110
^^^^^^^^^^^^^^^^^
111

    
112
Advantages:
113

    
114
* higher performance than loopback driver
115

    
116
* more scalable
117

    
118
* better safety properties for VBD data
119

    
120
* Xen-team strongly encourages use
121

    
122
* already in Xen tree
123

    
124
* supports QCOW files
125

    
126
* asynchronous driver (i.e. high performance)
127

    
128
Disadvantages:
129

    
130
* not enabled in most precompiled kernels
131

    
132
* stable, but not as much tested as loopback driver
133

    
134
3) ubklback driver
135
^^^^^^^^^^^^^^^^^^
136

    
137
The Xen Roadmap states "Work is well under way to implement a
138
``ublkback`` driver that supports all of the various qemu file format
139
plugins".
140

    
141
Furthermore, the Roadmap includes the following:
142

    
143
  "... A special high-performance qcow plugin is also under
144
  development, that supports better metadata caching, asynchronous IO,
145
  and allows request reordering with appropriate safety barriers to
146
  enforce correctness. It remains both forward and backward compatible
147
  with existing qcow disk images, but makes adjustments to qemu's
148
  default allocation policy when creating new disks such as to
149
  optimize performance."
150

    
151
File types
152
~~~~~~~~~~
153

    
154
Raw disk image file
155
^^^^^^^^^^^^^^^^^^^
156

    
157
Advantages:
158
* Resizing supported
159
* Sparse file (filesystem dependend)
160
* simple and easily exportable
161

    
162
Disadvantages:
163

    
164
* Underlying filesystem needs to support sparse files (most
165
  filesystems do, though)
166

    
167
QCOW disk image file
168
^^^^^^^^^^^^^^^^^^^^
169

    
170
Advantages:
171

    
172
* Smaller file size, even on filesystems which don't support holes
173
  (i.e. sparse files)
174

    
175
* Snapshot support, where the image only represents changes made to an
176
  underlying disk image
177

    
178
* Optional zlib based compression
179

    
180
* Optional AES encryption
181

    
182
Disadvantages:
183
* Resizing not supported yet (it's on the way)
184

    
185
VMDK disk image file
186
^^^^^^^^^^^^^^^^^^^^
187

    
188
This file format is directly based on the qemu vmdk driver, which is
189
synchronous and thus slow.
190

    
191
Detailed Design
192
===============
193

    
194
Terminology
195
+++++++++++
196

    
197
* **VBD** (Virtual Block Device): Persistent storage available to a
198
  virtual machine, providing the abstraction of an actual block
199
  storage device. VBDs may be actual block devices, filesystem images,
200
  or remote/network storage.
201

    
202
* **Dom0** (Domain 0): The first domain to be started on a Xen
203
  machine.  Domain 0 is responsible for managing the system.
204

    
205
* **VM** (Virtual Machine): The environment in which a hosted
206
  operating system runs, providing the abstraction of a dedicated
207
  machine. A VM may be identical to the underlying hardware (as in
208
  full virtualization, or it may differ, as in paravirtualization). In
209
  the case of Xen the domU (unprivileged domain) instance is meant.
210

    
211
* **QCOW**: QEMU (a processor emulator) image format.
212

    
213

    
214
Implementation
215
++++++++++++++
216

    
217
Managing file-based instances
218
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
219

    
220
The option for file-based storage will be added to the 'gnt-instance'
221
utility.
222

    
223
Add Instance
224
^^^^^^^^^^^^
225

    
226
Example:
227

    
228
  gnt-instance add -t file:[path\ =[,driver=loop[,reuse[,...]]]] \
229
  --disk 0:size=5G --disk 1:size=10G -n node -o debian-etch instance2
230

    
231
This will create a file-based instance with e.g. the following files:
232
* ``/sda`` -> 5GB
233
* ``/sdb`` -> 10GB
234

    
235
The default directory where files will be stored is
236
``/srv/ganeti/file-storage/``. This can be changed by setting the
237
``<path>`` option. This option denotes the full path to the directory
238
where the files are stored. The filetype will be "raw" for the first
239
release of Ganeti 2.0. However, the code will be extensible to more
240
file types, since Ganeti will store information about the file type of
241
each image file. Internally Ganeti will keep track of the used driver,
242
the file-type and the full path to the file for every VBD. Example:
243
"logical_id" : ``[FD_LOOP, FT_RAW, "/instance1/sda"]`` If the
244
``--reuse`` flag is set, Ganeti checks for existing files in the
245
corresponding directory (e.g. ``/xen/instance2/``). If one or more
246
files in this directory are present and correctly named (the naming
247
conventions will be defined in Ganeti version 2.0) Ganeti will set a
248
VM up with these. If no file can be found or the names or invalid the
249
operation will be aborted.
250

    
251
Remove instance
252
^^^^^^^^^^^^^^^
253

    
254
The instance removal will just differ from the actual one by deleting
255
the VBD-files instead of the corresponding block device (e.g. a logical
256
volume).
257

    
258
Starting/Stopping Instance
259
^^^^^^^^^^^^^^^^^^^^^^^^^^
260

    
261
Here nothing has to be changed, as the xen tools don't differentiate
262
between file-based or blockdevice-based instances in this case.
263

    
264
Export/Import instance
265
^^^^^^^^^^^^^^^^^^^^^^
266

    
267
Provided "dump/restore" is used in the "export" and "import" guest-os
268
scripts, there are no modifications needed when file-based instances are
269
exported/imported. If any other backup-tool (which requires access to
270
the mounted file-system) is used then the image file can be temporaily
271
mounted. This can be done in different ways:
272

    
273
Mount a raw image file via loopback driver::
274

    
275
  mount -o loop /srv/ganeti/file-storage/instance1/sda1 /mnt/disk\
276

    
277
Mount a raw image file via blkfront driver (Dom0 kernel needs this
278
module to do the following operation)::
279

    
280
  xm block-attach 0 tap:aio:/srv/ganeti/file-storage/instance1/sda1 /dev/xvda1 w 0\
281

    
282
  mount /dev/xvda1 /mnt/disk
283

    
284
Mount a qcow image file via blkfront driver (Dom0 kernel needs this
285
module to do the following operation)
286

    
287
  xm block-attach 0 tap:qcow:/srv/ganeti/file-storage/instance1/sda1 /dev/xvda1 w 0
288

    
289
  mount /dev/xvda1 /mnt/disk
290

    
291
High availability features with file-based instances
292
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
293

    
294
Failing over an instance
295
^^^^^^^^^^^^^^^^^^^^^^^^
296

    
297
Failover is done in the same way as with block device backends. The
298
instance gets stopped on the primary node and started on the secondary.
299
The roles of primary and secondary get swapped. Note: If a failover is
300
done, Ganeti will assume that the corresponding VBD(s) location (i.e.
301
directory) is the same on the source and destination node. In case one
302
or more corresponding file(s) are not present on the destination node,
303
Ganeti will abort the operation.
304

    
305
Replacing an instance disks
306
^^^^^^^^^^^^^^^^^^^^^^^^^^^
307

    
308
Since there is no data mirroring for file-backed VM there is no such
309
operation.
310

    
311
Evacuation of a node
312
^^^^^^^^^^^^^^^^^^^^
313

    
314
Since there is no data mirroring for file-backed VMs there is no such
315
operation.
316

    
317
Live migration
318
^^^^^^^^^^^^^^
319

    
320
Live migration is possible using file-backed VBDs. However, the
321
administrator has to make sure that the corresponding files are exactly
322
the same on the source and destination node.
323

    
324
Xen Setup
325
+++++++++
326

    
327
File creation
328
~~~~~~~~~~~~~
329

    
330
Creation of a raw file is simple. Example of creating a sparse file of 2
331
Gigabytes. The option "seek" instructs "dd" to create a sparse file::
332

    
333
  dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1
334

    
335
Creation of QCOW image files can be done with the "qemu-img" utility (in
336
debian it comes with the "qemu" package).
337

    
338
Config file
339
~~~~~~~~~~~
340

    
341
The Xen config file will have the following modification if one chooses
342
the file-based disk-template.
343

    
344
1) loopback driver and raw file
345
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
346

    
347
::
348

    
349
  disk = ['file:</path/to/file>,sda1,w']
350

    
351
2) blktap driver and raw file
352
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
353

    
354
::
355

    
356
  disk = ['tap:aio:,sda1,w']
357

    
358
3) blktap driver and qcow file
359
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
360

    
361
::
362

    
363
  disk = ['tap:qcow:,sda1,w']
364

    
365
Other hypervisors
366
+++++++++++++++++
367

    
368
Other hypervisors have mostly differnet ways to make storage available
369
to their virtual instances/machines. This is beyond the scope of this
370
document.