Statistics
| Branch: | Tag: | Revision:

root / doc / design-storagetypes.rst @ 333bd799

History | View | Annotate | Download (15.4 kB)

1
=============================================================================
2
Management of storage types and disk templates, incl. storage space reporting
3
=============================================================================
4

    
5
.. contents:: :depth: 4
6

    
7
Background
8
==========
9

    
10
Currently, there is no consistent management of different variants of storage
11
in Ganeti. One direct consequence is that storage space reporting is currently
12
broken for all storage that is not based on lvm technolgy. This design looks at
13
the root causes and proposes a way to fix it.
14

    
15
Proposed changes
16
================
17

    
18
We propose to streamline handling of different storage types and disk templates.
19
Currently, there is no consistent implementation for dis/enabling of disk
20
templates and/or storage types.
21

    
22
Our idea is to introduce a list of enabled disk templates, which can be
23
used by instances in the cluster. Based on this list, we want to provide
24
storage reporting mechanisms for the available disk templates. Since some
25
disk templates share the same underlying storage technology (for example
26
``drbd`` and ``plain`` are based on ``lvm``), we map disk templates to storage
27
types and implement storage space reporting for each storage type.
28

    
29
Configuration changes
30
---------------------
31

    
32
Add a new attribute "enabled_disk_templates" (type: list of strings) to the
33
cluster config which holds disk templates, for example, "drbd", "file",
34
or "ext". This attribute represents the list of disk templates that are enabled
35
cluster-wide for usage by the instances. It will not be possible to create
36
instances with a disk template that is not enabled, as well as it will not be
37
possible to remove a disk template from the list if there are still instances
38
using it.
39

    
40
The list of enabled disk templates can contain any non-empty subset of
41
the currently implemented disk templates: ``blockdev``, ``diskless``, ``drbd``,
42
``ext``, ``file``, ``plain``, ``rbd``, and ``sharedfile``. See
43
``DISK_TEMPLATES`` in ``constants.py``.
44

    
45
Note that the abovementioned list of enabled disk types is just a "mechanism"
46
parameter that defines which disk templates the cluster can use. Further
47
filtering about what's allowed can go in the ipolicy, which is not covered in
48
this design doc. Note that it is possible to force an instance to use a disk
49
template that is not allowed by the ipolicy. This is not possible if the
50
template is not enabled by the cluster.
51

    
52
The ipolicy also contains a list of enabled disk templates. Since the cluster-
53
wide enabled disk templates should be a stronger constraint, the list of
54
enabled disk templates in the ipolicy should be a subset of those. In case the
55
user tries to create an inconsistent situation here, gnt-cluster should
56
display this as an error.
57

    
58
We consider the first disk template in the list to be the default template for
59
instance creation and storage reporting. This will remove the need to specify
60
the disk template with ``-t`` on instance creation. Note: It would be
61
better to take the default disk template from the node-group-specific
62
ipolicy. However, when using the iallocator, the nodegroup can only be
63
determined from the node which is determined by the iallocator, which in
64
turn needs the disk-template first. To solve this
65
chicken-and-egg-problem we first need to extend 'gnt-instance add' to
66
accept a nodegroup in the first place.
67

    
68
Currently, cluster-wide dis/enabling of disk templates is not implemented
69
consistently. ``lvm`` based disk templates are enabled by specifying a volume
70
group name on cluster initialization and can only be disabled by explicitly
71
using the option ``--no-lvm-storage``. This will be replaced by adding/removing
72
``drbd`` and ``plain`` from the set of enabled disk templates.
73

    
74
The option ``--no-drbd-storage`` is also subsumed by dis/enabling the
75
disk template ``drbd`` on the cluster.
76

    
77
Up till now, file storage and shared file storage could be dis/enabled at
78
``./configure`` time. This will also be replaced by adding/removing the
79
respective disk templates from the set of enabled disk templates.
80

    
81
There is currently no possibility to dis/enable the disk templates
82
``diskless``, ``blockdev``, ``ext``, and ``rdb``. By introducing the set of
83
enabled disk templates, we will require these disk templates to be explicitly
84
enabled in order to be used. The idea is that the administrator of the cluster
85
can tailor the cluster configuration to what is actually needed in the cluster.
86
There is hope that this will lead to cleaner code, better performance and fewer
87
bugs.
88

    
89
When upgrading the configuration from a version that did not have the list
90
of enabled disk templates, we have to decide which disk templates are enabled
91
based on the current configuration of the cluster. We propose the following
92
update logic to be implemented in the online update of the config in
93
the ``Cluster`` class in ``objects.py``:
94
- If a ``volume_group_name`` is existing, then enable ``drbd`` and ``plain``.
95
- If ``file`` or ``sharedfile`` was enabled at configure time, add the
96
respective disk template to the list of enabled disk templates.
97
- For disk templates ``diskless``, ``blockdev``, ``ext``, and ``rbd``, we
98
inspect the current cluster configuration regarding whether or not there
99
are instances that use one of those disk templates. We will add only those
100
that are currently in use.
101
The order in which the list of enabled disk templates is built up will be
102
determined by a preference order based on when in the history of Ganeti the
103
disk templates were introduced (thus being a heuristic for which are used
104
more than others).
105

    
106
The list of enabled disk templates can be specified on cluster initialization
107
with ``gnt-cluster init`` using the optional parameter
108
``--enabled-disk-templates``. If it is not set, it will be set to a default
109
set of enabled disk templates, which includes the following disk templates:
110
``drbd`` and ``plain``. The list can be shrunk or extended by
111
``gnt-cluster modify`` using the same parameter.
112

    
113
Storage reporting
114
-----------------
115

    
116
The storage reporting in ``gnt-node list`` will be the first user of the
117
newly introduced list of enabled disk templates. Currently, storage reporting
118
works only for lvm-based storage. We want to extend that and report storage
119
for the enabled disk templates. The default of ``gnt-node list`` will only
120
report on storage of the default disk template (the first in the list of enabled
121
disk templates). One can explicitly ask for storage reporting on the other
122
enabled disk templates with the ``-o`` option.
123

    
124
Some of the currently implemented disk templates share the same base storage
125
technology. Since the storage reporting is based on the underlying technology
126
rather than on the user-facing disk templates, we introduce storage types to
127
represent the underlying technology. There will be a mapping from disk templates
128
to storage types, which will be used by the storage reporting backend to pick
129
the right method for estimating the storage for the different disk templates.
130

    
131
The proposed storage types are ``blockdev``, ``diskless``, ``ext``, ``file``,
132
``lvm-pv``, ``lvm-vg``, ``rados``.
133

    
134
The mapping from disk templates to storage types will be: ``drbd`` and ``plain``
135
to ``lvm-vg``, ``file`` and ``sharedfile`` to ``file``, and all others to their
136
obvious counterparts.
137

    
138
Note that there is no disk template mapping to ``lvm-pv``, because this storage
139
type is currently only used to enable the user to mark it as (un)allocatable.
140
(See ``man gnt-node``.) It is not possible to create an instance on a storage
141
unit that is of type ``lvm-pv`` directly, therefore it is not included in the
142
mapping.
143

    
144
The storage reporting for file and sharedfile storage will report space
145
on the file storage dir, which is currently limited to one directory.
146
In the future, if we'll have support for more directories, or for per-nodegroup
147
directories this can be changed.
148

    
149
For now, we will implement only the storage reporting for lvm-based and
150
file-based storage, that is disk templates ``file``, ``sharedfile``, ``lvm``,
151
and ``drbd``. For disk template ``diskless``, there is obviously nothing to
152
report about. When implementing storage reporting for file, we can also use
153
it for ``sharedfile``, since it uses the same file system mechanisms to
154
determine the free space. In the future, we can optimize storage reporting
155
for shared storage by not querying all nodes that use a common shared file
156
for the same space information.
157

    
158
In the future, we extend storage reporting for shared storage types like
159
``rados`` and ``ext``. Note that it will not make sense to query each node for
160
storage reporting on a storage unit that is used by several nodes.
161

    
162
We will not implement storage reporting for the ``blockdev`` disk template,
163
because block devices are always adopted after being provided by the system
164
administrator, thus coming from outside Ganeti. There is no point in storage
165
reporting for block devices, because Ganeti will never try to allocate storage
166
inside a block device.
167

    
168
RPC changes
169
-----------
170

    
171
The noded RPC call that reports node storage space will be changed to
172
accept a list of <storage_type>,<key> string tuples. For each of them, it will
173
report the free amount of storage space found on storage <key> as known
174
by the requested storage_type. Depending on the storage_type, the key would
175
be a volume group name in case of lvm, a directory name for the file-based
176
storage, and a rados pool name for rados storage.
177

    
178
Masterd will know through the mapping of storage types to storage calculation
179
functions which storage type uses which mechanism for storage calculation
180
and invoke only the needed ones.
181

    
182
Note that for file and sharedfile the node knows which directories are allowed
183
and won't allow any other directory to be queried for security reasons. The
184
actual path still needs to be passed to distinguish the two, as the type will
185
be the same for both.
186

    
187
These calculations will be implemented in the node storage system
188
(currently lib/storage.py) but querying will still happen through the
189
``node info`` call, to avoid requiring an extra RPC each time.
190

    
191
Ganeti reporting
192
----------------
193

    
194
`gnt-node list`` can be queried for the different disk templates, if they
195
are enabled. By default, it will just report information about the default
196
disk template. Examples::
197

    
198
  > gnt-node list
199
  Node                       DTotal DFree MTotal MNode MFree Pinst Sinst
200
  mynode1                      3.6T  3.6T  64.0G 1023M 62.2G     1     0
201
  mynode2                      3.6T  3.6T  64.0G 1023M 62.0G     2     1
202
  mynode3                      3.6T  3.6T  64.0G 1023M 62.3G     0     2
203

    
204
  > gnt-node list -o dtotal/drbd,dfree/file
205
  Node      DTotal (drbd, myvg) DFree (file, mydir)
206
  mynode1                 3.6T                    -
207
  mynode2                 3.6T                    -
208

    
209
Note that for drbd, we only report the space of the vg and only if it was not
210
renamed to something different than the default volume group name. With this
211
design, there is also no possibility to ask about the meta volume group. We
212
restrict the design here to make the transition to storage pools easier (as it
213
is an interim state only). It is the administrator's responsibility to ensure
214
that there is enough space for the meta volume group.
215

    
216
When storage pools are implemented, we switch from referencing the disk template
217
to referencing the storage pool name. For that, of course, the pool names need
218
to be unique over all storage types. For drbd, we will use the default 'drbd'
219
storage pool and possibly a second lvm-based storage pool for the metavg. It
220
will be possible to rename storage pools (thus also the default lvm storage
221
pool). There will be new functionality to ask about what storage pools are
222
available and of what type. Storage pools will have a storage pool type which is
223
one of the disk templates. There can be more than one storage pool based on the
224
same disk template, therefore we will then start referencing the storage pool
225
name instead of the disk template.
226

    
227
Note: As of version 2.10, ``gnt-node list`` only reports storage space
228
information for the default disk template, as supporting more options
229
turned out to be not feasible without storage pools.
230

    
231
Besides in ``gnt-node list``, storage space information is also
232
displayed in ``gnt-node list-storage``. This will also adapt to the
233
extended storage reporting capabilities. The user can specify a storage
234
type using ``--storage-type``. If he requests storage information about
235
a storage type which does not support space reporting, a warning is
236
emitted. If no storage type is specified explicitely, ``gnt-node
237
list-storage`` will try to report storage on the storage type of the
238
default disk template. If the default disk template's storage type does
239
not support space reporting, an error message is emitted.
240

    
241
``gnt-cluster info`` will report which disk templates are enabled, i.e.
242
which ones are supported according to the cluster configuration. Example
243
output::
244

    
245
  > gnt-cluster info
246
  [...]
247
  Cluster parameters:
248
    - [...]
249
    - enabled disk templates: plain, drbd, sharedfile, rados
250
    - [...]
251

    
252
``gnt-node list-storage`` will not be affected by any changes, since this design
253
is restricted only to free storage reporting for non-shared storage types.
254

    
255
Allocator changes
256
-----------------
257

    
258
The iallocator protocol doesn't need to change: since we know which
259
disk template an instance has, we'll pass only the "free" value for that
260
disk template to the iallocator, when asking for an allocation to be
261
made. Note that for DRBD nowadays we ignore the case when vg and metavg
262
are different, and we only consider the main volume group. Fixing this is
263
outside the scope of this design.
264

    
265
Although the iallocator protocol itself does not need change, the
266
invocation of the iallocator needs quite some adaption. So far, it
267
always requested LVM storage information no matter if that was the
268
disk template to be considered for the allocation. For instance
269
allocation, this is the disk template of the instance.
270
TODO: consider other allocator requests.
271

    
272
With this design, we ensure forward-compatibility with respect to storage
273
pools. For now, we'll report space for all available disk templates that
274
are based on non-shared storage types, in the future, for all available
275
storage pools.
276

    
277
Rebalancing changes
278
-------------------
279

    
280
Hbal will not need changes, as it handles it already. We don't forecast
281
any changes needed to it.
282

    
283
Space reporting changes
284
-----------------------
285

    
286
Hspace will by default report by assuming the allocation will happen on
287
the default disk template for the cluster/nodegroup. An option will be added
288
to manually specify a different storage.
289

    
290
Interactions with Partitioned Ganeti
291
------------------------------------
292

    
293
Also the design for :doc:`Partitioned Ganeti <design-partitioned>` deals
294
with reporting free space. Partitioned Ganeti has a different way to
295
report free space for LVM on nodes where the ``exclusive_storage`` flag
296
is set. That doesn't interact directly with this design, as the specifics
297
of how the free space is computed is not in the scope of this design.
298
But the ``node info`` call contains the value of the
299
``exclusive_storage`` flag, which is currently only meaningful for the
300
LVM storage type. Additional flags like the ``exclusive_storage`` flag
301
for lvm might be useful for other disk templates / storage types as well.
302
We therefore extend the RPC call with <storage_type>,<key> to
303
<storage_type>,<key>,[<param>] to include any disk-template-specific
304
(or storage-type specific) parameters in the RPC call.
305

    
306
The reporting of free spindles, also part of Partitioned Ganeti, is not
307
concerned with this design doc, as those are seen as a separate resource.
308

    
309
.. vim: set textwidth=72 :
310
.. Local Variables:
311
.. mode: rst
312
.. fill-column: 72
313
.. End: