Statistics
| Branch: | Tag: | Revision:

root / doc / design-storagetypes.rst @ b284f504

History | View | Annotate | Download (13.5 kB)

1
=============================================================================
2
Management of storage types and disk templates, incl. storage space reporting
3
=============================================================================
4

    
5
.. contents:: :depth: 4
6

    
7
Background
8
==========
9

    
10
Currently, there is no consistent management of different variants of storage
11
in Ganeti. One direct consequence is that storage space reporting is currently
12
broken for all storage that is not based on lvm technolgy. This design looks at
13
the root causes and proposes a way to fix it.
14

    
15
Proposed changes
16
================
17

    
18
We propose to streamline handling of different storage types and disk templates.
19
Currently, there is no consistent implementation for dis/enabling of disk
20
templates and/or storage types.
21

    
22
Our idea is to introduce a list of enabled disk templates, which can be
23
used by instances in the cluster. Based on this list, we want to provide
24
storage reporting mechanisms for the available disk templates. Since some
25
disk templates share the same underlying storage technology (for example
26
``drbd`` and ``plain`` are based on ``lvm``), we map disk templates to storage
27
types and implement storage space reporting for each storage type.
28

    
29
Configuration changes
30
---------------------
31

    
32
Add a new attribute "enabled_disk_templates" (type: list of strings) to the
33
cluster config which holds disk templates, for example, "drbd", "file",
34
or "ext". This attribute represents the list of disk templates that are enabled
35
cluster-wide for usage by the instances. It will not be possible to create
36
instances with a disk template that is not enabled, as well as it will not be
37
possible to remove a disk template from the list if there are still instances
38
using it.
39

    
40
The list of enabled disk templates can contain any non-empty subset of
41
the currently implemented disk templates: ``blockdev``, ``diskless``, ``drbd``,
42
``ext``, ``file``, ``plain``, ``rbd``, and ``sharedfile``. See
43
``DISK_TEMPLATES`` in ``constants.py``.
44

    
45
Note that the abovementioned list of enabled disk types is just a "mechanism"
46
parameter that defines which disk templates the cluster can use. Further
47
filtering about what's allowed can go in the ipolicy, which is not covered in
48
this design doc. Note that it is possible to force an instance to use a disk
49
template that is not allowed by the ipolicy. This is not possible if the
50
template is not enabled by the cluster.
51

    
52
FIXME: In what way should verification between the enabled disk templates in
53
the cluster and in the ipolicy take place?
54

    
55
We consider the first disk template in the list to be the default template for
56
instance creation and storage reporting. This will remove the need to specify
57
the disk template with ``-t`` on instance creation.
58

    
59
Currently, cluster-wide dis/enabling of disk templates is not implemented
60
consistently. ``lvm`` based disk templates are enabled by specifying a volume
61
group name on cluster initialization and can only be disabled by explicitly
62
using the option ``--no-lvm-storage``. This will be replaced by adding/removing
63
``drbd`` and ``plain`` from the set of enabled disk templates.
64

    
65
Up till now, file storage and shared file storage could be dis/enabled at
66
``./configure`` time. This will also be replaced by adding/removing the
67
respective disk templates from the set of enabled disk templates.
68

    
69
There is currently no possibility to dis/enable the disk templates
70
``diskless``, ``blockdev``, ``ext``, and ``rdb``. By introducing the set of
71
enabled disk templates, we will require these disk templates to be explicitely
72
enabled in order to be used. The idea is that the administrator of the cluster
73
can tailor the cluster configuration to what is actually needed in the cluster.
74
There is hope that this will lead to cleaner code, better performance and fewer
75
bugs.
76

    
77
When upgrading the configuration from a version that did not have the list
78
of enabled disk templates, we have to decide which disk templates are enabled
79
based on the current configuration of the cluster. We propose the following
80
update logic to be implemented in the online update of the config in
81
the ``Cluster`` class in ``objects.py``:
82
- If a ``volume_group_name`` is existing, then enable ``drbd`` and ``plain``.
83
(TODO: can we narrow that down further?)
84
- If ``file`` or ``sharedfile`` was enabled at configure time, add the
85
respective disk template to the list of enabled disk templates.
86
- For disk templates ``diskless``, ``blockdev``, ``ext``, and ``rbd``, we
87
inspect the current cluster configuration regarding whether or not there
88
are instances that use one of those disk templates. We will add only those
89
that are currently in use.
90
The order in which the list of enabled disk templates is built up will be
91
determined by a preference order based on when in the history of Ganeti the
92
disk templates were introduced (thus being a heuristic for which are used
93
more than others).
94

    
95
The list of enabled disk templates can be specified on cluster initialization
96
with ``gnt-cluster init`` using the optional parameter
97
``--enabled-disk-templates``. If it is not set, it will be set to a default
98
set of enabled disk templates, which includes the following disk templates:
99
``drbd`` and ``plain``. The list can be shrunk or extended by
100
``gnt-cluster modify`` using the same parameter.
101

    
102
Storage reporting
103
-----------------
104

    
105
The storage reporting in ``gnt-node list`` will be the first user of the
106
newly introduced list of enabled disk templates. Currently, storage reporting
107
works only for lvm-based storage. We want to extend that and report storage
108
for the enabled disk templates. The default of ``gnt-node list`` will only
109
report on storage of the default disk template (the first in the list of enabled
110
disk templates). One can explicitly ask for storage reporting on the other
111
enabled disk templates with the ``-o`` option.
112

    
113
Some of the currently implemented disk templates share the same base storage
114
technology. Since the storage reporting is based on the underlying technology
115
rather than on the user-facing disk templates, we introduce storage types to
116
represent the underlying technology. There will be a mapping from disk templates
117
to storage types, which will be used by the storage reporting backend to pick
118
the right method for estimating the storage for the different disk templates.
119

    
120
The proposed storage types are ``blockdev``, ``diskless``, ``ext``, ``file``,
121
``lvm-pv``, ``lvm-vg``, ``rados``.
122

    
123
The mapping from disk templates to storage types will be: ``drbd`` and ``plain``
124
to ``lvm-vg``, ``file`` and ``sharedfile`` to ``file``, and all others to their
125
obvious counterparts.
126

    
127
Note that there is no disk template mapping to ``lvm-pv``, because this storage
128
type is currently only used to enable the user to mark it as (un)allocatable.
129
(See ``man gnt-node``.) It is not possible to create an instance on a storage
130
unit that is of type ``lvm-pv`` directly, therefore it is not included in the
131
mapping.
132

    
133
The storage reporting for file storage will report space on the file storage
134
dir, which is currently limited to one directory. In the future, if we'll have
135
support for more directories, or for per-nodegroup directories this can be
136
changed.
137

    
138
For now, we will implement only the storage reporting for non-shared storage,
139
that is disk templates ``file``, ``lvm``, and ``drbd``. For disk template
140
``diskless``, there is obviously nothing to report about. When implementing
141
storage reporting for file, we can also use it for ``sharedfile``, since it
142
uses the same file system mechanisms to determine the free space. In the
143
future, we can optimize storage reporting for shared storage by not querying
144
all nodes that use a common shared file for the same space information.
145

    
146
In the future, we extend storage reporting for shared storage types like
147
``rados`` and ``ext``. Note that it will not make sense to query each node for
148
storage reporting on a storage unit that is used by several nodes.
149

    
150
We will not implement storage reporting for the ``blockdev`` disk template,
151
because block devices are always adopted after being provided by the system
152
administrator, thus coming from outside Ganeti. There is no point in storage
153
reporting for block devices, because Ganeti will never try to allocate storage
154
inside a block device.
155

    
156
RPC changes
157
-----------
158

    
159
The noded RPC call that reports node storage space will be changed to
160
accept a list of <disktemplate>,<key> string tuples. For each of them, it will
161
report the free amount of storage space found on storage <key> as known
162
by the requested disk template. Depending on the disk template, the key would
163
be a volume group name, in case of lvm-based disk templates, a directory name
164
for the file and shared file storage, and a rados pool name for rados storage.
165

    
166
Masterd will know through the mapping of disk templates to storage types which
167
storage type uses which mechanism for storage calculation and invoke only the
168
needed ones.
169

    
170
Note that for file and sharedfile the node knows which directories are allowed
171
and won't allow any other directory to be queried for security reasons. The
172
actual path still needs to be passed to distinguish the two, as the type will
173
be the same for both.
174

    
175
These calculations will be implemented in the node storage system
176
(currently lib/storage.py) but querying will still happen through the
177
``node info`` call, to avoid requiring an extra RPC each time.
178

    
179
Ganeti reporting
180
----------------
181

    
182
`gnt-node list`` can be queried for the different disk templates, if they
183
are enabled. By default, it will just report information about the default
184
disk template. Examples::
185

    
186
  > gnt-node list
187
  Node                       DTotal DFree MTotal MNode MFree Pinst Sinst
188
  mynode1                      3.6T  3.6T  64.0G 1023M 62.2G     1     0
189
  mynode2                      3.6T  3.6T  64.0G 1023M 62.0G     2     1
190
  mynode3                      3.6T  3.6T  64.0G 1023M 62.3G     0     2
191

    
192
  > gnt-node list -o dtotal/drbd,dfree/file
193
  Node      DTotal (drbd, myvg) DFree (file, mydir)
194
  mynode1                 3.6T                    -
195
  mynode2                 3.6T                    -
196

    
197
Note that for drbd, we only report the space of the vg and only if it was not
198
renamed to something different than the default volume group name. With this
199
design, there is also no possibility to ask about the meta volume group. We
200
restrict the design here to make the transition to storage pools easier (as it
201
is an interim state only). It is the administrator's responsibility to ensure
202
that there is enough space for the meta volume group.
203

    
204
When storage pools are implemented, we switch from referencing the disk template
205
to referencing the storage pool name. For that, of course, the pool names need
206
to be unique over all storage types. For drbd, we will use the default 'drbd'
207
storage pool and possibly a second lvm-based storage pool for the metavg. It
208
will be possible to rename storage pools (thus also the default lvm storage
209
pool). There will be new functionality to ask about what storage pools are
210
available and of what type. Storage pools will have a storage pool type which is
211
one of the disk templates. There can be more than one storage pool based on the
212
same disk template, therefore we will then start referencing the storage pool
213
name instead of the disk template.
214

    
215
``gnt-cluster info`` will report which disk templates are enabled, i.e.
216
which ones are supported according to the cluster configuration. Example
217
output::
218

    
219
  > gnt-cluster info
220
  [...]
221
  Cluster parameters:
222
    - [...]
223
    - enabled disk templates: plain, drbd, sharedfile, rados
224
    - [...]
225

    
226
``gnt-node list-storage`` will not be affected by any changes, since this design
227
is restricted only to free storage reporting for non-shared storage types.
228

    
229
Allocator changes
230
-----------------
231

    
232
The iallocator protocol doesn't need to change: since we know which
233
disk template an instance has, we'll pass only the "free" value for that
234
disk template to the iallocator, when asking for an allocation to be
235
made. Note that for DRBD nowadays we ignore the case when vg and metavg
236
are different, and we only consider the main volume group. Fixing this is
237
outside the scope of this design.
238

    
239
With this design, we ensure forward-compatibility with respect to storage
240
pools. For now, we'll report space for all available disk templates that
241
are based on non-shared storage types, in the future, for all available
242
storage pools.
243

    
244
Rebalancing changes
245
-------------------
246

    
247
Hbal will not need changes, as it handles it already. We don't forecast
248
any changes needed to it.
249

    
250
Space reporting changes
251
-----------------------
252

    
253
Hspace will by default report by assuming the allocation will happen on
254
the default disk template for the cluster/nodegroup. An option will be added
255
to manually specify a different storage.
256

    
257
Interactions with Partitioned Ganeti
258
------------------------------------
259

    
260
Also the design for :doc:`Partitioned Ganeti <design-partitioned>` deals
261
with reporting free space. Partitioned Ganeti has a different way to
262
report free space for LVM on nodes where the ``exclusive_storage`` flag
263
is set. That doesn't interact directly with this design, as the specifics
264
of how the free space is computed is not in the scope of this design.
265
But the ``node info`` call contains the value of the
266
``exclusive_storage`` flag, which is currently only meaningful for the
267
LVM storage type. Additional flags like the ``exclusive_storage`` flag
268
for lvm might be useful for other disk templates / storage types as well.
269
We therefore extend the RPC call with <disktemplate>,<key> to
270
<disktemplate>,<key>,<params> to include any disk-template-specific
271
(or storage-type specific) parameters in the RPC call.
272

    
273
The reporting of free spindles, also part of Partitioned Ganeti, is not
274
concerned with this design doc, as those are seen as a separate resource.
275

    
276
.. vim: set textwidth=72 :
277
.. Local Variables:
278
.. mode: rst
279
.. fill-column: 72
280
.. End: