Statistics
| Branch: | Tag: | Revision:

root / doc / design-storagetypes.rst @ ab6536ba

History | View | Annotate | Download (14.1 kB)

1 b284f504 Helga Velroyen
=============================================================================
2 b284f504 Helga Velroyen
Management of storage types and disk templates, incl. storage space reporting
3 b284f504 Helga Velroyen
=============================================================================
4 ab8747b7 Guido Trotter
5 ab8747b7 Guido Trotter
.. contents:: :depth: 4
6 ab8747b7 Guido Trotter
7 ab8747b7 Guido Trotter
Background
8 ab8747b7 Guido Trotter
==========
9 ab8747b7 Guido Trotter
10 6559d7f8 Helga Velroyen
Currently, there is no consistent management of different variants of storage
11 6559d7f8 Helga Velroyen
in Ganeti. One direct consequence is that storage space reporting is currently
12 6559d7f8 Helga Velroyen
broken for all storage that is not based on lvm technolgy. This design looks at
13 6559d7f8 Helga Velroyen
the root causes and proposes a way to fix it.
14 6559d7f8 Helga Velroyen
15 ab8747b7 Guido Trotter
Proposed changes
16 ab8747b7 Guido Trotter
================
17 ab8747b7 Guido Trotter
18 6559d7f8 Helga Velroyen
We propose to streamline handling of different storage types and disk templates.
19 6559d7f8 Helga Velroyen
Currently, there is no consistent implementation for dis/enabling of disk
20 6559d7f8 Helga Velroyen
templates and/or storage types.
21 6559d7f8 Helga Velroyen
22 6559d7f8 Helga Velroyen
Our idea is to introduce a list of enabled disk templates, which can be
23 6559d7f8 Helga Velroyen
used by instances in the cluster. Based on this list, we want to provide
24 6559d7f8 Helga Velroyen
storage reporting mechanisms for the available disk templates. Since some
25 6559d7f8 Helga Velroyen
disk templates share the same underlying storage technology (for example
26 6559d7f8 Helga Velroyen
``drbd`` and ``plain`` are based on ``lvm``), we map disk templates to storage
27 6559d7f8 Helga Velroyen
types and implement storage space reporting for each storage type.
28 ab8747b7 Guido Trotter
29 ab8747b7 Guido Trotter
Configuration changes
30 ab8747b7 Guido Trotter
---------------------
31 ab8747b7 Guido Trotter
32 6559d7f8 Helga Velroyen
Add a new attribute "enabled_disk_templates" (type: list of strings) to the
33 6559d7f8 Helga Velroyen
cluster config which holds disk templates, for example, "drbd", "file",
34 6559d7f8 Helga Velroyen
or "ext". This attribute represents the list of disk templates that are enabled
35 6559d7f8 Helga Velroyen
cluster-wide for usage by the instances. It will not be possible to create
36 6559d7f8 Helga Velroyen
instances with a disk template that is not enabled, as well as it will not be
37 6559d7f8 Helga Velroyen
possible to remove a disk template from the list if there are still instances
38 6559d7f8 Helga Velroyen
using it.
39 6559d7f8 Helga Velroyen
40 6559d7f8 Helga Velroyen
The list of enabled disk templates can contain any non-empty subset of
41 6559d7f8 Helga Velroyen
the currently implemented disk templates: ``blockdev``, ``diskless``, ``drbd``,
42 6559d7f8 Helga Velroyen
``ext``, ``file``, ``plain``, ``rbd``, and ``sharedfile``. See
43 6559d7f8 Helga Velroyen
``DISK_TEMPLATES`` in ``constants.py``.
44 6559d7f8 Helga Velroyen
45 6559d7f8 Helga Velroyen
Note that the abovementioned list of enabled disk types is just a "mechanism"
46 6559d7f8 Helga Velroyen
parameter that defines which disk templates the cluster can use. Further
47 6559d7f8 Helga Velroyen
filtering about what's allowed can go in the ipolicy, which is not covered in
48 6559d7f8 Helga Velroyen
this design doc. Note that it is possible to force an instance to use a disk
49 6559d7f8 Helga Velroyen
template that is not allowed by the ipolicy. This is not possible if the
50 6559d7f8 Helga Velroyen
template is not enabled by the cluster.
51 6559d7f8 Helga Velroyen
52 87c7621a Helga Velroyen
The ipolicy also contains a list of enabled disk templates. Since the cluster-
53 87c7621a Helga Velroyen
wide enabled disk templates should be a stronger constraint, the list of
54 87c7621a Helga Velroyen
enabled disk templates in the ipolicy should be a subset of those. In case the
55 87c7621a Helga Velroyen
user tries to create an inconsistent situation here, gnt-cluster should emit
56 87c7621a Helga Velroyen
a warning.
57 6559d7f8 Helga Velroyen
58 6559d7f8 Helga Velroyen
We consider the first disk template in the list to be the default template for
59 6559d7f8 Helga Velroyen
instance creation and storage reporting. This will remove the need to specify
60 513c5e25 Helga Velroyen
the disk template with ``-t`` on instance creation. Note: It would be
61 513c5e25 Helga Velroyen
better to take the default disk template from the node-group-specific
62 513c5e25 Helga Velroyen
ipolicy. However, when using the iallocator, the nodegroup can only be
63 513c5e25 Helga Velroyen
determined from the node which is determined by the iallocator, which in
64 513c5e25 Helga Velroyen
turn needs the disk-template first. To solve this
65 513c5e25 Helga Velroyen
chicken-and-egg-problem we first need to extend 'gnt-instance add' to
66 513c5e25 Helga Velroyen
accept a nodegroup in the first place.
67 6559d7f8 Helga Velroyen
68 6559d7f8 Helga Velroyen
Currently, cluster-wide dis/enabling of disk templates is not implemented
69 6559d7f8 Helga Velroyen
consistently. ``lvm`` based disk templates are enabled by specifying a volume
70 6559d7f8 Helga Velroyen
group name on cluster initialization and can only be disabled by explicitly
71 6559d7f8 Helga Velroyen
using the option ``--no-lvm-storage``. This will be replaced by adding/removing
72 6559d7f8 Helga Velroyen
``drbd`` and ``plain`` from the set of enabled disk templates.
73 6559d7f8 Helga Velroyen
74 6559d7f8 Helga Velroyen
Up till now, file storage and shared file storage could be dis/enabled at
75 6559d7f8 Helga Velroyen
``./configure`` time. This will also be replaced by adding/removing the
76 6559d7f8 Helga Velroyen
respective disk templates from the set of enabled disk templates.
77 6559d7f8 Helga Velroyen
78 6559d7f8 Helga Velroyen
There is currently no possibility to dis/enable the disk templates
79 6559d7f8 Helga Velroyen
``diskless``, ``blockdev``, ``ext``, and ``rdb``. By introducing the set of
80 9524c4c4 Weiwei Jia
enabled disk templates, we will require these disk templates to be explicitly
81 6559d7f8 Helga Velroyen
enabled in order to be used. The idea is that the administrator of the cluster
82 6559d7f8 Helga Velroyen
can tailor the cluster configuration to what is actually needed in the cluster.
83 6559d7f8 Helga Velroyen
There is hope that this will lead to cleaner code, better performance and fewer
84 6559d7f8 Helga Velroyen
bugs.
85 6559d7f8 Helga Velroyen
86 6559d7f8 Helga Velroyen
When upgrading the configuration from a version that did not have the list
87 6559d7f8 Helga Velroyen
of enabled disk templates, we have to decide which disk templates are enabled
88 6559d7f8 Helga Velroyen
based on the current configuration of the cluster. We propose the following
89 6559d7f8 Helga Velroyen
update logic to be implemented in the online update of the config in
90 6559d7f8 Helga Velroyen
the ``Cluster`` class in ``objects.py``:
91 6559d7f8 Helga Velroyen
- If a ``volume_group_name`` is existing, then enable ``drbd`` and ``plain``.
92 6559d7f8 Helga Velroyen
(TODO: can we narrow that down further?)
93 6559d7f8 Helga Velroyen
- If ``file`` or ``sharedfile`` was enabled at configure time, add the
94 6559d7f8 Helga Velroyen
respective disk template to the list of enabled disk templates.
95 6559d7f8 Helga Velroyen
- For disk templates ``diskless``, ``blockdev``, ``ext``, and ``rbd``, we
96 6559d7f8 Helga Velroyen
inspect the current cluster configuration regarding whether or not there
97 6559d7f8 Helga Velroyen
are instances that use one of those disk templates. We will add only those
98 6559d7f8 Helga Velroyen
that are currently in use.
99 6559d7f8 Helga Velroyen
The order in which the list of enabled disk templates is built up will be
100 6559d7f8 Helga Velroyen
determined by a preference order based on when in the history of Ganeti the
101 6559d7f8 Helga Velroyen
disk templates were introduced (thus being a heuristic for which are used
102 6559d7f8 Helga Velroyen
more than others).
103 6559d7f8 Helga Velroyen
104 6559d7f8 Helga Velroyen
The list of enabled disk templates can be specified on cluster initialization
105 6559d7f8 Helga Velroyen
with ``gnt-cluster init`` using the optional parameter
106 6559d7f8 Helga Velroyen
``--enabled-disk-templates``. If it is not set, it will be set to a default
107 6559d7f8 Helga Velroyen
set of enabled disk templates, which includes the following disk templates:
108 6559d7f8 Helga Velroyen
``drbd`` and ``plain``. The list can be shrunk or extended by
109 6559d7f8 Helga Velroyen
``gnt-cluster modify`` using the same parameter.
110 6559d7f8 Helga Velroyen
111 6559d7f8 Helga Velroyen
Storage reporting
112 6559d7f8 Helga Velroyen
-----------------
113 74df4a99 Helga Velroyen
114 6559d7f8 Helga Velroyen
The storage reporting in ``gnt-node list`` will be the first user of the
115 6559d7f8 Helga Velroyen
newly introduced list of enabled disk templates. Currently, storage reporting
116 6559d7f8 Helga Velroyen
works only for lvm-based storage. We want to extend that and report storage
117 6559d7f8 Helga Velroyen
for the enabled disk templates. The default of ``gnt-node list`` will only
118 6559d7f8 Helga Velroyen
report on storage of the default disk template (the first in the list of enabled
119 6559d7f8 Helga Velroyen
disk templates). One can explicitly ask for storage reporting on the other
120 6559d7f8 Helga Velroyen
enabled disk templates with the ``-o`` option.
121 6559d7f8 Helga Velroyen
122 6559d7f8 Helga Velroyen
Some of the currently implemented disk templates share the same base storage
123 6559d7f8 Helga Velroyen
technology. Since the storage reporting is based on the underlying technology
124 6559d7f8 Helga Velroyen
rather than on the user-facing disk templates, we introduce storage types to
125 6559d7f8 Helga Velroyen
represent the underlying technology. There will be a mapping from disk templates
126 6559d7f8 Helga Velroyen
to storage types, which will be used by the storage reporting backend to pick
127 6559d7f8 Helga Velroyen
the right method for estimating the storage for the different disk templates.
128 6559d7f8 Helga Velroyen
129 6559d7f8 Helga Velroyen
The proposed storage types are ``blockdev``, ``diskless``, ``ext``, ``file``,
130 6559d7f8 Helga Velroyen
``lvm-pv``, ``lvm-vg``, ``rados``.
131 6559d7f8 Helga Velroyen
132 6559d7f8 Helga Velroyen
The mapping from disk templates to storage types will be: ``drbd`` and ``plain``
133 6559d7f8 Helga Velroyen
to ``lvm-vg``, ``file`` and ``sharedfile`` to ``file``, and all others to their
134 6559d7f8 Helga Velroyen
obvious counterparts.
135 6559d7f8 Helga Velroyen
136 6559d7f8 Helga Velroyen
Note that there is no disk template mapping to ``lvm-pv``, because this storage
137 6559d7f8 Helga Velroyen
type is currently only used to enable the user to mark it as (un)allocatable.
138 6559d7f8 Helga Velroyen
(See ``man gnt-node``.) It is not possible to create an instance on a storage
139 6559d7f8 Helga Velroyen
unit that is of type ``lvm-pv`` directly, therefore it is not included in the
140 6559d7f8 Helga Velroyen
mapping.
141 6559d7f8 Helga Velroyen
142 6559d7f8 Helga Velroyen
The storage reporting for file storage will report space on the file storage
143 6559d7f8 Helga Velroyen
dir, which is currently limited to one directory. In the future, if we'll have
144 74df4a99 Helga Velroyen
support for more directories, or for per-nodegroup directories this can be
145 74df4a99 Helga Velroyen
changed.
146 74df4a99 Helga Velroyen
147 6559d7f8 Helga Velroyen
For now, we will implement only the storage reporting for non-shared storage,
148 6559d7f8 Helga Velroyen
that is disk templates ``file``, ``lvm``, and ``drbd``. For disk template
149 6559d7f8 Helga Velroyen
``diskless``, there is obviously nothing to report about. When implementing
150 6559d7f8 Helga Velroyen
storage reporting for file, we can also use it for ``sharedfile``, since it
151 6559d7f8 Helga Velroyen
uses the same file system mechanisms to determine the free space. In the
152 6559d7f8 Helga Velroyen
future, we can optimize storage reporting for shared storage by not querying
153 6559d7f8 Helga Velroyen
all nodes that use a common shared file for the same space information.
154 6559d7f8 Helga Velroyen
155 6559d7f8 Helga Velroyen
In the future, we extend storage reporting for shared storage types like
156 6559d7f8 Helga Velroyen
``rados`` and ``ext``. Note that it will not make sense to query each node for
157 6559d7f8 Helga Velroyen
storage reporting on a storage unit that is used by several nodes.
158 6559d7f8 Helga Velroyen
159 6559d7f8 Helga Velroyen
We will not implement storage reporting for the ``blockdev`` disk template,
160 6559d7f8 Helga Velroyen
because block devices are always adopted after being provided by the system
161 6559d7f8 Helga Velroyen
administrator, thus coming from outside Ganeti. There is no point in storage
162 6559d7f8 Helga Velroyen
reporting for block devices, because Ganeti will never try to allocate storage
163 6559d7f8 Helga Velroyen
inside a block device.
164 ab8747b7 Guido Trotter
165 ab8747b7 Guido Trotter
RPC changes
166 ab8747b7 Guido Trotter
-----------
167 ab8747b7 Guido Trotter
168 ab8747b7 Guido Trotter
The noded RPC call that reports node storage space will be changed to
169 32f88ce7 Helga Velroyen
accept a list of <storage_type>,<key> string tuples. For each of them, it will
170 74df4a99 Helga Velroyen
report the free amount of storage space found on storage <key> as known
171 32f88ce7 Helga Velroyen
by the requested storage_type. Depending on the storage_type, the key would
172 32f88ce7 Helga Velroyen
be a volume group name in case of lvm, a directory name for the file-based
173 32f88ce7 Helga Velroyen
storage, and a rados pool name for rados storage.
174 ab8747b7 Guido Trotter
175 32f88ce7 Helga Velroyen
Masterd will know through the mapping of storage types to storage calculation
176 32f88ce7 Helga Velroyen
functions which storage type uses which mechanism for storage calculation
177 32f88ce7 Helga Velroyen
and invoke only the needed ones.
178 ab8747b7 Guido Trotter
179 6559d7f8 Helga Velroyen
Note that for file and sharedfile the node knows which directories are allowed
180 6559d7f8 Helga Velroyen
and won't allow any other directory to be queried for security reasons. The
181 6559d7f8 Helga Velroyen
actual path still needs to be passed to distinguish the two, as the type will
182 6559d7f8 Helga Velroyen
be the same for both.
183 ab8747b7 Guido Trotter
184 ab8747b7 Guido Trotter
These calculations will be implemented in the node storage system
185 ab8747b7 Guido Trotter
(currently lib/storage.py) but querying will still happen through the
186 ab8747b7 Guido Trotter
``node info`` call, to avoid requiring an extra RPC each time.
187 ab8747b7 Guido Trotter
188 ab8747b7 Guido Trotter
Ganeti reporting
189 ab8747b7 Guido Trotter
----------------
190 ab8747b7 Guido Trotter
191 6559d7f8 Helga Velroyen
`gnt-node list`` can be queried for the different disk templates, if they
192 74df4a99 Helga Velroyen
are enabled. By default, it will just report information about the default
193 6559d7f8 Helga Velroyen
disk template. Examples::
194 74df4a99 Helga Velroyen
195 74df4a99 Helga Velroyen
  > gnt-node list
196 74df4a99 Helga Velroyen
  Node                       DTotal DFree MTotal MNode MFree Pinst Sinst
197 74df4a99 Helga Velroyen
  mynode1                      3.6T  3.6T  64.0G 1023M 62.2G     1     0
198 74df4a99 Helga Velroyen
  mynode2                      3.6T  3.6T  64.0G 1023M 62.0G     2     1
199 74df4a99 Helga Velroyen
  mynode3                      3.6T  3.6T  64.0G 1023M 62.3G     0     2
200 74df4a99 Helga Velroyen
201 6559d7f8 Helga Velroyen
  > gnt-node list -o dtotal/drbd,dfree/file
202 6559d7f8 Helga Velroyen
  Node      DTotal (drbd, myvg) DFree (file, mydir)
203 6559d7f8 Helga Velroyen
  mynode1                 3.6T                    -
204 6559d7f8 Helga Velroyen
  mynode2                 3.6T                    -
205 74df4a99 Helga Velroyen
206 74df4a99 Helga Velroyen
Note that for drbd, we only report the space of the vg and only if it was not
207 74df4a99 Helga Velroyen
renamed to something different than the default volume group name. With this
208 74df4a99 Helga Velroyen
design, there is also no possibility to ask about the meta volume group. We
209 74df4a99 Helga Velroyen
restrict the design here to make the transition to storage pools easier (as it
210 74df4a99 Helga Velroyen
is an interim state only). It is the administrator's responsibility to ensure
211 74df4a99 Helga Velroyen
that there is enough space for the meta volume group.
212 74df4a99 Helga Velroyen
213 6559d7f8 Helga Velroyen
When storage pools are implemented, we switch from referencing the disk template
214 6559d7f8 Helga Velroyen
to referencing the storage pool name. For that, of course, the pool names need
215 6559d7f8 Helga Velroyen
to be unique over all storage types. For drbd, we will use the default 'drbd'
216 6559d7f8 Helga Velroyen
storage pool and possibly a second lvm-based storage pool for the metavg. It
217 6559d7f8 Helga Velroyen
will be possible to rename storage pools (thus also the default lvm storage
218 6559d7f8 Helga Velroyen
pool). There will be new functionality to ask about what storage pools are
219 6559d7f8 Helga Velroyen
available and of what type. Storage pools will have a storage pool type which is
220 6559d7f8 Helga Velroyen
one of the disk templates. There can be more than one storage pool based on the
221 6559d7f8 Helga Velroyen
same disk template, therefore we will then start referencing the storage pool
222 6559d7f8 Helga Velroyen
name instead of the disk template.
223 6559d7f8 Helga Velroyen
224 6559d7f8 Helga Velroyen
``gnt-cluster info`` will report which disk templates are enabled, i.e.
225 74df4a99 Helga Velroyen
which ones are supported according to the cluster configuration. Example
226 74df4a99 Helga Velroyen
output::
227 74df4a99 Helga Velroyen
228 74df4a99 Helga Velroyen
  > gnt-cluster info
229 74df4a99 Helga Velroyen
  [...]
230 74df4a99 Helga Velroyen
  Cluster parameters:
231 74df4a99 Helga Velroyen
    - [...]
232 6559d7f8 Helga Velroyen
    - enabled disk templates: plain, drbd, sharedfile, rados
233 74df4a99 Helga Velroyen
    - [...]
234 74df4a99 Helga Velroyen
235 74df4a99 Helga Velroyen
``gnt-node list-storage`` will not be affected by any changes, since this design
236 6559d7f8 Helga Velroyen
is restricted only to free storage reporting for non-shared storage types.
237 ab8747b7 Guido Trotter
238 ab8747b7 Guido Trotter
Allocator changes
239 ab8747b7 Guido Trotter
-----------------
240 ab8747b7 Guido Trotter
241 ab8747b7 Guido Trotter
The iallocator protocol doesn't need to change: since we know which
242 6559d7f8 Helga Velroyen
disk template an instance has, we'll pass only the "free" value for that
243 6559d7f8 Helga Velroyen
disk template to the iallocator, when asking for an allocation to be
244 ab8747b7 Guido Trotter
made. Note that for DRBD nowadays we ignore the case when vg and metavg
245 6559d7f8 Helga Velroyen
are different, and we only consider the main volume group. Fixing this is
246 6559d7f8 Helga Velroyen
outside the scope of this design.
247 ab8747b7 Guido Trotter
248 74df4a99 Helga Velroyen
With this design, we ensure forward-compatibility with respect to storage
249 6559d7f8 Helga Velroyen
pools. For now, we'll report space for all available disk templates that
250 6559d7f8 Helga Velroyen
are based on non-shared storage types, in the future, for all available
251 6559d7f8 Helga Velroyen
storage pools.
252 74df4a99 Helga Velroyen
253 ab8747b7 Guido Trotter
Rebalancing changes
254 ab8747b7 Guido Trotter
-------------------
255 ab8747b7 Guido Trotter
256 ab8747b7 Guido Trotter
Hbal will not need changes, as it handles it already. We don't forecast
257 ab8747b7 Guido Trotter
any changes needed to it.
258 ab8747b7 Guido Trotter
259 ab8747b7 Guido Trotter
Space reporting changes
260 ab8747b7 Guido Trotter
-----------------------
261 ab8747b7 Guido Trotter
262 ab8747b7 Guido Trotter
Hspace will by default report by assuming the allocation will happen on
263 6559d7f8 Helga Velroyen
the default disk template for the cluster/nodegroup. An option will be added
264 ab8747b7 Guido Trotter
to manually specify a different storage.
265 ab8747b7 Guido Trotter
266 74df4a99 Helga Velroyen
Interactions with Partitioned Ganeti
267 74df4a99 Helga Velroyen
------------------------------------
268 74df4a99 Helga Velroyen
269 74df4a99 Helga Velroyen
Also the design for :doc:`Partitioned Ganeti <design-partitioned>` deals
270 74df4a99 Helga Velroyen
with reporting free space. Partitioned Ganeti has a different way to
271 74df4a99 Helga Velroyen
report free space for LVM on nodes where the ``exclusive_storage`` flag
272 6559d7f8 Helga Velroyen
is set. That doesn't interact directly with this design, as the specifics
273 74df4a99 Helga Velroyen
of how the free space is computed is not in the scope of this design.
274 74df4a99 Helga Velroyen
But the ``node info`` call contains the value of the
275 74df4a99 Helga Velroyen
``exclusive_storage`` flag, which is currently only meaningful for the
276 6559d7f8 Helga Velroyen
LVM storage type. Additional flags like the ``exclusive_storage`` flag
277 6559d7f8 Helga Velroyen
for lvm might be useful for other disk templates / storage types as well.
278 32f88ce7 Helga Velroyen
We therefore extend the RPC call with <storage_type>,<key> to
279 32f88ce7 Helga Velroyen
<storage_type>,<key>,[<param>] to include any disk-template-specific
280 6559d7f8 Helga Velroyen
(or storage-type specific) parameters in the RPC call.
281 74df4a99 Helga Velroyen
282 74df4a99 Helga Velroyen
The reporting of free spindles, also part of Partitioned Ganeti, is not
283 74df4a99 Helga Velroyen
concerned with this design doc, as those are seen as a separate resource.
284 74df4a99 Helga Velroyen
285 ab8747b7 Guido Trotter
.. vim: set textwidth=72 :
286 ab8747b7 Guido Trotter
.. Local Variables:
287 ab8747b7 Guido Trotter
.. mode: rst
288 ab8747b7 Guido Trotter
.. fill-column: 72
289 ab8747b7 Guido Trotter
.. End: