root / doc / design-storagespace.rst @ 94309823
History | View | Annotate | Download (6.9 kB)
1 |
============================ |
---|---|
2 |
Storage free space reporting |
3 |
============================ |
4 |
|
5 |
.. contents:: :depth: 4 |
6 |
|
7 |
Background |
8 |
========== |
9 |
|
10 |
Currently Space reporting is broken for all storage types except drbd or |
11 |
lvm (plain). This design looks at the root causes and proposes a way to |
12 |
fix it. |
13 |
|
14 |
Proposed changes |
15 |
================ |
16 |
|
17 |
The changes below will streamline Ganeti to properly support |
18 |
interaction with different storage types. |
19 |
|
20 |
Configuration changes |
21 |
--------------------- |
22 |
|
23 |
Add a new attribute "enabled_storage_types" (type: list of strings) to the |
24 |
cluster config which holds the types of storages, for example, "plain", "drbd", |
25 |
or "ext". We consider the first one of the list as the default type. |
26 |
|
27 |
For file storage, we'll report the storage space on the file storage dir, |
28 |
which is currently limited to one directory. In the future, if we'll have |
29 |
support for more directories, or for per-nodegroup directories this can be |
30 |
changed. |
31 |
|
32 |
Note that the abovementioned enabled_storage_types are just "mechanisms" |
33 |
parameters that define which storage types the cluster can use. Further |
34 |
filtering about what's allowed can go in the ipolicy, but these changes are |
35 |
not covered in this design doc. |
36 |
|
37 |
Since the ipolicy currently has a list of enabled storage types, we'll |
38 |
use that to decide which storage type is the default, and to self-select |
39 |
it for new instance creations, and reporting. |
40 |
|
41 |
Enabling/disabling of storage types at ``./configure`` time will be |
42 |
eventually removed. |
43 |
|
44 |
RPC changes |
45 |
----------- |
46 |
|
47 |
The noded RPC call that reports node storage space will be changed to |
48 |
accept a list of <type>,<key> string tuples. For each of them, it will |
49 |
report the free amount of storage space found on storage <key> as known |
50 |
by the requested storage type types. For example types are ``lvm``, |
51 |
``filesystem``, or ``rados``, and the key would be a volume group name, in |
52 |
the case of lvm, a directory name for the filesystem and a rados pool name |
53 |
for rados storage. |
54 |
|
55 |
For now, we will implement only the storage reporting for non-shared storage, |
56 |
that is ``filesystem`` and ``lvm``. For shared storage types like ``rados`` |
57 |
and ``ext`` we will not implement a free space calculation, because it does |
58 |
not make sense to query each node for the free space of a commonly used |
59 |
storage. |
60 |
|
61 |
Masterd will know (through a constant map) which storage type uses which |
62 |
type for storage calculation (i.e. ``plain`` and ``drbd`` use ``lvm``, |
63 |
``file`` uses ``filesystem``, etc) and query the one needed (or all of the |
64 |
needed ones). |
65 |
|
66 |
Note that for file and sharedfile the node knows which directories are |
67 |
allowed and won't allow any other directory to be queried for security |
68 |
reasons. The actual path still needs to be passed to distinguish the |
69 |
two, as the type will be the same for both. |
70 |
|
71 |
These calculations will be implemented in the node storage system |
72 |
(currently lib/storage.py) but querying will still happen through the |
73 |
``node info`` call, to avoid requiring an extra RPC each time. |
74 |
|
75 |
Ganeti reporting |
76 |
---------------- |
77 |
|
78 |
`gnt-node list`` can be queried for the different storage types, if they |
79 |
are enabled. By default, it will just report information about the default |
80 |
storage type. Examples:: |
81 |
|
82 |
> gnt-node list |
83 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
84 |
mynode1 3.6T 3.6T 64.0G 1023M 62.2G 1 0 |
85 |
mynode2 3.6T 3.6T 64.0G 1023M 62.0G 2 1 |
86 |
mynode3 3.6T 3.6T 64.0G 1023M 62.3G 0 2 |
87 |
|
88 |
> gnt-node list -o dtotal/lvm,dfree/rados |
89 |
Node DTotal (Lvm, myvg) DFree (Rados, myrados) |
90 |
mynode1 3.6T - |
91 |
mynode2 3.6T - |
92 |
|
93 |
Note that for drbd, we only report the space of the vg and only if it was not |
94 |
renamed to something different than the default volume group name. With this |
95 |
design, there is also no possibility to ask about the meta volume group. We |
96 |
restrict the design here to make the transition to storage pools easier (as it |
97 |
is an interim state only). It is the administrator's responsibility to ensure |
98 |
that there is enough space for the meta volume group. |
99 |
|
100 |
When storage pools are implemented, we switch from referencing the storage |
101 |
type to referencing the storage pool name. For that, of course, the pool |
102 |
names need to be unique over all storage types. For drbd, we will use the |
103 |
default 'lvm' storage pool and possibly a second lvm-based storage pool for |
104 |
the metavg. It will be possible to rename storage pools (thus also the default |
105 |
lvm storage pool). There will be new functionality to ask about what storage |
106 |
pools are available and of what type. |
107 |
|
108 |
``gnt-cluster info`` will report which storage types are enabled, i.e. |
109 |
which ones are supported according to the cluster configuration. Example |
110 |
output:: |
111 |
|
112 |
> gnt-cluster info |
113 |
[...] |
114 |
Cluster parameters: |
115 |
- [...] |
116 |
- enabled storage types: plain (default), drbd, lvm, rados |
117 |
- [...] |
118 |
|
119 |
``gnt-node list-storage`` will not be affected by any changes, since this design |
120 |
describes only free storage reporting for non-shared storage types. |
121 |
|
122 |
Allocator changes |
123 |
----------------- |
124 |
|
125 |
The iallocator protocol doesn't need to change: since we know which |
126 |
storage type an instance has, we'll pass only the "free" value for that |
127 |
storage type to the iallocator, when asking for an allocation to be |
128 |
made. Note that for DRBD nowadays we ignore the case when vg and metavg |
129 |
are different, and we only consider the main VG. Fixing this is outside |
130 |
the scope of this design. |
131 |
|
132 |
With this design, we ensure forward-compatibility with respect to storage |
133 |
pools. For now, we'll report space for all available (non-shared) storage |
134 |
types, in the future, for all available storage pools. |
135 |
|
136 |
Rebalancing changes |
137 |
------------------- |
138 |
|
139 |
Hbal will not need changes, as it handles it already. We don't forecast |
140 |
any changes needed to it. |
141 |
|
142 |
Space reporting changes |
143 |
----------------------- |
144 |
|
145 |
Hspace will by default report by assuming the allocation will happen on |
146 |
the default storage for the cluster/nodegroup. An option will be added |
147 |
to manually specify a different storage. |
148 |
|
149 |
Interactions with Partitioned Ganeti |
150 |
------------------------------------ |
151 |
|
152 |
Also the design for :doc:`Partitioned Ganeti <design-partitioned>` deals |
153 |
with reporting free space. Partitioned Ganeti has a different way to |
154 |
report free space for LVM on nodes where the ``exclusive_storage`` flag |
155 |
is set. That doesn't interact directly with this design, as the specific |
156 |
of how the free space is computed is not in the scope of this design. |
157 |
But the ``node info`` call contains the value of the |
158 |
``exclusive_storage`` flag, which is currently only meaningful for the |
159 |
LVM back-end. Additional flags like the ``external_storage`` flag |
160 |
for lvm might be useful for other storage types as well. We therefore |
161 |
extend the RPC call with <type>,<key> to <type>,<key>,<params> to |
162 |
include any storage-type specific parameters in the RPC call. |
163 |
|
164 |
The reporting of free spindles, also part of Partitioned Ganeti, is not |
165 |
concerned with this design doc, as those are seen as a separate resource. |
166 |
|
167 |
.. vim: set textwidth=72 : |
168 |
.. Local Variables: |
169 |
.. mode: rst |
170 |
.. fill-column: 72 |
171 |
.. End: |