root / man / hspace.rst @ 2c04ae0d
History | View | Annotate | Download (13.7 kB)
1 |
HSPACE(1) Ganeti | Version @GANETI_VERSION@ |
---|---|
2 |
=========================================== |
3 |
|
4 |
NAME |
5 |
---- |
6 |
|
7 |
hspace - Cluster space analyzer for Ganeti |
8 |
|
9 |
SYNOPSIS |
10 |
-------- |
11 |
|
12 |
**hspace** {backend options...} [algorithm options...] [request options...] |
13 |
[output options...] [-v... | -q] |
14 |
|
15 |
**hspace** \--version |
16 |
|
17 |
Backend options: |
18 |
|
19 |
{ **-m** *cluster* | **-L[** *path* **] [-X]** | **-t** *data-file* | |
20 |
**\--simulate** *spec* | **-I** *path* } |
21 |
|
22 |
|
23 |
Algorithm options: |
24 |
|
25 |
**[ \--max-cpu *cpu-ratio* ]** |
26 |
**[ \--min-disk *disk-ratio* ]** |
27 |
**[ -O *name...* ]** |
28 |
|
29 |
|
30 |
Request options: |
31 |
|
32 |
**[\--disk-template** *template* **]** |
33 |
|
34 |
**[\--standard-alloc** *disk,ram,cpu* **]** |
35 |
|
36 |
**[\--tiered-alloc** *disk,ram,cpu* **]** |
37 |
|
38 |
Output options: |
39 |
|
40 |
**[\--machine-readable**[=*CHOICE*] **]** |
41 |
**[-p**[*fields*]**]** |
42 |
|
43 |
|
44 |
DESCRIPTION |
45 |
----------- |
46 |
|
47 |
hspace computes how many additional instances can be fit on a cluster, |
48 |
while maintaining N+1 status. |
49 |
|
50 |
The program will try to place instances, all of the same size, on the |
51 |
cluster, until the point where we don't have any N+1 possible |
52 |
allocation. It uses the exact same allocation algorithm as the hail |
53 |
iallocator plugin in *allocate* mode. |
54 |
|
55 |
The output of the program is designed either for human consumption (the |
56 |
default) or, when enabled with the ``--machine-readable`` option |
57 |
(described further below), for machine consumption. In the latter case, |
58 |
it is intended to interpreted as a shell fragment (or parsed as a |
59 |
*key=value* file). Options which extend the output (e.g. -p, -v) will |
60 |
output the additional information on stderr (such that the stdout is |
61 |
still parseable). |
62 |
|
63 |
By default, the instance specifications will be read from the cluster; |
64 |
the options ``--standard-alloc`` and ``--tiered-alloc`` can be used to |
65 |
override them. |
66 |
|
67 |
The following keys are available in the machine-readable output of the |
68 |
script (all prefixed with *HTS_*): |
69 |
|
70 |
SPEC_MEM, SPEC_DSK, SPEC_CPU, SPEC_RQN, SPEC_DISK_TEMPLATE |
71 |
These represent the specifications of the instance model used for |
72 |
allocation (the memory, disk, cpu, requested nodes, disk template). |
73 |
|
74 |
TSPEC_INI_MEM, TSPEC_INI_DSK, TSPEC_INI_CPU, ... |
75 |
Only defined when the tiered mode allocation is enabled, these are |
76 |
similar to the above specifications but show the initial starting spec |
77 |
for tiered allocation. |
78 |
|
79 |
CLUSTER_MEM, CLUSTER_DSK, CLUSTER_CPU, CLUSTER_NODES |
80 |
These represent the total memory, disk, CPU count and total nodes in |
81 |
the cluster. |
82 |
|
83 |
INI_SCORE, FIN_SCORE |
84 |
These are the initial (current) and final cluster score (see the hbal |
85 |
man page for details about the scoring algorithm). |
86 |
|
87 |
INI_INST_CNT, FIN_INST_CNT |
88 |
The initial and final instance count. |
89 |
|
90 |
INI_MEM_FREE, FIN_MEM_FREE |
91 |
The initial and final total free memory in the cluster (but this |
92 |
doesn't necessarily mean available for use). |
93 |
|
94 |
INI_MEM_AVAIL, FIN_MEM_AVAIL |
95 |
The initial and final total available memory for allocation in the |
96 |
cluster. If allocating redundant instances, new instances could |
97 |
increase the reserved memory so it doesn't necessarily mean the |
98 |
entirety of this memory can be used for new instance allocations. |
99 |
|
100 |
INI_MEM_RESVD, FIN_MEM_RESVD |
101 |
The initial and final reserved memory (for redundancy/N+1 purposes). |
102 |
|
103 |
INI_MEM_INST, FIN_MEM_INST |
104 |
The initial and final memory used for instances (actual runtime used |
105 |
RAM). |
106 |
|
107 |
INI_MEM_OVERHEAD, FIN_MEM_OVERHEAD |
108 |
The initial and final memory overhead, i.e. memory used for the node |
109 |
itself and unaccounted memory (e.g. due to hypervisor overhead). |
110 |
|
111 |
INI_MEM_EFF, HTS_INI_MEM_EFF |
112 |
The initial and final memory efficiency, represented as instance |
113 |
memory divided by total memory. |
114 |
|
115 |
INI_DSK_FREE, INI_DSK_AVAIL, INI_DSK_RESVD, INI_DSK_INST, INI_DSK_EFF |
116 |
Initial disk stats, similar to the memory ones. |
117 |
|
118 |
FIN_DSK_FREE, FIN_DSK_AVAIL, FIN_DSK_RESVD, FIN_DSK_INST, FIN_DSK_EFF |
119 |
Final disk stats, similar to the memory ones. |
120 |
|
121 |
INI_CPU_INST, FIN_CPU_INST |
122 |
Initial and final number of virtual CPUs used by instances. |
123 |
|
124 |
INI_CPU_EFF, FIN_CPU_EFF |
125 |
The initial and final CPU efficiency, represented as the count of |
126 |
virtual instance CPUs divided by the total physical CPU count. |
127 |
|
128 |
INI_MNODE_MEM_AVAIL, FIN_MNODE_MEM_AVAIL |
129 |
The initial and final maximum per-node available memory. This is not |
130 |
very useful as a metric but can give an impression of the status of |
131 |
the nodes; as an example, this value restricts the maximum instance |
132 |
size that can be still created on the cluster. |
133 |
|
134 |
INI_MNODE_DSK_AVAIL, FIN_MNODE_DSK_AVAIL |
135 |
Like the above but for disk. |
136 |
|
137 |
TSPEC |
138 |
This parameter holds the pairs of specifications and counts of |
139 |
instances that can be created in the *tiered allocation* mode. The |
140 |
value of the key is a space-separated list of values; each value is of |
141 |
the form *memory,disk,vcpu=count* where the memory, disk and vcpu are |
142 |
the values for the current spec, and count is how many instances of |
143 |
this spec can be created. A complete value for this variable could be: |
144 |
**4096,102400,2=225 2560,102400,2=20 512,102400,2=21**. |
145 |
|
146 |
KM_USED_CPU, KM_USED_NPU, KM_USED_MEM, KM_USED_DSK |
147 |
These represents the metrics of used resources at the start of the |
148 |
computation (only for tiered allocation mode). The NPU value is |
149 |
"normalized" CPU count, i.e. the number of virtual CPUs divided by |
150 |
the maximum ratio of the virtual to physical CPUs. |
151 |
|
152 |
KM_POOL_CPU, KM_POOL_NPU, KM_POOL_MEM, KM_POOL_DSK |
153 |
These represents the total resources allocated during the tiered |
154 |
allocation process. In effect, they represent how much is readily |
155 |
available for allocation. |
156 |
|
157 |
KM_UNAV_CPU, KM_POOL_NPU, KM_UNAV_MEM, KM_UNAV_DSK |
158 |
These represents the resources left over (either free as in |
159 |
unallocable or allocable on their own) after the tiered allocation |
160 |
has been completed. They represent better the actual unallocable |
161 |
resources, because some other resource has been exhausted. For |
162 |
example, the cluster might still have 100GiB disk free, but with no |
163 |
memory left for instances, we cannot allocate another instance, so |
164 |
in effect the disk space is unallocable. Note that the CPUs here |
165 |
represent instance virtual CPUs, and in case the *\--max-cpu* option |
166 |
hasn't been specified this will be -1. |
167 |
|
168 |
ALLOC_USAGE |
169 |
The current usage represented as initial number of instances divided |
170 |
per final number of instances. |
171 |
|
172 |
ALLOC_COUNT |
173 |
The number of instances allocated (delta between FIN_INST_CNT and |
174 |
INI_INST_CNT). |
175 |
|
176 |
ALLOC_FAIL*_CNT |
177 |
For the last attemp at allocations (which would have increased |
178 |
FIN_INST_CNT with one, if it had succeeded), this is the count of |
179 |
the failure reasons per failure type; currently defined are FAILMEM, |
180 |
FAILDISK and FAILCPU which represent errors due to not enough |
181 |
memory, disk and CPUs, and FAILN1 which represents a non N+1 |
182 |
compliant cluster on which we can't allocate instances at all. |
183 |
|
184 |
ALLOC_FAIL_REASON |
185 |
The reason for most of the failures, being one of the above FAIL* |
186 |
strings. |
187 |
|
188 |
OK |
189 |
A marker representing the successful end of the computation, and |
190 |
having value "1". If this key is not present in the output it means |
191 |
that the computation failed and any values present should not be |
192 |
relied upon. |
193 |
|
194 |
Many of the ``INI_``/``FIN_`` metrics will be also displayed with a |
195 |
``TRL_`` prefix, and denote the cluster status at the end of the tiered |
196 |
allocation run. |
197 |
|
198 |
The human output format should be self-explanatory, so it is not |
199 |
described further. |
200 |
|
201 |
OPTIONS |
202 |
------- |
203 |
|
204 |
The options that can be passed to the program are as follows: |
205 |
|
206 |
\--disk-template *template* |
207 |
Overrides the disk template for the instance read from the cluster; |
208 |
one of the Ganeti disk templates (e.g. plain, drbd, so on) should be |
209 |
passed in. |
210 |
|
211 |
\--spindle-use *spindles* |
212 |
Override the spindle use for the instance read from the cluster. The |
213 |
value can be 0 (for example for instances that use very low I/O), but not |
214 |
negative. For shared storage the value is ignored. |
215 |
|
216 |
\--max-cpu=*cpu-ratio* |
217 |
The maximum virtual to physical cpu ratio, as a floating point number |
218 |
greater than or equal to one. For example, specifying *cpu-ratio* as |
219 |
**2.5** means that, for a 4-cpu machine, a maximum of 10 virtual cpus |
220 |
should be allowed to be in use for primary instances. A value of |
221 |
exactly one means there will be no over-subscription of CPU (except |
222 |
for the CPU time used by the node itself), and values below one do not |
223 |
make sense, as that means other resources (e.g. disk) won't be fully |
224 |
utilised due to CPU restrictions. |
225 |
|
226 |
\--min-disk=*disk-ratio* |
227 |
The minimum amount of free disk space remaining, as a floating point |
228 |
number. For example, specifying *disk-ratio* as **0.25** means that |
229 |
at least one quarter of disk space should be left free on nodes. |
230 |
|
231 |
-l *rounds*, \--max-length=*rounds* |
232 |
Restrict the number of instance allocations to this length. This is |
233 |
not very useful in practice, but can be used for testing hspace |
234 |
itself, or to limit the runtime for very big clusters. |
235 |
|
236 |
-p, \--print-nodes |
237 |
Prints the before and after node status, in a format designed to allow |
238 |
the user to understand the node's most important parameters. See the |
239 |
man page **htools**\(1) for more details about this option. |
240 |
|
241 |
-O *name* |
242 |
This option (which can be given multiple times) will mark nodes as |
243 |
being *offline*. This means a couple of things: |
244 |
|
245 |
- instances won't be placed on these nodes, not even temporarily; |
246 |
e.g. the *replace primary* move is not available if the secondary |
247 |
node is offline, since this move requires a failover. |
248 |
- these nodes will not be included in the score calculation (except |
249 |
for the percentage of instances on offline nodes) |
250 |
|
251 |
Note that the algorithm will also mark as offline any nodes which |
252 |
are reported by RAPI as such, or that have "?" in file-based input |
253 |
in any numeric fields. |
254 |
|
255 |
-S *filename*, \--save-cluster=*filename* |
256 |
If given, the state of the cluster at the end of the allocation is |
257 |
saved to a file named *filename.alloc*, and if tiered allocation is |
258 |
enabled, the state after tiered allocation will be saved to |
259 |
*filename.tiered*. This allows re-feeding the cluster state to |
260 |
either hspace itself (with different parameters) or for example |
261 |
hbal, via the ``-t`` option. |
262 |
|
263 |
-t *datafile*, \--text-data=*datafile* |
264 |
Backend specification: the name of the file holding node and instance |
265 |
information (if not collecting via RAPI or LUXI). This or one of the |
266 |
other backends must be selected. The option is described in the man |
267 |
page **htools**\(1). |
268 |
|
269 |
-m *cluster* |
270 |
Backend specification: collect data directly from the *cluster* given |
271 |
as an argument via RAPI. The option is described in the man page |
272 |
**htools**\(1). |
273 |
|
274 |
-L [*path*] |
275 |
Backend specification: collect data directly from the master daemon, |
276 |
which is to be contacted via LUXI (an internal Ganeti protocol). The |
277 |
option is described in the man page **htools**\(1). |
278 |
|
279 |
\--simulate *description* |
280 |
Backend specification: similar to the **-t** option, this allows |
281 |
overriding the cluster data with a simulated cluster. For details |
282 |
about the description, see the man page **htools**\(1). |
283 |
|
284 |
\--standard-alloc *disk,ram,cpu* |
285 |
This option overrides the instance size read from the cluster for the |
286 |
*standard* allocation mode, where we simply allocate instances of the |
287 |
same, fixed size until the cluster runs out of space. |
288 |
|
289 |
The specification given is similar to the *\--simulate* option and it |
290 |
holds: |
291 |
|
292 |
- the disk size of the instance (units can be used) |
293 |
- the memory size of the instance (units can be used) |
294 |
- the vcpu count for the insance |
295 |
|
296 |
An example description would be *100G,4g,2* describing an instance |
297 |
specification of 100GB of disk space, 4GiB of memory and 2 VCPUs. |
298 |
|
299 |
\--tiered-alloc *disk,ram,cpu* |
300 |
This option overrides the instance size for the *tiered* allocation |
301 |
mode. In this mode, the algorithm starts from the given specification |
302 |
and allocates until there is no more space; then it decreases the |
303 |
specification and tries the allocation again. The decrease is done on |
304 |
the metric that last failed during allocation. The argument should |
305 |
have the same format as for ``--standard-alloc``. |
306 |
|
307 |
Also note that the normal allocation and the tiered allocation are |
308 |
independent, and both start from the initial cluster state; as such, |
309 |
the instance count for these two modes are not related one to |
310 |
another. |
311 |
|
312 |
\--machine-readable[=*choice*] |
313 |
By default, the output of the program is in "human-readable" format, |
314 |
i.e. text descriptions. By passing this flag you can either enable |
315 |
(``--machine-readable`` or ``--machine-readable=yes``) or explicitly |
316 |
disable (``--machine-readable=no``) the machine readable format |
317 |
described above. |
318 |
|
319 |
-v, \--verbose |
320 |
Increase the output verbosity. Each usage of this option will |
321 |
increase the verbosity (currently more than 2 doesn't make sense) |
322 |
from the default of one. |
323 |
|
324 |
-q, \--quiet |
325 |
Decrease the output verbosity. Each usage of this option will |
326 |
decrease the verbosity (less than zero doesn't make sense) from the |
327 |
default of one. |
328 |
|
329 |
-V, \--version |
330 |
Just show the program version and exit. |
331 |
|
332 |
UNITS |
333 |
~~~~~ |
334 |
|
335 |
By default, all unit-accepting options use mebibytes. Using the |
336 |
lower-case letters of *m*, *g* and *t* (or their longer equivalents of |
337 |
*mib*, *gib*, *tib*, for which case doesn't matter) explicit binary |
338 |
units can be selected. Units in the SI system can be selected using the |
339 |
upper-case letters of *M*, *G* and *T* (or their longer equivalents of |
340 |
*MB*, *GB*, *TB*, for which case doesn't matter). |
341 |
|
342 |
More details about the difference between the SI and binary systems can |
343 |
be read in the **units**\(7) man page. |
344 |
|
345 |
EXIT STATUS |
346 |
----------- |
347 |
|
348 |
The exist status of the command will be zero, unless for some reason |
349 |
the algorithm fatally failed (e.g. wrong node or instance data). |
350 |
|
351 |
BUGS |
352 |
---- |
353 |
|
354 |
The algorithm is highly dependent on the number of nodes; its runtime |
355 |
grows exponentially with this number, and as such is impractical for |
356 |
really big clusters. |
357 |
|
358 |
The algorithm doesn't rebalance the cluster or try to get the optimal |
359 |
fit; it just allocates in the best place for the current step, without |
360 |
taking into consideration the impact on future placements. |
361 |
|
362 |
.. vim: set textwidth=72 : |
363 |
.. Local Variables: |
364 |
.. mode: rst |
365 |
.. fill-column: 72 |
366 |
.. End: |