root / man / hspace.rst @ 9ef605a6
History | View | Annotate | Download (13.7 kB)
1 |
HSPACE(1) Ganeti | Version @GANETI_VERSION@ |
---|---|
2 |
=========================================== |
3 |
|
4 |
NAME |
5 |
---- |
6 |
|
7 |
hspace - Cluster space analyzer for Ganeti |
8 |
|
9 |
SYNOPSIS |
10 |
-------- |
11 |
|
12 |
**hspace** {backend options...} [algorithm options...] [request options...] |
13 |
[ -p [*fields*] ] [-v... | -q] |
14 |
|
15 |
**hspace** --version |
16 |
|
17 |
Backend options: |
18 |
|
19 |
{ **-m** *cluster* | **-L[** *path* **] [-X]** | **-t** *data-file* | |
20 |
**--simulate** *spec* } |
21 |
|
22 |
|
23 |
Algorithm options: |
24 |
|
25 |
**[ --max-cpu *cpu-ratio* ]** |
26 |
**[ --min-disk *disk-ratio* ]** |
27 |
**[ -O *name...* ]** |
28 |
|
29 |
|
30 |
Request options: |
31 |
|
32 |
**[--memory** *mem* **]** |
33 |
**[--disk** *disk* **]** |
34 |
**[--disk-template** *template* **]** |
35 |
**[--vcpus** *vcpus* **]** |
36 |
**[--tiered-alloc** *spec* **]** |
37 |
|
38 |
|
39 |
DESCRIPTION |
40 |
----------- |
41 |
|
42 |
|
43 |
hspace computes how many additional instances can be fit on a cluster, |
44 |
while maintaining N+1 status. |
45 |
|
46 |
The program will try to place instances, all of the same size, on the |
47 |
cluster, until the point where we don't have any N+1 possible |
48 |
allocation. It uses the exact same allocation algorithm as the hail |
49 |
iallocator plugin in *allocate* mode. |
50 |
|
51 |
The output of the program is designed to interpreted as a shell |
52 |
fragment (or parsed as a *key=value* file). Options which extend the |
53 |
output (e.g. -p, -v) will output the additional information on stderr |
54 |
(such that the stdout is still parseable). |
55 |
|
56 |
The following keys are available in the output of the script (all |
57 |
prefixed with *HTS_*): |
58 |
|
59 |
SPEC_MEM, SPEC_DSK, SPEC_CPU, SPEC_RQN |
60 |
These represent the specifications of the instance model used for |
61 |
allocation (the memory, disk, cpu, requested nodes). |
62 |
|
63 |
CLUSTER_MEM, CLUSTER_DSK, CLUSTER_CPU, CLUSTER_NODES |
64 |
These represent the total memory, disk, CPU count and total nodes in |
65 |
the cluster. |
66 |
|
67 |
INI_SCORE, FIN_SCORE |
68 |
These are the initial (current) and final cluster score (see the hbal |
69 |
man page for details about the scoring algorithm). |
70 |
|
71 |
INI_INST_CNT, FIN_INST_CNT |
72 |
The initial and final instance count. |
73 |
|
74 |
INI_MEM_FREE, FIN_MEM_FREE |
75 |
The initial and final total free memory in the cluster (but this |
76 |
doesn't necessarily mean available for use). |
77 |
|
78 |
INI_MEM_AVAIL, FIN_MEM_AVAIL |
79 |
The initial and final total available memory for allocation in the |
80 |
cluster. If allocating redundant instances, new instances could |
81 |
increase the reserved memory so it doesn't necessarily mean the |
82 |
entirety of this memory can be used for new instance allocations. |
83 |
|
84 |
INI_MEM_RESVD, FIN_MEM_RESVD |
85 |
The initial and final reserved memory (for redundancy/N+1 purposes). |
86 |
|
87 |
INI_MEM_INST, FIN_MEM_INST |
88 |
The initial and final memory used for instances (actual runtime used |
89 |
RAM). |
90 |
|
91 |
INI_MEM_OVERHEAD, FIN_MEM_OVERHEAD |
92 |
The initial and final memory overhead--memory used for the node |
93 |
itself and unacounted memory (e.g. due to hypervisor overhead). |
94 |
|
95 |
INI_MEM_EFF, HTS_INI_MEM_EFF |
96 |
The initial and final memory efficiency, represented as instance |
97 |
memory divided by total memory. |
98 |
|
99 |
INI_DSK_FREE, INI_DSK_AVAIL, INI_DSK_RESVD, INI_DSK_INST, INI_DSK_EFF |
100 |
Initial disk stats, similar to the memory ones. |
101 |
|
102 |
FIN_DSK_FREE, FIN_DSK_AVAIL, FIN_DSK_RESVD, FIN_DSK_INST, FIN_DSK_EFF |
103 |
Final disk stats, similar to the memory ones. |
104 |
|
105 |
INI_CPU_INST, FIN_CPU_INST |
106 |
Initial and final number of virtual CPUs used by instances. |
107 |
|
108 |
INI_CPU_EFF, FIN_CPU_EFF |
109 |
The initial and final CPU efficiency, represented as the count of |
110 |
virtual instance CPUs divided by the total physical CPU count. |
111 |
|
112 |
INI_MNODE_MEM_AVAIL, FIN_MNODE_MEM_AVAIL |
113 |
The initial and final maximum per-node available memory. This is not |
114 |
very useful as a metric but can give an impression of the status of |
115 |
the nodes; as an example, this value restricts the maximum instance |
116 |
size that can be still created on the cluster. |
117 |
|
118 |
INI_MNODE_DSK_AVAIL, FIN_MNODE_DSK_AVAIL |
119 |
Like the above but for disk. |
120 |
|
121 |
TSPEC |
122 |
If the tiered allocation mode has been enabled, this parameter holds |
123 |
the pairs of specifications and counts of instances that can be |
124 |
created in this mode. The value of the key is a space-separated list |
125 |
of values; each value is of the form *memory,disk,vcpu=count* where |
126 |
the memory, disk and vcpu are the values for the current spec, and |
127 |
count is how many instances of this spec can be created. A complete |
128 |
value for this variable could be: **4096,102400,2=225 |
129 |
2560,102400,2=20 512,102400,2=21**. |
130 |
|
131 |
KM_USED_CPU, KM_USED_NPU, KM_USED_MEM, KM_USED_DSK |
132 |
These represents the metrics of used resources at the start of the |
133 |
computation (only for tiered allocation mode). The NPU value is |
134 |
"normalized" CPU count, i.e. the number of virtual CPUs divided by |
135 |
the maximum ratio of the virtual to physical CPUs. |
136 |
|
137 |
KM_POOL_CPU, KM_POOL_NPU, KM_POOL_MEM, KM_POOL_DSK |
138 |
These represents the total resources allocated during the tiered |
139 |
allocation process. In effect, they represent how much is readily |
140 |
available for allocation. |
141 |
|
142 |
KM_UNAV_CPU, KM_POOL_NPU, KM_UNAV_MEM, KM_UNAV_DSK |
143 |
These represents the resources left over (either free as in |
144 |
unallocable or allocable on their own) after the tiered allocation |
145 |
has been completed. They represent better the actual unallocable |
146 |
resources, because some other resource has been exhausted. For |
147 |
example, the cluster might still have 100GiB disk free, but with no |
148 |
memory left for instances, we cannot allocate another instance, so |
149 |
in effect the disk space is unallocable. Note that the CPUs here |
150 |
represent instance virtual CPUs, and in case the *--max-cpu* option |
151 |
hasn't been specified this will be -1. |
152 |
|
153 |
ALLOC_USAGE |
154 |
The current usage represented as initial number of instances divided |
155 |
per final number of instances. |
156 |
|
157 |
ALLOC_COUNT |
158 |
The number of instances allocated (delta between FIN_INST_CNT and |
159 |
INI_INST_CNT). |
160 |
|
161 |
ALLOC_FAIL*_CNT |
162 |
For the last attemp at allocations (which would have increased |
163 |
FIN_INST_CNT with one, if it had succeeded), this is the count of |
164 |
the failure reasons per failure type; currently defined are FAILMEM, |
165 |
FAILDISK and FAILCPU which represent errors due to not enough |
166 |
memory, disk and CPUs, and FAILN1 which represents a non N+1 |
167 |
compliant cluster on which we can't allocate instances at all. |
168 |
|
169 |
ALLOC_FAIL_REASON |
170 |
The reason for most of the failures, being one of the above FAIL* |
171 |
strings. |
172 |
|
173 |
OK |
174 |
A marker representing the successful end of the computation, and |
175 |
having value "1". If this key is not present in the output it means |
176 |
that the computation failed and any values present should not be |
177 |
relied upon. |
178 |
|
179 |
If the tiered allocation mode is enabled, then many of the INI_/FIN_ |
180 |
metrics will be also displayed with a TRL_ prefix, and denote the |
181 |
cluster status at the end of the tiered allocation run. |
182 |
|
183 |
OPTIONS |
184 |
------- |
185 |
|
186 |
The options that can be passed to the program are as follows: |
187 |
|
188 |
--memory *mem* |
189 |
The memory size of the instances to be placed (defaults to 4GiB). |
190 |
|
191 |
--disk *disk* |
192 |
The disk size of the instances to be placed (defaults to 100GiB). |
193 |
|
194 |
--disk-template *template* |
195 |
The disk template for the instance; one of the Ganeti disk templates |
196 |
(e.g. plain, drbd, so on) should be passed in. |
197 |
|
198 |
--vcpus *vcpus* |
199 |
The number of VCPUs of the instances to be placed (defaults to 1). |
200 |
|
201 |
--max-cpu=*cpu-ratio* |
202 |
The maximum virtual to physical cpu ratio, as a floating point |
203 |
number between zero and one. For example, specifying *cpu-ratio* as |
204 |
**2.5** means that, for a 4-cpu machine, a maximum of 10 virtual |
205 |
cpus should be allowed to be in use for primary instances. A value |
206 |
of one doesn't make sense though, as that means no disk space can be |
207 |
used on it. |
208 |
|
209 |
--min-disk=*disk-ratio* |
210 |
The minimum amount of free disk space remaining, as a floating point |
211 |
number. For example, specifying *disk-ratio* as **0.25** means that |
212 |
at least one quarter of disk space should be left free on nodes. |
213 |
|
214 |
-p, --print-nodes |
215 |
Prints the before and after node status, in a format designed to |
216 |
allow the user to understand the node's most important parameters. |
217 |
|
218 |
It is possible to customise the listed information by passing a |
219 |
comma-separated list of field names to this option (the field list |
220 |
is currently undocumented), or to extend the default field list by |
221 |
prefixing the additional field list with a plus sign. By default, |
222 |
the node list will contain the following information: |
223 |
|
224 |
F |
225 |
a character denoting the status of the node, with '-' meaning an |
226 |
offline node, '*' meaning N+1 failure and blank meaning a good |
227 |
node |
228 |
|
229 |
Name |
230 |
the node name |
231 |
|
232 |
t_mem |
233 |
the total node memory |
234 |
|
235 |
n_mem |
236 |
the memory used by the node itself |
237 |
|
238 |
i_mem |
239 |
the memory used by instances |
240 |
|
241 |
x_mem |
242 |
amount memory which seems to be in use but cannot be determined |
243 |
why or by which instance; usually this means that the hypervisor |
244 |
has some overhead or that there are other reporting errors |
245 |
|
246 |
f_mem |
247 |
the free node memory |
248 |
|
249 |
r_mem |
250 |
the reserved node memory, which is the amount of free memory |
251 |
needed for N+1 compliance |
252 |
|
253 |
t_dsk |
254 |
total disk |
255 |
|
256 |
f_dsk |
257 |
free disk |
258 |
|
259 |
pcpu |
260 |
the number of physical cpus on the node |
261 |
|
262 |
vcpu |
263 |
the number of virtual cpus allocated to primary instances |
264 |
|
265 |
pcnt |
266 |
number of primary instances |
267 |
|
268 |
scnt |
269 |
number of secondary instances |
270 |
|
271 |
p_fmem |
272 |
percent of free memory |
273 |
|
274 |
p_fdsk |
275 |
percent of free disk |
276 |
|
277 |
r_cpu |
278 |
ratio of virtual to physical cpus |
279 |
|
280 |
lCpu |
281 |
the dynamic CPU load (if the information is available) |
282 |
|
283 |
lMem |
284 |
the dynamic memory load (if the information is available) |
285 |
|
286 |
lDsk |
287 |
the dynamic disk load (if the information is available) |
288 |
|
289 |
lNet |
290 |
the dynamic net load (if the information is available) |
291 |
|
292 |
-O *name* |
293 |
This option (which can be given multiple times) will mark nodes as |
294 |
being *offline*. This means a couple of things: |
295 |
|
296 |
- instances won't be placed on these nodes, not even temporarily; |
297 |
e.g. the *replace primary* move is not available if the secondary |
298 |
node is offline, since this move requires a failover. |
299 |
- these nodes will not be included in the score calculation (except |
300 |
for the percentage of instances on offline nodes) |
301 |
|
302 |
Note that the algorithm will also mark as offline any nodes which |
303 |
are reported by RAPI as such, or that have "?" in file-based input |
304 |
in any numeric fields. |
305 |
|
306 |
-t *datafile*, --text-data=*datafile* |
307 |
The name of the file holding node and instance information (if not |
308 |
collecting via RAPI or LUXI). This or one of the other backends must |
309 |
be selected. |
310 |
|
311 |
-S *filename*, --save-cluster=*filename* |
312 |
If given, the state of the cluster at the end of the allocation is |
313 |
saved to a file named *filename.alloc*, and if tiered allocation is |
314 |
enabled, the state after tiered allocation will be saved to |
315 |
*filename.tiered*. This allows re-feeding the cluster state to |
316 |
either hspace itself (with different parameters) or for example |
317 |
hbal. |
318 |
|
319 |
-m *cluster* |
320 |
Collect data directly from the *cluster* given as an argument via |
321 |
RAPI. If the argument doesn't contain a colon (:), then it is |
322 |
converted into a fully-built URL via prepending ``https://`` and |
323 |
appending the default RAPI port, otherwise it's considered a |
324 |
fully-specified URL and is used as-is. |
325 |
|
326 |
-L [*path*] |
327 |
Collect data directly from the master daemon, which is to be |
328 |
contacted via the luxi (an internal Ganeti protocol). An optional |
329 |
*path* argument is interpreted as the path to the unix socket on |
330 |
which the master daemon listens; otherwise, the default path used by |
331 |
ganeti when installed with *--localstatedir=/var* is used. |
332 |
|
333 |
--simulate *description* |
334 |
Instead of using actual data, build an empty cluster given a node |
335 |
description. The *description* parameter must be a comma-separated |
336 |
list of five elements, describing in order: |
337 |
|
338 |
- the allocation policy for this node group |
339 |
- the number of nodes in the cluster |
340 |
- the disk size of the nodes, in mebibytes |
341 |
- the memory size of the nodes, in mebibytes |
342 |
- the cpu core count for the nodes |
343 |
|
344 |
An example description would be **preferred,B20,102400,16384,4** |
345 |
describing a 20-node cluster where each node has 100GiB of disk |
346 |
space, 16GiB of memory and 4 CPU cores. Note that all nodes must |
347 |
have the same specs currently. |
348 |
|
349 |
This option can be given multiple times, and each new use defines a |
350 |
new node group. Hence different node groups can have different |
351 |
allocation policies and node count/specifications. |
352 |
|
353 |
--tiered-alloc *spec* |
354 |
Besides the standard, fixed-size allocation, also do a tiered |
355 |
allocation scheme where the algorithm starts from the given |
356 |
specification and allocates until there is no more space; then it |
357 |
decreases the specification and tries the allocation again. The |
358 |
decrease is done on the matric that last failed during |
359 |
allocation. The specification given is similar to the *--simulate* |
360 |
option and it holds: |
361 |
|
362 |
- the disk size of the instance |
363 |
- the memory size of the instance |
364 |
- the vcpu count for the insance |
365 |
|
366 |
An example description would be *10240,8192,2* describing an initial |
367 |
starting specification of 10GiB of disk space, 4GiB of memory and 2 |
368 |
VCPUs. |
369 |
|
370 |
Also note that the normal allocation and the tiered allocation are |
371 |
independent, and both start from the initial cluster state; as such, |
372 |
the instance count for these two modes are not related one to |
373 |
another. |
374 |
|
375 |
-v, --verbose |
376 |
Increase the output verbosity. Each usage of this option will |
377 |
increase the verbosity (currently more than 2 doesn't make sense) |
378 |
from the default of one. |
379 |
|
380 |
-q, --quiet |
381 |
Decrease the output verbosity. Each usage of this option will |
382 |
decrease the verbosity (less than zero doesn't make sense) from the |
383 |
default of one. |
384 |
|
385 |
-V, --version |
386 |
Just show the program version and exit. |
387 |
|
388 |
EXIT STATUS |
389 |
----------- |
390 |
|
391 |
The exist status of the command will be zero, unless for some reason |
392 |
the algorithm fatally failed (e.g. wrong node or instance data). |
393 |
|
394 |
BUGS |
395 |
---- |
396 |
|
397 |
The algorithm is highly dependent on the number of nodes; its runtime |
398 |
grows exponentially with this number, and as such is impractical for |
399 |
really big clusters. |
400 |
|
401 |
The algorithm doesn't rebalance the cluster or try to get the optimal |
402 |
fit; it just allocates in the best place for the current step, without |
403 |
taking into consideration the impact on future placements. |
404 |
|
405 |
.. vim: set textwidth=72 : |
406 |
.. Local Variables: |
407 |
.. mode: rst |
408 |
.. fill-column: 72 |
409 |
.. End: |