root / hspace.1 @ 4886952e
History | View | Annotate | Download (14 kB)
1 |
.TH HSPACE 1 2009-06-01 htools "Ganeti H-tools" |
---|---|
2 |
.SH NAME |
3 |
hspace \- Cluster space analyzer for Ganeti |
4 |
|
5 |
.SH SYNOPSIS |
6 |
.B hspace |
7 |
.B "[backend options...]" |
8 |
.B "[algorithm options...]" |
9 |
.B "[request options..."] |
10 |
.BI "[ -p[" fields "] ]" |
11 |
.B "[-v... | -q]" |
12 |
|
13 |
.B hspace |
14 |
.B --version |
15 |
|
16 |
.TP |
17 |
Backend options: |
18 |
.BI " -m " cluster |
19 |
| |
20 |
.BI " -L[" path "]" |
21 |
| |
22 |
.BI " -t " data-file |
23 |
| |
24 |
.BI " --simulate " spec |
25 |
|
26 |
.TP |
27 |
Algorithm options: |
28 |
.BI "[ --max-cpu " cpu-ratio " ]" |
29 |
.BI "[ --min-disk " disk-ratio " ]" |
30 |
.BI "[ -O " name... " ]" |
31 |
|
32 |
.TP |
33 |
Request options: |
34 |
.BI "[--memory " mem "]" |
35 |
.BI "[--disk " disk "]" |
36 |
.BI "[--req-nodes " req-nodes "]" |
37 |
.BI "[--vcpus " vcpus "]" |
38 |
.BI "[--tiered-alloc " spec "]" |
39 |
|
40 |
|
41 |
.SH DESCRIPTION |
42 |
hspace computes how many additional instances can be fit on a cluster, |
43 |
while maintaining N+1 status. |
44 |
|
45 |
The program will try to place instances, all of the same size, on the |
46 |
cluster, until the point where we don't have any N+1 possible |
47 |
allocation. It uses the exact same allocation algorithm as the hail |
48 |
iallocator plugin. |
49 |
|
50 |
The output of the program is designed to interpreted as a shell |
51 |
fragment (or parsed as a \fIkey=value\fR file). Options which extend |
52 |
the output (e.g. \-p, \-v) will output the additional information on |
53 |
stderr (such that the stdout is still parseable). |
54 |
|
55 |
The following keys are available in the output of the script (all |
56 |
prefixed with \fIHTS_\fR): |
57 |
.TP |
58 |
.I SPEC_MEM, SPEC_DSK, SPEC_CPU, SPEC_RQN |
59 |
These represent the specifications of the instance model used for |
60 |
allocation (the memory, disk, cpu, requested nodes). |
61 |
|
62 |
.TP |
63 |
.I CLUSTER_MEM, CLUSTER_DSK, CLUSTER_CPU, CLUSTER_NODES |
64 |
These represent the total memory, disk, CPU count and total nodes in |
65 |
the cluster. |
66 |
|
67 |
.TP |
68 |
.I INI_SCORE, FIN_SCORE |
69 |
These are the initial (current) and final cluster score (see the hbal |
70 |
man page for details about the scoring algorithm). |
71 |
|
72 |
.TP |
73 |
.I INI_INST_CNT, FIN_INST_CNT |
74 |
The initial and final instance count. |
75 |
|
76 |
.TP |
77 |
.I INI_MEM_FREE, FIN_MEM_FREE |
78 |
The initial and final total free memory in the cluster (but this |
79 |
doesn't necessarily mean available for use). |
80 |
|
81 |
.TP |
82 |
.I INI_MEM_AVAIL, FIN_MEM_AVAIL |
83 |
The initial and final total available memory for allocation in the |
84 |
cluster. If allocating redundant instances, new instances could |
85 |
increase the reserved memory so it doesn't necessarily mean the |
86 |
entirety of this memory can be used for new instance allocations. |
87 |
|
88 |
.TP |
89 |
.I INI_MEM_RESVD, FIN_MEM_RESVD |
90 |
The initial and final reserved memory (for redundancy/N+1 purposes). |
91 |
|
92 |
.TP |
93 |
.I INI_MEM_INST, FIN_MEM_INST |
94 |
The initial and final memory used for instances (actual runtime used |
95 |
RAM). |
96 |
|
97 |
.TP |
98 |
.I INI_MEM_OVERHEAD, FIN_MEM_OVERHEAD |
99 |
The initial and final memory overhead \(em memory used for the node |
100 |
itself and unacounted memory (e.g. due to hypervisor overhead). |
101 |
|
102 |
.TP |
103 |
.I INI_MEM_EFF, HTS_INI_MEM_EFF |
104 |
The initial and final memory efficiency, represented as instance |
105 |
memory divided by total memory. |
106 |
|
107 |
.TP |
108 |
.I INI_DSK_FREE, INI_DSK_AVAIL, INI_DSK_RESVD, INI_DSK_INST, INI_DSK_EFF |
109 |
Initial disk stats, similar to the memory ones. |
110 |
|
111 |
.TP |
112 |
.I FIN_DSK_FREE, FIN_DSK_AVAIL, FIN_DSK_RESVD, FIN_DSK_INST, FIN_DSK_EFF |
113 |
Final disk stats, similar to the memory ones. |
114 |
|
115 |
.TP |
116 |
.I INI_CPU_INST, FIN_CPU_INST |
117 |
Initial and final number of virtual CPUs used by instances. |
118 |
|
119 |
.TP |
120 |
.I INI_CPU_EFF, FIN_CPU_EFF |
121 |
The initial and final CPU efficiency, represented as the count of |
122 |
virtual instance CPUs divided by the total physical CPU count. |
123 |
|
124 |
.TP |
125 |
.I INI_MNODE_MEM_AVAIL, FIN_MNODE_MEM_AVAIL |
126 |
The initial and final maximum per\(hynode available memory. This is not |
127 |
very useful as a metric but can give an impression of the status of |
128 |
the nodes; as an example, this value restricts the maximum instance |
129 |
size that can be still created on the cluster. |
130 |
|
131 |
.TP |
132 |
.I INI_MNODE_DSK_AVAIL, FIN_MNODE_DSK_AVAIL |
133 |
Like the above but for disk. |
134 |
|
135 |
.TP |
136 |
.I TSPEC |
137 |
If the tiered allocation mode has been enabled, this parameter holds |
138 |
the pairs of specifications and counts of instances that can be |
139 |
created in this mode. The value of the key is a space\(hyseparated list |
140 |
of values; each value is of the form \fImemory,disk,vcpu=count\fR |
141 |
where the memory, disk and vcpu are the values for the current spec, |
142 |
and count is how many instances of this spec can be created. A |
143 |
complete value for this variable could be: \fB4096,102400,2=225 |
144 |
2560,102400,2=20 512,102400,2=21\fR. |
145 |
|
146 |
.TP |
147 |
.I KM_USED_CPU, KM_USED_NPU, KM_USED_MEM, KM_USED_DSK |
148 |
These represents the metrics of used resources at the start of the |
149 |
computation (only for tiered allocation mode). The NPU value is |
150 |
"normalized" CPU count, i.e. the number of virtual CPUs divided by the |
151 |
maximum ratio of the virtual to physical CPUs. |
152 |
|
153 |
.TP |
154 |
.I KM_POOL_CPU, KM_POOL_NPU, KM_POOL_MEM, KM_POOL_DSK |
155 |
These represents the total resources allocated during the tiered |
156 |
allocation process. In effect, they represent how much is readily |
157 |
available for allocation. |
158 |
|
159 |
.TP |
160 |
.I KM_UNAV_CPU, KM_POOL_NPU, KM_UNAV_MEM, KM_UNAV_DSK |
161 |
These represents the resources left over (either free as in |
162 |
unallocable or allocable on their own) after the tiered allocation has |
163 |
been completed. They represent better the actual unallocable |
164 |
resources, because some other resource has been exhausted. For |
165 |
example, the cluster might still have 100GiB disk free, but with no |
166 |
memory left for instances, we cannot allocate another instance, so in |
167 |
effect the disk space is unallocable. Note that the CPUs here |
168 |
represent instance virtual CPUs, and in case the \fI--max-cpu\fR |
169 |
option hasn't been specified this will be \-1. |
170 |
|
171 |
.TP |
172 |
.I ALLOC_USAGE |
173 |
The current usage represented as initial number of instances divided |
174 |
per final number of instances. |
175 |
|
176 |
.TP |
177 |
.I ALLOC_COUNT |
178 |
The number of instances allocated (delta between FIN_INST_CNT and |
179 |
INI_INST_CNT). |
180 |
|
181 |
.TP |
182 |
.I ALLOC_FAIL*_CNT |
183 |
For the last attemp at allocations (which would have increased |
184 |
FIN_INST_CNT with one, if it had succeeded), this is the count of the |
185 |
failure reasons per failure type; currently defined are FAILMEM, |
186 |
FAILDISK and FAILCPU which represent errors due to not enough memory, |
187 |
disk and CPUs, and FAILN1 which represents a non N+1 compliant cluster |
188 |
on which we can't allocate instances at all. |
189 |
|
190 |
.TP |
191 |
.I ALLOC_FAIL_REASON |
192 |
The reason for most of the failures, being one of the above FAIL* |
193 |
strings. |
194 |
|
195 |
.TP |
196 |
.I OK |
197 |
A marker representing the successful end of the computation, and |
198 |
having value "1". If this key is not present in the output it means |
199 |
that the computation failed and any values present should not be |
200 |
relied upon. |
201 |
|
202 |
.PP |
203 |
|
204 |
If the tiered allocation mode is enabled, then many of the INI_/FIN_ |
205 |
metrics will be also displayed with a TRL_ prefix, and denote the |
206 |
cluster status at the end of the tiered allocation run. |
207 |
|
208 |
.SH OPTIONS |
209 |
The options that can be passed to the program are as follows: |
210 |
|
211 |
.TP |
212 |
.BI "--memory " mem |
213 |
The memory size of the instances to be placed (defaults to 4GiB). |
214 |
|
215 |
.TP |
216 |
.BI "--disk " disk |
217 |
The disk size of the instances to be placed (defaults to 100GiB). |
218 |
|
219 |
.TP |
220 |
.BI "--req-nodes " num-nodes |
221 |
The number of nodes for the instances; the default of two means |
222 |
mirrored instances, while passing one means plain type instances. |
223 |
|
224 |
.TP |
225 |
.BI "--vcpus " vcpus |
226 |
The number of VCPUs of the instances to be placed (defaults to 1). |
227 |
|
228 |
.TP |
229 |
.BI "--max-cpu " cpu-ratio |
230 |
The maximum virtual\(hyto\(hyphysical cpu ratio, as a floating point |
231 |
number between zero and one. For example, specifying \fIcpu-ratio\fR |
232 |
as \fB2.5\fR means that, for a 4\(hycpu machine, a maximum of 10 |
233 |
virtual cpus should be allowed to be in use for primary instances. A |
234 |
value of one doesn't make sense though, as that means no disk space |
235 |
can be used on it. |
236 |
|
237 |
.TP |
238 |
.BI "--min-disk " disk-ratio |
239 |
The minimum amount of free disk space remaining, as a floating point |
240 |
number. For example, specifying \fIdisk-ratio\fR as \fB0.25\fR means |
241 |
that at least one quarter of disk space should be left free on nodes. |
242 |
|
243 |
.TP |
244 |
.B -p, --print-nodes |
245 |
Prints the before and after node status, in a format designed to allow |
246 |
the user to understand the node's most important parameters. |
247 |
|
248 |
It is possible to customise the listed information by passing a |
249 |
comma\(hyseparated list of field names to this option (the field list |
250 |
is currently undocumented), or to extend the default field list by |
251 |
prefixing the additional field list with a plus sign. By default, the |
252 |
node list will contain the following information: |
253 |
.RS |
254 |
.TP |
255 |
.B F |
256 |
a character denoting the status of the node, with '\-' meaning an |
257 |
offline node, '*' meaning N+1 failure and blank meaning a good node |
258 |
.TP |
259 |
.B Name |
260 |
the node name |
261 |
.TP |
262 |
.B t_mem |
263 |
the total node memory |
264 |
.TP |
265 |
.B n_mem |
266 |
the memory used by the node itself |
267 |
.TP |
268 |
.B i_mem |
269 |
the memory used by instances |
270 |
.TP |
271 |
.B x_mem |
272 |
amount memory which seems to be in use but cannot be determined why or |
273 |
by which instance; usually this means that the hypervisor has some |
274 |
overhead or that there are other reporting errors |
275 |
.TP |
276 |
.B f_mem |
277 |
the free node memory |
278 |
.TP |
279 |
.B r_mem |
280 |
the reserved node memory, which is the amount of free memory needed |
281 |
for N+1 compliance |
282 |
.TP |
283 |
.B t_dsk |
284 |
total disk |
285 |
.TP |
286 |
.B f_dsk |
287 |
free disk |
288 |
.TP |
289 |
.B pcpu |
290 |
the number of physical cpus on the node |
291 |
.TP |
292 |
.B vcpu |
293 |
the number of virtual cpus allocated to primary instances |
294 |
.TP |
295 |
.B pcnt |
296 |
number of primary instances |
297 |
.TP |
298 |
.B pcnt |
299 |
number of secondary instances |
300 |
.TP |
301 |
.B p_fmem |
302 |
percent of free memory |
303 |
.TP |
304 |
.B p_fdsk |
305 |
percent of free disk |
306 |
.TP |
307 |
.B r_cpu |
308 |
ratio of virtual to physical cpus |
309 |
.TP |
310 |
.B lCpu |
311 |
the dynamic CPU load (if the information is available) |
312 |
.TP |
313 |
.B lMem |
314 |
the dynamic memory load (if the information is available) |
315 |
.TP |
316 |
.B lDsk |
317 |
the dynamic disk load (if the information is available) |
318 |
.TP |
319 |
.B lNet |
320 |
the dynamic net load (if the information is available) |
321 |
.RE |
322 |
|
323 |
.TP |
324 |
.BI "-O " name |
325 |
This option (which can be given multiple times) will mark nodes as |
326 |
being \fIoffline\fR, and instances won't be placed on these nodes. |
327 |
|
328 |
Note that hspace will also mark as offline any nodes which are |
329 |
reported by RAPI as such, or that have "?" in file\(hybased input in any |
330 |
numeric fields. |
331 |
.RE |
332 |
|
333 |
.TP |
334 |
.BI "-t" datafile ", --text-data=" datafile |
335 |
The name of the file holding node and instance information (if not |
336 |
collecting via RAPI or LUXI). This or one of the other backends must |
337 |
be selected. |
338 |
|
339 |
.TP |
340 |
.BI "-S" filename ", --save-cluster=" filename |
341 |
If given, the state of the cluster at the end of the allocation is |
342 |
saved to a file named \fIfilename.alloc\fR, and if tiered allocation |
343 |
is enabled, the state after tiered allocation will be saved to |
344 |
\fIfilename.tiered\fR. This allows re-feeding the cluster state to |
345 |
either hspace itself (with different parameters) or for example hbal. |
346 |
|
347 |
.TP |
348 |
.BI "-m" cluster |
349 |
Collect data directly from the |
350 |
.I cluster |
351 |
given as an argument via RAPI. If the argument doesn't contain a colon |
352 |
(:), then it is converted into a fully\(hybuilt URL via prepending |
353 |
https:// and appending the default RAPI port, otherwise it's |
354 |
considered a fully\(hyspecified URL and is used as\(hyis. |
355 |
|
356 |
.TP |
357 |
.BI "-L[" path "]" |
358 |
Collect data directly from the master daemon, which is to be contacted |
359 |
via the luxi (an internal Ganeti protocol). An optional \fIpath\fR |
360 |
argument is interpreted as the path to the unix socket on which the |
361 |
master daemon listens; otherwise, the default path used by ganeti when |
362 |
installed with \fI--localstatedir=/var\fR is used. |
363 |
|
364 |
.TP |
365 |
.BI "--simulate " description |
366 |
Instead of using actual data, build an empty cluster given a node |
367 |
description. The \fIdescription\fR parameter must be a |
368 |
comma\(hyseparated list of four elements, describing in order: |
369 |
|
370 |
.RS |
371 |
|
372 |
.RS |
373 |
.TP |
374 |
the number of nodes in the cluster |
375 |
|
376 |
.TP |
377 |
the disk size of the nodes, in mebibytes |
378 |
|
379 |
.TP |
380 |
the memory size of the nodes, in mebibytes |
381 |
|
382 |
.TP |
383 |
the cpu core count for the nodes |
384 |
|
385 |
.RE |
386 |
|
387 |
An example description would be \fB20,102400,16384,4\fR describing a |
388 |
20\(hynode cluster where each node has 100GiB of disk space, 16GiB of |
389 |
memory and 4 CPU cores. Note that all nodes must have the same specs |
390 |
currently. |
391 |
|
392 |
.RE |
393 |
|
394 |
.TP |
395 |
.BI "--tiered-alloc " spec |
396 |
Beside the standard, fixed\(hysize allocation, also do a tiered |
397 |
allocation scheme where the algorithm starts from the given |
398 |
specification and allocates until there is no more space; then it |
399 |
decreases the specification and tries the allocation again. The |
400 |
decrease is done on the matric that last failed during allocation. The |
401 |
specification given is similar to the \fI--simulate\fR option and it |
402 |
holds: |
403 |
|
404 |
.RS |
405 |
|
406 |
.RS |
407 |
|
408 |
.TP |
409 |
the disk size of the instance |
410 |
|
411 |
.TP |
412 |
the memory size of the instance |
413 |
|
414 |
.TP |
415 |
the vcpu count for the insance |
416 |
|
417 |
.RE |
418 |
|
419 |
An example description would be \fB10240,8192,2\fR describing an |
420 |
initial starting specification of 10GiB of disk space, 4GiB of memory |
421 |
and 2 VCPUs. |
422 |
|
423 |
Also note that the normal allocation and the tiered allocation are |
424 |
independent, and both start from the initial cluster state; as such, |
425 |
the instance count for these two modes are not related one to another. |
426 |
|
427 |
.RE |
428 |
|
429 |
.TP |
430 |
.B -v, --verbose |
431 |
Increase the output verbosity. Each usage of this option will increase |
432 |
the verbosity (currently more than 2 doesn't make sense) from the |
433 |
default of one. At verbosity 2 the location of the new instances is |
434 |
shown in the standard error. |
435 |
|
436 |
.TP |
437 |
.B -q, --quiet |
438 |
Decrease the output verbosity. Each usage of this option will decrease |
439 |
the verbosity (less than zero doesn't make sense) from the default of |
440 |
one. |
441 |
|
442 |
.TP |
443 |
.B -V, --version |
444 |
Just show the program version and exit. |
445 |
|
446 |
.SH EXIT STATUS |
447 |
|
448 |
The exist status of the command will be zero, unless for some reason |
449 |
the algorithm fatally failed (e.g. wrong node or instance data). |
450 |
|
451 |
.SH BUGS |
452 |
|
453 |
The algorithm is highly dependent on the number of nodes; its runtime |
454 |
grows exponentially with this number, and as such is impractical for |
455 |
really big clusters. |
456 |
|
457 |
The algorithm doesn't rebalance the cluster or try to get the optimal |
458 |
fit; it just allocates in the best place for the current step, without |
459 |
taking into consideration the impact on future placements. |
460 |
|
461 |
.SH ENVIRONMENT |
462 |
|
463 |
If the variables \fBHTOOLS_NODES\fR and \fBHTOOLS_INSTANCES\fR are |
464 |
present in the environment, they will override the default names for |
465 |
the nodes and instances files. These will have of course no effect |
466 |
when the RAPI or Luxi backends are used. |
467 |
|
468 |
.SH SEE ALSO |
469 |
.BR hbal "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), " |
470 |
.BR gnt-node "(8)" |
471 |
|
472 |
.SH "COPYRIGHT" |
473 |
.PP |
474 |
Copyright (C) 2009 Google Inc. Permission is granted to copy, |
475 |
distribute and/or modify under the terms of the GNU General Public |
476 |
License as published by the Free Software Foundation; either version 2 |
477 |
of the License, or (at your option) any later version. |
478 |
.PP |
479 |
On Debian systems, the complete text of the GNU General Public License |
480 |
can be found in /usr/share/common-licenses/GPL. |