Statistics
| Branch: | Tag: | Revision:

root / hspace.1 @ 5d4f9eed

History | View | Annotate | Download (12.5 kB)

1
.TH HSPACE 1 2009-06-01 htools "Ganeti H-tools"
2
.SH NAME
3
hspace \- Cluster space analyzer for Ganeti
4

    
5
.SH SYNOPSIS
6
.B hspace
7
.B "[backend options...]"
8
.B "[algorithm options...]"
9
.B "[request options..."]
10
.B "[-p]"
11
.B "[-v... | -q]"
12

    
13
.B hspace
14
.B --version
15

    
16
.TP
17
Backend options:
18
.BI " -m " cluster
19
|
20
.BI " -L[" path "]"
21
|
22
.BI " -n " nodes-file
23
.BI " -i " instances-file
24
|
25
.BI " --simulate " spec
26

    
27
.TP
28
Algorithm options:
29
.BI "[ --max-cpu " cpu-ratio " ]"
30
.BI "[ --min-disk " disk-ratio " ]"
31
.BI "[ -O " name... " ]"
32

    
33
.TP
34
Request options:
35
.BI "[--memory " mem "]"
36
.BI "[--disk " disk "]"
37
.BI "[--req-nodes " req-nodes "]"
38
.BI "[--vcpus " vcpus "]"
39
.BI "[--tiered-alloc " spec "]"
40

    
41

    
42
.SH DESCRIPTION
43
hspace computes how many additional instances can be fit on a cluster,
44
while maintaining N+1 status.
45

    
46
The program will try to place instances, all of the same size, on the
47
cluster, until the point where we don't have any N+1 possible
48
allocation. It uses the exact same allocation algorithm as the hail
49
iallocator plugin.
50

    
51
The output of the program is designed to interpreted as a shell
52
fragment (or parsed as a \fIkey=value\fR file). Options which extend
53
the output (e.g. -p, -v) will output the additional information on
54
stderr (such that the stdout is still parseable).
55

    
56
The following keys are available in the output of the script (all
57
prefixed with \fIHTS_\fR):
58
.TP
59
.I SPEC_MEM, SPEC_DSK, SPEC_CPU, SPEC_RQN
60
These represent the specifications of the instance model used for
61
allocation (the memory, disk, cpu, requested nodes).
62

    
63
.TP
64
.I CLUSTER_MEM, CLUSTER_DSK, CLUSTER_CPU, CLUSTER_NODES
65
These represent the total memory, disk, CPU count and total nodes in
66
the cluster.
67

    
68
.TP
69
.I INI_SCORE, FIN_SCORE
70
These are the initial (current) and final cluster score (see the hbal
71
man page for details about the scoring algorithm).
72

    
73
.TP
74
.I INI_INST_CNT, FIN_INST_CNT
75
The initial and final instance count.
76

    
77
.TP
78
.I INI_MEM_FREE, FIN_MEM_FREE
79
The initial and final total free memory in the cluster (but this
80
doesn't necessarily mean available for use).
81

    
82
.TP
83
.I INI_MEM_AVAIL, FIN_MEM_AVAIL
84
The initial and final total available memory for allocation in the
85
cluster. If allocating redundant instances, new instances could
86
increase the reserved memory so it doesn't necessarily mean the
87
entirety of this memory can be used for new instance allocations.
88

    
89
.TP
90
.I INI_MEM_RESVD, FIN_MEM_RESVD
91
The initial and final reserved memory (for redundancy/N+1 purposes).
92

    
93
.TP
94
.I INI_MEM_INST, FIN_MEM_INST
95
The initial and final memory used for instances (actual runtime used
96
RAM).
97

    
98
.TP
99
.I INI_MEM_OVERHEAD, FIN_MEM_OVERHEAD
100
The initial and final memory overhead - memory used for the node
101
itself and unacounted memory (e.g. due to hypervisor overhead).
102

    
103
.TP
104
.I INI_MEM_EFF, HTS_INI_MEM_EFF
105
The initial and final memory efficiency, represented as instance
106
memory divided by total memory.
107

    
108
.TP
109
.I INI_DSK_FREE, INI_DSK_AVAIL, INI_DSK_RESVD, INI_DSK_INST, INI_DSK_EFF
110
Initial disk stats, similar to the memory ones.
111

    
112
.TP
113
.I FIN_DSK_FREE, FIN_DSK_AVAIL, FIN_DSK_RESVD, FIN_DSK_INST, FIN_DSK_EFF
114
Final disk stats, similar to the memory ones.
115

    
116
.TP
117
.I INI_CPU_INST, FIN_CPU_INST
118
Initial and final number of virtual CPUs used by instances.
119

    
120
.TP
121
.I INI_CPU_EFF, FIN_CPU_EFF
122
The initial and final CPU efficiency, represented as the count of
123
virtual instance CPUs divided by the total physical CPU count.
124

    
125
.TP
126
.I INI_MNODE_MEM_AVAIL, FIN_MNODE_MEM_AVAIL
127
The initial and final maximum per-node available memory. This is not
128
very useful as a metric but can give an impression of the status of
129
the nodes; as an example, this value restricts the maximum instance
130
size that can be still created on the cluster.
131

    
132
.TP
133
.I INI_MNODE_DSK_AVAIL, FIN_MNODE_DSK_AVAIL
134
Like the above but for disk.
135

    
136
.TP
137
.I TSPEC
138
If the tiered allocation mode has been enabled, this parameter holds
139
the pairs of specifications and counts of instances that can be
140
created in this mode. The value of the key is a space-separated list
141
of values; each value is of the form \fImemory,disk,vcpu=count\fR
142
where the memory, disk and vcpu are the values for the current spec,
143
and count is how many instances of this spec can be created. A
144
complete value for this variable could be: \fB4096,102400,2=225
145
2560,102400,2=20 512,102400,2=21\fR.
146

    
147
.TP
148
.I ALLOC_USAGE
149
The current usage represented as initial number of instances divided
150
per final number of instances.
151

    
152
.TP
153
.I ALLOC_COUNT
154
The number of instances allocated (delta between FIN_INST_CNT and
155
INI_INST_CNT).
156

    
157
.TP
158
.I ALLOC_FAIL*_CNT
159
For the last attemp at allocations (which would have increased
160
FIN_INST_CNT with one, if it had succeeded), this is the count of the
161
failure reasons per failure type; currently defined are FAILMEM,
162
FAILDISK and FAILCPU which represent errors due to not enough memory,
163
disk and CPUs, and FAILN1 which represents a non N+1 compliant cluster
164
on which we can't allocate instances at all.
165

    
166
.TP
167
.I ALLOC_FAIL_REASON
168
The reason for most of the failures, being one of the above FAIL*
169
strings.
170

    
171
.TP
172
.I OK
173
A marker representing the successful end of the computation, and
174
having value "1". If this key is not present in the output it means
175
that the computation failed and any values present should not be
176
relied upon.
177

    
178
.PP
179

    
180
If the tiered allocation mode is enabled, then many of the INI_/FIN_
181
metrics will be also displayed with a TRL_ prefix, and denote the
182
cluster status at the end of the tiered allocation run.
183

    
184
.SH OPTIONS
185
The options that can be passed to the program are as follows:
186

    
187
.TP
188
.BI "--memory " mem
189
The memory size of the instances to be placed (defaults to 4GiB).
190

    
191
.TP
192
.BI "--disk " disk
193
The disk size of the instances to be placed (defaults to 100GiB).
194

    
195
.TP
196
.BI "--req-nodes " num-nodes
197
The number of nodes for the instances; the default of two means
198
mirrored instances, while passing one means plain type instances.
199

    
200
.TP
201
.BI "--vcpus " vcpus
202
The number of VCPUs of the instances to be placed (defaults to 1).
203

    
204
.TP
205
.BI "--max-cpu " cpu-ratio
206
The maximum virtual-to-physical cpu ratio, as a floating point number
207
between zero and one. For example, specifying \fIcpu-ratio\fR as
208
\fB2.5\fR means that, for a 4-cpu machine, a maximum of 10 virtual
209
cpus should be allowed to be in use for primary instances. A value of
210
one doesn't make sense though, as that means no disk space can be used
211
on it.
212

    
213
.TP
214
.BI "--min-disk " disk-ratio
215
The minimum amount of free disk space remaining, as a floating point
216
number. For example, specifying \fIdisk-ratio\fR as \fB0.25\fR means
217
that at least one quarter of disk space should be left free on nodes.
218

    
219
.TP
220
.B -p, --print-nodes
221
Prints the before and after node status, in a format designed to allow
222
the user to understand the node's most important parameters.
223

    
224
The node list will contain these informations:
225
.RS
226
.TP
227
.B F
228
a character denoting the status of the node, with '-' meaning an
229
offline node, '*' meaning N+1 failure and blank meaning a good node
230
.TP
231
.B Name
232
the node name
233
.TP
234
.B t_mem
235
the total node memory
236
.TP
237
.B n_mem
238
the memory used by the node itself
239
.TP
240
.B i_mem
241
the memory used by instances
242
.TP
243
.B x_mem
244
amount memory which seems to be in use but cannot be determined why or
245
by which instance; usually this means that the hypervisor has some
246
overhead or that there are other reporting errors
247
.TP
248
.B f_mem
249
the free node memory
250
.TP
251
.B r_mem
252
the reserved node memory, which is the amount of free memory needed
253
for N+1 compliance
254
.TP
255
.B t_dsk
256
total disk
257
.TP
258
.B f_dsk
259
free disk
260
.TP
261
.B pcpu
262
the number of physical cpus on the node
263
.TP
264
.B vcpu
265
the number of virtual cpus allocated to primary instances
266
.TP
267
.B pri
268
number of primary instances
269
.TP
270
.B sec
271
number of secondary instances
272
.TP
273
.B p_fmem
274
percent of free memory
275
.TP
276
.B p_fdsk
277
percent of free disk
278
.TP
279
.B r_cpu
280
ratio of virtual to physical cpus
281
.TP
282
.B lCpu
283
the dynamic CPU load (if the information is available)
284
.TP
285
.B lMem
286
the dynamic memory load (if the information is available)
287
.TP
288
.B lDsk
289
the dynamic disk load (if the information is available)
290
.TP
291
.B lNet
292
the dynamic net load (if the information is available)
293
.RE
294

    
295
.TP
296
.BI "-O " name
297
This option (which can be given multiple times) will mark nodes as
298
being \fIoffline\fR, and instances won't be placed on these nodes.
299

    
300
Note that hspace will also mark as offline any nodes which are
301
reported by RAPI as such, or that have "?" in file-based input in any
302
numeric fields.
303
.RE
304

    
305
.TP
306
.BI "-n" nodefile ", --nodes=" nodefile
307
The name of the file holding node information (if not collecting via
308
RAPI), instead of the default \fInodes\fR file (but see below how to
309
customize the default value via the environment).
310

    
311
.TP
312
.BI "-i" instancefile ", --instances=" instancefile
313
The name of the file holding instance information (if not collecting
314
via RAPI), instead of the default \fIinstances\fR file (but see below
315
how to customize the default value via the environment).
316

    
317
.TP
318
.BI "-m" cluster
319
Collect data not from files but directly from the
320
.I cluster
321
given as an argument via RAPI. If the argument doesn't contain a colon
322
(:), then it is converted into a fully-built URL via prepending
323
https:// and appending the default RAPI port, otherwise it's
324
considered a fully-specified URL and is used as-is.
325

    
326
.TP
327
.BI "-L[" path "]"
328
Collect data not from files but directly from the master daemon, which
329
is to be contacted via the luxi (an internal Ganeti protocol). An
330
optional \fIpath\fR argument is interpreted as the path to the unix
331
socket on which the master daemon listens; otherwise, the default path
332
used by ganeti when installed with "--localstatedir=/var" is used.
333

    
334
.TP
335
.BI "--simulate " description
336
Instead of using actual data, build an empty cluster given a node
337
description. The \fIdescription\fR parameter must be a comma-separated
338
list of four elements, describing in order:
339

    
340
.RS
341

    
342
.RS
343
.TP
344
the number of nodes in the cluster
345

    
346
.TP
347
the disk size of the nodes, in mebibytes
348

    
349
.TP
350
the memory size of the nodes, in mebibytes
351

    
352
.TP
353
the cpu core count for the nodes
354

    
355
.RE
356

    
357
An example description would be \fB20,102400,16384,4\fR describing a
358
20-node cluster where each node has 100GiB of disk space, 16GiB of
359
memory and 4 CPU cores. Note that all nodes must have the same specs
360
currently.
361

    
362
.RE
363

    
364
.TP
365
.BI "--tiered-alloc " spec
366
Beside the standard, fixed-size allocation, also do a tiered
367
allocation scheme where the algorithm starts from the given
368
specification and allocates until there is no more space; then it
369
decreases the specification and tries the allocation again. The
370
decrease is done on the matric that last failed during allocation. The
371
specification given is similar to the "--simulate" option and it
372
holds:
373

    
374
.RS
375

    
376
.RS
377

    
378
.TP
379
the disk size of the instance
380

    
381
.TP
382
the memory size of the instance
383

    
384
.TP
385
the vcpu count for the insance
386

    
387
.RE
388

    
389
An example description would be \fB10240,8192,2\fR describing an
390
initial starting specification of 10GiB of disk space, 4GiB of memory
391
and 2 VCPUs.
392

    
393
Also note that the normal allocation and the tiered allocation are
394
independent, and both start from the initial cluster state; as such,
395
the instance count for these two modes are not related one to another.
396

    
397
.RE
398

    
399
.TP
400
.B -v, --verbose
401
Increase the output verbosity. Each usage of this option will increase
402
the verbosity (currently more than 2 doesn't make sense) from the
403
default of one. At verbosity 2 the location of the new instances is
404
shown in the standard error.
405

    
406
.TP
407
.B -q, --quiet
408
Decrease the output verbosity. Each usage of this option will decrease
409
the verbosity (less than zero doesn't make sense) from the default of
410
one.
411

    
412
.TP
413
.B -V, --version
414
Just show the program version and exit.
415

    
416
.SH EXIT STATUS
417

    
418
The exist status of the command will be zero, unless for some reason
419
the algorithm fatally failed (e.g. wrong node or instance data).
420

    
421
.SH BUGS
422

    
423
The algorithm is highly dependent on the number of nodes; its runtime
424
grows exponentially with this number, and as such is impractical for
425
really big clusters.
426

    
427
The algorithm doesn't rebalance the cluster or try to get the optimal
428
fit; it just allocates in the best place for the current step, without
429
taking into consideration the impact on future placements.
430

    
431
.SH ENVIRONMENT
432

    
433
If the variables \fBHTOOLS_NODES\fR and \fBHTOOLS_INSTANCES\fR are
434
present in the environment, they will override the default names for
435
the nodes and instances files. These will have of course no effect
436
when the RAPI or Luxi backends are used.
437

    
438
.SH SEE ALSO
439
.BR hbal "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), "
440
.BR gnt-node "(8)"
441

    
442
.SH "COPYRIGHT"
443
.PP
444
Copyright (C) 2009 Google Inc. Permission is granted to copy,
445
distribute and/or modify under the terms of the GNU General Public
446
License as published by the Free Software Foundation; either version 2
447
of the License, or (at your option) any later version.
448
.PP
449
On Debian systems, the complete text of the GNU General Public License
450
can be found in /usr/share/common-licenses/GPL.