1 .TH HSPACE 1 2009-06-01 htools "Ganeti H-tools"
3 hspace \- Cluster space analyzer for Ganeti
7 .B "[backend options...]"
8 .B "[algorithm options...]"
9 .B "[request options..."]
10 .BI "[ -p[" fields "] ]"
24 .BI " --simulate " spec
28 .BI "[ --max-cpu " cpu-ratio " ]"
29 .BI "[ --min-disk " disk-ratio " ]"
30 .BI "[ -O " name... " ]"
34 .BI "[--memory " mem "]"
35 .BI "[--disk " disk "]"
36 .BI "[--req-nodes " req-nodes "]"
37 .BI "[--vcpus " vcpus "]"
38 .BI "[--tiered-alloc " spec "]"
42 hspace computes how many additional instances can be fit on a cluster,
43 while maintaining N+1 status.
45 The program will try to place instances, all of the same size, on the
46 cluster, until the point where we don't have any N+1 possible
47 allocation. It uses the exact same allocation algorithm as the hail
50 The output of the program is designed to interpreted as a shell
51 fragment (or parsed as a \fIkey=value\fR file). Options which extend
52 the output (e.g. \-p, \-v) will output the additional information on
53 stderr (such that the stdout is still parseable).
55 The following keys are available in the output of the script (all
56 prefixed with \fIHTS_\fR):
58 .I SPEC_MEM, SPEC_DSK, SPEC_CPU, SPEC_RQN
59 These represent the specifications of the instance model used for
60 allocation (the memory, disk, cpu, requested nodes).
63 .I CLUSTER_MEM, CLUSTER_DSK, CLUSTER_CPU, CLUSTER_NODES
64 These represent the total memory, disk, CPU count and total nodes in
68 .I INI_SCORE, FIN_SCORE
69 These are the initial (current) and final cluster score (see the hbal
70 man page for details about the scoring algorithm).
73 .I INI_INST_CNT, FIN_INST_CNT
74 The initial and final instance count.
77 .I INI_MEM_FREE, FIN_MEM_FREE
78 The initial and final total free memory in the cluster (but this
79 doesn't necessarily mean available for use).
82 .I INI_MEM_AVAIL, FIN_MEM_AVAIL
83 The initial and final total available memory for allocation in the
84 cluster. If allocating redundant instances, new instances could
85 increase the reserved memory so it doesn't necessarily mean the
86 entirety of this memory can be used for new instance allocations.
89 .I INI_MEM_RESVD, FIN_MEM_RESVD
90 The initial and final reserved memory (for redundancy/N+1 purposes).
93 .I INI_MEM_INST, FIN_MEM_INST
94 The initial and final memory used for instances (actual runtime used
98 .I INI_MEM_OVERHEAD, FIN_MEM_OVERHEAD
99 The initial and final memory overhead \(em memory used for the node
100 itself and unacounted memory (e.g. due to hypervisor overhead).
103 .I INI_MEM_EFF, HTS_INI_MEM_EFF
104 The initial and final memory efficiency, represented as instance
105 memory divided by total memory.
108 .I INI_DSK_FREE, INI_DSK_AVAIL, INI_DSK_RESVD, INI_DSK_INST, INI_DSK_EFF
109 Initial disk stats, similar to the memory ones.
112 .I FIN_DSK_FREE, FIN_DSK_AVAIL, FIN_DSK_RESVD, FIN_DSK_INST, FIN_DSK_EFF
113 Final disk stats, similar to the memory ones.
116 .I INI_CPU_INST, FIN_CPU_INST
117 Initial and final number of virtual CPUs used by instances.
120 .I INI_CPU_EFF, FIN_CPU_EFF
121 The initial and final CPU efficiency, represented as the count of
122 virtual instance CPUs divided by the total physical CPU count.
125 .I INI_MNODE_MEM_AVAIL, FIN_MNODE_MEM_AVAIL
126 The initial and final maximum per\(hynode available memory. This is not
127 very useful as a metric but can give an impression of the status of
128 the nodes; as an example, this value restricts the maximum instance
129 size that can be still created on the cluster.
132 .I INI_MNODE_DSK_AVAIL, FIN_MNODE_DSK_AVAIL
133 Like the above but for disk.
137 If the tiered allocation mode has been enabled, this parameter holds
138 the pairs of specifications and counts of instances that can be
139 created in this mode. The value of the key is a space\(hyseparated list
140 of values; each value is of the form \fImemory,disk,vcpu=count\fR
141 where the memory, disk and vcpu are the values for the current spec,
142 and count is how many instances of this spec can be created. A
143 complete value for this variable could be: \fB4096,102400,2=225
144 2560,102400,2=20 512,102400,2=21\fR.
147 .I KM_USED_CPU, KM_USED_MEM, KM_USED_DSK
148 These represents the metrics of used resources at the start of the
149 computation (only for tiered allocation mode).
152 .I KM_POOL_CPU, KM_POOL_MEM, KM_POOL_DSK
153 These represents the total resources allocated during the tiered
154 allocation process. In effect, they represent how much is readily
155 available for allocation.
158 .I KM_UNAV_CPU, KM_UNAV_MEM, KM_UNAV_DSK
159 These represents the resources left over (either free as in
160 unallocable or allocable on their own) after the tiered allocation has
161 been completed. They represent better the actual unallocable
162 resources, because some other resource has been exhausted. For
163 example, the cluster might still have 100GiB disk free, but with no
164 memory left for instances, we cannot allocate another instance, so in
165 effect the disk space is unallocable. Note that the CPUs here
166 represent instance virtual CPUs, and in case the \fI--max-cpu\fR
167 option hasn't been specified this will be -1.
171 The current usage represented as initial number of instances divided
172 per final number of instances.
176 The number of instances allocated (delta between FIN_INST_CNT and
181 For the last attemp at allocations (which would have increased
182 FIN_INST_CNT with one, if it had succeeded), this is the count of the
183 failure reasons per failure type; currently defined are FAILMEM,
184 FAILDISK and FAILCPU which represent errors due to not enough memory,
185 disk and CPUs, and FAILN1 which represents a non N+1 compliant cluster
186 on which we can't allocate instances at all.
190 The reason for most of the failures, being one of the above FAIL*
195 A marker representing the successful end of the computation, and
196 having value "1". If this key is not present in the output it means
197 that the computation failed and any values present should not be
202 If the tiered allocation mode is enabled, then many of the INI_/FIN_
203 metrics will be also displayed with a TRL_ prefix, and denote the
204 cluster status at the end of the tiered allocation run.
207 The options that can be passed to the program are as follows:
211 The memory size of the instances to be placed (defaults to 4GiB).
215 The disk size of the instances to be placed (defaults to 100GiB).
218 .BI "--req-nodes " num-nodes
219 The number of nodes for the instances; the default of two means
220 mirrored instances, while passing one means plain type instances.
224 The number of VCPUs of the instances to be placed (defaults to 1).
227 .BI "--max-cpu " cpu-ratio
228 The maximum virtual\(hyto\(hyphysical cpu ratio, as a floating point
229 number between zero and one. For example, specifying \fIcpu-ratio\fR
230 as \fB2.5\fR means that, for a 4\(hycpu machine, a maximum of 10
231 virtual cpus should be allowed to be in use for primary instances. A
232 value of one doesn't make sense though, as that means no disk space
236 .BI "--min-disk " disk-ratio
237 The minimum amount of free disk space remaining, as a floating point
238 number. For example, specifying \fIdisk-ratio\fR as \fB0.25\fR means
239 that at least one quarter of disk space should be left free on nodes.
243 Prints the before and after node status, in a format designed to allow
244 the user to understand the node's most important parameters.
246 It is possible to customise the listed information by passing a
247 comma\(hyseparated list of field names to this option (the field list is
248 currently undocumented). By default, the node list will contain these
253 a character denoting the status of the node, with '\-' meaning an
254 offline node, '*' meaning N+1 failure and blank meaning a good node
260 the total node memory
263 the memory used by the node itself
266 the memory used by instances
269 amount memory which seems to be in use but cannot be determined why or
270 by which instance; usually this means that the hypervisor has some
271 overhead or that there are other reporting errors
277 the reserved node memory, which is the amount of free memory needed
287 the number of physical cpus on the node
290 the number of virtual cpus allocated to primary instances
293 number of primary instances
296 number of secondary instances
299 percent of free memory
305 ratio of virtual to physical cpus
308 the dynamic CPU load (if the information is available)
311 the dynamic memory load (if the information is available)
314 the dynamic disk load (if the information is available)
317 the dynamic net load (if the information is available)
322 This option (which can be given multiple times) will mark nodes as
323 being \fIoffline\fR, and instances won't be placed on these nodes.
325 Note that hspace will also mark as offline any nodes which are
326 reported by RAPI as such, or that have "?" in file\(hybased input in any
331 .BI "-t" datafile ", --text-data=" datafile
332 The name of the file holding node and instance information (if not
333 collecting via RAPI or LUXI). This or one of the other backends must
338 Collect data directly from the
340 given as an argument via RAPI. If the argument doesn't contain a colon
341 (:), then it is converted into a fully\(hybuilt URL via prepending
342 https:// and appending the default RAPI port, otherwise it's
343 considered a fully\(hyspecified URL and is used as\(hyis.
347 Collect data directly from the master daemon, which is to be contacted
348 via the luxi (an internal Ganeti protocol). An optional \fIpath\fR
349 argument is interpreted as the path to the unix socket on which the
350 master daemon listens; otherwise, the default path used by ganeti when
351 installed with \fI--localstatedir=/var\fR is used.
354 .BI "--simulate " description
355 Instead of using actual data, build an empty cluster given a node
356 description. The \fIdescription\fR parameter must be a
357 comma\(hyseparated list of four elements, describing in order:
363 the number of nodes in the cluster
366 the disk size of the nodes, in mebibytes
369 the memory size of the nodes, in mebibytes
372 the cpu core count for the nodes
376 An example description would be \fB20,102400,16384,4\fR describing a
377 20\(hynode cluster where each node has 100GiB of disk space, 16GiB of
378 memory and 4 CPU cores. Note that all nodes must have the same specs
384 .BI "--tiered-alloc " spec
385 Beside the standard, fixed\(hysize allocation, also do a tiered
386 allocation scheme where the algorithm starts from the given
387 specification and allocates until there is no more space; then it
388 decreases the specification and tries the allocation again. The
389 decrease is done on the matric that last failed during allocation. The
390 specification given is similar to the \fI--simulate\fR option and it
398 the disk size of the instance
401 the memory size of the instance
404 the vcpu count for the insance
408 An example description would be \fB10240,8192,2\fR describing an
409 initial starting specification of 10GiB of disk space, 4GiB of memory
412 Also note that the normal allocation and the tiered allocation are
413 independent, and both start from the initial cluster state; as such,
414 the instance count for these two modes are not related one to another.
420 Increase the output verbosity. Each usage of this option will increase
421 the verbosity (currently more than 2 doesn't make sense) from the
422 default of one. At verbosity 2 the location of the new instances is
423 shown in the standard error.
427 Decrease the output verbosity. Each usage of this option will decrease
428 the verbosity (less than zero doesn't make sense) from the default of
433 Just show the program version and exit.
437 The exist status of the command will be zero, unless for some reason
438 the algorithm fatally failed (e.g. wrong node or instance data).
442 The algorithm is highly dependent on the number of nodes; its runtime
443 grows exponentially with this number, and as such is impractical for
446 The algorithm doesn't rebalance the cluster or try to get the optimal
447 fit; it just allocates in the best place for the current step, without
448 taking into consideration the impact on future placements.
452 If the variables \fBHTOOLS_NODES\fR and \fBHTOOLS_INSTANCES\fR are
453 present in the environment, they will override the default names for
454 the nodes and instances files. These will have of course no effect
455 when the RAPI or Luxi backends are used.
458 .BR hbal "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), "
463 Copyright (C) 2009 Google Inc. Permission is granted to copy,
464 distribute and/or modify under the terms of the GNU General Public
465 License as published by the Free Software Foundation; either version 2
466 of the License, or (at your option) any later version.
468 On Debian systems, the complete text of the GNU General Public License
469 can be found in /usr/share/common-licenses/GPL.