-.TH HBAL 1 2009-03-22 htools "Ganeti H-tools"
+.TH HBAL 1 2009-03-23 htools "Ganeti H-tools"
.SH NAME
hbal \- Cluster balancer for Ganeti
.B "[-C]"
.B "[-p]"
.B "[-o]"
+.B "[-v... | -q]"
.BI "[-l" limit "]"
.BI "[-O" name... "]"
+.BI "[-e" score "]"
.BI "[-m " cluster "]"
.BI "[-n " nodes-file " ]"
.BI "[-i " instances-file "]"
+.BI "[--max-cpu " cpu-ratio "]"
+.BI "[--min-disk " disk-ratio "]"
.B hbal
.B --version
failover/migrate and replace-disks such that we change one of the
instance nodes, and the other one remains (but possibly with changed
role, e.g. from primary it becomes secondary). The list is:
- - failover (f)
- - replace secondary (r)
- - replace primary, a composite move (f, r, f)
- - failover and replace secondary, also composite (f, r)
- - replace secondary and failover, also composite (r, f)
+.RS 4
+.TP 3
+\(em
+failover (f)
+.TP
+\(em
+replace secondary (r)
+.TP
+\(em
+replace primary, a composite move (f, r, f)
+.TP
+\(em
+failover and replace secondary, also composite (f, r)
+.TP
+\(em
+replace secondary and failover, also composite (r, f)
+.RE
We don't do the only remaining possibility of replacing both nodes
(r,f,r,f or the equivalent f,r,f,r) since these move needs an
As said before, the algorithm tries to minimise the cluster score at
each step. Currently this score is computed as a sum of the following
components:
- - coefficient of variance of the percent of free memory
- - coefficient of variance of the percent of reserved memory
- - coefficient of variance of the percent of free disk
- - percentage of nodes failing N+1 check
- - percentage of instances living (either as primary or secondary) on
- offline nodes
+.RS 4
+.TP 3
+\(em
+coefficient of variance of the percent of free memory
+.TP
+\(em
+coefficient of variance of the percent of reserved memory
+.TP
+\(em
+coefficient of variance of the percent of free disk
+.TP
+\(em
+percentage of nodes failing N+1 check
+.TP
+\(em
+percentage of instances living (either as primary or secondary) on
+offline nodes
+.TP
+\(em
+coefficent of variance of the ratio of virtual-to-physical cpus (for
+primary instaces of the node)
+.RE
The free memory and free disk values help ensure that all nodes are
somewhat balanced in their resource usage. The reserved memory helps
instances the same size and spread across the nodes equally), all
values would be zero. This doesn't happen too often in practice :)
+.SS OFFLINE INSTANCES
+
+Since current Ganeti versions do not report the memory used by offline
+(down) instances, ignoring the run status of instances will cause
+wrong calculations. For this reason, the algorithm subtracts the
+memory size of down instances from the free node memory of their
+primary node, in effect simulating the startup of such instances.
+
.SS OTHER POSSIBLE METRICS
It would be desirable to add more metrics to the algorithm, especially
dynamically-computed metrics, such as:
- - CPU usage of instances, combined with VCPU versus PCPU count
- - Disk IO usage
- - Network IO
+.RS 4
+.TP 3
+\(em
+CPU usage of instances
+.TP
+\(em
+Disk IO usage
+.TP
+\(em
+Network IO
+.RE
.SH OPTIONS
The options that can be passed to the program are as follows:
.B f_dsk
free disk
.TP
+.B pcpu
+the number of physical cpus on the node
+.TP
+.B vcpu
+the number of virtual cpus allocated to primary instances
+.TP
.B pri
number of primary instances
.TP
.TP
.B p_fdsk
percent of free disk
+.TP
+.B r_cpu
+ratio of virtual to physical cpus
.RE
.TP
status.
The line will contain four fields:
- - initial cluster score
- - number of steps in the solution
- - final cluster score
- - improvement in the cluster score
+.RS
+.RS 4
+.TP 3
+\(em
+initial cluster score
+.TP
+\(em
+number of steps in the solution
+.TP
+\(em
+final cluster score
+.TP
+\(em
+improvement in the cluster score
+.RE
+.RE
.TP
.BI "-O " name
This option (which can be given multiple times) will mark nodes as
being \fIoffline\fR. This means a couple of things:
.RS
-.TP
--
+.RS 4
+.TP 3
+\(em
instances won't be placed on these nodes, not even temporarily;
e.g. the \fIreplace primary\fR move is not available if the secondary
node is offline, since this move requires a failover.
.TP
--
+\(em
these nodes will not be included in the score calculation (except for
the percentage of instances on offline nodes)
.RE
+Note that hbal will also mark as offline any nodes which are reported
+by RAPI as such, or that have "?" in file-based input in any numeric
+fields.
+.RE
+
+.TP
+.BI "-e" score ", --min-score=" score
+This parameter denotes the minimum score we are happy with and alters
+the computation in two ways:
+.RS
+.RS 4
+.TP 3
+\(em
+if the cluster has the initial score lower than this value, then we
+don't enter the algorithm at all, and exit with success
+.TP
+\(em
+during the iterative process, if we reach a score lower than this
+value, we exit the algorithm
+.RE
+The default value of the parameter is currently \fI1e-9\fR (chosen
+empirically).
+.RE
.TP
.BI "-n" nodefile ", --nodes=" nodefile
The name of the file holding node information (if not collecting via
-RAPI), instead of the default
-.I nodes
-file.
+RAPI), instead of the default \fInodes\fR file (but see below how to
+customize the default value via the environment).
.TP
.BI "-i" instancefile ", --instances=" instancefile
The name of the file holding instance information (if not collecting
-via RAPI), instead of the default
-.I instances
-file.
+via RAPI), instead of the default \fIinstances\fR file (but see below
+how to customize the default value via the environment).
.TP
.BI "-m" cluster
Collect data not from files but directly from the
.I cluster
-given as an argument via RAPI. This work for both Ganeti 1.2 and
-Ganeti 2.0.
+given as an argument via RAPI. If the argument doesn't contain a colon
+(:), then it is converted into a fully-built URL via prepending
+https:// and appending the default RAPI port, otherwise it's
+considered a fully-specified URL and is used as-is.
.TP
.BI "-l" N ", --max-length=" N
automate the execution of the balancing.
.TP
+.BI "--max-cpu " cpu-ratio
+The maximum virtual-to-physical cpu ratio, as a floating point number
+between zero and one. For example, specifying \fIcpu-ratio\fR as
+\fB2.5\fR means that, for a 4-cpu machine, a maximum of 10 virtual
+cpus should be allowed to be in use for primary instances. A value of
+one doesn't make sense though, as that means no disk space can be used
+on it.
+
+.TP
+.BI "--min-disk " disk-ratio
+The minimum amount of free disk space remaining, as a floating point
+number. For example, specifying \fIdisk-ratio\fR as \fB0.25\fR means
+that at least one quarter of disk space should be left free on nodes.
+
+.TP
.B -v, --verbose
Increase the output verbosity. Each usage of this option will increase
the verbosity (currently more than 2 doesn't make sense) from the
-default of zero.
+default of one.
+
+.TP
+.B -q, --quiet
+Decrease the output verbosity. Each usage of this option will decrease
+the verbosity (less than zero doesn't make sense) from the default of
+one.
.TP
.B -V, --version
The exist status of the command will be zero, unless for some reason
the algorithm fatally failed (e.g. wrong node or instance data).
+.SH ENVIRONMENT
+
+If the variables \fBHTOOLS_NODES\fR and \fBHTOOLS_INSTANCES\fR are
+present in the environment, they will override the default names for
+the nodes and instances files. These will have of course no effect
+when RAPI is used.
+
.SH BUGS
The program does not check its input data for consistency, and aborts
.SH SEE ALSO
.BR hn1 "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), "
.BR gnt-node "(8)"
+
+.SH "COPYRIGHT"
+.PP
+Copyright (C) 2009 Google Inc. Permission is granted to copy,
+distribute and/or modify under the terms of the GNU General Public
+License as published by the Free Software Foundation; either version 2
+of the License, or (at your option) any later version.
+.PP
+On Debian systems, the complete text of the GNU General Public License
+can be found in /usr/share/common-licenses/GPL.