X-Git-Url: https://code.grnet.gr/git/ganeti-local/blobdiff_plain/d2ac55261a8c8ffb0400f30842509b420c11d6dd..0c936d24dccef20cdca0b5fb96c786731e284173:/hbal.1 diff --git a/hbal.1 b/hbal.1 index 352a8c7..3f0c915 100644 --- a/hbal.1 +++ b/hbal.1 @@ -1,4 +1,4 @@ -.TH HBAL 1 2009-03-22 htools "Ganeti H-tools" +.TH HBAL 1 2009-03-23 htools "Ganeti H-tools" .SH NAME hbal \- Cluster balancer for Ganeti @@ -7,11 +7,15 @@ hbal \- Cluster balancer for Ganeti .B "[-C]" .B "[-p]" .B "[-o]" +.B "[-v... | -q]" .BI "[-l" limit "]" .BI "[-O" name... "]" +.BI "[-e" score "]" .BI "[-m " cluster "]" .BI "[-n " nodes-file " ]" .BI "[-i " instances-file "]" +.BI "[--max-cpu " cpu-ratio "]" +.BI "[--min-disk " disk-ratio "]" .B hbal .B --version @@ -41,11 +45,23 @@ The possible move type for an instance are combinations of failover/migrate and replace-disks such that we change one of the instance nodes, and the other one remains (but possibly with changed role, e.g. from primary it becomes secondary). The list is: - - failover (f) - - replace secondary (r) - - replace primary, a composite move (f, r, f) - - failover and replace secondary, also composite (f, r) - - replace secondary and failover, also composite (r, f) +.RS 4 +.TP 3 +\(em +failover (f) +.TP +\(em +replace secondary (r) +.TP +\(em +replace primary, a composite move (f, r, f) +.TP +\(em +failover and replace secondary, also composite (f, r) +.TP +\(em +replace secondary and failover, also composite (r, f) +.RE We don't do the only remaining possibility of replacing both nodes (r,f,r,f or the equivalent f,r,f,r) since these move needs an @@ -58,12 +74,28 @@ give better scores but will result in more disk replacements. As said before, the algorithm tries to minimise the cluster score at each step. Currently this score is computed as a sum of the following components: - - coefficient of variance of the percent of free memory - - coefficient of variance of the percent of reserved memory - - coefficient of variance of the percent of free disk - - percentage of nodes failing N+1 check - - percentage of instances living (either as primary or secondary) on - offline nodes +.RS 4 +.TP 3 +\(em +coefficient of variance of the percent of free memory +.TP +\(em +coefficient of variance of the percent of reserved memory +.TP +\(em +coefficient of variance of the percent of free disk +.TP +\(em +percentage of nodes failing N+1 check +.TP +\(em +percentage of instances living (either as primary or secondary) on +offline nodes +.TP +\(em +coefficent of variance of the ratio of virtual-to-physical cpus (for +primary instaces of the node) +.RE The free memory and free disk values help ensure that all nodes are somewhat balanced in their resource usage. The reserved memory helps @@ -96,13 +128,29 @@ On a perfectly balanced cluster (all nodes the same size, all instances the same size and spread across the nodes equally), all values would be zero. This doesn't happen too often in practice :) +.SS OFFLINE INSTANCES + +Since current Ganeti versions do not report the memory used by offline +(down) instances, ignoring the run status of instances will cause +wrong calculations. For this reason, the algorithm subtracts the +memory size of down instances from the free node memory of their +primary node, in effect simulating the startup of such instances. + .SS OTHER POSSIBLE METRICS It would be desirable to add more metrics to the algorithm, especially dynamically-computed metrics, such as: - - CPU usage of instances, combined with VCPU versus PCPU count - - Disk IO usage - - Network IO +.RS 4 +.TP 3 +\(em +CPU usage of instances +.TP +\(em +Disk IO usage +.TP +\(em +Network IO +.RE .SH OPTIONS The options that can be passed to the program are as follows: @@ -152,6 +200,12 @@ total disk .B f_dsk free disk .TP +.B pcpu +the number of physical cpus on the node +.TP +.B vcpu +the number of virtual cpus allocated to primary instances +.TP .B pri number of primary instances .TP @@ -163,6 +217,9 @@ percent of free memory .TP .B p_fdsk percent of free disk +.TP +.B r_cpu +ratio of virtual to physical cpus .RE .TP @@ -172,47 +229,83 @@ when one wants to look at multiple clusters at once and check their status. The line will contain four fields: - - initial cluster score - - number of steps in the solution - - final cluster score - - improvement in the cluster score +.RS +.RS 4 +.TP 3 +\(em +initial cluster score +.TP +\(em +number of steps in the solution +.TP +\(em +final cluster score +.TP +\(em +improvement in the cluster score +.RE +.RE .TP .BI "-O " name This option (which can be given multiple times) will mark nodes as being \fIoffline\fR. This means a couple of things: .RS -.TP -- +.RS 4 +.TP 3 +\(em instances won't be placed on these nodes, not even temporarily; e.g. the \fIreplace primary\fR move is not available if the secondary node is offline, since this move requires a failover. .TP -- +\(em these nodes will not be included in the score calculation (except for the percentage of instances on offline nodes) .RE +Note that hbal will also mark as offline any nodes which are reported +by RAPI as such, or that have "?" in file-based input in any numeric +fields. +.RE + +.TP +.BI "-e" score ", --min-score=" score +This parameter denotes the minimum score we are happy with and alters +the computation in two ways: +.RS +.RS 4 +.TP 3 +\(em +if the cluster has the initial score lower than this value, then we +don't enter the algorithm at all, and exit with success +.TP +\(em +during the iterative process, if we reach a score lower than this +value, we exit the algorithm +.RE +The default value of the parameter is currently \fI1e-9\fR (chosen +empirically). +.RE .TP .BI "-n" nodefile ", --nodes=" nodefile The name of the file holding node information (if not collecting via -RAPI), instead of the default -.I nodes -file. +RAPI), instead of the default \fInodes\fR file (but see below how to +customize the default value via the environment). .TP .BI "-i" instancefile ", --instances=" instancefile The name of the file holding instance information (if not collecting -via RAPI), instead of the default -.I instances -file. +via RAPI), instead of the default \fIinstances\fR file (but see below +how to customize the default value via the environment). .TP .BI "-m" cluster Collect data not from files but directly from the .I cluster -given as an argument via RAPI. This work for both Ganeti 1.2 and -Ganeti 2.0. +given as an argument via RAPI. If the argument doesn't contain a colon +(:), then it is converted into a fully-built URL via prepending +https:// and appending the default RAPI port, otherwise it's +considered a fully-specified URL and is used as-is. .TP .BI "-l" N ", --max-length=" N @@ -220,10 +313,31 @@ Restrict the solution to this length. This can be used for example to automate the execution of the balancing. .TP +.BI "--max-cpu " cpu-ratio +The maximum virtual-to-physical cpu ratio, as a floating point number +between zero and one. For example, specifying \fIcpu-ratio\fR as +\fB2.5\fR means that, for a 4-cpu machine, a maximum of 10 virtual +cpus should be allowed to be in use for primary instances. A value of +one doesn't make sense though, as that means no disk space can be used +on it. + +.TP +.BI "--min-disk " disk-ratio +The minimum amount of free disk space remaining, as a floating point +number. For example, specifying \fIdisk-ratio\fR as \fB0.25\fR means +that at least one quarter of disk space should be left free on nodes. + +.TP .B -v, --verbose Increase the output verbosity. Each usage of this option will increase the verbosity (currently more than 2 doesn't make sense) from the -default of zero. +default of one. + +.TP +.B -q, --quiet +Decrease the output verbosity. Each usage of this option will decrease +the verbosity (less than zero doesn't make sense) from the default of +one. .TP .B -V, --version @@ -234,6 +348,13 @@ Just show the program version and exit. The exist status of the command will be zero, unless for some reason the algorithm fatally failed (e.g. wrong node or instance data). +.SH ENVIRONMENT + +If the variables \fBHTOOLS_NODES\fR and \fBHTOOLS_INSTANCES\fR are +present in the environment, they will override the default names for +the nodes and instances files. These will have of course no effect +when RAPI is used. + .SH BUGS The program does not check its input data for consistency, and aborts @@ -426,3 +547,13 @@ list (but hopefully will end in the same state). .SH SEE ALSO .BR hn1 "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), " .BR gnt-node "(8)" + +.SH "COPYRIGHT" +.PP +Copyright (C) 2009 Google Inc. Permission is granted to copy, +distribute and/or modify under the terms of the GNU General Public +License as published by the Free Software Foundation; either version 2 +of the License, or (at your option) any later version. +.PP +On Debian systems, the complete text of the GNU General Public License +can be found in /usr/share/common-licenses/GPL.