X-Git-Url: https://code.grnet.gr/git/ganeti-local/blobdiff_plain/26f7b0980f9e94b83c7baaad44bc1c2f2a199bf1..2b8463047a71e40e86233b7f35698f478350986f:/man/hbal.rst diff --git a/man/hbal.rst b/man/hbal.rst index a63ab90..49fd9ec 100644 --- a/man/hbal.rst +++ b/man/hbal.rst @@ -1,5 +1,5 @@ -HBAL(1) htools | Ganeti H-tools -=============================== +HBAL(1) Ganeti | Version @GANETI_VERSION@ +========================================= NAME ---- @@ -27,8 +27,10 @@ Algorithm options: **[ -g *delta* ]** **[ --min-gain-limit *threshold* ]** **[ -O *name...* ]** **[ --no-disk-moves ]** +**[ --no-instance-moves ]** **[ -U *util-file* ]** **[ --evac-mode ]** +**[ --select-instances *inst...* ]** **[ --exclude-instances *inst...* ]** Reporting options: @@ -215,82 +217,9 @@ The options that can be passed to the program are as follows: new jobset. -p, --print-nodes - Prints the before and after node status, in a format designed to - allow the user to understand the node's most important parameters. - - It is possible to customise the listed information by passing a - comma-separated list of field names to this option (the field list - is currently undocumented), or to extend the default field list by - prefixing the additional field list with a plus sign. By default, - the node list will contain the following information: - - F - a character denoting the status of the node, with '-' meaning an - offline node, '*' meaning N+1 failure and blank meaning a good - node - - Name - the node name - - t_mem - the total node memory - - n_mem - the memory used by the node itself - - i_mem - the memory used by instances - - x_mem - amount memory which seems to be in use but cannot be determined - why or by which instance; usually this means that the hypervisor - has some overhead or that there are other reporting errors - - f_mem - the free node memory - - r_mem - the reserved node memory, which is the amount of free memory - needed for N+1 compliance - - t_dsk - total disk - - f_dsk - free disk - - pcpu - the number of physical cpus on the node - - vcpu - the number of virtual cpus allocated to primary instances - - pcnt - number of primary instances - - scnt - number of secondary instances - - p_fmem - percent of free memory - - p_fdsk - percent of free disk - - r_cpu - ratio of virtual to physical cpus - - lCpu - the dynamic CPU load (if the information is available) - - lMem - the dynamic memory load (if the information is available) - - lDsk - the dynamic disk load (if the information is available) - - lNet - the dynamic net load (if the information is available) + Prints the before and after node status, in a format designed to allow + the user to understand the node's most important parameters. See the + man page **htools**(1) for more details about this option. --print-instances Prints the before and after instance map. This is less useful as the @@ -355,12 +284,23 @@ The options that can be passed to the program are as follows: a much quicker balancing, but of course the improvements are limited. It is up to the user to decide when to use one or another. +--no-instance-moves + This parameter prevents hbal from using instance moves + (i.e. "gnt-instance migrate/failover") operations. This will only use + the slow disk-replacement operations, and will also provide a worse + balance, but can be useful if moving instances around is deemed unsafe + or not preferred. + --evac-mode This parameter restricts the list of instances considered for moving to the ones living on offline/drained nodes. It can be used as a (bulk) replacement for Ganeti's own *gnt-node evacuate*, with the note that it doesn't guarantee full evacuation. +--select-instances=*instances* + This parameter marks the given instances (as a comma-separated list) + as the only ones being moved during the rebalance. + --exclude-instances=*instances* This parameter marks the given instances (as a comma-separated list) from being moved during the rebalance. @@ -422,17 +362,22 @@ The options that can be passed to the program are as follows: jobset will be executed in parallel. The jobsets themselves are executed serially. + The execution of the job series can be interrupted, see below for + signal handling. + -l *N*, --max-length=*N* Restrict the solution to this length. This can be used for example to automate the execution of the balancing. --max-cpu=*cpu-ratio* - The maximum virtual to physical cpu ratio, as a floating point - number between zero and one. For example, specifying *cpu-ratio* as - **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual - cpus should be allowed to be in use for primary instances. A value - of one doesn't make sense though, as that means no disk space can be - used on it. + The maximum virtual to physical cpu ratio, as a floating point number + greater than or equal to one. For example, specifying *cpu-ratio* as + **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual cpus + should be allowed to be in use for primary instances. A value of + exactly one means there will be no over-subscription of CPU (except + for the CPU time used by the node itself), and values below one do not + make sense, as that means other resources (e.g. disk) won't be fully + utilised due to CPU restrictions. --min-disk=*disk-ratio* The minimum amount of free disk space remaining, as a floating point @@ -457,25 +402,45 @@ The options that can be passed to the program are as follows: -V, --version Just show the program version and exit. +SIGNAL HANDLING +--------------- + +When executing jobs via LUXI (using the ``-X`` option), normally hbal +will execute all jobs until either one errors out or all the jobs finish +successfully. + +Since balancing can take a long time, it is possible to stop hbal early +in two ways: + +- by sending a ``SIGINT`` (``^C``), hbal will register the termination + request, and will wait until the currently submitted jobs finish, at + which point it will exit (with exit code 1) +- by sending a ``SIGTERM``, hbal will immediately exit (with exit code + 2); it is the responsibility of the user to follow up with Ganeti the + result of the currently-executing jobs + +Note that in any situation, it's perfectly safe to kill hbal, either via +the above signals or via any other signal (e.g. ``SIGQUIT``, +``SIGKILL``), since the jobs themselves are processed by Ganeti whereas +hbal (after submission) only watches their progression. In this case, +the use will again have to query Ganeti for job results. + EXIT STATUS ----------- -The exit status of the command will be zero, unless for some reason -the algorithm fatally failed (e.g. wrong node or instance data), or -(in case of job execution) any job has failed. +The exit status of the command will be zero, unless for some reason the +algorithm fatally failed (e.g. wrong node or instance data), or (in case +of job execution) either one of the jobs has failed or the balancing was +interrupted early. BUGS ---- -The program does not check its input data for consistency, and aborts -with cryptic errors messages in this case. +The program does not check all its input data for consistency, and +sometime aborts with cryptic errors messages with invalid data. The algorithm is not perfect. -The output format is not easily scriptable, and the program should -feed moves directly into Ganeti (either via RAPI or via a gnt-debug -input file). - EXAMPLE ------- @@ -648,19 +613,8 @@ done. Otherwise, if only the migrate is done, the input data is changed in a way that the program will output a different solution list (but hopefully will end in the same state). -SEE ALSO --------- - -**hspace**(1), **hscan**(1), **hail**(1), **ganeti**(7), -**gnt-instance**(8), **gnt-node**(8) - -COPYRIGHT ---------- - -Copyright (C) 2009, 2010, 2011 Google Inc. Permission is granted to -copy, distribute and/or modify under the terms of the GNU General -Public License as published by the Free Software Foundation; either -version 2 of the License, or (at your option) any later version. - -On Debian systems, the complete text of the GNU General Public License -can be found in /usr/share/common-licenses/GPL. +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: