-HBAL(1) htools | Ganeti H-tools
-===============================
+HBAL(1) Ganeti | Version @GANETI_VERSION@
+=========================================
NAME
----
**[ -g *delta* ]** **[ --min-gain-limit *threshold* ]**
**[ -O *name...* ]**
**[ --no-disk-moves ]**
+**[ --no-instance-moves ]**
**[ -U *util-file* ]**
**[ --evac-mode ]**
+**[ --select-instances *inst...* ]**
**[ --exclude-instances *inst...* ]**
Reporting options:
new jobset.
-p, --print-nodes
- Prints the before and after node status, in a format designed to
- allow the user to understand the node's most important parameters.
-
- It is possible to customise the listed information by passing a
- comma-separated list of field names to this option (the field list
- is currently undocumented), or to extend the default field list by
- prefixing the additional field list with a plus sign. By default,
- the node list will contain the following information:
-
- F
- a character denoting the status of the node, with '-' meaning an
- offline node, '*' meaning N+1 failure and blank meaning a good
- node
-
- Name
- the node name
-
- t_mem
- the total node memory
-
- n_mem
- the memory used by the node itself
-
- i_mem
- the memory used by instances
-
- x_mem
- amount memory which seems to be in use but cannot be determined
- why or by which instance; usually this means that the hypervisor
- has some overhead or that there are other reporting errors
-
- f_mem
- the free node memory
-
- r_mem
- the reserved node memory, which is the amount of free memory
- needed for N+1 compliance
-
- t_dsk
- total disk
-
- f_dsk
- free disk
-
- pcpu
- the number of physical cpus on the node
-
- vcpu
- the number of virtual cpus allocated to primary instances
-
- pcnt
- number of primary instances
-
- scnt
- number of secondary instances
-
- p_fmem
- percent of free memory
-
- p_fdsk
- percent of free disk
-
- r_cpu
- ratio of virtual to physical cpus
-
- lCpu
- the dynamic CPU load (if the information is available)
-
- lMem
- the dynamic memory load (if the information is available)
-
- lDsk
- the dynamic disk load (if the information is available)
-
- lNet
- the dynamic net load (if the information is available)
+ Prints the before and after node status, in a format designed to allow
+ the user to understand the node's most important parameters. See the
+ man page **htools**(1) for more details about this option.
--print-instances
Prints the before and after instance map. This is less useful as the
a much quicker balancing, but of course the improvements are
limited. It is up to the user to decide when to use one or another.
+--no-instance-moves
+ This parameter prevents hbal from using instance moves
+ (i.e. "gnt-instance migrate/failover") operations. This will only use
+ the slow disk-replacement operations, and will also provide a worse
+ balance, but can be useful if moving instances around is deemed unsafe
+ or not preferred.
+
--evac-mode
This parameter restricts the list of instances considered for moving
to the ones living on offline/drained nodes. It can be used as a
(bulk) replacement for Ganeti's own *gnt-node evacuate*, with the
note that it doesn't guarantee full evacuation.
+--select-instances=*instances*
+ This parameter marks the given instances (as a comma-separated list)
+ as the only ones being moved during the rebalance.
+
--exclude-instances=*instances*
This parameter marks the given instances (as a comma-separated list)
from being moved during the rebalance.
jobset will be executed in parallel. The jobsets themselves are
executed serially.
+ The execution of the job series can be interrupted, see below for
+ signal handling.
+
-l *N*, --max-length=*N*
Restrict the solution to this length. This can be used for example
to automate the execution of the balancing.
--max-cpu=*cpu-ratio*
- The maximum virtual to physical cpu ratio, as a floating point
- number between zero and one. For example, specifying *cpu-ratio* as
- **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual
- cpus should be allowed to be in use for primary instances. A value
- of one doesn't make sense though, as that means no disk space can be
- used on it.
+ The maximum virtual to physical cpu ratio, as a floating point number
+ greater than or equal to one. For example, specifying *cpu-ratio* as
+ **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual cpus
+ should be allowed to be in use for primary instances. A value of
+ exactly one means there will be no over-subscription of CPU (except
+ for the CPU time used by the node itself), and values below one do not
+ make sense, as that means other resources (e.g. disk) won't be fully
+ utilised due to CPU restrictions.
--min-disk=*disk-ratio*
The minimum amount of free disk space remaining, as a floating point
-V, --version
Just show the program version and exit.
+SIGNAL HANDLING
+---------------
+
+When executing jobs via LUXI (using the ``-X`` option), normally hbal
+will execute all jobs until either one errors out or all the jobs finish
+successfully.
+
+Since balancing can take a long time, it is possible to stop hbal early
+in two ways:
+
+- by sending a ``SIGINT`` (``^C``), hbal will register the termination
+ request, and will wait until the currently submitted jobs finish, at
+ which point it will exit (with exit code 1)
+- by sending a ``SIGTERM``, hbal will immediately exit (with exit code
+ 2); it is the responsibility of the user to follow up with Ganeti the
+ result of the currently-executing jobs
+
+Note that in any situation, it's perfectly safe to kill hbal, either via
+the above signals or via any other signal (e.g. ``SIGQUIT``,
+``SIGKILL``), since the jobs themselves are processed by Ganeti whereas
+hbal (after submission) only watches their progression. In this case,
+the use will again have to query Ganeti for job results.
+
EXIT STATUS
-----------
-The exit status of the command will be zero, unless for some reason
-the algorithm fatally failed (e.g. wrong node or instance data), or
-(in case of job execution) any job has failed.
+The exit status of the command will be zero, unless for some reason the
+algorithm fatally failed (e.g. wrong node or instance data), or (in case
+of job execution) either one of the jobs has failed or the balancing was
+interrupted early.
BUGS
----
-The program does not check its input data for consistency, and aborts
-with cryptic errors messages in this case.
+The program does not check all its input data for consistency, and
+sometime aborts with cryptic errors messages with invalid data.
The algorithm is not perfect.
-The output format is not easily scriptable, and the program should
-feed moves directly into Ganeti (either via RAPI or via a gnt-debug
-input file).
-
EXAMPLE
-------
changed in a way that the program will output a different solution
list (but hopefully will end in the same state).
-SEE ALSO
---------
-
-**hspace**(1), **hscan**(1), **hail**(1), **ganeti**(7),
-**gnt-instance**(8), **gnt-node**(8)
-
-COPYRIGHT
----------
-
-Copyright (C) 2009, 2010, 2011 Google Inc. Permission is granted to
-copy, distribute and/or modify under the terms of the GNU General
-Public License as published by the Free Software Foundation; either
-version 2 of the License, or (at your option) any later version.
-
-On Debian systems, the complete text of the GNU General Public License
-can be found in /usr/share/common-licenses/GPL.
+.. vim: set textwidth=72 :
+.. Local Variables:
+.. mode: rst
+.. fill-column: 72
+.. End: