**hbal** {backend options...} [algorithm options...] [reporting options...]
-**hbal** --version
+**hbal** \--version
Backend options:
-{ **-m** *cluster* | **-L[** *path* **] [-X]** | **-t** *data-file* }
+{ **-m** *cluster* | **-L[** *path* **] [-X]** | **-t** *data-file* |
+**-I** *path* }
Algorithm options:
-**[ --max-cpu *cpu-ratio* ]**
-**[ --min-disk *disk-ratio* ]**
+**[ \--max-cpu *cpu-ratio* ]**
+**[ \--min-disk *disk-ratio* ]**
**[ -l *limit* ]**
**[ -e *score* ]**
-**[ -g *delta* ]** **[ --min-gain-limit *threshold* ]**
+**[ -g *delta* ]** **[ \--min-gain-limit *threshold* ]**
**[ -O *name...* ]**
-**[ --no-disk-moves ]**
-**[ --no-instance-moves ]**
+**[ \--no-disk-moves ]**
+**[ \--no-instance-moves ]**
**[ -U *util-file* ]**
-**[ --evac-mode ]**
-**[ --select-instances *inst...* ]**
-**[ --exclude-instances *inst...* ]**
+**[ \--evac-mode ]**
+**[ \--select-instances *inst...* ]**
+**[ \--exclude-instances *inst...* ]**
Reporting options:
**[ -C[ *file* ] ]**
**[ -p[ *fields* ] ]**
-**[ --print-instances ]**
-**[ -o ]**
+**[ \--print-instances ]**
+**[ -S *file* ]**
**[ -v... | -q ]**
The algorithm used is designed to be stable (i.e. it will give you the
same results when restarting it from the middle of the solution) and
-reasonably fast. It is not, however, designed to be a perfect
-algorithm--it is possible to make it go into a corner from which
-it can find no improvement, because it looks only one "step" ahead.
+reasonably fast. It is not, however, designed to be a perfect algorithm:
+it is possible to make it go into a corner from which it can find no
+improvement, because it looks only one "step" ahead.
By default, the program will show the solution incrementally as it is
computed, in a somewhat cryptic format; for getting the actual Ganeti
- an instance to move onto an offline node (offline nodes are either
read from the cluster or declared with *-O*)
- an exclusion-tag based conflict (exclusion tags are read from the
- cluster and/or defined via the *--exclusion-tags* option)
-- a max vcpu/pcpu ratio to be exceeded (configured via *--max-cpu*)
+ cluster and/or defined via the *\--exclusion-tags* option)
+- a max vcpu/pcpu ratio to be exceeded (configured via *\--max-cpu*)
- min disk free percentage to go below the configured limit
- (configured via *--min-disk*)
+ (configured via *\--min-disk*)
CLUSTER SCORING
~~~~~~~~~~~~~~~
It works by tagging instances with certain tags and then building
exclusion maps based on these. Which tags are actually used is
-configured either via the command line (option *--exclusion-tags*)
+configured either via the command line (option *\--exclusion-tags*)
or via adding them to the cluster tags:
---exclusion-tags=a,b
+\--exclusion-tags=a,b
This will make all instance tags of the form *a:\**, *b:\** be
considered for the exclusion map
The options that can be passed to the program are as follows:
--C, --print-commands
+-C, \--print-commands
Print the command list at the end of the run. Without this, the
program will only show a shorter, but cryptic output.
parallel (due to resource allocation in Ganeti) and thus we start a
new jobset.
--p, --print-nodes
+-p, \--print-nodes
Prints the before and after node status, in a format designed to allow
the user to understand the node's most important parameters. See the
- man page **htools**(1) for more details about this option.
+ man page **htools**\(1) for more details about this option.
---print-instances
+\--print-instances
Prints the before and after instance map. This is less useful as the
node status, but it can help in understanding instance moves.
reported by RAPI as such, or that have "?" in file-based input in
any numeric fields.
--e *score*, --min-score=*score*
+-e *score*, \--min-score=*score*
This parameter denotes the minimum score we are happy with and alters
the computation in two ways:
The default value of the parameter is currently ``1e-9`` (chosen
empirically).
--g *delta*, --min-gain=*delta*
+-g *delta*, \--min-gain=*delta*
Since the balancing algorithm can sometimes result in just very tiny
improvements, that bring less gain that they cost in relocation
time, this parameter (defaulting to 0.01) represents the minimum
gain we require during a step, to continue balancing.
---min-gain-limit=*threshold*
+\--min-gain-limit=*threshold*
The above min-gain option will only take effect if the cluster score
is already below *threshold* (defaults to 0.1). The rationale behind
this setting is that at high cluster scores (badly balanced
threshold, the total gain is only the threshold value, so we can
exit early.
---no-disk-moves
+\--no-disk-moves
This parameter prevents hbal from using disk move
(i.e. "gnt-instance replace-disks") operations. This will result in
a much quicker balancing, but of course the improvements are
limited. It is up to the user to decide when to use one or another.
---no-instance-moves
+\--no-instance-moves
This parameter prevents hbal from using instance moves
(i.e. "gnt-instance migrate/failover") operations. This will only use
the slow disk-replacement operations, and will also provide a worse
balance, but can be useful if moving instances around is deemed unsafe
or not preferred.
---evac-mode
+\--evac-mode
This parameter restricts the list of instances considered for moving
to the ones living on offline/drained nodes. It can be used as a
(bulk) replacement for Ganeti's own *gnt-node evacuate*, with the
note that it doesn't guarantee full evacuation.
---select-instances=*instances*
+\--select-instances=*instances*
This parameter marks the given instances (as a comma-separated list)
as the only ones being moved during the rebalance.
---exclude-instances=*instances*
+\--exclude-instances=*instances*
This parameter marks the given instances (as a comma-separated list)
from being moved during the rebalance.
metrics and thus the influence of the dynamic utilisation will be
practically insignificant.
--S *filename*, --save-cluster=*filename*
+-S *filename*, \--save-cluster=*filename*
If given, the state of the cluster before the balancing is saved to
the given file plus the extension "original"
(i.e. *filename*.original), and the state at the end of the
(i.e. *filename*.balanced). This allows re-feeding the cluster state
to either hbal itself or for example hspace via the ``-t`` option.
--t *datafile*, --text-data=*datafile*
+-t *datafile*, \--text-data=*datafile*
Backend specification: the name of the file holding node and instance
information (if not collecting via RAPI or LUXI). This or one of the
other backends must be selected. The option is described in the man
- page **htools**(1).
+ page **htools**\(1).
-m *cluster*
Backend specification: collect data directly from the *cluster* given
as an argument via RAPI. The option is described in the man page
- **htools**(1).
+ **htools**\(1).
-L [*path*]
Backend specification: collect data directly from the master daemon,
which is to be contacted via LUXI (an internal Ganeti protocol). The
- option is described in the man page **htools**(1).
+ option is described in the man page **htools**\(1).
-X
When using the Luxi backend, hbal can also execute the given
The execution of the job series can be interrupted, see below for
signal handling.
--l *N*, --max-length=*N*
+-l *N*, \--max-length=*N*
Restrict the solution to this length. This can be used for example
to automate the execution of the balancing.
---max-cpu=*cpu-ratio*
+\--max-cpu=*cpu-ratio*
The maximum virtual to physical cpu ratio, as a floating point number
greater than or equal to one. For example, specifying *cpu-ratio* as
**2.5** means that, for a 4-cpu machine, a maximum of 10 virtual cpus
make sense, as that means other resources (e.g. disk) won't be fully
utilised due to CPU restrictions.
---min-disk=*disk-ratio*
+\--min-disk=*disk-ratio*
The minimum amount of free disk space remaining, as a floating point
number. For example, specifying *disk-ratio* as **0.25** means that
at least one quarter of disk space should be left free on nodes.
--G *uuid*, --group=*uuid*
+-G *uuid*, \--group=*uuid*
On an multi-group cluster, select this group for
processing. Otherwise hbal will abort, since it cannot balance
multiple groups at the same time.
--v, --verbose
+-v, \--verbose
Increase the output verbosity. Each usage of this option will
increase the verbosity (currently more than 2 doesn't make sense)
from the default of one.
--q, --quiet
+-q, \--quiet
Decrease the output verbosity. Each usage of this option will
decrease the verbosity (less than zero doesn't make sense) from the
default of one.
--V, --version
+-V, \--version
Just show the program version and exit.
SIGNAL HANDLING
- by sending a ``SIGINT`` (``^C``), hbal will register the termination
request, and will wait until the currently submitted jobs finish, at
- which point it will exit (with exit code 1)
+ which point it will exit (with exit code 0 if all jobs finished
+ correctly, otherwise with exit code 1 as usual)
+
- by sending a ``SIGTERM``, hbal will immediately exit (with exit code
- 2); it is the responsibility of the user to follow up with Ganeti the
- result of the currently-executing jobs
+ 2\); it is the responsibility of the user to follow up with Ganeti
+ and check the result of the currently-executing jobs
Note that in any situation, it's perfectly safe to kill hbal, either via
the above signals or via any other signal (e.g. ``SIGQUIT``,
``SIGKILL``), since the jobs themselves are processed by Ganeti whereas
hbal (after submission) only watches their progression. In this case,
-the use will again have to query Ganeti for job results.
+the user will have to query Ganeti for job results.
EXIT STATUS
-----------
The exit status of the command will be zero, unless for some reason the
-algorithm fatally failed (e.g. wrong node or instance data), or (in case
-of job execution) either one of the jobs has failed or the balancing was
-interrupted early.
+algorithm failed (e.g. wrong node or instance data), invalid command
+line options, or (in case of job execution) one of the jobs has failed.
+
+Once job execution via Luxi has started (``-X``), if the balancing was
+interrupted early (via *SIGINT*, or via ``--max-length``) but all jobs
+executed successfully, then the exit status is zero; a non-zero exit
+code means that the cluster state should be investigated, since a job
+failed or we couldn't compute its status and this can also point to a
+problem on the Ganeti side.
BUGS
----