-HBAL(1) htools | Ganeti H-tools
-===============================
+HBAL(1) Ganeti | Version @GANETI_VERSION@
+=========================================
NAME
----
**[ -g *delta* ]** **[ --min-gain-limit *threshold* ]**
**[ -O *name...* ]**
**[ --no-disk-moves ]**
+**[ --no-instance-moves ]**
**[ -U *util-file* ]**
**[ --evac-mode ]**
+**[ --select-instances *inst...* ]**
**[ --exclude-instances *inst...* ]**
Reporting options:
new jobset.
-p, --print-nodes
- Prints the before and after node status, in a format designed to
- allow the user to understand the node's most important parameters.
-
- It is possible to customise the listed information by passing a
- comma-separated list of field names to this option (the field list
- is currently undocumented), or to extend the default field list by
- prefixing the additional field list with a plus sign. By default,
- the node list will contain the following information:
-
- F
- a character denoting the status of the node, with '-' meaning an
- offline node, '*' meaning N+1 failure and blank meaning a good
- node
-
- Name
- the node name
-
- t_mem
- the total node memory
-
- n_mem
- the memory used by the node itself
-
- i_mem
- the memory used by instances
-
- x_mem
- amount memory which seems to be in use but cannot be determined
- why or by which instance; usually this means that the hypervisor
- has some overhead or that there are other reporting errors
-
- f_mem
- the free node memory
-
- r_mem
- the reserved node memory, which is the amount of free memory
- needed for N+1 compliance
-
- t_dsk
- total disk
-
- f_dsk
- free disk
-
- pcpu
- the number of physical cpus on the node
-
- vcpu
- the number of virtual cpus allocated to primary instances
-
- pcnt
- number of primary instances
-
- scnt
- number of secondary instances
-
- p_fmem
- percent of free memory
-
- p_fdsk
- percent of free disk
-
- r_cpu
- ratio of virtual to physical cpus
-
- lCpu
- the dynamic CPU load (if the information is available)
-
- lMem
- the dynamic memory load (if the information is available)
-
- lDsk
- the dynamic disk load (if the information is available)
-
- lNet
- the dynamic net load (if the information is available)
+ Prints the before and after node status, in a format designed to allow
+ the user to understand the node's most important parameters. See the
+ man page **htools**(1) for more details about this option.
--print-instances
Prints the before and after instance map. This is less useful as the
node status, but it can help in understanding instance moves.
--o, --oneline
- Only shows a one-line output from the program, designed for the case
- when one wants to look at multiple clusters at once and check their
- status.
-
- The line will contain four fields:
-
- - initial cluster score
- - number of steps in the solution
- - final cluster score
- - improvement in the cluster score
-
-O *name*
This option (which can be given multiple times) will mark nodes as
being *offline*. This means a couple of things:
a much quicker balancing, but of course the improvements are
limited. It is up to the user to decide when to use one or another.
+--no-instance-moves
+ This parameter prevents hbal from using instance moves
+ (i.e. "gnt-instance migrate/failover") operations. This will only use
+ the slow disk-replacement operations, and will also provide a worse
+ balance, but can be useful if moving instances around is deemed unsafe
+ or not preferred.
+
--evac-mode
This parameter restricts the list of instances considered for moving
to the ones living on offline/drained nodes. It can be used as a
(bulk) replacement for Ganeti's own *gnt-node evacuate*, with the
note that it doesn't guarantee full evacuation.
+--select-instances=*instances*
+ This parameter marks the given instances (as a comma-separated list)
+ as the only ones being moved during the rebalance.
+
--exclude-instances=*instances*
This parameter marks the given instances (as a comma-separated list)
from being moved during the rebalance.
metrics and thus the influence of the dynamic utilisation will be
practically insignificant.
--t *datafile*, --text-data=*datafile*
- The name of the file holding node and instance information (if not
- collecting via RAPI or LUXI). This or one of the other backends must
- be selected.
-
-S *filename*, --save-cluster=*filename*
If given, the state of the cluster before the balancing is saved to
the given file plus the extension "original"
(i.e. *filename*.original), and the state at the end of the
balancing is saved to the given file plus the extension "balanced"
(i.e. *filename*.balanced). This allows re-feeding the cluster state
- to either hbal itself or for example hspace.
+ to either hbal itself or for example hspace via the ``-t`` option.
+
+-t *datafile*, --text-data=*datafile*
+ Backend specification: the name of the file holding node and instance
+ information (if not collecting via RAPI or LUXI). This or one of the
+ other backends must be selected. The option is described in the man
+ page **htools**(1).
-m *cluster*
- Collect data directly from the *cluster* given as an argument via
- RAPI. If the argument doesn't contain a colon (:), then it is
- converted into a fully-built URL via prepending ``https://`` and
- appending the default RAPI port, otherwise it's considered a
- fully-specified URL and is used as-is.
+ Backend specification: collect data directly from the *cluster* given
+ as an argument via RAPI. The option is described in the man page
+ **htools**(1).
-L [*path*]
- Collect data directly from the master daemon, which is to be
- contacted via the luxi (an internal Ganeti protocol). An optional
- *path* argument is interpreted as the path to the unix socket on
- which the master daemon listens; otherwise, the default path used by
- ganeti when installed with *--localstatedir=/var* is used.
+ Backend specification: collect data directly from the master daemon,
+ which is to be contacted via LUXI (an internal Ganeti protocol). The
+ option is described in the man page **htools**(1).
-X
When using the Luxi backend, hbal can also execute the given
jobset will be executed in parallel. The jobsets themselves are
executed serially.
+ The execution of the job series can be interrupted, see below for
+ signal handling.
+
-l *N*, --max-length=*N*
Restrict the solution to this length. This can be used for example
to automate the execution of the balancing.
--max-cpu=*cpu-ratio*
- The maximum virtual to physical cpu ratio, as a floating point
- number between zero and one. For example, specifying *cpu-ratio* as
- **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual
- cpus should be allowed to be in use for primary instances. A value
- of one doesn't make sense though, as that means no disk space can be
- used on it.
+ The maximum virtual to physical cpu ratio, as a floating point number
+ greater than or equal to one. For example, specifying *cpu-ratio* as
+ **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual cpus
+ should be allowed to be in use for primary instances. A value of
+ exactly one means there will be no over-subscription of CPU (except
+ for the CPU time used by the node itself), and values below one do not
+ make sense, as that means other resources (e.g. disk) won't be fully
+ utilised due to CPU restrictions.
--min-disk=*disk-ratio*
The minimum amount of free disk space remaining, as a floating point
-V, --version
Just show the program version and exit.
-EXIT STATUS
------------
+SIGNAL HANDLING
+---------------
+
+When executing jobs via LUXI (using the ``-X`` option), normally hbal
+will execute all jobs until either one errors out or all the jobs finish
+successfully.
+
+Since balancing can take a long time, it is possible to stop hbal early
+in two ways:
-The exist status of the command will be zero, unless for some reason
-the algorithm fatally failed (e.g. wrong node or instance data).
+- by sending a ``SIGINT`` (``^C``), hbal will register the termination
+ request, and will wait until the currently submitted jobs finish, at
+ which point it will exit (with exit code 1)
+- by sending a ``SIGTERM``, hbal will immediately exit (with exit code
+ 2); it is the responsibility of the user to follow up with Ganeti the
+ result of the currently-executing jobs
-ENVIRONMENT
+Note that in any situation, it's perfectly safe to kill hbal, either via
+the above signals or via any other signal (e.g. ``SIGQUIT``,
+``SIGKILL``), since the jobs themselves are processed by Ganeti whereas
+hbal (after submission) only watches their progression. In this case,
+the use will again have to query Ganeti for job results.
+
+EXIT STATUS
-----------
-If the variables **HTOOLS_NODES** and **HTOOLS_INSTANCES** are present
-in the environment, they will override the default names for the nodes
-and instances files. These will have of course no effect when the RAPI
-or Luxi backends are used.
+The exit status of the command will be zero, unless for some reason the
+algorithm fatally failed (e.g. wrong node or instance data), or (in case
+of job execution) either one of the jobs has failed or the balancing was
+interrupted early.
BUGS
----
-The program does not check its input data for consistency, and aborts
-with cryptic errors messages in this case.
+The program does not check all its input data for consistency, and
+sometime aborts with cryptic errors messages with invalid data.
The algorithm is not perfect.
-The output format is not easily scriptable, and the program should
-feed moves directly into Ganeti (either via RAPI or via a gnt-debug
-input file).
-
EXAMPLE
-------
changed in a way that the program will output a different solution
list (but hopefully will end in the same state).
-SEE ALSO
---------
-
-**hspace**(1), **hscan**(1), **hail**(1), **ganeti**(7),
-**gnt-instance**(8), **gnt-node**(8)
-
-COPYRIGHT
----------
-
-Copyright (C) 2009, 2010 Google Inc. Permission is granted to copy,
-distribute and/or modify under the terms of the GNU General Public
-License as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
-
-On Debian systems, the complete text of the GNU General Public License
-can be found in /usr/share/common-licenses/GPL.
+.. vim: set textwidth=72 :
+.. Local Variables:
+.. mode: rst
+.. fill-column: 72
+.. End: