update hroller man page: node group filtering is implemented

[ganeti-local] / man / hbal.rst
diff --git a/man/hbal.rst b/man/hbal.rst

index f8f2fb1..766baac 100644 (file)
--- a/man/hbal.rst
+++ b/man/hbal.rst
@@ -11,32 +11,35 @@ SYNOPSIS
  
  **hbal** {backend options...} [algorithm options...] [reporting options...]
  
-**hbal** --version
+**hbal** \--version
  
  
  Backend options:
  
-{ **-m** *cluster* | **-L[** *path* **] [-X]** | **-t** *data-file* }
+{ **-m** *cluster* | **-L[** *path* **] [-X]** | **-t** *data-file* |
+**-I** *path* }
  
  Algorithm options:
  
-**[ --max-cpu *cpu-ratio* ]**
-**[ --min-disk *disk-ratio* ]**
+**[ \--max-cpu *cpu-ratio* ]**
+**[ \--min-disk *disk-ratio* ]**
  **[ -l *limit* ]**
  **[ -e *score* ]**
-**[ -g *delta* ]** **[ --min-gain-limit *threshold* ]**
+**[ -g *delta* ]** **[ \--min-gain-limit *threshold* ]**
  **[ -O *name...* ]**
-**[ --no-disk-moves ]**
+**[ \--no-disk-moves ]**
+**[ \--no-instance-moves ]**
  **[ -U *util-file* ]**
-**[ --evac-mode ]**
-**[ --exclude-instances *inst...* ]**
+**[ \--evac-mode ]**
+**[ \--select-instances *inst...* ]**
+**[ \--exclude-instances *inst...* ]**
  
  Reporting options:
  
  **[ -C[ *file* ] ]**
  **[ -p[ *fields* ] ]**
-**[ --print-instances ]**
-**[ -o ]**
+**[ \--print-instances ]**
+**[ -S *file* ]**
  **[ -v... | -q ]**
  
  
@@ -50,9 +53,9 @@ the cluster into a better state.
  
  The algorithm used is designed to be stable (i.e. it will give you the
  same results when restarting it from the middle of the solution) and
-reasonably fast. It is not, however, designed to be a perfect
-algorithm--it is possible to make it go into a corner from which
-it can find no improvement, because it looks only one "step" ahead.
+reasonably fast. It is not, however, designed to be a perfect algorithm:
+it is possible to make it go into a corner from which it can find no
+improvement, because it looks only one "step" ahead.
  
  By default, the program will show the solution incrementally as it is
  computed, in a somewhat cryptic format; for getting the actual Ganeti
@@ -90,10 +93,10 @@ At each step, we prevent an instance move if it would cause:
  - an instance to move onto an offline node (offline nodes are either
    read from the cluster or declared with *-O*)
  - an exclusion-tag based conflict (exclusion tags are read from the
-  cluster and/or defined via the *--exclusion-tags* option)
-- a max vcpu/pcpu ratio to be exceeded (configured via *--max-cpu*)
+  cluster and/or defined via the *\--exclusion-tags* option)
+- a max vcpu/pcpu ratio to be exceeded (configured via *\--max-cpu*)
  - min disk free percentage to go below the configured limit
-  (configured via *--min-disk*)
+  (configured via *\--min-disk*)
  
  CLUSTER SCORING
  ~~~~~~~~~~~~~~~
@@ -176,10 +179,10 @@ which would make the respective node a SPOF for the given service.
  
  It works by tagging instances with certain tags and then building
  exclusion maps based on these. Which tags are actually used is
-configured either via the command line (option *--exclusion-tags*)
+configured either via the command line (option *\--exclusion-tags*)
  or via adding them to the cluster tags:
  
---exclusion-tags=a,b
+\--exclusion-tags=a,b
    This will make all instance tags of the form *a:\**, *b:\** be
    considered for the exclusion map
  
@@ -196,7 +199,7 @@ OPTIONS
  
  The options that can be passed to the program are as follows:
  
--C, --print-commands
+-C, \--print-commands
    Print the command list at the end of the run. Without this, the
    program will only show a shorter, but cryptic output.
  
@@ -214,100 +217,15 @@ The options that can be passed to the program are as follows:
    parallel (due to resource allocation in Ganeti) and thus we start a
    new jobset.
  
--p, --print-nodes
-  Prints the before and after node status, in a format designed to
-  allow the user to understand the node's most important parameters.
+-p, \--print-nodes
+  Prints the before and after node status, in a format designed to allow
+  the user to understand the node's most important parameters. See the
+  man page **htools**\(1) for more details about this option.
  
-  It is possible to customise the listed information by passing a
-  comma-separated list of field names to this option (the field list
-  is currently undocumented), or to extend the default field list by
-  prefixing the additional field list with a plus sign. By default,
-  the node list will contain the following information:
-
-  F
-    a character denoting the status of the node, with '-' meaning an
-    offline node, '*' meaning N+1 failure and blank meaning a good
-    node
-
-  Name
-    the node name
-
-  t_mem
-    the total node memory
-
-  n_mem
-    the memory used by the node itself
-
-  i_mem
-    the memory used by instances
-
-  x_mem
-    amount memory which seems to be in use but cannot be determined
-    why or by which instance; usually this means that the hypervisor
-    has some overhead or that there are other reporting errors
-
-  f_mem
-    the free node memory
-
-  r_mem
-    the reserved node memory, which is the amount of free memory
-    needed for N+1 compliance
-
-  t_dsk
-    total disk
-
-  f_dsk
-    free disk
-
-  pcpu
-    the number of physical cpus on the node
-
-  vcpu
-    the number of virtual cpus allocated to primary instances
-
-  pcnt
-    number of primary instances
-
-  scnt
-    number of secondary instances
-
-  p_fmem
-    percent of free memory
-
-  p_fdsk
-    percent of free disk
-
-  r_cpu
-    ratio of virtual to physical cpus
-
-  lCpu
-    the dynamic CPU load (if the information is available)
-
-  lMem
-    the dynamic memory load (if the information is available)
-
-  lDsk
-    the dynamic disk load (if the information is available)
-
-  lNet
-    the dynamic net load (if the information is available)
-
---print-instances
+\--print-instances
    Prints the before and after instance map. This is less useful as the
    node status, but it can help in understanding instance moves.
  
--o, --oneline
-  Only shows a one-line output from the program, designed for the case
-  when one wants to look at multiple clusters at once and check their
-  status.
-
-  The line will contain four fields:
-
-  - initial cluster score
-  - number of steps in the solution
-  - final cluster score
-  - improvement in the cluster score
-
  -O *name*
    This option (which can be given multiple times) will mark nodes as
    being *offline*. This means a couple of things:
@@ -322,7 +240,7 @@ The options that can be passed to the program are as follows:
    reported by RAPI as such, or that have "?" in file-based input in
    any numeric fields.
  
--e *score*, --min-score=*score*
+-e *score*, \--min-score=*score*
    This parameter denotes the minimum score we are happy with and alters
    the computation in two ways:
  
@@ -334,13 +252,13 @@ The options that can be passed to the program are as follows:
    The default value of the parameter is currently ``1e-9`` (chosen
    empirically).
  
--g *delta*, --min-gain=*delta*
+-g *delta*, \--min-gain=*delta*
    Since the balancing algorithm can sometimes result in just very tiny
    improvements, that bring less gain that they cost in relocation
    time, this parameter (defaulting to 0.01) represents the minimum
    gain we require during a step, to continue balancing.
  
---min-gain-limit=*threshold*
+\--min-gain-limit=*threshold*
    The above min-gain option will only take effect if the cluster score
    is already below *threshold* (defaults to 0.1). The rationale behind
    this setting is that at high cluster scores (badly balanced
@@ -349,19 +267,30 @@ The options that can be passed to the program are as follows:
    threshold, the total gain is only the threshold value, so we can
    exit early.
  
---no-disk-moves
+\--no-disk-moves
    This parameter prevents hbal from using disk move
    (i.e. "gnt-instance replace-disks") operations. This will result in
    a much quicker balancing, but of course the improvements are
    limited. It is up to the user to decide when to use one or another.
  
---evac-mode
+\--no-instance-moves
+  This parameter prevents hbal from using instance moves
+  (i.e. "gnt-instance migrate/failover") operations. This will only use
+  the slow disk-replacement operations, and will also provide a worse
+  balance, but can be useful if moving instances around is deemed unsafe
+  or not preferred.
+
+\--evac-mode
    This parameter restricts the list of instances considered for moving
    to the ones living on offline/drained nodes. It can be used as a
    (bulk) replacement for Ganeti's own *gnt-node evacuate*, with the
    note that it doesn't guarantee full evacuation.
  
---exclude-instances=*instances*
+\--select-instances=*instances*
+  This parameter marks the given instances (as a comma-separated list)
+  as the only ones being moved during the rebalance.
+
+\--exclude-instances=*instances*
    This parameter marks the given instances (as a comma-separated list)
    from being moved during the rebalance.
  
@@ -385,32 +314,29 @@ The options that can be passed to the program are as follows:
    metrics and thus the influence of the dynamic utilisation will be
    practically insignificant.
  
--t *datafile*, --text-data=*datafile*
-  The name of the file holding node and instance information (if not
-  collecting via RAPI or LUXI). This or one of the other backends must
-  be selected.
-
--S *filename*, --save-cluster=*filename*
+-S *filename*, \--save-cluster=*filename*
    If given, the state of the cluster before the balancing is saved to
    the given file plus the extension "original"
    (i.e. *filename*.original), and the state at the end of the
    balancing is saved to the given file plus the extension "balanced"
    (i.e. *filename*.balanced). This allows re-feeding the cluster state
-  to either hbal itself or for example hspace.
+  to either hbal itself or for example hspace via the ``-t`` option.
+
+-t *datafile*, \--text-data=*datafile*
+  Backend specification: the name of the file holding node and instance
+  information (if not collecting via RAPI or LUXI). This or one of the
+  other backends must be selected. The option is described in the man
+  page **htools**\(1).
  
  -m *cluster*
- Collect data directly from the *cluster* given as an argument via
- RAPI. If the argument doesn't contain a colon (:), then it is
- converted into a fully-built URL via prepending ``https://`` and
- appending the default RAPI port, otherwise it's considered a
- fully-specified URL and is used as-is.
+  Backend specification: collect data directly from the *cluster* given
+  as an argument via RAPI. The option is described in the man page
+  **htools**\(1).
  
  -L [*path*]
-  Collect data directly from the master daemon, which is to be
-  contacted via the luxi (an internal Ganeti protocol). An optional
-  *path* argument is interpreted as the path to the unix socket on
-  which the master daemon listens; otherwise, the default path used by
-  ganeti when installed with *--localstatedir=/var* is used.
+  Backend specification: collect data directly from the master daemon,
+  which is to be contacted via LUXI (an internal Ganeti protocol). The
+  option is described in the man page **htools**\(1).
  
  -X
    When using the Luxi backend, hbal can also execute the given
@@ -422,60 +348,93 @@ The options that can be passed to the program are as follows:
    jobset will be executed in parallel. The jobsets themselves are
    executed serially.
  
--l *N*, --max-length=*N*
+  The execution of the job series can be interrupted, see below for
+  signal handling.
+
+-l *N*, \--max-length=*N*
    Restrict the solution to this length. This can be used for example
    to automate the execution of the balancing.
  
---max-cpu=*cpu-ratio*
-  The maximum virtual to physical cpu ratio, as a floating point
-  number between zero and one. For example, specifying *cpu-ratio* as
-  **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual
-  cpus should be allowed to be in use for primary instances. A value
-  of one doesn't make sense though, as that means no disk space can be
-  used on it.
-
---min-disk=*disk-ratio*
+\--max-cpu=*cpu-ratio*
+  The maximum virtual to physical cpu ratio, as a floating point number
+  greater than or equal to one. For example, specifying *cpu-ratio* as
+  **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual cpus
+  should be allowed to be in use for primary instances. A value of
+  exactly one means there will be no over-subscription of CPU (except
+  for the CPU time used by the node itself), and values below one do not
+  make sense, as that means other resources (e.g. disk) won't be fully
+  utilised due to CPU restrictions.
+
+\--min-disk=*disk-ratio*
    The minimum amount of free disk space remaining, as a floating point
    number. For example, specifying *disk-ratio* as **0.25** means that
    at least one quarter of disk space should be left free on nodes.
  
--G *uuid*, --group=*uuid*
+-G *uuid*, \--group=*uuid*
    On an multi-group cluster, select this group for
    processing. Otherwise hbal will abort, since it cannot balance
    multiple groups at the same time.
  
--v, --verbose
+-v, \--verbose
    Increase the output verbosity. Each usage of this option will
    increase the verbosity (currently more than 2 doesn't make sense)
    from the default of one.
  
--q, --quiet
+-q, \--quiet
    Decrease the output verbosity. Each usage of this option will
    decrease the verbosity (less than zero doesn't make sense) from the
    default of one.
  
--V, --version
+-V, \--version
    Just show the program version and exit.
  
+SIGNAL HANDLING
+---------------
+
+When executing jobs via LUXI (using the ``-X`` option), normally hbal
+will execute all jobs until either one errors out or all the jobs finish
+successfully.
+
+Since balancing can take a long time, it is possible to stop hbal early
+in two ways:
+
+- by sending a ``SIGINT`` (``^C``), hbal will register the termination
+  request, and will wait until the currently submitted jobs finish, at
+  which point it will exit (with exit code 0 if all jobs finished
+  correctly, otherwise with exit code 1 as usual)
+
+- by sending a ``SIGTERM``, hbal will immediately exit (with exit code
+  2\); it is the responsibility of the user to follow up with Ganeti
+  and check the result of the currently-executing jobs
+
+Note that in any situation, it's perfectly safe to kill hbal, either via
+the above signals or via any other signal (e.g. ``SIGQUIT``,
+``SIGKILL``), since the jobs themselves are processed by Ganeti whereas
+hbal (after submission) only watches their progression. In this case,
+the user will have to query Ganeti for job results.
+
  EXIT STATUS
  -----------
  
-The exit status of the command will be zero, unless for some reason
-the algorithm fatally failed (e.g. wrong node or instance data), or
-(in case of job execution) any job has failed.
+The exit status of the command will be zero, unless for some reason the
+algorithm failed (e.g. wrong node or instance data), invalid command
+line options, or (in case of job execution) one of the jobs has failed.
+
+Once job execution via Luxi has started (``-X``), if the balancing was
+interrupted early (via *SIGINT*, or via ``--max-length``) but all jobs
+executed successfully, then the exit status is zero; a non-zero exit
+code means that the cluster state should be investigated, since a job
+failed or we couldn't compute its status and this can also point to a
+problem on the Ganeti side.
  
  BUGS
  ----
  
-The program does not check its input data for consistency, and aborts
-with cryptic errors messages in this case.
+The program does not check all its input data for consistency, and
+sometime aborts with cryptic errors messages with invalid data.
  
  The algorithm is not perfect.
  
-The output format is not easily scriptable, and the program should
-feed moves directly into Ganeti (either via RAPI or via a gnt-debug
-input file).
-
  EXAMPLE
  -------