Statistics
| Branch: | Tag: | Revision:

root / man / hbal.rst @ f30d0610

History | View | Annotate | Download (28 kB)

1
HBAL(1) Ganeti | Version @GANETI_VERSION@
2
=========================================
3

    
4
NAME
5
----
6

    
7
hbal \- Cluster balancer for Ganeti
8

    
9
SYNOPSIS
10
--------
11

    
12
**hbal** {backend options...} [algorithm options...] [reporting options...]
13

    
14
**hbal** \--version
15

    
16

    
17
Backend options:
18

    
19
{ **-m** *cluster* | **-L[** *path* **] [-X]** | **-t** *data-file* |
20
**-I** *path* }
21

    
22
Algorithm options:
23

    
24
**[ \--max-cpu *cpu-ratio* ]**
25
**[ \--min-disk *disk-ratio* ]**
26
**[ -l *limit* ]**
27
**[ -e *score* ]**
28
**[ -g *delta* ]** **[ \--min-gain-limit *threshold* ]**
29
**[ -O *name...* ]**
30
**[ \--no-disk-moves ]**
31
**[ \--no-instance-moves ]**
32
**[ -U *util-file* ]**
33
**[ \--ignore-dynu ]**
34
**[ \--evac-mode ]**
35
**[ \--select-instances *inst...* ]**
36
**[ \--exclude-instances *inst...* ]**
37

    
38
Reporting options:
39

    
40
**[ -C[ *file* ] ]**
41
**[ -p[ *fields* ] ]**
42
**[ \--print-instances ]**
43
**[ -S *file* ]**
44
**[ -v... | -q ]**
45

    
46

    
47
DESCRIPTION
48
-----------
49

    
50
hbal is a cluster balancer that looks at the current state of the
51
cluster (nodes with their total and free disk, memory, etc.) and
52
instance placement and computes a series of steps designed to bring
53
the cluster into a better state.
54

    
55
The algorithm used is designed to be stable (i.e. it will give you the
56
same results when restarting it from the middle of the solution) and
57
reasonably fast. It is not, however, designed to be a perfect algorithm:
58
it is possible to make it go into a corner from which it can find no
59
improvement, because it looks only one "step" ahead.
60

    
61
The program accesses the cluster state via Rapi or Luxi. It also
62
requests data over the network from all MonDs with the --mond option.
63
Currently it uses only data produced by CPUload collector.
64

    
65
By default, the program will show the solution incrementally as it is
66
computed, in a somewhat cryptic format; for getting the actual Ganeti
67
command list, use the **-C** option.
68

    
69
ALGORITHM
70
~~~~~~~~~
71

    
72
The program works in independent steps; at each step, we compute the
73
best instance move that lowers the cluster score.
74

    
75
The possible move type for an instance are combinations of
76
failover/migrate and replace-disks such that we change one of the
77
instance nodes, and the other one remains (but possibly with changed
78
role, e.g. from primary it becomes secondary). The list is:
79

    
80
- failover (f)
81
- replace secondary (r)
82
- replace primary, a composite move (f, r, f)
83
- failover and replace secondary, also composite (f, r)
84
- replace secondary and failover, also composite (r, f)
85

    
86
We don't do the only remaining possibility of replacing both nodes
87
(r,f,r,f or the equivalent f,r,f,r) since these move needs an
88
exhaustive search over both candidate primary and secondary nodes, and
89
is O(n*n) in the number of nodes. Furthermore, it doesn't seems to
90
give better scores but will result in more disk replacements.
91

    
92
PLACEMENT RESTRICTIONS
93
~~~~~~~~~~~~~~~~~~~~~~
94

    
95
At each step, we prevent an instance move if it would cause:
96

    
97
- a node to go into N+1 failure state
98
- an instance to move onto an offline node (offline nodes are either
99
  read from the cluster or declared with *-O*; drained nodes are
100
  considered offline)
101
- an exclusion-tag based conflict (exclusion tags are read from the
102
  cluster and/or defined via the *\--exclusion-tags* option)
103
- a max vcpu/pcpu ratio to be exceeded (configured via *\--max-cpu*)
104
- min disk free percentage to go below the configured limit
105
  (configured via *\--min-disk*)
106

    
107
CLUSTER SCORING
108
~~~~~~~~~~~~~~~
109

    
110
As said before, the algorithm tries to minimise the cluster score at
111
each step. Currently this score is computed as a weighted sum of the
112
following components:
113

    
114
- standard deviation of the percent of free memory
115
- standard deviation of the percent of reserved memory
116
- standard deviation of the percent of free disk
117
- count of nodes failing N+1 check
118
- count of instances living (either as primary or secondary) on
119
  offline nodes; in the sense of hbal (and the other htools) drained
120
  nodes are considered offline
121
- count of instances living (as primary) on offline nodes; this
122
  differs from the above metric by helping failover of such instances
123
  in 2-node clusters
124
- standard deviation of the ratio of virtual-to-physical cpus (for
125
  primary instances of the node)
126
- standard deviation of the dynamic load on the nodes, for cpus,
127
  memory, disk and network
128
- standard deviation of the CPU load provided by MonD
129

    
130
The free memory and free disk values help ensure that all nodes are
131
somewhat balanced in their resource usage. The reserved memory helps
132
to ensure that nodes are somewhat balanced in holding secondary
133
instances, and that no node keeps too much memory reserved for
134
N+1. And finally, the N+1 percentage helps guide the algorithm towards
135
eliminating N+1 failures, if possible.
136

    
137
Except for the N+1 failures and offline instances counts, we use the
138
standard deviation since when used with values within a fixed range
139
(we use percents expressed as values between zero and one) it gives
140
consistent results across all metrics (there are some small issues
141
related to different means, but it works generally well). The 'count'
142
type values will have higher score and thus will matter more for
143
balancing; thus these are better for hard constraints (like evacuating
144
nodes and fixing N+1 failures). For example, the offline instances
145
count (i.e. the number of instances living on offline nodes) will
146
cause the algorithm to actively move instances away from offline
147
nodes. This, coupled with the restriction on placement given by
148
offline nodes, will cause evacuation of such nodes.
149

    
150
The dynamic load values need to be read from an external file (Ganeti
151
doesn't supply them), and are computed for each node as: sum of
152
primary instance cpu load, sum of primary instance memory load, sum of
153
primary and secondary instance disk load (as DRBD generates write load
154
on secondary nodes too in normal case and in degraded scenarios also
155
read load), and sum of primary instance network load. An example of
156
how to generate these values for input to hbal would be to track ``xm
157
list`` for instances over a day and by computing the delta of the cpu
158
values, and feed that via the *-U* option for all instances (and keep
159
the other metrics as one). For the algorithm to work, all that is
160
needed is that the values are consistent for a metric across all
161
instances (e.g. all instances use cpu% to report cpu usage, and not
162
something related to number of CPU seconds used if the CPUs are
163
different), and that they are normalised to between zero and one. Note
164
that it's recommended to not have zero as the load value for any
165
instance metric since then secondary instances are not well balanced.
166

    
167
The CPUload from MonD's data collector will be used only if all MonDs
168
are running, otherwise it won't affect the cluster score. Since we can't
169
find the CPU load of each instance, we can assume that the CPU load of
170
an instance is proportional to the number of its vcpus. With this
171
heuristic, instances from nodes with high CPU load will tend to move to
172
nodes with less CPU load.
173

    
174
On a perfectly balanced cluster (all nodes the same size, all
175
instances the same size and spread across the nodes equally), the
176
values for all metrics would be zero. This doesn't happen too often in
177
practice :)
178

    
179
OFFLINE INSTANCES
180
~~~~~~~~~~~~~~~~~
181

    
182
Since current Ganeti versions do not report the memory used by offline
183
(down) instances, ignoring the run status of instances will cause
184
wrong calculations. For this reason, the algorithm subtracts the
185
memory size of down instances from the free node memory of their
186
primary node, in effect simulating the startup of such instances.
187

    
188
EXCLUSION TAGS
189
~~~~~~~~~~~~~~
190

    
191
The exclusion tags mechanism is designed to prevent instances which
192
run the same workload (e.g. two DNS servers) to land on the same node,
193
which would make the respective node a SPOF for the given service.
194

    
195
It works by tagging instances with certain tags and then building
196
exclusion maps based on these. Which tags are actually used is
197
configured either via the command line (option *\--exclusion-tags*)
198
or via adding them to the cluster tags:
199

    
200
\--exclusion-tags=a,b
201
  This will make all instance tags of the form *a:\**, *b:\** be
202
  considered for the exclusion map
203

    
204
cluster tags *htools:iextags:a*, *htools:iextags:b*
205
  This will make instance tags *a:\**, *b:\** be considered for the
206
  exclusion map. More precisely, the suffix of cluster tags starting
207
  with *htools:iextags:* will become the prefix of the exclusion tags.
208

    
209
Both the above forms mean that two instances both having (e.g.) the
210
tag *a:foo* or *b:bar* won't end on the same node.
211

    
212
OPTIONS
213
-------
214

    
215
The options that can be passed to the program are as follows:
216

    
217
-C, \--print-commands
218
  Print the command list at the end of the run. Without this, the
219
  program will only show a shorter, but cryptic output.
220

    
221
  Note that the moves list will be split into independent steps,
222
  called "jobsets", but only for visual inspection, not for actually
223
  parallelisation. It is not possible to parallelise these directly
224
  when executed via "gnt-instance" commands, since a compound command
225
  (e.g. failover and replace-disks) must be executed
226
  serially. Parallel execution is only possible when using the Luxi
227
  backend and the *-L* option.
228

    
229
  The algorithm for splitting the moves into jobsets is by
230
  accumulating moves until the next move is touching nodes already
231
  touched by the current moves; this means we can't execute in
232
  parallel (due to resource allocation in Ganeti) and thus we start a
233
  new jobset.
234

    
235
-p, \--print-nodes
236
  Prints the before and after node status, in a format designed to allow
237
  the user to understand the node's most important parameters. See the
238
  man page **htools**\(1) for more details about this option.
239

    
240
\--print-instances
241
  Prints the before and after instance map. This is less useful as the
242
  node status, but it can help in understanding instance moves.
243

    
244
-O *name*
245
  This option (which can be given multiple times) will mark nodes as
246
  being *offline*. This means a couple of things:
247

    
248
  - instances won't be placed on these nodes, not even temporarily;
249
    e.g. the *replace primary* move is not available if the secondary
250
    node is offline, since this move requires a failover.
251
  - these nodes will not be included in the score calculation (except
252
    for the percentage of instances on offline nodes)
253

    
254
  Note that algorithm will also mark as offline any nodes which are
255
  reported by RAPI as such, or that have "?" in file-based input in
256
  any numeric fields.
257

    
258
-e *score*, \--min-score=*score*
259
  This parameter denotes the minimum score we are happy with and alters
260
  the computation in two ways:
261

    
262
  - if the cluster has the initial score lower than this value, then we
263
    don't enter the algorithm at all, and exit with success
264
  - during the iterative process, if we reach a score lower than this
265
    value, we exit the algorithm
266

    
267
  The default value of the parameter is currently ``1e-9`` (chosen
268
  empirically).
269

    
270
-g *delta*, \--min-gain=*delta*
271
  Since the balancing algorithm can sometimes result in just very tiny
272
  improvements, that bring less gain that they cost in relocation
273
  time, this parameter (defaulting to 0.01) represents the minimum
274
  gain we require during a step, to continue balancing.
275

    
276
\--min-gain-limit=*threshold*
277
  The above min-gain option will only take effect if the cluster score
278
  is already below *threshold* (defaults to 0.1). The rationale behind
279
  this setting is that at high cluster scores (badly balanced
280
  clusters), we don't want to abort the rebalance too quickly, as
281
  later gains might still be significant. However, under the
282
  threshold, the total gain is only the threshold value, so we can
283
  exit early.
284

    
285
\--no-disk-moves
286
  This parameter prevents hbal from using disk move
287
  (i.e. "gnt-instance replace-disks") operations. This will result in
288
  a much quicker balancing, but of course the improvements are
289
  limited. It is up to the user to decide when to use one or another.
290

    
291
\--no-instance-moves
292
  This parameter prevents hbal from using instance moves
293
  (i.e. "gnt-instance migrate/failover") operations. This will only use
294
  the slow disk-replacement operations, and will also provide a worse
295
  balance, but can be useful if moving instances around is deemed unsafe
296
  or not preferred.
297

    
298
\--evac-mode
299
  This parameter restricts the list of instances considered for moving
300
  to the ones living on offline/drained nodes. It can be used as a
301
  (bulk) replacement for Ganeti's own *gnt-node evacuate*, with the
302
  note that it doesn't guarantee full evacuation.
303

    
304
\--select-instances=*instances*
305
  This parameter marks the given instances (as a comma-separated list)
306
  as the only ones being moved during the rebalance.
307

    
308
\--exclude-instances=*instances*
309
  This parameter marks the given instances (as a comma-separated list)
310
  from being moved during the rebalance.
311

    
312
-U *util-file*
313
  This parameter specifies a file holding instance dynamic utilisation
314
  information that will be used to tweak the balancing algorithm to
315
  equalise load on the nodes (as opposed to static resource
316
  usage). The file is in the format "instance_name cpu_util mem_util
317
  disk_util net_util" where the "_util" parameters are interpreted as
318
  numbers and the instance name must match exactly the instance as
319
  read from Ganeti. In case of unknown instance names, the program
320
  will abort.
321

    
322
  If not given, the default values are one for all metrics and thus
323
  dynamic utilisation has only one effect on the algorithm: the
324
  equalisation of the secondary instances across nodes (this is the
325
  only metric that is not tracked by another, dedicated value, and
326
  thus the disk load of instances will cause secondary instance
327
  equalisation). Note that value of one will also influence slightly
328
  the primary instance count, but that is already tracked via other
329
  metrics and thus the influence of the dynamic utilisation will be
330
  practically insignificant.
331

    
332
\--ignore-dynu
333
  If given, all dynamic utilisation information will be ignored by
334
  assuming it to be 0. This option will take precedence over any data
335
  passed by the ``-U`` option or by the MonDs with the ``--mond`` and
336
  the ``--mond-data`` option.
337

    
338
-S *filename*, \--save-cluster=*filename*
339
  If given, the state of the cluster before the balancing is saved to
340
  the given file plus the extension "original"
341
  (i.e. *filename*.original), and the state at the end of the
342
  balancing is saved to the given file plus the extension "balanced"
343
  (i.e. *filename*.balanced). This allows re-feeding the cluster state
344
  to either hbal itself or for example hspace via the ``-t`` option.
345

    
346
-t *datafile*, \--text-data=*datafile*
347
  Backend specification: the name of the file holding node and instance
348
  information (if not collecting via RAPI or LUXI). This or one of the
349
  other backends must be selected. The option is described in the man
350
  page **htools**\(1).
351

    
352
\--mond
353
  If given the program will query all MonDs to fetch data from the
354
  supported data collectors over the network.
355

    
356
\--mond-data *datafile*
357
  The name of the file holding the data provided by MonD, to override
358
  quering MonDs over the network. This is mostly used for debugging. The
359
  file must be in JSON format and present an array of JSON objects ,
360
  one for every node, with two members. The first member named ``node``
361
  is the name of the node and the second member named ``reports`` is an
362
  array of report objects. The report objects must be in the same format
363
  as produced by the monitoring agent.
364

    
365
-m *cluster*
366
  Backend specification: collect data directly from the *cluster* given
367
  as an argument via RAPI. The option is described in the man page
368
  **htools**\(1).
369

    
370
-L [*path*]
371
  Backend specification: collect data directly from the master daemon,
372
  which is to be contacted via LUXI (an internal Ganeti protocol). The
373
  option is described in the man page **htools**\(1).
374

    
375
-X
376
  When using the Luxi backend, hbal can also execute the given
377
  commands. The execution method is to execute the individual jobsets
378
  (see the *-C* option for details) in separate stages, aborting if at
379
  any time a jobset doesn't have all jobs successful. Each step in the
380
  balancing solution will be translated into exactly one Ganeti job
381
  (having between one and three OpCodes), and all the steps in a
382
  jobset will be executed in parallel. The jobsets themselves are
383
  executed serially.
384

    
385
  The execution of the job series can be interrupted, see below for
386
  signal handling.
387

    
388
-l *N*, \--max-length=*N*
389
  Restrict the solution to this length. This can be used for example
390
  to automate the execution of the balancing.
391

    
392
\--max-cpu=*cpu-ratio*
393
  The maximum virtual to physical cpu ratio, as a floating point number
394
  greater than or equal to one. For example, specifying *cpu-ratio* as
395
  **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual cpus
396
  should be allowed to be in use for primary instances. A value of
397
  exactly one means there will be no over-subscription of CPU (except
398
  for the CPU time used by the node itself), and values below one do not
399
  make sense, as that means other resources (e.g. disk) won't be fully
400
  utilised due to CPU restrictions.
401

    
402
\--min-disk=*disk-ratio*
403
  The minimum amount of free disk space remaining, as a floating point
404
  number. For example, specifying *disk-ratio* as **0.25** means that
405
  at least one quarter of disk space should be left free on nodes.
406

    
407
-G *uuid*, \--group=*uuid*
408
  On an multi-group cluster, select this group for
409
  processing. Otherwise hbal will abort, since it cannot balance
410
  multiple groups at the same time.
411

    
412
-v, \--verbose
413
  Increase the output verbosity. Each usage of this option will
414
  increase the verbosity (currently more than 2 doesn't make sense)
415
  from the default of one.
416

    
417
-q, \--quiet
418
  Decrease the output verbosity. Each usage of this option will
419
  decrease the verbosity (less than zero doesn't make sense) from the
420
  default of one.
421

    
422
-V, \--version
423
  Just show the program version and exit.
424

    
425
SIGNAL HANDLING
426
---------------
427

    
428
When executing jobs via LUXI (using the ``-X`` option), normally hbal
429
will execute all jobs until either one errors out or all the jobs finish
430
successfully.
431

    
432
Since balancing can take a long time, it is possible to stop hbal early
433
in two ways:
434

    
435
- by sending a ``SIGINT`` (``^C``), hbal will register the termination
436
  request, and will wait until the currently submitted jobs finish, at
437
  which point it will exit (with exit code 0 if all jobs finished
438
  correctly, otherwise with exit code 1 as usual)
439

    
440
- by sending a ``SIGTERM``, hbal will immediately exit (with exit code
441
  2\); it is the responsibility of the user to follow up with Ganeti
442
  and check the result of the currently-executing jobs
443

    
444
Note that in any situation, it's perfectly safe to kill hbal, either via
445
the above signals or via any other signal (e.g. ``SIGQUIT``,
446
``SIGKILL``), since the jobs themselves are processed by Ganeti whereas
447
hbal (after submission) only watches their progression. In this case,
448
the user will have to query Ganeti for job results.
449

    
450
EXIT STATUS
451
-----------
452

    
453
The exit status of the command will be zero, unless for some reason the
454
algorithm failed (e.g. wrong node or instance data), invalid command
455
line options, or (in case of job execution) one of the jobs has failed.
456

    
457
Once job execution via Luxi has started (``-X``), if the balancing was
458
interrupted early (via *SIGINT*, or via ``--max-length``) but all jobs
459
executed successfully, then the exit status is zero; a non-zero exit
460
code means that the cluster state should be investigated, since a job
461
failed or we couldn't compute its status and this can also point to a
462
problem on the Ganeti side.
463

    
464
BUGS
465
----
466

    
467
The program does not check all its input data for consistency, and
468
sometime aborts with cryptic errors messages with invalid data.
469

    
470
The algorithm is not perfect.
471

    
472
EXAMPLE
473
-------
474

    
475
Note that these examples are not for the latest version (they don't
476
have full node data).
477

    
478
Default output
479
~~~~~~~~~~~~~~
480

    
481
With the default options, the program shows each individual step and
482
the improvements it brings in cluster score::
483

    
484
    $ hbal
485
    Loaded 20 nodes, 80 instances
486
    Cluster is not N+1 happy, continuing but no guarantee that the cluster will end N+1 happy.
487
    Initial score: 0.52329131
488
    Trying to minimize the CV...
489
        1. instance14  node1:node10  => node16:node10 0.42109120 a=f r:node16 f
490
        2. instance54  node4:node15  => node16:node15 0.31904594 a=f r:node16 f
491
        3. instance4   node5:node2   => node2:node16  0.26611015 a=f r:node16
492
        4. instance48  node18:node20 => node2:node18  0.21361717 a=r:node2 f
493
        5. instance93  node19:node18 => node16:node19 0.16166425 a=r:node16 f
494
        6. instance89  node3:node20  => node2:node3   0.11005629 a=r:node2 f
495
        7. instance5   node6:node2   => node16:node6  0.05841589 a=r:node16 f
496
        8. instance94  node7:node20  => node20:node16 0.00658759 a=f r:node16
497
        9. instance44  node20:node2  => node2:node15  0.00438740 a=f r:node15
498
       10. instance62  node14:node18 => node14:node16 0.00390087 a=r:node16
499
       11. instance13  node11:node14 => node11:node16 0.00361787 a=r:node16
500
       12. instance19  node10:node11 => node10:node7  0.00336636 a=r:node7
501
       13. instance43  node12:node13 => node12:node1  0.00305681 a=r:node1
502
       14. instance1   node1:node2   => node1:node4   0.00263124 a=r:node4
503
       15. instance58  node19:node20 => node19:node17 0.00252594 a=r:node17
504
    Cluster score improved from 0.52329131 to 0.00252594
505

    
506
In the above output, we can see:
507

    
508
- the input data (here from files) shows a cluster with 20 nodes and
509
  80 instances
510
- the cluster is not initially N+1 compliant
511
- the initial score is 0.52329131
512

    
513
The step list follows, showing the instance, its initial
514
primary/secondary nodes, the new primary secondary, the cluster list,
515
and the actions taken in this step (with 'f' denoting failover/migrate
516
and 'r' denoting replace secondary).
517

    
518
Finally, the program shows the improvement in cluster score.
519

    
520
A more detailed output is obtained via the *-C* and *-p* options::
521

    
522
    $ hbal
523
    Loaded 20 nodes, 80 instances
524
    Cluster is not N+1 happy, continuing but no guarantee that the cluster will end N+1 happy.
525
    Initial cluster status:
526
    N1 Name   t_mem f_mem r_mem t_dsk f_dsk pri sec  p_fmem  p_fdsk
527
     * node1  32762  1280  6000  1861  1026   5   3 0.03907 0.55179
528
       node2  32762 31280 12000  1861  1026   0   8 0.95476 0.55179
529
     * node3  32762  1280  6000  1861  1026   5   3 0.03907 0.55179
530
     * node4  32762  1280  6000  1861  1026   5   3 0.03907 0.55179
531
     * node5  32762  1280  6000  1861   978   5   5 0.03907 0.52573
532
     * node6  32762  1280  6000  1861  1026   5   3 0.03907 0.55179
533
     * node7  32762  1280  6000  1861  1026   5   3 0.03907 0.55179
534
       node8  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
535
       node9  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
536
     * node10 32762  7280 12000  1861  1026   4   4 0.22221 0.55179
537
       node11 32762  7280  6000  1861   922   4   5 0.22221 0.49577
538
       node12 32762  7280  6000  1861  1026   4   4 0.22221 0.55179
539
       node13 32762  7280  6000  1861   922   4   5 0.22221 0.49577
540
       node14 32762  7280  6000  1861   922   4   5 0.22221 0.49577
541
     * node15 32762  7280 12000  1861  1131   4   3 0.22221 0.60782
542
       node16 32762 31280     0  1861  1860   0   0 0.95476 1.00000
543
       node17 32762  7280  6000  1861  1106   5   3 0.22221 0.59479
544
     * node18 32762  1280  6000  1396   561   5   3 0.03907 0.40239
545
     * node19 32762  1280  6000  1861  1026   5   3 0.03907 0.55179
546
       node20 32762 13280 12000  1861   689   3   9 0.40535 0.37068
547

    
548
    Initial score: 0.52329131
549
    Trying to minimize the CV...
550
        1. instance14  node1:node10  => node16:node10 0.42109120 a=f r:node16 f
551
        2. instance54  node4:node15  => node16:node15 0.31904594 a=f r:node16 f
552
        3. instance4   node5:node2   => node2:node16  0.26611015 a=f r:node16
553
        4. instance48  node18:node20 => node2:node18  0.21361717 a=r:node2 f
554
        5. instance93  node19:node18 => node16:node19 0.16166425 a=r:node16 f
555
        6. instance89  node3:node20  => node2:node3   0.11005629 a=r:node2 f
556
        7. instance5   node6:node2   => node16:node6  0.05841589 a=r:node16 f
557
        8. instance94  node7:node20  => node20:node16 0.00658759 a=f r:node16
558
        9. instance44  node20:node2  => node2:node15  0.00438740 a=f r:node15
559
       10. instance62  node14:node18 => node14:node16 0.00390087 a=r:node16
560
       11. instance13  node11:node14 => node11:node16 0.00361787 a=r:node16
561
       12. instance19  node10:node11 => node10:node7  0.00336636 a=r:node7
562
       13. instance43  node12:node13 => node12:node1  0.00305681 a=r:node1
563
       14. instance1   node1:node2   => node1:node4   0.00263124 a=r:node4
564
       15. instance58  node19:node20 => node19:node17 0.00252594 a=r:node17
565
    Cluster score improved from 0.52329131 to 0.00252594
566

    
567
    Commands to run to reach the above solution:
568
      echo step 1
569
      echo gnt-instance migrate instance14
570
      echo gnt-instance replace-disks -n node16 instance14
571
      echo gnt-instance migrate instance14
572
      echo step 2
573
      echo gnt-instance migrate instance54
574
      echo gnt-instance replace-disks -n node16 instance54
575
      echo gnt-instance migrate instance54
576
      echo step 3
577
      echo gnt-instance migrate instance4
578
      echo gnt-instance replace-disks -n node16 instance4
579
      echo step 4
580
      echo gnt-instance replace-disks -n node2 instance48
581
      echo gnt-instance migrate instance48
582
      echo step 5
583
      echo gnt-instance replace-disks -n node16 instance93
584
      echo gnt-instance migrate instance93
585
      echo step 6
586
      echo gnt-instance replace-disks -n node2 instance89
587
      echo gnt-instance migrate instance89
588
      echo step 7
589
      echo gnt-instance replace-disks -n node16 instance5
590
      echo gnt-instance migrate instance5
591
      echo step 8
592
      echo gnt-instance migrate instance94
593
      echo gnt-instance replace-disks -n node16 instance94
594
      echo step 9
595
      echo gnt-instance migrate instance44
596
      echo gnt-instance replace-disks -n node15 instance44
597
      echo step 10
598
      echo gnt-instance replace-disks -n node16 instance62
599
      echo step 11
600
      echo gnt-instance replace-disks -n node16 instance13
601
      echo step 12
602
      echo gnt-instance replace-disks -n node7 instance19
603
      echo step 13
604
      echo gnt-instance replace-disks -n node1 instance43
605
      echo step 14
606
      echo gnt-instance replace-disks -n node4 instance1
607
      echo step 15
608
      echo gnt-instance replace-disks -n node17 instance58
609

    
610
    Final cluster status:
611
    N1 Name   t_mem f_mem r_mem t_dsk f_dsk pri sec  p_fmem  p_fdsk
612
       node1  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
613
       node2  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
614
       node3  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
615
       node4  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
616
       node5  32762  7280  6000  1861  1078   4   5 0.22221 0.57947
617
       node6  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
618
       node7  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
619
       node8  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
620
       node9  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
621
       node10 32762  7280  6000  1861  1026   4   4 0.22221 0.55179
622
       node11 32762  7280  6000  1861  1022   4   4 0.22221 0.54951
623
       node12 32762  7280  6000  1861  1026   4   4 0.22221 0.55179
624
       node13 32762  7280  6000  1861  1022   4   4 0.22221 0.54951
625
       node14 32762  7280  6000  1861  1022   4   4 0.22221 0.54951
626
       node15 32762  7280  6000  1861  1031   4   4 0.22221 0.55408
627
       node16 32762  7280  6000  1861  1060   4   4 0.22221 0.57007
628
       node17 32762  7280  6000  1861  1006   5   4 0.22221 0.54105
629
       node18 32762  7280  6000  1396   761   4   2 0.22221 0.54570
630
       node19 32762  7280  6000  1861  1026   4   4 0.22221 0.55179
631
       node20 32762 13280  6000  1861  1089   3   5 0.40535 0.58565
632

    
633
Here we see, beside the step list, the initial and final cluster
634
status, with the final one showing all nodes being N+1 compliant, and
635
the command list to reach the final solution. In the initial listing,
636
we see which nodes are not N+1 compliant.
637

    
638
The algorithm is stable as long as each step above is fully completed,
639
e.g. in step 8, both the migrate and the replace-disks are
640
done. Otherwise, if only the migrate is done, the input data is
641
changed in a way that the program will output a different solution
642
list (but hopefully will end in the same state).
643

    
644
.. vim: set textwidth=72 :
645
.. Local Variables:
646
.. mode: rst
647
.. fill-column: 72
648
.. End: