Statistics
| Branch: | Tag: | Revision:

root / hbal.1 @ e015b554

History | View | Annotate | Download (17.7 kB)

1 d0003b35 Iustin Pop
.TH HBAL 1 2009-03-23 htools "Ganeti H-tools"
2 a9211170 Iustin Pop
.SH NAME
3 a9211170 Iustin Pop
hbal \- Cluster balancer for Ganeti
4 a9211170 Iustin Pop
5 a9211170 Iustin Pop
.SH SYNOPSIS
6 a9211170 Iustin Pop
.B hbal
7 a9211170 Iustin Pop
.B "[-C]"
8 a9211170 Iustin Pop
.B "[-p]"
9 a9211170 Iustin Pop
.B "[-o]"
10 d09b6ed3 Iustin Pop
.B "[-v... | -q]"
11 d2ac5526 Iustin Pop
.BI "[-l" limit "]"
12 d2ac5526 Iustin Pop
.BI "[-O" name... "]"
13 b0517d61 Iustin Pop
.BI "[-e" score "]"
14 d2ac5526 Iustin Pop
.BI "[-m " cluster "]"
15 a9211170 Iustin Pop
.BI "[-n " nodes-file " ]"
16 d2ac5526 Iustin Pop
.BI "[-i " instances-file "]"
17 a9211170 Iustin Pop
18 b0045e4d Iustin Pop
.B hbal
19 b0045e4d Iustin Pop
.B --version
20 b0045e4d Iustin Pop
21 a9211170 Iustin Pop
.SH DESCRIPTION
22 a9211170 Iustin Pop
hbal is a cluster balancer that looks at the current state of the
23 a9211170 Iustin Pop
cluster (nodes with their total and free disk, memory, etc.) and
24 a9211170 Iustin Pop
instance placement and computes a series of steps designed to bring
25 a9211170 Iustin Pop
the cluster into a better state.
26 a9211170 Iustin Pop
27 a9211170 Iustin Pop
The algorithm to do so is designed to be stable (i.e. it will give you
28 a9211170 Iustin Pop
the same results when restarting it from the middle of the solution)
29 a9211170 Iustin Pop
and reasonably fast. It is not, however, designed to be a perfect
30 a9211170 Iustin Pop
algorithm - it is possible to make it go into a corner from which it
31 a9211170 Iustin Pop
can find no improvement, because it only look one "step" ahead.
32 a9211170 Iustin Pop
33 a9211170 Iustin Pop
By default, the program will show the solution incrementally as it is
34 a9211170 Iustin Pop
computed, in a somewhat cryptic format; for getting the actual Ganeti
35 a9211170 Iustin Pop
command list, use the \fB-C\fR option.
36 a9211170 Iustin Pop
37 a9211170 Iustin Pop
.SS ALGORITHM
38 a9211170 Iustin Pop
39 b0045e4d Iustin Pop
The program works in independent steps; at each step, we compute the
40 a9211170 Iustin Pop
best instance move that lowers the cluster score.
41 a9211170 Iustin Pop
42 a9211170 Iustin Pop
The possible move type for an instance are combinations of
43 a9211170 Iustin Pop
failover/migrate and replace-disks such that we change one of the
44 a9211170 Iustin Pop
instance nodes, and the other one remains (but possibly with changed
45 a9211170 Iustin Pop
role, e.g. from primary it becomes secondary). The list is:
46 d0003b35 Iustin Pop
.RS 4
47 d0003b35 Iustin Pop
.TP 3
48 d0003b35 Iustin Pop
\(em
49 d0003b35 Iustin Pop
failover (f)
50 d0003b35 Iustin Pop
.TP
51 d0003b35 Iustin Pop
\(em
52 d0003b35 Iustin Pop
replace secondary (r)
53 d0003b35 Iustin Pop
.TP
54 d0003b35 Iustin Pop
\(em
55 d0003b35 Iustin Pop
replace primary, a composite move (f, r, f)
56 d0003b35 Iustin Pop
.TP
57 d0003b35 Iustin Pop
\(em
58 d0003b35 Iustin Pop
failover and replace secondary, also composite (f, r)
59 d0003b35 Iustin Pop
.TP
60 d0003b35 Iustin Pop
\(em
61 d0003b35 Iustin Pop
replace secondary and failover, also composite (r, f)
62 d0003b35 Iustin Pop
.RE
63 a9211170 Iustin Pop
64 a9211170 Iustin Pop
We don't do the only remaining possibility of replacing both nodes
65 a9211170 Iustin Pop
(r,f,r,f or the equivalent f,r,f,r) since these move needs an
66 a9211170 Iustin Pop
exhaustive search over both candidate primary and secondary nodes, and
67 a9211170 Iustin Pop
is O(n*n) in the number of nodes. Furthermore, it doesn't seems to
68 a9211170 Iustin Pop
give better scores but will result in more disk replacements.
69 a9211170 Iustin Pop
70 a9211170 Iustin Pop
.SS CLUSTER SCORING
71 a9211170 Iustin Pop
72 b0045e4d Iustin Pop
As said before, the algorithm tries to minimise the cluster score at
73 a9211170 Iustin Pop
each step. Currently this score is computed as a sum of the following
74 a9211170 Iustin Pop
components:
75 d0003b35 Iustin Pop
.RS 4
76 d0003b35 Iustin Pop
.TP 3
77 d0003b35 Iustin Pop
\(em
78 d0003b35 Iustin Pop
coefficient of variance of the percent of free memory
79 d0003b35 Iustin Pop
.TP
80 d0003b35 Iustin Pop
\(em
81 d0003b35 Iustin Pop
coefficient of variance of the percent of reserved memory
82 d0003b35 Iustin Pop
.TP
83 d0003b35 Iustin Pop
\(em
84 d0003b35 Iustin Pop
coefficient of variance of the percent of free disk
85 d0003b35 Iustin Pop
.TP
86 d0003b35 Iustin Pop
\(em
87 d0003b35 Iustin Pop
percentage of nodes failing N+1 check
88 d0003b35 Iustin Pop
.TP
89 d0003b35 Iustin Pop
\(em
90 d0003b35 Iustin Pop
percentage of instances living (either as primary or secondary) on
91 d0003b35 Iustin Pop
offline nodes
92 d0003b35 Iustin Pop
.RE
93 a9211170 Iustin Pop
94 a9211170 Iustin Pop
The free memory and free disk values help ensure that all nodes are
95 a9211170 Iustin Pop
somewhat balanced in their resource usage. The reserved memory helps
96 a9211170 Iustin Pop
to ensure that nodes are somewhat balanced in holding secondary
97 a9211170 Iustin Pop
instances, and that no node keeps too much memory reserved for
98 a9211170 Iustin Pop
N+1. And finally, the N+1 percentage helps guide the algorithm towards
99 a9211170 Iustin Pop
eliminating N+1 failures, if possible.
100 a9211170 Iustin Pop
101 d2ac5526 Iustin Pop
Except for the N+1 failures and offline instances percentage, we use
102 d2ac5526 Iustin Pop
the coefficient of variance since this brings the values into the same
103 d2ac5526 Iustin Pop
unit so to speak, and with a restrict domain of values (between zero
104 d2ac5526 Iustin Pop
and one). The percentage of N+1 failures, while also in this numeric
105 d2ac5526 Iustin Pop
range, doesn't actually has the same meaning, but it has shown to work
106 d2ac5526 Iustin Pop
well.
107 a9211170 Iustin Pop
108 a9211170 Iustin Pop
The other alternative, using for N+1 checks the coefficient of
109 a9211170 Iustin Pop
variance of (N+1 fail=1, N+1 pass=0) across nodes could hint the
110 a9211170 Iustin Pop
algorithm to make more N+1 failures if most nodes are N+1 fail
111 a9211170 Iustin Pop
already. Since this (making N+1 failures) is not allowed by other
112 a9211170 Iustin Pop
rules of the algorithm, so the N+1 checks would simply not work
113 a9211170 Iustin Pop
anymore in this case.
114 a9211170 Iustin Pop
115 d2ac5526 Iustin Pop
The offline instances percentage (meaning the percentage of instances
116 d2ac5526 Iustin Pop
living on offline nodes) will cause the algorithm to actively move
117 d2ac5526 Iustin Pop
instances away from offline nodes. This, coupled with the restriction
118 d2ac5526 Iustin Pop
on placement given by offline nodes, will cause evacuation of such
119 d2ac5526 Iustin Pop
nodes.
120 d2ac5526 Iustin Pop
121 a9211170 Iustin Pop
On a perfectly balanced cluster (all nodes the same size, all
122 a9211170 Iustin Pop
instances the same size and spread across the nodes equally), all
123 a9211170 Iustin Pop
values would be zero. This doesn't happen too often in practice :)
124 a9211170 Iustin Pop
125 d0003b35 Iustin Pop
.SS OFFLINE INSTANCES
126 d0003b35 Iustin Pop
127 d0003b35 Iustin Pop
Since current Ganeti versions do not report the memory used by offline
128 d0003b35 Iustin Pop
(down) instances, ignoring the run status of instances will cause
129 d0003b35 Iustin Pop
wrong calculations. For this reason, the algorithm subtracts the
130 d0003b35 Iustin Pop
memory size of down instances from the free node memory of their
131 d0003b35 Iustin Pop
primary node, in effect simulating the startup of such instances.
132 d0003b35 Iustin Pop
133 a9211170 Iustin Pop
.SS OTHER POSSIBLE METRICS
134 a9211170 Iustin Pop
135 a9211170 Iustin Pop
It would be desirable to add more metrics to the algorithm, especially
136 a9211170 Iustin Pop
dynamically-computed metrics, such as:
137 d0003b35 Iustin Pop
.RS 4
138 d0003b35 Iustin Pop
.TP 3
139 d0003b35 Iustin Pop
\(em
140 d0003b35 Iustin Pop
CPU usage of instances, combined with VCPU versus PCPU count
141 d0003b35 Iustin Pop
.TP
142 d0003b35 Iustin Pop
\(em
143 d0003b35 Iustin Pop
Disk IO usage
144 d0003b35 Iustin Pop
.TP
145 d0003b35 Iustin Pop
\(em
146 d0003b35 Iustin Pop
Network IO
147 d0003b35 Iustin Pop
.RE
148 a9211170 Iustin Pop
149 a9211170 Iustin Pop
.SH OPTIONS
150 a9211170 Iustin Pop
The options that can be passed to the program are as follows:
151 a9211170 Iustin Pop
.TP
152 a9211170 Iustin Pop
.B -C, --print-commands
153 a9211170 Iustin Pop
Print the command list at the end of the run. Without this, the
154 a9211170 Iustin Pop
program will only show a shorter, but cryptic output.
155 a9211170 Iustin Pop
.TP
156 a9211170 Iustin Pop
.B -p, --print-nodes
157 a9211170 Iustin Pop
Prints the before and after node status, in a format designed to allow
158 a9211170 Iustin Pop
the user to understand the node's most important parameters.
159 a9211170 Iustin Pop
160 a9211170 Iustin Pop
The node list will contain these informations:
161 d2ac5526 Iustin Pop
.RS
162 d2ac5526 Iustin Pop
.TP
163 d2ac5526 Iustin Pop
.B F
164 d2ac5526 Iustin Pop
a character denoting the status of the node, with '-' meaning an
165 d2ac5526 Iustin Pop
offline node, '*' meaning N+1 failure and blank meaning a good node
166 d2ac5526 Iustin Pop
.TP
167 d2ac5526 Iustin Pop
.B Name
168 d2ac5526 Iustin Pop
the node name
169 d2ac5526 Iustin Pop
.TP
170 d2ac5526 Iustin Pop
.B t_mem
171 d2ac5526 Iustin Pop
the total node memory
172 d2ac5526 Iustin Pop
.TP
173 d2ac5526 Iustin Pop
.B n_mem
174 d2ac5526 Iustin Pop
the memory used by the node itself
175 d2ac5526 Iustin Pop
.TP
176 d2ac5526 Iustin Pop
.B i_mem
177 d2ac5526 Iustin Pop
the memory used by instances
178 d2ac5526 Iustin Pop
.TP
179 d2ac5526 Iustin Pop
.B x_mem
180 d2ac5526 Iustin Pop
amount memory which seems to be in use but cannot be determined why or
181 d2ac5526 Iustin Pop
by which instance; usually this means that the hypervisor has some
182 d2ac5526 Iustin Pop
overhead or that there are other reporting errors
183 d2ac5526 Iustin Pop
.TP
184 d2ac5526 Iustin Pop
.B f_mem
185 d2ac5526 Iustin Pop
the free node memory
186 d2ac5526 Iustin Pop
.TP
187 d2ac5526 Iustin Pop
.B r_mem
188 d2ac5526 Iustin Pop
the reserved node memory, which is the amount of free memory needed
189 d2ac5526 Iustin Pop
for N+1 compliance
190 d2ac5526 Iustin Pop
.TP
191 d2ac5526 Iustin Pop
.B t_dsk
192 d2ac5526 Iustin Pop
total disk
193 d2ac5526 Iustin Pop
.TP
194 d2ac5526 Iustin Pop
.B f_dsk
195 d2ac5526 Iustin Pop
free disk
196 d2ac5526 Iustin Pop
.TP
197 d2ac5526 Iustin Pop
.B pri
198 d2ac5526 Iustin Pop
number of primary instances
199 d2ac5526 Iustin Pop
.TP
200 d2ac5526 Iustin Pop
.B sec
201 d2ac5526 Iustin Pop
number of secondary instances
202 d2ac5526 Iustin Pop
.TP
203 d2ac5526 Iustin Pop
.B p_fmem
204 d2ac5526 Iustin Pop
percent of free memory
205 d2ac5526 Iustin Pop
.TP
206 d2ac5526 Iustin Pop
.B p_fdsk
207 d2ac5526 Iustin Pop
percent of free disk
208 d2ac5526 Iustin Pop
.RE
209 a9211170 Iustin Pop
210 a9211170 Iustin Pop
.TP
211 a9211170 Iustin Pop
.B -o, --oneline
212 a9211170 Iustin Pop
Only shows a one-line output from the program, designed for the case
213 a9211170 Iustin Pop
when one wants to look at multiple clusters at once and check their
214 a9211170 Iustin Pop
status.
215 a9211170 Iustin Pop
216 a9211170 Iustin Pop
The line will contain four fields:
217 d0003b35 Iustin Pop
.RS
218 d0003b35 Iustin Pop
.RS 4
219 d0003b35 Iustin Pop
.TP 3
220 d0003b35 Iustin Pop
\(em
221 d0003b35 Iustin Pop
initial cluster score
222 d0003b35 Iustin Pop
.TP
223 d0003b35 Iustin Pop
\(em
224 d0003b35 Iustin Pop
number of steps in the solution
225 d0003b35 Iustin Pop
.TP
226 d0003b35 Iustin Pop
\(em
227 d0003b35 Iustin Pop
final cluster score
228 d0003b35 Iustin Pop
.TP
229 d0003b35 Iustin Pop
\(em
230 d0003b35 Iustin Pop
improvement in the cluster score
231 d0003b35 Iustin Pop
.RE
232 d0003b35 Iustin Pop
.RE
233 a9211170 Iustin Pop
234 a9211170 Iustin Pop
.TP
235 d2ac5526 Iustin Pop
.BI "-O " name
236 d2ac5526 Iustin Pop
This option (which can be given multiple times) will mark nodes as
237 d2ac5526 Iustin Pop
being \fIoffline\fR. This means a couple of things:
238 d2ac5526 Iustin Pop
.RS
239 d0003b35 Iustin Pop
.RS 4
240 d0003b35 Iustin Pop
.TP 3
241 d0003b35 Iustin Pop
\(em
242 d2ac5526 Iustin Pop
instances won't be placed on these nodes, not even temporarily;
243 d2ac5526 Iustin Pop
e.g. the \fIreplace primary\fR move is not available if the secondary
244 d2ac5526 Iustin Pop
node is offline, since this move requires a failover.
245 d2ac5526 Iustin Pop
.TP
246 d0003b35 Iustin Pop
\(em
247 d2ac5526 Iustin Pop
these nodes will not be included in the score calculation (except for
248 d2ac5526 Iustin Pop
the percentage of instances on offline nodes)
249 d2ac5526 Iustin Pop
.RE
250 00b15752 Iustin Pop
Note that hbal will also mark as offline any nodes which are reported
251 00b15752 Iustin Pop
by RAPI as such, or that have "?" in file-based input in any numeric
252 00b15752 Iustin Pop
fields.
253 d0003b35 Iustin Pop
.RE
254 d2ac5526 Iustin Pop
255 d2ac5526 Iustin Pop
.TP
256 b0517d61 Iustin Pop
.BI "-e" score ", --min-score=" score
257 b0517d61 Iustin Pop
This parameter denotes the minimum score we are happy with and alters
258 b0517d61 Iustin Pop
the computation in two ways:
259 b0517d61 Iustin Pop
.RS
260 b0517d61 Iustin Pop
.RS 4
261 b0517d61 Iustin Pop
.TP 3
262 b0517d61 Iustin Pop
\(em
263 b0517d61 Iustin Pop
if the cluster has the initial score lower than this value, then we
264 b0517d61 Iustin Pop
don't enter the algorithm at all, and exit with success
265 b0517d61 Iustin Pop
.TP
266 b0517d61 Iustin Pop
\(em
267 b0517d61 Iustin Pop
during the iterative process, if we reach a score lower than this
268 b0517d61 Iustin Pop
value, we exit the algorithm
269 b0517d61 Iustin Pop
.RE
270 b0517d61 Iustin Pop
The default value of the parameter is currently \fI1e-9\fR (chosen
271 b0517d61 Iustin Pop
empirically).
272 b0517d61 Iustin Pop
.RE
273 b0517d61 Iustin Pop
274 b0517d61 Iustin Pop
.TP
275 a9211170 Iustin Pop
.BI "-n" nodefile ", --nodes=" nodefile
276 a9211170 Iustin Pop
The name of the file holding node information (if not collecting via
277 7b255913 Iustin Pop
RAPI), instead of the default \fInodes\fR file (but see below how to
278 7b255913 Iustin Pop
customize the default value via the environment).
279 a9211170 Iustin Pop
280 a9211170 Iustin Pop
.TP
281 a9211170 Iustin Pop
.BI "-i" instancefile ", --instances=" instancefile
282 a9211170 Iustin Pop
The name of the file holding instance information (if not collecting
283 7b255913 Iustin Pop
via RAPI), instead of the default \fIinstances\fR file (but see below
284 7b255913 Iustin Pop
how to customize the default value via the environment).
285 a9211170 Iustin Pop
286 a9211170 Iustin Pop
.TP
287 a9211170 Iustin Pop
.BI "-m" cluster
288 a9211170 Iustin Pop
Collect data not from files but directly from the
289 a9211170 Iustin Pop
.I cluster
290 e015b554 Iustin Pop
given as an argument via RAPI. If the argument doesn't contain a colon
291 e015b554 Iustin Pop
(:), then it is converted into a fully-built URL via prepending
292 e015b554 Iustin Pop
https:// and appending the default RAPI port, otherwise it's
293 e015b554 Iustin Pop
considered a fully-specified URL and is used unchanged.
294 a9211170 Iustin Pop
295 a9211170 Iustin Pop
.TP
296 a9211170 Iustin Pop
.BI "-l" N ", --max-length=" N
297 a9211170 Iustin Pop
Restrict the solution to this length. This can be used for example to
298 a9211170 Iustin Pop
automate the execution of the balancing.
299 a9211170 Iustin Pop
300 a9211170 Iustin Pop
.TP
301 a9211170 Iustin Pop
.B -v, --verbose
302 a9211170 Iustin Pop
Increase the output verbosity. Each usage of this option will increase
303 a9211170 Iustin Pop
the verbosity (currently more than 2 doesn't make sense) from the
304 d09b6ed3 Iustin Pop
default of one.
305 d09b6ed3 Iustin Pop
306 d09b6ed3 Iustin Pop
.TP
307 d09b6ed3 Iustin Pop
.B -q, --quiet
308 d09b6ed3 Iustin Pop
Decrease the output verbosity. Each usage of this option will decrease
309 d09b6ed3 Iustin Pop
the verbosity (less than zero doesn't make sense) from the default of
310 d09b6ed3 Iustin Pop
one.
311 a9211170 Iustin Pop
312 a9211170 Iustin Pop
.TP
313 a9211170 Iustin Pop
.B -V, --version
314 a9211170 Iustin Pop
Just show the program version and exit.
315 a9211170 Iustin Pop
316 a9211170 Iustin Pop
.SH EXIT STATUS
317 a9211170 Iustin Pop
318 a9211170 Iustin Pop
The exist status of the command will be zero, unless for some reason
319 a9211170 Iustin Pop
the algorithm fatally failed (e.g. wrong node or instance data).
320 a9211170 Iustin Pop
321 7b255913 Iustin Pop
.SH ENVIRONMENT
322 7b255913 Iustin Pop
323 7b255913 Iustin Pop
If the variables \fBHTOOLS_NODES\fR and \fBHTOOLS_INSTANCES\fR are
324 7b255913 Iustin Pop
present in the environment, they will override the default names for
325 7b255913 Iustin Pop
the nodes and instances files. These will have of course no effect
326 7b255913 Iustin Pop
when RAPI is used.
327 7b255913 Iustin Pop
328 a9211170 Iustin Pop
.SH BUGS
329 a9211170 Iustin Pop
330 a9211170 Iustin Pop
The program does not check its input data for consistency, and aborts
331 a9211170 Iustin Pop
with cryptic errors messages in this case.
332 a9211170 Iustin Pop
333 a9211170 Iustin Pop
The algorithm is not perfect.
334 a9211170 Iustin Pop
335 d0003b35 Iustin Pop
The algorithm doesn't deal with non-\fBdrbd\fR instances, and chokes
336 d0003b35 Iustin Pop
on input data which has such instances.
337 d0003b35 Iustin Pop
338 a9211170 Iustin Pop
The output format is not easily scriptable, and the program should
339 a9211170 Iustin Pop
feed moves directly into Ganeti (either via RAPI or via a gnt-debug
340 a9211170 Iustin Pop
input file).
341 a9211170 Iustin Pop
342 a9211170 Iustin Pop
.SH EXAMPLE
343 a9211170 Iustin Pop
344 d2ac5526 Iustin Pop
Note that this example are not for the latest version (they don't have
345 d2ac5526 Iustin Pop
full node data).
346 d2ac5526 Iustin Pop
347 a9211170 Iustin Pop
.SS Default output
348 a9211170 Iustin Pop
349 a9211170 Iustin Pop
With the default options, the program shows each individual step and
350 a9211170 Iustin Pop
the improvements it brings in cluster score:
351 a9211170 Iustin Pop
352 a9211170 Iustin Pop
.in +4n
353 a9211170 Iustin Pop
.nf
354 a9211170 Iustin Pop
.RB "$" " hbal"
355 a9211170 Iustin Pop
Loaded 20 nodes, 80 instances
356 a9211170 Iustin Pop
Cluster is not N+1 happy, continuing but no guarantee that the cluster will end N+1 happy.
357 a9211170 Iustin Pop
Initial score: 0.52329131
358 a9211170 Iustin Pop
Trying to minimize the CV...
359 a9211170 Iustin Pop
    1. instance14  node1:node10  => node16:node10 0.42109120 a=f r:node16 f
360 a9211170 Iustin Pop
    2. instance54  node4:node15  => node16:node15 0.31904594 a=f r:node16 f
361 a9211170 Iustin Pop
    3. instance4   node5:node2   => node2:node16  0.26611015 a=f r:node16
362 a9211170 Iustin Pop
    4. instance48  node18:node20 => node2:node18  0.21361717 a=r:node2 f
363 a9211170 Iustin Pop
    5. instance93  node19:node18 => node16:node19 0.16166425 a=r:node16 f
364 a9211170 Iustin Pop
    6. instance89  node3:node20  => node2:node3   0.11005629 a=r:node2 f
365 a9211170 Iustin Pop
    7. instance5   node6:node2   => node16:node6  0.05841589 a=r:node16 f
366 a9211170 Iustin Pop
    8. instance94  node7:node20  => node20:node16 0.00658759 a=f r:node16
367 a9211170 Iustin Pop
    9. instance44  node20:node2  => node2:node15  0.00438740 a=f r:node15
368 a9211170 Iustin Pop
   10. instance62  node14:node18 => node14:node16 0.00390087 a=r:node16
369 a9211170 Iustin Pop
   11. instance13  node11:node14 => node11:node16 0.00361787 a=r:node16
370 a9211170 Iustin Pop
   12. instance19  node10:node11 => node10:node7  0.00336636 a=r:node7
371 a9211170 Iustin Pop
   13. instance43  node12:node13 => node12:node1  0.00305681 a=r:node1
372 a9211170 Iustin Pop
   14. instance1   node1:node2   => node1:node4   0.00263124 a=r:node4
373 a9211170 Iustin Pop
   15. instance58  node19:node20 => node19:node17 0.00252594 a=r:node17
374 a9211170 Iustin Pop
Cluster score improved from 0.52329131 to 0.00252594
375 a9211170 Iustin Pop
.fi
376 a9211170 Iustin Pop
.in
377 a9211170 Iustin Pop
378 a9211170 Iustin Pop
In the above output, we can see:
379 a9211170 Iustin Pop
  - the input data (here from files) shows a cluster with 20 nodes and
380 a9211170 Iustin Pop
    80 instances
381 a9211170 Iustin Pop
  - the cluster is not initially N+1 compliant
382 a9211170 Iustin Pop
  - the initial score is 0.52329131
383 a9211170 Iustin Pop
384 a9211170 Iustin Pop
The step list follows, showing the instance, its initial
385 a9211170 Iustin Pop
primary/secondary nodes, the new primary secondary, the cluster list,
386 a9211170 Iustin Pop
and the actions taken in this step (with 'f' denoting failover/migrate
387 a9211170 Iustin Pop
and 'r' denoting replace secondary).
388 a9211170 Iustin Pop
389 a9211170 Iustin Pop
Finally, the program shows the improvement in cluster score.
390 a9211170 Iustin Pop
391 a9211170 Iustin Pop
A more detailed output is obtained via the \fB-C\fR and \fB-p\fR options:
392 a9211170 Iustin Pop
393 a9211170 Iustin Pop
.in +4n
394 a9211170 Iustin Pop
.nf
395 a9211170 Iustin Pop
.RB "$" " hbal"
396 a9211170 Iustin Pop
Loaded 20 nodes, 80 instances
397 a9211170 Iustin Pop
Cluster is not N+1 happy, continuing but no guarantee that the cluster will end N+1 happy.
398 a9211170 Iustin Pop
Initial cluster status:
399 a9211170 Iustin Pop
N1 Name   t_mem f_mem r_mem t_dsk f_dsk pri sec  p_fmem  p_fdsk
400 a9211170 Iustin Pop
 * node1  32762  1280  6000  1861  1026   5   3 0.03907 0.55179
401 a9211170 Iustin Pop
   node2  32762 31280 12000  1861  1026   0   8 0.95476 0.55179
402 a9211170 Iustin Pop
 * node3  32762  1280  6000  1861  1026   5   3 0.03907 0.55179
403 a9211170 Iustin Pop
 * node4  32762  1280  6000  1861  1026   5   3 0.03907 0.55179
404 a9211170 Iustin Pop
 * node5  32762  1280  6000  1861   978   5   5 0.03907 0.52573
405 a9211170 Iustin Pop
 * node6  32762  1280  6000  1861  1026   5   3 0.03907 0.55179
406 a9211170 Iustin Pop
 * node7  32762  1280  6000  1861  1026   5   3 0.03907 0.55179
407 a9211170 Iustin Pop
   node8  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
408 a9211170 Iustin Pop
   node9  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
409 a9211170 Iustin Pop
 * node10 32762  7280 12000  1861  1026   4   4 0.22221 0.55179
410 a9211170 Iustin Pop
   node11 32762  7280  6000  1861   922   4   5 0.22221 0.49577
411 a9211170 Iustin Pop
   node12 32762  7280  6000  1861  1026   4   4 0.22221 0.55179
412 a9211170 Iustin Pop
   node13 32762  7280  6000  1861   922   4   5 0.22221 0.49577
413 a9211170 Iustin Pop
   node14 32762  7280  6000  1861   922   4   5 0.22221 0.49577
414 a9211170 Iustin Pop
 * node15 32762  7280 12000  1861  1131   4   3 0.22221 0.60782
415 a9211170 Iustin Pop
   node16 32762 31280     0  1861  1860   0   0 0.95476 1.00000
416 a9211170 Iustin Pop
   node17 32762  7280  6000  1861  1106   5   3 0.22221 0.59479
417 a9211170 Iustin Pop
 * node18 32762  1280  6000  1396   561   5   3 0.03907 0.40239
418 a9211170 Iustin Pop
 * node19 32762  1280  6000  1861  1026   5   3 0.03907 0.55179
419 a9211170 Iustin Pop
   node20 32762 13280 12000  1861   689   3   9 0.40535 0.37068
420 a9211170 Iustin Pop
421 a9211170 Iustin Pop
Initial score: 0.52329131
422 a9211170 Iustin Pop
Trying to minimize the CV...
423 a9211170 Iustin Pop
    1. instance14  node1:node10  => node16:node10 0.42109120 a=f r:node16 f
424 a9211170 Iustin Pop
    2. instance54  node4:node15  => node16:node15 0.31904594 a=f r:node16 f
425 a9211170 Iustin Pop
    3. instance4   node5:node2   => node2:node16  0.26611015 a=f r:node16
426 a9211170 Iustin Pop
    4. instance48  node18:node20 => node2:node18  0.21361717 a=r:node2 f
427 a9211170 Iustin Pop
    5. instance93  node19:node18 => node16:node19 0.16166425 a=r:node16 f
428 a9211170 Iustin Pop
    6. instance89  node3:node20  => node2:node3   0.11005629 a=r:node2 f
429 a9211170 Iustin Pop
    7. instance5   node6:node2   => node16:node6  0.05841589 a=r:node16 f
430 a9211170 Iustin Pop
    8. instance94  node7:node20  => node20:node16 0.00658759 a=f r:node16
431 a9211170 Iustin Pop
    9. instance44  node20:node2  => node2:node15  0.00438740 a=f r:node15
432 a9211170 Iustin Pop
   10. instance62  node14:node18 => node14:node16 0.00390087 a=r:node16
433 a9211170 Iustin Pop
   11. instance13  node11:node14 => node11:node16 0.00361787 a=r:node16
434 a9211170 Iustin Pop
   12. instance19  node10:node11 => node10:node7  0.00336636 a=r:node7
435 a9211170 Iustin Pop
   13. instance43  node12:node13 => node12:node1  0.00305681 a=r:node1
436 a9211170 Iustin Pop
   14. instance1   node1:node2   => node1:node4   0.00263124 a=r:node4
437 a9211170 Iustin Pop
   15. instance58  node19:node20 => node19:node17 0.00252594 a=r:node17
438 a9211170 Iustin Pop
Cluster score improved from 0.52329131 to 0.00252594
439 a9211170 Iustin Pop
440 a9211170 Iustin Pop
Commands to run to reach the above solution:
441 a9211170 Iustin Pop
  echo step 1
442 a9211170 Iustin Pop
  echo gnt-instance migrate instance14
443 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node16 instance14
444 a9211170 Iustin Pop
  echo gnt-instance migrate instance14
445 a9211170 Iustin Pop
  echo step 2
446 a9211170 Iustin Pop
  echo gnt-instance migrate instance54
447 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node16 instance54
448 a9211170 Iustin Pop
  echo gnt-instance migrate instance54
449 a9211170 Iustin Pop
  echo step 3
450 a9211170 Iustin Pop
  echo gnt-instance migrate instance4
451 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node16 instance4
452 a9211170 Iustin Pop
  echo step 4
453 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node2 instance48
454 a9211170 Iustin Pop
  echo gnt-instance migrate instance48
455 a9211170 Iustin Pop
  echo step 5
456 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node16 instance93
457 a9211170 Iustin Pop
  echo gnt-instance migrate instance93
458 a9211170 Iustin Pop
  echo step 6
459 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node2 instance89
460 a9211170 Iustin Pop
  echo gnt-instance migrate instance89
461 a9211170 Iustin Pop
  echo step 7
462 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node16 instance5
463 a9211170 Iustin Pop
  echo gnt-instance migrate instance5
464 a9211170 Iustin Pop
  echo step 8
465 a9211170 Iustin Pop
  echo gnt-instance migrate instance94
466 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node16 instance94
467 a9211170 Iustin Pop
  echo step 9
468 a9211170 Iustin Pop
  echo gnt-instance migrate instance44
469 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node15 instance44
470 a9211170 Iustin Pop
  echo step 10
471 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node16 instance62
472 a9211170 Iustin Pop
  echo step 11
473 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node16 instance13
474 a9211170 Iustin Pop
  echo step 12
475 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node7 instance19
476 a9211170 Iustin Pop
  echo step 13
477 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node1 instance43
478 a9211170 Iustin Pop
  echo step 14
479 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node4 instance1
480 a9211170 Iustin Pop
  echo step 15
481 a9211170 Iustin Pop
  echo gnt-instance replace-disks -n node17 instance58
482 a9211170 Iustin Pop
483 a9211170 Iustin Pop
Final cluster status:
484 a9211170 Iustin Pop
N1 Name   t_mem f_mem r_mem t_dsk f_dsk pri sec  p_fmem  p_fdsk
485 a9211170 Iustin Pop
   node1  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
486 a9211170 Iustin Pop
   node2  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
487 a9211170 Iustin Pop
   node3  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
488 a9211170 Iustin Pop
   node4  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
489 a9211170 Iustin Pop
   node5  32762  7280  6000  1861  1078   4   5 0.22221 0.57947
490 a9211170 Iustin Pop
   node6  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
491 a9211170 Iustin Pop
   node7  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
492 a9211170 Iustin Pop
   node8  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
493 a9211170 Iustin Pop
   node9  32762  7280  6000  1861  1026   4   4 0.22221 0.55179
494 a9211170 Iustin Pop
   node10 32762  7280  6000  1861  1026   4   4 0.22221 0.55179
495 a9211170 Iustin Pop
   node11 32762  7280  6000  1861  1022   4   4 0.22221 0.54951
496 a9211170 Iustin Pop
   node12 32762  7280  6000  1861  1026   4   4 0.22221 0.55179
497 a9211170 Iustin Pop
   node13 32762  7280  6000  1861  1022   4   4 0.22221 0.54951
498 a9211170 Iustin Pop
   node14 32762  7280  6000  1861  1022   4   4 0.22221 0.54951
499 a9211170 Iustin Pop
   node15 32762  7280  6000  1861  1031   4   4 0.22221 0.55408
500 a9211170 Iustin Pop
   node16 32762  7280  6000  1861  1060   4   4 0.22221 0.57007
501 a9211170 Iustin Pop
   node17 32762  7280  6000  1861  1006   5   4 0.22221 0.54105
502 a9211170 Iustin Pop
   node18 32762  7280  6000  1396   761   4   2 0.22221 0.54570
503 a9211170 Iustin Pop
   node19 32762  7280  6000  1861  1026   4   4 0.22221 0.55179
504 a9211170 Iustin Pop
   node20 32762 13280  6000  1861  1089   3   5 0.40535 0.58565
505 a9211170 Iustin Pop
506 a9211170 Iustin Pop
.fi
507 a9211170 Iustin Pop
.in
508 a9211170 Iustin Pop
509 a9211170 Iustin Pop
Here we see, beside the step list, the initial and final cluster
510 a9211170 Iustin Pop
status, with the final one showing all nodes being N+1 compliant, and
511 a9211170 Iustin Pop
the command list to reach the final solution. In the initial listing,
512 a9211170 Iustin Pop
we see which nodes are not N+1 compliant.
513 a9211170 Iustin Pop
514 a9211170 Iustin Pop
The algorithm is stable as long as each step above is fully completed,
515 a9211170 Iustin Pop
e.g. in step 8, both the migrate and the replace-disks are
516 a9211170 Iustin Pop
done. Otherwise, if only the migrate is done, the input data is
517 a9211170 Iustin Pop
changed in a way that the program will output a different solution
518 a9211170 Iustin Pop
list (but hopefully will end in the same state).
519 a9211170 Iustin Pop
520 a9211170 Iustin Pop
.SH SEE ALSO
521 d2ac5526 Iustin Pop
.BR hn1 "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), "
522 d2ac5526 Iustin Pop
.BR gnt-node "(8)"