root / hbal.1 @ d09b6ed3
History | View | Annotate | Download (17.4 kB)
1 | d0003b35 | Iustin Pop | .TH HBAL 1 2009-03-23 htools "Ganeti H-tools" |
---|---|---|---|
2 | a9211170 | Iustin Pop | .SH NAME |
3 | a9211170 | Iustin Pop | hbal \- Cluster balancer for Ganeti |
4 | a9211170 | Iustin Pop | |
5 | a9211170 | Iustin Pop | .SH SYNOPSIS |
6 | a9211170 | Iustin Pop | .B hbal |
7 | a9211170 | Iustin Pop | .B "[-C]" |
8 | a9211170 | Iustin Pop | .B "[-p]" |
9 | a9211170 | Iustin Pop | .B "[-o]" |
10 | d09b6ed3 | Iustin Pop | .B "[-v... | -q]" |
11 | d2ac5526 | Iustin Pop | .BI "[-l" limit "]" |
12 | d2ac5526 | Iustin Pop | .BI "[-O" name... "]" |
13 | b0517d61 | Iustin Pop | .BI "[-e" score "]" |
14 | d2ac5526 | Iustin Pop | .BI "[-m " cluster "]" |
15 | a9211170 | Iustin Pop | .BI "[-n " nodes-file " ]" |
16 | d2ac5526 | Iustin Pop | .BI "[-i " instances-file "]" |
17 | a9211170 | Iustin Pop | |
18 | b0045e4d | Iustin Pop | .B hbal |
19 | b0045e4d | Iustin Pop | .B --version |
20 | b0045e4d | Iustin Pop | |
21 | a9211170 | Iustin Pop | .SH DESCRIPTION |
22 | a9211170 | Iustin Pop | hbal is a cluster balancer that looks at the current state of the |
23 | a9211170 | Iustin Pop | cluster (nodes with their total and free disk, memory, etc.) and |
24 | a9211170 | Iustin Pop | instance placement and computes a series of steps designed to bring |
25 | a9211170 | Iustin Pop | the cluster into a better state. |
26 | a9211170 | Iustin Pop | |
27 | a9211170 | Iustin Pop | The algorithm to do so is designed to be stable (i.e. it will give you |
28 | a9211170 | Iustin Pop | the same results when restarting it from the middle of the solution) |
29 | a9211170 | Iustin Pop | and reasonably fast. It is not, however, designed to be a perfect |
30 | a9211170 | Iustin Pop | algorithm - it is possible to make it go into a corner from which it |
31 | a9211170 | Iustin Pop | can find no improvement, because it only look one "step" ahead. |
32 | a9211170 | Iustin Pop | |
33 | a9211170 | Iustin Pop | By default, the program will show the solution incrementally as it is |
34 | a9211170 | Iustin Pop | computed, in a somewhat cryptic format; for getting the actual Ganeti |
35 | a9211170 | Iustin Pop | command list, use the \fB-C\fR option. |
36 | a9211170 | Iustin Pop | |
37 | a9211170 | Iustin Pop | .SS ALGORITHM |
38 | a9211170 | Iustin Pop | |
39 | b0045e4d | Iustin Pop | The program works in independent steps; at each step, we compute the |
40 | a9211170 | Iustin Pop | best instance move that lowers the cluster score. |
41 | a9211170 | Iustin Pop | |
42 | a9211170 | Iustin Pop | The possible move type for an instance are combinations of |
43 | a9211170 | Iustin Pop | failover/migrate and replace-disks such that we change one of the |
44 | a9211170 | Iustin Pop | instance nodes, and the other one remains (but possibly with changed |
45 | a9211170 | Iustin Pop | role, e.g. from primary it becomes secondary). The list is: |
46 | d0003b35 | Iustin Pop | .RS 4 |
47 | d0003b35 | Iustin Pop | .TP 3 |
48 | d0003b35 | Iustin Pop | \(em |
49 | d0003b35 | Iustin Pop | failover (f) |
50 | d0003b35 | Iustin Pop | .TP |
51 | d0003b35 | Iustin Pop | \(em |
52 | d0003b35 | Iustin Pop | replace secondary (r) |
53 | d0003b35 | Iustin Pop | .TP |
54 | d0003b35 | Iustin Pop | \(em |
55 | d0003b35 | Iustin Pop | replace primary, a composite move (f, r, f) |
56 | d0003b35 | Iustin Pop | .TP |
57 | d0003b35 | Iustin Pop | \(em |
58 | d0003b35 | Iustin Pop | failover and replace secondary, also composite (f, r) |
59 | d0003b35 | Iustin Pop | .TP |
60 | d0003b35 | Iustin Pop | \(em |
61 | d0003b35 | Iustin Pop | replace secondary and failover, also composite (r, f) |
62 | d0003b35 | Iustin Pop | .RE |
63 | a9211170 | Iustin Pop | |
64 | a9211170 | Iustin Pop | We don't do the only remaining possibility of replacing both nodes |
65 | a9211170 | Iustin Pop | (r,f,r,f or the equivalent f,r,f,r) since these move needs an |
66 | a9211170 | Iustin Pop | exhaustive search over both candidate primary and secondary nodes, and |
67 | a9211170 | Iustin Pop | is O(n*n) in the number of nodes. Furthermore, it doesn't seems to |
68 | a9211170 | Iustin Pop | give better scores but will result in more disk replacements. |
69 | a9211170 | Iustin Pop | |
70 | a9211170 | Iustin Pop | .SS CLUSTER SCORING |
71 | a9211170 | Iustin Pop | |
72 | b0045e4d | Iustin Pop | As said before, the algorithm tries to minimise the cluster score at |
73 | a9211170 | Iustin Pop | each step. Currently this score is computed as a sum of the following |
74 | a9211170 | Iustin Pop | components: |
75 | d0003b35 | Iustin Pop | .RS 4 |
76 | d0003b35 | Iustin Pop | .TP 3 |
77 | d0003b35 | Iustin Pop | \(em |
78 | d0003b35 | Iustin Pop | coefficient of variance of the percent of free memory |
79 | d0003b35 | Iustin Pop | .TP |
80 | d0003b35 | Iustin Pop | \(em |
81 | d0003b35 | Iustin Pop | coefficient of variance of the percent of reserved memory |
82 | d0003b35 | Iustin Pop | .TP |
83 | d0003b35 | Iustin Pop | \(em |
84 | d0003b35 | Iustin Pop | coefficient of variance of the percent of free disk |
85 | d0003b35 | Iustin Pop | .TP |
86 | d0003b35 | Iustin Pop | \(em |
87 | d0003b35 | Iustin Pop | percentage of nodes failing N+1 check |
88 | d0003b35 | Iustin Pop | .TP |
89 | d0003b35 | Iustin Pop | \(em |
90 | d0003b35 | Iustin Pop | percentage of instances living (either as primary or secondary) on |
91 | d0003b35 | Iustin Pop | offline nodes |
92 | d0003b35 | Iustin Pop | .RE |
93 | a9211170 | Iustin Pop | |
94 | a9211170 | Iustin Pop | The free memory and free disk values help ensure that all nodes are |
95 | a9211170 | Iustin Pop | somewhat balanced in their resource usage. The reserved memory helps |
96 | a9211170 | Iustin Pop | to ensure that nodes are somewhat balanced in holding secondary |
97 | a9211170 | Iustin Pop | instances, and that no node keeps too much memory reserved for |
98 | a9211170 | Iustin Pop | N+1. And finally, the N+1 percentage helps guide the algorithm towards |
99 | a9211170 | Iustin Pop | eliminating N+1 failures, if possible. |
100 | a9211170 | Iustin Pop | |
101 | d2ac5526 | Iustin Pop | Except for the N+1 failures and offline instances percentage, we use |
102 | d2ac5526 | Iustin Pop | the coefficient of variance since this brings the values into the same |
103 | d2ac5526 | Iustin Pop | unit so to speak, and with a restrict domain of values (between zero |
104 | d2ac5526 | Iustin Pop | and one). The percentage of N+1 failures, while also in this numeric |
105 | d2ac5526 | Iustin Pop | range, doesn't actually has the same meaning, but it has shown to work |
106 | d2ac5526 | Iustin Pop | well. |
107 | a9211170 | Iustin Pop | |
108 | a9211170 | Iustin Pop | The other alternative, using for N+1 checks the coefficient of |
109 | a9211170 | Iustin Pop | variance of (N+1 fail=1, N+1 pass=0) across nodes could hint the |
110 | a9211170 | Iustin Pop | algorithm to make more N+1 failures if most nodes are N+1 fail |
111 | a9211170 | Iustin Pop | already. Since this (making N+1 failures) is not allowed by other |
112 | a9211170 | Iustin Pop | rules of the algorithm, so the N+1 checks would simply not work |
113 | a9211170 | Iustin Pop | anymore in this case. |
114 | a9211170 | Iustin Pop | |
115 | d2ac5526 | Iustin Pop | The offline instances percentage (meaning the percentage of instances |
116 | d2ac5526 | Iustin Pop | living on offline nodes) will cause the algorithm to actively move |
117 | d2ac5526 | Iustin Pop | instances away from offline nodes. This, coupled with the restriction |
118 | d2ac5526 | Iustin Pop | on placement given by offline nodes, will cause evacuation of such |
119 | d2ac5526 | Iustin Pop | nodes. |
120 | d2ac5526 | Iustin Pop | |
121 | a9211170 | Iustin Pop | On a perfectly balanced cluster (all nodes the same size, all |
122 | a9211170 | Iustin Pop | instances the same size and spread across the nodes equally), all |
123 | a9211170 | Iustin Pop | values would be zero. This doesn't happen too often in practice :) |
124 | a9211170 | Iustin Pop | |
125 | d0003b35 | Iustin Pop | .SS OFFLINE INSTANCES |
126 | d0003b35 | Iustin Pop | |
127 | d0003b35 | Iustin Pop | Since current Ganeti versions do not report the memory used by offline |
128 | d0003b35 | Iustin Pop | (down) instances, ignoring the run status of instances will cause |
129 | d0003b35 | Iustin Pop | wrong calculations. For this reason, the algorithm subtracts the |
130 | d0003b35 | Iustin Pop | memory size of down instances from the free node memory of their |
131 | d0003b35 | Iustin Pop | primary node, in effect simulating the startup of such instances. |
132 | d0003b35 | Iustin Pop | |
133 | a9211170 | Iustin Pop | .SS OTHER POSSIBLE METRICS |
134 | a9211170 | Iustin Pop | |
135 | a9211170 | Iustin Pop | It would be desirable to add more metrics to the algorithm, especially |
136 | a9211170 | Iustin Pop | dynamically-computed metrics, such as: |
137 | d0003b35 | Iustin Pop | .RS 4 |
138 | d0003b35 | Iustin Pop | .TP 3 |
139 | d0003b35 | Iustin Pop | \(em |
140 | d0003b35 | Iustin Pop | CPU usage of instances, combined with VCPU versus PCPU count |
141 | d0003b35 | Iustin Pop | .TP |
142 | d0003b35 | Iustin Pop | \(em |
143 | d0003b35 | Iustin Pop | Disk IO usage |
144 | d0003b35 | Iustin Pop | .TP |
145 | d0003b35 | Iustin Pop | \(em |
146 | d0003b35 | Iustin Pop | Network IO |
147 | d0003b35 | Iustin Pop | .RE |
148 | a9211170 | Iustin Pop | |
149 | a9211170 | Iustin Pop | .SH OPTIONS |
150 | a9211170 | Iustin Pop | The options that can be passed to the program are as follows: |
151 | a9211170 | Iustin Pop | .TP |
152 | a9211170 | Iustin Pop | .B -C, --print-commands |
153 | a9211170 | Iustin Pop | Print the command list at the end of the run. Without this, the |
154 | a9211170 | Iustin Pop | program will only show a shorter, but cryptic output. |
155 | a9211170 | Iustin Pop | .TP |
156 | a9211170 | Iustin Pop | .B -p, --print-nodes |
157 | a9211170 | Iustin Pop | Prints the before and after node status, in a format designed to allow |
158 | a9211170 | Iustin Pop | the user to understand the node's most important parameters. |
159 | a9211170 | Iustin Pop | |
160 | a9211170 | Iustin Pop | The node list will contain these informations: |
161 | d2ac5526 | Iustin Pop | .RS |
162 | d2ac5526 | Iustin Pop | .TP |
163 | d2ac5526 | Iustin Pop | .B F |
164 | d2ac5526 | Iustin Pop | a character denoting the status of the node, with '-' meaning an |
165 | d2ac5526 | Iustin Pop | offline node, '*' meaning N+1 failure and blank meaning a good node |
166 | d2ac5526 | Iustin Pop | .TP |
167 | d2ac5526 | Iustin Pop | .B Name |
168 | d2ac5526 | Iustin Pop | the node name |
169 | d2ac5526 | Iustin Pop | .TP |
170 | d2ac5526 | Iustin Pop | .B t_mem |
171 | d2ac5526 | Iustin Pop | the total node memory |
172 | d2ac5526 | Iustin Pop | .TP |
173 | d2ac5526 | Iustin Pop | .B n_mem |
174 | d2ac5526 | Iustin Pop | the memory used by the node itself |
175 | d2ac5526 | Iustin Pop | .TP |
176 | d2ac5526 | Iustin Pop | .B i_mem |
177 | d2ac5526 | Iustin Pop | the memory used by instances |
178 | d2ac5526 | Iustin Pop | .TP |
179 | d2ac5526 | Iustin Pop | .B x_mem |
180 | d2ac5526 | Iustin Pop | amount memory which seems to be in use but cannot be determined why or |
181 | d2ac5526 | Iustin Pop | by which instance; usually this means that the hypervisor has some |
182 | d2ac5526 | Iustin Pop | overhead or that there are other reporting errors |
183 | d2ac5526 | Iustin Pop | .TP |
184 | d2ac5526 | Iustin Pop | .B f_mem |
185 | d2ac5526 | Iustin Pop | the free node memory |
186 | d2ac5526 | Iustin Pop | .TP |
187 | d2ac5526 | Iustin Pop | .B r_mem |
188 | d2ac5526 | Iustin Pop | the reserved node memory, which is the amount of free memory needed |
189 | d2ac5526 | Iustin Pop | for N+1 compliance |
190 | d2ac5526 | Iustin Pop | .TP |
191 | d2ac5526 | Iustin Pop | .B t_dsk |
192 | d2ac5526 | Iustin Pop | total disk |
193 | d2ac5526 | Iustin Pop | .TP |
194 | d2ac5526 | Iustin Pop | .B f_dsk |
195 | d2ac5526 | Iustin Pop | free disk |
196 | d2ac5526 | Iustin Pop | .TP |
197 | d2ac5526 | Iustin Pop | .B pri |
198 | d2ac5526 | Iustin Pop | number of primary instances |
199 | d2ac5526 | Iustin Pop | .TP |
200 | d2ac5526 | Iustin Pop | .B sec |
201 | d2ac5526 | Iustin Pop | number of secondary instances |
202 | d2ac5526 | Iustin Pop | .TP |
203 | d2ac5526 | Iustin Pop | .B p_fmem |
204 | d2ac5526 | Iustin Pop | percent of free memory |
205 | d2ac5526 | Iustin Pop | .TP |
206 | d2ac5526 | Iustin Pop | .B p_fdsk |
207 | d2ac5526 | Iustin Pop | percent of free disk |
208 | d2ac5526 | Iustin Pop | .RE |
209 | a9211170 | Iustin Pop | |
210 | a9211170 | Iustin Pop | .TP |
211 | a9211170 | Iustin Pop | .B -o, --oneline |
212 | a9211170 | Iustin Pop | Only shows a one-line output from the program, designed for the case |
213 | a9211170 | Iustin Pop | when one wants to look at multiple clusters at once and check their |
214 | a9211170 | Iustin Pop | status. |
215 | a9211170 | Iustin Pop | |
216 | a9211170 | Iustin Pop | The line will contain four fields: |
217 | d0003b35 | Iustin Pop | .RS |
218 | d0003b35 | Iustin Pop | .RS 4 |
219 | d0003b35 | Iustin Pop | .TP 3 |
220 | d0003b35 | Iustin Pop | \(em |
221 | d0003b35 | Iustin Pop | initial cluster score |
222 | d0003b35 | Iustin Pop | .TP |
223 | d0003b35 | Iustin Pop | \(em |
224 | d0003b35 | Iustin Pop | number of steps in the solution |
225 | d0003b35 | Iustin Pop | .TP |
226 | d0003b35 | Iustin Pop | \(em |
227 | d0003b35 | Iustin Pop | final cluster score |
228 | d0003b35 | Iustin Pop | .TP |
229 | d0003b35 | Iustin Pop | \(em |
230 | d0003b35 | Iustin Pop | improvement in the cluster score |
231 | d0003b35 | Iustin Pop | .RE |
232 | d0003b35 | Iustin Pop | .RE |
233 | a9211170 | Iustin Pop | |
234 | a9211170 | Iustin Pop | .TP |
235 | d2ac5526 | Iustin Pop | .BI "-O " name |
236 | d2ac5526 | Iustin Pop | This option (which can be given multiple times) will mark nodes as |
237 | d2ac5526 | Iustin Pop | being \fIoffline\fR. This means a couple of things: |
238 | d2ac5526 | Iustin Pop | .RS |
239 | d0003b35 | Iustin Pop | .RS 4 |
240 | d0003b35 | Iustin Pop | .TP 3 |
241 | d0003b35 | Iustin Pop | \(em |
242 | d2ac5526 | Iustin Pop | instances won't be placed on these nodes, not even temporarily; |
243 | d2ac5526 | Iustin Pop | e.g. the \fIreplace primary\fR move is not available if the secondary |
244 | d2ac5526 | Iustin Pop | node is offline, since this move requires a failover. |
245 | d2ac5526 | Iustin Pop | .TP |
246 | d0003b35 | Iustin Pop | \(em |
247 | d2ac5526 | Iustin Pop | these nodes will not be included in the score calculation (except for |
248 | d2ac5526 | Iustin Pop | the percentage of instances on offline nodes) |
249 | d2ac5526 | Iustin Pop | .RE |
250 | d0003b35 | Iustin Pop | .RE |
251 | d2ac5526 | Iustin Pop | |
252 | d2ac5526 | Iustin Pop | .TP |
253 | b0517d61 | Iustin Pop | .BI "-e" score ", --min-score=" score |
254 | b0517d61 | Iustin Pop | This parameter denotes the minimum score we are happy with and alters |
255 | b0517d61 | Iustin Pop | the computation in two ways: |
256 | b0517d61 | Iustin Pop | .RS |
257 | b0517d61 | Iustin Pop | .RS 4 |
258 | b0517d61 | Iustin Pop | .TP 3 |
259 | b0517d61 | Iustin Pop | \(em |
260 | b0517d61 | Iustin Pop | if the cluster has the initial score lower than this value, then we |
261 | b0517d61 | Iustin Pop | don't enter the algorithm at all, and exit with success |
262 | b0517d61 | Iustin Pop | .TP |
263 | b0517d61 | Iustin Pop | \(em |
264 | b0517d61 | Iustin Pop | during the iterative process, if we reach a score lower than this |
265 | b0517d61 | Iustin Pop | value, we exit the algorithm |
266 | b0517d61 | Iustin Pop | .RE |
267 | b0517d61 | Iustin Pop | The default value of the parameter is currently \fI1e-9\fR (chosen |
268 | b0517d61 | Iustin Pop | empirically). |
269 | b0517d61 | Iustin Pop | .RE |
270 | b0517d61 | Iustin Pop | |
271 | b0517d61 | Iustin Pop | .TP |
272 | a9211170 | Iustin Pop | .BI "-n" nodefile ", --nodes=" nodefile |
273 | a9211170 | Iustin Pop | The name of the file holding node information (if not collecting via |
274 | 7b255913 | Iustin Pop | RAPI), instead of the default \fInodes\fR file (but see below how to |
275 | 7b255913 | Iustin Pop | customize the default value via the environment). |
276 | a9211170 | Iustin Pop | |
277 | a9211170 | Iustin Pop | .TP |
278 | a9211170 | Iustin Pop | .BI "-i" instancefile ", --instances=" instancefile |
279 | a9211170 | Iustin Pop | The name of the file holding instance information (if not collecting |
280 | 7b255913 | Iustin Pop | via RAPI), instead of the default \fIinstances\fR file (but see below |
281 | 7b255913 | Iustin Pop | how to customize the default value via the environment). |
282 | a9211170 | Iustin Pop | |
283 | a9211170 | Iustin Pop | .TP |
284 | a9211170 | Iustin Pop | .BI "-m" cluster |
285 | a9211170 | Iustin Pop | Collect data not from files but directly from the |
286 | a9211170 | Iustin Pop | .I cluster |
287 | a9211170 | Iustin Pop | given as an argument via RAPI. This work for both Ganeti 1.2 and |
288 | a9211170 | Iustin Pop | Ganeti 2.0. |
289 | a9211170 | Iustin Pop | |
290 | a9211170 | Iustin Pop | .TP |
291 | a9211170 | Iustin Pop | .BI "-l" N ", --max-length=" N |
292 | a9211170 | Iustin Pop | Restrict the solution to this length. This can be used for example to |
293 | a9211170 | Iustin Pop | automate the execution of the balancing. |
294 | a9211170 | Iustin Pop | |
295 | a9211170 | Iustin Pop | .TP |
296 | a9211170 | Iustin Pop | .B -v, --verbose |
297 | a9211170 | Iustin Pop | Increase the output verbosity. Each usage of this option will increase |
298 | a9211170 | Iustin Pop | the verbosity (currently more than 2 doesn't make sense) from the |
299 | d09b6ed3 | Iustin Pop | default of one. |
300 | d09b6ed3 | Iustin Pop | |
301 | d09b6ed3 | Iustin Pop | .TP |
302 | d09b6ed3 | Iustin Pop | .B -q, --quiet |
303 | d09b6ed3 | Iustin Pop | Decrease the output verbosity. Each usage of this option will decrease |
304 | d09b6ed3 | Iustin Pop | the verbosity (less than zero doesn't make sense) from the default of |
305 | d09b6ed3 | Iustin Pop | one. |
306 | a9211170 | Iustin Pop | |
307 | a9211170 | Iustin Pop | .TP |
308 | a9211170 | Iustin Pop | .B -V, --version |
309 | a9211170 | Iustin Pop | Just show the program version and exit. |
310 | a9211170 | Iustin Pop | |
311 | a9211170 | Iustin Pop | .SH EXIT STATUS |
312 | a9211170 | Iustin Pop | |
313 | a9211170 | Iustin Pop | The exist status of the command will be zero, unless for some reason |
314 | a9211170 | Iustin Pop | the algorithm fatally failed (e.g. wrong node or instance data). |
315 | a9211170 | Iustin Pop | |
316 | 7b255913 | Iustin Pop | .SH ENVIRONMENT |
317 | 7b255913 | Iustin Pop | |
318 | 7b255913 | Iustin Pop | If the variables \fBHTOOLS_NODES\fR and \fBHTOOLS_INSTANCES\fR are |
319 | 7b255913 | Iustin Pop | present in the environment, they will override the default names for |
320 | 7b255913 | Iustin Pop | the nodes and instances files. These will have of course no effect |
321 | 7b255913 | Iustin Pop | when RAPI is used. |
322 | 7b255913 | Iustin Pop | |
323 | a9211170 | Iustin Pop | .SH BUGS |
324 | a9211170 | Iustin Pop | |
325 | a9211170 | Iustin Pop | The program does not check its input data for consistency, and aborts |
326 | a9211170 | Iustin Pop | with cryptic errors messages in this case. |
327 | a9211170 | Iustin Pop | |
328 | a9211170 | Iustin Pop | The algorithm is not perfect. |
329 | a9211170 | Iustin Pop | |
330 | d0003b35 | Iustin Pop | The algorithm doesn't deal with non-\fBdrbd\fR instances, and chokes |
331 | d0003b35 | Iustin Pop | on input data which has such instances. |
332 | d0003b35 | Iustin Pop | |
333 | a9211170 | Iustin Pop | The output format is not easily scriptable, and the program should |
334 | a9211170 | Iustin Pop | feed moves directly into Ganeti (either via RAPI or via a gnt-debug |
335 | a9211170 | Iustin Pop | input file). |
336 | a9211170 | Iustin Pop | |
337 | a9211170 | Iustin Pop | .SH EXAMPLE |
338 | a9211170 | Iustin Pop | |
339 | d2ac5526 | Iustin Pop | Note that this example are not for the latest version (they don't have |
340 | d2ac5526 | Iustin Pop | full node data). |
341 | d2ac5526 | Iustin Pop | |
342 | a9211170 | Iustin Pop | .SS Default output |
343 | a9211170 | Iustin Pop | |
344 | a9211170 | Iustin Pop | With the default options, the program shows each individual step and |
345 | a9211170 | Iustin Pop | the improvements it brings in cluster score: |
346 | a9211170 | Iustin Pop | |
347 | a9211170 | Iustin Pop | .in +4n |
348 | a9211170 | Iustin Pop | .nf |
349 | a9211170 | Iustin Pop | .RB "$" " hbal" |
350 | a9211170 | Iustin Pop | Loaded 20 nodes, 80 instances |
351 | a9211170 | Iustin Pop | Cluster is not N+1 happy, continuing but no guarantee that the cluster will end N+1 happy. |
352 | a9211170 | Iustin Pop | Initial score: 0.52329131 |
353 | a9211170 | Iustin Pop | Trying to minimize the CV... |
354 | a9211170 | Iustin Pop | 1. instance14 node1:node10 => node16:node10 0.42109120 a=f r:node16 f |
355 | a9211170 | Iustin Pop | 2. instance54 node4:node15 => node16:node15 0.31904594 a=f r:node16 f |
356 | a9211170 | Iustin Pop | 3. instance4 node5:node2 => node2:node16 0.26611015 a=f r:node16 |
357 | a9211170 | Iustin Pop | 4. instance48 node18:node20 => node2:node18 0.21361717 a=r:node2 f |
358 | a9211170 | Iustin Pop | 5. instance93 node19:node18 => node16:node19 0.16166425 a=r:node16 f |
359 | a9211170 | Iustin Pop | 6. instance89 node3:node20 => node2:node3 0.11005629 a=r:node2 f |
360 | a9211170 | Iustin Pop | 7. instance5 node6:node2 => node16:node6 0.05841589 a=r:node16 f |
361 | a9211170 | Iustin Pop | 8. instance94 node7:node20 => node20:node16 0.00658759 a=f r:node16 |
362 | a9211170 | Iustin Pop | 9. instance44 node20:node2 => node2:node15 0.00438740 a=f r:node15 |
363 | a9211170 | Iustin Pop | 10. instance62 node14:node18 => node14:node16 0.00390087 a=r:node16 |
364 | a9211170 | Iustin Pop | 11. instance13 node11:node14 => node11:node16 0.00361787 a=r:node16 |
365 | a9211170 | Iustin Pop | 12. instance19 node10:node11 => node10:node7 0.00336636 a=r:node7 |
366 | a9211170 | Iustin Pop | 13. instance43 node12:node13 => node12:node1 0.00305681 a=r:node1 |
367 | a9211170 | Iustin Pop | 14. instance1 node1:node2 => node1:node4 0.00263124 a=r:node4 |
368 | a9211170 | Iustin Pop | 15. instance58 node19:node20 => node19:node17 0.00252594 a=r:node17 |
369 | a9211170 | Iustin Pop | Cluster score improved from 0.52329131 to 0.00252594 |
370 | a9211170 | Iustin Pop | .fi |
371 | a9211170 | Iustin Pop | .in |
372 | a9211170 | Iustin Pop | |
373 | a9211170 | Iustin Pop | In the above output, we can see: |
374 | a9211170 | Iustin Pop | - the input data (here from files) shows a cluster with 20 nodes and |
375 | a9211170 | Iustin Pop | 80 instances |
376 | a9211170 | Iustin Pop | - the cluster is not initially N+1 compliant |
377 | a9211170 | Iustin Pop | - the initial score is 0.52329131 |
378 | a9211170 | Iustin Pop | |
379 | a9211170 | Iustin Pop | The step list follows, showing the instance, its initial |
380 | a9211170 | Iustin Pop | primary/secondary nodes, the new primary secondary, the cluster list, |
381 | a9211170 | Iustin Pop | and the actions taken in this step (with 'f' denoting failover/migrate |
382 | a9211170 | Iustin Pop | and 'r' denoting replace secondary). |
383 | a9211170 | Iustin Pop | |
384 | a9211170 | Iustin Pop | Finally, the program shows the improvement in cluster score. |
385 | a9211170 | Iustin Pop | |
386 | a9211170 | Iustin Pop | A more detailed output is obtained via the \fB-C\fR and \fB-p\fR options: |
387 | a9211170 | Iustin Pop | |
388 | a9211170 | Iustin Pop | .in +4n |
389 | a9211170 | Iustin Pop | .nf |
390 | a9211170 | Iustin Pop | .RB "$" " hbal" |
391 | a9211170 | Iustin Pop | Loaded 20 nodes, 80 instances |
392 | a9211170 | Iustin Pop | Cluster is not N+1 happy, continuing but no guarantee that the cluster will end N+1 happy. |
393 | a9211170 | Iustin Pop | Initial cluster status: |
394 | a9211170 | Iustin Pop | N1 Name t_mem f_mem r_mem t_dsk f_dsk pri sec p_fmem p_fdsk |
395 | a9211170 | Iustin Pop | * node1 32762 1280 6000 1861 1026 5 3 0.03907 0.55179 |
396 | a9211170 | Iustin Pop | node2 32762 31280 12000 1861 1026 0 8 0.95476 0.55179 |
397 | a9211170 | Iustin Pop | * node3 32762 1280 6000 1861 1026 5 3 0.03907 0.55179 |
398 | a9211170 | Iustin Pop | * node4 32762 1280 6000 1861 1026 5 3 0.03907 0.55179 |
399 | a9211170 | Iustin Pop | * node5 32762 1280 6000 1861 978 5 5 0.03907 0.52573 |
400 | a9211170 | Iustin Pop | * node6 32762 1280 6000 1861 1026 5 3 0.03907 0.55179 |
401 | a9211170 | Iustin Pop | * node7 32762 1280 6000 1861 1026 5 3 0.03907 0.55179 |
402 | a9211170 | Iustin Pop | node8 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
403 | a9211170 | Iustin Pop | node9 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
404 | a9211170 | Iustin Pop | * node10 32762 7280 12000 1861 1026 4 4 0.22221 0.55179 |
405 | a9211170 | Iustin Pop | node11 32762 7280 6000 1861 922 4 5 0.22221 0.49577 |
406 | a9211170 | Iustin Pop | node12 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
407 | a9211170 | Iustin Pop | node13 32762 7280 6000 1861 922 4 5 0.22221 0.49577 |
408 | a9211170 | Iustin Pop | node14 32762 7280 6000 1861 922 4 5 0.22221 0.49577 |
409 | a9211170 | Iustin Pop | * node15 32762 7280 12000 1861 1131 4 3 0.22221 0.60782 |
410 | a9211170 | Iustin Pop | node16 32762 31280 0 1861 1860 0 0 0.95476 1.00000 |
411 | a9211170 | Iustin Pop | node17 32762 7280 6000 1861 1106 5 3 0.22221 0.59479 |
412 | a9211170 | Iustin Pop | * node18 32762 1280 6000 1396 561 5 3 0.03907 0.40239 |
413 | a9211170 | Iustin Pop | * node19 32762 1280 6000 1861 1026 5 3 0.03907 0.55179 |
414 | a9211170 | Iustin Pop | node20 32762 13280 12000 1861 689 3 9 0.40535 0.37068 |
415 | a9211170 | Iustin Pop | |
416 | a9211170 | Iustin Pop | Initial score: 0.52329131 |
417 | a9211170 | Iustin Pop | Trying to minimize the CV... |
418 | a9211170 | Iustin Pop | 1. instance14 node1:node10 => node16:node10 0.42109120 a=f r:node16 f |
419 | a9211170 | Iustin Pop | 2. instance54 node4:node15 => node16:node15 0.31904594 a=f r:node16 f |
420 | a9211170 | Iustin Pop | 3. instance4 node5:node2 => node2:node16 0.26611015 a=f r:node16 |
421 | a9211170 | Iustin Pop | 4. instance48 node18:node20 => node2:node18 0.21361717 a=r:node2 f |
422 | a9211170 | Iustin Pop | 5. instance93 node19:node18 => node16:node19 0.16166425 a=r:node16 f |
423 | a9211170 | Iustin Pop | 6. instance89 node3:node20 => node2:node3 0.11005629 a=r:node2 f |
424 | a9211170 | Iustin Pop | 7. instance5 node6:node2 => node16:node6 0.05841589 a=r:node16 f |
425 | a9211170 | Iustin Pop | 8. instance94 node7:node20 => node20:node16 0.00658759 a=f r:node16 |
426 | a9211170 | Iustin Pop | 9. instance44 node20:node2 => node2:node15 0.00438740 a=f r:node15 |
427 | a9211170 | Iustin Pop | 10. instance62 node14:node18 => node14:node16 0.00390087 a=r:node16 |
428 | a9211170 | Iustin Pop | 11. instance13 node11:node14 => node11:node16 0.00361787 a=r:node16 |
429 | a9211170 | Iustin Pop | 12. instance19 node10:node11 => node10:node7 0.00336636 a=r:node7 |
430 | a9211170 | Iustin Pop | 13. instance43 node12:node13 => node12:node1 0.00305681 a=r:node1 |
431 | a9211170 | Iustin Pop | 14. instance1 node1:node2 => node1:node4 0.00263124 a=r:node4 |
432 | a9211170 | Iustin Pop | 15. instance58 node19:node20 => node19:node17 0.00252594 a=r:node17 |
433 | a9211170 | Iustin Pop | Cluster score improved from 0.52329131 to 0.00252594 |
434 | a9211170 | Iustin Pop | |
435 | a9211170 | Iustin Pop | Commands to run to reach the above solution: |
436 | a9211170 | Iustin Pop | echo step 1 |
437 | a9211170 | Iustin Pop | echo gnt-instance migrate instance14 |
438 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node16 instance14 |
439 | a9211170 | Iustin Pop | echo gnt-instance migrate instance14 |
440 | a9211170 | Iustin Pop | echo step 2 |
441 | a9211170 | Iustin Pop | echo gnt-instance migrate instance54 |
442 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node16 instance54 |
443 | a9211170 | Iustin Pop | echo gnt-instance migrate instance54 |
444 | a9211170 | Iustin Pop | echo step 3 |
445 | a9211170 | Iustin Pop | echo gnt-instance migrate instance4 |
446 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node16 instance4 |
447 | a9211170 | Iustin Pop | echo step 4 |
448 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node2 instance48 |
449 | a9211170 | Iustin Pop | echo gnt-instance migrate instance48 |
450 | a9211170 | Iustin Pop | echo step 5 |
451 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node16 instance93 |
452 | a9211170 | Iustin Pop | echo gnt-instance migrate instance93 |
453 | a9211170 | Iustin Pop | echo step 6 |
454 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node2 instance89 |
455 | a9211170 | Iustin Pop | echo gnt-instance migrate instance89 |
456 | a9211170 | Iustin Pop | echo step 7 |
457 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node16 instance5 |
458 | a9211170 | Iustin Pop | echo gnt-instance migrate instance5 |
459 | a9211170 | Iustin Pop | echo step 8 |
460 | a9211170 | Iustin Pop | echo gnt-instance migrate instance94 |
461 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node16 instance94 |
462 | a9211170 | Iustin Pop | echo step 9 |
463 | a9211170 | Iustin Pop | echo gnt-instance migrate instance44 |
464 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node15 instance44 |
465 | a9211170 | Iustin Pop | echo step 10 |
466 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node16 instance62 |
467 | a9211170 | Iustin Pop | echo step 11 |
468 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node16 instance13 |
469 | a9211170 | Iustin Pop | echo step 12 |
470 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node7 instance19 |
471 | a9211170 | Iustin Pop | echo step 13 |
472 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node1 instance43 |
473 | a9211170 | Iustin Pop | echo step 14 |
474 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node4 instance1 |
475 | a9211170 | Iustin Pop | echo step 15 |
476 | a9211170 | Iustin Pop | echo gnt-instance replace-disks -n node17 instance58 |
477 | a9211170 | Iustin Pop | |
478 | a9211170 | Iustin Pop | Final cluster status: |
479 | a9211170 | Iustin Pop | N1 Name t_mem f_mem r_mem t_dsk f_dsk pri sec p_fmem p_fdsk |
480 | a9211170 | Iustin Pop | node1 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
481 | a9211170 | Iustin Pop | node2 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
482 | a9211170 | Iustin Pop | node3 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
483 | a9211170 | Iustin Pop | node4 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
484 | a9211170 | Iustin Pop | node5 32762 7280 6000 1861 1078 4 5 0.22221 0.57947 |
485 | a9211170 | Iustin Pop | node6 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
486 | a9211170 | Iustin Pop | node7 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
487 | a9211170 | Iustin Pop | node8 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
488 | a9211170 | Iustin Pop | node9 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
489 | a9211170 | Iustin Pop | node10 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
490 | a9211170 | Iustin Pop | node11 32762 7280 6000 1861 1022 4 4 0.22221 0.54951 |
491 | a9211170 | Iustin Pop | node12 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
492 | a9211170 | Iustin Pop | node13 32762 7280 6000 1861 1022 4 4 0.22221 0.54951 |
493 | a9211170 | Iustin Pop | node14 32762 7280 6000 1861 1022 4 4 0.22221 0.54951 |
494 | a9211170 | Iustin Pop | node15 32762 7280 6000 1861 1031 4 4 0.22221 0.55408 |
495 | a9211170 | Iustin Pop | node16 32762 7280 6000 1861 1060 4 4 0.22221 0.57007 |
496 | a9211170 | Iustin Pop | node17 32762 7280 6000 1861 1006 5 4 0.22221 0.54105 |
497 | a9211170 | Iustin Pop | node18 32762 7280 6000 1396 761 4 2 0.22221 0.54570 |
498 | a9211170 | Iustin Pop | node19 32762 7280 6000 1861 1026 4 4 0.22221 0.55179 |
499 | a9211170 | Iustin Pop | node20 32762 13280 6000 1861 1089 3 5 0.40535 0.58565 |
500 | a9211170 | Iustin Pop | |
501 | a9211170 | Iustin Pop | .fi |
502 | a9211170 | Iustin Pop | .in |
503 | a9211170 | Iustin Pop | |
504 | a9211170 | Iustin Pop | Here we see, beside the step list, the initial and final cluster |
505 | a9211170 | Iustin Pop | status, with the final one showing all nodes being N+1 compliant, and |
506 | a9211170 | Iustin Pop | the command list to reach the final solution. In the initial listing, |
507 | a9211170 | Iustin Pop | we see which nodes are not N+1 compliant. |
508 | a9211170 | Iustin Pop | |
509 | a9211170 | Iustin Pop | The algorithm is stable as long as each step above is fully completed, |
510 | a9211170 | Iustin Pop | e.g. in step 8, both the migrate and the replace-disks are |
511 | a9211170 | Iustin Pop | done. Otherwise, if only the migrate is done, the input data is |
512 | a9211170 | Iustin Pop | changed in a way that the program will output a different solution |
513 | a9211170 | Iustin Pop | list (but hopefully will end in the same state). |
514 | a9211170 | Iustin Pop | |
515 | a9211170 | Iustin Pop | .SH SEE ALSO |
516 | d2ac5526 | Iustin Pop | .BR hn1 "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), " |
517 | d2ac5526 | Iustin Pop | .BR gnt-node "(8)" |