root / hn1.1 @ d2ac5526
History | View | Annotate | Download (6.2 kB)
1 |
.TH HN1 1 2009-03-22 htools "Ganeti H-tools" |
---|---|
2 |
.SH NAME |
3 |
hn1 \- N+1 fixer for Ganeti |
4 |
|
5 |
.SH SYNOPSIS |
6 |
.B hn1 |
7 |
.B "[-C]" |
8 |
.B "[-p]" |
9 |
.B "[-o]" |
10 |
.BI "[ -m " cluster "]" |
11 |
.BI "[-n " nodes-file " ]" |
12 |
.BI "[ -i " instances-file "]" |
13 |
.BI "[-d " depth "]" |
14 |
.BI "[-r " max-removals "]" |
15 |
.BI "[-L " max-delta "]" |
16 |
.BI "[-l " min-delta "]" |
17 |
|
18 |
.B hn1 |
19 |
.B --version |
20 |
|
21 |
.SH DESCRIPTION |
22 |
hn1 is a cluster N+1 fixer that tries to compute the minimum number of |
23 |
moves needed for getting all nodes to be N+1 compliant. |
24 |
|
25 |
The algorithm is designed to be a 'perfect' algorithm, so that we |
26 |
always examine the entire solution space until we find the minimum |
27 |
solution. The algorithm can be tweaked via the \fB-d\fR, \fB-r\fR, |
28 |
\fB-L\fR and \fB-l\fR options. |
29 |
|
30 |
By default, the program will show the solution in a somewhat cryptic |
31 |
format; for getting the actual Ganeti command list, use the \fB-C\fR |
32 |
option. |
33 |
|
34 |
\fBNote:\fR this program is somewhat deprecated; \fBhbal(1)\fR gives |
35 |
usually much faster results, and a better cluster. It is recommended |
36 |
to use this program only when \fBhbal\fR doesn't give a N+1 compliant |
37 |
cluster. |
38 |
|
39 |
.SS ALGORITHM |
40 |
|
41 |
The algorithm works in multiple rounds, of increasing \fIdepth\fR, |
42 |
until we have a solution. |
43 |
|
44 |
First, before starting the solution computation, we compute all the |
45 |
N+1-fail nodes and the instances they hold. These instances are |
46 |
candidate for replacement (and only these!). |
47 |
|
48 |
The program start then with \fIdepth\fR one (unless overridden via the |
49 |
\fB-d\fR option), and at each round: |
50 |
- it tries to remove from the cluster as many instances as the |
51 |
current depth in order to make the cluster N+1 compliant |
52 |
- then, for each of the possible instance combinations that allow |
53 |
this (unless the total size is reduced via the \fB-r\fR option), |
54 |
it tries to put them back on the cluster while maintaining N+1 |
55 |
compliance |
56 |
|
57 |
It might be that at a given round, the results are: |
58 |
- no instance combination that can be put back; this means it is not |
59 |
possible to make the cluster N+1 compliant with this number of |
60 |
instances being moved, so we increase the depth and go on to the |
61 |
next round |
62 |
- one or more successful result, in which case we take the one that |
63 |
has as few changes as possible (by change meaning a replace-disks |
64 |
needed) |
65 |
|
66 |
The main problem with the algorithm is that, being an exhaustive |
67 |
search, the CPU time required grows very very quickly based on |
68 |
depth. On a 20-node, 80-instances cluster, depths up to 5-6 are |
69 |
quickly computed, and depth 10 could already take days. |
70 |
|
71 |
Since the algorithm is designed to prune the search space as quickly |
72 |
as possible, is by luck we find a good solution early at a given |
73 |
depth, then the other solutions which would result in a bigger delta |
74 |
(the number of changes) will not be investigated, and the program will |
75 |
finish fast. Since this is random and depends on where in the full |
76 |
solution space the good solution will be, there are two options for |
77 |
cutting down the time needed: |
78 |
- \fB-l\fR makes any solution that has delta lower than its |
79 |
parameter succeed instantly |
80 |
- \fB-L\fR makes any solution with delta higher than its parameter |
81 |
being rejected instantly (and not descend on the search tree) |
82 |
|
83 |
.SH OPTIONS |
84 |
The options that can be passed to the program are as follows: |
85 |
.TP |
86 |
.B -C, --print-commands |
87 |
Print the command list at the end of the run. Without this, the |
88 |
program will only show a shorter, but cryptic output. |
89 |
.TP |
90 |
.B -p, --print-nodes |
91 |
Prints the before and after node status, in a format designed to allow |
92 |
the user to understand the node's most important parameters. |
93 |
|
94 |
The node list will contain these informations: |
95 |
.RS |
96 |
.TP |
97 |
.B F |
98 |
a character denoting the status of the node, with '-' meaning an |
99 |
offline node, '*' meaning N+1 failure and blank meaning a good node |
100 |
.TP |
101 |
.B Name |
102 |
the node name |
103 |
.TP |
104 |
.B t_mem |
105 |
the total node memory |
106 |
.TP |
107 |
.B n_mem |
108 |
the memory used by the node itself |
109 |
.TP |
110 |
.B i_mem |
111 |
the memory used by instances |
112 |
.TP |
113 |
.B x_mem |
114 |
amount memory which seems to be in use but cannot be determined why or |
115 |
by which instance; usually this means that the hypervisor has some |
116 |
overhead or that there are other reporting errors |
117 |
.TP |
118 |
.B f_mem |
119 |
the free node memory |
120 |
.TP |
121 |
.B r_mem |
122 |
the reserved node memory, which is the amount of free memory needed |
123 |
for N+1 compliance |
124 |
.TP |
125 |
.B t_dsk |
126 |
total disk |
127 |
.TP |
128 |
.B f_dsk |
129 |
free disk |
130 |
.TP |
131 |
.B pri |
132 |
number of primary instances |
133 |
.TP |
134 |
.B sec |
135 |
number of secondary instances |
136 |
.TP |
137 |
.B p_fmem |
138 |
percent of free memory |
139 |
.TP |
140 |
.B p_fdsk |
141 |
percent of free disk |
142 |
.RE |
143 |
|
144 |
.TP |
145 |
.BI "-n" nodefile ", --nodes=" nodefile |
146 |
The name of the file holding node information (if not collecting via |
147 |
RAPI), instead of the default |
148 |
.I nodes |
149 |
file. |
150 |
|
151 |
.TP |
152 |
.BI "-i" instancefile ", --instances=" instancefile |
153 |
The name of the file holding instance information (if not collecting |
154 |
via RAPI), instead of the default |
155 |
.I instances |
156 |
file. |
157 |
|
158 |
.TP |
159 |
.BI "-m" cluster |
160 |
Collect data not from files but directly from the |
161 |
.I cluster |
162 |
given as an argument via RAPI. This work for both Ganeti 1.2 and |
163 |
Ganeti 2.0. |
164 |
|
165 |
.TP |
166 |
.BI "-d" DEPTH ", --depth=" DEPTH |
167 |
Start the algorithm directly at depth \fID\fR, so that we don't |
168 |
examine lower depth. This will be faster if we know a solution is not |
169 |
found a lower depths, and thus it's unneeded to search them. |
170 |
|
171 |
.TP |
172 |
.BI "-l" MIN-DELTA ", --min-delta=" MIN-DELTA |
173 |
If we find a solution with delta lower or equal to \fIMIN-DELTA\fR, |
174 |
consider this a success and don't examine further. |
175 |
|
176 |
.TP |
177 |
.BI "-L" MAX-DELTA ", --max-delta=" MAX-DELTA |
178 |
If while computing a solution, it's intermediate delta is already |
179 |
higher or equal to \fIMAX-DELTA\fR, consider this a failure and abort |
180 |
(as if N+1 checks have failed). |
181 |
|
182 |
.TP |
183 |
.B -V, --version |
184 |
Just show the program version and exit. |
185 |
|
186 |
.SH EXIT STATUS |
187 |
|
188 |
The exist status of the command will be zero, unless for some reason |
189 |
the algorithm fatally failed (e.g. wrong node or instance data). |
190 |
|
191 |
.SH BUGS |
192 |
|
193 |
The program does not check its input data for consistency, and aborts |
194 |
with cryptic errors messages in this case. |
195 |
|
196 |
The algorithm doesn't know when it won't be possible to reach N+1 |
197 |
compliance at all, and will happily churn CPU for ages without |
198 |
realising it won't reach a solution. |
199 |
|
200 |
The algorithm is too slow. |
201 |
|
202 |
The output format is not easily scriptable, and the program should |
203 |
feed moves directly into Ganeti (either via RAPI or via a gnt-debug |
204 |
input file). |
205 |
|
206 |
.SH SEE ALSO |
207 |
.BR hbal "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), " |
208 |
.BR gnt-node "(8)" |