Revision d2ac5526
b/README | ||
---|---|---|
242 | 242 |
|
243 | 243 |
gnt-node list -oname,mtotal,mnode,mfree,dtotal,dfree \ |
244 | 244 |
--separator '|' --no-headers > nodes |
245 |
gnt-instance list -oname,admin_ram,sda_size,pnode,snodes \ |
|
245 |
gnt-instance list -oname,admin_ram,sda_size,status,pnode,snodes \
|
|
246 | 246 |
--separator '|' --no-head > instances |
247 | 247 |
|
248 | 248 |
These two files should be saved under the names of 'nodes' and 'instances'. |
b/hbal.1 | ||
---|---|---|
1 |
.TH HBAL 1 2009-03-14 htools "Ganeti H-tools"
|
|
1 |
.TH HBAL 1 2009-03-22 htools "Ganeti H-tools"
|
|
2 | 2 |
.SH NAME |
3 | 3 |
hbal \- Cluster balancer for Ganeti |
4 | 4 |
|
... | ... | |
7 | 7 |
.B "[-C]" |
8 | 8 |
.B "[-p]" |
9 | 9 |
.B "[-o]" |
10 |
.B "-l" |
|
11 |
.BI "[ -m " cluster "]" |
|
10 |
.BI "[-l" limit "]" |
|
11 |
.BI "[-O" name... "]" |
|
12 |
.BI "[-m " cluster "]" |
|
12 | 13 |
.BI "[-n " nodes-file " ]" |
13 |
.BI "[ -i " instances-file "]"
|
|
14 |
.BI "[-i " instances-file "]" |
|
14 | 15 |
|
15 | 16 |
.B hbal |
16 | 17 |
.B --version |
... | ... | |
61 | 62 |
- coefficient of variance of the percent of reserved memory |
62 | 63 |
- coefficient of variance of the percent of free disk |
63 | 64 |
- percentage of nodes failing N+1 check |
65 |
- percentage of instances living (either as primary or secondary) on |
|
66 |
offline nodes |
|
64 | 67 |
|
65 | 68 |
The free memory and free disk values help ensure that all nodes are |
66 | 69 |
somewhat balanced in their resource usage. The reserved memory helps |
... | ... | |
69 | 72 |
N+1. And finally, the N+1 percentage helps guide the algorithm towards |
70 | 73 |
eliminating N+1 failures, if possible. |
71 | 74 |
|
72 |
Except for the N+1 failures, we use the coefficient of variance since |
|
73 |
this brings the values into the same unit so to speak, and with a |
|
74 |
restrict domain of values (between zero and one). The percentage of |
|
75 |
N+1 failures, while also in this numeric range, doesn't actually has |
|
76 |
the same meaning, but it has shown to work well. |
|
75 |
Except for the N+1 failures and offline instances percentage, we use |
|
76 |
the coefficient of variance since this brings the values into the same |
|
77 |
unit so to speak, and with a restrict domain of values (between zero |
|
78 |
and one). The percentage of N+1 failures, while also in this numeric |
|
79 |
range, doesn't actually has the same meaning, but it has shown to work |
|
80 |
well. |
|
77 | 81 |
|
78 | 82 |
The other alternative, using for N+1 checks the coefficient of |
79 | 83 |
variance of (N+1 fail=1, N+1 pass=0) across nodes could hint the |
... | ... | |
82 | 86 |
rules of the algorithm, so the N+1 checks would simply not work |
83 | 87 |
anymore in this case. |
84 | 88 |
|
89 |
The offline instances percentage (meaning the percentage of instances |
|
90 |
living on offline nodes) will cause the algorithm to actively move |
|
91 |
instances away from offline nodes. This, coupled with the restriction |
|
92 |
on placement given by offline nodes, will cause evacuation of such |
|
93 |
nodes. |
|
94 |
|
|
85 | 95 |
On a perfectly balanced cluster (all nodes the same size, all |
86 | 96 |
instances the same size and spread across the nodes equally), all |
87 | 97 |
values would be zero. This doesn't happen too often in practice :) |
... | ... | |
106 | 116 |
the user to understand the node's most important parameters. |
107 | 117 |
|
108 | 118 |
The node list will contain these informations: |
109 |
- a character denoting the status of the node, with '-' meaning an |
|
110 |
offline node, '*' meaning N+1 failure and blank meaning a good |
|
111 |
node |
|
112 |
- the node name |
|
113 |
- the total node memory |
|
114 |
- the memory used by the node itself |
|
115 |
- the free node memory |
|
116 |
- the reserved node memory, which is the amount of free memory |
|
117 |
needed for N+1 compliance |
|
118 |
- total disk |
|
119 |
- free disk |
|
120 |
- number of primary instances |
|
121 |
- number of secondary instances |
|
122 |
- percent of free memory |
|
123 |
- percent of free disk |
|
119 |
.RS |
|
120 |
.TP |
|
121 |
.B F |
|
122 |
a character denoting the status of the node, with '-' meaning an |
|
123 |
offline node, '*' meaning N+1 failure and blank meaning a good node |
|
124 |
.TP |
|
125 |
.B Name |
|
126 |
the node name |
|
127 |
.TP |
|
128 |
.B t_mem |
|
129 |
the total node memory |
|
130 |
.TP |
|
131 |
.B n_mem |
|
132 |
the memory used by the node itself |
|
133 |
.TP |
|
134 |
.B i_mem |
|
135 |
the memory used by instances |
|
136 |
.TP |
|
137 |
.B x_mem |
|
138 |
amount memory which seems to be in use but cannot be determined why or |
|
139 |
by which instance; usually this means that the hypervisor has some |
|
140 |
overhead or that there are other reporting errors |
|
141 |
.TP |
|
142 |
.B f_mem |
|
143 |
the free node memory |
|
144 |
.TP |
|
145 |
.B r_mem |
|
146 |
the reserved node memory, which is the amount of free memory needed |
|
147 |
for N+1 compliance |
|
148 |
.TP |
|
149 |
.B t_dsk |
|
150 |
total disk |
|
151 |
.TP |
|
152 |
.B f_dsk |
|
153 |
free disk |
|
154 |
.TP |
|
155 |
.B pri |
|
156 |
number of primary instances |
|
157 |
.TP |
|
158 |
.B sec |
|
159 |
number of secondary instances |
|
160 |
.TP |
|
161 |
.B p_fmem |
|
162 |
percent of free memory |
|
163 |
.TP |
|
164 |
.B p_fdsk |
|
165 |
percent of free disk |
|
166 |
.RE |
|
124 | 167 |
|
125 | 168 |
.TP |
126 | 169 |
.B -o, --oneline |
... | ... | |
135 | 178 |
- improvement in the cluster score |
136 | 179 |
|
137 | 180 |
.TP |
181 |
.BI "-O " name |
|
182 |
This option (which can be given multiple times) will mark nodes as |
|
183 |
being \fIoffline\fR. This means a couple of things: |
|
184 |
.RS |
|
185 |
.TP |
|
186 |
- |
|
187 |
instances won't be placed on these nodes, not even temporarily; |
|
188 |
e.g. the \fIreplace primary\fR move is not available if the secondary |
|
189 |
node is offline, since this move requires a failover. |
|
190 |
.TP |
|
191 |
- |
|
192 |
these nodes will not be included in the score calculation (except for |
|
193 |
the percentage of instances on offline nodes) |
|
194 |
.RE |
|
195 |
|
|
196 |
.TP |
|
138 | 197 |
.BI "-n" nodefile ", --nodes=" nodefile |
139 | 198 |
The name of the file holding node information (if not collecting via |
140 | 199 |
RAPI), instead of the default |
... | ... | |
188 | 247 |
|
189 | 248 |
.SH EXAMPLE |
190 | 249 |
|
250 |
Note that this example are not for the latest version (they don't have |
|
251 |
full node data). |
|
252 |
|
|
191 | 253 |
.SS Default output |
192 | 254 |
|
193 | 255 |
With the default options, the program shows each individual step and |
... | ... | |
362 | 424 |
list (but hopefully will end in the same state). |
363 | 425 |
|
364 | 426 |
.SH SEE ALSO |
365 |
hn1(1), ganeti(7), gnt-instance(8), gnt-node(8) |
|
427 |
.BR hn1 "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), " |
|
428 |
.BR gnt-node "(8)" |
b/hn1.1 | ||
---|---|---|
1 |
.TH HN1 1 2009-03-14 htools "Ganeti H-tools"
|
|
1 |
.TH HN1 1 2009-03-22 htools "Ganeti H-tools"
|
|
2 | 2 |
.SH NAME |
3 | 3 |
hn1 \- N+1 fixer for Ganeti |
4 | 4 |
|
... | ... | |
92 | 92 |
the user to understand the node's most important parameters. |
93 | 93 |
|
94 | 94 |
The node list will contain these informations: |
95 |
- a character denoting the status of the node, with '-' meaning an |
|
96 |
offline node, '*' meaning N+1 failure and blank meaning a good |
|
97 |
node |
|
98 |
- the node name |
|
99 |
- the total node memory |
|
100 |
- the memory used by the node itself |
|
101 |
- the free node memory |
|
102 |
- the reserved node memory, which is the amount of free memory |
|
103 |
needed for N+1 compliance |
|
104 |
- total disk |
|
105 |
- free disk |
|
106 |
- number of primary instances |
|
107 |
- number of secondary instances |
|
108 |
- percent of free memory |
|
109 |
- percent of free disk |
|
95 |
.RS |
|
96 |
.TP |
|
97 |
.B F |
|
98 |
a character denoting the status of the node, with '-' meaning an |
|
99 |
offline node, '*' meaning N+1 failure and blank meaning a good node |
|
100 |
.TP |
|
101 |
.B Name |
|
102 |
the node name |
|
103 |
.TP |
|
104 |
.B t_mem |
|
105 |
the total node memory |
|
106 |
.TP |
|
107 |
.B n_mem |
|
108 |
the memory used by the node itself |
|
109 |
.TP |
|
110 |
.B i_mem |
|
111 |
the memory used by instances |
|
112 |
.TP |
|
113 |
.B x_mem |
|
114 |
amount memory which seems to be in use but cannot be determined why or |
|
115 |
by which instance; usually this means that the hypervisor has some |
|
116 |
overhead or that there are other reporting errors |
|
117 |
.TP |
|
118 |
.B f_mem |
|
119 |
the free node memory |
|
120 |
.TP |
|
121 |
.B r_mem |
|
122 |
the reserved node memory, which is the amount of free memory needed |
|
123 |
for N+1 compliance |
|
124 |
.TP |
|
125 |
.B t_dsk |
|
126 |
total disk |
|
127 |
.TP |
|
128 |
.B f_dsk |
|
129 |
free disk |
|
130 |
.TP |
|
131 |
.B pri |
|
132 |
number of primary instances |
|
133 |
.TP |
|
134 |
.B sec |
|
135 |
number of secondary instances |
|
136 |
.TP |
|
137 |
.B p_fmem |
|
138 |
percent of free memory |
|
139 |
.TP |
|
140 |
.B p_fdsk |
|
141 |
percent of free disk |
|
142 |
.RE |
|
110 | 143 |
|
111 | 144 |
.TP |
112 | 145 |
.BI "-n" nodefile ", --nodes=" nodefile |
... | ... | |
171 | 204 |
input file). |
172 | 205 |
|
173 | 206 |
.SH SEE ALSO |
174 |
hbal(1), ganeti(7), gnt-instance(8), gnt-node(8) |
|
207 |
.BR hbal "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), " |
|
208 |
.BR gnt-node "(8)" |
b/hscan.1 | ||
---|---|---|
1 |
.TH HSCAN 1 2009-03-22 htools "Ganeti H-tools" |
|
2 |
.SH NAME |
|
3 |
hscan \- Scan clusters via RAPI and save node/instance data |
|
4 |
|
|
5 |
.SH SYNOPSIS |
|
6 |
.B hscan |
|
7 |
.B "[-p]" |
|
8 |
.B "[--no-headers]" |
|
9 |
.BI "[-d " path "]" |
|
10 |
.I cluster... |
|
11 |
|
|
12 |
.B hscan |
|
13 |
.B --version |
|
14 |
|
|
15 |
.SH DESCRIPTION |
|
16 |
hscan is a tool for scanning clusters via RAPI and saving their data |
|
17 |
in the input format used by |
|
18 |
.BR hbal "(1) and " hn1 "(1)." |
|
19 |
It will also show a one-line score for each cluster scanned or, if |
|
20 |
desired, the cluster state as show by the \fB-p\fR option to the other |
|
21 |
tools. |
|
22 |
|
|
23 |
For each cluster, two files named \fIcluster\fB.instances\fR and |
|
24 |
\fIcluster\fB.nodes\fR will be generated holding the instance and node |
|
25 |
data. These files can then be used in \fBhbal\fR(1) or \fBhn1\fR(1) |
|
26 |
via the \fB-i\fR and \fB-n\fR options. |
|
27 |
|
|
28 |
.SH OPTIONS |
|
29 |
The options that can be passed to the program are as follows: |
|
30 |
|
|
31 |
.TP |
|
32 |
.B -p, --print-nodes |
|
33 |
Prints the node status for each cluster after the cluster's one-line |
|
34 |
status display, in a format designed to allow the user to understand |
|
35 |
the node's most important parameters. For details, see the man page |
|
36 |
for \fBhbal\fR(1). |
|
37 |
|
|
38 |
.TP |
|
39 |
.BI "-d " path |
|
40 |
Save the node and instance data for each cluster under \fIpath\fR, |
|
41 |
instead of the current directory. |
|
42 |
|
|
43 |
.TP |
|
44 |
.B -V, --version |
|
45 |
Just show the program version and exit. |
|
46 |
|
|
47 |
.SH EXIT STATUS |
|
48 |
|
|
49 |
The exist status of the command will be zero, unless for some reason |
|
50 |
loading the input data failed fatally (e.g. wrong node or instance |
|
51 |
data). |
|
52 |
|
|
53 |
.SH BUGS |
|
54 |
|
|
55 |
The program does not check its input data for consistency, and aborts |
|
56 |
with cryptic errors messages in this case. |
|
57 |
|
|
58 |
.SH SEE ALSO |
|
59 |
.BR hbal "(1), " hn1 "(1), " ganeti "(7), " gnt-instance "(8), " gnt-node "(8)" |
Also available in: Unified diff