Revision d2ac5526

b/README
242 242

  
243 243
    gnt-node list -oname,mtotal,mnode,mfree,dtotal,dfree \
244 244
      --separator '|' --no-headers > nodes
245
    gnt-instance list -oname,admin_ram,sda_size,pnode,snodes \
245
    gnt-instance list -oname,admin_ram,sda_size,status,pnode,snodes \
246 246
      --separator '|' --no-head > instances
247 247

  
248 248
These two files should be saved under the names of 'nodes' and 'instances'.
b/hbal.1
1
.TH HBAL 1 2009-03-14 htools "Ganeti H-tools"
1
.TH HBAL 1 2009-03-22 htools "Ganeti H-tools"
2 2
.SH NAME
3 3
hbal \- Cluster balancer for Ganeti
4 4

  
......
7 7
.B "[-C]"
8 8
.B "[-p]"
9 9
.B "[-o]"
10
.B "-l"
11
.BI "[ -m " cluster "]"
10
.BI "[-l" limit "]"
11
.BI "[-O" name... "]"
12
.BI "[-m " cluster "]"
12 13
.BI "[-n " nodes-file " ]"
13
.BI "[ -i " instances-file "]"
14
.BI "[-i " instances-file "]"
14 15

  
15 16
.B hbal
16 17
.B --version
......
61 62
  - coefficient of variance of the percent of reserved memory
62 63
  - coefficient of variance of the percent of free disk
63 64
  - percentage of nodes failing N+1 check
65
  - percentage of instances living (either as primary or secondary) on
66
    offline nodes
64 67

  
65 68
The free memory and free disk values help ensure that all nodes are
66 69
somewhat balanced in their resource usage. The reserved memory helps
......
69 72
N+1. And finally, the N+1 percentage helps guide the algorithm towards
70 73
eliminating N+1 failures, if possible.
71 74

  
72
Except for the N+1 failures, we use the coefficient of variance since
73
this brings the values into the same unit so to speak, and with a
74
restrict domain of values (between zero and one). The percentage of
75
N+1 failures, while also in this numeric range, doesn't actually has
76
the same meaning, but it has shown to work well.
75
Except for the N+1 failures and offline instances percentage, we use
76
the coefficient of variance since this brings the values into the same
77
unit so to speak, and with a restrict domain of values (between zero
78
and one). The percentage of N+1 failures, while also in this numeric
79
range, doesn't actually has the same meaning, but it has shown to work
80
well.
77 81

  
78 82
The other alternative, using for N+1 checks the coefficient of
79 83
variance of (N+1 fail=1, N+1 pass=0) across nodes could hint the
......
82 86
rules of the algorithm, so the N+1 checks would simply not work
83 87
anymore in this case.
84 88

  
89
The offline instances percentage (meaning the percentage of instances
90
living on offline nodes) will cause the algorithm to actively move
91
instances away from offline nodes. This, coupled with the restriction
92
on placement given by offline nodes, will cause evacuation of such
93
nodes.
94

  
85 95
On a perfectly balanced cluster (all nodes the same size, all
86 96
instances the same size and spread across the nodes equally), all
87 97
values would be zero. This doesn't happen too often in practice :)
......
106 116
the user to understand the node's most important parameters.
107 117

  
108 118
The node list will contain these informations:
109
  - a character denoting the status of the node, with '-' meaning an
110
    offline node, '*' meaning N+1 failure and blank meaning a good
111
    node
112
  - the node name
113
  - the total node memory
114
  - the memory used by the node itself
115
  - the free node memory
116
  - the reserved node memory, which is the amount of free memory
117
    needed for N+1 compliance
118
  - total disk
119
  - free disk
120
  - number of primary instances
121
  - number of secondary instances
122
  - percent of free memory
123
  - percent of free disk
119
.RS
120
.TP
121
.B F
122
a character denoting the status of the node, with '-' meaning an
123
offline node, '*' meaning N+1 failure and blank meaning a good node
124
.TP
125
.B Name
126
the node name
127
.TP
128
.B t_mem
129
the total node memory
130
.TP
131
.B n_mem
132
the memory used by the node itself
133
.TP
134
.B i_mem
135
the memory used by instances
136
.TP
137
.B x_mem
138
amount memory which seems to be in use but cannot be determined why or
139
by which instance; usually this means that the hypervisor has some
140
overhead or that there are other reporting errors
141
.TP
142
.B f_mem
143
the free node memory
144
.TP
145
.B r_mem
146
the reserved node memory, which is the amount of free memory needed
147
for N+1 compliance
148
.TP
149
.B t_dsk
150
total disk
151
.TP
152
.B f_dsk
153
free disk
154
.TP
155
.B pri
156
number of primary instances
157
.TP
158
.B sec
159
number of secondary instances
160
.TP
161
.B p_fmem
162
percent of free memory
163
.TP
164
.B p_fdsk
165
percent of free disk
166
.RE
124 167

  
125 168
.TP
126 169
.B -o, --oneline
......
135 178
  - improvement in the cluster score
136 179

  
137 180
.TP
181
.BI "-O " name
182
This option (which can be given multiple times) will mark nodes as
183
being \fIoffline\fR. This means a couple of things:
184
.RS
185
.TP
186
-
187
instances won't be placed on these nodes, not even temporarily;
188
e.g. the \fIreplace primary\fR move is not available if the secondary
189
node is offline, since this move requires a failover.
190
.TP
191
-
192
these nodes will not be included in the score calculation (except for
193
the percentage of instances on offline nodes)
194
.RE
195

  
196
.TP
138 197
.BI "-n" nodefile ", --nodes=" nodefile
139 198
The name of the file holding node information (if not collecting via
140 199
RAPI), instead of the default
......
188 247

  
189 248
.SH EXAMPLE
190 249

  
250
Note that this example are not for the latest version (they don't have
251
full node data).
252

  
191 253
.SS Default output
192 254

  
193 255
With the default options, the program shows each individual step and
......
362 424
list (but hopefully will end in the same state).
363 425

  
364 426
.SH SEE ALSO
365
hn1(1), ganeti(7), gnt-instance(8), gnt-node(8)
427
.BR hn1 "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), "
428
.BR gnt-node "(8)"
b/hn1.1
1
.TH HN1 1 2009-03-14 htools "Ganeti H-tools"
1
.TH HN1 1 2009-03-22 htools "Ganeti H-tools"
2 2
.SH NAME
3 3
hn1 \- N+1 fixer for Ganeti
4 4

  
......
92 92
the user to understand the node's most important parameters.
93 93

  
94 94
The node list will contain these informations:
95
  - a character denoting the status of the node, with '-' meaning an
96
    offline node, '*' meaning N+1 failure and blank meaning a good
97
    node
98
  - the node name
99
  - the total node memory
100
  - the memory used by the node itself
101
  - the free node memory
102
  - the reserved node memory, which is the amount of free memory
103
    needed for N+1 compliance
104
  - total disk
105
  - free disk
106
  - number of primary instances
107
  - number of secondary instances
108
  - percent of free memory
109
  - percent of free disk
95
.RS
96
.TP
97
.B F
98
a character denoting the status of the node, with '-' meaning an
99
offline node, '*' meaning N+1 failure and blank meaning a good node
100
.TP
101
.B Name
102
the node name
103
.TP
104
.B t_mem
105
the total node memory
106
.TP
107
.B n_mem
108
the memory used by the node itself
109
.TP
110
.B i_mem
111
the memory used by instances
112
.TP
113
.B x_mem
114
amount memory which seems to be in use but cannot be determined why or
115
by which instance; usually this means that the hypervisor has some
116
overhead or that there are other reporting errors
117
.TP
118
.B f_mem
119
the free node memory
120
.TP
121
.B r_mem
122
the reserved node memory, which is the amount of free memory needed
123
for N+1 compliance
124
.TP
125
.B t_dsk
126
total disk
127
.TP
128
.B f_dsk
129
free disk
130
.TP
131
.B pri
132
number of primary instances
133
.TP
134
.B sec
135
number of secondary instances
136
.TP
137
.B p_fmem
138
percent of free memory
139
.TP
140
.B p_fdsk
141
percent of free disk
142
.RE
110 143

  
111 144
.TP
112 145
.BI "-n" nodefile ", --nodes=" nodefile
......
171 204
input file).
172 205

  
173 206
.SH SEE ALSO
174
hbal(1), ganeti(7), gnt-instance(8), gnt-node(8)
207
.BR hbal "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), "
208
.BR gnt-node "(8)"
b/hscan.1
1
.TH HSCAN 1 2009-03-22 htools "Ganeti H-tools"
2
.SH NAME
3
hscan \- Scan clusters via RAPI and save node/instance data
4

  
5
.SH SYNOPSIS
6
.B hscan
7
.B "[-p]"
8
.B "[--no-headers]"
9
.BI "[-d " path "]"
10
.I cluster...
11

  
12
.B hscan
13
.B --version
14

  
15
.SH DESCRIPTION
16
hscan is a tool for scanning clusters via RAPI and saving their data
17
in the input format used by
18
.BR hbal "(1) and " hn1 "(1)."
19
It will also show a one-line score for each cluster scanned or, if
20
desired, the cluster state as show by the \fB-p\fR option to the other
21
tools.
22

  
23
For each cluster, two files named \fIcluster\fB.instances\fR and
24
\fIcluster\fB.nodes\fR will be generated holding the instance and node
25
data. These files can then be used in \fBhbal\fR(1) or \fBhn1\fR(1)
26
via the \fB-i\fR and \fB-n\fR options.
27

  
28
.SH OPTIONS
29
The options that can be passed to the program are as follows:
30

  
31
.TP
32
.B -p, --print-nodes
33
Prints the node status for each cluster after the cluster's one-line
34
status display, in a format designed to allow the user to understand
35
the node's most important parameters. For details, see the man page
36
for \fBhbal\fR(1).
37

  
38
.TP
39
.BI "-d " path
40
Save the node and instance data for each cluster under \fIpath\fR,
41
instead of the current directory.
42

  
43
.TP
44
.B -V, --version
45
Just show the program version and exit.
46

  
47
.SH EXIT STATUS
48

  
49
The exist status of the command will be zero, unless for some reason
50
loading the input data failed fatally (e.g. wrong node or instance
51
data).
52

  
53
.SH BUGS
54

  
55
The program does not check its input data for consistency, and aborts
56
with cryptic errors messages in this case.
57

  
58
.SH SEE ALSO
59
.BR hbal "(1), " hn1 "(1), " ganeti "(7), " gnt-instance "(8), " gnt-node "(8)"

Also available in: Unified diff