Revision e4f08c46
b/COPYING  

1 
GNU GENERAL PUBLIC LICENSE 

2 
Version 2, June 1991 

3  
4 
Copyright (C) 1989, 1991 Free Software Foundation, Inc., 

5 
51 Franklin Street, Fifth Floor, Boston, MA 021101301 USA 

6 
Everyone is permitted to copy and distribute verbatim copies 

7 
of this license document, but changing it is not allowed. 

8  
9 
Preamble 

10  
11 
The licenses for most software are designed to take away your 

12 
freedom to share and change it. By contrast, the GNU General Public 

13 
License is intended to guarantee your freedom to share and change free 

14 
softwareto make sure the software is free for all its users. This 

15 
General Public License applies to most of the Free Software 

16 
Foundation's software and to any other program whose authors commit to 

17 
using it. (Some other Free Software Foundation software is covered by 

18 
the GNU Lesser General Public License instead.) You can apply it to 

19 
your programs, too. 

20  
21 
When we speak of free software, we are referring to freedom, not 

22 
price. Our General Public Licenses are designed to make sure that you 

23 
have the freedom to distribute copies of free software (and charge for 

24 
this service if you wish), that you receive source code or can get it 

25 
if you want it, that you can change the software or use pieces of it 

26 
in new free programs; and that you know you can do these things. 

27  
28 
To protect your rights, we need to make restrictions that forbid 

29 
anyone to deny you these rights or to ask you to surrender the rights. 

30 
These restrictions translate to certain responsibilities for you if you 

31 
distribute copies of the software, or if you modify it. 

32  
33 
For example, if you distribute copies of such a program, whether 

34 
gratis or for a fee, you must give the recipients all the rights that 

35 
you have. You must make sure that they, too, receive or can get the 

36 
source code. And you must show them these terms so they know their 

37 
rights. 

38  
39 
We protect your rights with two steps: (1) copyright the software, and 

40 
(2) offer you this license which gives you legal permission to copy, 

41 
distribute and/or modify the software. 

42  
43 
Also, for each author's protection and ours, we want to make certain 

44 
that everyone understands that there is no warranty for this free 

45 
software. If the software is modified by someone else and passed on, we 

46 
want its recipients to know that what they have is not the original, so 

47 
that any problems introduced by others will not reflect on the original 

48 
authors' reputations. 

49  
50 
Finally, any free program is threatened constantly by software 

51 
patents. We wish to avoid the danger that redistributors of a free 

52 
program will individually obtain patent licenses, in effect making the 

53 
program proprietary. To prevent this, we have made it clear that any 

54 
patent must be licensed for everyone's free use or not licensed at all. 

55  
56 
The precise terms and conditions for copying, distribution and 

57 
modification follow. 

58  
59 
GNU GENERAL PUBLIC LICENSE 

60 
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 

61  
62 
0. This License applies to any program or other work which contains 

63 
a notice placed by the copyright holder saying it may be distributed 

64 
under the terms of this General Public License. The "Program", below, 

65 
refers to any such program or work, and a "work based on the Program" 

66 
means either the Program or any derivative work under copyright law: 

67 
that is to say, a work containing the Program or a portion of it, 

68 
either verbatim or with modifications and/or translated into another 

69 
language. (Hereinafter, translation is included without limitation in 

70 
the term "modification".) Each licensee is addressed as "you". 

71  
72 
Activities other than copying, distribution and modification are not 

73 
covered by this License; they are outside its scope. The act of 

74 
running the Program is not restricted, and the output from the Program 

75 
is covered only if its contents constitute a work based on the 

76 
Program (independent of having been made by running the Program). 

77 
Whether that is true depends on what the Program does. 

78  
79 
1. You may copy and distribute verbatim copies of the Program's 

80 
source code as you receive it, in any medium, provided that you 

81 
conspicuously and appropriately publish on each copy an appropriate 

82 
copyright notice and disclaimer of warranty; keep intact all the 

83 
notices that refer to this License and to the absence of any warranty; 

84 
and give any other recipients of the Program a copy of this License 

85 
along with the Program. 

86  
87 
You may charge a fee for the physical act of transferring a copy, and 

88 
you may at your option offer warranty protection in exchange for a fee. 

89  
90 
2. You may modify your copy or copies of the Program or any portion 

91 
of it, thus forming a work based on the Program, and copy and 

92 
distribute such modifications or work under the terms of Section 1 

93 
above, provided that you also meet all of these conditions: 

94  
95 
a) You must cause the modified files to carry prominent notices 

96 
stating that you changed the files and the date of any change. 

97  
98 
b) You must cause any work that you distribute or publish, that in 

99 
whole or in part contains or is derived from the Program or any 

100 
part thereof, to be licensed as a whole at no charge to all third 

101 
parties under the terms of this License. 

102  
103 
c) If the modified program normally reads commands interactively 

104 
when run, you must cause it, when started running for such 

105 
interactive use in the most ordinary way, to print or display an 

106 
announcement including an appropriate copyright notice and a 

107 
notice that there is no warranty (or else, saying that you provide 

108 
a warranty) and that users may redistribute the program under 

109 
these conditions, and telling the user how to view a copy of this 

110 
License. (Exception: if the Program itself is interactive but 

111 
does not normally print such an announcement, your work based on 

112 
the Program is not required to print an announcement.) 

113  
114 
These requirements apply to the modified work as a whole. If 

115 
identifiable sections of that work are not derived from the Program, 

116 
and can be reasonably considered independent and separate works in 

117 
themselves, then this License, and its terms, do not apply to those 

118 
sections when you distribute them as separate works. But when you 

119 
distribute the same sections as part of a whole which is a work based 

120 
on the Program, the distribution of the whole must be on the terms of 

121 
this License, whose permissions for other licensees extend to the 

122 
entire whole, and thus to each and every part regardless of who wrote it. 

123  
124 
Thus, it is not the intent of this section to claim rights or contest 

125 
your rights to work written entirely by you; rather, the intent is to 

126 
exercise the right to control the distribution of derivative or 

127 
collective works based on the Program. 

128  
129 
In addition, mere aggregation of another work not based on the Program 

130 
with the Program (or with a work based on the Program) on a volume of 

131 
a storage or distribution medium does not bring the other work under 

132 
the scope of this License. 

133  
134 
3. You may copy and distribute the Program (or a work based on it, 

135 
under Section 2) in object code or executable form under the terms of 

136 
Sections 1 and 2 above provided that you also do one of the following: 

137  
138 
a) Accompany it with the complete corresponding machinereadable 

139 
source code, which must be distributed under the terms of Sections 

140 
1 and 2 above on a medium customarily used for software interchange; or, 

141  
142 
b) Accompany it with a written offer, valid for at least three 

143 
years, to give any third party, for a charge no more than your 

144 
cost of physically performing source distribution, a complete 

145 
machinereadable copy of the corresponding source code, to be 

146 
distributed under the terms of Sections 1 and 2 above on a medium 

147 
customarily used for software interchange; or, 

148  
149 
c) Accompany it with the information you received as to the offer 

150 
to distribute corresponding source code. (This alternative is 

151 
allowed only for noncommercial distribution and only if you 

152 
received the program in object code or executable form with such 

153 
an offer, in accord with Subsection b above.) 

154  
155 
The source code for a work means the preferred form of the work for 

156 
making modifications to it. For an executable work, complete source 

157 
code means all the source code for all modules it contains, plus any 

158 
associated interface definition files, plus the scripts used to 

159 
control compilation and installation of the executable. However, as a 

160 
special exception, the source code distributed need not include 

161 
anything that is normally distributed (in either source or binary 

162 
form) with the major components (compiler, kernel, and so on) of the 

163 
operating system on which the executable runs, unless that component 

164 
itself accompanies the executable. 

165  
166 
If distribution of executable or object code is made by offering 

167 
access to copy from a designated place, then offering equivalent 

168 
access to copy the source code from the same place counts as 

169 
distribution of the source code, even though third parties are not 

170 
compelled to copy the source along with the object code. 

171  
172 
4. You may not copy, modify, sublicense, or distribute the Program 

173 
except as expressly provided under this License. Any attempt 

174 
otherwise to copy, modify, sublicense or distribute the Program is 

175 
void, and will automatically terminate your rights under this License. 

176 
However, parties who have received copies, or rights, from you under 

177 
this License will not have their licenses terminated so long as such 

178 
parties remain in full compliance. 

179  
180 
5. You are not required to accept this License, since you have not 

181 
signed it. However, nothing else grants you permission to modify or 

182 
distribute the Program or its derivative works. These actions are 

183 
prohibited by law if you do not accept this License. Therefore, by 

184 
modifying or distributing the Program (or any work based on the 

185 
Program), you indicate your acceptance of this License to do so, and 

186 
all its terms and conditions for copying, distributing or modifying 

187 
the Program or works based on it. 

188  
189 
6. Each time you redistribute the Program (or any work based on the 

190 
Program), the recipient automatically receives a license from the 

191 
original licensor to copy, distribute or modify the Program subject to 

192 
these terms and conditions. You may not impose any further 

193 
restrictions on the recipients' exercise of the rights granted herein. 

194 
You are not responsible for enforcing compliance by third parties to 

195 
this License. 

196  
197 
7. If, as a consequence of a court judgment or allegation of patent 

198 
infringement or for any other reason (not limited to patent issues), 

199 
conditions are imposed on you (whether by court order, agreement or 

200 
otherwise) that contradict the conditions of this License, they do not 

201 
excuse you from the conditions of this License. If you cannot 

202 
distribute so as to satisfy simultaneously your obligations under this 

203 
License and any other pertinent obligations, then as a consequence you 

204 
may not distribute the Program at all. For example, if a patent 

205 
license would not permit royaltyfree redistribution of the Program by 

206 
all those who receive copies directly or indirectly through you, then 

207 
the only way you could satisfy both it and this License would be to 

208 
refrain entirely from distribution of the Program. 

209  
210 
If any portion of this section is held invalid or unenforceable under 

211 
any particular circumstance, the balance of the section is intended to 

212 
apply and the section as a whole is intended to apply in other 

213 
circumstances. 

214  
215 
It is not the purpose of this section to induce you to infringe any 

216 
patents or other property right claims or to contest validity of any 

217 
such claims; this section has the sole purpose of protecting the 

218 
integrity of the free software distribution system, which is 

219 
implemented by public license practices. Many people have made 

220 
generous contributions to the wide range of software distributed 

221 
through that system in reliance on consistent application of that 

222 
system; it is up to the author/donor to decide if he or she is willing 

223 
to distribute software through any other system and a licensee cannot 

224 
impose that choice. 

225  
226 
This section is intended to make thoroughly clear what is believed to 

227 
be a consequence of the rest of this License. 

228  
229 
8. If the distribution and/or use of the Program is restricted in 

230 
certain countries either by patents or by copyrighted interfaces, the 

231 
original copyright holder who places the Program under this License 

232 
may add an explicit geographical distribution limitation excluding 

233 
those countries, so that distribution is permitted only in or among 

234 
countries not thus excluded. In such case, this License incorporates 

235 
the limitation as if written in the body of this License. 

236  
237 
9. The Free Software Foundation may publish revised and/or new versions 

238 
of the General Public License from time to time. Such new versions will 

239 
be similar in spirit to the present version, but may differ in detail to 

240 
address new problems or concerns. 

241  
242 
Each version is given a distinguishing version number. If the Program 

243 
specifies a version number of this License which applies to it and "any 

244 
later version", you have the option of following the terms and conditions 

245 
either of that version or of any later version published by the Free 

246 
Software Foundation. If the Program does not specify a version number of 

247 
this License, you may choose any version ever published by the Free Software 

248 
Foundation. 

249  
250 
10. If you wish to incorporate parts of the Program into other free 

251 
programs whose distribution conditions are different, write to the author 

252 
to ask for permission. For software which is copyrighted by the Free 

253 
Software Foundation, write to the Free Software Foundation; we sometimes 

254 
make exceptions for this. Our decision will be guided by the two goals 

255 
of preserving the free status of all derivatives of our free software and 

256 
of promoting the sharing and reuse of software generally. 

257  
258 
NO WARRANTY 

259  
260 
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 

261 
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 

262 
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 

263 
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 

264 
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 

265 
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 

266 
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 

267 
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 

268 
REPAIR OR CORRECTION. 

269  
270 
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 

271 
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 

272 
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 

273 
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 

274 
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 

275 
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 

276 
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 

277 
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 

278 
POSSIBILITY OF SUCH DAMAGES. 

279  
280 
END OF TERMS AND CONDITIONS 

281  
282 
How to Apply These Terms to Your New Programs 

283  
284 
If you develop a new program, and you want it to be of the greatest 

285 
possible use to the public, the best way to achieve this is to make it 

286 
free software which everyone can redistribute and change under these terms. 

287  
288 
To do so, attach the following notices to the program. It is safest 

289 
to attach them to the start of each source file to most effectively 

290 
convey the exclusion of warranty; and each file should have at least 

291 
the "copyright" line and a pointer to where the full notice is found. 

292  
293 
<one line to give the program's name and a brief idea of what it does.> 

294 
Copyright (C) <year> <name of author> 

295  
296 
This program is free software; you can redistribute it and/or modify 

297 
it under the terms of the GNU General Public License as published by 

298 
the Free Software Foundation; either version 2 of the License, or 

299 
(at your option) any later version. 

300  
301 
This program is distributed in the hope that it will be useful, 

302 
but WITHOUT ANY WARRANTY; without even the implied warranty of 

303 
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 

304 
GNU General Public License for more details. 

305  
306 
You should have received a copy of the GNU General Public License along 

307 
with this program; if not, write to the Free Software Foundation, Inc., 

308 
51 Franklin Street, Fifth Floor, Boston, MA 021101301 USA. 

309  
310 
Also add information on how to contact you by electronic and paper mail. 

311  
312 
If the program is interactive, make it output a short notice like this 

313 
when it starts in an interactive mode: 

314  
315 
Gnomovision version 69, Copyright (C) year name of author 

316 
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 

317 
This is free software, and you are welcome to redistribute it 

318 
under certain conditions; type `show c' for details. 

319  
320 
The hypothetical commands `show w' and `show c' should show the appropriate 

321 
parts of the General Public License. Of course, the commands you use may 

322 
be called something other than `show w' and `show c'; they could even be 

323 
mouseclicks or menu itemswhatever suits your program. 

324  
325 
You should also get your employer (if you work as a programmer) or your 

326 
school, if any, to sign a "copyright disclaimer" for the program, if 

327 
necessary. Here is a sample; alter the names: 

328  
329 
Yoyodyne, Inc., hereby disclaims all copyright interest in the program 

330 
`Gnomovision' (which makes passes at compilers) written by James Hacker. 

331  
332 
<signature of Ty Coon>, 1 April 1989 

333 
Ty Coon, President of Vice 

334  
335 
This General Public License does not permit incorporating your program into 

336 
proprietary programs. If your program is a subroutine library, you may 

337 
consider it more useful to permit linking proprietary applications with the 

338 
library. If this is what you want to do, use the GNU Lesser General 

339 
Public License instead of this License. 
b/Makefile  

1 
HSRCS := $(wildcard src/*.hs) 

2 
HDDIR = apidoc 

3  
4 
# Haskell rules 

5  
6 
all: 

7 
$(MAKE) C src 

8  
9 
README.html: README 

10 
rst2html $< $@ 

11  
12 
doc: README.html 

13 
rm rf $(HDDIR) 

14 
mkdir p $(HDDIR)/src 

15 
cp hscolour.css $(HDDIR)/src 

16 
for file in $(HSRCS); do \ 

17 
HsColour css anchor \ 

18 
$$file > $(HDDIR)/src/`basename $$file .hs`.html ; \ 

19 
done 

20 
ln sf hn1.html $(HDDIR)/src/Main.html 

21 
haddock odir $(HDDIR) html ignoreallexports \ 

22 
t hn1 p haddockprologue \ 

23 
sourcemodule="src/%{MODULE/.//}.html" \ 

24 
sourceentity="src/%{MODULE/.//}.html#%{NAME}" \ 

25 
$(HSRCS) 

26  
27 
clean: 

28 
rm f *.o *.cmi *.cmo *.cmx *.old hn1 zn1 *.prof *.ps *.stat *.aux \ 

29 
gmon.out *.hi README.html TAGS 

30  
31 
.PHONY : all doc clean hn1 
b/README  

1 
Cluster tools (haneti?) 

2 
======================== 

3  
4 
These are some simple cluster tools for fixing common problems. Right now N+1 

5 
and rebalancing are included. 

6  
7 
.. contents:: 

8  
9 
Cluster N+1 solver 

10 
 

11  
12 
This program runs a very simple brute force algorithm over the instance 

13 
placement space in order to determine the shortest number of replacedisks 

14 
needed to fix the cluster. Note this means we won't get a balanced cluster, 

15 
just one that passes N+1 checks. 

16  
17 
Also note that the set of all instance placements on a 20/80 cluster is 

18 
(20*19)^80, that is ~10^200, so... 

19  
20 
Algorithm 

21 
+++++++++ 

22  
23 
The algorithm is a simple twophase process. 

24  
25 
In phase 1 we determine the removal set, that is the set of instances that when 

26 
removed completely from the cluster, make it healthy again. The instances that 

27 
can go into the set are all the primary and secondary instances of the failing 

28 
nodes. The result from this phase is actually a list  we compute all sets of 

29 
the same minimum length. 

30  
31 
So basically we aim to determine here: what is the minimum number of instances 

32 
that need to be removed (this is called the removal depth) and which are the 

33 
actual combinations that fit (called the list of removal sets). 

34  
35 
In phase 2, for each removal set computed in the previous phase, we take the 

36 
removed instances and try to determine where we can put them so that the 

37 
cluster is still passing N+1 checks. From this list of possible solutions 

38 
(called the list of solutions), we compute the one that has the smallest delta 

39 
from the original state (the delta is the number of replace disks that needs to 

40 
be run) and chose this as the final solution. 

41  
42 
Implementation 

43 
++++++++++++++ 

44  
45 
Of course, a naive implementation based on the above description will run for 

46 
long periods of time, so the implementation has to be smart in order to prune 

47 
the solution space as eagerly as possible. 

48  
49 
In the following, we use as example a set of test data (a cluster with 20 

50 
nodes, 80 instances that has 5 nodes failing N+1 checks for a total of 12 

51 
warnings). 

52  
53 
On this set, the minimum depth is 4 (anything below fails), and for this depth 

54 
the current version of the algorithm generates 5 removal sets; a previous 

55 
version of the first phase generated a slightly different set of instances, with 

56 
two removal sets. For the original version of the algorithm: 

57  
58 
 the first, nonoptimized implementation computed a solution of delta=4 in 30 

59 
minutes on serverclass CPUs and was still running when aborted 10 minutes 

60 
later 

61 
 the intermediate optimized version computed the whole solution space and 

62 
found a delta=3 solution in around 10 seconds on a laptopclass CPU (total 

63 
number of solutions ~600k) 

64 
 latest version on server CPUs (which actually computes more removal sets) 

65 
computes depth=4 in less than a second and depth=5 in around 2 seconds, and 

66 
depth=6 in less than 20 seconds; depth=8 takes under five minutes (this is 

67 
10^10 bigger solution space) 

68  
69 
Note that when (artificially) increasing the depth to 5 the number of removal 

70 
sets grows fast (~3000) and a (again artificial) depth 6 generates 61k removal 

71 
sets. Therefore, it is possible to restrict the number of solution sets 

72 
examined via a commandline option. 

73  
74 
The factors that influence the run time are: 

75  
76 
 the removal depth; for each increase with one of the depth, we grow the 

77 
solution space by the number of nodes squared (since a new instance can live 

78 
any two nodes as primary/secondary, therefore (almost) N times N); i.e., 

79 
depth=1 will create a N^2 solution space, depth two will make this N^4, 

80 
depth three will be N^6, etc. 

81 
 the removal depth again; for each increase in the depth, there will be more 

82 
valid removal sets, and the space of solutions increases linearly with the 

83 
number of removal sets 

84  
85 
Therefore, the smaller the depth the faster the algorithm will be; it doesn't 

86 
seem like this algorithm will work for clusters of 100 nodes and many many 

87 
small instances (e.g. 256MB instances on 16GB nodes). 

88  
89 
Currently applied optimizations: 

90  
91 
 when choosing where to place an instance in phase two, there are N*(N1) 

92 
possible primary/secondary options; however, if instead of iterating over all 

93 
p * s pairs, we first determine the set of primary nodes that can hold this 

94 
instance (without failing N+1), we can cut (N1) secondary placements for 

95 
each primary node removed; and since this applies at every iteration of phase 

96 
2 it linearly decreases the solution space, and on full clusters, this can 

97 
mean a fourfive times reductions of solution space 

98 
 since the number of solutions is very high even for smaller depths (on the 

99 
test data, depth=4 results in 1.8M solutions) we can't compare them at the 

100 
end, so at each iteration in phase 2 we only promote the best solution out of 

101 
our own set of solutions 

102 
 since the placement of instances can only increase the delta of the solution 

103 
(placing a new instance will add zero or more replacedisks steps), it means 

104 
the delta will only increase while recursing during phase 2; therefore, if we 

105 
know at one point that we have a current delta that is equal or higher to the 

106 
delta of the best solution so far, we can abort the recursion; this cuts a 

107 
tremendous number of branches; further promotion of the best solution from 

108 
one removal set to another can cut entire removal sets after a few recursions 

109  
110 
Command line usage 

111 
++++++++++++++++++ 

112  
113 
Synopsis:: 

114  
115 
hn1 [n NODES_FILE] [i INSTANCES_FILE] [d START_DEPTH] \ 

116 
[r MAX_REMOVALS] [l MIN_DELTA] [L MAX_DELTA] \ 

117 
[p] [C] 

118  
119 
The n and i options change the names of the input files. The d option 

120 
changes the start depth, as a higher depth can give (with a longer computation 

121 
time) a solution with better delta. The r option restricts at each depth the 

122 
number of solutions considered  with r=1000 for example even depth=10 finishes 

123 
in less than a second. 

124  
125 
The p option will show the cluster state after the solution is implemented, 

126 
while the C option will show the needed gntinstance commands to implement 

127 
it. 

128  
129 
The l (mindelta) and L (maxdelta) options restrict the solution in the 

130 
following ways: 

131  
132 
 mindelta will cause the search to abort early once we find a solution with 

133 
delta less than or equal to this parameter; this can cause extremely fast 

134 
results in case a desired solution is found quickly; the default value for 

135 
this parameter is zero, so once we find a "perfect" solution we finish early 

136 
 maxdelta causes rejection of valid solution but which have delta higher 

137 
than the value of this parameter; this can reduce the depth of the search 

138 
tree, with sometimes significant speedups; by default, this optimization is 

139 
not used 

140  
141 
Individually or combined, these two parameters can (if there are any) very 

142 
fast result; on our test data, depth=34 (max depth!) is solved in 2 seconds 

143 
with mindelta=0/maxdelta=1 (since there is such a solution), and the 

144 
extremely low maxdelta causes extreme pruning. 

145  
146 
Cluster rebalancer 

147 
 

148  
149 
Compared to the N+1 solver, the rebalancer uses a very simple algorithm: 

150 
repeatedly try to move each instance one step, so that the cluster score 

151 
becomes better. We stop when no further move can improve the score. 

152  
153 
The algorithm is divided into rounds (all identical): 

154  
155 
#. Repeat for each instance: 

156  
157 
#. Compute score after the potential failover of the instance 

158  
159 
#. For each node that is different from the current primary/secondary 

160  
161 
#. Compute score after replacing the primary with this new node 

162  
163 
#. Compute score after replacing the secondary with this new node 

164  
165  
166 
#. Out of this N*2+1 possible new scores (and their associated move) for 

167 
this instance, we choose the one that is the best in terms of cluster 

168 
score, and then proceed to the next instance 

169  
170 
Since we don't compute all combinations of moves for instances (e.g. the first 

171 
instance's all moves Cartesian product with second instance's all moves, etc.) 

172 
but we proceed serially instance A, then B, then C, the total computations we 

173 
make in one steps is simply N(number of nodes)*2+1 times I(number of instances), 

174 
instead of (N*2+1)^I. So therefore the runtime for a round is trivial. 

175  
176 
Further rounds are done, since the relocation of instances might offer better 

177 
places for instances which we didn't move, or simply didn't move to the best 

178 
place. It is possible to limit the rounds, but usually the algorithm finishes 

179 
after a few rounds by itself. 

180  
181 
Note that the cluster *must* be N+1 compliant before this algorithm is run, and 

182 
will stay at each move N+1 compliant. Therefore, the final cluster will be N+1 

183 
compliant. 

184  
185 
Singleround solutions 

186 
++++++++++++++++++++++ 

187  
188 
Singleround solutions have the very nice property that they are 

189 
incrementallyvalid. In other words, if you have a 10step solution, at each 

190 
step the cluster is both N+1 compliant and better than the previous step. 

191  
192 
This means that you can stop at any point and you will have a better cluster. 

193 
For this reason, singleround solutions are recommended in the common case of 

194 
let's make this better. Multiround solutions will be better though when adding 

195 
a couple of new, empty nodes to the cluster due to the many relocations needed. 

196  
197  
198 
Multiround solutions 

199 
+++++++++++++++++++++ 

200  
201 
A multiround solution (not for a single round), due to deduplication of moves 

202 
(i.e. just put the instance directly in its final place, and not move it five 

203 
times around) loses both these properties. It might be that it's not possible to 

204 
directly put the instance on the final nodes. So it can be possible that yes, 

205 
the cluster is happy in the final solution and nice, but you cannot do the steps 

206 
in the shown order. Solving this (via additional instance move(s)) is left to 

207 
the user. 

208  
209 
Command line usage 

210 
++++++++++++++++++ 

211  
212 
Synopsis:: 

213  
214 
hbal [n NODES_FILE] [i INSTANCES_FILE] \ 

215 
[r MAX_ROUNDS] \ 

216 
[p] [C] 

217  
218 
The n and i options change the names of the input files. The r option 

219 
restricts the maximum number of rounds (and is more of safety measure). 

220  
221 
The p option will show the cluster state after the solution is implemented, 

222 
while the C option will show the needed gntinstance commands to implement 

223 
it. 

224  
225 
Integration with Ganeti 

226 
 

227  
228 
The programs needs only the output of the node list and instance list. That is, 

229 
they need the following two commands to be run:: 

230  
231 
gntnode list oname,mtotal,mfree,dtotal,dfree,pinst_list,sinst_list \ 

232 
separator '' noheaders > nodes 

233 
gntinstance list oname,admin_ram,sda_size \ 

234 
separator '' nohead > instances 

235  
236 
These two files should be saved under the names of 'nodes' and 'instances'. 

237  
238 
When run, the programs will show some informational messages and output the 

239 
chosen solution, in the form of a list of instance name and chosen 

240 
primary/secondary nodes. The user then needs to run the necessary commands to 

241 
get the instances to live on those nodes. 

242  
243 
Note that sda_size is less than the total disk size of an instance by 4352 

244 
MiB, so if disk space is at a premium the calculation could be wrong; in this 

245 
case, please adjust the values manually. 
b/haddockprologue  

1 
This is the internal documentation for hn1, an experimental N+1 

2 
cluster solver. 

3  
4 
Start with the "Main" module, the follow with "Cluster" and then the 

5 
rest. 
b/hscolour.css  

1  
2 
.keyglyph, .layout {color: red;} 

3 
.keyword {color: blue;} 

4 
.comment, .comment a {color: green;} 

5 
.str, .chr {color: teal;} 

6 
.keyword,.conid, .varid, .conop, .varop, .num, .cpp, .sel, .definition {} 
b/src/Cluster.hs  

1 
{ Implementation of clusterwide logic. 

2  
3 
This module holds all pure clusterlogic; I\/O related functionality 

4 
goes into the "Main" module. 

5  
6 
} 

7  
8 
module Cluster 

9 
( 

10 
 * Types 

11 
NodeList 

12 
, InstanceList 

13 
, Placement 

14 
, Solution(..) 

15 
, Table(..) 

16 
, Removal 

17 
 * Generic functions 

18 
, totalResources 

19 
 * First phase functions 

20 
, computeBadItems 

21 
 * Second phase functions 

22 
, computeSolution 

23 
, applySolution 

24 
, printSolution 

25 
, printNodes 

26 
 * Balacing functions 

27 
, checkMove 

28 
, compCV 

29 
, printStats 

30 
 * Loading functions 

31 
, loadData 

32 
) where 

33  
34 
import Data.List 

35 
import Data.Maybe (isNothing, fromJust) 

36 
import Text.Printf (printf) 

37 
import Data.Function 

38  
39 
import qualified Container 

40 
import qualified Instance 

41 
import qualified Node 

42 
import Utils 

43  
44 
type NodeList = Container.Container Node.Node 

45 
type InstanceList = Container.Container Instance.Instance 

46 
type Score = Double 

47  
48 
  The description of an instance placement. 

49 
type Placement = (Int, Int, Int) 

50  
51 
{  A cluster solution described as the solution delta and the list 

52 
of placements. 

53  
54 
} 

55 
data Solution = Solution Int [Placement] 

56 
deriving (Eq, Ord, Show) 

57  
58 
  Returns the delta of a solution or 1 for Nothing 

59 
solutionDelta :: Maybe Solution > Int 

60 
solutionDelta sol = case sol of 

61 
Just (Solution d _) > d 

62 
_ > 1 

63  
64 
  A removal set. 

65 
data Removal = Removal NodeList [Instance.Instance] 

66  
67 
  An instance move definition 

68 
data IMove = Failover 

69 
 ReplacePrimary Int 

70 
 ReplaceSecondary Int 

71 
deriving (Show) 

72  
73 
  The complete state for the balancing solution 

74 
data Table = Table NodeList InstanceList Score [Placement] 

75 
deriving (Show) 

76  
77 
 General functions 

78  
79 
  Cap the removal list if needed. 

80 
capRemovals :: [a] > Int > [a] 

81 
capRemovals removals max_removals = 

82 
if max_removals > 0 then 

83 
take max_removals removals 

84 
else 

85 
removals 

86  
87 
  Check if the given node list fails the N+1 check. 

88 
verifyN1Check :: [Node.Node] > Bool 

89 
verifyN1Check nl = any Node.failN1 nl 

90  
91 
  Verifies the N+1 status and return the affected nodes. 

92 
verifyN1 :: [Node.Node] > [Node.Node] 

93 
verifyN1 nl = filter Node.failN1 nl 

94  
95 
{ Add an instance and return the new node and instance maps. } 

96 
addInstance :: NodeList > Instance.Instance > 

97 
Node.Node > Node.Node > Maybe NodeList 

98 
addInstance nl idata pri sec = 

99 
let pdx = Node.idx pri 

100 
sdx = Node.idx sec 

101 
in do 

102 
pnode < Node.addPri pri idata 

103 
snode < Node.addSec sec idata pdx 

104 
new_nl < return $ Container.addTwo sdx snode 

105 
pdx pnode nl 

106 
return new_nl 

107  
108 
  Remove an instance and return the new node and instance maps. 

109 
removeInstance :: NodeList > Instance.Instance > NodeList 

110 
removeInstance nl idata = 

111 
let pnode = Instance.pnode idata 

112 
snode = Instance.snode idata 

113 
pn = Container.find pnode nl 

114 
sn = Container.find snode nl 

115 
new_nl = Container.addTwo 

116 
pnode (Node.removePri pn idata) 

117 
snode (Node.removeSec sn idata) nl in 

118 
new_nl 

119  
120 
  Remove an instance and return the new node map. 

121 
removeInstances :: NodeList > [Instance.Instance] > NodeList 

122 
removeInstances = foldl' removeInstance 

123  
124 
  Compute the total free disk and memory in the cluster. 

125 
totalResources :: Container.Container Node.Node > (Int, Int) 

126 
totalResources nl = 

127 
foldl' 

128 
(\ (mem, disk) node > (mem + (Node.f_mem node), 

129 
disk + (Node.f_disk node))) 

130 
(0, 0) (Container.elems nl) 

131  
132 
{  Compute a new version of a cluster given a solution. 

133  
134 
This is not used for computing the solutions, but for applying a 

135 
(knowngood) solution to the original cluster for final display. 

136  
137 
It first removes the relocated instances after which it places them on 

138 
their new nodes. 

139  
140 
} 

141 
applySolution :: NodeList > InstanceList > [Placement] > NodeList 

142 
applySolution nl il sol = 

143 
let odxes = map (\ (a, b, c) > (Container.find a il, 

144 
Node.idx (Container.find b nl), 

145 
Node.idx (Container.find c nl)) 

146 
) sol 

147 
idxes = (\ (x, _, _) > x) (unzip3 odxes) 

148 
nc = removeInstances nl idxes 

149 
in 

150 
foldl' (\ nz (a, b, c) > 

151 
let new_p = Container.find b nz 

152 
new_s = Container.find c nz in 

153 
fromJust (addInstance nz a new_p new_s) 

154 
) nc odxes 

155  
156  
157 
 First phase functions 

158  
159 
{  Given a list 1,2,3..n build a list of pairs [(1, [2..n]), (2, 

160 
[3..n]), ...] 

161  
162 
} 

163 
genParts :: [a] > Int > [(a, [a])] 

164 
genParts l count = 

165 
case l of 

166 
[] > [] 

167 
x:xs > 

168 
if length l < count then 

169 
[] 

170 
else 

171 
(x, xs) : (genParts xs count) 

172  
173 
  Generates combinations of count items from the names list. 

174 
genNames :: Int > [b] > [[b]] 

175 
genNames count1 names1 = 

176 
let aux_fn count names current = 

177 
case count of 

178 
0 > [current] 

179 
_ > 

180 
concatMap 

181 
(\ (x, xs) > aux_fn (count  1) xs (x:current)) 

182 
(genParts names count) 

183 
in 

184 
aux_fn count1 names1 [] 

185  
186 
{  Computes the pair of bad nodes and instances. 

187  
188 
The bad node list is computed via a simple 'verifyN1' check, and the 

189 
bad instance list is the list of primary and secondary instances of 

190 
those nodes. 

191  
192 
} 

193 
computeBadItems :: NodeList > InstanceList > 

194 
([Node.Node], [Instance.Instance]) 

195 
computeBadItems nl il = 

196 
let bad_nodes = verifyN1 $ Container.elems nl 

197 
bad_instances = map (\idx > Container.find idx il) $ 

198 
sort $ nub $ concat $ 

199 
map (\ n > (Node.slist n) ++ (Node.plist n)) bad_nodes 

200 
in 

201 
(bad_nodes, bad_instances) 

202  
203  
204 
{  Checks if removal of instances results in N+1 pass. 

205  
206 
Note: the check removal cannot optimize by scanning only the affected 

207 
nodes, since the cluster is known to be not healthy; only the check 

208 
placement can make this shortcut. 

209  
210 
} 

211 
checkRemoval :: NodeList > [Instance.Instance] > Maybe Removal 

212 
checkRemoval nl victims = 

213 
let nx = removeInstances nl victims 

214 
failN1 = verifyN1Check (Container.elems nx) 

215 
in 

216 
if failN1 then 

217 
Nothing 

218 
else 

219 
Just $ Removal nx victims 

220  
221  
222 
  Computes the removals list for a given depth 

223 
computeRemovals :: Cluster.NodeList 

224 
> [Instance.Instance] 

225 
> Int 

226 
> [Maybe Cluster.Removal] 

227 
computeRemovals nl bad_instances depth = 

228 
map (checkRemoval nl) $ genNames depth bad_instances 

229  
230 
 Second phase functions 

231  
232 
  Singlenode relocation cost 

233 
nodeDelta :: Int > Int > Int > Int 

234 
nodeDelta i p s = 

235 
if i == p  i == s then 

236 
0 

237 
else 

238 
1 

239  
240 
{ Compute best solution. 

241  
242 
This function compares two solutions, choosing the minimum valid 

243 
solution. 

244 
} 

245 
compareSolutions :: Maybe Solution > Maybe Solution > Maybe Solution 

246 
compareSolutions a b = case (a, b) of 

247 
(Nothing, x) > x 

248 
(x, Nothing) > x 

249 
(x, y) > min x y 

250  
251 
  Compute best table. Note that the ordering of the arguments is important. 

252 
compareTables :: Table > Table > Table 

253 
compareTables a@(Table _ _ a_cv _) b@(Table _ _ b_cv _ ) = 

254 
if a_cv > b_cv then b else a 

255  
256 
  Check if a given delta is worse then an existing solution. 

257 
tooHighDelta :: Maybe Solution > Int > Int > Bool 

258 
tooHighDelta sol new_delta max_delta = 

259 
if new_delta > max_delta && max_delta >=0 then 

260 
True 

261 
else 

262 
case sol of 

263 
Nothing > False 

264 
Just (Solution old_delta _) > old_delta <= new_delta 

265  
266 
{ Check if placement of instances still keeps the cluster N+1 compliant. 

267  
268 
This is the workhorse of the allocation algorithm: given the 

269 
current node and instance maps, the list of instances to be 

270 
placed, and the current solution, this will return all possible 

271 
solution by recursing until all target instances are placed. 

272  
273 
} 

274 
checkPlacement :: NodeList  ^ The current node list 

275 
> [Instance.Instance]  ^ List of instances still to place 

276 
> [Placement]  ^ Partial solution until now 

277 
> Int  ^ The delta of the partial solution 

278 
> Maybe Solution  ^ The previous solution 

279 
> Int  ^ Abort if the we go above this delta 

280 
> Maybe Solution  ^ The new solution 

281 
checkPlacement nl victims current current_delta prev_sol max_delta = 

282 
let target = head victims 

283 
opdx = Instance.pnode target 

284 
osdx = Instance.snode target 

285 
vtail = tail victims 

286 
have_tail = (length vtail) > 0 

287 
nodes = Container.elems nl 

288 
in 

289 
foldl' 

290 
(\ accu_p pri > 

291 
let 

292 
pri_idx = Node.idx pri 

293 
upri_delta = current_delta + nodeDelta pri_idx opdx osdx 

294 
new_pri = Node.addPri pri target 

295 
fail_delta1 = tooHighDelta accu_p upri_delta max_delta 

296 
in 

297 
if fail_delta1  isNothing(new_pri) then accu_p 

298 
else let pri_nl = Container.add pri_idx (fromJust new_pri) nl in 

299 
foldl' 

300 
(\ accu sec > 

301 
let 

302 
sec_idx = Node.idx sec 

303 
upd_delta = upri_delta + 

304 
nodeDelta sec_idx opdx osdx 

305 
fail_delta2 = tooHighDelta accu upd_delta max_delta 

306 
new_sec = Node.addSec sec target pri_idx 

307 
in 

308 
if sec_idx == pri_idx  fail_delta2  

309 
isNothing new_sec then accu 

310 
else let 

311 
nx = Container.add sec_idx (fromJust new_sec) pri_nl 

312 
plc = (Instance.idx target, pri_idx, sec_idx) 

313 
c2 = plc:current 

314 
result = 

315 
if have_tail then 

316 
checkPlacement nx vtail c2 upd_delta 

317 
accu max_delta 

318 
else 

319 
Just (Solution upd_delta c2) 

320 
in compareSolutions accu result 

321 
) accu_p nodes 

322 
) prev_sol nodes 

323  
324 
  Apply a move 

325 
applyMove :: NodeList > Instance.Instance 

326 
> IMove > (Maybe NodeList, Instance.Instance, Int, Int) 

327 
applyMove nl inst Failover = 

328 
let old_pdx = Instance.pnode inst 

329 
old_sdx = Instance.snode inst 

330 
old_p = Container.find old_pdx nl 

331 
old_s = Container.find old_sdx nl 

332 
int_p = Node.removePri old_p inst 

333 
int_s = Node.removeSec old_s inst 

334 
new_p = Node.addPri int_s inst 

335 
new_s = Node.addSec int_p inst old_sdx 

336 
new_nl = if isNothing(new_p)  isNothing(new_s) then Nothing 

337 
else Just $ Container.addTwo old_pdx (fromJust new_s) 

338 
old_sdx (fromJust new_p) nl 

339 
in (new_nl, Instance.setBoth inst old_sdx old_pdx, old_sdx, old_pdx) 

340  
341 
applyMove nl inst (ReplacePrimary new_pdx) = 

342 
let old_pdx = Instance.pnode inst 

343 
old_sdx = Instance.snode inst 

344 
old_p = Container.find old_pdx nl 

345 
old_s = Container.find old_sdx nl 

346 
tgt_n = Container.find new_pdx nl 

347 
int_p = Node.removePri old_p inst 

348 
int_s = Node.removeSec old_s inst 

349 
new_p = Node.addPri tgt_n inst 

350 
new_s = Node.addSec int_s inst new_pdx 

351 
new_nl = if isNothing(new_p)  isNothing(new_s) then Nothing 

352 
else Just $ Container.add new_pdx (fromJust new_p) $ 

353 
Container.addTwo old_pdx int_p 

354 
old_sdx (fromJust new_s) nl 

355 
in (new_nl, Instance.setPri inst new_pdx, new_pdx, old_sdx) 

356  
357 
applyMove nl inst (ReplaceSecondary new_sdx) = 

358 
let old_pdx = Instance.pnode inst 

359 
old_sdx = Instance.snode inst 

360 
old_s = Container.find old_sdx nl 

361 
tgt_n = Container.find new_sdx nl 

362 
int_s = Node.removeSec old_s inst 

363 
new_s = Node.addSec tgt_n inst old_pdx 

364 
new_nl = if isNothing(new_s) then Nothing 

365 
else Just $ Container.addTwo new_sdx (fromJust new_s) 

366 
old_sdx int_s nl 

367 
in (new_nl, Instance.setSec inst new_sdx, old_pdx, new_sdx) 

368  
369 
checkSingleStep :: Table  ^ The original table 

370 
> Instance.Instance  ^ The instance to move 

371 
> Table  ^ The current best table 

372 
> IMove  ^ The move to apply 

373 
> Table  ^ The final best table 

374 
checkSingleStep ini_tbl target cur_tbl move = 

375 
let 

376 
Table ini_nl ini_il _ ini_plc = ini_tbl 

377 
(tmp_nl, new_inst, pri_idx, sec_idx) = 

378 
applyMove ini_nl target move 

379 
in 

380 
if isNothing tmp_nl then cur_tbl 

381 
else 

382 
let tgt_idx = Instance.idx target 

383 
upd_nl = fromJust tmp_nl 

384 
upd_cvar = compCV upd_nl 

385 
upd_il = Container.add tgt_idx new_inst ini_il 

386 
tmp_plc = filter (\ (t, _, _) > t /= tgt_idx) ini_plc 

387 
upd_plc = (tgt_idx, pri_idx, sec_idx):tmp_plc 

388 
upd_tbl = Table upd_nl upd_il upd_cvar upd_plc 

389 
in 

390 
compareTables cur_tbl upd_tbl 

391  
392 
  Compute the best next move. 

393 
checkMove :: Table  ^ The current solution 

394 
> [Instance.Instance]  ^ List of instances still to move 

395 
> Table  ^ The new solution 

396 
checkMove ini_tbl victims = 

397 
let target = head victims 

398 
opdx = Instance.pnode target 

399 
osdx = Instance.snode target 

400 
vtail = tail victims 

401 
have_tail = (length vtail) > 0 

402 
Table ini_nl _ _ _ = ini_tbl 

403 
nodes = filter (\node > let idx = Node.idx node 

404 
in idx /= opdx && idx /= osdx) 

405 
$ Container.elems ini_nl 

406 
aft_failover = checkSingleStep ini_tbl target ini_tbl Failover 

407 
next_tbl = 

408 
foldl' 

409 
(\ accu_p new_node > 

410 
let 

411 
new_idx = Node.idx new_node 

412 
pmoves = [ReplacePrimary new_idx, 

413 
ReplaceSecondary new_idx] 

414 
in 

415 
foldl' (checkSingleStep ini_tbl target) accu_p pmoves 

416 
) aft_failover nodes 

417 
in if have_tail then checkMove next_tbl vtail 

418 
else next_tbl 

419  
420  
421  
422 
{  Auxiliary function for solution computation. 

423  
424 
We write this in an explicit recursive fashion in order to control 

425 
earlyabort in case we have met the min delta. We can't use foldr 

426 
instead of explicit recursion since we need the accumulator for the 

427 
abort decision. 

428  
429 
} 

430 
advanceSolution :: [Maybe Removal]  ^ The removal to process 

431 
> Int  ^ Minimum delta parameter 

432 
> Int  ^ Maximum delta parameter 

433 
> Maybe Solution  ^ Current best solution 

434 
> Maybe Solution  ^ New best solution 

435 
advanceSolution [] _ _ sol = sol 

436 
advanceSolution (Nothing:xs) m n sol = advanceSolution xs m n sol 

437 
advanceSolution ((Just (Removal nx removed)):xs) min_d max_d prev_sol = 

438 
let new_sol = checkPlacement nx removed [] 0 prev_sol max_d 

439 
new_delta = solutionDelta $! new_sol 

440 
in 

441 
if new_delta >= 0 && new_delta <= min_d then 

442 
new_sol 

443 
else 

444 
advanceSolution xs min_d max_d new_sol 

445  
446 
  Computes the placement solution. 

447 
solutionFromRemovals :: [Maybe Removal]  ^ The list of (possible) removals 

448 
> Int  ^ Minimum delta parameter 

449 
> Int  ^ Maximum delta parameter 

450 
> Maybe Solution  ^ The best solution found 

451 
solutionFromRemovals removals min_delta max_delta = 

452 
advanceSolution removals min_delta max_delta Nothing 

453  
454 
{  Computes the solution at the given depth. 

455  
456 
This is a wrapper over both computeRemovals and 

457 
solutionFromRemovals. In case we have no solution, we return Nothing. 

458  
459 
} 

460 
computeSolution :: NodeList  ^ The original node data 

461 
> [Instance.Instance]  ^ The list of /bad/ instances 

462 
> Int  ^ The /depth/ of removals 

463 
> Int  ^ Maximum number of removals to process 

464 
> Int  ^ Minimum delta parameter 

465 
> Int  ^ Maximum delta parameter 

466 
> Maybe Solution  ^ The best solution found (or Nothing) 

467 
computeSolution nl bad_instances depth max_removals min_delta max_delta = 

468 
let 

469 
removals = computeRemovals nl bad_instances depth 

470 
removals' = capRemovals removals max_removals 

471 
in 

472 
solutionFromRemovals removals' min_delta max_delta 

473  
474 
 Solution display functions (pure) 

475  
476 
  Given the original and final nodes, computes the relocation description. 

477 
computeMoves :: String  ^ The instance name 

478 
> String  ^ Original primary 

479 
> String  ^ Original secondary 

480 
> String  ^ New primary 

481 
> String  ^ New secondary 

482 
> (String, [String]) 

483 
 ^ Tuple of moves and commands list; moves is containing 

484 
 either @/f/@ for failover or @/r:name/@ for replace 

485 
 secondary, while the command list holds gntinstance 

486 
 commands (without that prefix), e.g \"@failover instance1@\" 

487 
computeMoves i a b c d = 

488 
if c == a then { Same primary } 

489 
if d == b then { Same sec??! } 

490 
("", []) 

491 
else { Change of secondary } 

492 
(printf "r:%s" d, 

493 
[printf "replacedisks n %s %s" d i]) 

494 
else 

495 
if c == b then { Failover and ... } 

496 
if d == a then { that's all } 

497 
("f", [printf "failover %s" i]) 

498 
else 

499 
(printf "f r:%s" d, 

500 
[printf "failover %s" i, 

501 
printf "replacedisks n %s %s" d i]) 

502 
else 

503 
if d == a then { ... and keep primary as secondary } 

504 
(printf "r:%s f" c, 

505 
[printf "replacedisks n %s %s" c i, 

506 
printf "failover %s" i]) 

507 
else 

508 
if d == b then { ... keep same secondary } 

509 
(printf "f r:%s f" c, 

510 
[printf "failover %s" i, 

511 
printf "replacedisks n %s %s" c i, 

512 
printf "failover %s" i]) 

513  
514 
else { Nothing in common } 

515 
(printf "r:%s f r:%s" c d, 

516 
[printf "replacedisks n %s %s" c i, 

517 
printf "failover %s" i, 

518 
printf "replacedisks n %s %s" d i]) 

519  
520 
{ Converts a solution to string format } 

521 
printSolution :: InstanceList 

522 
> [(Int, String)] 

523 
> [(Int, String)] 

524 
> [Placement] 

525 
> ([String], [[String]]) 

526 
printSolution il ktn kti sol = 

527 
unzip $ map 

528 
(\ (i, p, s) > 

529 
let inst = Container.find i il 

530 
inam = fromJust $ lookup (Instance.idx inst) kti 

531 
npri = fromJust $ lookup p ktn 

532 
nsec = fromJust $ lookup s ktn 

533 
opri = fromJust $ lookup (Instance.pnode inst) ktn 

534 
osec = fromJust $ lookup (Instance.snode inst) ktn 

535 
(moves, cmds) = computeMoves inam opri osec npri nsec 

536  
537 
in 

538 
(printf " I: %s\to: %s+>%s\tn: %s+>%s\ta: %s" 

539 
inam opri osec npri nsec moves, 

540 
cmds) 

541 
) sol 

542  
543 
  Print the node list. 

544 
printNodes :: [(Int, String)] > NodeList > String 

545 
printNodes ktn nl = 

546 
let snl = sortBy (compare `on` Node.idx) (Container.elems nl) 

547 
snl' = map (\ n > ((fromJust $ lookup (Node.idx n) ktn), n)) snl 

548 
in unlines $ map (uncurry Node.list) snl' 

549  
550 
  Compute the mem and disk covariance. 

551 
compDetailedCV :: NodeList > (Double, Double) 

552 
compDetailedCV nl = 

553 
let nstats = map Node.normUsed $ Container.elems nl 

554 
(mem_l, dsk_l) = unzip nstats 

555 
mem_cv = varianceCoeff mem_l 

556 
dsk_cv = varianceCoeff dsk_l 

557 
in (mem_cv, dsk_cv) 

558  
559 
  Compute the 'total' variance. 

560 
compCV :: NodeList > Double 

561 
compCV nl = 

562 
let (mem_cv, dsk_cv) = compDetailedCV nl 

563 
in mem_cv + dsk_cv 

564  
565 
printStats :: NodeList > String 

566 
printStats nl = 

567 
let (mem_cv, dsk_cv) = compDetailedCV nl 

568 
in printf "mem=%.8f, dsk=%.8f" mem_cv dsk_cv 

569  
570 
 Balancing functions 

571  
572 
 Loading functions 

573  
574 
{  Convert newline and delimiterseparated text. 

575  
576 
This function converts a text in tabular format as generated by 

577 
@gntinstance list@ and @gntnode list@ to a list of objects using a 

578 
supplied conversion function. 

579  
580 
} 

581 
loadTabular :: String > ([String] > (String, a)) 

582 
> (a > Int > a) > ([(String, Int)], [(Int, a)]) 

583 
loadTabular text_data convert_fn set_fn = 

584 
let lines_data = lines text_data 

585 
rows = map (sepSplit '') lines_data 

586 
kerows = (map convert_fn rows) 

587 
idxrows = map (\ (idx, (k, v)) > ((k, idx), (idx, set_fn v idx))) 

588 
(zip [0..] kerows) 

589 
in unzip idxrows 

590  
591 
  Set the primary or secondary node indices on the instance list. 

592 
fixInstances :: [(Int, Node.Node)] 

593 
> (Node.Node > [Int])  ^ Either 'Node.slist' or 'Node.plist' 

594 
> (Instance.Instance > Int > Instance.Instance) 

595 
 ^ Either 'Instance.setSec' or 'Instance.setPri' 

596 
> [(Int, Instance.Instance)] 

597 
> [(Int, Instance.Instance)] 

598 
fixInstances nl list_fn set_fn il = 

599 
concat $ map 

600 
(\ (n_idx, n) > 

601 
map 

602 
(\ i_idx > 

603 
let oldi = fromJust (lookup i_idx il) 

604 
in 

605 
(i_idx, set_fn oldi n_idx) 

606 
) (list_fn n) 

607 
) nl 

608  
609 
  Splits and returns a list of indexes based on an Instance assoc list. 

610 
csi :: String > [(String, Int)] > [Int] 

611 
csi values il = 

612 
map 

613 
(\ x > fromJust (lookup x il)) 

614 
(commaSplit values) 

615  
616 
{ Initializer function that loads the data from a node and list file 

617 
and massages it into the correct format. } 

618 
loadData :: String  ^ Node data in text format 

619 
> String  ^ Instance data in text format 

620 
> (Container.Container Node.Node, 

621 
Container.Container Instance.Instance, 

622 
[(Int, String)], [(Int, String)]) 

623 
loadData ndata idata = 

624 
{ instance file: name mem disk } 

625 
let (kti, il) = loadTabular idata 

626 
(\ (i:j:k:[]) > (i, Instance.create j k)) Instance.setIdx 

627 
{ node file: name mem disk plist slist } 

628 
(ktn, nl) = loadTabular ndata 

629 
(\ (i:jt:jf:kt:kf:l:m:[]) > 

630 
(i, Node.create jt jf kt kf (csi l kti) (csi m kti))) 

631 
Node.setIdx 

632 
il2 = fixInstances nl Node.slist Instance.setSec $ 

633 
fixInstances nl Node.plist Instance.setPri il 

634 
il3 = Container.fromAssocList il2 

635 
nl3 = Container.fromAssocList 

636 
(map (\ (k, v) > (k, Node.buildPeers v il3 (length nl))) nl) 

637 
in 

638 
(nl3, il3, swapPairs ktn, swapPairs kti) 
b/src/Container.hs  

1 
{ Module abstracting the node and instance container implementation. 

2  
3 
This is currently implemented on top of an 'IntMap', which seems to 

4 
give the best performance for our workload. 

5  
6 
} 

7  
8 
module Container 

9 
( 

10 
 * Types 

11 
Container 

12 
 * Creation 

13 
, empty 

14 
, fromAssocList 

15 
 * Query 

16 
, size 

17 
, find 

18 
 * Update 

19 
, add 

20 
, addTwo 

21 
, remove 

22 
 * Conversion 

23 
, elems 

24 
) where 

25  
26 
import qualified Data.IntMap as IntMap 

27  
28 
type Key = IntMap.Key 

29 
type Container = IntMap.IntMap 

30  
31 
  Create an empty container. 

32 
empty :: Container a 

33 
empty = IntMap.empty 

34  
35 
  Returns the number of elements in the map. 

36 
size :: Container a > Int 

37 
size = IntMap.size 

38  
39 
  Locate a key in the map (must exist). 

40 
find :: Key > Container a > a 

41 
find k c = c IntMap.! k 

42  
43 
  Locate a keyin the map returning a default value if not existing. 

44 
findWithDefault :: a > Key > Container a > a 

45 
findWithDefault = IntMap.findWithDefault 

46  
47 
  Add or update one element to the map. 

48 
add :: Key > a > Container a > Container a 

49 
add k v c = IntMap.insert k v c 

50  
51 
  Remove an element from the map. 

52 
remove :: Key > Container a > Container a 

53 
remove = IntMap.delete 

54  
55 
  Return the list of values in the map. 

56 
elems :: Container a > [a] 

57 
elems = IntMap.elems 

58  
59 
  Create a map from an association list. 

60 
fromAssocList :: [(Key, a)] > Container a 

61 
fromAssocList = IntMap.fromList 

62  
63 
  Create a map from an association list with a combining function. 

64 
fromListWith :: (a > a > a) > [(Key, a)] > Container a 

65 
fromListWith = IntMap.fromListWith 

66  
67 
  Fold over the values of the map. 

68 
fold :: (a > b > b) > b > Container a > b 

69 
fold = IntMap.fold 

70  
71 
  Add or update two elements of the map. 

72 
addTwo :: Key > a > Key > a > Container a > Container a 

73 
addTwo k1 v1 k2 v2 c = add k1 v1 $ add k2 v2 c 
b/src/Instance.hs  

1 
{ Module describing an instance. 

2  
3 
The instance data type holds very few fields, the algorithm 

4 
intelligence is in the "Node" and "Cluster" modules. 

5  
6 
} 

7 
module Instance where 

8  
9 
data Instance = Instance { mem :: Int  ^ memory of the instance 

10 
, disk :: Int  ^ disk size of instance 

11 
, pnode :: Int  ^ original primary node 

12 
, snode :: Int  ^ original secondary node 

13 
, idx :: Int  ^ internal index for bookkeeping 

14 
} deriving (Show) 

15  
16 
create :: String > String > Instance 

17 
create mem_init disk_init = Instance { 

18 
mem = read mem_init, 

19 
disk = read disk_init, 

20 
pnode = 1, 

21 
snode = 1, 

22 
idx = 1 

23 
} 

24  
25 
  Changes the primary node of the instance. 

26 
setPri :: Instance  ^ the original instance 

27 
> Int  ^ the new primary node 

28 
> Instance  ^ the modified instance 

29 
setPri t p = t { pnode = p } 

30  
31 
  Changes the secondary node of the instance. 

32 
setSec :: Instance  ^ the original instance 

33 
> Int  ^ the new secondary node 

34 
> Instance  ^ the modified instance 

35 
setSec t s = t { snode = s } 

36  
37 
  Changes both nodes of the instance. 

38 
setBoth :: Instance  ^ the original instance 

39 
> Int  ^ new primary node index 

40 
> Int  ^ new secondary node index 

41 
> Instance  ^ the modified instance 

42 
setBoth t p s = t { pnode = p, snode = s } 

43  
44 
  Changes the index. 

45 
 This is used only during the building of the data structures. 

46 
setIdx :: Instance  ^ the original instance 

47 
> Int  ^ new index 

48 
> Instance  ^ the modified instance 

49 
setIdx t i = t { idx = i } 
b/src/Makefile  

1 
all: hn1 hbal 

2  
3 
hn1: 

4 
ghc make O2 W hn1 

5  
6 
hbal: 

7 
ghc make O2 W hbal 

8  
9 
clean: 

10 
rm f *.o *.cmi *.cmo *.cmx *.old hn1 zn1 *.prof *.ps *.stat *.aux \ 

11 
gmon.out *.hi README.html TAGS 

12  
13 
.PHONY : all clean hn1 hbal 
b/src/Node.hs  

1 
{ Module describing a node. 

2  
3 
All updates are functional (copybased) and return a new node with 

4 
updated value. 

5 
} 

6  
7 
module Node 

8 
( 

9 
Node(failN1, idx, f_mem, f_disk, slist, plist) 

10 
 * Constructor 

11 
, create 

12 
 ** Finalization after data loading 

13 
, buildPeers 

14 
, setIdx 

15 
 * Instance (re)location 

16 
, removePri 

17 
, removeSec 

18 
, addPri 

19 
, addSec 

20 
 * Statistics 

21 
, normUsed 

22 
 * Formatting 

23 
, list 

24 
) where 

25  
26 
import Data.List 

27 
import Text.Printf (printf) 

28  
29 
import qualified Container 

30 
import qualified Instance 

31 
import qualified PeerMap 

32  
33 
import Utils 

34 
Also available in: Unified diff