Revision e4f08c46

b/COPYING
1
		    GNU GENERAL PUBLIC LICENSE
2
		       Version 2, June 1991
3

  
4
 Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
5
 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
6
 Everyone is permitted to copy and distribute verbatim copies
7
 of this license document, but changing it is not allowed.
8

  
9
			    Preamble
10

  
11
  The licenses for most software are designed to take away your
12
freedom to share and change it.  By contrast, the GNU General Public
13
License is intended to guarantee your freedom to share and change free
14
software--to make sure the software is free for all its users.  This
15
General Public License applies to most of the Free Software
16
Foundation's software and to any other program whose authors commit to
17
using it.  (Some other Free Software Foundation software is covered by
18
the GNU Lesser General Public License instead.)  You can apply it to
19
your programs, too.
20

  
21
  When we speak of free software, we are referring to freedom, not
22
price.  Our General Public Licenses are designed to make sure that you
23
have the freedom to distribute copies of free software (and charge for
24
this service if you wish), that you receive source code or can get it
25
if you want it, that you can change the software or use pieces of it
26
in new free programs; and that you know you can do these things.
27

  
28
  To protect your rights, we need to make restrictions that forbid
29
anyone to deny you these rights or to ask you to surrender the rights.
30
These restrictions translate to certain responsibilities for you if you
31
distribute copies of the software, or if you modify it.
32

  
33
  For example, if you distribute copies of such a program, whether
34
gratis or for a fee, you must give the recipients all the rights that
35
you have.  You must make sure that they, too, receive or can get the
36
source code.  And you must show them these terms so they know their
37
rights.
38

  
39
  We protect your rights with two steps: (1) copyright the software, and
40
(2) offer you this license which gives you legal permission to copy,
41
distribute and/or modify the software.
42

  
43
  Also, for each author's protection and ours, we want to make certain
44
that everyone understands that there is no warranty for this free
45
software.  If the software is modified by someone else and passed on, we
46
want its recipients to know that what they have is not the original, so
47
that any problems introduced by others will not reflect on the original
48
authors' reputations.
49

  
50
  Finally, any free program is threatened constantly by software
51
patents.  We wish to avoid the danger that redistributors of a free
52
program will individually obtain patent licenses, in effect making the
53
program proprietary.  To prevent this, we have made it clear that any
54
patent must be licensed for everyone's free use or not licensed at all.
55

  
56
  The precise terms and conditions for copying, distribution and
57
modification follow.
58

  
59
		    GNU GENERAL PUBLIC LICENSE
60
   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
61

  
62
  0. This License applies to any program or other work which contains
63
a notice placed by the copyright holder saying it may be distributed
64
under the terms of this General Public License.  The "Program", below,
65
refers to any such program or work, and a "work based on the Program"
66
means either the Program or any derivative work under copyright law:
67
that is to say, a work containing the Program or a portion of it,
68
either verbatim or with modifications and/or translated into another
69
language.  (Hereinafter, translation is included without limitation in
70
the term "modification".)  Each licensee is addressed as "you".
71

  
72
Activities other than copying, distribution and modification are not
73
covered by this License; they are outside its scope.  The act of
74
running the Program is not restricted, and the output from the Program
75
is covered only if its contents constitute a work based on the
76
Program (independent of having been made by running the Program).
77
Whether that is true depends on what the Program does.
78

  
79
  1. You may copy and distribute verbatim copies of the Program's
80
source code as you receive it, in any medium, provided that you
81
conspicuously and appropriately publish on each copy an appropriate
82
copyright notice and disclaimer of warranty; keep intact all the
83
notices that refer to this License and to the absence of any warranty;
84
and give any other recipients of the Program a copy of this License
85
along with the Program.
86

  
87
You may charge a fee for the physical act of transferring a copy, and
88
you may at your option offer warranty protection in exchange for a fee.
89

  
90
  2. You may modify your copy or copies of the Program or any portion
91
of it, thus forming a work based on the Program, and copy and
92
distribute such modifications or work under the terms of Section 1
93
above, provided that you also meet all of these conditions:
94

  
95
    a) You must cause the modified files to carry prominent notices
96
    stating that you changed the files and the date of any change.
97

  
98
    b) You must cause any work that you distribute or publish, that in
99
    whole or in part contains or is derived from the Program or any
100
    part thereof, to be licensed as a whole at no charge to all third
101
    parties under the terms of this License.
102

  
103
    c) If the modified program normally reads commands interactively
104
    when run, you must cause it, when started running for such
105
    interactive use in the most ordinary way, to print or display an
106
    announcement including an appropriate copyright notice and a
107
    notice that there is no warranty (or else, saying that you provide
108
    a warranty) and that users may redistribute the program under
109
    these conditions, and telling the user how to view a copy of this
110
    License.  (Exception: if the Program itself is interactive but
111
    does not normally print such an announcement, your work based on
112
    the Program is not required to print an announcement.)
113

  
114
These requirements apply to the modified work as a whole.  If
115
identifiable sections of that work are not derived from the Program,
116
and can be reasonably considered independent and separate works in
117
themselves, then this License, and its terms, do not apply to those
118
sections when you distribute them as separate works.  But when you
119
distribute the same sections as part of a whole which is a work based
120
on the Program, the distribution of the whole must be on the terms of
121
this License, whose permissions for other licensees extend to the
122
entire whole, and thus to each and every part regardless of who wrote it.
123

  
124
Thus, it is not the intent of this section to claim rights or contest
125
your rights to work written entirely by you; rather, the intent is to
126
exercise the right to control the distribution of derivative or
127
collective works based on the Program.
128

  
129
In addition, mere aggregation of another work not based on the Program
130
with the Program (or with a work based on the Program) on a volume of
131
a storage or distribution medium does not bring the other work under
132
the scope of this License.
133

  
134
  3. You may copy and distribute the Program (or a work based on it,
135
under Section 2) in object code or executable form under the terms of
136
Sections 1 and 2 above provided that you also do one of the following:
137

  
138
    a) Accompany it with the complete corresponding machine-readable
139
    source code, which must be distributed under the terms of Sections
140
    1 and 2 above on a medium customarily used for software interchange; or,
141

  
142
    b) Accompany it with a written offer, valid for at least three
143
    years, to give any third party, for a charge no more than your
144
    cost of physically performing source distribution, a complete
145
    machine-readable copy of the corresponding source code, to be
146
    distributed under the terms of Sections 1 and 2 above on a medium
147
    customarily used for software interchange; or,
148

  
149
    c) Accompany it with the information you received as to the offer
150
    to distribute corresponding source code.  (This alternative is
151
    allowed only for noncommercial distribution and only if you
152
    received the program in object code or executable form with such
153
    an offer, in accord with Subsection b above.)
154

  
155
The source code for a work means the preferred form of the work for
156
making modifications to it.  For an executable work, complete source
157
code means all the source code for all modules it contains, plus any
158
associated interface definition files, plus the scripts used to
159
control compilation and installation of the executable.  However, as a
160
special exception, the source code distributed need not include
161
anything that is normally distributed (in either source or binary
162
form) with the major components (compiler, kernel, and so on) of the
163
operating system on which the executable runs, unless that component
164
itself accompanies the executable.
165

  
166
If distribution of executable or object code is made by offering
167
access to copy from a designated place, then offering equivalent
168
access to copy the source code from the same place counts as
169
distribution of the source code, even though third parties are not
170
compelled to copy the source along with the object code.
171

  
172
  4. You may not copy, modify, sublicense, or distribute the Program
173
except as expressly provided under this License.  Any attempt
174
otherwise to copy, modify, sublicense or distribute the Program is
175
void, and will automatically terminate your rights under this License.
176
However, parties who have received copies, or rights, from you under
177
this License will not have their licenses terminated so long as such
178
parties remain in full compliance.
179

  
180
  5. You are not required to accept this License, since you have not
181
signed it.  However, nothing else grants you permission to modify or
182
distribute the Program or its derivative works.  These actions are
183
prohibited by law if you do not accept this License.  Therefore, by
184
modifying or distributing the Program (or any work based on the
185
Program), you indicate your acceptance of this License to do so, and
186
all its terms and conditions for copying, distributing or modifying
187
the Program or works based on it.
188

  
189
  6. Each time you redistribute the Program (or any work based on the
190
Program), the recipient automatically receives a license from the
191
original licensor to copy, distribute or modify the Program subject to
192
these terms and conditions.  You may not impose any further
193
restrictions on the recipients' exercise of the rights granted herein.
194
You are not responsible for enforcing compliance by third parties to
195
this License.
196

  
197
  7. If, as a consequence of a court judgment or allegation of patent
198
infringement or for any other reason (not limited to patent issues),
199
conditions are imposed on you (whether by court order, agreement or
200
otherwise) that contradict the conditions of this License, they do not
201
excuse you from the conditions of this License.  If you cannot
202
distribute so as to satisfy simultaneously your obligations under this
203
License and any other pertinent obligations, then as a consequence you
204
may not distribute the Program at all.  For example, if a patent
205
license would not permit royalty-free redistribution of the Program by
206
all those who receive copies directly or indirectly through you, then
207
the only way you could satisfy both it and this License would be to
208
refrain entirely from distribution of the Program.
209

  
210
If any portion of this section is held invalid or unenforceable under
211
any particular circumstance, the balance of the section is intended to
212
apply and the section as a whole is intended to apply in other
213
circumstances.
214

  
215
It is not the purpose of this section to induce you to infringe any
216
patents or other property right claims or to contest validity of any
217
such claims; this section has the sole purpose of protecting the
218
integrity of the free software distribution system, which is
219
implemented by public license practices.  Many people have made
220
generous contributions to the wide range of software distributed
221
through that system in reliance on consistent application of that
222
system; it is up to the author/donor to decide if he or she is willing
223
to distribute software through any other system and a licensee cannot
224
impose that choice.
225

  
226
This section is intended to make thoroughly clear what is believed to
227
be a consequence of the rest of this License.
228

  
229
  8. If the distribution and/or use of the Program is restricted in
230
certain countries either by patents or by copyrighted interfaces, the
231
original copyright holder who places the Program under this License
232
may add an explicit geographical distribution limitation excluding
233
those countries, so that distribution is permitted only in or among
234
countries not thus excluded.  In such case, this License incorporates
235
the limitation as if written in the body of this License.
236

  
237
  9. The Free Software Foundation may publish revised and/or new versions
238
of the General Public License from time to time.  Such new versions will
239
be similar in spirit to the present version, but may differ in detail to
240
address new problems or concerns.
241

  
242
Each version is given a distinguishing version number.  If the Program
243
specifies a version number of this License which applies to it and "any
244
later version", you have the option of following the terms and conditions
245
either of that version or of any later version published by the Free
246
Software Foundation.  If the Program does not specify a version number of
247
this License, you may choose any version ever published by the Free Software
248
Foundation.
249

  
250
  10. If you wish to incorporate parts of the Program into other free
251
programs whose distribution conditions are different, write to the author
252
to ask for permission.  For software which is copyrighted by the Free
253
Software Foundation, write to the Free Software Foundation; we sometimes
254
make exceptions for this.  Our decision will be guided by the two goals
255
of preserving the free status of all derivatives of our free software and
256
of promoting the sharing and reuse of software generally.
257

  
258
			    NO WARRANTY
259

  
260
  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
261
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
262
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
263
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
264
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
265
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
266
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
267
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
268
REPAIR OR CORRECTION.
269

  
270
  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
271
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
272
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
273
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
274
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
275
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
276
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
277
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
278
POSSIBILITY OF SUCH DAMAGES.
279

  
280
		     END OF TERMS AND CONDITIONS
281

  
282
	    How to Apply These Terms to Your New Programs
283

  
284
  If you develop a new program, and you want it to be of the greatest
285
possible use to the public, the best way to achieve this is to make it
286
free software which everyone can redistribute and change under these terms.
287

  
288
  To do so, attach the following notices to the program.  It is safest
289
to attach them to the start of each source file to most effectively
290
convey the exclusion of warranty; and each file should have at least
291
the "copyright" line and a pointer to where the full notice is found.
292

  
293
    <one line to give the program's name and a brief idea of what it does.>
294
    Copyright (C) <year>  <name of author>
295

  
296
    This program is free software; you can redistribute it and/or modify
297
    it under the terms of the GNU General Public License as published by
298
    the Free Software Foundation; either version 2 of the License, or
299
    (at your option) any later version.
300

  
301
    This program is distributed in the hope that it will be useful,
302
    but WITHOUT ANY WARRANTY; without even the implied warranty of
303
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
304
    GNU General Public License for more details.
305

  
306
    You should have received a copy of the GNU General Public License along
307
    with this program; if not, write to the Free Software Foundation, Inc.,
308
    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
309

  
310
Also add information on how to contact you by electronic and paper mail.
311

  
312
If the program is interactive, make it output a short notice like this
313
when it starts in an interactive mode:
314

  
315
    Gnomovision version 69, Copyright (C) year name of author
316
    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
317
    This is free software, and you are welcome to redistribute it
318
    under certain conditions; type `show c' for details.
319

  
320
The hypothetical commands `show w' and `show c' should show the appropriate
321
parts of the General Public License.  Of course, the commands you use may
322
be called something other than `show w' and `show c'; they could even be
323
mouse-clicks or menu items--whatever suits your program.
324

  
325
You should also get your employer (if you work as a programmer) or your
326
school, if any, to sign a "copyright disclaimer" for the program, if
327
necessary.  Here is a sample; alter the names:
328

  
329
  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
330
  `Gnomovision' (which makes passes at compilers) written by James Hacker.
331

  
332
  <signature of Ty Coon>, 1 April 1989
333
  Ty Coon, President of Vice
334

  
335
This General Public License does not permit incorporating your program into
336
proprietary programs.  If your program is a subroutine library, you may
337
consider it more useful to permit linking proprietary applications with the
338
library.  If this is what you want to do, use the GNU Lesser General
339
Public License instead of this License.
b/Makefile
1
HSRCS := $(wildcard src/*.hs)
2
HDDIR = apidoc
3

  
4
# Haskell rules
5

  
6
all:
7
	$(MAKE) -C src
8

  
9
README.html: README
10
	rst2html $< $@
11

  
12
doc: README.html
13
	rm -rf $(HDDIR)
14
	mkdir -p $(HDDIR)/src
15
	cp hscolour.css $(HDDIR)/src
16
	for file in $(HSRCS); do \
17
        HsColour -css -anchor \
18
        $$file > $(HDDIR)/src/`basename $$file .hs`.html ; \
19
    done
20
	ln -sf hn1.html $(HDDIR)/src/Main.html
21
	haddock --odir $(HDDIR) --html --ignore-all-exports \
22
	    -t hn1 -p haddock-prologue \
23
        --source-module="src/%{MODULE/.//}.html" \
24
        --source-entity="src/%{MODULE/.//}.html#%{NAME}" \
25
	    $(HSRCS)
26

  
27
clean:
28
	rm -f *.o *.cmi *.cmo *.cmx *.old hn1 zn1 *.prof *.ps *.stat *.aux \
29
        gmon.out *.hi README.html TAGS
30

  
31
.PHONY : all doc clean hn1
b/README
1
Cluster tools (h-aneti?)
2
========================
3

  
4
These are some simple cluster tools for fixing common problems. Right now N+1
5
and rebalancing are included.
6

  
7
.. contents::
8

  
9
Cluster N+1 solver
10
------------------
11

  
12
This program runs a very simple brute force algorithm over the instance
13
placement space in order to determine the shortest number of replace-disks
14
needed to fix the cluster. Note this means we won't get a balanced cluster,
15
just one that passes N+1 checks.
16

  
17
Also note that the set of all instance placements on a 20/80 cluster is
18
(20*19)^80, that is ~10^200, so...
19

  
20
Algorithm
21
+++++++++
22

  
23
The algorithm is a simple two-phase process.
24

  
25
In phase 1 we determine the removal set, that is the set of instances that when
26
removed completely from the cluster, make it healthy again. The instances that
27
can go into the set are all the primary and secondary instances of the failing
28
nodes. The result from this phase is actually a list - we compute all sets of
29
the same minimum length.
30

  
31
So basically we aim to determine here: what is the minimum number of instances
32
that need to be removed (this is called the removal depth) and which are the
33
actual combinations that fit (called the list of removal sets). 
34

  
35
In phase 2, for each removal set computed in the previous phase, we take the
36
removed instances and try to determine where we can put them so that the
37
cluster is still passing N+1 checks. From this list of possible solutions
38
(called the list of solutions), we compute the one that has the smallest delta
39
from the original state (the delta is the number of replace disks that needs to
40
be run) and chose this as the final solution.
41

  
42
Implementation
43
++++++++++++++
44

  
45
Of course, a naive implementation based on the above description will run for
46
long periods of time, so the implementation has to be smart in order to prune
47
the solution space as eagerly as possible.
48

  
49
In the following, we use as example a set of test data (a cluster with 20
50
nodes, 80 instances that has 5 nodes failing N+1 checks for a total of 12
51
warnings).
52

  
53
On this set, the minimum depth is 4 (anything below fails), and for this depth
54
the current version of the algorithm generates 5 removal sets; a previous
55
version of the first phase generated a slightly different set of instances, with
56
two removal sets. For the original version of the algorithm:
57

  
58
- the first, non-optimized implementation computed a solution of delta=4 in 30
59
  minutes on server-class CPUs and was still running when aborted 10 minutes
60
  later
61
- the intermediate optimized version computed the whole solution space and
62
  found a delta=3 solution in around 10 seconds on a laptop-class CPU (total
63
  number of solutions ~600k)
64
- latest version on server CPUs (which actually computes more removal sets)
65
  computes depth=4 in less than a second and depth=5 in around 2 seconds, and
66
  depth=6 in less than 20 seconds; depth=8 takes under five minutes (this is
67
  10^10 bigger solution space)
68

  
69
Note that when (artificially) increasing the depth to 5 the number of removal
70
sets grows fast (~3000) and a (again artificial) depth 6 generates 61k removal
71
sets. Therefore, it is possible to restrict the number of solution sets
72
examined via a command-line option.
73

  
74
The factors that influence the run time are:
75

  
76
- the removal depth; for each increase with one of the depth, we grow the
77
  solution space by the number of nodes squared (since a new instance can live
78
  any two nodes as primary/secondary, therefore (almost) N times N); i.e.,
79
  depth=1 will create a N^2 solution space, depth two will make this N^4,
80
  depth three will be N^6, etc.
81
- the removal depth again; for each increase in the depth, there will be more
82
  valid removal sets, and the space of solutions increases linearly with the
83
  number of removal sets
84

  
85
Therefore, the smaller the depth the faster the algorithm will be; it doesn't
86
seem like this algorithm will work for clusters of 100 nodes and many many
87
small instances (e.g. 256MB instances on 16GB nodes).
88

  
89
Currently applied optimizations:
90

  
91
- when choosing where to place an instance in phase two, there are N*(N-1)
92
  possible primary/secondary options; however, if instead of iterating over all
93
  p * s pairs, we first determine the set of primary nodes that can hold this
94
  instance (without failing N+1), we can cut (N-1) secondary placements for
95
  each primary node removed; and since this applies at every iteration of phase
96
  2 it linearly decreases the solution space, and on full clusters, this can
97
  mean a four-five times reductions of solution space
98
- since the number of solutions is very high even for smaller depths (on the
99
  test data, depth=4 results in 1.8M solutions) we can't compare them at the
100
  end, so at each iteration in phase 2 we only promote the best solution out of
101
  our own set of solutions
102
- since the placement of instances can only increase the delta of the solution
103
  (placing a new instance will add zero or more replace-disks steps), it means
104
  the delta will only increase while recursing during phase 2; therefore, if we
105
  know at one point that we have a current delta that is equal or higher to the
106
  delta of the best solution so far, we can abort the recursion; this cuts a
107
  tremendous number of branches; further promotion of the best solution from
108
  one removal set to another can cut entire removal sets after a few recursions
109

  
110
Command line usage
111
++++++++++++++++++
112

  
113
Synopsis::
114

  
115
    hn1 [-n NODES_FILE] [-i INSTANCES_FILE] [-d START_DEPTH] \
116
        [-r MAX_REMOVALS] [-l MIN_DELTA] [-L MAX_DELTA] \
117
        [-p] [-C]
118

  
119
The -n and -i options change the names of the input files. The -d option
120
changes the start depth, as a higher depth can give (with a longer computation
121
time) a solution with better delta. The -r option restricts at each depth the
122
number of solutions considered - with r=1000 for example even depth=10 finishes
123
in less than a second.
124

  
125
The -p option will show the cluster state after the solution is implemented,
126
while the -C option will show the needed gnt-instance commands to implement
127
it.
128

  
129
The -l (--min-delta) and -L (--max-delta) options restrict the solution in the
130
following ways:
131

  
132
- min-delta will cause the search to abort early once we find a solution with
133
  delta less than or equal to this parameter; this can cause extremely fast
134
  results in case a desired solution is found quickly; the default value for
135
  this parameter is zero, so once we find a "perfect" solution we finish early
136
- max-delta causes rejection of valid solution but which have delta higher
137
  than the value of this parameter; this can reduce the depth of the search
138
  tree, with sometimes significant speedups; by default, this optimization is
139
  not used
140

  
141
Individually or combined, these two parameters can (if there are any) very
142
fast result; on our test data, depth=34 (max depth!) is solved in 2 seconds
143
with min-delta=0/max-delta=1 (since there is such a solution), and the
144
extremely low max-delta causes extreme pruning.
145

  
146
Cluster rebalancer
147
------------------
148

  
149
Compared to the N+1 solver, the rebalancer uses a very simple algorithm:
150
repeatedly try to move each instance one step, so that the cluster score
151
becomes better. We stop when no further move can improve the score.
152

  
153
The algorithm is divided into rounds (all identical):
154

  
155
#. Repeat for each instance:
156

  
157
    #. Compute score after the potential failover of the instance
158

  
159
    #. For each node that is different from the current primary/secondary
160

  
161
        #. Compute score after replacing the primary with this new node
162

  
163
        #. Compute score after replacing the secondary with this new node
164

  
165

  
166
    #. Out of this N*2+1 possible new scores (and their associated move) for
167
       this instance, we choose the one that is the best in terms of cluster
168
       score, and then proceed to the next instance
169

  
170
Since we don't compute all combinations of moves for instances (e.g. the first
171
instance's all moves Cartesian product with second instance's all moves, etc.)
172
but we proceed serially instance A, then B, then C, the total computations we
173
make in one steps is simply N(number of nodes)*2+1 times I(number of instances),
174
instead of (N*2+1)^I. So therefore the runtime for a round is trivial.
175

  
176
Further rounds are done, since the relocation of instances might offer better
177
places for instances which we didn't move, or simply didn't move to the best
178
place. It is possible to limit the rounds, but usually the algorithm finishes
179
after a few rounds by itself.
180

  
181
Note that the cluster *must* be N+1 compliant before this algorithm is run, and
182
will stay at each move N+1 compliant. Therefore, the final cluster will be N+1
183
compliant.
184

  
185
Single-round solutions
186
++++++++++++++++++++++
187

  
188
Single-round solutions have the very nice property that they are
189
incrementally-valid. In other words, if you have a 10-step solution, at each
190
step the cluster is both N+1 compliant and better than the previous step.
191

  
192
This means that you can stop at any point and you will have a better cluster.
193
For this reason, single-round solutions are recommended in the common case of
194
let's make this better. Multi-round solutions will be better though when adding
195
a couple of new, empty nodes to the cluster due to the many relocations needed.
196

  
197

  
198
Multi-round solutions
199
+++++++++++++++++++++
200

  
201
A multi-round solution (not for a single round), due to de-duplication of moves
202
(i.e. just put the instance directly in its final place, and not move it five
203
times around) loses both these properties. It might be that it's not possible to
204
directly put the instance on the final nodes. So it can be possible that yes,
205
the cluster is happy in the final solution and nice, but you cannot do the steps
206
in the shown order. Solving this (via additional instance move(s)) is left to
207
the user.
208

  
209
Command line usage
210
++++++++++++++++++
211

  
212
Synopsis::
213

  
214
    hbal [-n NODES_FILE] [-i INSTANCES_FILE] \
215
         [-r MAX_ROUNDS] \
216
         [-p] [-C]
217

  
218
The -n and -i options change the names of the input files. The -r option
219
restricts the maximum number of rounds (and is more of safety measure).
220

  
221
The -p option will show the cluster state after the solution is implemented,
222
while the -C option will show the needed gnt-instance commands to implement
223
it.
224

  
225
Integration with Ganeti
226
-----------------------
227

  
228
The programs needs only the output of the node list and instance list. That is,
229
they need the following two commands to be run::
230

  
231
    gnt-node list -oname,mtotal,mfree,dtotal,dfree,pinst_list,sinst_list \
232
      --separator '|' --no-headers > nodes
233
    gnt-instance list -oname,admin_ram,sda_size \
234
      --separator '|' --no-head > instances
235

  
236
These two files should be saved under the names of 'nodes' and 'instances'.
237

  
238
When run, the programs will show some informational messages and output the
239
chosen solution, in the form of a list of instance name and chosen
240
primary/secondary nodes. The user then needs to run the necessary commands to
241
get the instances to live on those nodes.
242

  
243
Note that sda_size is less than the total disk size of an instance by 4352
244
MiB, so if disk space is at a premium the calculation could be wrong; in this
245
case, please adjust the values manually.
b/haddock-prologue
1
This is the internal documentation for hn1, an experimental N+1
2
cluster solver.
3

  
4
Start with the "Main" module, the follow with "Cluster" and then the
5
rest.
b/hscolour.css
1

  
2
.keyglyph, .layout {color: red;}
3
.keyword {color: blue;}
4
.comment, .comment a {color: green;}
5
.str, .chr {color: teal;}
6
.keyword,.conid, .varid, .conop, .varop, .num, .cpp, .sel, .definition {}
b/src/Cluster.hs
1
{-| Implementation of cluster-wide logic.
2

  
3
This module holds all pure cluster-logic; I\/O related functionality
4
goes into the "Main" module.
5

  
6
-}
7

  
8
module Cluster
9
    (
10
     -- * Types
11
     NodeList
12
    , InstanceList
13
    , Placement
14
    , Solution(..)
15
    , Table(..)
16
    , Removal
17
    -- * Generic functions
18
    , totalResources
19
    -- * First phase functions
20
    , computeBadItems
21
    -- * Second phase functions
22
    , computeSolution
23
    , applySolution
24
    , printSolution
25
    , printNodes
26
    -- * Balacing functions
27
    , checkMove
28
    , compCV
29
    , printStats
30
    -- * Loading functions
31
    , loadData
32
    ) where
33

  
34
import Data.List
35
import Data.Maybe (isNothing, fromJust)
36
import Text.Printf (printf)
37
import Data.Function
38

  
39
import qualified Container
40
import qualified Instance
41
import qualified Node
42
import Utils
43

  
44
type NodeList = Container.Container Node.Node
45
type InstanceList = Container.Container Instance.Instance
46
type Score = Double
47

  
48
-- | The description of an instance placement.
49
type Placement = (Int, Int, Int)
50

  
51
{- | A cluster solution described as the solution delta and the list
52
of placements.
53

  
54
-}
55
data Solution = Solution Int [Placement]
56
                deriving (Eq, Ord, Show)
57

  
58
-- | Returns the delta of a solution or -1 for Nothing
59
solutionDelta :: Maybe Solution -> Int
60
solutionDelta sol = case sol of
61
                      Just (Solution d _) -> d
62
                      _ -> -1
63

  
64
-- | A removal set.
65
data Removal = Removal NodeList [Instance.Instance]
66

  
67
-- | An instance move definition
68
data IMove = Failover
69
           | ReplacePrimary Int
70
           | ReplaceSecondary Int
71
             deriving (Show)
72

  
73
-- | The complete state for the balancing solution
74
data Table = Table NodeList InstanceList Score [Placement]
75
             deriving (Show)
76

  
77
-- General functions
78

  
79
-- | Cap the removal list if needed.
80
capRemovals :: [a] -> Int -> [a]
81
capRemovals removals max_removals =
82
    if max_removals > 0 then
83
        take max_removals removals
84
    else
85
        removals
86

  
87
-- | Check if the given node list fails the N+1 check.
88
verifyN1Check :: [Node.Node] -> Bool
89
verifyN1Check nl = any Node.failN1 nl
90

  
91
-- | Verifies the N+1 status and return the affected nodes.
92
verifyN1 :: [Node.Node] -> [Node.Node]
93
verifyN1 nl = filter Node.failN1 nl
94

  
95
{-| Add an instance and return the new node and instance maps. -}
96
addInstance :: NodeList -> Instance.Instance ->
97
               Node.Node -> Node.Node -> Maybe NodeList
98
addInstance nl idata pri sec =
99
  let pdx = Node.idx pri
100
      sdx = Node.idx sec
101
  in do
102
      pnode <- Node.addPri pri idata
103
      snode <- Node.addSec sec idata pdx
104
      new_nl <- return $ Container.addTwo sdx snode
105
                         pdx pnode nl
106
      return new_nl
107

  
108
-- | Remove an instance and return the new node and instance maps.
109
removeInstance :: NodeList -> Instance.Instance -> NodeList
110
removeInstance nl idata =
111
  let pnode = Instance.pnode idata
112
      snode = Instance.snode idata
113
      pn = Container.find pnode nl
114
      sn = Container.find snode nl
115
      new_nl = Container.addTwo
116
               pnode (Node.removePri pn idata)
117
               snode (Node.removeSec sn idata) nl in
118
  new_nl
119

  
120
-- | Remove an instance and return the new node map.
121
removeInstances :: NodeList -> [Instance.Instance] -> NodeList
122
removeInstances = foldl' removeInstance
123

  
124
-- | Compute the total free disk and memory in the cluster.
125
totalResources :: Container.Container Node.Node -> (Int, Int)
126
totalResources nl =
127
    foldl'
128
    (\ (mem, disk) node -> (mem + (Node.f_mem node),
129
                            disk + (Node.f_disk node)))
130
    (0, 0) (Container.elems nl)
131

  
132
{- | Compute a new version of a cluster given a solution.
133

  
134
This is not used for computing the solutions, but for applying a
135
(known-good) solution to the original cluster for final display.
136

  
137
It first removes the relocated instances after which it places them on
138
their new nodes.
139

  
140
 -}
141
applySolution :: NodeList -> InstanceList -> [Placement] -> NodeList
142
applySolution nl il sol =
143
    let odxes = map (\ (a, b, c) -> (Container.find a il,
144
                                     Node.idx (Container.find b nl),
145
                                     Node.idx (Container.find c nl))
146
                    ) sol
147
        idxes = (\ (x, _, _) -> x) (unzip3 odxes)
148
        nc = removeInstances nl idxes
149
    in
150
      foldl' (\ nz (a, b, c) ->
151
                 let new_p = Container.find b nz
152
                     new_s = Container.find c nz in
153
                 fromJust (addInstance nz a new_p new_s)
154
           ) nc odxes
155

  
156

  
157
-- First phase functions
158

  
159
{- | Given a list 1,2,3..n build a list of pairs [(1, [2..n]), (2,
160
    [3..n]), ...]
161

  
162
-}
163
genParts :: [a] -> Int -> [(a, [a])]
164
genParts l count =
165
    case l of
166
      [] -> []
167
      x:xs ->
168
          if length l < count then
169
              []
170
          else
171
              (x, xs) : (genParts xs count)
172

  
173
-- | Generates combinations of count items from the names list.
174
genNames :: Int -> [b] -> [[b]]
175
genNames count1 names1 =
176
  let aux_fn count names current =
177
          case count of
178
            0 -> [current]
179
            _ ->
180
                concatMap
181
                (\ (x, xs) -> aux_fn (count - 1) xs (x:current))
182
                (genParts names count)
183
  in
184
    aux_fn count1 names1 []
185

  
186
{- | Computes the pair of bad nodes and instances.
187

  
188
The bad node list is computed via a simple 'verifyN1' check, and the
189
bad instance list is the list of primary and secondary instances of
190
those nodes.
191

  
192
-}
193
computeBadItems :: NodeList -> InstanceList ->
194
                   ([Node.Node], [Instance.Instance])
195
computeBadItems nl il =
196
  let bad_nodes = verifyN1 $ Container.elems nl
197
      bad_instances = map (\idx -> Container.find idx il) $
198
                      sort $ nub $ concat $
199
                      map (\ n -> (Node.slist n) ++ (Node.plist n)) bad_nodes
200
  in
201
    (bad_nodes, bad_instances)
202

  
203

  
204
{- | Checks if removal of instances results in N+1 pass.
205

  
206
Note: the check removal cannot optimize by scanning only the affected
207
nodes, since the cluster is known to be not healthy; only the check
208
placement can make this shortcut.
209

  
210
-}
211
checkRemoval :: NodeList -> [Instance.Instance] -> Maybe Removal
212
checkRemoval nl victims =
213
  let nx = removeInstances nl victims
214
      failN1 = verifyN1Check (Container.elems nx)
215
  in
216
    if failN1 then
217
      Nothing
218
    else
219
      Just $ Removal nx victims
220

  
221

  
222
-- | Computes the removals list for a given depth
223
computeRemovals :: Cluster.NodeList
224
                 -> [Instance.Instance]
225
                 -> Int
226
                 -> [Maybe Cluster.Removal]
227
computeRemovals nl bad_instances depth =
228
    map (checkRemoval nl) $ genNames depth bad_instances
229

  
230
-- Second phase functions
231

  
232
-- | Single-node relocation cost
233
nodeDelta :: Int -> Int -> Int -> Int
234
nodeDelta i p s =
235
    if i == p || i == s then
236
        0
237
    else
238
        1
239

  
240
{-| Compute best solution.
241

  
242
    This function compares two solutions, choosing the minimum valid
243
    solution.
244
-}
245
compareSolutions :: Maybe Solution -> Maybe Solution -> Maybe Solution
246
compareSolutions a b = case (a, b) of
247
  (Nothing, x) -> x
248
  (x, Nothing) -> x
249
  (x, y) -> min x y
250

  
251
-- | Compute best table. Note that the ordering of the arguments is important.
252
compareTables :: Table -> Table -> Table
253
compareTables a@(Table _ _ a_cv _) b@(Table _ _ b_cv _ ) =
254
    if a_cv > b_cv then b else a
255

  
256
-- | Check if a given delta is worse then an existing solution.
257
tooHighDelta :: Maybe Solution -> Int -> Int -> Bool
258
tooHighDelta sol new_delta max_delta =
259
    if new_delta > max_delta && max_delta >=0 then
260
        True
261
    else
262
        case sol of
263
          Nothing -> False
264
          Just (Solution old_delta _) -> old_delta <= new_delta
265

  
266
{-| Check if placement of instances still keeps the cluster N+1 compliant.
267

  
268
    This is the workhorse of the allocation algorithm: given the
269
    current node and instance maps, the list of instances to be
270
    placed, and the current solution, this will return all possible
271
    solution by recursing until all target instances are placed.
272

  
273
-}
274
checkPlacement :: NodeList            -- ^ The current node list
275
               -> [Instance.Instance] -- ^ List of instances still to place
276
               -> [Placement]         -- ^ Partial solution until now
277
               -> Int                 -- ^ The delta of the partial solution
278
               -> Maybe Solution      -- ^ The previous solution
279
               -> Int                 -- ^ Abort if the we go above this delta
280
               -> Maybe Solution      -- ^ The new solution
281
checkPlacement nl victims current current_delta prev_sol max_delta =
282
  let target = head victims
283
      opdx = Instance.pnode target
284
      osdx = Instance.snode target
285
      vtail = tail victims
286
      have_tail = (length vtail) > 0
287
      nodes = Container.elems nl
288
  in
289
    foldl'
290
    (\ accu_p pri ->
291
         let
292
             pri_idx = Node.idx pri
293
             upri_delta = current_delta + nodeDelta pri_idx opdx osdx
294
             new_pri = Node.addPri pri target
295
             fail_delta1 = tooHighDelta accu_p upri_delta max_delta
296
         in
297
           if fail_delta1 || isNothing(new_pri) then accu_p
298
           else let pri_nl = Container.add pri_idx (fromJust new_pri) nl in
299
                foldl'
300
                (\ accu sec ->
301
                     let
302
                         sec_idx = Node.idx sec
303
                         upd_delta = upri_delta +
304
                                     nodeDelta sec_idx opdx osdx
305
                         fail_delta2 = tooHighDelta accu upd_delta max_delta
306
                         new_sec = Node.addSec sec target pri_idx
307
                     in
308
                       if sec_idx == pri_idx || fail_delta2 ||
309
                          isNothing new_sec then accu
310
                       else let
311
                           nx = Container.add sec_idx (fromJust new_sec) pri_nl
312
                           plc = (Instance.idx target, pri_idx, sec_idx)
313
                           c2 = plc:current
314
                           result =
315
                               if have_tail then
316
                                   checkPlacement nx vtail c2 upd_delta
317
                                                  accu max_delta
318
                               else
319
                                   Just (Solution upd_delta c2)
320
                      in compareSolutions accu result
321
                ) accu_p nodes
322
    ) prev_sol nodes
323

  
324
-- | Apply a move
325
applyMove :: NodeList -> Instance.Instance
326
          -> IMove -> (Maybe NodeList, Instance.Instance, Int, Int)
327
applyMove nl inst Failover =
328
    let old_pdx = Instance.pnode inst
329
        old_sdx = Instance.snode inst
330
        old_p = Container.find old_pdx nl
331
        old_s = Container.find old_sdx nl
332
        int_p = Node.removePri old_p inst
333
        int_s = Node.removeSec old_s inst
334
        new_p = Node.addPri int_s inst
335
        new_s = Node.addSec int_p inst old_sdx
336
        new_nl = if isNothing(new_p) || isNothing(new_s) then Nothing
337
                 else Just $ Container.addTwo old_pdx (fromJust new_s)
338
                      old_sdx (fromJust new_p) nl
339
    in (new_nl, Instance.setBoth inst old_sdx old_pdx, old_sdx, old_pdx)
340

  
341
applyMove nl inst (ReplacePrimary new_pdx) =
342
    let old_pdx = Instance.pnode inst
343
        old_sdx = Instance.snode inst
344
        old_p = Container.find old_pdx nl
345
        old_s = Container.find old_sdx nl
346
        tgt_n = Container.find new_pdx nl
347
        int_p = Node.removePri old_p inst
348
        int_s = Node.removeSec old_s inst
349
        new_p = Node.addPri tgt_n inst
350
        new_s = Node.addSec int_s inst new_pdx
351
        new_nl = if isNothing(new_p) || isNothing(new_s) then Nothing
352
                 else Just $ Container.add new_pdx (fromJust new_p) $
353
                      Container.addTwo old_pdx int_p
354
                               old_sdx (fromJust new_s) nl
355
    in (new_nl, Instance.setPri inst new_pdx, new_pdx, old_sdx)
356

  
357
applyMove nl inst (ReplaceSecondary new_sdx) =
358
    let old_pdx = Instance.pnode inst
359
        old_sdx = Instance.snode inst
360
        old_s = Container.find old_sdx nl
361
        tgt_n = Container.find new_sdx nl
362
        int_s = Node.removeSec old_s inst
363
        new_s = Node.addSec tgt_n inst old_pdx
364
        new_nl = if isNothing(new_s) then Nothing
365
                 else Just $ Container.addTwo new_sdx (fromJust new_s)
366
                      old_sdx int_s nl
367
    in (new_nl, Instance.setSec inst new_sdx, old_pdx, new_sdx)
368

  
369
checkSingleStep :: Table -- ^ The original table
370
                -> Instance.Instance -- ^ The instance to move
371
                -> Table -- ^ The current best table
372
                -> IMove -- ^ The move to apply
373
                -> Table -- ^ The final best table
374
checkSingleStep ini_tbl target cur_tbl move =
375
    let
376
        Table ini_nl ini_il _ ini_plc = ini_tbl
377
        (tmp_nl, new_inst, pri_idx, sec_idx) =
378
            applyMove ini_nl target move
379
    in
380
      if isNothing tmp_nl then cur_tbl
381
      else
382
          let tgt_idx = Instance.idx target
383
              upd_nl = fromJust tmp_nl
384
              upd_cvar = compCV upd_nl
385
              upd_il = Container.add tgt_idx new_inst ini_il
386
              tmp_plc = filter (\ (t, _, _) -> t /= tgt_idx) ini_plc
387
              upd_plc = (tgt_idx, pri_idx, sec_idx):tmp_plc
388
              upd_tbl = Table upd_nl upd_il upd_cvar upd_plc
389
          in
390
            compareTables cur_tbl upd_tbl
391

  
392
-- | Compute the best next move.
393
checkMove :: Table            -- ^ The current solution
394
          -> [Instance.Instance] -- ^ List of instances still to move
395
          -> Table            -- ^ The new solution
396
checkMove ini_tbl victims =
397
  let target = head victims
398
      opdx = Instance.pnode target
399
      osdx = Instance.snode target
400
      vtail = tail victims
401
      have_tail = (length vtail) > 0
402
      Table ini_nl _ _ _ = ini_tbl
403
      nodes = filter (\node -> let idx = Node.idx node
404
                               in idx /= opdx && idx /= osdx)
405
              $ Container.elems ini_nl
406
      aft_failover = checkSingleStep ini_tbl target ini_tbl Failover
407
      next_tbl =
408
          foldl'
409
          (\ accu_p new_node ->
410
               let
411
                   new_idx = Node.idx new_node
412
                   pmoves = [ReplacePrimary new_idx,
413
                             ReplaceSecondary new_idx]
414
               in
415
                 foldl' (checkSingleStep ini_tbl target) accu_p pmoves
416
          ) aft_failover nodes
417
  in if have_tail then checkMove next_tbl vtail
418
     else next_tbl
419

  
420

  
421

  
422
{- | Auxiliary function for solution computation.
423

  
424
We write this in an explicit recursive fashion in order to control
425
early-abort in case we have met the min delta. We can't use foldr
426
instead of explicit recursion since we need the accumulator for the
427
abort decision.
428

  
429
-}
430
advanceSolution :: [Maybe Removal] -- ^ The removal to process
431
                -> Int             -- ^ Minimum delta parameter
432
                -> Int             -- ^ Maximum delta parameter
433
                -> Maybe Solution  -- ^ Current best solution
434
                -> Maybe Solution  -- ^ New best solution
435
advanceSolution [] _ _ sol = sol
436
advanceSolution (Nothing:xs) m n sol = advanceSolution xs m n sol
437
advanceSolution ((Just (Removal nx removed)):xs) min_d max_d prev_sol =
438
    let new_sol = checkPlacement nx removed [] 0 prev_sol max_d
439
        new_delta = solutionDelta $! new_sol
440
    in
441
      if new_delta >= 0 && new_delta <= min_d then
442
          new_sol
443
      else
444
          advanceSolution xs min_d max_d new_sol
445

  
446
-- | Computes the placement solution.
447
solutionFromRemovals :: [Maybe Removal] -- ^ The list of (possible) removals
448
                     -> Int             -- ^ Minimum delta parameter
449
                     -> Int             -- ^ Maximum delta parameter
450
                     -> Maybe Solution  -- ^ The best solution found
451
solutionFromRemovals removals min_delta max_delta =
452
    advanceSolution removals min_delta max_delta Nothing
453

  
454
{- | Computes the solution at the given depth.
455

  
456
This is a wrapper over both computeRemovals and
457
solutionFromRemovals. In case we have no solution, we return Nothing.
458

  
459
-}
460
computeSolution :: NodeList        -- ^ The original node data
461
                -> [Instance.Instance] -- ^ The list of /bad/ instances
462
                -> Int             -- ^ The /depth/ of removals
463
                -> Int             -- ^ Maximum number of removals to process
464
                -> Int             -- ^ Minimum delta parameter
465
                -> Int             -- ^ Maximum delta parameter
466
                -> Maybe Solution  -- ^ The best solution found (or Nothing)
467
computeSolution nl bad_instances depth max_removals min_delta max_delta =
468
  let
469
      removals = computeRemovals nl bad_instances depth
470
      removals' = capRemovals removals max_removals
471
  in
472
    solutionFromRemovals removals' min_delta max_delta
473

  
474
-- Solution display functions (pure)
475

  
476
-- | Given the original and final nodes, computes the relocation description.
477
computeMoves :: String -- ^ The instance name
478
             -> String -- ^ Original primary
479
             -> String -- ^ Original secondary
480
             -> String -- ^ New primary
481
             -> String -- ^ New secondary
482
             -> (String, [String])
483
                -- ^ Tuple of moves and commands list; moves is containing
484
                -- either @/f/@ for failover or @/r:name/@ for replace
485
                -- secondary, while the command list holds gnt-instance
486
                -- commands (without that prefix), e.g \"@failover instance1@\"
487
computeMoves i a b c d =
488
    if c == a then {- Same primary -}
489
        if d == b then {- Same sec??! -}
490
            ("-", [])
491
        else {- Change of secondary -}
492
            (printf "r:%s" d,
493
             [printf "replace-disks -n %s %s" d i])
494
    else
495
        if c == b then {- Failover and ... -}
496
            if d == a then {- that's all -}
497
                ("f", [printf "failover %s" i])
498
            else
499
                (printf "f r:%s" d,
500
                 [printf "failover %s" i,
501
                  printf "replace-disks -n %s %s" d i])
502
        else
503
            if d == a then {- ... and keep primary as secondary -}
504
                (printf "r:%s f" c,
505
                 [printf "replace-disks -n %s %s" c i,
506
                  printf "failover %s" i])
507
            else
508
                if d == b then {- ... keep same secondary -}
509
                    (printf "f r:%s f" c,
510
                     [printf "failover %s" i,
511
                      printf "replace-disks -n %s %s" c i,
512
                      printf "failover %s" i])
513

  
514
                else {- Nothing in common -}
515
                    (printf "r:%s f r:%s" c d,
516
                     [printf "replace-disks -n %s %s" c i,
517
                      printf "failover %s" i,
518
                      printf "replace-disks -n %s %s" d i])
519

  
520
{-| Converts a solution to string format -}
521
printSolution :: InstanceList
522
              -> [(Int, String)]
523
              -> [(Int, String)]
524
              -> [Placement]
525
              -> ([String], [[String]])
526
printSolution il ktn kti sol =
527
  unzip $ map
528
    (\ (i, p, s) ->
529
       let inst = Container.find i il
530
           inam = fromJust $ lookup (Instance.idx inst) kti
531
           npri = fromJust $ lookup p ktn
532
           nsec = fromJust $ lookup s ktn
533
           opri = fromJust $ lookup (Instance.pnode inst) ktn
534
           osec = fromJust $ lookup (Instance.snode inst) ktn
535
           (moves, cmds) =  computeMoves inam opri osec npri nsec
536

  
537
       in
538
         (printf "  I: %s\to: %s+>%s\tn: %s+>%s\ta: %s"
539
                 inam opri osec npri nsec moves,
540
          cmds)
541
    ) sol
542

  
543
-- | Print the node list.
544
printNodes :: [(Int, String)] -> NodeList -> String
545
printNodes ktn nl =
546
    let snl = sortBy (compare `on` Node.idx) (Container.elems nl)
547
        snl' = map (\ n -> ((fromJust $ lookup (Node.idx n) ktn), n)) snl
548
    in unlines $ map (uncurry Node.list) snl'
549

  
550
-- | Compute the mem and disk covariance.
551
compDetailedCV :: NodeList -> (Double, Double)
552
compDetailedCV nl =
553
    let nstats = map Node.normUsed $ Container.elems nl
554
        (mem_l, dsk_l) = unzip nstats
555
        mem_cv = varianceCoeff mem_l
556
        dsk_cv = varianceCoeff dsk_l
557
    in (mem_cv, dsk_cv)
558

  
559
-- | Compute the 'total' variance.
560
compCV :: NodeList -> Double
561
compCV nl =
562
    let (mem_cv, dsk_cv) = compDetailedCV nl
563
    in mem_cv + dsk_cv
564

  
565
printStats :: NodeList -> String
566
printStats nl =
567
    let (mem_cv, dsk_cv) = compDetailedCV nl
568
    in printf "mem=%.8f, dsk=%.8f" mem_cv dsk_cv
569

  
570
-- Balancing functions
571

  
572
-- Loading functions
573

  
574
{- | Convert newline and delimiter-separated text.
575

  
576
This function converts a text in tabular format as generated by
577
@gnt-instance list@ and @gnt-node list@ to a list of objects using a
578
supplied conversion function.
579

  
580
-}
581
loadTabular :: String -> ([String] -> (String, a))
582
            -> (a -> Int -> a) -> ([(String, Int)], [(Int, a)])
583
loadTabular text_data convert_fn set_fn =
584
    let lines_data = lines text_data
585
        rows = map (sepSplit '|') lines_data
586
        kerows = (map convert_fn rows)
587
        idxrows = map (\ (idx, (k, v)) -> ((k, idx), (idx, set_fn v idx)))
588
                  (zip [0..] kerows)
589
    in unzip idxrows
590

  
591
-- | Set the primary or secondary node indices on the instance list.
592
fixInstances :: [(Int, Node.Node)]
593
             -> (Node.Node -> [Int]) -- ^ Either 'Node.slist' or 'Node.plist'
594
             -> (Instance.Instance -> Int -> Instance.Instance)
595
                -- ^ Either 'Instance.setSec' or 'Instance.setPri'
596
             -> [(Int, Instance.Instance)]
597
             -> [(Int, Instance.Instance)]
598
fixInstances nl list_fn set_fn il =
599
    concat $ map
600
               (\ (n_idx, n) ->
601
                   map
602
                   (\ i_idx ->
603
                        let oldi = fromJust (lookup i_idx il)
604
                        in
605
                          (i_idx, set_fn oldi n_idx)
606
                   ) (list_fn n)
607
              ) nl
608

  
609
-- | Splits and returns a list of indexes based on an Instance assoc list.
610
csi :: String -> [(String, Int)] -> [Int]
611
csi values il =
612
    map
613
    (\ x -> fromJust (lookup x il))
614
    (commaSplit values)
615

  
616
{-| Initializer function that loads the data from a node and list file
617
    and massages it into the correct format. -}
618
loadData :: String -- ^ Node data in text format
619
         -> String -- ^ Instance data in text format
620
         -> (Container.Container Node.Node,
621
             Container.Container Instance.Instance,
622
             [(Int, String)], [(Int, String)])
623
loadData ndata idata =
624
    {- instance file: name mem disk -}
625
    let (kti, il) = loadTabular idata
626
                    (\ (i:j:k:[]) -> (i, Instance.create j k)) Instance.setIdx
627
    {- node file: name mem disk plist slist -}
628
        (ktn, nl) = loadTabular ndata
629
                    (\ (i:jt:jf:kt:kf:l:m:[]) ->
630
                         (i, Node.create jt jf kt kf (csi l kti) (csi m kti)))
631
                    Node.setIdx
632
        il2 = fixInstances nl Node.slist Instance.setSec $
633
              fixInstances nl Node.plist Instance.setPri il
634
        il3 = Container.fromAssocList il2
635
        nl3 = Container.fromAssocList
636
             (map (\ (k, v) -> (k, Node.buildPeers v il3 (length nl))) nl)
637
    in
638
      (nl3, il3, swapPairs ktn, swapPairs kti)
b/src/Container.hs
1
{-| Module abstracting the node and instance container implementation.
2

  
3
This is currently implemented on top of an 'IntMap', which seems to
4
give the best performance for our workload.
5

  
6
-}
7

  
8
module Container
9
    (
10
     -- * Types
11
     Container
12
     -- * Creation
13
    , empty
14
    , fromAssocList
15
     -- * Query
16
    , size
17
    , find
18
     -- * Update
19
    , add
20
    , addTwo
21
    , remove
22
    -- * Conversion
23
    , elems
24
    ) where
25

  
26
import qualified Data.IntMap as IntMap
27

  
28
type Key = IntMap.Key
29
type Container = IntMap.IntMap
30

  
31
-- | Create an empty container.
32
empty :: Container a
33
empty = IntMap.empty
34

  
35
-- | Returns the number of elements in the map.
36
size :: Container a -> Int
37
size = IntMap.size
38

  
39
-- | Locate a key in the map (must exist).
40
find :: Key -> Container a -> a
41
find k c = c IntMap.! k
42

  
43
-- | Locate a keyin the map returning a default value if not existing.
44
findWithDefault :: a -> Key -> Container a -> a
45
findWithDefault = IntMap.findWithDefault
46

  
47
-- | Add or update one element to the map.
48
add :: Key -> a -> Container a -> Container a
49
add k v c = IntMap.insert k v c
50

  
51
-- | Remove an element from the map.
52
remove :: Key -> Container a -> Container a
53
remove = IntMap.delete
54

  
55
-- | Return the list of values in the map.
56
elems :: Container a -> [a]
57
elems = IntMap.elems
58

  
59
-- | Create a map from an association list.
60
fromAssocList :: [(Key, a)] -> Container a
61
fromAssocList = IntMap.fromList
62

  
63
-- | Create a map from an association list with a combining function.
64
fromListWith :: (a -> a -> a) -> [(Key, a)] -> Container a
65
fromListWith = IntMap.fromListWith
66

  
67
-- | Fold over the values of the map.
68
fold :: (a -> b -> b) -> b -> Container a -> b
69
fold = IntMap.fold
70

  
71
-- | Add or update two elements of the map.
72
addTwo :: Key -> a -> Key -> a -> Container a -> Container a
73
addTwo k1 v1 k2 v2 c = add k1 v1 $ add k2 v2 c
b/src/Instance.hs
1
{-| Module describing an instance.
2

  
3
The instance data type holds very few fields, the algorithm
4
intelligence is in the "Node" and "Cluster" modules.
5

  
6
-}
7
module Instance where
8

  
9
data Instance = Instance { mem :: Int -- ^ memory of the instance
10
                         , disk :: Int -- ^ disk size of instance
11
                         , pnode :: Int -- ^ original primary node
12
                         , snode :: Int -- ^ original secondary node
13
                         , idx :: Int -- ^ internal index for book-keeping
14
                         } deriving (Show)
15

  
16
create :: String -> String -> Instance
17
create mem_init disk_init = Instance {
18
                              mem = read mem_init,
19
                              disk = read disk_init,
20
                              pnode = -1,
21
                              snode = -1,
22
                              idx = -1
23
                            }
24

  
25
-- | Changes the primary node of the instance.
26
setPri :: Instance -- ^ the original instance
27
        -> Int -- ^ the new primary node
28
        -> Instance -- ^ the modified instance
29
setPri t p = t { pnode = p }
30

  
31
-- | Changes the secondary node of the instance.
32
setSec :: Instance -- ^ the original instance
33
        -> Int  -- ^ the new secondary node
34
        -> Instance -- ^ the modified instance
35
setSec t s = t { snode = s }
36

  
37
-- | Changes both nodes of the instance.
38
setBoth :: Instance -- ^ the original instance
39
         -> Int -- ^ new primary node index
40
         -> Int -- ^ new secondary node index
41
         -> Instance -- ^ the modified instance
42
setBoth t p s = t { pnode = p, snode = s }
43

  
44
-- | Changes the index.
45
-- This is used only during the building of the data structures.
46
setIdx :: Instance -- ^ the original instance
47
        -> Int -- ^ new index
48
        -> Instance -- ^ the modified instance
49
setIdx t i = t { idx = i }
b/src/Makefile
1
all: hn1 hbal
2

  
3
hn1:
4
	ghc --make -O2 -W hn1
5

  
6
hbal:
7
	ghc --make -O2 -W hbal
8

  
9
clean:
10
	rm -f *.o *.cmi *.cmo *.cmx *.old hn1 zn1 *.prof *.ps *.stat *.aux \
11
        gmon.out *.hi README.html TAGS
12

  
13
.PHONY : all clean hn1 hbal
b/src/Node.hs
1
{-| Module describing a node.
2

  
3
    All updates are functional (copy-based) and return a new node with
4
    updated value.
5
-}
6

  
7
module Node
8
    (
9
      Node(failN1, idx, f_mem, f_disk, slist, plist)
10
    -- * Constructor
11
    , create
12
    -- ** Finalization after data loading
13
    , buildPeers
14
    , setIdx
15
    -- * Instance (re)location
16
    , removePri
17
    , removeSec
18
    , addPri
19
    , addSec
20
    -- * Statistics
21
    , normUsed
22
    -- * Formatting
23
    , list
24
    ) where
25

  
26
import Data.List
27
import Text.Printf (printf)
28

  
29
import qualified Container
30
import qualified Instance
31
import qualified PeerMap
32

  
33
import Utils
34

  
... This diff was truncated because it exceeds the maximum size that can be displayed.

Also available in: Unified diff