Revision e4f08c46
b/COPYING | ||
---|---|---|
1 |
GNU GENERAL PUBLIC LICENSE |
|
2 |
Version 2, June 1991 |
|
3 |
|
|
4 |
Copyright (C) 1989, 1991 Free Software Foundation, Inc., |
|
5 |
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
|
6 |
Everyone is permitted to copy and distribute verbatim copies |
|
7 |
of this license document, but changing it is not allowed. |
|
8 |
|
|
9 |
Preamble |
|
10 |
|
|
11 |
The licenses for most software are designed to take away your |
|
12 |
freedom to share and change it. By contrast, the GNU General Public |
|
13 |
License is intended to guarantee your freedom to share and change free |
|
14 |
software--to make sure the software is free for all its users. This |
|
15 |
General Public License applies to most of the Free Software |
|
16 |
Foundation's software and to any other program whose authors commit to |
|
17 |
using it. (Some other Free Software Foundation software is covered by |
|
18 |
the GNU Lesser General Public License instead.) You can apply it to |
|
19 |
your programs, too. |
|
20 |
|
|
21 |
When we speak of free software, we are referring to freedom, not |
|
22 |
price. Our General Public Licenses are designed to make sure that you |
|
23 |
have the freedom to distribute copies of free software (and charge for |
|
24 |
this service if you wish), that you receive source code or can get it |
|
25 |
if you want it, that you can change the software or use pieces of it |
|
26 |
in new free programs; and that you know you can do these things. |
|
27 |
|
|
28 |
To protect your rights, we need to make restrictions that forbid |
|
29 |
anyone to deny you these rights or to ask you to surrender the rights. |
|
30 |
These restrictions translate to certain responsibilities for you if you |
|
31 |
distribute copies of the software, or if you modify it. |
|
32 |
|
|
33 |
For example, if you distribute copies of such a program, whether |
|
34 |
gratis or for a fee, you must give the recipients all the rights that |
|
35 |
you have. You must make sure that they, too, receive or can get the |
|
36 |
source code. And you must show them these terms so they know their |
|
37 |
rights. |
|
38 |
|
|
39 |
We protect your rights with two steps: (1) copyright the software, and |
|
40 |
(2) offer you this license which gives you legal permission to copy, |
|
41 |
distribute and/or modify the software. |
|
42 |
|
|
43 |
Also, for each author's protection and ours, we want to make certain |
|
44 |
that everyone understands that there is no warranty for this free |
|
45 |
software. If the software is modified by someone else and passed on, we |
|
46 |
want its recipients to know that what they have is not the original, so |
|
47 |
that any problems introduced by others will not reflect on the original |
|
48 |
authors' reputations. |
|
49 |
|
|
50 |
Finally, any free program is threatened constantly by software |
|
51 |
patents. We wish to avoid the danger that redistributors of a free |
|
52 |
program will individually obtain patent licenses, in effect making the |
|
53 |
program proprietary. To prevent this, we have made it clear that any |
|
54 |
patent must be licensed for everyone's free use or not licensed at all. |
|
55 |
|
|
56 |
The precise terms and conditions for copying, distribution and |
|
57 |
modification follow. |
|
58 |
|
|
59 |
GNU GENERAL PUBLIC LICENSE |
|
60 |
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION |
|
61 |
|
|
62 |
0. This License applies to any program or other work which contains |
|
63 |
a notice placed by the copyright holder saying it may be distributed |
|
64 |
under the terms of this General Public License. The "Program", below, |
|
65 |
refers to any such program or work, and a "work based on the Program" |
|
66 |
means either the Program or any derivative work under copyright law: |
|
67 |
that is to say, a work containing the Program or a portion of it, |
|
68 |
either verbatim or with modifications and/or translated into another |
|
69 |
language. (Hereinafter, translation is included without limitation in |
|
70 |
the term "modification".) Each licensee is addressed as "you". |
|
71 |
|
|
72 |
Activities other than copying, distribution and modification are not |
|
73 |
covered by this License; they are outside its scope. The act of |
|
74 |
running the Program is not restricted, and the output from the Program |
|
75 |
is covered only if its contents constitute a work based on the |
|
76 |
Program (independent of having been made by running the Program). |
|
77 |
Whether that is true depends on what the Program does. |
|
78 |
|
|
79 |
1. You may copy and distribute verbatim copies of the Program's |
|
80 |
source code as you receive it, in any medium, provided that you |
|
81 |
conspicuously and appropriately publish on each copy an appropriate |
|
82 |
copyright notice and disclaimer of warranty; keep intact all the |
|
83 |
notices that refer to this License and to the absence of any warranty; |
|
84 |
and give any other recipients of the Program a copy of this License |
|
85 |
along with the Program. |
|
86 |
|
|
87 |
You may charge a fee for the physical act of transferring a copy, and |
|
88 |
you may at your option offer warranty protection in exchange for a fee. |
|
89 |
|
|
90 |
2. You may modify your copy or copies of the Program or any portion |
|
91 |
of it, thus forming a work based on the Program, and copy and |
|
92 |
distribute such modifications or work under the terms of Section 1 |
|
93 |
above, provided that you also meet all of these conditions: |
|
94 |
|
|
95 |
a) You must cause the modified files to carry prominent notices |
|
96 |
stating that you changed the files and the date of any change. |
|
97 |
|
|
98 |
b) You must cause any work that you distribute or publish, that in |
|
99 |
whole or in part contains or is derived from the Program or any |
|
100 |
part thereof, to be licensed as a whole at no charge to all third |
|
101 |
parties under the terms of this License. |
|
102 |
|
|
103 |
c) If the modified program normally reads commands interactively |
|
104 |
when run, you must cause it, when started running for such |
|
105 |
interactive use in the most ordinary way, to print or display an |
|
106 |
announcement including an appropriate copyright notice and a |
|
107 |
notice that there is no warranty (or else, saying that you provide |
|
108 |
a warranty) and that users may redistribute the program under |
|
109 |
these conditions, and telling the user how to view a copy of this |
|
110 |
License. (Exception: if the Program itself is interactive but |
|
111 |
does not normally print such an announcement, your work based on |
|
112 |
the Program is not required to print an announcement.) |
|
113 |
|
|
114 |
These requirements apply to the modified work as a whole. If |
|
115 |
identifiable sections of that work are not derived from the Program, |
|
116 |
and can be reasonably considered independent and separate works in |
|
117 |
themselves, then this License, and its terms, do not apply to those |
|
118 |
sections when you distribute them as separate works. But when you |
|
119 |
distribute the same sections as part of a whole which is a work based |
|
120 |
on the Program, the distribution of the whole must be on the terms of |
|
121 |
this License, whose permissions for other licensees extend to the |
|
122 |
entire whole, and thus to each and every part regardless of who wrote it. |
|
123 |
|
|
124 |
Thus, it is not the intent of this section to claim rights or contest |
|
125 |
your rights to work written entirely by you; rather, the intent is to |
|
126 |
exercise the right to control the distribution of derivative or |
|
127 |
collective works based on the Program. |
|
128 |
|
|
129 |
In addition, mere aggregation of another work not based on the Program |
|
130 |
with the Program (or with a work based on the Program) on a volume of |
|
131 |
a storage or distribution medium does not bring the other work under |
|
132 |
the scope of this License. |
|
133 |
|
|
134 |
3. You may copy and distribute the Program (or a work based on it, |
|
135 |
under Section 2) in object code or executable form under the terms of |
|
136 |
Sections 1 and 2 above provided that you also do one of the following: |
|
137 |
|
|
138 |
a) Accompany it with the complete corresponding machine-readable |
|
139 |
source code, which must be distributed under the terms of Sections |
|
140 |
1 and 2 above on a medium customarily used for software interchange; or, |
|
141 |
|
|
142 |
b) Accompany it with a written offer, valid for at least three |
|
143 |
years, to give any third party, for a charge no more than your |
|
144 |
cost of physically performing source distribution, a complete |
|
145 |
machine-readable copy of the corresponding source code, to be |
|
146 |
distributed under the terms of Sections 1 and 2 above on a medium |
|
147 |
customarily used for software interchange; or, |
|
148 |
|
|
149 |
c) Accompany it with the information you received as to the offer |
|
150 |
to distribute corresponding source code. (This alternative is |
|
151 |
allowed only for noncommercial distribution and only if you |
|
152 |
received the program in object code or executable form with such |
|
153 |
an offer, in accord with Subsection b above.) |
|
154 |
|
|
155 |
The source code for a work means the preferred form of the work for |
|
156 |
making modifications to it. For an executable work, complete source |
|
157 |
code means all the source code for all modules it contains, plus any |
|
158 |
associated interface definition files, plus the scripts used to |
|
159 |
control compilation and installation of the executable. However, as a |
|
160 |
special exception, the source code distributed need not include |
|
161 |
anything that is normally distributed (in either source or binary |
|
162 |
form) with the major components (compiler, kernel, and so on) of the |
|
163 |
operating system on which the executable runs, unless that component |
|
164 |
itself accompanies the executable. |
|
165 |
|
|
166 |
If distribution of executable or object code is made by offering |
|
167 |
access to copy from a designated place, then offering equivalent |
|
168 |
access to copy the source code from the same place counts as |
|
169 |
distribution of the source code, even though third parties are not |
|
170 |
compelled to copy the source along with the object code. |
|
171 |
|
|
172 |
4. You may not copy, modify, sublicense, or distribute the Program |
|
173 |
except as expressly provided under this License. Any attempt |
|
174 |
otherwise to copy, modify, sublicense or distribute the Program is |
|
175 |
void, and will automatically terminate your rights under this License. |
|
176 |
However, parties who have received copies, or rights, from you under |
|
177 |
this License will not have their licenses terminated so long as such |
|
178 |
parties remain in full compliance. |
|
179 |
|
|
180 |
5. You are not required to accept this License, since you have not |
|
181 |
signed it. However, nothing else grants you permission to modify or |
|
182 |
distribute the Program or its derivative works. These actions are |
|
183 |
prohibited by law if you do not accept this License. Therefore, by |
|
184 |
modifying or distributing the Program (or any work based on the |
|
185 |
Program), you indicate your acceptance of this License to do so, and |
|
186 |
all its terms and conditions for copying, distributing or modifying |
|
187 |
the Program or works based on it. |
|
188 |
|
|
189 |
6. Each time you redistribute the Program (or any work based on the |
|
190 |
Program), the recipient automatically receives a license from the |
|
191 |
original licensor to copy, distribute or modify the Program subject to |
|
192 |
these terms and conditions. You may not impose any further |
|
193 |
restrictions on the recipients' exercise of the rights granted herein. |
|
194 |
You are not responsible for enforcing compliance by third parties to |
|
195 |
this License. |
|
196 |
|
|
197 |
7. If, as a consequence of a court judgment or allegation of patent |
|
198 |
infringement or for any other reason (not limited to patent issues), |
|
199 |
conditions are imposed on you (whether by court order, agreement or |
|
200 |
otherwise) that contradict the conditions of this License, they do not |
|
201 |
excuse you from the conditions of this License. If you cannot |
|
202 |
distribute so as to satisfy simultaneously your obligations under this |
|
203 |
License and any other pertinent obligations, then as a consequence you |
|
204 |
may not distribute the Program at all. For example, if a patent |
|
205 |
license would not permit royalty-free redistribution of the Program by |
|
206 |
all those who receive copies directly or indirectly through you, then |
|
207 |
the only way you could satisfy both it and this License would be to |
|
208 |
refrain entirely from distribution of the Program. |
|
209 |
|
|
210 |
If any portion of this section is held invalid or unenforceable under |
|
211 |
any particular circumstance, the balance of the section is intended to |
|
212 |
apply and the section as a whole is intended to apply in other |
|
213 |
circumstances. |
|
214 |
|
|
215 |
It is not the purpose of this section to induce you to infringe any |
|
216 |
patents or other property right claims or to contest validity of any |
|
217 |
such claims; this section has the sole purpose of protecting the |
|
218 |
integrity of the free software distribution system, which is |
|
219 |
implemented by public license practices. Many people have made |
|
220 |
generous contributions to the wide range of software distributed |
|
221 |
through that system in reliance on consistent application of that |
|
222 |
system; it is up to the author/donor to decide if he or she is willing |
|
223 |
to distribute software through any other system and a licensee cannot |
|
224 |
impose that choice. |
|
225 |
|
|
226 |
This section is intended to make thoroughly clear what is believed to |
|
227 |
be a consequence of the rest of this License. |
|
228 |
|
|
229 |
8. If the distribution and/or use of the Program is restricted in |
|
230 |
certain countries either by patents or by copyrighted interfaces, the |
|
231 |
original copyright holder who places the Program under this License |
|
232 |
may add an explicit geographical distribution limitation excluding |
|
233 |
those countries, so that distribution is permitted only in or among |
|
234 |
countries not thus excluded. In such case, this License incorporates |
|
235 |
the limitation as if written in the body of this License. |
|
236 |
|
|
237 |
9. The Free Software Foundation may publish revised and/or new versions |
|
238 |
of the General Public License from time to time. Such new versions will |
|
239 |
be similar in spirit to the present version, but may differ in detail to |
|
240 |
address new problems or concerns. |
|
241 |
|
|
242 |
Each version is given a distinguishing version number. If the Program |
|
243 |
specifies a version number of this License which applies to it and "any |
|
244 |
later version", you have the option of following the terms and conditions |
|
245 |
either of that version or of any later version published by the Free |
|
246 |
Software Foundation. If the Program does not specify a version number of |
|
247 |
this License, you may choose any version ever published by the Free Software |
|
248 |
Foundation. |
|
249 |
|
|
250 |
10. If you wish to incorporate parts of the Program into other free |
|
251 |
programs whose distribution conditions are different, write to the author |
|
252 |
to ask for permission. For software which is copyrighted by the Free |
|
253 |
Software Foundation, write to the Free Software Foundation; we sometimes |
|
254 |
make exceptions for this. Our decision will be guided by the two goals |
|
255 |
of preserving the free status of all derivatives of our free software and |
|
256 |
of promoting the sharing and reuse of software generally. |
|
257 |
|
|
258 |
NO WARRANTY |
|
259 |
|
|
260 |
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY |
|
261 |
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN |
|
262 |
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES |
|
263 |
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED |
|
264 |
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF |
|
265 |
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS |
|
266 |
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE |
|
267 |
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, |
|
268 |
REPAIR OR CORRECTION. |
|
269 |
|
|
270 |
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING |
|
271 |
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR |
|
272 |
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, |
|
273 |
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING |
|
274 |
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED |
|
275 |
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY |
|
276 |
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER |
|
277 |
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE |
|
278 |
POSSIBILITY OF SUCH DAMAGES. |
|
279 |
|
|
280 |
END OF TERMS AND CONDITIONS |
|
281 |
|
|
282 |
How to Apply These Terms to Your New Programs |
|
283 |
|
|
284 |
If you develop a new program, and you want it to be of the greatest |
|
285 |
possible use to the public, the best way to achieve this is to make it |
|
286 |
free software which everyone can redistribute and change under these terms. |
|
287 |
|
|
288 |
To do so, attach the following notices to the program. It is safest |
|
289 |
to attach them to the start of each source file to most effectively |
|
290 |
convey the exclusion of warranty; and each file should have at least |
|
291 |
the "copyright" line and a pointer to where the full notice is found. |
|
292 |
|
|
293 |
<one line to give the program's name and a brief idea of what it does.> |
|
294 |
Copyright (C) <year> <name of author> |
|
295 |
|
|
296 |
This program is free software; you can redistribute it and/or modify |
|
297 |
it under the terms of the GNU General Public License as published by |
|
298 |
the Free Software Foundation; either version 2 of the License, or |
|
299 |
(at your option) any later version. |
|
300 |
|
|
301 |
This program is distributed in the hope that it will be useful, |
|
302 |
but WITHOUT ANY WARRANTY; without even the implied warranty of |
|
303 |
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
|
304 |
GNU General Public License for more details. |
|
305 |
|
|
306 |
You should have received a copy of the GNU General Public License along |
|
307 |
with this program; if not, write to the Free Software Foundation, Inc., |
|
308 |
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. |
|
309 |
|
|
310 |
Also add information on how to contact you by electronic and paper mail. |
|
311 |
|
|
312 |
If the program is interactive, make it output a short notice like this |
|
313 |
when it starts in an interactive mode: |
|
314 |
|
|
315 |
Gnomovision version 69, Copyright (C) year name of author |
|
316 |
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. |
|
317 |
This is free software, and you are welcome to redistribute it |
|
318 |
under certain conditions; type `show c' for details. |
|
319 |
|
|
320 |
The hypothetical commands `show w' and `show c' should show the appropriate |
|
321 |
parts of the General Public License. Of course, the commands you use may |
|
322 |
be called something other than `show w' and `show c'; they could even be |
|
323 |
mouse-clicks or menu items--whatever suits your program. |
|
324 |
|
|
325 |
You should also get your employer (if you work as a programmer) or your |
|
326 |
school, if any, to sign a "copyright disclaimer" for the program, if |
|
327 |
necessary. Here is a sample; alter the names: |
|
328 |
|
|
329 |
Yoyodyne, Inc., hereby disclaims all copyright interest in the program |
|
330 |
`Gnomovision' (which makes passes at compilers) written by James Hacker. |
|
331 |
|
|
332 |
<signature of Ty Coon>, 1 April 1989 |
|
333 |
Ty Coon, President of Vice |
|
334 |
|
|
335 |
This General Public License does not permit incorporating your program into |
|
336 |
proprietary programs. If your program is a subroutine library, you may |
|
337 |
consider it more useful to permit linking proprietary applications with the |
|
338 |
library. If this is what you want to do, use the GNU Lesser General |
|
339 |
Public License instead of this License. |
b/Makefile | ||
---|---|---|
1 |
HSRCS := $(wildcard src/*.hs) |
|
2 |
HDDIR = apidoc |
|
3 |
|
|
4 |
# Haskell rules |
|
5 |
|
|
6 |
all: |
|
7 |
$(MAKE) -C src |
|
8 |
|
|
9 |
README.html: README |
|
10 |
rst2html $< $@ |
|
11 |
|
|
12 |
doc: README.html |
|
13 |
rm -rf $(HDDIR) |
|
14 |
mkdir -p $(HDDIR)/src |
|
15 |
cp hscolour.css $(HDDIR)/src |
|
16 |
for file in $(HSRCS); do \ |
|
17 |
HsColour -css -anchor \ |
|
18 |
$$file > $(HDDIR)/src/`basename $$file .hs`.html ; \ |
|
19 |
done |
|
20 |
ln -sf hn1.html $(HDDIR)/src/Main.html |
|
21 |
haddock --odir $(HDDIR) --html --ignore-all-exports \ |
|
22 |
-t hn1 -p haddock-prologue \ |
|
23 |
--source-module="src/%{MODULE/.//}.html" \ |
|
24 |
--source-entity="src/%{MODULE/.//}.html#%{NAME}" \ |
|
25 |
$(HSRCS) |
|
26 |
|
|
27 |
clean: |
|
28 |
rm -f *.o *.cmi *.cmo *.cmx *.old hn1 zn1 *.prof *.ps *.stat *.aux \ |
|
29 |
gmon.out *.hi README.html TAGS |
|
30 |
|
|
31 |
.PHONY : all doc clean hn1 |
b/README | ||
---|---|---|
1 |
Cluster tools (h-aneti?) |
|
2 |
======================== |
|
3 |
|
|
4 |
These are some simple cluster tools for fixing common problems. Right now N+1 |
|
5 |
and rebalancing are included. |
|
6 |
|
|
7 |
.. contents:: |
|
8 |
|
|
9 |
Cluster N+1 solver |
|
10 |
------------------ |
|
11 |
|
|
12 |
This program runs a very simple brute force algorithm over the instance |
|
13 |
placement space in order to determine the shortest number of replace-disks |
|
14 |
needed to fix the cluster. Note this means we won't get a balanced cluster, |
|
15 |
just one that passes N+1 checks. |
|
16 |
|
|
17 |
Also note that the set of all instance placements on a 20/80 cluster is |
|
18 |
(20*19)^80, that is ~10^200, so... |
|
19 |
|
|
20 |
Algorithm |
|
21 |
+++++++++ |
|
22 |
|
|
23 |
The algorithm is a simple two-phase process. |
|
24 |
|
|
25 |
In phase 1 we determine the removal set, that is the set of instances that when |
|
26 |
removed completely from the cluster, make it healthy again. The instances that |
|
27 |
can go into the set are all the primary and secondary instances of the failing |
|
28 |
nodes. The result from this phase is actually a list - we compute all sets of |
|
29 |
the same minimum length. |
|
30 |
|
|
31 |
So basically we aim to determine here: what is the minimum number of instances |
|
32 |
that need to be removed (this is called the removal depth) and which are the |
|
33 |
actual combinations that fit (called the list of removal sets). |
|
34 |
|
|
35 |
In phase 2, for each removal set computed in the previous phase, we take the |
|
36 |
removed instances and try to determine where we can put them so that the |
|
37 |
cluster is still passing N+1 checks. From this list of possible solutions |
|
38 |
(called the list of solutions), we compute the one that has the smallest delta |
|
39 |
from the original state (the delta is the number of replace disks that needs to |
|
40 |
be run) and chose this as the final solution. |
|
41 |
|
|
42 |
Implementation |
|
43 |
++++++++++++++ |
|
44 |
|
|
45 |
Of course, a naive implementation based on the above description will run for |
|
46 |
long periods of time, so the implementation has to be smart in order to prune |
|
47 |
the solution space as eagerly as possible. |
|
48 |
|
|
49 |
In the following, we use as example a set of test data (a cluster with 20 |
|
50 |
nodes, 80 instances that has 5 nodes failing N+1 checks for a total of 12 |
|
51 |
warnings). |
|
52 |
|
|
53 |
On this set, the minimum depth is 4 (anything below fails), and for this depth |
|
54 |
the current version of the algorithm generates 5 removal sets; a previous |
|
55 |
version of the first phase generated a slightly different set of instances, with |
|
56 |
two removal sets. For the original version of the algorithm: |
|
57 |
|
|
58 |
- the first, non-optimized implementation computed a solution of delta=4 in 30 |
|
59 |
minutes on server-class CPUs and was still running when aborted 10 minutes |
|
60 |
later |
|
61 |
- the intermediate optimized version computed the whole solution space and |
|
62 |
found a delta=3 solution in around 10 seconds on a laptop-class CPU (total |
|
63 |
number of solutions ~600k) |
|
64 |
- latest version on server CPUs (which actually computes more removal sets) |
|
65 |
computes depth=4 in less than a second and depth=5 in around 2 seconds, and |
|
66 |
depth=6 in less than 20 seconds; depth=8 takes under five minutes (this is |
|
67 |
10^10 bigger solution space) |
|
68 |
|
|
69 |
Note that when (artificially) increasing the depth to 5 the number of removal |
|
70 |
sets grows fast (~3000) and a (again artificial) depth 6 generates 61k removal |
|
71 |
sets. Therefore, it is possible to restrict the number of solution sets |
|
72 |
examined via a command-line option. |
|
73 |
|
|
74 |
The factors that influence the run time are: |
|
75 |
|
|
76 |
- the removal depth; for each increase with one of the depth, we grow the |
|
77 |
solution space by the number of nodes squared (since a new instance can live |
|
78 |
any two nodes as primary/secondary, therefore (almost) N times N); i.e., |
|
79 |
depth=1 will create a N^2 solution space, depth two will make this N^4, |
|
80 |
depth three will be N^6, etc. |
|
81 |
- the removal depth again; for each increase in the depth, there will be more |
|
82 |
valid removal sets, and the space of solutions increases linearly with the |
|
83 |
number of removal sets |
|
84 |
|
|
85 |
Therefore, the smaller the depth the faster the algorithm will be; it doesn't |
|
86 |
seem like this algorithm will work for clusters of 100 nodes and many many |
|
87 |
small instances (e.g. 256MB instances on 16GB nodes). |
|
88 |
|
|
89 |
Currently applied optimizations: |
|
90 |
|
|
91 |
- when choosing where to place an instance in phase two, there are N*(N-1) |
|
92 |
possible primary/secondary options; however, if instead of iterating over all |
|
93 |
p * s pairs, we first determine the set of primary nodes that can hold this |
|
94 |
instance (without failing N+1), we can cut (N-1) secondary placements for |
|
95 |
each primary node removed; and since this applies at every iteration of phase |
|
96 |
2 it linearly decreases the solution space, and on full clusters, this can |
|
97 |
mean a four-five times reductions of solution space |
|
98 |
- since the number of solutions is very high even for smaller depths (on the |
|
99 |
test data, depth=4 results in 1.8M solutions) we can't compare them at the |
|
100 |
end, so at each iteration in phase 2 we only promote the best solution out of |
|
101 |
our own set of solutions |
|
102 |
- since the placement of instances can only increase the delta of the solution |
|
103 |
(placing a new instance will add zero or more replace-disks steps), it means |
|
104 |
the delta will only increase while recursing during phase 2; therefore, if we |
|
105 |
know at one point that we have a current delta that is equal or higher to the |
|
106 |
delta of the best solution so far, we can abort the recursion; this cuts a |
|
107 |
tremendous number of branches; further promotion of the best solution from |
|
108 |
one removal set to another can cut entire removal sets after a few recursions |
|
109 |
|
|
110 |
Command line usage |
|
111 |
++++++++++++++++++ |
|
112 |
|
|
113 |
Synopsis:: |
|
114 |
|
|
115 |
hn1 [-n NODES_FILE] [-i INSTANCES_FILE] [-d START_DEPTH] \ |
|
116 |
[-r MAX_REMOVALS] [-l MIN_DELTA] [-L MAX_DELTA] \ |
|
117 |
[-p] [-C] |
|
118 |
|
|
119 |
The -n and -i options change the names of the input files. The -d option |
|
120 |
changes the start depth, as a higher depth can give (with a longer computation |
|
121 |
time) a solution with better delta. The -r option restricts at each depth the |
|
122 |
number of solutions considered - with r=1000 for example even depth=10 finishes |
|
123 |
in less than a second. |
|
124 |
|
|
125 |
The -p option will show the cluster state after the solution is implemented, |
|
126 |
while the -C option will show the needed gnt-instance commands to implement |
|
127 |
it. |
|
128 |
|
|
129 |
The -l (--min-delta) and -L (--max-delta) options restrict the solution in the |
|
130 |
following ways: |
|
131 |
|
|
132 |
- min-delta will cause the search to abort early once we find a solution with |
|
133 |
delta less than or equal to this parameter; this can cause extremely fast |
|
134 |
results in case a desired solution is found quickly; the default value for |
|
135 |
this parameter is zero, so once we find a "perfect" solution we finish early |
|
136 |
- max-delta causes rejection of valid solution but which have delta higher |
|
137 |
than the value of this parameter; this can reduce the depth of the search |
|
138 |
tree, with sometimes significant speedups; by default, this optimization is |
|
139 |
not used |
|
140 |
|
|
141 |
Individually or combined, these two parameters can (if there are any) very |
|
142 |
fast result; on our test data, depth=34 (max depth!) is solved in 2 seconds |
|
143 |
with min-delta=0/max-delta=1 (since there is such a solution), and the |
|
144 |
extremely low max-delta causes extreme pruning. |
|
145 |
|
|
146 |
Cluster rebalancer |
|
147 |
------------------ |
|
148 |
|
|
149 |
Compared to the N+1 solver, the rebalancer uses a very simple algorithm: |
|
150 |
repeatedly try to move each instance one step, so that the cluster score |
|
151 |
becomes better. We stop when no further move can improve the score. |
|
152 |
|
|
153 |
The algorithm is divided into rounds (all identical): |
|
154 |
|
|
155 |
#. Repeat for each instance: |
|
156 |
|
|
157 |
#. Compute score after the potential failover of the instance |
|
158 |
|
|
159 |
#. For each node that is different from the current primary/secondary |
|
160 |
|
|
161 |
#. Compute score after replacing the primary with this new node |
|
162 |
|
|
163 |
#. Compute score after replacing the secondary with this new node |
|
164 |
|
|
165 |
|
|
166 |
#. Out of this N*2+1 possible new scores (and their associated move) for |
|
167 |
this instance, we choose the one that is the best in terms of cluster |
|
168 |
score, and then proceed to the next instance |
|
169 |
|
|
170 |
Since we don't compute all combinations of moves for instances (e.g. the first |
|
171 |
instance's all moves Cartesian product with second instance's all moves, etc.) |
|
172 |
but we proceed serially instance A, then B, then C, the total computations we |
|
173 |
make in one steps is simply N(number of nodes)*2+1 times I(number of instances), |
|
174 |
instead of (N*2+1)^I. So therefore the runtime for a round is trivial. |
|
175 |
|
|
176 |
Further rounds are done, since the relocation of instances might offer better |
|
177 |
places for instances which we didn't move, or simply didn't move to the best |
|
178 |
place. It is possible to limit the rounds, but usually the algorithm finishes |
|
179 |
after a few rounds by itself. |
|
180 |
|
|
181 |
Note that the cluster *must* be N+1 compliant before this algorithm is run, and |
|
182 |
will stay at each move N+1 compliant. Therefore, the final cluster will be N+1 |
|
183 |
compliant. |
|
184 |
|
|
185 |
Single-round solutions |
|
186 |
++++++++++++++++++++++ |
|
187 |
|
|
188 |
Single-round solutions have the very nice property that they are |
|
189 |
incrementally-valid. In other words, if you have a 10-step solution, at each |
|
190 |
step the cluster is both N+1 compliant and better than the previous step. |
|
191 |
|
|
192 |
This means that you can stop at any point and you will have a better cluster. |
|
193 |
For this reason, single-round solutions are recommended in the common case of |
|
194 |
let's make this better. Multi-round solutions will be better though when adding |
|
195 |
a couple of new, empty nodes to the cluster due to the many relocations needed. |
|
196 |
|
|
197 |
|
|
198 |
Multi-round solutions |
|
199 |
+++++++++++++++++++++ |
|
200 |
|
|
201 |
A multi-round solution (not for a single round), due to de-duplication of moves |
|
202 |
(i.e. just put the instance directly in its final place, and not move it five |
|
203 |
times around) loses both these properties. It might be that it's not possible to |
|
204 |
directly put the instance on the final nodes. So it can be possible that yes, |
|
205 |
the cluster is happy in the final solution and nice, but you cannot do the steps |
|
206 |
in the shown order. Solving this (via additional instance move(s)) is left to |
|
207 |
the user. |
|
208 |
|
|
209 |
Command line usage |
|
210 |
++++++++++++++++++ |
|
211 |
|
|
212 |
Synopsis:: |
|
213 |
|
|
214 |
hbal [-n NODES_FILE] [-i INSTANCES_FILE] \ |
|
215 |
[-r MAX_ROUNDS] \ |
|
216 |
[-p] [-C] |
|
217 |
|
|
218 |
The -n and -i options change the names of the input files. The -r option |
|
219 |
restricts the maximum number of rounds (and is more of safety measure). |
|
220 |
|
|
221 |
The -p option will show the cluster state after the solution is implemented, |
|
222 |
while the -C option will show the needed gnt-instance commands to implement |
|
223 |
it. |
|
224 |
|
|
225 |
Integration with Ganeti |
|
226 |
----------------------- |
|
227 |
|
|
228 |
The programs needs only the output of the node list and instance list. That is, |
|
229 |
they need the following two commands to be run:: |
|
230 |
|
|
231 |
gnt-node list -oname,mtotal,mfree,dtotal,dfree,pinst_list,sinst_list \ |
|
232 |
--separator '|' --no-headers > nodes |
|
233 |
gnt-instance list -oname,admin_ram,sda_size \ |
|
234 |
--separator '|' --no-head > instances |
|
235 |
|
|
236 |
These two files should be saved under the names of 'nodes' and 'instances'. |
|
237 |
|
|
238 |
When run, the programs will show some informational messages and output the |
|
239 |
chosen solution, in the form of a list of instance name and chosen |
|
240 |
primary/secondary nodes. The user then needs to run the necessary commands to |
|
241 |
get the instances to live on those nodes. |
|
242 |
|
|
243 |
Note that sda_size is less than the total disk size of an instance by 4352 |
|
244 |
MiB, so if disk space is at a premium the calculation could be wrong; in this |
|
245 |
case, please adjust the values manually. |
b/haddock-prologue | ||
---|---|---|
1 |
This is the internal documentation for hn1, an experimental N+1 |
|
2 |
cluster solver. |
|
3 |
|
|
4 |
Start with the "Main" module, the follow with "Cluster" and then the |
|
5 |
rest. |
b/hscolour.css | ||
---|---|---|
1 |
|
|
2 |
.keyglyph, .layout {color: red;} |
|
3 |
.keyword {color: blue;} |
|
4 |
.comment, .comment a {color: green;} |
|
5 |
.str, .chr {color: teal;} |
|
6 |
.keyword,.conid, .varid, .conop, .varop, .num, .cpp, .sel, .definition {} |
b/src/Cluster.hs | ||
---|---|---|
1 |
{-| Implementation of cluster-wide logic. |
|
2 |
|
|
3 |
This module holds all pure cluster-logic; I\/O related functionality |
|
4 |
goes into the "Main" module. |
|
5 |
|
|
6 |
-} |
|
7 |
|
|
8 |
module Cluster |
|
9 |
( |
|
10 |
-- * Types |
|
11 |
NodeList |
|
12 |
, InstanceList |
|
13 |
, Placement |
|
14 |
, Solution(..) |
|
15 |
, Table(..) |
|
16 |
, Removal |
|
17 |
-- * Generic functions |
|
18 |
, totalResources |
|
19 |
-- * First phase functions |
|
20 |
, computeBadItems |
|
21 |
-- * Second phase functions |
|
22 |
, computeSolution |
|
23 |
, applySolution |
|
24 |
, printSolution |
|
25 |
, printNodes |
|
26 |
-- * Balacing functions |
|
27 |
, checkMove |
|
28 |
, compCV |
|
29 |
, printStats |
|
30 |
-- * Loading functions |
|
31 |
, loadData |
|
32 |
) where |
|
33 |
|
|
34 |
import Data.List |
|
35 |
import Data.Maybe (isNothing, fromJust) |
|
36 |
import Text.Printf (printf) |
|
37 |
import Data.Function |
|
38 |
|
|
39 |
import qualified Container |
|
40 |
import qualified Instance |
|
41 |
import qualified Node |
|
42 |
import Utils |
|
43 |
|
|
44 |
type NodeList = Container.Container Node.Node |
|
45 |
type InstanceList = Container.Container Instance.Instance |
|
46 |
type Score = Double |
|
47 |
|
|
48 |
-- | The description of an instance placement. |
|
49 |
type Placement = (Int, Int, Int) |
|
50 |
|
|
51 |
{- | A cluster solution described as the solution delta and the list |
|
52 |
of placements. |
|
53 |
|
|
54 |
-} |
|
55 |
data Solution = Solution Int [Placement] |
|
56 |
deriving (Eq, Ord, Show) |
|
57 |
|
|
58 |
-- | Returns the delta of a solution or -1 for Nothing |
|
59 |
solutionDelta :: Maybe Solution -> Int |
|
60 |
solutionDelta sol = case sol of |
|
61 |
Just (Solution d _) -> d |
|
62 |
_ -> -1 |
|
63 |
|
|
64 |
-- | A removal set. |
|
65 |
data Removal = Removal NodeList [Instance.Instance] |
|
66 |
|
|
67 |
-- | An instance move definition |
|
68 |
data IMove = Failover |
|
69 |
| ReplacePrimary Int |
|
70 |
| ReplaceSecondary Int |
|
71 |
deriving (Show) |
|
72 |
|
|
73 |
-- | The complete state for the balancing solution |
|
74 |
data Table = Table NodeList InstanceList Score [Placement] |
|
75 |
deriving (Show) |
|
76 |
|
|
77 |
-- General functions |
|
78 |
|
|
79 |
-- | Cap the removal list if needed. |
|
80 |
capRemovals :: [a] -> Int -> [a] |
|
81 |
capRemovals removals max_removals = |
|
82 |
if max_removals > 0 then |
|
83 |
take max_removals removals |
|
84 |
else |
|
85 |
removals |
|
86 |
|
|
87 |
-- | Check if the given node list fails the N+1 check. |
|
88 |
verifyN1Check :: [Node.Node] -> Bool |
|
89 |
verifyN1Check nl = any Node.failN1 nl |
|
90 |
|
|
91 |
-- | Verifies the N+1 status and return the affected nodes. |
|
92 |
verifyN1 :: [Node.Node] -> [Node.Node] |
|
93 |
verifyN1 nl = filter Node.failN1 nl |
|
94 |
|
|
95 |
{-| Add an instance and return the new node and instance maps. -} |
|
96 |
addInstance :: NodeList -> Instance.Instance -> |
|
97 |
Node.Node -> Node.Node -> Maybe NodeList |
|
98 |
addInstance nl idata pri sec = |
|
99 |
let pdx = Node.idx pri |
|
100 |
sdx = Node.idx sec |
|
101 |
in do |
|
102 |
pnode <- Node.addPri pri idata |
|
103 |
snode <- Node.addSec sec idata pdx |
|
104 |
new_nl <- return $ Container.addTwo sdx snode |
|
105 |
pdx pnode nl |
|
106 |
return new_nl |
|
107 |
|
|
108 |
-- | Remove an instance and return the new node and instance maps. |
|
109 |
removeInstance :: NodeList -> Instance.Instance -> NodeList |
|
110 |
removeInstance nl idata = |
|
111 |
let pnode = Instance.pnode idata |
|
112 |
snode = Instance.snode idata |
|
113 |
pn = Container.find pnode nl |
|
114 |
sn = Container.find snode nl |
|
115 |
new_nl = Container.addTwo |
|
116 |
pnode (Node.removePri pn idata) |
|
117 |
snode (Node.removeSec sn idata) nl in |
|
118 |
new_nl |
|
119 |
|
|
120 |
-- | Remove an instance and return the new node map. |
|
121 |
removeInstances :: NodeList -> [Instance.Instance] -> NodeList |
|
122 |
removeInstances = foldl' removeInstance |
|
123 |
|
|
124 |
-- | Compute the total free disk and memory in the cluster. |
|
125 |
totalResources :: Container.Container Node.Node -> (Int, Int) |
|
126 |
totalResources nl = |
|
127 |
foldl' |
|
128 |
(\ (mem, disk) node -> (mem + (Node.f_mem node), |
|
129 |
disk + (Node.f_disk node))) |
|
130 |
(0, 0) (Container.elems nl) |
|
131 |
|
|
132 |
{- | Compute a new version of a cluster given a solution. |
|
133 |
|
|
134 |
This is not used for computing the solutions, but for applying a |
|
135 |
(known-good) solution to the original cluster for final display. |
|
136 |
|
|
137 |
It first removes the relocated instances after which it places them on |
|
138 |
their new nodes. |
|
139 |
|
|
140 |
-} |
|
141 |
applySolution :: NodeList -> InstanceList -> [Placement] -> NodeList |
|
142 |
applySolution nl il sol = |
|
143 |
let odxes = map (\ (a, b, c) -> (Container.find a il, |
|
144 |
Node.idx (Container.find b nl), |
|
145 |
Node.idx (Container.find c nl)) |
|
146 |
) sol |
|
147 |
idxes = (\ (x, _, _) -> x) (unzip3 odxes) |
|
148 |
nc = removeInstances nl idxes |
|
149 |
in |
|
150 |
foldl' (\ nz (a, b, c) -> |
|
151 |
let new_p = Container.find b nz |
|
152 |
new_s = Container.find c nz in |
|
153 |
fromJust (addInstance nz a new_p new_s) |
|
154 |
) nc odxes |
|
155 |
|
|
156 |
|
|
157 |
-- First phase functions |
|
158 |
|
|
159 |
{- | Given a list 1,2,3..n build a list of pairs [(1, [2..n]), (2, |
|
160 |
[3..n]), ...] |
|
161 |
|
|
162 |
-} |
|
163 |
genParts :: [a] -> Int -> [(a, [a])] |
|
164 |
genParts l count = |
|
165 |
case l of |
|
166 |
[] -> [] |
|
167 |
x:xs -> |
|
168 |
if length l < count then |
|
169 |
[] |
|
170 |
else |
|
171 |
(x, xs) : (genParts xs count) |
|
172 |
|
|
173 |
-- | Generates combinations of count items from the names list. |
|
174 |
genNames :: Int -> [b] -> [[b]] |
|
175 |
genNames count1 names1 = |
|
176 |
let aux_fn count names current = |
|
177 |
case count of |
|
178 |
0 -> [current] |
|
179 |
_ -> |
|
180 |
concatMap |
|
181 |
(\ (x, xs) -> aux_fn (count - 1) xs (x:current)) |
|
182 |
(genParts names count) |
|
183 |
in |
|
184 |
aux_fn count1 names1 [] |
|
185 |
|
|
186 |
{- | Computes the pair of bad nodes and instances. |
|
187 |
|
|
188 |
The bad node list is computed via a simple 'verifyN1' check, and the |
|
189 |
bad instance list is the list of primary and secondary instances of |
|
190 |
those nodes. |
|
191 |
|
|
192 |
-} |
|
193 |
computeBadItems :: NodeList -> InstanceList -> |
|
194 |
([Node.Node], [Instance.Instance]) |
|
195 |
computeBadItems nl il = |
|
196 |
let bad_nodes = verifyN1 $ Container.elems nl |
|
197 |
bad_instances = map (\idx -> Container.find idx il) $ |
|
198 |
sort $ nub $ concat $ |
|
199 |
map (\ n -> (Node.slist n) ++ (Node.plist n)) bad_nodes |
|
200 |
in |
|
201 |
(bad_nodes, bad_instances) |
|
202 |
|
|
203 |
|
|
204 |
{- | Checks if removal of instances results in N+1 pass. |
|
205 |
|
|
206 |
Note: the check removal cannot optimize by scanning only the affected |
|
207 |
nodes, since the cluster is known to be not healthy; only the check |
|
208 |
placement can make this shortcut. |
|
209 |
|
|
210 |
-} |
|
211 |
checkRemoval :: NodeList -> [Instance.Instance] -> Maybe Removal |
|
212 |
checkRemoval nl victims = |
|
213 |
let nx = removeInstances nl victims |
|
214 |
failN1 = verifyN1Check (Container.elems nx) |
|
215 |
in |
|
216 |
if failN1 then |
|
217 |
Nothing |
|
218 |
else |
|
219 |
Just $ Removal nx victims |
|
220 |
|
|
221 |
|
|
222 |
-- | Computes the removals list for a given depth |
|
223 |
computeRemovals :: Cluster.NodeList |
|
224 |
-> [Instance.Instance] |
|
225 |
-> Int |
|
226 |
-> [Maybe Cluster.Removal] |
|
227 |
computeRemovals nl bad_instances depth = |
|
228 |
map (checkRemoval nl) $ genNames depth bad_instances |
|
229 |
|
|
230 |
-- Second phase functions |
|
231 |
|
|
232 |
-- | Single-node relocation cost |
|
233 |
nodeDelta :: Int -> Int -> Int -> Int |
|
234 |
nodeDelta i p s = |
|
235 |
if i == p || i == s then |
|
236 |
0 |
|
237 |
else |
|
238 |
1 |
|
239 |
|
|
240 |
{-| Compute best solution. |
|
241 |
|
|
242 |
This function compares two solutions, choosing the minimum valid |
|
243 |
solution. |
|
244 |
-} |
|
245 |
compareSolutions :: Maybe Solution -> Maybe Solution -> Maybe Solution |
|
246 |
compareSolutions a b = case (a, b) of |
|
247 |
(Nothing, x) -> x |
|
248 |
(x, Nothing) -> x |
|
249 |
(x, y) -> min x y |
|
250 |
|
|
251 |
-- | Compute best table. Note that the ordering of the arguments is important. |
|
252 |
compareTables :: Table -> Table -> Table |
|
253 |
compareTables a@(Table _ _ a_cv _) b@(Table _ _ b_cv _ ) = |
|
254 |
if a_cv > b_cv then b else a |
|
255 |
|
|
256 |
-- | Check if a given delta is worse then an existing solution. |
|
257 |
tooHighDelta :: Maybe Solution -> Int -> Int -> Bool |
|
258 |
tooHighDelta sol new_delta max_delta = |
|
259 |
if new_delta > max_delta && max_delta >=0 then |
|
260 |
True |
|
261 |
else |
|
262 |
case sol of |
|
263 |
Nothing -> False |
|
264 |
Just (Solution old_delta _) -> old_delta <= new_delta |
|
265 |
|
|
266 |
{-| Check if placement of instances still keeps the cluster N+1 compliant. |
|
267 |
|
|
268 |
This is the workhorse of the allocation algorithm: given the |
|
269 |
current node and instance maps, the list of instances to be |
|
270 |
placed, and the current solution, this will return all possible |
|
271 |
solution by recursing until all target instances are placed. |
|
272 |
|
|
273 |
-} |
|
274 |
checkPlacement :: NodeList -- ^ The current node list |
|
275 |
-> [Instance.Instance] -- ^ List of instances still to place |
|
276 |
-> [Placement] -- ^ Partial solution until now |
|
277 |
-> Int -- ^ The delta of the partial solution |
|
278 |
-> Maybe Solution -- ^ The previous solution |
|
279 |
-> Int -- ^ Abort if the we go above this delta |
|
280 |
-> Maybe Solution -- ^ The new solution |
|
281 |
checkPlacement nl victims current current_delta prev_sol max_delta = |
|
282 |
let target = head victims |
|
283 |
opdx = Instance.pnode target |
|
284 |
osdx = Instance.snode target |
|
285 |
vtail = tail victims |
|
286 |
have_tail = (length vtail) > 0 |
|
287 |
nodes = Container.elems nl |
|
288 |
in |
|
289 |
foldl' |
|
290 |
(\ accu_p pri -> |
|
291 |
let |
|
292 |
pri_idx = Node.idx pri |
|
293 |
upri_delta = current_delta + nodeDelta pri_idx opdx osdx |
|
294 |
new_pri = Node.addPri pri target |
|
295 |
fail_delta1 = tooHighDelta accu_p upri_delta max_delta |
|
296 |
in |
|
297 |
if fail_delta1 || isNothing(new_pri) then accu_p |
|
298 |
else let pri_nl = Container.add pri_idx (fromJust new_pri) nl in |
|
299 |
foldl' |
|
300 |
(\ accu sec -> |
|
301 |
let |
|
302 |
sec_idx = Node.idx sec |
|
303 |
upd_delta = upri_delta + |
|
304 |
nodeDelta sec_idx opdx osdx |
|
305 |
fail_delta2 = tooHighDelta accu upd_delta max_delta |
|
306 |
new_sec = Node.addSec sec target pri_idx |
|
307 |
in |
|
308 |
if sec_idx == pri_idx || fail_delta2 || |
|
309 |
isNothing new_sec then accu |
|
310 |
else let |
|
311 |
nx = Container.add sec_idx (fromJust new_sec) pri_nl |
|
312 |
plc = (Instance.idx target, pri_idx, sec_idx) |
|
313 |
c2 = plc:current |
|
314 |
result = |
|
315 |
if have_tail then |
|
316 |
checkPlacement nx vtail c2 upd_delta |
|
317 |
accu max_delta |
|
318 |
else |
|
319 |
Just (Solution upd_delta c2) |
|
320 |
in compareSolutions accu result |
|
321 |
) accu_p nodes |
|
322 |
) prev_sol nodes |
|
323 |
|
|
324 |
-- | Apply a move |
|
325 |
applyMove :: NodeList -> Instance.Instance |
|
326 |
-> IMove -> (Maybe NodeList, Instance.Instance, Int, Int) |
|
327 |
applyMove nl inst Failover = |
|
328 |
let old_pdx = Instance.pnode inst |
|
329 |
old_sdx = Instance.snode inst |
|
330 |
old_p = Container.find old_pdx nl |
|
331 |
old_s = Container.find old_sdx nl |
|
332 |
int_p = Node.removePri old_p inst |
|
333 |
int_s = Node.removeSec old_s inst |
|
334 |
new_p = Node.addPri int_s inst |
|
335 |
new_s = Node.addSec int_p inst old_sdx |
|
336 |
new_nl = if isNothing(new_p) || isNothing(new_s) then Nothing |
|
337 |
else Just $ Container.addTwo old_pdx (fromJust new_s) |
|
338 |
old_sdx (fromJust new_p) nl |
|
339 |
in (new_nl, Instance.setBoth inst old_sdx old_pdx, old_sdx, old_pdx) |
|
340 |
|
|
341 |
applyMove nl inst (ReplacePrimary new_pdx) = |
|
342 |
let old_pdx = Instance.pnode inst |
|
343 |
old_sdx = Instance.snode inst |
|
344 |
old_p = Container.find old_pdx nl |
|
345 |
old_s = Container.find old_sdx nl |
|
346 |
tgt_n = Container.find new_pdx nl |
|
347 |
int_p = Node.removePri old_p inst |
|
348 |
int_s = Node.removeSec old_s inst |
|
349 |
new_p = Node.addPri tgt_n inst |
|
350 |
new_s = Node.addSec int_s inst new_pdx |
|
351 |
new_nl = if isNothing(new_p) || isNothing(new_s) then Nothing |
|
352 |
else Just $ Container.add new_pdx (fromJust new_p) $ |
|
353 |
Container.addTwo old_pdx int_p |
|
354 |
old_sdx (fromJust new_s) nl |
|
355 |
in (new_nl, Instance.setPri inst new_pdx, new_pdx, old_sdx) |
|
356 |
|
|
357 |
applyMove nl inst (ReplaceSecondary new_sdx) = |
|
358 |
let old_pdx = Instance.pnode inst |
|
359 |
old_sdx = Instance.snode inst |
|
360 |
old_s = Container.find old_sdx nl |
|
361 |
tgt_n = Container.find new_sdx nl |
|
362 |
int_s = Node.removeSec old_s inst |
|
363 |
new_s = Node.addSec tgt_n inst old_pdx |
|
364 |
new_nl = if isNothing(new_s) then Nothing |
|
365 |
else Just $ Container.addTwo new_sdx (fromJust new_s) |
|
366 |
old_sdx int_s nl |
|
367 |
in (new_nl, Instance.setSec inst new_sdx, old_pdx, new_sdx) |
|
368 |
|
|
369 |
checkSingleStep :: Table -- ^ The original table |
|
370 |
-> Instance.Instance -- ^ The instance to move |
|
371 |
-> Table -- ^ The current best table |
|
372 |
-> IMove -- ^ The move to apply |
|
373 |
-> Table -- ^ The final best table |
|
374 |
checkSingleStep ini_tbl target cur_tbl move = |
|
375 |
let |
|
376 |
Table ini_nl ini_il _ ini_plc = ini_tbl |
|
377 |
(tmp_nl, new_inst, pri_idx, sec_idx) = |
|
378 |
applyMove ini_nl target move |
|
379 |
in |
|
380 |
if isNothing tmp_nl then cur_tbl |
|
381 |
else |
|
382 |
let tgt_idx = Instance.idx target |
|
383 |
upd_nl = fromJust tmp_nl |
|
384 |
upd_cvar = compCV upd_nl |
|
385 |
upd_il = Container.add tgt_idx new_inst ini_il |
|
386 |
tmp_plc = filter (\ (t, _, _) -> t /= tgt_idx) ini_plc |
|
387 |
upd_plc = (tgt_idx, pri_idx, sec_idx):tmp_plc |
|
388 |
upd_tbl = Table upd_nl upd_il upd_cvar upd_plc |
|
389 |
in |
|
390 |
compareTables cur_tbl upd_tbl |
|
391 |
|
|
392 |
-- | Compute the best next move. |
|
393 |
checkMove :: Table -- ^ The current solution |
|
394 |
-> [Instance.Instance] -- ^ List of instances still to move |
|
395 |
-> Table -- ^ The new solution |
|
396 |
checkMove ini_tbl victims = |
|
397 |
let target = head victims |
|
398 |
opdx = Instance.pnode target |
|
399 |
osdx = Instance.snode target |
|
400 |
vtail = tail victims |
|
401 |
have_tail = (length vtail) > 0 |
|
402 |
Table ini_nl _ _ _ = ini_tbl |
|
403 |
nodes = filter (\node -> let idx = Node.idx node |
|
404 |
in idx /= opdx && idx /= osdx) |
|
405 |
$ Container.elems ini_nl |
|
406 |
aft_failover = checkSingleStep ini_tbl target ini_tbl Failover |
|
407 |
next_tbl = |
|
408 |
foldl' |
|
409 |
(\ accu_p new_node -> |
|
410 |
let |
|
411 |
new_idx = Node.idx new_node |
|
412 |
pmoves = [ReplacePrimary new_idx, |
|
413 |
ReplaceSecondary new_idx] |
|
414 |
in |
|
415 |
foldl' (checkSingleStep ini_tbl target) accu_p pmoves |
|
416 |
) aft_failover nodes |
|
417 |
in if have_tail then checkMove next_tbl vtail |
|
418 |
else next_tbl |
|
419 |
|
|
420 |
|
|
421 |
|
|
422 |
{- | Auxiliary function for solution computation. |
|
423 |
|
|
424 |
We write this in an explicit recursive fashion in order to control |
|
425 |
early-abort in case we have met the min delta. We can't use foldr |
|
426 |
instead of explicit recursion since we need the accumulator for the |
|
427 |
abort decision. |
|
428 |
|
|
429 |
-} |
|
430 |
advanceSolution :: [Maybe Removal] -- ^ The removal to process |
|
431 |
-> Int -- ^ Minimum delta parameter |
|
432 |
-> Int -- ^ Maximum delta parameter |
|
433 |
-> Maybe Solution -- ^ Current best solution |
|
434 |
-> Maybe Solution -- ^ New best solution |
|
435 |
advanceSolution [] _ _ sol = sol |
|
436 |
advanceSolution (Nothing:xs) m n sol = advanceSolution xs m n sol |
|
437 |
advanceSolution ((Just (Removal nx removed)):xs) min_d max_d prev_sol = |
|
438 |
let new_sol = checkPlacement nx removed [] 0 prev_sol max_d |
|
439 |
new_delta = solutionDelta $! new_sol |
|
440 |
in |
|
441 |
if new_delta >= 0 && new_delta <= min_d then |
|
442 |
new_sol |
|
443 |
else |
|
444 |
advanceSolution xs min_d max_d new_sol |
|
445 |
|
|
446 |
-- | Computes the placement solution. |
|
447 |
solutionFromRemovals :: [Maybe Removal] -- ^ The list of (possible) removals |
|
448 |
-> Int -- ^ Minimum delta parameter |
|
449 |
-> Int -- ^ Maximum delta parameter |
|
450 |
-> Maybe Solution -- ^ The best solution found |
|
451 |
solutionFromRemovals removals min_delta max_delta = |
|
452 |
advanceSolution removals min_delta max_delta Nothing |
|
453 |
|
|
454 |
{- | Computes the solution at the given depth. |
|
455 |
|
|
456 |
This is a wrapper over both computeRemovals and |
|
457 |
solutionFromRemovals. In case we have no solution, we return Nothing. |
|
458 |
|
|
459 |
-} |
|
460 |
computeSolution :: NodeList -- ^ The original node data |
|
461 |
-> [Instance.Instance] -- ^ The list of /bad/ instances |
|
462 |
-> Int -- ^ The /depth/ of removals |
|
463 |
-> Int -- ^ Maximum number of removals to process |
|
464 |
-> Int -- ^ Minimum delta parameter |
|
465 |
-> Int -- ^ Maximum delta parameter |
|
466 |
-> Maybe Solution -- ^ The best solution found (or Nothing) |
|
467 |
computeSolution nl bad_instances depth max_removals min_delta max_delta = |
|
468 |
let |
|
469 |
removals = computeRemovals nl bad_instances depth |
|
470 |
removals' = capRemovals removals max_removals |
|
471 |
in |
|
472 |
solutionFromRemovals removals' min_delta max_delta |
|
473 |
|
|
474 |
-- Solution display functions (pure) |
|
475 |
|
|
476 |
-- | Given the original and final nodes, computes the relocation description. |
|
477 |
computeMoves :: String -- ^ The instance name |
|
478 |
-> String -- ^ Original primary |
|
479 |
-> String -- ^ Original secondary |
|
480 |
-> String -- ^ New primary |
|
481 |
-> String -- ^ New secondary |
|
482 |
-> (String, [String]) |
|
483 |
-- ^ Tuple of moves and commands list; moves is containing |
|
484 |
-- either @/f/@ for failover or @/r:name/@ for replace |
|
485 |
-- secondary, while the command list holds gnt-instance |
|
486 |
-- commands (without that prefix), e.g \"@failover instance1@\" |
|
487 |
computeMoves i a b c d = |
|
488 |
if c == a then {- Same primary -} |
|
489 |
if d == b then {- Same sec??! -} |
|
490 |
("-", []) |
|
491 |
else {- Change of secondary -} |
|
492 |
(printf "r:%s" d, |
|
493 |
[printf "replace-disks -n %s %s" d i]) |
|
494 |
else |
|
495 |
if c == b then {- Failover and ... -} |
|
496 |
if d == a then {- that's all -} |
|
497 |
("f", [printf "failover %s" i]) |
|
498 |
else |
|
499 |
(printf "f r:%s" d, |
|
500 |
[printf "failover %s" i, |
|
501 |
printf "replace-disks -n %s %s" d i]) |
|
502 |
else |
|
503 |
if d == a then {- ... and keep primary as secondary -} |
|
504 |
(printf "r:%s f" c, |
|
505 |
[printf "replace-disks -n %s %s" c i, |
|
506 |
printf "failover %s" i]) |
|
507 |
else |
|
508 |
if d == b then {- ... keep same secondary -} |
|
509 |
(printf "f r:%s f" c, |
|
510 |
[printf "failover %s" i, |
|
511 |
printf "replace-disks -n %s %s" c i, |
|
512 |
printf "failover %s" i]) |
|
513 |
|
|
514 |
else {- Nothing in common -} |
|
515 |
(printf "r:%s f r:%s" c d, |
|
516 |
[printf "replace-disks -n %s %s" c i, |
|
517 |
printf "failover %s" i, |
|
518 |
printf "replace-disks -n %s %s" d i]) |
|
519 |
|
|
520 |
{-| Converts a solution to string format -} |
|
521 |
printSolution :: InstanceList |
|
522 |
-> [(Int, String)] |
|
523 |
-> [(Int, String)] |
|
524 |
-> [Placement] |
|
525 |
-> ([String], [[String]]) |
|
526 |
printSolution il ktn kti sol = |
|
527 |
unzip $ map |
|
528 |
(\ (i, p, s) -> |
|
529 |
let inst = Container.find i il |
|
530 |
inam = fromJust $ lookup (Instance.idx inst) kti |
|
531 |
npri = fromJust $ lookup p ktn |
|
532 |
nsec = fromJust $ lookup s ktn |
|
533 |
opri = fromJust $ lookup (Instance.pnode inst) ktn |
|
534 |
osec = fromJust $ lookup (Instance.snode inst) ktn |
|
535 |
(moves, cmds) = computeMoves inam opri osec npri nsec |
|
536 |
|
|
537 |
in |
|
538 |
(printf " I: %s\to: %s+>%s\tn: %s+>%s\ta: %s" |
|
539 |
inam opri osec npri nsec moves, |
|
540 |
cmds) |
|
541 |
) sol |
|
542 |
|
|
543 |
-- | Print the node list. |
|
544 |
printNodes :: [(Int, String)] -> NodeList -> String |
|
545 |
printNodes ktn nl = |
|
546 |
let snl = sortBy (compare `on` Node.idx) (Container.elems nl) |
|
547 |
snl' = map (\ n -> ((fromJust $ lookup (Node.idx n) ktn), n)) snl |
|
548 |
in unlines $ map (uncurry Node.list) snl' |
|
549 |
|
|
550 |
-- | Compute the mem and disk covariance. |
|
551 |
compDetailedCV :: NodeList -> (Double, Double) |
|
552 |
compDetailedCV nl = |
|
553 |
let nstats = map Node.normUsed $ Container.elems nl |
|
554 |
(mem_l, dsk_l) = unzip nstats |
|
555 |
mem_cv = varianceCoeff mem_l |
|
556 |
dsk_cv = varianceCoeff dsk_l |
|
557 |
in (mem_cv, dsk_cv) |
|
558 |
|
|
559 |
-- | Compute the 'total' variance. |
|
560 |
compCV :: NodeList -> Double |
|
561 |
compCV nl = |
|
562 |
let (mem_cv, dsk_cv) = compDetailedCV nl |
|
563 |
in mem_cv + dsk_cv |
|
564 |
|
|
565 |
printStats :: NodeList -> String |
|
566 |
printStats nl = |
|
567 |
let (mem_cv, dsk_cv) = compDetailedCV nl |
|
568 |
in printf "mem=%.8f, dsk=%.8f" mem_cv dsk_cv |
|
569 |
|
|
570 |
-- Balancing functions |
|
571 |
|
|
572 |
-- Loading functions |
|
573 |
|
|
574 |
{- | Convert newline and delimiter-separated text. |
|
575 |
|
|
576 |
This function converts a text in tabular format as generated by |
|
577 |
@gnt-instance list@ and @gnt-node list@ to a list of objects using a |
|
578 |
supplied conversion function. |
|
579 |
|
|
580 |
-} |
|
581 |
loadTabular :: String -> ([String] -> (String, a)) |
|
582 |
-> (a -> Int -> a) -> ([(String, Int)], [(Int, a)]) |
|
583 |
loadTabular text_data convert_fn set_fn = |
|
584 |
let lines_data = lines text_data |
|
585 |
rows = map (sepSplit '|') lines_data |
|
586 |
kerows = (map convert_fn rows) |
|
587 |
idxrows = map (\ (idx, (k, v)) -> ((k, idx), (idx, set_fn v idx))) |
|
588 |
(zip [0..] kerows) |
|
589 |
in unzip idxrows |
|
590 |
|
|
591 |
-- | Set the primary or secondary node indices on the instance list. |
|
592 |
fixInstances :: [(Int, Node.Node)] |
|
593 |
-> (Node.Node -> [Int]) -- ^ Either 'Node.slist' or 'Node.plist' |
|
594 |
-> (Instance.Instance -> Int -> Instance.Instance) |
|
595 |
-- ^ Either 'Instance.setSec' or 'Instance.setPri' |
|
596 |
-> [(Int, Instance.Instance)] |
|
597 |
-> [(Int, Instance.Instance)] |
|
598 |
fixInstances nl list_fn set_fn il = |
|
599 |
concat $ map |
|
600 |
(\ (n_idx, n) -> |
|
601 |
map |
|
602 |
(\ i_idx -> |
|
603 |
let oldi = fromJust (lookup i_idx il) |
|
604 |
in |
|
605 |
(i_idx, set_fn oldi n_idx) |
|
606 |
) (list_fn n) |
|
607 |
) nl |
|
608 |
|
|
609 |
-- | Splits and returns a list of indexes based on an Instance assoc list. |
|
610 |
csi :: String -> [(String, Int)] -> [Int] |
|
611 |
csi values il = |
|
612 |
map |
|
613 |
(\ x -> fromJust (lookup x il)) |
|
614 |
(commaSplit values) |
|
615 |
|
|
616 |
{-| Initializer function that loads the data from a node and list file |
|
617 |
and massages it into the correct format. -} |
|
618 |
loadData :: String -- ^ Node data in text format |
|
619 |
-> String -- ^ Instance data in text format |
|
620 |
-> (Container.Container Node.Node, |
|
621 |
Container.Container Instance.Instance, |
|
622 |
[(Int, String)], [(Int, String)]) |
|
623 |
loadData ndata idata = |
|
624 |
{- instance file: name mem disk -} |
|
625 |
let (kti, il) = loadTabular idata |
|
626 |
(\ (i:j:k:[]) -> (i, Instance.create j k)) Instance.setIdx |
|
627 |
{- node file: name mem disk plist slist -} |
|
628 |
(ktn, nl) = loadTabular ndata |
|
629 |
(\ (i:jt:jf:kt:kf:l:m:[]) -> |
|
630 |
(i, Node.create jt jf kt kf (csi l kti) (csi m kti))) |
|
631 |
Node.setIdx |
|
632 |
il2 = fixInstances nl Node.slist Instance.setSec $ |
|
633 |
fixInstances nl Node.plist Instance.setPri il |
|
634 |
il3 = Container.fromAssocList il2 |
|
635 |
nl3 = Container.fromAssocList |
|
636 |
(map (\ (k, v) -> (k, Node.buildPeers v il3 (length nl))) nl) |
|
637 |
in |
|
638 |
(nl3, il3, swapPairs ktn, swapPairs kti) |
b/src/Container.hs | ||
---|---|---|
1 |
{-| Module abstracting the node and instance container implementation. |
|
2 |
|
|
3 |
This is currently implemented on top of an 'IntMap', which seems to |
|
4 |
give the best performance for our workload. |
|
5 |
|
|
6 |
-} |
|
7 |
|
|
8 |
module Container |
|
9 |
( |
|
10 |
-- * Types |
|
11 |
Container |
|
12 |
-- * Creation |
|
13 |
, empty |
|
14 |
, fromAssocList |
|
15 |
-- * Query |
|
16 |
, size |
|
17 |
, find |
|
18 |
-- * Update |
|
19 |
, add |
|
20 |
, addTwo |
|
21 |
, remove |
|
22 |
-- * Conversion |
|
23 |
, elems |
|
24 |
) where |
|
25 |
|
|
26 |
import qualified Data.IntMap as IntMap |
|
27 |
|
|
28 |
type Key = IntMap.Key |
|
29 |
type Container = IntMap.IntMap |
|
30 |
|
|
31 |
-- | Create an empty container. |
|
32 |
empty :: Container a |
|
33 |
empty = IntMap.empty |
|
34 |
|
|
35 |
-- | Returns the number of elements in the map. |
|
36 |
size :: Container a -> Int |
|
37 |
size = IntMap.size |
|
38 |
|
|
39 |
-- | Locate a key in the map (must exist). |
|
40 |
find :: Key -> Container a -> a |
|
41 |
find k c = c IntMap.! k |
|
42 |
|
|
43 |
-- | Locate a keyin the map returning a default value if not existing. |
|
44 |
findWithDefault :: a -> Key -> Container a -> a |
|
45 |
findWithDefault = IntMap.findWithDefault |
|
46 |
|
|
47 |
-- | Add or update one element to the map. |
|
48 |
add :: Key -> a -> Container a -> Container a |
|
49 |
add k v c = IntMap.insert k v c |
|
50 |
|
|
51 |
-- | Remove an element from the map. |
|
52 |
remove :: Key -> Container a -> Container a |
|
53 |
remove = IntMap.delete |
|
54 |
|
|
55 |
-- | Return the list of values in the map. |
|
56 |
elems :: Container a -> [a] |
|
57 |
elems = IntMap.elems |
|
58 |
|
|
59 |
-- | Create a map from an association list. |
|
60 |
fromAssocList :: [(Key, a)] -> Container a |
|
61 |
fromAssocList = IntMap.fromList |
|
62 |
|
|
63 |
-- | Create a map from an association list with a combining function. |
|
64 |
fromListWith :: (a -> a -> a) -> [(Key, a)] -> Container a |
|
65 |
fromListWith = IntMap.fromListWith |
|
66 |
|
|
67 |
-- | Fold over the values of the map. |
|
68 |
fold :: (a -> b -> b) -> b -> Container a -> b |
|
69 |
fold = IntMap.fold |
|
70 |
|
|
71 |
-- | Add or update two elements of the map. |
|
72 |
addTwo :: Key -> a -> Key -> a -> Container a -> Container a |
|
73 |
addTwo k1 v1 k2 v2 c = add k1 v1 $ add k2 v2 c |
b/src/Instance.hs | ||
---|---|---|
1 |
{-| Module describing an instance. |
|
2 |
|
|
3 |
The instance data type holds very few fields, the algorithm |
|
4 |
intelligence is in the "Node" and "Cluster" modules. |
|
5 |
|
|
6 |
-} |
|
7 |
module Instance where |
|
8 |
|
|
9 |
data Instance = Instance { mem :: Int -- ^ memory of the instance |
|
10 |
, disk :: Int -- ^ disk size of instance |
|
11 |
, pnode :: Int -- ^ original primary node |
|
12 |
, snode :: Int -- ^ original secondary node |
|
13 |
, idx :: Int -- ^ internal index for book-keeping |
|
14 |
} deriving (Show) |
|
15 |
|
|
16 |
create :: String -> String -> Instance |
|
17 |
create mem_init disk_init = Instance { |
|
18 |
mem = read mem_init, |
|
19 |
disk = read disk_init, |
|
20 |
pnode = -1, |
|
21 |
snode = -1, |
|
22 |
idx = -1 |
|
23 |
} |
|
24 |
|
|
25 |
-- | Changes the primary node of the instance. |
|
26 |
setPri :: Instance -- ^ the original instance |
|
27 |
-> Int -- ^ the new primary node |
|
28 |
-> Instance -- ^ the modified instance |
|
29 |
setPri t p = t { pnode = p } |
|
30 |
|
|
31 |
-- | Changes the secondary node of the instance. |
|
32 |
setSec :: Instance -- ^ the original instance |
|
33 |
-> Int -- ^ the new secondary node |
|
34 |
-> Instance -- ^ the modified instance |
|
35 |
setSec t s = t { snode = s } |
|
36 |
|
|
37 |
-- | Changes both nodes of the instance. |
|
38 |
setBoth :: Instance -- ^ the original instance |
|
39 |
-> Int -- ^ new primary node index |
|
40 |
-> Int -- ^ new secondary node index |
|
41 |
-> Instance -- ^ the modified instance |
|
42 |
setBoth t p s = t { pnode = p, snode = s } |
|
43 |
|
|
44 |
-- | Changes the index. |
|
45 |
-- This is used only during the building of the data structures. |
|
46 |
setIdx :: Instance -- ^ the original instance |
|
47 |
-> Int -- ^ new index |
|
48 |
-> Instance -- ^ the modified instance |
|
49 |
setIdx t i = t { idx = i } |
b/src/Makefile | ||
---|---|---|
1 |
all: hn1 hbal |
|
2 |
|
|
3 |
hn1: |
|
4 |
ghc --make -O2 -W hn1 |
|
5 |
|
|
6 |
hbal: |
|
7 |
ghc --make -O2 -W hbal |
|
8 |
|
|
9 |
clean: |
|
10 |
rm -f *.o *.cmi *.cmo *.cmx *.old hn1 zn1 *.prof *.ps *.stat *.aux \ |
|
11 |
gmon.out *.hi README.html TAGS |
|
12 |
|
|
13 |
.PHONY : all clean hn1 hbal |
b/src/Node.hs | ||
---|---|---|
1 |
{-| Module describing a node. |
|
2 |
|
|
3 |
All updates are functional (copy-based) and return a new node with |
|
4 |
updated value. |
|
5 |
-} |
|
6 |
|
|
7 |
module Node |
|
8 |
( |
|
9 |
Node(failN1, idx, f_mem, f_disk, slist, plist) |
|
10 |
-- * Constructor |
|
11 |
, create |
|
12 |
-- ** Finalization after data loading |
|
13 |
, buildPeers |
|
14 |
, setIdx |
|
15 |
-- * Instance (re)location |
|
16 |
, removePri |
|
17 |
, removeSec |
|
18 |
, addPri |
|
19 |
, addSec |
|
20 |
-- * Statistics |
|
21 |
, normUsed |
|
22 |
-- * Formatting |
|
23 |
, list |
|
24 |
) where |
|
25 |
|
|
26 |
import Data.List |
|
27 |
import Text.Printf (printf) |
|
28 |
|
|
29 |
import qualified Container |
|
30 |
import qualified Instance |
|
31 |
import qualified PeerMap |
|
32 |
|
|
33 |
import Utils |
|
34 |
|
Also available in: Unified diff