Initial import

author Iustin Pop <iustin@google.com>

Wed, 28 Jan 2009 11:09:09 +0000 (12:09 +0100)

committer Iustin Pop <iustin@google.com>

Wed, 28 Jan 2009 11:09:09 +0000 (12:09 +0100)
author Iustin Pop <iustin@google.com>
Wed, 28 Jan 2009 11:09:09 +0000 (12:09 +0100)
committer Iustin Pop <iustin@google.com>
Wed, 28 Jan 2009 11:09:09 +0000 (12:09 +0100)
diff --git a/COPYING b/COPYING

new file mode 100644 (file)

index 0000000..d511905
--- /dev/null
+++ b/COPYING
@@ -0,0 +1,339 @@
+                   GNU GENERAL PUBLIC LICENSE
+                      Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                           Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                   GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+  2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) You must cause the modified files to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    b) You must cause any work that you distribute or publish, that in
+    whole or in part contains or is derived from the Program or any
+    part thereof, to be licensed as a whole at no charge to all third
+    parties under the terms of this License.
+
+    c) If the modified program normally reads commands interactively
+    when run, you must cause it, when started running for such
+    interactive use in the most ordinary way, to print or display an
+    announcement including an appropriate copyright notice and a
+    notice that there is no warranty (or else, saying that you provide
+    a warranty) and that users may redistribute the program under
+    these conditions, and telling the user how to view a copy of this
+    License.  (Exception: if the Program itself is interactive but
+    does not normally print such an announcement, your work based on
+    the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+    a) Accompany it with the complete corresponding machine-readable
+    source code, which must be distributed under the terms of Sections
+    1 and 2 above on a medium customarily used for software interchange; or,
+
+    b) Accompany it with a written offer, valid for at least three
+    years, to give any third party, for a charge no more than your
+    cost of physically performing source distribution, a complete
+    machine-readable copy of the corresponding source code, to be
+    distributed under the terms of Sections 1 and 2 above on a medium
+    customarily used for software interchange; or,
+
+    c) Accompany it with the information you received as to the offer
+    to distribute corresponding source code.  (This alternative is
+    allowed only for noncommercial distribution and only if you
+    received the program in object code or executable form with such
+    an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it.  For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable.  However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+  5. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Program or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+  6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+  7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded.  In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+  9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation.  If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+  10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission.  For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this.  Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+                           NO WARRANTY
+
+  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+                    END OF TERMS AND CONDITIONS
+
+           How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License along
+    with this program; if not, write to the Free Software Foundation, Inc.,
+    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+    Gnomovision version 69, Copyright (C) year name of author
+    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  <signature of Ty Coon>, 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.
diff --git a/Makefile b/Makefile

new file mode 100644 (file)

index 0000000..a4ef134
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,31 @@
+HSRCS := $(wildcard src/*.hs)
+HDDIR = apidoc
+
+# Haskell rules
+
+all:
+       $(MAKE) -C src
+
+README.html: README
+       rst2html $< $@
+
+doc: README.html
+       rm -rf $(HDDIR)
+       mkdir -p $(HDDIR)/src
+       cp hscolour.css $(HDDIR)/src
+       for file in $(HSRCS); do \
+        HsColour -css -anchor \
+        $$file > $(HDDIR)/src/`basename $$file .hs`.html ; \
+    done
+       ln -sf hn1.html $(HDDIR)/src/Main.html
+       haddock --odir $(HDDIR) --html --ignore-all-exports \
+           -t hn1 -p haddock-prologue \
+        --source-module="src/%{MODULE/.//}.html" \
+        --source-entity="src/%{MODULE/.//}.html#%{NAME}" \
+           $(HSRCS)
+
+clean:
+       rm -f *.o *.cmi *.cmo *.cmx *.old hn1 zn1 *.prof *.ps *.stat *.aux \
+        gmon.out *.hi README.html TAGS
+
+.PHONY : all doc clean hn1
diff --git a/README b/README

new file mode 100644 (file)

index 0000000..273df49
--- /dev/null
+++ b/README
@@ -0,0 +1,245 @@
+Cluster tools (h-aneti?)
+========================
+
+These are some simple cluster tools for fixing common problems. Right now N+1
+and rebalancing are included.
+
+.. contents::
+
+Cluster N+1 solver
+------------------
+
+This program runs a very simple brute force algorithm over the instance
+placement space in order to determine the shortest number of replace-disks
+needed to fix the cluster. Note this means we won't get a balanced cluster,
+just one that passes N+1 checks.
+
+Also note that the set of all instance placements on a 20/80 cluster is
+(20*19)^80, that is ~10^200, so...
+
+Algorithm
++++++++++
+
+The algorithm is a simple two-phase process.
+
+In phase 1 we determine the removal set, that is the set of instances that when
+removed completely from the cluster, make it healthy again. The instances that
+can go into the set are all the primary and secondary instances of the failing
+nodes. The result from this phase is actually a list - we compute all sets of
+the same minimum length.
+
+So basically we aim to determine here: what is the minimum number of instances
+that need to be removed (this is called the removal depth) and which are the
+actual combinations that fit (called the list of removal sets). 
+
+In phase 2, for each removal set computed in the previous phase, we take the
+removed instances and try to determine where we can put them so that the
+cluster is still passing N+1 checks. From this list of possible solutions
+(called the list of solutions), we compute the one that has the smallest delta
+from the original state (the delta is the number of replace disks that needs to
+be run) and chose this as the final solution.
+
+Implementation
+++++++++++++++
+
+Of course, a naive implementation based on the above description will run for
+long periods of time, so the implementation has to be smart in order to prune
+the solution space as eagerly as possible.
+
+In the following, we use as example a set of test data (a cluster with 20
+nodes, 80 instances that has 5 nodes failing N+1 checks for a total of 12
+warnings).
+
+On this set, the minimum depth is 4 (anything below fails), and for this depth
+the current version of the algorithm generates 5 removal sets; a previous
+version of the first phase generated a slightly different set of instances, with
+two removal sets. For the original version of the algorithm:
+
+- the first, non-optimized implementation computed a solution of delta=4 in 30
+  minutes on server-class CPUs and was still running when aborted 10 minutes
+  later
+- the intermediate optimized version computed the whole solution space and
+  found a delta=3 solution in around 10 seconds on a laptop-class CPU (total
+  number of solutions ~600k)
+- latest version on server CPUs (which actually computes more removal sets)
+  computes depth=4 in less than a second and depth=5 in around 2 seconds, and
+  depth=6 in less than 20 seconds; depth=8 takes under five minutes (this is
+  10^10 bigger solution space)
+
+Note that when (artificially) increasing the depth to 5 the number of removal
+sets grows fast (~3000) and a (again artificial) depth 6 generates 61k removal
+sets. Therefore, it is possible to restrict the number of solution sets
+examined via a command-line option.
+
+The factors that influence the run time are:
+
+- the removal depth; for each increase with one of the depth, we grow the
+  solution space by the number of nodes squared (since a new instance can live
+  any two nodes as primary/secondary, therefore (almost) N times N); i.e.,
+  depth=1 will create a N^2 solution space, depth two will make this N^4,
+  depth three will be N^6, etc.
+- the removal depth again; for each increase in the depth, there will be more
+  valid removal sets, and the space of solutions increases linearly with the
+  number of removal sets
+
+Therefore, the smaller the depth the faster the algorithm will be; it doesn't
+seem like this algorithm will work for clusters of 100 nodes and many many
+small instances (e.g. 256MB instances on 16GB nodes).
+
+Currently applied optimizations:
+
+- when choosing where to place an instance in phase two, there are N*(N-1)
+  possible primary/secondary options; however, if instead of iterating over all
+  p * s pairs, we first determine the set of primary nodes that can hold this
+  instance (without failing N+1), we can cut (N-1) secondary placements for
+  each primary node removed; and since this applies at every iteration of phase
+  2 it linearly decreases the solution space, and on full clusters, this can
+  mean a four-five times reductions of solution space
+- since the number of solutions is very high even for smaller depths (on the
+  test data, depth=4 results in 1.8M solutions) we can't compare them at the
+  end, so at each iteration in phase 2 we only promote the best solution out of
+  our own set of solutions
+- since the placement of instances can only increase the delta of the solution
+  (placing a new instance will add zero or more replace-disks steps), it means
+  the delta will only increase while recursing during phase 2; therefore, if we
+  know at one point that we have a current delta that is equal or higher to the
+  delta of the best solution so far, we can abort the recursion; this cuts a
+  tremendous number of branches; further promotion of the best solution from
+  one removal set to another can cut entire removal sets after a few recursions
+
+Command line usage
+++++++++++++++++++
+
+Synopsis::
+
+    hn1 [-n NODES_FILE] [-i INSTANCES_FILE] [-d START_DEPTH] \
+        [-r MAX_REMOVALS] [-l MIN_DELTA] [-L MAX_DELTA] \
+        [-p] [-C]
+
+The -n and -i options change the names of the input files. The -d option
+changes the start depth, as a higher depth can give (with a longer computation
+time) a solution with better delta. The -r option restricts at each depth the
+number of solutions considered - with r=1000 for example even depth=10 finishes
+in less than a second.
+
+The -p option will show the cluster state after the solution is implemented,
+while the -C option will show the needed gnt-instance commands to implement
+it.
+
+The -l (--min-delta) and -L (--max-delta) options restrict the solution in the
+following ways:
+
+- min-delta will cause the search to abort early once we find a solution with
+  delta less than or equal to this parameter; this can cause extremely fast
+  results in case a desired solution is found quickly; the default value for
+  this parameter is zero, so once we find a "perfect" solution we finish early
+- max-delta causes rejection of valid solution but which have delta higher
+  than the value of this parameter; this can reduce the depth of the search
+  tree, with sometimes significant speedups; by default, this optimization is
+  not used
+
+Individually or combined, these two parameters can (if there are any) very
+fast result; on our test data, depth=34 (max depth!) is solved in 2 seconds
+with min-delta=0/max-delta=1 (since there is such a solution), and the
+extremely low max-delta causes extreme pruning.
+
+Cluster rebalancer
+------------------
+
+Compared to the N+1 solver, the rebalancer uses a very simple algorithm:
+repeatedly try to move each instance one step, so that the cluster score
+becomes better. We stop when no further move can improve the score.
+
+The algorithm is divided into rounds (all identical):
+
+#. Repeat for each instance:
+
+    #. Compute score after the potential failover of the instance
+
+    #. For each node that is different from the current primary/secondary
+
+        #. Compute score after replacing the primary with this new node
+
+        #. Compute score after replacing the secondary with this new node
+
+
+    #. Out of this N*2+1 possible new scores (and their associated move) for
+       this instance, we choose the one that is the best in terms of cluster
+       score, and then proceed to the next instance
+
+Since we don't compute all combinations of moves for instances (e.g. the first
+instance's all moves Cartesian product with second instance's all moves, etc.)
+but we proceed serially instance A, then B, then C, the total computations we
+make in one steps is simply N(number of nodes)*2+1 times I(number of instances),
+instead of (N*2+1)^I. So therefore the runtime for a round is trivial.
+
+Further rounds are done, since the relocation of instances might offer better
+places for instances which we didn't move, or simply didn't move to the best
+place. It is possible to limit the rounds, but usually the algorithm finishes
+after a few rounds by itself.
+
+Note that the cluster *must* be N+1 compliant before this algorithm is run, and
+will stay at each move N+1 compliant. Therefore, the final cluster will be N+1
+compliant.
+
+Single-round solutions
+++++++++++++++++++++++
+
+Single-round solutions have the very nice property that they are
+incrementally-valid. In other words, if you have a 10-step solution, at each
+step the cluster is both N+1 compliant and better than the previous step.
+
+This means that you can stop at any point and you will have a better cluster.
+For this reason, single-round solutions are recommended in the common case of
+let's make this better. Multi-round solutions will be better though when adding
+a couple of new, empty nodes to the cluster due to the many relocations needed.
+
+
+Multi-round solutions
++++++++++++++++++++++
+
+A multi-round solution (not for a single round), due to de-duplication of moves
+(i.e. just put the instance directly in its final place, and not move it five
+times around) loses both these properties. It might be that it's not possible to
+directly put the instance on the final nodes. So it can be possible that yes,
+the cluster is happy in the final solution and nice, but you cannot do the steps
+in the shown order. Solving this (via additional instance move(s)) is left to
+the user.
+
+Command line usage
+++++++++++++++++++
+
+Synopsis::
+
+    hbal [-n NODES_FILE] [-i INSTANCES_FILE] \
+         [-r MAX_ROUNDS] \
+         [-p] [-C]
+
+The -n and -i options change the names of the input files. The -r option
+restricts the maximum number of rounds (and is more of safety measure).
+
+The -p option will show the cluster state after the solution is implemented,
+while the -C option will show the needed gnt-instance commands to implement
+it.
+
+Integration with Ganeti
+-----------------------
+
+The programs needs only the output of the node list and instance list. That is,
+they need the following two commands to be run::
+
+    gnt-node list -oname,mtotal,mfree,dtotal,dfree,pinst_list,sinst_list \
+      --separator '|' --no-headers > nodes
+    gnt-instance list -oname,admin_ram,sda_size \
+      --separator '|' --no-head > instances
+
+These two files should be saved under the names of 'nodes' and 'instances'.
+
+When run, the programs will show some informational messages and output the
+chosen solution, in the form of a list of instance name and chosen
+primary/secondary nodes. The user then needs to run the necessary commands to
+get the instances to live on those nodes.
+
+Note that sda_size is less than the total disk size of an instance by 4352
+MiB, so if disk space is at a premium the calculation could be wrong; in this
+case, please adjust the values manually.
diff --git a/haddock-prologue b/haddock-prologue

new file mode 100644 (file)

index 0000000..7216b93
--- /dev/null
+++ b/haddock-prologue
@@ -0,0 +1,5 @@
+This is the internal documentation for hn1, an experimental N+1
+cluster solver.
+
+Start with the "Main" module, the follow with "Cluster" and then the
+rest.
diff --git a/hscolour.css b/hscolour.css

new file mode 100644 (file)

index 0000000..b0dc6e9
--- /dev/null
+++ b/hscolour.css
@@ -0,0 +1,6 @@
+
+.keyglyph, .layout {color: red;}
+.keyword {color: blue;}
+.comment, .comment a {color: green;}
+.str, .chr {color: teal;}
+.keyword,.conid, .varid, .conop, .varop, .num, .cpp, .sel, .definition {}
diff --git a/src/Cluster.hs b/src/Cluster.hs

new file mode 100644 (file)

index 0000000..484ba58
--- /dev/null
+++ b/src/Cluster.hs
@@ -0,0 +1,638 @@
+{-| Implementation of cluster-wide logic.
+
+This module holds all pure cluster-logic; I\/O related functionality
+goes into the "Main" module.
+
+-}
+
+module Cluster
+    (
+     -- * Types
+     NodeList
+    , InstanceList
+    , Placement
+    , Solution(..)
+    , Table(..)
+    , Removal
+    -- * Generic functions
+    , totalResources
+    -- * First phase functions
+    , computeBadItems
+    -- * Second phase functions
+    , computeSolution
+    , applySolution
+    , printSolution
+    , printNodes
+    -- * Balacing functions
+    , checkMove
+    , compCV
+    , printStats
+    -- * Loading functions
+    , loadData
+    ) where
+
+import Data.List
+import Data.Maybe (isNothing, fromJust)
+import Text.Printf (printf)
+import Data.Function
+
+import qualified Container
+import qualified Instance
+import qualified Node
+import Utils
+
+type NodeList = Container.Container Node.Node
+type InstanceList = Container.Container Instance.Instance
+type Score = Double
+
+-- | The description of an instance placement.
+type Placement = (Int, Int, Int)
+
+{- | A cluster solution described as the solution delta and the list
+of placements.
+
+-}
+data Solution = Solution Int [Placement]
+                deriving (Eq, Ord, Show)
+
+-- | Returns the delta of a solution or -1 for Nothing
+solutionDelta :: Maybe Solution -> Int
+solutionDelta sol = case sol of
+                      Just (Solution d _) -> d
+                      _ -> -1
+
+-- | A removal set.
+data Removal = Removal NodeList [Instance.Instance]
+
+-- | An instance move definition
+data IMove = Failover
+           | ReplacePrimary Int
+           | ReplaceSecondary Int
+             deriving (Show)
+
+-- | The complete state for the balancing solution
+data Table = Table NodeList InstanceList Score [Placement]
+             deriving (Show)
+
+-- General functions
+
+-- | Cap the removal list if needed.
+capRemovals :: [a] -> Int -> [a]
+capRemovals removals max_removals =
+    if max_removals > 0 then
+        take max_removals removals
+    else
+        removals
+
+-- | Check if the given node list fails the N+1 check.
+verifyN1Check :: [Node.Node] -> Bool
+verifyN1Check nl = any Node.failN1 nl
+
+-- | Verifies the N+1 status and return the affected nodes.
+verifyN1 :: [Node.Node] -> [Node.Node]
+verifyN1 nl = filter Node.failN1 nl
+
+{-| Add an instance and return the new node and instance maps. -}
+addInstance :: NodeList -> Instance.Instance ->
+               Node.Node -> Node.Node -> Maybe NodeList
+addInstance nl idata pri sec =
+  let pdx = Node.idx pri
+      sdx = Node.idx sec
+  in do
+      pnode <- Node.addPri pri idata
+      snode <- Node.addSec sec idata pdx
+      new_nl <- return $ Container.addTwo sdx snode
+                         pdx pnode nl
+      return new_nl
+
+-- | Remove an instance and return the new node and instance maps.
+removeInstance :: NodeList -> Instance.Instance -> NodeList
+removeInstance nl idata =
+  let pnode = Instance.pnode idata
+      snode = Instance.snode idata
+      pn = Container.find pnode nl
+      sn = Container.find snode nl
+      new_nl = Container.addTwo
+               pnode (Node.removePri pn idata)
+               snode (Node.removeSec sn idata) nl in
+  new_nl
+
+-- | Remove an instance and return the new node map.
+removeInstances :: NodeList -> [Instance.Instance] -> NodeList
+removeInstances = foldl' removeInstance
+
+-- | Compute the total free disk and memory in the cluster.
+totalResources :: Container.Container Node.Node -> (Int, Int)
+totalResources nl =
+    foldl'
+    (\ (mem, disk) node -> (mem + (Node.f_mem node),
+                            disk + (Node.f_disk node)))
+    (0, 0) (Container.elems nl)
+
+{- | Compute a new version of a cluster given a solution.
+
+This is not used for computing the solutions, but for applying a
+(known-good) solution to the original cluster for final display.
+
+It first removes the relocated instances after which it places them on
+their new nodes.
+
+ -}
+applySolution :: NodeList -> InstanceList -> [Placement] -> NodeList
+applySolution nl il sol =
+    let odxes = map (\ (a, b, c) -> (Container.find a il,
+                                     Node.idx (Container.find b nl),
+                                     Node.idx (Container.find c nl))
+                    ) sol
+        idxes = (\ (x, _, _) -> x) (unzip3 odxes)
+        nc = removeInstances nl idxes
+    in
+      foldl' (\ nz (a, b, c) ->
+                 let new_p = Container.find b nz
+                     new_s = Container.find c nz in
+                 fromJust (addInstance nz a new_p new_s)
+           ) nc odxes
+
+
+-- First phase functions
+
+{- | Given a list 1,2,3..n build a list of pairs [(1, [2..n]), (2,
+    [3..n]), ...]
+
+-}
+genParts :: [a] -> Int -> [(a, [a])]
+genParts l count =
+    case l of
+      [] -> []
+      x:xs ->
+          if length l < count then
+              []
+          else
+              (x, xs) : (genParts xs count)
+
+-- | Generates combinations of count items from the names list.
+genNames :: Int -> [b] -> [[b]]
+genNames count1 names1 =
+  let aux_fn count names current =
+          case count of
+            0 -> [current]
+            _ ->
+                concatMap
+                (\ (x, xs) -> aux_fn (count - 1) xs (x:current))
+                (genParts names count)
+  in
+    aux_fn count1 names1 []
+
+{- | Computes the pair of bad nodes and instances.
+
+The bad node list is computed via a simple 'verifyN1' check, and the
+bad instance list is the list of primary and secondary instances of
+those nodes.
+
+-}
+computeBadItems :: NodeList -> InstanceList ->
+                   ([Node.Node], [Instance.Instance])
+computeBadItems nl il =
+  let bad_nodes = verifyN1 $ Container.elems nl
+      bad_instances = map (\idx -> Container.find idx il) $
+                      sort $ nub $ concat $
+                      map (\ n -> (Node.slist n) ++ (Node.plist n)) bad_nodes
+  in
+    (bad_nodes, bad_instances)
+
+
+{- | Checks if removal of instances results in N+1 pass.
+
+Note: the check removal cannot optimize by scanning only the affected
+nodes, since the cluster is known to be not healthy; only the check
+placement can make this shortcut.
+
+-}
+checkRemoval :: NodeList -> [Instance.Instance] -> Maybe Removal
+checkRemoval nl victims =
+  let nx = removeInstances nl victims
+      failN1 = verifyN1Check (Container.elems nx)
+  in
+    if failN1 then
+      Nothing
+    else
+      Just $ Removal nx victims
+
+
+-- | Computes the removals list for a given depth
+computeRemovals :: Cluster.NodeList
+                 -> [Instance.Instance]
+                 -> Int
+                 -> [Maybe Cluster.Removal]
+computeRemovals nl bad_instances depth =
+    map (checkRemoval nl) $ genNames depth bad_instances
+
+-- Second phase functions
+
+-- | Single-node relocation cost
+nodeDelta :: Int -> Int -> Int -> Int
+nodeDelta i p s =
+    if i == p || i == s then
+        0
+    else
+        1
+
+{-| Compute best solution.
+
+    This function compares two solutions, choosing the minimum valid
+    solution.
+-}
+compareSolutions :: Maybe Solution -> Maybe Solution -> Maybe Solution
+compareSolutions a b = case (a, b) of
+  (Nothing, x) -> x
+  (x, Nothing) -> x
+  (x, y) -> min x y
+
+-- | Compute best table. Note that the ordering of the arguments is important.
+compareTables :: Table -> Table -> Table
+compareTables a@(Table _ _ a_cv _) b@(Table _ _ b_cv _ ) =
+    if a_cv > b_cv then b else a
+
+-- | Check if a given delta is worse then an existing solution.
+tooHighDelta :: Maybe Solution -> Int -> Int -> Bool
+tooHighDelta sol new_delta max_delta =
+    if new_delta > max_delta && max_delta >=0 then
+        True
+    else
+        case sol of
+          Nothing -> False
+          Just (Solution old_delta _) -> old_delta <= new_delta
+
+{-| Check if placement of instances still keeps the cluster N+1 compliant.
+
+    This is the workhorse of the allocation algorithm: given the
+    current node and instance maps, the list of instances to be
+    placed, and the current solution, this will return all possible
+    solution by recursing until all target instances are placed.
+
+-}
+checkPlacement :: NodeList            -- ^ The current node list
+               -> [Instance.Instance] -- ^ List of instances still to place
+               -> [Placement]         -- ^ Partial solution until now
+               -> Int                 -- ^ The delta of the partial solution
+               -> Maybe Solution      -- ^ The previous solution
+               -> Int                 -- ^ Abort if the we go above this delta
+               -> Maybe Solution      -- ^ The new solution
+checkPlacement nl victims current current_delta prev_sol max_delta =
+  let target = head victims
+      opdx = Instance.pnode target
+      osdx = Instance.snode target
+      vtail = tail victims
+      have_tail = (length vtail) > 0
+      nodes = Container.elems nl
+  in
+    foldl'
+    (\ accu_p pri ->
+         let
+             pri_idx = Node.idx pri
+             upri_delta = current_delta + nodeDelta pri_idx opdx osdx
+             new_pri = Node.addPri pri target
+             fail_delta1 = tooHighDelta accu_p upri_delta max_delta
+         in
+           if fail_delta1 || isNothing(new_pri) then accu_p
+           else let pri_nl = Container.add pri_idx (fromJust new_pri) nl in
+                foldl'
+                (\ accu sec ->
+                     let
+                         sec_idx = Node.idx sec
+                         upd_delta = upri_delta +
+                                     nodeDelta sec_idx opdx osdx
+                         fail_delta2 = tooHighDelta accu upd_delta max_delta
+                         new_sec = Node.addSec sec target pri_idx
+                     in
+                       if sec_idx == pri_idx || fail_delta2 ||
+                          isNothing new_sec then accu
+                       else let
+                           nx = Container.add sec_idx (fromJust new_sec) pri_nl
+                           plc = (Instance.idx target, pri_idx, sec_idx)
+                           c2 = plc:current
+                           result =
+                               if have_tail then
+                                   checkPlacement nx vtail c2 upd_delta
+                                                  accu max_delta
+                               else
+                                   Just (Solution upd_delta c2)
+                      in compareSolutions accu result
+                ) accu_p nodes
+    ) prev_sol nodes
+
+-- | Apply a move
+applyMove :: NodeList -> Instance.Instance
+          -> IMove -> (Maybe NodeList, Instance.Instance, Int, Int)
+applyMove nl inst Failover =
+    let old_pdx = Instance.pnode inst
+        old_sdx = Instance.snode inst
+        old_p = Container.find old_pdx nl
+        old_s = Container.find old_sdx nl
+        int_p = Node.removePri old_p inst
+        int_s = Node.removeSec old_s inst
+        new_p = Node.addPri int_s inst
+        new_s = Node.addSec int_p inst old_sdx
+        new_nl = if isNothing(new_p) || isNothing(new_s) then Nothing
+                 else Just $ Container.addTwo old_pdx (fromJust new_s)
+                      old_sdx (fromJust new_p) nl
+    in (new_nl, Instance.setBoth inst old_sdx old_pdx, old_sdx, old_pdx)
+
+applyMove nl inst (ReplacePrimary new_pdx) =
+    let old_pdx = Instance.pnode inst
+        old_sdx = Instance.snode inst
+        old_p = Container.find old_pdx nl
+        old_s = Container.find old_sdx nl
+        tgt_n = Container.find new_pdx nl
+        int_p = Node.removePri old_p inst
+        int_s = Node.removeSec old_s inst
+        new_p = Node.addPri tgt_n inst
+        new_s = Node.addSec int_s inst new_pdx
+        new_nl = if isNothing(new_p) || isNothing(new_s) then Nothing
+                 else Just $ Container.add new_pdx (fromJust new_p) $
+                      Container.addTwo old_pdx int_p
+                               old_sdx (fromJust new_s) nl
+    in (new_nl, Instance.setPri inst new_pdx, new_pdx, old_sdx)
+
+applyMove nl inst (ReplaceSecondary new_sdx) =
+    let old_pdx = Instance.pnode inst
+        old_sdx = Instance.snode inst
+        old_s = Container.find old_sdx nl
+        tgt_n = Container.find new_sdx nl
+        int_s = Node.removeSec old_s inst
+        new_s = Node.addSec tgt_n inst old_pdx
+        new_nl = if isNothing(new_s) then Nothing
+                 else Just $ Container.addTwo new_sdx (fromJust new_s)
+                      old_sdx int_s nl
+    in (new_nl, Instance.setSec inst new_sdx, old_pdx, new_sdx)
+
+checkSingleStep :: Table -- ^ The original table
+                -> Instance.Instance -- ^ The instance to move
+                -> Table -- ^ The current best table
+                -> IMove -- ^ The move to apply
+                -> Table -- ^ The final best table
+checkSingleStep ini_tbl target cur_tbl move =
+    let
+        Table ini_nl ini_il _ ini_plc = ini_tbl
+        (tmp_nl, new_inst, pri_idx, sec_idx) =
+            applyMove ini_nl target move
+    in
+      if isNothing tmp_nl then cur_tbl
+      else
+          let tgt_idx = Instance.idx target
+              upd_nl = fromJust tmp_nl
+              upd_cvar = compCV upd_nl
+              upd_il = Container.add tgt_idx new_inst ini_il
+              tmp_plc = filter (\ (t, _, _) -> t /= tgt_idx) ini_plc
+              upd_plc = (tgt_idx, pri_idx, sec_idx):tmp_plc
+              upd_tbl = Table upd_nl upd_il upd_cvar upd_plc
+          in
+            compareTables cur_tbl upd_tbl
+
+-- | Compute the best next move.
+checkMove :: Table            -- ^ The current solution
+          -> [Instance.Instance] -- ^ List of instances still to move
+          -> Table            -- ^ The new solution
+checkMove ini_tbl victims =
+  let target = head victims
+      opdx = Instance.pnode target
+      osdx = Instance.snode target
+      vtail = tail victims
+      have_tail = (length vtail) > 0
+      Table ini_nl _ _ _ = ini_tbl
+      nodes = filter (\node -> let idx = Node.idx node
+                               in idx /= opdx && idx /= osdx)
+              $ Container.elems ini_nl
+      aft_failover = checkSingleStep ini_tbl target ini_tbl Failover
+      next_tbl =
+          foldl'
+          (\ accu_p new_node ->
+               let
+                   new_idx = Node.idx new_node
+                   pmoves = [ReplacePrimary new_idx,
+                             ReplaceSecondary new_idx]
+               in
+                 foldl' (checkSingleStep ini_tbl target) accu_p pmoves
+          ) aft_failover nodes
+  in if have_tail then checkMove next_tbl vtail
+     else next_tbl
+
+
+
+{- | Auxiliary function for solution computation.
+
+We write this in an explicit recursive fashion in order to control
+early-abort in case we have met the min delta. We can't use foldr
+instead of explicit recursion since we need the accumulator for the
+abort decision.
+
+-}
+advanceSolution :: [Maybe Removal] -- ^ The removal to process
+                -> Int             -- ^ Minimum delta parameter
+                -> Int             -- ^ Maximum delta parameter
+                -> Maybe Solution  -- ^ Current best solution
+                -> Maybe Solution  -- ^ New best solution
+advanceSolution [] _ _ sol = sol
+advanceSolution (Nothing:xs) m n sol = advanceSolution xs m n sol
+advanceSolution ((Just (Removal nx removed)):xs) min_d max_d prev_sol =
+    let new_sol = checkPlacement nx removed [] 0 prev_sol max_d
+        new_delta = solutionDelta $! new_sol
+    in
+      if new_delta >= 0 && new_delta <= min_d then
+          new_sol
+      else
+          advanceSolution xs min_d max_d new_sol
+
+-- | Computes the placement solution.
+solutionFromRemovals :: [Maybe Removal] -- ^ The list of (possible) removals
+                     -> Int             -- ^ Minimum delta parameter
+                     -> Int             -- ^ Maximum delta parameter
+                     -> Maybe Solution  -- ^ The best solution found
+solutionFromRemovals removals min_delta max_delta =
+    advanceSolution removals min_delta max_delta Nothing
+
+{- | Computes the solution at the given depth.
+
+This is a wrapper over both computeRemovals and
+solutionFromRemovals. In case we have no solution, we return Nothing.
+
+-}
+computeSolution :: NodeList        -- ^ The original node data
+                -> [Instance.Instance] -- ^ The list of /bad/ instances
+                -> Int             -- ^ The /depth/ of removals
+                -> Int             -- ^ Maximum number of removals to process
+                -> Int             -- ^ Minimum delta parameter
+                -> Int             -- ^ Maximum delta parameter
+                -> Maybe Solution  -- ^ The best solution found (or Nothing)
+computeSolution nl bad_instances depth max_removals min_delta max_delta =
+  let
+      removals = computeRemovals nl bad_instances depth
+      removals' = capRemovals removals max_removals
+  in
+    solutionFromRemovals removals' min_delta max_delta
+
+-- Solution display functions (pure)
+
+-- | Given the original and final nodes, computes the relocation description.
+computeMoves :: String -- ^ The instance name
+             -> String -- ^ Original primary
+             -> String -- ^ Original secondary
+             -> String -- ^ New primary
+             -> String -- ^ New secondary
+             -> (String, [String])
+                -- ^ Tuple of moves and commands list; moves is containing
+                -- either @/f/@ for failover or @/r:name/@ for replace
+                -- secondary, while the command list holds gnt-instance
+                -- commands (without that prefix), e.g \"@failover instance1@\"
+computeMoves i a b c d =
+    if c == a then {- Same primary -}
+        if d == b then {- Same sec??! -}
+            ("-", [])
+        else {- Change of secondary -}
+            (printf "r:%s" d,
+             [printf "replace-disks -n %s %s" d i])
+    else
+        if c == b then {- Failover and ... -}
+            if d == a then {- that's all -}
+                ("f", [printf "failover %s" i])
+            else
+                (printf "f r:%s" d,
+                 [printf "failover %s" i,
+                  printf "replace-disks -n %s %s" d i])
+        else
+            if d == a then {- ... and keep primary as secondary -}
+                (printf "r:%s f" c,
+                 [printf "replace-disks -n %s %s" c i,
+                  printf "failover %s" i])
+            else
+                if d == b then {- ... keep same secondary -}
+                    (printf "f r:%s f" c,
+                     [printf "failover %s" i,
+                      printf "replace-disks -n %s %s" c i,
+                      printf "failover %s" i])
+
+                else {- Nothing in common -}
+                    (printf "r:%s f r:%s" c d,
+                     [printf "replace-disks -n %s %s" c i,
+                      printf "failover %s" i,
+                      printf "replace-disks -n %s %s" d i])
+
+{-| Converts a solution to string format -}
+printSolution :: InstanceList
+              -> [(Int, String)]
+              -> [(Int, String)]
+              -> [Placement]
+              -> ([String], [[String]])
+printSolution il ktn kti sol =
+  unzip $ map
+    (\ (i, p, s) ->
+       let inst = Container.find i il
+           inam = fromJust $ lookup (Instance.idx inst) kti
+           npri = fromJust $ lookup p ktn
+           nsec = fromJust $ lookup s ktn
+           opri = fromJust $ lookup (Instance.pnode inst) ktn
+           osec = fromJust $ lookup (Instance.snode inst) ktn
+           (moves, cmds) =  computeMoves inam opri osec npri nsec
+
+       in
+         (printf "  I: %s\to: %s+>%s\tn: %s+>%s\ta: %s"
+                 inam opri osec npri nsec moves,
+          cmds)
+    ) sol
+
+-- | Print the node list.
+printNodes :: [(Int, String)] -> NodeList -> String
+printNodes ktn nl =
+    let snl = sortBy (compare `on` Node.idx) (Container.elems nl)
+        snl' = map (\ n -> ((fromJust $ lookup (Node.idx n) ktn), n)) snl
+    in unlines $ map (uncurry Node.list) snl'
+
+-- | Compute the mem and disk covariance.
+compDetailedCV :: NodeList -> (Double, Double)
+compDetailedCV nl =
+    let nstats = map Node.normUsed $ Container.elems nl
+        (mem_l, dsk_l) = unzip nstats
+        mem_cv = varianceCoeff mem_l
+        dsk_cv = varianceCoeff dsk_l
+    in (mem_cv, dsk_cv)
+
+-- | Compute the 'total' variance.
+compCV :: NodeList -> Double
+compCV nl =
+    let (mem_cv, dsk_cv) = compDetailedCV nl
+    in mem_cv + dsk_cv
+
+printStats :: NodeList -> String
+printStats nl =
+    let (mem_cv, dsk_cv) = compDetailedCV nl
+    in printf "mem=%.8f, dsk=%.8f" mem_cv dsk_cv
+
+-- Balancing functions
+
+-- Loading functions
+
+{- | Convert newline and delimiter-separated text.
+
+This function converts a text in tabular format as generated by
+@gnt-instance list@ and @gnt-node list@ to a list of objects using a
+supplied conversion function.
+
+-}
+loadTabular :: String -> ([String] -> (String, a))
+            -> (a -> Int -> a) -> ([(String, Int)], [(Int, a)])
+loadTabular text_data convert_fn set_fn =
+    let lines_data = lines text_data
+        rows = map (sepSplit '|') lines_data
+        kerows = (map convert_fn rows)
+        idxrows = map (\ (idx, (k, v)) -> ((k, idx), (idx, set_fn v idx)))
+                  (zip [0..] kerows)
+    in unzip idxrows
+
+-- | Set the primary or secondary node indices on the instance list.
+fixInstances :: [(Int, Node.Node)]
+             -> (Node.Node -> [Int]) -- ^ Either 'Node.slist' or 'Node.plist'
+             -> (Instance.Instance -> Int -> Instance.Instance)
+                -- ^ Either 'Instance.setSec' or 'Instance.setPri'
+             -> [(Int, Instance.Instance)]
+             -> [(Int, Instance.Instance)]
+fixInstances nl list_fn set_fn il =
+    concat $ map
+               (\ (n_idx, n) ->
+                   map
+                   (\ i_idx ->
+                        let oldi = fromJust (lookup i_idx il)
+                        in
+                          (i_idx, set_fn oldi n_idx)
+                   ) (list_fn n)
+              ) nl
+
+-- | Splits and returns a list of indexes based on an Instance assoc list.
+csi :: String -> [(String, Int)] -> [Int]
+csi values il =
+    map
+    (\ x -> fromJust (lookup x il))
+    (commaSplit values)
+
+{-| Initializer function that loads the data from a node and list file
+    and massages it into the correct format. -}
+loadData :: String -- ^ Node data in text format
+         -> String -- ^ Instance data in text format
+         -> (Container.Container Node.Node,
+             Container.Container Instance.Instance,
+             [(Int, String)], [(Int, String)])
+loadData ndata idata =
+    {- instance file: name mem disk -}
+    let (kti, il) = loadTabular idata
+                    (\ (i:j:k:[]) -> (i, Instance.create j k)) Instance.setIdx
+    {- node file: name mem disk plist slist -}
+        (ktn, nl) = loadTabular ndata
+                    (\ (i:jt:jf:kt:kf:l:m:[]) ->
+                         (i, Node.create jt jf kt kf (csi l kti) (csi m kti)))
+                    Node.setIdx
+        il2 = fixInstances nl Node.slist Instance.setSec $
+              fixInstances nl Node.plist Instance.setPri il
+        il3 = Container.fromAssocList il2
+        nl3 = Container.fromAssocList
+             (map (\ (k, v) -> (k, Node.buildPeers v il3 (length nl))) nl)
+    in
+      (nl3, il3, swapPairs ktn, swapPairs kti)
diff --git a/src/Container.hs b/src/Container.hs

new file mode 100644 (file)

index 0000000..74ce2b5
--- /dev/null
+++ b/src/Container.hs
@@ -0,0 +1,73 @@
+{-| Module abstracting the node and instance container implementation.
+
+This is currently implemented on top of an 'IntMap', which seems to
+give the best performance for our workload.
+
+-}
+
+module Container
+    (
+     -- * Types
+     Container
+     -- * Creation
+    , empty
+    , fromAssocList
+     -- * Query
+    , size
+    , find
+     -- * Update
+    , add
+    , addTwo
+    , remove
+    -- * Conversion
+    , elems
+    ) where
+
+import qualified Data.IntMap as IntMap
+
+type Key = IntMap.Key
+type Container = IntMap.IntMap
+
+-- | Create an empty container.
+empty :: Container a
+empty = IntMap.empty
+
+-- | Returns the number of elements in the map.
+size :: Container a -> Int
+size = IntMap.size
+
+-- | Locate a key in the map (must exist).
+find :: Key -> Container a -> a
+find k c = c IntMap.! k
+
+-- | Locate a keyin the map returning a default value if not existing.
+findWithDefault :: a -> Key -> Container a -> a
+findWithDefault = IntMap.findWithDefault
+
+-- | Add or update one element to the map.
+add :: Key -> a -> Container a -> Container a
+add k v c = IntMap.insert k v c
+
+-- | Remove an element from the map.
+remove :: Key -> Container a -> Container a
+remove = IntMap.delete
+
+-- | Return the list of values in the map.
+elems :: Container a -> [a]
+elems = IntMap.elems
+
+-- | Create a map from an association list.
+fromAssocList :: [(Key, a)] -> Container a
+fromAssocList = IntMap.fromList
+
+-- | Create a map from an association list with a combining function.
+fromListWith :: (a -> a -> a) -> [(Key, a)] -> Container a
+fromListWith = IntMap.fromListWith
+
+-- | Fold over the values of the map.
+fold :: (a -> b -> b) -> b -> Container a -> b
+fold = IntMap.fold
+
+-- | Add or update two elements of the map.
+addTwo :: Key -> a -> Key -> a -> Container a -> Container a
+addTwo k1 v1 k2 v2 c = add k1 v1 $ add k2 v2 c
diff --git a/src/Instance.hs b/src/Instance.hs

new file mode 100644 (file)

index 0000000..f16e867
--- /dev/null
+++ b/src/Instance.hs
@@ -0,0 +1,49 @@
+{-| Module describing an instance.
+
+The instance data type holds very few fields, the algorithm
+intelligence is in the "Node" and "Cluster" modules.
+
+-}
+module Instance where
+
+data Instance = Instance { mem :: Int -- ^ memory of the instance
+                         , disk :: Int -- ^ disk size of instance
+                         , pnode :: Int -- ^ original primary node
+                         , snode :: Int -- ^ original secondary node
+                         , idx :: Int -- ^ internal index for book-keeping
+                         } deriving (Show)
+
+create :: String -> String -> Instance
+create mem_init disk_init = Instance {
+                              mem = read mem_init,
+                              disk = read disk_init,
+                              pnode = -1,
+                              snode = -1,
+                              idx = -1
+                            }
+
+-- | Changes the primary node of the instance.
+setPri :: Instance -- ^ the original instance
+        -> Int -- ^ the new primary node
+        -> Instance -- ^ the modified instance
+setPri t p = t { pnode = p }
+
+-- | Changes the secondary node of the instance.
+setSec :: Instance -- ^ the original instance
+        -> Int  -- ^ the new secondary node
+        -> Instance -- ^ the modified instance
+setSec t s = t { snode = s }
+
+-- | Changes both nodes of the instance.
+setBoth :: Instance -- ^ the original instance
+         -> Int -- ^ new primary node index
+         -> Int -- ^ new secondary node index
+         -> Instance -- ^ the modified instance
+setBoth t p s = t { pnode = p, snode = s }
+
+-- | Changes the index.
+-- This is used only during the building of the data structures.
+setIdx :: Instance -- ^ the original instance
+        -> Int -- ^ new index
+        -> Instance -- ^ the modified instance
+setIdx t i = t { idx = i }
diff --git a/src/Makefile b/src/Makefile

new file mode 100644 (file)

index 0000000..760d297
--- /dev/null
+++ b/src/Makefile
@@ -0,0 +1,13 @@
+all: hn1 hbal
+
+hn1:
+       ghc --make -O2 -W hn1
+
+hbal:
+       ghc --make -O2 -W hbal
+
+clean:
+       rm -f *.o *.cmi *.cmo *.cmx *.old hn1 zn1 *.prof *.ps *.stat *.aux \
+        gmon.out *.hi README.html TAGS
+
+.PHONY : all clean hn1 hbal
diff --git a/src/Node.hs b/src/Node.hs

new file mode 100644 (file)

index 0000000..c1d5bc8
--- /dev/null
+++ b/src/Node.hs
@@ -0,0 +1,192 @@
+{-| Module describing a node.
+
+    All updates are functional (copy-based) and return a new node with
+    updated value.
+-}
+
+module Node
+    (
+      Node(failN1, idx, f_mem, f_disk, slist, plist)
+    -- * Constructor
+    , create
+    -- ** Finalization after data loading
+    , buildPeers
+    , setIdx
+    -- * Instance (re)location
+    , removePri
+    , removeSec
+    , addPri
+    , addSec
+    -- * Statistics
+    , normUsed
+    -- * Formatting
+    , list
+    ) where
+
+import Data.List
+import Text.Printf (printf)
+
+import qualified Container
+import qualified Instance
+import qualified PeerMap
+
+import Utils
+
+data Node = Node { t_mem :: Int -- ^ total memory (Mib)
+                 , f_mem :: Int -- ^ free memory (MiB)
+                 , t_disk :: Int -- ^ total disk space (MiB)
+                 , f_disk :: Int -- ^ free disk space (MiB)
+                 , plist :: [Int] -- ^ list of primary instance indices
+                 , slist :: [Int] -- ^ list of secondary instance indices
+                 , idx :: Int -- ^ internal index for book-keeping
+                 , peers:: PeerMap.PeerMap -- ^ primary node to instance
+                                           -- mapping
+                 , failN1:: Bool -- ^ whether the node has failed n1
+                 , maxRes :: Int -- ^ maximum memory needed for
+                                   -- failover by primaries of this node
+  } deriving (Show)
+
+{- | Create a new node.
+
+The index and the peers maps are empty, and will be need to be update
+later via the 'setIdx' and 'buildPeers' functions.
+
+-}
+create :: String -> String -> String -> String -> [Int] -> [Int] -> Node
+create mem_t_init mem_f_init disk_t_init disk_f_init
+       plist_init slist_init = Node
+    {
+      t_mem = read mem_t_init,
+      f_mem = read mem_f_init,
+      t_disk = read disk_t_init,
+      f_disk = read disk_f_init,
+      plist = plist_init,
+      slist = slist_init,
+      failN1 = True,
+      idx = -1,
+      peers = PeerMap.empty,
+      maxRes = 0
+    }
+
+-- | Changes the index.
+-- This is used only during the building of the data structures.
+setIdx :: Node -> Int -> Node
+setIdx t i = t {idx = i}
+
+-- | Given the rmem, free memory and disk, computes the failn1 status.
+computeFailN1 :: Int -> Int -> Int -> Bool
+computeFailN1 new_rmem new_mem new_disk =
+    new_mem <= new_rmem || new_disk <= 0
+
+
+-- | Computes the maximum reserved memory for peers from a peer map.
+computeMaxRes :: PeerMap.PeerMap -> PeerMap.Elem
+computeMaxRes new_peers = PeerMap.maxElem new_peers
+
+-- | Builds the peer map for a given node.
+buildPeers :: Node -> Container.Container Instance.Instance -> Int -> Node
+buildPeers t il num_nodes =
+    let mdata = map
+                (\i_idx -> let inst = Container.find i_idx il
+                           in (Instance.pnode inst, Instance.mem inst))
+                (slist t)
+        pmap = PeerMap.accumArray (+) 0 (0, num_nodes - 1) mdata
+        new_rmem = computeMaxRes pmap
+        new_failN1 = computeFailN1 new_rmem (f_mem t) (f_disk t)
+    in t {peers=pmap, failN1 = new_failN1, maxRes = new_rmem}
+
+-- | Removes a primary instance.
+removePri :: Node -> Instance.Instance -> Node
+removePri t inst =
+    let iname = Instance.idx inst
+        new_plist = delete iname (plist t)
+        new_mem = f_mem t + Instance.mem inst
+        new_disk = f_disk t + Instance.disk inst
+        new_failn1 = computeFailN1 (maxRes t) new_mem new_disk
+    in t {plist = new_plist, f_mem = new_mem, f_disk = new_disk,
+          failN1 = new_failn1}
+
+-- | Removes a secondary instance.
+removeSec :: Node -> Instance.Instance -> Node
+removeSec t inst =
+    let iname = Instance.idx inst
+        pnode = Instance.pnode inst
+        new_slist = delete iname (slist t)
+        new_disk = f_disk t + Instance.disk inst
+        old_peers = peers t
+        old_peem = PeerMap.find pnode old_peers
+        new_peem =  old_peem - (Instance.mem inst)
+        new_peers = PeerMap.add pnode new_peem old_peers
+        old_rmem = maxRes t
+        new_rmem = if old_peem < old_rmem then
+                       old_rmem
+                   else
+                       computeMaxRes new_peers
+        new_failn1 = computeFailN1 new_rmem (f_mem t) new_disk
+    in t {slist = new_slist, f_disk = new_disk, peers = new_peers,
+          failN1 = new_failn1, maxRes = new_rmem}
+
+-- | Adds a primary instance.
+addPri :: Node -> Instance.Instance -> Maybe Node
+addPri t inst =
+    let iname = Instance.idx inst
+        new_mem = f_mem t - Instance.mem inst
+        new_disk = f_disk t - Instance.disk inst
+        new_failn1 = computeFailN1 (maxRes t) new_mem new_disk in
+      if new_failn1 then
+        Nothing
+      else
+        let new_plist = iname:(plist t) in
+        Just t {plist = new_plist, f_mem = new_mem, f_disk = new_disk,
+                failN1 = new_failn1}
+
+-- | Adds a secondary instance.
+addSec :: Node -> Instance.Instance -> Int -> Maybe Node
+addSec t inst pdx =
+    let iname = Instance.idx inst
+        old_peers = peers t
+        new_disk = f_disk t - Instance.disk inst
+        new_peem = PeerMap.find pdx old_peers + Instance.mem inst
+        new_peers = PeerMap.add pdx new_peem old_peers
+        new_rmem = max (maxRes t) new_peem
+        new_failn1 = computeFailN1 new_rmem (f_mem t) new_disk in
+    if new_failn1 then
+        Nothing
+    else
+        let new_slist = iname:(slist t) in
+        Just t {slist = new_slist, f_disk = new_disk,
+                peers = new_peers, failN1 = new_failn1,
+                maxRes = new_rmem}
+
+-- | Simple converter to string.
+str :: Node -> String
+str t =
+    printf ("Node %d (mem=%5d MiB, disk=%5.2f GiB)\n  Primaries:" ++
+            " %s\nSecondaries: %s")
+      (idx t) (f_mem t) ((f_disk t) `div` 1024)
+      (commaJoin (map show (plist t)))
+      (commaJoin (map show (slist t)))
+
+-- | String converter for the node list functionality.
+list :: String -> Node -> String
+list n t =
+    let pl = plist t
+        sl = slist t
+        (mp, dp) = normUsed t
+    in
+      printf "  %s(%d)\t%5d\t%5d\t%3d\t%3d\t%s\t%s\t%.5f\t%.5f"
+                 n (idx t) (f_mem t) ((f_disk t) `div` 1024)
+                 (length pl) (length sl)
+                 (commaJoin (map show pl))
+                 (commaJoin (map show sl))
+                 mp dp
+
+-- | Normalize the usage status
+-- This converts the used memory and disk values into a normalized integer
+-- value, currently expresed as per mille of totals
+
+normUsed :: Node -> (Double, Double)
+normUsed n =
+    let mp = (fromIntegral $ f_mem n) / (fromIntegral $ t_mem n)
+        dp = (fromIntegral $ f_disk n) / (fromIntegral $ t_disk n)
+    in (mp, dp)
diff --git a/src/PeerMap.hs b/src/PeerMap.hs

new file mode 100644 (file)

index 0000000..284fa70
--- /dev/null
+++ b/src/PeerMap.hs
@@ -0,0 +1,76 @@
+{-|
+  Module abstracting the peer map implementation.
+
+This is abstracted separately since the speed of peermap updates can
+be a significant part of the total runtime, and as such changing the
+implementation should be easy in case it's needed.
+
+-}
+
+module PeerMap (
+                PeerMap,
+                Key,
+                Elem,
+                empty,
+                create,
+                accumArray,
+                PeerMap.find,
+                add,
+                remove,
+                maxElem
+                )
+    where
+
+import Data.Maybe (fromMaybe)
+import Data.List
+import Data.Function
+import Data.Ord
+
+type Key = Int
+type Elem = Int
+type PeerMap = [(Key, Elem)]
+
+empty :: PeerMap
+empty = []
+
+create :: Key -> PeerMap
+create _ = []
+
+-- | Our reverse-compare function
+pmCompare :: (Key, Elem) -> (Key, Elem) -> Ordering
+pmCompare a b = (compare `on` snd) b a
+
+addWith :: (Elem -> Elem -> Elem) -> Key -> Elem -> PeerMap -> PeerMap
+addWith fn k v lst =
+    let r = lookup k lst
+    in
+      case r of
+        Nothing -> insertBy pmCompare (k, v) lst
+        Just o -> insertBy pmCompare (k, fn o v) (remove k lst)
+
+accumArray :: (Elem -> Elem -> Elem) -> Elem -> (Key, Key) ->
+              [(Key, Elem)] -> PeerMap
+accumArray fn _ _ lst =
+    case lst of
+      [] -> empty
+      (k, v):xs -> addWith fn k v $ accumArray fn undefined undefined xs
+
+find :: Key -> PeerMap -> Elem
+find k c = fromMaybe 0 $ lookup k c
+
+add :: Key -> Elem -> PeerMap -> PeerMap
+add k v c = addWith (\_ n -> n) k v c
+
+remove :: Key -> PeerMap -> PeerMap
+remove k c = case c of
+               [] -> []
+               (x@(x', _)):xs -> if k == x' then xs
+                            else x:(remove k xs)
+
+to_list :: PeerMap -> [Elem]
+to_list c = snd $ unzip c
+
+maxElem :: PeerMap -> Elem
+maxElem c = case c of
+              [] -> 0
+              (_, v):_ -> v
diff --git a/src/Utils.hs b/src/Utils.hs

new file mode 100644 (file)

index 0000000..d54bdde
--- /dev/null
+++ b/src/Utils.hs
@@ -0,0 +1,53 @@
+{-| Utility functions -}
+
+module Utils where
+
+import Data.List
+
+import Debug.Trace
+
+-- | To be used only for debugging, breaks referential integrity.
+debug :: Show a => a -> a
+debug x = trace (show x) x
+
+-- | Comma-join a string list.
+commaJoin :: [String] -> String
+commaJoin = intercalate ","
+
+-- | Split a string on a separator and return an array.
+sepSplit :: Char -> String -> [String]
+sepSplit sep s
+    | x == "" && xs == [] = []
+    | xs == []            = [x]
+    | ys == []            = x:"":[]
+    | otherwise           = x:(sepSplit sep ys)
+    where (x, xs) = break (== sep) s
+          ys = drop 1 xs
+
+-- | Partial application of sepSplit to @'.'@
+commaSplit :: String -> [String]
+commaSplit = sepSplit ','
+
+-- | Swap a list of @(a, b)@ into @(b, a)@
+swapPairs :: [(a, b)] -> [(b, a)]
+swapPairs = map (\ (a, b) -> (b, a))
+
+-- Simple and slow statistical functions, please replace with better versions
+
+-- | Mean value of a list.
+meanValue :: Floating a => [a] -> a
+meanValue lst = (sum lst) / (fromIntegral $ length lst)
+
+-- | Standard deviation.
+stdDev :: Floating a => [a] -> a
+stdDev lst =
+    let mv = meanValue lst
+        square = (^ (2::Int)) -- silences "defaulting the constraint..."
+        av = sum $ map square $ map (\e -> e - mv) lst
+        bv = sqrt (av / (fromIntegral $ length lst))
+    in bv
+
+
+-- | Coefficient of variation.
+varianceCoeff :: Floating a => [a] -> a
+varianceCoeff lst = (stdDev lst) / (fromIntegral $ length lst)
diff --git a/src/hbal.hs b/src/hbal.hs

new file mode 100644 (file)

index 0000000..c5089db
--- /dev/null
+++ b/src/hbal.hs
@@ -0,0 +1,164 @@
+{-| Solver for N+1 cluster errors
+
+-}
+
+module Main (main) where
+
+import Data.List
+import Data.Function
+import Monad
+import System
+import System.IO
+import System.Console.GetOpt
+import qualified System
+
+import Text.Printf (printf)
+
+import qualified Container
+import qualified Cluster
+
+-- | Command line options structure.
+data Options = Options
+    { optShowNodes :: Bool
+    , optShowCmds  :: Bool
+    , optNodef     :: FilePath
+    , optInstf     :: FilePath
+    , optMaxRounds :: Int
+    } deriving Show
+
+-- | Default values for the command line options.
+defaultOptions :: Options
+defaultOptions  = Options
+ { optShowNodes = False
+ , optShowCmds  = False
+ , optNodef     = "nodes"
+ , optInstf     = "instances"
+ , optMaxRounds = -1
+ }
+
+{- | Start computing the solution at the given depth and recurse until
+we find a valid solution or we exceed the maximum depth.
+
+-}
+iterateDepth :: Cluster.Table
+             -> Int                 -- ^ Current round
+             -> Int                 -- ^ Max rounds
+             -> IO Cluster.Table
+iterateDepth ini_tbl cur_round max_rounds =
+    let Cluster.Table _ ini_il ini_cv ini_plc = ini_tbl
+        all_inst = Container.elems ini_il
+        fin_tbl = Cluster.checkMove ini_tbl all_inst
+        (Cluster.Table _ _ fin_cv fin_plc) = fin_tbl
+        ini_plc_len = length ini_plc
+        fin_plc_len = length fin_plc
+        allowed_next = (max_rounds < 0 || cur_round < max_rounds)
+    in
+      do
+        printf "  - round %d: " cur_round
+        hFlush stdout
+        let msg =
+                if fin_cv < ini_cv then
+                    if not allowed_next then
+                        printf "%.8f, %d moves (stopping due to round limit)\n"
+                               fin_cv
+                               (fin_plc_len - ini_plc_len)
+                    else
+                        printf "%.8f, %d moves\n" fin_cv
+                                   (fin_plc_len - ini_plc_len)
+                else
+                    "no improvement, stopping\n"
+        putStr msg
+        hFlush stdout
+        (if fin_cv < ini_cv then -- this round made success, try deeper
+             if allowed_next
+             then iterateDepth fin_tbl (cur_round + 1) max_rounds
+             -- don't go deeper, but return the better solution
+             else return fin_tbl
+         else
+             return ini_tbl)
+
+-- | Options list and functions
+options :: [OptDescr (Options -> Options)]
+options =
+    [ Option ['p']     ["print-nodes"]
+      (NoArg (\ opts -> opts { optShowNodes = True }))
+      "print the final node list"
+    , Option ['C']     ["print-commands"]
+      (NoArg (\ opts -> opts { optShowCmds = True }))
+      "print the ganeti command list for reaching the solution"
+     , Option ['n']     ["nodes"]
+      (ReqArg (\ f opts -> opts { optNodef = f }) "FILE")
+      "the node list FILE"
+     , Option ['i']     ["instances"]
+      (ReqArg (\ f opts -> opts { optInstf =  f }) "FILE")
+      "the instance list FILE"
+     , Option ['r']     ["max-rounds"]
+      (ReqArg (\ i opts -> opts { optMaxRounds =  (read i)::Int }) "N")
+      "do not run for more than R rounds(useful for very unbalanced clusters)"
+     ]
+
+-- | Command line parser, using the 'options' structure.
+parseOpts :: [String] -> IO (Options, [String])
+parseOpts argv =
+    case getOpt Permute options argv of
+      (o,n,[]  ) ->
+          return (foldl (flip id) defaultOptions o, n)
+      (_,_,errs) ->
+          ioError (userError (concat errs ++ usageInfo header options))
+      where header = "Usage: hbal [OPTION...]"
+
+-- | Main function.
+main :: IO ()
+main = do
+  cmd_args <- System.getArgs
+  (opts, _) <- parseOpts cmd_args
+  (nl, il, ktn, kti) <- liftM2 Cluster.loadData
+                        (readFile $ optNodef opts)
+                        (readFile $ optInstf opts)
+  printf "Loaded %d nodes, %d instances\n"
+             (Container.size nl)
+             (Container.size il)
+  let (bad_nodes, bad_instances) = Cluster.computeBadItems nl il
+  printf "Initial check done: %d bad nodes, %d bad instances.\n"
+             (length bad_nodes) (length bad_instances)
+
+  when (length bad_nodes > 0) $ do
+         putStrLn "Cluster is not N+1 happy, please fix N+1 first. Exiting."
+         exitWith $ ExitFailure 1
+
+  when (optShowNodes opts) $
+       do
+         putStrLn "Initial cluster status:"
+         putStrLn $ Cluster.printNodes ktn nl
+
+  let ini_cv = Cluster.compCV nl
+      ini_tbl = Cluster.Table nl il ini_cv []
+  printf "Initial coefficients: overall %.8f, %s\n"
+       ini_cv (Cluster.printStats nl)
+
+  putStrLn "Trying to minimize the CV..."
+  fin_tbl <- iterateDepth ini_tbl 1 (optMaxRounds opts)
+  let (Cluster.Table fin_nl _ fin_cv fin_plc) = fin_tbl
+      ord_plc = reverse fin_plc
+  printf "Final coefficients:   overall %.8f, %s\n"
+         fin_cv
+         (Cluster.printStats fin_nl)
+
+  printf "Solution length=%d\n" (length ord_plc)
+
+  let (sol_strs, cmd_strs) = Cluster.printSolution il ktn kti ord_plc
+  putStr $ unlines $ sol_strs
+  when (optShowCmds opts) $
+       do
+         putStrLn ""
+         putStrLn "Commands to run to reach the above solution:"
+         putStr $ unlines $ map ("  echo gnt-instance " ++) $ concat cmd_strs
+  when (optShowNodes opts) $
+       do
+         let (orig_mem, orig_disk) = Cluster.totalResources nl
+             (final_mem, final_disk) = Cluster.totalResources fin_nl
+         putStrLn ""
+         putStrLn "Final cluster status:"
+         putStrLn $ Cluster.printNodes ktn fin_nl
+         printf "Original: mem=%d disk=%d\n" orig_mem orig_disk
+         printf "Final:    mem=%d disk=%d\n" final_mem final_disk
diff --git a/src/hn1.hs b/src/hn1.hs

new file mode 100644 (file)

index 0000000..73d7549
--- /dev/null
+++ b/src/hn1.hs
@@ -0,0 +1,169 @@
+{-| Solver for N+1 cluster errors
+
+-}
+
+module Main (main) where
+
+import Data.List
+import Data.Function
+import Monad
+import System
+import System.IO
+import System.Console.GetOpt
+import qualified System
+
+import Text.Printf (printf)
+
+import qualified Container
+import qualified Instance
+import qualified Cluster
+
+-- | Command line options structure.
+data Options = Options
+    { optShowNodes   :: Bool
+    , optShowCmds    :: Bool
+    , optNodef       :: FilePath
+    , optInstf       :: FilePath
+    , optMinDepth    :: Int
+    , optMaxRemovals :: Int
+    , optMinDelta    :: Int
+    , optMaxDelta    :: Int
+    } deriving Show
+
+-- | Default values for the command line options.
+defaultOptions :: Options
+defaultOptions    = Options
+ { optShowNodes   = False
+ , optShowCmds    = False
+ , optNodef       = "nodes"
+ , optInstf       = "instances"
+ , optMinDepth    = 1
+ , optMaxRemovals = -1
+ , optMinDelta    = 0
+ , optMaxDelta    = -1
+ }
+
+{- | Start computing the solution at the given depth and recurse until
+we find a valid solution or we exceed the maximum depth.
+
+-}
+iterateDepth :: Cluster.NodeList
+             -> [Instance.Instance]
+             -> Int
+             -> Int
+             -> Int
+             -> Int
+             -> IO (Maybe Cluster.Solution)
+iterateDepth nl bad_instances depth max_removals min_delta max_delta =
+    let
+        max_depth = length bad_instances
+        sol = Cluster.computeSolution nl bad_instances depth
+              max_removals min_delta max_delta
+    in
+      do
+        printf "%d " depth
+        hFlush stdout
+        case sol `seq` sol of
+          Nothing ->
+              if depth > max_depth then
+                  return Nothing
+              else
+                  iterateDepth nl bad_instances (depth + 1)
+                               max_removals min_delta max_delta
+          _ -> return sol
+
+-- | Options list and functions
+options :: [OptDescr (Options -> Options)]
+options =
+    [ Option ['p']     ["print-nodes"]
+      (NoArg (\ opts -> opts { optShowNodes = True }))
+      "print the final node list"
+    , Option ['C']     ["print-commands"]
+      (NoArg (\ opts -> opts { optShowCmds = True }))
+      "print the ganeti command list for reaching the solution"
+     , Option ['n']     ["nodes"]
+      (ReqArg (\ f opts -> opts { optNodef = f }) "FILE")
+      "the node list FILE"
+     , Option ['i']     ["instances"]
+      (ReqArg (\ f opts -> opts { optInstf =  f }) "FILE")
+      "the instance list FILE"
+     , Option ['d']     ["depth"]
+      (ReqArg (\ i opts -> opts { optMinDepth =  (read i)::Int }) "D")
+      "start computing the solution at depth D"
+     , Option ['r']     ["max-removals"]
+      (ReqArg (\ i opts -> opts { optMaxRemovals =  (read i)::Int }) "R")
+      "do not process more than R removal sets (useful for high depths)"
+     , Option ['L']     ["max-delta"]
+      (ReqArg (\ i opts -> opts { optMaxDelta =  (read i)::Int }) "L")
+      "refuse solutions with delta higher than L"
+     , Option ['l']     ["min-delta"]
+      (ReqArg (\ i opts -> opts { optMinDelta =  (read i)::Int }) "L")
+      "return once a solution with delta L or lower has been found"
+     ]
+
+-- | Command line parser, using the 'options' structure.
+parseOpts :: [String] -> IO (Options, [String])
+parseOpts argv =
+    case getOpt Permute options argv of
+      (o,n,[]  ) ->
+          return (foldl (flip id) defaultOptions o, n)
+      (_,_,errs) ->
+          ioError (userError (concat errs ++ usageInfo header options))
+      where header = "Usage: hn1 [OPTION...]"
+
+-- | Main function.
+main :: IO ()
+main = do
+  cmd_args <- System.getArgs
+  (opts, _) <- parseOpts cmd_args
+  let min_depth = optMinDepth opts
+  (nl, il, ktn, kti) <- liftM2 Cluster.loadData
+                        (readFile $ optNodef opts)
+                        (readFile $ optInstf opts)
+  printf "Loaded %d nodes, %d instances\n"
+             (Container.size nl)
+             (Container.size il)
+  let (bad_nodes, bad_instances) = Cluster.computeBadItems nl il
+  printf "Initial check done: %d bad nodes, %d bad instances.\n"
+             (length bad_nodes) (length bad_instances)
+
+  when (null bad_instances) $ do
+         putStrLn "Happy time! Cluster is fine, no need to burn CPU."
+         exitWith ExitSuccess
+
+  when (length bad_instances < min_depth) $ do
+         printf "Error: depth %d is higher than the number of bad instances.\n"
+                min_depth
+         exitWith $ ExitFailure 2
+
+  putStr "Computing solution: depth "
+  hFlush stdout
+
+  result <- iterateDepth nl bad_instances min_depth (optMaxRemovals opts)
+            (optMinDelta opts) (optMaxDelta opts)
+  let (min_d, solution) =
+          case result of
+            Just (Cluster.Solution a b) -> (a, b)
+            Nothing -> (-1, [])
+  when (min_d == -1) $ do
+         putStrLn "failed. Try to run with higher depth."
+         exitWith $ ExitFailure 1
+
+  printf "found.\nSolution (delta=%d):\n" $! min_d
+  let (sol_strs, cmd_strs) = Cluster.printSolution il ktn kti solution
+  putStr $ unlines $ sol_strs
+  when (optShowCmds opts) $
+       do
+         putStrLn ""
+         putStrLn "Commands to run to reach the above solution:"
+         putStr $ unlines $ map ("  echo gnt-instance " ++) $ concat cmd_strs
+  when (optShowNodes opts) $
+       do
+         let (orig_mem, orig_disk) = Cluster.totalResources nl
+             ns = Cluster.applySolution nl il solution
+             (final_mem, final_disk) = Cluster.totalResources ns
+         putStrLn ""
+         putStrLn "Final cluster status:"
+         putStrLn $ Cluster.printNodes ktn ns
+         printf "Original: mem=%d disk=%d\n" orig_mem orig_disk
+         printf "Final:    mem=%d disk=%d\n" final_mem final_disk
author	Iustin Pop <iustin@google.com>
	Wed, 28 Jan 2009 11:09:09 +0000 (12:09 +0100)
committer	Iustin Pop <iustin@google.com>
	Wed, 28 Jan 2009 11:09:09 +0000 (12:09 +0100)
COPYING	[new file with mode: 0644]	patch \| blob
Makefile	[new file with mode: 0644]	patch \| blob
README	[new file with mode: 0644]	patch \| blob
haddock-prologue	[new file with mode: 0644]	patch \| blob
hscolour.css	[new file with mode: 0644]	patch \| blob
src/Cluster.hs	[new file with mode: 0644]	patch \| blob
src/Container.hs	[new file with mode: 0644]	patch \| blob
src/Instance.hs	[new file with mode: 0644]	patch \| blob
src/Makefile	[new file with mode: 0644]	patch \| blob
src/Node.hs	[new file with mode: 0644]	patch \| blob
src/PeerMap.hs	[new file with mode: 0644]	patch \| blob
src/Utils.hs	[new file with mode: 0644]	patch \| blob
src/hbal.hs	[new file with mode: 0644]	patch \| blob
src/hn1.hs	[new file with mode: 0644]	patch \| blob