Iustin Pop [Mon, 18 May 2009 06:46:12 +0000 (08:46 +0200)]
Add a copy of Rapi.HS as IAlloc.hs
This will be used in two ways:
- format the response to Ganeti (easy, implemented)
- parse the input data and build the node/instance lists (hard :)
Iustin Pop [Mon, 18 May 2009 06:18:21 +0000 (08:18 +0200)]
Remove the apidoc dir on clean
Iustin Pop [Sun, 26 Apr 2009 14:15:33 +0000 (16:15 +0200)]
hbal: add a --quiet option
This option is the opposite of the --verbose option, and it allows
decreasing the verbosity level from the default of one to zero (which
currently doesn't show the warning messages for missing disk/memory).
Iustin Pop [Sat, 25 Apr 2009 15:38:22 +0000 (17:38 +0200)]
hbal: Simplify the oneline formatting
This patch moves the oneline format into a separate function for easier
usage.
Iustin Pop [Sat, 25 Apr 2009 15:32:46 +0000 (17:32 +0200)]
hbal: Early exit when we don't have any instances
For clusters with no instances, there is no point in computing either
the score or in running the algorithm. In this case, we exit prematurely
and when running in one-line mode we show dummy information.
Iustin Pop [Sat, 25 Apr 2009 15:27:27 +0000 (17:27 +0200)]
hbal: Add a new min-score option
This new parameter causes the algorithm to finish (or even not start at
all) if we reach/have a score better than it.
Iustin Pop [Sat, 25 Apr 2009 14:31:25 +0000 (16:31 +0200)]
hbal: Change hardcoded tests to monadic composition
In some case we manually do “if isNothing … then Nothing else …”, which
can be very easily replaced with a monadic construct in the Maybe monad.
Iustin Pop [Tue, 21 Apr 2009 07:05:52 +0000 (09:05 +0200)]
Fix the makefile dist rule
It was missing a dependency on the Version.hs file, so right after “make
clean”, a “make dist” used to fail.
Iustin Pop [Tue, 21 Apr 2009 06:50:04 +0000 (08:50 +0200)]
Update NEWS file for the 0.0.8 release
Iustin Pop [Mon, 20 Apr 2009 12:12:13 +0000 (14:12 +0200)]
Increase allowed missing memory to 512MB
Since Xen seems to “steal” some amounts of memory (depending on total
node memory), we increase the maximum allowed missing memory to 512MB,
based on gathered data from multiple machines.
Iustin Pop [Mon, 20 Apr 2009 11:13:17 +0000 (13:13 +0200)]
Update man pages with the env variables
This patch documents the environment variables in the man pages of hbal
and hn1.
Iustin Pop [Mon, 20 Apr 2009 11:07:09 +0000 (13:07 +0200)]
Add reading the file names from env vars
This patch adds support for selecting the instance/node file names via
two environment variables (HTOOLS_NODES, HTOOLS_INSTANCES).
Unfortunately we still have lots of duplicated code, since the options
are not unified.
Iustin Pop [Mon, 20 Apr 2009 11:00:10 +0000 (13:00 +0200)]
Implement error checks for extra arguments
Neither hbal nor hn1 take any arguments beside the options, so if any
are passed is most likely an unintended error. This patch aborts in such
cases.
Iustin Pop [Mon, 20 Apr 2009 10:00:51 +0000 (12:00 +0200)]
Fix the makefile clean rule
Use the $HPROGS variable instead of hardcoding the program names.
Iustin Pop [Mon, 20 Apr 2009 10:00:29 +0000 (12:00 +0200)]
Add a gitignore file
Iustin Pop [Sat, 18 Apr 2009 22:17:05 +0000 (00:17 +0200)]
Implement writing the command list to a script
This patch adds support in hbal for writing the command list to a shell
script, with error checking and allowing for early exit.
Iustin Pop [Thu, 16 Apr 2009 09:02:48 +0000 (11:02 +0200)]
hbal: Abort for invalid offline node names
Since it's easy to pass a wrong node name as offline, we should abort
instead of silently ignoring it.
Iustin Pop [Mon, 23 Mar 2009 07:26:08 +0000 (08:26 +0100)]
Updated new in preparation for the 0.0.7 release
Iustin Pop [Mon, 23 Mar 2009 07:12:40 +0000 (08:12 +0100)]
More documentation updates
This removes most of the content of the README file (obsoleted by new
algorithm and man pages), modifies the Makefile to include the built
documentation in the source archive (so that haddock/hscolour are not
needed) and updates the haddock-prologue with current information.
Iustin Pop [Mon, 23 Mar 2009 06:40:49 +0000 (07:40 +0100)]
More man page updates
This moves some data from README to the man pages and has other general
improvements.
Iustin Pop [Sun, 22 Mar 2009 22:32:50 +0000 (23:32 +0100)]
Add checks for missing disk space
This small patch adds disk space checks to the Cluster.checkData
function, and simplifies a little the warning messages.
Iustin Pop [Sun, 22 Mar 2009 22:30:42 +0000 (23:30 +0100)]
Include DRBD overhead in sda/sdb size
For Ganeti 1.2 which doesn't have the ‘disk_usage’ instance query field,
we need to manually include the DRBD overhead (per disk). This patch
modifies the RAPI collection to do this, but loading from disk does not
as it's unknown if the query came from hscan or RAPI 1.2 or RAPI 2.0...
Iustin Pop [Sun, 22 Mar 2009 22:12:16 +0000 (23:12 +0100)]
Documentation updates
This patch adds a man page for hscan and updates the README and other
man pages with the latest changes.
Iustin Pop [Sun, 22 Mar 2009 21:33:55 +0000 (22:33 +0100)]
Update all needed node fields on f_mem change
This fixes the setFmem function which didn't compute other related
fields after free memory change. Ideally, this should be abstracted so
that add/remove Pri and similar functions could reuse it instead of
duplicating code.
Iustin Pop [Sun, 22 Mar 2009 10:40:58 +0000 (11:40 +0100)]
Fix interaction between down instances and nodes
If an instance is down, it's memory is not reflected in the node used
memory, and thus the node free memory is higher than the actual value.
This patch deducts the memory for such instances from the node free
memory, allowing a correct calculation for such cases.
Iustin Pop [Sun, 22 Mar 2009 10:24:23 +0000 (11:24 +0100)]
Add a new instance field denoting run status
This patch modifies Rapi, the Cluster.loadData and hscan serialization to load
and save the instance run status. At instance level, we add both a boolean
field denoting the true/false run status, and a string field which holds the
original value (since we don't have a 1-to-1 mapping) for use in hscan
serialization.
The run status is not yet used.
Iustin Pop [Sun, 22 Mar 2009 10:03:52 +0000 (11:03 +0100)]
Show the x_mem/i_mem in node list
This patch adds checking of cluster data in the binaries and display of
node's x_mem/i_mem in the node list.
Iustin Pop [Sun, 22 Mar 2009 10:02:59 +0000 (11:02 +0100)]
Add functions to check and fix cluster data
This patch adds a checkData function which goes over the node list and computes
the unaccounted memory, returning a list of warning messages (if any) and the
update nodes.
Iustin Pop [Sun, 22 Mar 2009 09:55:12 +0000 (10:55 +0100)]
Add a new node filed x_mem
Nodes can have some memory unaccounted for, due to (e.g.) hypervisor
overhead, rounding errors in reporting, etc.
It is better if we model this memory explicitly instead of hiding it,
and actually since the n_mem addition it is actually required to do so.
The new attribute is not yet used.
Iustin Pop [Sun, 22 Mar 2009 09:52:50 +0000 (10:52 +0100)]
Split common CLI functionality into a module
This patch moves the common CLI functionality (as much as currently
possible) into a separate module. This means we only have one parseOpts
and that Utils.hs doesn't keep this kind of functions anymore.
Iustin Pop [Sun, 22 Mar 2009 00:18:22 +0000 (01:18 +0100)]
Remove unused and obsolete function
The Node.str function is very old and is not useful since the node
objects have much more fields today. This patch removes it, and if
needed a full node display can be done via ‘show’.
Iustin Pop [Sat, 21 Mar 2009 23:25:55 +0000 (00:25 +0100)]
Add node memory field to Node objects
This patch adds a new n_mem field to the node objects, and implements
read/save/show support for it. The field is not currently used (except
in the node list) but will be used for checking data consistency and
instance up/down status.
Iustin Pop [Sat, 21 Mar 2009 23:12:29 +0000 (00:12 +0100)]
Pass actual types to node/instance constructors
This patch changes the parameters passed to the node and instance
constructors from generic Strings (which are then parsed via “read”) to
the actual used types, by converting them earlier in Cluster.loadData.
Iustin Pop [Sat, 21 Mar 2009 23:06:51 +0000 (00:06 +0100)]
Small change in hscan
This fixes a mistake between Int/Integer. Should be more careful :)
Iustin Pop [Sat, 21 Mar 2009 22:51:54 +0000 (23:51 +0100)]
Add hscan to Makefile
Iustin Pop [Sat, 21 Mar 2009 22:50:47 +0000 (23:50 +0100)]
Add the hscan tool
This patch adds an hscan tool that loads data from clusters via RAPI and
writes it to files that can be later used offline.
Iustin Pop [Sat, 21 Mar 2009 22:48:49 +0000 (23:48 +0100)]
Some small changes in preparation for hscan
This patch does some small changes:
- fixes a comment
- export more node functions (unneeded now, but hscan will use them)
- fixes Makefile rule for building the programs
Iustin Pop [Sat, 21 Mar 2009 21:00:22 +0000 (22:00 +0100)]
Add a separate type for the [(Int, String)] list
This is added for better readability, since this is very often used in
declarations.
Iustin Pop [Sat, 21 Mar 2009 14:48:12 +0000 (15:48 +0100)]
Handle correctly offline nodes in cluster scoring
This patch changes two things with regard to offline nodes:
- first, it only calculates the various coefficients across online
nodes
- second, it adds a new score denoting the percentage of instances
which live on such nodes
The first change allows correct score computation in presence of offline
nodes (whose properties we don't need to take into account), while the
second change actively evacuates offline nodes.
Iustin Pop [Sat, 21 Mar 2009 11:20:15 +0000 (12:20 +0100)]
Show offline nodes in the node status list
This patch adds a new ‘-’ flag for the node status which denotes offline
nodes.
Iustin Pop [Fri, 20 Mar 2009 23:26:00 +0000 (00:26 +0100)]
Restrict move list based on offline node status
This patch changes the Cluster.checkInstanceMove function to restrict
the target move list based on which nodes are online.
Iustin Pop [Fri, 20 Mar 2009 22:48:07 +0000 (23:48 +0100)]
Add command line support for offlining nodes
This patch modifies hbal (only, hn1 not yet) for setting nodes offline.
Iustin Pop [Fri, 20 Mar 2009 22:45:15 +0000 (23:45 +0100)]
Add a new 'offline' Node attribute
This patch adds a new node attribute - offline - which will serve to
skip nodes from the target candidate list.
Iustin Pop [Fri, 20 Mar 2009 22:43:59 +0000 (23:43 +0100)]
More fixes to the Makefile
Iustin Pop [Fri, 20 Mar 2009 21:46:09 +0000 (22:46 +0100)]
Small doc update in Node.hs
Iustin Pop [Fri, 20 Mar 2009 18:58:35 +0000 (19:58 +0100)]
Some updates to the apidoc rules
Iustin Pop [Fri, 20 Mar 2009 18:26:10 +0000 (19:26 +0100)]
Fix/enhance makefile rules after the rename
Iustin Pop [Fri, 20 Mar 2009 17:17:27 +0000 (18:17 +0100)]
Add a .gitattributes file
This will enhance the ‘dist’ rule by skipping unneeded files.
Iustin Pop [Fri, 20 Mar 2009 17:16:21 +0000 (18:16 +0100)]
Introduce a namespace for the modules
The modules are moved from the ‘top’ namespace to ‘Ganeti.HTools’, in
compliance with standard practices.
Iustin Pop [Mon, 16 Mar 2009 07:35:48 +0000 (08:35 +0100)]
Update NEWS for version 0.0.6
Iustin Pop [Sat, 14 Mar 2009 19:25:41 +0000 (20:25 +0100)]
Abstract the version format into a function
This patch moves the version string creation into a function in Utils
which shows some more information.
Iustin Pop [Sat, 14 Mar 2009 19:12:02 +0000 (20:12 +0100)]
Add a man page for hn1 and update the hbal one
A new man page and typos fixed in hbal.1.
Iustin Pop [Sat, 14 Mar 2009 11:53:43 +0000 (12:53 +0100)]
Add a manpage for hbal
Iustin Pop [Sat, 14 Mar 2009 11:49:43 +0000 (12:49 +0100)]
Add a --version option
This patch adds a -V, --version command line option that shows the
program version and also updates the hn1 usage string (similar to hbal).
Iustin Pop [Sat, 14 Mar 2009 11:41:13 +0000 (12:41 +0100)]
Move a function around in hbal.hs
This just reorders some functions for a more logical ordering.
Iustin Pop [Sat, 14 Mar 2009 11:35:04 +0000 (12:35 +0100)]
Show the step counter in the solution list
This patch changes the solution list to include a step counter so that
it's more clear these are successive steps (in a definite order), and
not just an unordered list of changes.
Iustin Pop [Sat, 14 Mar 2009 11:16:52 +0000 (12:16 +0100)]
Use gnt-instance migrate instead of failover
This patch changes the gnt-instance failover to migrate, and fixes a bug
in the formatting of commands.
Iustin Pop [Sat, 14 Mar 2009 11:14:17 +0000 (12:14 +0100)]
hbal: added a verbose setting and changed output
This patch added a verbose output and changed the output so that by
default it is less verbose and more clear.
Iustin Pop [Sat, 14 Mar 2009 09:11:36 +0000 (10:11 +0100)]
Add a new move FailoverAndReplace
This patch adds a new instance move, FailoverAndReplace, which promotes
the old secondary to primary and then uses a new secondary node.
This is the last move that we can do within the limitations of one node
changed per move.
Iustin Pop [Fri, 13 Mar 2009 19:42:56 +0000 (20:42 +0100)]
Some more docstring updates
Iustin Pop [Fri, 13 Mar 2009 19:11:12 +0000 (20:11 +0100)]
Enhance the command list for the solution
This patch moves the formatting of the command list to Cluster.hs and
enhances it with separator messages between the steps.
Iustin Pop [Fri, 13 Mar 2009 18:53:41 +0000 (19:53 +0100)]
Add a new ReplaceAndFailover move
This patch adds a new replace secondary and failover move (equals to
“r:x f”), which can improve the solution (since we are testing more
options at each step).
Iustin Pop [Fri, 13 Mar 2009 18:52:44 +0000 (19:52 +0100)]
Some whitespace changes
Aligned the comments in Instance.hs
Iustin Pop [Fri, 13 Mar 2009 07:49:12 +0000 (08:49 +0100)]
Convert hbal from multiple rounds to a step-method
Currently hbal does multiple rounds, stopping when a rounds doesn't
bring improvements. With the recent changes to not remove instances from
the candidate list, this is obsolete as the first round will always run
to the end of the improvements.
This patch changes this so that the Cluster.checkMove function doesn't
recurse, but just computes the next best move (as its docstring says).
This means we can actually incrementally compute and print the solution,
and this is needed as otherwise an instance could move twice and the
second time it needs the current placement to compute the exact command
line and operation needed for the move.
Iustin Pop [Fri, 13 Mar 2009 07:09:35 +0000 (08:09 +0100)]
Rework the solution printing in Cluster.hs
This abstracts the individual placement solution so that it can be used
independently.
Iustin Pop [Thu, 12 Mar 2009 20:21:14 +0000 (21:21 +0100)]
Remove the restriction of one-move-per-round
The current code restricts each instance to one move per round. This is
bad, as an computation restarted in the middle of the solution will have
a different set of instances to work and will thus lead to a different
end-solution.
Once this is applied, further rounds are not possible since the first
round will have tried all instances at its end. As such, the removal of
the rounds feature will be next.
The code adds a hard-coded 100 moves limit, which for big clusters is
actually small.
Iustin Pop [Thu, 12 Mar 2009 20:16:41 +0000 (21:16 +0100)]
Add a header to node lists and print more data
This prints the total memory/disk and also adds a header.
Iustin Pop [Thu, 12 Mar 2009 20:07:47 +0000 (21:07 +0100)]
Rename the maxRes to r_mem
This is to keep in style with the other memory variables.
Iustin Pop [Thu, 12 Mar 2009 19:54:41 +0000 (20:54 +0100)]
Display the reserved memory too in node lists
This is useful and not easy to compute otherwise.
Iustin Pop [Thu, 12 Mar 2009 19:31:57 +0000 (20:31 +0100)]
First try to embed VCS id in binaries
This patch attempts to embed the VCS id in binaries, based on the way
other projects seem to do this.
Iustin Pop [Wed, 11 Mar 2009 08:22:45 +0000 (09:22 +0100)]
Fix the Makefile clean rule
This removes obsolete entries from the clean rule and adds the hbal
binary.
Iustin Pop [Wed, 11 Mar 2009 08:12:11 +0000 (09:12 +0100)]
Change the N1 score to percent of N1 failures
Since for a very many N+1 failures in a cluster, we could actually
degrade the N1 CV by making a node N+1 compliant, we need to make sure
this value only decreases when fixing non-compliant nodes.
The easiest way is to compute the N+1 score as a percentage of failed
nodes, with the caveat that the domain of values might not be fully
compatible with the other scores. It is still [0, 1] but does not vary
like the others.
Iustin Pop [Wed, 11 Mar 2009 08:08:31 +0000 (09:08 +0100)]
Add two new variables in the cluster score
This patch adds two new variables to the cluster score:
- variance of the failN1 attribute
- variance of the reserved memory percentage
The variance of the failN1 helps make the cluster N+1 happy, whereas the
reserved memory percentage helps balance the unused memory for
redundancy on the nodes.
Iustin Pop [Wed, 11 Mar 2009 08:07:06 +0000 (09:07 +0100)]
Add the node reserved memory percentage
This patch adds the node attribute “reserved memory percentage” that is
derived from the maximum reserved memory for a node and its total
memory.
This will be useful for enhancing the balancing algorithm.
Iustin Pop [Wed, 11 Mar 2009 07:14:56 +0000 (08:14 +0100)]
Record the running cluster CV in placements
This patch adds a score variable to the placement type, so we can record
the changes in the cluster CV for later display.
This gives visibility in the decrease of the parameters and can show
which are the most important steps to perform (out of the full move
list).
Iustin Pop [Wed, 11 Mar 2009 07:13:18 +0000 (08:13 +0100)]
Also print cluster coefficients in hn1
This patch adds printing the initial and final cluster coefficients in
hn1 too, to better understand the found solution.
Iustin Pop [Tue, 10 Mar 2009 20:20:06 +0000 (21:20 +0100)]
Beautify the cluster status list
This patch removes the primary/secondary instance lists from the node
status and also removes the tabbed formatting with explicit width
formatting.
Iustin Pop [Tue, 10 Mar 2009 19:35:01 +0000 (20:35 +0100)]
Beautify solution list
This patch makes the tabular solution list nicer, by changing from tabs
to explicit widths.
Iustin Pop [Tue, 10 Mar 2009 18:59:50 +0000 (19:59 +0100)]
Limit string literals to 80-char columns
Learned how multi-line string literals work in Haskell :)
Iustin Pop [Mon, 9 Mar 2009 20:47:48 +0000 (21:47 +0100)]
Add a news file and make the 0.0.5 release
Iustin Pop [Mon, 9 Mar 2009 20:37:39 +0000 (21:37 +0100)]
Beautify: strip common suffix from names
This patch automatically removes the longest common (domain, i.e.
starting with a dot) suffix from the node and instance names. This gives
a much clearer display, and this format is compatible with the way
Ganeti accepts shortened names.
Iustin Pop [Mon, 9 Mar 2009 19:50:08 +0000 (20:50 +0100)]
hbal: allow, but warn on, N+1 failed clusters
Based on the node changes, we remove the N+1 check and only show a
warning instead.
Iustin Pop [Mon, 9 Mar 2009 19:46:24 +0000 (20:46 +0100)]
Change the node N+1 check model
Currently, we fail a new instance placement if the new node status is
not N+1 compliant. This means that an allocation on an already N+1
failed node still fails, even though (conceptually) we're not worse than
before.
This patch changes this model to fail the allocation *only* if the node
was N+1 compliant before. This allows balancing to work on non-N+1 happy
clusters, with the caveat that they probably won't be N+1 happy at the
end.
Since we skip N+1 check in some cases, we add a new “failHealth” check
that verifies the node still has strict positive free memory and disk
space.
Iustin Pop [Mon, 9 Mar 2009 19:33:13 +0000 (20:33 +0100)]
Show which nodes are not N+1 compliant in output
This patch adds a '*' character to nodes which are not N+1 compliant to
the output, to help with understanding pre- and post-changes cluster
status.
Iustin Pop [Sun, 22 Feb 2009 12:20:00 +0000 (13:20 +0100)]
Don't build documentation for the Main modules
This fixes the doc issue which exists since the addition of hbal. Now
make doc makes sense again.
Iustin Pop [Sun, 22 Feb 2009 12:18:52 +0000 (13:18 +0100)]
Change the total disk/mem to Double
Since we only use the totals for computations, and we always convert
them via fromIntegral, let's just store them directly as Doubles.
Iustin Pop [Sun, 22 Feb 2009 12:15:12 +0000 (13:15 +0100)]
A no-code change s/disk/dsk/
This just makes indendation nicer in many expressions.
Iustin Pop [Sun, 22 Feb 2009 12:05:08 +0000 (13:05 +0100)]
Compute the p_mem / p_dsk statically
This patch changes the computation of p_mem / p_dsk from on-demand
(whenever the cluster stats are computed) to after-modify (after a node
is modified, we update its stats). This brings a god speed-up as only
one node or two are usually changed between cluster-wide stats are
computed.
Iustin Pop [Sun, 15 Feb 2009 13:53:38 +0000 (14:53 +0100)]
Documentation updates
Iustin Pop [Sun, 15 Feb 2009 13:48:39 +0000 (14:48 +0100)]
Simplify the checkInstanceMove function
This patch flattens the two folds into one, by simply building the whole
list of moves instead of the double recursion (nodes and the each
node's moves). This has no functional change, but it's much cleaner.
Iustin Pop [Sun, 15 Feb 2009 13:40:42 +0000 (14:40 +0100)]
A small optimization in node computation
Currently we always compute the available node list for moves (for an
instances) based on the nodes of the initial table. This works find,
however is a repeated calculation.
We optimize this by passing a node list (of indexes, not full objects),
which helps in two ways:
- faster to filter later
- allows restriction of target nodes by enforcing only this subset as
target for moves
Iustin Pop [Sun, 15 Feb 2009 13:40:13 +0000 (14:40 +0100)]
Container: add a 'keys' function
Iustin Pop [Sun, 15 Feb 2009 13:35:53 +0000 (14:35 +0100)]
Replace a foldl by foldl'
Iustin Pop [Sun, 15 Feb 2009 13:10:41 +0000 (14:10 +0100)]
Split checkMove into two
This cleans up and splits the individual instance move into a separate function.
Iustin Pop [Sat, 14 Feb 2009 21:22:26 +0000 (22:22 +0100)]
Change the balancing algorithm
This patch changes the balancing algorithm to not iterate linearly over
the instances (in a random, but fixed order), instead selecting at each
step the best next move. This should allow a better score (most of the
time), and usually also a shorter solution.
Iustin Pop [Sat, 14 Feb 2009 08:05:06 +0000 (09:05 +0100)]
Add RAPI support to hn1
This patch moves a function to Utils and changes hn1 to be able to take
data from RAPI.
Iustin Pop [Sat, 14 Feb 2009 08:00:56 +0000 (09:00 +0100)]
Implement oneline-output for hbal
Iustin Pop [Sat, 14 Feb 2009 07:51:15 +0000 (08:51 +0100)]
Do not try both http and https against the server
This patch changes the tryRapi function so that if the http request
succeeded, we don't try https too.
Iustin Pop [Sat, 14 Feb 2009 07:46:16 +0000 (08:46 +0100)]
Simplify some JSON transforms
... hopefully this is more clear.
Iustin Pop [Fri, 13 Feb 2009 21:26:23 +0000 (22:26 +0100)]
Add compatibility with rapi v1
The patch adds compatibility with RAPI v1, and this required some new
JSON functions as valFromObj doesn't behave nicely.
Some other unrelated changes were done too.