Iustin Pop [Tue, 9 Mar 2010 14:40:44 +0000 (15:40 +0100)]
Fix iallocator crash when no solutions exist
Commit 5436576 added an un-guarded `head' call, which crashes with
“Prelude.head: empty list” when no results exists for the per-instance
allocation/relocation calls.
This patch fixes this, and also adds another check for an unguarded
`head' call during parsing of input data.
Iustin Pop [Fri, 26 Feb 2010 13:42:41 +0000 (14:42 +0100)]
Fix a haddock comment issue
For some versions of haddock, this can create problems.
Iustin Pop [Thu, 25 Feb 2010 13:47:17 +0000 (14:47 +0100)]
Abstract instance running states into a list
This removes some manual checks from a few places in the code with a
single list defined once.
Iustin Pop [Thu, 25 Feb 2010 13:39:13 +0000 (14:39 +0100)]
A number of small fixes from hlint
Iustin Pop [Thu, 25 Feb 2010 12:49:37 +0000 (13:49 +0100)]
Add a lint target that uses hlint
Iustin Pop [Thu, 25 Feb 2010 12:35:37 +0000 (13:35 +0100)]
Fix unused-do-binds for ghc 6.12
GHC 6.12 has some new warnings, which are valid in most cases except
(IMHO) printf usage.
Iustin Pop [Thu, 25 Feb 2010 12:34:29 +0000 (13:34 +0100)]
Fix unused imports for ghc 6.12
GHC 6.12 has become more picky about unused imports, so we need to
remove/tighten some of them.
Iustin Pop [Wed, 24 Feb 2010 15:21:33 +0000 (16:21 +0100)]
Allow overriding the ghc compiler used
… via a GHC make variable.
Iustin Pop [Tue, 23 Feb 2010 17:10:51 +0000 (18:10 +0100)]
hscan: implement LUXI backend scanning
This allows hscan to work also with NO_CURL (but only for the local
machine, of course).
Iustin Pop [Tue, 23 Feb 2010 12:53:26 +0000 (13:53 +0100)]
Loader: abort for unknown to-be-excluded instances
Iustin Pop [Tue, 23 Feb 2010 12:18:13 +0000 (13:18 +0100)]
Enable hbal to use the new command line option
Iustin Pop [Tue, 23 Feb 2010 12:13:23 +0000 (13:13 +0100)]
balance function: use the movable flag directly
Instead of deciding based on secondary node, use the new flag.
Iustin Pop [Tue, 23 Feb 2010 12:09:46 +0000 (13:09 +0100)]
Update the loader pipeline to set the movable flag
This updates the movable flag on instances if they have only one node
(we don't rely on OpMoveInstance) or if they are set so via the command
line options.
This doesn't yet enable the use of the new flag.
Iustin Pop [Tue, 23 Feb 2010 11:56:13 +0000 (12:56 +0100)]
Add a 'movable' flag on instances
This will be used instead of checking for no secondary and for
simplifying 'do not touch' instances.
Iustin Pop [Tue, 23 Feb 2010 09:40:07 +0000 (10:40 +0100)]
Add an option for excluding instances from moves
Iustin Pop [Mon, 22 Feb 2010 14:18:13 +0000 (15:18 +0100)]
Update NEWS file for the 0.2.4 release
Iustin Pop [Mon, 22 Feb 2010 13:46:24 +0000 (14:46 +0100)]
Update the hail man page
This adds a short note for the new iallocator mode.
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 17 Feb 2010 09:09:22 +0000 (10:09 +0100)]
Implement IAllocator node evacuate request
This patch adds the new request loading/execution (trivial), but the
actual response formatting becomes more difficult as now the response
type differs by request.
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 17 Feb 2010 09:06:52 +0000 (10:06 +0100)]
Add a tryEvac function
This will be used by the node evacuate IAllocator request type.
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 17 Feb 2010 08:54:05 +0000 (09:54 +0100)]
Move a type declaration to Node.hs
We'll need AllocElement in both Cluster and IAlloc in the future, so we
move it to Node.hs which is imported by both.
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 16 Feb 2010 13:22:48 +0000 (14:22 +0100)]
Change an internal type from Maybe to list
In preparation for multiple responses, we change from Maybe to List
(both used in the container sense).
This allows us to keep the same workflow for all kind of requests.
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 16 Feb 2010 12:30:57 +0000 (13:30 +0100)]
IAllocator: move some keys into per-request data
Since not all structures will have these keys in the future, we move
them into per-structure keys.
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 9 Feb 2010 10:52:08 +0000 (11:52 +0100)]
Document the evac mode
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 9 Feb 2010 10:48:21 +0000 (11:48 +0100)]
Implement evacuation mode in hbal
This mode restricts the list of instances to be moved to the instances
living on the offline (and drained) nodes.
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 9 Feb 2010 10:11:17 +0000 (11:11 +0100)]
Add an evac mode CLI option
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 8 Feb 2010 14:21:54 +0000 (15:21 +0100)]
Reorder options in CLI.hs
This should be no code change, just reordering of the options.
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 4 Feb 2010 10:30:53 +0000 (11:30 +0100)]
Update documentation for the text backend
Iustin Pop [Thu, 4 Feb 2010 09:58:39 +0000 (10:58 +0100)]
Update NEWS file for the 0.2.3 release
Iustin Pop [Wed, 3 Feb 2010 08:59:55 +0000 (09:59 +0100)]
Fix secondary node selection for existing N+1
In case a secondary node is already N+1 failed, currently the node
selection will accept a node that cannot start (at all) the new instance
as valid. This is wrong, so we add a new simple check to prevent the
case of instance's memory size being higher than the node's free (not
available, which might be lower than 0 for N+1 failures) memory.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 2 Feb 2010 16:57:49 +0000 (17:57 +0100)]
Rewrite the node add checks for simpler layout
This will make it clearer than many if…then choices.
Iustin Pop [Thu, 14 Jan 2010 16:18:51 +0000 (17:18 +0100)]
Move instance relocation test upper in the chain
Currently we test each instance for relocation in checkMove; however, it
is a little more clear if we pass only the relocatable instances to
checkMove. The patch also slightly rewrites (indendation/style) the
second half of the checkMove function.
Iustin Pop [Thu, 14 Jan 2010 16:05:29 +0000 (17:05 +0100)]
Split the balancing function in two parts
Currently in the balancing function we do two thing:
- take the decision where to do a new balancing round or not
- and actually computing the balancing round
This is not nice, as the two parts are conceptually separate, so this
patch splits the decision on whether to descend or not to a new
function.
Iustin Pop [Thu, 14 Jan 2010 14:15:11 +0000 (15:15 +0100)]
Small update to the Makefile
Iustin Pop [Thu, 14 Jan 2010 14:09:39 +0000 (15:09 +0100)]
Makefile: Switch from subshell to $(MAKE) -C
It seems that set -e does not affect subshell (only simple commands),
and thus we don't actually get failures from make check being run in a
subshell. Rather than trying to handle this better, we remove the
subshell and invoke make with the required subdirectory.
René Nussbaumer [Tue, 12 Jan 2010 09:55:16 +0000 (10:55 +0100)]
Fixing a typo in option description
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 7 Jan 2010 12:58:22 +0000 (13:58 +0100)]
live-test fixes for the text backend changes
Iustin Pop [Thu, 7 Jan 2010 11:49:16 +0000 (12:49 +0100)]
Update hscan to generate new-format text files
This also updates its manpage, some information were really old.
Iustin Pop [Thu, 7 Jan 2010 11:33:39 +0000 (12:33 +0100)]
Switch the text file format to single-file
This patch changes from the two separate files to a single file, with
sections separated by a blank line. Currently only the node and instance
data is accepted, later the cluster tags will be read too via this
format.
This makes all the programs accept the new format, but hscan doesn't yet
generate it.
Iustin Pop [Thu, 7 Jan 2010 10:44:24 +0000 (11:44 +0100)]
Change the signatures of the text loader slightly
This is in preparation for the text format changes.
Iustin Pop [Tue, 29 Dec 2009 19:13:42 +0000 (20:13 +0100)]
Update NEWS file for the 0.2.2 release
Iustin Pop [Tue, 29 Dec 2009 19:07:55 +0000 (20:07 +0100)]
Update hbal man page to note that we use stddev
We actually use stddev and not the coefficient of variance (as wrongly
noted before), so we update the documentation appropriately.
We also note that the dynamic load values must be pre-normalized, since
we don't do such a normalization in the code.
Iustin Pop [Tue, 29 Dec 2009 18:50:03 +0000 (19:50 +0100)]
Update comments in live-test about clusters
The live test needs a populated cluster, so let's write this down.
Iustin Pop [Mon, 28 Dec 2009 11:01:49 +0000 (12:01 +0100)]
Improve the dist build rule
This changes the 'dist' rule to also do a check that the archive can
build all the programs and passes the check test itself, and shows the
sha1sum at the end automatically.
Iustin Pop [Mon, 28 Dec 2009 10:13:04 +0000 (11:13 +0100)]
Remove Version.hs during clean too
Ganeti/HTools/Version.hs is generated at build time from version (which
is the only one shipped), so it must be removed by the clean rule.
Iustin Pop [Mon, 28 Dec 2009 10:09:25 +0000 (11:09 +0100)]
Fix small typo
This was found, of all things, via lintian during the Debian packaging…
Iustin Pop [Fri, 11 Dec 2009 17:01:10 +0000 (18:01 +0100)]
Convert n1_score metric from % to count
This increases the priority of fixing N+1 failures compared to balancing
metrics.
Iustin Pop [Fri, 11 Dec 2009 16:54:18 +0000 (17:54 +0100)]
Merge branch 'master' into next
* master:
Use the oper_ram field if available
rapi, luxi: treat drained nodes as offline
Iustin Pop [Fri, 11 Dec 2009 16:47:07 +0000 (17:47 +0100)]
Metric: count of primary instances/offline nodes
This helps with evacuation/failover of instances on 2-node clusters with
one one offline.
Iustin Pop [Fri, 11 Dec 2009 16:41:01 +0000 (17:41 +0100)]
Offline instance metric: change from % to count
Currently we use the offline instance percentage (with range [0, 1]),
but this is not good, since we want the evacuation of such instances to
have a high priority; therefore we change this to a count of offline
instances, which has higher weight than a metric with range [0, 1].
Iustin Pop [Fri, 11 Dec 2009 16:17:28 +0000 (17:17 +0100)]
Use the oper_ram field if available
For the RAPI and LUXI backends, we can get the actual memory usage (if
instances are running) via the oper_ram, whereas backend/memory only
tell what the instance will use at the next boot.
Not using oper_ram means that the node model is flawed and we consider
wrong values for the instance's memory (resulting sometimes in hilarious
values such as x_mem = -700 MB).
Iustin Pop [Wed, 9 Dec 2009 10:29:22 +0000 (11:29 +0100)]
rapi, luxi: treat drained nodes as offline
Commit e97f211 changed the iallocator backend to handle drained nodes as
offline. This commit completes that change by making the rapi and luxi
backend do the same (the text backend ignores any '?' values which are
returned by ganeti when nodes have problems, so it doesn't need this
change).
Iustin Pop [Wed, 2 Dec 2009 16:49:38 +0000 (17:49 +0100)]
Add a live-test script
This can be used to test that all the existing commands work correctly. It
needs a running cluster with at least one instance to run all the tests.
Iustin Pop [Wed, 2 Dec 2009 14:58:14 +0000 (15:58 +0100)]
Fix typo breaking LUXI backend
This really shows the need for actual dist-time full testing (not
unittests).
Iustin Pop [Wed, 2 Dec 2009 10:25:43 +0000 (11:25 +0100)]
Update NEWS file for the 0.2.1 release
Iustin Pop [Wed, 2 Dec 2009 10:51:47 +0000 (11:51 +0100)]
Fix unittests after instance tags addition
Iustin Pop [Wed, 2 Dec 2009 10:25:21 +0000 (11:25 +0100)]
Merge branch 'next'
* next:
Update documentation for the iextags
Re-wrap the README
Configure exclusion tags via the cluster tags
hail: add '-p' option intended for debugging use
Read cluster tags in the IAllocator backend
Read cluster tags in the LUXI backend
Read cluster tags in the RAPI backend
Introduce support for reading the cluster tags
Collapse the statistical functions into one
Specialize the math functions
Use conflicting primaries count in cluster score
Node: add function for conflicting primary count
Add a new node list field
Add a command-line option to filter exclusion tags
Introduce tag-based exclusion of primary instances
Add a tags attribute to instances
Small change in some list arguments
Use either \- or \(hy in manpages
Iustin Pop [Wed, 2 Dec 2009 10:24:48 +0000 (11:24 +0100)]
Update documentation for the iextags
Iustin Pop [Wed, 2 Dec 2009 10:20:59 +0000 (11:20 +0100)]
Re-wrap the README
… since we added the fill-column 72 setting.
Iustin Pop [Tue, 1 Dec 2009 12:49:49 +0000 (13:49 +0100)]
Configure exclusion tags via the cluster tags
This patch adds reading of the exclusion tags from the cluster tags: any
tags starting with htools:iextags: will convert their suffix into an
exclusion tags prefix. In other words, "htools:iextags:service" will
cause any "service:X" tag to become an exclusion group.
Iustin Pop [Tue, 1 Dec 2009 12:45:02 +0000 (13:45 +0100)]
hail: add '-p' option intended for debugging use
This prints the initial node list on stderr, since stdout is reserved for the
iallocator protocol (even though ganeti won't pass -p itself).
Iustin Pop [Tue, 1 Dec 2009 11:17:19 +0000 (12:17 +0100)]
Read cluster tags in the IAllocator backend
Iustin Pop [Tue, 1 Dec 2009 10:47:15 +0000 (11:47 +0100)]
Read cluster tags in the LUXI backend
Iustin Pop [Tue, 1 Dec 2009 09:53:58 +0000 (10:53 +0100)]
Read cluster tags in the RAPI backend
This also shows them in hbal in verbose mode.
Iustin Pop [Fri, 27 Nov 2009 15:13:12 +0000 (16:13 +0100)]
Introduce support for reading the cluster tags
While these are not actually populated from the backends, and all the
programs ignore them, this patch contains the changes in the function
types required.
Iustin Pop [Tue, 24 Nov 2009 11:50:48 +0000 (12:50 +0100)]
hspace: quote non-alphanum values in shell output
The tiered allocation output which contains spaces makes the output of
hspace non-sourceable. This patch adds a new function to ensure
non-alphanumeric values are quoted such that the output can be parsed
easily via the shell.
The patch also fixes a bug in the DSK_AVAIL key (found after adding the
quoting) which added an extra space at the end of these keys.
Iustin Pop [Tue, 17 Nov 2009 01:17:16 +0000 (02:17 +0100)]
Collapse the statistical functions into one
This allows us to get rid of two duplicate list length computations,
with a minor speedup.
Iustin Pop [Tue, 17 Nov 2009 01:04:38 +0000 (02:04 +0100)]
Specialize the math functions
The statistics functions are currently defined as polymorphic with a
Floating constraint. Changing this to monomorphic on Double type makes
them stricter and much more performant (~70% speedup). This is a cheap
way to recoup some of the loses incurred by the recent proliferation of
metrics.
Iustin Pop [Sat, 14 Nov 2009 23:02:17 +0000 (00:02 +0100)]
Use conflicting primaries count in cluster score
This small patch adds the number of conflicting primaries in the cluster
score. This is different from the other non-CV metrics where we usually
compute the percentage of failing instances (for that metric); but for a
somewhat big cluster, 1-2% failing instances will be a too small value
to cause the relocation of conflicting instances (future patches will
also switch other non-CV metrics to this method).
Iustin Pop [Sat, 14 Nov 2009 23:01:04 +0000 (00:01 +0100)]
Node: add function for conflicting primary count
Iustin Pop [Sat, 14 Nov 2009 09:26:30 +0000 (10:26 +0100)]
Add a new node list field
This patch adds a new node list field (ptags), showing the primary
instance tags.
Iustin Pop [Wed, 11 Nov 2009 16:37:09 +0000 (17:37 +0100)]
Add a command-line option to filter exclusion tags
Since we don't want all instance tags to be used for exclusion, we add a
command line option to filter on these. Since the iallocator protocol
cannot accept command line options, currently it's not possible to
specify these for hail, and thus it will never use any exclusion tags.
Iustin Pop [Wed, 11 Nov 2009 13:14:18 +0000 (14:14 +0100)]
Introduce tag-based exclusion of primary instances
This patch introduces exclusion of primary instances based on tags. This
is incomplete as currently all tags are being excluded, and we don't
optimise towards relocation of instances sharing tags on the same node.
Iustin Pop [Wed, 11 Nov 2009 10:01:36 +0000 (11:01 +0100)]
Add a tags attribute to instances
… and read it in all the loaders. hscan is modified to save it to the
files it generates.
The attribute is not yet used in any place.
Iustin Pop [Wed, 11 Nov 2009 09:37:33 +0000 (10:37 +0100)]
Small change in some list arguments
This is simpler than the concat operator.
Iustin Pop [Tue, 10 Nov 2009 17:24:17 +0000 (18:24 +0100)]
Use either \- or \(hy in manpages
This reduces warnings from lintian when building Debian packages.
Iustin Pop [Tue, 10 Nov 2009 13:48:39 +0000 (14:48 +0100)]
Update NEWS file for the 0.2.0 release
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 10 Nov 2009 13:23:34 +0000 (14:23 +0100)]
Rewrite NEWS for better RST compatibility
The text-only version should still be very readable, but the RST output
will be better hopefully.
Iustin Pop [Tue, 10 Nov 2009 12:59:56 +0000 (13:59 +0100)]
Allow overriding the field list in -p
The print nodes option can now accept an optional field list to
customise the output. This is ugly, since the field names do not match
the header names, but it is at least barely customisable (at runtime).
Iustin Pop [Mon, 9 Nov 2009 16:18:12 +0000 (17:18 +0100)]
Update hspace manpage with tiered allocation info
Also fixes some other small issues in man pages.
Iustin Pop [Mon, 9 Nov 2009 15:49:48 +0000 (16:49 +0100)]
Move more node-listing functionality in Node.hs
This will prepare for the runtime-selectable field list.
Iustin Pop [Mon, 9 Nov 2009 14:51:23 +0000 (15:51 +0100)]
Change the default dynamic usage to baseUtil
This fixed the unbalanced secondary instances on partially empty
clusters, and helps in general for the cases where real utilisation data
is not available.
Iustin Pop [Mon, 9 Nov 2009 13:43:35 +0000 (14:43 +0100)]
Add a few comments in the scoring function
Iustin Pop [Fri, 6 Nov 2009 16:09:43 +0000 (17:09 +0100)]
Enhance the error reporting for Rapi and Luxi
Currently the JSON conversion in Rapi and Luxi are giving something
like:
Error: failed to load data. Details:
Unable to read Double
This doesn't tell one where the error is (in a node specification? and
which node? etc.). This patch annotates such messages with the owner
node:
Error: failed to load data. Details:
Node 'node1' key 'mtotal': Unable to read Double
For errors during parsing of the node/instance name (unlikely, but
still), the output is:
Error: failed to load data. Details:
Parsing new node key 'name': Unable to read String
Iustin Pop [Fri, 6 Nov 2009 14:26:35 +0000 (15:26 +0100)]
Change the Utils.fromObj signature
Currently the fromObj function takes a JSON object which is then
converted into a list of (String, JSValue) in which we make a lookup.
However, most of the callers of this function call it repeatedly on the
same object, which means we do the object→list conversion repeatedly.
This patch converts it to take directly the list, and converts its
callers to do the conversion themselves (and only once).
While this is not in the hot-path today, it would be if we ever were to
process much data over Luxi (or RAPI), and is a good cleanup in any
case.
Iustin Pop [Fri, 6 Nov 2009 13:22:39 +0000 (14:22 +0100)]
Rework the tiered spec output format
Iustin Pop [Mon, 2 Nov 2009 15:25:26 +0000 (16:25 +0100)]
A small style change in Node.hs
This imports PeerMap as P and reindents some lines.
Iustin Pop [Mon, 2 Nov 2009 10:45:29 +0000 (11:45 +0100)]
hspace: show tiered-alloc stats in the output
This is a first attempt to get a readable output of tiered allocation
stats in hspace's output. Not very nice, but it should be somewhat
parseable.
Iustin Pop [Mon, 2 Nov 2009 09:32:13 +0000 (10:32 +0100)]
hspace: fix stats printing for tiered mode
Iustin Pop [Mon, 2 Nov 2009 09:26:09 +0000 (10:26 +0100)]
Make some CLI options more consistent
Both the simulate and the tiered allocation mode take a machine spec on
input via a comma-separated list. This patch makes this a little bit
more consistent (always use disk,ram,cpu in this order).
Iustin Pop [Fri, 30 Oct 2009 11:21:17 +0000 (12:21 +0100)]
Implement first version of tiered allocations
This patch adds the first version of tiered allocations where we
decrease instance specs on allocation failure and retry the allocation.
The output is not yet stable and the output changes are not documented
(yet).
Iustin Pop [Fri, 30 Oct 2009 11:07:28 +0000 (12:07 +0100)]
Add support for shrinking instance specs
This patch adds a function that, for some given failure modes, shrinks a
given instance in the hope that allocation will succeed when retried
with the new spec.
Iustin Pop [Fri, 30 Oct 2009 10:16:53 +0000 (11:16 +0100)]
hspace: Abstract the instance listing
This also converts it to formatTable from hardcoded listing.
Iustin Pop [Fri, 30 Oct 2009 08:29:42 +0000 (17:29 +0900)]
Rework the instance spec CLI options
This patch reworks the internal handling of the instance spec CLI
option, and adds a tiered spec option that will be used in hspace to
enable the (auxiliary) tiered-spec allocation mode.
It also introduces a new data type for holding the instance
specification.
Iustin Pop [Fri, 30 Oct 2009 09:17:36 +0000 (10:17 +0100)]
Convert option parsing to a monadic flow
This allows us to do verification of option arguments in the assignment
functions themselves.
Iustin Pop [Wed, 21 Oct 2009 11:06:40 +0000 (20:06 +0900)]
Some cleanup of Loader.mergeData
This doesn't need to be a monadic function, let's make it a simpler one.
Iustin Pop [Wed, 21 Oct 2009 08:58:56 +0000 (17:58 +0900)]
hbal: ignore unknown instance in dynload file
Since the utilisation file might be generated at a different time from
the hbal run, and instances could dissapear in the meantime, it's better
to simply ignore unknown instances rather than abort.
Iustin Pop [Wed, 21 Oct 2009 08:49:08 +0000 (17:49 +0900)]
Fix hbal man page w.r.t. --print-instances
The ordering was wrong, was showing node list details under
--print-instances.
Iustin Pop [Wed, 21 Oct 2009 08:47:52 +0000 (17:47 +0900)]
Expand the --print-instances output
This adds run status, resource parameters and load parameters for
instances.
Iustin Pop [Mon, 19 Oct 2009 06:06:59 +0000 (15:06 +0900)]
Old update to the NEWS file
0.1.8 was never documented in the NEWS file.
Iustin Pop [Sun, 18 Oct 2009 21:50:40 +0000 (06:50 +0900)]
Change the Container.findByName function
This patch changes the signature and implementation of the function;
returning the item makes more sense (saves a lookup later again in the
container, and applying idx is cheap), and the previous implementation
was ugly.