Statistics
| Branch: | Tag: | Revision:

root @ 06fb841e

# Date Author Comment
06fb841e 12/01/2010 07:08 pm Iustin Pop

Add two utility functions for the Result type

Actually, this just moves the functions from the QC module to Types, and
removes a duplicate entry from Cluster.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

99b63608 12/01/2010 07:08 pm Iustin Pop

Rework the types used during data loading

This improves on the previous change. Currently, the node and instance
lists shipped around during data loading are (again) association lists.
For instances it's not a big issue, but the node list is rewritten
continuously while we assign instances to nodes, and that is very slow....

2d0ca2c5 12/01/2010 07:08 pm Iustin Pop

Loader functions: move from assoc lists to maps

When loading big clusters, the association lists become a bit slow, so
we'll replace this with a simple Map String Int; the change is trivial
and can be reverted easily, while it brings up a good speedup in the...

6ff78049 12/01/2010 07:08 pm Iustin Pop

Convert some leftovers to NameAssoc

The type alias NameAssoc has been introduced a long time ago, but there
are some few not-yet-converted cases. In preparation for changes to that
type, let's make sure we use it consistently.

Signed-off-by: Iustin Pop <>...

646aa028 12/01/2010 03:00 pm Iustin Pop

hbal: implement handling of multi-group clusters

On a single-group cluster, we proceed as before. On multi-group
clusters, we require selection of the desired group (currently via UUID
only).

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

f4161783 12/01/2010 03:00 pm Iustin Pop

Add Cluster.splitCluster for node groups

This splits a top-level cluster information into the component node
groups. Instance go to the group of their primary node, but otherwise we
don't disallow split instances.

Signed-off-by: Iustin Pop <>...

ae16cf83 12/01/2010 03:00 pm Iustin Pop

Add the man html files to gitignore

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

5ef78537 12/01/2010 03:00 pm Iustin Pop

Rework Container.hs and improve test coverage

Since some of the functions we export from Container.hs are 1:1
identical to IntMap, we can just export the originals and remove the
wrappers. This reduces the code we need to unittest.

Furthermore, we add two simple unittest for the two non-trivial...

a423b510 12/01/2010 03:00 pm Iustin Pop

Add new command-line option for group selection

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

32b8d9c0 12/01/2010 03:00 pm Iustin Pop

Add two functions for checking cluster consistency

For now, we don't support instances allocated across two groups, and we
will reject such clusters. The isClusterConsistent function will return
a list of inconsistent instances, potentially allowing operation without...

d8bcd0a8 12/01/2010 03:00 pm Iustin Pop

Add function for nodes to (nodgroup, nodes) split

Unittests included. The function will be needed for consistency checks
in the algorithms.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

c4d98278 12/01/2010 03:00 pm Iustin Pop

Add a type alias for UUIDs

This is to pottentially allow easier changes later.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

e474b5b5 11/28/2010 07:02 pm Iustin Pop

Also build HTML versions of man pages

b7b29191 11/24/2010 03:55 pm Iustin Pop

RAPI: read the group UUID from the server

This depends on future support from Ganeti (2.4+).

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

31463db5 11/24/2010 03:55 pm Iustin Pop

IAlloc: read group uuid from the input message

This makes the code incompatible with JSON files from Ganeti pre-2.4.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

b3707354 11/24/2010 03:55 pm Iustin Pop

Text: read/save the node group UUID

Compatibility with old text files is kept by using the default UUID if
the file (or even some records) don't have a UUID.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

f5ed8632 11/24/2010 03:55 pm Iustin Pop

Luxi: read the node uuid from the cluster

This makes the code incompatible with Ganeti pre-2.4.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

a68004b7 11/24/2010 03:55 pm Iustin Pop

Node: add the node group's UUID

This is not used anywhere yet, and the backend are all just adding the
default UUID, not the real one.

The patch also allows displaying the group UUID in the node list.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

9b9da389 11/24/2010 03:54 pm Iustin Pop

Utils: add a default UUID

This will be used as a placeholder for the cases when we need a UUID
(any UUID), but we don't have one handy.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

a3e8da03 11/23/2010 04:46 pm Iustin Pop

Merge branch 'devel-0.2' into master

7570569e 11/23/2010 12:58 pm Iustin Pop

Improve the standard deviation computation

This does just two passes, instead of three, over the list. This reduces
the overall runtime well enough (~25%) in some tests, but it's not
reproducible using profiling, so I don't know how much the function
itself is being sped-up....

543e859d 11/23/2010 12:58 pm Iustin Pop

hbal: change handling of signal

Currently, hbal does a one-two signal handling, where the first signal
causes graceful termination, and the second one an immediate on (either
SIGINT or SIGTERM can be used, interchangeably). However, this poses a
timing problem: if two programs want to send a graceful termination...

5e718042 11/19/2010 01:35 pm Iustin Pop

Simu loader: move the loading to non-IO code

While we don't actually have IO code in the Simu loader, we do have the
same interface. So we move the code again to a separate parseData
function which is exported.

b3f0710c 11/19/2010 01:08 pm Iustin Pop

Luxi loader: split parsing from loading

748bfcc2 11/19/2010 01:06 pm Iustin Pop

Rapi loader: split parsing from loading

The change is similar to the text loader change.

dadfc261 11/19/2010 01:00 pm Iustin Pop

Text loader: split parsing from loadData

This change, which will be followed by similar changes in the other
loaders, splits the parsing of the data from the actual loading from
disk. Since the parsing doesn't usually involve IO actions, we will be
able to better test the parsing. The loading becomes a smaller part of...

9d775204 11/11/2010 01:02 pm Iustin Pop

Ignore nodes which are not vm_capable

This break compatibility with Ganeti pre-2.3.

92d43268 11/09/2010 09:51 am Iustin Pop

Merge branch 'devel-0.2'

  • devel-0.2:
    Fix tag exclusion weight
306cccd5 11/09/2010 09:37 am Iustin Pop

Fix tag exclusion weight

Currently, the tag exclusion metric has a weight of one, which means
there might be cases where we won't move instances around because it
upsets the cluster metrics. However, we do want to make a higher effort
for cleaning up tag collisions, so we increase the weight to an...

718f135d 10/26/2010 12:37 pm Iustin Pop

Force UTF-8 locale for pandoc invocation

Pandoc 1.5.x uses the locale information to parse its input files (only
1.5, pre and post version use always UTF-8). Hence we need to enforce a
UTF-8 locale for proper parsing of input files.

49148d15 10/25/2010 06:39 pm Iustin Pop

Move from hand-written man pages to RST/pandoc

This simplifies the maintenance of the man pages, and unifies the rst-to-*
converter to pandoc.

92921ea4 10/21/2010 04:58 pm Iustin Pop

Add design for htools/Ganeti 2.3 sync

This is a work in progress, will be modified along with the progress
of Ganeti 2.3.

ca8e1c6a 10/07/2010 04:09 pm Iustin Pop

Update NEWS file for 0.2.7 release

e3ae9508 10/07/2010 03:42 pm Iustin Pop

Fix some warnings in unittests

4886952e 10/06/2010 04:23 pm Iustin Pop

Add a hack for normalized CPU values in hspace

Currently, the key metrics/tiered spec computations show the virtual cpu
count. However, since we do have a maximum ration Vcpu/Pcpu, we can also
show the “normalized” cpu count, i.e. the equivalent physical cpu count...

03c6d8fa 10/06/2010 03:56 pm Iustin Pop

Improve the error message for tiered alloc option

03cb89f0 09/15/2010 06:30 pm Iustin Pop

hbal: implement user-friendly termination requests

Currently, hbal will abort immediately when requested (^C, or SIGINT,
etc.). This is not nice, since then the already started jobs need to be
tracked manually.

This patch adds a signal handler for SIGINT and SIGTERM, which will, the...

5f715404 09/03/2010 06:10 pm Iustin Pop

Document the gain options in hbal's manpage

848b65c9 09/03/2010 06:02 pm Iustin Pop

Use the mingain options in the balancing algorithm

Also adds them in hbal.

4f807a57 09/03/2010 03:35 pm Iustin Pop

Add new CLI options for min gain during balancing

Recent hbal seems to run many steps for small improvements (< 1e-3), so
we should stop early in this case.

We add a new option (-g), that will be used for the minimum gain during
balancing. This check will only become active when the cluster score is...

d78ceb9e 09/02/2010 03:45 pm Iustin Pop

Makefile: make the rst2html converter more strict

This will make the automated builds flag any problems.

adc5c176 09/02/2010 03:43 pm Iustin Pop

Add some more debugging functions

These are just variations of the standard debug, but are provided for
simpler code, since lazyness is something causing non-computation of
debug statements.

74e89a14 09/02/2010 03:43 pm Iustin Pop

Fix ReplaceSecondary moves for offline nodes

The addition of a new secondary on a node is doing two memory tests:
- in strict mode, reject if we get into N+1 failure
- reject if the new instance memory is greater than the free memory (not
available memory) on the node...

49d977db 09/02/2010 03:43 pm Iustin Pop

Update NEWS file

db43d7b3 08/30/2010 12:12 pm Iustin Pop

Update man pages for the new -S option

10852adb 08/30/2010 12:12 pm Iustin Pop

hspace: mark new instances as running

Otherwise the saved cluster state and the in-memory one are wrong.

3e9501d0 08/30/2010 12:12 pm Iustin Pop

Implement cluster state saving in hspace

This also uncovered a few issues with the allocation model (instances
not being marked up, etc.).

Compared to hbal, hspace will generate either one or two files (for both
the standard and the tiered allocation mode), depending on the input...

94d08202 08/30/2010 12:12 pm Iustin Pop

Change iterateAlloc to return the instance list

The Cluster.iterateAlloc and tieredAlloc functions are changed to also
return the updated instance list, since it is needed to have a “full”
cluster view.

748654f7 08/30/2010 12:12 pm Iustin Pop

Implement cluster state saving in hbal

Also move the LUXI execution (-X) to the end, after all the output
messages are printed. No good in waiting for the messages for a long
while, especially as they are not up-to-date stats after the job
execution, just an estimation of what the state will be.

4a273e97 08/30/2010 12:12 pm Iustin Pop

Abstract the cluster serialization from hscan.hs

This is currently hardcoded in an internal function in hscan.hs, and we
move it to Text.hs for later use.

02da9d07 08/25/2010 07:40 pm Iustin Pop

Add a new option --save-cluster

This option will in the future be used to serialize the cluster state in
hbal and hspace after the rebalance/allocation steps.

50811e2c 08/25/2010 07:04 pm Iustin Pop

Add unittest for Node text serialization

This checks that the Node text serialization and deserialization
operations are idempotent when combined other.

a070c426 08/25/2010 06:53 pm Iustin Pop

Switch unittest to custom hostnames

Currently, the hostnames are almost fully arbitrary chars, which breaks
the assumption that nodes/instances will be normal DNS hostnames.

This patch adds some custom generators for these hostnames, that will
allow better testing of text loader serialization/deserialization.

3bf75b7d 08/24/2010 07:30 pm Iustin Pop

Move text serialization functions to Text.hs

Currently these are in hscan, and cannot be reused easily.

57ef88df 07/29/2010 07:03 am Iustin Pop

Fix a couple of typos in the manpages

Again, thanks to lintian.

0ca66853 07/27/2010 09:44 pm Iustin Pop

hail: fix error message for failed multi-evac

Currently we show the instance index, but this makes no sense outside
the current running program. Instead, we show the instance name.

84edb64b 07/27/2010 03:03 am Iustin Pop

Update NEWS file for the 0.2.6 release

303bb0ed 07/27/2010 03:03 am Iustin Pop

NEWS: Add double blank lines before headers

This looks better for text-only viewing…

f688711c 07/23/2010 03:50 am Iustin Pop

hscan: return exit code 2 for RAPI failures

If some clusters failed during RAPI collection, exit with exit code 2 so
that tests can detect this failure.

b7478ce1 07/23/2010 03:32 am Iustin Pop

More enhancements to live-test.sh

b8262965 07/22/2010 04:57 pm Iustin Pop

Fix another haddock issue

691dcd2a 07/22/2010 06:03 am Iustin Pop

Remove an obsolete function and add Utils tests

b880f1d1 07/22/2010 03:32 am Iustin Pop

Extend the live-test

The (recently-enabled) live test coverage stats found a few low-hanging
fruits in the tests we do…

7e9e8245 07/22/2010 02:27 am Iustin Pop

Use --union for hpc sum

… which fixes the issue noted in the previous commit (almost a brown
paper bag change).

dc61c50b 07/22/2010 01:43 am Iustin Pop

Preliminary support for coverage during live-test

While this doesn't work correctly yet (hpc sum seems to only take common
modules, not the sum of modules?), it prepares for gathering coverage
data during live-test (as an alternative to unittest coverage data).

223dbe53 07/22/2010 01:42 am Iustin Pop

Add some more imports to QC.hs

This is needed so that in the coverage report we list all modules, even
the ones we don't test at all, such that we get the complete results.

c3c7a0c1 07/22/2010 01:42 am Iustin Pop

Change the meaning of the N+1 fail metric

Currently, this metric tracks the nodes failing the N+1 check. While
this helps (in some cases) to evacuate such nodes, it's not a good
metric since rarely it will change during a step (only at the last
instance moving away). Therefore we replace it with the count of...

8a3b30ca 07/22/2010 01:42 am Iustin Pop

Introduce per-metric weights

Currently all metrics have the same weight (we just sum them together).
However, for the hard constraints (N+1 failures, offline nodes, etc.)
we should handle the metrics differently based on their meaning. For
example, an instance living on a primary offline node is worse than an...

2cae47e9 07/22/2010 01:42 am Iustin Pop

Allow balancing moves to introduce N+1 errors

This patch switches the applyMove function to the extended versions of
Node.addPri and addSec, and passes the override flag based on the state
of the node that we're moving away from.

3e3c9393 07/22/2010 01:42 am Iustin Pop

Introduce a relaxed add instance mode

In case an instance is living on an offline node, it doesn't make sense
to refuse moving it because that would create N+1 failures; failing N+1
is still much better than not running at all. Similarly, if the
secondary node of an instance is offline, meaning the instance doesn't...

2849670b 07/19/2010 02:20 pm Iustin Pop

Remove obsolete Container.maxNameLen

This was only used in one place (hbal), and is obsolete by the change to
the dual name/alias structure.

14c972c7 07/19/2010 02:20 pm Iustin Pop

hbal: print short names in steps list

This was a regression from the name handling changes, as we started
using the original names for the solution list (which is not designed
for parsing/feeding back into ganeti).

fb33aaaf 07/19/2010 02:20 pm Iustin Pop

Remove an obsolete function

printSolution is no longer used, as we print the solution iteratively
now.

6dfa04fd 07/19/2010 12:13 am Iustin Pop

Allow '+' in node list fields

When the field list is prefixed with a plus sign, this will extend the
default field list, instead of replacing it entirely.

16f08e82 07/19/2010 12:13 am Iustin Pop

Update the node list fields

This patch renames the pri/sec to pcnt/scnt, and adds the real primary
and secondary instance lists, the peermap and the index of a node as
selectable options.

124b7cd7 07/19/2010 12:13 am Iustin Pop

Cleanup a node's peer map when possible

If the last secondary instance of a peer is deleted (detected by the new
peer memory value being equal to zero), then the pair (pdx, 0) should be
deleted completely. This is not optimization per se, but rather cleanup...

f9acea10 07/16/2010 09:31 pm Iustin Pop

Fix handling of offline options and short names

This needs to be abstracted in a separate function, but in the meantime
we fix the issue in both places.

Signed-off-by: Iustin Pop <>

95446d7a 06/21/2010 12:12 pm Iustin Pop

Fix another haddock special-char issue

db079755 06/21/2010 05:59 am Iustin Pop

Remove JOB_STATUS_GONE and add unittests

… for the serialization/deserialization of the job and opcode status.

Job status 'gone' was not actually used. It can be reintroduced if
needed.

41065165 06/21/2010 05:46 am Iustin Pop

Add opcode status constants/type

This mirrors, again, the Ganeti constats, and are added for future use.

7e98f782 06/21/2010 05:46 am Iustin Pop

Rename the job status constants

The rename is done such that we match Ganeti's own constants.

95f490de 06/08/2010 03:48 am Iustin Pop

Optimise the Luxi.recvMsg function

Since the current buffer cannot contain (during network reads) an EOM,
we should look for the EOM only in the newly-received string. While
this shouldn't make much difference, in some tests it cuts the recvMsg
total time by around half....

04282772 06/08/2010 01:09 am Iustin Pop

Complete the client Luxi implementation

All current Luxi calls are supported after this patch. A bug in
ArchiveJob is also fixed (Ganeti's job IDs are strings).

9622919d 06/08/2010 12:35 am Iustin Pop

Add support for more LUXI calls

While not are directly useful, having them will open some possibilities
(e.g. polling for job changes in hbal's -X mode, and auto-archiving the
jobs once they are successful).

4a007641 06/03/2010 12:08 am Iustin Pop

Fix some lint errors in the unit tests

683b1ca7 06/02/2010 11:55 pm Iustin Pop

Change the Luxi operations structure

Currently, we define the LuxiOp type as a simple enumeration, and leave
the arguments structure to the users of the Ganeti.Luxi module. This is
suboptimal for a couple of reasons: first, we decouple the operation
type from operation arguments, and that means we don't use the type...

9c0a748f 06/01/2010 11:51 pm Iustin Pop

Fix a warning in Loader tests

Incomplete pattern match…

c088674b 06/01/2010 08:54 pm Iustin Pop

Add a few Loader tests

These are not comprehensive, but at least we have a start.

8c5652f6 05/30/2010 09:16 pm Iustin Pop

Modify the test runner to show test exceptions

QuickCheck's batch driver (at least v1) doesn't show the test aborts,
but simply discards the specific exception and increases the abort
count. This makes it hard to debug the tests, so we modify our own test...

9e35522c 05/28/2010 12:13 pm Iustin Pop

Reduce the warnings during the unittests

Since the unittests are not 'clean' from the p.o.v. of type
declarations, and cannot be made clean in all respects (e.g. orphan
instances), we silence some warnings for the test target, to have a
cleaner output.

06fe0cea 05/28/2010 12:39 am Iustin Pop

Improve the test driver

The tests are moved to a separate data structure, and we can select a
subset of tests to run.

88f25dd0 05/28/2010 12:25 am Iustin Pop

Introduce OpCode unittests

f36a8028 05/28/2010 12:00 am Iustin Pop

Introduce suport for optional keys in JObjects

Some keys are optional in the Ganeti opcodes (e.g. ‘node’ in the
OpReplaceDisks), and as such we need to transform them in a Maybe value,
instead of failing.

The patch reworks a bit fromObj and adds maybeFromObj which parses such...

c96d44df 05/27/2010 11:37 pm Iustin Pop

Replace fromJResult with annotateJResult

This patch removes all old uses of fromJResult with the annotated
version, and removes the non-annotated version. All JSON parsing points
should now have annotated errors.

c8b662f1 05/27/2010 11:32 pm Iustin Pop

Add annotations to loadJSArray

This allows, for example, the RAPI backend to detail which information
(instance or node data) fails to parse.

50d26669 05/27/2010 11:23 pm Iustin Pop

Change fromObj error messages

Currently fromObj doesn't detail what we're trying to read, which can
lead to cryptic messages: "Cannot read Int". The patch changes this
function to annotate the error messages with the key/value we're trying
to convert, by using a new version of fromJResult....

82ea2874 05/27/2010 01:11 am Iustin Pop

A few more small Node unit-tests

39d11971 05/25/2010 08:17 pm Iustin Pop

Add more unittests

Instance, Node and Text modules have improved coverage.

3fea6959 05/20/2010 07:45 pm Iustin Pop

Add more unit tests for allocation/balance

The patch adds some simple unit-tests for both the allocation function
(we can allocate small instances on an empty cluster, we can allocate in
tiered more starting from any size) and the balancing functions (one...

3ce8009a 05/20/2010 01:31 pm Iustin Pop

Move two functions from hspace to Cluster.hs

This is done so we can test a longer pipeline.