Statistics
| Branch: | Tag: | Revision:

root / Ganeti @ 4bc33d60

# Date Author Comment
4bc33d60 01/07/2011 05:39 pm Iustin Pop

Instance relocation: stay within the current group

This patch adds a new top-level relocation function that restricts the
relocation to the instance's group, and switches hail to it.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

cb0c77ff 12/30/2010 03:56 pm Iustin Pop

Container: remove fromAssocList

Container.fromAssocList is just a re-export of IntMap.fromList; it
makes sense to remove it and simply export the original name, as it
needs just a bit of renaming in the rest of the code.

Signed-off-by: Iustin Pop <>...

a3eee4ad 12/30/2010 03:56 pm Iustin Pop

Parallelize the balancing computations

This small patch changes the balancing computation to work in
parallel, if possible.

While the normal linking is against the single-threaded runtime, if
the code is linked against the multi-threaded one, the balancing will...

d5ccec02 12/30/2010 03:46 pm Iustin Pop

Allocation routines: return list of resource stats

Currently, the allocation routines (iterateAlloc and tieredAlloc)
return only the final state of the cluster and the list of allocated
instances. For better visibility in how the cluster resources change,...

f52dadb2 12/30/2010 03:43 pm Iustin Pop

Fix updating of available (V)CPUs in CStats

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

fd3fe74d 12/30/2010 03:43 pm Iustin Pop

RAPI: implement backwards compat with Ganeti 2.3

This is a cheap way to get back compatibility with Ganeti 2.3 (and
lower) in the RAPI backend. It is however not very safe (the /2/groups
resource could fail due to other reasons), so it is added only
temporarily....

a083e855 12/30/2010 03:42 pm Iustin Pop

Document Utils.tryFromObj

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

6bc39970 12/30/2010 03:41 pm Iustin Pop

Add 'Read' instances for most objects

This allows a cluster structure to be easily serialized via "read";
together with the already existing instances of Show, this gives a
poor man's serialization/deserialization implementation.

The patch also exports the compDetailedCV function from Cluster.hs, so...

33e44f0c 12/30/2010 11:49 am Iustin Pop

Add maybePrintInsts for the instance listing

This again abstracts a bit the instance listing. Due to the fact that
I don't want to import Cluster.hs in CLI.hs, we pass the already
generated output. It also moves the instance display to stderr.

Signed-off-by: Iustin Pop <>...

417f6b50 12/30/2010 11:49 am Iustin Pop

Add maybePrintNodes for abstracting the node list

Since this bit of code (including the “when (isJust …)” is used in
multiple places, let's abstract it in a function that is used
consistently. One (bad?) side-effect is that all node lists are done
to stderr, including the ones from hbal where it was previously done...

4188449c 12/30/2010 11:48 am Iustin Pop

Add maybeSaveData for cluster state saving

This functionality was replicated in multiple places (hbal & hspace),
so we abstract it for better clarity.

Additionally, in hbal we now save the state both before and after
balancing.

Signed-off-by: Iustin Pop <>...

c0e31451 12/30/2010 11:45 am Iustin Pop

Convert Text.serializeCluster to ClusterData

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

f4f6eb0b 12/30/2010 11:44 am Iustin Pop

Convert the rest of the pipeline to ClusterData

This patch converts the backends and mergeData to the new ClusterData
type.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

017a0c3d 12/30/2010 11:44 am Iustin Pop

Move part of the loader pipeline to ClusterData

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

34c00528 12/30/2010 11:40 am Iustin Pop

Convert Loader.RqType to ClusterData

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

7b6e99b3 12/30/2010 11:38 am Iustin Pop

Add a new type ClusterData

This will be used to hold all the disparate uses of the cluster data:
we have either tuples with these four elements, or functions taking
these four arguments, etc.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

6c7448bb 12/30/2010 11:38 am Iustin Pop

Simulation backend: read the allocation policy too

This patch moves the allocation policy from hardcoded to be read from
the given specification, and extends the error message for invalid
specifications.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

9983063b 12/30/2010 11:36 am Iustin Pop

Simulation backend: allow multiple node groups

This patch changes the behaviour of the --simulation option to be an
incremental option, where each new use defines a new node group. This
allows simulation of more complex clusters.

Signed-off-by: Iustin Pop <>...

50211c86 12/23/2010 05:17 pm Iustin Pop

Merge branch 'stable-0.2'

  • devel-0.2:
    Update NEWS file for 0.2.8 release
    hbal: return meaningful exit code for job failures
    Change the balancing function
4715711d 12/23/2010 02:25 pm Iustin Pop

Change the balancing function

Currently the balancing function is a modified version of the standard
deviation (stddev divided by list length), due to historical reasons.

While this works fine for small clusters, for big clusters it makes
the balancing effect too "weak", and in some cases it refuses to...

949397c8 12/23/2010 11:16 am Iustin Pop

Move some tiered spec functionality to Cluster.hs

This splits out a bit of code from hspace.hs and moves it into its own
function in Cluster.hs.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

10ef6b4e 12/20/2010 02:23 pm Iustin Pop

Change the Node.group attribute

Currently, the Node.group attribute is the UUID of the group, as until
recently Ganeti didn't export the node group properties. Since it does
so now, we make the following changes (again apologies for a big
patch):

- we change the group attribute to be an index, similar to the way an...

e4d8071d 12/20/2010 02:23 pm Iustin Pop

Text.hs: also save the group data when serialising

This should have been in the previous patches, but sent separate for
clarity.

The live-test script is updated to read the first node from the
cluster, now that the text files don't start anymore with the node...

aec636b9 12/20/2010 02:23 pm Iustin Pop

hail: display group names in info messages

This patch switches from the group index to the group name for the
informational messages in the hail results.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

748d5d50 12/20/2010 02:23 pm Iustin Pop

Generalise the sepSplit function

Currently it works on splitting strings by individual chars, but we
can generalise it to split lists by list elements, which means we can
reuse it later in the Text module for splitting both lists of chars by
'|' or lists of lines by empty newlines. The change also makes the...

a604456d 12/20/2010 02:23 pm Iustin Pop

Text.hs: change to use sepSplit

The new sepSplit function can split based on empty lines, so we remove
the hackish text splitting from before and simply use sepSplit. This
is needed as the addition of extra sections would have increased the
code linearly, which we don't want :)...

afcd5a0b 12/20/2010 02:23 pm Iustin Pop

Text.hs: also read cluster tags from the data file

This means that a file with the correct information is as accurate as
the other backends (Luxi, Rapi). Serialization of tags is in the next
patch.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

716c6be5 12/20/2010 02:23 pm Iustin Pop

Text.hs: serialize cluster tags when writing data

This is the complement to the reading part. Now the live-test works
correctly against clusters with configured exclusion tags.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

b2ba4669 12/20/2010 02:23 pm Iustin Pop

Implement a JSON instance for AllocPolicy

This will allow reading this attribute via the Rapi/Luxi backends.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

2ddabf4f 12/20/2010 02:23 pm Iustin Pop

Rapi: read the allocation policy from the cluster

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

c4c37257 12/20/2010 02:23 pm Iustin Pop

Luxi: read the allocation policy from the cluster

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

f4c7d37a 12/20/2010 02:23 pm Iustin Pop

Text: read/write the allocation policy

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

73206d0a 12/20/2010 02:23 pm Iustin Pop

IAllocator: respect the alloc_policy for groups

This patch changes the allocate mode to respect the alloc_policy for
groups. It does this by changing the sort key from simply the solution
score, to a tuple with two elements: the alloc policy (which is now an...

a679e9dc 12/20/2010 02:23 pm Iustin Pop

Rework the data loader pipelines to read groups

This (invasive) patch changes all the loader pipelines to read the node
groups data from the cluster, via the various backends. It is invasive
as it needs coordinated changes across all the loaders.

Note that the new group data is not used, just returned....

f4531f51 12/20/2010 02:22 pm Iustin Pop

Add lookupGroup utility function

This will be used in the various backends similar to the lookupNode
function.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

0dc1bf87 12/20/2010 02:20 pm Iustin Pop

Add a new Group.hs module describing node groups

This is not yet used by the rest of the code.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

edd0a48f 12/20/2010 02:20 pm Iustin Pop

Add the new OpQueryGroups opcode definition

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

dec88196 12/09/2010 04:08 pm Iustin Pop

Improve error reporting for small clusters

When doing a two-node allocation on a cluster/group in which only one
node is online, or a one-node allocation without any online nodes, we
didn't show a valid error mesage. The patch changes tryAlloc to "fail
hard" in this case, to make the failure explicit....

9b1584fc 12/09/2010 04:08 pm Iustin Pop

hail/allocate: implement multi-group support

This is a bit hackish. We add a new function that takes the input data,
splits it into groups, runs the original tryAlloc for each group, and
then chooses the best solution, but adds the log messages from all the...

859fc11d 12/09/2010 04:08 pm Iustin Pop

Add a 'log' attribute to allocation solutions

And also a couple of functions for describing a given solution; these
will be used in the future instead of the ones currently in hail.

The patch also enhances the description of failure messages.

Signed-off-by: Iustin Pop <>...

85d0ddc3 12/09/2010 04:08 pm Iustin Pop

Change AllocSolution from tuple to its own type

Tuples are good for two, three, at most four elements. Beyond that, the
continuous pattern matching and construction/deconstruction becomes
tedious.

Since in the future we'll probably keep more information in the...

a334d536 12/01/2010 07:08 pm Iustin Pop

Cleanup AllocSolution after AllocElement changes

Since we added the score to AllocElement, we don't need to wrap
AllocElement in yet another tuple, just to attach the cluster score. So
we simplify the AllocSolution type.

Signed-off-by: Iustin Pop <>...

7d3f4253 12/01/2010 07:08 pm Iustin Pop

AllocElement: extend with the cluster score

AllocElement, a type used as a result of allocations, holds the status
of the nodes after the allocation. In most cases, we'll compare this
allocation result with others, to see which allocation decision makes
the most sense. This comparison is done via the cluster score....

06fb841e 12/01/2010 07:08 pm Iustin Pop

Add two utility functions for the Result type

Actually, this just moves the functions from the QC module to Types, and
removes a duplicate entry from Cluster.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

99b63608 12/01/2010 07:08 pm Iustin Pop

Rework the types used during data loading

This improves on the previous change. Currently, the node and instance
lists shipped around during data loading are (again) association lists.
For instances it's not a big issue, but the node list is rewritten
continuously while we assign instances to nodes, and that is very slow....

2d0ca2c5 12/01/2010 07:08 pm Iustin Pop

Loader functions: move from assoc lists to maps

When loading big clusters, the association lists become a bit slow, so
we'll replace this with a simple Map String Int; the change is trivial
and can be reverted easily, while it brings up a good speedup in the...

6ff78049 12/01/2010 07:08 pm Iustin Pop

Convert some leftovers to NameAssoc

The type alias NameAssoc has been introduced a long time ago, but there
are some few not-yet-converted cases. In preparation for changes to that
type, let's make sure we use it consistently.

Signed-off-by: Iustin Pop <>...

f4161783 12/01/2010 03:00 pm Iustin Pop

Add Cluster.splitCluster for node groups

This splits a top-level cluster information into the component node
groups. Instance go to the group of their primary node, but otherwise we
don't disallow split instances.

Signed-off-by: Iustin Pop <>...

5ef78537 12/01/2010 03:00 pm Iustin Pop

Rework Container.hs and improve test coverage

Since some of the functions we export from Container.hs are 1:1
identical to IntMap, we can just export the originals and remove the
wrappers. This reduces the code we need to unittest.

Furthermore, we add two simple unittest for the two non-trivial...

a423b510 12/01/2010 03:00 pm Iustin Pop

Add new command-line option for group selection

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

32b8d9c0 12/01/2010 03:00 pm Iustin Pop

Add two functions for checking cluster consistency

For now, we don't support instances allocated across two groups, and we
will reject such clusters. The isClusterConsistent function will return
a list of inconsistent instances, potentially allowing operation without...

d8bcd0a8 12/01/2010 03:00 pm Iustin Pop

Add function for nodes to (nodgroup, nodes) split

Unittests included. The function will be needed for consistency checks
in the algorithms.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

c4d98278 12/01/2010 03:00 pm Iustin Pop

Add a type alias for UUIDs

This is to pottentially allow easier changes later.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

b7b29191 11/24/2010 03:55 pm Iustin Pop

RAPI: read the group UUID from the server

This depends on future support from Ganeti (2.4+).

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

31463db5 11/24/2010 03:55 pm Iustin Pop

IAlloc: read group uuid from the input message

This makes the code incompatible with JSON files from Ganeti pre-2.4.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

b3707354 11/24/2010 03:55 pm Iustin Pop

Text: read/save the node group UUID

Compatibility with old text files is kept by using the default UUID if
the file (or even some records) don't have a UUID.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

f5ed8632 11/24/2010 03:55 pm Iustin Pop

Luxi: read the node uuid from the cluster

This makes the code incompatible with Ganeti pre-2.4.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

a68004b7 11/24/2010 03:55 pm Iustin Pop

Node: add the node group's UUID

This is not used anywhere yet, and the backend are all just adding the
default UUID, not the real one.

The patch also allows displaying the group UUID in the node list.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

9b9da389 11/24/2010 03:54 pm Iustin Pop

Utils: add a default UUID

This will be used as a placeholder for the cases when we need a UUID
(any UUID), but we don't have one handy.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

a3e8da03 11/23/2010 04:46 pm Iustin Pop

Merge branch 'devel-0.2' into master

7570569e 11/23/2010 12:58 pm Iustin Pop

Improve the standard deviation computation

This does just two passes, instead of three, over the list. This reduces
the overall runtime well enough (~25%) in some tests, but it's not
reproducible using profiling, so I don't know how much the function
itself is being sped-up....

5e718042 11/19/2010 01:35 pm Iustin Pop

Simu loader: move the loading to non-IO code

While we don't actually have IO code in the Simu loader, we do have the
same interface. So we move the code again to a separate parseData
function which is exported.

b3f0710c 11/19/2010 01:08 pm Iustin Pop

Luxi loader: split parsing from loading

748bfcc2 11/19/2010 01:06 pm Iustin Pop

Rapi loader: split parsing from loading

The change is similar to the text loader change.

dadfc261 11/19/2010 01:00 pm Iustin Pop

Text loader: split parsing from loadData

This change, which will be followed by similar changes in the other
loaders, splits the parsing of the data from the actual loading from
disk. Since the parsing doesn't usually involve IO actions, we will be
able to better test the parsing. The loading becomes a smaller part of...

9d775204 11/11/2010 01:02 pm Iustin Pop

Ignore nodes which are not vm_capable

This break compatibility with Ganeti pre-2.3.

306cccd5 11/09/2010 09:37 am Iustin Pop

Fix tag exclusion weight

Currently, the tag exclusion metric has a weight of one, which means
there might be cases where we won't move instances around because it
upsets the cluster metrics. However, we do want to make a higher effort
for cleaning up tag collisions, so we increase the weight to an...

e3ae9508 10/07/2010 03:42 pm Iustin Pop

Fix some warnings in unittests

03c6d8fa 10/06/2010 03:56 pm Iustin Pop

Improve the error message for tiered alloc option

848b65c9 09/03/2010 06:02 pm Iustin Pop

Use the mingain options in the balancing algorithm

Also adds them in hbal.

4f807a57 09/03/2010 03:35 pm Iustin Pop

Add new CLI options for min gain during balancing

Recent hbal seems to run many steps for small improvements (< 1e-3), so
we should stop early in this case.

We add a new option (-g), that will be used for the minimum gain during
balancing. This check will only become active when the cluster score is...

74e89a14 09/02/2010 03:43 pm Iustin Pop

Fix ReplaceSecondary moves for offline nodes

The addition of a new secondary on a node is doing two memory tests:
- in strict mode, reject if we get into N+1 failure
- reject if the new instance memory is greater than the free memory (not
available memory) on the node...

adc5c176 09/02/2010 03:43 pm Iustin Pop

Add some more debugging functions

These are just variations of the standard debug, but are provided for
simpler code, since lazyness is something causing non-computation of
debug statements.

94d08202 08/30/2010 12:12 pm Iustin Pop

Change iterateAlloc to return the instance list

The Cluster.iterateAlloc and tieredAlloc functions are changed to also
return the updated instance list, since it is needed to have a “full”
cluster view.

4a273e97 08/30/2010 12:12 pm Iustin Pop

Abstract the cluster serialization from hscan.hs

This is currently hardcoded in an internal function in hscan.hs, and we
move it to Text.hs for later use.

02da9d07 08/25/2010 07:40 pm Iustin Pop

Add a new option --save-cluster

This option will in the future be used to serialize the cluster state in
hbal and hspace after the rebalance/allocation steps.

50811e2c 08/25/2010 07:04 pm Iustin Pop

Add unittest for Node text serialization

This checks that the Node text serialization and deserialization
operations are idempotent when combined other.

a070c426 08/25/2010 06:53 pm Iustin Pop

Switch unittest to custom hostnames

Currently, the hostnames are almost fully arbitrary chars, which breaks
the assumption that nodes/instances will be normal DNS hostnames.

This patch adds some custom generators for these hostnames, that will
allow better testing of text loader serialization/deserialization.

3bf75b7d 08/24/2010 07:30 pm Iustin Pop

Move text serialization functions to Text.hs

Currently these are in hscan, and cannot be reused easily.

0ca66853 07/27/2010 09:44 pm Iustin Pop

hail: fix error message for failed multi-evac

Currently we show the instance index, but this makes no sense outside
the current running program. Instead, we show the instance name.

b8262965 07/22/2010 04:57 pm Iustin Pop

Fix another haddock issue

691dcd2a 07/22/2010 06:03 am Iustin Pop

Remove an obsolete function and add Utils tests

2cae47e9 07/22/2010 01:42 am Iustin Pop

Allow balancing moves to introduce N+1 errors

This patch switches the applyMove function to the extended versions of
Node.addPri and addSec, and passes the override flag based on the state
of the node that we're moving away from.

8a3b30ca 07/22/2010 01:42 am Iustin Pop

Introduce per-metric weights

Currently all metrics have the same weight (we just sum them together).
However, for the hard constraints (N+1 failures, offline nodes, etc.)
we should handle the metrics differently based on their meaning. For
example, an instance living on a primary offline node is worse than an...

c3c7a0c1 07/22/2010 01:42 am Iustin Pop

Change the meaning of the N+1 fail metric

Currently, this metric tracks the nodes failing the N+1 check. While
this helps (in some cases) to evacuate such nodes, it's not a good
metric since rarely it will change during a step (only at the last
instance moving away). Therefore we replace it with the count of...

223dbe53 07/22/2010 01:42 am Iustin Pop

Add some more imports to QC.hs

This is needed so that in the coverage report we list all modules, even
the ones we don't test at all, such that we get the complete results.

3e3c9393 07/22/2010 01:42 am Iustin Pop

Introduce a relaxed add instance mode

In case an instance is living on an offline node, it doesn't make sense
to refuse moving it because that would create N+1 failures; failing N+1
is still much better than not running at all. Similarly, if the
secondary node of an instance is offline, meaning the instance doesn't...

fb33aaaf 07/19/2010 02:20 pm Iustin Pop

Remove an obsolete function

printSolution is no longer used, as we print the solution iteratively
now.

14c972c7 07/19/2010 02:20 pm Iustin Pop

hbal: print short names in steps list

This was a regression from the name handling changes, as we started
using the original names for the solution list (which is not designed
for parsing/feeding back into ganeti).

2849670b 07/19/2010 02:20 pm Iustin Pop

Remove obsolete Container.maxNameLen

This was only used in one place (hbal), and is obsolete by the change to
the dual name/alias structure.

6dfa04fd 07/19/2010 12:13 am Iustin Pop

Allow '+' in node list fields

When the field list is prefixed with a plus sign, this will extend the
default field list, instead of replacing it entirely.

16f08e82 07/19/2010 12:13 am Iustin Pop

Update the node list fields

This patch renames the pri/sec to pcnt/scnt, and adds the real primary
and secondary instance lists, the peermap and the index of a node as
selectable options.

124b7cd7 07/19/2010 12:13 am Iustin Pop

Cleanup a node's peer map when possible

If the last secondary instance of a peer is deleted (detected by the new
peer memory value being equal to zero), then the pair (pdx, 0) should be
deleted completely. This is not optimization per se, but rather cleanup...

95446d7a 06/21/2010 12:12 pm Iustin Pop

Fix another haddock special-char issue

db079755 06/21/2010 05:59 am Iustin Pop

Remove JOB_STATUS_GONE and add unittests

… for the serialization/deserialization of the job and opcode status.

Job status 'gone' was not actually used. It can be reintroduced if
needed.

41065165 06/21/2010 05:46 am Iustin Pop

Add opcode status constants/type

This mirrors, again, the Ganeti constats, and are added for future use.

7e98f782 06/21/2010 05:46 am Iustin Pop

Rename the job status constants

The rename is done such that we match Ganeti's own constants.

95f490de 06/08/2010 03:48 am Iustin Pop

Optimise the Luxi.recvMsg function

Since the current buffer cannot contain (during network reads) an EOM,
we should look for the EOM only in the newly-received string. While
this shouldn't make much difference, in some tests it cuts the recvMsg
total time by around half....

04282772 06/08/2010 01:09 am Iustin Pop

Complete the client Luxi implementation

All current Luxi calls are supported after this patch. A bug in
ArchiveJob is also fixed (Ganeti's job IDs are strings).

9622919d 06/08/2010 12:35 am Iustin Pop

Add support for more LUXI calls

While not are directly useful, having them will open some possibilities
(e.g. polling for job changes in hbal's -X mode, and auto-archiving the
jobs once they are successful).