Statistics
| Branch: | Tag: | Revision:

root / Ganeti / HTools / Cluster.hs @ d6c76bd5

History | View | Annotate | Download (44.5 kB)

# Date Author Comment
d6c76bd5 02/01/2011 02:08 pm Iustin Pop

tryAlloc: restrict valid node pairs to same-group

This is a cheap way to make capacity calculation work well with
multi-group clusters.

There are two alternatives in implementing this:

- we can split the cluster into groups, run individual group
allocation, and then try to recombine the groups; but this doesn't...

40ee14bc 02/01/2011 02:08 pm Iustin Pop

Cluster.hs: add a new type alias

Just a bit of small cleanup, since we might want to use more functions
with this signature in the future.

Signed-off-by: Iustin Pop <>
Reviewed-by: Michael Hanselmann <>
Reviewed-by: Guido Trotter <>

1bc47d38 01/07/2011 05:39 pm Iustin Pop

Convert node evacuation to multi-group

This patch does the necessary changes to make the new tryMGEvac work
correctly: each instance remains inside its primary node's group when
it is evacuated.

This is done by splitting up the to-be-evacuated instance list per...

2ca68e2b 01/07/2011 05:39 pm Iustin Pop

Evacuation: extract the inner fold function

This makes the code more readable, which will help with the
multi-group evacuation.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

4bc33d60 01/07/2011 05:39 pm Iustin Pop

Instance relocation: stay within the current group

This patch adds a new top-level relocation function that restricts the
relocation to the instance's group, and switches hail to it.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

cb0c77ff 12/30/2010 03:56 pm Iustin Pop

Container: remove fromAssocList

Container.fromAssocList is just a re-export of IntMap.fromList; it
makes sense to remove it and simply export the original name, as it
needs just a bit of renaming in the rest of the code.

Signed-off-by: Iustin Pop <>...

a3eee4ad 12/30/2010 03:56 pm Iustin Pop

Parallelize the balancing computations

This small patch changes the balancing computation to work in
parallel, if possible.

While the normal linking is against the single-threaded runtime, if
the code is linked against the multi-threaded one, the balancing will...

d5ccec02 12/30/2010 03:46 pm Iustin Pop

Allocation routines: return list of resource stats

Currently, the allocation routines (iterateAlloc and tieredAlloc)
return only the final state of the cluster and the list of allocated
instances. For better visibility in how the cluster resources change,...

f52dadb2 12/30/2010 03:43 pm Iustin Pop

Fix updating of available (V)CPUs in CStats

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

6bc39970 12/30/2010 03:41 pm Iustin Pop

Add 'Read' instances for most objects

This allows a cluster structure to be easily serialized via "read";
together with the already existing instances of Show, this gives a
poor man's serialization/deserialization implementation.

The patch also exports the compDetailedCV function from Cluster.hs, so...

50211c86 12/23/2010 05:17 pm Iustin Pop

Merge branch 'stable-0.2'

  • devel-0.2:
    Update NEWS file for 0.2.8 release
    hbal: return meaningful exit code for job failures
    Change the balancing function
4715711d 12/23/2010 02:25 pm Iustin Pop

Change the balancing function

Currently the balancing function is a modified version of the standard
deviation (stddev divided by list length), due to historical reasons.

While this works fine for small clusters, for big clusters it makes
the balancing effect too "weak", and in some cases it refuses to...

949397c8 12/23/2010 11:16 am Iustin Pop

Move some tiered spec functionality to Cluster.hs

This splits out a bit of code from hspace.hs and moves it into its own
function in Cluster.hs.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

73206d0a 12/20/2010 02:23 pm Iustin Pop

IAllocator: respect the alloc_policy for groups

This patch changes the allocate mode to respect the alloc_policy for
groups. It does this by changing the sort key from simply the solution
score, to a tuple with two elements: the alloc policy (which is now an...

aec636b9 12/20/2010 02:23 pm Iustin Pop

hail: display group names in info messages

This patch switches from the group index to the group name for the
informational messages in the hail results.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

10ef6b4e 12/20/2010 02:23 pm Iustin Pop

Change the Node.group attribute

Currently, the Node.group attribute is the UUID of the group, as until
recently Ganeti didn't export the node group properties. Since it does
so now, we make the following changes (again apologies for a big
patch):

- we change the group attribute to be an index, similar to the way an...

dec88196 12/09/2010 04:08 pm Iustin Pop

Improve error reporting for small clusters

When doing a two-node allocation on a cluster/group in which only one
node is online, or a one-node allocation without any online nodes, we
didn't show a valid error mesage. The patch changes tryAlloc to "fail
hard" in this case, to make the failure explicit....

9b1584fc 12/09/2010 04:08 pm Iustin Pop

hail/allocate: implement multi-group support

This is a bit hackish. We add a new function that takes the input data,
splits it into groups, runs the original tryAlloc for each group, and
then chooses the best solution, but adds the log messages from all the...

859fc11d 12/09/2010 04:08 pm Iustin Pop

Add a 'log' attribute to allocation solutions

And also a couple of functions for describing a given solution; these
will be used in the future instead of the ones currently in hail.

The patch also enhances the description of failure messages.

Signed-off-by: Iustin Pop <>...

85d0ddc3 12/09/2010 04:08 pm Iustin Pop

Change AllocSolution from tuple to its own type

Tuples are good for two, three, at most four elements. Beyond that, the
continuous pattern matching and construction/deconstruction becomes
tedious.

Since in the future we'll probably keep more information in the...

a334d536 12/01/2010 07:08 pm Iustin Pop

Cleanup AllocSolution after AllocElement changes

Since we added the score to AllocElement, we don't need to wrap
AllocElement in yet another tuple, just to attach the cluster score. So
we simplify the AllocSolution type.

Signed-off-by: Iustin Pop <>...

7d3f4253 12/01/2010 07:08 pm Iustin Pop

AllocElement: extend with the cluster score

AllocElement, a type used as a result of allocations, holds the status
of the nodes after the allocation. In most cases, we'll compare this
allocation result with others, to see which allocation decision makes
the most sense. This comparison is done via the cluster score....

06fb841e 12/01/2010 07:08 pm Iustin Pop

Add two utility functions for the Result type

Actually, this just moves the functions from the QC module to Types, and
removes a duplicate entry from Cluster.

Signed-off-by: Iustin Pop <>
Reviewed-by: Balazs Lecz <>

f4161783 12/01/2010 03:00 pm Iustin Pop

Add Cluster.splitCluster for node groups

This splits a top-level cluster information into the component node
groups. Instance go to the group of their primary node, but otherwise we
don't disallow split instances.

Signed-off-by: Iustin Pop <>...

32b8d9c0 12/01/2010 03:00 pm Iustin Pop

Add two functions for checking cluster consistency

For now, we don't support instances allocated across two groups, and we
will reject such clusters. The isClusterConsistent function will return
a list of inconsistent instances, potentially allowing operation without...

306cccd5 11/09/2010 09:37 am Iustin Pop

Fix tag exclusion weight

Currently, the tag exclusion metric has a weight of one, which means
there might be cases where we won't move instances around because it
upsets the cluster metrics. However, we do want to make a higher effort
for cleaning up tag collisions, so we increase the weight to an...

848b65c9 09/03/2010 06:02 pm Iustin Pop

Use the mingain options in the balancing algorithm

Also adds them in hbal.

94d08202 08/30/2010 12:12 pm Iustin Pop

Change iterateAlloc to return the instance list

The Cluster.iterateAlloc and tieredAlloc functions are changed to also
return the updated instance list, since it is needed to have a “full”
cluster view.

0ca66853 07/27/2010 09:44 pm Iustin Pop

hail: fix error message for failed multi-evac

Currently we show the instance index, but this makes no sense outside
the current running program. Instead, we show the instance name.

c3c7a0c1 07/22/2010 01:42 am Iustin Pop

Change the meaning of the N+1 fail metric

Currently, this metric tracks the nodes failing the N+1 check. While
this helps (in some cases) to evacuate such nodes, it's not a good
metric since rarely it will change during a step (only at the last
instance moving away). Therefore we replace it with the count of...

8a3b30ca 07/22/2010 01:42 am Iustin Pop

Introduce per-metric weights

Currently all metrics have the same weight (we just sum them together).
However, for the hard constraints (N+1 failures, offline nodes, etc.)
we should handle the metrics differently based on their meaning. For
example, an instance living on a primary offline node is worse than an...

2cae47e9 07/22/2010 01:42 am Iustin Pop

Allow balancing moves to introduce N+1 errors

This patch switches the applyMove function to the extended versions of
Node.addPri and addSec, and passes the override flag based on the state
of the node that we're moving away from.

14c972c7 07/19/2010 02:20 pm Iustin Pop

hbal: print short names in steps list

This was a regression from the name handling changes, as we started
using the original names for the solution list (which is not designed
for parsing/feeding back into ganeti).

fb33aaaf 07/19/2010 02:20 pm Iustin Pop

Remove an obsolete function

printSolution is no longer used, as we print the solution iteratively
now.

6dfa04fd 07/19/2010 12:13 am Iustin Pop

Allow '+' in node list fields

When the field list is prefixed with a plus sign, this will extend the
default field list, instead of replacing it entirely.

3fea6959 05/20/2010 07:45 pm Iustin Pop

Add more unit tests for allocation/balance

The patch adds some simple unit-tests for both the allocation function
(we can allocate small instances on an empty cluster, we can allocate in
tiered more starting from any size) and the balancing functions (one...

3ce8009a 05/20/2010 01:31 pm Iustin Pop

Move two functions from hspace to Cluster.hs

This is done so we can test a longer pipeline.

8423f76b 05/20/2010 01:31 pm Iustin Pop

Make CStats instance of show

This helps debugging via ghci.

3e4480e0 05/20/2010 12:07 pm Iustin Pop

Stop modifying names for internal computations

Currently the name used internally is modified and holds the shortened
name of the nodes/instances. This has caused issues before, since we
always have to strip the suffix from input data and reapply it if we...

f4c0b8c5 05/18/2010 07:31 pm Iustin Pop

Remove the noLimit values and always use limits

This patch moves from allowing no-limits for disk/cpu ratios, and always
use a real limit. For disk, it's simple since we use 0, which means no
reservations for disks. For CPU, we set an (arbitrary) limit of 64 v/p,...

e2436511 05/04/2010 02:42 pm Iustin Pop

Fix hspace's KM metrics

We returned the KM_POOL_* metrics as the final state, not as the delta
between the final and the initial state.

9b8fac3d 04/15/2010 12:50 pm Iustin Pop

Add a new function to compute allocation deltas

Given two cluster states, the new function can answer the following
questions:

- how much resources currently allocated
- how much resources finally allocated (delta from above is how much we
can actually allocate on the cluster)...

86ecce4a 04/15/2010 12:27 pm Iustin Pop

Introduce total vcpu tracking in CStats

We add a new field that tracks the available virtual cpus (expressed as
node cpus times the vcpu ratio).

5182e970 02/25/2010 03:39 pm Iustin Pop

A number of small fixes from hlint

c424cdc8 02/23/2010 02:13 pm Iustin Pop

balance function: use the movable flag directly

Instead of deciding based on secondary node, use the new flag.

12b0511d 02/22/2010 04:19 pm Iustin Pop

Add a tryEvac function

This will be used by the node evacuate IAllocator request type.

Signed-off-by: Iustin Pop <>

1fe81531 02/22/2010 04:19 pm Iustin Pop

Move a type declaration to Node.hs

We'll need AllocElement in both Cluster and IAlloc in the future, so we
move it to Node.hs which is imported by both.

Signed-off-by: Iustin Pop <>

23f9ab76 02/22/2010 04:19 pm Iustin Pop

Change an internal type from Maybe to list

In preparation for multiple responses, we change from Maybe to List
(both used in the container sense).

This allows us to keep the same workflow for all kind of requests.

Signed-off-by: Iustin Pop <>

2e28ac32 02/22/2010 03:50 pm Iustin Pop

Implement evacuation mode in hbal

This mode restricts the list of instances to be moved to the instances
living on the offline (and drained) nodes.

Signed-off-by: Iustin Pop <>

a804261a 01/14/2010 06:38 pm Iustin Pop

Move instance relocation test upper in the chain

Currently we test each instance for relocation in checkMove; however, it
is a little more clear if we pass only the relocatable instances to
checkMove. The patch also slightly rewrites (indendation/style) the...

5ad86777 01/14/2010 06:05 pm Iustin Pop

Split the balancing function in two parts

Currently in the balancing function we do two thing:

- take the decision where to do a new balancing round or not
- and actually computing the balancing round

This is not nice, as the two parts are conceptually separate, so this...

0c860cff 12/11/2009 07:01 pm Iustin Pop

Convert n1_score metric from % to count

This increases the priority of fixing N+1 failures compared to balancing
metrics.

673f0f00 12/11/2009 06:47 pm Iustin Pop

Metric: count of primary instances/offline nodes

This helps with evacuation/failover of instances on 2-node clusters with
one one offline.

e4d31268 12/11/2009 06:43 pm Iustin Pop

Offline instance metric: change from % to count

Currently we use the offline instance percentage (with range [0, 1]),
but this is not good, since we want the evacuation of such instances to
have a high priority; therefore we change this to a count of offline...

d844fe88 11/17/2009 11:44 am Iustin Pop

Use conflicting primaries count in cluster score

This small patch adds the number of conflicting primaries in the cluster
score. This is different from the other non-CV metrics where we usually
compute the percentage of failing instances (for that metric); but for a...

e98fb766 11/10/2009 02:59 pm Iustin Pop

Allow overriding the field list in -p

The print nodes option can now accept an optional field list to
customise the output. This is ugly, since the field names do not match
the header names, but it is at least barely customisable (at runtime).

76354e11 11/09/2009 05:49 pm Iustin Pop

Move more node-listing functionality in Node.hs

This will prepare for the runtime-selectable field list.

daee4bed 11/09/2009 03:43 pm Iustin Pop

Add a few comments in the scoring function

30ff0c73 10/21/2009 11:47 am Iustin Pop

Expand the --print-instances output

This adds run status, resource parameters and load parameters for
instances.

8c9af2f0 10/19/2009 12:17 am Iustin Pop

Simplify the cstats initializer

Since all values are initialized to zero, the exact ordering is not
important and thus we can use the positional mode for simpler code.

The patch also adds docstrings to the cstats functions.

668c03b3 10/19/2009 12:11 am Iustin Pop

Simplify Cluster.computeMoves

Since we now have an actual type for describing the instance moves
(IMove), it's simpler to convert this into the move description/move
commands, rather than re-computing the move based on initial and final
nodes. This makes the shell commands computation and over-Luxi command...

eb2598ab 10/18/2009 11:20 pm Iustin Pop

Remove obsolete export

The ‘Placement’ type has been moved to Types.hs but we kept exporting it
from Cluster, which is not needed.

c5f7412e 10/18/2009 08:21 pm Iustin Pop

Generalise the node/instance listing

This patch introduces a generic formatTable function (based on, and
similar to the Ganeti one, but different and more FP in style) and
changes the node and instance listing to it.

The node list (due to the many variables) is still a little bit hackish...

ad6cffe4 10/18/2009 07:38 pm Iustin Pop

Fix instance listing for non-redundant case

ee9724b9 10/16/2009 04:59 pm Iustin Pop

Start using the utilisation scores in balancing

This enables the per-node load/total available capacity scores to be
used in balancing. Note that the total available capacity is currently
fixed at zero and cannot be changed by the user.

183a9c3d 10/16/2009 10:09 am Iustin Pop

Show the load on nodes in node lists

The strange printf usage is due to some limitation (it seems) in ghc for
very long argument lists. The whole printout should be rewritten later.

507fda3f 10/15/2009 05:00 pm Iustin Pop

Allow displaying the instance map in hbal

This is similar to --print-nodes, but with much fewer fields.

f5b553da 10/14/2009 04:41 pm Iustin Pop

Style change: cluster CStats camel-casing

This is again the cs_x to csX name change.

2060348b 10/14/2009 04:41 pm Iustin Pop

Style change: node and instance attributes

This changes from a_b to aB in all node and instance attributes, to
match the standard Haskell style. Also attributes that should have been
camel-cased but weren't were changed (e.g. plist → pList, pnode →
pNode).

fca250e9 10/14/2009 01:45 pm Iustin Pop

Modify the internals of the detailed CV scores

Before we used a tuple; since we'll need more metrics in the future,
it's simpler to transform this into a list of doubles, whose elements
are handled homogeneously by all the code that needs them.

dfbbd43a 10/14/2009 11:56 am Iustin Pop

Change iMoveToJob to properly create migrates

The current Cluster.iMoveToJob always creates failovers, which is not
what we want. This simply used the original instances status to select
between these two (this is not optimal by the way, since the status...

924f9c16 10/14/2009 11:55 am Iustin Pop

Extend the MoveJob type to hold the instance index

This will be needed in order to generate the proper instance move commands.

Signed-off-by: Iustin Pop <>

a2e90275 10/02/2009 06:54 pm Iustin Pop

Store the instance move in the MoveJobs

This will automatically sort our Ganeti jobs into the independent job
sets, and then we can submit them separately.

92e32d76 10/02/2009 06:48 pm Iustin Pop

Move some more type definitions to Types.hs

6b20875c 10/02/2009 06:37 pm Iustin Pop

Add a function converting Placements into Jobs

This converts from htools-specific Placements into Ganeti standard
OpCodes, which will later allow execution via Luxi.

3173c987 10/02/2009 05:52 pm Iustin Pop

Record the move being performed in a Placement

This will allow a more descriptive output later in the solution list, as
opposed to trying to reconstruct the move from the node indices.

The patch also documents the Placement members.

0e8ae201 10/02/2009 02:56 pm Iustin Pop

hbal: Implement grouping of moves into jobsets

Since moving two instances between different node-quadruples (inst X: A,
B → C, D and inst Y: E, F → G, H) can be parallelised by Ganeti, it
makes sense to split the operation list into jobsets whose execution...

fbb95f28 09/28/2009 05:09 pm Iustin Pop

Turn on, and fix, more warnings

The Makefile was intented to be -Wall and not simply -W, but I missed
that. This enables more warnings and also enables -Werror (except for
the tests).

f25e5aac 08/30/2009 06:55 pm Iustin Pop

Split the balancing algorithm in two parts

Currently the computation, recursing part and the IO part (progress
updates) of the balancing main function (iterateDepth) are all in the
same function, which makes it hard to test. This patch moves the
decision/computation part (whether to proceed one more round, whether we...

c0501c69 08/26/2009 11:07 am Iustin Pop

Implement support for 'cheap' moves only

This patch adds support for cheap (failover/migrate) operations only in
the balancing algorithm and in the hbal command line options.

This allows a very quick balancing (compared to allowing replace-disks)
which can be useful as a scheduled operation.

c9926b22 08/26/2009 10:40 am Iustin Pop

Use migrate or failover based on instance state

While we can't guarantee that the instance will be in the same state by
the time the migrate/failover command will be run, we can at least try
to do the right thing assuming no other changes to the cluster state....

2485487d 07/14/2009 05:15 pm Iustin Pop

Fix a few hlint errors

7d11799b 07/09/2009 04:58 pm Iustin Pop

Fix a haddoc issue

31e7ac17 07/09/2009 04:16 pm Iustin Pop

hspace: fix failure handling of tryAlloc results

Currently hspace doesn't handle failures from tryAlloc correctly; this
patch changes the iterateDepth function in hspace to return a Result (…)
so that errors can be propagated correctly.

The patch also changes one output key to be more clear and a typo in...

478df686 07/09/2009 03:44 pm Iustin Pop

Change the tryAlloc/tryReloc workflow

Currently, the tryAlloc and tryReloc function return a list with all the
results, both failures and successes. This is fine for hail, which does
one round of allocations, but is not so good for hspace, which does
iterative rounds; since at each (successful) step we only take the best...

685935f7 07/08/2009 08:30 pm Iustin Pop

Simplify the Cluster.tryAlloc structures

Currently the tryAlloc function calls the
allocateOnSingle/allocateOnPair and the builds a new tuple with those
functions's result plus the new node list. This is however suboptimal
in two respects:
- the new nodes added are the 'old' versions of the respective nodes,...

8880d889 07/08/2009 07:38 pm Iustin Pop

Slight change to the internal allocation results

Currently the Cluster.AllocSolution type is defined as a list of
‘(OpResult Node.list, …)’ and the results for applyMove are defined as
‘(OpResult Node.List, …)’. Both these means that the failure/success
indication is hidden in the first elements of this tuple, which makes is...

de4ac2c2 07/08/2009 12:49 pm Iustin Pop

hspace: move instance count and score into CStats

Currently the instance count and cluster score are separated from the
other initial/final phase stats, even though they are very similar. This
patch moves computation of these two into totalResources/CStats and...

8c4c6a8a 07/07/2009 12:56 pm Iustin Pop

Export more stats in hspace

This patch changes Cluster.totalResources to compute more resources and
prints them in hspace.

16103319 07/07/2009 11:06 am Iustin Pop

Fix score calculation to work with empty clusters

Currently the cluster score calculation includes an offline instance
percentage, expressed as “offline inst / (offline + online inst)”, which
results in NaN for empty clusters. This patch changes the calculation...

41c3b292 07/07/2009 12:13 am Iustin Pop

Simplify Cluster.computeMoves

This patch changes the function Cluster.computeMoves to use guards and a
couple of subexpressions in order to greatly simplify it.

9f6dcdea 07/06/2009 11:50 pm Iustin Pop

Fix hlint-generated warnings

This big patch cleans up the code per hlint indications. Many removals
of extra parentheses, replacements of concat . map with concabtMap,
extra dollar signs, eta reductions, etc. were performed.

The code still compiles and passes a couple of manual tests on sample...

f2280553 07/05/2009 03:53 pm Iustin Pop

Introduce a new type for allocation results

Currently the allocation/move operations workflow return ‘Maybe a’,
which is very convenient but loses all details about the failure mode.

This patch introduces a new data type which encodes the specific failure...

266aea94 07/05/2009 03:21 pm Iustin Pop

Remove hn1 and related code

hn1 was deprecated for a while and this patch removes it altogether. The
support code in Cluster.hs is also removed.

301789f4 07/03/2009 10:01 pm Iustin Pop

Fix totalResources avail disk computation

This uses the newly-added Node.availDisk to compute the actual available
disk correctl, and display the total allocatable disk in hspace.

1a7eff0e 07/03/2009 12:50 am Iustin Pop

Add a new type for cluster statistics

Currently totalResources returns a 5-tuple of integers. This is not easy
to handle, as each change on the return type means that each caller must
be updated.

This patch adds a new type for cluster stats and uses that instead as...

e2af3156 07/02/2009 01:33 pm Iustin Pop

Add display of more stats in hspace

This patch changes Cluster.totalResources to compute more details about
the cluster status, and enhances hspace to display more of these.

0c936d24 06/16/2009 12:52 pm Iustin Pop

Fix a haddock/docstring issue

78694255 06/12/2009 02:22 am Iustin Pop

Fix the various monomorphism warning

In a few places (e.g. tryRead or any printf call) it's a little bit hard
to add the correct type signatures, but in the it is possible to fix
these warnings (which can bite one in subtle cases).

3c64b5aa 06/12/2009 01:12 am Iustin Pop

Small changes to the node list output

This is just some cleanup of the node list output, adding pcpu/vcpu
counters, and making the display slightly nicer.