tryAlloc: restrict valid node pairs to same-group
This is a cheap way to make capacity calculation work well withmulti-group clusters.
There are two alternatives in implementing this:
- we can split the cluster into groups, run individual group allocation, and then try to recombine the groups; but this doesn't...
Cluster.hs: add a new type alias
Just a bit of small cleanup, since we might want to use more functionswith this signature in the future.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Rapi: read and use the vm_capable node flag
Similar to the IAllocator change, this patch reads and uses thevm_capable flag in Rapi. Furthermore, it changes the group UUIDreading to the same maybeFromObj infrastructure.
Signed-off-by: Iustin Pop <iustin@google.com>...
IAllocator: read and use the vm_capable node flag
This allows non-vm_capable nodes, which don't export runtime data, tonot break the IAllocator message parsing.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Balazs Lecz <leczb@google.com>
IAllocator: replace fake policy with real one
This small patch actually reads the allocation policy from theIAllocator message.
JSON: improve error reporting
Currently, we list the entire object in error messages. But for largeobjects (e.g. an IAllocator message), this makes the outputunreadable, as the elements are containers themselves.
To simplify the reporting, we only list the keys, as this is more...
JSON functions: change signature of (maybe)fromObj
Currently, fromObj/maybeFromObj take first the key, and then theobject. This is suboptimal, as this form is not easy to use withpartial functional application.
To make it easier to switch between tryFromObj, fromObj and...
Convert node evacuation to multi-group
This patch does the necessary changes to make the new tryMGEvac workcorrectly: each instance remains inside its primary node's group whenit is evacuated.
This is done by splitting up the to-be-evacuated instance list per...
Evacuation: extract the inner fold function
This makes the code more readable, which will help with themulti-group evacuation.
Rapi: fully evaluate the body in getUrl
Currently, the Rapi.getUrl function returns the body withoutevaluating it, and the other functions (loadData, parseData) do thesame. In effect, the top-level structure returned from loadData can bea thunk which depends on the curl operation, thus keeping the curl...
Rapi: move the curl options list to a separate var
A small cleanup, this just moves the options to a separate list toavoid instantiation at every call.
Node: add and export a 'used disk' function
This is similar to iMem.
Node: Export the instance memory function
This exports the iMem function as a standalone function, instead ofbeing hardcoded in showField.
Instance relocation: stay within the current group
This patch adds a new top-level relocation function that restricts therelocation to the instance's group, and switches hail to it.
Container: remove fromAssocList
Container.fromAssocList is just a re-export of IntMap.fromList; itmakes sense to remove it and simply export the original name, as itneeds just a bit of renaming in the rest of the code.
Parallelize the balancing computations
This small patch changes the balancing computation to work inparallel, if possible.
While the normal linking is against the single-threaded runtime, ifthe code is linked against the multi-threaded one, the balancing will...
Allocation routines: return list of resource stats
Currently, the allocation routines (iterateAlloc and tieredAlloc)return only the final state of the cluster and the list of allocatedinstances. For better visibility in how the cluster resources change,...
Fix updating of available (V)CPUs in CStats
RAPI: implement backwards compat with Ganeti 2.3
This is a cheap way to get back compatibility with Ganeti 2.3 (andlower) in the RAPI backend. It is however not very safe (the /2/groupsresource could fail due to other reasons), so it is added onlytemporarily....
Document Utils.tryFromObj
Add 'Read' instances for most objects
This allows a cluster structure to be easily serialized via "read";together with the already existing instances of Show, this gives apoor man's serialization/deserialization implementation.
The patch also exports the compDetailedCV function from Cluster.hs, so...
Add maybePrintInsts for the instance listing
This again abstracts a bit the instance listing. Due to the fact thatI don't want to import Cluster.hs in CLI.hs, we pass the alreadygenerated output. It also moves the instance display to stderr.
Add maybePrintNodes for abstracting the node list
Since this bit of code (including the “when (isJust …)” is used inmultiple places, let's abstract it in a function that is usedconsistently. One (bad?) side-effect is that all node lists are doneto stderr, including the ones from hbal where it was previously done...
Add maybeSaveData for cluster state saving
This functionality was replicated in multiple places (hbal & hspace),so we abstract it for better clarity.
Additionally, in hbal we now save the state both before and afterbalancing.
Convert Text.serializeCluster to ClusterData
Convert the rest of the pipeline to ClusterData
This patch converts the backends and mergeData to the new ClusterDatatype.
Move part of the loader pipeline to ClusterData
Convert Loader.RqType to ClusterData
Add a new type ClusterData
This will be used to hold all the disparate uses of the cluster data:we have either tuples with these four elements, or functions takingthese four arguments, etc.
Simulation backend: read the allocation policy too
This patch moves the allocation policy from hardcoded to be read fromthe given specification, and extends the error message for invalidspecifications.
Simulation backend: allow multiple node groups
This patch changes the behaviour of the --simulation option to be anincremental option, where each new use defines a new node group. Thisallows simulation of more complex clusters.
Merge branch 'stable-0.2'
Change the balancing function
Currently the balancing function is a modified version of the standarddeviation (stddev divided by list length), due to historical reasons.
While this works fine for small clusters, for big clusters it makesthe balancing effect too "weak", and in some cases it refuses to...
Move some tiered spec functionality to Cluster.hs
This splits out a bit of code from hspace.hs and moves it into its ownfunction in Cluster.hs.
IAllocator: respect the alloc_policy for groups
This patch changes the allocate mode to respect the alloc_policy forgroups. It does this by changing the sort key from simply the solutionscore, to a tuple with two elements: the alloc policy (which is now an...
Text: read/write the allocation policy
Luxi: read the allocation policy from the cluster
Rapi: read the allocation policy from the cluster
Implement a JSON instance for AllocPolicy
This will allow reading this attribute via the Rapi/Luxi backends.
Text.hs: serialize cluster tags when writing data
This is the complement to the reading part. Now the live-test workscorrectly against clusters with configured exclusion tags.
Text.hs: also read cluster tags from the data file
This means that a file with the correct information is as accurate asthe other backends (Luxi, Rapi). Serialization of tags is in the nextpatch.
Text.hs: change to use sepSplit
The new sepSplit function can split based on empty lines, so we removethe hackish text splitting from before and simply use sepSplit. Thisis needed as the addition of extra sections would have increased thecode linearly, which we don't want :)...
Generalise the sepSplit function
Currently it works on splitting strings by individual chars, but wecan generalise it to split lists by list elements, which means we canreuse it later in the Text module for splitting both lists of chars by'|' or lists of lines by empty newlines. The change also makes the...
hail: display group names in info messages
This patch switches from the group index to the group name for theinformational messages in the hail results.
Text.hs: also save the group data when serialising
This should have been in the previous patches, but sent separate forclarity.
The live-test script is updated to read the first node from thecluster, now that the text files don't start anymore with the node...
Change the Node.group attribute
Currently, the Node.group attribute is the UUID of the group, as untilrecently Ganeti didn't export the node group properties. Since it doesso now, we make the following changes (again apologies for a bigpatch):
- we change the group attribute to be an index, similar to the way an...
Rework the data loader pipelines to read groups
This (invasive) patch changes all the loader pipelines to read the nodegroups data from the cluster, via the various backends. It is invasiveas it needs coordinated changes across all the loaders.
Note that the new group data is not used, just returned....
Add lookupGroup utility function
This will be used in the various backends similar to the lookupNodefunction.
Add a new Group.hs module describing node groups
This is not yet used by the rest of the code.
Improve error reporting for small clusters
When doing a two-node allocation on a cluster/group in which only onenode is online, or a one-node allocation without any online nodes, wedidn't show a valid error mesage. The patch changes tryAlloc to "failhard" in this case, to make the failure explicit....
hail/allocate: implement multi-group support
This is a bit hackish. We add a new function that takes the input data,splits it into groups, runs the original tryAlloc for each group, andthen chooses the best solution, but adds the log messages from all the...
Add a 'log' attribute to allocation solutions
And also a couple of functions for describing a given solution; thesewill be used in the future instead of the ones currently in hail.
The patch also enhances the description of failure messages.
Change AllocSolution from tuple to its own type
Tuples are good for two, three, at most four elements. Beyond that, thecontinuous pattern matching and construction/deconstruction becomestedious.
Since in the future we'll probably keep more information in the...
Cleanup AllocSolution after AllocElement changes
Since we added the score to AllocElement, we don't need to wrapAllocElement in yet another tuple, just to attach the cluster score. Sowe simplify the AllocSolution type.
AllocElement: extend with the cluster score
AllocElement, a type used as a result of allocations, holds the statusof the nodes after the allocation. In most cases, we'll compare thisallocation result with others, to see which allocation decision makesthe most sense. This comparison is done via the cluster score....
Add two utility functions for the Result type
Actually, this just moves the functions from the QC module to Types, andremoves a duplicate entry from Cluster.
Rework the types used during data loading
This improves on the previous change. Currently, the node and instancelists shipped around during data loading are (again) association lists.For instances it's not a big issue, but the node list is rewrittencontinuously while we assign instances to nodes, and that is very slow....
Loader functions: move from assoc lists to maps
When loading big clusters, the association lists become a bit slow, sowe'll replace this with a simple Map String Int; the change is trivialand can be reverted easily, while it brings up a good speedup in the...
Convert some leftovers to NameAssoc
The type alias NameAssoc has been introduced a long time ago, but thereare some few not-yet-converted cases. In preparation for changes to thattype, let's make sure we use it consistently.
Add Cluster.splitCluster for node groups
This splits a top-level cluster information into the component nodegroups. Instance go to the group of their primary node, but otherwise wedon't disallow split instances.
Rework Container.hs and improve test coverage
Since some of the functions we export from Container.hs are 1:1identical to IntMap, we can just export the originals and remove thewrappers. This reduces the code we need to unittest.
Furthermore, we add two simple unittest for the two non-trivial...
Add new command-line option for group selection
Add two functions for checking cluster consistency
For now, we don't support instances allocated across two groups, and wewill reject such clusters. The isClusterConsistent function will returna list of inconsistent instances, potentially allowing operation without...
Add function for nodes to (nodgroup, nodes) split
Unittests included. The function will be needed for consistency checksin the algorithms.
Add a type alias for UUIDs
This is to pottentially allow easier changes later.
RAPI: read the group UUID from the server
This depends on future support from Ganeti (2.4+).
IAlloc: read group uuid from the input message
This makes the code incompatible with JSON files from Ganeti pre-2.4.
Text: read/save the node group UUID
Compatibility with old text files is kept by using the default UUID ifthe file (or even some records) don't have a UUID.
Luxi: read the node uuid from the cluster
This makes the code incompatible with Ganeti pre-2.4.
Node: add the node group's UUID
This is not used anywhere yet, and the backend are all just adding thedefault UUID, not the real one.
The patch also allows displaying the group UUID in the node list.
Utils: add a default UUID
This will be used as a placeholder for the cases when we need a UUID(any UUID), but we don't have one handy.
Merge branch 'devel-0.2' into master
Improve the standard deviation computation
This does just two passes, instead of three, over the list. This reducesthe overall runtime well enough (~25%) in some tests, but it's notreproducible using profiling, so I don't know how much the functionitself is being sped-up....
Simu loader: move the loading to non-IO code
While we don't actually have IO code in the Simu loader, we do have thesame interface. So we move the code again to a separate parseDatafunction which is exported.
Luxi loader: split parsing from loading
Rapi loader: split parsing from loading
The change is similar to the text loader change.
Text loader: split parsing from loadData
This change, which will be followed by similar changes in the otherloaders, splits the parsing of the data from the actual loading fromdisk. Since the parsing doesn't usually involve IO actions, we will beable to better test the parsing. The loading becomes a smaller part of...
Ignore nodes which are not vm_capable
This break compatibility with Ganeti pre-2.3.
Fix tag exclusion weight
Currently, the tag exclusion metric has a weight of one, which meansthere might be cases where we won't move instances around because itupsets the cluster metrics. However, we do want to make a higher effortfor cleaning up tag collisions, so we increase the weight to an...
Fix some warnings in unittests
Improve the error message for tiered alloc option
Use the mingain options in the balancing algorithm
Also adds them in hbal.
Add new CLI options for min gain during balancing
Recent hbal seems to run many steps for small improvements (< 1e-3), sowe should stop early in this case.
We add a new option (-g), that will be used for the minimum gain duringbalancing. This check will only become active when the cluster score is...
Add some more debugging functions
These are just variations of the standard debug, but are provided forsimpler code, since lazyness is something causing non-computation ofdebug statements.
Fix ReplaceSecondary moves for offline nodes
The addition of a new secondary on a node is doing two memory tests:- in strict mode, reject if we get into N+1 failure- reject if the new instance memory is greater than the free memory (not available memory) on the node...
Change iterateAlloc to return the instance list
The Cluster.iterateAlloc and tieredAlloc functions are changed to alsoreturn the updated instance list, since it is needed to have a “full”cluster view.
Abstract the cluster serialization from hscan.hs
This is currently hardcoded in an internal function in hscan.hs, and wemove it to Text.hs for later use.
Add a new option --save-cluster
This option will in the future be used to serialize the cluster state inhbal and hspace after the rebalance/allocation steps.
Add unittest for Node text serialization
This checks that the Node text serialization and deserializationoperations are idempotent when combined other.
Switch unittest to custom hostnames
Currently, the hostnames are almost fully arbitrary chars, which breaksthe assumption that nodes/instances will be normal DNS hostnames.
This patch adds some custom generators for these hostnames, that willallow better testing of text loader serialization/deserialization.
Move text serialization functions to Text.hs
Currently these are in hscan, and cannot be reused easily.
hail: fix error message for failed multi-evac
Currently we show the instance index, but this makes no sense outsidethe current running program. Instead, we show the instance name.
Fix another haddock issue
Remove an obsolete function and add Utils tests
Add some more imports to QC.hs
This is needed so that in the coverage report we list all modules, eventhe ones we don't test at all, such that we get the complete results.
Change the meaning of the N+1 fail metric
Currently, this metric tracks the nodes failing the N+1 check. Whilethis helps (in some cases) to evacuate such nodes, it's not a goodmetric since rarely it will change during a step (only at the lastinstance moving away). Therefore we replace it with the count of...
Introduce per-metric weights
Currently all metrics have the same weight (we just sum them together).However, for the hard constraints (N+1 failures, offline nodes, etc.)we should handle the metrics differently based on their meaning. Forexample, an instance living on a primary offline node is worse than an...
Allow balancing moves to introduce N+1 errors
This patch switches the applyMove function to the extended versions ofNode.addPri and addSec, and passes the override flag based on the stateof the node that we're moving away from.
Introduce a relaxed add instance mode
In case an instance is living on an offline node, it doesn't make senseto refuse moving it because that would create N+1 failures; failing N+1is still much better than not running at all. Similarly, if thesecondary node of an instance is offline, meaning the instance doesn't...
Remove obsolete Container.maxNameLen
This was only used in one place (hbal), and is obsolete by the change tothe dual name/alias structure.