Cleanup AllocSolution after AllocElement changes
Since we added the score to AllocElement, we don't need to wrapAllocElement in yet another tuple, just to attach the cluster score. Sowe simplify the AllocSolution type.
Signed-off-by: Iustin Pop <iustin@google.com>...
AllocElement: extend with the cluster score
AllocElement, a type used as a result of allocations, holds the statusof the nodes after the allocation. In most cases, we'll compare thisallocation result with others, to see which allocation decision makesthe most sense. This comparison is done via the cluster score....
Add two utility functions for the Result type
Actually, this just moves the functions from the QC module to Types, andremoves a duplicate entry from Cluster.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Balazs Lecz <leczb@google.com>
Rework the types used during data loading
This improves on the previous change. Currently, the node and instancelists shipped around during data loading are (again) association lists.For instances it's not a big issue, but the node list is rewrittencontinuously while we assign instances to nodes, and that is very slow....
Loader functions: move from assoc lists to maps
When loading big clusters, the association lists become a bit slow, sowe'll replace this with a simple Map String Int; the change is trivialand can be reverted easily, while it brings up a good speedup in the...
Convert some leftovers to NameAssoc
The type alias NameAssoc has been introduced a long time ago, but thereare some few not-yet-converted cases. In preparation for changes to thattype, let's make sure we use it consistently.
Add Cluster.splitCluster for node groups
This splits a top-level cluster information into the component nodegroups. Instance go to the group of their primary node, but otherwise wedon't disallow split instances.
Rework Container.hs and improve test coverage
Since some of the functions we export from Container.hs are 1:1identical to IntMap, we can just export the originals and remove thewrappers. This reduces the code we need to unittest.
Furthermore, we add two simple unittest for the two non-trivial...
Add new command-line option for group selection
Add two functions for checking cluster consistency
For now, we don't support instances allocated across two groups, and wewill reject such clusters. The isClusterConsistent function will returna list of inconsistent instances, potentially allowing operation without...
Add function for nodes to (nodgroup, nodes) split
Unittests included. The function will be needed for consistency checksin the algorithms.
Add a type alias for UUIDs
This is to pottentially allow easier changes later.
RAPI: read the group UUID from the server
This depends on future support from Ganeti (2.4+).
IAlloc: read group uuid from the input message
This makes the code incompatible with JSON files from Ganeti pre-2.4.
Text: read/save the node group UUID
Compatibility with old text files is kept by using the default UUID ifthe file (or even some records) don't have a UUID.
Luxi: read the node uuid from the cluster
This makes the code incompatible with Ganeti pre-2.4.
Node: add the node group's UUID
This is not used anywhere yet, and the backend are all just adding thedefault UUID, not the real one.
The patch also allows displaying the group UUID in the node list.
Utils: add a default UUID
This will be used as a placeholder for the cases when we need a UUID(any UUID), but we don't have one handy.
Merge branch 'devel-0.2' into master
Improve the standard deviation computation
This does just two passes, instead of three, over the list. This reducesthe overall runtime well enough (~25%) in some tests, but it's notreproducible using profiling, so I don't know how much the functionitself is being sped-up....
Simu loader: move the loading to non-IO code
While we don't actually have IO code in the Simu loader, we do have thesame interface. So we move the code again to a separate parseDatafunction which is exported.
Luxi loader: split parsing from loading
Rapi loader: split parsing from loading
The change is similar to the text loader change.
Text loader: split parsing from loadData
This change, which will be followed by similar changes in the otherloaders, splits the parsing of the data from the actual loading fromdisk. Since the parsing doesn't usually involve IO actions, we will beable to better test the parsing. The loading becomes a smaller part of...
Ignore nodes which are not vm_capable
This break compatibility with Ganeti pre-2.3.
Fix tag exclusion weight
Currently, the tag exclusion metric has a weight of one, which meansthere might be cases where we won't move instances around because itupsets the cluster metrics. However, we do want to make a higher effortfor cleaning up tag collisions, so we increase the weight to an...
Fix some warnings in unittests
Improve the error message for tiered alloc option
Use the mingain options in the balancing algorithm
Also adds them in hbal.
Add new CLI options for min gain during balancing
Recent hbal seems to run many steps for small improvements (< 1e-3), sowe should stop early in this case.
We add a new option (-g), that will be used for the minimum gain duringbalancing. This check will only become active when the cluster score is...
Add some more debugging functions
These are just variations of the standard debug, but are provided forsimpler code, since lazyness is something causing non-computation ofdebug statements.
Fix ReplaceSecondary moves for offline nodes
The addition of a new secondary on a node is doing two memory tests:- in strict mode, reject if we get into N+1 failure- reject if the new instance memory is greater than the free memory (not available memory) on the node...
Change iterateAlloc to return the instance list
The Cluster.iterateAlloc and tieredAlloc functions are changed to alsoreturn the updated instance list, since it is needed to have a “full”cluster view.
Abstract the cluster serialization from hscan.hs
This is currently hardcoded in an internal function in hscan.hs, and wemove it to Text.hs for later use.
Add a new option --save-cluster
This option will in the future be used to serialize the cluster state inhbal and hspace after the rebalance/allocation steps.
Add unittest for Node text serialization
This checks that the Node text serialization and deserializationoperations are idempotent when combined other.
Switch unittest to custom hostnames
Currently, the hostnames are almost fully arbitrary chars, which breaksthe assumption that nodes/instances will be normal DNS hostnames.
This patch adds some custom generators for these hostnames, that willallow better testing of text loader serialization/deserialization.
Move text serialization functions to Text.hs
Currently these are in hscan, and cannot be reused easily.
hail: fix error message for failed multi-evac
Currently we show the instance index, but this makes no sense outsidethe current running program. Instead, we show the instance name.
Fix another haddock issue
Remove an obsolete function and add Utils tests
Add some more imports to QC.hs
This is needed so that in the coverage report we list all modules, eventhe ones we don't test at all, such that we get the complete results.
Change the meaning of the N+1 fail metric
Currently, this metric tracks the nodes failing the N+1 check. Whilethis helps (in some cases) to evacuate such nodes, it's not a goodmetric since rarely it will change during a step (only at the lastinstance moving away). Therefore we replace it with the count of...
Introduce per-metric weights
Currently all metrics have the same weight (we just sum them together).However, for the hard constraints (N+1 failures, offline nodes, etc.)we should handle the metrics differently based on their meaning. Forexample, an instance living on a primary offline node is worse than an...
Allow balancing moves to introduce N+1 errors
This patch switches the applyMove function to the extended versions ofNode.addPri and addSec, and passes the override flag based on the stateof the node that we're moving away from.
Introduce a relaxed add instance mode
In case an instance is living on an offline node, it doesn't make senseto refuse moving it because that would create N+1 failures; failing N+1is still much better than not running at all. Similarly, if thesecondary node of an instance is offline, meaning the instance doesn't...
Remove obsolete Container.maxNameLen
This was only used in one place (hbal), and is obsolete by the change tothe dual name/alias structure.
hbal: print short names in steps list
This was a regression from the name handling changes, as we startedusing the original names for the solution list (which is not designedfor parsing/feeding back into ganeti).
Remove an obsolete function
printSolution is no longer used, as we print the solution iterativelynow.
Allow '+' in node list fields
When the field list is prefixed with a plus sign, this will extend thedefault field list, instead of replacing it entirely.
Update the node list fields
This patch renames the pri/sec to pcnt/scnt, and adds the real primaryand secondary instance lists, the peermap and the index of a node asselectable options.
Cleanup a node's peer map when possible
If the last secondary instance of a peer is deleted (detected by the newpeer memory value being equal to zero), then the pair (pdx, 0) should bedeleted completely. This is not optimization per se, but rather cleanup...
Fix another haddock special-char issue
Remove JOB_STATUS_GONE and add unittests
… for the serialization/deserialization of the job and opcode status.
Job status 'gone' was not actually used. It can be reintroduced ifneeded.
Fix some lint errors in the unit tests
Change the Luxi operations structure
Currently, we define the LuxiOp type as a simple enumeration, and leavethe arguments structure to the users of the Ganeti.Luxi module. This issuboptimal for a couple of reasons: first, we decouple the operationtype from operation arguments, and that means we don't use the type...
Fix a warning in Loader tests
Incomplete pattern match…
Add a few Loader tests
These are not comprehensive, but at least we have a start.
Reduce the warnings during the unittests
Since the unittests are not 'clean' from the p.o.v. of typedeclarations, and cannot be made clean in all respects (e.g. orphaninstances), we silence some warnings for the test target, to have acleaner output.
Introduce OpCode unittests
Introduce suport for optional keys in JObjects
Some keys are optional in the Ganeti opcodes (e.g. ‘node’ in theOpReplaceDisks), and as such we need to transform them in a Maybe value,instead of failing.
The patch reworks a bit fromObj and adds maybeFromObj which parses such...
Replace fromJResult with annotateJResult
This patch removes all old uses of fromJResult with the annotatedversion, and removes the non-annotated version. All JSON parsing pointsshould now have annotated errors.
Add annotations to loadJSArray
This allows, for example, the RAPI backend to detail which information(instance or node data) fails to parse.
Change fromObj error messages
Currently fromObj doesn't detail what we're trying to read, which canlead to cryptic messages: "Cannot read Int". The patch changes thisfunction to annotate the error messages with the key/value we're tryingto convert, by using a new version of fromJResult....
A few more small Node unit-tests
Add more unittests
Instance, Node and Text modules have improved coverage.
Add more unit tests for allocation/balance
The patch adds some simple unit-tests for both the allocation function(we can allocate small instances on an empty cluster, we can allocate intiered more starting from any size) and the balancing functions (one...
Move two functions from hspace to Cluster.hs
This is done so we can test a longer pipeline.
Make CStats instance of show
This helps debugging via ghci.
Another haddoc fix…
Accept both full and short names in CLI
This patch introduces some new functionality in the base Element typeand in Container which supports searching for all 'known' names of anelement, such that both short and full names are accept for variousoptions like '-O' and '--excluded-instances'.
Stop modifying names for internal computations
Currently the name used internally is modified and holds the shortenedname of the nodes/instances. This has caused issues before, since wealways have to strip the suffix from input data and reapply it if we...
Add a new node/instance field
This new field ('alias') will hold the shortened/beautified displayname. When resetting the name, the alias is reset too, and there's a newfunction to update only the alias.
Change some test constants
First, we reduce the max size of the disks, since Int on 32bits willoverflow for big simulated clusters. This is a real issue, that willneed fixing in real life, but for now we just "silence" this test.
Second, we increase the amount of time a test is allowed to run,...
Fix some haddock comments
Add more unit tests
This increases the overall coverage by 5%-10% (depending on coveragetype). Some modules are still not unittested at all, as HUnit is abetter choice for them.
Shuffle some constants around
… and export more functions. This will help with unit testing.
Remove the noLimit values and always use limits
This patch moves from allowing no-limits for disk/cpu ratios, and alwaysuse a real limit. For disk, it's simple since we use 0, which means noreservations for disks. For CPU, we set an (arbitrary) limit of 64 v/p,...
Fix hspace's KM metrics
We returned the KM_POOL_* metrics as the final state, not as the deltabetween the final and the initial state.
Fix Node hiCpu computation
In case we're not enabling limits, let's restrict this to -1, instead of-1 times the number of pcpus.
Add a new function to compute allocation deltas
Given two cluster states, the new function can answer the followingquestions:
- how much resources currently allocated- how much resources finally allocated (delta from above is how much we can actually allocate on the cluster)...
Introduce total vcpu tracking in CStats
We add a new field that tracks the available virtual cpus (expressed asnode cpus times the vcpu ratio).
Merge branch 'master' into next
Fix iallocator crash when no solutions exist
Commit 5436576 added an un-guarded `head' call, which crashes with“Prelude.head: empty list” when no results exists for the per-instanceallocation/relocation calls.
This patch fixes this, and also adds another check for an unguarded...
Fix IAllocator multi-evacuate message
Since Ganeti passes full host names (not common-suffix-stripped), weneed to remove the suffix from the evac_nodes keys too. In case one nodeis not part of the cluster, it will lead to a wrong error message, butfor now it fixes the problem.
Fix a haddock comment issue
For some versions of haddock, this can create problems.
Abstract instance running states into a list
This removes some manual checks from a few places in the code with asingle list defined once.
A number of small fixes from hlint
Fix unused-do-binds for ghc 6.12
GHC 6.12 has some new warnings, which are valid in most cases except(IMHO) printf usage.
Fix unused imports for ghc 6.12
GHC 6.12 has become more picky about unused imports, so we need toremove/tighten some of them.
hscan: implement LUXI backend scanning
This allows hscan to work also with NO_CURL (but only for the localmachine, of course).
Loader: abort for unknown to-be-excluded instances
balance function: use the movable flag directly
Instead of deciding based on secondary node, use the new flag.
Update the loader pipeline to set the movable flag
This updates the movable flag on instances if they have only one node(we don't rely on OpMoveInstance) or if they are set so via the commandline options.
This doesn't yet enable the use of the new flag.
Add a 'movable' flag on instances
This will be used instead of checking for no secondary and forsimplifying 'do not touch' instances.
Add an option for excluding instances from moves
Implement IAllocator node evacuate request
This patch adds the new request loading/execution (trivial), but theactual response formatting becomes more difficult as now the responsetype differs by request.
Signed-off-by: Iustin Pop <iustin@google.com>
Add a tryEvac function
This will be used by the node evacuate IAllocator request type.
Move a type declaration to Node.hs
We'll need AllocElement in both Cluster and IAlloc in the future, so wemove it to Node.hs which is imported by both.