Add more unit tests
This increases the overall coverage by 5%-10% (depending on coveragetype). Some modules are still not unittested at all, as HUnit is abetter choice for them.
Shuffle some constants around
… and export more functions. This will help with unit testing.
Remove the noLimit values and always use limits
This patch moves from allowing no-limits for disk/cpu ratios, and alwaysuse a real limit. For disk, it's simple since we use 0, which means noreservations for disks. For CPU, we set an (arbitrary) limit of 64 v/p,...
hspace: change handling of N+1 bad clusters
Currently we just print a fake result and exit early. This is bad, sinceit doesn't use the same codepaths for all the result printing, and hasalready led to a bug where hspace looks like completely ignoring the...
Fix hspace's KM metrics
We returned the KM_POOL_* metrics as the final state, not as the deltabetween the final and the initial state.
Update NEWS file for the 0.2.5 release
Update hspace man page
hspace: show more metrics
This patch adds the metrics of used/allocable/unallocable resources.
Fix Node hiCpu computation
In case we're not enabling limits, let's restrict this to -1, instead of-1 times the number of pcpus.
Add a new function to compute allocation deltas
Given two cluster states, the new function can answer the followingquestions:
- how much resources currently allocated- how much resources finally allocated (delta from above is how much we can actually allocate on the cluster)...
Introduce total vcpu tracking in CStats
We add a new field that tracks the available virtual cpus (expressed asnode cpus times the vcpu ratio).
Merge branch 'master' into next
Fix iallocator crash when no solutions exist
Commit 5436576 added an un-guarded `head' call, which crashes with“Prelude.head: empty list” when no results exists for the per-instanceallocation/relocation calls.
This patch fixes this, and also adds another check for an unguarded...
Fix IAllocator multi-evacuate message
Since Ganeti passes full host names (not common-suffix-stripped), weneed to remove the suffix from the evac_nodes keys too. In case one nodeis not part of the cluster, it will lead to a wrong error message, butfor now it fixes the problem.
Fix a haddock comment issue
For some versions of haddock, this can create problems.
Abstract instance running states into a list
This removes some manual checks from a few places in the code with asingle list defined once.
A number of small fixes from hlint
Add a lint target that uses hlint
Fix unused-do-binds for ghc 6.12
GHC 6.12 has some new warnings, which are valid in most cases except(IMHO) printf usage.
Fix unused imports for ghc 6.12
GHC 6.12 has become more picky about unused imports, so we need toremove/tighten some of them.
Allow overriding the ghc compiler used
… via a GHC make variable.
hscan: implement LUXI backend scanning
This allows hscan to work also with NO_CURL (but only for the localmachine, of course).
Loader: abort for unknown to-be-excluded instances
Enable hbal to use the new command line option
balance function: use the movable flag directly
Instead of deciding based on secondary node, use the new flag.
Update the loader pipeline to set the movable flag
This updates the movable flag on instances if they have only one node(we don't rely on OpMoveInstance) or if they are set so via the commandline options.
This doesn't yet enable the use of the new flag.
Add a 'movable' flag on instances
This will be used instead of checking for no secondary and forsimplifying 'do not touch' instances.
Add an option for excluding instances from moves
Update NEWS file for the 0.2.4 release
Update the hail man page
This adds a short note for the new iallocator mode.
Signed-off-by: Iustin Pop <iustin@google.com>
Implement IAllocator node evacuate request
This patch adds the new request loading/execution (trivial), but theactual response formatting becomes more difficult as now the responsetype differs by request.
Add a tryEvac function
This will be used by the node evacuate IAllocator request type.
Move a type declaration to Node.hs
We'll need AllocElement in both Cluster and IAlloc in the future, so wemove it to Node.hs which is imported by both.
Change an internal type from Maybe to list
In preparation for multiple responses, we change from Maybe to List(both used in the container sense).
This allows us to keep the same workflow for all kind of requests.
IAllocator: move some keys into per-request data
Since not all structures will have these keys in the future, we movethem into per-structure keys.
Document the evac mode
Implement evacuation mode in hbal
This mode restricts the list of instances to be moved to the instancesliving on the offline (and drained) nodes.
Add an evac mode CLI option
Reorder options in CLI.hs
This should be no code change, just reordering of the options.
Update documentation for the text backend
Update NEWS file for the 0.2.3 release
Fix secondary node selection for existing N+1
In case a secondary node is already N+1 failed, currently the nodeselection will accept a node that cannot start (at all) the new instanceas valid. This is wrong, so we add a new simple check to prevent the...
Rewrite the node add checks for simpler layout
This will make it clearer than many if…then choices.
Move instance relocation test upper in the chain
Currently we test each instance for relocation in checkMove; however, itis a little more clear if we pass only the relocatable instances tocheckMove. The patch also slightly rewrites (indendation/style) the...
Split the balancing function in two parts
Currently in the balancing function we do two thing:
- take the decision where to do a new balancing round or not- and actually computing the balancing round
This is not nice, as the two parts are conceptually separate, so this...
Small update to the Makefile
Makefile: Switch from subshell to $(MAKE) -C
It seems that set -e does not affect subshell (only simple commands),and thus we don't actually get failures from make check being run in asubshell. Rather than trying to handle this better, we remove thesubshell and invoke make with the required subdirectory.
Fixing a typo in option description
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Signed-off-by: Iustin Pop <iustin@google.com>
live-test fixes for the text backend changes
Update hscan to generate new-format text files
This also updates its manpage, some information were really old.
Switch the text file format to single-file
This patch changes from the two separate files to a single file, withsections separated by a blank line. Currently only the node and instancedata is accepted, later the cluster tags will be read too via thisformat....
Change the signatures of the text loader slightly
This is in preparation for the text format changes.
Update NEWS file for the 0.2.2 release
Update hbal man page to note that we use stddev
We actually use stddev and not the coefficient of variance (as wronglynoted before), so we update the documentation appropriately.
We also note that the dynamic load values must be pre-normalized, sincewe don't do such a normalization in the code.
Update comments in live-test about clusters
The live test needs a populated cluster, so let's write this down.
Improve the dist build rule
This changes the 'dist' rule to also do a check that the archive canbuild all the programs and passes the check test itself, and shows thesha1sum at the end automatically.
Remove Version.hs during clean too
Ganeti/HTools/Version.hs is generated at build time from version (whichis the only one shipped), so it must be removed by the clean rule.
Fix small typo
This was found, of all things, via lintian during the Debian packaging…
Convert n1_score metric from % to count
This increases the priority of fixing N+1 failures compared to balancingmetrics.
Metric: count of primary instances/offline nodes
This helps with evacuation/failover of instances on 2-node clusters withone one offline.
Offline instance metric: change from % to count
Currently we use the offline instance percentage (with range [0, 1]),but this is not good, since we want the evacuation of such instances tohave a high priority; therefore we change this to a count of offline...
Use the oper_ram field if available
For the RAPI and LUXI backends, we can get the actual memory usage (ifinstances are running) via the oper_ram, whereas backend/memory onlytell what the instance will use at the next boot.
Not using oper_ram means that the node model is flawed and we consider...
rapi, luxi: treat drained nodes as offline
Commit e97f211 changed the iallocator backend to handle drained nodes asoffline. This commit completes that change by making the rapi and luxibackend do the same (the text backend ignores any '?' values which are...
Add a live-test script
This can be used to test that all the existing commands work correctly. Itneeds a running cluster with at least one instance to run all the tests.
Fix typo breaking LUXI backend
This really shows the need for actual dist-time full testing (notunittests).
Update NEWS file for the 0.2.1 release
Fix unittests after instance tags addition
Merge branch 'next'
Update documentation for the iextags
Re-wrap the README
… since we added the fill-column 72 setting.
Configure exclusion tags via the cluster tags
This patch adds reading of the exclusion tags from the cluster tags: anytags starting with htools:iextags: will convert their suffix into anexclusion tags prefix. In other words, "htools:iextags:service" will...
hail: add '-p' option intended for debugging use
This prints the initial node list on stderr, since stdout is reserved for theiallocator protocol (even though ganeti won't pass -p itself).
Read cluster tags in the IAllocator backend
Read cluster tags in the LUXI backend
Read cluster tags in the RAPI backend
This also shows them in hbal in verbose mode.
Introduce support for reading the cluster tags
While these are not actually populated from the backends, and all theprograms ignore them, this patch contains the changes in the functiontypes required.
hspace: quote non-alphanum values in shell output
The tiered allocation output which contains spaces makes the output ofhspace non-sourceable. This patch adds a new function to ensurenon-alphanumeric values are quoted such that the output can be parsed...
Collapse the statistical functions into one
This allows us to get rid of two duplicate list length computations,with a minor speedup.
Specialize the math functions
The statistics functions are currently defined as polymorphic with aFloating constraint. Changing this to monomorphic on Double type makesthem stricter and much more performant (~70% speedup). This is a cheapway to recoup some of the loses incurred by the recent proliferation of...
Use conflicting primaries count in cluster score
This small patch adds the number of conflicting primaries in the clusterscore. This is different from the other non-CV metrics where we usuallycompute the percentage of failing instances (for that metric); but for a...
Node: add function for conflicting primary count
Add a new node list field
This patch adds a new node list field (ptags), showing the primaryinstance tags.
Add a command-line option to filter exclusion tags
Since we don't want all instance tags to be used for exclusion, we add acommand line option to filter on these. Since the iallocator protocolcannot accept command line options, currently it's not possible to...
Introduce tag-based exclusion of primary instances
This patch introduces exclusion of primary instances based on tags. Thisis incomplete as currently all tags are being excluded, and we don'toptimise towards relocation of instances sharing tags on the same node.
Add a tags attribute to instances
… and read it in all the loaders. hscan is modified to save it to thefiles it generates.
The attribute is not yet used in any place.
Small change in some list arguments
This is simpler than the concat operator.
Use either \- or \(hy in manpages
This reduces warnings from lintian when building Debian packages.
Update NEWS file for the 0.2.0 release
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Rewrite NEWS for better RST compatibility
The text-only version should still be very readable, but the RST outputwill be better hopefully.
Allow overriding the field list in -p
The print nodes option can now accept an optional field list tocustomise the output. This is ugly, since the field names do not matchthe header names, but it is at least barely customisable (at runtime).
Update hspace manpage with tiered allocation info
Also fixes some other small issues in man pages.
Move more node-listing functionality in Node.hs
This will prepare for the runtime-selectable field list.
Change the default dynamic usage to baseUtil
This fixed the unbalanced secondary instances on partially emptyclusters, and helps in general for the cases where real utilisation datais not available.
Add a few comments in the scoring function
Enhance the error reporting for Rapi and Luxi
Currently the JSON conversion in Rapi and Luxi are giving somethinglike: Error: failed to load data. Details: Unable to read Double
This doesn't tell one where the error is (in a node specification? and...
Change the Utils.fromObj signature
Currently the fromObj function takes a JSON object which is thenconverted into a list of (String, JSValue) in which we make a lookup.However, most of the callers of this function call it repeatedly on thesame object, which means we do the object→list conversion repeatedly....
Rework the tiered spec output format
A small style change in Node.hs
This imports PeerMap as P and reindents some lines.