Split the balancing algorithm in two parts
Currently the computation, recursing part and the IO part (progressupdates) of the balancing main function (iterateDepth) are all in thesame function, which makes it hard to test. This patch moves thedecision/computation part (whether to proceed one more round, whether we...
Implement support for 'cheap' moves only
This patch adds support for cheap (failover/migrate) operations only inthe balancing algorithm and in the hbal command line options.
This allows a very quick balancing (compared to allowing replace-disks)which can be useful as a scheduled operation.
Simplify the wrapIO function
This fixes one warning from hlint.
Use migrate or failover based on instance state
While we can't guarantee that the instance will be in the same state bythe time the migrate/failover command will be run, we can at least tryto do the right thing assuming no other changes to the cluster state....
Update NEWS file for the 0.1.6 release
Improve the error message for command line errors
Instead of using ioError . userError, we format the error ourselves.This is nicer - no ‘)’ at the end of the output.
Add a simulated cluster data loader
This is useful especially for hspace, where we might want to simulate ahypothetical cluster to check allocation beforehand.
Fix a typo in hbal.hs
Signed-off-by: Guido Trotter <ultrotter@google.com>
CLI: Handle error better
This patch adds an error handler for any exceptions that are raisedduring the external data load phase. This can be improved further, butit's a good start.
Unify the command line options and structures
This patch moves all the command line options and their internalrepresentation into CLI.hs. This means that duplicated options betweenany two binaries are no longer declared twice, and that we no longerneed the two *Option classes.
Document the --vcpus option to hspace
Fix a few hlint errors
Man page updates
This patch beautifies the man pages for hbal and hspace.
CLI: Prevent incompatible options to be selected
This patch makes CLI abort if more than one backend is selected.
Update documentation for the new luxi backend
Add support for luxi backend in CLI/hspace/hbal
This patch changes the backend selection method in CLI to prefer, in order: - a RAPI specification - a Luxi specification - and finally the node/instance files
It also modifies hspace and hbal to provide a ‘-L’ command line option...
Initial commit of the luxi backend
This patch adds a luxi backend that allows direct query of the masterdaemon on the local node. This patch doesn't enable the backend to beused.
There are a couple of things still missing in the implementation: - we don't have a master timeout in reads and writes, only a...
Introduce timeout in RAPI queries
The patch adds two constants in Types.hs for connect and query timeout,then modifies Rapi.hs to use them as the connect and general curltimeout.
Rapi could be improved more, as currently we wait double the totaltimeout due to not aborting early in case the node queries failed.
Update NEWS file for the 0.1.5 release
This is basically a hspace release, so the changelog is small.
Fix a haddoc issue
hspace: fix failure handling of tryAlloc results
Currently hspace doesn't handle failures from tryAlloc correctly; thispatch changes the iterateDepth function in hspace to return a Result (…)so that errors can be propagated correctly.
The patch also changes one output key to be more clear and a typo in...
Change the tryAlloc/tryReloc workflow
Currently, the tryAlloc and tryReloc function return a list with all theresults, both failures and successes. This is fine for hail, which doesone round of allocations, but is not so good for hspace, which doesiterative rounds; since at each (successful) step we only take the best...
Simplify the Cluster.tryAlloc structures
Currently the tryAlloc function calls theallocateOnSingle/allocateOnPair and the builds a new tuple with thosefunctions's result plus the new node list. This is however suboptimalin two respects: - the new nodes added are the 'old' versions of the respective nodes,...
Slight change to the internal allocation results
Currently the Cluster.AllocSolution type is defined as a list of‘(OpResult Node.list, …)’ and the results for applyMove are defined as‘(OpResult Node.List, …)’. Both these means that the failure/successindication is hidden in the first elements of this tuple, which makes is...
Add a 'tags' makefile target
This uses hasktags for building emacs TAGS.
hspace: switch output to shell-script format
This (big) patch changes the output of hspace from text-format(separated by ‘: ’) to a shell-snippet, in ‘key=value’ format.
This will allow sourcing the output or parsing it via awk/sed/etc.
hspace: move instance count and score into CStats
Currently the instance count and cluster score are separated from theother initial/final phase stats, even though they are very similar. Thispatch moves computation of these two into totalResources/CStats and...
Fix unittests
The recent OpResult and CPU values additions broke unittests.
Export more stats in hspace
This patch changes Cluster.totalResources to compute more resources andprints them in hspace.
Show errors on stderr instead of stdout
Currently many of the exit and warning conditions mistakenly display errormessages on stdout, which makes parsing the output of programs harder. Thispatch attempts to fix such occurrences.
Fix score calculation to work with empty clusters
Currently the cluster score calculation includes an offline instancepercentage, expressed as “offline inst / (offline + online inst)”, whichresults in NaN for empty clusters. This patch changes the calculation...
hspace: convert N1 error exit into FailN1 result
Currently hspace exits with an error if the cluster is not N+1 compliantat the beginning of the run. This patch changes hspace such that thiscondition is instead treated as a zero-allocation-possible, FailN1 mode....
Some docstring updates
hspace: add display of instance spec
This is mostly for user-friendliness in the default mode, when we don'tspecify the instance parameters.
Optimize the Utils.stdDev function
This patch optimizes the stdDev function in two respects: - first, we don't do sum . map which builds an intermediate list, but instead use a fold over the list to build incrementally the sum; this should reduce both the time and space characteristics, as we...
Take the foldl out of Loader.fixNodes
Currently Loader.fixNodes is foldl' with a complicated function. Itmakes more sense to take foldl' out of this function (and put it intothe caller) and let fixNodes be only this internal function.
Simplify Cluster.computeMoves
This patch changes the function Cluster.computeMoves to use guards and acouple of subexpressions in order to greatly simplify it.
Fix hlint-generated warnings
This big patch cleans up the code per hlint indications. Many removalsof extra parentheses, replacements of concat . map with concabtMap,extra dollar signs, eta reductions, etc. were performed.
The code still compiles and passes a couple of manual tests on sample...
Add computation of the failure reason in hspace
This patch enhances hspace to report why the allocation sequencestopped, both in absolute error count and for the top reason.
Return correct failure data from Node.add*
This patch alters the Node.addPri/addSec to return correct failure data.It removes the computeFailN1 function from the module as that used tocombine both mem and disk checks in the same function and thus the real...
Introduce a new type for allocation results
Currently the allocation/move operations workflow return ‘Maybe a’,which is very convenient but loses all details about the failure mode.
This patch introduces a new data type which encodes the specific failure...
Remove hn1 and related code
hn1 was deprecated for a while and this patch removes it altogether. Thesupport code in Cluster.hs is also removed.
Display two more stats in hspace
This adds two new stats - sum of reserved ram and disk.
Fix totalResources avail disk computation
This uses the newly-added Node.availDisk to compute the actual availabledisk correctl, and display the total allocatable disk in hspace.
Add an availDisk node function
This function returns the amount of available disk, which depends onwhether a low disk limit has been configured or not and on the free diskspace of the node.
Add two new autocomputed vars to Nodes
Currently we track the max disk usage/max vcpus as percentages, howeversometimes it's easier to check against minimum free disk or maximumnumber of cpus, as units instead of percentages.
This patch adds two new variables, lo_dsk, hi_cpu, which are recomputed...
Add a new type for cluster statistics
Currently totalResources returns a 5-tuple of integers. This is not easyto handle, as each change on the return type means that each caller mustbe updated.
This patch adds a new type for cluster stats and uses that instead as...
Enhance hspace resource display
The display of cluster resources is extracted into a separate functionand enhanced to display more stats.
Add display of more stats in hspace
This patch changes Cluster.totalResources to compute more details aboutthe cluster status, and enhances hspace to display more of these.
Update NEWS file for the 0.1.4 release
Fix a haddock/docstring issue
Fix some hscan bugs
Currently hscan has a number of bugs: - doesn't add the common suffix (csf) to the instance's nodes - doesn't export the cpus for neither nodes nor instances - doesn't support single-node instances
This patch fixes these issues.
Some documentation updates for the new parameters
Add cpu/disk limits in hbal
Add setting of node limits in hspace
Implement cpu/disk limits in instance moves
We modify Node.addPri/addSec to take into account the limits on instanceadds.
Add two new node attributes
Two new min disk free ratio and max cpu usage attributes are added to thenodes. These will be used in the future to restrict allocation.
Fix 'unused X' warnings
This removes some unused functions and imports to cleanup the warnings.
Fix the various monomorphism warning
In a few places (e.g. tryRead or any printf call) it's a little bit hardto add the correct type signatures, but in the it is possible to fixthese warnings (which can bite one in subtle cases).
Small changes to the node list output
This is just some cleanup of the node list output, adding pcpu/vcpucounters, and making the display slightly nicer.
Add cpu ratio to cluster calculation
Update cpu counters correctly after pinst changes
The cpu counters are update on primary instance adds/removes.
Add cpu-count-related attributes to nodes
This patch adds cpu-count related attributes to nodes: - total cpus - cpus in use - ratio of virtual:physical cpus
We also set correctly the cpu values at load time, but we don't doanything yet while moving instances around. The cpu ratio is shown in...
Add a new vcpus attribute to instances
This patch adds reading of vcpu count for instances, in preparation forusing the vcpu ratio in cluster scoring.
Fix reading of total disk space in iallocator
IAllocator currently uses a wrong key name for reading the total diskspace (‘disk_usage’ which was copied from RAPI, but the actualiallocator key is ‘disk_space_total’).
This patch fixes that and also makes iallocator always use this key,...
Update NEWS and README for the 0.1.3 release
Small updates to the documentation and make a new small release.
Fix the ReplacePrimary instance move
During a replace-primary instance move, on the real cluster the instanceis temporarily started on the secondary, and as such we must check thatthe secondary node can hold it for this duration. Currently the codedoes not, and depending on cluster scoring it will put instances on such...
Update NEWS file for the 0.1.2 release
Update the README file with hspace informations
Fix hspace with plain type instances
This also fixes other required node numbers.
Add a man page for hspace
Rework the tryAlloc/tryReloc functions
Currently tryAlloc/tryReloc do not return the new instance, as this isnot needed for IAllocator alloc/reloc requests. However, for computingthe space, the new instance is useful, so we modify these functions toreturn this information too....
Add an utility function for triples
Initial add of the hspace tool
This is a tool that checks how many instances (of same size, specifiedby command line arguments) can be added to a cluster while remaining N+1compliant.
Small doc change
And an alignment issue.
Ensure consistent naming of the tools
This patch makes sure that all references to the name of the software isganeti-htools, not simply htools.
Small documentation update
Add copyright/license information
This doc-patch adds copyright and license information to (hopefully) allneeded files.
tests: move the test declaration in QC.hs
This patch moves the test declaration into QC.hs, so that test.hs has tobe modified only when we add a new test category.
Small whitespace change
Move some alloc functions from hail into Cluster
These are generic enough to be used from multiple places, they belongbetter in Cluster.hs than in the hail source.
Move the RqType and Request types to Loader.hs
These two will be more generic than now, and belong somewhere else -Loader.hs is a generic module for data loading, thus we move them there.
Cleanup an old function
Also replace a type with its synonim.
Lots of documentation updates
This patch does only doc build changes, doc changes and function movearound (for more logical documentation). It should have no impact at allon the code.
Change the check rule in Makefile
Since ghc won't trigger recompilation due to the -fhpc flag, it's notuseful to rm && make test, as this will only relink the binary.Therefore we simplify this rule.
A simple test for Container.addTwo
Add some very trivial Instance tests
This is more of an exercise in QuickCheck than strong testing.
Finish removal of unused params from PeerMap
This completes the removal started earlier byt removeing the need topass the number of nodes to Node.buildPeers, which is now unused.
Add test infrastructure and initial tests
This patch adds a QuickCheck-based test infrastructure and initial testsbased on it. The PeerMap module has a 100% coverage ☺
Side-note: one has to read the source of QuickCheck to see how to use it(especially the Batch submodule), the docs are not enough…
Some cleanup of the PeerMap module
This patch removes some unused functions and does some cleanup of theremaining ones.
Remove unused parameters from PeerMap creation
We remove some unused arguments (added way back for compatibility withArrays, which we didn't use in the end). This makes the code clearer(and doesn't need the Ndx type to be an instance of Num).
Remove an unused type synonim
Add a separate function for looking up instances
Currently we (wrongly) use lookupNode to lookup instances, just becausethe name assoc list has the same type. This patch adds a separatefunction for it.
Add type synonyms for the node/instance indices
This is a first step towards full datatype renaming. That requires morechanges, so at first we only want to document clearly what is a nodeindex, what is an instance index, and what is a plain Int.
Change the module import hierarchy
This patch makes the Types module a base module, and Node/Instance onesimport it, from the previous (opposite) situation. This will allow inthe future to use newtypes for the index and name types.
Update NEWS file for the second attempt at 0.1.1
Add a maintainer-clean makefile rule
This splits the current “clean” rule into proper clean (cleaning ofbuild artifacts) and maintainer-clean (cleaning of distributed files).This should make it better for Debian packaging.
Port offline node fixes from Rapi to IAllocator
The IAllocator source was copied from Rapi before the offline node fixeswere made. This changes such that offline nodes are accepted correctly.
Update NEWS file for the 0.1.1 release
Also replace tabs in an older entry with spaces.
Fix loading of plain instances via iallocator
Currently iallocator is broken when reading single-node instances (andwith an ugly error message). This patch fixes this case, by marking themwith secondary node “noSecondary” like the rest of the code.