Add a function converting Placements into Jobs
This converts from htools-specific Placements into Ganeti standardOpCodes, which will later allow execution via Luxi.
Record the move being performed in a Placement
This will allow a more descriptive output later in the solution list, asopposed to trying to reconstruct the move from the node indices.
The patch also documents the Placement members.
Split the Luxi generic parts from the loader
The Luxi loader implements both a generic Ganeti Luxi client and theloader; it is better if these two are separated. The patch adds aGaneti/Luxi.hs (not under HTools!) since that is generic for Ganeti, andnot related necessarily to htools.
hbal: Implement grouping of moves into jobsets
Since moving two instances between different node-quadruples (inst X: A,B → C, D and inst Y: E, F → G, H) can be parallelised by Ganeti, itmakes sense to split the operation list into jobsets whose execution...
Change ExtLoader to only handle I/O errors
Due to the Control.Exception changes between 6.8 and 6.10, using itportably is difficult. Since we're only interested in handling I/Oerrors, we can use prelude's catch and not have to deal withControl.Exception at all....
Brown-paper-bag release fixing haddock issues
Haddock doesn't like pre-processed files (at least not in all versions).Thus we need to remove the ExtLoader module from the haddock-procesedfile list.
Brown bag fix: invert a test
During testing I used the test inversely to see it triggers correctly,and committed by mistake the inverted test. Fixing it.
Turn on, and fix, more warnings
The Makefile was intented to be -Wall and not simply -W, but I missedthat. This enables more warnings and also enables -Werror (except forthe tests).
Add support for building without curl
Since curl is not always needed (e.g. when only using luxi or lesslikely file backends only) and is also not always available, it isuseful for building without it. This of course disabled the RAPIbackend.
This patch changes ExtLoader to build with the ‘-cpp’ option which makes...
Split the exernal data loader out of CLI.hs
Currently the external data loader is in CLI.hs, which makes allprograms that need cli functionality (options, etc.) link against thenetwork modules (most importantly curl). This patch splits thisfunctionality into a new module such that (for example) hail which only...
Fix luxi recvMsg for messages bigger than 4K
This patch fixes a logic bug in luxi that breaks receive of messagesbigger than 4096 bytes. The send message is not impacted as it uses adifferent algorithm.
Add some more instance tests
This include instance text load tests.
Test some cases for the cluster score computation
Split the balancing algorithm in two parts
Currently the computation, recursing part and the IO part (progressupdates) of the balancing main function (iterateDepth) are all in thesame function, which makes it hard to test. This patch moves thedecision/computation part (whether to proceed one more round, whether we...
Implement support for 'cheap' moves only
This patch adds support for cheap (failover/migrate) operations only inthe balancing algorithm and in the hbal command line options.
This allows a very quick balancing (compared to allowing replace-disks)which can be useful as a scheduled operation.
Simplify the wrapIO function
This fixes one warning from hlint.
Use migrate or failover based on instance state
While we can't guarantee that the instance will be in the same state bythe time the migrate/failover command will be run, we can at least tryto do the right thing assuming no other changes to the cluster state....
Improve the error message for command line errors
Instead of using ioError . userError, we format the error ourselves.This is nicer - no ‘)’ at the end of the output.
Add a simulated cluster data loader
This is useful especially for hspace, where we might want to simulate ahypothetical cluster to check allocation beforehand.
CLI: Handle error better
This patch adds an error handler for any exceptions that are raisedduring the external data load phase. This can be improved further, butit's a good start.
Unify the command line options and structures
This patch moves all the command line options and their internalrepresentation into CLI.hs. This means that duplicated options betweenany two binaries are no longer declared twice, and that we no longerneed the two *Option classes.
Fix a few hlint errors
CLI: Prevent incompatible options to be selected
This patch makes CLI abort if more than one backend is selected.
Add support for luxi backend in CLI/hspace/hbal
This patch changes the backend selection method in CLI to prefer, in order: - a RAPI specification - a Luxi specification - and finally the node/instance files
It also modifies hspace and hbal to provide a ‘-L’ command line option...
Initial commit of the luxi backend
This patch adds a luxi backend that allows direct query of the masterdaemon on the local node. This patch doesn't enable the backend to beused.
There are a couple of things still missing in the implementation: - we don't have a master timeout in reads and writes, only a...
Introduce timeout in RAPI queries
The patch adds two constants in Types.hs for connect and query timeout,then modifies Rapi.hs to use them as the connect and general curltimeout.
Rapi could be improved more, as currently we wait double the totaltimeout due to not aborting early in case the node queries failed.
Fix a haddoc issue
hspace: fix failure handling of tryAlloc results
Currently hspace doesn't handle failures from tryAlloc correctly; thispatch changes the iterateDepth function in hspace to return a Result (…)so that errors can be propagated correctly.
The patch also changes one output key to be more clear and a typo in...
Change the tryAlloc/tryReloc workflow
Currently, the tryAlloc and tryReloc function return a list with all theresults, both failures and successes. This is fine for hail, which doesone round of allocations, but is not so good for hspace, which doesiterative rounds; since at each (successful) step we only take the best...
Simplify the Cluster.tryAlloc structures
Currently the tryAlloc function calls theallocateOnSingle/allocateOnPair and the builds a new tuple with thosefunctions's result plus the new node list. This is however suboptimalin two respects: - the new nodes added are the 'old' versions of the respective nodes,...
Slight change to the internal allocation results
Currently the Cluster.AllocSolution type is defined as a list of‘(OpResult Node.list, …)’ and the results for applyMove are defined as‘(OpResult Node.List, …)’. Both these means that the failure/successindication is hidden in the first elements of this tuple, which makes is...
hspace: switch output to shell-script format
This (big) patch changes the output of hspace from text-format(separated by ‘: ’) to a shell-snippet, in ‘key=value’ format.
This will allow sourcing the output or parsing it via awk/sed/etc.
hspace: move instance count and score into CStats
Currently the instance count and cluster score are separated from theother initial/final phase stats, even though they are very similar. Thispatch moves computation of these two into totalResources/CStats and...
Fix unittests
The recent OpResult and CPU values additions broke unittests.
Export more stats in hspace
This patch changes Cluster.totalResources to compute more resources andprints them in hspace.
Show errors on stderr instead of stdout
Currently many of the exit and warning conditions mistakenly display errormessages on stdout, which makes parsing the output of programs harder. Thispatch attempts to fix such occurrences.
Fix score calculation to work with empty clusters
Currently the cluster score calculation includes an offline instancepercentage, expressed as “offline inst / (offline + online inst)”, whichresults in NaN for empty clusters. This patch changes the calculation...
Optimize the Utils.stdDev function
This patch optimizes the stdDev function in two respects: - first, we don't do sum . map which builds an intermediate list, but instead use a fold over the list to build incrementally the sum; this should reduce both the time and space characteristics, as we...
Take the foldl out of Loader.fixNodes
Currently Loader.fixNodes is foldl' with a complicated function. Itmakes more sense to take foldl' out of this function (and put it intothe caller) and let fixNodes be only this internal function.
Simplify Cluster.computeMoves
This patch changes the function Cluster.computeMoves to use guards and acouple of subexpressions in order to greatly simplify it.
Fix hlint-generated warnings
This big patch cleans up the code per hlint indications. Many removalsof extra parentheses, replacements of concat . map with concabtMap,extra dollar signs, eta reductions, etc. were performed.
The code still compiles and passes a couple of manual tests on sample...
Add computation of the failure reason in hspace
This patch enhances hspace to report why the allocation sequencestopped, both in absolute error count and for the top reason.
Return correct failure data from Node.add*
This patch alters the Node.addPri/addSec to return correct failure data.It removes the computeFailN1 function from the module as that used tocombine both mem and disk checks in the same function and thus the real...
Introduce a new type for allocation results
Currently the allocation/move operations workflow return ‘Maybe a’,which is very convenient but loses all details about the failure mode.
This patch introduces a new data type which encodes the specific failure...
Remove hn1 and related code
hn1 was deprecated for a while and this patch removes it altogether. Thesupport code in Cluster.hs is also removed.
Fix totalResources avail disk computation
This uses the newly-added Node.availDisk to compute the actual availabledisk correctl, and display the total allocatable disk in hspace.
Add an availDisk node function
This function returns the amount of available disk, which depends onwhether a low disk limit has been configured or not and on the free diskspace of the node.
Add two new autocomputed vars to Nodes
Currently we track the max disk usage/max vcpus as percentages, howeversometimes it's easier to check against minimum free disk or maximumnumber of cpus, as units instead of percentages.
This patch adds two new variables, lo_dsk, hi_cpu, which are recomputed...
Add a new type for cluster statistics
Currently totalResources returns a 5-tuple of integers. This is not easyto handle, as each change on the return type means that each caller mustbe updated.
This patch adds a new type for cluster stats and uses that instead as...
Add display of more stats in hspace
This patch changes Cluster.totalResources to compute more details aboutthe cluster status, and enhances hspace to display more of these.
Fix a haddock/docstring issue
Implement cpu/disk limits in instance moves
We modify Node.addPri/addSec to take into account the limits on instanceadds.
Add two new node attributes
Two new min disk free ratio and max cpu usage attributes are added to thenodes. These will be used in the future to restrict allocation.
Fix 'unused X' warnings
This removes some unused functions and imports to cleanup the warnings.
Fix the various monomorphism warning
In a few places (e.g. tryRead or any printf call) it's a little bit hardto add the correct type signatures, but in the it is possible to fixthese warnings (which can bite one in subtle cases).
Small changes to the node list output
This is just some cleanup of the node list output, adding pcpu/vcpucounters, and making the display slightly nicer.
Add cpu ratio to cluster calculation
Update cpu counters correctly after pinst changes
The cpu counters are update on primary instance adds/removes.
Add cpu-count-related attributes to nodes
This patch adds cpu-count related attributes to nodes: - total cpus - cpus in use - ratio of virtual:physical cpus
We also set correctly the cpu values at load time, but we don't doanything yet while moving instances around. The cpu ratio is shown in...
Add a new vcpus attribute to instances
This patch adds reading of vcpu count for instances, in preparation forusing the vcpu ratio in cluster scoring.
Fix reading of total disk space in iallocator
IAllocator currently uses a wrong key name for reading the total diskspace (‘disk_usage’ which was copied from RAPI, but the actualiallocator key is ‘disk_space_total’).
This patch fixes that and also makes iallocator always use this key,...
Fix the ReplacePrimary instance move
During a replace-primary instance move, on the real cluster the instanceis temporarily started on the secondary, and as such we must check thatthe secondary node can hold it for this duration. Currently the codedoes not, and depending on cluster scoring it will put instances on such...
Rework the tryAlloc/tryReloc functions
Currently tryAlloc/tryReloc do not return the new instance, as this isnot needed for IAllocator alloc/reloc requests. However, for computingthe space, the new instance is useful, so we modify these functions toreturn this information too....
Add an utility function for triples
Small doc change
And an alignment issue.
Ensure consistent naming of the tools
This patch makes sure that all references to the name of the software isganeti-htools, not simply htools.
Add copyright/license information
This doc-patch adds copyright and license information to (hopefully) allneeded files.
tests: move the test declaration in QC.hs
This patch moves the test declaration into QC.hs, so that test.hs has tobe modified only when we add a new test category.
Move some alloc functions from hail into Cluster
These are generic enough to be used from multiple places, they belongbetter in Cluster.hs than in the hail source.
Small whitespace change
Move the RqType and Request types to Loader.hs
These two will be more generic than now, and belong somewhere else -Loader.hs is a generic module for data loading, thus we move them there.
Cleanup an old function
Also replace a type with its synonim.
Lots of documentation updates
This patch does only doc build changes, doc changes and function movearound (for more logical documentation). It should have no impact at allon the code.
A simple test for Container.addTwo
Add some very trivial Instance tests
This is more of an exercise in QuickCheck than strong testing.
Finish removal of unused params from PeerMap
This completes the removal started earlier byt removeing the need topass the number of nodes to Node.buildPeers, which is now unused.
Add test infrastructure and initial tests
This patch adds a QuickCheck-based test infrastructure and initial testsbased on it. The PeerMap module has a 100% coverage ☺
Side-note: one has to read the source of QuickCheck to see how to use it(especially the Batch submodule), the docs are not enough…
Some cleanup of the PeerMap module
This patch removes some unused functions and does some cleanup of theremaining ones.
Remove unused parameters from PeerMap creation
We remove some unused arguments (added way back for compatibility withArrays, which we didn't use in the end). This makes the code clearer(and doesn't need the Ndx type to be an instance of Num).
Remove an unused type synonim
Add a separate function for looking up instances
Currently we (wrongly) use lookupNode to lookup instances, just becausethe name assoc list has the same type. This patch adds a separatefunction for it.
Add type synonyms for the node/instance indices
This is a first step towards full datatype renaming. That requires morechanges, so at first we only want to document clearly what is a nodeindex, what is an instance index, and what is a plain Int.
Change the module import hierarchy
This patch makes the Types module a base module, and Node/Instance onesimport it, from the previous (opposite) situation. This will allow inthe future to use newtypes for the index and name types.
Port offline node fixes from Rapi to IAllocator
The IAllocator source was copied from Rapi before the offline node fixeswere made. This changes such that offline nodes are accepted correctly.
Fix loading of plain instances via iallocator
Currently iallocator is broken when reading single-node instances (andwith an ugly error message). This patch fixes this case, by marking themwith secondary node “noSecondary” like the rest of the code.
Fix some haddock issues
Slash is a reserved char. Slash is a reserved char. Slash is a…
hail: do not allocate on offline/drained nodes
This patch implements filtering out of the offline/drained nodes andfixes a bug in IAllocator.hs parsing (similar to an older bug in Rapi.hsfrom where the code was copied).
hail: Implement non-mirrored instance allocation
This patch implements non-mirrored instance allocation, by allocating assecondary node “noSecondary”.
Implement hail allocate (for 2-node requests)
This patch implements allocate for two node requests. One node requestscan be done as soon as we have a valid allocateOn function for singlenodes.
Working implementation if relocate
This patch completes the implementation of hail relocate. It maps allvalid destination nodes through a ReplaceSecondary IMove, filters outthe failed relocations, computes the resulting scores and picks thelowest one.
Start implementing the hail functionality
This patch implements a very stupid (and broken) version of hail‘allocate’.
Remove ktn/kti from first half of loader
This patch removes the ktn/kti lists from most parts of the first halfof the loading sequence. Some remain as the [(String, Int)] is thenicest way to lookup names and get indices back.
Remove the ktn/kti from second half of loading
This removes the return of ktn/kti from Loader.mergeData and associatedfunctions.
Remove most uses of ktn/kti
This patch removes all uses of ktn/kti from the past-loader stages.
Add some utility functions for kt deprecation
These will be used to remove even more uses of ktn/kti in non-criticalpaths.
Remove some extraneous uses of ktn/kti
Since we have Node/Instance.name, we can now simplify a few constructs.
Strip the suffix from the names in the objects
This strips the suffix from the objects themselves, not only from thektn/kti vars.
Make IAlloc.loadData return maps
This patch makes the format of IAlloc.loadData be similar the same asLoader.mergeData.
Move common loading sequence in CLI
This patch moves the common loading sequence to CLI, such that hbal/hn1and possible future scripts that take the input from same sources canuse it.
Move checkData from Cluster to Loader
This moves the remaining loading function to Loader (together with itsassociated support functions).