Read cluster tags in the IAllocator backend
Read cluster tags in the LUXI backend
Read cluster tags in the RAPI backend
This also shows them in hbal in verbose mode.
Introduce support for reading the cluster tags
While these are not actually populated from the backends, and all theprograms ignore them, this patch contains the changes in the functiontypes required.
Add a command-line option to filter exclusion tags
Since we don't want all instance tags to be used for exclusion, we add acommand line option to filter on these. Since the iallocator protocolcannot accept command line options, currently it's not possible to...
Add a new node list field
This patch adds a new node list field (ptags), showing the primaryinstance tags.
Node: add function for conflicting primary count
Use conflicting primaries count in cluster score
This small patch adds the number of conflicting primaries in the clusterscore. This is different from the other non-CV metrics where we usuallycompute the percentage of failing instances (for that metric); but for a...
Specialize the math functions
The statistics functions are currently defined as polymorphic with aFloating constraint. Changing this to monomorphic on Double type makesthem stricter and much more performant (~70% speedup). This is a cheapway to recoup some of the loses incurred by the recent proliferation of...
Collapse the statistical functions into one
This allows us to get rid of two duplicate list length computations,with a minor speedup.
Introduce tag-based exclusion of primary instances
This patch introduces exclusion of primary instances based on tags. Thisis incomplete as currently all tags are being excluded, and we don'toptimise towards relocation of instances sharing tags on the same node.
Add a tags attribute to instances
… and read it in all the loaders. hscan is modified to save it to thefiles it generates.
The attribute is not yet used in any place.
Small change in some list arguments
This is simpler than the concat operator.
Allow overriding the field list in -p
The print nodes option can now accept an optional field list tocustomise the output. This is ugly, since the field names do not matchthe header names, but it is at least barely customisable (at runtime).
Move more node-listing functionality in Node.hs
This will prepare for the runtime-selectable field list.
Change the default dynamic usage to baseUtil
This fixed the unbalanced secondary instances on partially emptyclusters, and helps in general for the cases where real utilisation datais not available.
Add a few comments in the scoring function
Enhance the error reporting for Rapi and Luxi
Currently the JSON conversion in Rapi and Luxi are giving somethinglike: Error: failed to load data. Details: Unable to read Double
This doesn't tell one where the error is (in a node specification? and...
Change the Utils.fromObj signature
Currently the fromObj function takes a JSON object which is thenconverted into a list of (String, JSValue) in which we make a lookup.However, most of the callers of this function call it repeatedly on thesame object, which means we do the object→list conversion repeatedly....
Make some CLI options more consistent
Both the simulate and the tiered allocation mode take a machine spec oninput via a comma-separated list. This patch makes this a little bitmore consistent (always use disk,ram,cpu in this order).
hspace: show tiered-alloc stats in the output
This is a first attempt to get a readable output of tiered allocationstats in hspace's output. Not very nice, but it should be somewhatparseable.
A small style change in Node.hs
This imports PeerMap as P and reindents some lines.
Add support for shrinking instance specs
This patch adds a function that, for some given failure modes, shrinks agiven instance in the hope that allocation will succeed when retriedwith the new spec.
Convert option parsing to a monadic flow
This allows us to do verification of option arguments in the assignmentfunctions themselves.
Rework the instance spec CLI options
This patch reworks the internal handling of the instance spec CLIoption, and adds a tiered spec option that will be used in hspace toenable the (auxiliary) tiered-spec allocation mode.
It also introduces a new data type for holding the instance...
Some cleanup of Loader.mergeData
This doesn't need to be a monadic function, let's make it a simpler one.
hbal: ignore unknown instance in dynload file
Since the utilisation file might be generated at a different time fromthe hbal run, and instances could dissapear in the meantime, it's betterto simply ignore unknown instances rather than abort.
Expand the --print-instances output
This adds run status, resource parameters and load parameters forinstances.
Change the Container.findByName function
This patch changes the signature and implementation of the function;returning the item makes more sense (saves a lookup later again in thecontainer, and applying idx is cheap), and the previous implementationwas ugly.
Some small style fixes
Simplify the cstats initializer
Since all values are initialized to zero, the exact ordering is notimportant and thus we can use the positional mode for simpler code.
The patch also adds docstrings to the cstats functions.
Simplify Cluster.computeMoves
Since we now have an actual type for describing the instance moves(IMove), it's simpler to convert this into the move description/movecommands, rather than re-computing the move based on initial and finalnodes. This makes the shell commands computation and over-Luxi command...
Remove obsolete export
The ‘Placement’ type has been moved to Types.hs but we kept exporting itfrom Cluster, which is not needed.
Generalise the node/instance listing
This patch introduces a generic formatTable function (based on, andsimilar to the Ganeti one, but different and more FP in style) andchanges the node and instance listing to it.
The node list (due to the many variables) is still a little bit hackish...
Fix instance listing for non-redundant case
Fix two haddoc/happy docstring issues
Start using the utilisation scores in balancing
This enables the per-node load/total available capacity scores to beused in balancing. Note that the total available capacity is currentlyfixed at zero and cannot be changed by the user.
Add loading and processing of utilisation data
This patch adds loading and processing the utilisation data duringinstance moves. While the data is not yet used, it is correctly modifiedby instance changes between nodes.
hbal has the new ‘-U’ command line argument for this. The format of the...
Add an option to input utilisation data
Merge the Node.setPri and Node.addCpus functions
The latter is only used right after the former in the Loader module, andwe'll need more of this 'update not with the data of this instance'functionality (which is different than addPri where all information must...
Move some utility functions to Utils.hs
These were already duplicate (Text and Simu) and we need tryRead in more places.
Show the load on nodes in node lists
The strange printf usage is due to some limitation (it seems) in ghc forvery long argument lists. The whole printout should be rewritten later.
Add initial structure for utilisation balancing
This patch adds the datatypes and modifies the nodes and instance types to havesuch attributes. They are not used yet in any way.
Allow displaying the instance map in hbal
This is similar to --print-nodes, but with much fewer fields.
Add an explicit export list to Instance.hs
This exports all functions, but it's still good to have.
More hlint fixes
This makes (for now) the code hlint-clean. This is per se not a hugegain, but it allows easier tracking of regressions in style later(one-two new violations are easier to diagnose when not hidden among 20“known” ones).
Style change: camel-casing of unittests
Style change: cluster CStats camel-casing
This is again the cs_x to csX name change.
Style change: node and instance attributes
This changes from a_b to aB in all node and instance attributes, tomatch the standard Haskell style. Also attributes that should have beencamel-cased but weren't were changed (e.g. plist → pList, pnode →pNode).
Modify the internals of the detailed CV scores
Before we used a tuple; since we'll need more metrics in the future,it's simpler to transform this into a list of doubles, whose elementsare handled homogeneously by all the code that needs them.
Add a command line option for executing jobs
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add two specialized Luxi calls
This are higher level wrappers over the basic callMethod.
Signed-off-by: Iustin Pop <iustin@google.com>
Change iMoveToJob to properly create migrates
The current Cluster.iMoveToJob always creates failovers, which is notwhat we want. This simply used the original instances status to selectbetween these two (this is not optimal by the way, since the status...
Extend the MoveJob type to hold the instance index
This will be needed in order to generate the proper instance move commands.
Add a simple module for dealing with Ganeti jobs
This holds for now just the job status definitions and its serialisationto/from JSON.
Fix haddock issues with tuple members
It seems that haddock cannot document tuple members - but arguably, onceone needs to do that, tuples should not be used anymore.
This just moves the comments to the tuple comment.
Signed-off-by: Iustin Pop <iustin@google.com>...
parseNode: don't lookup values in drained nodes
Currently parseNode skips looking for values in offline nodes, but triesto read them for drained ones. With this patch we treat offline anddrained nodes in the same way (which is compatible with the iallocator...
Store the instance move in the MoveJobs
This will automatically sort our Ganeti jobs into the independent jobsets, and then we can submit them separately.
Move some more type definitions to Types.hs
Add a function converting Placements into Jobs
This converts from htools-specific Placements into Ganeti standardOpCodes, which will later allow execution via Luxi.
Add a small implementation of OpCodes
These are just a few opcodes we need for executing instance moves.
Record the move being performed in a Placement
This will allow a more descriptive output later in the solution list, asopposed to trying to reconstruct the move from the node indices.
The patch also documents the Placement members.
Add definitions for more Luxi calls
Split the Luxi generic parts from the loader
The Luxi loader implements both a generic Ganeti Luxi client and theloader; it is better if these two are separated. The patch adds aGaneti/Luxi.hs (not under HTools!) since that is generic for Ganeti, andnot related necessarily to htools.
hbal: Implement grouping of moves into jobsets
Since moving two instances between different node-quadruples (inst X: A,B → C, D and inst Y: E, F → G, H) can be parallelised by Ganeti, itmakes sense to split the operation list into jobsets whose execution...
Change ExtLoader to only handle I/O errors
Due to the Control.Exception changes between 6.8 and 6.10, using itportably is difficult. Since we're only interested in handling I/Oerrors, we can use prelude's catch and not have to deal withControl.Exception at all....
Brown-paper-bag release fixing haddock issues
Haddock doesn't like pre-processed files (at least not in all versions).Thus we need to remove the ExtLoader module from the haddock-procesedfile list.
Brown bag fix: invert a test
During testing I used the test inversely to see it triggers correctly,and committed by mistake the inverted test. Fixing it.
Turn on, and fix, more warnings
The Makefile was intented to be -Wall and not simply -W, but I missedthat. This enables more warnings and also enables -Werror (except forthe tests).
Add support for building without curl
Since curl is not always needed (e.g. when only using luxi or lesslikely file backends only) and is also not always available, it isuseful for building without it. This of course disabled the RAPIbackend.
This patch changes ExtLoader to build with the ‘-cpp’ option which makes...
Split the exernal data loader out of CLI.hs
Currently the external data loader is in CLI.hs, which makes allprograms that need cli functionality (options, etc.) link against thenetwork modules (most importantly curl). This patch splits thisfunctionality into a new module such that (for example) hail which only...
Fix luxi recvMsg for messages bigger than 4K
This patch fixes a logic bug in luxi that breaks receive of messagesbigger than 4096 bytes. The send message is not impacted as it uses adifferent algorithm.
Add some more instance tests
This include instance text load tests.
Test some cases for the cluster score computation
Split the balancing algorithm in two parts
Currently the computation, recursing part and the IO part (progressupdates) of the balancing main function (iterateDepth) are all in thesame function, which makes it hard to test. This patch moves thedecision/computation part (whether to proceed one more round, whether we...
Implement support for 'cheap' moves only
This patch adds support for cheap (failover/migrate) operations only inthe balancing algorithm and in the hbal command line options.
This allows a very quick balancing (compared to allowing replace-disks)which can be useful as a scheduled operation.
Simplify the wrapIO function
This fixes one warning from hlint.
Use migrate or failover based on instance state
While we can't guarantee that the instance will be in the same state bythe time the migrate/failover command will be run, we can at least tryto do the right thing assuming no other changes to the cluster state....
Improve the error message for command line errors
Instead of using ioError . userError, we format the error ourselves.This is nicer - no ‘)’ at the end of the output.
Add a simulated cluster data loader
This is useful especially for hspace, where we might want to simulate ahypothetical cluster to check allocation beforehand.
CLI: Handle error better
This patch adds an error handler for any exceptions that are raisedduring the external data load phase. This can be improved further, butit's a good start.
Unify the command line options and structures
This patch moves all the command line options and their internalrepresentation into CLI.hs. This means that duplicated options betweenany two binaries are no longer declared twice, and that we no longerneed the two *Option classes.
Fix a few hlint errors
CLI: Prevent incompatible options to be selected
This patch makes CLI abort if more than one backend is selected.
Add support for luxi backend in CLI/hspace/hbal
This patch changes the backend selection method in CLI to prefer, in order: - a RAPI specification - a Luxi specification - and finally the node/instance files
It also modifies hspace and hbal to provide a ‘-L’ command line option...
Initial commit of the luxi backend
This patch adds a luxi backend that allows direct query of the masterdaemon on the local node. This patch doesn't enable the backend to beused.
There are a couple of things still missing in the implementation: - we don't have a master timeout in reads and writes, only a...
Introduce timeout in RAPI queries
The patch adds two constants in Types.hs for connect and query timeout,then modifies Rapi.hs to use them as the connect and general curltimeout.
Rapi could be improved more, as currently we wait double the totaltimeout due to not aborting early in case the node queries failed.
Fix a haddoc issue
hspace: fix failure handling of tryAlloc results
Currently hspace doesn't handle failures from tryAlloc correctly; thispatch changes the iterateDepth function in hspace to return a Result (…)so that errors can be propagated correctly.
The patch also changes one output key to be more clear and a typo in...
Change the tryAlloc/tryReloc workflow
Currently, the tryAlloc and tryReloc function return a list with all theresults, both failures and successes. This is fine for hail, which doesone round of allocations, but is not so good for hspace, which doesiterative rounds; since at each (successful) step we only take the best...
Simplify the Cluster.tryAlloc structures
Currently the tryAlloc function calls theallocateOnSingle/allocateOnPair and the builds a new tuple with thosefunctions's result plus the new node list. This is however suboptimalin two respects: - the new nodes added are the 'old' versions of the respective nodes,...
Slight change to the internal allocation results
Currently the Cluster.AllocSolution type is defined as a list of‘(OpResult Node.list, …)’ and the results for applyMove are defined as‘(OpResult Node.List, …)’. Both these means that the failure/successindication is hidden in the first elements of this tuple, which makes is...
hspace: switch output to shell-script format
This (big) patch changes the output of hspace from text-format(separated by ‘: ’) to a shell-snippet, in ‘key=value’ format.
This will allow sourcing the output or parsing it via awk/sed/etc.
hspace: move instance count and score into CStats
Currently the instance count and cluster score are separated from theother initial/final phase stats, even though they are very similar. Thispatch moves computation of these two into totalResources/CStats and...
Fix unittests
The recent OpResult and CPU values additions broke unittests.
Export more stats in hspace
This patch changes Cluster.totalResources to compute more resources andprints them in hspace.
Show errors on stderr instead of stdout
Currently many of the exit and warning conditions mistakenly display errormessages on stdout, which makes parsing the output of programs harder. Thispatch attempts to fix such occurrences.
Fix score calculation to work with empty clusters
Currently the cluster score calculation includes an offline instancepercentage, expressed as “offline inst / (offline + online inst)”, whichresults in NaN for empty clusters. This patch changes the calculation...
Optimize the Utils.stdDev function
This patch optimizes the stdDev function in two respects: - first, we don't do sum . map which builds an intermediate list, but instead use a fold over the list to build incrementally the sum; this should reduce both the time and space characteristics, as we...
Take the foldl out of Loader.fixNodes
Currently Loader.fixNodes is foldl' with a complicated function. Itmakes more sense to take foldl' out of this function (and put it intothe caller) and let fixNodes be only this internal function.