History | View | Annotate | Download (30.4 kB)
Add a tryEvac function
This will be used by the node evacuate IAllocator request type.
Signed-off-by: Iustin Pop <iustin@google.com>
Change an internal type from Maybe to list
In preparation for multiple responses, we change from Maybe to List(both used in the container sense).
This allows us to keep the same workflow for all kind of requests.
Move a type declaration to Node.hs
We'll need AllocElement in both Cluster and IAlloc in the future, so wemove it to Node.hs which is imported by both.
Implement evacuation mode in hbal
This mode restricts the list of instances to be moved to the instancesliving on the offline (and drained) nodes.
Move instance relocation test upper in the chain
Currently we test each instance for relocation in checkMove; however, itis a little more clear if we pass only the relocatable instances tocheckMove. The patch also slightly rewrites (indendation/style) the...
Split the balancing function in two parts
Currently in the balancing function we do two thing:
- take the decision where to do a new balancing round or not- and actually computing the balancing round
This is not nice, as the two parts are conceptually separate, so this...
Convert n1_score metric from % to count
This increases the priority of fixing N+1 failures compared to balancingmetrics.
Metric: count of primary instances/offline nodes
This helps with evacuation/failover of instances on 2-node clusters withone one offline.
Offline instance metric: change from % to count
Currently we use the offline instance percentage (with range [0, 1]),but this is not good, since we want the evacuation of such instances tohave a high priority; therefore we change this to a count of offline...
Use conflicting primaries count in cluster score
This small patch adds the number of conflicting primaries in the clusterscore. This is different from the other non-CV metrics where we usuallycompute the percentage of failing instances (for that metric); but for a...
Allow overriding the field list in -p
The print nodes option can now accept an optional field list tocustomise the output. This is ugly, since the field names do not matchthe header names, but it is at least barely customisable (at runtime).
Move more node-listing functionality in Node.hs
This will prepare for the runtime-selectable field list.
Add a few comments in the scoring function
Expand the --print-instances output
This adds run status, resource parameters and load parameters forinstances.
Simplify the cstats initializer
Since all values are initialized to zero, the exact ordering is notimportant and thus we can use the positional mode for simpler code.
The patch also adds docstrings to the cstats functions.
Simplify Cluster.computeMoves
Since we now have an actual type for describing the instance moves(IMove), it's simpler to convert this into the move description/movecommands, rather than re-computing the move based on initial and finalnodes. This makes the shell commands computation and over-Luxi command...
Remove obsolete export
The ‘Placement’ type has been moved to Types.hs but we kept exporting itfrom Cluster, which is not needed.
Generalise the node/instance listing
This patch introduces a generic formatTable function (based on, andsimilar to the Ganeti one, but different and more FP in style) andchanges the node and instance listing to it.
The node list (due to the many variables) is still a little bit hackish...
Fix instance listing for non-redundant case
Start using the utilisation scores in balancing
This enables the per-node load/total available capacity scores to beused in balancing. Note that the total available capacity is currentlyfixed at zero and cannot be changed by the user.
Show the load on nodes in node lists
The strange printf usage is due to some limitation (it seems) in ghc forvery long argument lists. The whole printout should be rewritten later.
Allow displaying the instance map in hbal
This is similar to --print-nodes, but with much fewer fields.
Style change: cluster CStats camel-casing
This is again the cs_x to csX name change.
Style change: node and instance attributes
This changes from a_b to aB in all node and instance attributes, tomatch the standard Haskell style. Also attributes that should have beencamel-cased but weren't were changed (e.g. plist → pList, pnode →pNode).
Modify the internals of the detailed CV scores
Before we used a tuple; since we'll need more metrics in the future,it's simpler to transform this into a list of doubles, whose elementsare handled homogeneously by all the code that needs them.
Change iMoveToJob to properly create migrates
The current Cluster.iMoveToJob always creates failovers, which is notwhat we want. This simply used the original instances status to selectbetween these two (this is not optimal by the way, since the status...
Extend the MoveJob type to hold the instance index
This will be needed in order to generate the proper instance move commands.
Store the instance move in the MoveJobs
This will automatically sort our Ganeti jobs into the independent jobsets, and then we can submit them separately.
Move some more type definitions to Types.hs
Add a function converting Placements into Jobs
This converts from htools-specific Placements into Ganeti standardOpCodes, which will later allow execution via Luxi.
Record the move being performed in a Placement
This will allow a more descriptive output later in the solution list, asopposed to trying to reconstruct the move from the node indices.
The patch also documents the Placement members.
hbal: Implement grouping of moves into jobsets
Since moving two instances between different node-quadruples (inst X: A,B → C, D and inst Y: E, F → G, H) can be parallelised by Ganeti, itmakes sense to split the operation list into jobsets whose execution...
Turn on, and fix, more warnings
The Makefile was intented to be -Wall and not simply -W, but I missedthat. This enables more warnings and also enables -Werror (except forthe tests).
Split the balancing algorithm in two parts
Currently the computation, recursing part and the IO part (progressupdates) of the balancing main function (iterateDepth) are all in thesame function, which makes it hard to test. This patch moves thedecision/computation part (whether to proceed one more round, whether we...
Implement support for 'cheap' moves only
This patch adds support for cheap (failover/migrate) operations only inthe balancing algorithm and in the hbal command line options.
This allows a very quick balancing (compared to allowing replace-disks)which can be useful as a scheduled operation.
Use migrate or failover based on instance state
While we can't guarantee that the instance will be in the same state bythe time the migrate/failover command will be run, we can at least tryto do the right thing assuming no other changes to the cluster state....
Fix a few hlint errors
Fix a haddoc issue
hspace: fix failure handling of tryAlloc results
Currently hspace doesn't handle failures from tryAlloc correctly; thispatch changes the iterateDepth function in hspace to return a Result (…)so that errors can be propagated correctly.
The patch also changes one output key to be more clear and a typo in...
Change the tryAlloc/tryReloc workflow
Currently, the tryAlloc and tryReloc function return a list with all theresults, both failures and successes. This is fine for hail, which doesone round of allocations, but is not so good for hspace, which doesiterative rounds; since at each (successful) step we only take the best...
Simplify the Cluster.tryAlloc structures
Currently the tryAlloc function calls theallocateOnSingle/allocateOnPair and the builds a new tuple with thosefunctions's result plus the new node list. This is however suboptimalin two respects: - the new nodes added are the 'old' versions of the respective nodes,...
Slight change to the internal allocation results
Currently the Cluster.AllocSolution type is defined as a list of‘(OpResult Node.list, …)’ and the results for applyMove are defined as‘(OpResult Node.List, …)’. Both these means that the failure/successindication is hidden in the first elements of this tuple, which makes is...
hspace: move instance count and score into CStats
Currently the instance count and cluster score are separated from theother initial/final phase stats, even though they are very similar. Thispatch moves computation of these two into totalResources/CStats and...
Export more stats in hspace
This patch changes Cluster.totalResources to compute more resources andprints them in hspace.
Fix score calculation to work with empty clusters
Currently the cluster score calculation includes an offline instancepercentage, expressed as “offline inst / (offline + online inst)”, whichresults in NaN for empty clusters. This patch changes the calculation...
This patch changes the function Cluster.computeMoves to use guards and acouple of subexpressions in order to greatly simplify it.
Fix hlint-generated warnings
This big patch cleans up the code per hlint indications. Many removalsof extra parentheses, replacements of concat . map with concabtMap,extra dollar signs, eta reductions, etc. were performed.
The code still compiles and passes a couple of manual tests on sample...
Introduce a new type for allocation results
Currently the allocation/move operations workflow return ‘Maybe a’,which is very convenient but loses all details about the failure mode.
This patch introduces a new data type which encodes the specific failure...
Remove hn1 and related code
hn1 was deprecated for a while and this patch removes it altogether. Thesupport code in Cluster.hs is also removed.
Fix totalResources avail disk computation
This uses the newly-added Node.availDisk to compute the actual availabledisk correctl, and display the total allocatable disk in hspace.
Add a new type for cluster statistics
Currently totalResources returns a 5-tuple of integers. This is not easyto handle, as each change on the return type means that each caller mustbe updated.
This patch adds a new type for cluster stats and uses that instead as...
Add display of more stats in hspace
This patch changes Cluster.totalResources to compute more details aboutthe cluster status, and enhances hspace to display more of these.
Fix a haddock/docstring issue
Fix the various monomorphism warning
In a few places (e.g. tryRead or any printf call) it's a little bit hardto add the correct type signatures, but in the it is possible to fixthese warnings (which can bite one in subtle cases).
Small changes to the node list output
This is just some cleanup of the node list output, adding pcpu/vcpucounters, and making the display slightly nicer.
Add cpu ratio to cluster calculation
Add cpu-count-related attributes to nodes
This patch adds cpu-count related attributes to nodes: - total cpus - cpus in use - ratio of virtual:physical cpus
We also set correctly the cpu values at load time, but we don't doanything yet while moving instances around. The cpu ratio is shown in...
Fix the ReplacePrimary instance move
During a replace-primary instance move, on the real cluster the instanceis temporarily started on the secondary, and as such we must check thatthe secondary node can hold it for this duration. Currently the codedoes not, and depending on cluster scoring it will put instances on such...
Rework the tryAlloc/tryReloc functions
Currently tryAlloc/tryReloc do not return the new instance, as this isnot needed for IAllocator alloc/reloc requests. However, for computingthe space, the new instance is useful, so we modify these functions toreturn this information too....
Add copyright/license information
This doc-patch adds copyright and license information to (hopefully) allneeded files.
Move some alloc functions from hail into Cluster
These are generic enough to be used from multiple places, they belongbetter in Cluster.hs than in the hail source.
Small whitespace change
Cleanup an old function
Also replace a type with its synonim.
Lots of documentation updates
This patch does only doc build changes, doc changes and function movearound (for more logical documentation). It should have no impact at allon the code.
Remove an unused type synonim
Add type synonyms for the node/instance indices
This is a first step towards full datatype renaming. That requires morechanges, so at first we only want to document clearly what is a nodeindex, what is an instance index, and what is a plain Int.
Change the module import hierarchy
This patch makes the Types module a base module, and Node/Instance onesimport it, from the previous (opposite) situation. This will allow inthe future to use newtypes for the index and name types.
hail: Implement non-mirrored instance allocation
This patch implements non-mirrored instance allocation, by allocating assecondary node “noSecondary”.
Implement hail allocate (for 2-node requests)
This patch implements allocate for two node requests. One node requestscan be done as soon as we have a valid allocateOn function for singlenodes.
Working implementation if relocate
This patch completes the implementation of hail relocate. It maps allvalid destination nodes through a ReplaceSecondary IMove, filters outthe failed relocations, computes the resulting scores and picks thelowest one.
Remove most uses of ktn/kti
This patch removes all uses of ktn/kti from the past-loader stages.
Remove some extraneous uses of ktn/kti
Since we have Node/Instance.name, we can now simplify a few constructs.
Move checkData from Cluster to Loader
This moves the remaining loading function to Loader (together with itsassociated support functions).
More code reorganizations
This new big patch does a couple of more cleanups in the loading of datachapter: - introduce a Types module that holds most types (except the base Node/Instance/etc.) so that multiple other modules can use these (instead of only Cluster and its users)...
Rework the loader model
This big patch changes the loader model from “string data as commonformat” to actual object structures as common format.
The text loading function move from Cluster.hs to a new Text.hs module,some common functions are moved to a new Loader.hs module, and the...
Experimental support for non-redundant instances
This patch adds experimental support to hbal for non-redundant instances(i.e. instances with only one node). They are currently handled asnon-moveable, and as such the algorithm simply ignores them.
Supports needs to be added when reading from RAPI via hscan, and...
Small doc addition
Introduce nice errors on invalid input fields
This patch switches from plain read to a wrapper over readsPrec thatreturns better error messages than the buildin 'Prelude: no parse'.
Split node/instance parsing into functions
This allows easy checking for valid format of the input data (row-wise).
Add initial validation checks in Cluster.loadData
This patch converts loadTabular and loadData to a monadic form, thusallowing meaningful error messages from the node/instance load routines.
Convert Cluster.loadData to Result return
This patch changes Cluster.loadData to return a Result, instead ofdirectly the values; this will allow us to return meaningful errorvalues (e.g. when an instances lives on unknown node) rather than simplyabort. Currently the result is always an Ok, the actual signalling of...
Don't consider offline nodes as N+1 failed
This is just a cosmetic (I hope) change; the nodes shouldn't be usedanyway, and we only correct the display message.
Add support for 'offline' nodes
This patch drops compatiblity with Ganeti 1.2 and adds support foroffline nodes in the cluster. When reading from RAPI, the drained nodesare considered offline so that we don't allocate on them too.
hbal: Add a new min-score option
This new parameter causes the algorithm to finish (or even not start atall) if we reach/have a score better than it.
hbal: Change hardcoded tests to monadic composition
In some case we manually do “if isNothing … then Nothing else …”, whichcan be very easily replaced with a monadic construct in the Maybe monad.
Increase allowed missing memory to 512MB
Since Xen seems to “steal” some amounts of memory (depending on totalnode memory), we increase the maximum allowed missing memory to 512MB,based on gathered data from multiple machines.
Implement writing the command list to a script
This patch adds support in hbal for writing the command list to a shellscript, with error checking and allowing for early exit.
Add checks for missing disk space
This small patch adds disk space checks to the Cluster.checkDatafunction, and simplifies a little the warning messages.
Fix interaction between down instances and nodes
If an instance is down, it's memory is not reflected in the node usedmemory, and thus the node free memory is higher than the actual value.This patch deducts the memory for such instances from the node free...
Add a new instance field denoting run status
This patch modifies Rapi, the Cluster.loadData and hscan serialization to loadand save the instance run status. At instance level, we add both a booleanfield denoting the true/false run status, and a string field which holds the...
Show the x_mem/i_mem in node list
This patch adds checking of cluster data in the binaries and display ofnode's x_mem/i_mem in the node list.
Add functions to check and fix cluster data
This patch adds a checkData function which goes over the node list and computesthe unaccounted memory, returning a list of warning messages (if any) and theupdate nodes.
Add node memory field to Node objects
This patch adds a new n_mem field to the node objects, and implementsread/save/show support for it. The field is not currently used (exceptin the node list) but will be used for checking data consistency andinstance up/down status.
Pass actual types to node/instance constructors
This patch changes the parameters passed to the node and instanceconstructors from generic Strings (which are then parsed via “read”) tothe actual used types, by converting them earlier in Cluster.loadData.
Some small changes in preparation for hscan
This patch does some small changes: - fixes a comment - export more node functions (unneeded now, but hscan will use them) - fixes Makefile rule for building the programs
Add a separate type for the [(Int, String)] list
This is added for better readability, since this is very often used indeclarations.
Handle correctly offline nodes in cluster scoring
This patch changes two things with regard to offline nodes: - first, it only calculates the various coefficients across online nodes - second, it adds a new score denoting the percentage of instances...
Show offline nodes in the node status list
This patch adds a new ‘-’ flag for the node status which denotes offlinenodes.
Restrict move list based on offline node status
This patch changes the Cluster.checkInstanceMove function to restrictthe target move list based on which nodes are online.
Some updates to the apidoc rules