Remove obsolete TODO
Originally, hroller started as a tool for offline maintenance only.There it made sense to warn about instances still running. By now,default planning is to migrate instance off the nodes to be rebooted,with options for other behavior (like pretending that all instances...
Index instances by their UUID
No longer index instances by their name but by their UUID in the clusterconfig. This change changes large parts of the code, as the followingadjustments were necessary: * Change the index key to UUID in the configuration and the...
Merge branch 'stable-2.8' into master
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
In tiered allocation, cut non-promising shrinking tries
The heuristics for tiered allocation has been improved in that itchooses to shrink such a resource next where by shrinking only thisresource a valid allocation can be made, if such a resource exists....
Merge branch 'stable-2.8' into 'master'
Improve hspace shrinking strategy
In tired allocation, hspace shrinks that resource of the instancenext, that causes failure on most nodes. While, this is not a badstrategy in general, it can lead hspace into a dead end if for a largenumber of nodes a particular resource blocks any further allocation of...
Merge branch 'stable-2.7' into stable-2.8
Conflicts: (trival, take union of added files/tests) Makefile.am test/hs/shelltests/htools-hspace.test...
Make shrinkByType aware of individual disks
When shrinking an instances, you can't just get smaller disk footprintwhile leaving the individual disks as they are. Make the shrinkheuristic aware of that fact, and decrease all individual disks aswell. Fixes issue 484....
Index nodes by their UUID
No longer index nodes by their name but by their UUID in the clusterconfig. This change changes large parts of the code, as the followingadjustments were necessary: * Change the index key to UUID in the configuration and the ConfigWriter, including all methods....
Add missing parenthesis to description of --machine-readable
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Add type annotation to avoid monomorphism restriction
Even though we need the let-bound variable showMoves onlyat type [(String, String)] -> IO (), it's most general typewould be (PrintfArg a, PrintfArg b) => [(a, b)] -> IO ().This causes the monomorphism restriction apply to that binding,...
add option --print-moves to hroller
If non-redundant instances are present in the cluster, hroller willplan for them to move to other nodes while the group is rebooted.This adds an option to also show this plan.
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
hspace prints info about spindles
Statistics about spindles are tracked. In human-readable output, spindlesare printed only when used (i.e., exclusive storage is enabled). Formachine-oriented output, they are always there.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>...
Add support for shrinking an instance spindles-wise
This makes tiered allocation in hspace work also with respect to spindles.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Spindles become part of htools resource spec
Spindles are now part of resource spec. Instances get created with spindlesspecified (which are just ignored when exclusive storage is disabled).
htools cluster score takes spindles into account
When exclusive storage is enabled, spindles are used instead of disk spaceto compute the cluster score.
Comments and variable names in computePDsk has been changed to match theactual code.
Update spindles when moving instances in htools
Spindles get updated, and errors raised when not enough free spindlesexist. No new error is raised when exclusive storage is disabled.
Unit tests included.
Unit tests for htools and exclusive storage
The existing tests are run also on nodes with exclusive storage enabled. Thevalues for spindles and exclusive storage are set in a consistent way, forboth nodes and instances.
Load complete instance disk information through LUXI
Information about size and spindles of all the disks of an instance is loadedby the LUXI backend, instead of faking one equivalent big disk. In this wayinstance policy checks are more accurate.
Load node spindles data in htools
The data structure for nodes gets a new field for free spindles, and theexisting field for total spindles gets renamed to avoid identifying it withthe node parameter that had the same name. These fields get filled with...
Refactor reading live data in htools
This simplifies different handling of individual items.
Check real spindles in ipolicies
When exclusive storage is enabled, the spindles in instance disks are usedto check the instance policies (as outlined in design-partitioned.rst).
Check the full instance specs in htools
Spindles and disk count are checked too. Existing functions have beenrefactored, so common parts are not duplicated.
Add spindles to instance disks in htools
A new data type is introduced for disks to store both size and spindles.When available, spindles are filled with input data. Except for loading andstoring, spindles are ignored.
Load exclusive_storage in htools
The node parameter is loaded into the data structures. No behavior is yetmodified.
Add unit test for text backend + fix bug
Test serialization and deserialization of instances. Fix check of secondarynode.
Restrict instance moves in hroller to the same node group
When scheduling rolling reboots, hroller looks for nodes to evacuatethe non-redundant instances to. This is done by greedily movinginstances to other nodes that can take them, policy wise and capacity...
Add tests for network-aware allocation
hail-alloc-invalid-network defines a cluster with two nodegroups and anallocation request which does not fit on any of the groups. Group 1 hasinvalid disk-templates while Group 2 is not connected to the rightnetworks....
Honor network connections in hail
Before trying to allocate nodes in node groups, node groups are nowfiltered based on the networks they are connected to an the networkswhich are required by the new instance.
Signed-off-by: Thomas Thrainer <thomasth@google.com>...
Parse NIC data from allocation request in hail
Add a NIC type and extend the Instance type by a list of NIC's. Parsethe NIC's in allocation requests and store them for now. Later patcheswill make use of this field in order to ensure that the requestedinstance is only placed in node groups wich are connected to those...
Support group networks in Text backend
The Text backend now parses network UUID (comma separated) andserializes them in the same form.The test data is adapted to the new format.
Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Parse node group networks
Extend the Group by the network ids it is connected to. Adaptthe IAlloc backend such that the networks are parsed correctly.This also required the adaption of test data.
hroller: option to ignore non-redundant instances
Add an option to hroller restoring the old behavior on not takingany non-redundant instances into account when forming rebootgroups.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Make hroller also plan for non-redundant instances
Non-redundant instances need to be moved to a different nodebefore maintenance of the node. Even though they can be moved toany node, there must be enough capacity to host the instances of thereboot group to be evacuated....
hroller: option to skip nodes with non-redundant instances
So far, hroller ignores the fact, that non-redundant instances exist.One option to deal is non-redundant instances is to not schedule thosenodes for reboot. This is supported by adding the option --skip-non-redundant....
Remove trailing whitespace
Support online-maintenance in hroller
Make hroller take into account the nodes (redundant) instanceswill be migrated to. This be behavior can be overridden by the--offline-maintenance option which will make hroller plan underthe assumption that all instances will be shutdown before starting...
Support construction of the graph of all reboot constraints
For online rolling reboots, there are two kind of restrictions. First,we cannot reboot the primary and secondary nodes of an instancetogether. Secondly, two nodes cannot be rebooted simultaneously, if...
Add option --one-step-only to hroller
Add a new option to hroller to only output information about the firstreboot group. Together with the option --node-tags this allows for thefollowing work flow. First tag all nodes; then repeatedly compute thefirst node group, handle these nodes and remove the tags. In between...
Sort reboot groups by size
Make hroller output the node groups not containing the master nodesorted by size, largest group first. The master node still remainsthe last node of the last reboot group. In this way, most progressis made when switching back to normal cluster operations after the...
Fix lint errors (redundant bracket)
Add option to hroller to select nodes based on tags
Add option --node-tags to tell hroller to consider only nodeswith these tags. A use case would be a tag tracking on whichnodes the maintenance has not yet been carried out, e.g., ifrolling reboots are interleaved with other cluster operations....
Make Rapi backed set node tags correctly
Since the htools representation of a node now allows addingthe node tags, populate this field correctly in the Rapibackend.
Make LUXI backed set node tags correctly
Since the htools representation of a node now allows addingthe node tags, populate this field correctly in the LUXIbackend.
Extend the text format to contain node tags
In order to allow htools to make use of node tags, add them to thetext format. This is done by adding a new column at the end of thenode lines. If this column is missing, the default value (whichis the empty list) is left unchanged, thus yielding the current...
Extend the Node in the htools to allow adding node tags
Since hroller (and probably other tools in the future) will supportnode selection based on node tags, extend the node data structure toallow adding this information.
Make hroller filter the nodes before coloring the graph
Hroller used to first compute a coloring of the node graph and thenfilter out the nodes that it had to work on. While the only filteringwas according to node groups this did not make a difference, as there...
Make mkNodeGraph ignore edges to non-present nodes
Change the behavior of mkNodeGraph to tacitly ignore all instanceswhere one of the nodes is not in the list of nodes. In this way, wecan construct sub-graphs by filtering the nodes and ignoring anypossibly added isolated nodes for the missing indexes....
hspace: Handle multiple ipolicy specs
With tiered allocation, hspace uses all the max specs in turn as theinitial instance spec.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Add multiple min/max specs in instance policy
Now instance policies can contain more than one min/max specs. This is themain element of the "Constrained instance sizes" section in the"Partitioned Ganeti" design doc.
This is a big patch, but changing the type of a configuration item requires...
Extend the simulation backend to also simulate a master node
In a simulated cluster as created by the simulation backend tothe htools, make the first node of the first node group the masternode. In this way, hools (like hroller) that require a master node...
Extend Text format by marking the master node
Sometimes, e.g., for hroller, it is necessary to know which nodeis the master node. Therefore this information has to be includedin the text format as well. Since we never use an offline node asmaster node, we can put this information can be put in the "is...
Make hroller insist on finding precisely one master node
As people rely on the master node being the last node of the lastgroup, make hroller fail, if no master node could be found in thecluster. This happens, e.g., if a backend format is used that does not...
In Rapi, set master correctly
The cluster data contains the information about the masternode. Use this information to set the isMaster bit correctly.
In Luxi, set the master correctly
Utility function to set the master node in a node list
The information about which node is the master node is a cluster-widesetting, in most formats provided independently of the node information.Most backends therefore have to set the isMaster bit indepently in the...
Make Hroller present master node last
If in the list of nodes to be scheduled for maintaince,one is marked as being the master node, schedule itas the last node in the last group.
Extend the node description by isMaster
Extend the description of the node by the propertyof being the master node; also provide and appropriatesetter function. This property will be used, e.g., byhroller to schedule reboot of the master last.
Fix warnings hlint 1.8.43 complained about
These lines are ok according to previous versions of hlint but triggeran error with version 1.8.43.
Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Merge branch 'devel-2.7'
Make the disks parameter available to the constructor
In that way, tools building on Instance will benefit from the correctedverification semantics of the instance policy on disk space.
Verify individual disks in Instance
Instance policy on disks is specified on a per-disk basis. So extendthe instance description by the sizes of the individual disks and modifythe instance policy verification to correctly check individual disks.
Refactor ispecs in ipolicy structures
Minimum and maximum instance specs are put together into a single elementof the instance policy. This is in preparation for introducing multiplemin/max specs.
Signed-off-by: Iustin Pop <iustin@google.com>...
Change hbal behaviour in case of early exit
Currently, hbal exits with status 1 if early exit is requested, evenwhen all jobs are successful. This is counter-intuitive behaviour, solet's fix it (Issue 386).
Note that the man page had conflicting information already, so it's a...
Fix low verbosity levels in htools
In a few cases, we tested the verbosity level for (== 0), instead ofhigher/lower than a certain value. If the user passes multiple"--quiet" options, this can result in negative verbosity levels, whichbehave like "extra verbosity"....
HRoller: print only online nodes
To make the graphs work even when instances live on offline nodes (eg.because we're offlining them just to exclude them, or because they haveinstance still on them) we just filter them out at the end, when we'regoing to print out the result....
HRoller: allow filtering by node group
Accept the -G option, and if it's passed require that it matches anodegroup, then only output nodes belonging to that group.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Switch the curl bindings from optional to required
Currently, we support curl being optional via some sporting exercises:ifdefs in the code, data types that represent 'Curl is disabled'state, etc. However, with the future work on RPC, we would have toeven make the dependencies list conditional on it, etc. This is too...
Add CLI-level option to override the priority
This just defined the new priority, with the same name as the Python one.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Enable use of the priority option in hbal
This patch adds the option to hbal, and uses it to tweak the submittedjobs. There are also two small shelltests for testing the parsing.
Make hbal opcode annotation more generic
Currently, hbal code always uses annotateOpCode function, which meanswe would have to pass the options data to all function in the callchain if we wanted to make this more flexible.
By abstracting the type of the annotator and passing it as an argument...
Remove use of 'head' and add hlint warning for it
Since 'head' is unsafe to use in most cases, this patch removes itsuse from most of the code, adds a lint warning for it (and for tail aswell), and adds override annotations in the few cases where it'sactually OK to use it (mainly when using head over the result of...
Harep.hs: fix a couple typos in comments and docstrings
Signed-off-by: Dato Simó <dato@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix typo in a comment
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
harep: create repair jobs
Implement 'doRepair' to create a repair job from a list of opcodes ifthe instance's policy allows it (otherwise set an ENOPERM result label),and the instance was previously healthy (i.e. not in ArFailed orArPendingRepair)....
harep: do not wait for repair job completion to set tags
Because of instance locks, after submitting a repair job we weren't able toset the "pending" tag until at least the first opcode of the job finished.Introduce a small delay in the repair job so as to allow the subsequent...
harep: finish execution with a summary of instance states
The harep tool prints messages for every action that it performs (or thatit doesn't perform). In case nothing is to be done at all, always printsome statistics of the current state of the cluster....
harep: initial parsing of tags
Parse auto-repair tags to set each instance in one of ArHealthy, ArFailed,or ArPendingRepair. The implementation tries to be well behaved when oldtags have been left behind, which future patches will still try not to do....
harep: check for completed jobs at the start of the program
As a first step before detecting any brokeness with instances, see if anyof our previous repairs have completed, and move instances to ArFailed orArHealthy accordingly. Do nothing if there are still running jobs for the...
harep: pure function to detect brokeness with instances
Add a 'detectBroken' function that determines whether an instance is in anunhealthy state, and what's needed to repair it. The repair is specified asan AutoRepairType constant, and a list of opcodes. The opcodes will only be...
Program/Harep.hs: add skeleton for the new auto-repair tool
harep(1) detects certain kind of problems with instances and applies theallowed set of solutions. See doc/design-autorepair.rst.
Signed-off-by: Dato Simó <dato@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
HTools/Types.hs: minor adjustments to auto-repair types
In particular:
- make ArHealthy take an optional AutoRepairData; this allows to represent the situation where a repair completed successfully, and hence there's an associated tag we might want to know about....
CLI.hs: fix double spaces in option help strings
Some help strings with continuation backslashes ('\') were providing aspace both before and after the backslash, resulting in double spaces inhelp output. Provide it only after the backslash, which fixes the issue and...
Loader.hs: ignore expired ArSuspended policies
At the moment, because 'mergeData' is pure, it may set instance auto-repairpolicies that are of the form `ArSuspended $ Until timestamp_in_the_past`.If later on the auto-repair tool notices this, it has lost access to what...
Loader.hs: rewrite extractExTags to use chompPrefix
Loader.hs: set instance auto-repair policy in mergeData
'getArPolicy' and 'setArPolicy' follow the precedence rules introduced inb1eb71c: within an object, the most restrictive tag wins; across object,the nearest tag wins.
Signed-off-by: Dato Simó <dato@google.com>...
Instance.hs: add an 'arPolicy' field for auto-repair policy
Add initial constants and Haskell ADTs for auto repair
In this commit, the AutoRepairType and AutoRepairResult types are defined,with the possible values specified in doc/design-autorepair.rst.
HTools/Types.hs: more auto-repair types
AutoRepairPolicy, AutoRepairStatus, and other auxiliary types are added.These are used only internally by the auto-repair tool, and parsed from thevarious object tags as defined in the design doc.
Fix a bad data type in Hcheck.hs
While trying to understand why some code was not being tested, Irealised that we have a bad data type in Hcheck.hs.
We have "data Level = GroupLvl | ClusterLvl", but then we need to passthe group name/index as well, so we have functions that look like the...
Move src/Ganeti/HTools/Program.hs to Program/Main.hs
This removes one more tab conflict; this is the last module in ourcode where we have both x.hs and x/.
Furthermore, we collapse all actual code into the new Main.hs module,leaving the htools.hs basically empty (will allow better testing in...
Rename htools/ to src/
Per offline discussions, this is the first patch of therenames. Tested with "make distcheck", seems to work fine.
The only change outside of the renaming is a bit of simplification inthe .gitignore rules; otherwise, simply s/htools/src/....