Add option for loading serialized instances
Monitoring CLI tools might have to load serialized lists of instances(mainly for testing reasons). This patch adds an option to allow that.
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Factor out lv info gathering function
The buildJsonReport function will soon have to perform the coupling ofinstance data with LVInfo data. In preparation for that, in order to makeit more readable, the instructions for obtaining LVInfos are factored out...
Add "instance" field to LVInfo
Extend the LVInfo data structure with the field for storing the name ofthe instance it is paired with.
Update the tests accordingly.
Factor out the getInstances function
The getInstances function can be useful in general, but is defined insidethe InstStatus data collector. This commit takes it out and adds it toa proper (newly created) library.
Signed-off-by: Michele Tartara <mtartara@google.com>...
Extraction of storage info by type
There was a bug in the node queries. It was assumed thatthe returned storage space information was in a particularorder. With the changes in the storage reporting, thisorder is not that reliable anymore, in particular, the...
Turn 'exclusive_storage' into storage parameter (hs)
This is the haskell implementation of my patch "Extend RPC call'node_info' by storage parameters". It turns the 'exclusivestorage' flag into a storage parameter of the LVM storage types.Besides that, this patch moves some types into the Types.hs....
Prevent LV parser compile error
The LV parser is not compiled correctly by more recent versions of GHCbecause of more strict checks.
lvCommand is a surely non-empty list, but the compiler still refuses itasking for explicitly management of the empty list case....
Load CPUs used by the node OS in htools
A new field is added to the Node type, and it's used to initialize the usedCPUs field.
The signature of Node.create has been split among lines to match theparameter list.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>...
Export CPUs used by the node OS
They are exported through the LUXI, RAPI, and IAllocator interfaces.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Add LV collector to the monitoring daemon
Allow the monitoring daemon to use the LV data collector.
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Add LV data collector
This commit adds the LV data collector.
Also, the lvCommand function was not providing the correct value as expected bythe readProcess function, so it was fixed.
Add LV parser
Add the parser for getting the information about the logical volumes in thesystem.
hroller: option --full-evacuation
Add an option to hroller, to plan for full evacuation of thenodes to be rebooted, i.e., also plan for replacement secondarynodes for all instances on the node after migrating out instanceswith this node as primary.
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
Extract a partition functional
Separate the partitionNonRedundant function in hroller into ageneral functional that partitions a list of nodes accordingto some clearing strategy and the specialization of movingnon-redundant instances out. In this way, we don't have to...
Extract functional for greedily clearing nodes
The method clearNodes in hroller greedily clears nodes ofnon-redundant instances by moving them to a different node. This patchseparates the greedy clearing algorithm from the specialization tonon-redundant instances; in this way, we don't have to duplicate code...
Make hroller not consider offline nodes for evacuation
When planing on where to evacuate the non-redundant instancesof the nodes to be rebooted, it doesn't make sense to consideroffline nodes. So add this restriction to hroller.
Update comments in hroller code
hroller schedules moves of instances to have rebooted nodesfree of instances with this node as primary. Update the commentsto reflect that this move planning is for non-redundant instancesonly.
Remove obsolete TODO
Originally, hroller started as a tool for offline maintenance only.There it made sense to warn about instances still running. By now,default planning is to migrate instance off the nodes to be rebooted,with options for other behavior (like pretending that all instances...
Make NodeInfo (hs) accept arbitrary storage types
So far, the Haskell implementation of NodeInfo justrequests storage information about volume groups.With this patch, storage info for abitrary storagetypes can be requested.
Signed-off-by: Helga Velroyen <helgav@google.com>...
Storage utility functions for Haskell
In order to extend the Haskell version of the NodeInfoquery, we need some utility functions to deal withdisk templates and storage types.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
Merge branch 'stable-2.8' into master
Merge branch 'stable-2.7' into stable-2.8
For node queries allow short forms of host names
For node queries use the host-name filter instead of the simpleequality-based one.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Provide a special filter for host names
For host names, usually short forms are used, e.g., node1 or node1.subinstead of the full qualified node1.sub.example.com. Therefore comparingnode names only by equality is too restrictive. This patch provides an...
Index instances by their UUID
No longer index instances by their name but by their UUID in the clusterconfig. This change changes large parts of the code, as the followingadjustments were necessary: * Change the index key to UUID in the configuration and the...
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
In tiered allocation, cut non-promising shrinking tries
The heuristics for tiered allocation has been improved in that itchooses to shrink such a resource next where by shrinking only thisresource a valid allocation can be made, if such a resource exists....
Merge branch 'stable-2.8' into 'master'
Revert "Storage utility functions for Haskell"
This reverts commit88d27b8aa8adc2e5ced773909f1d40812c5a6ea7.
Revert "Make NodeInfo (hs) accept arbitrary storage types"
This reverts commite89525a859b2e841c08fce506c0b68b97c7efe61.
Rename directory 'Block' to 'Storage'
This patch renames the 'Block' directory to 'Storage' inthe Haskell code base. The same rename was done in thepython code base earlier this quarter. We generalize thename, because we needed a place for general storage...
Improve hspace shrinking strategy
In tired allocation, hspace shrinks that resource of the instancenext, that causes failure on most nodes. While, this is not a badstrategy in general, it can lead hspace into a dead end if for a largenumber of nodes a particular resource blocks any further allocation of...
Convenience function for iterating while the result is Ok
For a function f :: a -> GenericResult a, iterate it (in the sense of themonad), until the result is Bad; return the list of values occurred.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Provide witness for the sum-type structure of GenericResult
GenericResult, while rightfully a type of its own, is isomorphicto Either. So, also provide the case analysis function (i.e., theuniversal arrow out of the sum).
Refactor NodeInfo RPC regarding storage reporting
The NodeInfo RPC call is refactored to handle now more thanjust storage reporting for volume groups.
Since NodeInfo now returns storage space information notnecessarily for volume groups, but also for other storage...
Add storage type to NodeInfo result
So far, the storage information returned from RPC callNodeInfo contained only information about volume groups.In order to extend the storage reporting to other storagetype, we include another field "type" in the result of...
Conflicts: (trival, take union of added files/tests) Makefile.am test/hs/shelltests/htools-hspace.test...
Make shrinkByType aware of individual disks
When shrinking an instances, you can't just get smaller disk footprintwhile leaving the individual disks as they are. Make the shrinkheuristic aware of that fact, and decrease all individual disks aswell. Fixes issue 484....
Fix lookup of xen toolstack in Haskell
There was a bug in the haskell implementation of nodequery which made the lookup of the xen toolstack xm/xlfail.
Index nodes by their UUID
No longer index nodes by their name but by their UUID in the clusterconfig. This change changes large parts of the code, as the followingadjustments were necessary: * Change the index key to UUID in the configuration and the ConfigWriter, including all methods....
Add missing parenthesis to description of --machine-readable
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Add hvparams to RPC call 'node_info'
This patch adds the hvparams parameter to the RPC call'node_info'. It also adjusts the related code in noded.pyand Query/Node.hs
Add type annotation to avoid monomorphism restriction
Even though we need the let-bound variable showMoves onlyat type [(String, String)] -> IO (), it's most general typewould be (PrintfArg a, PrintfArg b) => [(a, b)] -> IO ().This causes the monomorphism restriction apply to that binding,...
add option --print-moves to hroller
If non-redundant instances are present in the cluster, hroller willplan for them to move to other nodes while the group is rebooted.This adds an option to also show this plan.
hspace prints info about spindles
Statistics about spindles are tracked. In human-readable output, spindlesare printed only when used (i.e., exclusive storage is enabled). Formachine-oriented output, they are always there.
Add support for shrinking an instance spindles-wise
This makes tiered allocation in hspace work also with respect to spindles.
Spindles become part of htools resource spec
Spindles are now part of resource spec. Instances get created with spindlesspecified (which are just ignored when exclusive storage is disabled).
htools cluster score takes spindles into account
When exclusive storage is enabled, spindles are used instead of disk spaceto compute the cluster score.
Comments and variable names in computePDsk has been changed to match theactual code.
Update spindles when moving instances in htools
Spindles get updated, and errors raised when not enough free spindlesexist. No new error is raised when exclusive storage is disabled.
Unit tests included.
Unit tests for htools and exclusive storage
The existing tests are run also on nodes with exclusive storage enabled. Thevalues for spindles and exclusive storage are set in a consistent way, forboth nodes and instances.
Load complete instance disk information through LUXI
Information about size and spindles of all the disks of an instance is loadedby the LUXI backend, instead of faking one equivalent big disk. In this wayinstance policy checks are more accurate.
Load node spindles data in htools
The data structure for nodes gets a new field for free spindles, and theexisting field for total spindles gets renamed to avoid identifying it withthe node parameter that had the same name. These fields get filled with...
Refactor reading live data in htools
This simplifies different handling of individual items.
Export node spindles
Node spindles (queried live) are exported through the LUXI, RAPI, andiallocator interfaces.
Add a force option to the ClusterSetParams Opcode
If set, the op code will, in particular, try to set the master IPon the new netdev, even if shutting down the master IP on the oldnetdev failed.
Fix shadowing of library function
The "reads" field shadows a library function from Prelude. This commitfixes the problem.
Let the monitoring daemon provide the diskstats collector
Add the collector to the list of those provided my the monitoring daemon.
Add diskstats data collector
Add a new data collector responsible for gathering disk performancestatistics.
Add a CLI parameter for input files
For many data collectors it is useful (especially for testing) to have topossibility to specify an input file.
This commit adds a generic option for doing that.
Add /proc/diskstats parser
Add a parser for interpreting the content of the /proc/diskstats file,providing information about the state of the disks of the system.
Check real spindles in ipolicies
When exclusive storage is enabled, the spindles in instance disks are usedto check the instance policies (as outlined in design-partitioned.rst).
Check the full instance specs in htools
Spindles and disk count are checked too. Existing functions have beenrefactored, so common parts are not duplicated.
Add spindles to instance disks in htools
A new data type is introduced for disks to store both size and spindles.When available, spindles are filled with input data. Except for loading andstoring, spindles are ignored.
Load exclusive_storage in htools
The node parameter is loaded into the data structures. No behavior is yetmodified.
New function to load JSON arrays of optional values
This will be needed to load spindles in some htools backends. Unit testsprovided.
Add unit test for text backend + fix bug
Test serialization and deserialization of instances. Fix check of secondarynode.
Restrict instance moves in hroller to the same node group
When scheduling rolling reboots, hroller looks for nodes to evacuatethe non-redundant instances to. This is done by greedily movinginstances to other nodes that can take them, policy wise and capacity...
Add tests for network-aware allocation
hail-alloc-invalid-network defines a cluster with two nodegroups and anallocation request which does not fit on any of the groups. Group 1 hasinvalid disk-templates while Group 2 is not connected to the rightnetworks....
Honor network connections in hail
Before trying to allocate nodes in node groups, node groups are nowfiltered based on the networks they are connected to an the networkswhich are required by the new instance.
Signed-off-by: Thomas Thrainer <thomasth@google.com>...
Parse NIC data from allocation request in hail
Add a NIC type and extend the Instance type by a list of NIC's. Parsethe NIC's in allocation requests and store them for now. Later patcheswill make use of this field in order to ensure that the requestedinstance is only placed in node groups wich are connected to those...
Support group networks in Text backend
The Text backend now parses network UUID (comma separated) andserializes them in the same form.The test data is adapted to the new format.
Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Parse node group networks
Extend the Group by the network ids it is connected to. Adaptthe IAlloc backend such that the networks are parsed correctly.This also required the adaption of test data.
Add disks_active to configuration
This flag tracks if the disks of an instace are supposed to be active.That's the case when an instance is running or when its disks gotactivated explicitly (and in a couple of other cases).It will be used by watcher to re-activate disks after a node reboot....
Add spindles field to disk object
The field is filled with the value provided on the command line.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
hroller: option to ignore non-redundant instances
Add an option to hroller restoring the old behavior on not takingany non-redundant instances into account when forming rebootgroups.
Make hroller also plan for non-redundant instances
Non-redundant instances need to be moved to a different nodebefore maintenance of the node. Even though they can be moved toany node, there must be enough capacity to host the instances of thereboot group to be evacuated....
hroller: option to skip nodes with non-redundant instances
So far, hroller ignores the fact, that non-redundant instances exist.One option to deal is non-redundant instances is to not schedule thosenodes for reboot. This is supported by adding the option --skip-non-redundant....
Remove trailing whitespace
RPC 'node_info': <storage_type,key> instead of vg_names
This replaces the field 'vg_names' in the RPC call of 'node info' by'storage_units'. A storage unit is a tuple <storage_type,key>and a generalization of a vg_name. The list of vg names is replaced by...
Make HS ConfD client IPv6 compatible
The Haskell ConfD client was assuming internet addresses to be IPv4. Thispatch modifies the client so that it is able to automatically detect theprotocol it should use by analyzing the address it is told to connect to....
Factor out resolveAddr function
This function can be useful to many parts of the code to convert the stringrepresentation of an IP (v4 or v6) address into the proper data type.
Use dcName in mon-collector
Instead of manually specify the name of the data collectors in mon-collector,just use the dcName field each of them exports.
Factor out the mergeStatuses function
It will be used by multiple data collectors, not only the DRBD collector.
Add global status field to the instance status collector
The global status is computed from the statuses of the single instances.
The output json format is adapted to include this piece of information, asprescribed by the design document.
Export the Instance Status collector report
It will need to be accessed by the monitoring daemon.
Add inst-status-xen to the monitoring daemon
Enable the monitoring daemon to invoke the Xen instance status data collector.Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Add HS functions for getting the instance reason path
The getInstReasonFilename is built to resemble the python correspondingfunction.
Add module containing function for getting info from Xen
The Xen instance status data collector will require to get some informationfrom the hypervisor. This commit introduces a module providing such functions.
Add the core of the instance status collector
Add the Xen instance status data collector with only its core features.The next commits will add more reporting functionalities.
The access to the collector is made possible through the mon-collectortool....
Export the actual instance state
Compute the actual state of the instance and export it.
Determine status of one instance
Added function for determining whether the status of an instance is ok, and torepresent this information in the corresponding field in the report.
Include the reason trail in the instance collector output
Fetch the reason trail from file, failing gracefully if it is not found, andinclude it in the output of the instance status data collector.
Export Instance Status collector information
Name, version, format version, category and kind of the Instance Status datacollector are now exported.
Factor out function for building report
Instead of building the report as part of the "Main" function, have itbuilt by its own dedicated function, so that it will be able to export itdirectly to the monitoring daemon when needed.
Support online-maintenance in hroller
Make hroller take into account the nodes (redundant) instanceswill be migrated to. This be behavior can be overridden by the--offline-maintenance option which will make hroller plan underthe assumption that all instances will be shutdown before starting...
Support construction of the graph of all reboot constraints
For online rolling reboots, there are two kind of restrictions. First,we cannot reboot the primary and secondary nodes of an instancetogether. Secondly, two nodes cannot be rebooted simultaneously, if...
Add option --one-step-only to hroller
Add a new option to hroller to only output information about the firstreboot group. Together with the option --node-tags this allows for thefollowing work flow. First tag all nodes; then repeatedly compute thefirst node group, handle these nodes and remove the tags. In between...