Make hbal deal with no-LVM storage space properly
Since 2.6, hbal crashes when used on a cluster where noLVM storage is enabled at all. The problem is that italways queries for fields that only sometimes makesense for certain types of storage. This patch will...
Merge branch 'stable-2.8' into stable-2.9
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Fix integer overflow problem in hbal
waitForJobs in src/Ganeti/Jobs.hs has an integer overflow that (at least onamd64) causes it to break after waiting for ~10 minutes. This results in hbalsleeping forever (when compiled with squeeze's ghc 6.12.1) or crashing (when...
Add missing space
Also, refactor the line to keep it under 80 chars.
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Allow classic queries to use either names or UUIDs
When UUIDs are used in CLI commands, such addressing of objects failsor succeeds inconsistently across object types. Worse yet, some callsdo not fail, but simply return no result. This is due to the way the...
luxid: fix detection of master node in node query
Ganeti.Config.getNodeRole would rely on clusterMasterNode returning themaster node name, however clusterMasterNode returns the master node'sUUID. We fix this and a similar issue in Ganeti.Query.Node.nodeFields....
Fix gnt-network list-tags
Define network tags in haskell part.
This fixes issue 641.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Avoid lines longer than 80 chars
...as they're a lint error.
Fix evacuation out of drained node
Refactor reading live data in htools
This simplifies different handling of individual items.
Cherry-picked from 8c72f7119f50a11661aacba2a1abffdfdc6f7cfa.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
eta-reduce isIpV6
This is not only better style, but also fixes a lint error.Also use the infix form of `elem` to increase readability.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Ganeti.Rpc: use brackets for ipv6 addresses
We detect an IPv6 vs V4 address based on columns, rather than passingthe family from the cluster object to be more future proof (in casewe'll ever support mixed clusters).
Unfortunately quite a bit more code is required to test this: we need an...
Fix socket permissions after master-failover
When using gnt-cluster master-failover, on the soon-to-be-master theluxi daemon is started by the node daemon. This makes the luxidaemon inherit the node daemon's umask 077, making the communicationsocket unreadable to group members. When using Ganeti with non-root...
Fix path for serial file
It is actually located inside the queue directory.
Make the inst-status-xen collector more resilient
The data collectors should be able to provide as much information as possibleeven when the system is badly degraded. This patch modifies the instance statuscollector for xen so that it can keep providing as much data as possible, even...
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
Perform proper cleanup on termination of Haskell daemons
Haskell deamons did not perform proper cleanup at termination. There was no codefor removing the pid file, and the code in LuxiD for removing the unix socketfile was not working, because it is implemented with a "finally" statement,...
Mark the DSA host pubkey as optional
Commit a9542a4 introduced support for DSA SSH keys. However, the dsahostkeypubfield added to the config is not marked as optional in the Haskell components.As a result, luxid thinks the config file is corrupt and refuses to start. We...
Replace LD_* constants with DT_* constants
LD_* constants are basically like DT_* constants, exceptfor that both file and shared file were mapped to file.In order to not having to maintain three slightly differentsets of disk-related constants (DT, LD and ST), we merge...
Make the DRBD collector more failure-resilient
If information about instances is not available, just log the error andcontinue without it.
Add function to unwrap Results logging failures
Add logWarningIfBad, a utility function similar to exitIfBad, that logs awarning and returns a default value instead of just crashing the program ifthe unpacked value is Bad.
Signed-off-by: Michele Tartara <mtartara@google.com>...
Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Use FQDN to check master node status
The master node name in SS conf is stored as FQDN, so also use the FQDNon each node to check if it is the master node.
This fixes issue 551.
Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Support DSA SSH keys in bootstrap
As outlined in issue 338, Ganeti failed to initialize a cluster if noRSA SSH key is present on the master node. This patch extends Ganetissupport to DSA keys, so clusters with only DSA keys are possible now.
This fixes issue 338....
Include VCS version in `gnt-cluster version`
Also print the VCS version in the output of `gnt-cluster version`. Thismakes the VCS version also available over RAPI, etc.
Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Add cleanup parameter to instance failover
Most of the code is shared with instance migrate, so we actually only needto add the parameter and pass its value along the the common code.
Also, tests and harep are updated to support the right set of options to...
gnt-cluster verify: consider shared file storage
This patch enhances 'gnt-cluster verify' in a way that itnow validates the acceptance and existance of the sharedstorage directory.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
gnt-cluster modify --shared-file-storage-dir
This patch introduces to 'gnt-cluster modify' the option'--shared-file-storage-dir' to change the default directoryfor instances using shared file storage at cluster runtime.
Signed-off-by: Helga Velroyen <helgav@google.com>...
gnt-cluster {init, modify} --file-storage-dir
This patch implements consistent usage and behavior ofthe --file-storage-dir option in 'gnt-cluster init'and 'gnt-cluster modify'. It includes a bunch of unittests as well.
Additionally, it enables the previously written unit...
Fix permission errors for split users
Correctly set ownership and permissions for daemon log files, correctthe name of the luxid logfile and set the ownership of the query socketcorrectly.
Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Conflicts: src/Ganeti/Utils.hs (trivial)trivial
Let ReqNodeInstances work with node UUIDs
The "primaryNode" and "secondaryNode" fields of "Instance" entities in thecluster configuration where changed to use UUIDs instead of names. TheReqNodeInstances query inside Confd was not upgraded yet, and was thus...
Add documentation line to getNodeInstances
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Add debug logging to Confd
Knowing the replies actually sent helps tracking down problems much moreefficiently.
Fix permission problem related to Issue 477
Commit 91525dee856951ace940c78b6254a1c7344b4803 fixed Issue 477 but broke"gnt-cluster info".
This commit offers a solution to both problems, by changing the permissionof the socket instead of changing the permission the confd process is run...
Add hs function to easily change file ownership
The Haskell library functions only allow to change file ownership usinguid/gid. A function for doing that with explicit names is added by thiscommit.
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Rename queryd to luxid
As queryd will, in the future, handle all LUXI request, queue jobs andmost likely perform various other tasks, it is renamed to luxid already.This will safe some headache when upgrading Ganeti installations, as wedon't have to deal with a daemon rename....
Add queryd daemon (split from confd)
queryd is added as a new daemon which handles configuration queries overLUXI. This functionality was removed from confd, which now only queriesover the network.
The queryd user is added to the master group such that it can access...
Extract ConfigReader from Confd/Server.hs
Confd's functionality to watch the Ganeti configuration file isextracted to the ConfigReader module. No functional changes areintroduced.
This extraction makes will enable us to split queryd from confd, asqueryd will have to use the same functionality....
Add timestamps to haskell network query fields
Add timestamp fields to the list of available network query fields inthe Haskell code.
Signed-off-by: Christos Stavrakakis <cstavr@grnet.gr>Reviewed-by: Helga Velroyen <helgav@google.com>
Merge branch 'stable-2.8' into master
Merge branch 'stable-2.7' into stable-2.8
Conflicts:...
Verify file storage path
This patch adds two verification steps to 'gnt-clusterverify':- The configured file storage directory is checked against the allowed file storage directories file.- We check whether the configured file storage directory is existing and writable on each node....
Allow modify_etc_hosts to be changed
The modify_etc_hosts options, enabling the cluster to modify the /etc/hostsfiles of nodes, and to keep them in sync, could only be set at cluster inittime.
With this commit it can now be changed through modify_etc_hosts as well....
Add luxiReqQueryNetworks to LuxiOp
When the QueryNetwork was introduced as a method, apparentlyit was forgotten in the Haskell world. Add it here as well.
Log received message at debug level
At debug level, we can well afford to have a detailed entryfor each message received by a server.
Log RPC errors from inside executeRpcCall
executeRpcCall is the function to be used for executing RPCs, so it makes senseto use it as the single point for logging all thte RPC call errors.
Fixes Issue 293.
Factor out the logRpcErrors function
This function can be useful to multiple RPC calls, therefore it is movedto the file containing the common RPC functions.
Also, it is made more generic by changing its signature.
Add support for querying network timestamps
Add creation and modified timestamps when creating a new network, andextend the available query fields for networks with these fields,namely 'ctime' and 'mtime'.
Signed-off-by: Christos Stavrakakis <cstavr@grnet.gr>...
Prevent silent failure in case of connection problems
While running "gnt-node list", if a query to ConfD fails (especiallybecause of permission problems) it used to just fail silently, with gnt-nodeshowing question marks instead of data.
With this patch, ConfD records the error in its log file, together with a...
Include "instance" information in LV data collector
This commit enables the logical volume data collector to get information aboutthe instances and to link it to the information about logical volumes.
The list of parameters accepted by the collector is expanded to allow proper...
Add "includeLogicalId" function for Disks
This function checks whether a disk contains a given Lvm logical ID,directly or through its children.
Unit tests are added as well.
Add option for loading serialized instances
Monitoring CLI tools might have to load serialized lists of instances(mainly for testing reasons). This patch adds an option to allow that.
Factor out lv info gathering function
The buildJsonReport function will soon have to perform the coupling ofinstance data with LVInfo data. In preparation for that, in order to makeit more readable, the instructions for obtaining LVInfos are factored out...
Add "instance" field to LVInfo
Extend the LVInfo data structure with the field for storing the name ofthe instance it is paired with.
Update the tests accordingly.
Factor out the getInstances function
The getInstances function can be useful in general, but is defined insidethe InstStatus data collector. This commit takes it out and adds it toa proper (newly created) library.
Extraction of storage info by type
There was a bug in the node queries. It was assumed thatthe returned storage space information was in a particularorder. With the changes in the storage reporting, thisorder is not that reliable anymore, in particular, the...
Clean up work around for host name filtering
These functions simply served as a work around to expresshost name matching by regular expressions, instead of usingcorrect equality filter on host names that providesthe correct matching already.
Do not handle host queries special
As, since 91c1a265, the equality used for host names alreadyis based on matching, there is no need to use a special functionfor this any more.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Turn 'exclusive_storage' into storage parameter (hs)
This is the haskell implementation of my patch "Extend RPC call'node_info' by storage parameters". It turns the 'exclusivestorage' flag into a storage parameter of the LVM storage types.Besides that, this patch moves some types into the Types.hs....
Prevent LV parser compile error
The LV parser is not compiled correctly by more recent versions of GHCbecause of more strict checks.
lvCommand is a surely non-empty list, but the compiler still refuses itasking for explicitly management of the empty list case....
Export CPUs used by the node OS
They are exported through the LUXI, RAPI, and IAllocator interfaces.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Load CPUs used by the node OS in htools
A new field is added to the Node type, and it's used to initialize the usedCPUs field.
The signature of Node.create has been split among lines to match theparameter list.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>...
Add LV collector to the monitoring daemon
Allow the monitoring daemon to use the LV data collector.
Add LV data collector
This commit adds the LV data collector.
Also, the lvCommand function was not providing the correct value as expected bythe readProcess function, so it was fixed.
Add LV parser
Add the parser for getting the information about the logical volumes in thesystem.
hroller: option --full-evacuation
Add an option to hroller, to plan for full evacuation of thenodes to be rebooted, i.e., also plan for replacement secondarynodes for all instances on the node after migrating out instanceswith this node as primary.
Extract a partition functional
Separate the partitionNonRedundant function in hroller into ageneral functional that partitions a list of nodes accordingto some clearing strategy and the specialization of movingnon-redundant instances out. In this way, we don't have to...
Extract functional for greedily clearing nodes
The method clearNodes in hroller greedily clears nodes ofnon-redundant instances by moving them to a different node. This patchseparates the greedy clearing algorithm from the specialization tonon-redundant instances; in this way, we don't have to duplicate code...
Make hroller not consider offline nodes for evacuation
When planing on where to evacuate the non-redundant instancesof the nodes to be rebooted, it doesn't make sense to consideroffline nodes. So add this restriction to hroller.
Update comments in hroller code
hroller schedules moves of instances to have rebooted nodesfree of instances with this node as primary. Update the commentsto reflect that this move planning is for non-redundant instancesonly.
Remove obsolete TODO
Originally, hroller started as a tool for offline maintenance only.There it made sense to warn about instances still running. By now,default planning is to migrate instance off the nodes to be rebooted,with options for other behavior (like pretending that all instances...
Make NodeInfo (hs) accept arbitrary storage types
So far, the Haskell implementation of NodeInfo justrequests storage information about volume groups.With this patch, storage info for abitrary storagetypes can be requested.
Storage utility functions for Haskell
In order to extend the Haskell version of the NodeInfoquery, we need some utility functions to deal withdisk templates and storage types.
Support big-step shrinking in tiered allocation
In tiered allocation, if by shrinking only a single resource a validallocation can be found, shrinking is bound to shrink on this resource.Of course, after shrinking that resource a little bit without finding...
For node queries allow short forms of host names
For node queries use the host-name filter instead of the simpleequality-based one.
Provide a special filter for host names
For host names, usually short forms are used, e.g., node1 or node1.subinstead of the full qualified node1.sub.example.com. Therefore comparingnode names only by equality is too restrictive. This patch provides an...
Index instances by their UUID
No longer index instances by their name but by their UUID in the clusterconfig. This change changes large parts of the code, as the followingadjustments were necessary: * Change the index key to UUID in the configuration and the...
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
In tiered allocation, cut non-promising shrinking tries
The heuristics for tiered allocation has been improved in that itchooses to shrink such a resource next where by shrinking only thisresource a valid allocation can be made, if such a resource exists....
Merge branch 'stable-2.8' into 'master'
Revert "Storage utility functions for Haskell"
This reverts commit88d27b8aa8adc2e5ced773909f1d40812c5a6ea7.
Revert "Make NodeInfo (hs) accept arbitrary storage types"
This reverts commite89525a859b2e841c08fce506c0b68b97c7efe61.
Rename directory 'Block' to 'Storage'
This patch renames the 'Block' directory to 'Storage' inthe Haskell code base. The same rename was done in thepython code base earlier this quarter. We generalize thename, because we needed a place for general storage...
Improve hspace shrinking strategy
In tired allocation, hspace shrinks that resource of the instancenext, that causes failure on most nodes. While, this is not a badstrategy in general, it can lead hspace into a dead end if for a largenumber of nodes a particular resource blocks any further allocation of...
Convenience function for iterating while the result is Ok
For a function f :: a -> GenericResult a, iterate it (in the sense of themonad), until the result is Bad; return the list of values occurred.
Provide witness for the sum-type structure of GenericResult
GenericResult, while rightfully a type of its own, is isomorphicto Either. So, also provide the case analysis function (i.e., theuniversal arrow out of the sum).