Add a scheduler to keep track of the job queue
In order to allow informed decissions on when to start a job,it is necessary for luxid to keep track of the (active partof the) job queue. Add a scheduler, similar to the config reader,that does this, but also schedules jobs to be executed. At the...
Move FStat related function to Utils
In this way, the functions to to decide, based on fstat, whethera file needs to be reloaded can used by other parts as well,in particular to monitor progress in the job queue.
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
Rename enqueueJobs to startJobs
This reflects better what the method actually does. Later,we will add a job scheduler that will provide a proper enqueuemethod.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>
Add default_iallocator_params cluster parameter
Add a cluster parameter to hold the iallocator parameters usedby the default instance allocator. Implement the option tomodify config.data, query config.data and upgrade man pages,tests and cfgupgrade tool. The new default_iallocator_params is...
Modify --mond to yes|no option
Modify --mond option used by hail, hbal and hinfo from nonargument to yes|no option.
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Activate QA for rapi queries via luxi
This patch enables QA testing for rapi queries for thenewly transformed queries from python to haskell(groups, instances, nodes, export, and networks). So far,the QA did not distinguish between resources that cannot be...
Set the received time stamp for new jobs
Since luxid now handles the job submission requests, it is alsoits responsibility to set the received time stamps. Do this.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Provide a function to set the received times tamp of a job
This is the pure function for changing the received time stamp;obtaining the actual time stamp has to be done in IO.
Document the jobqueue timestamp format
...and also provide a method to get the current time inthat format.
Fix removal of duplicates
Commit ede6df3d02 introduced a bug in the node querieswhere disk templates where paired up wrongly to theirstorage unit keys due to removal of duplicates at thewrong place. This patch fixes it.
Signed-off-by: Helga Velroyen <helgav@google.com>...
Fix retrieval of number of instances of a node
This patch fixes a FIXME to make the retrieval of thenumber of primary and secondary instances share morecommon code.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Use hypervisor / storage information only when requested
So far, the node queries ignored the list of fields andjust requested all available information from the backend.That means, for example if only hypervisor information isrequested, still the storage space calculation is...
Remove duplicate storage units in node query
This is a little performance tweak for the node queries.So far, the query code mapped the disk templates to storageunits. It could happen that two disk templates were mappedto the same storage unit and therefore the storage space...
Make luxid job submission be defined by replication
When receiving jobs to be submitted, make luxid replicate them to allmaster candidates and then return. The actual execution can be handledasynchronously.
Add function to enqueue jobs
Add a function that ensures that a given set of jobs gets executed atthe appropriate time. At the moment, this is still the simplemechanism of handing over everything to masterd; but even at thisstage, it has the benefit of allowing to remove code duplication in...
Add a function justBad to filter the Bad value of a list
In the same way as justOk allows to filter the Ok values,add justBad to filter the Bad values. While there, simplifythe definition of justOk.
Add wrapper to replicate many jobs
Add a convenience wrapper around replicateJob to replicatemany jobs to the master candidates.
Add function to replicate a job to the master candidates
As luxid will be handling the job queue soon, add a utility toreplicate jobs to all master candidates. Also log errors.
Compress JobqueueUpdate RPCs
Noded understands compressed RPCs for updating files in the(replicated) job queue. Make use of this feature.
Release internal lock for serial file later
When allocating new jobs, the new serial is replicated amongall master candidates. To avoid races with a later job idallocation, keep the internal lock till after the replicationattempt.
Rename LuxiSocket to MasterSocket
Rename the constants to name the socket to connect masterd,as the name LuxiSocket hints on luxid, which is differentfrom masterd.
Instance queries: remove opcodes and LU
Removes the remains of the instance queries.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Export and network queries: remove opcodes and LUs
Removes the remains of the export (aka backup) and networkqueries.
Group queries: remove opcodes and LUs
Removes the remains of the group query code.
Node queries: remove opcodes and LUs
Removes the remains of the node query code.
Remove instance query python code
This patch removes the python code for the instancequeries. So far, it replaces it by 'NotImplemented'exceptions. In a later patch of this series, theremaining part is remove completely.
Switch to Haskell for group queries
This patch removes the group query implementationin python in order to use the new Haskell implementation.
Switch to haskell for export (aka backup) queries
This patch removes the python implementation of export(aka backup) queries. So far, it is replaced by'NotImplemented' exceptions, but later in this seriesit will be replaced completely.
Switch to Haskell for network queries
This patch removes the python implementation of networkqueries and replaces it with 'NotImplemented' exceptions.It will be removed completely once all queries areswitched to Haskell.
masterd: implement query via luxi
The master daemon so far still did queries via the pythonimplementation. This patch implements that it uses thehaskell implementation and removes the node queries fromthe list of OP-queriable entities.
Implement 'QueryInstances' call in Haskell luxi server
While the command line uses the generic 'Query' call,rapi calls 'QueryInstances'. 'QueryInstances' so farwas not fully implemented in the Haskell implementationof the luxi server. This was discovered when trying to...
Fix bug regarding node UUID in haskell node queries
When moving from python to haskell node queries, a bugwas discovered where a node's UUID was mistakenlycompared to a node's name. This indirectly caused thecluster epo operation to fail, because it was not...
Remove --enable-split-query option
Switching from python to haskell queries, this patchremoves the option to dis/enable the haskell queriesat configure time.
hsqueeze: fix position of option in gnt-node power
hsqueeze can produce a shell script with the commands to squeezethe cluster; in the script, fix the position of the '-f' optionin the 'gnt-node power' command.
Add missing spindles paramter to idisk
When spindles where added to Ganeti, apparently it was forgottento add the parameter to the Haskell data type as well. Do this now.
Allow the NIC VLAN to be set to an empty string
The NIC VLAN has previously not been modified via Haskell, causing theINicParams class not to be used. With the recent job queuerefactorings, a modification definition is recorded, and for an emptystring (which is a legal default value) a crash happens. This patch...
Add the aggregate NIC VLAN instance field
Allow the retrieval of the VLANs of all the NICs through nic.vlans.
Signed-off-by: Hrvoje Ribicic <riba@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
Add NIC VLAN field retrieval to Haskell queries
The field was added to Python queries in an earlier version, and nowhas to be added to the Haskell queries as well.
hsqueeze: add option to show or save commands
Add an option to hsqueeze to show, or save in a file, the commandsthat have to be carried out.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
hsqueeze: when balancing also keep the move sequence
In hsqueeze, when computing the balancing sequence, alsoremember the sequence of moves that lead there.
Add function to get the moves between two configurations
Add a function that, given two adjacent cluster configurations ofa balancing sequence, computes the moves that led from the firstto the second configuration.
In the list of involved nodes, drop "no secondary"
When grouping moves into jobs, a new job set is started, if the newmove involves a node also touched by a previous move. When computingthe list of involved nodes, the new primary and secondary nodes of the...
Move saving of a command list to CLI
Move the function that saves a list of a command in a fileto CLI.hs. In this way, it is reusable by other htools.
Merge branch 'stable-2.10' into master
Add NodeGroup to InstanceConsoleInfoParams
Before, calls to `gnt-instance list -o console` with an instance on anode with a custom SSH port failed because of missing groupconfiguration. This patch fixes the problem.
Signed-off-by: Petr Pudlak <pudlak@google.com>...
Add "ndp/ssh_port" node group configuration parameter
The parameter is added to Haskell sources, from which the correspondingPython code is generated.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
group queries: test niceSort and remove FIXME
In an effort to get rid of the python queries soon, thispatch fixes a FIXME of the group queries regarding themissing testing of niceSort in this context. Due to thelack of actually weirdly named hostnames, this patch...
Don't allow optional node parameters
Ganeti does not support optional fields in parameters(hypervisor-params, disk-params, etc.). OpenVSwitch related nodeparameters were the exception to this rule, which caused numerousproblems related to import/export and (de-)serialization....
Haskell instance queries report 'USER_down'
Update instance queries on the Haskell codebase to report 'USER_down',similarly to the Python instance queries.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Add instance state 'USER_down'
Add instance state 'USER_down' which is a state used in reporting onlyand it represents the situation in which the user has shutdown theinstance but Ganeti's configuration still has this instance marked as'ADMIN_up'.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>...
Add Haskell hypervisor instance state
Add 'InstanceState' datatype which is the Haskell counterpart of thePython type 'HvInstanceState'.
Add missing Constructor for SetParamsMods
Disks and nics can not only be addressed by indices, but alsoby name. Hence add a constructor for this case as well, to befaithful to the python world.
Ignore hlint warning "Error: Too strict if" in Server.hs
A previous patch [229da00] added an annotation for ignoring the warning,but to the middle of a function, which doesn't compile. This patch movesthe annotation to the end of the function to correct the problem....
This warning appears only in newer versions of hlint (mine was v1.8.43),and in this case it's reported incorrectly. The arguments to "showJSON" have different types, therefore it's not possible to move "showJSON" in...
Make luxid handle SubmitManyJobs
Handle this request by writing the jobs to thequeue and inform masterd; masterd will then alsodistribute the jobs to all master candidates.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Make luxid handle SubmitJob
As luxid is to take over responsibility for the job queue,handle this request by writing the job to the queue and theninforming masterd; masterd will also distribute the job toall master candidates.
Add the predicate of the queue being open
Adding jobs to the queue is only allowed if the queueis not drained.
Provide path to the queue drain file
Since luxid is going to write to the job queue,it needs to honor drains of the queue as well.
Add Luxi Request to pick up a job in the queue
During the transition to the new daemon layout, from step 2onwards, luxid will write to the queue but masterd will triggerthe execution. Therefore, add a new luxi request to tell masterdto pick up a job that has already been written to the queue....
Move Haskell constants to proper module
Move Haskell constants from module 'Ganeti.HsConstants', which was atransitional module part of the Haskell to Python constant generationinfrastructure, to module 'Ganeti.Constants'.
Tear down Py2Hs constant infrastructure
Tear down Python to Haskell constant conversion infrastructure, whichincludes eliminating the autotool 'convert-constants' and the Haskellmodule, namely 'Ganeti.PyConstants', which held the convertedcontants.
Provide means of locking a file
Two avoid two processes simultaneously accessing the sameon-file structure, like the job queue, file locks are used.Therefore, provide this functionality in Haskell as well.
Provide path to the queue lock file
To avoid several processes accessing the queue at the same time,Ganeti locks the queue via a lock file on disk. Provide the pathto this file.
Provide a function to write jobs to disk
This function writes a (non-archived) job to disk. The filename can be computed from the job id, which is part of the job.
Hs2Py constants: additional module jstore
Add constants from additional modules ('ganeti.jstore') to the Haskellto Python constant generation.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Hs2Py constants: additional module errors
Add constants from module 'ganeti.errors' to the Haskell to Pythonconstant generation.
Provide means to allocate new job ids
Add utility functions to allocate new job ids by increasingthe value stored in the serial file. As this function isused in a multi-threaded program, synchronize access overan internal lock.
Add function to get master candidates from configuration
With luxi daemon taking over part of the job queue management, it willalso be responsible for replicating the queue to all master candidates.Therefore, add a function to extract the list of master candidates from...
Support RPC asking to replicate part of the job queue
To be able to replicate the job queue, in particular the serial,luxid needs to be able to send the jobqueue_update RPC. So addits definition.
Provide function to obtain the unique element of a list
This version of 'the' properly lives in the 'Result' monad,as opposed to traditional one calling 'error'. The reasonwhy it is 'Bad' that not precisely one element is returnedis given as argument....
Provide method to read job serial number
This methods allows reading the maximal job serial number fromdisk.
Provide convenience function to create Job from op-codes
This function handles the pure part of generating a job,i.e., assuming the job id already assigned and not settingtime stamps.
Add function to resolve dependencies in meta op code
When queueing many jobs, the dependencies between them need tobe resolved with the knowledge of their respective job id.Lift the computation of the absolute dependency to the levelof MetaOpCodes.
Add function to compute the absolute id of a dependency
SubmitManyJobs also accepts jobs with dependencies given asrelative ids. Together with the absolute id of the job, onceassigned, the dependency can be resolved. Add a function doingthis computation....
Provide a convenience method to optain a QueuedOpCode
When generating jobs from sequences of op-codes, it is necessaryto wrap op-codes into queued form.
Add utility function tryAndLogIOError
This function allows to use 'IO a' objects in a safeway, using the 'try' function; the outcome is reportedas a 'Result'. IOErrors are logged and the result is'Bad', while in the case of no errors, a result-yielding...
Hs2Py constants: additional module qlang
Add constants from additional modules ('ganeti.qlang') to the Haskellto Python constant generation.
Provide utility to atomically write a file
To keep our on-file data consistent at any moment, we changefile contents by atomically replacing the file with a new one.
Hs2Py constants: additional module luxi
Add constants from additional modules ('ganeti.luxi') to the Haskellto Python constant generation.
Fix reference to vcs version in query server
Fix reference to vcs version in query server to take its value fromthe Haskell constant in 'Ganeti.Version' instead of using the constantgenerated from Python.
Use configure constants instead of generated
Replace uses of the generated 'AF_INET*' constants with the constantsin Haskell's 'AutoConf'.
Hs2Py constants: add 'UUID_REGEX'
Add constant 'UUID_REGEX' to the Haskell to Python constantgeneration.
When loading configuration fails, include the reason
Before the message why a failure happened (like a parsing error) was lost.
Signed-off-by: Petr Pudlak <pudlak@google.com>Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Fix getNodeRole
In the configuration, the master node is now givenby its uuid. Therefore, compare the uuid and not thename to find out if a given node is the master.
Fix documentation
Fix documentation in constants containing values in seconds.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Hs2Py constants: 'hvsParameterTypes' and 'hvsParameters'
Add constants 'hvsParameters' and 'hvsParameterTypes' to the Haskellto Python constant generation.
hsqueeze: support planing for onlining nodes
If the amount of free resources falls below a given threshold,hsqueeze will suggest putting standby nodes back online untilthe minimum of free resources is reached, or all standby nodesare online.
Add an --minimal-resources option
Add a new option, to be used by hsqueeze, to specifythe amount of free resources that has to be on eachnode, in order not to start onlining standby nodes.It is given as a multiple of the standard allocation,as specified by the instance policy....
Text Backend: correctly read data for offline nodes
With standby nodes, simply ignoring the specification ofan offline node is not sufficient any more.
Change default for target resource to 2.0
The target resources, as used by hsqueeze, are supposed tobe strictly higher than the minimal resources. However,keeping minimal resources of less than a single instanceis not a useful reserve.
SimpleRetry on BlockDev.Remove()
Sometimes, upon disk removal, corresponding file descriptorsare kept briefly open by various processes (hypervisor, blkid, etc.).With this patch, we retry several times before raising the appropriateerror, thus making disk removal more robust against those corner cases....
Add possibility to compress to OpInstanceCreate
OpInstanceCreate now supports the 'compress' option. It allows to enablecompression during instance imports.
Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Support import with compressed instance moves
Support compressing instance data while sending it to the target node oninstance imports.
Add local compression to OpBackupExport
OpBackupExport is extended by a compress parameter. This parameter(either 'none' or 'gzip') controls if instance disks are compressedbefore being sent over the network to the destination node.
Signed-off-by: Thomas Thrainer <thomasth@google.com>...
Add possibility to compress to OpInstanceMove
OpInstanceMove now supports the 'compress' option. It allows to enablecompression for intra-cluster instance moves.
Remove trailing whitespace
Remove trailing whitespace from OpCodes.hs and OpParams.hs.
Add hsqueeze planning for compression
Add a new htool, hsqueeze, for dynamic power management. Thiscommit only implements the first useful part: plan (but notexecuting) taking nodes offline nodes still keeping within theresource limit.
Provide --target-resources option
Add a new option, to be used by hsqueeze, to specify thetarget free resources on each node. It is given as a multipleof the standard allocation, as specified in the instance policy.