Use the new MVarLock in the job queue and the query server
A small refactoring was done in handling ArchiveJob so that it waspossible to use 'withLock'.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Don't log anything during forking a job process
As it seems that using stderr by both the master process and childprocesses could be a cause of forking problems, logging is now deferreduntil the forking handshake is over and the master and child process are...
Add 'install_image' param to 'Cluster'
The 'Cluster.install_image' param holds the location of the image tobe used for the safe installation instances.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Remove all references to the masterd socket
...as masterd is no more.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>
Make luxid activate the master IP on startup
This is the last task currently done by masterd, so makeingluxid taking this over, we can get rid of masterd.
Use getMasterOrCandidates
...instead of replicating the functionality on the fly.
Add the compression tools parameter
This patch makes the myriad of changes necessary for the compressiontool parameter to be added. The filtering of compression tools forsuspicious entries has been added for this exact purpose.
Signed-off-by: Hrvoje Ribicic <riba@google.com>...
Use 'getInstDisks' function to retrieve the disks
Change Haskell's Query code to use Config's 'getInstDisks' function inorder to retrieve the instance's disks.
Signed-off-by: Ilias Tsitsimpis <iliastsi@grnet.gr>Reviewed-by: Jose A. Lopes <jabolopes@google.com>
Retry forking a new process several times
Apparently due to some library bug, forking sometimes fails: The newprocess is running, but it doesn't start executing. Therefore we retrythe attempt several times.
Signed-off-by: Petr Pudlak <pudlak@google.com>...
Pass the debug level to forked jobs
When forking off jobs, make them inherit the debug levelof the parent process (i.e., of luxid). In this way, wecan debug jobs in test clusters without cluttering productionlogs. We pass the debug level through the environment instead...
Add VTypeFloat
...in order not to have to declare floating pointvalues as VTypeInt and rely on the sloppiness ofthe JSON specification to not distinguish betweenintegers and floating point numbers.
When checking job death, check if its lock is the Luxi lock
In this case, the call trying to acquire a shared lock always succeeds,because the daemon already has an exclusive lock, which falsely reportsthat the job has died.
Cancel jobs by sending SIGTERM
We can only send the signal if the job is alive and if there is aprocess ID in the job file (which means that the signal handler has beeninstalled). If it's missing, we need to wait and retry.
In addition, after we send the signal, we wait for the job to actually...
When forking a job, close all unnecessary file descriptors
This is a bit problematic as there is no portable way how to list allopen file descriptors, and we can't track them all, because they're alsoopened by third party libraries such as inotify. Therefore we use...
When starting the Luxi daemon, check if it's able to fork
If a Haskell program is compiled with -threaded, then inheriting openfile descriptors doesn't work, which breaks our job death detectionmechanism. (And on older GHC versions even forking doesn't work.)...
Make luxid aware of SIGCHLD
As luxid forks off processes now, it may receive SIGCHLDsignals. Hence add a handler for this. Since we obtain thesuccess of the child from the job file, ignoring is goodenough.
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
Add Haskell and Python modules for running jobs as processes
They will be used by Luxi daemon to spawn jobs as separate processes.
The communication protocol between the Luxi daemon and a spawned processis described in the documentation of module Ganeti.Query.Exec....
Add an utility function for writing and replicating a job
Use the function where appropriate.
Also handling of CancelJob is slightly refactored to use ResultT, whichis used by the new function.
Extend 'lockFile' to return the file descriptor
.. of the locked file so that it can be closed later, if needed.
Report non-existent jobs as such
When queried to WaitForJobChange of an non-existent job,report this as an error.
Add the zeroing-image option
This patch adds the zeroing-image option to gnt-cluster and theOpBackupExport params. The many changes are all minor, yet necessary.
Signed-off-by: Hrvoje Ribicic <riba@google.com>Reviewed-by: Jose A. Lopes <jabolopes@google.com>
Switch to ClientType as identifier
...instead of Either String JobId.
Make LuxiD query WConfD for locks
Since WConfD is now the authoritative source for locks, make LuxiDquery this daemon for lock information rather than the master daemon.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Make locks field use live data
So far, the description of the locks fields was made under the assumptionthat lock queries wouldn't be answered by Luxid anyway, and hence it wasenough to parse such requests. However, now luxid will answer these queriesafter getting a snapshot of the locks status from wconfd. Hence make the fields...
Add Foldable/Traversable instances for GenericContainer
This makes working with it easier as it allows use of many standardfunctions.
Merge branch 'stable-2.11' into master
Show mac prefix setting in gnt-cluster info
Include mac-prefix setting in the output of 'gnt-cluster info'command.
This fixes part of issue 239.
Signed-off-by: Dimitris Bliablias <bl.dimitris@gmail.com>Reviewed-by: Jose A. Lopes <jabolopes@google.com>
Have SubmitManyJobs add entries to the reason trail
Not only SubmitJobToDrainedQueue (and therefore SubmitJob) but alsoSubmitManyJobs has to add "gnt:opcode:*" entries to the reason trail.
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Have LuxiD add the "gnt:opcode" reason trail entry
The entry used to be added in jqueue.py, but after switching the queuemanagement from masterd to luxyd it had been lost. Now, make LuxiD responsiblefor adding it.
Signed-off-by: Michele Tartara <mtartara@google.com>...
Add query support for locks to luxid
While requests only get forwarded, it still helps toget luxid feature-complete with respect to master.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>Cherry-picked from commit a6e406ce376453e90e598c7be68809d6a7bd7d41...
Provide fields for lock queries
For luxid to be feature-complete with respect to masterd, italso needs to answer requests about locks. This includes knowingthe fields available for locks.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>...
Allow clients of UDSServer to use different monads
.. as long as they're instances of "MonadBaseControl IO" and "MonadLog".This allows the UDSServer to call functions like "fork" within monadssuch as "ResultT e IO" or "ReaderT IO".
Add 'instance_communication_parameter' to 'Cluster'
Conflicts: lib/client/gnt_node.py: trivial src/Ganeti/Query/Query.hs: import ALL the functions
Consider job-IDs queried for twice only once
As reading jobs from disk is an expensive operation, when queryingfor jobs, we optimize by considering which values the job-id is askedfor in the filter. As any reasonable person would not add the sameclause twice in an Or-clause, the implicit assumption was that the...
Merge branch 'stable-2.10' into stable-2.11
Merge branch 'stable-2.9' into stable-2.10
Allow classic queries to use either names or UUIDs
When UUIDs are used in CLI commands, such addressing of objects failsor succeeds inconsistently across object types. Worse yet, some callsdo not fail, but simply return no result. This is due to the way the...
Remove wildcard luxi operation matching in luxid
In that way, we explicitly name the operations that are nothandled by luxid and explain the reason. In particular, wecan be sure that newly added luxid operations won't be forgottenin luxid.
Implement QueryExports in luxid
...by handling as a classical query, using that queries forexport are already implemented. Note that QueryExport is slightlydifferent from other Query* Luxi requests, in that the fields arenot passed with the request, but have a fixed value....
Implement ChangeJobPriority in luxid
For jobs still queued, we ask the queue to change the priority,and replicate the changed job. For jobs that have already beenstarted, we have to contact the job directly, which, at the moment,means forwarding the request to masterd....
User new error handling functions in SubmitJobToDrainedQueue
This somewhat shortens and simplifies the code.
Use new error functions when querying locks
This helps to handle errors coming from the Luxi client.
Use new error handling functions for querying jobs
Since we already touched getJobIDs, and this function is already basedon ResultT, use new error functions here as well.
Update getDirJobIDs to use ResultT
Also simplify code and remove unused functions.
Rename 'resultT' to 'toError'
.. to better correspond to its generalized type.
Make safeRenameFile create dirs with defined permissions
If, and only if, safeRenameFile creates a new directory, make sureit has well defined permissions. While there, also optimize for thecommon case. The main use of safeRenameFile is archiving jobs. As...
Enable network tags in Haskell code
Prior to the creation of the 2.10 branch, network tags were broken, andthe Haskell code introduced there mistakenly accepted this as thedesired functionality. This patch fixes this in a very simple way.
Implement auto-archiving of jobs
As luxid is taking over the handling of the job queue, italso needs to handle the automated archiving of jobs. Herewe replicate the semantics of the current python implementationof archiving as many jobs older than the given time as possible,...
Make ArchiveJob in luxid create the archive, if necessary
As jobs are archived in groups of 10000, creating new subdirectoriesof the archive might be necessary when archiving a job. Use afunction that takes care of this.
fix off-by-one error in indentation
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Implement ArchiveJob queries in luxid
With luxid taking over the tasks of masterd, archivingjobs also belongs to its responsibilities. As archivinga job affects the global state of the job queue, synchroniseover the queue lock.
Make use of fieldListToFieldMap
...to avoid duplicating that code all over Ganeti.Query.
Provide a utility function to map FieldList to FieldMap
As the same construction is used in several places, it is betterto have it factored out as a named function.
Remove dead Ganeti.Query.Job.loadRuntimeData
This function was exported from the module, but actually neverused anywhere in the code base. So clean it up.
luxid: fix detection of master node in node query
Ganeti.Config.getNodeRole would rely on clusterMasterNode returning themaster node name, however clusterMasterNode returns the master node'sUUID. We fix this and a similar issue in Ganeti.Query.Node.nodeFields....
Update set_watcher_pause to use ClockTime instead of Double
This only affects the internal representation in the Haskell part.
Make configuration available to the scheduler
In this way, scheduling decisions can depend on the configurationof the cluster. At the moment, this is only the maximal numberjobs to be run in parallel, but in the future this will also includejob filters....
Make max_running_jobs queryable
As we have introduced a new cluster parameter, it shouldbe also visible when querying about the cluster configuration.
Use ClockTime instead of Double in fields in Objects.hs
This affects "mtime" and "ctime" fields in all data types.
This also forces explicit declaration of how the fields are serializedin Query.
Implement job cancellation in luxid
As luxid handles the job queue, this daemon is the naturalplace to handle job cancellation. Answering to CancelJob requestsis also necessary for luxid to be feature compliant with masterd,even for command-line requests only....
User shutdown hypervisor parameter
Add user shutdown parameter for KVM. Based on this parameter, decidewhat information to report for a KVM instance, for example,distinguish between 'ADMIN_down' and 'USER_down'.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>...
Fix whitespace
Fix whitespace in several modules.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Also consider filter fields for deciding if using live data
If the query fields don't require live data, we use the shortcutand don't request live data. However, we cannot take this shortcutif the fields the filter depends on requires live data.
Make luxid handle SetDrainFlag
Make luxid also handle queries to drain the job queue.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix sign in drain_flag request
The drain flag is set, if the queue is not open.
ssconf: Add Gluster mount directory
This commit adds the gluster storage directory to ssconf (withoutactually using its value just yet).
Signed-off-by: Santi Raffa <rsanti@google.com>Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
Implement fields query for instance
Support the query for the fields available for instances.
Remove the hvsGlobals from instance query fields
...to be consistent with the python implementation.
When interpreting [] as "all fields", sort nicely
When asked for all fields, we promise to return the list of fieldssorted according to niceSort. Keep this promise.
Merge branch 'stable-2.8' into stable-2.9
Fix gnt-network list-tags
Define network tags in haskell part.
This fixes issue 641.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Handle QueryConfigValues
Make luxid handle the QueryConfigValues call providing certainsimple status information about the cluster.
Add a predicate for watcher pause
Add a predicate, in IO, to test whether the watcher ispaused.
Implement SetWatcherPause in luxid
Make luxid handle SetWatcherPause correctly.
Move the generalized IO client from Luxi to UDSServer
No code is changed in this patch (except imports and qualifiers), onlymoved.
Generalize the IO client handling in Luxi
... to be usable for WConfd as well. A daemon handler is encapsulatedinto `Handler` data type, which is then passed to a generic `listener`.
The changes are done in Luxi.hs so that the differences are visible and...
Add the Unix domain socket path to the Server data type
This simplifies code for closing such a socket.
Encapsulate a server socket and its parameters
Instead of passing a bare server socket around, we pass it encapsulatedin a data type together with parameters such as read/write timeouts.
Rename getClient/Server to getLuxiClient/Server
Later they will be split into LUXI-specific and general parts.
Make luxid support WaitForJobChange
Make support the WaitForJobChange, waiting for a job tochange on certain monitored fields.
Make luxid use the JQScheduler
Make luxid use the job scheduler instead of immediatelystarting every received job.
Rename enqueueJobs to startJobs
This reflects better what the method actually does. Later,we will add a job scheduler that will provide a proper enqueuemethod.
Add default_iallocator_params cluster parameter
Add a cluster parameter to hold the iallocator parameters usedby the default instance allocator. Implement the option tomodify config.data, query config.data and upgrade man pages,tests and cfgupgrade tool. The new default_iallocator_params is...
Set the received time stamp for new jobs
Since luxid now handles the job submission requests, it is alsoits responsibility to set the received time stamps. Do this.
Fix retrieval of number of instances of a node
This patch fixes a FIXME to make the retrieval of thenumber of primary and secondary instances share morecommon code.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Use hypervisor / storage information only when requested
So far, the node queries ignored the list of fields andjust requested all available information from the backend.That means, for example if only hypervisor information isrequested, still the storage space calculation is...
Make luxid job submission be defined by replication
When receiving jobs to be submitted, make luxid replicate them to allmaster candidates and then return. The actual execution can be handledasynchronously.
Rename LuxiSocket to MasterSocket
Rename the constants to name the socket to connect masterd,as the name LuxiSocket hints on luxid, which is differentfrom masterd.
Implement 'QueryInstances' call in Haskell luxi server
While the command line uses the generic 'Query' call,rapi calls 'QueryInstances'. 'QueryInstances' so farwas not fully implemented in the Haskell implementationof the luxi server. This was discovered when trying to...
Fix bug regarding node UUID in haskell node queries
When moving from python to haskell node queries, a bugwas discovered where a node's UUID was mistakenlycompared to a node's name. This indirectly caused thecluster epo operation to fail, because it was not...
Add the aggregate NIC VLAN instance field
Allow the retrieval of the VLANs of all the NICs through nic.vlans.
Signed-off-by: Hrvoje Ribicic <riba@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>