Change return type of internal rmJob
...to also provide the job itself. In this way, the function canalso be used for tasks that require temporarily removing a jobfrom the queue.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>
Add a function changing the priority of an opcode
This pure function follows the semantic that an opcode, includingits priority, may only be changed if the opcode is not finalized.
Add a function to change the priority of a job
...by changing the priority of the non-finished opcodes.
Add functions for manipulating errors in Result(T)
There is often need to manipulate these errors, for example to convert a Stringfrom Result into an exception. These functions make this easier.
Function 'toErrorStr' lifts 'Result' to any 'MonadError'. This is useful...
Remove FromString in favor of Error from standard libraries
They have the very same functionality, and using our own FromString onlycauses unnecessary code duplication.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>...
Add Alternative instances for GenericResult and ResultT
This allows to use Alternative specific combinators, namely `optional`.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Cherry-pick of 78209a84b0f6be27fd381ac2...
Add andRestArguments to IDiskParams
In this way, we cann pass through the opaque parametersrequired for disk creation and modification in the case ofexternal storage.
Add function providing the canonical andRestArguments
The field catching the remaining fields will always be of the sameshape, so add a function for this to make usage simple.
Add additional constructor AndRestArguments to OptionalType
A field of this type will capture all the remaining fieldsof an object as JSValues. Obviously, the intended use isto have precisely one such field. This mechanism will allowto pass opaque values trough, as it is, e.g., required for...
Make safeRenameFile create dirs with defined permissions
If, and only if, safeRenameFile creates a new directory, make sureit has well defined permissions. While there, also optimize for thecommon case. The main use of safeRenameFile is archiving jobs. As...
Add constant for subdir permissions within the job queue
When archiving jobs, new directories have to be created, asjobs are archived in groups of 10000. Add a constant describingthe permissions of these newly created directories.
Note that, due to the type, the constant cannot be part...
Add utility to fix permissions
Especially when creating new directories, we need to make sureownership and permissions are set correctly. Provide a functionto do so.
Add data type describing permissions and possibly owners
When creating new files, and, more importantly, new directoriesit is relevant to set permissions, and possibly owners, correctly.Provide a type specifying the target configuration.
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
Merge branch 'stable-2.10' into stable-2.11
Enable network tags in Haskell code
Prior to the creation of the 2.10 branch, network tags were broken, andthe Haskell code introduced there mistakenly accepted this as thedesired functionality. This patch fixes this in a very simple way.
Signed-off-by: Hrvoje Ribicic <riba@google.com>...
Merge branch 'stable-2.9' into stable-2.10
Add 'provider' to IDiskParams
IDISK_PROVIDER was included in python's IDISK_PARAMS, so itshould also be included in the Haskell code.
Now that luxid creates and enqueues jobs, without this patch theExtStorage interface is broken as the user can not pass the disk...
Disabling client certificate usage
This patch temporarily disables the usage of the clientSSL certificates. The handling of RPC connections had aconceptional flaw, because the certificates lack a propersignature. For this, Ganeti needs to implement a CA,...
Implement auto-archiving of jobs
As luxid is taking over the handling of the job queue, italso needs to handle the automated archiving of jobs. Herewe replicate the semantics of the current python implementationof archiving as many jobs older than the given time as possible,...
Add a utility function to try archiving jobs
Provide a function that walks through a list of job ids andarchives them if appropriate. Abort that process if a giventimeout is reached.
Support computation on Timestamp
As timestamps are also used to determine if an event is sufficientlylong in the past (e.g., on archiving jobs), support adding a timeinterval to a Timestamp.
Add constructor function for Timestamp
Provide means to get Ganeti's internal timestampsfrom standard clock time.
Add a predicate on Jobs on whether it can be archived
Jobs usually are archived a given time after they have finished.For finalized jobs without end-time, the start-time is taken enlieu. This function provides the pure predicate for this decision.
Make ArchiveJob in luxid create the archive, if necessary
As jobs are archived in groups of 10000, creating new subdirectoriesof the archive might be necessary when archiving a job. Use afunction that takes care of this.
Provide a safe version of rename
...that also creates the target directory, if needed.
Fix expectation for the return value of jobqueue_rename
On sucess, jobqueue_rename returns a list containing onenull per change request.
fix off-by-one error in indentation
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Implement ArchiveJob queries in luxid
With luxid taking over the tasks of masterd, archivingjobs also belongs to its responsibilities. As archivinga job affects the global state of the job queue, synchroniseover the queue lock.
Add RPC call jobqueue_rename
Archiving jobs is also replicated to all master candidates.Therefore luxid needs to be aware of this RPC call.
luxid: fix detection of master node in node query
Ganeti.Config.getNodeRole would rely on clusterMasterNode returning themaster node name, however clusterMasterNode returns the master node'sUUID. We fix this and a similar issue in Ganeti.Query.Node.nodeFields....
When updating job queue, support virtual paths
When replicating parts of the job queue, allow for virtualpaths in the RPC call. In this way, replication will alsowork correctly in a vcluster setup. Note that makeVirtualPathlives in IO, and hence cannot be part of the pure encoding...
Add a module to support virtual clusters
Virtual clusters are an efficient way to test how Ganeti behaveson a large cluster without requiring a large number of machines.Now that more tasks like job replication are done by luxid, providethat functionality in Haskell as well....
Move vcluster-related constants to Constants.hs
...as, in that way, they will also be available in Haskell,where job replication happens as well.
Clean up luxidMaxRunningJobs
Now that the number of jobs maximally running in parallel isa run-time option, this magic constant is not needed any more.
Make the scheduler use the max_running_jobs config parameter
Use the run-time configuration to decide on the number of jobsscheduled for execution instead of using a hard-coded constant.
Make configuration available to the scheduler
In this way, scheduling decisions can depend on the configurationof the cluster. At the moment, this is only the maximal numberjobs to be run in parallel, but in the future this will also includejob filters....
Make max_running_jobs queryable
As we have introduced a new cluster parameter, it shouldbe also visible when querying about the cluster configuration.
Add opcode parameter for the maximal number of running jobs
This parameter of OpClusterSetParams will allow to set themaximal number of jobs to be run simultaneously.
Add parameter max_running_jobs to the cluster configuration
This cluster-wide parameter will determine how many non-finalized jobs maximallyshould be in a not queued state at the same time.
Implement job cancellation in luxid
As luxid handles the job queue, this daemon is the naturalplace to handle job cancellation. Answering to CancelJob requestsis also necessary for luxid to be feature compliant with masterd,even for command-line requests only....
Provide a function to compute the canceled version of a job
When a job gets canceled while still queued, dequeuing requiresluxid to mark it as cancelled. So provide the necessary purefunction to do so.
Support canceling dequeued jobs
Even after jobs have been handed over for execution, it mightstill be possible to cancel them. On such case would be thejob still waiting for a lock. Eventually, we will have tocommunicate to the job directly, but as long as execution is...
Add dequeuing to the job scheduler
This only removes queued jobs from the queueand indicates whether the job was found in the queue.For jobs that are already started from the queue'spoint of view, it might still be possible to cancelthem, e.g., if they are still waiting for locks....
Fix Kvmd imports for Ubuntu 13.04 64
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
User shutdown hypervisor parameter
Add user shutdown parameter for KVM. Based on this parameter, decidewhat information to report for a KVM instance, for example,distinguish between 'ADMIN_down' and 'USER_down'.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>...
Add KVM daemon daemonize
Add KVM daemon entry point, command-line options, backgrounding, etc
Add KVM daemon logic
Add KVM daemon logic, which contains monitors for Qmp sockets anddirectory/file watching.
Generalize and reuse Unix domain sockets
Refactor module 'Ganeti.UDSServer' so the KVM daemon can reuse codedeclared in this module to handle Unix domain sockets.
KVM daemon datatype, user and group
Fix whitespace
Fix whitespace in several modules.
Fix according to the Ganeti style guide
Also consider filter fields for deciding if using live data
If the query fields don't require live data, we use the shortcutand don't request live data. However, we cannot take this shortcutif the fields the filter depends on requires live data.
Increase job queue polling interval
Now that all jobs are monitored with inotify, increase the polling interval.
After detecting a finished job, schedule again
In order to obtain a higher throughput of jobs, schedule new jobsas soon as a job was detected to have finished.
Attach a watcher for jobs
Add a function that can serve as an event handler for inotifyupdating a job in the job queue if the corresponding job filechanges. Also attach it to all jobs selected to be run.
JQScheduler: always pass JobWithStat
When attaching inotifies to jobs, we need to preserveit through potential requeuing actions. Also, this informationis needed for cleaning up.
Cleanup inotifies
When cleaning up finished jobs, remove the inotifyattached to them, if any.
Add an optional inotify to jobs in the scheduler
This provides the infrastructure to monitor running jobsby inotify, and hence update the queue promptly uponjob changes.
Make luxid handle SetDrainFlag
Make luxid also handle queries to drain the job queue.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Add RPC for setting the queue drain flag
As luxid is also responsible for handling requests to drain the job queue,we need the corresponding RPC in Haskell as well.
Fix sign in drain_flag request
The drain flag is set, if the queue is not open.
Reinstantiate inotify after a lost file
When watching a file, reinstantiate the inotify if notifiedof an event that removes the watch. Such events are likelyto happen, as our usual way to "modify" a file is to atomicallyreplace it by another one.
Improve debug-logging for watch file
Also log, at debug level only, when a change of a watchedfile was observed, but the change did not result in anychange of derived value.
Improve debugging by logging inotify events
At debug level, not only log that an inotify triggered,but also log the actual event.
Verify client certificates
This patch adds a step to 'gnt-cluster verify' to verifythe existence and validity of the nodes' clientcertificates. Since this is a crucial point of thesecurity concept, the verification is very detailed withexpressive error messages and well tested by unit tests....
Verify incoming RPCs against candidate map
From this patch on, incoming RPC calls are checked againstthe map of valid master candidate certificates. If no mapis present, the cluster is assumed to be inbootstrap/upgrade mode and compares the incoming call...
Extend RPC call to create SSL certificates
So far the RPC call 'node_crypto_tokens' did only retrievethe certificate digest of an existing certificate. Thiscall is now enhanced to also create a new certificate andreturn the respective digest. This will be used in various...
Store candidate certificates in ssconf
This patch enables Ganeti to store the candidatecertificate map in ssconf. A utility function toread it is provided as well.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Add candiate certificate map to configuration
At the end of this patch series, incoming RPC calls arelegitimized against a map of master candidate nodes'SSL certificate digests. This patch adds the map itselfto the cluster's configuration.
Signed-off-by: Helga Velroyen <helgav@google.com>...
Retrieve a node's certificate digest
In various cluster operations, the master node needs toretrieve the digest of a node's SSL certificate. For thispurpose, we add an RPC call to retrieve the digest. Thefunction is designed in a general way to make it possible...
Merge branch 'stable-2.10' into master
break line longer than 80 chars
hsqueeze: tag nodes before offlining them
hsqueeze is supposed to tag nodes before powering them down, so thatit later can recognize which nodes can be activated later. When showingthe commands to execute, also add the tagging commands.
hsqueeze: only consider nodes that are not secondaries
If an instance has a secondary node, it cannot be easilymoved to every node (in the same node group), as otherwiseno node would be distinguished as secondary. As hsqueezeshould only consider nodes were moving the instances away...
Gluster: add the Shared File storage type
The shared file and gluster disk templates should not report their diskspace information like file does, because they do not behave the same.
If a cluster pulls from the same, shared source of storage then it is...
Gluster: add userspace access support
Add support for the QEMU gluster: protocol. Also change the accessmode routines so they check the access parameter for all templates.
Signed-off-by: Santi Raffa <rsanti@google.com>Signed-off-by: Thomas Thrainer <thomasth@google.com>...
Gluster: mount automatically
Add parameters to the Gluster disk template so Gluster can manage themount point point autonomously.
Signed-off-by: Santi Raffa <rsanti@google.com>Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
Gluster: use ssconf value for mountpoint directory
Gluster still does not mount anything autonomously, but this commitchanges where Gluster expects its mountpoint to be.
ssconf: Add Gluster mount directory
This commit adds the gluster storage directory to ssconf (withoutactually using its value just yet).
Gluster: minimal implementation
Add Gluster to Ganeti by essentially cloning the shared file behavioureverywhere in the code base.
Implement fields query for instance
Support the query for the fields available for instances.
Remove the hvsGlobals from instance query fields
...to be consistent with the python implementation.
When interpreting [] as "all fields", sort nicely
When asked for all fields, we promise to return the list of fieldssorted according to niceSort. Keep this promise.
Merge branch 'stable-2.8' into stable-2.9
Fix race in watchFile
As the calling of watchFile and the evaluation of the initialgetFStatSafe takes non-zero time, the file could have changedbefore inotify was set up properly. Solve this problem by anadditional check for the watched value to have changed immediately...
Fix gnt-network list-tags
Define network tags in haskell part.
This fixes issue 641.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Use a data type when generating Python types of OpCodes
Currently they are generated only as Strings.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Jose A. Lopes <jabolopes@google.com>
Refactor OpCodeDescriptor from a tuple to a data type
This greatly enhances code readability.
Also fix monadic types "Q ExpQ" [which is "Q (Q Exp)"] to "Q Exp".
Add showValueList to PyValue for proper String instances
It's the same trick ShowS uses. We add a type class function forshowing a list to PyValue and then just use it in the instance for`[a]`. This way we have the proper String instance without anyoverlapping/incoherent instances....
Rename PyValueInstances.hs to PyValue.hs
Now the file contains the type class declaration as well.
Move PyValue into PyValueInstances.hs, import it in THH.hs
This puts all PyValue code into one module, getting rid of orphaninstances.
Make the duration field optional null-serialized
The time in SetWatcherPause is optional (with Nothing meaningthat the pause should be canceled), but the serialization isnot that of a Maybe Double; instead Just values serialize asthey are and Nothing serializes to null. Fortunately, we already...
Handle QueryConfigValues
Make luxid handle the QueryConfigValues call providing certainsimple status information about the cluster.
Add a predicate for watcher pause
Add a predicate, in IO, to test whether the watcher ispaused.
Provide path to watcher pause file
Extend Path.hs to also provide the path to the file indicatingwhether watcher is paused.
Implement SetWatcherPause in luxid
Make luxid handle SetWatcherPause correctly.
Add the RPC-call set_watcher_pause
With luxid taking over responsibility for handling watcher-pause requests,it needs to know about this RPC. So have it available in Haskell as well.
The time field for SetWatcherPause is optional
A JSON null value is used to indicate that the pause should be canceled.
Generate a separate return type for the job queue update RPC
The instantiation of RPC requires a bidirectional functional dependencybetween call type and return type. Hence we cannot use Unit everywhere.