Add reason trail pickup constant
Add a constant for the reason trail, representing the pickup of a job from thedisk.
Also, refactor a bit the other constants so that part of the definition can beshared in a hierarichical fashion.
Signed-off-by: Michele Tartara <mtartara@google.com>...
Have LuxiD add the "gnt:opcode" reason trail entry
The entry used to be added in jqueue.py, but after switching the queuemanagement from masterd to luxyd it had been lost. Now, make LuxiD responsiblefor adding it.
Add function for extending the reason trail in Luxid
The function will be used by the next commit.
Also, remove a few trailing whitespaces lying around the file.
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
While at it, fix the order of imports in OpCodes.hs
.. so that Ganeti imports are below library imports and orderedalphabetically.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>Reviewed-by: Jose A. Lopes <jabolopes@google.com>
Add a function for generating OpCode reason src. names
The function convert the opcode name to lowercase with underscores,strips the 'Op' prefix and prepends Constants.opcodeReasonSrcOpcode.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>...
Add a TH function for lower-cased stripped opcode names
The function strips the 'Op' prefix from a constructor name and convertsit to lower-case with underscores.
Generalize genConstrToStr to custom monadic functions
This will allow compile-time checks for constructor names.
Add query support for locks to luxid
While requests only get forwarded, it still helps toget luxid feature-complete with respect to master.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>Cherry-picked from commit a6e406ce376453e90e598c7be68809d6a7bd7d41...
Provide fields for lock queries
For luxid to be feature-complete with respect to masterd, italso needs to answer requests about locks. This includes knowingthe fields available for locks.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>...
Merge branch 'stable-2.10' into stable-2.11
Merge branch 'stable-2.9' into stable-2.10
Make hbal deal with no-LVM storage space properly
Since 2.6, hbal crashes when used on a cluster where noLVM storage is enabled at all. The problem is that italways queries for fields that only sometimes makesense for certain types of storage. This patch will...
Conflicts: NEWS: take both additions configure.ac: ignore suffix bump...
Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Use node UUID as client certificate serial number
It turns out, that some implementations of OpenSSL are morepedantic in checking the certficates than others. In thisparticular case, the SSL connection could not beestablished when the serial number of the certificates...
Revert "Disabling client certificate usage"
This reverts commit 45f75526b848, which was introduced totemporarily disable the implementation of SSL clientcertificates. As this patch series fixes the reason forthe disabling, we are rolling back the patch....
Merge branch 'stable-2.8' into stable-2.9
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Fix integer overflow problem in hbal
waitForJobs in src/Ganeti/Jobs.hs has an integer overflow that (at least onamd64) causes it to break after waiting for ~10 minutes. This results in hbalsleeping forever (when compiled with squeeze's ghc 6.12.1) or crashing (when...
Add missing space
Also, refactor the line to keep it under 80 chars.
Remove the HTOOLS configuration variable
.. and update the code that uses it.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Consider job-IDs queried for twice only once
As reading jobs from disk is an expensive operation, when queryingfor jobs, we optimize by considering which values the job-id is askedfor in the filter. As any reasonable person would not add the sameclause twice in an Or-clause, the implicit assumption was that the...
Fix 'JobIdListOnly' type from 'List' to 'Map'
Allow classic queries to use either names or UUIDs
When UUIDs are used in CLI commands, such addressing of objects failsor succeeds inconsistently across object types. Worse yet, some callsdo not fail, but simply return no result. This is due to the way the...
Change return type of internal rmJob
...to also provide the job itself. In this way, the function canalso be used for tasks that require temporarily removing a jobfrom the queue.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>
When enqueuing new jobs, respect job ID
When adding new jobs, don't add them at the end, but at aposition that fits with their job id. In this way, we canbuild operations that require fully dequeing a job an addingit later after some modifications.
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
Provide a function to change the priority of a queued job
There is a separation of responsibilities here. For jobs stillin the queue, it is the responsibility of the queue (scheduler),for started jobs, the job itself has to take care of it. To avoidthe job transitioning inbetween, it is temporarily dequeued during...
Implement ChangeJobPriority in luxid
For jobs still queued, we ask the queue to change the priority,and replicate the changed job. For jobs that have already beenstarted, we have to contact the job directly, which, at the moment,means forwarding the request to masterd....
Add a function changing the priority of an opcode
This pure function follows the semantic that an opcode, includingits priority, may only be changed if the opcode is not finalized.
Add a function to change the priority of a job
...by changing the priority of the non-finished opcodes.
Add functions for manipulating errors in Result(T)
There is often need to manipulate these errors, for example to convert a Stringfrom Result into an exception. These functions make this easier.
Function 'toErrorStr' lifts 'Result' to any 'MonadError'. This is useful...
Remove FromString in favor of Error from standard libraries
They have the very same functionality, and using our own FromString onlycauses unnecessary code duplication.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>...
Add Alternative instances for GenericResult and ResultT
This allows to use Alternative specific combinators, namely `optional`.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Cherry-pick of 78209a84b0f6be27fd381ac2...
Add andRestArguments to IDiskParams
In this way, we cann pass through the opaque parametersrequired for disk creation and modification in the case ofexternal storage.
Add function providing the canonical andRestArguments
The field catching the remaining fields will always be of the sameshape, so add a function for this to make usage simple.
Add additional constructor AndRestArguments to OptionalType
A field of this type will capture all the remaining fieldsof an object as JSValues. Obviously, the intended use isto have precisely one such field. This mechanism will allowto pass opaque values trough, as it is, e.g., required for...
Make safeRenameFile create dirs with defined permissions
If, and only if, safeRenameFile creates a new directory, make sureit has well defined permissions. While there, also optimize for thecommon case. The main use of safeRenameFile is archiving jobs. As...
Add constant for subdir permissions within the job queue
When archiving jobs, new directories have to be created, asjobs are archived in groups of 10000. Add a constant describingthe permissions of these newly created directories.
Note that, due to the type, the constant cannot be part...
Add utility to fix permissions
Especially when creating new directories, we need to make sureownership and permissions are set correctly. Provide a functionto do so.
Add data type describing permissions and possibly owners
When creating new files, and, more importantly, new directoriesit is relevant to set permissions, and possibly owners, correctly.Provide a type specifying the target configuration.
Enable network tags in Haskell code
Prior to the creation of the 2.10 branch, network tags were broken, andthe Haskell code introduced there mistakenly accepted this as thedesired functionality. This patch fixes this in a very simple way.
Signed-off-by: Hrvoje Ribicic <riba@google.com>...
Add 'provider' to IDiskParams
IDISK_PROVIDER was included in python's IDISK_PARAMS, so itshould also be included in the Haskell code.
Now that luxid creates and enqueues jobs, without this patch theExtStorage interface is broken as the user can not pass the disk...
Disabling client certificate usage
This patch temporarily disables the usage of the clientSSL certificates. The handling of RPC connections had aconceptional flaw, because the certificates lack a propersignature. For this, Ganeti needs to implement a CA,...
Implement auto-archiving of jobs
As luxid is taking over the handling of the job queue, italso needs to handle the automated archiving of jobs. Herewe replicate the semantics of the current python implementationof archiving as many jobs older than the given time as possible,...
Add a utility function to try archiving jobs
Provide a function that walks through a list of job ids andarchives them if appropriate. Abort that process if a giventimeout is reached.
Support computation on Timestamp
As timestamps are also used to determine if an event is sufficientlylong in the past (e.g., on archiving jobs), support adding a timeinterval to a Timestamp.
Add constructor function for Timestamp
Provide means to get Ganeti's internal timestampsfrom standard clock time.
Add a predicate on Jobs on whether it can be archived
Jobs usually are archived a given time after they have finished.For finalized jobs without end-time, the start-time is taken enlieu. This function provides the pure predicate for this decision.
Make ArchiveJob in luxid create the archive, if necessary
As jobs are archived in groups of 10000, creating new subdirectoriesof the archive might be necessary when archiving a job. Use afunction that takes care of this.
Provide a safe version of rename
...that also creates the target directory, if needed.
Fix expectation for the return value of jobqueue_rename
On sucess, jobqueue_rename returns a list containing onenull per change request.
fix off-by-one error in indentation
Implement ArchiveJob queries in luxid
With luxid taking over the tasks of masterd, archivingjobs also belongs to its responsibilities. As archivinga job affects the global state of the job queue, synchroniseover the queue lock.
Add RPC call jobqueue_rename
Archiving jobs is also replicated to all master candidates.Therefore luxid needs to be aware of this RPC call.
luxid: fix detection of master node in node query
Ganeti.Config.getNodeRole would rely on clusterMasterNode returning themaster node name, however clusterMasterNode returns the master node'sUUID. We fix this and a similar issue in Ganeti.Query.Node.nodeFields....
When updating job queue, support virtual paths
When replicating parts of the job queue, allow for virtualpaths in the RPC call. In this way, replication will alsowork correctly in a vcluster setup. Note that makeVirtualPathlives in IO, and hence cannot be part of the pure encoding...
Add a module to support virtual clusters
Virtual clusters are an efficient way to test how Ganeti behaveson a large cluster without requiring a large number of machines.Now that more tasks like job replication are done by luxid, providethat functionality in Haskell as well....
Move vcluster-related constants to Constants.hs
...as, in that way, they will also be available in Haskell,where job replication happens as well.
Clean up luxidMaxRunningJobs
Now that the number of jobs maximally running in parallel isa run-time option, this magic constant is not needed any more.
Make the scheduler use the max_running_jobs config parameter
Use the run-time configuration to decide on the number of jobsscheduled for execution instead of using a hard-coded constant.
Make configuration available to the scheduler
In this way, scheduling decisions can depend on the configurationof the cluster. At the moment, this is only the maximal numberjobs to be run in parallel, but in the future this will also includejob filters....
Make max_running_jobs queryable
As we have introduced a new cluster parameter, it shouldbe also visible when querying about the cluster configuration.
Add opcode parameter for the maximal number of running jobs
This parameter of OpClusterSetParams will allow to set themaximal number of jobs to be run simultaneously.
Add parameter max_running_jobs to the cluster configuration
This cluster-wide parameter will determine how many non-finalized jobs maximallyshould be in a not queued state at the same time.
Implement job cancellation in luxid
As luxid handles the job queue, this daemon is the naturalplace to handle job cancellation. Answering to CancelJob requestsis also necessary for luxid to be feature compliant with masterd,even for command-line requests only....
Provide a function to compute the canceled version of a job
When a job gets canceled while still queued, dequeuing requiresluxid to mark it as cancelled. So provide the necessary purefunction to do so.
Support canceling dequeued jobs
Even after jobs have been handed over for execution, it mightstill be possible to cancel them. On such case would be thejob still waiting for a lock. Eventually, we will have tocommunicate to the job directly, but as long as execution is...
Add dequeuing to the job scheduler
This only removes queued jobs from the queueand indicates whether the job was found in the queue.For jobs that are already started from the queue'spoint of view, it might still be possible to cancelthem, e.g., if they are still waiting for locks....
Fix Kvmd imports for Ubuntu 13.04 64
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
User shutdown hypervisor parameter
Add user shutdown parameter for KVM. Based on this parameter, decidewhat information to report for a KVM instance, for example,distinguish between 'ADMIN_down' and 'USER_down'.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>...
Add KVM daemon daemonize
Add KVM daemon entry point, command-line options, backgrounding, etc
Add KVM daemon logic
Add KVM daemon logic, which contains monitors for Qmp sockets anddirectory/file watching.
Generalize and reuse Unix domain sockets
Refactor module 'Ganeti.UDSServer' so the KVM daemon can reuse codedeclared in this module to handle Unix domain sockets.
KVM daemon datatype, user and group
Fix whitespace
Fix whitespace in several modules.
Fix according to the Ganeti style guide
Also consider filter fields for deciding if using live data
If the query fields don't require live data, we use the shortcutand don't request live data. However, we cannot take this shortcutif the fields the filter depends on requires live data.
Increase job queue polling interval
Now that all jobs are monitored with inotify, increase the polling interval.
After detecting a finished job, schedule again
In order to obtain a higher throughput of jobs, schedule new jobsas soon as a job was detected to have finished.
Attach a watcher for jobs
Add a function that can serve as an event handler for inotifyupdating a job in the job queue if the corresponding job filechanges. Also attach it to all jobs selected to be run.
JQScheduler: always pass JobWithStat
When attaching inotifies to jobs, we need to preserveit through potential requeuing actions. Also, this informationis needed for cleaning up.
Cleanup inotifies
When cleaning up finished jobs, remove the inotifyattached to them, if any.
Add an optional inotify to jobs in the scheduler
This provides the infrastructure to monitor running jobsby inotify, and hence update the queue promptly uponjob changes.
Make luxid handle SetDrainFlag
Make luxid also handle queries to drain the job queue.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Add RPC for setting the queue drain flag
As luxid is also responsible for handling requests to drain the job queue,we need the corresponding RPC in Haskell as well.
Fix sign in drain_flag request
The drain flag is set, if the queue is not open.
Reinstantiate inotify after a lost file
When watching a file, reinstantiate the inotify if notifiedof an event that removes the watch. Such events are likelyto happen, as our usual way to "modify" a file is to atomicallyreplace it by another one.
Improve debug-logging for watch file
Also log, at debug level only, when a change of a watchedfile was observed, but the change did not result in anychange of derived value.
Improve debugging by logging inotify events
At debug level, not only log that an inotify triggered,but also log the actual event.
Verify client certificates
This patch adds a step to 'gnt-cluster verify' to verifythe existence and validity of the nodes' clientcertificates. Since this is a crucial point of thesecurity concept, the verification is very detailed withexpressive error messages and well tested by unit tests....
Verify incoming RPCs against candidate map
From this patch on, incoming RPC calls are checked againstthe map of valid master candidate certificates. If no mapis present, the cluster is assumed to be inbootstrap/upgrade mode and compares the incoming call...
Extend RPC call to create SSL certificates
So far the RPC call 'node_crypto_tokens' did only retrievethe certificate digest of an existing certificate. Thiscall is now enhanced to also create a new certificate andreturn the respective digest. This will be used in various...
Store candidate certificates in ssconf
This patch enables Ganeti to store the candidatecertificate map in ssconf. A utility function toread it is provided as well.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Add candiate certificate map to configuration
At the end of this patch series, incoming RPC calls arelegitimized against a map of master candidate nodes'SSL certificate digests. This patch adds the map itselfto the cluster's configuration.
Signed-off-by: Helga Velroyen <helgav@google.com>...
Retrieve a node's certificate digest
In various cluster operations, the master node needs toretrieve the digest of a node's SSL certificate. For thispurpose, we add an RPC call to retrieve the digest. Thefunction is designed in a general way to make it possible...
Merge branch 'stable-2.10' into master
break line longer than 80 chars