History | View | Annotate | Download (79.7 kB)
Add job_id and index to the reason trail
The reason trail will contain an item indicating the job_id and theindex number of the current opcode inside the job queue.
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Fix job queue directory permission problems
If split users are used, the queue directory could only be accessedby masterd, but also confd needs to be able to read it, e.g. when itis queried as part of "gnt-job list"
This commit fixes the permissions in such a way to allow proper access rights....
jqueue: Improve inotify error reporting
This addresses issue 218. When the number of inotify watches isexhausted, for example by being set too low from the beginning or byother programs, waiting for a job to change would just report a lost job(e.g. “Error checking job status: Job with id 7817 lost”)....
Replicate queue drain flag across all master candidates
Until now, the flag was unset on a master failover unless the“$localstatedir/lib/ganeti/queue/drain” file existed.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
jqueue: Don't modify input opcode when changing priority
Commit 4679547 implemented the ability to change job's priority after itwas submitted. The code contained a bug whereby it would modify theinput data for an opcode, something the job queue shouldn't do (logical...
Cleanup ht's use of positive/strictpositive
Currently, ht.py uses a bad terminology for positive/non-negativenumbers. Per http://en.wikipedia.org/wiki/Positive_number, this is thecorrect terminology:
- A number is positive if it is greater than zero.- A number is negative if it is less than zero....
jqueue: Set task ID for jobs added to workerpool
The job ID is re-used as the task ID, as job IDs are unique.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
jqueue: Allow changing of job priority
This is due to a feature request. Sometimes one wants to change thepriority of a job after it has been submitted, e.g. after submitting animportant job only to later notice many other pending jobs which will beprocessed first. Priority changes only take effect at the next lock...
jqueue/mcpu: Determine priority using callback
Instead of being given the priority for acquiring locks by means of aparameter, mcpu will now call back. This is in preparation forimplementing a command to change a job's priority on the fly and allowsto change it while locks are being acquired (taking effect on the next...
Merge branch 'devel-2.6'
jqueue: Return jobs to queue when shutting down
When a job is still waiting for locks and the queue is shutting down,they should be returned and not actually start processing. Until nowjobs which transitioned from “queued” to “waiting” were alreadyconsidered to be running as far as the shutdown code was concerned....
jqueue: Factorize code to modify job
A new function will be added to change a job's priority.
jqueue: Add docstring for _DetermineJobDirectories
Somehow this was missed in commit 0422250e.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
jqueue: Fix comments in _SubmitJobUnlocked
gnt-job: List archived jobs if requested
If requested via a filter or by including the “archived” output,archived jobs will be loaded and shown. This is significantly slowerthan just listing normal jobs, therefore by default they are not loadedat all....
jqueue: Correct docstring
The description was not accurate.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
jqueue: Add new in-memory attribute for archived jobs
This attribute is set to True for jobs which were restored from anarchived file. A new filter will act on this field.
jqueue: Look at archived jobs when watching
First: This enables the use of “gnt-job watch $id” for archived jobs.
Now, the reason for actually making this work is that duringsufficiently large group or node evacuations jobs are archived beforethe client gets to poll for their output. This led to situations where...
Implement virtual cluster support in Python code
- pathutils: Prepend node-specific prefix path- RPC: Use virtual paths (see vcluster.py)- SSH: Pass environment variables, use destination's node directory when copying files using scp, use GANETI_HOSTNAME to determine hostname...
Migrate lib/{jqueue,jstore}.py from constants to pathutils
File system paths moved from constants to pathutils.
Switch job IDs to numeric
This has been a long-standing cleanup item, which we've alwaysrefrained from doing due to the high estimated effort needed.
In reality, it turned out that after some infrastructure improvements(the previous patches), the actual job queue-related changes are quite...
jqueue: Move functions related to job ID to jstore
These don't really need to be in jqueue, and a new function willbe added to convert job IDs to an integer for queries.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Add job support to query2 via LUXI
This enables the use of filters through query2 when listing jobs.
jqueue: Cache prepared field list in _JobChangesChecker
… instead of re-calculating it on every file change.
jqueue: Convert GetInfo to query2
This rather inefficient implementation (fields are evaluated on everycall to GetInfo) is not good for WaitForJobChanges and doesn't supportfilters, but that will be rectified in later patches.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
jqueue._QueuedOpCode: Change a docstring
There was a typo and it's not necessary to repeat the class name.
Merge branch 'devel-2.5'
Merge branch 'stable-2.5' into devel-2.5
jqueue: Factorize checking job processor's result
This allows for more unittesting.
jqueue: Fix epylint errors introduced in 37d76f1e4
serializer: Remove JSON indentation and dict key sorting
Serializing to JSON using “simplejson” is significantly slower whenindentation and/or sorting of dictionary keys is used. In simplejson 1.xthe difference isn't that big, but with simplejson 2.x the difference...
jqueue: Fix deadlock between job queue and dependency manager
When an opcode is about to be processed its dependencies areevaluated using “_JobDependencyManager.CheckAndRegister”. Dueto its nature that function requires a lock on the manager'sinternal structures. All of this happens while the job queue...
jqueue: Add code to prepare for queue shutdown
Doing so will prevent job submissions (similar to a drained queue),but won't affect currently running jobs. No further jobs will beexecuted.
jqueue: Factorize code checking for drained queue
This is in preparation for a clean(er) shutdown of masterd.
jqueue: Allow zero jobs to be submitted at once
If cmdlib.LUNodeMigrate was called for a node without primary instancesit would try to submit an empty list of jobs. This was never visible viaCLI as there we check the list of primary instances first.
Convert job queue's RPC to generated code
With these changes job queue RPC will finally show up on the lockmonitor. See below for an example. A job queue-specific class is used torestrict the use of a static list for name resolution to the job queue.Further improvements can be made to not re-create the whole RPC client...
Fixes to errors/warnings raised by pylint 0.24
Running pylint 0.24.0 revealed 2 errors and 1 warning. Here is how Ifixed them:
DeprecationWarning fixes for pylint
In version 0.21, pylint unified all the disable-* (and enable-*)directives to disable (resp. enable). This leads to a lot ofDeprecationWarning being emitted even if one uses the recommendedversion of pylint (0.21.1, as stated in devnotes.rst)....
ensure-dirs: Set permissions on job files in queue
This was a regression from 2.4.
jqueue: Add short delay before detecting job changes
By sleeping for 100ms after receiving a notification for a changed jobfile the job is given some additional time to change again. Thissignificantly reduces the number of LUXI calls for WaitForJobChanges...
Export job dependencies through lock monitor
This makes them visible to the user. Example:
$ gnt-debug locks -o name,pendingName Pendingjob/890 job:891,892job/892 job:894
Rename *_STATUS_WAITLOCK to …_WAITING
This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to {JOB,OP}_STATUS_WAITING, as per design document for chained jobs.
Fix locking issue with job dependencies
When jobs waiting for a dependency are notified, they're re-added to thequeue. This would require owning the queue lock in exclusive mode, butsince the function doing so is called from within the job/opcodeprocessor, it only holds the lock in shared mode....
jqueue: Read-only jobs don't need processor lock
jqueue: Implement submitting multiple jobs with dependencies
With this change users of the “SubmitManyJobs” interface can userelative job dependencies. Relative job IDs in dependencies are resolvedbefore handing the job off to the workerpool.
jqueue: Add “writable” flag to memory objects
Basically only one instance of the job, the one being processed,should be serialized to disk and replicated to other nodes. Withthis flag assertions can be added in various places.
Implement chained jobs
An overview is available in the design document for this change,doc/design-chained-jobs.rst.
When a job enters the job processor, the current opcode's dependenciesare evaluated. If a referenced job has not yet reached the desired...
Fix assertion error on unclean master shutdown
Commit 66bd7445 added an assertion to ensure a finalized job has its“end_timestamp” attribute set. Unfortunately it didn't cover a case whenthe queue is recovering from an unclean master shutdown.
Merge branch 'devel-2.4'
Fix off-by-one bug in job serial generation
Commit 009e73d0 (September 2009) changed the job queue to generatemultiple job serials at once. Ever since it would return one more thanrequested.
The “serial” file in the job queue directory is defined to contain the...
jqueue: Allow loading of archived jobs
Chained jobs need to look at previous jobs, including archived ones. Anice side-effect of this change is the ability to look at archived jobsusing “gnt-job info <id>” as long as the ID is known.
jqueue: Fix potential race condition when cancelling queued jobs
When a job was cancelled, its status would be changed and the filewritten again. Since this was a final status, the job file could bemoved anytime for archival. If the job was still in the queue, however,...
jqueue: Update worker thread name to include opcode summary
With this patch, the worker thread name is updated to include a shortsummary of the opcode (basically its OP_ID). The base name of job queuethreads is shortened from “JobQueue” to “Jq”. Logs and the lock monitor...
Implement submitting jobs from logical units
The design details can be seen in the design document(doc/design-lu-generated-jobs.rst).
Add opcode summary to SubmitManyJobs errors
Requested-by: Iustin Pop <iustin@google.com>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
gnt-cluster master-failover: Undrain queue
- Move functions for drain status (tracked via file) from jqueue to jstore- Undrain queue on master failover if necessary- Add QA test
jqueue: Fix cancelling while in waitlock in queue
Since the recent change to leave jobs in the “waitlock” status (commit5fd6b6947), cancelling a job while it's back in the queue would break.This patch handles these cases and adds a unittest.
jqueue: Keep jobs in “waitlock” while returning to queue
Iustin Pop reported that a job's file is updated many times while itwaits for locks held by other thread(s). After an investigation it wasconcluded that the reason was a design decision for job priorities to...
jqueue: Fix bug when cancelling jobs
If a job was cancelled while it was waiting for locks, an assertionwould've failed. This patch fixes the problem and provides a unittest to check for this situation.
jqueue/gnt-job: Add job priority fields for display
These fields can help with debugging.
jqueue: Resume jobs from “waitlock” status (2nd try)
Commit 5ef699a0e had to roll back an earlier attempt at implementingthis. With the improved job queue processer, this is finally possible.
jqueue, CancelJob: Check status only once per call
This simplifies the code a bit--the status is only checked once.
Fix docstring typo in jqueue._JobProcessor._MarkWaitlock
epydoc complained:“File …/ganeti/jqueue.py, line 886, inganeti.jqueue._JobProcessor._MarkWaitlock Warning: Redefinition of type for job”
jqueue: Use priority for acquiring locks
jqueue: Use timeout when acquiring locks
As already noted in the design document, an opcode's priority isincreased when the lock(s) can't be acquired within a certain amount oftime, except at the highest priority, where in such a case a blockingacquire is used....
jqueue: Introduce per-opcode context object
This is better to group per-opcode data.
jqueue: Rename current_op to better reflect what it actually is
jqueue: Separate function for in-memory variables
jqueue: Change model from per-job to per-opcode processing
In order to support priorities, the processing of jobs needs to bechanged. Instead of processing jobs as a whole, the code is changed toprocess one opcode at a time and then return to the queue. See the...
jqueue: Use priority for worker pool
A small helper function is added to make this easier. Priorities are notyet used in all necessary places.
jqueue: Add missing docstring to _QueuedJob.Cancel
This was forgotten in commit 099b2870b.
jqueue: Move CancelJob logic to separate function
Moving the internals of this function will allow it to be used fromunittests in the future. Splitting this into a pure, side-effect freefunction and an impure one makes the pure function easily testable....
Merge branch 'devel-2.2'
(no conflicts, took LGTM from original commit)
Signed-off-by: Iustin Pop <iustin@google.com>...
jqueue: Ensure only accepted priorities are allowed for submitting jobs
Quoting the design document: “Submitted opcodes can have one of the prioritieslisted below. Other priorities are reserved for internal use”. Submitting jobsat priority -20 should not be allowed....
Add support for job priority to opcodes and job queue objects
This allows clients to submit opcodes with a priority. Except for beingtracked by the job queue, it is not yet used by any code.
Unittests for jqueue._QueuedOpCode and jqueue._QueuedJob are provided for...
Remove mcpu's ReportLocks callback
This is no longer needed with the new lock monitor. One callback is kept tocheck for cancelled jobs.
Revert "jqueue: Resume jobs from “waitlock” status"
This reverts commit 4008c8edae31a3971fa8c4b200238afc8005d3d4.
While it worked in my initial tests, I've now found cases where this doesn'twork properly as it is. More work is needed and will be done as part of the...
jqueue: Resume jobs from “waitlock” status
After an unclean restart of ganeti-masterd, jobs in the “waitlock” status canbe safely restarted. They hadn't modified the cluster yet.
jqueue: Move queue inspection into separate function
This makes the init function a lot smaller while not changingfunctionality.
jqueue: Don't update file in MarkUnfinishedOps
This reduced the number of updates to the job files. It's used in two placeswhile processing a job and the file is updated just afterwards.
Move job queue to new ganeti.runtime
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
jqueue: Use separate function for encoding errors
Comes with unittest.
hansmi helped me with merging the conflict. Thanks
Conflicts: lib/workerpool.py
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
workerpool: Allow setting task name
With this patch, the task name is added to the thread name and will show up inlogs. Log messages from jobs will look like “pid=578/JobQueue14/Job13 mcpu:289DEBUG LU locks acquired/cluster/BGL/shared”.
jqueue: Remove lock status field
With the job queue changes for Ganeti 2.2, watched and queried jobs areloaded directly from disk, rendering the in-memory “lock_status” fielduseless. Writing it to disk would be possible, but has a huge cost atruntime (when tested, processing 1'000 opcodes involved 4'000 additional...
jqueue: Mark opcodes following failed ones as failed, too
When an opcode fails, the job queue would leave following opcodes as “queued”,which can be quite confusing. With this patch, they're all marked as failed andassertions are added to check this.
jqueue: Work around race condition between job processing and archival
This is a simplified version of a patch I sent earlier to make sure the jobfile is only written once with a finalized status.
Support for resolving hostnames to IPv6 addresses
This patch enables IPv6 name resolution by using socket.getaddrinfoinstead of socket.gethostbyname_ex.
It renames the HostInfo class to Hostname and unifies its use throughoutthe code. This is achieved by using static calls where no object is...
jqueue: More checks for cancelling queued job
We can also check when the lock status is updated. This willimprove job cancelling.
jqueue: Add more debug output
Fix a few job archival issues
This patch fixes two issues with job archival. First, theLoadJobFromDisk can return 'None' for no-such-job, and we shouldn't addNone to the job list; we can't anyway, as this raises an exception:
node1# gnt-job archive foo...
Change handling of non-Ganeti errors in jqueue
Currently, if a job execution raises a Ganeti-specific error (i.e.subclass of GenericError), then we encode it as (error class, [errorargs]). This matches the RAPI documentation.
However, if we get a non-Ganeti error, then we encode it as simply...
workerpool: Change signature of AddTask function to not use *args
By changing it to a normal parameter, which must be a sequence, we canstart using keyword parameters.
Before this patch all arguments to “AddTask(self, *args)” were passed asarguments to the worker's “RunTask” method. Priorities, which should be...