History | View | Annotate | Download (17.7 kB)
Fix a typo in a debug message
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Check for own locks when checking job death in Luxi
Otherwise a job that is being started is falsely reported as dead.
Mark a job as failed, if it fails to start
.. and add a reason trail message. Otherwise failed jobs hang, neverfinishing.
When checking job death, check if its lock is the Luxi lock
In this case, the call trying to acquire a shared lock always succeeds,because the daemon already has an exclusive lock, which falsely reportsthat the job has died.
Signed-off-by: Petr Pudlak <pudlak@google.com>...
Add reason-trail entry on failing jobs
When failing a job, add an entry to the reason trail, indicatingwhat made the job fail (e.g., failed to fork or detected job death).
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>
Clean up dead jobs from the job queue
Make the onTimeWatcher of the job queue scheduler also verifythat all notionally running jobs are indeed alive. If a job isfound dead, remove it from the list of running jobs and updatethe job file to reflect the unexpected death....
Execute jobs as processes from Luxi
.. instead of just letting the master daemon to handle them.
We try to start all given jobs independently and requeue those thatfailed.
Add a livelock file for the Luxi daemon
The file is initialized and kept within JQStatus.It is temporarily assigned to jobs spawned by Luxi until they createtheir own livelock files.
Fix a race in rescheduling jobs
When handling the queue, in particular at analyzing job dependencies,we assume that all non-finalized jobs are present in the Queue datastructure. When rescheduling jobs we move them from the running partof the queue to the scheduled part again. In order to comply with the...
Schedule on jobs where all job dependencies are finished
Jobs may depend on other jobs in the sense that they may only be startedonce a given job is finalized. For a job process, however, it is hard todetermine if the status of a different job without a significant overhead....
Merge branch 'stable-2.11' into master
Change return type of internal rmJob
...to also provide the job itself. In this way, the function canalso be used for tasks that require temporarily removing a jobfrom the queue.
When enqueuing new jobs, respect job ID
When adding new jobs, don't add them at the end, but at aposition that fits with their job id. In this way, we canbuild operations that require fully dequeing a job an addingit later after some modifications.
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
Provide a function to change the priority of a queued job
There is a separation of responsibilities here. For jobs stillin the queue, it is the responsibility of the queue (scheduler),for started jobs, the job itself has to take care of it. To avoidthe job transitioning inbetween, it is temporarily dequeued during...
Update getDirJobIDs to use ResultT
Also simplify code and remove unused functions.
Make the scheduler use the max_running_jobs config parameter
Use the run-time configuration to decide on the number of jobsscheduled for execution instead of using a hard-coded constant.
Make configuration available to the scheduler
In this way, scheduling decisions can depend on the configurationof the cluster. At the moment, this is only the maximal numberjobs to be run in parallel, but in the future this will also includejob filters....
Add dequeuing to the job scheduler
This only removes queued jobs from the queueand indicates whether the job was found in the queue.For jobs that are already started from the queue'spoint of view, it might still be possible to cancelthem, e.g., if they are still waiting for locks....
After detecting a finished job, schedule again
In order to obtain a higher throughput of jobs, schedule new jobsas soon as a job was detected to have finished.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Attach a watcher for jobs
Add a function that can serve as an event handler for inotifyupdating a job in the job queue if the corresponding job filechanges. Also attach it to all jobs selected to be run.
JQScheduler: always pass JobWithStat
When attaching inotifies to jobs, we need to preserveit through potential requeuing actions. Also, this informationis needed for cleaning up.
Cleanup inotifies
When cleaning up finished jobs, remove the inotifyattached to them, if any.
Add an optional inotify to jobs in the scheduler
This provides the infrastructure to monitor running jobsby inotify, and hence update the queue promptly uponjob changes.
Make luxid inspect the job queue on startup
Since luxid handled the scheduling, make luxid also read the queueupon restart. In this way, jobs get scheduled in the same way, independentof luxid restarts.
Use the jobFinalized predicate in JQScheduler
...to improve readability.
Don't assume we win the archive race
The job scheduler in luxid regularly watches for changesof the job files to determine progress of jobs. As thesefiles are updated atomically, reading them will alwayssucceed---until they're archived. While luxid is quite...
Make JQScheduler handle failure on job starting
Given that luxid (at the moment) connects to masterd for startingjobs, it may be that this inter-process communication fails. Inthis case, just reschedule the jobs instead of killing the schedulerthread....
fix typo in log message
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Jose Lopes <jabolopes@google.com>
Differentiate watchers in luxid
luxid has two time-based watcher threads, one for theconfiguration, and one for the job queue. To improve readabilityof the debug output, make both watcher use a different debugmessage when the timer fires.
Add a scheduler to keep track of the job queue
In order to allow informed decissions on when to start a job,it is necessary for luxid to keep track of the (active partof the) job queue. Add a scheduler, similar to the config reader,that does this, but also schedules jobs to be executed. At the...