/doc/design-chained-jobs.rst - snf-ganeti - Greek Research and Technology Network's projects

| Branch: | Tag: | Revision:

root / doc / design-chained-jobs.rst @ 2ed0e208

History | View | Annotate | Download (8.2 kB)

       ============
       Chained jobs
       ============
       .. contents:: :depth: 4
       This is a design document about the innards of Ganeti's job processing.
       Readers are advised to study previous design documents on the topic:
       - :ref:`Original job queue <jqueue-original-design>`
       - :ref:`Job priorities <jqueue-job-priority-design>`
       - :doc:`LU-generated jobs <design-lu-generated-jobs>`
       Current state and shortcomings
       ==============================
       Ever since the introduction of the job queue with Ganeti 2.0 there have
       been situations where we wanted to run several jobs in a specific order.
       Due to the job queue's current design, such a guarantee can not be
       given. Jobs are run according to their priority, their ability to
       acquire all necessary locks and other factors.
       One way to work around this limitation is to do some kind of job
       grouping in the client code. Once all jobs of a group have finished, the
       next group is submitted and waited for. There are different kinds of
       clients for Ganeti, some of which don't share code (e.g. Python clients
       vs. htools). This design proposes a solution which would be implemented
       as part of the job queue in the master daemon.
       Proposed changes
       ================
       With the implementation of :ref:`job priorities
       <jqueue-job-priority-design>` the processing code was re-architectured
       and became a lot more versatile. It now returns jobs to the queue in
       case the locks for an opcode can't be acquired, allowing other
       jobs/opcodes to be run in the meantime.
       The proposal is to add a new, optional property to opcodes to define
       dependencies on other jobs. Job X could define opcodes with a dependency
       on the success of job Y and would only be run once job Y is finished. If
       there's a dependency on success and job Y failed, job X would fail as
       well. Since such dependencies would use job IDs, the jobs still need to
       be submitted in the right order.
       .. pyassert::
          # Update description below if finalized job status change
          constants.JOBS_FINALIZED == frozenset([
            constants.JOB_STATUS_CANCELED,
            constants.JOB_STATUS_SUCCESS,
            constants.JOB_STATUS_ERROR,
            ])
       The new attribute's value would be a list of two-valued tuples. Each
       tuple contains a job ID and a list of requested status for the job
       depended upon. Only final status are accepted
       (:pyeval:`utils.CommaJoin(constants.JOBS_FINALIZED)`). An empty list is
       equivalent to specifying all final status (except
       :pyeval:`constants.JOB_STATUS_CANCELED`, which is treated specially).
       An opcode runs only once all its dependency requirements have been
       fulfilled.
       Any job referring to a cancelled job is also cancelled unless it
       explicitly lists :pyeval:`constants.JOB_STATUS_CANCELED` as a requested
       status.
       In case a referenced job can not be found in the normal queue or the
       archive, referring jobs fail as the status of the referenced job can't
       be determined.
       With this change, clients can submit all wanted jobs in the right order
       and proceed to wait for changes on all these jobs (see
       ``cli.JobExecutor``). The master daemon will take care of executing them
       in the right order, while still presenting the client with a simple
       interface.
       Clients using the ``SubmitManyJobs`` interface can use relative job IDs
       (negative integers) to refer to jobs in the same submission.
       .. highlight:: javascript
       Example data structures::
         # First job
+        {
           "job_id": "6151",
           "ops": [
             { "OP_ID": "OP_INSTANCE_REPLACE_DISKS", ..., },
             { "OP_ID": "OP_INSTANCE_FAILOVER", ..., },
             ],
+        }
         # Second job, runs in parallel with first job
+        {
           "job_id": "7687",
           "ops": [
             { "OP_ID": "OP_INSTANCE_MIGRATE", ..., },
             ],
+        }
         # Third job, depending on success of previous jobs
+        {
           "job_id": "9218",
           "ops": [
             { "OP_ID": "OP_NODE_SET_PARAMS",
               "depend": [
                 [6151, ["success"]],
                 [7687, ["success"]],
                 ],
               "offline": True, },
             ],
+        }
       Implementation details
       ----------------------
       Status while waiting for dependencies
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       Jobs waiting for dependencies are certainly not in the queue anymore and
       therefore need to change their status from "queued". While waiting for
       opcode locks the job is in the "waiting" status (the constant is named
       ``JOB_STATUS_WAITLOCK``, but the actual value is ``waiting``). There the
       following possibilities:
       #. Introduce a new status, e.g. "waitdeps".
          Pro:
          - Clients know for sure a job is waiting for dependencies, not locks
          Con:
          - Code and tests would have to be updated/extended for the new status
          - List of possible state transitions certainly wouldn't get simpler
          - Breaks backwards compatibility, older clients might get confused
       #. Use existing "waiting" status.
          Pro:
          - No client changes necessary, less code churn (note that there are
            clients which don't live in Ganeti core)
          - Clients don't need to know the difference between waiting for a job
            and waiting for a lock; it doesn't make a difference
          - Fewer state transitions (see commit ``5fd6b69479c0``, which removed
            many state transitions and disk writes)
          Con:
          - Not immediately visible what a job is waiting for, but it's the
            same issue with locks; this is the reason why the lock monitor
            (``gnt-debug locks``) was introduced; job dependencies can be shown
            as "locks" in the monitor
       Based on these arguments, the proposal is to do the following:
       - Rename ``JOB_STATUS_WAITLOCK`` constant to ``JOB_STATUS_WAITING`` to
         reflect its actual meanting: the job is waiting for something
       - While waiting for dependencies and locks, jobs are in the "waiting"
         status
       - Export dependency information in lock monitor; example output::
           Name      Mode Owner Pending
           job/27491 -    -     success:job/34709,job/21459
           job/21459 -    -     success,error:job/14513
       Cost of deserialization
       ~~~~~~~~~~~~~~~~~~~~~~~
       To determine the status of a dependency job the job queue must have
       access to its data structure. Other queue operations already do this,
       e.g. archiving, watching a job's progress and querying jobs.
       Initially (Ganeti 2.0/2.1) the job queue shared the job objects
       in memory and protected them using locks. Ganeti 2.2 (see :doc:`design
       document <design-2.2>`) changed the queue to read and deserialize jobs
       from disk. This significantly reduced locking and code complexity.
       Nowadays inotify is used to wait for changes on job files when watching
       a job's progress.
       Reading from disk and deserializing certainly has some cost associated
       with it, but it's a significantly simpler architecture than
       synchronizing in memory with locks. At the stage where dependencies are
       evaluated the queue lock is held in shared mode, so different workers
       can read at the same time (deliberately ignoring CPython's interpreter
       lock).
       It is expected that the majority of executed jobs won't use
       dependencies and therefore won't be affected.
       Other discussed solutions
       =========================
       Job-level attribute
       -------------------
       At a first look it might seem to be better to put dependencies on
       previous jobs at a job level. However, it turns out that having the
       option of defining only a single opcode in a job as having such a
       dependency can be useful as well. The code complexity in the job queue
       is equivalent if not simpler.
       Since opcodes are guaranteed to run in order, clients can just define
       the dependency on the first opcode.
       Another reason for the choice of an opcode-level attribute is that the
       current LUXI interface for submitting jobs is a bit restricted and would
       need to be changed to allow the addition of job-level attributes,
       potentially requiring changes in all LUXI clients and/or breaking
       backwards compatibility.
       Client-side logic
       -----------------
       There's at least one implementation of a batched job executor twisted
       into the ``burnin`` tool's code. While certainly possible, a client-side
       solution should be avoided due to the different clients already in use.
       For one, the :doc:`remote API <rapi>` client shouldn't import
       non-standard modules. htools are written in Haskell and can't use Python
       modules. A batched job executor contains quite some logic. Even if
       cleanly abstracted in a (Python) library, sharing code between different
       clients is difficult if not impossible.
       .. vim: set textwidth=72 :
       .. Local Variables:
       .. mode: rst
       .. fill-column: 72
       .. End:

Synnefo » snf-ganeti

root / doc / design-chained-jobs.rst @ 2ed0e208