9 In Ganeti 1.2, operations in a cluster have to be done in a serialized way.
10 Virtually any operation locks the whole cluster by grabbing the global lock.
11 Other commands can't return before all work has been done.
13 By implementing a job queue and granular locking, we can lower the latency of
14 command execution inside a Ganeti cluster.
20 Job execution—“Life of a Ganeti job”
21 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
23 #. Job gets submitted by the client. A new job identifier is generated and
24 assigned to the job. The job is then automatically replicated to all nodes
25 in the cluster. The identifier is returned to the client.
26 #. A pool of worker threads waits for new jobs. If all are busy, the job has
27 to wait and the first worker finishing its work will grab it. Otherwise any
28 of the waiting threads will pick up the new job.
29 #. Client waits for job status updates by calling a waiting RPC function.
30 Log message may be shown to the user. Until the job is started, it can also
32 #. As soon as the job is finished, its final result and status can be retrieved
34 #. If the client archives the job, it gets moved to a history directory.
35 This could also be done regularily using a cron script.
41 All file operations have to be done atomically by writing to a temporary file
42 and subsequent renaming. Except for log messages, every change in a job is
43 stored and replicated to other nodes.
47 /var/lib/ganeti/queue/
48 job-1 (JSON encoded job description and status)
53 lock (Queue managing process opens this file in exclusive mode)
54 serial (Last job ID used)
55 version (Queue format version)
61 Locking in the job queue is a complicated topic. It is called from more than
62 one thread and must be thread-safe. For simplicity, a single lock is used for
65 A more detailed description can be found in doc/locking.txt.
71 RPC calls available between Ganeti master and node daemons:
73 jobqueue_update(file_name, content)
74 Writes a file in the job queue directory.
76 Cleans the job queue directory completely, including archived job.
77 jobqueue_rename(old, new)
78 Renames a file in the job queue directory.
84 RPC between Ganeti clients and the Ganeti master daemon supports the following
88 Submits a list of opcodes and returns the job identifier. The identifier is
89 guaranteed to be unique during the lifetime of a cluster.
90 WaitForJobChange(job_id, fields, […], timeout)
91 This function waits until a job changes or a timeout expires. The condition
92 for when a job changed is defined by the fields passed and the last log
94 QueryJobs(job_ids, fields)
95 Returns field values for the job identifiers passed.
97 Cancels the job specified by identifier. This operation may fail if the job
98 is already running, canceled or finished.
100 Moves a job into the …/archive/ directory. This operation will fail if the
101 job has not been canceled or finished.
104 Job and opcode status
105 ~~~~~~~~~~~~~~~~~~~~~
107 Each job and each opcode has, at any time, one of the following states:
110 The job/opcode was submitted, but did not yet start.
112 The job/opcode is running.
114 The job/opcode was canceled before it started.
116 The job/opcode ran and finished successfully.
118 The job/opcode was aborted with an error.
120 If the master is aborted while a job is running, the job will be set to the
121 Error status once the master started again.
127 Archived jobs are kept in a separate directory, /var/lib/ganeti/queue/archive/.
128 The idea is to speed up the queue handling.
134 The queue has to be completely empty for Ganeti updates with changes in the job