root / doc / locking.rst @ 2a50e2e8
History | View | Annotate | Download (3.2 kB)
1 |
Ganeti locking |
---|---|
2 |
============== |
3 |
|
4 |
Introduction |
5 |
------------ |
6 |
|
7 |
This document describes lock order dependencies in Ganeti. |
8 |
It is divided by functional sections |
9 |
|
10 |
|
11 |
Opcode Execution Locking |
12 |
------------------------ |
13 |
|
14 |
These locks are declared by Logical Units (LUs) (in cmdlib.py) and |
15 |
acquired by the Processor (in mcpu.py) with the aid of the Ganeti |
16 |
Locking Library (locking.py). They are acquired in the following order: |
17 |
|
18 |
* BGL: this is the Big Ganeti Lock, it exists for retrocompatibility. |
19 |
New LUs acquire it in a shared fashion, and are able to execute all |
20 |
toghether (baring other lock waits) while old LUs acquire it |
21 |
exclusively and can only execute one at a time, and not at the same |
22 |
time with new LUs. |
23 |
* Instance locks: can be declared in ExpandNames() or DeclareLocks() |
24 |
by an LU, and have the same name as the instance itself. They are |
25 |
acquired as a set. Internally the locking library acquired them in |
26 |
alphabetical order. |
27 |
* Node locks: can be declared in ExpandNames() or DeclareLocks() by an |
28 |
LU, and have the same name as the node itself. They are acquired as |
29 |
a set. Internally the locking library acquired them in alphabetical |
30 |
order. Given this order it's possible to safely acquire a set of |
31 |
instances, and then the nodes they reside on. |
32 |
|
33 |
The ConfigWriter (in config.py) is also protected by a SharedLock, which |
34 |
is shared by functions that read the config and acquired exclusively by |
35 |
functions that modify it. Since the ConfigWriter calls |
36 |
rpc.call_upload_file to all nodes to distribute the config without |
37 |
holding the node locks, this call must be able to execute on the nodes |
38 |
in parallel with other operations (but not necessarily concurrently with |
39 |
itself on the same file, as inside the ConfigWriter this is called with |
40 |
the internal config lock held. |
41 |
|
42 |
|
43 |
Job Queue Locking |
44 |
----------------- |
45 |
|
46 |
The job queue is designed to be thread-safe. This means that its public |
47 |
functions can be called from any thread. The job queue can be called |
48 |
from functions called by the queue itself (e.g. logical units), but |
49 |
special attention must be paid not to create deadlocks or an invalid |
50 |
state. |
51 |
|
52 |
The single queue lock is used from all classes involved in the queue |
53 |
handling. During development we tried to split locks, but deemed it to |
54 |
be too dangerous and difficult at the time. Job queue functions |
55 |
acquiring the lock can be safely called from all the rest of the code, |
56 |
as the lock is released before leaving the job queue again. Unlocked |
57 |
functions should only be called from job queue related classes (e.g. in |
58 |
jqueue.py) and the lock must be acquired beforehand. |
59 |
|
60 |
In the job queue worker (``_JobQueueWorker``), the lock must be released |
61 |
before calling the LU processor. Otherwise a deadlock can occur when log |
62 |
messages are added to opcode results. |
63 |
|
64 |
|
65 |
Node Daemon Locking |
66 |
------------------- |
67 |
|
68 |
The node daemon contains a lock for the job queue. In order to avoid |
69 |
conflicts and/or corruption when an eventual master daemon or another |
70 |
node daemon is running, it must be held for all job queue operations |
71 |
|
72 |
There's one special case for the node daemon running on the master node. |
73 |
If grabbing the lock in exclusive fails on startup, the code assumes all |
74 |
checks have been done by the process keeping the lock. |
75 |
|
76 |
.. vim: set textwidth=72 : |