root / doc / locking.txt @ a25c1b2a
History | View | Annotate | Download (3 kB)
1 |
Ganeti locking |
---|---|
2 |
============== |
3 |
|
4 |
Introduction |
5 |
------------ |
6 |
|
7 |
This document describes lock order dependencies in Ganeti. |
8 |
It is divided by functional sections |
9 |
|
10 |
|
11 |
Opcode Execution Locking |
12 |
------------------------ |
13 |
|
14 |
These locks are declared by Logical Units (LUs) (in cmdlib.py) and acquired by |
15 |
the Processor (in mcpu.py) with the aid of the Ganeti Locking Library |
16 |
(locking.py). They are acquired in the following order: |
17 |
|
18 |
* BGL: this is the Big Ganeti Lock, it exists for retrocompatibility. New LUs |
19 |
acquire it in a shared fashion, and are able to execute all toghether |
20 |
(baring other lock waits) while old LUs acquire it exclusively and can only |
21 |
execute one at a time, and not at the same time with new LUs. |
22 |
* Instance locks: can be declared in ExpandNames() o DeclareLocks() by an LU, |
23 |
and have the same name as the instance itself. They are acquired as a set. |
24 |
Internally the locking library acquired them in alphabetical order. |
25 |
* Node locks: can be declared in ExpandNames() o DeclareLocks() by an LU, and |
26 |
have the same name as the node itself. They are acquired as a set. |
27 |
Internally the locking library acquired them in alphabetical order. Given |
28 |
this order it's possible to safely acquire a set of instances, and then the |
29 |
nodes they reside on. |
30 |
|
31 |
The ConfigWriter (in config.py) is also protected by a SharedLock, which is |
32 |
shared by functions that read the config and acquired exclusively by functions |
33 |
that modify it. Since the ConfigWriter calls rpc.call_upload_file to all nodes |
34 |
to distribute the config without holding the node locks, this call must be able |
35 |
to execute on the nodes in parallel with other operations (but not necessarily |
36 |
concurrently with itself on the same file, as inside the ConfigWriter this is |
37 |
called with the internal config lock held. |
38 |
|
39 |
|
40 |
Job Queue Locking |
41 |
----------------- |
42 |
|
43 |
The job queue is designed to be thread-safe. This means that its public |
44 |
functions can be called from any thread. One must only pay attention not to |
45 |
call the job queue from functions called by the queue itself (e.g. logical |
46 |
units). |
47 |
|
48 |
The single queue lock is used from all classes involved in the queue handling. |
49 |
During development we tried to split locks, but demmed it to be too dangerous |
50 |
and difficicult at the time. Once the lock is held, all functions called must |
51 |
not grab the queue lock again (e.g. call only functions with a ``Unlocked`` |
52 |
suffix). |
53 |
|
54 |
In the job queue worker (``_JobQueueWorker``), the lock must be released before |
55 |
calling the LU processor. Otherwise a deadlock can occur when log messages are |
56 |
added to opcode results. |
57 |
|
58 |
|
59 |
Node Daemon Locking |
60 |
------------------- |
61 |
|
62 |
The node daemon contains a lock for the job queue. In order to avoid conflicts |
63 |
and/or corruption when an eventual master daemon or another node daemon is |
64 |
running, it must be held for all job queue operations |
65 |
|
66 |
There's one special case for the node daemon running on the master node. If |
67 |
grabbing the lock in exclusive fails on startup, the code assumes all checks |
68 |
have been done by the process keeping the lock. |