Revision 9725b53d
b/Makefile.am | ||
---|---|---|
136 | 136 |
doc/iallocator.rst \ |
137 | 137 |
doc/index.rst \ |
138 | 138 |
doc/install.rst \ |
139 |
doc/locking.rst \ |
|
139 | 140 |
doc/rapi.rst \ |
140 | 141 |
doc/security.rst |
141 | 142 |
|
... | ... | |
202 | 203 |
doc/examples/dumb-allocator \ |
203 | 204 |
doc/examples/hooks/ethers \ |
204 | 205 |
doc/examples/hooks/ipsec.in \ |
205 |
doc/locking.txt \ |
|
206 | 206 |
test/testutils.py \ |
207 | 207 |
test/mocks.py \ |
208 | 208 |
$(dist_TESTS) \ |
b/doc/design-2.0.rst | ||
---|---|---|
692 | 692 |
one thread and must be thread-safe. For simplicity, a single lock is used for |
693 | 693 |
the whole job queue. |
694 | 694 |
|
695 |
A more detailed description can be found in doc/locking.txt.
|
|
695 |
A more detailed description can be found in doc/locking.rst.
|
|
696 | 696 |
|
697 | 697 |
|
698 | 698 |
Internal RPC |
b/doc/index.rst | ||
---|---|---|
14 | 14 |
security.rst |
15 | 15 |
design-2.0.rst |
16 | 16 |
design-2.1.rst |
17 |
locking.rst |
|
17 | 18 |
hooks.rst |
18 | 19 |
iallocator.rst |
19 | 20 |
rapi.rst |
b/doc/locking.rst | ||
---|---|---|
1 |
Ganeti locking |
|
2 |
============== |
|
3 |
|
|
4 |
Introduction |
|
5 |
------------ |
|
6 |
|
|
7 |
This document describes lock order dependencies in Ganeti. |
|
8 |
It is divided by functional sections |
|
9 |
|
|
10 |
|
|
11 |
Opcode Execution Locking |
|
12 |
------------------------ |
|
13 |
|
|
14 |
These locks are declared by Logical Units (LUs) (in cmdlib.py) and acquired by |
|
15 |
the Processor (in mcpu.py) with the aid of the Ganeti Locking Library |
|
16 |
(locking.py). They are acquired in the following order: |
|
17 |
|
|
18 |
* BGL: this is the Big Ganeti Lock, it exists for retrocompatibility. New LUs |
|
19 |
acquire it in a shared fashion, and are able to execute all toghether |
|
20 |
(baring other lock waits) while old LUs acquire it exclusively and can only |
|
21 |
execute one at a time, and not at the same time with new LUs. |
|
22 |
* Instance locks: can be declared in ExpandNames() or DeclareLocks() by an LU, |
|
23 |
and have the same name as the instance itself. They are acquired as a set. |
|
24 |
Internally the locking library acquired them in alphabetical order. |
|
25 |
* Node locks: can be declared in ExpandNames() or DeclareLocks() by an LU, and |
|
26 |
have the same name as the node itself. They are acquired as a set. |
|
27 |
Internally the locking library acquired them in alphabetical order. Given |
|
28 |
this order it's possible to safely acquire a set of instances, and then the |
|
29 |
nodes they reside on. |
|
30 |
|
|
31 |
The ConfigWriter (in config.py) is also protected by a SharedLock, which is |
|
32 |
shared by functions that read the config and acquired exclusively by functions |
|
33 |
that modify it. Since the ConfigWriter calls rpc.call_upload_file to all nodes |
|
34 |
to distribute the config without holding the node locks, this call must be able |
|
35 |
to execute on the nodes in parallel with other operations (but not necessarily |
|
36 |
concurrently with itself on the same file, as inside the ConfigWriter this is |
|
37 |
called with the internal config lock held. |
|
38 |
|
|
39 |
|
|
40 |
Job Queue Locking |
|
41 |
----------------- |
|
42 |
|
|
43 |
The job queue is designed to be thread-safe. This means that its public |
|
44 |
functions can be called from any thread. The job queue can be called from |
|
45 |
functions called by the queue itself (e.g. logical units), but special |
|
46 |
attention must be paid not to create deadlocks or an invalid state. |
|
47 |
|
|
48 |
The single queue lock is used from all classes involved in the queue handling. |
|
49 |
During development we tried to split locks, but deemed it to be too dangerous |
|
50 |
and difficult at the time. Job queue functions acquiring the lock can be safely |
|
51 |
called from all the rest of the code, as the lock is released before leaving |
|
52 |
the job queue again. Unlocked functions should only be called from job queue |
|
53 |
related classes (e.g. in jqueue.py) and the lock must be acquired beforehand. |
|
54 |
|
|
55 |
In the job queue worker (``_JobQueueWorker``), the lock must be released before |
|
56 |
calling the LU processor. Otherwise a deadlock can occur when log messages are |
|
57 |
added to opcode results. |
|
58 |
|
|
59 |
|
|
60 |
Node Daemon Locking |
|
61 |
------------------- |
|
62 |
|
|
63 |
The node daemon contains a lock for the job queue. In order to avoid conflicts |
|
64 |
and/or corruption when an eventual master daemon or another node daemon is |
|
65 |
running, it must be held for all job queue operations |
|
66 |
|
|
67 |
There's one special case for the node daemon running on the master node. If |
|
68 |
grabbing the lock in exclusive fails on startup, the code assumes all checks |
|
69 |
have been done by the process keeping the lock. |
/dev/null | ||
---|---|---|
1 |
Ganeti locking |
|
2 |
============== |
|
3 |
|
|
4 |
Introduction |
|
5 |
------------ |
|
6 |
|
|
7 |
This document describes lock order dependencies in Ganeti. |
|
8 |
It is divided by functional sections |
|
9 |
|
|
10 |
|
|
11 |
Opcode Execution Locking |
|
12 |
------------------------ |
|
13 |
|
|
14 |
These locks are declared by Logical Units (LUs) (in cmdlib.py) and acquired by |
|
15 |
the Processor (in mcpu.py) with the aid of the Ganeti Locking Library |
|
16 |
(locking.py). They are acquired in the following order: |
|
17 |
|
|
18 |
* BGL: this is the Big Ganeti Lock, it exists for retrocompatibility. New LUs |
|
19 |
acquire it in a shared fashion, and are able to execute all toghether |
|
20 |
(baring other lock waits) while old LUs acquire it exclusively and can only |
|
21 |
execute one at a time, and not at the same time with new LUs. |
|
22 |
* Instance locks: can be declared in ExpandNames() o DeclareLocks() by an LU, |
|
23 |
and have the same name as the instance itself. They are acquired as a set. |
|
24 |
Internally the locking library acquired them in alphabetical order. |
|
25 |
* Node locks: can be declared in ExpandNames() o DeclareLocks() by an LU, and |
|
26 |
have the same name as the node itself. They are acquired as a set. |
|
27 |
Internally the locking library acquired them in alphabetical order. Given |
|
28 |
this order it's possible to safely acquire a set of instances, and then the |
|
29 |
nodes they reside on. |
|
30 |
|
|
31 |
The ConfigWriter (in config.py) is also protected by a SharedLock, which is |
|
32 |
shared by functions that read the config and acquired exclusively by functions |
|
33 |
that modify it. Since the ConfigWriter calls rpc.call_upload_file to all nodes |
|
34 |
to distribute the config without holding the node locks, this call must be able |
|
35 |
to execute on the nodes in parallel with other operations (but not necessarily |
|
36 |
concurrently with itself on the same file, as inside the ConfigWriter this is |
|
37 |
called with the internal config lock held. |
|
38 |
|
|
39 |
|
|
40 |
Job Queue Locking |
|
41 |
----------------- |
|
42 |
|
|
43 |
The job queue is designed to be thread-safe. This means that its public |
|
44 |
functions can be called from any thread. The job queue can be called from |
|
45 |
functions called by the queue itself (e.g. logical units), but special |
|
46 |
attention must be paid not to create deadlocks or an invalid state. |
|
47 |
|
|
48 |
The single queue lock is used from all classes involved in the queue handling. |
|
49 |
During development we tried to split locks, but deemed it to be too dangerous |
|
50 |
and difficult at the time. Job queue functions acquiring the lock can be safely |
|
51 |
called from all the rest of the code, as the lock is released before leaving |
|
52 |
the job queue again. Unlocked functions should only be called from job queue |
|
53 |
related classes (e.g. in jqueue.py) and the lock must be acquired beforehand. |
|
54 |
|
|
55 |
In the job queue worker (``_JobQueueWorker``), the lock must be released before |
|
56 |
calling the LU processor. Otherwise a deadlock can occur when log messages are |
|
57 |
added to opcode results. |
|
58 |
|
|
59 |
|
|
60 |
Node Daemon Locking |
|
61 |
------------------- |
|
62 |
|
|
63 |
The node daemon contains a lock for the job queue. In order to avoid conflicts |
|
64 |
and/or corruption when an eventual master daemon or another node daemon is |
|
65 |
running, it must be held for all job queue operations |
|
66 |
|
|
67 |
There's one special case for the node daemon running on the master node. If |
|
68 |
grabbing the lock in exclusive fails on startup, the code assumes all checks |
|
69 |
have been done by the process keeping the lock. |
Also available in: Unified diff