root / doc / locking.txt @ e18def2a
History | View | Annotate | Download (3.2 kB)
1 | a25c1b2a | Michael Hanselmann | Ganeti locking |
---|---|---|---|
2 | a25c1b2a | Michael Hanselmann | ============== |
3 | a25c1b2a | Michael Hanselmann | |
4 | a25c1b2a | Michael Hanselmann | Introduction |
5 | a25c1b2a | Michael Hanselmann | ------------ |
6 | 0f933d15 | Guido Trotter | |
7 | 0f933d15 | Guido Trotter | This document describes lock order dependencies in Ganeti. |
8 | 0f933d15 | Guido Trotter | It is divided by functional sections |
9 | 0f933d15 | Guido Trotter | |
10 | 0f933d15 | Guido Trotter | |
11 | a25c1b2a | Michael Hanselmann | Opcode Execution Locking |
12 | a25c1b2a | Michael Hanselmann | ------------------------ |
13 | 0f933d15 | Guido Trotter | |
14 | 0f933d15 | Guido Trotter | These locks are declared by Logical Units (LUs) (in cmdlib.py) and acquired by |
15 | 0f933d15 | Guido Trotter | the Processor (in mcpu.py) with the aid of the Ganeti Locking Library |
16 | 0f933d15 | Guido Trotter | (locking.py). They are acquired in the following order: |
17 | 0f933d15 | Guido Trotter | |
18 | 0f933d15 | Guido Trotter | * BGL: this is the Big Ganeti Lock, it exists for retrocompatibility. New LUs |
19 | 0f933d15 | Guido Trotter | acquire it in a shared fashion, and are able to execute all toghether |
20 | 0f933d15 | Guido Trotter | (baring other lock waits) while old LUs acquire it exclusively and can only |
21 | 0f933d15 | Guido Trotter | execute one at a time, and not at the same time with new LUs. |
22 | 0f933d15 | Guido Trotter | * Instance locks: can be declared in ExpandNames() o DeclareLocks() by an LU, |
23 | 0f933d15 | Guido Trotter | and have the same name as the instance itself. They are acquired as a set. |
24 | 0f933d15 | Guido Trotter | Internally the locking library acquired them in alphabetical order. |
25 | 0f933d15 | Guido Trotter | * Node locks: can be declared in ExpandNames() o DeclareLocks() by an LU, and |
26 | 0f933d15 | Guido Trotter | have the same name as the node itself. They are acquired as a set. |
27 | 0f933d15 | Guido Trotter | Internally the locking library acquired them in alphabetical order. Given |
28 | 0f933d15 | Guido Trotter | this order it's possible to safely acquire a set of instances, and then the |
29 | 0f933d15 | Guido Trotter | nodes they reside on. |
30 | 0f933d15 | Guido Trotter | |
31 | 0f933d15 | Guido Trotter | The ConfigWriter (in config.py) is also protected by a SharedLock, which is |
32 | 0f933d15 | Guido Trotter | shared by functions that read the config and acquired exclusively by functions |
33 | 0f933d15 | Guido Trotter | that modify it. Since the ConfigWriter calls rpc.call_upload_file to all nodes |
34 | 0f933d15 | Guido Trotter | to distribute the config without holding the node locks, this call must be able |
35 | 0f933d15 | Guido Trotter | to execute on the nodes in parallel with other operations (but not necessarily |
36 | 0f933d15 | Guido Trotter | concurrently with itself on the same file, as inside the ConfigWriter this is |
37 | 0f933d15 | Guido Trotter | called with the internal config lock held. |
38 | 0f933d15 | Guido Trotter | |
39 | a25c1b2a | Michael Hanselmann | |
40 | a25c1b2a | Michael Hanselmann | Job Queue Locking |
41 | a25c1b2a | Michael Hanselmann | ----------------- |
42 | a25c1b2a | Michael Hanselmann | |
43 | a25c1b2a | Michael Hanselmann | The job queue is designed to be thread-safe. This means that its public |
44 | b5b22be2 | Michael Hanselmann | functions can be called from any thread. The job queue can be called from |
45 | b5b22be2 | Michael Hanselmann | functions called by the queue itself (e.g. logical units), but special |
46 | b5b22be2 | Michael Hanselmann | attention must be paid not to create deadlocks or an invalid state. |
47 | a25c1b2a | Michael Hanselmann | |
48 | a25c1b2a | Michael Hanselmann | The single queue lock is used from all classes involved in the queue handling. |
49 | b5b22be2 | Michael Hanselmann | During development we tried to split locks, but deemed it to be too dangerous |
50 | b5b22be2 | Michael Hanselmann | and difficult at the time. Job queue functions acquiring the lock can be safely |
51 | b5b22be2 | Michael Hanselmann | called from all the rest of the code, as the lock is released before leaving |
52 | b5b22be2 | Michael Hanselmann | the job queue again. Unlocked functions should only be called from job queue |
53 | b5b22be2 | Michael Hanselmann | related classes (e.g. in jqueue.py) and the lock must be acquired beforehand. |
54 | a25c1b2a | Michael Hanselmann | |
55 | a25c1b2a | Michael Hanselmann | In the job queue worker (``_JobQueueWorker``), the lock must be released before |
56 | a25c1b2a | Michael Hanselmann | calling the LU processor. Otherwise a deadlock can occur when log messages are |
57 | a25c1b2a | Michael Hanselmann | added to opcode results. |
58 | a25c1b2a | Michael Hanselmann | |
59 | a25c1b2a | Michael Hanselmann | |
60 | a25c1b2a | Michael Hanselmann | Node Daemon Locking |
61 | a25c1b2a | Michael Hanselmann | ------------------- |
62 | a25c1b2a | Michael Hanselmann | |
63 | a25c1b2a | Michael Hanselmann | The node daemon contains a lock for the job queue. In order to avoid conflicts |
64 | a25c1b2a | Michael Hanselmann | and/or corruption when an eventual master daemon or another node daemon is |
65 | a25c1b2a | Michael Hanselmann | running, it must be held for all job queue operations |
66 | a25c1b2a | Michael Hanselmann | |
67 | a25c1b2a | Michael Hanselmann | There's one special case for the node daemon running on the master node. If |
68 | a25c1b2a | Michael Hanselmann | grabbing the lock in exclusive fails on startup, the code assumes all checks |
69 | a25c1b2a | Michael Hanselmann | have been done by the process keeping the lock. |