Statistics
| Branch: | Tag: | Revision:

root / doc / locking.rst @ ab6536ba

History | View | Annotate | Download (3.2 kB)

1 a25c1b2a Michael Hanselmann
Ganeti locking
2 a25c1b2a Michael Hanselmann
==============
3 a25c1b2a Michael Hanselmann
4 a25c1b2a Michael Hanselmann
Introduction
5 a25c1b2a Michael Hanselmann
------------
6 0f933d15 Guido Trotter
7 0f933d15 Guido Trotter
This document describes lock order dependencies in Ganeti.
8 0f933d15 Guido Trotter
It is divided by functional sections
9 0f933d15 Guido Trotter
10 0f933d15 Guido Trotter
11 a25c1b2a Michael Hanselmann
Opcode Execution Locking
12 a25c1b2a Michael Hanselmann
------------------------
13 0f933d15 Guido Trotter
14 7faf5110 Michael Hanselmann
These locks are declared by Logical Units (LUs) (in cmdlib.py) and
15 7faf5110 Michael Hanselmann
acquired by the Processor (in mcpu.py) with the aid of the Ganeti
16 7faf5110 Michael Hanselmann
Locking Library (locking.py). They are acquired in the following order:
17 7faf5110 Michael Hanselmann
18 7faf5110 Michael Hanselmann
  * BGL: this is the Big Ganeti Lock, it exists for retrocompatibility.
19 7faf5110 Michael Hanselmann
    New LUs acquire it in a shared fashion, and are able to execute all
20 7faf5110 Michael Hanselmann
    toghether (baring other lock waits) while old LUs acquire it
21 7faf5110 Michael Hanselmann
    exclusively and can only execute one at a time, and not at the same
22 7faf5110 Michael Hanselmann
    time with new LUs.
23 7faf5110 Michael Hanselmann
  * Instance locks: can be declared in ExpandNames() or DeclareLocks()
24 7faf5110 Michael Hanselmann
    by an LU, and have the same name as the instance itself. They are
25 7faf5110 Michael Hanselmann
    acquired as a set.  Internally the locking library acquired them in
26 7faf5110 Michael Hanselmann
    alphabetical order.
27 7faf5110 Michael Hanselmann
  * Node locks: can be declared in ExpandNames() or DeclareLocks() by an
28 7faf5110 Michael Hanselmann
    LU, and have the same name as the node itself. They are acquired as
29 7faf5110 Michael Hanselmann
    a set.  Internally the locking library acquired them in alphabetical
30 7faf5110 Michael Hanselmann
    order. Given this order it's possible to safely acquire a set of
31 7faf5110 Michael Hanselmann
    instances, and then the nodes they reside on.
32 7faf5110 Michael Hanselmann
33 7faf5110 Michael Hanselmann
The ConfigWriter (in config.py) is also protected by a SharedLock, which
34 7faf5110 Michael Hanselmann
is shared by functions that read the config and acquired exclusively by
35 7faf5110 Michael Hanselmann
functions that modify it. Since the ConfigWriter calls
36 7faf5110 Michael Hanselmann
rpc.call_upload_file to all nodes to distribute the config without
37 7faf5110 Michael Hanselmann
holding the node locks, this call must be able to execute on the nodes
38 7faf5110 Michael Hanselmann
in parallel with other operations (but not necessarily concurrently with
39 7faf5110 Michael Hanselmann
itself on the same file, as inside the ConfigWriter this is called with
40 7faf5110 Michael Hanselmann
the internal config lock held.
41 0f933d15 Guido Trotter
42 a25c1b2a Michael Hanselmann
43 a25c1b2a Michael Hanselmann
Job Queue Locking
44 a25c1b2a Michael Hanselmann
-----------------
45 a25c1b2a Michael Hanselmann
46 a25c1b2a Michael Hanselmann
The job queue is designed to be thread-safe. This means that its public
47 7faf5110 Michael Hanselmann
functions can be called from any thread. The job queue can be called
48 7faf5110 Michael Hanselmann
from functions called by the queue itself (e.g. logical units), but
49 7faf5110 Michael Hanselmann
special attention must be paid not to create deadlocks or an invalid
50 7faf5110 Michael Hanselmann
state.
51 a25c1b2a Michael Hanselmann
52 7faf5110 Michael Hanselmann
The single queue lock is used from all classes involved in the queue
53 7faf5110 Michael Hanselmann
handling.  During development we tried to split locks, but deemed it to
54 7faf5110 Michael Hanselmann
be too dangerous and difficult at the time. Job queue functions
55 7faf5110 Michael Hanselmann
acquiring the lock can be safely called from all the rest of the code,
56 7faf5110 Michael Hanselmann
as the lock is released before leaving the job queue again. Unlocked
57 7faf5110 Michael Hanselmann
functions should only be called from job queue related classes (e.g. in
58 7faf5110 Michael Hanselmann
jqueue.py) and the lock must be acquired beforehand.
59 a25c1b2a Michael Hanselmann
60 7faf5110 Michael Hanselmann
In the job queue worker (``_JobQueueWorker``), the lock must be released
61 7faf5110 Michael Hanselmann
before calling the LU processor. Otherwise a deadlock can occur when log
62 7faf5110 Michael Hanselmann
messages are added to opcode results.
63 a25c1b2a Michael Hanselmann
64 a25c1b2a Michael Hanselmann
65 a25c1b2a Michael Hanselmann
Node Daemon Locking
66 a25c1b2a Michael Hanselmann
-------------------
67 a25c1b2a Michael Hanselmann
68 7faf5110 Michael Hanselmann
The node daemon contains a lock for the job queue. In order to avoid
69 7faf5110 Michael Hanselmann
conflicts and/or corruption when an eventual master daemon or another
70 7faf5110 Michael Hanselmann
node daemon is running, it must be held for all job queue operations
71 a25c1b2a Michael Hanselmann
72 7faf5110 Michael Hanselmann
There's one special case for the node daemon running on the master node.
73 7faf5110 Michael Hanselmann
If grabbing the lock in exclusive fails on startup, the code assumes all
74 7faf5110 Michael Hanselmann
checks have been done by the process keeping the lock.
75 558fd122 Michael Hanselmann
76 558fd122 Michael Hanselmann
.. vim: set textwidth=72 :