Statistics
| Branch: | Tag: | Revision:

root / doc / locking.rst @ 558fd122

History | View | Annotate | Download (3.2 kB)

1 a25c1b2a Michael Hanselmann
Ganeti locking
2 a25c1b2a Michael Hanselmann
==============
3 a25c1b2a Michael Hanselmann
4 a25c1b2a Michael Hanselmann
Introduction
5 a25c1b2a Michael Hanselmann
------------
6 0f933d15 Guido Trotter
7 0f933d15 Guido Trotter
This document describes lock order dependencies in Ganeti.
8 0f933d15 Guido Trotter
It is divided by functional sections
9 0f933d15 Guido Trotter
10 0f933d15 Guido Trotter
11 a25c1b2a Michael Hanselmann
Opcode Execution Locking
12 a25c1b2a Michael Hanselmann
------------------------
13 0f933d15 Guido Trotter
14 0f933d15 Guido Trotter
These locks are declared by Logical Units (LUs) (in cmdlib.py) and acquired by
15 0f933d15 Guido Trotter
the Processor (in mcpu.py) with the aid of the Ganeti Locking Library
16 0f933d15 Guido Trotter
(locking.py). They are acquired in the following order:
17 0f933d15 Guido Trotter
18 0f933d15 Guido Trotter
  * BGL: this is the Big Ganeti Lock, it exists for retrocompatibility. New LUs
19 0f933d15 Guido Trotter
    acquire it in a shared fashion, and are able to execute all toghether
20 0f933d15 Guido Trotter
    (baring other lock waits) while old LUs acquire it exclusively and can only
21 0f933d15 Guido Trotter
    execute one at a time, and not at the same time with new LUs.
22 9725b53d Michael Hanselmann
  * Instance locks: can be declared in ExpandNames() or DeclareLocks() by an LU,
23 0f933d15 Guido Trotter
    and have the same name as the instance itself. They are acquired as a set.
24 0f933d15 Guido Trotter
    Internally the locking library acquired them in alphabetical order.
25 9725b53d Michael Hanselmann
  * Node locks: can be declared in ExpandNames() or DeclareLocks() by an LU, and
26 0f933d15 Guido Trotter
    have the same name as the node itself. They are acquired as a set.
27 0f933d15 Guido Trotter
    Internally the locking library acquired them in alphabetical order. Given
28 0f933d15 Guido Trotter
    this order it's possible to safely acquire a set of instances, and then the
29 0f933d15 Guido Trotter
    nodes they reside on.
30 0f933d15 Guido Trotter
31 0f933d15 Guido Trotter
The ConfigWriter (in config.py) is also protected by a SharedLock, which is
32 0f933d15 Guido Trotter
shared by functions that read the config and acquired exclusively by functions
33 0f933d15 Guido Trotter
that modify it. Since the ConfigWriter calls rpc.call_upload_file to all nodes
34 0f933d15 Guido Trotter
to distribute the config without holding the node locks, this call must be able
35 0f933d15 Guido Trotter
to execute on the nodes in parallel with other operations (but not necessarily
36 0f933d15 Guido Trotter
concurrently with itself on the same file, as inside the ConfigWriter this is
37 0f933d15 Guido Trotter
called with the internal config lock held.
38 0f933d15 Guido Trotter
39 a25c1b2a Michael Hanselmann
40 a25c1b2a Michael Hanselmann
Job Queue Locking
41 a25c1b2a Michael Hanselmann
-----------------
42 a25c1b2a Michael Hanselmann
43 a25c1b2a Michael Hanselmann
The job queue is designed to be thread-safe. This means that its public
44 b5b22be2 Michael Hanselmann
functions can be called from any thread. The job queue can be called from
45 b5b22be2 Michael Hanselmann
functions called by the queue itself (e.g. logical units), but special
46 b5b22be2 Michael Hanselmann
attention must be paid not to create deadlocks or an invalid state.
47 a25c1b2a Michael Hanselmann
48 a25c1b2a Michael Hanselmann
The single queue lock is used from all classes involved in the queue handling.
49 b5b22be2 Michael Hanselmann
During development we tried to split locks, but deemed it to be too dangerous
50 b5b22be2 Michael Hanselmann
and difficult at the time. Job queue functions acquiring the lock can be safely
51 b5b22be2 Michael Hanselmann
called from all the rest of the code, as the lock is released before leaving
52 b5b22be2 Michael Hanselmann
the job queue again. Unlocked functions should only be called from job queue
53 b5b22be2 Michael Hanselmann
related classes (e.g. in jqueue.py) and the lock must be acquired beforehand.
54 a25c1b2a Michael Hanselmann
55 a25c1b2a Michael Hanselmann
In the job queue worker (``_JobQueueWorker``), the lock must be released before
56 a25c1b2a Michael Hanselmann
calling the LU processor. Otherwise a deadlock can occur when log messages are
57 a25c1b2a Michael Hanselmann
added to opcode results.
58 a25c1b2a Michael Hanselmann
59 a25c1b2a Michael Hanselmann
60 a25c1b2a Michael Hanselmann
Node Daemon Locking
61 a25c1b2a Michael Hanselmann
-------------------
62 a25c1b2a Michael Hanselmann
63 a25c1b2a Michael Hanselmann
The node daemon contains a lock for the job queue. In order to avoid conflicts
64 a25c1b2a Michael Hanselmann
and/or corruption when an eventual master daemon or another node daemon is
65 a25c1b2a Michael Hanselmann
running, it must be held for all job queue operations
66 a25c1b2a Michael Hanselmann
67 a25c1b2a Michael Hanselmann
There's one special case for the node daemon running on the master node. If
68 a25c1b2a Michael Hanselmann
grabbing the lock in exclusive fails on startup, the code assumes all checks
69 a25c1b2a Michael Hanselmann
have been done by the process keeping the lock.
70 558fd122 Michael Hanselmann
71 558fd122 Michael Hanselmann
.. vim: set textwidth=72 :