Revision 5ee09f03

b/doc/design-2.1.rst
9 9
changing too much of the core code, while addressing issues and adding new
10 10
features and improvements over 2.0, in a timely fashion.
11 11

  
12
.. contents:: :depth: 3
12
.. contents:: :depth: 4
13 13

  
14 14
Objective
15 15
=========
......
80 80
this is a lightweight framework, for abstracting the different storage
81 81
operation, and not for modelling the storage hierarchy.
82 82

  
83

  
84
Locking improvements
85
~~~~~~~~~~~~~~~~~~~~
86

  
87
Current State and shortcomings
88
++++++++++++++++++++++++++++++
89

  
90
The class ``LockSet`` (see ``lib/locking.py``) is a container for one or many
91
``SharedLock`` instances. It provides an interface to add/remove locks and to
92
acquire and subsequently release any number of those locks contained in it.
93

  
94
Locks in a ``LockSet`` are always acquired in alphabetic order. Due to the way
95
we're using locks for nodes and instances (the single cluster lock isn't
96
affected by this issue) this can lead to long delays when acquiring locks if
97
another operation tries to acquire multiple locks but has to wait for yet
98
another operation.
99

  
100
In the following demonstration we assume to have the instance locks ``inst1``,
101
``inst2``, ``inst3`` and ``inst4``.
102

  
103
#. Operation A grabs lock for instance ``inst4``.
104
#. Operation B wants to acquire all instance locks in alphabetic order, but it
105
   has to wait for ``inst4``.
106
#. Operation C tries to lock ``inst1``, but it has to wait until
107
   Operation B (which is trying to acquire all locks) releases the lock again.
108
#. Operation A finishes and releases lock on ``inst4``. Operation B can
109
   continue and eventually releases all locks.
110
#. Operation C can get ``inst1`` lock and finishes.
111

  
112
Technically there's no need for Operation C to wait for Operation A, and
113
subsequently Operation B, to finish. Operation B can't continue until
114
Operation A is done (it has to wait for ``inst4``), anyway.
115

  
116
Proposed changes
117
++++++++++++++++
118

  
119
Non-blocking lock acquiring
120
^^^^^^^^^^^^^^^^^^^^^^^^^^^
121

  
122
Acquiring locks for OpCode execution is always done in blocking mode. They
123
won't return until the lock has successfully been acquired (or an error
124
occurred, although we won't cover that case here).
125

  
126
``SharedLock`` and ``LockSet`` must be able to be acquired in a
127
non-blocking way. They must support a timeout and abort trying to acquire
128
the lock(s) after the specified amount of time.
129

  
130
Retry acquiring locks
131
^^^^^^^^^^^^^^^^^^^^^
132

  
133
To prevent other operations from waiting for a long time, such as described in
134
the demonstration before, ``LockSet`` must not keep locks for a prolonged period
135
of time when trying to acquire two or more locks. Instead it should, with an
136
increasing timeout for acquiring all locks, release all locks again and
137
sleep some time if it fails to acquire all requested locks.
138

  
139
A good timeout value needs to be determined. In any case should ``LockSet``
140
proceed to acquire locks in blocking mode after a few (unsuccessful) attempts
141
to acquire all requested locks.
142

  
143
One proposal for the timeout is to use ``2**tries`` seconds, where ``tries``
144
is the number of unsuccessful tries.
145

  
146
In the demonstration before this would allow Operation C to continue after
147
Operation B unsuccessfully tried to acquire all locks and released all
148
acquired locks (``inst1``, ``inst2`` and ``inst3``) again.
149

  
150
Other solutions discussed
151
+++++++++++++++++++++++++
152

  
153
There was also some discussion on going one step further and extend the job
154
queue (see ``lib/jqueue.py``) to select the next task for a worker depending on
155
whether it can acquire the necessary locks. While this may reduce the number of
156
necessary worker threads and/or increase throughput on large clusters with many
157
jobs, it also brings many potential problems, such as contention and increased
158
memory usage, with it. As this would be an extension of the changes proposed
159
before it could be implemented at a later point in time, but we decided to stay
160
with the simpler solution for now.
161

  
162

  
83 163
Feature changes
84 164
---------------
85 165

  

Also available in: Unified diff