Revision ca9ccea8 doc/design-2.1.rst

b/doc/design-2.1.rst
161 161
extension of the changes proposed before it could be implemented at a later
162 162
point in time, but we decided to stay with the simpler solution for now.
163 163

  
164
Implementation details
165
++++++++++++++++++++++
166

  
167
``SharedLock`` redesign
168
^^^^^^^^^^^^^^^^^^^^^^^
169

  
170
The current design of ``SharedLock`` is not good for supporting timeouts
171
when acquiring a lock and there are also minor fairness issues in it. We
172
plan to address both with a redesign. A proof of concept implementation was
173
written and resulted in significantly simpler code.
174

  
175
Currently ``SharedLock`` uses two separate queues for shared and exclusive
176
acquires and waiters get to run in turns. This means if an exclusive acquire
177
is released, the lock will allow shared waiters to run and vice versa.
178
Although it's still fair in the end there is a slight bias towards shared
179
waiters in the current implementation. The same implementation with two
180
shared queues can not support timeouts without adding a lot of complexity.
181

  
182
Our proposed redesign changes ``SharedLock`` to have only one single queue.
183
There will be one condition (see Condition_ for a note about performance) in
184
the queue per exclusive acquire and two for all shared acquires (see below for
185
an explanation). The maximum queue length will always be ``2 + (number of
186
exclusive acquires waiting)``. The number of queue entries for shared acquires
187
can vary from 0 to 2.
188

  
189
The two conditions for shared acquires are a bit special. They will be used
190
in turn. When the lock is instantiated, no conditions are in the queue. As
191
soon as the first shared acquire arrives (and there are holder(s) or waiting
192
acquires; see Acquire_), the active condition is added to the queue. Until
193
it becomes the topmost condition in the queue and has been notified, any
194
shared acquire is added to this active condition. When the active condition
195
is notified, the conditions are swapped and further shared acquires are
196
added to the previously inactive condition (which has now become the active
197
condition). After all waiters on the previously active (now inactive) and
198
now notified condition received the notification, it is removed from the
199
queue of pending acquires.
200

  
201
This means shared acquires will skip any exclusive acquire in the queue. We
202
believe it's better to improve parallelization on operations only asking for
203
shared (or read-only) locks. Exclusive operations holding the same lock can
204
not be parallelized.
205

  
206

  
207
Acquire
208
*******
209

  
210
For exclusive acquires a new condition is created and appended to the queue.
211
Shared acquires are added to the active condition for shared acquires and if
212
the condition is not yet on the queue, it's appended.
213

  
214
The next step is to wait for our condition to be on the top of the queue (to
215
guarantee fairness). If the timeout expired, we return to the caller without
216
acquiring the lock. On every notification we check whether the lock has been
217
deleted, in which case an error is returned to the caller.
218

  
219
The lock can be acquired if we're on top of the queue (there is no one else
220
ahead of us). For an exclusive acquire, there must not be other exclusive or
221
shared holders. For a shared acquire, there must not be an exclusive holder.
222
If these conditions are all true, the lock is acquired and we return to the
223
caller. In any other case we wait again on the condition.
224

  
225
If it was the last waiter on a condition, the condition is removed from the
226
queue.
227

  
228
Optimization: There's no need to touch the queue if there are no pending
229
acquires and no current holders. The caller can have the lock immediately.
230

  
231
.. image:: design-2.1-lock-acquire.png
232

  
233

  
234
Release
235
*******
236

  
237
First the lock removes the caller from the internal owner list. If there are
238
pending acquires in the queue, the first (the oldest) condition is notified.
239

  
240
If the first condition was the active condition for shared acquires, the
241
inactive condition will be made active. This ensures fairness with exclusive
242
locks by forcing consecutive shared acquires to wait in the queue.
243

  
244
.. image:: design-2.1-lock-release.png
245

  
246

  
247
Delete
248
******
249

  
250
The caller must either hold the lock in exclusive mode already or the lock
251
must be acquired in exclusive mode. Trying to delete a lock while it's held
252
in shared mode must fail.
253

  
254
After ensuring the lock is held in exclusive mode, the lock will mark itself
255
as deleted and continue to notify all pending acquires. They will wake up,
256
notice the deleted lock and return an error to the caller.
257

  
258

  
259
Condition
260
^^^^^^^^^
261

  
262
Note: This is not necessary for the locking changes above, but it may be a
263
good optimization (pending performance tests).
264

  
265
The existing locking code in Ganeti 2.0 uses Python's built-in
266
``threading.Condition`` class. Unfortunately ``Condition`` implements
267
timeouts by sleeping 1ms to 20ms between tries to acquire the condition lock
268
in non-blocking mode. This requires unnecessary context switches and
269
contention on the CPython GIL (Global Interpreter Lock).
270

  
271
By using POSIX pipes (see ``pipe(2)``) we can use the operating system's
272
support for timeouts on file descriptors (see ``select(2)``). A custom
273
condition class will have to be written for this.
274

  
275
On instantiation the class creates a pipe. After each notification the
276
previous pipe is abandoned and re-created (technically the old pipe needs to
277
stay around until all notifications have been delivered).
278

  
279
All waiting clients of the condition use ``select(2)`` or ``poll(2)`` to
280
wait for notifications, optionally with a timeout. A notification will be
281
signalled to the waiting clients by closing the pipe. If the pipe wasn't
282
closed during the timeout, the waiting function returns to its caller
283
nonetheless.
284

  
164 285

  
165 286
Feature changes
166 287
---------------

Also available in: Unified diff