161 |
161 |
extension of the changes proposed before it could be implemented at a later
|
162 |
162 |
point in time, but we decided to stay with the simpler solution for now.
|
163 |
163 |
|
|
164 |
Implementation details
|
|
165 |
++++++++++++++++++++++
|
|
166 |
|
|
167 |
``SharedLock`` redesign
|
|
168 |
^^^^^^^^^^^^^^^^^^^^^^^
|
|
169 |
|
|
170 |
The current design of ``SharedLock`` is not good for supporting timeouts
|
|
171 |
when acquiring a lock and there are also minor fairness issues in it. We
|
|
172 |
plan to address both with a redesign. A proof of concept implementation was
|
|
173 |
written and resulted in significantly simpler code.
|
|
174 |
|
|
175 |
Currently ``SharedLock`` uses two separate queues for shared and exclusive
|
|
176 |
acquires and waiters get to run in turns. This means if an exclusive acquire
|
|
177 |
is released, the lock will allow shared waiters to run and vice versa.
|
|
178 |
Although it's still fair in the end there is a slight bias towards shared
|
|
179 |
waiters in the current implementation. The same implementation with two
|
|
180 |
shared queues can not support timeouts without adding a lot of complexity.
|
|
181 |
|
|
182 |
Our proposed redesign changes ``SharedLock`` to have only one single queue.
|
|
183 |
There will be one condition (see Condition_ for a note about performance) in
|
|
184 |
the queue per exclusive acquire and two for all shared acquires (see below for
|
|
185 |
an explanation). The maximum queue length will always be ``2 + (number of
|
|
186 |
exclusive acquires waiting)``. The number of queue entries for shared acquires
|
|
187 |
can vary from 0 to 2.
|
|
188 |
|
|
189 |
The two conditions for shared acquires are a bit special. They will be used
|
|
190 |
in turn. When the lock is instantiated, no conditions are in the queue. As
|
|
191 |
soon as the first shared acquire arrives (and there are holder(s) or waiting
|
|
192 |
acquires; see Acquire_), the active condition is added to the queue. Until
|
|
193 |
it becomes the topmost condition in the queue and has been notified, any
|
|
194 |
shared acquire is added to this active condition. When the active condition
|
|
195 |
is notified, the conditions are swapped and further shared acquires are
|
|
196 |
added to the previously inactive condition (which has now become the active
|
|
197 |
condition). After all waiters on the previously active (now inactive) and
|
|
198 |
now notified condition received the notification, it is removed from the
|
|
199 |
queue of pending acquires.
|
|
200 |
|
|
201 |
This means shared acquires will skip any exclusive acquire in the queue. We
|
|
202 |
believe it's better to improve parallelization on operations only asking for
|
|
203 |
shared (or read-only) locks. Exclusive operations holding the same lock can
|
|
204 |
not be parallelized.
|
|
205 |
|
|
206 |
|
|
207 |
Acquire
|
|
208 |
*******
|
|
209 |
|
|
210 |
For exclusive acquires a new condition is created and appended to the queue.
|
|
211 |
Shared acquires are added to the active condition for shared acquires and if
|
|
212 |
the condition is not yet on the queue, it's appended.
|
|
213 |
|
|
214 |
The next step is to wait for our condition to be on the top of the queue (to
|
|
215 |
guarantee fairness). If the timeout expired, we return to the caller without
|
|
216 |
acquiring the lock. On every notification we check whether the lock has been
|
|
217 |
deleted, in which case an error is returned to the caller.
|
|
218 |
|
|
219 |
The lock can be acquired if we're on top of the queue (there is no one else
|
|
220 |
ahead of us). For an exclusive acquire, there must not be other exclusive or
|
|
221 |
shared holders. For a shared acquire, there must not be an exclusive holder.
|
|
222 |
If these conditions are all true, the lock is acquired and we return to the
|
|
223 |
caller. In any other case we wait again on the condition.
|
|
224 |
|
|
225 |
If it was the last waiter on a condition, the condition is removed from the
|
|
226 |
queue.
|
|
227 |
|
|
228 |
Optimization: There's no need to touch the queue if there are no pending
|
|
229 |
acquires and no current holders. The caller can have the lock immediately.
|
|
230 |
|
|
231 |
.. image:: design-2.1-lock-acquire.png
|
|
232 |
|
|
233 |
|
|
234 |
Release
|
|
235 |
*******
|
|
236 |
|
|
237 |
First the lock removes the caller from the internal owner list. If there are
|
|
238 |
pending acquires in the queue, the first (the oldest) condition is notified.
|
|
239 |
|
|
240 |
If the first condition was the active condition for shared acquires, the
|
|
241 |
inactive condition will be made active. This ensures fairness with exclusive
|
|
242 |
locks by forcing consecutive shared acquires to wait in the queue.
|
|
243 |
|
|
244 |
.. image:: design-2.1-lock-release.png
|
|
245 |
|
|
246 |
|
|
247 |
Delete
|
|
248 |
******
|
|
249 |
|
|
250 |
The caller must either hold the lock in exclusive mode already or the lock
|
|
251 |
must be acquired in exclusive mode. Trying to delete a lock while it's held
|
|
252 |
in shared mode must fail.
|
|
253 |
|
|
254 |
After ensuring the lock is held in exclusive mode, the lock will mark itself
|
|
255 |
as deleted and continue to notify all pending acquires. They will wake up,
|
|
256 |
notice the deleted lock and return an error to the caller.
|
|
257 |
|
|
258 |
|
|
259 |
Condition
|
|
260 |
^^^^^^^^^
|
|
261 |
|
|
262 |
Note: This is not necessary for the locking changes above, but it may be a
|
|
263 |
good optimization (pending performance tests).
|
|
264 |
|
|
265 |
The existing locking code in Ganeti 2.0 uses Python's built-in
|
|
266 |
``threading.Condition`` class. Unfortunately ``Condition`` implements
|
|
267 |
timeouts by sleeping 1ms to 20ms between tries to acquire the condition lock
|
|
268 |
in non-blocking mode. This requires unnecessary context switches and
|
|
269 |
contention on the CPython GIL (Global Interpreter Lock).
|
|
270 |
|
|
271 |
By using POSIX pipes (see ``pipe(2)``) we can use the operating system's
|
|
272 |
support for timeouts on file descriptors (see ``select(2)``). A custom
|
|
273 |
condition class will have to be written for this.
|
|
274 |
|
|
275 |
On instantiation the class creates a pipe. After each notification the
|
|
276 |
previous pipe is abandoned and re-created (technically the old pipe needs to
|
|
277 |
stay around until all notifications have been delivered).
|
|
278 |
|
|
279 |
All waiting clients of the condition use ``select(2)`` or ``poll(2)`` to
|
|
280 |
wait for notifications, optionally with a timeout. A notification will be
|
|
281 |
signalled to the waiting clients by closing the pipe. If the pipe wasn't
|
|
282 |
closed during the timeout, the waiting function returns to its caller
|
|
283 |
nonetheless.
|
|
284 |
|
164 |
285 |
|
165 |
286 |
Feature changes
|
166 |
287 |
---------------
|