Revision 5eeb7168
b/doc/design-daemons.rst | ||
---|---|---|
261 | 261 |
MasterD will cease to exist as a deamon on its own at this point, but not |
262 | 262 |
before. |
263 | 263 |
|
264 |
WConfD details |
|
265 |
-------------- |
|
266 |
|
|
267 |
WConfD will communicate with its clients through a Unix domain socket for both |
|
268 |
configuration management and locking. Clients can issue multiple RPC calls |
|
269 |
through one socket. For each such a call the client sends a JSON request |
|
270 |
document with a remote function name and data for its arguments. The server |
|
271 |
replies with a JSON response document containing either the result of |
|
272 |
signalling a failure. |
|
273 |
|
|
274 |
There will be a special RPC call for identifying a client when connecting to |
|
275 |
WConfD. The client will tell WConfD it's job number and process ID. WConfD will |
|
276 |
fail any other RPC calls before a client identifies this way. |
|
277 |
|
|
278 |
Any state associated with client processes will be mirrored on persistent |
|
279 |
storage and linked to the identity of processes so that the WConfD daemon will |
|
280 |
be able to resume its operation at any point after a restart or a crash. WConfD |
|
281 |
will track each client's process start time along with its process ID to be |
|
282 |
able detect if a process dies and it's process ID is reused. WConfD will clear |
|
283 |
all locks and other state associated with a client if it detects it's process |
|
284 |
no longer exists. |
|
285 |
|
|
286 |
Configuration management |
|
287 |
++++++++++++++++++++++++ |
|
288 |
|
|
289 |
The new configuration management protocol will be implemented in the following |
|
290 |
steps: |
|
291 |
|
|
292 |
#. Reimplement all current methods of ``ConfigWriter`` for reading and writing |
|
293 |
the configuration of a cluster in WConfD. |
|
294 |
#. Expose each of those functions in WConfD as a RPC function. This will allow |
|
295 |
easy future extensions or modifications. |
|
296 |
#. Replace ``ConfigWriter`` with a stub (preferably automatically generated |
|
297 |
from the Haskell code) that will contain the same methods as the current |
|
298 |
``ConfigWriter`` and delegate all calls to its methods to WConfD. |
|
299 |
|
|
300 |
After this step it'll be possible access the configuration from separate |
|
301 |
processes. |
|
302 |
|
|
303 |
Future aims: |
|
304 |
|
|
305 |
- Optionally refactor the RPC calls to reduce their number or improve their |
|
306 |
efficiency (for example by obtaining a larger set of data instead of |
|
307 |
querying items one by one). |
|
308 |
|
|
309 |
Locking |
|
310 |
+++++++ |
|
311 |
|
|
312 |
The new locking protocol will be implemented as follows: |
|
313 |
|
|
314 |
Re-implement the current locking mechanism in WConfD and expose it for RPC |
|
315 |
calls. All current locks will be mapped into a data structure that will |
|
316 |
uniquely identify them (storing lock's level together with it's name). |
|
317 |
|
|
318 |
WConfD will impose a linear order on locks. The order will be compatible |
|
319 |
with the current ordering of lock levels so that existing code will work |
|
320 |
without changes. |
|
321 |
|
|
322 |
WConfD will keep the set of currently held locks for each client. The |
|
323 |
protocol will allow the following operations on the set: |
|
324 |
|
|
325 |
*Update:* |
|
326 |
Update the current set of locks according to a given list. The list contains |
|
327 |
locks and their desired level (release / shared / exclusive). To prevent |
|
328 |
deadlocks, WConfD will check that all newly requested locks (or already held |
|
329 |
locks requested to be upgraded to *exclusive*) are greater in the sense of |
|
330 |
the linear order than all currently held locks, and fail the operation if |
|
331 |
not. Only the locks in the list will be updated, other locks already held |
|
332 |
will be left intact. If the operation fails, the client's lock set will be |
|
333 |
left intact. |
|
334 |
*Opportunistic union:* |
|
335 |
Add as much as possible locks from a given set to the current set within a |
|
336 |
given timeout. WConfD will again check the proper order of locks and |
|
337 |
acquire only the ones that are allowed wrt. the current set. Returns the |
|
338 |
set of acquired locks, possibly empty. Immediate. Never fails. (It would also |
|
339 |
be possible to extend the operation to try to wait until a given number of |
|
340 |
locks is available, or a given timeout elapses.) |
|
341 |
*List:* |
|
342 |
List the current set of held locks. Immediate, never fails. |
|
343 |
*Intersection:* |
|
344 |
Retain only a given set of locks in the current one. This function is |
|
345 |
provided for convenience, it's redundant wrt. *list* and *update*. Immediate, |
|
346 |
never fails. |
|
347 |
|
|
348 |
After this step it'll be possible to use locks from jobs as separate processes. |
|
349 |
|
|
350 |
The above set of operations allows the clients to use various work-flows. In particular: |
|
351 |
|
|
352 |
Pessimistic strategy: |
|
353 |
Lock all potentially relevant resources (for example all nodes), determine |
|
354 |
which will be needed, and release all the others. |
|
355 |
Optimistic strategy: |
|
356 |
Determine what locks need to be acquired without holding any. Lock the |
|
357 |
required set of locks. Determine the set of required locks again and check if |
|
358 |
they are all held. If not, release everything and restart. |
|
359 |
|
|
360 |
.. COMMENTED OUT: |
|
361 |
Start with the smallest set of locks and when determining what more |
|
362 |
relevant resources will be needed, expand the set. If an *union* operation |
|
363 |
fails, release all locks, acquire the desired union and restart the |
|
364 |
operation so that all preconditions and possible concurrent changes are |
|
365 |
checked again. |
|
366 |
|
|
367 |
Future aims: |
|
368 |
|
|
369 |
- Add more fine-grained locks to prevent unnecessary blocking of jobs. This |
|
370 |
could include locks on parameters of entities or locks on their states (so that |
|
371 |
a node remains online, but otherwise can change, etc.). In particular, |
|
372 |
adding, moving and removing instances currently blocks the whole node. |
|
373 |
- Add checks that all modified configuration parameters belong to entities |
|
374 |
the client has locked and log violations. |
|
375 |
- Make the above checks mandatory. |
|
376 |
- Automate optimistic locking and checking the locks in logical units. |
|
377 |
For example, this could be accomplished by allowing some of the initial |
|
378 |
phases of `LogicalUnit` (such as `ExpandNames` and `DeclareLocks`) to be run |
|
379 |
repeatedly, checking if the set of locks requested the second time is |
|
380 |
contained in the set acquired after the first pass. |
|
381 |
- Add the possibility for a job to reserve hardware resources such as disk |
|
382 |
space or memory on nodes. Most likely as a new, special kind of instances |
|
383 |
that would only block its resources and allow to be converted to a regular |
|
384 |
instance. This would allow long-running jobs such as instance creation or |
|
385 |
move to lock the corresponding nodes, acquire the resources and turn the |
|
386 |
locks into shared ones, keeping an exclusive lock only on the instance. |
|
387 |
- Use more sophisticated algorithm for preventing deadlocks such as a |
|
388 |
`wait-for graph`_. This would allow less *union* failures and allow more |
|
389 |
optimistic, scalable acquisition of locks. |
|
390 |
|
|
391 |
.. _`wait-for graph`: http://en.wikipedia.org/wiki/Wait-for_graph |
|
392 |
|
|
393 |
|
|
264 | 394 |
Further considerations |
265 | 395 |
====================== |
266 | 396 |
|
Also available in: Unified diff