Revision 5eeb7168

b/doc/design-daemons.rst
261 261
   MasterD will cease to exist as a deamon on its own at this point, but not
262 262
   before.
263 263

  
264
WConfD details
265
--------------
266

  
267
WConfD will communicate with its clients through a Unix domain socket for both
268
configuration management and locking. Clients can issue multiple RPC calls
269
through one socket. For each such a call the client sends a JSON request
270
document with a remote function name and data for its arguments. The server
271
replies with a JSON response document containing either the result of
272
signalling a failure.
273

  
274
There will be a special RPC call for identifying a client when connecting to
275
WConfD. The client will tell WConfD it's job number and process ID. WConfD will
276
fail any other RPC calls before a client identifies this way.
277

  
278
Any state associated with client processes will be mirrored on persistent
279
storage and linked to the identity of processes so that the WConfD daemon will
280
be able to resume its operation at any point after a restart or a crash. WConfD
281
will track each client's process start time along with its process ID to be
282
able detect if a process dies and it's process ID is reused.  WConfD will clear
283
all locks and other state associated with a client if it detects it's process
284
no longer exists.
285

  
286
Configuration management
287
++++++++++++++++++++++++
288

  
289
The new configuration management protocol will be implemented in the following
290
steps:
291

  
292
#. Reimplement all current methods of ``ConfigWriter`` for reading and writing
293
   the configuration of a cluster in WConfD.
294
#. Expose each of those functions in WConfD as a RPC function. This will allow
295
   easy future extensions or modifications.
296
#. Replace ``ConfigWriter`` with a stub (preferably automatically generated
297
   from the Haskell code) that will contain the same methods as the current
298
   ``ConfigWriter`` and delegate all calls to its methods to WConfD.
299

  
300
After this step it'll be possible access the configuration from separate
301
processes.
302

  
303
Future aims:
304

  
305
-  Optionally refactor the RPC calls to reduce their number or improve their
306
   efficiency (for example by obtaining a larger set of data instead of
307
   querying items one by one).
308

  
309
Locking
310
+++++++
311

  
312
The new locking protocol will be implemented as follows:
313

  
314
Re-implement the current locking mechanism in WConfD and expose it for RPC
315
calls. All current locks will be mapped into a data structure that will
316
uniquely identify them (storing lock's level together with it's name).
317

  
318
WConfD will impose a linear order on locks. The order will be compatible
319
with the current ordering of lock levels so that existing code will work
320
without changes.
321

  
322
WConfD will keep the set of currently held locks for each client. The
323
protocol will allow the following operations on the set:
324

  
325
*Update:*
326
  Update the current set of locks according to a given list. The list contains
327
  locks and their desired level (release / shared / exclusive). To prevent
328
  deadlocks, WConfD will check that all newly requested locks (or already held
329
  locks requested to be upgraded to *exclusive*) are greater in the sense of
330
  the linear order than all currently held locks, and fail the operation if
331
  not. Only the locks in the list will be updated, other locks already held
332
  will be left intact. If the operation fails, the client's lock set will be
333
  left intact.
334
*Opportunistic union:*
335
  Add as much as possible locks from a given set to the current set within a
336
  given timeout. WConfD will again check the proper order of locks and
337
  acquire only the ones that are allowed wrt. the current set.  Returns the
338
  set of acquired locks, possibly empty. Immediate. Never fails. (It would also
339
  be possible to extend the operation to try to wait until a given number of
340
  locks is available, or a given timeout elapses.)
341
*List:*
342
  List the current set of held locks. Immediate, never fails.
343
*Intersection:*
344
  Retain only a given set of locks in the current one. This function is
345
  provided for convenience, it's redundant wrt. *list* and *update*. Immediate,
346
  never fails.
347

  
348
After this step it'll be possible to use locks from jobs as separate processes.
349

  
350
The above set of operations allows the clients to use various work-flows. In particular:
351

  
352
Pessimistic strategy:
353
  Lock all potentially relevant resources (for example all nodes), determine
354
  which will be needed, and release all the others.
355
Optimistic strategy:
356
  Determine what locks need to be acquired without holding any. Lock the
357
  required set of locks. Determine the set of required locks again and check if
358
  they are all held. If not, release everything and restart.
359

  
360
.. COMMENTED OUT:
361
  Start with the smallest set of locks and when determining what more
362
  relevant resources will be needed, expand the set. If an *union* operation
363
  fails, release all locks, acquire the desired union and restart the
364
  operation so that all preconditions and possible concurrent changes are
365
  checked again.
366

  
367
Future aims:
368

  
369
-  Add more fine-grained locks to prevent unnecessary blocking of jobs. This
370
   could include locks on parameters of entities or locks on their states (so that
371
   a node remains online, but otherwise can change, etc.). In particular,
372
   adding, moving and removing instances currently blocks the whole node.
373
-  Add checks that all modified configuration parameters belong to entities
374
   the client has locked and log violations.
375
-  Make the above checks mandatory.
376
-  Automate optimistic locking and checking the locks in logical units.
377
   For example, this could be accomplished by allowing some of the initial
378
   phases of `LogicalUnit` (such as `ExpandNames` and `DeclareLocks`) to be run
379
   repeatedly, checking if the set of locks requested the second time is
380
   contained in the set acquired after the first pass.
381
-  Add the possibility for a job to reserve hardware resources such as disk
382
   space or memory on nodes. Most likely as a new, special kind of instances
383
   that would only block its resources and allow to be converted to a regular
384
   instance. This would allow long-running jobs such as instance creation or
385
   move to lock the corresponding nodes, acquire the resources and turn the
386
   locks into shared ones, keeping an exclusive lock only on the instance.
387
-  Use more sophisticated algorithm for preventing deadlocks such as a
388
   `wait-for graph`_. This would allow less *union* failures and allow more
389
   optimistic, scalable acquisition of locks.
390

  
391
.. _`wait-for graph`: http://en.wikipedia.org/wiki/Wait-for_graph
392

  
393

  
264 394
Further considerations
265 395
======================
266 396

  

Also available in: Unified diff