Revision aa355c79 doc/design-2.1.rst

b/doc/design-2.1.rst
292 292
caller nonetheless.
293 293

  
294 294

  
295
Node daemon availability
296
~~~~~~~~~~~~~~~~~~~~~~~~
297

  
298
Current State and shortcomings
299
++++++++++++++++++++++++++++++
300

  
301
Currently, when a Ganeti node suffers serious system disk damage, the
302
migration/failover of an instance may not correctly shutdown the virtual
303
machine on the broken node causing instances duplication. The ``gnt-node
304
powercycle`` command can be used to force a node reboot and thus to
305
avoid duplicated instances. This command relies on node daemon
306
availability, though, and thus can fail if the node daemon has some
307
pages swapped out of ram, for example.
308

  
309

  
310
Proposed changes
311
++++++++++++++++
312

  
313
The proposed solution forces node daemon to run exclusively in RAM. It
314
uses python ctypes to to call ``mlockall(MCL_CURRENT | MCL_FUTURE)`` on
315
the node daemon process and all its children. In addition another log
316
handler has been implemented for node daemon to redirect to
317
``/dev/console`` messages that cannot be written on the logfile.
318

  
319
With these changes node daemon can successfully run basic tasks such as
320
a powercycle request even when the system disk is heavily damaged and
321
reading/writing to disk fails constantly.
322

  
323

  
295 324
Feature changes
296 325
---------------
297 326

  

Also available in: Unified diff