« Previous | Next » 

Revision bc69c426

IDbc69c4265a660190048e5e2dd298df412017c4fa
Parent 22114677
Child a1116f57

Added by Jose A. Lopes about 10 years ago

Stop watcher from restarting down instances during an opcode

This patch changes the watcher to check whether an instance that is
down is also locked by some LU before attempting to restart the
instance. Without checking the lock status, the watcher could think
that an instance that is being failed over is actually down, for
example.

This problem occurs because there is a significant time window between
'xm stop' and 'xm destroy' during which an instance is reported as
being down, but the watcher should not act during this period.

This fixes issue 734.
This introduces issue 743.

Unfortunately, this fix introduces a race condition given at the
moment it is not possible to query the instance status and the lock
status simultaneously. It won't be possible to fix this race
condition until after the locks have been migrated completely to
Haskell and the cluster configuration can be functionally updated in
Haskell, which will also allow the simultaneous queries.

Signed-off-by: Jose A. Lopes <>
Reviewed-by: Michele Tartara <>

Files

  • added
  • modified
  • copied
  • renamed
  • deleted

View differences