Revision bc69c426
ID | bc69c4265a660190048e5e2dd298df412017c4fa |
Parent | 22114677 |
Child | a1116f57 |
Stop watcher from restarting down instances during an opcode
This patch changes the watcher to check whether an instance that is
down is also locked by some LU before attempting to restart the
instance. Without checking the lock status, the watcher could think
that an instance that is being failed over is actually down, for
example.
This problem occurs because there is a significant time window between
'xm stop' and 'xm destroy' during which an instance is reported as
being down, but the watcher should not act during this period.
This fixes issue 734.
This introduces issue 743.
Unfortunately, this fix introduces a race condition given at the
moment it is not possible to query the instance status and the lock
status simultaneously. It won't be possible to fix this race
condition until after the locks have been migrated completely to
Haskell and the cluster configuration can be functionally updated in
Haskell, which will also allow the simultaneous queries.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>
Reviewed-by: Michele Tartara <mtartara@google.com>
Files
- added
- modified
- copied
- renamed
- deleted