Revision f5116c87

« Previous | Next »

ID	f5116c873189a6e0743b29d0690e516621631728

Added by Iustin Pop almost 14 years ago

watcher: smarter handling of instance records

This patch implements a few changes to the instance handling. First, old
instances which no longer exist on the cluster are removed from the
state file, to keep things clean.

Second, the instance restart counters are reset every 8 hours, since
some error cases might be transient (e.g. networking issues, or machine
temporarily down), and if the problem takes more than 5 restarts but is
not permanent, watcher will not restart the instance. The value of 8
hours is, I think, both conservative (as not to hammer the cluster too
often with restarts) and fast enough to clear semi-transient problems.

And last, if an instance is not restarted due to exhausted retries, this
should be warned, otherwise it's hard to understand why watcher doesn't
want to restart an ERROR_down instance.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

Files

added
modified
copied
renamed
deleted

View differences

daemons
- ganeti-watcher (diff)

Synnefo » snf-ganeti

Revision f5116c87

Files