X-Git-Url: https://code.grnet.gr/git/ganeti-local/blobdiff_plain/819358e178135369da8eb22a91c184518f89ecfb..99c7cd5be025e86745aa46003ca0962609e0b4e2:/doc/design-autorepair.rst diff --git a/doc/design-autorepair.rst b/doc/design-autorepair.rst index 54dc914..5ab446b 100644 --- a/doc/design-autorepair.rst +++ b/doc/design-autorepair.rst @@ -81,6 +81,14 @@ error condition that requires a more risky or drastic solution, but never vice versa (if a worse solution is allowed then so is a better one). +If there are multiple ``ganeti:watcher:autorepair:`` tags in an +object (cluster, node group or instance), the least destructive tag +takes precedence. When multiplicity happens across objects, the nearest +tag wins. For example, if in a cluster with two instances, *I1* and +*I2*, *I1* has ``failover``, and the cluster itself has both +``fix-storage`` and ``reinstall``, *I1* will end up with ``failover`` +and *I2* with ``fix-storage``. + ganeti:watcher:autorepair:suspend[:] +++++++++++++++++++++++++++++++++++++++++++++++ @@ -102,6 +110,17 @@ It might also be useful to easily have an operation that tags all instances matching a filter on some charateristic. But again, this wouldn't be specific to this tag. +If there are multiple +``ganeti:watcher:autorepair:suspend[:]`` tags in an object, +the form without timestamp takes precedence (permanent suspension); or, +if all object tags have a timestamp, the one with the highest timestamp. +When multiplicity happens across objects, the nearest tag wins, as +above. This makes it possible to suspend cluster-enabled repairs with a +single tag in the cluster object; or to suspend them only for a certain +node group or instance. At the same time, it is possible to re-enable +cluster-suspended repairs in a particular instance or group by applying +an enable tag to them. + ganeti:watcher:autorepair:pending:::: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ @@ -115,8 +134,8 @@ to this instance for this ``id`` (we will "update" the tag by adding a the timestamp will never change for the same repair) ``jobs`` is the list of jobs already run or being run to repair the -instance. If the instance has just been put in pending state but no job -has run yet, this list is empty. +instance (separated by a plus sign, *+*). If the instance has just +been put in pending state but no job has run yet, this list is empty. This tag will be set by ganeti if an equivalent autorepair tag is present and a a repair is needed, or can be set by an external tool to @@ -131,8 +150,8 @@ ganeti:watcher:autorepair:result::::: (instance) If this tag is present a repair of type ``type`` has been performed on the instance and has been completed by ``timestamp``. The result is -either ``success``, ``failure`` or ``enoperm``, and jobs is a comma -separated list of jobs that were executed for this repair. +either ``success``, ``failure`` or ``enoperm``, and jobs is a +*+*-separated list of jobs that were executed for this repair. An ``enoperm`` result is returned when the repair was brought on until possible, but the repair type doesn't consent to proceed further.