Revision 96eeb742 doc/design-internal-shutdown.rst
b/doc/design-internal-shutdown.rst | ||
---|---|---|
5 | 5 |
.. contents:: :depth: 2 |
6 | 6 |
|
7 | 7 |
This is a design document detailing the implementation of a way for Ganeti to |
8 |
detect whether a machine marked as up but not running was shutdown gracefully
|
|
9 |
by the user from inside the machine itself.
|
|
8 |
detect whether an instance marked as up but not running was shutdown gracefully
|
|
9 |
by the user from inside the instance itself.
|
|
10 | 10 |
|
11 | 11 |
Current state and shortcomings |
12 | 12 |
============================== |
13 | 13 |
|
14 | 14 |
Ganeti keeps track of the desired status of instances in order to be able to |
15 |
take proper actions (e.g.: reboot) on the ones that happen to crash. |
|
16 |
Currently, the only way to properly shut down a machine is through Ganeti's own |
|
17 |
commands, that will mark an instance as ``ADMIN_down``. |
|
15 |
take proper action (e.g.: reboot) on the instances that happen to crash. |
|
16 |
Currently, the only way to properly shut down an instance is through Ganeti's |
|
17 |
own commands, which can be used to mark an instance as ``ADMIN_down``. |
|
18 |
|
|
18 | 19 |
If a user shuts down an instance from inside, through the proper command of the |
19 | 20 |
operating system it is running, the instance will be shutdown gracefully, but |
20 | 21 |
Ganeti is not aware of that: the desired status of the instance will still be |
... | ... | |
25 | 26 |
================ |
26 | 27 |
|
27 | 28 |
We propose to modify Ganeti in such a way that it will detect when an instance |
28 |
was shutdown because of an explicit user request. When such a situation is |
|
29 |
detected, instead of presenting an error as it happens now, either the state |
|
30 |
of the instance will be set to ADMIN_down, or the instance will be |
|
31 |
automatically rebooted, depending on a instance-specific configuration value. |
|
32 |
The default behavior in case no such parameter is found will be to follow |
|
33 |
the apparent will of the user, and setting to ADMIN_down an instance that |
|
34 |
was shut down correctly from inside. |
|
35 |
|
|
36 |
This design document applies to the Xen backend of Ganeti, because it uses |
|
37 |
features specific of such hypervisor. Initial analysis suggests that a similar |
|
38 |
approach might be used for KVM as well, so this design document will be later |
|
39 |
extended to add more details about it. |
|
29 |
was shutdown as a result of an explicit request from the user. When such a |
|
30 |
situation is detected, instead of presenting an error as it happens now, either |
|
31 |
the state of the instance will be set to ``ADMIN_down``, or the instance will be |
|
32 |
automatically rebooted, depending on an instance-specific configuration value. |
|
33 |
The default behavior in case no such parameter is found will be to follow the |
|
34 |
apparent will of the user, and setting to ``ADMIN_down`` an instance that was |
|
35 |
shut down correctly from inside. |
|
36 |
|
|
37 |
The rest of this design document details the implementation of instance shutdown |
|
38 |
detection for Xen. The KVM implementation is detailed in :doc:`design-kvmd`. |
|
40 | 39 |
|
41 | 40 |
Implementation |
42 | 41 |
============== |
... | ... | |
60 | 59 |
If the state is ``---s--`` it means the instance was properly shutdown. |
61 | 60 |
|
62 | 61 |
If the instance was properly shutdown and it is still marked as ``running`` by |
63 |
Ganeti, it means that it was shutdown from inside by the user, and the ganeti
|
|
62 |
Ganeti, it means that it was shutdown from inside by the user, and the Ganeti
|
|
64 | 63 |
status of the instance needs to be changed to ``ADMIN_down``. |
65 | 64 |
|
66 | 65 |
This will be done at regular intervals by the group watcher, just before |
67 | 66 |
deciding which instances to reboot. |
68 | 67 |
|
69 |
On top of that, at the same times, the watcher will also need to issue ``xm
|
|
70 |
destroy`` commands for all the domains that are in crashed or shutdown state, |
|
68 |
On top of that, at the same time, the watcher will also need to issue ``xm |
|
69 |
destroy`` commands for all the domains that are in a crashed or shutdown state,
|
|
71 | 70 |
since this will not be done automatically by Xen anymore because of the |
72 | 71 |
``preserve`` setting in their config files. |
73 | 72 |
|
74 | 73 |
This behavior will be limited to the domains shut down from inside, because it |
75 | 74 |
will actually keep the resources of the domain busy until the watcher will do |
76 | 75 |
the cleaning job (that, with the default setting, is up to every 5 minutes). |
77 |
Still, this is considered acceptable, because it is not frequent for a domain |
|
78 |
to be shut down this way. The cleanup function will be also run
|
|
79 |
automatically just before performing any job that requires resources to be
|
|
80 |
available (such as when creating a new instance), in order to ensure that the
|
|
81 |
new resource allocation happens starting from a clean state. Functionalities
|
|
82 |
that only query the state of instances will not run the cleanup function.
|
|
76 |
Still, this is considered acceptable, because it is not frequent for a domain to
|
|
77 |
be shut down this way. The cleanup function will be also run automatically just
|
|
78 |
before performing any job that requires resources to be available (such as when
|
|
79 |
creating a new instance), in order to ensure that the new resource allocation
|
|
80 |
happens starting from a clean state. Functionalities that only query the state
|
|
81 |
of instances will not run the cleanup function. |
|
83 | 82 |
|
84 | 83 |
The cleanup operation includes both node-specific operations (the actual |
85 | 84 |
destruction of the stopped domains) and configuration changes, to be performed |
... | ... | |
112 | 111 |
shutdown procedure as usual. |
113 | 112 |
|
114 | 113 |
The ``gnt-instance list`` command will need to be able to handle the situation |
115 |
where an instance was shutdown internally but not yet cleaned up. |
|
116 |
The ``admin_state`` field will maintain the current meaning unchanged. The
|
|
114 |
where an instance was shutdown internally but not yet cleaned up. The
|
|
115 |
``admin_state`` field will maintain the current meaning unchanged. The |
|
117 | 116 |
``oper_state`` field will get a new possible state, ``S``, meaning that the |
118 | 117 |
instance was shutdown internally. |
119 | 118 |
|
Also available in: Unified diff