root / doc / design-internal-shutdown.rst @ 0565f862
History | View | Annotate | Download (6.2 kB)
1 |
============================================================ |
---|---|
2 |
Detection of user-initiated shutdown from inside an instance |
3 |
============================================================ |
4 |
|
5 |
.. contents:: :depth: 2 |
6 |
|
7 |
This is a design document detailing the implementation of a way for Ganeti to |
8 |
detect whether an instance marked as up but not running was shutdown gracefully |
9 |
by the user from inside the instance itself. |
10 |
|
11 |
Current state and shortcomings |
12 |
============================== |
13 |
|
14 |
Ganeti keeps track of the desired status of instances in order to be able to |
15 |
take proper action (e.g.: reboot) on the instances that happen to crash. |
16 |
Currently, the only way to properly shut down an instance is through Ganeti's |
17 |
own commands, which can be used to mark an instance as ``ADMIN_down``. |
18 |
|
19 |
If a user shuts down an instance from inside, through the proper command of the |
20 |
operating system it is running, the instance will be shutdown gracefully, but |
21 |
Ganeti is not aware of that: the desired status of the instance will still be |
22 |
marked as ``running``, so when the watcher realises that the instance is down, |
23 |
it will restart it. This behaviour is usually not what the user expects. |
24 |
|
25 |
Proposed changes |
26 |
================ |
27 |
|
28 |
We propose to modify Ganeti in such a way that it will detect when an instance |
29 |
was shutdown as a result of an explicit request from the user. When such a |
30 |
situation is detected, instead of presenting an error as it happens now, either |
31 |
the state of the instance will be set to ``ADMIN_down``, or the instance will be |
32 |
automatically rebooted, depending on an instance-specific configuration value. |
33 |
The default behavior in case no such parameter is found will be to follow the |
34 |
apparent will of the user, and setting to ``ADMIN_down`` an instance that was |
35 |
shut down correctly from inside. |
36 |
|
37 |
The rest of this design document details the implementation of instance shutdown |
38 |
detection for Xen. The KVM implementation is detailed in :doc:`design-kvmd`. |
39 |
|
40 |
Implementation |
41 |
============== |
42 |
|
43 |
Xen knows why a domain is being shut down (a crash or an explicit shutdown |
44 |
or poweroff request), but such information is not usually readily available |
45 |
externally, because all such cases lead to the virtual machine being destroyed |
46 |
immediately after the event is detected. |
47 |
|
48 |
Still, Xen allows the instance configuration file to define what action to be |
49 |
taken in all those cases through the ``on_poweroff``, ``on_shutdown`` and |
50 |
``on_crash`` variables. By setting them to ``preserve``, Xen will avoid |
51 |
destroying the domains automatically. |
52 |
|
53 |
When the domain is not destroyed, it can be viewed by using ``xm list`` (or ``xl |
54 |
list`` in newer Xen versions), and the ``State`` field of the output will |
55 |
provide useful information. |
56 |
|
57 |
If the state is ``----c-`` it means the instance has crashed. |
58 |
|
59 |
If the state is ``---s--`` it means the instance was properly shutdown. |
60 |
|
61 |
If the instance was properly shutdown and it is still marked as ``running`` by |
62 |
Ganeti, it means that it was shutdown from inside by the user, and the Ganeti |
63 |
status of the instance needs to be changed to ``ADMIN_down``. |
64 |
|
65 |
This will be done at regular intervals by the group watcher, just before |
66 |
deciding which instances to reboot. |
67 |
|
68 |
On top of that, at the same time, the watcher will also need to issue ``xm |
69 |
destroy`` commands for all the domains that are in a crashed or shutdown state, |
70 |
since this will not be done automatically by Xen anymore because of the |
71 |
``preserve`` setting in their config files. |
72 |
|
73 |
This behavior will be limited to the domains shut down from inside, because it |
74 |
will actually keep the resources of the domain busy until the watcher will do |
75 |
the cleaning job (that, with the default setting, is up to every 5 minutes). |
76 |
Still, this is considered acceptable, because it is not frequent for a domain to |
77 |
be shut down this way. The cleanup function will be also run automatically just |
78 |
before performing any job that requires resources to be available (such as when |
79 |
creating a new instance), in order to ensure that the new resource allocation |
80 |
happens starting from a clean state. Functionalities that only query the state |
81 |
of instances will not run the cleanup function. |
82 |
|
83 |
The cleanup operation includes both node-specific operations (the actual |
84 |
destruction of the stopped domains) and configuration changes, to be performed |
85 |
on the master node (marking as offline an instance that was shut down |
86 |
internally). The watcher, on the master node, will fetch the list of instances |
87 |
that have been shutdown from inside (recognizable by their ``oper_state`` |
88 |
as described below). It will then submit a series of ``InstanceShutdown`` jobs |
89 |
that will mark such instances as ``ADMIN_down`` and clean them up (after |
90 |
the functionality of ``InstanceShutdown`` will have been extended as specified |
91 |
in the rest of this design document). |
92 |
|
93 |
LUs performing operations other than an explicit cleanup will have to be |
94 |
modified to perform the cleanup as well, either by submitting a job to perform |
95 |
the cleanup (to be completed before actually performing the task at hand) or by |
96 |
explicitly performing the cleanup themselves through the RPC calls. |
97 |
|
98 |
Other required changes |
99 |
++++++++++++++++++++++ |
100 |
|
101 |
The implementation of this design document will require some commands to be |
102 |
changed in order to cope with the new shutdown procedure. |
103 |
|
104 |
With the default shutdown action in Xen set to ``preserve``, the Ganeti |
105 |
command for shutting down instances would leave them in a shutdown but |
106 |
preserved state. Therefore, it will have to be changed in such a way to |
107 |
immediately perform the cleanup of the instance after verifying its correct |
108 |
shutdown. Also, it will correctly deal with instances that have been shutdown |
109 |
from inside but are still active according to Ganeti, by detecting this |
110 |
situation, destroying the instance and carrying out the rest of the Ganeti |
111 |
shutdown procedure as usual. |
112 |
|
113 |
The ``gnt-instance list`` command will need to be able to handle the situation |
114 |
where an instance was shutdown internally but not yet cleaned up. The |
115 |
``admin_state`` field will maintain the current meaning unchanged. The |
116 |
``oper_state`` field will get a new possible state, ``S``, meaning that the |
117 |
instance was shutdown internally. |
118 |
|
119 |
The ``gnt-instance info`` command ``State`` field, in such case, will show a |
120 |
message stating that the instance was supposed to be run but was shut down |
121 |
internally. |
122 |
|
123 |
.. vim: set textwidth=72 : |
124 |
.. Local Variables: |
125 |
.. mode: rst |
126 |
.. fill-column: 72 |
127 |
.. End: |