Statistics
| Branch: | Tag: | Revision:

root / doc / design-internal-shutdown.rst @ 0565f862

History | View | Annotate | Download (6.2 kB)

1
============================================================
2
Detection of user-initiated shutdown from inside an instance
3
============================================================
4

    
5
.. contents:: :depth: 2
6

    
7
This is a design document detailing the implementation of a way for Ganeti to
8
detect whether an instance marked as up but not running was shutdown gracefully
9
by the user from inside the instance itself.
10

    
11
Current state and shortcomings
12
==============================
13

    
14
Ganeti keeps track of the desired status of instances in order to be able to
15
take proper action (e.g.: reboot) on the instances that happen to crash.
16
Currently, the only way to properly shut down an instance is through Ganeti's
17
own commands, which can be used to mark an instance as ``ADMIN_down``.
18

    
19
If a user shuts down an instance from inside, through the proper command of the
20
operating system it is running, the instance will be shutdown gracefully, but
21
Ganeti is not aware of that: the desired status of the instance will still be
22
marked as ``running``, so when the watcher realises that the instance is down,
23
it will restart it. This behaviour is usually not what the user expects.
24

    
25
Proposed changes
26
================
27

    
28
We propose to modify Ganeti in such a way that it will detect when an instance
29
was shutdown as a result of an explicit request from the user. When such a
30
situation is detected, instead of presenting an error as it happens now, either
31
the state of the instance will be set to ``ADMIN_down``, or the instance will be
32
automatically rebooted, depending on an instance-specific configuration value.
33
The default behavior in case no such parameter is found will be to follow the
34
apparent will of the user, and setting to ``ADMIN_down`` an instance that was
35
shut down correctly from inside.
36

    
37
The rest of this design document details the implementation of instance shutdown
38
detection for Xen.  The KVM implementation is detailed in :doc:`design-kvmd`.
39

    
40
Implementation
41
==============
42

    
43
Xen knows why a domain is being shut down (a crash or an explicit shutdown
44
or poweroff request), but such information is not usually readily available
45
externally, because all such cases lead to the virtual machine being destroyed
46
immediately after the event is detected.
47

    
48
Still, Xen allows the instance configuration file to define what action to be
49
taken in all those cases through the ``on_poweroff``, ``on_shutdown`` and
50
``on_crash`` variables. By setting them to ``preserve``, Xen will avoid
51
destroying the domains automatically.
52

    
53
When the domain is not destroyed, it can be viewed by using ``xm list`` (or ``xl
54
list`` in newer Xen versions), and the ``State`` field of the output will
55
provide useful information.
56

    
57
If the state is ``----c-`` it means the instance has crashed.
58

    
59
If the state is ``---s--`` it means the instance was properly shutdown.
60

    
61
If the instance was properly shutdown and it is still marked as ``running`` by
62
Ganeti, it means that it was shutdown from inside by the user, and the Ganeti
63
status of the instance needs to be changed to ``ADMIN_down``.
64

    
65
This will be done at regular intervals by the group watcher, just before
66
deciding which instances to reboot.
67

    
68
On top of that, at the same time, the watcher will also need to issue ``xm
69
destroy`` commands for all the domains that are in a crashed or shutdown state,
70
since this will not be done automatically by Xen anymore because of the
71
``preserve`` setting in their config files.
72

    
73
This behavior will be limited to the domains shut down from inside, because it
74
will actually keep the resources of the domain busy until the watcher will do
75
the cleaning job (that, with the default setting, is up to every 5 minutes).
76
Still, this is considered acceptable, because it is not frequent for a domain to
77
be shut down this way. The cleanup function will be also run automatically just
78
before performing any job that requires resources to be available (such as when
79
creating a new instance), in order to ensure that the new resource allocation
80
happens starting from a clean state. Functionalities that only query the state
81
of instances will not run the cleanup function.
82

    
83
The cleanup operation includes both node-specific operations (the actual
84
destruction of the stopped domains) and configuration changes, to be performed
85
on the master node (marking as offline an instance that was shut down
86
internally). The watcher, on the master node, will fetch the list of instances
87
that have been shutdown from inside (recognizable by their ``oper_state``
88
as described below). It will then submit a series of ``InstanceShutdown`` jobs
89
that will mark such instances as ``ADMIN_down`` and clean them up (after
90
the functionality of ``InstanceShutdown`` will have been extended as specified
91
in the rest of this design document).
92

    
93
LUs performing operations other than an explicit cleanup will have to be
94
modified to perform the cleanup as well, either by submitting a job to perform
95
the cleanup (to be completed before actually performing the task at hand) or by
96
explicitly performing the cleanup themselves through the RPC calls.
97

    
98
Other required changes
99
++++++++++++++++++++++
100

    
101
The implementation of this design document will require some commands to be
102
changed in order to cope with the new shutdown procedure.
103

    
104
With the default shutdown action in Xen set to ``preserve``, the Ganeti
105
command for shutting down instances would leave them in a shutdown but
106
preserved state. Therefore, it will have to be changed in such a way to
107
immediately perform the cleanup of the instance after verifying its correct
108
shutdown. Also, it will correctly deal with instances that have been shutdown
109
from inside but are still active according to Ganeti, by detecting this
110
situation, destroying the instance and carrying out the rest of the Ganeti
111
shutdown procedure as usual.
112

    
113
The ``gnt-instance list`` command will need to be able to handle the situation
114
where an instance was shutdown internally but not yet cleaned up.  The
115
``admin_state`` field will maintain the current meaning unchanged. The
116
``oper_state`` field will get a new possible state, ``S``, meaning that the
117
instance was shutdown internally.
118

    
119
The ``gnt-instance info`` command ``State`` field, in such case, will show a
120
message stating that the instance was supposed to be run but was shut down
121
internally.
122

    
123
.. vim: set textwidth=72 :
124
.. Local Variables:
125
.. mode: rst
126
.. fill-column: 72
127
.. End: