Revision 96eeb742

b/doc/design-internal-shutdown.rst
5 5
.. contents:: :depth: 2
6 6

  
7 7
This is a design document detailing the implementation of a way for Ganeti to
8
detect whether a machine marked as up but not running was shutdown gracefully
9
by the user from inside the machine itself.
8
detect whether an instance marked as up but not running was shutdown gracefully
9
by the user from inside the instance itself.
10 10

  
11 11
Current state and shortcomings
12 12
==============================
13 13

  
14 14
Ganeti keeps track of the desired status of instances in order to be able to
15
take proper actions (e.g.: reboot) on the ones that happen to crash.
16
Currently, the only way to properly shut down a machine is through Ganeti's own
17
commands, that will mark an instance as ``ADMIN_down``.
15
take proper action (e.g.: reboot) on the instances that happen to crash.
16
Currently, the only way to properly shut down an instance is through Ganeti's
17
own commands, which can be used to mark an instance as ``ADMIN_down``.
18

  
18 19
If a user shuts down an instance from inside, through the proper command of the
19 20
operating system it is running, the instance will be shutdown gracefully, but
20 21
Ganeti is not aware of that: the desired status of the instance will still be
......
25 26
================
26 27

  
27 28
We propose to modify Ganeti in such a way that it will detect when an instance
28
was shutdown because of an explicit user request. When such a situation is
29
detected, instead of presenting an error as it happens now, either the state
30
of the instance will be set to ADMIN_down, or the instance will be
31
automatically rebooted, depending on a instance-specific configuration value.
32
The default behavior in case no such parameter is found will be to follow
33
the apparent will of the user, and setting to ADMIN_down an instance that
34
was shut down correctly from inside.
35

  
36
This design document applies to the Xen backend of Ganeti, because it uses
37
features specific of such hypervisor. Initial analysis suggests that a similar
38
approach might be used for KVM as well, so this design document will be later
39
extended to add more details about it.
29
was shutdown as a result of an explicit request from the user. When such a
30
situation is detected, instead of presenting an error as it happens now, either
31
the state of the instance will be set to ``ADMIN_down``, or the instance will be
32
automatically rebooted, depending on an instance-specific configuration value.
33
The default behavior in case no such parameter is found will be to follow the
34
apparent will of the user, and setting to ``ADMIN_down`` an instance that was
35
shut down correctly from inside.
36

  
37
The rest of this design document details the implementation of instance shutdown
38
detection for Xen.  The KVM implementation is detailed in :doc:`design-kvmd`.
40 39

  
41 40
Implementation
42 41
==============
......
60 59
If the state is ``---s--`` it means the instance was properly shutdown.
61 60

  
62 61
If the instance was properly shutdown and it is still marked as ``running`` by
63
Ganeti, it means that it was shutdown from inside by the user, and the ganeti
62
Ganeti, it means that it was shutdown from inside by the user, and the Ganeti
64 63
status of the instance needs to be changed to ``ADMIN_down``.
65 64

  
66 65
This will be done at regular intervals by the group watcher, just before
67 66
deciding which instances to reboot.
68 67

  
69
On top of that, at the same times, the watcher will also need to issue ``xm
70
destroy`` commands for all the domains that are in crashed or shutdown state,
68
On top of that, at the same time, the watcher will also need to issue ``xm
69
destroy`` commands for all the domains that are in a crashed or shutdown state,
71 70
since this will not be done automatically by Xen anymore because of the
72 71
``preserve`` setting in their config files.
73 72

  
74 73
This behavior will be limited to the domains shut down from inside, because it
75 74
will actually keep the resources of the domain busy until the watcher will do
76 75
the cleaning job (that, with the default setting, is up to every 5 minutes).
77
Still, this is considered acceptable, because it is not frequent for a domain
78
to be shut down this way. The cleanup function will be also run
79
automatically just before performing any job that requires resources to be
80
available (such as when creating a new instance), in order to ensure that the
81
new resource allocation happens starting from a clean state. Functionalities
82
that only query the state of instances will not run the cleanup function.
76
Still, this is considered acceptable, because it is not frequent for a domain to
77
be shut down this way. The cleanup function will be also run automatically just
78
before performing any job that requires resources to be available (such as when
79
creating a new instance), in order to ensure that the new resource allocation
80
happens starting from a clean state. Functionalities that only query the state
81
of instances will not run the cleanup function.
83 82

  
84 83
The cleanup operation includes both node-specific operations (the actual
85 84
destruction of the stopped domains) and configuration changes, to be performed
......
112 111
shutdown procedure as usual.
113 112

  
114 113
The ``gnt-instance list`` command will need to be able to handle the situation
115
where an instance was shutdown internally but not yet cleaned up.
116
The ``admin_state`` field will maintain the current meaning unchanged. The
114
where an instance was shutdown internally but not yet cleaned up.  The
115
``admin_state`` field will maintain the current meaning unchanged. The
117 116
``oper_state`` field will get a new possible state, ``S``, meaning that the
118 117
instance was shutdown internally.
119 118

  

Also available in: Unified diff