Statistics
| Branch: | Tag: | Revision:

root / doc / design-hotplug.rst @ cba4819e

History | View | Annotate | Download (9.4 kB)

1
=======
2
Hotplug
3
=======
4

    
5
.. contents:: :depth: 4
6

    
7
This is a design document detailing the implementation of device
8
hotplugging in Ganeti. The logic used is hypervisor agnostic but still
9
the initial implementation will target the KVM hypervisor. The
10
implementation adds ``python-fdsend`` as a new dependency.
11

    
12

    
13
Current state and shortcomings
14
==============================
15

    
16
Currently, Ganeti supports addition/removal/modification of devices
17
(NICs, Disks) but the actual modification takes place only after
18
rebooting the instance. To this end an instance cannot change network,
19
get a new disk etc. without a hard reboot.
20

    
21
Until now, in case of KVM hypervisor, code does not name devices nor
22
places them in specific PCI slots. Devices are appended in the KVM
23
command and Ganeti lets KVM decide where to place them. This means that
24
there is a possibility a device that resides in PCI slot 5, after a
25
reboot (due to another device removal) to be moved to another PCI slot
26
and probably get renamed too (due to udev rules, etc.).
27

    
28
In order migration to succeed, the process on the target node should be
29
started with exactly the same machine version, CPU architecture and PCI
30
configuration with the running process. During instance creation/startup
31
ganeti creates a KVM runtime file with all the necessary information to
32
generate the KVM command. This runtime file is used during instance
33
migration to start a new identical KVM process. The current format
34
includes the fixed part of the final KVM command, a list of NICs',
35
and hvparams dict. It does not favor easy manipulations concerning
36
disks, because they are encapsulated in the fixed KVM command.
37

    
38
Proposed changes
39
================
40

    
41
For the case of the KVM hypervisor, QEMU exposes 32 PCI slots to the
42
instance. Disks and NICs occupy some of these slots. Recent versions of
43
QEMU have introduced monitor commands that allow addition/removal of PCI
44
devices. Devices are referenced based on their name or position on the
45
virtual PCI bus. To be able to use these commands, we need to be able to
46
assign each device a unique name.
47

    
48
To keep track where each device is plugged into, we add the
49
``pci`` slot to Disk and NIC objects, but we save it only in runtime
50
files, since it is hypervisor specific info. This is added for easy
51
object manipulation and is ensured not to be written back to the config.
52

    
53
We propose to make use of QEMU 1.0 monitor commands so that
54
modifications to devices take effect instantly without the need for hard
55
reboot. The only change exposed to the end-user will be the addition of
56
a ``--hotplug`` option to the ``gnt-instance modify`` command.
57

    
58
Upon hotplugging the PCI configuration of an instance is changed.
59
Runtime files should be updated correspondingly. Currently this is
60
impossible in case of disk hotplug because disks are included in command
61
line entry of the runtime file, contrary to NICs that are correctly
62
treated separately. We change the format of runtime files, we remove
63
disks from the fixed KVM command and create new entry containing them
64
only. KVM options concerning disk are generated during
65
``_ExecuteKVMCommand()``, just like NICs.
66

    
67
Design decisions
68
================
69

    
70
Which should be each device ID? Currently KVM does not support arbitrary
71
IDs for devices; supported are only names starting with a letter, max 32
72
chars length, and only including '.' '_' '-' special chars.
73
We use the device pci slot and name it after <device type>-pci-<slot>
74
(for debugging purposes we could add a part of uuid as well).
75

    
76
Who decides where to hotplug each device? As long as this is a
77
hypervisor specific matter, there is no point for the master node to
78
decide such a thing. Master node just has to request noded to hotplug a
79
device. To this end, hypervisor specific code should parse the current
80
PCI configuration (i.e. ``info pci`` QEMU monitor command), find the first
81
available slot and hotplug the device. Having noded to decide where to
82
hotplug a device we ensure that no error will occur due to duplicate
83
slot assignment (if masterd keeps track of PCI reservations and noded
84
fails to return the PCI slot that the device was plugged into then next
85
hotplug will fail).
86

    
87
Where should we keep track of devices' PCI slots? As already mentioned,
88
we must keep track of devices PCI slots to successfully migrate
89
instances. First option is to save this info to config data, which would
90
allow us to place each device at the same PCI slot after reboot. This
91
would require to make the hypervisor return the PCI slot chosen for each
92
device, and storing this information to config data. Additionally the
93
whole instance configuration should be returned with PCI slots filled
94
after instance start and each instance should keep track of current PCI
95
reservations. We decide not to go towards this direction in order to
96
keep it simple and do not add hypervisor specific info to configuration
97
data (``pci_reservations`` at instance level and ``pci`` at device
98
level). For the aforementioned reason, we decide to store this info only
99
in KVM runtime files.
100

    
101
Where to place the devices upon instance startup? QEMU has by default 4
102
pre-occupied PCI slots. So, hypervisor can use the remaining ones for
103
disks and NICs. Currently, PCI configuration is not preserved after
104
reboot.  Each time an instance starts, KVM assigns PCI slots to devices
105
based on their ordering in Ganeti configuration, i.e. the second disk
106
will be placed after the first, the third NIC after the second, etc.
107
Since we decided that there is no need to keep track of devices PCI
108
slots, there is no need to change current functionality.
109

    
110
How to deal with existing instances? Hotplug depends on runtime file
111
manipulation. It stores there pci info and every device the kvm process is
112
currently using. Existing files have no pci info in devices and have block
113
devices encapsulated inside kvm_cmd entry. Thus hotplugging of existing devices
114
will not be possible. Still migration and hotplugging of new devices will
115
succeed. The workaround will happen upon loading kvm runtime: if we detect old
116
style format we will add an empty list for block devices and upon saving kvm
117
runtime we will include this empty list as well. Switching entirely to new
118
format will happen upon instance reboot.
119

    
120

    
121
Configuration changes
122
---------------------
123

    
124
The ``NIC`` and ``Disk`` objects get one extra slot: ``pci``. It refers to
125
PCI slot that the device gets plugged into.
126

    
127
In order to be able to live migrate successfully, runtime files should
128
be updated every time a live modification (hotplug) takes place. To this
129
end we change the format of runtime files. The KVM options referring to
130
instance's disks are no longer recorded as part of the KVM command line.
131
Disks are treated separately, just as we treat NICs right now. We insert
132
and remove entries to reflect the current PCI configuration.
133

    
134

    
135
Backend changes
136
---------------
137

    
138
Introduce one new RPC call:
139

    
140
- hotplug_device(DEVICE_TYPE, ACTION, device, ...)
141

    
142
where DEVICE_TYPE can be either NIC or Disk, and ACTION either REMOVE or ADD.
143

    
144
Hypervisor changes
145
------------------
146

    
147
We implement hotplug on top of the KVM hypervisor. We take advantage of
148
QEMU 1.0 monitor commands (``device_add``, ``device_del``,
149
``drive_add``, ``drive_del``, ``netdev_add``,`` netdev_del``). QEMU
150
refers to devices based on their id. We use ``uuid`` to name them
151
properly. If a device is about to be hotplugged we parse the output of
152
``info pci`` and find the occupied PCI slots. We choose the first
153
available and the whole device object is appended to the corresponding
154
entry in the runtime file.
155

    
156
Concerning NIC handling, we build on the top of the existing logic
157
(first create a tap with _OpenTap() and then pass its file descriptor to
158
the KVM process). To this end we need to pass access rights to the
159
corresponding file descriptor over the monitor socket (UNIX domain
160
socket). The open file is passed as a socket-level control message
161
(SCM), using the ``fdsend`` python library.
162

    
163

    
164
User interface
165
--------------
166

    
167
The new ``--hotplug`` option to gnt-instance modify is introduced, which
168
forces live modifications.
169

    
170

    
171
Enabling hotplug
172
++++++++++++++++
173

    
174
Hotplug will be optional during gnt-instance modify.  For existing
175
instance, after installing a version that supports hotplugging we
176
have the restriction that hotplug will not be supported for existing
177
devices. The reason is that old runtime files lack of:
178

    
179
1. Device pci configuration info.
180

    
181
2. Separate block device entry.
182

    
183
Hotplug will be supported only for KVM in the first implementation. For
184
all other hypervisors, backend will raise an Exception case hotplug is
185
requested.
186

    
187

    
188
NIC hotplug
189
+++++++++++
190

    
191
The user can add/modify/remove NICs either with hotplugging or not. If a
192
NIC is to be added a tap is created first and configured properly with
193
kvm-vif-bridge script. Then the instance gets a new network interface.
194
Since there is no QEMU monitor command to modify a NIC, we modify a NIC
195
by temporary removing the existing one and adding a new with the new
196
configuration. When removing a NIC the corresponding tap gets removed as
197
well.
198

    
199
::
200

    
201
 gnt-instance modify --net add --hotplug test
202
 gnt-instance modify --net 1:mac=aa:00:00:55:44:33 --hotplug test
203
 gnt-instance modify --net 1:remove --hotplug test
204

    
205

    
206
Disk hotplug
207
++++++++++++
208

    
209
The user can add and remove disks with hotplugging or not. QEMU monitor
210
supports resizing of disks, however the initial implementation will
211
support only disk addition/deletion.
212

    
213
::
214

    
215
 gnt-instance modify --disk add:size=1G --hotplug test
216
 gnt-instance modify --net 1:remove --hotplug test
217

    
218
.. vim: set textwidth=72 :
219
.. Local Variables:
220
.. mode: rst
221
.. fill-column: 72
222
.. End: