Statistics
| Branch: | Tag: | Revision:

root / doc / design-hotplug.rst @ a00069fd

History | View | Annotate | Download (10.7 kB)

1
=======
2
Hotplug
3
=======
4

    
5
.. contents:: :depth: 4
6

    
7
This is a design document detailing the implementation of device
8
hotplugging in Ganeti. The logic used is hypervisor agnostic but still
9
the initial implementation will target the KVM hypervisor. The
10
implementation adds ``python-fdsend`` as a new dependency. In case
11
it is not installed hotplug will not be possible and the user will
12
be notified with a warning.
13

    
14

    
15
Current state and shortcomings
16
==============================
17

    
18
Currently, Ganeti supports addition/removal/modification of devices
19
(NICs, Disks) but the actual modification takes place only after
20
rebooting the instance. To this end an instance cannot change network,
21
get a new disk etc. without a hard reboot.
22

    
23
Until now, in case of KVM hypervisor, code does not name devices nor
24
places them in specific PCI slots. Devices are appended in the KVM
25
command and Ganeti lets KVM decide where to place them. This means that
26
there is a possibility a device that resides in PCI slot 5, after a
27
reboot (due to another device removal) to be moved to another PCI slot
28
and probably get renamed too (due to udev rules, etc.).
29

    
30
In order for a migration to succeed, the process on the target node
31
should be started with exactly the same machine version, CPU
32
architecture and PCI configuration with the running process. During
33
instance creation/startup ganeti creates a KVM runtime file with all the
34
necessary information to generate the KVM command. This runtime file is
35
used during instance migration to start a new identical KVM process. The
36
current format includes the fixed part of the final KVM command, a list
37
of NICs', and hvparams dict. It does not favor easy manipulations
38
concerning disks, because they are encapsulated in the fixed KVM
39
command.
40

    
41

    
42
Proposed changes
43
================
44

    
45
For the case of the KVM hypervisor, QEMU exposes 32 PCI slots to the
46
instance. Disks and NICs occupy some of these slots. Recent versions of
47
QEMU have introduced monitor commands that allow addition/removal of PCI
48
devices. Devices are referenced based on their name or position on the
49
virtual PCI bus. To be able to use these commands, we need to be able to
50
assign each device a unique name.
51

    
52
To keep track where each device is plugged into, we add the
53
``pci`` slot to Disk and NIC objects, but we save it only in runtime
54
files, since it is hypervisor specific info. This is added for easy
55
object manipulation and is ensured not to be written back to the config.
56

    
57
We propose to make use of QEMU 1.7 QMP commands so that
58
modifications to devices take effect instantly without the need for hard
59
reboot. The only change exposed to the end-user will be the addition of
60
a ``--hotplug`` option to the ``gnt-instance modify`` command.
61

    
62
Upon hotplugging the PCI configuration of an instance is changed.
63
Runtime files should be updated correspondingly. Currently this is
64
impossible in case of disk hotplug because disks are included in command
65
line entry of the runtime file, contrary to NICs that are correctly
66
treated separately. We change the format of runtime files, we remove
67
disks from the fixed KVM command and create new entry containing them
68
only. KVM options concerning disk are generated during
69
``_ExecuteKVMCommand()``, just like NICs.
70

    
71
Design decisions
72
================
73

    
74
Which should be each device ID? Currently KVM does not support arbitrary
75
IDs for devices; supported are only names starting with a letter, max 32
76
chars length, and only including '.' '_' '-' special chars.
77
For debugging purposes and in order to be more informative, device will be
78
named after: <device type>-<part of uuid>-pci-<slot>.
79

    
80
Who decides where to hotplug each device? As long as this is a
81
hypervisor specific matter, there is no point for the master node to
82
decide such a thing. Master node just has to request noded to hotplug a
83
device. To this end, hypervisor specific code should parse the current
84
PCI configuration (i.e. ``query-pci`` QMP command), find the first
85
available slot and hotplug the device. Having noded to decide where to
86
hotplug a device we ensure that no error will occur due to duplicate
87
slot assignment (if masterd keeps track of PCI reservations and noded
88
fails to return the PCI slot that the device was plugged into then next
89
hotplug will fail).
90

    
91
Where should we keep track of devices' PCI slots? As already mentioned,
92
we must keep track of devices PCI slots to successfully migrate
93
instances. First option is to save this info to config data, which would
94
allow us to place each device at the same PCI slot after reboot. This
95
would require to make the hypervisor return the PCI slot chosen for each
96
device, and storing this information to config data. Additionally the
97
whole instance configuration should be returned with PCI slots filled
98
after instance start and each instance should keep track of current PCI
99
reservations. We decide not to go towards this direction in order to
100
keep it simple and do not add hypervisor specific info to configuration
101
data (``pci_reservations`` at instance level and ``pci`` at device
102
level). For the aforementioned reason, we decide to store this info only
103
in KVM runtime files.
104

    
105
Where to place the devices upon instance startup? QEMU has by default 4
106
pre-occupied PCI slots. So, hypervisor can use the remaining ones for
107
disks and NICs. Currently, PCI configuration is not preserved after
108
reboot.  Each time an instance starts, KVM assigns PCI slots to devices
109
based on their ordering in Ganeti configuration, i.e. the second disk
110
will be placed after the first, the third NIC after the second, etc.
111
Since we decided that there is no need to keep track of devices PCI
112
slots, there is no need to change current functionality.
113

    
114
How to deal with existing instances? Hotplug depends on runtime file
115
manipulation. It stores there pci info and every device the kvm process is
116
currently using. Existing files have no pci info in devices and have block
117
devices encapsulated inside kvm_cmd entry. Thus hotplugging of existing devices
118
will not be possible. Still migration and hotplugging of new devices will
119
succeed. The workaround will happen upon loading kvm runtime: if we detect old
120
style format we will add an empty list for block devices and upon saving kvm
121
runtime we will include this empty list as well. Switching entirely to new
122
format will happen upon instance reboot.
123

    
124

    
125
Configuration changes
126
---------------------
127

    
128
The ``NIC`` and ``Disk`` objects get one extra slot: ``pci``. It refers to
129
PCI slot that the device gets plugged into.
130

    
131
In order to be able to live migrate successfully, runtime files should
132
be updated every time a live modification (hotplug) takes place. To this
133
end we change the format of runtime files. The KVM options referring to
134
instance's disks are no longer recorded as part of the KVM command line.
135
Disks are treated separately, just as we treat NICs right now. We insert
136
and remove entries to reflect the current PCI configuration.
137

    
138

    
139
Backend changes
140
---------------
141

    
142
Introduce one new RPC call:
143

    
144
- hotplug_device(DEVICE_TYPE, ACTION, device, ...)
145

    
146
where DEVICE_TYPE can be either NIC or Disk, and ACTION either REMOVE or ADD.
147

    
148
Hypervisor changes
149
------------------
150

    
151
We implement hotplug on top of the KVM hypervisor. We take advantage of
152
QEMU 1.7 QMP commands (``device_add``, ``device_del``,
153
``blockdev-add``, ``netdev_add``, ``netdev_del``). Since ``drive_del``
154
is not yet implemented in QMP we use the one of HMP. QEMU
155
refers to devices based on their id. We use ``uuid`` to name them
156
properly. If a device is about to be hotplugged we parse the output of
157
``query-pci`` and find the occupied PCI slots. We choose the first
158
available and the whole device object is appended to the corresponding
159
entry in the runtime file.
160

    
161
Concerning NIC handling, we build on the top of the existing logic
162
(first create a tap with _OpenTap() and then pass its file descriptor to
163
the KVM process). To this end we need to pass access rights to the
164
corresponding file descriptor over the QMP socket (UNIX domain
165
socket). The open file is passed as a socket-level control message
166
(SCM), using the ``fdsend`` python library.
167

    
168

    
169
User interface
170
--------------
171

    
172
The new ``--hotplug`` option to gnt-instance modify is introduced, which
173
forces live modifications.
174

    
175

    
176
Enabling hotplug
177
++++++++++++++++
178

    
179
Hotplug will be optional during gnt-instance modify.  For existing
180
instance, after installing a version that supports hotplugging we
181
have the restriction that hotplug will not be supported for existing
182
devices. The reason is that old runtime files lack of:
183

    
184
1. Device pci configuration info.
185

    
186
2. Separate block device entry.
187

    
188
Hotplug will be supported only for KVM in the first implementation. For
189
all other hypervisors, backend will raise an Exception case hotplug is
190
requested.
191

    
192

    
193
NIC Hotplug
194
+++++++++++
195

    
196
The user can add/modify/remove NICs either with hotplugging or not. If a
197
NIC is to be added a tap is created first and configured properly with
198
kvm-vif-bridge script. Then the instance gets a new network interface.
199
Since there is no QEMU monitor command to modify a NIC, we modify a NIC
200
by temporary removing the existing one and adding a new with the new
201
configuration. When removing a NIC the corresponding tap gets removed as
202
well.
203

    
204
::
205

    
206
 gnt-instance modify --net add --hotplug test
207
 gnt-instance modify --net 1:mac=aa:00:00:55:44:33 --hotplug test
208
 gnt-instance modify --net 1:remove --hotplug test
209

    
210

    
211
Disk Hotplug
212
++++++++++++
213

    
214
The user can add and remove disks with hotplugging or not. QEMU monitor
215
supports resizing of disks, however the initial implementation will
216
support only disk addition/deletion.
217

    
218
::
219

    
220
 gnt-instance modify --disk add:size=1G --hotplug test
221
 gnt-instance modify --net 1:remove --hotplug test
222

    
223

    
224
Dealing with chroot and uid pool (and disks in general)
225
-------------------------------------------------------
226

    
227
The design so far covers all issues that arise without addressing the
228
case where the kvm process will not run with root privileges.
229
Specifically:
230

    
231
- in case of chroot, the kvm process cannot see the newly created device
232

    
233
- in case of uid pool security model, the kvm process is not allowed
234
  to access the device
235

    
236
For NIC hotplug we address this problem by using the ``getfd`` QMP
237
command and passing the file descriptor to the kvm process over the
238
monitor socket using SCM_RIGHTS. For disk hotplug and in case of uid
239
pool we can let the hypervisor code temporarily ``chown()`` the  device
240
before the actual hotplug. Still this is insufficient in case of chroot.
241
In this case, we need to ``mknod()`` the device inside the chroot. Both
242
workarounds can be avoided, if we make use of the ``add-fd``
243
QMP command, that was introduced in version 1.7. This command is the
244
equivalent of NICs' `get-fd`` for disks and will allow disk hotplug in
245
every case. So, if the QMP does not support the ``add-fd``
246
command, we will not allow disk hotplug
247
and notify the user with the corresponding warning.
248

    
249
.. vim: set textwidth=72 :
250
.. Local Variables:
251
.. mode: rst
252
.. fill-column: 72
253
.. End: