root / doc / design-os.rst @ 0565f862
History | View | Annotate | Download (30.8 kB)
1 |
=============================== |
---|---|
2 |
Ganeti OS installation redesign |
3 |
=============================== |
4 |
|
5 |
.. contents:: :depth: 3 |
6 |
|
7 |
This is a design document detailing a new OS installation procedure, which is |
8 |
more secure, able to provide more features and easier to use for many common |
9 |
tasks w.r.t. the current one. |
10 |
|
11 |
Current state and shortcomings |
12 |
============================== |
13 |
|
14 |
As of Ganeti 2.10, each instance is associated with an OS definition. An OS |
15 |
definition is a set of scripts (i.e., ``create``, ``export``, ``import``, |
16 |
``rename``) that are executed with root privileges on the primary host of the |
17 |
instance. These scripts are responsible for performing all the OS-related |
18 |
tasks, namely, create an instance, setup an operating system on the instance's |
19 |
disks, export/import the instance, and rename the instance. |
20 |
|
21 |
These scripts receive, through environment variables, a fixed set of instance |
22 |
parameters (such as, the hypervisor, the name of the instance, the number of |
23 |
disks and their location) and a set of user defined parameters. Both the |
24 |
instance and user defined parameters are written in the configuration file of |
25 |
Ganeti, to allow future reinstalls of the instance, and in various log files, |
26 |
namely: |
27 |
|
28 |
* node daemon log file: contains DEBUG strings of the ``/os_validate``, |
29 |
``/instance_os_add`` and ``/instance_start`` RPC calls. |
30 |
|
31 |
* master daemon log file: DEBUG strings related to the same RPC calls are stored |
32 |
here as well. |
33 |
|
34 |
* commands log: the CLI commands that create a new instance, including their |
35 |
parameters, are logged here. |
36 |
|
37 |
* RAPI log: the RAPI commands that create a new instance, including their |
38 |
parameters, are logged here. |
39 |
|
40 |
* job logs: the job files stored in the job queue, or in its archive, contain |
41 |
the parameters. |
42 |
|
43 |
The current situation presents a number of shortcomings: |
44 |
|
45 |
* Having the installation scripts run as root on the nodes does not allow |
46 |
user-defined OS scripts, as they would pose a huge security risk. |
47 |
Furthermore, even a script without malicious intentions might end up |
48 |
disrupting a node because of due to a bug. |
49 |
|
50 |
* Ganeti cannot be used to create instances starting from user provided disk |
51 |
images: even in the (hypothetical) case in which the scripts are completely |
52 |
secure and run not by root but by an unprivileged user with only the power to |
53 |
mount arbitrary files as disk images, this is still a security issue. It has |
54 |
been proven that a carefully crafted file system might exploit kernel |
55 |
vulnerabilities to gain control of the system. Therefore, directly mounting |
56 |
images on the Ganeti nodes is not an option. |
57 |
|
58 |
* There is no way to inject files into an existing disk image. A common use case |
59 |
is for the system administrator to provide a standard image of the system, to |
60 |
be later personalized with the network configuration, private keys identifying |
61 |
the machine, ssh keys of the users, and so on. A possible workaround would be |
62 |
for the scripts to mount the image (only if this is trusted!) and to receive |
63 |
the configurations and ssh keys as user defined OS parameters. Unfortunately, |
64 |
this is also not an option for security sensitive material (such as the ssh |
65 |
keys) because the OS parameters are stored in many places on the system, as |
66 |
already described above. |
67 |
|
68 |
* Most other virtualization software allow only instance images, but no |
69 |
installation scripts. This difference makes the interaction between Ganeti and |
70 |
other software difficult. |
71 |
|
72 |
Proposed changes |
73 |
================ |
74 |
|
75 |
In order to fix the shortcomings of the current state, we plan to introduce the |
76 |
following changes. |
77 |
|
78 |
OS parameter categories |
79 |
+++++++++++++++++++++++ |
80 |
|
81 |
Change the OS parameters to have three categories: |
82 |
|
83 |
* ``public``: the current behavior. The parameter is logged and stored freely. |
84 |
|
85 |
* ``private``: the parameter is saved inside the Ganeti configuration (to allow |
86 |
for instance reinstall) but it is not shown in logs, job logs, or passed back |
87 |
via RAPI. |
88 |
|
89 |
* ``secret``: the parameter is not saved inside the Ganeti configuration. |
90 |
Reinstalls are impossible unless the data is passed again. The parameter will |
91 |
not appear in any log file. When a functionality is performed jointly by |
92 |
multiple daemons (such as MasterD and LuxiD), currently Ganeti sometimes |
93 |
serializes jobs on disk and later reloads them. Secret parameters will not be |
94 |
serialized to disk. They will be passed around as part of the LUXI calls |
95 |
exchanged by the daemons, and only kept in memory, in order to reduce their |
96 |
accessibility as much as possible. In case of failure of the master node, |
97 |
these parameters will be lost and cannot be recovered because they are not |
98 |
serialized. As a result, the job cannot be taken over by the new master. This |
99 |
is an expected and accepted side effect of jobs with secret parameters: if |
100 |
they fail, they'll have to be restarted manually. |
101 |
|
102 |
Metadata |
103 |
++++++++ |
104 |
|
105 |
In order to allow metadata to be sent inside the instance, a communication |
106 |
mechanism between the instance and the host will be created. This mechanism |
107 |
will be bidirectional (e.g.: to allow the setup process going on inside the |
108 |
instance to communicate its progress to the host). Each instance will have |
109 |
access exclusively to its own metadata, and it will be only able to communicate |
110 |
with its host over this channel. This is the approach followed the |
111 |
``cloud-init`` tool and more details will be provided in the `Communication |
112 |
mechanism`_ and `Metadata service`_ sections. |
113 |
|
114 |
Installation procedure |
115 |
++++++++++++++++++++++ |
116 |
|
117 |
A new installation procedure will be introduced. There will be two sets of |
118 |
parameters, namely, installation parameters, which are used mainly for installs |
119 |
and reinstalls, and execution parameters, which are used in all the other runs |
120 |
that are not part of an installation procedure. Also, it will be possible to |
121 |
use an installation medium and/or run the OS scripts in an optional virtualized |
122 |
environment, and optionally use a personalization package. This section details |
123 |
all of these options. |
124 |
|
125 |
The set of installation parameters will allow, for example, to attach an |
126 |
installation floppy/cdrom/network, change the boot device order, or specify a |
127 |
disk image to be used. Through this set of parameters, the administrator will |
128 |
have to provide the hypervisor a location for an installation medium for the |
129 |
instance (e.g., a boot disk, a network image, etc). This medium will carry out |
130 |
the installation of the instance onto the instance's disks and will then be |
131 |
responsible for getting the parameters for configuring the instance, such as, |
132 |
network interfaces, IP address, and hostname. These parameters are taken from |
133 |
the metadata. The installation parameters will be stored in the configuration |
134 |
of Ganeti and used in future reinstalls, but not during normal execution. |
135 |
|
136 |
The instance is reinstalled using the same installation parameters from the |
137 |
first installation. However, it will be the administrator's responsibility to |
138 |
ensure that the installation media is still available at the proper location |
139 |
when a reinstall occurs. |
140 |
|
141 |
The parameter ``--os-parameters`` can still be used to specify the OS |
142 |
parameters. However, without OS scripts, Ganeti cannot do more than a syntactic |
143 |
check to validate the supplied OS parameter string. As a result, this string |
144 |
will be passed directly to the instance as part of the metadata. If OS scripts |
145 |
are used and the installation procedure is running inside a virtualized |
146 |
environment, Ganeti will take these parameters from the metadata and pass them |
147 |
to the OS scripts as environment variables. |
148 |
|
149 |
Ganeti allows the following installation options: |
150 |
|
151 |
* Use a disk image: |
152 |
|
153 |
Currently, it is already possible to specify an installation medium, such as, |
154 |
a cdrom, but not a disk image. Therefore, a new parameter ``--os-image`` will |
155 |
be used to specify the location of a disk image which will be dumped to the |
156 |
instance's first disk before the instance is started. The location of the |
157 |
image can be a URL and, if this is the case, Ganeti will download this image. |
158 |
|
159 |
* Run OS scripts: |
160 |
|
161 |
The parameter ``--os-type`` (short version: ``-o``), is currently used to |
162 |
specify the OS scripts. This parameter will still be used to specify the OS |
163 |
scripts with the difference that these scripts may optionally run inside a |
164 |
virtualized environment for safety reasons, depending on whether they are |
165 |
trusted or not. For more details on trusted and untrusted OS scripts, refer |
166 |
to the `Installation process in a virtualized environment`_ section. Note |
167 |
that this parameter will become optional thus allowing a user to create an |
168 |
instance specifying only, for example, a disk image or a cdrom image to boot |
169 |
from. |
170 |
|
171 |
* Personalization package |
172 |
|
173 |
As part of the instance creation command, it will be possible to indicate a |
174 |
URL for a "personalization package", which is an archive containing a set of |
175 |
files meant to be overlayed on top of the OS file system at the end of the |
176 |
setup process and before the VM is started for the first time in normal mode. |
177 |
Ganeti will provide a mechanism for receiving and unpacking this archive, |
178 |
independently of whether the installation is being performed inside the |
179 |
virtualized environment or not. |
180 |
|
181 |
The archive will be in TAR-GZIP format (with extension ``.tar.gz`` or |
182 |
``.tgz``) and contain the files according to the directory structure that will |
183 |
be recreated on the installation disk. Files contained in this archive will |
184 |
overwrite files with the same path created during the installation procedure |
185 |
(if any). The URL of the "personalization package" will have to specify an |
186 |
extension to identify the file format (in order to allow for more formats to |
187 |
be supported in the future). The URL will be stored as part of the |
188 |
configuration of the instance (therefore, the URL should not contain |
189 |
confidential information, but the files there available can). |
190 |
|
191 |
It is up to the system administrator to ensure that a package is actually |
192 |
available at that URL at install and reinstall time. The contents of the |
193 |
package are allowed to change. E.g.: a system administrator might create a |
194 |
package containing the private keys of the instance being created. When the |
195 |
instance is reinstalled, a new package with new keys can be made available |
196 |
there, thus allowing instance reinstall without the need to store keys. A |
197 |
username and a password can be specified together with the URL. If the URL is |
198 |
a HTTP(S) URL, they will be used as basic access authentication credentials to |
199 |
access that URL. The username and password will not be saved in the config, |
200 |
and will have to be provided again in case a reinstall is requested. |
201 |
|
202 |
The downloaded personalization package will not be stored locally on the node |
203 |
for longer than it is needed while unpacking it and adding its files to the |
204 |
instance being created. The personalization package will be overlayed on top |
205 |
of the instance filesystem after the scripts that created it have been |
206 |
executed. In order for the files in the package to be automatically overlayed |
207 |
on top of the instance filesystem, it is required that the appliance is |
208 |
actually able to mount the instance's disks. As a result, this will not work |
209 |
for every filesystem. |
210 |
|
211 |
* Combine a disk image, OS scripts, and a personalization package |
212 |
|
213 |
It will possible to combine a disk image, OS scripts, and a personalization |
214 |
package, both with or without a virtualized environment (see the exception |
215 |
below). At least, an installation medium or OS scripts should be specified. |
216 |
|
217 |
The disk image of the actual virtual appliance, which bootstraps the virtual |
218 |
environment used in the installation procedure, will be read only, so that a |
219 |
pristine copy of the appliance can be started every time a new instance needs |
220 |
to be created and to further increase security. The data the instance needs |
221 |
to write at runtime will only be stored in RAM and disappear as soon as the |
222 |
instance is stopped. |
223 |
|
224 |
The parameter ``--enable-safe-install=yes|no`` will be used to give the |
225 |
administrator control over whether to use a virtualized environment for the |
226 |
installation procedure. By default, a virtualized environment will be used. |
227 |
Note that some feature combinations, such as, using untrusted scripts, will |
228 |
require the virtualized environment. In this case, Ganeti will not allow |
229 |
disabling the virtualized environment. |
230 |
|
231 |
Implementation |
232 |
============== |
233 |
|
234 |
The implementation of this design will happen as an ordered sequence of steps, |
235 |
of increasing impact on the system and, in some cases, dependent on each other: |
236 |
|
237 |
#. Private and secret instance parameters |
238 |
#. Communication mechanism between host and instance |
239 |
#. Metadata service |
240 |
#. Personalization package (inside a virtualization environment) |
241 |
#. Instance creation via a disk image |
242 |
#. Instance creation inside a virtualized environment |
243 |
|
244 |
Some of these steps need to be more deeply specified w.r.t. what is already |
245 |
written in the `Proposed changes`_ Section. Extra details will be provided in |
246 |
the following subsections. |
247 |
|
248 |
Communication mechanism |
249 |
+++++++++++++++++++++++ |
250 |
|
251 |
The communication mechanism will be an exclusive, generic, bidirectional |
252 |
communication channel between Ganeti hosts and guests. |
253 |
|
254 |
exclusive |
255 |
The communication mechanism allows communication between a guest and its host, |
256 |
but it does not allow a guest to communicate with other guests or reach the |
257 |
outside world. |
258 |
|
259 |
generic |
260 |
The communication mechanism allows a guest to reach any service on the host, |
261 |
not just the metadata service. Examples of valid communication include, but |
262 |
are not limited to, access to the metadata service, send commands to Ganeti, |
263 |
request changes to parameters, such as, those related to the distribution |
264 |
upgrades, and let Ganeti control a helper instance, such as, the one for |
265 |
performing OS installs inside a safe environment. |
266 |
|
267 |
bidirectional |
268 |
The communication mechanism allows communication to be initiated from either |
269 |
party, namely, from a host to a guest or guest to host. |
270 |
|
271 |
Note that Ganeti will allow communication with any service (e.g., daemon) running |
272 |
on the host and, as a result, Ganeti will not be responsible for ensuring that |
273 |
only the metadata service is reachable. It is the responsibility of each system |
274 |
administrator to ensure that the extra firewalling and routing rules specified |
275 |
on the host provide the necessary protection on a given Ganeti installation and, |
276 |
at the same time, do not accidentally override the behaviour hereby described |
277 |
which makes the communication between the host and the guest exclusive, generic, |
278 |
and bidirectional, unless intended. |
279 |
|
280 |
The communication mechanism will be enabled automatically during an installation |
281 |
procedure that requires a virtualized environment, but, for backwards |
282 |
compatibility, it will be disabled when the instance is running normally, unless |
283 |
explicitly requested. Specifically, a new parameter ``--communication=yes|no`` |
284 |
(short version: ``-C``) will be added to ``gnt-instance add`` and ``gnt-instance |
285 |
modify``. This parameter will determine whether the communication mechanism is |
286 |
enabled for a particular instance. The value of this parameter will be saved as |
287 |
part of the instance's configuration. |
288 |
|
289 |
The communication mechanism will be implemented through network interfaces on |
290 |
the host and the guest, and Ganeti will be responsible for the host side, |
291 |
namely, creating a TAP interface for each guest and configuring these interfaces |
292 |
to have name ``gnt.com.%d``, where ``%d`` is a unique number within the host |
293 |
(e.g., ``gnt.com.0`` and ``gnt.com.1``), IP address ``169.254.169.254``, and |
294 |
netmask ``255.255.255.255``. The interface's name allows DHCP servers to |
295 |
recognize which interfaces are part of the communication mechanism. |
296 |
|
297 |
This network interface will be connected to the guest's last network interface, |
298 |
which is meant to be used exclusively for the communication mechanism and is |
299 |
defined after all the used-defined interfaces. The last interface was chosen |
300 |
(as opposed to the first one, for example) because the first interface is |
301 |
generally understood and the main gateway out, and also because it minimizes the |
302 |
impact on existing systems, for example, in a scenario where the system |
303 |
administrator has a running cluster and wants to enable the communication |
304 |
mechanism for already existing instances, which might have been created with |
305 |
older versions of Ganeti. Further, DBus should assist in keeping the guest |
306 |
network interfaces more stable. |
307 |
|
308 |
On the guest side, each instance will have its own MAC address and IP address. |
309 |
Both the guest's MAC address and IP address must be unique within a single |
310 |
cluster. An IP is unique within a single cluster, and not within a single host, |
311 |
in order to minimize disruption of connectivity, for example, during live |
312 |
migration, in particular since an instance is not aware when it changes host. |
313 |
Unfortunately, a side-effect of this decision is that a cluster can have a |
314 |
maximum of a ``/16`` network allowed instances (with communication enabled). If |
315 |
necessary to overcome this limit, it should be possible to allow different |
316 |
networks to be configured link-local only. |
317 |
|
318 |
The guest will use the DHCP protocol on its last network interface to contact a |
319 |
DHCP server running on the host and thus determine its IP address. The DHCP |
320 |
server is configured, started, and stopped, by Ganeti and it will be listening |
321 |
exclusively on the TAP network interfaces of the guests in order not to |
322 |
interfere with a potential DHCP server running on the same host. Furthermore, |
323 |
the DHCP server will only recognize MAC and IP address pairs that have been |
324 |
approved by Ganeti. |
325 |
|
326 |
The TAP network interfaces created for each guest share the same IP address. |
327 |
Therefore, it will be necessary to extend the routing table with rules specific |
328 |
to each guest. This can be achieved with the following command, which takes the |
329 |
guest's unique IP address and its TAP interface:: |
330 |
|
331 |
route add -host <ip> dev <ifname> |
332 |
|
333 |
This rule has the additional advantage of preventing guests from trying to lease |
334 |
IP addresses from the DHCP server other than the own that has been assigned to |
335 |
them by Ganeti. The guest could lie about its MAC address to the DHCP server |
336 |
and try to steal another guest's IP address, however, this routing rule will |
337 |
block traffic (i.e., IP packets carrying the wrong IP) from the DHCP server to |
338 |
the malicious guest. Similarly, the guest could lie about its IP address (i.e., |
339 |
simply assign a predefined IP address, perhaps from another guest), however, |
340 |
replies from the host will not be routed to the malicious guest. |
341 |
|
342 |
This routing rule ensures that the communication channel is exclusive but, as |
343 |
mentioned before, it will not prevent guests from accessing any service on the |
344 |
host. It is the system administrator's responsibility to employ the necessary |
345 |
``iptables`` rules. In order to achieve this, Ganeti will provide ``ifup`` |
346 |
hooks associated with the guest network interfaces which will give system |
347 |
administrator's the opportunity to customize their own ``iptables``, if |
348 |
necessary. Ganeti will also provide examples of such hooks. However, these are |
349 |
meant to personalized to each Ganeti installation and not to be taken as |
350 |
production ready scripts. |
351 |
|
352 |
For KVM, an instance will be started with a unique MAC address and the file |
353 |
descriptor for the TAP network interface meant to be used by the communication |
354 |
mechanism. Ganeti will be responsible for generating a unique MAC address for |
355 |
the guest, opening the TAP interface, and passing its file descriptor to KVM:: |
356 |
|
357 |
kvm -net nic,macaddr=<mac> -net tap,fd=<tap-fd> ... |
358 |
|
359 |
For Xen, a network interface will be created on the host (using the ``vif`` |
360 |
parameter of the Xen configuration file). Each instance will have its |
361 |
corresponding ``vif`` network interface on the host. The ``vif-route`` script |
362 |
of Xen might be helpful in implementing this. |
363 |
|
364 |
dnsmasq |
365 |
+++++++ |
366 |
|
367 |
The previous section describes the communication mechanism and explains the role |
368 |
of the DHCP server. Note that any DHCP server can be used in the implementation |
369 |
of the communication mechanism. However, the DHCP server employed should not |
370 |
violate the properties described in the previous section, which state that the |
371 |
communication mechanism should be exclusive, generic, and bidirectional, unless |
372 |
this is intentional. |
373 |
|
374 |
In our experiments, we have used dnsmasq. In this section, we describe how to |
375 |
properly configure dnsmasq to work on a given Ganeti installation. This is |
376 |
particularly important if, in this Ganeti installation, dnsmasq will share the |
377 |
node with one or more DHCP servers running in parallel. |
378 |
|
379 |
First, it is important to become familiar with the operational modes of dnsmasq, |
380 |
which are well explained in the `FAQ |
381 |
<http://www.thekelleys.org.uk/dnsmasq/docs/FAQ>`_ under the question ``What are |
382 |
these strange "bind-interface" and "bind-dynamic" options?``. The rest of this |
383 |
section assumes the reader is familiar with these operational modes. |
384 |
|
385 |
bind-dynamic |
386 |
dnsmasq SHOULD be configured in the ``bind-dynamic`` mode (if supported) in |
387 |
order to allow other DHCP servers to run on the same node. In this mode, |
388 |
dnsmasq can listen on the TAP interfaces for the communication mechanism by |
389 |
listening on the TAP interfaces that match the pattern ``gnt.com.*`` (e.g., |
390 |
``interface=gnt.com.*``). For extra safety, interfaces matching the pattern |
391 |
``eth*`` and the name ``lo`` should be configured such that dnsmasq will |
392 |
always ignore them (e.g., ``except-interface=eth*`` and |
393 |
``except-interface=lo``). |
394 |
|
395 |
bind-interfaces |
396 |
dnsmasq MAY be configured in the ``bind-interfaces`` mode (if supported) in |
397 |
order to allow other DHCP servers to run on the same node. Unfortunately, |
398 |
because dnsmasq cannot dynamically adjust to TAP interfaces that are created |
399 |
and destroyed by the system, dnsmasq must be restarted with a new |
400 |
configuration file each time an instance is created or destroyed. |
401 |
|
402 |
Also, the interfaces cannot be patterns, such as, ``gnt.com.*``. Instead, the |
403 |
interfaces must be explictly specified, for example, |
404 |
``interface=gnt.com.0,gnt.com.1``. Moreover, dnsmasq cannot bind to the TAP |
405 |
interfaces if they have all the same IPv4 address. As a result, it is |
406 |
necessary to configure these TAP interfaces to enable IPv6 and an IPv6 address |
407 |
must be assigned to them. |
408 |
|
409 |
wildcard |
410 |
dnsmasq CANNOT be configured in the ``wildcard`` mode if there is |
411 |
(at least) another DHCP server running on the same node. |
412 |
|
413 |
Metadata service |
414 |
++++++++++++++++ |
415 |
|
416 |
An instance will be able to reach metadata service on ``169.254.169.254:80`` in |
417 |
order to, for example, retrieve its metadata. This IP address and port were |
418 |
chosen for compatibility with the OpenStack and Amazon EC2 metadata service. |
419 |
The metadata service will be provided by a single daemon, which will determine |
420 |
the source instance for a given request and reply with the metadata pertaining |
421 |
to that instance. |
422 |
|
423 |
Where possible, the metadata will be provided in a way compatible with Amazon |
424 |
EC2, at:: |
425 |
|
426 |
http://169.254.169.254/<version>/meta-data/* |
427 |
|
428 |
Ganeti-specific metadata, that does not fit this structure, will be provided |
429 |
at:: |
430 |
|
431 |
http://169.254.169.254/ganeti/<version>/meta_data.json |
432 |
|
433 |
where ``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to |
434 |
indicate the most recent available protocol version. |
435 |
|
436 |
If needed in the future, this structure also allows us to support OpenStack's |
437 |
metadata at:: |
438 |
|
439 |
http://169.254.169.254/openstack/<version>/meta_data.json |
440 |
|
441 |
A bi-directional, pipe-like communication channel will also be provided. The |
442 |
instance will be able to receive data from the host by a GET request at:: |
443 |
|
444 |
http://169.254.169.254/ganeti/<version>/read |
445 |
|
446 |
and to send data to the host by a POST request at:: |
447 |
|
448 |
http://169.254.169.254/ganeti/<version>/write |
449 |
|
450 |
As in a pipe, once the data are read, they will not be in the buffer anymore, so |
451 |
subsequent GET requests to ``read`` will not return the same data. However, |
452 |
unlike a pipe, it will not be possible to perform blocking I/O operations. |
453 |
|
454 |
The OS parameters will be accessible through a GET request at:: |
455 |
|
456 |
http://169.254.169.254/ganeti/<version>/os/parameters.json |
457 |
|
458 |
as a JSON serialized dictionary having the parameter name as the key, and the |
459 |
pair ``(<value>, <visibility>)`` as the value, where ``<value>`` is the |
460 |
user-provided value of the parameter, and ``<visibility>`` is either ``public``, |
461 |
``private`` or ``secret``. |
462 |
|
463 |
The installation scripts to be run inside the virtualized environment will be |
464 |
available at:: |
465 |
|
466 |
http://169.254.169.254/ganeti/<version>/os/scripts/<script_name> |
467 |
|
468 |
where ``<script_name>`` is the name of the script. |
469 |
|
470 |
Rationale |
471 |
--------- |
472 |
|
473 |
The choice of using a network interface for instance-host communication, as |
474 |
opposed to VirtIO, XenBus or other methods, is due to the will of having a |
475 |
generic, hypervisor-independent way of creating a communication channel, that |
476 |
doesn't require unusual (para)virtualization drivers. |
477 |
At the same time, a network interface was preferred over solutions involving |
478 |
virtual floppy or USB devices because the latter tend to be detected and |
479 |
configured by the guest operating systems, sometimes even in prominent positions |
480 |
in the user interface, whereas it is fairly common to have an unconfigured |
481 |
network interface in a system, usually without any negative side effects. |
482 |
|
483 |
Installation process in a virtualized environment |
484 |
+++++++++++++++++++++++++++++++++++++++++++++++++ |
485 |
|
486 |
In the new OS installation scenario, we distinguish between trusted and |
487 |
untrusted code. |
488 |
|
489 |
The trusted installation code maintains the behavior of the current one and |
490 |
requires no modifications, with the scripts running on the node the instance is |
491 |
being created on. The untrusted code is stored in a subdirectory of the OS |
492 |
definition called ``untrusted``. This directory contains scripts that are |
493 |
equivalent to the already existing ones (``create``, ``export``, ``import``, |
494 |
``rename``) but that will be run inside an virtualized environment, to protect |
495 |
the host from malicious tampering. |
496 |
|
497 |
The ``untrusted`` code is meant to either be untrusted itself, or to be trusted |
498 |
code running operations that might be dangerous (such as mounting a |
499 |
user-provided image). |
500 |
|
501 |
By default, all new OS definitions will have to be explicitly marked as trusted |
502 |
by the cluster administrator (with a new ``gnt-os modify`` command) before they |
503 |
can run code on the host. Otherwise, only the untrusted part of the code will be |
504 |
allowed to run, inside the virtual appliance. For backwards compatibility |
505 |
reasons, when upgrading an existing cluster, all the installed OSes will be |
506 |
marked as trusted, so that they can keep running with no changes. |
507 |
|
508 |
In order to allow for the highest flexibility, if both a trusted and an |
509 |
untrusted script are provided for the same operation (i.e. ``create``), both of |
510 |
them will be executed at the same time, one on the host, and one inside the |
511 |
installation appliance. They will be allowed to communicate with each other |
512 |
through the already described communication mechanism, in order to orchestrate |
513 |
their execution (e.g.: the untrusted code might execute the installation, while |
514 |
the trusted one receives status updates from it and delivers them to a user |
515 |
interface). |
516 |
|
517 |
The cluster administrator will have an option to completely disable scripts |
518 |
running on the host, leaving only the ones running in the VM. |
519 |
|
520 |
Ganeti will provide a script to be run at install time that can be used to |
521 |
create the virtualized environment that will perform the OS installation of new |
522 |
instances. |
523 |
This script will build a debootstrapped basic Debian system including a software |
524 |
that will read the metadata, setup the environment variables and launch the |
525 |
installation scripts inside the virtualized environment. The script will also |
526 |
provide hooks for personalization. |
527 |
|
528 |
It will also be possible to use other self-made virtualized environments, as |
529 |
long as they connect to Ganeti over the described communication mechanism and |
530 |
they know how to read and use the provided metadata to create a new instance. |
531 |
|
532 |
While performing an installation in the virtualized environment, a customizable |
533 |
timeout will be used to detect possible problems with the installation process, |
534 |
and to kill the virtualized environment. The timeout will be optional and set on |
535 |
a cluster basis by the administrator. If set, it will be the total time allowed |
536 |
to setup an instance inside the appliance. It is mainly meant as a safety |
537 |
measure to prevent an instance taken over by malicious scripts to be available |
538 |
for a long time. |
539 |
|
540 |
Alternatives to design and implementation |
541 |
========================================= |
542 |
|
543 |
This section lists alternatives to design and implementation, which came up |
544 |
during the development of this design document, that will not be implemented. |
545 |
Please read carefully through the limitations and security concerns of each of |
546 |
these alternatives. |
547 |
|
548 |
Port forwarding in KVM |
549 |
++++++++++++++++++++++ |
550 |
|
551 |
The communication mechanism could have been implemented in KVM using guest port |
552 |
forwarding, as opposed to network interfaces. There are two alternatives in |
553 |
KVM's guest port forwarding, namely, creating a forwarding device, such as, a |
554 |
TCP/IP connection, or executing a command. However, we have determined that |
555 |
both of these options are not viable. |
556 |
|
557 |
A TCP/IP forwarding device can be created through the following KVM invocation:: |
558 |
|
559 |
kvm -net nic -net \ |
560 |
user,restrict=on,net=169.254.0.0/16,host=169.254.169.253, |
561 |
guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 ... |
562 |
|
563 |
This invocation even has the advantage that it can block undesired traffic |
564 |
(i.e., traffic that is not explicitly specified in the arguments) and it can |
565 |
remap ports, which would have allowed the metadata service daemon to run in port |
566 |
8080 instead of 80. However, in this scheme, KVM opens the TCP connection only |
567 |
once, when it is started, and, if the connection breaks, KVM will not |
568 |
reestablish the connection. Furthermore, opening the TCP connection only once |
569 |
interferes with the HTTP protocol, which needs to dynamically establish and |
570 |
close connections. |
571 |
|
572 |
The alternative to the TCP/IP forwarding device is to execute a command. The |
573 |
KVM invocation for this is, for example, the following:: |
574 |
|
575 |
kvm -net nic -net \ |
576 |
"user,restrict=on,net=169.254.0.0/16,host=169.254.169.253, |
577 |
guestfwd=tcp:169.254.169.254:80-netcat 127.0.0.1 8080" ... |
578 |
|
579 |
The advantage of this approach is that the command is executed each time the |
580 |
guest initiates a connection. This is the ideal situation, however, it is only |
581 |
supported in KVM 1.2 and above, and, therefore, not viable because we want to |
582 |
provide support for at least KVM version 1.0, which is the version provided by |
583 |
Ubuntu LTS. |
584 |
|
585 |
Alternatives to the DHCP server |
586 |
+++++++++++++++++++++++++++++++ |
587 |
|
588 |
There are alternatives to using the DHCP server, for example, by assigning a |
589 |
fixed IP address to guests, such as, the IP address ``169.254.169.253``. |
590 |
However, this introduces a routing problem, namely, how to route incoming |
591 |
packets from the same source IP to the host. This problem can be overcome in a |
592 |
number of ways. |
593 |
|
594 |
The first solution is to use NAT to translate the incoming guest IP address, for |
595 |
example, ``169.254.169.253``, to a unique IP address, for example, |
596 |
``169.254.0.1``. Given that NAT through ``ip rule`` is deprecated, users can |
597 |
resort to ``iptables``. Note that this has not yet been tested. |
598 |
|
599 |
Another option, which has been tested, but only in a prototype, is to connect |
600 |
the TAP network interfaces of the guests to a bridge. The bridge takes the |
601 |
configuration from the TAP network interfaces, namely, IP address |
602 |
``169.254.169.254`` and netmask ``255.255.255.255``, thus leaving those |
603 |
interfaces without an IP address. Note that in this setting, guests will be |
604 |
able to reach each other, therefore, if necessary, additional ``iptables`` rules |
605 |
can be put in place to prevent it. |