root / doc / design-os.rst @ 0565f862
History | View | Annotate | Download (30.8 kB)
1 | e5eaa80a | Michele Tartara | =============================== |
---|---|---|---|
2 | e5eaa80a | Michele Tartara | Ganeti OS installation redesign |
3 | e5eaa80a | Michele Tartara | =============================== |
4 | e5eaa80a | Michele Tartara | |
5 | e5eaa80a | Michele Tartara | .. contents:: :depth: 3 |
6 | e5eaa80a | Michele Tartara | |
7 | e5eaa80a | Michele Tartara | This is a design document detailing a new OS installation procedure, which is |
8 | e5eaa80a | Michele Tartara | more secure, able to provide more features and easier to use for many common |
9 | e5eaa80a | Michele Tartara | tasks w.r.t. the current one. |
10 | e5eaa80a | Michele Tartara | |
11 | e5eaa80a | Michele Tartara | Current state and shortcomings |
12 | e5eaa80a | Michele Tartara | ============================== |
13 | e5eaa80a | Michele Tartara | |
14 | e5eaa80a | Michele Tartara | As of Ganeti 2.10, each instance is associated with an OS definition. An OS |
15 | 1a7c1456 | Jose A. Lopes | definition is a set of scripts (i.e., ``create``, ``export``, ``import``, |
16 | 1a7c1456 | Jose A. Lopes | ``rename``) that are executed with root privileges on the primary host of the |
17 | 1a7c1456 | Jose A. Lopes | instance. These scripts are responsible for performing all the OS-related |
18 | 1a7c1456 | Jose A. Lopes | tasks, namely, create an instance, setup an operating system on the instance's |
19 | 1a7c1456 | Jose A. Lopes | disks, export/import the instance, and rename the instance. |
20 | 1a7c1456 | Jose A. Lopes | |
21 | 1a7c1456 | Jose A. Lopes | These scripts receive, through environment variables, a fixed set of instance |
22 | 1a7c1456 | Jose A. Lopes | parameters (such as, the hypervisor, the name of the instance, the number of |
23 | 1a7c1456 | Jose A. Lopes | disks and their location) and a set of user defined parameters. Both the |
24 | 1a7c1456 | Jose A. Lopes | instance and user defined parameters are written in the configuration file of |
25 | 1a7c1456 | Jose A. Lopes | Ganeti, to allow future reinstalls of the instance, and in various log files, |
26 | 1a7c1456 | Jose A. Lopes | namely: |
27 | e5eaa80a | Michele Tartara | |
28 | e5eaa80a | Michele Tartara | * node daemon log file: contains DEBUG strings of the ``/os_validate``, |
29 | e5eaa80a | Michele Tartara | ``/instance_os_add`` and ``/instance_start`` RPC calls. |
30 | e5eaa80a | Michele Tartara | |
31 | e5eaa80a | Michele Tartara | * master daemon log file: DEBUG strings related to the same RPC calls are stored |
32 | e5eaa80a | Michele Tartara | here as well. |
33 | e5eaa80a | Michele Tartara | |
34 | e5eaa80a | Michele Tartara | * commands log: the CLI commands that create a new instance, including their |
35 | e5eaa80a | Michele Tartara | parameters, are logged here. |
36 | e5eaa80a | Michele Tartara | |
37 | e5eaa80a | Michele Tartara | * RAPI log: the RAPI commands that create a new instance, including their |
38 | e5eaa80a | Michele Tartara | parameters, are logged here. |
39 | e5eaa80a | Michele Tartara | |
40 | e5eaa80a | Michele Tartara | * job logs: the job files stored in the job queue, or in its archive, contain |
41 | e5eaa80a | Michele Tartara | the parameters. |
42 | e5eaa80a | Michele Tartara | |
43 | e5eaa80a | Michele Tartara | The current situation presents a number of shortcomings: |
44 | e5eaa80a | Michele Tartara | |
45 | 1a7c1456 | Jose A. Lopes | * Having the installation scripts run as root on the nodes does not allow |
46 | 1a7c1456 | Jose A. Lopes | user-defined OS scripts, as they would pose a huge security risk. |
47 | e5eaa80a | Michele Tartara | Furthermore, even a script without malicious intentions might end up |
48 | 1a7c1456 | Jose A. Lopes | disrupting a node because of due to a bug. |
49 | e5eaa80a | Michele Tartara | |
50 | e5eaa80a | Michele Tartara | * Ganeti cannot be used to create instances starting from user provided disk |
51 | 1a7c1456 | Jose A. Lopes | images: even in the (hypothetical) case in which the scripts are completely |
52 | e5eaa80a | Michele Tartara | secure and run not by root but by an unprivileged user with only the power to |
53 | 1a7c1456 | Jose A. Lopes | mount arbitrary files as disk images, this is still a security issue. It has |
54 | 1a7c1456 | Jose A. Lopes | been proven that a carefully crafted file system might exploit kernel |
55 | e5eaa80a | Michele Tartara | vulnerabilities to gain control of the system. Therefore, directly mounting |
56 | e5eaa80a | Michele Tartara | images on the Ganeti nodes is not an option. |
57 | e5eaa80a | Michele Tartara | |
58 | e5eaa80a | Michele Tartara | * There is no way to inject files into an existing disk image. A common use case |
59 | e5eaa80a | Michele Tartara | is for the system administrator to provide a standard image of the system, to |
60 | e5eaa80a | Michele Tartara | be later personalized with the network configuration, private keys identifying |
61 | 1a7c1456 | Jose A. Lopes | the machine, ssh keys of the users, and so on. A possible workaround would be |
62 | e5eaa80a | Michele Tartara | for the scripts to mount the image (only if this is trusted!) and to receive |
63 | e5eaa80a | Michele Tartara | the configurations and ssh keys as user defined OS parameters. Unfortunately, |
64 | e5eaa80a | Michele Tartara | this is also not an option for security sensitive material (such as the ssh |
65 | e5eaa80a | Michele Tartara | keys) because the OS parameters are stored in many places on the system, as |
66 | e5eaa80a | Michele Tartara | already described above. |
67 | e5eaa80a | Michele Tartara | |
68 | 1a7c1456 | Jose A. Lopes | * Most other virtualization software allow only instance images, but no |
69 | 1a7c1456 | Jose A. Lopes | installation scripts. This difference makes the interaction between Ganeti and |
70 | e5eaa80a | Michele Tartara | other software difficult. |
71 | e5eaa80a | Michele Tartara | |
72 | e5eaa80a | Michele Tartara | Proposed changes |
73 | e5eaa80a | Michele Tartara | ================ |
74 | e5eaa80a | Michele Tartara | |
75 | e5eaa80a | Michele Tartara | In order to fix the shortcomings of the current state, we plan to introduce the |
76 | 56c934da | Jose A. Lopes | following changes. |
77 | 56c934da | Jose A. Lopes | |
78 | 1a7c1456 | Jose A. Lopes | OS parameter categories |
79 | 1a7c1456 | Jose A. Lopes | +++++++++++++++++++++++ |
80 | 56c934da | Jose A. Lopes | |
81 | 56c934da | Jose A. Lopes | Change the OS parameters to have three categories: |
82 | 56c934da | Jose A. Lopes | |
83 | 56c934da | Jose A. Lopes | * ``public``: the current behavior. The parameter is logged and stored freely. |
84 | 56c934da | Jose A. Lopes | |
85 | 56c934da | Jose A. Lopes | * ``private``: the parameter is saved inside the Ganeti configuration (to allow |
86 | 56c934da | Jose A. Lopes | for instance reinstall) but it is not shown in logs, job logs, or passed back |
87 | 56c934da | Jose A. Lopes | via RAPI. |
88 | 56c934da | Jose A. Lopes | |
89 | 56c934da | Jose A. Lopes | * ``secret``: the parameter is not saved inside the Ganeti configuration. |
90 | 56c934da | Jose A. Lopes | Reinstalls are impossible unless the data is passed again. The parameter will |
91 | 56c934da | Jose A. Lopes | not appear in any log file. When a functionality is performed jointly by |
92 | 56c934da | Jose A. Lopes | multiple daemons (such as MasterD and LuxiD), currently Ganeti sometimes |
93 | 56c934da | Jose A. Lopes | serializes jobs on disk and later reloads them. Secret parameters will not be |
94 | 56c934da | Jose A. Lopes | serialized to disk. They will be passed around as part of the LUXI calls |
95 | 56c934da | Jose A. Lopes | exchanged by the daemons, and only kept in memory, in order to reduce their |
96 | 56c934da | Jose A. Lopes | accessibility as much as possible. In case of failure of the master node, |
97 | 56c934da | Jose A. Lopes | these parameters will be lost and cannot be recovered because they are not |
98 | 56c934da | Jose A. Lopes | serialized. As a result, the job cannot be taken over by the new master. This |
99 | 56c934da | Jose A. Lopes | is an expected and accepted side effect of jobs with secret parameters: if |
100 | 56c934da | Jose A. Lopes | they fail, they'll have to be restarted manually. |
101 | 56c934da | Jose A. Lopes | |
102 | 56c934da | Jose A. Lopes | Metadata |
103 | 56c934da | Jose A. Lopes | ++++++++ |
104 | 56c934da | Jose A. Lopes | |
105 | 56c934da | Jose A. Lopes | In order to allow metadata to be sent inside the instance, a communication |
106 | 56c934da | Jose A. Lopes | mechanism between the instance and the host will be created. This mechanism |
107 | 56c934da | Jose A. Lopes | will be bidirectional (e.g.: to allow the setup process going on inside the |
108 | 56c934da | Jose A. Lopes | instance to communicate its progress to the host). Each instance will have |
109 | 56c934da | Jose A. Lopes | access exclusively to its own metadata, and it will be only able to communicate |
110 | 56c934da | Jose A. Lopes | with its host over this channel. This is the approach followed the |
111 | 56c934da | Jose A. Lopes | ``cloud-init`` tool and more details will be provided in the `Communication |
112 | 1a7c1456 | Jose A. Lopes | mechanism`_ and `Metadata service`_ sections. |
113 | 56c934da | Jose A. Lopes | |
114 | 56c934da | Jose A. Lopes | Installation procedure |
115 | 56c934da | Jose A. Lopes | ++++++++++++++++++++++ |
116 | 56c934da | Jose A. Lopes | |
117 | 1a7c1456 | Jose A. Lopes | A new installation procedure will be introduced. There will be two sets of |
118 | 1a7c1456 | Jose A. Lopes | parameters, namely, installation parameters, which are used mainly for installs |
119 | 1a7c1456 | Jose A. Lopes | and reinstalls, and execution parameters, which are used in all the other runs |
120 | 1a7c1456 | Jose A. Lopes | that are not part of an installation procedure. Also, it will be possible to |
121 | 1a7c1456 | Jose A. Lopes | use an installation medium and/or run the OS scripts in an optional virtualized |
122 | 1a7c1456 | Jose A. Lopes | environment, and optionally use a personalization package. This section details |
123 | 1a7c1456 | Jose A. Lopes | all of these options. |
124 | 1a7c1456 | Jose A. Lopes | |
125 | 1a7c1456 | Jose A. Lopes | The set of installation parameters will allow, for example, to attach an |
126 | 1a7c1456 | Jose A. Lopes | installation floppy/cdrom/network, change the boot device order, or specify a |
127 | 1a7c1456 | Jose A. Lopes | disk image to be used. Through this set of parameters, the administrator will |
128 | 1a7c1456 | Jose A. Lopes | have to provide the hypervisor a location for an installation medium for the |
129 | 1a7c1456 | Jose A. Lopes | instance (e.g., a boot disk, a network image, etc). This medium will carry out |
130 | 1a7c1456 | Jose A. Lopes | the installation of the instance onto the instance's disks and will then be |
131 | 1a7c1456 | Jose A. Lopes | responsible for getting the parameters for configuring the instance, such as, |
132 | 1a7c1456 | Jose A. Lopes | network interfaces, IP address, and hostname. These parameters are taken from |
133 | 1a7c1456 | Jose A. Lopes | the metadata. The installation parameters will be stored in the configuration |
134 | 1a7c1456 | Jose A. Lopes | of Ganeti and used in future reinstalls, but not during normal execution. |
135 | 56c934da | Jose A. Lopes | |
136 | 56c934da | Jose A. Lopes | The instance is reinstalled using the same installation parameters from the |
137 | 56c934da | Jose A. Lopes | first installation. However, it will be the administrator's responsibility to |
138 | 1a7c1456 | Jose A. Lopes | ensure that the installation media is still available at the proper location |
139 | 56c934da | Jose A. Lopes | when a reinstall occurs. |
140 | 56c934da | Jose A. Lopes | |
141 | 56c934da | Jose A. Lopes | The parameter ``--os-parameters`` can still be used to specify the OS |
142 | 56c934da | Jose A. Lopes | parameters. However, without OS scripts, Ganeti cannot do more than a syntactic |
143 | 1a7c1456 | Jose A. Lopes | check to validate the supplied OS parameter string. As a result, this string |
144 | 1a7c1456 | Jose A. Lopes | will be passed directly to the instance as part of the metadata. If OS scripts |
145 | 1a7c1456 | Jose A. Lopes | are used and the installation procedure is running inside a virtualized |
146 | 1a7c1456 | Jose A. Lopes | environment, Ganeti will take these parameters from the metadata and pass them |
147 | 1a7c1456 | Jose A. Lopes | to the OS scripts as environment variables. |
148 | 1a7c1456 | Jose A. Lopes | |
149 | 1a7c1456 | Jose A. Lopes | Ganeti allows the following installation options: |
150 | 56c934da | Jose A. Lopes | |
151 | 56c934da | Jose A. Lopes | * Use a disk image: |
152 | 56c934da | Jose A. Lopes | |
153 | 56c934da | Jose A. Lopes | Currently, it is already possible to specify an installation medium, such as, |
154 | 56c934da | Jose A. Lopes | a cdrom, but not a disk image. Therefore, a new parameter ``--os-image`` will |
155 | 56c934da | Jose A. Lopes | be used to specify the location of a disk image which will be dumped to the |
156 | 56c934da | Jose A. Lopes | instance's first disk before the instance is started. The location of the |
157 | 56c934da | Jose A. Lopes | image can be a URL and, if this is the case, Ganeti will download this image. |
158 | 56c934da | Jose A. Lopes | |
159 | 56c934da | Jose A. Lopes | * Run OS scripts: |
160 | 56c934da | Jose A. Lopes | |
161 | 56c934da | Jose A. Lopes | The parameter ``--os-type`` (short version: ``-o``), is currently used to |
162 | 56c934da | Jose A. Lopes | specify the OS scripts. This parameter will still be used to specify the OS |
163 | 1a7c1456 | Jose A. Lopes | scripts with the difference that these scripts may optionally run inside a |
164 | 56c934da | Jose A. Lopes | virtualized environment for safety reasons, depending on whether they are |
165 | 56c934da | Jose A. Lopes | trusted or not. For more details on trusted and untrusted OS scripts, refer |
166 | 1a7c1456 | Jose A. Lopes | to the `Installation process in a virtualized environment`_ section. Note |
167 | 1a7c1456 | Jose A. Lopes | that this parameter will become optional thus allowing a user to create an |
168 | 1a7c1456 | Jose A. Lopes | instance specifying only, for example, a disk image or a cdrom image to boot |
169 | 1a7c1456 | Jose A. Lopes | from. |
170 | 56c934da | Jose A. Lopes | |
171 | 56c934da | Jose A. Lopes | * Personalization package |
172 | 56c934da | Jose A. Lopes | |
173 | 56c934da | Jose A. Lopes | As part of the instance creation command, it will be possible to indicate a |
174 | 56c934da | Jose A. Lopes | URL for a "personalization package", which is an archive containing a set of |
175 | 56c934da | Jose A. Lopes | files meant to be overlayed on top of the OS file system at the end of the |
176 | 56c934da | Jose A. Lopes | setup process and before the VM is started for the first time in normal mode. |
177 | 1a7c1456 | Jose A. Lopes | Ganeti will provide a mechanism for receiving and unpacking this archive, |
178 | 1a7c1456 | Jose A. Lopes | independently of whether the installation is being performed inside the |
179 | 1a7c1456 | Jose A. Lopes | virtualized environment or not. |
180 | 56c934da | Jose A. Lopes | |
181 | 56c934da | Jose A. Lopes | The archive will be in TAR-GZIP format (with extension ``.tar.gz`` or |
182 | 56c934da | Jose A. Lopes | ``.tgz``) and contain the files according to the directory structure that will |
183 | 56c934da | Jose A. Lopes | be recreated on the installation disk. Files contained in this archive will |
184 | 56c934da | Jose A. Lopes | overwrite files with the same path created during the installation procedure |
185 | 56c934da | Jose A. Lopes | (if any). The URL of the "personalization package" will have to specify an |
186 | 56c934da | Jose A. Lopes | extension to identify the file format (in order to allow for more formats to |
187 | 56c934da | Jose A. Lopes | be supported in the future). The URL will be stored as part of the |
188 | 56c934da | Jose A. Lopes | configuration of the instance (therefore, the URL should not contain |
189 | 56c934da | Jose A. Lopes | confidential information, but the files there available can). |
190 | 56c934da | Jose A. Lopes | |
191 | 56c934da | Jose A. Lopes | It is up to the system administrator to ensure that a package is actually |
192 | 56c934da | Jose A. Lopes | available at that URL at install and reinstall time. The contents of the |
193 | 56c934da | Jose A. Lopes | package are allowed to change. E.g.: a system administrator might create a |
194 | 56c934da | Jose A. Lopes | package containing the private keys of the instance being created. When the |
195 | 56c934da | Jose A. Lopes | instance is reinstalled, a new package with new keys can be made available |
196 | 56c934da | Jose A. Lopes | there, thus allowing instance reinstall without the need to store keys. A |
197 | 56c934da | Jose A. Lopes | username and a password can be specified together with the URL. If the URL is |
198 | 56c934da | Jose A. Lopes | a HTTP(S) URL, they will be used as basic access authentication credentials to |
199 | 56c934da | Jose A. Lopes | access that URL. The username and password will not be saved in the config, |
200 | 56c934da | Jose A. Lopes | and will have to be provided again in case a reinstall is requested. |
201 | 56c934da | Jose A. Lopes | |
202 | 56c934da | Jose A. Lopes | The downloaded personalization package will not be stored locally on the node |
203 | 56c934da | Jose A. Lopes | for longer than it is needed while unpacking it and adding its files to the |
204 | 56c934da | Jose A. Lopes | instance being created. The personalization package will be overlayed on top |
205 | 56c934da | Jose A. Lopes | of the instance filesystem after the scripts that created it have been |
206 | 56c934da | Jose A. Lopes | executed. In order for the files in the package to be automatically overlayed |
207 | 56c934da | Jose A. Lopes | on top of the instance filesystem, it is required that the appliance is |
208 | 56c934da | Jose A. Lopes | actually able to mount the instance's disks. As a result, this will not work |
209 | 56c934da | Jose A. Lopes | for every filesystem. |
210 | 56c934da | Jose A. Lopes | |
211 | 56c934da | Jose A. Lopes | * Combine a disk image, OS scripts, and a personalization package |
212 | 56c934da | Jose A. Lopes | |
213 | 56c934da | Jose A. Lopes | It will possible to combine a disk image, OS scripts, and a personalization |
214 | 1a7c1456 | Jose A. Lopes | package, both with or without a virtualized environment (see the exception |
215 | 1a7c1456 | Jose A. Lopes | below). At least, an installation medium or OS scripts should be specified. |
216 | 56c934da | Jose A. Lopes | |
217 | 56c934da | Jose A. Lopes | The disk image of the actual virtual appliance, which bootstraps the virtual |
218 | 56c934da | Jose A. Lopes | environment used in the installation procedure, will be read only, so that a |
219 | 56c934da | Jose A. Lopes | pristine copy of the appliance can be started every time a new instance needs |
220 | 56c934da | Jose A. Lopes | to be created and to further increase security. The data the instance needs |
221 | 56c934da | Jose A. Lopes | to write at runtime will only be stored in RAM and disappear as soon as the |
222 | 56c934da | Jose A. Lopes | instance is stopped. |
223 | 56c934da | Jose A. Lopes | |
224 | 56c934da | Jose A. Lopes | The parameter ``--enable-safe-install=yes|no`` will be used to give the |
225 | 56c934da | Jose A. Lopes | administrator control over whether to use a virtualized environment for the |
226 | 56c934da | Jose A. Lopes | installation procedure. By default, a virtualized environment will be used. |
227 | 56c934da | Jose A. Lopes | Note that some feature combinations, such as, using untrusted scripts, will |
228 | 56c934da | Jose A. Lopes | require the virtualized environment. In this case, Ganeti will not allow |
229 | 56c934da | Jose A. Lopes | disabling the virtualized environment. |
230 | e5eaa80a | Michele Tartara | |
231 | e5eaa80a | Michele Tartara | Implementation |
232 | e5eaa80a | Michele Tartara | ============== |
233 | e5eaa80a | Michele Tartara | |
234 | e5eaa80a | Michele Tartara | The implementation of this design will happen as an ordered sequence of steps, |
235 | e5eaa80a | Michele Tartara | of increasing impact on the system and, in some cases, dependent on each other: |
236 | e5eaa80a | Michele Tartara | |
237 | e5eaa80a | Michele Tartara | #. Private and secret instance parameters |
238 | e5eaa80a | Michele Tartara | #. Communication mechanism between host and instance |
239 | e5eaa80a | Michele Tartara | #. Metadata service |
240 | e5eaa80a | Michele Tartara | #. Personalization package (inside a virtualization environment) |
241 | 56c934da | Jose A. Lopes | #. Instance creation via a disk image |
242 | 56c934da | Jose A. Lopes | #. Instance creation inside a virtualized environment |
243 | e5eaa80a | Michele Tartara | |
244 | e5eaa80a | Michele Tartara | Some of these steps need to be more deeply specified w.r.t. what is already |
245 | e5eaa80a | Michele Tartara | written in the `Proposed changes`_ Section. Extra details will be provided in |
246 | e5eaa80a | Michele Tartara | the following subsections. |
247 | e5eaa80a | Michele Tartara | |
248 | 1a7c1456 | Jose A. Lopes | Communication mechanism |
249 | 1a7c1456 | Jose A. Lopes | +++++++++++++++++++++++ |
250 | 1a7c1456 | Jose A. Lopes | |
251 | 1a7c1456 | Jose A. Lopes | The communication mechanism will be an exclusive, generic, bidirectional |
252 | 1a7c1456 | Jose A. Lopes | communication channel between Ganeti hosts and guests. |
253 | 1a7c1456 | Jose A. Lopes | |
254 | 1a7c1456 | Jose A. Lopes | exclusive |
255 | 1a7c1456 | Jose A. Lopes | The communication mechanism allows communication between a guest and its host, |
256 | 1a7c1456 | Jose A. Lopes | but it does not allow a guest to communicate with other guests or reach the |
257 | 1a7c1456 | Jose A. Lopes | outside world. |
258 | 1a7c1456 | Jose A. Lopes | |
259 | 1a7c1456 | Jose A. Lopes | generic |
260 | 1a7c1456 | Jose A. Lopes | The communication mechanism allows a guest to reach any service on the host, |
261 | 1a7c1456 | Jose A. Lopes | not just the metadata service. Examples of valid communication include, but |
262 | 1a7c1456 | Jose A. Lopes | are not limited to, access to the metadata service, send commands to Ganeti, |
263 | 1a7c1456 | Jose A. Lopes | request changes to parameters, such as, those related to the distribution |
264 | 1a7c1456 | Jose A. Lopes | upgrades, and let Ganeti control a helper instance, such as, the one for |
265 | 1a7c1456 | Jose A. Lopes | performing OS installs inside a safe environment. |
266 | 1a7c1456 | Jose A. Lopes | |
267 | 1a7c1456 | Jose A. Lopes | bidirectional |
268 | 1a7c1456 | Jose A. Lopes | The communication mechanism allows communication to be initiated from either |
269 | 1a7c1456 | Jose A. Lopes | party, namely, from a host to a guest or guest to host. |
270 | 1a7c1456 | Jose A. Lopes | |
271 | 1a7c1456 | Jose A. Lopes | Note that Ganeti will allow communication with any service (e.g., daemon) running |
272 | 1a7c1456 | Jose A. Lopes | on the host and, as a result, Ganeti will not be responsible for ensuring that |
273 | 1a7c1456 | Jose A. Lopes | only the metadata service is reachable. It is the responsibility of each system |
274 | 1a7c1456 | Jose A. Lopes | administrator to ensure that the extra firewalling and routing rules specified |
275 | 1a7c1456 | Jose A. Lopes | on the host provide the necessary protection on a given Ganeti installation and, |
276 | 1a7c1456 | Jose A. Lopes | at the same time, do not accidentally override the behaviour hereby described |
277 | 1a7c1456 | Jose A. Lopes | which makes the communication between the host and the guest exclusive, generic, |
278 | 1a7c1456 | Jose A. Lopes | and bidirectional, unless intended. |
279 | 56c934da | Jose A. Lopes | |
280 | 56c934da | Jose A. Lopes | The communication mechanism will be enabled automatically during an installation |
281 | 1a7c1456 | Jose A. Lopes | procedure that requires a virtualized environment, but, for backwards |
282 | 1a7c1456 | Jose A. Lopes | compatibility, it will be disabled when the instance is running normally, unless |
283 | 1a7c1456 | Jose A. Lopes | explicitly requested. Specifically, a new parameter ``--communication=yes|no`` |
284 | 1a7c1456 | Jose A. Lopes | (short version: ``-C``) will be added to ``gnt-instance add`` and ``gnt-instance |
285 | 1a7c1456 | Jose A. Lopes | modify``. This parameter will determine whether the communication mechanism is |
286 | 1a7c1456 | Jose A. Lopes | enabled for a particular instance. The value of this parameter will be saved as |
287 | 1a7c1456 | Jose A. Lopes | part of the instance's configuration. |
288 | 1a7c1456 | Jose A. Lopes | |
289 | 1a7c1456 | Jose A. Lopes | The communication mechanism will be implemented through network interfaces on |
290 | 1a7c1456 | Jose A. Lopes | the host and the guest, and Ganeti will be responsible for the host side, |
291 | 1a7c1456 | Jose A. Lopes | namely, creating a TAP interface for each guest and configuring these interfaces |
292 | 1ab752c8 | Jose A. Lopes | to have name ``gnt.com.%d``, where ``%d`` is a unique number within the host |
293 | 1ab752c8 | Jose A. Lopes | (e.g., ``gnt.com.0`` and ``gnt.com.1``), IP address ``169.254.169.254``, and |
294 | 1ab752c8 | Jose A. Lopes | netmask ``255.255.255.255``. The interface's name allows DHCP servers to |
295 | 1ab752c8 | Jose A. Lopes | recognize which interfaces are part of the communication mechanism. |
296 | 1ab752c8 | Jose A. Lopes | |
297 | 1ab752c8 | Jose A. Lopes | This network interface will be connected to the guest's last network interface, |
298 | 1ab752c8 | Jose A. Lopes | which is meant to be used exclusively for the communication mechanism and is |
299 | 1ab752c8 | Jose A. Lopes | defined after all the used-defined interfaces. The last interface was chosen |
300 | 1ab752c8 | Jose A. Lopes | (as opposed to the first one, for example) because the first interface is |
301 | 1ab752c8 | Jose A. Lopes | generally understood and the main gateway out, and also because it minimizes the |
302 | 1ab752c8 | Jose A. Lopes | impact on existing systems, for example, in a scenario where the system |
303 | 1ab752c8 | Jose A. Lopes | administrator has a running cluster and wants to enable the communication |
304 | 1ab752c8 | Jose A. Lopes | mechanism for already existing instances, which might have been created with |
305 | 1ab752c8 | Jose A. Lopes | older versions of Ganeti. Further, DBus should assist in keeping the guest |
306 | 1ab752c8 | Jose A. Lopes | network interfaces more stable. |
307 | 1a7c1456 | Jose A. Lopes | |
308 | 1a7c1456 | Jose A. Lopes | On the guest side, each instance will have its own MAC address and IP address. |
309 | 1a7c1456 | Jose A. Lopes | Both the guest's MAC address and IP address must be unique within a single |
310 | 1a7c1456 | Jose A. Lopes | cluster. An IP is unique within a single cluster, and not within a single host, |
311 | 1a7c1456 | Jose A. Lopes | in order to minimize disruption of connectivity, for example, during live |
312 | 1a7c1456 | Jose A. Lopes | migration, in particular since an instance is not aware when it changes host. |
313 | 1a7c1456 | Jose A. Lopes | Unfortunately, a side-effect of this decision is that a cluster can have a |
314 | 1a7c1456 | Jose A. Lopes | maximum of a ``/16`` network allowed instances (with communication enabled). If |
315 | 1a7c1456 | Jose A. Lopes | necessary to overcome this limit, it should be possible to allow different |
316 | 1a7c1456 | Jose A. Lopes | networks to be configured link-local only. |
317 | 1a7c1456 | Jose A. Lopes | |
318 | 1a7c1456 | Jose A. Lopes | The guest will use the DHCP protocol on its last network interface to contact a |
319 | 1a7c1456 | Jose A. Lopes | DHCP server running on the host and thus determine its IP address. The DHCP |
320 | 1a7c1456 | Jose A. Lopes | server is configured, started, and stopped, by Ganeti and it will be listening |
321 | 1a7c1456 | Jose A. Lopes | exclusively on the TAP network interfaces of the guests in order not to |
322 | 1a7c1456 | Jose A. Lopes | interfere with a potential DHCP server running on the same host. Furthermore, |
323 | 1a7c1456 | Jose A. Lopes | the DHCP server will only recognize MAC and IP address pairs that have been |
324 | 1a7c1456 | Jose A. Lopes | approved by Ganeti. |
325 | 1a7c1456 | Jose A. Lopes | |
326 | 1a7c1456 | Jose A. Lopes | The TAP network interfaces created for each guest share the same IP address. |
327 | 1a7c1456 | Jose A. Lopes | Therefore, it will be necessary to extend the routing table with rules specific |
328 | 1a7c1456 | Jose A. Lopes | to each guest. This can be achieved with the following command, which takes the |
329 | 1a7c1456 | Jose A. Lopes | guest's unique IP address and its TAP interface:: |
330 | 1a7c1456 | Jose A. Lopes | |
331 | 1a7c1456 | Jose A. Lopes | route add -host <ip> dev <ifname> |
332 | 1a7c1456 | Jose A. Lopes | |
333 | 1a7c1456 | Jose A. Lopes | This rule has the additional advantage of preventing guests from trying to lease |
334 | 1a7c1456 | Jose A. Lopes | IP addresses from the DHCP server other than the own that has been assigned to |
335 | 1a7c1456 | Jose A. Lopes | them by Ganeti. The guest could lie about its MAC address to the DHCP server |
336 | 1a7c1456 | Jose A. Lopes | and try to steal another guest's IP address, however, this routing rule will |
337 | 1a7c1456 | Jose A. Lopes | block traffic (i.e., IP packets carrying the wrong IP) from the DHCP server to |
338 | 1a7c1456 | Jose A. Lopes | the malicious guest. Similarly, the guest could lie about its IP address (i.e., |
339 | 1a7c1456 | Jose A. Lopes | simply assign a predefined IP address, perhaps from another guest), however, |
340 | 1a7c1456 | Jose A. Lopes | replies from the host will not be routed to the malicious guest. |
341 | 1a7c1456 | Jose A. Lopes | |
342 | 1a7c1456 | Jose A. Lopes | This routing rule ensures that the communication channel is exclusive but, as |
343 | 1a7c1456 | Jose A. Lopes | mentioned before, it will not prevent guests from accessing any service on the |
344 | 1a7c1456 | Jose A. Lopes | host. It is the system administrator's responsibility to employ the necessary |
345 | 1a7c1456 | Jose A. Lopes | ``iptables`` rules. In order to achieve this, Ganeti will provide ``ifup`` |
346 | 1a7c1456 | Jose A. Lopes | hooks associated with the guest network interfaces which will give system |
347 | 1a7c1456 | Jose A. Lopes | administrator's the opportunity to customize their own ``iptables``, if |
348 | 1a7c1456 | Jose A. Lopes | necessary. Ganeti will also provide examples of such hooks. However, these are |
349 | 1a7c1456 | Jose A. Lopes | meant to personalized to each Ganeti installation and not to be taken as |
350 | 1a7c1456 | Jose A. Lopes | production ready scripts. |
351 | 1a7c1456 | Jose A. Lopes | |
352 | 1a7c1456 | Jose A. Lopes | For KVM, an instance will be started with a unique MAC address and the file |
353 | 1a7c1456 | Jose A. Lopes | descriptor for the TAP network interface meant to be used by the communication |
354 | 1a7c1456 | Jose A. Lopes | mechanism. Ganeti will be responsible for generating a unique MAC address for |
355 | 1a7c1456 | Jose A. Lopes | the guest, opening the TAP interface, and passing its file descriptor to KVM:: |
356 | 1a7c1456 | Jose A. Lopes | |
357 | 1a7c1456 | Jose A. Lopes | kvm -net nic,macaddr=<mac> -net tap,fd=<tap-fd> ... |
358 | 1a7c1456 | Jose A. Lopes | |
359 | 1a7c1456 | Jose A. Lopes | For Xen, a network interface will be created on the host (using the ``vif`` |
360 | 1a7c1456 | Jose A. Lopes | parameter of the Xen configuration file). Each instance will have its |
361 | 1a7c1456 | Jose A. Lopes | corresponding ``vif`` network interface on the host. The ``vif-route`` script |
362 | 1a7c1456 | Jose A. Lopes | of Xen might be helpful in implementing this. |
363 | 1a7c1456 | Jose A. Lopes | |
364 | 1ab752c8 | Jose A. Lopes | dnsmasq |
365 | 1ab752c8 | Jose A. Lopes | +++++++ |
366 | 1ab752c8 | Jose A. Lopes | |
367 | 1ab752c8 | Jose A. Lopes | The previous section describes the communication mechanism and explains the role |
368 | 1ab752c8 | Jose A. Lopes | of the DHCP server. Note that any DHCP server can be used in the implementation |
369 | 1ab752c8 | Jose A. Lopes | of the communication mechanism. However, the DHCP server employed should not |
370 | 1ab752c8 | Jose A. Lopes | violate the properties described in the previous section, which state that the |
371 | 1ab752c8 | Jose A. Lopes | communication mechanism should be exclusive, generic, and bidirectional, unless |
372 | 1ab752c8 | Jose A. Lopes | this is intentional. |
373 | 1ab752c8 | Jose A. Lopes | |
374 | 1ab752c8 | Jose A. Lopes | In our experiments, we have used dnsmasq. In this section, we describe how to |
375 | 1ab752c8 | Jose A. Lopes | properly configure dnsmasq to work on a given Ganeti installation. This is |
376 | 1ab752c8 | Jose A. Lopes | particularly important if, in this Ganeti installation, dnsmasq will share the |
377 | 1ab752c8 | Jose A. Lopes | node with one or more DHCP servers running in parallel. |
378 | 1ab752c8 | Jose A. Lopes | |
379 | 1ab752c8 | Jose A. Lopes | First, it is important to become familiar with the operational modes of dnsmasq, |
380 | 1ab752c8 | Jose A. Lopes | which are well explained in the `FAQ |
381 | 1ab752c8 | Jose A. Lopes | <http://www.thekelleys.org.uk/dnsmasq/docs/FAQ>`_ under the question ``What are |
382 | 1ab752c8 | Jose A. Lopes | these strange "bind-interface" and "bind-dynamic" options?``. The rest of this |
383 | 1ab752c8 | Jose A. Lopes | section assumes the reader is familiar with these operational modes. |
384 | 1ab752c8 | Jose A. Lopes | |
385 | 1ab752c8 | Jose A. Lopes | bind-dynamic |
386 | 1ab752c8 | Jose A. Lopes | dnsmasq SHOULD be configured in the ``bind-dynamic`` mode (if supported) in |
387 | 1ab752c8 | Jose A. Lopes | order to allow other DHCP servers to run on the same node. In this mode, |
388 | 1ab752c8 | Jose A. Lopes | dnsmasq can listen on the TAP interfaces for the communication mechanism by |
389 | 1ab752c8 | Jose A. Lopes | listening on the TAP interfaces that match the pattern ``gnt.com.*`` (e.g., |
390 | 1ab752c8 | Jose A. Lopes | ``interface=gnt.com.*``). For extra safety, interfaces matching the pattern |
391 | 1ab752c8 | Jose A. Lopes | ``eth*`` and the name ``lo`` should be configured such that dnsmasq will |
392 | 1ab752c8 | Jose A. Lopes | always ignore them (e.g., ``except-interface=eth*`` and |
393 | 1ab752c8 | Jose A. Lopes | ``except-interface=lo``). |
394 | 1ab752c8 | Jose A. Lopes | |
395 | 1ab752c8 | Jose A. Lopes | bind-interfaces |
396 | 1ab752c8 | Jose A. Lopes | dnsmasq MAY be configured in the ``bind-interfaces`` mode (if supported) in |
397 | 1ab752c8 | Jose A. Lopes | order to allow other DHCP servers to run on the same node. Unfortunately, |
398 | 1ab752c8 | Jose A. Lopes | because dnsmasq cannot dynamically adjust to TAP interfaces that are created |
399 | 1ab752c8 | Jose A. Lopes | and destroyed by the system, dnsmasq must be restarted with a new |
400 | 1ab752c8 | Jose A. Lopes | configuration file each time an instance is created or destroyed. |
401 | 1ab752c8 | Jose A. Lopes | |
402 | 1ab752c8 | Jose A. Lopes | Also, the interfaces cannot be patterns, such as, ``gnt.com.*``. Instead, the |
403 | 1ab752c8 | Jose A. Lopes | interfaces must be explictly specified, for example, |
404 | 1ab752c8 | Jose A. Lopes | ``interface=gnt.com.0,gnt.com.1``. Moreover, dnsmasq cannot bind to the TAP |
405 | 1ab752c8 | Jose A. Lopes | interfaces if they have all the same IPv4 address. As a result, it is |
406 | 1ab752c8 | Jose A. Lopes | necessary to configure these TAP interfaces to enable IPv6 and an IPv6 address |
407 | 1ab752c8 | Jose A. Lopes | must be assigned to them. |
408 | 1ab752c8 | Jose A. Lopes | |
409 | 1ab752c8 | Jose A. Lopes | wildcard |
410 | 1ab752c8 | Jose A. Lopes | dnsmasq CANNOT be configured in the ``wildcard`` mode if there is |
411 | 1ab752c8 | Jose A. Lopes | (at least) another DHCP server running on the same node. |
412 | 1ab752c8 | Jose A. Lopes | |
413 | 1a7c1456 | Jose A. Lopes | Metadata service |
414 | 1a7c1456 | Jose A. Lopes | ++++++++++++++++ |
415 | 1a7c1456 | Jose A. Lopes | |
416 | 1a7c1456 | Jose A. Lopes | An instance will be able to reach metadata service on ``169.254.169.254:80`` in |
417 | 1a7c1456 | Jose A. Lopes | order to, for example, retrieve its metadata. This IP address and port were |
418 | 1a7c1456 | Jose A. Lopes | chosen for compatibility with the OpenStack and Amazon EC2 metadata service. |
419 | 1a7c1456 | Jose A. Lopes | The metadata service will be provided by a single daemon, which will determine |
420 | 1a7c1456 | Jose A. Lopes | the source instance for a given request and reply with the metadata pertaining |
421 | 1a7c1456 | Jose A. Lopes | to that instance. |
422 | e5eaa80a | Michele Tartara | |
423 | e5eaa80a | Michele Tartara | Where possible, the metadata will be provided in a way compatible with Amazon |
424 | e5eaa80a | Michele Tartara | EC2, at:: |
425 | e5eaa80a | Michele Tartara | |
426 | e5eaa80a | Michele Tartara | http://169.254.169.254/<version>/meta-data/* |
427 | e5eaa80a | Michele Tartara | |
428 | 1a7c1456 | Jose A. Lopes | Ganeti-specific metadata, that does not fit this structure, will be provided |
429 | 1a7c1456 | Jose A. Lopes | at:: |
430 | e5eaa80a | Michele Tartara | |
431 | e5eaa80a | Michele Tartara | http://169.254.169.254/ganeti/<version>/meta_data.json |
432 | e5eaa80a | Michele Tartara | |
433 | 1a7c1456 | Jose A. Lopes | where ``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to |
434 | 1a7c1456 | Jose A. Lopes | indicate the most recent available protocol version. |
435 | e5eaa80a | Michele Tartara | |
436 | e5eaa80a | Michele Tartara | If needed in the future, this structure also allows us to support OpenStack's |
437 | e5eaa80a | Michele Tartara | metadata at:: |
438 | e5eaa80a | Michele Tartara | |
439 | e5eaa80a | Michele Tartara | http://169.254.169.254/openstack/<version>/meta_data.json |
440 | e5eaa80a | Michele Tartara | |
441 | 1a7c1456 | Jose A. Lopes | A bi-directional, pipe-like communication channel will also be provided. The |
442 | 1a7c1456 | Jose A. Lopes | instance will be able to receive data from the host by a GET request at:: |
443 | e5eaa80a | Michele Tartara | |
444 | e5eaa80a | Michele Tartara | http://169.254.169.254/ganeti/<version>/read |
445 | e5eaa80a | Michele Tartara | |
446 | e5eaa80a | Michele Tartara | and to send data to the host by a POST request at:: |
447 | e5eaa80a | Michele Tartara | |
448 | e5eaa80a | Michele Tartara | http://169.254.169.254/ganeti/<version>/write |
449 | e5eaa80a | Michele Tartara | |
450 | e5eaa80a | Michele Tartara | As in a pipe, once the data are read, they will not be in the buffer anymore, so |
451 | 1a7c1456 | Jose A. Lopes | subsequent GET requests to ``read`` will not return the same data. However, |
452 | 1a7c1456 | Jose A. Lopes | unlike a pipe, it will not be possible to perform blocking I/O operations. |
453 | e5eaa80a | Michele Tartara | |
454 | 1a7c1456 | Jose A. Lopes | The OS parameters will be accessible through a GET request at:: |
455 | e5eaa80a | Michele Tartara | |
456 | e5eaa80a | Michele Tartara | http://169.254.169.254/ganeti/<version>/os/parameters.json |
457 | e5eaa80a | Michele Tartara | |
458 | e5eaa80a | Michele Tartara | as a JSON serialized dictionary having the parameter name as the key, and the |
459 | e5eaa80a | Michele Tartara | pair ``(<value>, <visibility>)`` as the value, where ``<value>`` is the |
460 | e5eaa80a | Michele Tartara | user-provided value of the parameter, and ``<visibility>`` is either ``public``, |
461 | e5eaa80a | Michele Tartara | ``private`` or ``secret``. |
462 | e5eaa80a | Michele Tartara | |
463 | 56c934da | Jose A. Lopes | The installation scripts to be run inside the virtualized environment will be |
464 | 56c934da | Jose A. Lopes | available at:: |
465 | e5eaa80a | Michele Tartara | |
466 | 56c934da | Jose A. Lopes | http://169.254.169.254/ganeti/<version>/os/scripts/<script_name> |
467 | e5eaa80a | Michele Tartara | |
468 | e5eaa80a | Michele Tartara | where ``<script_name>`` is the name of the script. |
469 | e5eaa80a | Michele Tartara | |
470 | e5eaa80a | Michele Tartara | Rationale |
471 | e5eaa80a | Michele Tartara | --------- |
472 | e5eaa80a | Michele Tartara | |
473 | e5eaa80a | Michele Tartara | The choice of using a network interface for instance-host communication, as |
474 | e5eaa80a | Michele Tartara | opposed to VirtIO, XenBus or other methods, is due to the will of having a |
475 | e5eaa80a | Michele Tartara | generic, hypervisor-independent way of creating a communication channel, that |
476 | e5eaa80a | Michele Tartara | doesn't require unusual (para)virtualization drivers. |
477 | e5eaa80a | Michele Tartara | At the same time, a network interface was preferred over solutions involving |
478 | e5eaa80a | Michele Tartara | virtual floppy or USB devices because the latter tend to be detected and |
479 | e5eaa80a | Michele Tartara | configured by the guest operating systems, sometimes even in prominent positions |
480 | e5eaa80a | Michele Tartara | in the user interface, whereas it is fairly common to have an unconfigured |
481 | e5eaa80a | Michele Tartara | network interface in a system, usually without any negative side effects. |
482 | e5eaa80a | Michele Tartara | |
483 | e5eaa80a | Michele Tartara | Installation process in a virtualized environment |
484 | e5eaa80a | Michele Tartara | +++++++++++++++++++++++++++++++++++++++++++++++++ |
485 | e5eaa80a | Michele Tartara | |
486 | e5eaa80a | Michele Tartara | In the new OS installation scenario, we distinguish between trusted and |
487 | e5eaa80a | Michele Tartara | untrusted code. |
488 | e5eaa80a | Michele Tartara | |
489 | e5eaa80a | Michele Tartara | The trusted installation code maintains the behavior of the current one and |
490 | e5eaa80a | Michele Tartara | requires no modifications, with the scripts running on the node the instance is |
491 | e5eaa80a | Michele Tartara | being created on. The untrusted code is stored in a subdirectory of the OS |
492 | e5eaa80a | Michele Tartara | definition called ``untrusted``. This directory contains scripts that are |
493 | e5eaa80a | Michele Tartara | equivalent to the already existing ones (``create``, ``export``, ``import``, |
494 | e5eaa80a | Michele Tartara | ``rename``) but that will be run inside an virtualized environment, to protect |
495 | e5eaa80a | Michele Tartara | the host from malicious tampering. |
496 | e5eaa80a | Michele Tartara | |
497 | e5eaa80a | Michele Tartara | The ``untrusted`` code is meant to either be untrusted itself, or to be trusted |
498 | e5eaa80a | Michele Tartara | code running operations that might be dangerous (such as mounting a |
499 | e5eaa80a | Michele Tartara | user-provided image). |
500 | e5eaa80a | Michele Tartara | |
501 | e5eaa80a | Michele Tartara | By default, all new OS definitions will have to be explicitly marked as trusted |
502 | e5eaa80a | Michele Tartara | by the cluster administrator (with a new ``gnt-os modify`` command) before they |
503 | e5eaa80a | Michele Tartara | can run code on the host. Otherwise, only the untrusted part of the code will be |
504 | e5eaa80a | Michele Tartara | allowed to run, inside the virtual appliance. For backwards compatibility |
505 | e5eaa80a | Michele Tartara | reasons, when upgrading an existing cluster, all the installed OSes will be |
506 | e5eaa80a | Michele Tartara | marked as trusted, so that they can keep running with no changes. |
507 | e5eaa80a | Michele Tartara | |
508 | e5eaa80a | Michele Tartara | In order to allow for the highest flexibility, if both a trusted and an |
509 | e5eaa80a | Michele Tartara | untrusted script are provided for the same operation (i.e. ``create``), both of |
510 | e5eaa80a | Michele Tartara | them will be executed at the same time, one on the host, and one inside the |
511 | e5eaa80a | Michele Tartara | installation appliance. They will be allowed to communicate with each other |
512 | e5eaa80a | Michele Tartara | through the already described communication mechanism, in order to orchestrate |
513 | e5eaa80a | Michele Tartara | their execution (e.g.: the untrusted code might execute the installation, while |
514 | e5eaa80a | Michele Tartara | the trusted one receives status updates from it and delivers them to a user |
515 | e5eaa80a | Michele Tartara | interface). |
516 | e5eaa80a | Michele Tartara | |
517 | e5eaa80a | Michele Tartara | The cluster administrator will have an option to completely disable scripts |
518 | e5eaa80a | Michele Tartara | running on the host, leaving only the ones running in the VM. |
519 | e5eaa80a | Michele Tartara | |
520 | e5eaa80a | Michele Tartara | Ganeti will provide a script to be run at install time that can be used to |
521 | e5eaa80a | Michele Tartara | create the virtualized environment that will perform the OS installation of new |
522 | e5eaa80a | Michele Tartara | instances. |
523 | 1a7c1456 | Jose A. Lopes | This script will build a debootstrapped basic Debian system including a software |
524 | e5eaa80a | Michele Tartara | that will read the metadata, setup the environment variables and launch the |
525 | e5eaa80a | Michele Tartara | installation scripts inside the virtualized environment. The script will also |
526 | e5eaa80a | Michele Tartara | provide hooks for personalization. |
527 | e5eaa80a | Michele Tartara | |
528 | e5eaa80a | Michele Tartara | It will also be possible to use other self-made virtualized environments, as |
529 | e5eaa80a | Michele Tartara | long as they connect to Ganeti over the described communication mechanism and |
530 | e5eaa80a | Michele Tartara | they know how to read and use the provided metadata to create a new instance. |
531 | e5eaa80a | Michele Tartara | |
532 | 1a7c1456 | Jose A. Lopes | While performing an installation in the virtualized environment, a customizable |
533 | 1a7c1456 | Jose A. Lopes | timeout will be used to detect possible problems with the installation process, |
534 | 1a7c1456 | Jose A. Lopes | and to kill the virtualized environment. The timeout will be optional and set on |
535 | 1a7c1456 | Jose A. Lopes | a cluster basis by the administrator. If set, it will be the total time allowed |
536 | 1a7c1456 | Jose A. Lopes | to setup an instance inside the appliance. It is mainly meant as a safety |
537 | 1a7c1456 | Jose A. Lopes | measure to prevent an instance taken over by malicious scripts to be available |
538 | 1a7c1456 | Jose A. Lopes | for a long time. |
539 | 1a7c1456 | Jose A. Lopes | |
540 | 1a7c1456 | Jose A. Lopes | Alternatives to design and implementation |
541 | 1a7c1456 | Jose A. Lopes | ========================================= |
542 | 1a7c1456 | Jose A. Lopes | |
543 | 1a7c1456 | Jose A. Lopes | This section lists alternatives to design and implementation, which came up |
544 | 1a7c1456 | Jose A. Lopes | during the development of this design document, that will not be implemented. |
545 | 1a7c1456 | Jose A. Lopes | Please read carefully through the limitations and security concerns of each of |
546 | 1a7c1456 | Jose A. Lopes | these alternatives. |
547 | 1a7c1456 | Jose A. Lopes | |
548 | 1a7c1456 | Jose A. Lopes | Port forwarding in KVM |
549 | 1a7c1456 | Jose A. Lopes | ++++++++++++++++++++++ |
550 | 1a7c1456 | Jose A. Lopes | |
551 | 1a7c1456 | Jose A. Lopes | The communication mechanism could have been implemented in KVM using guest port |
552 | 1a7c1456 | Jose A. Lopes | forwarding, as opposed to network interfaces. There are two alternatives in |
553 | 1a7c1456 | Jose A. Lopes | KVM's guest port forwarding, namely, creating a forwarding device, such as, a |
554 | 1a7c1456 | Jose A. Lopes | TCP/IP connection, or executing a command. However, we have determined that |
555 | 1a7c1456 | Jose A. Lopes | both of these options are not viable. |
556 | 1a7c1456 | Jose A. Lopes | |
557 | 1a7c1456 | Jose A. Lopes | A TCP/IP forwarding device can be created through the following KVM invocation:: |
558 | 1a7c1456 | Jose A. Lopes | |
559 | 1a7c1456 | Jose A. Lopes | kvm -net nic -net \ |
560 | 1a7c1456 | Jose A. Lopes | user,restrict=on,net=169.254.0.0/16,host=169.254.169.253, |
561 | 1a7c1456 | Jose A. Lopes | guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 ... |
562 | 1a7c1456 | Jose A. Lopes | |
563 | 1a7c1456 | Jose A. Lopes | This invocation even has the advantage that it can block undesired traffic |
564 | 1a7c1456 | Jose A. Lopes | (i.e., traffic that is not explicitly specified in the arguments) and it can |
565 | 1a7c1456 | Jose A. Lopes | remap ports, which would have allowed the metadata service daemon to run in port |
566 | 1a7c1456 | Jose A. Lopes | 8080 instead of 80. However, in this scheme, KVM opens the TCP connection only |
567 | 1a7c1456 | Jose A. Lopes | once, when it is started, and, if the connection breaks, KVM will not |
568 | 1a7c1456 | Jose A. Lopes | reestablish the connection. Furthermore, opening the TCP connection only once |
569 | 1a7c1456 | Jose A. Lopes | interferes with the HTTP protocol, which needs to dynamically establish and |
570 | 1a7c1456 | Jose A. Lopes | close connections. |
571 | 1a7c1456 | Jose A. Lopes | |
572 | 1a7c1456 | Jose A. Lopes | The alternative to the TCP/IP forwarding device is to execute a command. The |
573 | 1a7c1456 | Jose A. Lopes | KVM invocation for this is, for example, the following:: |
574 | 1a7c1456 | Jose A. Lopes | |
575 | 1a7c1456 | Jose A. Lopes | kvm -net nic -net \ |
576 | 1a7c1456 | Jose A. Lopes | "user,restrict=on,net=169.254.0.0/16,host=169.254.169.253, |
577 | 1a7c1456 | Jose A. Lopes | guestfwd=tcp:169.254.169.254:80-netcat 127.0.0.1 8080" ... |
578 | 1a7c1456 | Jose A. Lopes | |
579 | 1a7c1456 | Jose A. Lopes | The advantage of this approach is that the command is executed each time the |
580 | 1a7c1456 | Jose A. Lopes | guest initiates a connection. This is the ideal situation, however, it is only |
581 | 1a7c1456 | Jose A. Lopes | supported in KVM 1.2 and above, and, therefore, not viable because we want to |
582 | 1a7c1456 | Jose A. Lopes | provide support for at least KVM version 1.0, which is the version provided by |
583 | 1a7c1456 | Jose A. Lopes | Ubuntu LTS. |
584 | 1a7c1456 | Jose A. Lopes | |
585 | 1a7c1456 | Jose A. Lopes | Alternatives to the DHCP server |
586 | 1a7c1456 | Jose A. Lopes | +++++++++++++++++++++++++++++++ |
587 | 1a7c1456 | Jose A. Lopes | |
588 | 1a7c1456 | Jose A. Lopes | There are alternatives to using the DHCP server, for example, by assigning a |
589 | 1a7c1456 | Jose A. Lopes | fixed IP address to guests, such as, the IP address ``169.254.169.253``. |
590 | 1a7c1456 | Jose A. Lopes | However, this introduces a routing problem, namely, how to route incoming |
591 | 1a7c1456 | Jose A. Lopes | packets from the same source IP to the host. This problem can be overcome in a |
592 | 1a7c1456 | Jose A. Lopes | number of ways. |
593 | 1a7c1456 | Jose A. Lopes | |
594 | 1a7c1456 | Jose A. Lopes | The first solution is to use NAT to translate the incoming guest IP address, for |
595 | 1a7c1456 | Jose A. Lopes | example, ``169.254.169.253``, to a unique IP address, for example, |
596 | 1a7c1456 | Jose A. Lopes | ``169.254.0.1``. Given that NAT through ``ip rule`` is deprecated, users can |
597 | 1a7c1456 | Jose A. Lopes | resort to ``iptables``. Note that this has not yet been tested. |
598 | 1a7c1456 | Jose A. Lopes | |
599 | 1a7c1456 | Jose A. Lopes | Another option, which has been tested, but only in a prototype, is to connect |
600 | 1a7c1456 | Jose A. Lopes | the TAP network interfaces of the guests to a bridge. The bridge takes the |
601 | 1a7c1456 | Jose A. Lopes | configuration from the TAP network interfaces, namely, IP address |
602 | 1a7c1456 | Jose A. Lopes | ``169.254.169.254`` and netmask ``255.255.255.255``, thus leaving those |
603 | 1a7c1456 | Jose A. Lopes | interfaces without an IP address. Note that in this setting, guests will be |
604 | 1a7c1456 | Jose A. Lopes | able to reach each other, therefore, if necessary, additional ``iptables`` rules |
605 | 1a7c1456 | Jose A. Lopes | can be put in place to prevent it. |