|
1 |
===============================
|
|
2 |
Ganeti OS installation redesign
|
|
3 |
===============================
|
|
4 |
|
|
5 |
.. contents:: :depth: 3
|
|
6 |
|
|
7 |
This is a design document detailing a new OS installation procedure, which is
|
|
8 |
more secure, able to provide more features and easier to use for many common
|
|
9 |
tasks w.r.t. the current one.
|
|
10 |
|
|
11 |
Current state and shortcomings
|
|
12 |
==============================
|
|
13 |
|
|
14 |
As of Ganeti 2.10, each instance is associated with an OS definition. An OS
|
|
15 |
definition is a set of scripts (``create``, ``export``, ``import``, ``rename``)
|
|
16 |
that are executed with root privileges on the primary host of the instance to
|
|
17 |
perform all the OS-related functionality (setting up an operating system inside
|
|
18 |
the disks of the instance being created, exporting/importing the instance,
|
|
19 |
renaming it).
|
|
20 |
|
|
21 |
These scripts receive, as environment variables, a fixed set of parameters
|
|
22 |
related to the instance (such as the hypervisor, the name of the instance, the
|
|
23 |
number of disks, and their location) and a set of user defined parameters.
|
|
24 |
These parameters are also written in the configuration file of Ganeti, to allow
|
|
25 |
future reinstalls of the instance, and in various log files, namely:
|
|
26 |
|
|
27 |
* node daemon log file: contains DEBUG strings of the ``/os_validate``,
|
|
28 |
``/instance_os_add`` and ``/instance_start`` RPC calls.
|
|
29 |
|
|
30 |
* master daemon log file: DEBUG strings related to the same RPC calls are stored
|
|
31 |
here as well.
|
|
32 |
|
|
33 |
* commands log: the CLI commands that create a new instance, including their
|
|
34 |
parameters, are logged here.
|
|
35 |
|
|
36 |
* RAPI log: the RAPI commands that create a new instance, including their
|
|
37 |
parameters, are logged here.
|
|
38 |
|
|
39 |
* job logs: the job files stored in the job queue, or in its archive, contain
|
|
40 |
the parameters.
|
|
41 |
|
|
42 |
The current situation presents a number of shortcomings:
|
|
43 |
|
|
44 |
* Having the installation scripts run as root on the nodes doesn't allow
|
|
45 |
user-defined OS scripts, as they would pose a huge security issue.
|
|
46 |
Furthermore, even a script without malicious intentions might end up
|
|
47 |
distrupting a node because of a bug in it.
|
|
48 |
|
|
49 |
* Ganeti cannot be used to create instances starting from user provided disk
|
|
50 |
images: even in the (hypothetical) case where the scripts are completely
|
|
51 |
secure and run not by root but by an unprivileged user with only the power to
|
|
52 |
mount arbitrary files as disk images, this is a security issue. It has been
|
|
53 |
proven that a carefully crafted file system might exploit kernel
|
|
54 |
vulnerabilities to gain control of the system. Therefore, directly mounting
|
|
55 |
images on the Ganeti nodes is not an option.
|
|
56 |
|
|
57 |
* There is no way to inject files into an existing disk image. A common use case
|
|
58 |
is for the system administrator to provide a standard image of the system, to
|
|
59 |
be later personalized with the network configuration, private keys identifying
|
|
60 |
the machine, ssh keys of the users and so on. A possible workaround would be
|
|
61 |
for the scripts to mount the image (only if this is trusted!) and to receive
|
|
62 |
the configurations and ssh keys as user defined OS parameters. Unfortunately,
|
|
63 |
this is also not an option for security sensitive material (such as the ssh
|
|
64 |
keys) because the OS parameters are stored in many places on the system, as
|
|
65 |
already described above.
|
|
66 |
|
|
67 |
* Most other virtualization software simply work with instance images, not with
|
|
68 |
installation scripts. This difference makes the interaction of Ganeti with
|
|
69 |
other software difficult.
|
|
70 |
|
|
71 |
Proposed changes
|
|
72 |
================
|
|
73 |
|
|
74 |
In order to fix the shortcomings of the current state, we plan to introduce the
|
|
75 |
following changes:
|
|
76 |
|
|
77 |
* Change the OS parameters to have three categories:
|
|
78 |
|
|
79 |
* ``public``: the current behavior. The parameter is logged and stored freely.
|
|
80 |
|
|
81 |
* ``private``: the parameter is saved inside the Ganeti configuration (to allow
|
|
82 |
for instance reinstall) but it is not shown in logs, job logs, or passed back
|
|
83 |
via RAPI.
|
|
84 |
|
|
85 |
* ``secret``: the parameter is not saved inside the Ganeti configuration.
|
|
86 |
Reinstalls are impossible unless the data is passed again. The parameter will
|
|
87 |
not appear in any log file. When a functionality is performed jointly by
|
|
88 |
multiple daemons (such as MasterD and LuxiD), currently Ganeti sometimes
|
|
89 |
serializes jobs on disk and later reloads them. Secret parameters will not be
|
|
90 |
serialized to disk. They will be passed around as part of the LUXI calls
|
|
91 |
exchanged by the daemons, and only kept in memory, in order to reduce their
|
|
92 |
accessibility as much as possible. In case of failure of the master node,
|
|
93 |
these parameters will be lost and cannot be recovered because they are not
|
|
94 |
serialized. As a result, the job cannot be taken over by the new master.
|
|
95 |
This is an expected and accepted side effect of jobs with secret parameters:
|
|
96 |
if they fail, they'll have to be restarted manually.
|
|
97 |
|
|
98 |
* A new OS installation procedure, based on a safe virtualized environment.
|
|
99 |
This virtualized environment will run with the same hardware parameter as the
|
|
100 |
actual instance being installed, as much as possible. This will also allow to
|
|
101 |
reduce the memory usage in the host (specifically, in Dom0 for Xen
|
|
102 |
installations). Each instance will have these possible execution modes:
|
|
103 |
|
|
104 |
* ``run``: the default mode, used when the machine is running normally and
|
|
105 |
the OS installation procedure is run before starting the instance for the
|
|
106 |
first time.
|
|
107 |
|
|
108 |
* ``self_install``: the first run of the instance will be with a different set
|
|
109 |
of parameters w.r.t. all the successive runs. This set of "install
|
|
110 |
parameters" will allow, e.g., to attach an installation
|
|
111 |
floppy/cdrom/network, change the boot device order, or specify an OS image
|
|
112 |
to be used. Through this set of parameters, the administrator will have to
|
|
113 |
provide the hypervisor a way to find an installation medium for the instance
|
|
114 |
(e.g., a boot disk, a network image, etc). This medium will then install the
|
|
115 |
instance itself on the disks and will then be responsible to get the
|
|
116 |
parameters for configuring it (its network interfaces, IP address, hostname,
|
|
117 |
etc.) from a set of metadata provided by Ganeti (e.g.: using an approach
|
|
118 |
comparable to the one of the ``cloud-init`` tool). When this installation
|
|
119 |
mode is used, no OS installation script is required. In order for the
|
|
120 |
installation of an OS from an image to be possible, the ``--os-type``
|
|
121 |
parameter will be extended to support a new additional format: ``--os-type
|
|
122 |
image:<URL>`` will instruct Ganeti to take an image from the specified
|
|
123 |
position. For the initial implementation, URL can be either a filename or a
|
|
124 |
publically accessible HTTP or FTP resource. Once the instance image is
|
|
125 |
received, it will be dd-ed onto the first disk of the instance. When an
|
|
126 |
image is specified, ``--os-parameters`` can still be used, and its content
|
|
127 |
will be passed to the instance as part of the metadata. Note that, as part
|
|
128 |
of the OS scripts, there is a file specifying what parameters are
|
|
129 |
expected. With OS images, though, none of the traditional structure of OS
|
|
130 |
scripts is in place, so there will be no check regarding what parameters can
|
|
131 |
be specified: they will all be passed, as long as the ``--os-parameters``
|
|
132 |
string is syntactically valid. The set of ``self_install`` parameters will
|
|
133 |
be stored as part of the instance configuration, so that they can be used to
|
|
134 |
reinstall the instance. It will be the user's responsibility to ensure that
|
|
135 |
the OS image or any installation media is still available in the proper
|
|
136 |
position when a reinstall happens. After the first run, the instance will
|
|
137 |
revert to ``run`` mode.
|
|
138 |
|
|
139 |
* ``install``: Ganeti will start the instance using a virtual appliance
|
|
140 |
specifically made for installing Ganeti instances. Scripts analogous to the
|
|
141 |
current ones will run inside this instance. The disks of the instance being
|
|
142 |
installed will be connected to this virtual appliance, so that the scripts
|
|
143 |
can mount them and modify them as needed, as currently happens, but with the
|
|
144 |
additional protection given by this happening in a VM. The disk of the
|
|
145 |
virtual appliance will be read only, so that a pristine copy of the
|
|
146 |
appliance can be started every time a new instance needs to be created, to
|
|
147 |
further increase security. The data the instance needs to write at runtime
|
|
148 |
will only be stored in RAM, and disappear as soon as the instance is
|
|
149 |
stopped. Metadata will be provided also to this virtual applicance, that
|
|
150 |
will take care of converting them to environment variables for the
|
|
151 |
installation scripts. After the first run, the instance will revert to
|
|
152 |
``run`` mode.
|
|
153 |
|
|
154 |
* In order to allow for the metadata to be sent inside the instance, a
|
|
155 |
communication mechanism between the instance and the host will be created.
|
|
156 |
This mechanism will be bidirectional (e.g.: to allow the setup process going
|
|
157 |
on inside the instance to communicate its progress to the host). Each instance
|
|
158 |
will have access exclusively to its own metadata, and it will be only able to
|
|
159 |
communicate with its host over this channel. More details will be provided in
|
|
160 |
the `Communication mechanism and metadata service`_ section.
|
|
161 |
|
|
162 |
* As part of the instance creation command it will be possible to indicate a URL
|
|
163 |
for a "personalization package", that is an archive containing a set of files
|
|
164 |
meant to be overlayed on top of the operating system file system at the end of
|
|
165 |
the setup process, before the VM is started for the first time in ``run``
|
|
166 |
mode. Ganeti will provide a mechanism for receiving and unpacking this
|
|
167 |
archive as part of the ``install`` execution mode, whereas in ``self_install``
|
|
168 |
mode it will only be provided as a metadata for the instance to use. The
|
|
169 |
archive will be in TAR-GZIP format (with extension ``.tar.gz`` or ``.tgz``)
|
|
170 |
and will contain the files according to the directory structure that will be
|
|
171 |
recreated on the installation disk. Files contained in this archive will
|
|
172 |
overwrite files with the same path created during the install procedure (if
|
|
173 |
any). The URL of the "personalization package" will have to specify an
|
|
174 |
extesion to identify the file format (in order to allow for more formats to be
|
|
175 |
supported in the future). The URL will be stored as part of the configuration
|
|
176 |
of the instance (therefore, the URL should not contain confidential
|
|
177 |
information, but the files there available can). It is up to the system
|
|
178 |
administrator to ensure that a package is actually available at that URL at
|
|
179 |
install and reinstall time. The content of the package is allowed to change.
|
|
180 |
E.g.: a system administrator might create a package containing the private
|
|
181 |
keys of the instance being created. When the instance is reinstalled, a new
|
|
182 |
package with new keys can be made available there, therefore allowing instance
|
|
183 |
reinstall without the need to store keys. Together with the URL, a username
|
|
184 |
and a password can be specified to. If the URL is a HTTP(S) URL, they will be
|
|
185 |
used as basic access authentication credentials to access that URL. The
|
|
186 |
username and password will not be saved in the config, and will have to be
|
|
187 |
provided again in case a reinstall is requested. The downloaded
|
|
188 |
personalization package will not be stored locally on the node for longer than
|
|
189 |
it is needed while unpacking it and adding its files to the instance being
|
|
190 |
created. The personalization package will be overlayed on top of the instance
|
|
191 |
filesystem after the scripts that created it have been executed. In order for
|
|
192 |
the files in the package to be automatically overlayed on top of the instance
|
|
193 |
filesystem it is required that the appliance is actually able to mount the
|
|
194 |
instance disks, therefore this will not work for every filesystem.
|
|
195 |
|
|
196 |
Implementation
|
|
197 |
==============
|
|
198 |
|
|
199 |
The implementation of this design will happen as an ordered sequence of steps,
|
|
200 |
of increasing impact on the system and, in some cases, dependent on each other:
|
|
201 |
|
|
202 |
#. Private and secret instance parameters
|
|
203 |
#. Communication mechanism between host and instance
|
|
204 |
#. Metadata service
|
|
205 |
#. Personalization package (inside a virtualization environment)
|
|
206 |
#. ``self_install`` mode
|
|
207 |
#. ``install`` mode (inside a virtualization environment)
|
|
208 |
|
|
209 |
Some of these steps need to be more deeply specified w.r.t. what is already
|
|
210 |
written in the `Proposed changes`_ Section. Extra details will be provided in
|
|
211 |
the following subsections.
|
|
212 |
|
|
213 |
Communication mechanism and metadata service
|
|
214 |
++++++++++++++++++++++++++++++++++++++++++++
|
|
215 |
|
|
216 |
The communication mechanism and the metadata service are described together
|
|
217 |
because they are deeply tied. On the other hand, the communication mechanism
|
|
218 |
will need to be more generic because it can be used for other reasons in the
|
|
219 |
future (like allowing instances to explicitly send commands to Ganeti, or to let
|
|
220 |
Ganeti control a helper instance, like the one hereby introduced for performing
|
|
221 |
OS installs inside a safe environment).
|
|
222 |
|
|
223 |
The communication mechanism will be enabled automatically when the instance is
|
|
224 |
in ``self_install`` or ``install`` mode, but for backwards compatibility it will
|
|
225 |
be disabled when the instance is in ``run`` mode unless it is explicitly
|
|
226 |
requested. Specifically, a new parameter ``--communication`` (short version:
|
|
227 |
``-C``), with possible values ``true`` or ``false`` will be added to
|
|
228 |
``gnt-instance add`` and ``gnt-instance modify``. It will determine whether the
|
|
229 |
instance will have a communication channel set up to interact with the host and
|
|
230 |
to receive metadata. The value of this parameter will be saved as part of the
|
|
231 |
configuration of the instance.
|
|
232 |
|
|
233 |
When the communication mechanism is enabled, Ganeti will create a new network
|
|
234 |
interface inside the instance. This additional network interface will be the
|
|
235 |
last one in the instance, after all the user defined ones. On the host side,
|
|
236 |
this interface will only be accessible to the host itself, and not routed
|
|
237 |
outside the machine.
|
|
238 |
On this network interface, the instance will connect using the IP:
|
|
239 |
169.254.169.1 and netmask 255.255.255.0.
|
|
240 |
The host will be on the same network, with the IP address: 169.254.169.254.
|
|
241 |
|
|
242 |
The way to create this interface depends on the specific hypervisor being used.
|
|
243 |
In KVM, it is possible to create a network interface inside the instance without
|
|
244 |
having a corresponding interface created on the host. Using a command like::
|
|
245 |
|
|
246 |
kvm -net nic -net \
|
|
247 |
user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
|
|
248 |
guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080
|
|
249 |
|
|
250 |
a network interface will be created inside the VM, part of the 169.254.169.0/24
|
|
251 |
network, where the VM will have IP address .253 and the host port 8080 will be
|
|
252 |
reachable on port 80.
|
|
253 |
|
|
254 |
In Xen, unfortunately, such a capability is not present, and an actual network
|
|
255 |
interface has to be created on the host (using the ``vif`` parameter of the Xen
|
|
256 |
configuration file). Each instance will have its corresponding ``vif`` network
|
|
257 |
interface on the host. These interfaces will not be connected to each other in
|
|
258 |
any way, and Ganeti will not configure them to allow traffic to be forwarded
|
|
259 |
beyond the host machine. The ``vif-route`` script of Xen might be helpful in
|
|
260 |
implementing this.
|
|
261 |
It will be the system administrator's responsibility to ensure that the extra
|
|
262 |
firewalling and routing rules specified on the host don't allow this
|
|
263 |
accidentally.
|
|
264 |
|
|
265 |
The instance will be able to connect to 169.254.169.254:80, and issue GET
|
|
266 |
requests to an HTTP server that will provide the instance metadata.
|
|
267 |
|
|
268 |
The choice of this IP address and port for accessing the metadata is done for
|
|
269 |
compatibility reasons with OpenStack's and Amazon EC2's ways of providing
|
|
270 |
metadata to the instance. The metadata will be provided by a single daemon,
|
|
271 |
which will determine what instance the request comes from and reply with the
|
|
272 |
metadata specific for that instance.
|
|
273 |
|
|
274 |
Where possible, the metadata will be provided in a way compatible with Amazon
|
|
275 |
EC2, at::
|
|
276 |
|
|
277 |
http://169.254.169.254/<version>/meta-data/*
|
|
278 |
|
|
279 |
If some metadata are Ganeti-specific and don't fit this structure, they will be
|
|
280 |
provided at::
|
|
281 |
|
|
282 |
http://169.254.169.254/ganeti/<version>/meta_data.json
|
|
283 |
|
|
284 |
``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to indicate
|
|
285 |
the most recent available protocol version.
|
|
286 |
|
|
287 |
If needed in the future, this structure also allows us to support OpenStack's
|
|
288 |
metadata at::
|
|
289 |
|
|
290 |
http://169.254.169.254/openstack/<version>/meta_data.json
|
|
291 |
|
|
292 |
A bi-directional, pipe-like communication channel will be provided. The instance
|
|
293 |
will be able to receive data from the host by a GET request at::
|
|
294 |
|
|
295 |
http://169.254.169.254/ganeti/<version>/read
|
|
296 |
|
|
297 |
and to send data to the host by a POST request at::
|
|
298 |
|
|
299 |
http://169.254.169.254/ganeti/<version>/write
|
|
300 |
|
|
301 |
As in a pipe, once the data are read, they will not be in the buffer anymore, so
|
|
302 |
subsequent GET requests to ``read`` will not return the same data twice.
|
|
303 |
Unlike a pipe, though, it will not be possible to perform blocking I/O
|
|
304 |
operations.
|
|
305 |
|
|
306 |
The OS parameters will be accessible through a GET
|
|
307 |
request at::
|
|
308 |
|
|
309 |
http://169.254.169.254/ganeti/<version>/os/parameters.json
|
|
310 |
|
|
311 |
as a JSON serialized dictionary having the parameter name as the key, and the
|
|
312 |
pair ``(<value>, <visibility>)`` as the value, where ``<value>`` is the
|
|
313 |
user-provided value of the parameter, and ``<visibility>`` is either ``public``,
|
|
314 |
``private`` or ``secret``.
|
|
315 |
|
|
316 |
The installation scripts to be run inside the virtualized environment while the
|
|
317 |
instance is run in ``install`` mode will be available at::
|
|
318 |
|
|
319 |
http://169.254.169.254/<version>/ganeti/os/scripts/<script_name>
|
|
320 |
|
|
321 |
where ``<script_name>`` is the name of the script.
|
|
322 |
|
|
323 |
|
|
324 |
Rationale
|
|
325 |
---------
|
|
326 |
|
|
327 |
The choice of using a network interface for instance-host communication, as
|
|
328 |
opposed to VirtIO, XenBus or other methods, is due to the will of having a
|
|
329 |
generic, hypervisor-independent way of creating a communication channel, that
|
|
330 |
doesn't require unusual (para)virtualization drivers.
|
|
331 |
At the same time, a network interface was preferred over solutions involving
|
|
332 |
virtual floppy or USB devices because the latter tend to be detected and
|
|
333 |
configured by the guest operating systems, sometimes even in prominent positions
|
|
334 |
in the user interface, whereas it is fairly common to have an unconfigured
|
|
335 |
network interface in a system, usually without any negative side effects.
|
|
336 |
|
|
337 |
|
|
338 |
Installation process in a virtualized environment
|
|
339 |
+++++++++++++++++++++++++++++++++++++++++++++++++
|
|
340 |
|
|
341 |
In the new OS installation scenario, we distinguish between trusted and
|
|
342 |
untrusted code.
|
|
343 |
|
|
344 |
The trusted installation code maintains the behavior of the current one and
|
|
345 |
requires no modifications, with the scripts running on the node the instance is
|
|
346 |
being created on. The untrusted code is stored in a subdirectory of the OS
|
|
347 |
definition called ``untrusted``. This directory contains scripts that are
|
|
348 |
equivalent to the already existing ones (``create``, ``export``, ``import``,
|
|
349 |
``rename``) but that will be run inside an virtualized environment, to protect
|
|
350 |
the host from malicious tampering.
|
|
351 |
|
|
352 |
The ``untrusted`` code is meant to either be untrusted itself, or to be trusted
|
|
353 |
code running operations that might be dangerous (such as mounting a
|
|
354 |
user-provided image).
|
|
355 |
|
|
356 |
By default, all new OS definitions will have to be explicitly marked as trusted
|
|
357 |
by the cluster administrator (with a new ``gnt-os modify`` command) before they
|
|
358 |
can run code on the host. Otherwise, only the untrusted part of the code will be
|
|
359 |
allowed to run, inside the virtual appliance. For backwards compatibility
|
|
360 |
reasons, when upgrading an existing cluster, all the installed OSes will be
|
|
361 |
marked as trusted, so that they can keep running with no changes.
|
|
362 |
|
|
363 |
In order to allow for the highest flexibility, if both a trusted and an
|
|
364 |
untrusted script are provided for the same operation (i.e. ``create``), both of
|
|
365 |
them will be executed at the same time, one on the host, and one inside the
|
|
366 |
installation appliance. They will be allowed to communicate with each other
|
|
367 |
through the already described communication mechanism, in order to orchestrate
|
|
368 |
their execution (e.g.: the untrusted code might execute the installation, while
|
|
369 |
the trusted one receives status updates from it and delivers them to a user
|
|
370 |
interface).
|
|
371 |
|
|
372 |
The cluster administrator will have an option to completely disable scripts
|
|
373 |
running on the host, leaving only the ones running in the VM.
|
|
374 |
|
|
375 |
Ganeti will provide a script to be run at install time that can be used to
|
|
376 |
create the virtualized environment that will perform the OS installation of new
|
|
377 |
instances.
|
|
378 |
This script will build a debootstrapped basic debian system including a software
|
|
379 |
that will read the metadata, setup the environment variables and launch the
|
|
380 |
installation scripts inside the virtualized environment. The script will also
|
|
381 |
provide hooks for personalization.
|
|
382 |
|
|
383 |
It will also be possible to use other self-made virtualized environments, as
|
|
384 |
long as they connect to Ganeti over the described communication mechanism and
|
|
385 |
they know how to read and use the provided metadata to create a new instance.
|
|
386 |
|
|
387 |
While performing an installation in the virtualized environment, a
|
|
388 |
personalizable timeout will be used to detect possible problems with the
|
|
389 |
installation process, and to kill the virtualized environment. The timeout will
|
|
390 |
be optional and set on a cluster basis by the administrator. If set, it will be
|
|
391 |
the total time allowed to setup an instance inside the appliance. It is mainly
|
|
392 |
meant as a safety measure to prevent an instance taken over by malicious scripts
|
|
393 |
to be available for a long time.
|
|
394 |
|
|
395 |
.. vim: set textwidth=72 :
|
|
396 |
.. Local Variables:
|
|
397 |
.. mode: rst
|
|
398 |
.. fill-column: 72
|
|
399 |
.. End:
|