Revision e5eaa80a

b/doc/design-draft.rst
20 20
   design-daemons.rst
21 21
   design-hsqueeze.rst
22 22
   design-ssh-ports.rst
23
   design-os.rst
23 24

  
24 25
.. vim: set textwidth=72 :
25 26
.. Local Variables:
b/doc/design-os.rst
1
===============================
2
Ganeti OS installation redesign
3
===============================
4

  
5
.. contents:: :depth: 3
6

  
7
This is a design document detailing a new OS installation procedure, which is
8
more secure, able to provide more features and easier to use for many common
9
tasks w.r.t. the current one.
10

  
11
Current state and shortcomings
12
==============================
13

  
14
As of Ganeti 2.10, each instance is associated with an OS definition. An OS
15
definition is a set of scripts (``create``, ``export``, ``import``, ``rename``)
16
that are executed with root privileges on the primary host of the instance to
17
perform all the OS-related functionality (setting up an operating system inside
18
the disks of the instance being created, exporting/importing the instance,
19
renaming it).
20

  
21
These scripts receive, as environment variables, a fixed set of parameters
22
related to the instance (such as the hypervisor, the name of the instance, the
23
number of disks, and their location) and a set of user defined parameters.
24
These parameters are also written in the configuration file of Ganeti, to allow
25
future reinstalls of the instance, and in various log files, namely:
26

  
27
* node daemon log file: contains DEBUG strings of the ``/os_validate``,
28
  ``/instance_os_add`` and ``/instance_start`` RPC calls.
29

  
30
* master daemon log file: DEBUG strings related to the same RPC calls are stored
31
  here as well.
32

  
33
* commands log: the CLI commands that create a new instance, including their
34
  parameters, are logged here.
35

  
36
* RAPI log: the RAPI commands that create a new instance, including their
37
  parameters, are logged here.
38

  
39
* job logs: the job files stored in the job queue, or in its archive, contain
40
  the parameters.
41

  
42
The current situation presents a number of shortcomings:
43

  
44
* Having the installation scripts run as root on the nodes doesn't allow
45
  user-defined OS scripts, as they would pose a huge security issue.
46
  Furthermore, even a script without malicious intentions might end up
47
  distrupting a node because of a bug in it.
48

  
49
* Ganeti cannot be used to create instances starting from user provided disk
50
  images: even in the (hypothetical) case where the scripts are completely
51
  secure and run not by root but by an unprivileged user with only the power to
52
  mount arbitrary files as disk images, this is a security issue. It has been
53
  proven that a carefully crafted file system might exploit kernel
54
  vulnerabilities to gain control of the system. Therefore, directly mounting
55
  images on the Ganeti nodes is not an option.
56

  
57
* There is no way to inject files into an existing disk image. A common use case
58
  is for the system administrator to provide a standard image of the system, to
59
  be later personalized with the network configuration, private keys identifying
60
  the machine, ssh keys of the users and so on. A possible workaround would be
61
  for the scripts to mount the image (only if this is trusted!) and to receive
62
  the configurations and ssh keys as user defined OS parameters. Unfortunately,
63
  this is also not an option for security sensitive material (such as the ssh
64
  keys) because the OS parameters are stored in many places on the system, as
65
  already described above.
66

  
67
* Most other virtualization software simply work with instance images, not with
68
  installation scripts. This difference makes the interaction of Ganeti with
69
  other software difficult.
70

  
71
Proposed changes
72
================
73

  
74
In order to fix the shortcomings of the current state, we plan to introduce the
75
following changes:
76

  
77
* Change the OS parameters to have three categories:
78

  
79
 * ``public``: the current behavior. The parameter is logged and stored freely.
80

  
81
 * ``private``: the parameter is saved inside the Ganeti configuration (to allow
82
   for instance reinstall) but it is not shown in logs, job logs, or passed back
83
   via RAPI.
84

  
85
 * ``secret``: the parameter is not saved inside the Ganeti configuration.
86
   Reinstalls are impossible unless the data is passed again. The parameter will
87
   not appear in any log file. When a functionality is performed jointly by
88
   multiple daemons (such as MasterD and LuxiD), currently Ganeti sometimes
89
   serializes jobs on disk and later reloads them. Secret parameters will not be
90
   serialized to disk. They will be passed around as part of the LUXI calls
91
   exchanged by the daemons, and only kept in memory, in order to reduce their
92
   accessibility as much as possible. In case of failure of the master node,
93
   these parameters will be lost and cannot be recovered because they are not
94
   serialized. As a result, the job cannot be taken over by the new master.
95
   This is an expected and accepted side effect of jobs with secret parameters:
96
   if they fail, they'll have to be restarted manually.
97

  
98
* A new OS installation procedure, based on a safe virtualized environment.
99
  This virtualized environment will run with the same hardware parameter as the
100
  actual instance being installed, as much as possible. This will also allow to
101
  reduce the memory usage in the host (specifically, in Dom0 for Xen
102
  installations). Each instance will have these possible execution modes:
103

  
104
  * ``run``: the default mode, used when the machine is running normally and
105
    the OS installation procedure is run before starting the instance for the
106
    first time.
107

  
108
  * ``self_install``: the first run of the instance will be with a different set
109
    of parameters w.r.t. all the successive runs. This set of "install
110
    parameters" will allow, e.g., to attach an installation
111
    floppy/cdrom/network, change the boot device order, or specify an OS image
112
    to be used. Through this set of parameters, the administrator will have to
113
    provide the hypervisor a way to find an installation medium for the instance
114
    (e.g., a boot disk, a network image, etc). This medium will then install the
115
    instance itself on the disks and will then be responsible to get the
116
    parameters for configuring it (its network interfaces, IP address, hostname,
117
    etc.) from a set of metadata provided by Ganeti (e.g.: using an approach
118
    comparable to the one of the ``cloud-init`` tool). When this installation
119
    mode is used, no OS installation script is required.  In order for the
120
    installation of an OS from an image to be possible, the ``--os-type``
121
    parameter will be extended to support a new additional format: ``--os-type
122
    image:<URL>`` will instruct Ganeti to take an image from the specified
123
    position. For the initial implementation, URL can be either a filename or a
124
    publically accessible HTTP or FTP resource. Once the instance image is
125
    received, it will be dd-ed onto the first disk of the instance.  When an
126
    image is specified, ``--os-parameters`` can still be used, and its content
127
    will be passed to the instance as part of the metadata. Note that, as part
128
    of the OS scripts, there is a file specifying what parameters are
129
    expected. With OS images, though, none of the traditional structure of OS
130
    scripts is in place, so there will be no check regarding what parameters can
131
    be specified: they will all be passed, as long as the ``--os-parameters``
132
    string is syntactically valid.  The set of ``self_install`` parameters will
133
    be stored as part of the instance configuration, so that they can be used to
134
    reinstall the instance.  It will be the user's responsibility to ensure that
135
    the OS image or any installation media is still available in the proper
136
    position when a reinstall happens. After the first run, the instance will
137
    revert to ``run`` mode.
138

  
139
  * ``install``: Ganeti will start the instance using a virtual appliance
140
    specifically made for installing Ganeti instances. Scripts analogous to the
141
    current ones will run inside this instance. The disks of the instance being
142
    installed will be connected to this virtual appliance, so that the scripts
143
    can mount them and modify them as needed, as currently happens, but with the
144
    additional protection given by this happening in a VM. The disk of the
145
    virtual appliance will be read only, so that a pristine copy of the
146
    appliance can be started every time a new instance needs to be created, to
147
    further increase security. The data the instance needs to write at runtime
148
    will only be stored in RAM, and disappear as soon as the instance is
149
    stopped. Metadata will be provided also to this virtual applicance, that
150
    will take care of converting them to environment variables for the
151
    installation scripts. After the first run, the instance will revert to
152
    ``run`` mode.
153

  
154
* In order to allow for the metadata to be sent inside the instance, a
155
  communication mechanism between the instance and the host will be created.
156
  This mechanism will be bidirectional (e.g.: to allow the setup process going
157
  on inside the instance to communicate its progress to the host). Each instance
158
  will have access exclusively to its own metadata, and it will be only able to
159
  communicate with its host over this channel. More details will be provided in
160
  the `Communication mechanism and metadata service`_ section.
161

  
162
* As part of the instance creation command it will be possible to indicate a URL
163
  for a "personalization package", that is an archive containing a set of files
164
  meant to be overlayed on top of the operating system file system at the end of
165
  the setup process, before the VM is started for the first time in ``run``
166
  mode.  Ganeti will provide a mechanism for receiving and unpacking this
167
  archive as part of the ``install`` execution mode, whereas in ``self_install``
168
  mode it will only be provided as a metadata for the instance to use.  The
169
  archive will be in TAR-GZIP format (with extension ``.tar.gz`` or ``.tgz``)
170
  and will contain the files according to the directory structure that will be
171
  recreated on the installation disk. Files contained in this archive will
172
  overwrite files with the same path created during the install procedure (if
173
  any).  The URL of the "personalization package" will have to specify an
174
  extesion to identify the file format (in order to allow for more formats to be
175
  supported in the future).  The URL will be stored as part of the configuration
176
  of the instance (therefore, the URL should not contain confidential
177
  information, but the files there available can). It is up to the system
178
  administrator to ensure that a package is actually available at that URL at
179
  install and reinstall time.  The content of the package is allowed to change.
180
  E.g.: a system administrator might create a package containing the private
181
  keys of the instance being created. When the instance is reinstalled, a new
182
  package with new keys can be made available there, therefore allowing instance
183
  reinstall without the need to store keys.  Together with the URL, a username
184
  and a password can be specified to. If the URL is a HTTP(S) URL, they will be
185
  used as basic access authentication credentials to access that URL. The
186
  username and password will not be saved in the config, and will have to be
187
  provided again in case a reinstall is requested.  The downloaded
188
  personalization package will not be stored locally on the node for longer than
189
  it is needed while unpacking it and adding its files to the instance being
190
  created.  The personalization package will be overlayed on top of the instance
191
  filesystem after the scripts that created it have been executed.  In order for
192
  the files in the package to be automatically overlayed on top of the instance
193
  filesystem it is required that the appliance is actually able to mount the
194
  instance disks, therefore this will not work for every filesystem.
195

  
196
Implementation
197
==============
198

  
199
The implementation of this design will happen as an ordered sequence of steps,
200
of increasing impact on the system and, in some cases, dependent on each other:
201

  
202
#. Private and secret instance parameters
203
#. Communication mechanism between host and instance
204
#. Metadata service
205
#. Personalization package (inside a virtualization environment)
206
#. ``self_install`` mode
207
#. ``install`` mode (inside a virtualization environment)
208

  
209
Some of these steps need to be more deeply specified w.r.t. what is already
210
written in the `Proposed changes`_ Section. Extra details will be provided in
211
the following subsections.
212

  
213
Communication mechanism and metadata service
214
++++++++++++++++++++++++++++++++++++++++++++
215

  
216
The communication mechanism and the metadata service are described together
217
because they are deeply tied. On the other hand, the communication mechanism
218
will need to be more generic because it can be used for other reasons in the
219
future (like allowing instances to explicitly send commands to Ganeti, or to let
220
Ganeti control a helper instance, like the one hereby introduced for performing
221
OS installs inside a safe environment).
222

  
223
The communication mechanism will be enabled automatically when the instance is
224
in ``self_install`` or ``install`` mode, but for backwards compatibility it will
225
be disabled when the instance is in ``run`` mode unless it is explicitly
226
requested. Specifically, a new parameter ``--communication`` (short version:
227
``-C``), with possible values ``true`` or ``false`` will be added to
228
``gnt-instance add`` and ``gnt-instance modify``. It will determine whether the
229
instance will have a communication channel set up to interact with the host and
230
to receive metadata. The value of this parameter will be saved as part of the
231
configuration of the instance.
232

  
233
When the communication mechanism is enabled, Ganeti will create a new network
234
interface inside the instance. This additional network interface will be the
235
last one in the instance, after all the user defined ones. On the host side,
236
this interface will only be accessible to the host itself, and not routed
237
outside the machine.
238
On this network interface, the instance will connect using the IP:
239
169.254.169.1 and netmask 255.255.255.0.
240
The host will be on the same network, with the IP address: 169.254.169.254.
241

  
242
The way to create this interface depends on the specific hypervisor being used.
243
In KVM, it is possible to create a network interface inside the instance without
244
having a corresponding interface created on the host. Using a command like::
245

  
246
  kvm -net nic -net \
247
    user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
248
    guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080
249

  
250
a network interface will be created inside the VM, part of the 169.254.169.0/24
251
network, where the VM will have IP address .253 and the host port 8080 will be
252
reachable on port 80.
253

  
254
In Xen, unfortunately, such a capability is not present, and an actual network
255
interface has to be created on the host (using the ``vif`` parameter of the Xen
256
configuration file). Each instance will have its corresponding ``vif`` network
257
interface on the host. These interfaces will not be connected to each other in
258
any way, and Ganeti will not configure them to allow traffic to be forwarded
259
beyond the host machine. The ``vif-route`` script of Xen might be helpful in
260
implementing this.
261
It will be the system administrator's responsibility to ensure that the extra
262
firewalling and routing rules specified on the host don't allow this
263
accidentally.
264

  
265
The instance will be able to connect to 169.254.169.254:80, and issue GET
266
requests to an HTTP server that will provide the instance metadata.
267

  
268
The choice of this IP address and port for accessing the metadata is done for
269
compatibility reasons with OpenStack's and Amazon EC2's ways of providing
270
metadata to the instance. The metadata will be provided by a single daemon,
271
which will determine what instance the request comes from and reply with the
272
metadata specific for that instance.
273

  
274
Where possible, the metadata will be provided in a way compatible with Amazon
275
EC2, at::
276

  
277
  http://169.254.169.254/<version>/meta-data/*
278

  
279
If some metadata are Ganeti-specific and don't fit this structure, they will be
280
provided at::
281

  
282
  http://169.254.169.254/ganeti/<version>/meta_data.json
283

  
284
``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to indicate
285
the most recent available protocol version.
286

  
287
If needed in the future, this structure also allows us to support OpenStack's
288
metadata at::
289

  
290
  http://169.254.169.254/openstack/<version>/meta_data.json
291

  
292
A bi-directional, pipe-like communication channel will be provided. The instance
293
will be able to receive data from the host by a GET request at::
294

  
295
  http://169.254.169.254/ganeti/<version>/read
296

  
297
and to send data to the host by a POST request at::
298

  
299
  http://169.254.169.254/ganeti/<version>/write
300

  
301
As in a pipe, once the data are read, they will not be in the buffer anymore, so
302
subsequent GET requests to ``read`` will not return the same data twice.
303
Unlike a pipe, though, it will not be possible to perform blocking I/O
304
operations.
305

  
306
The OS parameters will be accessible through a GET
307
request at::
308

  
309
  http://169.254.169.254/ganeti/<version>/os/parameters.json
310

  
311
as a JSON serialized dictionary having the parameter name as the key, and the
312
pair ``(<value>, <visibility>)`` as the value, where ``<value>`` is the
313
user-provided value of the parameter, and ``<visibility>`` is either ``public``,
314
``private`` or ``secret``.
315

  
316
The installation scripts to be run inside the virtualized environment while the
317
instance is run in ``install`` mode will be available at::
318

  
319
  http://169.254.169.254/<version>/ganeti/os/scripts/<script_name>
320

  
321
where ``<script_name>`` is the name of the script.
322

  
323

  
324
Rationale
325
---------
326

  
327
The choice of using a network interface for instance-host communication, as
328
opposed to VirtIO, XenBus or other methods, is due to the will of having a
329
generic, hypervisor-independent way of creating a communication channel, that
330
doesn't require unusual (para)virtualization drivers.
331
At the same time, a network interface was preferred over solutions involving
332
virtual floppy or USB devices because the latter tend to be detected and
333
configured by the guest operating systems, sometimes even in prominent positions
334
in the user interface, whereas it is fairly common to have an unconfigured
335
network interface in a system, usually without any negative side effects.
336

  
337

  
338
Installation process in a virtualized environment
339
+++++++++++++++++++++++++++++++++++++++++++++++++
340

  
341
In the new OS installation scenario, we distinguish between trusted and
342
untrusted code.
343

  
344
The trusted installation code maintains the behavior of the current one and
345
requires no modifications, with the scripts running on the node the instance is
346
being created on. The untrusted code is stored in a subdirectory of the OS
347
definition called ``untrusted``.  This directory contains scripts that are
348
equivalent to the already existing ones (``create``, ``export``, ``import``,
349
``rename``) but that will be run inside an virtualized environment, to protect
350
the host from malicious tampering.
351

  
352
The ``untrusted`` code is meant to either be untrusted itself, or to be trusted
353
code running operations that might be dangerous (such as mounting a
354
user-provided image).
355

  
356
By default, all new OS definitions will have to be explicitly marked as trusted
357
by the cluster administrator (with a new ``gnt-os modify`` command) before they
358
can run code on the host. Otherwise, only the untrusted part of the code will be
359
allowed to run, inside the virtual appliance. For backwards compatibility
360
reasons, when upgrading an existing cluster, all the installed OSes will be
361
marked as trusted, so that they can keep running with no changes.
362

  
363
In order to allow for the highest flexibility, if both a trusted and an
364
untrusted script are provided for the same operation (i.e. ``create``), both of
365
them will be executed at the same time, one on the host, and one inside the
366
installation appliance. They will be allowed to communicate with each other
367
through the already described communication mechanism, in order to orchestrate
368
their execution (e.g.: the untrusted code might execute the installation, while
369
the trusted one receives status updates from it and delivers them to a user
370
interface).
371

  
372
The cluster administrator will have an option to completely disable scripts
373
running on the host, leaving only the ones running in the VM.
374

  
375
Ganeti will provide a script to be run at install time that can be used to
376
create the virtualized environment that will perform the OS installation of new
377
instances.
378
This script will build a debootstrapped basic debian system including a software
379
that will read the metadata, setup the environment variables and launch the
380
installation scripts inside the virtualized environment. The script will also
381
provide hooks for personalization.
382

  
383
It will also be possible to use other self-made virtualized environments, as
384
long as they connect to Ganeti over the described communication mechanism and
385
they know how to read and use the provided metadata to create a new instance.
386

  
387
While performing an installation in the virtualized environment, a
388
personalizable timeout will be used to detect possible problems with the
389
installation process, and to kill the virtualized environment. The timeout will
390
be optional and set on a cluster basis by the administrator. If set, it will be
391
the total time allowed to setup an instance inside the appliance. It is mainly
392
meant as a safety measure to prevent an instance taken over by malicious scripts
393
to be available for a long time.
394

  
395
.. vim: set textwidth=72 :
396
.. Local Variables:
397
.. mode: rst
398
.. fill-column: 72
399
.. End:

Also available in: Unified diff