Statistics
| Branch: | Tag: | Revision:

root / doc / design-2.1.rst @ 3bd3d643

History | View | Annotate | Download (13.5 kB)

1 82a1c938 Guido Trotter
=================
2 82a1c938 Guido Trotter
Ganeti 2.1 design
3 82a1c938 Guido Trotter
=================
4 82a1c938 Guido Trotter
5 82a1c938 Guido Trotter
This document describes the major changes in Ganeti 2.1 compared to
6 82a1c938 Guido Trotter
the 2.0 version.
7 82a1c938 Guido Trotter
8 82a1c938 Guido Trotter
The 2.1 version will be a relatively small release. Its main aim is to avoid
9 82a1c938 Guido Trotter
changing too much of the core code, while addressing issues and adding new
10 82a1c938 Guido Trotter
features and improvements over 2.0, in a timely fashion.
11 82a1c938 Guido Trotter
12 82a1c938 Guido Trotter
.. contents:: :depth: 3
13 82a1c938 Guido Trotter
14 82a1c938 Guido Trotter
Objective
15 82a1c938 Guido Trotter
=========
16 82a1c938 Guido Trotter
17 82a1c938 Guido Trotter
Ganeti 2.1 will add features to help further automatization of cluster
18 82a1c938 Guido Trotter
operations, further improbe scalability to even bigger clusters, and make it
19 82a1c938 Guido Trotter
easier to debug the Ganeti core.
20 82a1c938 Guido Trotter
21 82a1c938 Guido Trotter
Background
22 82a1c938 Guido Trotter
==========
23 82a1c938 Guido Trotter
24 82a1c938 Guido Trotter
Overview
25 82a1c938 Guido Trotter
========
26 82a1c938 Guido Trotter
27 82a1c938 Guido Trotter
Detailed design
28 82a1c938 Guido Trotter
===============
29 82a1c938 Guido Trotter
30 82a1c938 Guido Trotter
As for 2.0 we divide the 2.1 design into three areas:
31 82a1c938 Guido Trotter
32 587ff6fa Guido Trotter
- core changes, which affect the master daemon/job queue/locking or all/most
33 587ff6fa Guido Trotter
  logical units
34 82a1c938 Guido Trotter
- logical unit/feature changes
35 82a1c938 Guido Trotter
- external interface changes (eg. command line, os api, hooks, ...)
36 82a1c938 Guido Trotter
37 82a1c938 Guido Trotter
Core changes
38 82a1c938 Guido Trotter
------------
39 82a1c938 Guido Trotter
40 82a1c938 Guido Trotter
Feature changes
41 82a1c938 Guido Trotter
---------------
42 82a1c938 Guido Trotter
43 d1268971 Guido Trotter
Redistribute Config
44 d1268971 Guido Trotter
~~~~~~~~~~~~~~~~~~~
45 d1268971 Guido Trotter
46 d1268971 Guido Trotter
Current State and shortcomings
47 d1268971 Guido Trotter
++++++++++++++++++++++++++++++
48 d1268971 Guido Trotter
Currently LURedistributeConfig triggers a copy of the updated configuration
49 d1268971 Guido Trotter
file to all master candidates and of the ssconf files to all nodes. There are
50 d1268971 Guido Trotter
other files which are maintained manually but which are important to keep in
51 d1268971 Guido Trotter
sync. These are:
52 d1268971 Guido Trotter
53 d1268971 Guido Trotter
- rapi SSL key certificate file (rapi.pem) (on master candidates)
54 d1268971 Guido Trotter
- rapi user/password file rapi_users (on master candidates)
55 d1268971 Guido Trotter
56 d1268971 Guido Trotter
Furthermore there are some files which are hypervisor specific but we may want
57 d1268971 Guido Trotter
to keep in sync:
58 d1268971 Guido Trotter
59 d1268971 Guido Trotter
- the xen-hvm hypervisor uses one shared file for all vnc passwords, and copies
60 d1268971 Guido Trotter
  the file once, during node add. This design is subject to revision to be able
61 d1268971 Guido Trotter
  to have different passwords for different groups of instances via the use of
62 d1268971 Guido Trotter
  hypervisor parameters, and to allow xen-hvm and kvm to use an equal system to
63 d1268971 Guido Trotter
  provide password-protected vnc sessions. In general, though, it would be
64 d1268971 Guido Trotter
  useful if the vnc password files were copied as well, to avoid unwanted vnc
65 d1268971 Guido Trotter
  password changes on instance failover/migrate.
66 d1268971 Guido Trotter
67 d1268971 Guido Trotter
Optionally the admin may want to also ship files such as the global xend.conf
68 d1268971 Guido Trotter
file, and the network scripts to all nodes.
69 d1268971 Guido Trotter
70 d1268971 Guido Trotter
Proposed changes
71 d1268971 Guido Trotter
++++++++++++++++
72 d1268971 Guido Trotter
73 d1268971 Guido Trotter
RedistributeConfig will be changed to copy also the rapi files, and to call
74 d1268971 Guido Trotter
every enabled hypervisor asking for a list of additional files to copy. We also
75 d1268971 Guido Trotter
may want to add a global list of files on the cluster object, which will be
76 d1268971 Guido Trotter
propagated as well, or a hook to calculate them. If we implement this feature
77 d1268971 Guido Trotter
there should be a way to specify whether a file must be shipped to all nodes or
78 d1268971 Guido Trotter
just master candidates.
79 d1268971 Guido Trotter
80 d1268971 Guido Trotter
This code will be also shared (via tasklets or by other means, if tasklets are
81 d1268971 Guido Trotter
not ready for 2.1) with the AddNode and SetNodeParams LUs (so that the relevant
82 d1268971 Guido Trotter
files will be automatically shipped to new master candidates as they are set).
83 d1268971 Guido Trotter
84 5b18ff3b Guido Trotter
VNC Console Password
85 5b18ff3b Guido Trotter
~~~~~~~~~~~~~~~~~~~~
86 5b18ff3b Guido Trotter
87 5b18ff3b Guido Trotter
Current State and shortcomings
88 5b18ff3b Guido Trotter
++++++++++++++++++++++++++++++
89 5b18ff3b Guido Trotter
90 5b18ff3b Guido Trotter
Currently just the xen-hvm hypervisor supports setting a password to connect
91 5b18ff3b Guido Trotter
the the instances' VNC console, and has one common password stored in a file.
92 5b18ff3b Guido Trotter
93 5b18ff3b Guido Trotter
This doesn't allow different passwords for different instances/groups of
94 5b18ff3b Guido Trotter
instances, and makes it necessary to remember to copy the file around the
95 5b18ff3b Guido Trotter
cluster when the password changes.
96 5b18ff3b Guido Trotter
97 5b18ff3b Guido Trotter
Proposed changes
98 5b18ff3b Guido Trotter
++++++++++++++++
99 5b18ff3b Guido Trotter
100 5b18ff3b Guido Trotter
We'll change the VNC password file to a vnc_password_file hypervisor parameter.
101 5b18ff3b Guido Trotter
This way it can have a cluster default, but also a different value for each
102 5b18ff3b Guido Trotter
instance. The VNC enabled hypervisors (xen and kvm) will publish all the
103 5b18ff3b Guido Trotter
password files in use through the cluster so that a redistribute-config will
104 5b18ff3b Guido Trotter
ship them to all nodes (see the Redistribute Config proposed changes above).
105 5b18ff3b Guido Trotter
106 5b18ff3b Guido Trotter
The current VNC_PASSWORD_FILE constant will be removed, but its value will be
107 5b18ff3b Guido Trotter
used as the default HV_VNC_PASSWORD_FILE value, thus retaining backwards
108 5b18ff3b Guido Trotter
compatibility with 2.0.
109 5b18ff3b Guido Trotter
110 5b18ff3b Guido Trotter
The code to export the list of VNC password files from the hypervisors to
111 5b18ff3b Guido Trotter
RedistributeConfig will be shared between the KVM and xen-hvm hypervisors.
112 5b18ff3b Guido Trotter
113 76bb661b Guido Trotter
Disk/Net parameters
114 76bb661b Guido Trotter
~~~~~~~~~~~~~~~~~~~
115 76bb661b Guido Trotter
116 76bb661b Guido Trotter
Current State and shortcomings
117 76bb661b Guido Trotter
++++++++++++++++++++++++++++++
118 76bb661b Guido Trotter
119 76bb661b Guido Trotter
Currently disks and network interfaces have a few tweakable options and all the
120 76bb661b Guido Trotter
rest is left to a default we chose. We're finding that we need more and more to
121 76bb661b Guido Trotter
tweak some of these parameters, for example to disable barriers for DRBD
122 76bb661b Guido Trotter
devices, or allow striping for the LVM volumes.
123 76bb661b Guido Trotter
124 76bb661b Guido Trotter
Moreover for many of these parameters it will be nice to have cluster-wide
125 76bb661b Guido Trotter
defaults, and then be able to change them per disk/interface.
126 76bb661b Guido Trotter
127 76bb661b Guido Trotter
Proposed changes
128 76bb661b Guido Trotter
++++++++++++++++
129 76bb661b Guido Trotter
130 76bb661b Guido Trotter
We will add new cluster level diskparams and netparams, which will contain all
131 76bb661b Guido Trotter
the tweakable parameters. All values which have a sensible cluster-wide default
132 76bb661b Guido Trotter
will go into this new structure while parameters which have unique values will not.
133 76bb661b Guido Trotter
134 76bb661b Guido Trotter
Example of network parameters:
135 76bb661b Guido Trotter
  - mode: bridge/route
136 76bb661b Guido Trotter
  - link: for mode "bridge" the bridge to connect to, for mode route it can
137 76bb661b Guido Trotter
    contain the routing table, or the destination interface
138 76bb661b Guido Trotter
139 76bb661b Guido Trotter
Example of disk parameters:
140 76bb661b Guido Trotter
  - stripe: lvm stripes
141 76bb661b Guido Trotter
  - stripe_size: lvm stripe size
142 76bb661b Guido Trotter
  - meta_flushes: drbd, enable/disable metadata "barriers"
143 76bb661b Guido Trotter
  - data_flushes: drbd, enable/disable data "barriers"
144 76bb661b Guido Trotter
145 76bb661b Guido Trotter
Some parameters are bound to be disk-type specific (drbd, vs lvm, vs files) or
146 76bb661b Guido Trotter
hypervisor specific (nic models for example), but for now they will all live in
147 76bb661b Guido Trotter
the same structure. Each component is supposed to validate only the parameters
148 76bb661b Guido Trotter
it knows about, and ganeti itself will make sure that no "globally unknown"
149 76bb661b Guido Trotter
parameters are added, and that no parameters have overridden meanings for
150 76bb661b Guido Trotter
different components.
151 76bb661b Guido Trotter
152 76bb661b Guido Trotter
The parameters will be kept, as for the BEPARAMS into a "default" category,
153 76bb661b Guido Trotter
which will allow us to expand on by creating instance "classes" in the future.
154 76bb661b Guido Trotter
Instance classes is not a feature we plan implementing in 2.1, though.
155 76bb661b Guido Trotter
156 bff04b1b Guido Trotter
Non bridged instances support
157 bff04b1b Guido Trotter
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
158 bff04b1b Guido Trotter
159 bff04b1b Guido Trotter
Current State and shortcomings
160 bff04b1b Guido Trotter
++++++++++++++++++++++++++++++
161 bff04b1b Guido Trotter
162 bff04b1b Guido Trotter
Currently each instance NIC must be connected to a bridge, and if the bridge is
163 bff04b1b Guido Trotter
not specified the default cluster one is used. This makes it impossible to use
164 bff04b1b Guido Trotter
the vif-route xen network scripts, or other alternative mechanisms that don't
165 bff04b1b Guido Trotter
need a bridge to work.
166 bff04b1b Guido Trotter
167 bff04b1b Guido Trotter
Proposed changes
168 bff04b1b Guido Trotter
++++++++++++++++
169 bff04b1b Guido Trotter
170 bff04b1b Guido Trotter
The new "mode" network parameter will distinguish between bridged interfaces
171 bff04b1b Guido Trotter
and routed ones.
172 bff04b1b Guido Trotter
173 bff04b1b Guido Trotter
When mode is "bridge" the "link" parameter will contain the bridge the instance
174 bff04b1b Guido Trotter
should be connected to, effectively making things as today. The value has been
175 bff04b1b Guido Trotter
migrated from a nic field to a parameter to allow for an easier manipulation of
176 bff04b1b Guido Trotter
the cluster default.
177 bff04b1b Guido Trotter
178 bff04b1b Guido Trotter
When mode is "route" the ip field of the interface will become mandatory, to
179 bff04b1b Guido Trotter
allow for a route to be set. In the future we may want also to accept multiple
180 bff04b1b Guido Trotter
IPs or IP/mask values for this purpose. We will evaluate possible meanings of
181 bff04b1b Guido Trotter
the link parameter to signify a routing table to be used, which would allow for
182 bff04b1b Guido Trotter
insulation between instance groups (as today happens for different bridges).
183 bff04b1b Guido Trotter
184 bff04b1b Guido Trotter
For now we won't add a parameter to specify which network script gets called
185 bff04b1b Guido Trotter
for which instance, so in a mixed cluster the network script must be able to
186 bff04b1b Guido Trotter
handle both cases. The default kvm vif script will be changed to do so. (Xen
187 bff04b1b Guido Trotter
doesn't have a ganeti provided script, so nothing will be done for that
188 bff04b1b Guido Trotter
hypervisor)
189 76bb661b Guido Trotter
190 82a1c938 Guido Trotter
External interface changes
191 82a1c938 Guido Trotter
--------------------------
192 82a1c938 Guido Trotter
193 b6cc971c Guido Trotter
OS API
194 b6cc971c Guido Trotter
~~~~~~
195 b6cc971c Guido Trotter
196 b6cc971c Guido Trotter
The OS API of Ganeti 2.0 has been built with extensibility in mind. Since we
197 b6cc971c Guido Trotter
pass everything as environment variables it's a lot easier to send new
198 b6cc971c Guido Trotter
information to the OSes without breaking retrocompatibility. This section of
199 b6cc971c Guido Trotter
the design outlines the proposed extensions to the API and their
200 b6cc971c Guido Trotter
implementation.
201 b6cc971c Guido Trotter
202 b6cc971c Guido Trotter
API Version Compatibility Handling
203 b6cc971c Guido Trotter
++++++++++++++++++++++++++++++++++
204 b6cc971c Guido Trotter
205 b6cc971c Guido Trotter
In 2.1 there will be a new OS API version (eg. 15), which should be mostly
206 b6cc971c Guido Trotter
compatible with api 10, except for some new added variables. Since it's easy
207 b6cc971c Guido Trotter
not to pass some variables we'll be able to handle Ganeti 2.0 OSes by just
208 b6cc971c Guido Trotter
filtering out the newly added piece of information. We will still encourage
209 b6cc971c Guido Trotter
OSes to declare support for the new API after checking that the new variables
210 b6cc971c Guido Trotter
don't provide any conflict for them, and we will drop api 10 support after
211 b6cc971c Guido Trotter
ganeti 2.1 has released.
212 b6cc971c Guido Trotter
213 b6cc971c Guido Trotter
New Environment variables
214 b6cc971c Guido Trotter
+++++++++++++++++++++++++
215 b6cc971c Guido Trotter
216 b6cc971c Guido Trotter
Some variables have never been added to the OS api but would definitely be
217 b6cc971c Guido Trotter
useful for the OSes. We plan to add an INSTANCE_HYPERVISOR variable to allow
218 b6cc971c Guido Trotter
the OS to make changes relevant to the virtualization the instance is going to
219 b6cc971c Guido Trotter
use. Since this field is immutable for each instance, the os can tight the
220 b6cc971c Guido Trotter
install without caring of making sure the instance can run under any
221 b6cc971c Guido Trotter
virtualization technology.
222 b6cc971c Guido Trotter
223 b6cc971c Guido Trotter
We also want the OS to know the particular hypervisor parameters, to be able to
224 b6cc971c Guido Trotter
customize the install even more.  Since the parameters can change, though, we
225 b6cc971c Guido Trotter
will pass them only as an "FYI": if an OS ties some instance functionality to
226 b6cc971c Guido Trotter
the value of a particular hypervisor parameter manual changes or a reinstall
227 b6cc971c Guido Trotter
may be needed to adapt the instance to the new environment. This is not a
228 b6cc971c Guido Trotter
regression as of today, because even if the OSes are left blind about this
229 b6cc971c Guido Trotter
information, sometimes they still need to make compromises and cannot satisfy
230 b6cc971c Guido Trotter
all possible parameter values.
231 b6cc971c Guido Trotter
232 b6cc971c Guido Trotter
OS Parameters
233 b6cc971c Guido Trotter
+++++++++++++
234 b6cc971c Guido Trotter
235 b6cc971c Guido Trotter
Currently we are assisting to some degree of "os proliferation" just to change
236 b6cc971c Guido Trotter
a simple installation behavior. This means that the same OS gets installed on
237 b6cc971c Guido Trotter
the cluster multiple times, with different names, to customize just one
238 b6cc971c Guido Trotter
installation behavior. Usually such OSes try to share as much as possible
239 b6cc971c Guido Trotter
through symlinks, but this still causes complications on the user side,
240 b6cc971c Guido Trotter
especially when multiple parameters must be cross-matched.
241 b6cc971c Guido Trotter
242 b6cc971c Guido Trotter
For example today if you want to install debian etch, lenny or squeeze you
243 b6cc971c Guido Trotter
probably need to install the debootstrap OS multiple times, changing its
244 b6cc971c Guido Trotter
configuration file, and calling it debootstrap-etch, debootstrap-lenny or
245 b6cc971c Guido Trotter
debootstrap-squeeze. Furthermore if you have for example a "server" and a
246 b6cc971c Guido Trotter
"development" environment which installs different packages/configuration files
247 b6cc971c Guido Trotter
and must be available for all installs you'll probably end  up with
248 b6cc971c Guido Trotter
deboostrap-etch-server, debootstrap-etch-dev, debootrap-lenny-server,
249 b6cc971c Guido Trotter
debootstrap-lenny-dev, etc. Crossing more than two parameters quickly becomes
250 b6cc971c Guido Trotter
not manageable.
251 b6cc971c Guido Trotter
252 b6cc971c Guido Trotter
In order to avoid this we plan to make OSes more customizable, by allowing
253 b6cc971c Guido Trotter
arbitrary flags to be passed to them. These will be special "OS parameters"
254 b6cc971c Guido Trotter
which will be handled by Ganeti mostly as hypervisor or be parameters. This
255 b6cc971c Guido Trotter
slightly complicates the interface, but allows one OS (for example
256 b6cc971c Guido Trotter
"debootstrap" to be customizable and not require copies to perform different
257 b6cc971c Guido Trotter
cations).
258 b6cc971c Guido Trotter
259 b6cc971c Guido Trotter
Each OS will be able to declare which parameters it supports by listing them
260 b6cc971c Guido Trotter
one per line in a special "parameters" file in the OS dir. The parameters can
261 b6cc971c Guido Trotter
have a per-os cluster default, or be specified at instance creation time.  They
262 b6cc971c Guido Trotter
will then be passed to the OS scripts as: INSTANCE_OS_PARAMETER_<NAME> with
263 b6cc971c Guido Trotter
their specified value. The only value checking that will be performed is that
264 b6cc971c Guido Trotter
the os parameter value is a string, with only "normal" characters in it.
265 b6cc971c Guido Trotter
266 b6cc971c Guido Trotter
It will be impossible to change parameters for an instance, except at reinstall
267 b6cc971c Guido Trotter
time. Upon reinstall with a different OS the parameters will be by default
268 b6cc971c Guido Trotter
discarded and reset to the default (or passed) values, unless a special
269 b6cc971c Guido Trotter
--keep-known-os-parameters flag is passed.
270 b6cc971c Guido Trotter
271 3bd3d643 Iustin Pop
IAllocator changes
272 3bd3d643 Iustin Pop
~~~~~~~~~~~~~~~~~~
273 3bd3d643 Iustin Pop
274 3bd3d643 Iustin Pop
Current State and shortcomings
275 3bd3d643 Iustin Pop
++++++++++++++++++++++++++++++
276 3bd3d643 Iustin Pop
277 3bd3d643 Iustin Pop
The iallocator interface allows creation of instances without manually
278 3bd3d643 Iustin Pop
specifying nodes, but instead by specifying plugins which will do the
279 3bd3d643 Iustin Pop
required computations and produce a valid node list.
280 3bd3d643 Iustin Pop
281 3bd3d643 Iustin Pop
However, the interface is quite akward to use:
282 3bd3d643 Iustin Pop
283 3bd3d643 Iustin Pop
- one cannot set a 'default' iallocator script
284 3bd3d643 Iustin Pop
- one cannot use it to easily test if allocation would succeed
285 3bd3d643 Iustin Pop
- some new functionality, such as rebalancing clusters and calculating
286 3bd3d643 Iustin Pop
  capacity estimates is needed
287 3bd3d643 Iustin Pop
288 3bd3d643 Iustin Pop
Proposed changes
289 3bd3d643 Iustin Pop
++++++++++++++++
290 3bd3d643 Iustin Pop
291 3bd3d643 Iustin Pop
There are two area of improvements proposed:
292 3bd3d643 Iustin Pop
293 3bd3d643 Iustin Pop
- improving the use of the current interface
294 3bd3d643 Iustin Pop
- extending the IAllocator API to cover more automation
295 3bd3d643 Iustin Pop
296 3bd3d643 Iustin Pop
297 3bd3d643 Iustin Pop
Default iallocator names
298 3bd3d643 Iustin Pop
^^^^^^^^^^^^^^^^^^^^^^^^
299 3bd3d643 Iustin Pop
300 3bd3d643 Iustin Pop
The cluster will hold, for each type of iallocator, a (possibly empty)
301 3bd3d643 Iustin Pop
list of modules that will be used automatically.
302 3bd3d643 Iustin Pop
303 3bd3d643 Iustin Pop
If the list is empty, the behaviour will remain the same.
304 3bd3d643 Iustin Pop
305 3bd3d643 Iustin Pop
If the list has one entry, then ganeti will behave as if
306 3bd3d643 Iustin Pop
'--iallocator' was specifyed on the command line. I.e. use this
307 3bd3d643 Iustin Pop
allocator by default. If the user however passed nodes, those will be
308 3bd3d643 Iustin Pop
used in preference.
309 3bd3d643 Iustin Pop
310 3bd3d643 Iustin Pop
If the list has multiple entries, they will be tried in order until
311 3bd3d643 Iustin Pop
one gives a successful answer.
312 3bd3d643 Iustin Pop
313 3bd3d643 Iustin Pop
Dry-run allocation
314 3bd3d643 Iustin Pop
^^^^^^^^^^^^^^^^^^
315 3bd3d643 Iustin Pop
316 3bd3d643 Iustin Pop
The create instance LU will get a new 'dry-run' option that will just
317 3bd3d643 Iustin Pop
simulate the placement, and return the chosen node-lists after running
318 3bd3d643 Iustin Pop
all the usual checks.
319 3bd3d643 Iustin Pop
320 3bd3d643 Iustin Pop
Cluster balancing
321 3bd3d643 Iustin Pop
^^^^^^^^^^^^^^^^^
322 3bd3d643 Iustin Pop
323 3bd3d643 Iustin Pop
Instance add/removals/moves can create a situation where load on the
324 3bd3d643 Iustin Pop
nodes is not spread equally. For this, a new iallocator mode will be
325 3bd3d643 Iustin Pop
implemented called ``balance`` in which the plugin, given the current
326 3bd3d643 Iustin Pop
cluster state, and a maximum number of operations, will need to
327 3bd3d643 Iustin Pop
compute the instance relocations needed in order to achieve a "better"
328 3bd3d643 Iustin Pop
(for whatever the script believes it's better) cluster.
329 3bd3d643 Iustin Pop
330 3bd3d643 Iustin Pop
Cluster capacity calculation
331 3bd3d643 Iustin Pop
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
332 3bd3d643 Iustin Pop
333 3bd3d643 Iustin Pop
In this mode, called ``capacity``, given an instance specification and
334 3bd3d643 Iustin Pop
the current cluster state (similar to the ``allocate`` mode), the
335 3bd3d643 Iustin Pop
plugin needs to return:
336 3bd3d643 Iustin Pop
337 3bd3d643 Iustin Pop
- how many instances can be allocated on the cluster with that specification
338 3bd3d643 Iustin Pop
- on which nodes these will be allocated (in order)