Statistics
| Branch: | Tag: | Revision:

root / doc / design-virtual-clusters.rst @ 56c934da

History | View | Annotate | Download (12.1 kB)

1 ab171697 Michael Hanselmann
===================================
2 ab171697 Michael Hanselmann
Design for virtual clusters support
3 ab171697 Michael Hanselmann
===================================
4 6b898285 Iustin Pop
5 6b898285 Iustin Pop
6 6b898285 Iustin Pop
Introduction
7 6b898285 Iustin Pop
============
8 6b898285 Iustin Pop
9 6b898285 Iustin Pop
Currently there are two ways to test the Ganeti (including HTools) code
10 6b898285 Iustin Pop
base:
11 6b898285 Iustin Pop
12 6b898285 Iustin Pop
- unittests, which run using mocks as normal user and test small bits of
13 6b898285 Iustin Pop
  the code
14 6b898285 Iustin Pop
- QA/burnin/live-test, which require actual hardware (either physical or
15 6b898285 Iustin Pop
  virtual) and will build an actual cluster, with one machine to one
16 6b898285 Iustin Pop
  node correspondence
17 6b898285 Iustin Pop
18 6b898285 Iustin Pop
The difference in time between these two is significant:
19 6b898285 Iustin Pop
20 6b898285 Iustin Pop
- the unittests run in about 1-2 minutes
21 6b898285 Iustin Pop
- a so-called ‘quick’ QA (without burnin) runs in about an hour, and a
22 6b898285 Iustin Pop
  full QA could be double that time
23 6b898285 Iustin Pop
24 6b898285 Iustin Pop
On one hand, the unittests have a clear advantage: quick to run, not
25 6b898285 Iustin Pop
requiring many machines, but on the other hand QA is actually able to
26 6b898285 Iustin Pop
run end-to-end tests (including HTools, for example).
27 6b898285 Iustin Pop
28 6b898285 Iustin Pop
Ideally, we would have an intermediate step between these two extremes:
29 6b898285 Iustin Pop
be able to test most, if not all, of Ganeti's functionality but without
30 6b898285 Iustin Pop
requiring actual hardware, full machine ownership or root access.
31 6b898285 Iustin Pop
32 6b898285 Iustin Pop
33 6b898285 Iustin Pop
Current situation
34 6b898285 Iustin Pop
=================
35 6b898285 Iustin Pop
36 6b898285 Iustin Pop
Ganeti
37 6b898285 Iustin Pop
------
38 6b898285 Iustin Pop
39 6b898285 Iustin Pop
It is possible, given a manually built ``config.data`` and
40 6b898285 Iustin Pop
``_autoconf.py``, to run the masterd under the current user as a
41 6b898285 Iustin Pop
single-node cluster master. However, the node daemon and related
42 6b898285 Iustin Pop
functionality (cluster initialisation, master failover, etc.) are not
43 6b898285 Iustin Pop
directly runnable in this model.
44 6b898285 Iustin Pop
45 6b898285 Iustin Pop
Also, masterd only works as a master of a single node cluster, due to
46 6b898285 Iustin Pop
our current “hostname” method of identifying nodes, which results in a
47 6b898285 Iustin Pop
limit of maximum one node daemon per machine, unless we use multiple
48 6b898285 Iustin Pop
name and IP aliases.
49 6b898285 Iustin Pop
50 6b898285 Iustin Pop
HTools
51 6b898285 Iustin Pop
------
52 6b898285 Iustin Pop
53 6b898285 Iustin Pop
In HTools the situation is better, since it doesn't have to deal with
54 6b898285 Iustin Pop
actual machine management: all tools can use a custom LUXI path, and can
55 6b898285 Iustin Pop
even load RAPI data from the filesystem (so the RAPI backend can be
56 6b898285 Iustin Pop
tested), and both the ‘text’ backend for hbal/hspace and the input files
57 6b898285 Iustin Pop
for hail are text-based, loaded from the file-system.
58 6b898285 Iustin Pop
59 6b898285 Iustin Pop
Proposed changes
60 6b898285 Iustin Pop
================
61 6b898285 Iustin Pop
62 6b898285 Iustin Pop
The end-goal is to have full support for “virtual clusters”, i.e. be
63 6b898285 Iustin Pop
able to run a “big” (hundreds of virtual nodes and towards thousands of
64 6b898285 Iustin Pop
virtual instances) on a reasonably powerful, but single machine, under a
65 6b898285 Iustin Pop
single user account and without any special privileges.
66 6b898285 Iustin Pop
67 6b898285 Iustin Pop
This would have significant advantages:
68 6b898285 Iustin Pop
69 6b898285 Iustin Pop
- being able to test end-to-end certain changes, without requiring a
70 6b898285 Iustin Pop
  complicated setup
71 6b898285 Iustin Pop
- better able to estimate Ganeti's behaviour and performance as the
72 6b898285 Iustin Pop
  cluster size grows; this is something that we haven't been able to
73 6b898285 Iustin Pop
  test reliably yet, and as such we still have not yet diagnosed
74 6b898285 Iustin Pop
  scaling problems
75 6b898285 Iustin Pop
- easier integration with external tools (and even with HTools)
76 6b898285 Iustin Pop
77 6b898285 Iustin Pop
``masterd``
78 6b898285 Iustin Pop
-----------
79 6b898285 Iustin Pop
80 6b898285 Iustin Pop
As described above, ``masterd`` already works reasonably well in a
81 6b898285 Iustin Pop
virtual setup, as it won't execute external programs and it shouldn't
82 6b898285 Iustin Pop
directly read files from the local filesystem (or at least not
83 6b898285 Iustin Pop
virtualisation-related, as the master node can be a non-vm_capable
84 6b898285 Iustin Pop
node).
85 6b898285 Iustin Pop
86 6b898285 Iustin Pop
``noded``
87 6b898285 Iustin Pop
---------
88 6b898285 Iustin Pop
89 6b898285 Iustin Pop
The node daemon executes many privileged operations, but they can be
90 6b898285 Iustin Pop
split in a few general categories:
91 6b898285 Iustin Pop
92 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
93 6b898285 Iustin Pop
|Category       |Description            |Solution                            |
94 6b898285 Iustin Pop
+===============+=======================+====================================+
95 6b898285 Iustin Pop
|disk operations|Disk creation and      |Use only diskless or file-based     |
96 6b898285 Iustin Pop
|               |removal                |instances                           |
97 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
98 6b898285 Iustin Pop
|disk query     |Node disk total/free,  |Not supported currently, could use  |
99 6b898285 Iustin Pop
|               |used in node listing   |file-based                          |
100 6b898285 Iustin Pop
|               |and htools             |                                    |
101 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
102 6b898285 Iustin Pop
|hypervisor     |Instance start, stop   |Use the *fake* hypervisor           |
103 6b898285 Iustin Pop
|operations     |and query              |                                    |
104 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
105 6b898285 Iustin Pop
|instance       |Bridge existence query |Unprivileged operation, can be used |
106 6b898285 Iustin Pop
|networking     |                       |with an existing bridge at system   |
107 6b898285 Iustin Pop
|               |                       |level or use NIC-less instances     |
108 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
109 99721eb4 Michael Hanselmann
|instance OS    |OS add, OS rename,     |Only used with non-diskless         |
110 6b898285 Iustin Pop
|operations     |export and import      |instances; could work with custom OS|
111 99721eb4 Michael Hanselmann
|               |                       |scripts that just ``dd`` without    |
112 6b898285 Iustin Pop
|               |                       |mounting filesystems                |
113 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
114 6b898285 Iustin Pop
|node networking|IP address management  |Not supported; Ganeti will need to  |
115 99721eb4 Michael Hanselmann
|               |(master ip), IP query, |work without a master IP; for the IP|
116 99721eb4 Michael Hanselmann
|               |etc.                   |query operations the test machine   |
117 6b898285 Iustin Pop
|               |                       |would need externally-configured IPs|
118 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
119 99721eb4 Michael Hanselmann
|node add       |-                      |SSH command must be adjusted        |
120 99721eb4 Michael Hanselmann
+---------------+-----------------------+------------------------------------+
121 6b898285 Iustin Pop
|node setup     |ssh, /etc/hosts, so on |Can already be disabled from the    |
122 6b898285 Iustin Pop
|               |                       |cluster config                      |
123 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
124 6b898285 Iustin Pop
|master failover|start/stop the master  |Doable (as long as we use a single  |
125 6b898285 Iustin Pop
|               |daemon                 |user), might get tricky w.r.t. paths|
126 6b898285 Iustin Pop
|               |                       |to executables                      |
127 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
128 6b898285 Iustin Pop
|file upload    |Uploading of system    |The only issue could be with system |
129 6b898285 Iustin Pop
|               |files, job queue files |files, which are not owned by the   |
130 6b898285 Iustin Pop
|               |and ganeti config      |current user; internal ganeti files |
131 6b898285 Iustin Pop
|               |                       |should be working fine              |
132 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
133 6b898285 Iustin Pop
|node oob       |Out-of-band commands   |Since these are user-defined, we can|
134 6b898285 Iustin Pop
|               |                       |mock them easily                    |
135 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
136 6b898285 Iustin Pop
|node OS        |List the existing OSes |No special privileges needed, so    |
137 6b898285 Iustin Pop
|discovery      |and their properties   |works fine as-is                    |
138 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
139 6b898285 Iustin Pop
|hooks          |Running hooks for given|No special privileges needed        |
140 6b898285 Iustin Pop
|               |operations             |                                    |
141 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
142 6b898285 Iustin Pop
|iallocator     |Calling an iallocator  |No special privileges needed        |
143 6b898285 Iustin Pop
|               |script                 |                                    |
144 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
145 6b898285 Iustin Pop
|export/import  |Exporting and importing|When exporting/importing file-based |
146 6b898285 Iustin Pop
|               |instances              |instances, this should work, as the |
147 6b898285 Iustin Pop
|               |                       |listening ports are dynamically     |
148 6b898285 Iustin Pop
|               |                       |chosen                              |
149 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
150 6b898285 Iustin Pop
|hypervisor     |The validation of      |As long as the hypervisors don't    |
151 6b898285 Iustin Pop
|validation     |hypervisor parameters  |call to privileged commands, it     |
152 6b898285 Iustin Pop
|               |                       |should work                         |
153 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
154 6b898285 Iustin Pop
|node powercycle|The ability to power   |Privileged, so not supported, but   |
155 6b898285 Iustin Pop
|               |cycle a node remotely  |anyway not very interesting for     |
156 6b898285 Iustin Pop
|               |                       |testing                             |
157 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
158 6b898285 Iustin Pop
159 6b898285 Iustin Pop
It seems that much of the functionality works as is, or could work with
160 6b898285 Iustin Pop
small adjustments, even in a non-privileged setup. The bigger problem is
161 6b898285 Iustin Pop
the actual use of multiple node daemons per machine.
162 6b898285 Iustin Pop
163 6b898285 Iustin Pop
Multiple ``noded`` per machine
164 6b898285 Iustin Pop
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
165 6b898285 Iustin Pop
166 6b898285 Iustin Pop
Currently Ganeti identifies node simply by their hostname. Since
167 6b898285 Iustin Pop
changing this method would imply significant changes to tracking the
168 6b898285 Iustin Pop
nodes, the proposal is to simply have as many IPs per the (single)
169 6b898285 Iustin Pop
machine that is used for tests as nodes, and have each IP correspond to
170 6b898285 Iustin Pop
a different name, and thus no changes are needed to the core RPC
171 6b898285 Iustin Pop
library. Unfortunately this has the downside of requiring root rights
172 6b898285 Iustin Pop
for setting up the extra IPs and hostnames.
173 6b898285 Iustin Pop
174 6b898285 Iustin Pop
An alternative option is to implement per-node IP/port support in Ganeti
175 6b898285 Iustin Pop
(especially in the RPC layer), which would eliminate the root rights. We
176 99721eb4 Michael Hanselmann
expect that this will get implemented as a second step of this design,
177 99721eb4 Michael Hanselmann
but as the port is currently static will require changes in many places.
178 6b898285 Iustin Pop
179 6b898285 Iustin Pop
The only remaining problem is with sharing the ``localstatedir``
180 6b898285 Iustin Pop
structure (lib, run, log) amongst the daemons, for which we propose to
181 99721eb4 Michael Hanselmann
introduce an environment variable (``GANETI_ROOTDIR``) acting as a
182 99721eb4 Michael Hanselmann
prefix for essentially all paths. An environment variable is easier to
183 99721eb4 Michael Hanselmann
transport through several levels of programs (shell scripts, Python,
184 99721eb4 Michael Hanselmann
etc.) than a command line parameter. In Python code this prefix will be
185 99721eb4 Michael Hanselmann
applied to all paths in ``constants.py``. Every virtual node will get
186 99721eb4 Michael Hanselmann
its own root directory. The rationale for this is two-fold:
187 6b898285 Iustin Pop
188 6b898285 Iustin Pop
- having two or more node daemons writing to the same directory might
189 6b898285 Iustin Pop
  introduce artificial scenarios not existent in real life; currently
190 6b898285 Iustin Pop
  noded either owns the entire ``/var/lib/ganeti`` directory or shares
191 6b898285 Iustin Pop
  it with masterd, but never with another noded
192 6b898285 Iustin Pop
- having separate directories allows cluster verify to check correctly
193 6b898285 Iustin Pop
  consistency of file upload operations; otherwise, as long as one node
194 6b898285 Iustin Pop
  daemon wrote a file successfully, the results from all others are
195 6b898285 Iustin Pop
  “lost”
196 6b898285 Iustin Pop
197 99721eb4 Michael Hanselmann
In case the use of an environment variable turns out to be too difficult
198 99721eb4 Michael Hanselmann
a compile-time prefix path could be used. This would then require one
199 99721eb4 Michael Hanselmann
Ganeti installation per virtual node, but it might be good enough.
200 6b898285 Iustin Pop
201 6b898285 Iustin Pop
``rapi``
202 6b898285 Iustin Pop
--------
203 6b898285 Iustin Pop
204 6b898285 Iustin Pop
The RAPI daemon is not privileged and furthermore we only need one per
205 6b898285 Iustin Pop
cluster, so it presents no issues.
206 6b898285 Iustin Pop
207 6b898285 Iustin Pop
``confd``
208 6b898285 Iustin Pop
---------
209 6b898285 Iustin Pop
210 6b898285 Iustin Pop
``confd`` has somewhat the same issues as the node daemon regarding
211 6b898285 Iustin Pop
multiple daemons per machine, but the per-address binding still works.
212 6b898285 Iustin Pop
213 6b898285 Iustin Pop
``ganeti-watcher``
214 6b898285 Iustin Pop
------------------
215 6b898285 Iustin Pop
216 6b898285 Iustin Pop
Since the startup of daemons will be customised with per-IP binds, the
217 6b898285 Iustin Pop
watcher either has to be modified to not activate the daemons, or the
218 6b898285 Iustin Pop
start-stop tool has to take this into account. Due to watcher's use of
219 6b898285 Iustin Pop
the hostname, it's recommended that the master node is set to the
220 6b898285 Iustin Pop
machine hostname (also a requirement for the master daemon).
221 6b898285 Iustin Pop
222 6b898285 Iustin Pop
CLI scripts
223 6b898285 Iustin Pop
-----------
224 6b898285 Iustin Pop
225 6b898285 Iustin Pop
As long as the master node is set to the machine hostname, these should
226 6b898285 Iustin Pop
work fine.
227 6b898285 Iustin Pop
228 6b898285 Iustin Pop
Cluster initialisation
229 6b898285 Iustin Pop
----------------------
230 6b898285 Iustin Pop
231 6b898285 Iustin Pop
It could be possible that the cluster initialisation procedure is a bit
232 99721eb4 Michael Hanselmann
more involved (this was not tried yet). A script will be used to set up
233 99721eb4 Michael Hanselmann
all necessary IP addresses and hostnames, as well as creating the
234 99721eb4 Michael Hanselmann
initial directory structure. Building ``config.data`` manually should
235 99721eb4 Michael Hanselmann
not be necessary.
236 6b898285 Iustin Pop
237 6b898285 Iustin Pop
Needed tools
238 6b898285 Iustin Pop
============
239 6b898285 Iustin Pop
240 6b898285 Iustin Pop
With the above investigation results in mind, the only thing we need
241 6b898285 Iustin Pop
are:
242 6b898285 Iustin Pop
243 6b898285 Iustin Pop
- a tool to setup per-virtual node tree structure of ``localstatedir``
244 99721eb4 Michael Hanselmann
  (with the help of ``ensure-dirs``) and setup correctly the extra
245 99721eb4 Michael Hanselmann
  IP/hostnames
246 6b898285 Iustin Pop
- changes to the startup daemon tools to launch correctly the daemons
247 6b898285 Iustin Pop
  per virtual node
248 99721eb4 Michael Hanselmann
- changes to ``constants.py`` to override the ``localstatedir`` path
249 6b898285 Iustin Pop
- documentation for running such a virtual cluster
250 6b898285 Iustin Pop
- and eventual small fixes to the node daemon backend functionality, to
251 6b898285 Iustin Pop
  better separate privileged and non-privileged code
252 6b898285 Iustin Pop
253 6b898285 Iustin Pop
.. vim: set textwidth=72 :
254 6b898285 Iustin Pop
.. Local Variables:
255 6b898285 Iustin Pop
.. mode: rst
256 6b898285 Iustin Pop
.. fill-column: 72
257 6b898285 Iustin Pop
.. End: