root / doc / design-virtual-clusters.rst @ 347db542
History | View | Annotate | Download (12.1 kB)
1 |
=================================== |
---|---|
2 |
Design for virtual clusters support |
3 |
=================================== |
4 |
|
5 |
|
6 |
Introduction |
7 |
============ |
8 |
|
9 |
Currently there are two ways to test the Ganeti (including HTools) code |
10 |
base: |
11 |
|
12 |
- unittests, which run using mocks as normal user and test small bits of |
13 |
the code |
14 |
- QA/burnin/live-test, which require actual hardware (either physical or |
15 |
virtual) and will build an actual cluster, with one machine to one |
16 |
node correspondence |
17 |
|
18 |
The difference in time between these two is significant: |
19 |
|
20 |
- the unittests run in about 1-2 minutes |
21 |
- a so-called ‘quick’ QA (without burnin) runs in about an hour, and a |
22 |
full QA could be double that time |
23 |
|
24 |
On one hand, the unittests have a clear advantage: quick to run, not |
25 |
requiring many machines, but on the other hand QA is actually able to |
26 |
run end-to-end tests (including HTools, for example). |
27 |
|
28 |
Ideally, we would have an intermediate step between these two extremes: |
29 |
be able to test most, if not all, of Ganeti's functionality but without |
30 |
requiring actual hardware, full machine ownership or root access. |
31 |
|
32 |
|
33 |
Current situation |
34 |
================= |
35 |
|
36 |
Ganeti |
37 |
------ |
38 |
|
39 |
It is possible, given a manually built ``config.data`` and |
40 |
``_autoconf.py``, to run the masterd under the current user as a |
41 |
single-node cluster master. However, the node daemon and related |
42 |
functionality (cluster initialisation, master failover, etc.) are not |
43 |
directly runnable in this model. |
44 |
|
45 |
Also, masterd only works as a master of a single node cluster, due to |
46 |
our current “hostname” method of identifying nodes, which results in a |
47 |
limit of maximum one node daemon per machine, unless we use multiple |
48 |
name and IP aliases. |
49 |
|
50 |
HTools |
51 |
------ |
52 |
|
53 |
In HTools the situation is better, since it doesn't have to deal with |
54 |
actual machine management: all tools can use a custom LUXI path, and can |
55 |
even load RAPI data from the filesystem (so the RAPI backend can be |
56 |
tested), and both the ‘text’ backend for hbal/hspace and the input files |
57 |
for hail are text-based, loaded from the file-system. |
58 |
|
59 |
Proposed changes |
60 |
================ |
61 |
|
62 |
The end-goal is to have full support for “virtual clusters”, i.e. be |
63 |
able to run a “big” (hundreds of virtual nodes and towards thousands of |
64 |
virtual instances) on a reasonably powerful, but single machine, under a |
65 |
single user account and without any special privileges. |
66 |
|
67 |
This would have significant advantages: |
68 |
|
69 |
- being able to test end-to-end certain changes, without requiring a |
70 |
complicated setup |
71 |
- better able to estimate Ganeti's behaviour and performance as the |
72 |
cluster size grows; this is something that we haven't been able to |
73 |
test reliably yet, and as such we still have not yet diagnosed |
74 |
scaling problems |
75 |
- easier integration with external tools (and even with HTools) |
76 |
|
77 |
``masterd`` |
78 |
----------- |
79 |
|
80 |
As described above, ``masterd`` already works reasonably well in a |
81 |
virtual setup, as it won't execute external programs and it shouldn't |
82 |
directly read files from the local filesystem (or at least not |
83 |
virtualisation-related, as the master node can be a non-vm_capable |
84 |
node). |
85 |
|
86 |
``noded`` |
87 |
--------- |
88 |
|
89 |
The node daemon executes many privileged operations, but they can be |
90 |
split in a few general categories: |
91 |
|
92 |
+---------------+-----------------------+------------------------------------+ |
93 |
|Category |Description |Solution | |
94 |
+===============+=======================+====================================+ |
95 |
|disk operations|Disk creation and |Use only diskless or file-based | |
96 |
| |removal |instances | |
97 |
+---------------+-----------------------+------------------------------------+ |
98 |
|disk query |Node disk total/free, |Not supported currently, could use | |
99 |
| |used in node listing |file-based | |
100 |
| |and htools | | |
101 |
+---------------+-----------------------+------------------------------------+ |
102 |
|hypervisor |Instance start, stop |Use the *fake* hypervisor | |
103 |
|operations |and query | | |
104 |
+---------------+-----------------------+------------------------------------+ |
105 |
|instance |Bridge existence query |Unprivileged operation, can be used | |
106 |
|networking | |with an existing bridge at system | |
107 |
| | |level or use NIC-less instances | |
108 |
+---------------+-----------------------+------------------------------------+ |
109 |
|instance OS |OS add, OS rename, |Only used with non-diskless | |
110 |
|operations |export and import |instances; could work with custom OS| |
111 |
| | |scripts that just ``dd`` without | |
112 |
| | |mounting filesystems | |
113 |
+---------------+-----------------------+------------------------------------+ |
114 |
|node networking|IP address management |Not supported; Ganeti will need to | |
115 |
| |(master ip), IP query, |work without a master IP; for the IP| |
116 |
| |etc. |query operations the test machine | |
117 |
| | |would need externally-configured IPs| |
118 |
+---------------+-----------------------+------------------------------------+ |
119 |
|node add |- |SSH command must be adjusted | |
120 |
+---------------+-----------------------+------------------------------------+ |
121 |
|node setup |ssh, /etc/hosts, so on |Can already be disabled from the | |
122 |
| | |cluster config | |
123 |
+---------------+-----------------------+------------------------------------+ |
124 |
|master failover|start/stop the master |Doable (as long as we use a single | |
125 |
| |daemon |user), might get tricky w.r.t. paths| |
126 |
| | |to executables | |
127 |
+---------------+-----------------------+------------------------------------+ |
128 |
|file upload |Uploading of system |The only issue could be with system | |
129 |
| |files, job queue files |files, which are not owned by the | |
130 |
| |and ganeti config |current user; internal ganeti files | |
131 |
| | |should be working fine | |
132 |
+---------------+-----------------------+------------------------------------+ |
133 |
|node oob |Out-of-band commands |Since these are user-defined, we can| |
134 |
| | |mock them easily | |
135 |
+---------------+-----------------------+------------------------------------+ |
136 |
|node OS |List the existing OSes |No special privileges needed, so | |
137 |
|discovery |and their properties |works fine as-is | |
138 |
+---------------+-----------------------+------------------------------------+ |
139 |
|hooks |Running hooks for given|No special privileges needed | |
140 |
| |operations | | |
141 |
+---------------+-----------------------+------------------------------------+ |
142 |
|iallocator |Calling an iallocator |No special privileges needed | |
143 |
| |script | | |
144 |
+---------------+-----------------------+------------------------------------+ |
145 |
|export/import |Exporting and importing|When exporting/importing file-based | |
146 |
| |instances |instances, this should work, as the | |
147 |
| | |listening ports are dynamically | |
148 |
| | |chosen | |
149 |
+---------------+-----------------------+------------------------------------+ |
150 |
|hypervisor |The validation of |As long as the hypervisors don't | |
151 |
|validation |hypervisor parameters |call to privileged commands, it | |
152 |
| | |should work | |
153 |
+---------------+-----------------------+------------------------------------+ |
154 |
|node powercycle|The ability to power |Privileged, so not supported, but | |
155 |
| |cycle a node remotely |anyway not very interesting for | |
156 |
| | |testing | |
157 |
+---------------+-----------------------+------------------------------------+ |
158 |
|
159 |
It seems that much of the functionality works as is, or could work with |
160 |
small adjustments, even in a non-privileged setup. The bigger problem is |
161 |
the actual use of multiple node daemons per machine. |
162 |
|
163 |
Multiple ``noded`` per machine |
164 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
165 |
|
166 |
Currently Ganeti identifies node simply by their hostname. Since |
167 |
changing this method would imply significant changes to tracking the |
168 |
nodes, the proposal is to simply have as many IPs per the (single) |
169 |
machine that is used for tests as nodes, and have each IP correspond to |
170 |
a different name, and thus no changes are needed to the core RPC |
171 |
library. Unfortunately this has the downside of requiring root rights |
172 |
for setting up the extra IPs and hostnames. |
173 |
|
174 |
An alternative option is to implement per-node IP/port support in Ganeti |
175 |
(especially in the RPC layer), which would eliminate the root rights. We |
176 |
expect that this will get implemented as a second step of this design, |
177 |
but as the port is currently static will require changes in many places. |
178 |
|
179 |
The only remaining problem is with sharing the ``localstatedir`` |
180 |
structure (lib, run, log) amongst the daemons, for which we propose to |
181 |
introduce an environment variable (``GANETI_ROOTDIR``) acting as a |
182 |
prefix for essentially all paths. An environment variable is easier to |
183 |
transport through several levels of programs (shell scripts, Python, |
184 |
etc.) than a command line parameter. In Python code this prefix will be |
185 |
applied to all paths in ``constants.py``. Every virtual node will get |
186 |
its own root directory. The rationale for this is two-fold: |
187 |
|
188 |
- having two or more node daemons writing to the same directory might |
189 |
introduce artificial scenarios not existent in real life; currently |
190 |
noded either owns the entire ``/var/lib/ganeti`` directory or shares |
191 |
it with masterd, but never with another noded |
192 |
- having separate directories allows cluster verify to check correctly |
193 |
consistency of file upload operations; otherwise, as long as one node |
194 |
daemon wrote a file successfully, the results from all others are |
195 |
“lost” |
196 |
|
197 |
In case the use of an environment variable turns out to be too difficult |
198 |
a compile-time prefix path could be used. This would then require one |
199 |
Ganeti installation per virtual node, but it might be good enough. |
200 |
|
201 |
``rapi`` |
202 |
-------- |
203 |
|
204 |
The RAPI daemon is not privileged and furthermore we only need one per |
205 |
cluster, so it presents no issues. |
206 |
|
207 |
``confd`` |
208 |
--------- |
209 |
|
210 |
``confd`` has somewhat the same issues as the node daemon regarding |
211 |
multiple daemons per machine, but the per-address binding still works. |
212 |
|
213 |
``ganeti-watcher`` |
214 |
------------------ |
215 |
|
216 |
Since the startup of daemons will be customised with per-IP binds, the |
217 |
watcher either has to be modified to not activate the daemons, or the |
218 |
start-stop tool has to take this into account. Due to watcher's use of |
219 |
the hostname, it's recommended that the master node is set to the |
220 |
machine hostname (also a requirement for the master daemon). |
221 |
|
222 |
CLI scripts |
223 |
----------- |
224 |
|
225 |
As long as the master node is set to the machine hostname, these should |
226 |
work fine. |
227 |
|
228 |
Cluster initialisation |
229 |
---------------------- |
230 |
|
231 |
It could be possible that the cluster initialisation procedure is a bit |
232 |
more involved (this was not tried yet). A script will be used to set up |
233 |
all necessary IP addresses and hostnames, as well as creating the |
234 |
initial directory structure. Building ``config.data`` manually should |
235 |
not be necessary. |
236 |
|
237 |
Needed tools |
238 |
============ |
239 |
|
240 |
With the above investigation results in mind, the only thing we need |
241 |
are: |
242 |
|
243 |
- a tool to setup per-virtual node tree structure of ``localstatedir`` |
244 |
(with the help of ``ensure-dirs``) and setup correctly the extra |
245 |
IP/hostnames |
246 |
- changes to the startup daemon tools to launch correctly the daemons |
247 |
per virtual node |
248 |
- changes to ``constants.py`` to override the ``localstatedir`` path |
249 |
- documentation for running such a virtual cluster |
250 |
- and eventual small fixes to the node daemon backend functionality, to |
251 |
better separate privileged and non-privileged code |
252 |
|
253 |
.. vim: set textwidth=72 : |
254 |
.. Local Variables: |
255 |
.. mode: rst |
256 |
.. fill-column: 72 |
257 |
.. End: |