root / doc / design-virtual-clusters.rst @ 7142485a
History | View | Annotate | Download (11.3 kB)
1 |
========================== |
---|---|
2 |
Virtual clusters support |
3 |
========================== |
4 |
|
5 |
|
6 |
Introduction |
7 |
============ |
8 |
|
9 |
Currently there are two ways to test the Ganeti (including HTools) code |
10 |
base: |
11 |
|
12 |
- unittests, which run using mocks as normal user and test small bits of |
13 |
the code |
14 |
- QA/burnin/live-test, which require actual hardware (either physical or |
15 |
virtual) and will build an actual cluster, with one machine to one |
16 |
node correspondence |
17 |
|
18 |
The difference in time between these two is significant: |
19 |
|
20 |
- the unittests run in about 1-2 minutes |
21 |
- a so-called ‘quick’ QA (without burnin) runs in about an hour, and a |
22 |
full QA could be double that time |
23 |
|
24 |
On one hand, the unittests have a clear advantage: quick to run, not |
25 |
requiring many machines, but on the other hand QA is actually able to |
26 |
run end-to-end tests (including HTools, for example). |
27 |
|
28 |
Ideally, we would have an intermediate step between these two extremes: |
29 |
be able to test most, if not all, of Ganeti's functionality but without |
30 |
requiring actual hardware, full machine ownership or root access. |
31 |
|
32 |
|
33 |
Current situation |
34 |
================= |
35 |
|
36 |
Ganeti |
37 |
------ |
38 |
|
39 |
It is possible, given a manually built ``config.data`` and |
40 |
``_autoconf.py``, to run the masterd under the current user as a |
41 |
single-node cluster master. However, the node daemon and related |
42 |
functionality (cluster initialisation, master failover, etc.) are not |
43 |
directly runnable in this model. |
44 |
|
45 |
Also, masterd only works as a master of a single node cluster, due to |
46 |
our current “hostname” method of identifying nodes, which results in a |
47 |
limit of maximum one node daemon per machine, unless we use multiple |
48 |
name and IP aliases. |
49 |
|
50 |
HTools |
51 |
------ |
52 |
|
53 |
In HTools the situation is better, since it doesn't have to deal with |
54 |
actual machine management: all tools can use a custom LUXI path, and can |
55 |
even load RAPI data from the filesystem (so the RAPI backend can be |
56 |
tested), and both the ‘text’ backend for hbal/hspace and the input files |
57 |
for hail are text-based, loaded from the file-system. |
58 |
|
59 |
Proposed changes |
60 |
================ |
61 |
|
62 |
The end-goal is to have full support for “virtual clusters”, i.e. be |
63 |
able to run a “big” (hundreds of virtual nodes and towards thousands of |
64 |
virtual instances) on a reasonably powerful, but single machine, under a |
65 |
single user account and without any special privileges. |
66 |
|
67 |
This would have significant advantages: |
68 |
|
69 |
- being able to test end-to-end certain changes, without requiring a |
70 |
complicated setup |
71 |
- better able to estimate Ganeti's behaviour and performance as the |
72 |
cluster size grows; this is something that we haven't been able to |
73 |
test reliably yet, and as such we still have not yet diagnosed |
74 |
scaling problems |
75 |
- easier integration with external tools (and even with HTools) |
76 |
|
77 |
``masterd`` |
78 |
----------- |
79 |
|
80 |
As described above, ``masterd`` already works reasonably well in a |
81 |
virtual setup, as it won't execute external programs and it shouldn't |
82 |
directly read files from the local filesystem (or at least not |
83 |
virtualisation-related, as the master node can be a non-vm_capable |
84 |
node). |
85 |
|
86 |
``noded`` |
87 |
--------- |
88 |
|
89 |
The node daemon executes many privileged operations, but they can be |
90 |
split in a few general categories: |
91 |
|
92 |
+---------------+-----------------------+------------------------------------+ |
93 |
|Category |Description |Solution | |
94 |
+===============+=======================+====================================+ |
95 |
|disk operations|Disk creation and |Use only diskless or file-based | |
96 |
| |removal |instances | |
97 |
+---------------+-----------------------+------------------------------------+ |
98 |
|disk query |Node disk total/free, |Not supported currently, could use | |
99 |
| |used in node listing |file-based | |
100 |
| |and htools | | |
101 |
+---------------+-----------------------+------------------------------------+ |
102 |
|hypervisor |Instance start, stop |Use the *fake* hypervisor | |
103 |
|operations |and query | | |
104 |
+---------------+-----------------------+------------------------------------+ |
105 |
|instance |Bridge existence query |Unprivileged operation, can be used | |
106 |
|networking | |with an existing bridge at system | |
107 |
| | |level or use NIC-less instances | |
108 |
+---------------+-----------------------+------------------------------------+ |
109 |
|instance OS |OS add, OS rename, |Only used with non diskless | |
110 |
|operations |export and import |instances; could work with custom OS| |
111 |
| | |scripts (that just ``dd`` without | |
112 |
| | |mounting filesystems | |
113 |
+---------------+-----------------------+------------------------------------+ |
114 |
|node networking|IP address management |Not supported; Ganeti will need to | |
115 |
| |(master ip), IP query, |work without a master IP. For the IP| |
116 |
| |etc. |query operations, the test machine | |
117 |
| | |would need externally-configured IPs| |
118 |
+---------------+-----------------------+------------------------------------+ |
119 |
|node setup |ssh, /etc/hosts, so on |Can already be disabled from the | |
120 |
| | |cluster config | |
121 |
+---------------+-----------------------+------------------------------------+ |
122 |
|master failover|start/stop the master |Doable (as long as we use a single | |
123 |
| |daemon |user), might get tricky w.r.t. paths| |
124 |
| | |to executables | |
125 |
+---------------+-----------------------+------------------------------------+ |
126 |
|file upload |Uploading of system |The only issue could be with system | |
127 |
| |files, job queue files |files, which are not owned by the | |
128 |
| |and ganeti config |current user; internal ganeti files | |
129 |
| | |should be working fine | |
130 |
+---------------+-----------------------+------------------------------------+ |
131 |
|node oob |Out-of-band commands |Since these are user-defined, we can| |
132 |
| | |mock them easily | |
133 |
+---------------+-----------------------+------------------------------------+ |
134 |
|node OS |List the existing OSes |No special privileges needed, so | |
135 |
|discovery |and their properties |works fine as-is | |
136 |
+---------------+-----------------------+------------------------------------+ |
137 |
|hooks |Running hooks for given|No special privileges needed | |
138 |
| |operations | | |
139 |
+---------------+-----------------------+------------------------------------+ |
140 |
|iallocator |Calling an iallocator |No special privileges needed | |
141 |
| |script | | |
142 |
+---------------+-----------------------+------------------------------------+ |
143 |
|export/import |Exporting and importing|When exporting/importing file-based | |
144 |
| |instances |instances, this should work, as the | |
145 |
| | |listening ports are dynamically | |
146 |
| | |chosen | |
147 |
+---------------+-----------------------+------------------------------------+ |
148 |
|hypervisor |The validation of |As long as the hypervisors don't | |
149 |
|validation |hypervisor parameters |call to privileged commands, it | |
150 |
| | |should work | |
151 |
+---------------+-----------------------+------------------------------------+ |
152 |
|node powercycle|The ability to power |Privileged, so not supported, but | |
153 |
| |cycle a node remotely |anyway not very interesting for | |
154 |
| | |testing | |
155 |
+---------------+-----------------------+------------------------------------+ |
156 |
|
157 |
It seems that much of the functionality works as is, or could work with |
158 |
small adjustments, even in a non-privileged setup. The bigger problem is |
159 |
the actual use of multiple node daemons per machine. |
160 |
|
161 |
Multiple ``noded`` per machine |
162 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
163 |
|
164 |
Currently Ganeti identifies node simply by their hostname. Since |
165 |
changing this method would imply significant changes to tracking the |
166 |
nodes, the proposal is to simply have as many IPs per the (single) |
167 |
machine that is used for tests as nodes, and have each IP correspond to |
168 |
a different name, and thus no changes are needed to the core RPC |
169 |
library. Unfortunately this has the downside of requiring root rights |
170 |
for setting up the extra IPs and hostnames. |
171 |
|
172 |
An alternative option is to implement per-node IP/port support in Ganeti |
173 |
(especially in the RPC layer), which would eliminate the root rights. We |
174 |
expect that this will get implemented as a second step of this design. |
175 |
|
176 |
The only remaining problem is with sharing the ``localstatedir`` |
177 |
structure (lib, run, log) amongst the daemons, for which we propose to |
178 |
add a command line parameter which can override this path (via injection |
179 |
into ``_autoconf.py``). The rationale for this is two-fold: |
180 |
|
181 |
- having two or more node daemons writing to the same directory might |
182 |
introduce artificial scenarios not existent in real life; currently |
183 |
noded either owns the entire ``/var/lib/ganeti`` directory or shares |
184 |
it with masterd, but never with another noded |
185 |
- having separate directories allows cluster verify to check correctly |
186 |
consistency of file upload operations; otherwise, as long as one node |
187 |
daemon wrote a file successfully, the results from all others are |
188 |
“lost” |
189 |
|
190 |
|
191 |
``rapi`` |
192 |
-------- |
193 |
|
194 |
The RAPI daemon is not privileged and furthermore we only need one per |
195 |
cluster, so it presents no issues. |
196 |
|
197 |
``confd`` |
198 |
--------- |
199 |
|
200 |
``confd`` has somewhat the same issues as the node daemon regarding |
201 |
multiple daemons per machine, but the per-address binding still works. |
202 |
|
203 |
``ganeti-watcher`` |
204 |
------------------ |
205 |
|
206 |
Since the startup of daemons will be customised with per-IP binds, the |
207 |
watcher either has to be modified to not activate the daemons, or the |
208 |
start-stop tool has to take this into account. Due to watcher's use of |
209 |
the hostname, it's recommended that the master node is set to the |
210 |
machine hostname (also a requirement for the master daemon). |
211 |
|
212 |
CLI scripts |
213 |
----------- |
214 |
|
215 |
As long as the master node is set to the machine hostname, these should |
216 |
work fine. |
217 |
|
218 |
Cluster initialisation |
219 |
---------------------- |
220 |
|
221 |
It could be possible that the cluster initialisation procedure is a bit |
222 |
more involved (this was not tried yet). In any case, we can build a |
223 |
``config.data`` file manually, without having to actually run |
224 |
``gnt-cluster init``. |
225 |
|
226 |
Needed tools |
227 |
============ |
228 |
|
229 |
With the above investigation results in mind, the only thing we need |
230 |
are: |
231 |
|
232 |
- a tool to setup per-virtual node tree structure of ``localstatedir`` |
233 |
and setup correctly the extra IP/hostnames |
234 |
- changes to the startup daemon tools to launch correctly the daemons |
235 |
per virtual node |
236 |
- changes to ``noded`` to override the ``localstatedir`` path |
237 |
- documentation for running such a virtual cluster |
238 |
- and eventual small fixes to the node daemon backend functionality, to |
239 |
better separate privileged and non-privileged code |
240 |
|
241 |
.. vim: set textwidth=72 : |
242 |
.. Local Variables: |
243 |
.. mode: rst |
244 |
.. fill-column: 72 |
245 |
.. End: |