|
1 |
==========================
|
|
2 |
Virtual clusters support
|
|
3 |
==========================
|
|
4 |
|
|
5 |
|
|
6 |
Introduction
|
|
7 |
============
|
|
8 |
|
|
9 |
Currently there are two ways to test the Ganeti (including HTools) code
|
|
10 |
base:
|
|
11 |
|
|
12 |
- unittests, which run using mocks as normal user and test small bits of
|
|
13 |
the code
|
|
14 |
- QA/burnin/live-test, which require actual hardware (either physical or
|
|
15 |
virtual) and will build an actual cluster, with one machine to one
|
|
16 |
node correspondence
|
|
17 |
|
|
18 |
The difference in time between these two is significant:
|
|
19 |
|
|
20 |
- the unittests run in about 1-2 minutes
|
|
21 |
- a so-called ‘quick’ QA (without burnin) runs in about an hour, and a
|
|
22 |
full QA could be double that time
|
|
23 |
|
|
24 |
On one hand, the unittests have a clear advantage: quick to run, not
|
|
25 |
requiring many machines, but on the other hand QA is actually able to
|
|
26 |
run end-to-end tests (including HTools, for example).
|
|
27 |
|
|
28 |
Ideally, we would have an intermediate step between these two extremes:
|
|
29 |
be able to test most, if not all, of Ganeti's functionality but without
|
|
30 |
requiring actual hardware, full machine ownership or root access.
|
|
31 |
|
|
32 |
|
|
33 |
Current situation
|
|
34 |
=================
|
|
35 |
|
|
36 |
Ganeti
|
|
37 |
------
|
|
38 |
|
|
39 |
It is possible, given a manually built ``config.data`` and
|
|
40 |
``_autoconf.py``, to run the masterd under the current user as a
|
|
41 |
single-node cluster master. However, the node daemon and related
|
|
42 |
functionality (cluster initialisation, master failover, etc.) are not
|
|
43 |
directly runnable in this model.
|
|
44 |
|
|
45 |
Also, masterd only works as a master of a single node cluster, due to
|
|
46 |
our current “hostname” method of identifying nodes, which results in a
|
|
47 |
limit of maximum one node daemon per machine, unless we use multiple
|
|
48 |
name and IP aliases.
|
|
49 |
|
|
50 |
HTools
|
|
51 |
------
|
|
52 |
|
|
53 |
In HTools the situation is better, since it doesn't have to deal with
|
|
54 |
actual machine management: all tools can use a custom LUXI path, and can
|
|
55 |
even load RAPI data from the filesystem (so the RAPI backend can be
|
|
56 |
tested), and both the ‘text’ backend for hbal/hspace and the input files
|
|
57 |
for hail are text-based, loaded from the file-system.
|
|
58 |
|
|
59 |
Proposed changes
|
|
60 |
================
|
|
61 |
|
|
62 |
The end-goal is to have full support for “virtual clusters”, i.e. be
|
|
63 |
able to run a “big” (hundreds of virtual nodes and towards thousands of
|
|
64 |
virtual instances) on a reasonably powerful, but single machine, under a
|
|
65 |
single user account and without any special privileges.
|
|
66 |
|
|
67 |
This would have significant advantages:
|
|
68 |
|
|
69 |
- being able to test end-to-end certain changes, without requiring a
|
|
70 |
complicated setup
|
|
71 |
- better able to estimate Ganeti's behaviour and performance as the
|
|
72 |
cluster size grows; this is something that we haven't been able to
|
|
73 |
test reliably yet, and as such we still have not yet diagnosed
|
|
74 |
scaling problems
|
|
75 |
- easier integration with external tools (and even with HTools)
|
|
76 |
|
|
77 |
``masterd``
|
|
78 |
-----------
|
|
79 |
|
|
80 |
As described above, ``masterd`` already works reasonably well in a
|
|
81 |
virtual setup, as it won't execute external programs and it shouldn't
|
|
82 |
directly read files from the local filesystem (or at least not
|
|
83 |
virtualisation-related, as the master node can be a non-vm_capable
|
|
84 |
node).
|
|
85 |
|
|
86 |
``noded``
|
|
87 |
---------
|
|
88 |
|
|
89 |
The node daemon executes many privileged operations, but they can be
|
|
90 |
split in a few general categories:
|
|
91 |
|
|
92 |
+---------------+-----------------------+------------------------------------+
|
|
93 |
|Category |Description |Solution |
|
|
94 |
+===============+=======================+====================================+
|
|
95 |
|disk operations|Disk creation and |Use only diskless or file-based |
|
|
96 |
| |removal |instances |
|
|
97 |
+---------------+-----------------------+------------------------------------+
|
|
98 |
|disk query |Node disk total/free, |Not supported currently, could use |
|
|
99 |
| |used in node listing |file-based |
|
|
100 |
| |and htools | |
|
|
101 |
+---------------+-----------------------+------------------------------------+
|
|
102 |
|hypervisor |Instance start, stop |Use the *fake* hypervisor |
|
|
103 |
|operations |and query | |
|
|
104 |
+---------------+-----------------------+------------------------------------+
|
|
105 |
|instance |Bridge existence query |Unprivileged operation, can be used |
|
|
106 |
|networking | |with an existing bridge at system |
|
|
107 |
| | |level or use NIC-less instances |
|
|
108 |
+---------------+-----------------------+------------------------------------+
|
|
109 |
|instance OS |OS add, OS rename, |Only used with non diskless |
|
|
110 |
|operations |export and import |instances; could work with custom OS|
|
|
111 |
| | |scripts (that just ``dd`` without |
|
|
112 |
| | |mounting filesystems |
|
|
113 |
+---------------+-----------------------+------------------------------------+
|
|
114 |
|node networking|IP address management |Not supported; Ganeti will need to |
|
|
115 |
| |(master ip), IP query, |work without a master IP. For the IP|
|
|
116 |
| |etc. |query operations, the test machine |
|
|
117 |
| | |would need externally-configured IPs|
|
|
118 |
+---------------+-----------------------+------------------------------------+
|
|
119 |
|node setup |ssh, /etc/hosts, so on |Can already be disabled from the |
|
|
120 |
| | |cluster config |
|
|
121 |
+---------------+-----------------------+------------------------------------+
|
|
122 |
|master failover|start/stop the master |Doable (as long as we use a single |
|
|
123 |
| |daemon |user), might get tricky w.r.t. paths|
|
|
124 |
| | |to executables |
|
|
125 |
+---------------+-----------------------+------------------------------------+
|
|
126 |
|file upload |Uploading of system |The only issue could be with system |
|
|
127 |
| |files, job queue files |files, which are not owned by the |
|
|
128 |
| |and ganeti config |current user; internal ganeti files |
|
|
129 |
| | |should be working fine |
|
|
130 |
+---------------+-----------------------+------------------------------------+
|
|
131 |
|node oob |Out-of-band commands |Since these are user-defined, we can|
|
|
132 |
| | |mock them easily |
|
|
133 |
+---------------+-----------------------+------------------------------------+
|
|
134 |
|node OS |List the existing OSes |No special privileges needed, so |
|
|
135 |
|discovery |and their properties |works fine as-is |
|
|
136 |
+---------------+-----------------------+------------------------------------+
|
|
137 |
|hooks |Running hooks for given|No special privileges needed |
|
|
138 |
| |operations | |
|
|
139 |
+---------------+-----------------------+------------------------------------+
|
|
140 |
|iallocator |Calling an iallocator |No special privileges needed |
|
|
141 |
| |script | |
|
|
142 |
+---------------+-----------------------+------------------------------------+
|
|
143 |
|export/import |Exporting and importing|When exporting/importing file-based |
|
|
144 |
| |instances |instances, this should work, as the |
|
|
145 |
| | |listening ports are dynamically |
|
|
146 |
| | |chosen |
|
|
147 |
+---------------+-----------------------+------------------------------------+
|
|
148 |
|hypervisor |The validation of |As long as the hypervisors don't |
|
|
149 |
|validation |hypervisor parameters |call to privileged commands, it |
|
|
150 |
| | |should work |
|
|
151 |
+---------------+-----------------------+------------------------------------+
|
|
152 |
|node powercycle|The ability to power |Privileged, so not supported, but |
|
|
153 |
| |cycle a node remotely |anyway not very interesting for |
|
|
154 |
| | |testing |
|
|
155 |
+---------------+-----------------------+------------------------------------+
|
|
156 |
|
|
157 |
It seems that much of the functionality works as is, or could work with
|
|
158 |
small adjustments, even in a non-privileged setup. The bigger problem is
|
|
159 |
the actual use of multiple node daemons per machine.
|
|
160 |
|
|
161 |
Multiple ``noded`` per machine
|
|
162 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
163 |
|
|
164 |
Currently Ganeti identifies node simply by their hostname. Since
|
|
165 |
changing this method would imply significant changes to tracking the
|
|
166 |
nodes, the proposal is to simply have as many IPs per the (single)
|
|
167 |
machine that is used for tests as nodes, and have each IP correspond to
|
|
168 |
a different name, and thus no changes are needed to the core RPC
|
|
169 |
library. Unfortunately this has the downside of requiring root rights
|
|
170 |
for setting up the extra IPs and hostnames.
|
|
171 |
|
|
172 |
An alternative option is to implement per-node IP/port support in Ganeti
|
|
173 |
(especially in the RPC layer), which would eliminate the root rights. We
|
|
174 |
expect that this will get implemented as a second step of this design.
|
|
175 |
|
|
176 |
The only remaining problem is with sharing the ``localstatedir``
|
|
177 |
structure (lib, run, log) amongst the daemons, for which we propose to
|
|
178 |
add a command line parameter which can override this path (via injection
|
|
179 |
into ``_autoconf.py``). The rationale for this is two-fold:
|
|
180 |
|
|
181 |
- having two or more node daemons writing to the same directory might
|
|
182 |
introduce artificial scenarios not existent in real life; currently
|
|
183 |
noded either owns the entire ``/var/lib/ganeti`` directory or shares
|
|
184 |
it with masterd, but never with another noded
|
|
185 |
- having separate directories allows cluster verify to check correctly
|
|
186 |
consistency of file upload operations; otherwise, as long as one node
|
|
187 |
daemon wrote a file successfully, the results from all others are
|
|
188 |
“lost”
|
|
189 |
|
|
190 |
|
|
191 |
``rapi``
|
|
192 |
--------
|
|
193 |
|
|
194 |
The RAPI daemon is not privileged and furthermore we only need one per
|
|
195 |
cluster, so it presents no issues.
|
|
196 |
|
|
197 |
``confd``
|
|
198 |
---------
|
|
199 |
|
|
200 |
``confd`` has somewhat the same issues as the node daemon regarding
|
|
201 |
multiple daemons per machine, but the per-address binding still works.
|
|
202 |
|
|
203 |
``ganeti-watcher``
|
|
204 |
------------------
|
|
205 |
|
|
206 |
Since the startup of daemons will be customised with per-IP binds, the
|
|
207 |
watcher either has to be modified to not activate the daemons, or the
|
|
208 |
start-stop tool has to take this into account. Due to watcher's use of
|
|
209 |
the hostname, it's recommended that the master node is set to the
|
|
210 |
machine hostname (also a requirement for the master daemon).
|
|
211 |
|
|
212 |
CLI scripts
|
|
213 |
-----------
|
|
214 |
|
|
215 |
As long as the master node is set to the machine hostname, these should
|
|
216 |
work fine.
|
|
217 |
|
|
218 |
Cluster initialisation
|
|
219 |
----------------------
|
|
220 |
|
|
221 |
It could be possible that the cluster initialisation procedure is a bit
|
|
222 |
more involved (this was not tried yet). In any case, we can build a
|
|
223 |
``config.data`` file manually, without having to actually run
|
|
224 |
``gnt-cluster init``.
|
|
225 |
|
|
226 |
Needed tools
|
|
227 |
============
|
|
228 |
|
|
229 |
With the above investigation results in mind, the only thing we need
|
|
230 |
are:
|
|
231 |
|
|
232 |
- a tool to setup per-virtual node tree structure of ``localstatedir``
|
|
233 |
and setup correctly the extra IP/hostnames
|
|
234 |
- changes to the startup daemon tools to launch correctly the daemons
|
|
235 |
per virtual node
|
|
236 |
- changes to ``noded`` to override the ``localstatedir`` path
|
|
237 |
- documentation for running such a virtual cluster
|
|
238 |
- and eventual small fixes to the node daemon backend functionality, to
|
|
239 |
better separate privileged and non-privileged code
|
|
240 |
|
|
241 |
.. vim: set textwidth=72 :
|
|
242 |
.. Local Variables:
|
|
243 |
.. mode: rst
|
|
244 |
.. fill-column: 72
|
|
245 |
.. End:
|