Statistics
| Branch: | Tag: | Revision:

root / doc / design-virtual-clusters.rst @ 2a50e2e8

History | View | Annotate | Download (11.3 kB)

1 6b898285 Iustin Pop
==========================
2 6b898285 Iustin Pop
 Virtual clusters support
3 6b898285 Iustin Pop
==========================
4 6b898285 Iustin Pop
5 6b898285 Iustin Pop
6 6b898285 Iustin Pop
Introduction
7 6b898285 Iustin Pop
============
8 6b898285 Iustin Pop
9 6b898285 Iustin Pop
Currently there are two ways to test the Ganeti (including HTools) code
10 6b898285 Iustin Pop
base:
11 6b898285 Iustin Pop
12 6b898285 Iustin Pop
- unittests, which run using mocks as normal user and test small bits of
13 6b898285 Iustin Pop
  the code
14 6b898285 Iustin Pop
- QA/burnin/live-test, which require actual hardware (either physical or
15 6b898285 Iustin Pop
  virtual) and will build an actual cluster, with one machine to one
16 6b898285 Iustin Pop
  node correspondence
17 6b898285 Iustin Pop
18 6b898285 Iustin Pop
The difference in time between these two is significant:
19 6b898285 Iustin Pop
20 6b898285 Iustin Pop
- the unittests run in about 1-2 minutes
21 6b898285 Iustin Pop
- a so-called ‘quick’ QA (without burnin) runs in about an hour, and a
22 6b898285 Iustin Pop
  full QA could be double that time
23 6b898285 Iustin Pop
24 6b898285 Iustin Pop
On one hand, the unittests have a clear advantage: quick to run, not
25 6b898285 Iustin Pop
requiring many machines, but on the other hand QA is actually able to
26 6b898285 Iustin Pop
run end-to-end tests (including HTools, for example).
27 6b898285 Iustin Pop
28 6b898285 Iustin Pop
Ideally, we would have an intermediate step between these two extremes:
29 6b898285 Iustin Pop
be able to test most, if not all, of Ganeti's functionality but without
30 6b898285 Iustin Pop
requiring actual hardware, full machine ownership or root access.
31 6b898285 Iustin Pop
32 6b898285 Iustin Pop
33 6b898285 Iustin Pop
Current situation
34 6b898285 Iustin Pop
=================
35 6b898285 Iustin Pop
36 6b898285 Iustin Pop
Ganeti
37 6b898285 Iustin Pop
------
38 6b898285 Iustin Pop
39 6b898285 Iustin Pop
It is possible, given a manually built ``config.data`` and
40 6b898285 Iustin Pop
``_autoconf.py``, to run the masterd under the current user as a
41 6b898285 Iustin Pop
single-node cluster master. However, the node daemon and related
42 6b898285 Iustin Pop
functionality (cluster initialisation, master failover, etc.) are not
43 6b898285 Iustin Pop
directly runnable in this model.
44 6b898285 Iustin Pop
45 6b898285 Iustin Pop
Also, masterd only works as a master of a single node cluster, due to
46 6b898285 Iustin Pop
our current “hostname” method of identifying nodes, which results in a
47 6b898285 Iustin Pop
limit of maximum one node daemon per machine, unless we use multiple
48 6b898285 Iustin Pop
name and IP aliases.
49 6b898285 Iustin Pop
50 6b898285 Iustin Pop
HTools
51 6b898285 Iustin Pop
------
52 6b898285 Iustin Pop
53 6b898285 Iustin Pop
In HTools the situation is better, since it doesn't have to deal with
54 6b898285 Iustin Pop
actual machine management: all tools can use a custom LUXI path, and can
55 6b898285 Iustin Pop
even load RAPI data from the filesystem (so the RAPI backend can be
56 6b898285 Iustin Pop
tested), and both the ‘text’ backend for hbal/hspace and the input files
57 6b898285 Iustin Pop
for hail are text-based, loaded from the file-system.
58 6b898285 Iustin Pop
59 6b898285 Iustin Pop
Proposed changes
60 6b898285 Iustin Pop
================
61 6b898285 Iustin Pop
62 6b898285 Iustin Pop
The end-goal is to have full support for “virtual clusters”, i.e. be
63 6b898285 Iustin Pop
able to run a “big” (hundreds of virtual nodes and towards thousands of
64 6b898285 Iustin Pop
virtual instances) on a reasonably powerful, but single machine, under a
65 6b898285 Iustin Pop
single user account and without any special privileges.
66 6b898285 Iustin Pop
67 6b898285 Iustin Pop
This would have significant advantages:
68 6b898285 Iustin Pop
69 6b898285 Iustin Pop
- being able to test end-to-end certain changes, without requiring a
70 6b898285 Iustin Pop
  complicated setup
71 6b898285 Iustin Pop
- better able to estimate Ganeti's behaviour and performance as the
72 6b898285 Iustin Pop
  cluster size grows; this is something that we haven't been able to
73 6b898285 Iustin Pop
  test reliably yet, and as such we still have not yet diagnosed
74 6b898285 Iustin Pop
  scaling problems
75 6b898285 Iustin Pop
- easier integration with external tools (and even with HTools)
76 6b898285 Iustin Pop
77 6b898285 Iustin Pop
``masterd``
78 6b898285 Iustin Pop
-----------
79 6b898285 Iustin Pop
80 6b898285 Iustin Pop
As described above, ``masterd`` already works reasonably well in a
81 6b898285 Iustin Pop
virtual setup, as it won't execute external programs and it shouldn't
82 6b898285 Iustin Pop
directly read files from the local filesystem (or at least not
83 6b898285 Iustin Pop
virtualisation-related, as the master node can be a non-vm_capable
84 6b898285 Iustin Pop
node).
85 6b898285 Iustin Pop
86 6b898285 Iustin Pop
``noded``
87 6b898285 Iustin Pop
---------
88 6b898285 Iustin Pop
89 6b898285 Iustin Pop
The node daemon executes many privileged operations, but they can be
90 6b898285 Iustin Pop
split in a few general categories:
91 6b898285 Iustin Pop
92 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
93 6b898285 Iustin Pop
|Category       |Description            |Solution                            |
94 6b898285 Iustin Pop
+===============+=======================+====================================+
95 6b898285 Iustin Pop
|disk operations|Disk creation and      |Use only diskless or file-based     |
96 6b898285 Iustin Pop
|               |removal                |instances                           |
97 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
98 6b898285 Iustin Pop
|disk query     |Node disk total/free,  |Not supported currently, could use  |
99 6b898285 Iustin Pop
|               |used in node listing   |file-based                          |
100 6b898285 Iustin Pop
|               |and htools             |                                    |
101 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
102 6b898285 Iustin Pop
|hypervisor     |Instance start, stop   |Use the *fake* hypervisor           |
103 6b898285 Iustin Pop
|operations     |and query              |                                    |
104 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
105 6b898285 Iustin Pop
|instance       |Bridge existence query |Unprivileged operation, can be used |
106 6b898285 Iustin Pop
|networking     |                       |with an existing bridge at system   |
107 6b898285 Iustin Pop
|               |                       |level or use NIC-less instances     |
108 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
109 6b898285 Iustin Pop
|instance OS    |OS add, OS rename,     |Only used with non diskless         |
110 6b898285 Iustin Pop
|operations     |export and import      |instances; could work with custom OS|
111 6b898285 Iustin Pop
|               |                       |scripts (that just ``dd`` without   |
112 6b898285 Iustin Pop
|               |                       |mounting filesystems                |
113 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
114 6b898285 Iustin Pop
|node networking|IP address management  |Not supported; Ganeti will need to  |
115 6b898285 Iustin Pop
|               |(master ip), IP query, |work without a master IP. For the IP|
116 6b898285 Iustin Pop
|               |etc.                   |query operations, the test machine  |
117 6b898285 Iustin Pop
|               |                       |would need externally-configured IPs|
118 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
119 6b898285 Iustin Pop
|node setup     |ssh, /etc/hosts, so on |Can already be disabled from the    |
120 6b898285 Iustin Pop
|               |                       |cluster config                      |
121 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
122 6b898285 Iustin Pop
|master failover|start/stop the master  |Doable (as long as we use a single  |
123 6b898285 Iustin Pop
|               |daemon                 |user), might get tricky w.r.t. paths|
124 6b898285 Iustin Pop
|               |                       |to executables                      |
125 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
126 6b898285 Iustin Pop
|file upload    |Uploading of system    |The only issue could be with system |
127 6b898285 Iustin Pop
|               |files, job queue files |files, which are not owned by the   |
128 6b898285 Iustin Pop
|               |and ganeti config      |current user; internal ganeti files |
129 6b898285 Iustin Pop
|               |                       |should be working fine              |
130 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
131 6b898285 Iustin Pop
|node oob       |Out-of-band commands   |Since these are user-defined, we can|
132 6b898285 Iustin Pop
|               |                       |mock them easily                    |
133 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
134 6b898285 Iustin Pop
|node OS        |List the existing OSes |No special privileges needed, so    |
135 6b898285 Iustin Pop
|discovery      |and their properties   |works fine as-is                    |
136 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
137 6b898285 Iustin Pop
|hooks          |Running hooks for given|No special privileges needed        |
138 6b898285 Iustin Pop
|               |operations             |                                    |
139 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
140 6b898285 Iustin Pop
|iallocator     |Calling an iallocator  |No special privileges needed        |
141 6b898285 Iustin Pop
|               |script                 |                                    |
142 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
143 6b898285 Iustin Pop
|export/import  |Exporting and importing|When exporting/importing file-based |
144 6b898285 Iustin Pop
|               |instances              |instances, this should work, as the |
145 6b898285 Iustin Pop
|               |                       |listening ports are dynamically     |
146 6b898285 Iustin Pop
|               |                       |chosen                              |
147 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
148 6b898285 Iustin Pop
|hypervisor     |The validation of      |As long as the hypervisors don't    |
149 6b898285 Iustin Pop
|validation     |hypervisor parameters  |call to privileged commands, it     |
150 6b898285 Iustin Pop
|               |                       |should work                         |
151 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
152 6b898285 Iustin Pop
|node powercycle|The ability to power   |Privileged, so not supported, but   |
153 6b898285 Iustin Pop
|               |cycle a node remotely  |anyway not very interesting for     |
154 6b898285 Iustin Pop
|               |                       |testing                             |
155 6b898285 Iustin Pop
+---------------+-----------------------+------------------------------------+
156 6b898285 Iustin Pop
157 6b898285 Iustin Pop
It seems that much of the functionality works as is, or could work with
158 6b898285 Iustin Pop
small adjustments, even in a non-privileged setup. The bigger problem is
159 6b898285 Iustin Pop
the actual use of multiple node daemons per machine.
160 6b898285 Iustin Pop
161 6b898285 Iustin Pop
Multiple ``noded`` per machine
162 6b898285 Iustin Pop
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
163 6b898285 Iustin Pop
164 6b898285 Iustin Pop
Currently Ganeti identifies node simply by their hostname. Since
165 6b898285 Iustin Pop
changing this method would imply significant changes to tracking the
166 6b898285 Iustin Pop
nodes, the proposal is to simply have as many IPs per the (single)
167 6b898285 Iustin Pop
machine that is used for tests as nodes, and have each IP correspond to
168 6b898285 Iustin Pop
a different name, and thus no changes are needed to the core RPC
169 6b898285 Iustin Pop
library. Unfortunately this has the downside of requiring root rights
170 6b898285 Iustin Pop
for setting up the extra IPs and hostnames.
171 6b898285 Iustin Pop
172 6b898285 Iustin Pop
An alternative option is to implement per-node IP/port support in Ganeti
173 6b898285 Iustin Pop
(especially in the RPC layer), which would eliminate the root rights. We
174 6b898285 Iustin Pop
expect that this will get implemented as a second step of this design.
175 6b898285 Iustin Pop
176 6b898285 Iustin Pop
The only remaining problem is with sharing the ``localstatedir``
177 6b898285 Iustin Pop
structure (lib, run, log) amongst the daemons, for which we propose to
178 6b898285 Iustin Pop
add a command line parameter which can override this path (via injection
179 6b898285 Iustin Pop
into ``_autoconf.py``). The rationale for this is two-fold:
180 6b898285 Iustin Pop
181 6b898285 Iustin Pop
- having two or more node daemons writing to the same directory might
182 6b898285 Iustin Pop
  introduce artificial scenarios not existent in real life; currently
183 6b898285 Iustin Pop
  noded either owns the entire ``/var/lib/ganeti`` directory or shares
184 6b898285 Iustin Pop
  it with masterd, but never with another noded
185 6b898285 Iustin Pop
- having separate directories allows cluster verify to check correctly
186 6b898285 Iustin Pop
  consistency of file upload operations; otherwise, as long as one node
187 6b898285 Iustin Pop
  daemon wrote a file successfully, the results from all others are
188 6b898285 Iustin Pop
  “lost”
189 6b898285 Iustin Pop
190 6b898285 Iustin Pop
191 6b898285 Iustin Pop
``rapi``
192 6b898285 Iustin Pop
--------
193 6b898285 Iustin Pop
194 6b898285 Iustin Pop
The RAPI daemon is not privileged and furthermore we only need one per
195 6b898285 Iustin Pop
cluster, so it presents no issues.
196 6b898285 Iustin Pop
197 6b898285 Iustin Pop
``confd``
198 6b898285 Iustin Pop
---------
199 6b898285 Iustin Pop
200 6b898285 Iustin Pop
``confd`` has somewhat the same issues as the node daemon regarding
201 6b898285 Iustin Pop
multiple daemons per machine, but the per-address binding still works.
202 6b898285 Iustin Pop
203 6b898285 Iustin Pop
``ganeti-watcher``
204 6b898285 Iustin Pop
------------------
205 6b898285 Iustin Pop
206 6b898285 Iustin Pop
Since the startup of daemons will be customised with per-IP binds, the
207 6b898285 Iustin Pop
watcher either has to be modified to not activate the daemons, or the
208 6b898285 Iustin Pop
start-stop tool has to take this into account. Due to watcher's use of
209 6b898285 Iustin Pop
the hostname, it's recommended that the master node is set to the
210 6b898285 Iustin Pop
machine hostname (also a requirement for the master daemon).
211 6b898285 Iustin Pop
212 6b898285 Iustin Pop
CLI scripts
213 6b898285 Iustin Pop
-----------
214 6b898285 Iustin Pop
215 6b898285 Iustin Pop
As long as the master node is set to the machine hostname, these should
216 6b898285 Iustin Pop
work fine.
217 6b898285 Iustin Pop
218 6b898285 Iustin Pop
Cluster initialisation
219 6b898285 Iustin Pop
----------------------
220 6b898285 Iustin Pop
221 6b898285 Iustin Pop
It could be possible that the cluster initialisation procedure is a bit
222 6b898285 Iustin Pop
more involved (this was not tried yet). In any case, we can build a
223 6b898285 Iustin Pop
``config.data`` file manually, without having to actually run
224 6b898285 Iustin Pop
``gnt-cluster init``.
225 6b898285 Iustin Pop
226 6b898285 Iustin Pop
Needed tools
227 6b898285 Iustin Pop
============
228 6b898285 Iustin Pop
229 6b898285 Iustin Pop
With the above investigation results in mind, the only thing we need
230 6b898285 Iustin Pop
are:
231 6b898285 Iustin Pop
232 6b898285 Iustin Pop
- a tool to setup per-virtual node tree structure of ``localstatedir``
233 6b898285 Iustin Pop
  and setup correctly the extra IP/hostnames
234 6b898285 Iustin Pop
- changes to the startup daemon tools to launch correctly the daemons
235 6b898285 Iustin Pop
  per virtual node
236 6b898285 Iustin Pop
- changes to ``noded`` to override the ``localstatedir`` path
237 6b898285 Iustin Pop
- documentation for running such a virtual cluster
238 6b898285 Iustin Pop
- and eventual small fixes to the node daemon backend functionality, to
239 6b898285 Iustin Pop
  better separate privileged and non-privileged code
240 6b898285 Iustin Pop
241 6b898285 Iustin Pop
.. vim: set textwidth=72 :
242 6b898285 Iustin Pop
.. Local Variables:
243 6b898285 Iustin Pop
.. mode: rst
244 6b898285 Iustin Pop
.. fill-column: 72
245 6b898285 Iustin Pop
.. End: