Statistics
| Branch: | Tag: | Revision:

root / doc / design-virtual-clusters.rst @ 33c730a2

History | View | Annotate | Download (12.1 kB)

1
===================================
2
Design for virtual clusters support
3
===================================
4

    
5

    
6
Introduction
7
============
8

    
9
Currently there are two ways to test the Ganeti (including HTools) code
10
base:
11

    
12
- unittests, which run using mocks as normal user and test small bits of
13
  the code
14
- QA/burnin/live-test, which require actual hardware (either physical or
15
  virtual) and will build an actual cluster, with one machine to one
16
  node correspondence
17

    
18
The difference in time between these two is significant:
19

    
20
- the unittests run in about 1-2 minutes
21
- a so-called ‘quick’ QA (without burnin) runs in about an hour, and a
22
  full QA could be double that time
23

    
24
On one hand, the unittests have a clear advantage: quick to run, not
25
requiring many machines, but on the other hand QA is actually able to
26
run end-to-end tests (including HTools, for example).
27

    
28
Ideally, we would have an intermediate step between these two extremes:
29
be able to test most, if not all, of Ganeti's functionality but without
30
requiring actual hardware, full machine ownership or root access.
31

    
32

    
33
Current situation
34
=================
35

    
36
Ganeti
37
------
38

    
39
It is possible, given a manually built ``config.data`` and
40
``_autoconf.py``, to run the masterd under the current user as a
41
single-node cluster master. However, the node daemon and related
42
functionality (cluster initialisation, master failover, etc.) are not
43
directly runnable in this model.
44

    
45
Also, masterd only works as a master of a single node cluster, due to
46
our current “hostname” method of identifying nodes, which results in a
47
limit of maximum one node daemon per machine, unless we use multiple
48
name and IP aliases.
49

    
50
HTools
51
------
52

    
53
In HTools the situation is better, since it doesn't have to deal with
54
actual machine management: all tools can use a custom LUXI path, and can
55
even load RAPI data from the filesystem (so the RAPI backend can be
56
tested), and both the ‘text’ backend for hbal/hspace and the input files
57
for hail are text-based, loaded from the file-system.
58

    
59
Proposed changes
60
================
61

    
62
The end-goal is to have full support for “virtual clusters”, i.e. be
63
able to run a “big” (hundreds of virtual nodes and towards thousands of
64
virtual instances) on a reasonably powerful, but single machine, under a
65
single user account and without any special privileges.
66

    
67
This would have significant advantages:
68

    
69
- being able to test end-to-end certain changes, without requiring a
70
  complicated setup
71
- better able to estimate Ganeti's behaviour and performance as the
72
  cluster size grows; this is something that we haven't been able to
73
  test reliably yet, and as such we still have not yet diagnosed
74
  scaling problems
75
- easier integration with external tools (and even with HTools)
76

    
77
``masterd``
78
-----------
79

    
80
As described above, ``masterd`` already works reasonably well in a
81
virtual setup, as it won't execute external programs and it shouldn't
82
directly read files from the local filesystem (or at least not
83
virtualisation-related, as the master node can be a non-vm_capable
84
node).
85

    
86
``noded``
87
---------
88

    
89
The node daemon executes many privileged operations, but they can be
90
split in a few general categories:
91

    
92
+---------------+-----------------------+------------------------------------+
93
|Category       |Description            |Solution                            |
94
+===============+=======================+====================================+
95
|disk operations|Disk creation and      |Use only diskless or file-based     |
96
|               |removal                |instances                           |
97
+---------------+-----------------------+------------------------------------+
98
|disk query     |Node disk total/free,  |Not supported currently, could use  |
99
|               |used in node listing   |file-based                          |
100
|               |and htools             |                                    |
101
+---------------+-----------------------+------------------------------------+
102
|hypervisor     |Instance start, stop   |Use the *fake* hypervisor           |
103
|operations     |and query              |                                    |
104
+---------------+-----------------------+------------------------------------+
105
|instance       |Bridge existence query |Unprivileged operation, can be used |
106
|networking     |                       |with an existing bridge at system   |
107
|               |                       |level or use NIC-less instances     |
108
+---------------+-----------------------+------------------------------------+
109
|instance OS    |OS add, OS rename,     |Only used with non-diskless         |
110
|operations     |export and import      |instances; could work with custom OS|
111
|               |                       |scripts that just ``dd`` without    |
112
|               |                       |mounting filesystems                |
113
+---------------+-----------------------+------------------------------------+
114
|node networking|IP address management  |Not supported; Ganeti will need to  |
115
|               |(master ip), IP query, |work without a master IP; for the IP|
116
|               |etc.                   |query operations the test machine   |
117
|               |                       |would need externally-configured IPs|
118
+---------------+-----------------------+------------------------------------+
119
|node add       |-                      |SSH command must be adjusted        |
120
+---------------+-----------------------+------------------------------------+
121
|node setup     |ssh, /etc/hosts, so on |Can already be disabled from the    |
122
|               |                       |cluster config                      |
123
+---------------+-----------------------+------------------------------------+
124
|master failover|start/stop the master  |Doable (as long as we use a single  |
125
|               |daemon                 |user), might get tricky w.r.t. paths|
126
|               |                       |to executables                      |
127
+---------------+-----------------------+------------------------------------+
128
|file upload    |Uploading of system    |The only issue could be with system |
129
|               |files, job queue files |files, which are not owned by the   |
130
|               |and ganeti config      |current user; internal ganeti files |
131
|               |                       |should be working fine              |
132
+---------------+-----------------------+------------------------------------+
133
|node oob       |Out-of-band commands   |Since these are user-defined, we can|
134
|               |                       |mock them easily                    |
135
+---------------+-----------------------+------------------------------------+
136
|node OS        |List the existing OSes |No special privileges needed, so    |
137
|discovery      |and their properties   |works fine as-is                    |
138
+---------------+-----------------------+------------------------------------+
139
|hooks          |Running hooks for given|No special privileges needed        |
140
|               |operations             |                                    |
141
+---------------+-----------------------+------------------------------------+
142
|iallocator     |Calling an iallocator  |No special privileges needed        |
143
|               |script                 |                                    |
144
+---------------+-----------------------+------------------------------------+
145
|export/import  |Exporting and importing|When exporting/importing file-based |
146
|               |instances              |instances, this should work, as the |
147
|               |                       |listening ports are dynamically     |
148
|               |                       |chosen                              |
149
+---------------+-----------------------+------------------------------------+
150
|hypervisor     |The validation of      |As long as the hypervisors don't    |
151
|validation     |hypervisor parameters  |call to privileged commands, it     |
152
|               |                       |should work                         |
153
+---------------+-----------------------+------------------------------------+
154
|node powercycle|The ability to power   |Privileged, so not supported, but   |
155
|               |cycle a node remotely  |anyway not very interesting for     |
156
|               |                       |testing                             |
157
+---------------+-----------------------+------------------------------------+
158

    
159
It seems that much of the functionality works as is, or could work with
160
small adjustments, even in a non-privileged setup. The bigger problem is
161
the actual use of multiple node daemons per machine.
162

    
163
Multiple ``noded`` per machine
164
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
165

    
166
Currently Ganeti identifies node simply by their hostname. Since
167
changing this method would imply significant changes to tracking the
168
nodes, the proposal is to simply have as many IPs per the (single)
169
machine that is used for tests as nodes, and have each IP correspond to
170
a different name, and thus no changes are needed to the core RPC
171
library. Unfortunately this has the downside of requiring root rights
172
for setting up the extra IPs and hostnames.
173

    
174
An alternative option is to implement per-node IP/port support in Ganeti
175
(especially in the RPC layer), which would eliminate the root rights. We
176
expect that this will get implemented as a second step of this design,
177
but as the port is currently static will require changes in many places.
178

    
179
The only remaining problem is with sharing the ``localstatedir``
180
structure (lib, run, log) amongst the daemons, for which we propose to
181
introduce an environment variable (``GANETI_ROOTDIR``) acting as a
182
prefix for essentially all paths. An environment variable is easier to
183
transport through several levels of programs (shell scripts, Python,
184
etc.) than a command line parameter. In Python code this prefix will be
185
applied to all paths in ``constants.py``. Every virtual node will get
186
its own root directory. The rationale for this is two-fold:
187

    
188
- having two or more node daemons writing to the same directory might
189
  introduce artificial scenarios not existent in real life; currently
190
  noded either owns the entire ``/var/lib/ganeti`` directory or shares
191
  it with masterd, but never with another noded
192
- having separate directories allows cluster verify to check correctly
193
  consistency of file upload operations; otherwise, as long as one node
194
  daemon wrote a file successfully, the results from all others are
195
  “lost”
196

    
197
In case the use of an environment variable turns out to be too difficult
198
a compile-time prefix path could be used. This would then require one
199
Ganeti installation per virtual node, but it might be good enough.
200

    
201
``rapi``
202
--------
203

    
204
The RAPI daemon is not privileged and furthermore we only need one per
205
cluster, so it presents no issues.
206

    
207
``confd``
208
---------
209

    
210
``confd`` has somewhat the same issues as the node daemon regarding
211
multiple daemons per machine, but the per-address binding still works.
212

    
213
``ganeti-watcher``
214
------------------
215

    
216
Since the startup of daemons will be customised with per-IP binds, the
217
watcher either has to be modified to not activate the daemons, or the
218
start-stop tool has to take this into account. Due to watcher's use of
219
the hostname, it's recommended that the master node is set to the
220
machine hostname (also a requirement for the master daemon).
221

    
222
CLI scripts
223
-----------
224

    
225
As long as the master node is set to the machine hostname, these should
226
work fine.
227

    
228
Cluster initialisation
229
----------------------
230

    
231
It could be possible that the cluster initialisation procedure is a bit
232
more involved (this was not tried yet). A script will be used to set up
233
all necessary IP addresses and hostnames, as well as creating the
234
initial directory structure. Building ``config.data`` manually should
235
not be necessary.
236

    
237
Needed tools
238
============
239

    
240
With the above investigation results in mind, the only thing we need
241
are:
242

    
243
- a tool to setup per-virtual node tree structure of ``localstatedir``
244
  (with the help of ``ensure-dirs``) and setup correctly the extra
245
  IP/hostnames
246
- changes to the startup daemon tools to launch correctly the daemons
247
  per virtual node
248
- changes to ``constants.py`` to override the ``localstatedir`` path
249
- documentation for running such a virtual cluster
250
- and eventual small fixes to the node daemon backend functionality, to
251
  better separate privileged and non-privileged code
252

    
253
.. vim: set textwidth=72 :
254
.. Local Variables:
255
.. mode: rst
256
.. fill-column: 72
257
.. End: