root / doc / design-virtual-clusters.rst @ 44e08911
History | View | Annotate | Download (12.1 kB)
1 | ab171697 | Michael Hanselmann | =================================== |
---|---|---|---|
2 | ab171697 | Michael Hanselmann | Design for virtual clusters support |
3 | ab171697 | Michael Hanselmann | =================================== |
4 | 6b898285 | Iustin Pop | |
5 | 6b898285 | Iustin Pop | |
6 | 6b898285 | Iustin Pop | Introduction |
7 | 6b898285 | Iustin Pop | ============ |
8 | 6b898285 | Iustin Pop | |
9 | 6b898285 | Iustin Pop | Currently there are two ways to test the Ganeti (including HTools) code |
10 | 6b898285 | Iustin Pop | base: |
11 | 6b898285 | Iustin Pop | |
12 | 6b898285 | Iustin Pop | - unittests, which run using mocks as normal user and test small bits of |
13 | 6b898285 | Iustin Pop | the code |
14 | 6b898285 | Iustin Pop | - QA/burnin/live-test, which require actual hardware (either physical or |
15 | 6b898285 | Iustin Pop | virtual) and will build an actual cluster, with one machine to one |
16 | 6b898285 | Iustin Pop | node correspondence |
17 | 6b898285 | Iustin Pop | |
18 | 6b898285 | Iustin Pop | The difference in time between these two is significant: |
19 | 6b898285 | Iustin Pop | |
20 | 6b898285 | Iustin Pop | - the unittests run in about 1-2 minutes |
21 | 6b898285 | Iustin Pop | - a so-called ‘quick’ QA (without burnin) runs in about an hour, and a |
22 | 6b898285 | Iustin Pop | full QA could be double that time |
23 | 6b898285 | Iustin Pop | |
24 | 6b898285 | Iustin Pop | On one hand, the unittests have a clear advantage: quick to run, not |
25 | 6b898285 | Iustin Pop | requiring many machines, but on the other hand QA is actually able to |
26 | 6b898285 | Iustin Pop | run end-to-end tests (including HTools, for example). |
27 | 6b898285 | Iustin Pop | |
28 | 6b898285 | Iustin Pop | Ideally, we would have an intermediate step between these two extremes: |
29 | 6b898285 | Iustin Pop | be able to test most, if not all, of Ganeti's functionality but without |
30 | 6b898285 | Iustin Pop | requiring actual hardware, full machine ownership or root access. |
31 | 6b898285 | Iustin Pop | |
32 | 6b898285 | Iustin Pop | |
33 | 6b898285 | Iustin Pop | Current situation |
34 | 6b898285 | Iustin Pop | ================= |
35 | 6b898285 | Iustin Pop | |
36 | 6b898285 | Iustin Pop | Ganeti |
37 | 6b898285 | Iustin Pop | ------ |
38 | 6b898285 | Iustin Pop | |
39 | 6b898285 | Iustin Pop | It is possible, given a manually built ``config.data`` and |
40 | 6b898285 | Iustin Pop | ``_autoconf.py``, to run the masterd under the current user as a |
41 | 6b898285 | Iustin Pop | single-node cluster master. However, the node daemon and related |
42 | 6b898285 | Iustin Pop | functionality (cluster initialisation, master failover, etc.) are not |
43 | 6b898285 | Iustin Pop | directly runnable in this model. |
44 | 6b898285 | Iustin Pop | |
45 | 6b898285 | Iustin Pop | Also, masterd only works as a master of a single node cluster, due to |
46 | 6b898285 | Iustin Pop | our current “hostname” method of identifying nodes, which results in a |
47 | 6b898285 | Iustin Pop | limit of maximum one node daemon per machine, unless we use multiple |
48 | 6b898285 | Iustin Pop | name and IP aliases. |
49 | 6b898285 | Iustin Pop | |
50 | 6b898285 | Iustin Pop | HTools |
51 | 6b898285 | Iustin Pop | ------ |
52 | 6b898285 | Iustin Pop | |
53 | 6b898285 | Iustin Pop | In HTools the situation is better, since it doesn't have to deal with |
54 | 6b898285 | Iustin Pop | actual machine management: all tools can use a custom LUXI path, and can |
55 | 6b898285 | Iustin Pop | even load RAPI data from the filesystem (so the RAPI backend can be |
56 | 6b898285 | Iustin Pop | tested), and both the ‘text’ backend for hbal/hspace and the input files |
57 | 6b898285 | Iustin Pop | for hail are text-based, loaded from the file-system. |
58 | 6b898285 | Iustin Pop | |
59 | 6b898285 | Iustin Pop | Proposed changes |
60 | 6b898285 | Iustin Pop | ================ |
61 | 6b898285 | Iustin Pop | |
62 | 6b898285 | Iustin Pop | The end-goal is to have full support for “virtual clusters”, i.e. be |
63 | 6b898285 | Iustin Pop | able to run a “big” (hundreds of virtual nodes and towards thousands of |
64 | 6b898285 | Iustin Pop | virtual instances) on a reasonably powerful, but single machine, under a |
65 | 6b898285 | Iustin Pop | single user account and without any special privileges. |
66 | 6b898285 | Iustin Pop | |
67 | 6b898285 | Iustin Pop | This would have significant advantages: |
68 | 6b898285 | Iustin Pop | |
69 | 6b898285 | Iustin Pop | - being able to test end-to-end certain changes, without requiring a |
70 | 6b898285 | Iustin Pop | complicated setup |
71 | 6b898285 | Iustin Pop | - better able to estimate Ganeti's behaviour and performance as the |
72 | 6b898285 | Iustin Pop | cluster size grows; this is something that we haven't been able to |
73 | 6b898285 | Iustin Pop | test reliably yet, and as such we still have not yet diagnosed |
74 | 6b898285 | Iustin Pop | scaling problems |
75 | 6b898285 | Iustin Pop | - easier integration with external tools (and even with HTools) |
76 | 6b898285 | Iustin Pop | |
77 | 6b898285 | Iustin Pop | ``masterd`` |
78 | 6b898285 | Iustin Pop | ----------- |
79 | 6b898285 | Iustin Pop | |
80 | 6b898285 | Iustin Pop | As described above, ``masterd`` already works reasonably well in a |
81 | 6b898285 | Iustin Pop | virtual setup, as it won't execute external programs and it shouldn't |
82 | 6b898285 | Iustin Pop | directly read files from the local filesystem (or at least not |
83 | 6b898285 | Iustin Pop | virtualisation-related, as the master node can be a non-vm_capable |
84 | 6b898285 | Iustin Pop | node). |
85 | 6b898285 | Iustin Pop | |
86 | 6b898285 | Iustin Pop | ``noded`` |
87 | 6b898285 | Iustin Pop | --------- |
88 | 6b898285 | Iustin Pop | |
89 | 6b898285 | Iustin Pop | The node daemon executes many privileged operations, but they can be |
90 | 6b898285 | Iustin Pop | split in a few general categories: |
91 | 6b898285 | Iustin Pop | |
92 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
93 | 6b898285 | Iustin Pop | |Category |Description |Solution | |
94 | 6b898285 | Iustin Pop | +===============+=======================+====================================+ |
95 | 6b898285 | Iustin Pop | |disk operations|Disk creation and |Use only diskless or file-based | |
96 | 6b898285 | Iustin Pop | | |removal |instances | |
97 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
98 | 6b898285 | Iustin Pop | |disk query |Node disk total/free, |Not supported currently, could use | |
99 | 6b898285 | Iustin Pop | | |used in node listing |file-based | |
100 | 6b898285 | Iustin Pop | | |and htools | | |
101 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
102 | 6b898285 | Iustin Pop | |hypervisor |Instance start, stop |Use the *fake* hypervisor | |
103 | 6b898285 | Iustin Pop | |operations |and query | | |
104 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
105 | 6b898285 | Iustin Pop | |instance |Bridge existence query |Unprivileged operation, can be used | |
106 | 6b898285 | Iustin Pop | |networking | |with an existing bridge at system | |
107 | 6b898285 | Iustin Pop | | | |level or use NIC-less instances | |
108 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
109 | 99721eb4 | Michael Hanselmann | |instance OS |OS add, OS rename, |Only used with non-diskless | |
110 | 6b898285 | Iustin Pop | |operations |export and import |instances; could work with custom OS| |
111 | 99721eb4 | Michael Hanselmann | | | |scripts that just ``dd`` without | |
112 | 6b898285 | Iustin Pop | | | |mounting filesystems | |
113 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
114 | 6b898285 | Iustin Pop | |node networking|IP address management |Not supported; Ganeti will need to | |
115 | 99721eb4 | Michael Hanselmann | | |(master ip), IP query, |work without a master IP; for the IP| |
116 | 99721eb4 | Michael Hanselmann | | |etc. |query operations the test machine | |
117 | 6b898285 | Iustin Pop | | | |would need externally-configured IPs| |
118 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
119 | 99721eb4 | Michael Hanselmann | |node add |- |SSH command must be adjusted | |
120 | 99721eb4 | Michael Hanselmann | +---------------+-----------------------+------------------------------------+ |
121 | 6b898285 | Iustin Pop | |node setup |ssh, /etc/hosts, so on |Can already be disabled from the | |
122 | 6b898285 | Iustin Pop | | | |cluster config | |
123 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
124 | 6b898285 | Iustin Pop | |master failover|start/stop the master |Doable (as long as we use a single | |
125 | 6b898285 | Iustin Pop | | |daemon |user), might get tricky w.r.t. paths| |
126 | 6b898285 | Iustin Pop | | | |to executables | |
127 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
128 | 6b898285 | Iustin Pop | |file upload |Uploading of system |The only issue could be with system | |
129 | 6b898285 | Iustin Pop | | |files, job queue files |files, which are not owned by the | |
130 | 6b898285 | Iustin Pop | | |and ganeti config |current user; internal ganeti files | |
131 | 6b898285 | Iustin Pop | | | |should be working fine | |
132 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
133 | 6b898285 | Iustin Pop | |node oob |Out-of-band commands |Since these are user-defined, we can| |
134 | 6b898285 | Iustin Pop | | | |mock them easily | |
135 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
136 | 6b898285 | Iustin Pop | |node OS |List the existing OSes |No special privileges needed, so | |
137 | 6b898285 | Iustin Pop | |discovery |and their properties |works fine as-is | |
138 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
139 | 6b898285 | Iustin Pop | |hooks |Running hooks for given|No special privileges needed | |
140 | 6b898285 | Iustin Pop | | |operations | | |
141 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
142 | 6b898285 | Iustin Pop | |iallocator |Calling an iallocator |No special privileges needed | |
143 | 6b898285 | Iustin Pop | | |script | | |
144 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
145 | 6b898285 | Iustin Pop | |export/import |Exporting and importing|When exporting/importing file-based | |
146 | 6b898285 | Iustin Pop | | |instances |instances, this should work, as the | |
147 | 6b898285 | Iustin Pop | | | |listening ports are dynamically | |
148 | 6b898285 | Iustin Pop | | | |chosen | |
149 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
150 | 6b898285 | Iustin Pop | |hypervisor |The validation of |As long as the hypervisors don't | |
151 | 6b898285 | Iustin Pop | |validation |hypervisor parameters |call to privileged commands, it | |
152 | 6b898285 | Iustin Pop | | | |should work | |
153 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
154 | 6b898285 | Iustin Pop | |node powercycle|The ability to power |Privileged, so not supported, but | |
155 | 6b898285 | Iustin Pop | | |cycle a node remotely |anyway not very interesting for | |
156 | 6b898285 | Iustin Pop | | | |testing | |
157 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
158 | 6b898285 | Iustin Pop | |
159 | 6b898285 | Iustin Pop | It seems that much of the functionality works as is, or could work with |
160 | 6b898285 | Iustin Pop | small adjustments, even in a non-privileged setup. The bigger problem is |
161 | 6b898285 | Iustin Pop | the actual use of multiple node daemons per machine. |
162 | 6b898285 | Iustin Pop | |
163 | 6b898285 | Iustin Pop | Multiple ``noded`` per machine |
164 | 6b898285 | Iustin Pop | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
165 | 6b898285 | Iustin Pop | |
166 | 6b898285 | Iustin Pop | Currently Ganeti identifies node simply by their hostname. Since |
167 | 6b898285 | Iustin Pop | changing this method would imply significant changes to tracking the |
168 | 6b898285 | Iustin Pop | nodes, the proposal is to simply have as many IPs per the (single) |
169 | 6b898285 | Iustin Pop | machine that is used for tests as nodes, and have each IP correspond to |
170 | 6b898285 | Iustin Pop | a different name, and thus no changes are needed to the core RPC |
171 | 6b898285 | Iustin Pop | library. Unfortunately this has the downside of requiring root rights |
172 | 6b898285 | Iustin Pop | for setting up the extra IPs and hostnames. |
173 | 6b898285 | Iustin Pop | |
174 | 6b898285 | Iustin Pop | An alternative option is to implement per-node IP/port support in Ganeti |
175 | 6b898285 | Iustin Pop | (especially in the RPC layer), which would eliminate the root rights. We |
176 | 99721eb4 | Michael Hanselmann | expect that this will get implemented as a second step of this design, |
177 | 99721eb4 | Michael Hanselmann | but as the port is currently static will require changes in many places. |
178 | 6b898285 | Iustin Pop | |
179 | 6b898285 | Iustin Pop | The only remaining problem is with sharing the ``localstatedir`` |
180 | 6b898285 | Iustin Pop | structure (lib, run, log) amongst the daemons, for which we propose to |
181 | 99721eb4 | Michael Hanselmann | introduce an environment variable (``GANETI_ROOTDIR``) acting as a |
182 | 99721eb4 | Michael Hanselmann | prefix for essentially all paths. An environment variable is easier to |
183 | 99721eb4 | Michael Hanselmann | transport through several levels of programs (shell scripts, Python, |
184 | 99721eb4 | Michael Hanselmann | etc.) than a command line parameter. In Python code this prefix will be |
185 | 99721eb4 | Michael Hanselmann | applied to all paths in ``constants.py``. Every virtual node will get |
186 | 99721eb4 | Michael Hanselmann | its own root directory. The rationale for this is two-fold: |
187 | 6b898285 | Iustin Pop | |
188 | 6b898285 | Iustin Pop | - having two or more node daemons writing to the same directory might |
189 | 6b898285 | Iustin Pop | introduce artificial scenarios not existent in real life; currently |
190 | 6b898285 | Iustin Pop | noded either owns the entire ``/var/lib/ganeti`` directory or shares |
191 | 6b898285 | Iustin Pop | it with masterd, but never with another noded |
192 | 6b898285 | Iustin Pop | - having separate directories allows cluster verify to check correctly |
193 | 6b898285 | Iustin Pop | consistency of file upload operations; otherwise, as long as one node |
194 | 6b898285 | Iustin Pop | daemon wrote a file successfully, the results from all others are |
195 | 6b898285 | Iustin Pop | “lost” |
196 | 6b898285 | Iustin Pop | |
197 | 99721eb4 | Michael Hanselmann | In case the use of an environment variable turns out to be too difficult |
198 | 99721eb4 | Michael Hanselmann | a compile-time prefix path could be used. This would then require one |
199 | 99721eb4 | Michael Hanselmann | Ganeti installation per virtual node, but it might be good enough. |
200 | 6b898285 | Iustin Pop | |
201 | 6b898285 | Iustin Pop | ``rapi`` |
202 | 6b898285 | Iustin Pop | -------- |
203 | 6b898285 | Iustin Pop | |
204 | 6b898285 | Iustin Pop | The RAPI daemon is not privileged and furthermore we only need one per |
205 | 6b898285 | Iustin Pop | cluster, so it presents no issues. |
206 | 6b898285 | Iustin Pop | |
207 | 6b898285 | Iustin Pop | ``confd`` |
208 | 6b898285 | Iustin Pop | --------- |
209 | 6b898285 | Iustin Pop | |
210 | 6b898285 | Iustin Pop | ``confd`` has somewhat the same issues as the node daemon regarding |
211 | 6b898285 | Iustin Pop | multiple daemons per machine, but the per-address binding still works. |
212 | 6b898285 | Iustin Pop | |
213 | 6b898285 | Iustin Pop | ``ganeti-watcher`` |
214 | 6b898285 | Iustin Pop | ------------------ |
215 | 6b898285 | Iustin Pop | |
216 | 6b898285 | Iustin Pop | Since the startup of daemons will be customised with per-IP binds, the |
217 | 6b898285 | Iustin Pop | watcher either has to be modified to not activate the daemons, or the |
218 | 6b898285 | Iustin Pop | start-stop tool has to take this into account. Due to watcher's use of |
219 | 6b898285 | Iustin Pop | the hostname, it's recommended that the master node is set to the |
220 | 6b898285 | Iustin Pop | machine hostname (also a requirement for the master daemon). |
221 | 6b898285 | Iustin Pop | |
222 | 6b898285 | Iustin Pop | CLI scripts |
223 | 6b898285 | Iustin Pop | ----------- |
224 | 6b898285 | Iustin Pop | |
225 | 6b898285 | Iustin Pop | As long as the master node is set to the machine hostname, these should |
226 | 6b898285 | Iustin Pop | work fine. |
227 | 6b898285 | Iustin Pop | |
228 | 6b898285 | Iustin Pop | Cluster initialisation |
229 | 6b898285 | Iustin Pop | ---------------------- |
230 | 6b898285 | Iustin Pop | |
231 | 6b898285 | Iustin Pop | It could be possible that the cluster initialisation procedure is a bit |
232 | 99721eb4 | Michael Hanselmann | more involved (this was not tried yet). A script will be used to set up |
233 | 99721eb4 | Michael Hanselmann | all necessary IP addresses and hostnames, as well as creating the |
234 | 99721eb4 | Michael Hanselmann | initial directory structure. Building ``config.data`` manually should |
235 | 99721eb4 | Michael Hanselmann | not be necessary. |
236 | 6b898285 | Iustin Pop | |
237 | 6b898285 | Iustin Pop | Needed tools |
238 | 6b898285 | Iustin Pop | ============ |
239 | 6b898285 | Iustin Pop | |
240 | 6b898285 | Iustin Pop | With the above investigation results in mind, the only thing we need |
241 | 6b898285 | Iustin Pop | are: |
242 | 6b898285 | Iustin Pop | |
243 | 6b898285 | Iustin Pop | - a tool to setup per-virtual node tree structure of ``localstatedir`` |
244 | 99721eb4 | Michael Hanselmann | (with the help of ``ensure-dirs``) and setup correctly the extra |
245 | 99721eb4 | Michael Hanselmann | IP/hostnames |
246 | 6b898285 | Iustin Pop | - changes to the startup daemon tools to launch correctly the daemons |
247 | 6b898285 | Iustin Pop | per virtual node |
248 | 99721eb4 | Michael Hanselmann | - changes to ``constants.py`` to override the ``localstatedir`` path |
249 | 6b898285 | Iustin Pop | - documentation for running such a virtual cluster |
250 | 6b898285 | Iustin Pop | - and eventual small fixes to the node daemon backend functionality, to |
251 | 6b898285 | Iustin Pop | better separate privileged and non-privileged code |
252 | 6b898285 | Iustin Pop | |
253 | 6b898285 | Iustin Pop | .. vim: set textwidth=72 : |
254 | 6b898285 | Iustin Pop | .. Local Variables: |
255 | 6b898285 | Iustin Pop | .. mode: rst |
256 | 6b898285 | Iustin Pop | .. fill-column: 72 |
257 | 6b898285 | Iustin Pop | .. End: |