root / doc / design-virtual-clusters.rst @ 7142485a
History | View | Annotate | Download (11.3 kB)
1 | 6b898285 | Iustin Pop | ========================== |
---|---|---|---|
2 | 6b898285 | Iustin Pop | Virtual clusters support |
3 | 6b898285 | Iustin Pop | ========================== |
4 | 6b898285 | Iustin Pop | |
5 | 6b898285 | Iustin Pop | |
6 | 6b898285 | Iustin Pop | Introduction |
7 | 6b898285 | Iustin Pop | ============ |
8 | 6b898285 | Iustin Pop | |
9 | 6b898285 | Iustin Pop | Currently there are two ways to test the Ganeti (including HTools) code |
10 | 6b898285 | Iustin Pop | base: |
11 | 6b898285 | Iustin Pop | |
12 | 6b898285 | Iustin Pop | - unittests, which run using mocks as normal user and test small bits of |
13 | 6b898285 | Iustin Pop | the code |
14 | 6b898285 | Iustin Pop | - QA/burnin/live-test, which require actual hardware (either physical or |
15 | 6b898285 | Iustin Pop | virtual) and will build an actual cluster, with one machine to one |
16 | 6b898285 | Iustin Pop | node correspondence |
17 | 6b898285 | Iustin Pop | |
18 | 6b898285 | Iustin Pop | The difference in time between these two is significant: |
19 | 6b898285 | Iustin Pop | |
20 | 6b898285 | Iustin Pop | - the unittests run in about 1-2 minutes |
21 | 6b898285 | Iustin Pop | - a so-called ‘quick’ QA (without burnin) runs in about an hour, and a |
22 | 6b898285 | Iustin Pop | full QA could be double that time |
23 | 6b898285 | Iustin Pop | |
24 | 6b898285 | Iustin Pop | On one hand, the unittests have a clear advantage: quick to run, not |
25 | 6b898285 | Iustin Pop | requiring many machines, but on the other hand QA is actually able to |
26 | 6b898285 | Iustin Pop | run end-to-end tests (including HTools, for example). |
27 | 6b898285 | Iustin Pop | |
28 | 6b898285 | Iustin Pop | Ideally, we would have an intermediate step between these two extremes: |
29 | 6b898285 | Iustin Pop | be able to test most, if not all, of Ganeti's functionality but without |
30 | 6b898285 | Iustin Pop | requiring actual hardware, full machine ownership or root access. |
31 | 6b898285 | Iustin Pop | |
32 | 6b898285 | Iustin Pop | |
33 | 6b898285 | Iustin Pop | Current situation |
34 | 6b898285 | Iustin Pop | ================= |
35 | 6b898285 | Iustin Pop | |
36 | 6b898285 | Iustin Pop | Ganeti |
37 | 6b898285 | Iustin Pop | ------ |
38 | 6b898285 | Iustin Pop | |
39 | 6b898285 | Iustin Pop | It is possible, given a manually built ``config.data`` and |
40 | 6b898285 | Iustin Pop | ``_autoconf.py``, to run the masterd under the current user as a |
41 | 6b898285 | Iustin Pop | single-node cluster master. However, the node daemon and related |
42 | 6b898285 | Iustin Pop | functionality (cluster initialisation, master failover, etc.) are not |
43 | 6b898285 | Iustin Pop | directly runnable in this model. |
44 | 6b898285 | Iustin Pop | |
45 | 6b898285 | Iustin Pop | Also, masterd only works as a master of a single node cluster, due to |
46 | 6b898285 | Iustin Pop | our current “hostname” method of identifying nodes, which results in a |
47 | 6b898285 | Iustin Pop | limit of maximum one node daemon per machine, unless we use multiple |
48 | 6b898285 | Iustin Pop | name and IP aliases. |
49 | 6b898285 | Iustin Pop | |
50 | 6b898285 | Iustin Pop | HTools |
51 | 6b898285 | Iustin Pop | ------ |
52 | 6b898285 | Iustin Pop | |
53 | 6b898285 | Iustin Pop | In HTools the situation is better, since it doesn't have to deal with |
54 | 6b898285 | Iustin Pop | actual machine management: all tools can use a custom LUXI path, and can |
55 | 6b898285 | Iustin Pop | even load RAPI data from the filesystem (so the RAPI backend can be |
56 | 6b898285 | Iustin Pop | tested), and both the ‘text’ backend for hbal/hspace and the input files |
57 | 6b898285 | Iustin Pop | for hail are text-based, loaded from the file-system. |
58 | 6b898285 | Iustin Pop | |
59 | 6b898285 | Iustin Pop | Proposed changes |
60 | 6b898285 | Iustin Pop | ================ |
61 | 6b898285 | Iustin Pop | |
62 | 6b898285 | Iustin Pop | The end-goal is to have full support for “virtual clusters”, i.e. be |
63 | 6b898285 | Iustin Pop | able to run a “big” (hundreds of virtual nodes and towards thousands of |
64 | 6b898285 | Iustin Pop | virtual instances) on a reasonably powerful, but single machine, under a |
65 | 6b898285 | Iustin Pop | single user account and without any special privileges. |
66 | 6b898285 | Iustin Pop | |
67 | 6b898285 | Iustin Pop | This would have significant advantages: |
68 | 6b898285 | Iustin Pop | |
69 | 6b898285 | Iustin Pop | - being able to test end-to-end certain changes, without requiring a |
70 | 6b898285 | Iustin Pop | complicated setup |
71 | 6b898285 | Iustin Pop | - better able to estimate Ganeti's behaviour and performance as the |
72 | 6b898285 | Iustin Pop | cluster size grows; this is something that we haven't been able to |
73 | 6b898285 | Iustin Pop | test reliably yet, and as such we still have not yet diagnosed |
74 | 6b898285 | Iustin Pop | scaling problems |
75 | 6b898285 | Iustin Pop | - easier integration with external tools (and even with HTools) |
76 | 6b898285 | Iustin Pop | |
77 | 6b898285 | Iustin Pop | ``masterd`` |
78 | 6b898285 | Iustin Pop | ----------- |
79 | 6b898285 | Iustin Pop | |
80 | 6b898285 | Iustin Pop | As described above, ``masterd`` already works reasonably well in a |
81 | 6b898285 | Iustin Pop | virtual setup, as it won't execute external programs and it shouldn't |
82 | 6b898285 | Iustin Pop | directly read files from the local filesystem (or at least not |
83 | 6b898285 | Iustin Pop | virtualisation-related, as the master node can be a non-vm_capable |
84 | 6b898285 | Iustin Pop | node). |
85 | 6b898285 | Iustin Pop | |
86 | 6b898285 | Iustin Pop | ``noded`` |
87 | 6b898285 | Iustin Pop | --------- |
88 | 6b898285 | Iustin Pop | |
89 | 6b898285 | Iustin Pop | The node daemon executes many privileged operations, but they can be |
90 | 6b898285 | Iustin Pop | split in a few general categories: |
91 | 6b898285 | Iustin Pop | |
92 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
93 | 6b898285 | Iustin Pop | |Category |Description |Solution | |
94 | 6b898285 | Iustin Pop | +===============+=======================+====================================+ |
95 | 6b898285 | Iustin Pop | |disk operations|Disk creation and |Use only diskless or file-based | |
96 | 6b898285 | Iustin Pop | | |removal |instances | |
97 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
98 | 6b898285 | Iustin Pop | |disk query |Node disk total/free, |Not supported currently, could use | |
99 | 6b898285 | Iustin Pop | | |used in node listing |file-based | |
100 | 6b898285 | Iustin Pop | | |and htools | | |
101 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
102 | 6b898285 | Iustin Pop | |hypervisor |Instance start, stop |Use the *fake* hypervisor | |
103 | 6b898285 | Iustin Pop | |operations |and query | | |
104 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
105 | 6b898285 | Iustin Pop | |instance |Bridge existence query |Unprivileged operation, can be used | |
106 | 6b898285 | Iustin Pop | |networking | |with an existing bridge at system | |
107 | 6b898285 | Iustin Pop | | | |level or use NIC-less instances | |
108 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
109 | 6b898285 | Iustin Pop | |instance OS |OS add, OS rename, |Only used with non diskless | |
110 | 6b898285 | Iustin Pop | |operations |export and import |instances; could work with custom OS| |
111 | 6b898285 | Iustin Pop | | | |scripts (that just ``dd`` without | |
112 | 6b898285 | Iustin Pop | | | |mounting filesystems | |
113 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
114 | 6b898285 | Iustin Pop | |node networking|IP address management |Not supported; Ganeti will need to | |
115 | 6b898285 | Iustin Pop | | |(master ip), IP query, |work without a master IP. For the IP| |
116 | 6b898285 | Iustin Pop | | |etc. |query operations, the test machine | |
117 | 6b898285 | Iustin Pop | | | |would need externally-configured IPs| |
118 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
119 | 6b898285 | Iustin Pop | |node setup |ssh, /etc/hosts, so on |Can already be disabled from the | |
120 | 6b898285 | Iustin Pop | | | |cluster config | |
121 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
122 | 6b898285 | Iustin Pop | |master failover|start/stop the master |Doable (as long as we use a single | |
123 | 6b898285 | Iustin Pop | | |daemon |user), might get tricky w.r.t. paths| |
124 | 6b898285 | Iustin Pop | | | |to executables | |
125 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
126 | 6b898285 | Iustin Pop | |file upload |Uploading of system |The only issue could be with system | |
127 | 6b898285 | Iustin Pop | | |files, job queue files |files, which are not owned by the | |
128 | 6b898285 | Iustin Pop | | |and ganeti config |current user; internal ganeti files | |
129 | 6b898285 | Iustin Pop | | | |should be working fine | |
130 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
131 | 6b898285 | Iustin Pop | |node oob |Out-of-band commands |Since these are user-defined, we can| |
132 | 6b898285 | Iustin Pop | | | |mock them easily | |
133 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
134 | 6b898285 | Iustin Pop | |node OS |List the existing OSes |No special privileges needed, so | |
135 | 6b898285 | Iustin Pop | |discovery |and their properties |works fine as-is | |
136 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
137 | 6b898285 | Iustin Pop | |hooks |Running hooks for given|No special privileges needed | |
138 | 6b898285 | Iustin Pop | | |operations | | |
139 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
140 | 6b898285 | Iustin Pop | |iallocator |Calling an iallocator |No special privileges needed | |
141 | 6b898285 | Iustin Pop | | |script | | |
142 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
143 | 6b898285 | Iustin Pop | |export/import |Exporting and importing|When exporting/importing file-based | |
144 | 6b898285 | Iustin Pop | | |instances |instances, this should work, as the | |
145 | 6b898285 | Iustin Pop | | | |listening ports are dynamically | |
146 | 6b898285 | Iustin Pop | | | |chosen | |
147 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
148 | 6b898285 | Iustin Pop | |hypervisor |The validation of |As long as the hypervisors don't | |
149 | 6b898285 | Iustin Pop | |validation |hypervisor parameters |call to privileged commands, it | |
150 | 6b898285 | Iustin Pop | | | |should work | |
151 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
152 | 6b898285 | Iustin Pop | |node powercycle|The ability to power |Privileged, so not supported, but | |
153 | 6b898285 | Iustin Pop | | |cycle a node remotely |anyway not very interesting for | |
154 | 6b898285 | Iustin Pop | | | |testing | |
155 | 6b898285 | Iustin Pop | +---------------+-----------------------+------------------------------------+ |
156 | 6b898285 | Iustin Pop | |
157 | 6b898285 | Iustin Pop | It seems that much of the functionality works as is, or could work with |
158 | 6b898285 | Iustin Pop | small adjustments, even in a non-privileged setup. The bigger problem is |
159 | 6b898285 | Iustin Pop | the actual use of multiple node daemons per machine. |
160 | 6b898285 | Iustin Pop | |
161 | 6b898285 | Iustin Pop | Multiple ``noded`` per machine |
162 | 6b898285 | Iustin Pop | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
163 | 6b898285 | Iustin Pop | |
164 | 6b898285 | Iustin Pop | Currently Ganeti identifies node simply by their hostname. Since |
165 | 6b898285 | Iustin Pop | changing this method would imply significant changes to tracking the |
166 | 6b898285 | Iustin Pop | nodes, the proposal is to simply have as many IPs per the (single) |
167 | 6b898285 | Iustin Pop | machine that is used for tests as nodes, and have each IP correspond to |
168 | 6b898285 | Iustin Pop | a different name, and thus no changes are needed to the core RPC |
169 | 6b898285 | Iustin Pop | library. Unfortunately this has the downside of requiring root rights |
170 | 6b898285 | Iustin Pop | for setting up the extra IPs and hostnames. |
171 | 6b898285 | Iustin Pop | |
172 | 6b898285 | Iustin Pop | An alternative option is to implement per-node IP/port support in Ganeti |
173 | 6b898285 | Iustin Pop | (especially in the RPC layer), which would eliminate the root rights. We |
174 | 6b898285 | Iustin Pop | expect that this will get implemented as a second step of this design. |
175 | 6b898285 | Iustin Pop | |
176 | 6b898285 | Iustin Pop | The only remaining problem is with sharing the ``localstatedir`` |
177 | 6b898285 | Iustin Pop | structure (lib, run, log) amongst the daemons, for which we propose to |
178 | 6b898285 | Iustin Pop | add a command line parameter which can override this path (via injection |
179 | 6b898285 | Iustin Pop | into ``_autoconf.py``). The rationale for this is two-fold: |
180 | 6b898285 | Iustin Pop | |
181 | 6b898285 | Iustin Pop | - having two or more node daemons writing to the same directory might |
182 | 6b898285 | Iustin Pop | introduce artificial scenarios not existent in real life; currently |
183 | 6b898285 | Iustin Pop | noded either owns the entire ``/var/lib/ganeti`` directory or shares |
184 | 6b898285 | Iustin Pop | it with masterd, but never with another noded |
185 | 6b898285 | Iustin Pop | - having separate directories allows cluster verify to check correctly |
186 | 6b898285 | Iustin Pop | consistency of file upload operations; otherwise, as long as one node |
187 | 6b898285 | Iustin Pop | daemon wrote a file successfully, the results from all others are |
188 | 6b898285 | Iustin Pop | “lost” |
189 | 6b898285 | Iustin Pop | |
190 | 6b898285 | Iustin Pop | |
191 | 6b898285 | Iustin Pop | ``rapi`` |
192 | 6b898285 | Iustin Pop | -------- |
193 | 6b898285 | Iustin Pop | |
194 | 6b898285 | Iustin Pop | The RAPI daemon is not privileged and furthermore we only need one per |
195 | 6b898285 | Iustin Pop | cluster, so it presents no issues. |
196 | 6b898285 | Iustin Pop | |
197 | 6b898285 | Iustin Pop | ``confd`` |
198 | 6b898285 | Iustin Pop | --------- |
199 | 6b898285 | Iustin Pop | |
200 | 6b898285 | Iustin Pop | ``confd`` has somewhat the same issues as the node daemon regarding |
201 | 6b898285 | Iustin Pop | multiple daemons per machine, but the per-address binding still works. |
202 | 6b898285 | Iustin Pop | |
203 | 6b898285 | Iustin Pop | ``ganeti-watcher`` |
204 | 6b898285 | Iustin Pop | ------------------ |
205 | 6b898285 | Iustin Pop | |
206 | 6b898285 | Iustin Pop | Since the startup of daemons will be customised with per-IP binds, the |
207 | 6b898285 | Iustin Pop | watcher either has to be modified to not activate the daemons, or the |
208 | 6b898285 | Iustin Pop | start-stop tool has to take this into account. Due to watcher's use of |
209 | 6b898285 | Iustin Pop | the hostname, it's recommended that the master node is set to the |
210 | 6b898285 | Iustin Pop | machine hostname (also a requirement for the master daemon). |
211 | 6b898285 | Iustin Pop | |
212 | 6b898285 | Iustin Pop | CLI scripts |
213 | 6b898285 | Iustin Pop | ----------- |
214 | 6b898285 | Iustin Pop | |
215 | 6b898285 | Iustin Pop | As long as the master node is set to the machine hostname, these should |
216 | 6b898285 | Iustin Pop | work fine. |
217 | 6b898285 | Iustin Pop | |
218 | 6b898285 | Iustin Pop | Cluster initialisation |
219 | 6b898285 | Iustin Pop | ---------------------- |
220 | 6b898285 | Iustin Pop | |
221 | 6b898285 | Iustin Pop | It could be possible that the cluster initialisation procedure is a bit |
222 | 6b898285 | Iustin Pop | more involved (this was not tried yet). In any case, we can build a |
223 | 6b898285 | Iustin Pop | ``config.data`` file manually, without having to actually run |
224 | 6b898285 | Iustin Pop | ``gnt-cluster init``. |
225 | 6b898285 | Iustin Pop | |
226 | 6b898285 | Iustin Pop | Needed tools |
227 | 6b898285 | Iustin Pop | ============ |
228 | 6b898285 | Iustin Pop | |
229 | 6b898285 | Iustin Pop | With the above investigation results in mind, the only thing we need |
230 | 6b898285 | Iustin Pop | are: |
231 | 6b898285 | Iustin Pop | |
232 | 6b898285 | Iustin Pop | - a tool to setup per-virtual node tree structure of ``localstatedir`` |
233 | 6b898285 | Iustin Pop | and setup correctly the extra IP/hostnames |
234 | 6b898285 | Iustin Pop | - changes to the startup daemon tools to launch correctly the daemons |
235 | 6b898285 | Iustin Pop | per virtual node |
236 | 6b898285 | Iustin Pop | - changes to ``noded`` to override the ``localstatedir`` path |
237 | 6b898285 | Iustin Pop | - documentation for running such a virtual cluster |
238 | 6b898285 | Iustin Pop | - and eventual small fixes to the node daemon backend functionality, to |
239 | 6b898285 | Iustin Pop | better separate privileged and non-privileged code |
240 | 6b898285 | Iustin Pop | |
241 | 6b898285 | Iustin Pop | .. vim: set textwidth=72 : |
242 | 6b898285 | Iustin Pop | .. Local Variables: |
243 | 6b898285 | Iustin Pop | .. mode: rst |
244 | 6b898285 | Iustin Pop | .. fill-column: 72 |
245 | 6b898285 | Iustin Pop | .. End: |