Revision 5b2069a9 doc/design-2.2.rst
b/doc/design-2.2.rst | ||
---|---|---|
176 | 176 |
function processes and wait for all of them to terminate. |
177 | 177 |
|
178 | 178 |
|
179 |
Inter-cluster instance moves |
|
180 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
181 |
|
|
182 |
Current state and shortcomings |
|
183 |
++++++++++++++++++++++++++++++ |
|
184 |
|
|
185 |
With the current design of Ganeti, moving whole instances between |
|
186 |
different clusters involves a lot of manual work. There are several ways |
|
187 |
to move instances, one of them being to export the instance, manually |
|
188 |
copying all data to the new cluster before importing it again. Manual |
|
189 |
changes to the instances configuration, such as the IP address, may be |
|
190 |
necessary in the new environment. The goal is to improve and automate |
|
191 |
this process in Ganeti 2.2. |
|
192 |
|
|
193 |
Proposed changes |
|
194 |
++++++++++++++++ |
|
195 |
|
|
196 |
Authorization, Authentication and Security |
|
197 |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
198 |
|
|
199 |
Until now, each Ganeti cluster was a self-contained entity and wouldn't |
|
200 |
talk to other Ganeti clusters. Nodes within clusters only had to trust |
|
201 |
the other nodes in the same cluster and the network used for replication |
|
202 |
was trusted, too (hence the ability the use a separate, local network |
|
203 |
for replication). |
|
204 |
|
|
205 |
For inter-cluster instance transfers this model must be weakened. Nodes |
|
206 |
in one cluster will have to talk to nodes in other clusters, sometimes |
|
207 |
in other locations and, very important, via untrusted network |
|
208 |
connections. |
|
209 |
|
|
210 |
Various option have been considered for securing and authenticating the |
|
211 |
data transfer from one machine to another. To reduce the risk of |
|
212 |
accidentally overwriting data due to software bugs, authenticating the |
|
213 |
arriving data was considered critical. Eventually we decided to use |
|
214 |
socat's OpenSSL options (``OPENSSL:``, ``OPENSSL-LISTEN:`` et al), which |
|
215 |
provide us with encryption, authentication and authorization when used |
|
216 |
with separate keys and certificates. |
|
217 |
|
|
218 |
Combinations of OpenSSH, GnuPG and Netcat were deemed too complex to set |
|
219 |
up from within Ganeti. Any solution involving OpenSSH would require a |
|
220 |
dedicated user with a home directory and likely automated modifications |
|
221 |
to the user's ``$HOME/.ssh/authorized_keys`` file. When using Netcat, |
|
222 |
GnuPG or another encryption method would be necessary to transfer the |
|
223 |
data over an untrusted network. socat combines both in one program and |
|
224 |
is already a dependency. |
|
225 |
|
|
226 |
Each of the two clusters will have to generate an RSA key. The public |
|
227 |
parts are exchanged between the clusters by a third party, such as an |
|
228 |
administrator or a system interacting with Ganeti via the remote API |
|
229 |
("third party" from here on). After receiving each other's public key, |
|
230 |
the clusters can start talking to each other. |
|
231 |
|
|
232 |
All encrypted connections must be verified on both sides. Neither side |
|
233 |
may accept unverified certificates. The generated certificate should |
|
234 |
only be valid for the time necessary to move the instance. |
|
235 |
|
|
236 |
On the web, the destination cluster would be equivalent to an HTTPS |
|
237 |
server requiring verifiable client certificates. The browser would be |
|
238 |
equivalent to the source cluster and must verify the server's |
|
239 |
certificate while providing a client certificate to the server. |
|
240 |
|
|
241 |
Copying data |
|
242 |
^^^^^^^^^^^^ |
|
243 |
|
|
244 |
To simplify the implementation, we decided to operate at a block-device |
|
245 |
level only, allowing us to easily support non-DRBD instance moves. |
|
246 |
|
|
247 |
Intra-cluster instance moves will re-use the existing export and import |
|
248 |
scripts supplied by instance OS definitions. Unlike simply copying the |
|
249 |
raw data, this allows to use filesystem-specific utilities to dump only |
|
250 |
used parts of the disk and to exclude certain disks from the move. |
|
251 |
Compression should be used to further reduce the amount of data |
|
252 |
transferred. |
|
253 |
|
|
254 |
The export scripts writes all data to stdout and the import script reads |
|
255 |
it from stdin again. To avoid copying data and reduce disk space |
|
256 |
consumption, everything is read from the disk and sent over the network |
|
257 |
directly, where it'll be written to the new block device directly again. |
|
258 |
|
|
259 |
Workflow |
|
260 |
^^^^^^^^ |
|
261 |
|
|
262 |
#. Third party tells source cluster to shut down instance, asks for the |
|
263 |
instance specification and for the public part of an encryption key |
|
264 |
#. Third party tells destination cluster to create an instance with the |
|
265 |
same specifications as on source cluster and to prepare for an |
|
266 |
instance move with the key received from the source cluster and |
|
267 |
receives the public part of the destination's encryption key |
|
268 |
#. Third party hands public part of the destination's encryption key |
|
269 |
together with all necessary information to source cluster and tells |
|
270 |
it to start the move |
|
271 |
#. Source cluster connects to destination cluster for each disk and |
|
272 |
transfers its data using the instance OS definition's export and |
|
273 |
import scripts |
|
274 |
#. Due to the asynchronous nature of the whole process, the destination |
|
275 |
cluster checks whether all disks have been transferred every time |
|
276 |
after transfering a single disk; if so, it destroys the encryption |
|
277 |
key |
|
278 |
#. After sending all disks, the source cluster destroys its key |
|
279 |
#. Destination cluster runs OS definition's rename script to adjust |
|
280 |
instance settings if needed (e.g. IP address) |
|
281 |
#. Destination cluster starts the instance if requested at the beginning |
|
282 |
by the third party |
|
283 |
#. Source cluster removes the instance if requested |
|
284 |
|
|
285 |
Miscellaneous notes |
|
286 |
^^^^^^^^^^^^^^^^^^^ |
|
287 |
|
|
288 |
- A very similar system could also be used for instance exports within |
|
289 |
the same cluster. Currently OpenSSH is being used, but could be |
|
290 |
replaced by socat and SSL/TLS. |
|
291 |
- During the design of intra-cluster instance moves we also discussed |
|
292 |
encrypting instance exports using GnuPG. |
|
293 |
- While most instances should have exactly the same configuration as |
|
294 |
on the source cluster, setting them up with a different disk layout |
|
295 |
might be helpful in some use-cases. |
|
296 |
- A cleanup operation, similar to the one available for failed instance |
|
297 |
migrations, should be provided. |
|
298 |
- ``ganeti-watcher`` should remove instances pending a move from another |
|
299 |
cluster after a certain amount of time. This takes care of failures |
|
300 |
somewhere in the process. |
|
301 |
- RSA keys can be generated using the existing |
|
302 |
``bootstrap.GenerateSelfSignedSslCert`` function, though it might be |
|
303 |
useful to not write both parts into a single file, requiring small |
|
304 |
changes to the function. The public part always starts with |
|
305 |
``-----BEGIN CERTIFICATE-----`` and ends with ``-----END |
|
306 |
CERTIFICATE-----``. |
|
307 |
- The source and destination cluster might be different when it comes |
|
308 |
to available hypervisors, kernels, etc. The destination cluster should |
|
309 |
refuse to accept an instance move if it can't fulfill an instance's |
|
310 |
requirements. |
|
311 |
|
|
312 |
|
|
179 | 313 |
Feature changes |
180 | 314 |
--------------- |
181 | 315 |
|
Also available in: Unified diff