Revision 5b2069a9

b/doc/design-2.2.rst
176 176
function processes and wait for all of them to terminate.
177 177

  
178 178

  
179
Inter-cluster instance moves
180
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
181

  
182
Current state and shortcomings
183
++++++++++++++++++++++++++++++
184

  
185
With the current design of Ganeti, moving whole instances between
186
different clusters involves a lot of manual work. There are several ways
187
to move instances, one of them being to export the instance, manually
188
copying all data to the new cluster before importing it again. Manual
189
changes to the instances configuration, such as the IP address, may be
190
necessary in the new environment. The goal is to improve and automate
191
this process in Ganeti 2.2.
192

  
193
Proposed changes
194
++++++++++++++++
195

  
196
Authorization, Authentication and Security
197
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
198

  
199
Until now, each Ganeti cluster was a self-contained entity and wouldn't
200
talk to other Ganeti clusters. Nodes within clusters only had to trust
201
the other nodes in the same cluster and the network used for replication
202
was trusted, too (hence the ability the use a separate, local network
203
for replication).
204

  
205
For inter-cluster instance transfers this model must be weakened. Nodes
206
in one cluster will have to talk to nodes in other clusters, sometimes
207
in other locations and, very important, via untrusted network
208
connections.
209

  
210
Various option have been considered for securing and authenticating the
211
data transfer from one machine to another. To reduce the risk of
212
accidentally overwriting data due to software bugs, authenticating the
213
arriving data was considered critical. Eventually we decided to use
214
socat's OpenSSL options (``OPENSSL:``, ``OPENSSL-LISTEN:`` et al), which
215
provide us with encryption, authentication and authorization when used
216
with separate keys and certificates.
217

  
218
Combinations of OpenSSH, GnuPG and Netcat were deemed too complex to set
219
up from within Ganeti. Any solution involving OpenSSH would require a
220
dedicated user with a home directory and likely automated modifications
221
to the user's ``$HOME/.ssh/authorized_keys`` file. When using Netcat,
222
GnuPG or another encryption method would be necessary to transfer the
223
data over an untrusted network. socat combines both in one program and
224
is already a dependency.
225

  
226
Each of the two clusters will have to generate an RSA key. The public
227
parts are exchanged between the clusters by a third party, such as an
228
administrator or a system interacting with Ganeti via the remote API
229
("third party" from here on). After receiving each other's public key,
230
the clusters can start talking to each other.
231

  
232
All encrypted connections must be verified on both sides. Neither side
233
may accept unverified certificates. The generated certificate should
234
only be valid for the time necessary to move the instance.
235

  
236
On the web, the destination cluster would be equivalent to an HTTPS
237
server requiring verifiable client certificates. The browser would be
238
equivalent to the source cluster and must verify the server's
239
certificate while providing a client certificate to the server.
240

  
241
Copying data
242
^^^^^^^^^^^^
243

  
244
To simplify the implementation, we decided to operate at a block-device
245
level only, allowing us to easily support non-DRBD instance moves.
246

  
247
Intra-cluster instance moves will re-use the existing export and import
248
scripts supplied by instance OS definitions. Unlike simply copying the
249
raw data, this allows to use filesystem-specific utilities to dump only
250
used parts of the disk and to exclude certain disks from the move.
251
Compression should be used to further reduce the amount of data
252
transferred.
253

  
254
The export scripts writes all data to stdout and the import script reads
255
it from stdin again. To avoid copying data and reduce disk space
256
consumption, everything is read from the disk and sent over the network
257
directly, where it'll be written to the new block device directly again.
258

  
259
Workflow
260
^^^^^^^^
261

  
262
#. Third party tells source cluster to shut down instance, asks for the
263
   instance specification and for the public part of an encryption key
264
#. Third party tells destination cluster to create an instance with the
265
   same specifications as on source cluster and to prepare for an
266
   instance move with the key received from the source cluster and
267
   receives the public part of the destination's encryption key
268
#. Third party hands public part of the destination's encryption key
269
   together with all necessary information to source cluster and tells
270
   it to start the move
271
#. Source cluster connects to destination cluster for each disk and
272
   transfers its data using the instance OS definition's export and
273
   import scripts
274
#. Due to the asynchronous nature of the whole process, the destination
275
   cluster checks whether all disks have been transferred every time
276
   after transfering a single disk; if so, it destroys the encryption
277
   key
278
#. After sending all disks, the source cluster destroys its key
279
#. Destination cluster runs OS definition's rename script to adjust
280
   instance settings if needed (e.g. IP address)
281
#. Destination cluster starts the instance if requested at the beginning
282
   by the third party
283
#. Source cluster removes the instance if requested
284

  
285
Miscellaneous notes
286
^^^^^^^^^^^^^^^^^^^
287

  
288
- A very similar system could also be used for instance exports within
289
  the same cluster. Currently OpenSSH is being used, but could be
290
  replaced by socat and SSL/TLS.
291
- During the design of intra-cluster instance moves we also discussed
292
  encrypting instance exports using GnuPG.
293
- While most instances should have exactly the same configuration as
294
  on the source cluster, setting them up with a different disk layout
295
  might be helpful in some use-cases.
296
- A cleanup operation, similar to the one available for failed instance
297
  migrations, should be provided.
298
- ``ganeti-watcher`` should remove instances pending a move from another
299
  cluster after a certain amount of time. This takes care of failures
300
  somewhere in the process.
301
- RSA keys can be generated using the existing
302
  ``bootstrap.GenerateSelfSignedSslCert`` function, though it might be
303
  useful to not write both parts into a single file, requiring small
304
  changes to the function. The public part always starts with
305
  ``-----BEGIN CERTIFICATE-----`` and ends with ``-----END
306
  CERTIFICATE-----``.
307
- The source and destination cluster might be different when it comes
308
  to available hypervisors, kernels, etc. The destination cluster should
309
  refuse to accept an instance move if it can't fulfill an instance's
310
  requirements.
311

  
312

  
179 313
Feature changes
180 314
---------------
181 315

  
b/doc/security.rst
99 99
will be set at source configure time. Symlinks or command line
100 100
parameters may be used to use different files.
101 101

  
102
Inter-cluster instance moves
103
----------------------------
104

  
105
To move instances between clusters, different clusters must be able to
106
communicate with each other over a secure channel. Up to and including
107
Ganeti 2.1, clusters were self-contained entities and had no knowledge
108
of other clusters. With Ganeti 2.2, clusters can exchange data if tokens
109
(an encryption certificate) was exchanged by a trusted third party
110
before.
111

  
102 112
.. vim: set textwidth=72 :
103 113
.. Local Variables:
104 114
.. mode: rst

Also available in: Unified diff