Revision ac932df1 doc/design-2.1.rst
b/doc/design-2.1.rst | ||
---|---|---|
321 | 321 |
reading/writing to disk fails constantly. |
322 | 322 |
|
323 | 323 |
|
324 |
New Features |
|
325 |
------------ |
|
326 |
|
|
327 |
Automated Ganeti Cluster Merger |
|
328 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
329 |
|
|
330 |
Current situation |
|
331 |
+++++++++++++++++ |
|
332 |
|
|
333 |
Currently there's no easy way to merge two or more clusters together. |
|
334 |
But in order to optimize resources this is a needed missing piece. The |
|
335 |
goal of this design doc is to come up with a easy to use solution which |
|
336 |
allows you to merge two or more cluster together. |
|
337 |
|
|
338 |
Initial contact |
|
339 |
+++++++++++++++ |
|
340 |
|
|
341 |
As the design of Ganeti is based on an autonomous system, Ganeti by |
|
342 |
itself has no way to reach nodes outside of its cluster. To overcome |
|
343 |
this situation we're required to prepare the cluster before we can go |
|
344 |
ahead with the actual merge: We've to replace at least the ssh keys on |
|
345 |
the affected nodes before we can do any operation within ``gnt-`` |
|
346 |
commands. |
|
347 |
|
|
348 |
To make this a automated process we'll ask the user to provide us with |
|
349 |
the root password of every cluster we've to merge. We use the password |
|
350 |
to grab the current ``id_dsa`` key and then rely on that ssh key for any |
|
351 |
further communication to be made until the cluster is fully merged. |
|
352 |
|
|
353 |
Cluster merge |
|
354 |
+++++++++++++ |
|
355 |
|
|
356 |
After initial contact we do the cluster merge: |
|
357 |
|
|
358 |
1. Grab the list of nodes |
|
359 |
2. On all nodes add our own ``id_dsa.pub`` key to ``authorized_keys`` |
|
360 |
3. Stop all instances running on the merging cluster |
|
361 |
4. Disable ``ganeti-watcher`` as it tries to restart Ganeti daemons |
|
362 |
5. Stop all Ganeti daemons on all merging nodes |
|
363 |
6. Grab the ``config.data`` from the master of the merging cluster |
|
364 |
7. Stop local ``ganeti-masterd`` |
|
365 |
8. Merge the config: |
|
366 |
|
|
367 |
1. Open our own cluster ``config.data`` |
|
368 |
2. Open cluster ``config.data`` of the merging cluster |
|
369 |
3. Grab all nodes of the merging cluster |
|
370 |
4. Set ``master_candidate`` to false on all merging nodes |
|
371 |
5. Add the nodes to our own cluster ``config.data`` |
|
372 |
6. Grab all the instances on the merging cluster |
|
373 |
7. Adjust the port if the instance has drbd layout: |
|
374 |
|
|
375 |
1. In ``logical_id`` (index 2) |
|
376 |
2. In ``physical_id`` (index 1 and 3) |
|
377 |
|
|
378 |
8. Add the instances to our own cluster ``config.data`` |
|
379 |
|
|
380 |
9. Start ``ganeti-masterd`` with ``--no-voting`` ``--yes-do-it`` |
|
381 |
10. ``gnt-node add --readd`` on all merging nodes |
|
382 |
11. ``gnt-cluster redist-conf`` |
|
383 |
12. Restart ``ganeti-masterd`` normally |
|
384 |
13. Enable ``ganeti-watcher`` again |
|
385 |
14. Start all merging instances again |
|
386 |
|
|
387 |
Rollback |
|
388 |
++++++++ |
|
389 |
|
|
390 |
Until we actually (re)add any nodes we can abort and rollback the merge |
|
391 |
at any point. After merging the config, though, we've to get the backup |
|
392 |
copy of ``config.data`` (from another master candidate node). And for |
|
393 |
security reasons it's a good idea to undo ``id_dsa.pub`` distribution by |
|
394 |
going on every affected node and remove the ``id_dsa.pub`` key again. |
|
395 |
Also we've to keep in mind, that we've to start the Ganeti daemons and |
|
396 |
starting up the instances again. |
|
397 |
|
|
398 |
Verification |
|
399 |
++++++++++++ |
|
400 |
|
|
401 |
Last but not least we should verify that the merge was successful. |
|
402 |
Therefore we run ``gnt-cluster verify``, which ensures that the cluster |
|
403 |
overall is in a healthy state. Additional it's also possible to compare |
|
404 |
the list of instances/nodes with a list made prior to the upgrade to |
|
405 |
make sure we didn't lose any data/instance/node. |
|
406 |
|
|
407 |
Appendix |
|
408 |
++++++++ |
|
409 |
|
|
410 |
cluster-merge.py |
|
411 |
^^^^^^^^^^^^^^^^ |
|
412 |
|
|
413 |
Used to merge the cluster config. This is a POC and might differ from |
|
414 |
actual production code. |
|
415 |
|
|
416 |
:: |
|
417 |
|
|
418 |
#!/usr/bin/python |
|
419 |
|
|
420 |
import sys |
|
421 |
from ganeti import config |
|
422 |
from ganeti import constants |
|
423 |
|
|
424 |
c_mine = config.ConfigWriter(offline=True) |
|
425 |
c_other = config.ConfigWriter(sys.argv[1]) |
|
426 |
|
|
427 |
fake_id = 0 |
|
428 |
for node in c_other.GetNodeList(): |
|
429 |
node_info = c_other.GetNodeInfo(node) |
|
430 |
node_info.master_candidate = False |
|
431 |
c_mine.AddNode(node_info, str(fake_id)) |
|
432 |
fake_id += 1 |
|
433 |
|
|
434 |
for instance in c_other.GetInstanceList(): |
|
435 |
instance_info = c_other.GetInstanceInfo(instance) |
|
436 |
for dsk in instance_info.disks: |
|
437 |
if dsk.dev_type in constants.LDS_DRBD: |
|
438 |
port = c_mine.AllocatePort() |
|
439 |
logical_id = list(dsk.logical_id) |
|
440 |
logical_id[2] = port |
|
441 |
dsk.logical_id = tuple(logical_id) |
|
442 |
physical_id = list(dsk.physical_id) |
|
443 |
physical_id[1] = physical_id[3] = port |
|
444 |
dsk.physical_id = tuple(physical_id) |
|
445 |
c_mine.AddInstance(instance_info, str(fake_id)) |
|
446 |
fake_id += 1 |
|
447 |
|
|
448 |
|
|
324 | 449 |
Feature changes |
325 | 450 |
--------------- |
326 | 451 |
|
Also available in: Unified diff