From: René Nussbaumer Date: Fri, 12 Feb 2010 10:39:13 +0000 (+0100) Subject: Adding design doc for cluster merger X-Git-Tag: v2.2.0beta0~171 X-Git-Url: https://code.grnet.gr/git/ganeti-local/commitdiff_plain/3605691e2488d12de6b0a9b03de1284737f7836a Adding design doc for cluster merger Signed-off-by: René Nussbaumer Signed-off-by: Iustin Pop Reviewed-by: Iustin Pop (applied slightly modified patch) --- diff --git a/Makefile.am b/Makefile.am index 10650fc..7d585ef 100644 --- a/Makefile.am +++ b/Makefile.am @@ -145,6 +145,7 @@ docrst = \ doc/design-2.0.rst \ doc/design-2.1.rst \ doc/design-2.2.rst \ + doc/design-cluster-merger.rst \ doc/devnotes.rst \ doc/glossary.rst \ doc/hooks.rst \ diff --git a/doc/design-cluster-merger.rst b/doc/design-cluster-merger.rst new file mode 100644 index 0000000..2ab744e --- /dev/null +++ b/doc/design-cluster-merger.rst @@ -0,0 +1,127 @@ +===================== +Ganeti Cluster Merger +===================== + +Current situation +================= + +Currently there's no easy way to merge two or more clusters together. +But in order to optimize resources this is a needed missing piece. The +goal of this design doc is to come up with a easy to use solution which +allows you to merge two or more cluster together. + +Initial contact +=============== + +As the design of Ganeti is based on an autonomous system, Ganeti by +itself has no way to reach nodes outside of its cluster. To overcome +this situation we're required to prepare the cluster before we can go +ahead with the actual merge: We've to replace at least the ssh keys on +the affected nodes before we can do any operation within ``gnt-`` +commands. + +To make this a automated process we'll ask the user to provide us with +the root password of every cluster we've to merge. We use the password +to grab the current ``id_dsa`` key and then rely on that ssh key for any +further communication to be made until the cluster is fully merged. + +Cluster merge +============= + +After initial contact we do the cluster merge: + +1. Grab the list of nodes +2. On all nodes add our own ``id_dsa.pub`` key to ``authorized_keys`` +3. Stop all instances running on the merging cluster +4. Disable ``ganeti-watcher`` as it tries to restart Ganeti daemons +5. Stop all Ganeti daemons on all merging nodes +6. Grab the ``config.data`` from the master of the merging cluster +7. Stop local ``ganeti-masterd`` +8. Merge the config: + + 1. Open our own cluster ``config.data`` + 2. Open cluster ``config.data`` of the merging cluster + 3. Grab all nodes of the merging cluster + 4. Set ``master_candidate`` to false on all merging nodes + 5. Add the nodes to our own cluster ``config.data`` + 6. Grab all the instances on the merging cluster + 7. Adjust the port if the instance has drbd layout: + + 1. In ``logical_id`` (index 2) + 2. In ``physical_id`` (index 1 and 3) + + 8. Add the instances to our own cluster ``config.data`` + +9. Start ``ganeti-masterd`` with ``--no-voting`` ``--yes-do-it`` +10. ``gnt-node add --readd`` on all merging nodes +11. ``gnt-cluster redist-conf`` +12. Restart ``ganeti-masterd`` normally +13. Enable ``ganeti-watcher`` again +14. Start all merging instances again + +Rollback +======== + +Until we actually (re)add any nodes we can abort and rollback the merge +at any point. After merging the config, though, we've to get the backup +copy of ``config.data`` (from another master candidate node). And for +security reasons it's a good idea to undo ``id_dsa.pub`` distribution by +going on every affected node and remove the ``id_dsa.pub`` key again. +Also we've to keep in mind, that we've to start the Ganeti daemons and +starting up the instances again. + +Verification +============ + +Last but not least we should verify that the merge was successful. +Therefore we run ``gnt-cluster verify``, which ensures that the cluster +overall is in a healthy state. Additional it's also possible to compare +the list of instances/nodes with a list made prior to the upgrade to +make sure we didn't lose any data/instance/node. + +Appendix +======== + +cluster-merge.py +---------------- + +Used to merge the cluster config. This is a POC and might differ from +actual production code. + +:: + + #!/usr/bin/python + + import sys + from ganeti import config + from ganeti import constants + + c_mine = config.ConfigWriter(offline=True) + c_other = config.ConfigWriter(sys.argv[1]) + + fake_id = 0 + for node in c_other.GetNodeList(): + node_info = c_other.GetNodeInfo(node) + node_info.master_candidate = False + c_mine.AddNode(node_info, str(fake_id)) + fake_id += 1 + + for instance in c_other.GetInstanceList(): + instance_info = c_other.GetInstanceInfo(instance) + for dsk in instance_info.disks: + if dsk.dev_type in constants.LDS_DRBD: + port = c_mine.AllocatePort() + logical_id = list(dsk.logical_id) + logical_id[2] = port + dsk.logical_id = tuple(logical_id) + physical_id = list(dsk.physical_id) + physical_id[1] = physical_id[3] = port + dsk.physical_id = tuple(physical_id) + c_mine.AddInstance(instance_info, str(fake_id)) + fake_id += 1 + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: diff --git a/doc/index.rst b/doc/index.rst index 0d1e1c0..15b5054 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -17,6 +17,7 @@ Contents: design-2.0.rst design-2.1.rst design-2.2.rst + design-cluster-merger.rst locking.rst hooks.rst iallocator.rst