root / doc / design-cluster-merger.rst @ 3605691e
History | View | Annotate | Download (4.2 kB)
1 |
===================== |
---|---|
2 |
Ganeti Cluster Merger |
3 |
===================== |
4 |
|
5 |
Current situation |
6 |
================= |
7 |
|
8 |
Currently there's no easy way to merge two or more clusters together. |
9 |
But in order to optimize resources this is a needed missing piece. The |
10 |
goal of this design doc is to come up with a easy to use solution which |
11 |
allows you to merge two or more cluster together. |
12 |
|
13 |
Initial contact |
14 |
=============== |
15 |
|
16 |
As the design of Ganeti is based on an autonomous system, Ganeti by |
17 |
itself has no way to reach nodes outside of its cluster. To overcome |
18 |
this situation we're required to prepare the cluster before we can go |
19 |
ahead with the actual merge: We've to replace at least the ssh keys on |
20 |
the affected nodes before we can do any operation within ``gnt-`` |
21 |
commands. |
22 |
|
23 |
To make this a automated process we'll ask the user to provide us with |
24 |
the root password of every cluster we've to merge. We use the password |
25 |
to grab the current ``id_dsa`` key and then rely on that ssh key for any |
26 |
further communication to be made until the cluster is fully merged. |
27 |
|
28 |
Cluster merge |
29 |
============= |
30 |
|
31 |
After initial contact we do the cluster merge: |
32 |
|
33 |
1. Grab the list of nodes |
34 |
2. On all nodes add our own ``id_dsa.pub`` key to ``authorized_keys`` |
35 |
3. Stop all instances running on the merging cluster |
36 |
4. Disable ``ganeti-watcher`` as it tries to restart Ganeti daemons |
37 |
5. Stop all Ganeti daemons on all merging nodes |
38 |
6. Grab the ``config.data`` from the master of the merging cluster |
39 |
7. Stop local ``ganeti-masterd`` |
40 |
8. Merge the config: |
41 |
|
42 |
1. Open our own cluster ``config.data`` |
43 |
2. Open cluster ``config.data`` of the merging cluster |
44 |
3. Grab all nodes of the merging cluster |
45 |
4. Set ``master_candidate`` to false on all merging nodes |
46 |
5. Add the nodes to our own cluster ``config.data`` |
47 |
6. Grab all the instances on the merging cluster |
48 |
7. Adjust the port if the instance has drbd layout: |
49 |
|
50 |
1. In ``logical_id`` (index 2) |
51 |
2. In ``physical_id`` (index 1 and 3) |
52 |
|
53 |
8. Add the instances to our own cluster ``config.data`` |
54 |
|
55 |
9. Start ``ganeti-masterd`` with ``--no-voting`` ``--yes-do-it`` |
56 |
10. ``gnt-node add --readd`` on all merging nodes |
57 |
11. ``gnt-cluster redist-conf`` |
58 |
12. Restart ``ganeti-masterd`` normally |
59 |
13. Enable ``ganeti-watcher`` again |
60 |
14. Start all merging instances again |
61 |
|
62 |
Rollback |
63 |
======== |
64 |
|
65 |
Until we actually (re)add any nodes we can abort and rollback the merge |
66 |
at any point. After merging the config, though, we've to get the backup |
67 |
copy of ``config.data`` (from another master candidate node). And for |
68 |
security reasons it's a good idea to undo ``id_dsa.pub`` distribution by |
69 |
going on every affected node and remove the ``id_dsa.pub`` key again. |
70 |
Also we've to keep in mind, that we've to start the Ganeti daemons and |
71 |
starting up the instances again. |
72 |
|
73 |
Verification |
74 |
============ |
75 |
|
76 |
Last but not least we should verify that the merge was successful. |
77 |
Therefore we run ``gnt-cluster verify``, which ensures that the cluster |
78 |
overall is in a healthy state. Additional it's also possible to compare |
79 |
the list of instances/nodes with a list made prior to the upgrade to |
80 |
make sure we didn't lose any data/instance/node. |
81 |
|
82 |
Appendix |
83 |
======== |
84 |
|
85 |
cluster-merge.py |
86 |
---------------- |
87 |
|
88 |
Used to merge the cluster config. This is a POC and might differ from |
89 |
actual production code. |
90 |
|
91 |
:: |
92 |
|
93 |
#!/usr/bin/python |
94 |
|
95 |
import sys |
96 |
from ganeti import config |
97 |
from ganeti import constants |
98 |
|
99 |
c_mine = config.ConfigWriter(offline=True) |
100 |
c_other = config.ConfigWriter(sys.argv[1]) |
101 |
|
102 |
fake_id = 0 |
103 |
for node in c_other.GetNodeList(): |
104 |
node_info = c_other.GetNodeInfo(node) |
105 |
node_info.master_candidate = False |
106 |
c_mine.AddNode(node_info, str(fake_id)) |
107 |
fake_id += 1 |
108 |
|
109 |
for instance in c_other.GetInstanceList(): |
110 |
instance_info = c_other.GetInstanceInfo(instance) |
111 |
for dsk in instance_info.disks: |
112 |
if dsk.dev_type in constants.LDS_DRBD: |
113 |
port = c_mine.AllocatePort() |
114 |
logical_id = list(dsk.logical_id) |
115 |
logical_id[2] = port |
116 |
dsk.logical_id = tuple(logical_id) |
117 |
physical_id = list(dsk.physical_id) |
118 |
physical_id[1] = physical_id[3] = port |
119 |
dsk.physical_id = tuple(physical_id) |
120 |
c_mine.AddInstance(instance_info, str(fake_id)) |
121 |
fake_id += 1 |
122 |
|
123 |
.. vim: set textwidth=72 : |
124 |
.. Local Variables: |
125 |
.. mode: rst |
126 |
.. fill-column: 72 |
127 |
.. End: |