root / doc / design-cluster-merger.rst @ c3c5dc77
History | View | Annotate | Download (4.2 kB)
1 | 3605691e | René Nussbaumer | ===================== |
---|---|---|---|
2 | 3605691e | René Nussbaumer | Ganeti Cluster Merger |
3 | 3605691e | René Nussbaumer | ===================== |
4 | 3605691e | René Nussbaumer | |
5 | 3605691e | René Nussbaumer | Current situation |
6 | 3605691e | René Nussbaumer | ================= |
7 | 3605691e | René Nussbaumer | |
8 | 3605691e | René Nussbaumer | Currently there's no easy way to merge two or more clusters together. |
9 | 3605691e | René Nussbaumer | But in order to optimize resources this is a needed missing piece. The |
10 | 3605691e | René Nussbaumer | goal of this design doc is to come up with a easy to use solution which |
11 | 3605691e | René Nussbaumer | allows you to merge two or more cluster together. |
12 | 3605691e | René Nussbaumer | |
13 | 3605691e | René Nussbaumer | Initial contact |
14 | 3605691e | René Nussbaumer | =============== |
15 | 3605691e | René Nussbaumer | |
16 | 3605691e | René Nussbaumer | As the design of Ganeti is based on an autonomous system, Ganeti by |
17 | 3605691e | René Nussbaumer | itself has no way to reach nodes outside of its cluster. To overcome |
18 | 3605691e | René Nussbaumer | this situation we're required to prepare the cluster before we can go |
19 | 3605691e | René Nussbaumer | ahead with the actual merge: We've to replace at least the ssh keys on |
20 | 3605691e | René Nussbaumer | the affected nodes before we can do any operation within ``gnt-`` |
21 | 3605691e | René Nussbaumer | commands. |
22 | 3605691e | René Nussbaumer | |
23 | 3605691e | René Nussbaumer | To make this a automated process we'll ask the user to provide us with |
24 | 3605691e | René Nussbaumer | the root password of every cluster we've to merge. We use the password |
25 | 3605691e | René Nussbaumer | to grab the current ``id_dsa`` key and then rely on that ssh key for any |
26 | 3605691e | René Nussbaumer | further communication to be made until the cluster is fully merged. |
27 | 3605691e | René Nussbaumer | |
28 | 3605691e | René Nussbaumer | Cluster merge |
29 | 3605691e | René Nussbaumer | ============= |
30 | 3605691e | René Nussbaumer | |
31 | 3605691e | René Nussbaumer | After initial contact we do the cluster merge: |
32 | 3605691e | René Nussbaumer | |
33 | 3605691e | René Nussbaumer | 1. Grab the list of nodes |
34 | 3605691e | René Nussbaumer | 2. On all nodes add our own ``id_dsa.pub`` key to ``authorized_keys`` |
35 | 3605691e | René Nussbaumer | 3. Stop all instances running on the merging cluster |
36 | 3605691e | René Nussbaumer | 4. Disable ``ganeti-watcher`` as it tries to restart Ganeti daemons |
37 | 3605691e | René Nussbaumer | 5. Stop all Ganeti daemons on all merging nodes |
38 | 3605691e | René Nussbaumer | 6. Grab the ``config.data`` from the master of the merging cluster |
39 | 3605691e | René Nussbaumer | 7. Stop local ``ganeti-masterd`` |
40 | 3605691e | René Nussbaumer | 8. Merge the config: |
41 | 3605691e | René Nussbaumer | |
42 | 3605691e | René Nussbaumer | 1. Open our own cluster ``config.data`` |
43 | 3605691e | René Nussbaumer | 2. Open cluster ``config.data`` of the merging cluster |
44 | 3605691e | René Nussbaumer | 3. Grab all nodes of the merging cluster |
45 | 3605691e | René Nussbaumer | 4. Set ``master_candidate`` to false on all merging nodes |
46 | 3605691e | René Nussbaumer | 5. Add the nodes to our own cluster ``config.data`` |
47 | 3605691e | René Nussbaumer | 6. Grab all the instances on the merging cluster |
48 | 3605691e | René Nussbaumer | 7. Adjust the port if the instance has drbd layout: |
49 | 3605691e | René Nussbaumer | |
50 | 3605691e | René Nussbaumer | 1. In ``logical_id`` (index 2) |
51 | 3605691e | René Nussbaumer | 2. In ``physical_id`` (index 1 and 3) |
52 | 3605691e | René Nussbaumer | |
53 | 3605691e | René Nussbaumer | 8. Add the instances to our own cluster ``config.data`` |
54 | 3605691e | René Nussbaumer | |
55 | 3605691e | René Nussbaumer | 9. Start ``ganeti-masterd`` with ``--no-voting`` ``--yes-do-it`` |
56 | 3605691e | René Nussbaumer | 10. ``gnt-node add --readd`` on all merging nodes |
57 | 3605691e | René Nussbaumer | 11. ``gnt-cluster redist-conf`` |
58 | 3605691e | René Nussbaumer | 12. Restart ``ganeti-masterd`` normally |
59 | 3605691e | René Nussbaumer | 13. Enable ``ganeti-watcher`` again |
60 | 3605691e | René Nussbaumer | 14. Start all merging instances again |
61 | 3605691e | René Nussbaumer | |
62 | 3605691e | René Nussbaumer | Rollback |
63 | 3605691e | René Nussbaumer | ======== |
64 | 3605691e | René Nussbaumer | |
65 | 3605691e | René Nussbaumer | Until we actually (re)add any nodes we can abort and rollback the merge |
66 | 3605691e | René Nussbaumer | at any point. After merging the config, though, we've to get the backup |
67 | 3605691e | René Nussbaumer | copy of ``config.data`` (from another master candidate node). And for |
68 | 3605691e | René Nussbaumer | security reasons it's a good idea to undo ``id_dsa.pub`` distribution by |
69 | 3605691e | René Nussbaumer | going on every affected node and remove the ``id_dsa.pub`` key again. |
70 | 3605691e | René Nussbaumer | Also we've to keep in mind, that we've to start the Ganeti daemons and |
71 | 3605691e | René Nussbaumer | starting up the instances again. |
72 | 3605691e | René Nussbaumer | |
73 | 3605691e | René Nussbaumer | Verification |
74 | 3605691e | René Nussbaumer | ============ |
75 | 3605691e | René Nussbaumer | |
76 | 3605691e | René Nussbaumer | Last but not least we should verify that the merge was successful. |
77 | 3605691e | René Nussbaumer | Therefore we run ``gnt-cluster verify``, which ensures that the cluster |
78 | 3605691e | René Nussbaumer | overall is in a healthy state. Additional it's also possible to compare |
79 | 3605691e | René Nussbaumer | the list of instances/nodes with a list made prior to the upgrade to |
80 | 3605691e | René Nussbaumer | make sure we didn't lose any data/instance/node. |
81 | 3605691e | René Nussbaumer | |
82 | 3605691e | René Nussbaumer | Appendix |
83 | 3605691e | René Nussbaumer | ======== |
84 | 3605691e | René Nussbaumer | |
85 | 3605691e | René Nussbaumer | cluster-merge.py |
86 | 3605691e | René Nussbaumer | ---------------- |
87 | 3605691e | René Nussbaumer | |
88 | 3605691e | René Nussbaumer | Used to merge the cluster config. This is a POC and might differ from |
89 | 3605691e | René Nussbaumer | actual production code. |
90 | 3605691e | René Nussbaumer | |
91 | 3605691e | René Nussbaumer | :: |
92 | 3605691e | René Nussbaumer | |
93 | 3605691e | René Nussbaumer | #!/usr/bin/python |
94 | 3605691e | René Nussbaumer | |
95 | 3605691e | René Nussbaumer | import sys |
96 | 3605691e | René Nussbaumer | from ganeti import config |
97 | 3605691e | René Nussbaumer | from ganeti import constants |
98 | 3605691e | René Nussbaumer | |
99 | 3605691e | René Nussbaumer | c_mine = config.ConfigWriter(offline=True) |
100 | 3605691e | René Nussbaumer | c_other = config.ConfigWriter(sys.argv[1]) |
101 | 3605691e | René Nussbaumer | |
102 | 3605691e | René Nussbaumer | fake_id = 0 |
103 | 3605691e | René Nussbaumer | for node in c_other.GetNodeList(): |
104 | 3605691e | René Nussbaumer | node_info = c_other.GetNodeInfo(node) |
105 | 3605691e | René Nussbaumer | node_info.master_candidate = False |
106 | 3605691e | René Nussbaumer | c_mine.AddNode(node_info, str(fake_id)) |
107 | 3605691e | René Nussbaumer | fake_id += 1 |
108 | 3605691e | René Nussbaumer | |
109 | 3605691e | René Nussbaumer | for instance in c_other.GetInstanceList(): |
110 | 3605691e | René Nussbaumer | instance_info = c_other.GetInstanceInfo(instance) |
111 | 3605691e | René Nussbaumer | for dsk in instance_info.disks: |
112 | 3605691e | René Nussbaumer | if dsk.dev_type in constants.LDS_DRBD: |
113 | 3605691e | René Nussbaumer | port = c_mine.AllocatePort() |
114 | 3605691e | René Nussbaumer | logical_id = list(dsk.logical_id) |
115 | 3605691e | René Nussbaumer | logical_id[2] = port |
116 | 3605691e | René Nussbaumer | dsk.logical_id = tuple(logical_id) |
117 | 3605691e | René Nussbaumer | physical_id = list(dsk.physical_id) |
118 | 3605691e | René Nussbaumer | physical_id[1] = physical_id[3] = port |
119 | 3605691e | René Nussbaumer | dsk.physical_id = tuple(physical_id) |
120 | 3605691e | René Nussbaumer | c_mine.AddInstance(instance_info, str(fake_id)) |
121 | 3605691e | René Nussbaumer | fake_id += 1 |
122 | 3605691e | René Nussbaumer | |
123 | 3605691e | René Nussbaumer | .. vim: set textwidth=72 : |
124 | 3605691e | René Nussbaumer | .. Local Variables: |
125 | 3605691e | René Nussbaumer | .. mode: rst |
126 | 3605691e | René Nussbaumer | .. fill-column: 72 |
127 | 3605691e | René Nussbaumer | .. End: |