Statistics
| Branch: | Tag: | Revision:

root / doc / design-cluster-merger.rst @ 3605691e

History | View | Annotate | Download (4.2 kB)

1
=====================
2
Ganeti Cluster Merger
3
=====================
4

    
5
Current situation
6
=================
7

    
8
Currently there's no easy way to merge two or more clusters together.
9
But in order to optimize resources this is a needed missing piece. The
10
goal of this design doc is to come up with a easy to use solution which
11
allows you to merge two or more cluster together.
12

    
13
Initial contact
14
===============
15

    
16
As the design of Ganeti is based on an autonomous system, Ganeti by
17
itself has no way to reach nodes outside of its cluster. To overcome
18
this situation we're required to prepare the cluster before we can go
19
ahead with the actual merge: We've to replace at least the ssh keys on
20
the affected nodes before we can do any operation within ``gnt-``
21
commands.
22

    
23
To make this a automated process we'll ask the user to provide us with
24
the root password of every cluster we've to merge. We use the password
25
to grab the current ``id_dsa`` key and then rely on that ssh key for any
26
further communication to be made until the cluster is fully merged.
27

    
28
Cluster merge
29
=============
30

    
31
After initial contact we do the cluster merge:
32

    
33
1. Grab the list of nodes
34
2. On all nodes add our own ``id_dsa.pub`` key to ``authorized_keys``
35
3. Stop all instances running on the merging cluster
36
4. Disable ``ganeti-watcher`` as it tries to restart Ganeti daemons
37
5. Stop all Ganeti daemons on all merging nodes
38
6. Grab the ``config.data`` from the master of the merging cluster
39
7. Stop local ``ganeti-masterd``
40
8. Merge the config:
41

    
42
   1. Open our own cluster ``config.data``
43
   2. Open cluster ``config.data`` of the merging cluster
44
   3. Grab all nodes of the merging cluster
45
   4. Set ``master_candidate`` to false on all merging nodes
46
   5. Add the nodes to our own cluster ``config.data``
47
   6. Grab all the instances on the merging cluster
48
   7. Adjust the port if the instance has drbd layout:
49

    
50
      1. In ``logical_id`` (index 2)
51
      2. In ``physical_id`` (index 1 and 3)
52

    
53
   8. Add the instances to our own cluster ``config.data``
54

    
55
9. Start ``ganeti-masterd`` with ``--no-voting`` ``--yes-do-it``
56
10. ``gnt-node add --readd`` on all merging nodes
57
11. ``gnt-cluster redist-conf``
58
12. Restart ``ganeti-masterd`` normally
59
13. Enable ``ganeti-watcher`` again
60
14. Start all merging instances again
61

    
62
Rollback
63
========
64

    
65
Until we actually (re)add any nodes we can abort and rollback the merge
66
at any point. After merging the config, though, we've to get the backup
67
copy of ``config.data`` (from another master candidate node). And for
68
security reasons it's a good idea to undo ``id_dsa.pub`` distribution by
69
going on every affected node and remove the ``id_dsa.pub`` key again.
70
Also we've to keep in mind, that we've to start the Ganeti daemons and
71
starting up the instances again.
72

    
73
Verification
74
============
75

    
76
Last but not least we should verify that the merge was successful.
77
Therefore we run ``gnt-cluster verify``, which ensures that the cluster
78
overall is in a healthy state. Additional it's also possible to compare
79
the list of instances/nodes with a list made prior to the upgrade to
80
make sure we didn't lose any data/instance/node.
81

    
82
Appendix
83
========
84

    
85
cluster-merge.py
86
----------------
87

    
88
Used to merge the cluster config. This is a POC and might differ from
89
actual production code.
90

    
91
::
92

    
93
  #!/usr/bin/python
94

    
95
  import sys
96
  from ganeti import config
97
  from ganeti import constants
98

    
99
  c_mine = config.ConfigWriter(offline=True)
100
  c_other = config.ConfigWriter(sys.argv[1])
101

    
102
  fake_id = 0
103
  for node in c_other.GetNodeList():
104
    node_info = c_other.GetNodeInfo(node)
105
    node_info.master_candidate = False
106
    c_mine.AddNode(node_info, str(fake_id))
107
    fake_id += 1
108

    
109
  for instance in c_other.GetInstanceList():
110
    instance_info = c_other.GetInstanceInfo(instance)
111
    for dsk in instance_info.disks:
112
      if dsk.dev_type in constants.LDS_DRBD:
113
         port = c_mine.AllocatePort()
114
         logical_id = list(dsk.logical_id)
115
         logical_id[2] = port
116
         dsk.logical_id = tuple(logical_id)
117
         physical_id = list(dsk.physical_id)
118
         physical_id[1] = physical_id[3] = port
119
         dsk.physical_id = tuple(physical_id)
120
    c_mine.AddInstance(instance_info, str(fake_id))
121
    fake_id += 1
122

    
123
.. vim: set textwidth=72 :
124
.. Local Variables:
125
.. mode: rst
126
.. fill-column: 72
127
.. End: