Revision ac932df1

b/doc/design-2.1.rst
321 321
reading/writing to disk fails constantly.
322 322

  
323 323

  
324
New Features
325
------------
326

  
327
Automated Ganeti Cluster Merger
328
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
329

  
330
Current situation
331
+++++++++++++++++
332

  
333
Currently there's no easy way to merge two or more clusters together.
334
But in order to optimize resources this is a needed missing piece. The
335
goal of this design doc is to come up with a easy to use solution which
336
allows you to merge two or more cluster together.
337

  
338
Initial contact
339
+++++++++++++++
340

  
341
As the design of Ganeti is based on an autonomous system, Ganeti by
342
itself has no way to reach nodes outside of its cluster. To overcome
343
this situation we're required to prepare the cluster before we can go
344
ahead with the actual merge: We've to replace at least the ssh keys on
345
the affected nodes before we can do any operation within ``gnt-``
346
commands.
347

  
348
To make this a automated process we'll ask the user to provide us with
349
the root password of every cluster we've to merge. We use the password
350
to grab the current ``id_dsa`` key and then rely on that ssh key for any
351
further communication to be made until the cluster is fully merged.
352

  
353
Cluster merge
354
+++++++++++++
355

  
356
After initial contact we do the cluster merge:
357

  
358
1. Grab the list of nodes
359
2. On all nodes add our own ``id_dsa.pub`` key to ``authorized_keys``
360
3. Stop all instances running on the merging cluster
361
4. Disable ``ganeti-watcher`` as it tries to restart Ganeti daemons
362
5. Stop all Ganeti daemons on all merging nodes
363
6. Grab the ``config.data`` from the master of the merging cluster
364
7. Stop local ``ganeti-masterd``
365
8. Merge the config:
366

  
367
   1. Open our own cluster ``config.data``
368
   2. Open cluster ``config.data`` of the merging cluster
369
   3. Grab all nodes of the merging cluster
370
   4. Set ``master_candidate`` to false on all merging nodes
371
   5. Add the nodes to our own cluster ``config.data``
372
   6. Grab all the instances on the merging cluster
373
   7. Adjust the port if the instance has drbd layout:
374

  
375
      1. In ``logical_id`` (index 2)
376
      2. In ``physical_id`` (index 1 and 3)
377

  
378
   8. Add the instances to our own cluster ``config.data``
379

  
380
9. Start ``ganeti-masterd`` with ``--no-voting`` ``--yes-do-it``
381
10. ``gnt-node add --readd`` on all merging nodes
382
11. ``gnt-cluster redist-conf``
383
12. Restart ``ganeti-masterd`` normally
384
13. Enable ``ganeti-watcher`` again
385
14. Start all merging instances again
386

  
387
Rollback
388
++++++++
389

  
390
Until we actually (re)add any nodes we can abort and rollback the merge
391
at any point. After merging the config, though, we've to get the backup
392
copy of ``config.data`` (from another master candidate node). And for
393
security reasons it's a good idea to undo ``id_dsa.pub`` distribution by
394
going on every affected node and remove the ``id_dsa.pub`` key again.
395
Also we've to keep in mind, that we've to start the Ganeti daemons and
396
starting up the instances again.
397

  
398
Verification
399
++++++++++++
400

  
401
Last but not least we should verify that the merge was successful.
402
Therefore we run ``gnt-cluster verify``, which ensures that the cluster
403
overall is in a healthy state. Additional it's also possible to compare
404
the list of instances/nodes with a list made prior to the upgrade to
405
make sure we didn't lose any data/instance/node.
406

  
407
Appendix
408
++++++++
409

  
410
cluster-merge.py
411
^^^^^^^^^^^^^^^^
412

  
413
Used to merge the cluster config. This is a POC and might differ from
414
actual production code.
415

  
416
::
417

  
418
  #!/usr/bin/python
419

  
420
  import sys
421
  from ganeti import config
422
  from ganeti import constants
423

  
424
  c_mine = config.ConfigWriter(offline=True)
425
  c_other = config.ConfigWriter(sys.argv[1])
426

  
427
  fake_id = 0
428
  for node in c_other.GetNodeList():
429
    node_info = c_other.GetNodeInfo(node)
430
    node_info.master_candidate = False
431
    c_mine.AddNode(node_info, str(fake_id))
432
    fake_id += 1
433

  
434
  for instance in c_other.GetInstanceList():
435
    instance_info = c_other.GetInstanceInfo(instance)
436
    for dsk in instance_info.disks:
437
      if dsk.dev_type in constants.LDS_DRBD:
438
         port = c_mine.AllocatePort()
439
         logical_id = list(dsk.logical_id)
440
         logical_id[2] = port
441
         dsk.logical_id = tuple(logical_id)
442
         physical_id = list(dsk.physical_id)
443
         physical_id[1] = physical_id[3] = port
444
         dsk.physical_id = tuple(physical_id)
445
    c_mine.AddInstance(instance_info, str(fake_id))
446
    fake_id += 1
447

  
448

  
324 449
Feature changes
325 450
---------------
326 451

  
/dev/null
1
=====================
2
Ganeti Cluster Merger
3
=====================
4

  
5
Current situation
6
=================
7

  
8
Currently there's no easy way to merge two or more clusters together.
9
But in order to optimize resources this is a needed missing piece. The
10
goal of this design doc is to come up with a easy to use solution which
11
allows you to merge two or more cluster together.
12

  
13
Initial contact
14
===============
15

  
16
As the design of Ganeti is based on an autonomous system, Ganeti by
17
itself has no way to reach nodes outside of its cluster. To overcome
18
this situation we're required to prepare the cluster before we can go
19
ahead with the actual merge: We've to replace at least the ssh keys on
20
the affected nodes before we can do any operation within ``gnt-``
21
commands.
22

  
23
To make this a automated process we'll ask the user to provide us with
24
the root password of every cluster we've to merge. We use the password
25
to grab the current ``id_dsa`` key and then rely on that ssh key for any
26
further communication to be made until the cluster is fully merged.
27

  
28
Cluster merge
29
=============
30

  
31
After initial contact we do the cluster merge:
32

  
33
1. Grab the list of nodes
34
2. On all nodes add our own ``id_dsa.pub`` key to ``authorized_keys``
35
3. Stop all instances running on the merging cluster
36
4. Disable ``ganeti-watcher`` as it tries to restart Ganeti daemons
37
5. Stop all Ganeti daemons on all merging nodes
38
6. Grab the ``config.data`` from the master of the merging cluster
39
7. Stop local ``ganeti-masterd``
40
8. Merge the config:
41

  
42
   1. Open our own cluster ``config.data``
43
   2. Open cluster ``config.data`` of the merging cluster
44
   3. Grab all nodes of the merging cluster
45
   4. Set ``master_candidate`` to false on all merging nodes
46
   5. Add the nodes to our own cluster ``config.data``
47
   6. Grab all the instances on the merging cluster
48
   7. Adjust the port if the instance has drbd layout:
49

  
50
      1. In ``logical_id`` (index 2)
51
      2. In ``physical_id`` (index 1 and 3)
52

  
53
   8. Add the instances to our own cluster ``config.data``
54

  
55
9. Start ``ganeti-masterd`` with ``--no-voting`` ``--yes-do-it``
56
10. ``gnt-node add --readd`` on all merging nodes
57
11. ``gnt-cluster redist-conf``
58
12. Restart ``ganeti-masterd`` normally
59
13. Enable ``ganeti-watcher`` again
60
14. Start all merging instances again
61

  
62
Rollback
63
========
64

  
65
Until we actually (re)add any nodes we can abort and rollback the merge
66
at any point. After merging the config, though, we've to get the backup
67
copy of ``config.data`` (from another master candidate node). And for
68
security reasons it's a good idea to undo ``id_dsa.pub`` distribution by
69
going on every affected node and remove the ``id_dsa.pub`` key again.
70
Also we've to keep in mind, that we've to start the Ganeti daemons and
71
starting up the instances again.
72

  
73
Verification
74
============
75

  
76
Last but not least we should verify that the merge was successful.
77
Therefore we run ``gnt-cluster verify``, which ensures that the cluster
78
overall is in a healthy state. Additional it's also possible to compare
79
the list of instances/nodes with a list made prior to the upgrade to
80
make sure we didn't lose any data/instance/node.
81

  
82
Appendix
83
========
84

  
85
cluster-merge.py
86
----------------
87

  
88
Used to merge the cluster config. This is a POC and might differ from
89
actual production code.
90

  
91
::
92

  
93
  #!/usr/bin/python
94

  
95
  import sys
96
  from ganeti import config
97
  from ganeti import constants
98

  
99
  c_mine = config.ConfigWriter(offline=True)
100
  c_other = config.ConfigWriter(sys.argv[1])
101

  
102
  fake_id = 0
103
  for node in c_other.GetNodeList():
104
    node_info = c_other.GetNodeInfo(node)
105
    node_info.master_candidate = False
106
    c_mine.AddNode(node_info, str(fake_id))
107
    fake_id += 1
108

  
109
  for instance in c_other.GetInstanceList():
110
    instance_info = c_other.GetInstanceInfo(instance)
111
    for dsk in instance_info.disks:
112
      if dsk.dev_type in constants.LDS_DRBD:
113
         port = c_mine.AllocatePort()
114
         logical_id = list(dsk.logical_id)
115
         logical_id[2] = port
116
         dsk.logical_id = tuple(logical_id)
117
         physical_id = list(dsk.physical_id)
118
         physical_id[1] = physical_id[3] = port
119
         dsk.physical_id = tuple(physical_id)
120
    c_mine.AddInstance(instance_info, str(fake_id))
121
    fake_id += 1
122

  
123
.. vim: set textwidth=72 :
124
.. Local Variables:
125
.. mode: rst
126
.. fill-column: 72
127
.. End:

Also available in: Unified diff