root / doc / design-multi-reloc.rst @ 9626f028
History | View | Annotate | Download (3.8 kB)
1 |
==================================== |
---|---|
2 |
Moving instances accross node groups |
3 |
==================================== |
4 |
|
5 |
This design document explains the changes needed in Ganeti to perform |
6 |
instance moves across node groups. Reader familiarity with the following |
7 |
existing documents is advised: |
8 |
|
9 |
- :doc:`Current IAllocator specification <iallocator>` |
10 |
- :doc:`Shared storage model in 2.3+ <design-shared-storage>` |
11 |
|
12 |
Motivation and and design proposal |
13 |
================================== |
14 |
|
15 |
At the moment, moving instances away from their primary or secondary |
16 |
nodes with the ``relocate`` and ``multi-evacuate`` IAllocator calls |
17 |
restricts target nodes to those on the same node group. This ensures a |
18 |
mobility domain is never crossed, and allows normal operation of each |
19 |
node group to be confined within itself. |
20 |
|
21 |
It is desirable, however, to have a way of moving instances across node |
22 |
groups so that, for example, it is possible to move a set of instances |
23 |
to another group for policy reasons, or completely empty a given group |
24 |
to perform maintenance operations. |
25 |
|
26 |
To implement this, we propose a new ``multi-relocate`` IAllocator call |
27 |
that will be able to compute inter-group instance moves, taking into |
28 |
account mobility domains as appropriate. The interface proposed below |
29 |
should be enough to cover the use cases mentioned above. |
30 |
|
31 |
.. _multi-reloc-detailed-design: |
32 |
|
33 |
Detailed design |
34 |
=============== |
35 |
|
36 |
We introduce a new ``multi-relocate`` IAllocator call whose input will |
37 |
be a list of instances to move, and a "mode of operation" that will |
38 |
determine what groups will be candidates to receive the new instances. |
39 |
|
40 |
The mode of operation will be one of: |
41 |
|
42 |
- *Stay in group*: the instances will be moved off their current nodes, |
43 |
but will stay in the same group; this is what the ``relocate`` call |
44 |
does, but here it can act on multiple instances. (Typically, the |
45 |
source nodes will be marked as drained, to avoid just exchanging |
46 |
instances among them.) |
47 |
|
48 |
- *Change group*: this mode accepts one extra parameter, a list of node |
49 |
group UUIDs; the instances will be moved away from their current |
50 |
group, to any of the groups in this list. If the list is empty, the |
51 |
request is, simply, "change group": the instances are placed in any |
52 |
group but their original one. |
53 |
|
54 |
- *Any*: for each instance, any group is valid, including its current |
55 |
one. |
56 |
|
57 |
In all modes, the groups' ``alloc_policy`` attribute will be honored. |
58 |
|
59 |
.. _multi-reloc-result: |
60 |
|
61 |
Result |
62 |
------ |
63 |
|
64 |
In all storage models, an inter-group move can be modeled as a sequence |
65 |
of **replace secondary**, **migration** and **failover** operations |
66 |
(when shared storage is used, they will all be failover or migration |
67 |
operations within the corresponding mobility domain). |
68 |
|
69 |
The result is expected to be a list of jobsets. Each jobset contains |
70 |
lists of serialized opcodes. Example:: |
71 |
|
72 |
[ |
73 |
[ |
74 |
{ "OP_ID": "OP_INSTANCE_MIGRATE", |
75 |
"instance_name": "inst1.example.com", |
76 |
}, |
77 |
{ "OP_ID": "OP_INSTANCE_MIGRATE", |
78 |
"instance_name": "inst2.example.com", |
79 |
}, |
80 |
], |
81 |
[ |
82 |
{ "OP_ID": "OP_INSTANCE_REPLACE_DISKS", |
83 |
"instance_name": "inst2.example.com", |
84 |
"mode": "replace_new_secondary", |
85 |
"remote_node": "node4.example.com" |
86 |
}, |
87 |
], |
88 |
[ |
89 |
{ "OP_ID": "OP_INSTANCE_FAILOVER", |
90 |
"instance_name": "inst8.example.com", |
91 |
}, |
92 |
] |
93 |
] |
94 |
|
95 |
Accepted opcodes: |
96 |
|
97 |
- ``OP_INSTANCE_FAILOVER`` |
98 |
- ``OP_INSTANCE_MIGRATE`` |
99 |
- ``OP_INSTANCE_REPLACE_DISKS`` |
100 |
|
101 |
Starting with the first set, Ganeti will submit all jobs of a set at the |
102 |
same time, enabling execution in parallel. Upon completion of all jobs |
103 |
in a set, the process is repeated for the next one. Ganeti is at liberty |
104 |
to abort the execution of the relocation after any jobset. In such a |
105 |
case the user is notified and can restart the relocation. |
106 |
|
107 |
.. vim: set textwidth=72 : |
108 |
.. Local Variables: |
109 |
.. mode: rst |
110 |
.. fill-column: 72 |
111 |
.. End: |