root / doc / design-multi-reloc.rst @ 2a50e2e8
History | View | Annotate | Download (4.9 kB)
1 |
==================================== |
---|---|
2 |
Moving instances accross node groups |
3 |
==================================== |
4 |
|
5 |
This design document explains the changes needed in Ganeti to perform |
6 |
instance moves across node groups. Reader familiarity with the following |
7 |
existing documents is advised: |
8 |
|
9 |
- :doc:`Current IAllocator specification <iallocator>` |
10 |
- :doc:`Shared storage model in 2.3+ <design-shared-storage>` |
11 |
|
12 |
Motivation and and design proposal |
13 |
================================== |
14 |
|
15 |
At the moment, moving instances away from their primary or secondary |
16 |
nodes with the ``relocate`` and ``multi-evacuate`` IAllocator calls |
17 |
restricts target nodes to those on the same node group. This ensures a |
18 |
mobility domain is never crossed, and allows normal operation of each |
19 |
node group to be confined within itself. |
20 |
|
21 |
It is desirable, however, to have a way of moving instances across node |
22 |
groups so that, for example, it is possible to move a set of instances |
23 |
to another group for policy reasons, or completely empty a given group |
24 |
to perform maintenance operations. |
25 |
|
26 |
To implement this, we propose the addition of new IAllocator calls to |
27 |
compute inter-group instance moves and group-aware node evacuation, |
28 |
taking into account mobility domains as appropriate. The interface |
29 |
proposed below should be enough to cover the use cases mentioned above. |
30 |
|
31 |
With the implementation of this design proposal, the previous |
32 |
``multi-evacuate`` mode will be deprecated. |
33 |
|
34 |
.. _multi-reloc-detailed-design: |
35 |
|
36 |
Detailed design |
37 |
=============== |
38 |
|
39 |
All requests honor the groups' ``alloc_policy`` attribute. |
40 |
|
41 |
Changing instance's groups |
42 |
-------------------------- |
43 |
|
44 |
Takes a list of instances and a list of node group UUIDs; the instances |
45 |
will be moved away from their current group, to any of the groups in the |
46 |
target list. All instances need to have their primary node in the same |
47 |
group, which may not be a target group. If the target group list is |
48 |
empty, the request is simply "change group" and the instances are placed |
49 |
in any group but their original one. |
50 |
|
51 |
Node evacuation |
52 |
--------------- |
53 |
|
54 |
Evacuates instances off their primary nodes. The evacuation mode |
55 |
can be given as ``primary-only``, ``secondary-only`` or |
56 |
``all``. The call is given a list of instances whose primary nodes need |
57 |
to be in the same node group. The returned nodes need to be in the same |
58 |
group as the original primary node. |
59 |
|
60 |
.. _multi-reloc-result: |
61 |
|
62 |
Result |
63 |
------ |
64 |
|
65 |
In all storage models, an inter-group move can be modeled as a sequence |
66 |
of **replace secondary**, **migration** and **failover** operations |
67 |
(when shared storage is used, they will all be failover or migration |
68 |
operations within the corresponding mobility domain). |
69 |
|
70 |
The result of the operations described above must contain two lists of |
71 |
instances and a list of jobs (each of which is a list of serialized |
72 |
opcodes) to actually execute the operation. :doc:`Job dependencies |
73 |
<design-chained-jobs>` can be used to force jobs to run in a certain |
74 |
order while still making use of parallelism. |
75 |
|
76 |
The two lists of instances describe which instances could be |
77 |
moved/migrated and which couldn't for some reason ("unsuccessful"). The |
78 |
union of the instances in the two lists must be equal to the set of |
79 |
instances given in the original request. The successful list of |
80 |
instances contains elements as follows:: |
81 |
|
82 |
(instance name, target group name, [chosen node names]) |
83 |
|
84 |
The choice of names is simply for readability reasons (for example, |
85 |
Ganeti could log the computed solution in the job information) and for |
86 |
being able to check (manually) for consistency that the generated |
87 |
opcodes match the intended target groups/nodes. Note that for the |
88 |
node-evacuate operation, the group is not changed, but it should still |
89 |
be returned as such (as it's easier to have the same return type for |
90 |
both operations). |
91 |
|
92 |
The unsuccessful list of instances contains elements as follows:: |
93 |
|
94 |
(instance name, explanation) |
95 |
|
96 |
where ``explanation`` is a string describing why the plugin was not able |
97 |
to relocate the instance. |
98 |
|
99 |
The client is given a list of job IDs (see the :doc:`design for |
100 |
LU-generated jobs <design-lu-generated-jobs>`) which it can watch. |
101 |
Failures should be reported to the user. |
102 |
|
103 |
.. highlight:: python |
104 |
|
105 |
Example job list:: |
106 |
|
107 |
[ |
108 |
# First job |
109 |
[ |
110 |
{ "OP_ID": "OP_INSTANCE_MIGRATE", |
111 |
"instance_name": "inst1.example.com", |
112 |
}, |
113 |
{ "OP_ID": "OP_INSTANCE_MIGRATE", |
114 |
"instance_name": "inst2.example.com", |
115 |
}, |
116 |
], |
117 |
# Second job |
118 |
[ |
119 |
{ "OP_ID": "OP_INSTANCE_REPLACE_DISKS", |
120 |
"depends": [ |
121 |
[-1, ["success"]], |
122 |
], |
123 |
"instance_name": "inst2.example.com", |
124 |
"mode": "replace_new_secondary", |
125 |
"remote_node": "node4.example.com", |
126 |
}, |
127 |
], |
128 |
# Third job |
129 |
[ |
130 |
{ "OP_ID": "OP_INSTANCE_FAILOVER", |
131 |
"depends": [ |
132 |
[-2, []], |
133 |
], |
134 |
"instance_name": "inst8.example.com", |
135 |
}, |
136 |
], |
137 |
] |
138 |
|
139 |
Accepted opcodes: |
140 |
|
141 |
- ``OP_INSTANCE_FAILOVER`` |
142 |
- ``OP_INSTANCE_MIGRATE`` |
143 |
- ``OP_INSTANCE_REPLACE_DISKS`` |
144 |
|
145 |
.. vim: set textwidth=72 : |
146 |
.. Local Variables: |
147 |
.. mode: rst |
148 |
.. fill-column: 72 |
149 |
.. End: |