X-Git-Url: https://code.grnet.gr/git/ganeti-local/blobdiff_plain/6d267b8142f9ec81240446110e2026bbb36ad1f8..65c9591cb40abb922116ed475fd2693df689db38:/doc/design-multi-reloc.rst diff --git a/doc/design-multi-reloc.rst b/doc/design-multi-reloc.rst index fca80fa..039d51d 100644 --- a/doc/design-multi-reloc.rst +++ b/doc/design-multi-reloc.rst @@ -23,52 +23,124 @@ groups so that, for example, it is possible to move a set of instances to another group for policy reasons, or completely empty a given group to perform maintenance operations. -To implement this, we propose a new ``multi-relocate`` IAllocator call -that will be able to compute inter-group instance moves, taking into -account mobility domains as appropriate. The interface proposed below -should be enough to cover the use cases mentioned above. +To implement this, we propose the addition of new IAllocator calls to +compute inter-group instance moves and group-aware node evacuation, +taking into account mobility domains as appropriate. The interface +proposed below should be enough to cover the use cases mentioned above. + +With the implementation of this design proposal, the previous +``multi-evacuate`` mode will be deprecated. .. _multi-reloc-detailed-design: Detailed design =============== -We introduce a new ``multi-relocate`` IAllocator call whose input will -be a list of instances to move, and a "mode of operation" that will -determine what groups will be candidates to receive the new instances. +All requests honor the groups' ``alloc_policy`` attribute. -The mode of operation will be one of: +Changing instance's groups +-------------------------- -- *Stay in group*: the instances will be moved off their current nodes, - but will stay in the same group; this is what the ``relocate`` call - does, but here it can act on multiple instances. (Typically, the - source nodes will be marked as drained, to avoid just exchanging - instances among them.) +Takes a list of instances and a list of node group UUIDs; the instances +will be moved away from their current group, to any of the groups in the +target list. All instances need to have their primary node in the same +group, which may not be a target group. If the target group list is +empty, the request is simply "change group" and the instances are placed +in any group but their original one. -- *Change group*: this mode accepts one extra parameter, a list of node - group UUIDs; the instances will be moved away from their current - group, to any of the groups in this list. If the list is empty, the - request is, simply, "change group": the instances are placed in any - group but their original one. +Node evacuation +--------------- -- *Any*: for each instance, any group is valid, including its current - one. +Evacuates instances off their primary nodes. The evacuation mode +can be given as ``primary-only``, ``secondary-only`` or +``all``. The call is given a list of instances whose primary nodes need +to be in the same node group. The returned nodes need to be in the same +group as the original primary node. -In all modes, the groups' ``alloc_policy`` attribute will be honored. +.. _multi-reloc-result: Result ------ In all storage models, an inter-group move can be modeled as a sequence -of **replace secondary** and **failover** operations (when shared -storage is used, they will all be failover operations within the -corresponding mobility domain). This will be represented as a list of -``(instance, [operations])`` pairs. - -For replace secondary operations, a new secondary node must be -specified. For failover operations, a node *may* be specified when -necessary, e.g. when shared storage is in use and there's no designated -secondary for the instance. +of **replace secondary**, **migration** and **failover** operations +(when shared storage is used, they will all be failover or migration +operations within the corresponding mobility domain). + +The result of the operations described above must contain two lists of +instances and a list of jobs (each of which is a list of serialized +opcodes) to actually execute the operation. :doc:`Job dependencies +` can be used to force jobs to run in a certain +order while still making use of parallelism. + +The two lists of instances describe which instances could be +moved/migrated and which couldn't for some reason ("unsuccessful"). The +union of the instances in the two lists must be equal to the set of +instances given in the original request. The successful list of +instances contains elements as follows:: + + (instance name, target group name, [chosen node names]) + +The choice of names is simply for readability reasons (for example, +Ganeti could log the computed solution in the job information) and for +being able to check (manually) for consistency that the generated +opcodes match the intended target groups/nodes. Note that for the +node-evacuate operation, the group is not changed, but it should still +be returned as such (as it's easier to have the same return type for +both operations). + +The unsuccessful list of instances contains elements as follows:: + + (instance name, explanation) + +where ``explanation`` is a string describing why the plugin was not able +to relocate the instance. + +The client is given a list of job IDs (see the :doc:`design for +LU-generated jobs `) which it can watch. +Failures should be reported to the user. + +.. highlight:: python + +Example job list:: + + [ + # First job + [ + { "OP_ID": "OP_INSTANCE_MIGRATE", + "instance_name": "inst1.example.com", + }, + { "OP_ID": "OP_INSTANCE_MIGRATE", + "instance_name": "inst2.example.com", + }, + ], + # Second job + [ + { "OP_ID": "OP_INSTANCE_REPLACE_DISKS", + "depends": [ + [-1, ["success"]], + ], + "instance_name": "inst2.example.com", + "mode": "replace_new_secondary", + "remote_node": "node4.example.com", + }, + ], + # Third job + [ + { "OP_ID": "OP_INSTANCE_FAILOVER", + "depends": [ + [-2, []], + ], + "instance_name": "inst8.example.com", + }, + ], + ] + +Accepted opcodes: + +- ``OP_INSTANCE_FAILOVER`` +- ``OP_INSTANCE_MIGRATE`` +- ``OP_INSTANCE_REPLACE_DISKS`` .. vim: set textwidth=72 : .. Local Variables: