X-Git-Url: https://code.grnet.gr/git/ganeti-local/blobdiff_plain/2f7140ba34e2d8af70e05cbdb32ec4a4cadcc466..172679c916c8e08f9a5c46c68878b13ec4d5279d:/doc/iallocator.rst diff --git a/doc/iallocator.rst b/doc/iallocator.rst index 467f79d..b8c752c 100644 --- a/doc/iallocator.rst +++ b/doc/iallocator.rst @@ -1,7 +1,7 @@ Ganeti automatic instance allocation ==================================== -Documents Ganeti version 2.0 +Documents Ganeti version 2.4 .. contents:: @@ -68,7 +68,14 @@ Input message ~~~~~~~~~~~~~ The input message will be the JSON encoding of a dictionary containing -the following: +all the required information to perform the operation. We explain the +contents of this dictionary in two parts: common information that every +type of operation requires, and operation-specific information. + +Common information +++++++++++++++++++ + +All input dictionaries to the IAllocator must carry the following keys: version the version of the protocol; this document @@ -84,82 +91,20 @@ enabled_hypervisors the list of enabled hypervisors request - a dictionary containing the request data: + a dictionary containing the details of the request; the keys vary + depending on the type of operation that's being requested, as + explained in `Operation-specific input`_ below. - type - the request type; this can be either ``allocate`` or ``relocate``; - the ``allocate`` request is used when a new instance needs to be - placed on the cluster, while the ``relocate`` request is used when - an existing instance needs to be moved within the cluster +nodegroups + a dictionary with the data for the cluster's node groups; it is keyed + on the group UUID, and the values are a dictionary with the following + keys: name - the name of the instance; if the request is a realocation, then - this name will be found in the list of instances (see below), - otherwise is the FQDN of the new instance - - required_nodes - how many nodes should the algorithm return; while this information - can be deduced from the instace's disk template, it's better if - this computation is left to Ganeti as then allocator scripts are - less sensitive to changes to the disk templates - - disk_space_total - the total disk space that will be used by this instance on the - (new) nodes; again, this information can be computed from the list - of instance disks and its template type, but Ganeti is better - suited to compute it - - If the request is an allocation, then there are extra fields in the - request dictionary: - - disks - list of dictionaries holding the disk definitions for this - instance (in the order they are exported to the hypervisor): - - mode - either ``r`` or ``w`` denoting if the disk is read-only or - writable - - size - the size of this disk in mebibytes - - nics - a list of dictionaries holding the network interfaces for this - instance, containing: - - ip - the IP address that Ganeti know for this instance, or null - - mac - the MAC address for this interface - - bridge - the bridge to which this interface will be connected - - vcpus - the number of VCPUs for the instance - - disk_template - the disk template for the instance - - memory - the memory size for the instance - - os - the OS type for the instance - - tags - the list of the instance's tags - - hypervisor - the hypervisor of this instance - - - If the request is of type relocate, then there is one more entry in - the request dictionary, named ``relocate_from``, and it contains a - list of nodes to move the instance away from; note that with Ganeti - 2.0, this list will always contain a single node, the current - secondary of the instance. + the node group name + alloc_policy + the allocation policy of the node group (consult the semantics of + this attribute in the :manpage:`gnt-group(8)` manpage) instances a dictionary with the data for the current existing instance on the @@ -176,7 +121,7 @@ instances nodes dictionary with the data for the nodes in the cluster, indexed by - the node name; the dict contains: + the node name; the dict contains [*]_ : total_disk the total disk size of this node (mebibytes) @@ -221,13 +166,150 @@ nodes i_pri_up_memory: total memory required by running primary instances + group: + the node group that this node belongs to + No allocations should be made on nodes having either the ``drained`` or ``offline`` flags set. More details about these of node status flags is available in the manpage :manpage:`ganeti(7)`. +.. [*] Note that no run-time data is present for offline, drained or + non-vm_capable nodes; this means the tags total_memory, + reserved_memory, free_memory, total_disk, free_disk, total_cpus, + i_pri_memory and i_pri_up memory will be absent + +Operation-specific input +++++++++++++++++++++++++ + +All input dictionaries to the IAllocator carry, in the ``request`` +dictionary, detailed information about the operation that's being +requested. The required keys vary depending on the type of operation, as +follows. + +In all cases, it includes: + + type + the request type; this can be either ``allocate``, ``relocate``, + ``change-group``, ``node-evacuate`` or ``multi-evacuate``. The + ``allocate`` request is used when a new instance needs to be placed + on the cluster. The ``relocate`` request is used when an existing + instance needs to be moved within its node group. + + The ``multi-evacuate`` protocol used to request that the script + computes the optimal relocate solution for all secondary instances + of the given nodes. It is now deprecated and should no longer be + used. + + The ``change-group`` request is used to relocate multiple instances + across multiple node groups. ``node-evacuate`` evacuates instances + off their node(s). These are described in a separate :ref:`design + document `. + +For both allocate and relocate mode, the following extra keys are needed +in the ``request`` dictionary: + + name + the name of the instance; if the request is a realocation, then this + name will be found in the list of instances (see below), otherwise + is the FQDN of the new instance; type *string* + + required_nodes + how many nodes should the algorithm return; while this information + can be deduced from the instace's disk template, it's better if + this computation is left to Ganeti as then allocator scripts are + less sensitive to changes to the disk templates; type *integer* + + disk_space_total + the total disk space that will be used by this instance on the + (new) nodes; again, this information can be computed from the list + of instance disks and its template type, but Ganeti is better + suited to compute it; type *integer* + +.. pyassert:: + + constants.DISK_ACCESS_SET == set([constants.DISK_RDONLY, + constants.DISK_RDWR]) + +Allocation needs, in addition: + + disks + list of dictionaries holding the disk definitions for this + instance (in the order they are exported to the hypervisor): + + mode + either :pyeval:`constants.DISK_RDONLY` or + :pyeval:`constants.DISK_RDWR` denoting if the disk is read-only or + writable + + size + the size of this disk in mebibytes + + nics + a list of dictionaries holding the network interfaces for this + instance, containing: + + ip + the IP address that Ganeti know for this instance, or null + + mac + the MAC address for this interface + + bridge + the bridge to which this interface will be connected + + vcpus + the number of VCPUs for the instance + + disk_template + the disk template for the instance + + memory + the memory size for the instance + + os + the OS type for the instance + + tags + the list of the instance's tags + + hypervisor + the hypervisor of this instance + +Relocation: + + relocate_from + a list of nodes to move the instance away from (note that with + Ganeti 2.0, this list will always contain a single node, the + current secondary of the instance); type *list of strings* + +As for ``node-evacuate``, it needs the following request arguments: -Respone message -~~~~~~~~~~~~~~~ + instances + a list of instance names to evacuate; type *list of strings* + + evac_mode + specify which instances to evacuate; one of ``primary-only``, + ``secondary-only``, ``all``, type *string* + +``change-group`` needs the following request arguments: + + instances + a list of instance names whose group to change; type + *list of strings* + + target_groups + must either be the empty list, or contain a list of group UUIDs that + should be considered for relocating instances to; type + *list of strings* + +Finally, in the case of multi-evacuate, there's one single request +argument (in addition to ``type``): + + evac_nodes + the names of the nodes to be evacuated; type *list of strings* + +Response message +~~~~~~~~~~~~~~~~ The response message is much more simple than the input one. It is also a dict having three keys: @@ -239,12 +321,26 @@ info a string with information from the scripts; if the allocation fails, this will be shown to the user -nodes - the list of nodes computed by the algorithm; even if the algorithm - failed (i.e. success is false), this must be returned as an empty - list; also note that the length of this list must equal the - ``requested_nodes`` entry in the input message, otherwise Ganeti - will consider the result as failed +result + the output of the algorithm; even if the algorithm failed + (i.e. success is false), this must be returned as an empty list + + for allocate/relocate, this is the list of node(s) for the instance; + note that the length of this list must equal the ``requested_nodes`` + entry in the input message, otherwise Ganeti will consider the result + as failed + + for the ``node-evacuate`` and ``change-group`` modes, this is a + dictionary containing, among other information, a list of lists of + serialized opcodes; see the :ref:`design document + ` for a detailed description + + for multi-evacuation mode, this is a list of lists; each element of + the list is a list of instance name and the new secondary node + +.. note:: Current Ganeti version accepts either ``result`` or ``nodes`` + as a backwards-compatibility measure (older versions only supported + ``nodes``) Examples -------- @@ -252,42 +348,22 @@ Examples Input messages to scripts ~~~~~~~~~~~~~~~~~~~~~~~~~ -Input message, new instance allocation:: +Input message, new instance allocation (common elements are listed this +time, but not included in further examples below):: { + "version": 2, + "cluster_name": "cluster1.example.com", "cluster_tags": [], - "request": { - "required_nodes": 2, - "name": "instance3.example.com", - "tags": [ - "type:test", - "owner:foo" - ], - "type": "allocate", - "disks": [ - { - "mode": "w", - "size": 1024 - }, - { - "mode": "w", - "size": 2048 - } - ], - "nics": [ - { - "ip": null, - "mac": "00:11:22:33:44:55", - "bridge": null - } - ], - "vcpus": 1, - "disk_template": "drbd", - "memory": 2048, - "disk_space_total": 3328, - "os": "etch-image" + "enabled_hypervisors": [ + "xen-pvm" + ], + "nodegroups": { + "f4e06e0d-528a-4963-a5ad-10f3e114232d": { + "name": "default", + "alloc_policy": "preferred" + } }, - "cluster_name": "cluster1.example.com", "instances": { "instance1.example.com": { "tags": [], @@ -315,7 +391,7 @@ Input message, new instance allocation:: "nodes": [ "nodee1.com" ], - "os": "etch-image" + "os": "debootstrap+default" }, "instance2.example.com": { "tags": [], @@ -344,53 +420,103 @@ Input message, new instance allocation:: "node2.example.com", "node3.example.com" ], - "os": "etch-image" + "os": "debootstrap+default" } }, - "version": 1, "nodes": { "node1.example.com": { "total_disk": 858276, - "primary_ip": "192.168.1.1", - "secondary_ip": "192.168.2.1", + "primary_ip": "198.51.100.1", + "secondary_ip": "192.0.2.1", "tags": [], + "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d", "free_memory": 3505, "free_disk": 856740, "total_memory": 4095 }, "node2.example.com": { "total_disk": 858240, - "primary_ip": "192.168.1.3", - "secondary_ip": "192.168.2.3", + "primary_ip": "198.51.100.2", + "secondary_ip": "192.0.2.2", "tags": ["test"], + "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d", "free_memory": 3505, "free_disk": 848320, "total_memory": 4095 }, "node3.example.com.com": { "total_disk": 572184, - "primary_ip": "192.168.1.3", - "secondary_ip": "192.168.2.3", + "primary_ip": "198.51.100.3", + "secondary_ip": "192.0.2.3", "tags": [], + "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d", "free_memory": 3505, "free_disk": 570648, "total_memory": 4095 } + }, + "request": { + "type": "allocate", + "name": "instance3.example.com", + "required_nodes": 2, + "disk_space_total": 3328, + "disks": [ + { + "mode": "w", + "size": 1024 + }, + { + "mode": "w", + "size": 2048 + } + ], + "nics": [ + { + "ip": null, + "mac": "00:11:22:33:44:55", + "bridge": null + } + ], + "vcpus": 1, + "disk_template": "drbd", + "memory": 2048, + "os": "debootstrap+default", + "tags": [ + "type:test", + "owner:foo" + ], + hypervisor: "xen-pvm" } } -Input message, reallocation. Since only the request entry in the input -message is changed, we show only this changed entry:: +Input message, reallocation:: - "request": { - "relocate_from": [ - "node3.example.com" - ], - "required_nodes": 1, - "type": "relocate", - "name": "instance2.example.com", - "disk_space_total": 832 - }, + { + "version": 2, + ... + "request": { + "type": "relocate", + "name": "instance2.example.com", + "required_nodes": 1, + "disk_space_total": 832, + "relocate_from": [ + "node3.example.com" + ] + } + } + +Input message, node evacuation:: + + { + "version": 2, + ... + "request": { + "type": "multi-evacuate", + "evac_nodes": [ + "node2" + ], + } + } Response messages @@ -398,35 +524,66 @@ Response messages Successful response message:: { + "success": true, "info": "Allocation successful", - "nodes": [ + "result": [ "node2.example.com", "node1.example.com" - ], - "success": true + ] } Failed response message:: { + "success": false, "info": "Can't find a suitable node for position 2 (already selected: node2.example.com)", - "nodes": [], - "success": false + "result": [] } +Successful node evacuation message:: + + { + "success": true, + "info": "Request successful", + "result": [ + [ + "instance1", + "node3" + ], + [ + "instance2", + "node1" + ] + ] + } + + Command line messages ~~~~~~~~~~~~~~~~~~~~~ :: - # gnt-instance add -t plain -m 2g --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance3 + # gnt-instance add -t plain -m 2g --os-size 1g --swap-size 512m --iallocator hail -o debootstrap+default instance3 Selected nodes for the instance: node1.example.com * creating instance disks... [...] - # gnt-instance add -t plain -m 3400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance4 + # gnt-instance add -t plain -m 3400m --os-size 1g --swap-size 512m --iallocator hail -o debootstrap+default instance4 Failure: prerequisites not met for this operation: - Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 1 (already selected: ) + Can't compute nodes using iallocator 'hail': Can't find a suitable node for position 1 (already selected: ) - # gnt-instance add -t drbd -m 1400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance5 + # gnt-instance add -t drbd -m 1400m --os-size 1g --swap-size 512m --iallocator hail -o debootstrap+default instance5 Failure: prerequisites not met for this operation: - Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 2 (already selected: node1.example.com) + Can't compute nodes using iallocator 'hail': Can't find a suitable node for position 2 (already selected: node1.example.com) + +Reference implementation +~~~~~~~~~~~~~~~~~~~~~~~~ + +Ganeti's default iallocator is "hail" which is available when "htools" +components have been enabled at build time (see :doc:`install-quick` for +more details). + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: