X-Git-Url: https://code.grnet.gr/git/ganeti-local/blobdiff_plain/5bbd3f7f72506933b9ab86d2d1faff85ab3b4551..e4c03256a608b64421258ea50f60d6fa032ef0a4:/doc/iallocator.rst diff --git a/doc/iallocator.rst b/doc/iallocator.rst index 9dfc847..c6b150b 100644 --- a/doc/iallocator.rst +++ b/doc/iallocator.rst @@ -1,7 +1,7 @@ Ganeti automatic instance allocation ==================================== -Documents Ganeti version 2.0 +Documents Ganeti version 2.5 .. contents:: @@ -68,7 +68,14 @@ Input message ~~~~~~~~~~~~~ The input message will be the JSON encoding of a dictionary containing -the following: +all the required information to perform the operation. We explain the +contents of this dictionary in two parts: common information that every +type of operation requires, and operation-specific information. + +Common information +++++++++++++++++++ + +All input dictionaries to the IAllocator must carry the following keys: version the version of the protocol; this document @@ -84,89 +91,27 @@ enabled_hypervisors the list of enabled hypervisors request - a dictionary containing the request data: + a dictionary containing the details of the request; the keys vary + depending on the type of operation that's being requested, as + explained in `Operation-specific input`_ below. - type - the request type; this can be either ``allocate`` or ``relocate``; - the ``allocate`` request is used when a new instance needs to be - placed on the cluster, while the ``relocate`` request is used when - an existing instance needs to be moved within the cluster +nodegroups + a dictionary with the data for the cluster's node groups; it is keyed + on the group UUID, and the values are a dictionary with the following + keys: name - the name of the instance; if the request is a realocation, then - this name will be found in the list of instances (see below), - otherwise is the FQDN of the new instance - - required_nodes - how many nodes should the algorithm return; while this information - can be deduced from the instace's disk template, it's better if - this computation is left to Ganeti as then allocator scripts are - less sensitive to changes to the disk templates - - disk_space_total - the total disk space that will be used by this instance on the - (new) nodes; again, this information can be computed from the list - of instance disks and its template type, but Ganeti is better - suited to compute it - - If the request is an allocation, then there are extra fields in the - request dictionary: - - disks - list of dictionaries holding the disk definitions for this - instance (in the order they are exported to the hypervisor): - - mode - either ``r`` or ``w`` denoting if the disk is read-only or - writable - - size - the size of this disk in mebibytes - - nics - a list of dictionaries holding the network interfaces for this - instance, containing: - - ip - the IP address that Ganeti know for this instance, or null - - mac - the MAC address for this interface - - bridge - the bridge to which this interface will be connected - - vcpus - the number of VCPUs for the instance - - disk_template - the disk template for the instance - - memory - the memory size for the instance - - os - the OS type for the instance - - tags - the list of the instance's tags - - hypervisor - the hypervisor of this instance - - - If the request is of type relocate, then there is one more entry in - the request dictionary, named ``relocate_from``, and it contains a - list of nodes to move the instance away from; note that with Ganeti - 2.0, this list will always contain a single node, the current - secondary of the instance. + the node group name + alloc_policy + the allocation policy of the node group (consult the semantics of + this attribute in the :manpage:`gnt-group(8)` manpage) instances a dictionary with the data for the current existing instance on the cluster, indexed by instance name; the contents are similar to the instance definitions for the allocate mode, with the addition of: - admin_up + admin_state if this instance is set to run (but not the actual status of the instance) @@ -176,7 +121,7 @@ instances nodes dictionary with the data for the nodes in the cluster, indexed by - the node name; the dict contains: + the node name; the dict contains [*]_ : total_disk the total disk size of this node (mebibytes) @@ -221,13 +166,145 @@ nodes i_pri_up_memory: total memory required by running primary instances + group: + the node group that this node belongs to + No allocations should be made on nodes having either the ``drained`` or ``offline`` flags set. More details about these of node status - flags is available in the manpage *ganeti(7)*. + flags is available in the manpage :manpage:`ganeti(7)`. + +.. [*] Note that no run-time data is present for offline, drained or + non-vm_capable nodes; this means the tags total_memory, + reserved_memory, free_memory, total_disk, free_disk, total_cpus, + i_pri_memory and i_pri_up memory will be absent + +Operation-specific input +++++++++++++++++++++++++ + +All input dictionaries to the IAllocator carry, in the ``request`` +dictionary, detailed information about the operation that's being +requested. The required keys vary depending on the type of operation, as +follows. + +In all cases, it includes: + + type + the request type; this can be either ``allocate``, ``relocate``, + ``change-group`` or ``node-evacuate``. The + ``allocate`` request is used when a new instance needs to be placed + on the cluster. The ``relocate`` request is used when an existing + instance needs to be moved within its node group. + + The ``multi-evacuate`` protocol used to request that the script + computes the optimal relocate solution for all secondary instances + of the given nodes. It is now deprecated and needs only be + implemented if backwards compatibility with Ganeti 2.4 and lower is + needed. + + The ``change-group`` request is used to relocate multiple instances + across multiple node groups. ``node-evacuate`` evacuates instances + off their node(s). These are described in a separate :ref:`design + document `. + +For both allocate and relocate mode, the following extra keys are needed +in the ``request`` dictionary: + + name + the name of the instance; if the request is a realocation, then this + name will be found in the list of instances (see below), otherwise + is the FQDN of the new instance; type *string* + + required_nodes + how many nodes should the algorithm return; while this information + can be deduced from the instace's disk template, it's better if + this computation is left to Ganeti as then allocator scripts are + less sensitive to changes to the disk templates; type *integer* + + disk_space_total + the total disk space that will be used by this instance on the + (new) nodes; again, this information can be computed from the list + of instance disks and its template type, but Ganeti is better + suited to compute it; type *integer* + +.. pyassert:: + + constants.DISK_ACCESS_SET == set([constants.DISK_RDONLY, + constants.DISK_RDWR]) + +Allocation needs, in addition: + disks + list of dictionaries holding the disk definitions for this + instance (in the order they are exported to the hypervisor): + + mode + either :pyeval:`constants.DISK_RDONLY` or + :pyeval:`constants.DISK_RDWR` denoting if the disk is read-only or + writable + + size + the size of this disk in mebibytes + + nics + a list of dictionaries holding the network interfaces for this + instance, containing: -Respone message -~~~~~~~~~~~~~~~ + ip + the IP address that Ganeti know for this instance, or null + + mac + the MAC address for this interface + + bridge + the bridge to which this interface will be connected + + vcpus + the number of VCPUs for the instance + + disk_template + the disk template for the instance + + memory + the memory size for the instance + + os + the OS type for the instance + + tags + the list of the instance's tags + + hypervisor + the hypervisor of this instance + +Relocation: + + relocate_from + a list of nodes to move the instance away from (note that with + Ganeti 2.0, this list will always contain a single node, the + current secondary of the instance); type *list of strings* + +As for ``node-evacuate``, it needs the following request arguments: + + instances + a list of instance names to evacuate; type *list of strings* + + evac_mode + specify which instances to evacuate; one of ``primary-only``, + ``secondary-only``, ``all``, type *string* + +``change-group`` needs the following request arguments: + + instances + a list of instance names whose group to change; type + *list of strings* + + target_groups + must either be the empty list, or contain a list of group UUIDs that + should be considered for relocating instances to; type + *list of strings* + +Response message +~~~~~~~~~~~~~~~~ The response message is much more simple than the input one. It is also a dict having three keys: @@ -239,12 +316,23 @@ info a string with information from the scripts; if the allocation fails, this will be shown to the user -nodes - the list of nodes computed by the algorithm; even if the algorithm - failed (i.e. success is false), this must be returned as an empty - list; also note that the length of this list must equal the - ``requested_nodes`` entry in the input message, otherwise Ganeti - will consider the result as failed +result + the output of the algorithm; even if the algorithm failed + (i.e. success is false), this must be returned as an empty list + + for allocate/relocate, this is the list of node(s) for the instance; + note that the length of this list must equal the ``requested_nodes`` + entry in the input message, otherwise Ganeti will consider the result + as failed + + for the ``node-evacuate`` and ``change-group`` modes, this is a + dictionary containing, among other information, a list of lists of + serialized opcodes; see the :ref:`design document + ` for a detailed description + +.. note:: Current Ganeti version accepts either ``result`` or ``nodes`` + as a backwards-compatibility measure (older versions only supported + ``nodes``) Examples -------- @@ -252,42 +340,22 @@ Examples Input messages to scripts ~~~~~~~~~~~~~~~~~~~~~~~~~ -Input message, new instance allocation:: +Input message, new instance allocation (common elements are listed this +time, but not included in further examples below):: { + "version": 2, + "cluster_name": "cluster1.example.com", "cluster_tags": [], - "request": { - "required_nodes": 2, - "name": "instance3.example.com", - "tags": [ - "type:test", - "owner:foo" - ], - "type": "allocate", - "disks": [ - { - "mode": "w", - "size": 1024 - }, - { - "mode": "w", - "size": 2048 - } - ], - "nics": [ - { - "ip": null, - "mac": "00:11:22:33:44:55", - "bridge": null - } - ], - "vcpus": 1, - "disk_template": "drbd", - "memory": 2048, - "disk_space_total": 3328, - "os": "etch-image" + "enabled_hypervisors": [ + "xen-pvm" + ], + "nodegroups": { + "f4e06e0d-528a-4963-a5ad-10f3e114232d": { + "name": "default", + "alloc_policy": "preferred" + } }, - "cluster_name": "cluster1.example.com", "instances": { "instance1.example.com": { "tags": [], @@ -315,7 +383,7 @@ Input message, new instance allocation:: "nodes": [ "nodee1.com" ], - "os": "etch-image" + "os": "debootstrap+default" }, "instance2.example.com": { "tags": [], @@ -344,53 +412,90 @@ Input message, new instance allocation:: "node2.example.com", "node3.example.com" ], - "os": "etch-image" + "os": "debootstrap+default" } }, - "version": 1, "nodes": { "node1.example.com": { "total_disk": 858276, - "primary_ip": "192.168.1.1", - "secondary_ip": "192.168.2.1", + "primary_ip": "198.51.100.1", + "secondary_ip": "192.0.2.1", "tags": [], + "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d", "free_memory": 3505, "free_disk": 856740, "total_memory": 4095 }, "node2.example.com": { "total_disk": 858240, - "primary_ip": "192.168.1.3", - "secondary_ip": "192.168.2.3", + "primary_ip": "198.51.100.2", + "secondary_ip": "192.0.2.2", "tags": ["test"], + "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d", "free_memory": 3505, "free_disk": 848320, "total_memory": 4095 }, "node3.example.com.com": { "total_disk": 572184, - "primary_ip": "192.168.1.3", - "secondary_ip": "192.168.2.3", + "primary_ip": "198.51.100.3", + "secondary_ip": "192.0.2.3", "tags": [], + "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d", "free_memory": 3505, "free_disk": 570648, "total_memory": 4095 } + }, + "request": { + "type": "allocate", + "name": "instance3.example.com", + "required_nodes": 2, + "disk_space_total": 3328, + "disks": [ + { + "mode": "w", + "size": 1024 + }, + { + "mode": "w", + "size": 2048 + } + ], + "nics": [ + { + "ip": null, + "mac": "00:11:22:33:44:55", + "bridge": null + } + ], + "vcpus": 1, + "disk_template": "drbd", + "memory": 2048, + "os": "debootstrap+default", + "tags": [ + "type:test", + "owner:foo" + ], + hypervisor: "xen-pvm" } } -Input message, reallocation. Since only the request entry in the input -message is changed, we show only this changed entry:: +Input message, reallocation:: - "request": { - "relocate_from": [ - "node3.example.com" - ], - "required_nodes": 1, - "type": "relocate", - "name": "instance2.example.com", - "disk_space_total": 832 - }, + { + "version": 2, + ... + "request": { + "type": "relocate", + "name": "instance2.example.com", + "required_nodes": 1, + "disk_space_total": 832, + "relocate_from": [ + "node3.example.com" + ] + } + } Response messages @@ -398,35 +503,66 @@ Response messages Successful response message:: { + "success": true, "info": "Allocation successful", - "nodes": [ + "result": [ "node2.example.com", "node1.example.com" - ], - "success": true + ] } Failed response message:: { + "success": false, "info": "Can't find a suitable node for position 2 (already selected: node2.example.com)", - "nodes": [], - "success": false + "result": [] } +Successful node evacuation message:: + + { + "success": true, + "info": "Request successful", + "result": [ + [ + "instance1", + "node3" + ], + [ + "instance2", + "node1" + ] + ] + } + + Command line messages ~~~~~~~~~~~~~~~~~~~~~ :: - # gnt-instance add -t plain -m 2g --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance3 + # gnt-instance add -t plain -m 2g --os-size 1g --swap-size 512m --iallocator hail -o debootstrap+default instance3 Selected nodes for the instance: node1.example.com * creating instance disks... [...] - # gnt-instance add -t plain -m 3400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance4 + # gnt-instance add -t plain -m 3400m --os-size 1g --swap-size 512m --iallocator hail -o debootstrap+default instance4 Failure: prerequisites not met for this operation: - Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 1 (already selected: ) + Can't compute nodes using iallocator 'hail': Can't find a suitable node for position 1 (already selected: ) - # gnt-instance add -t drbd -m 1400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance5 + # gnt-instance add -t drbd -m 1400m --os-size 1g --swap-size 512m --iallocator hail -o debootstrap+default instance5 Failure: prerequisites not met for this operation: - Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 2 (already selected: node1.example.com) + Can't compute nodes using iallocator 'hail': Can't find a suitable node for position 2 (already selected: node1.example.com) + +Reference implementation +~~~~~~~~~~~~~~~~~~~~~~~~ + +Ganeti's default iallocator is "hail" which is available when "htools" +components have been enabled at build time (see :doc:`install-quick` for +more details). + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: