Ganeti automatic instance allocation
====================================
-Documents Ganeti version 2.0
+Documents Ganeti version 2.4
.. contents::
~~~~~~~~~~~~~
The input message will be the JSON encoding of a dictionary containing
-the following:
+all the required information to perform the operation. We explain the
+contents of this dictionary in two parts: common information that every
+type of operation requires, and operation-specific information.
+
+Common information
+++++++++++++++++++
+
+All input dictionaries to the IAllocator must carry the following keys:
version
the version of the protocol; this document
the list of enabled hypervisors
request
- a dictionary containing the request data:
+ a dictionary containing the details of the request; the keys vary
+ depending on the type of operation that's being requested, as
+ explained in `Operation-specific input`_ below.
- type
- the request type; this can be either ``allocate`` or ``relocate``;
- the ``allocate`` request is used when a new instance needs to be
- placed on the cluster, while the ``relocate`` request is used when
- an existing instance needs to be moved within the cluster
+nodegroups
+ a dictionary with the data for the cluster's node groups; it is keyed
+ on the group UUID, and the values are a dictionary with the following
+ keys:
name
- the name of the instance; if the request is a realocation, then
- this name will be found in the list of instances (see below),
- otherwise is the FQDN of the new instance
-
- required_nodes
- how many nodes should the algorithm return; while this information
- can be deduced from the instace's disk template, it's better if
- this computation is left to Ganeti as then allocator scripts are
- less sensitive to changes to the disk templates
-
- disk_space_total
- the total disk space that will be used by this instance on the
- (new) nodes; again, this information can be computed from the list
- of instance disks and its template type, but Ganeti is better
- suited to compute it
-
- If the request is an allocation, then there are extra fields in the
- request dictionary:
-
- disks
- list of dictionaries holding the disk definitions for this
- instance (in the order they are exported to the hypervisor):
-
- mode
- either ``r`` or ``w`` denoting if the disk is read-only or
- writable
-
- size
- the size of this disk in mebibytes
-
- nics
- a list of dictionaries holding the network interfaces for this
- instance, containing:
-
- ip
- the IP address that Ganeti know for this instance, or null
-
- mac
- the MAC address for this interface
-
- bridge
- the bridge to which this interface will be connected
-
- vcpus
- the number of VCPUs for the instance
-
- disk_template
- the disk template for the instance
-
- memory
- the memory size for the instance
-
- os
- the OS type for the instance
-
- tags
- the list of the instance's tags
-
- hypervisor
- the hypervisor of this instance
-
-
- If the request is of type relocate, then there is one more entry in
- the request dictionary, named ``relocate_from``, and it contains a
- list of nodes to move the instance away from; note that with Ganeti
- 2.0, this list will always contain a single node, the current
- secondary of the instance.
+ the node group name
+ alloc_policy
+ the allocation policy of the node group (consult the semantics of
+ this attribute in the :manpage:`gnt-group(8)` manpage)
instances
a dictionary with the data for the current existing instance on the
nodes
dictionary with the data for the nodes in the cluster, indexed by
- the node name; the dict contains:
+ the node name; the dict contains [*]_ :
total_disk
the total disk size of this node (mebibytes)
i_pri_up_memory:
total memory required by running primary instances
+ group:
+ the node group that this node belongs to
+
No allocations should be made on nodes having either the ``drained``
or ``offline`` flags set. More details about these of node status
flags is available in the manpage :manpage:`ganeti(7)`.
+.. [*] Note that no run-time data is present for offline, drained or
+ non-vm_capable nodes; this means the tags total_memory,
+ reserved_memory, free_memory, total_disk, free_disk, total_cpus,
+ i_pri_memory and i_pri_up memory will be absent
+
+Operation-specific input
+++++++++++++++++++++++++
+
+All input dictionaries to the IAllocator carry, in the ``request``
+dictionary, detailed information about the operation that's being
+requested. The required keys vary depending on the type of operation, as
+follows.
+
+In all cases, it includes:
+
+ type
+ the request type; this can be either ``allocate``, ``relocate``,
+ ``change-group``, ``node-evacuate`` or ``multi-evacuate``. The
+ ``allocate`` request is used when a new instance needs to be placed
+ on the cluster. The ``relocate`` request is used when an existing
+ instance needs to be moved within its node group.
+
+ The ``multi-evacuate`` protocol used to request that the script
+ computes the optimal relocate solution for all secondary instances
+ of the given nodes. It is now deprecated and should no longer be
+ used.
+
+ The ``change-group`` request is used to relocate multiple instances
+ across multiple node groups. ``node-evacuate`` evacuates instances
+ off their node(s). These are described in a separate :ref:`design
+ document <multi-reloc-detailed-design>`.
+
+For both allocate and relocate mode, the following extra keys are needed
+in the ``request`` dictionary:
+
+ name
+ the name of the instance; if the request is a realocation, then this
+ name will be found in the list of instances (see below), otherwise
+ is the FQDN of the new instance; type *string*
+
+ required_nodes
+ how many nodes should the algorithm return; while this information
+ can be deduced from the instace's disk template, it's better if
+ this computation is left to Ganeti as then allocator scripts are
+ less sensitive to changes to the disk templates; type *integer*
+
+ disk_space_total
+ the total disk space that will be used by this instance on the
+ (new) nodes; again, this information can be computed from the list
+ of instance disks and its template type, but Ganeti is better
+ suited to compute it; type *integer*
+
+.. pyassert::
+
+ constants.DISK_ACCESS_SET == set([constants.DISK_RDONLY,
+ constants.DISK_RDWR])
+
+Allocation needs, in addition:
+
+ disks
+ list of dictionaries holding the disk definitions for this
+ instance (in the order they are exported to the hypervisor):
+
+ mode
+ either :pyeval:`constants.DISK_RDONLY` or
+ :pyeval:`constants.DISK_RDWR` denoting if the disk is read-only or
+ writable
+
+ size
+ the size of this disk in mebibytes
+
+ nics
+ a list of dictionaries holding the network interfaces for this
+ instance, containing:
+
+ ip
+ the IP address that Ganeti know for this instance, or null
+
+ mac
+ the MAC address for this interface
+
+ bridge
+ the bridge to which this interface will be connected
+
+ vcpus
+ the number of VCPUs for the instance
+
+ disk_template
+ the disk template for the instance
+
+ memory
+ the memory size for the instance
+
+ os
+ the OS type for the instance
+
+ tags
+ the list of the instance's tags
+
+ hypervisor
+ the hypervisor of this instance
+
+Relocation:
+
+ relocate_from
+ a list of nodes to move the instance away from (note that with
+ Ganeti 2.0, this list will always contain a single node, the
+ current secondary of the instance); type *list of strings*
+
+As for ``node-evacuate``, it needs the following request arguments:
-Respone message
-~~~~~~~~~~~~~~~
+ instances
+ a list of instance names to evacuate; type *list of strings*
+
+ evac_mode
+ specify which instances to evacuate; one of ``primary-only``,
+ ``secondary-only``, ``all``, type *string*
+
+``change-group`` needs the following request arguments:
+
+ instances
+ a list of instance names whose group to change; type
+ *list of strings*
+
+ target_groups
+ must either be the empty list, or contain a list of group UUIDs that
+ should be considered for relocating instances to; type
+ *list of strings*
+
+Finally, in the case of multi-evacuate, there's one single request
+argument (in addition to ``type``):
+
+ evac_nodes
+ the names of the nodes to be evacuated; type *list of strings*
+
+Response message
+~~~~~~~~~~~~~~~~
The response message is much more simple than the input one. It is
also a dict having three keys:
success
- a boolean value denoting if the allocation was successfull or not
+ a boolean value denoting if the allocation was successful or not
info
a string with information from the scripts; if the allocation fails,
this will be shown to the user
-nodes
- the list of nodes computed by the algorithm; even if the algorithm
- failed (i.e. success is false), this must be returned as an empty
- list; also note that the length of this list must equal the
- ``requested_nodes`` entry in the input message, otherwise Ganeti
- will consider the result as failed
+result
+ the output of the algorithm; even if the algorithm failed
+ (i.e. success is false), this must be returned as an empty list
+
+ for allocate/relocate, this is the list of node(s) for the instance;
+ note that the length of this list must equal the ``requested_nodes``
+ entry in the input message, otherwise Ganeti will consider the result
+ as failed
+
+ for the ``node-evacuate`` and ``change-group`` modes, this is a
+ dictionary containing, among other information, a list of lists of
+ serialized opcodes; see the :ref:`design document
+ <multi-reloc-result>` for a detailed description
+
+ for multi-evacuation mode, this is a list of lists; each element of
+ the list is a list of instance name and the new secondary node
+
+.. note:: Current Ganeti version accepts either ``result`` or ``nodes``
+ as a backwards-compatibility measure (older versions only supported
+ ``nodes``)
Examples
--------
Input messages to scripts
~~~~~~~~~~~~~~~~~~~~~~~~~
-Input message, new instance allocation::
+Input message, new instance allocation (common elements are listed this
+time, but not included in further examples below)::
{
+ "version": 2,
+ "cluster_name": "cluster1.example.com",
"cluster_tags": [],
- "request": {
- "required_nodes": 2,
- "name": "instance3.example.com",
- "tags": [
- "type:test",
- "owner:foo"
- ],
- "type": "allocate",
- "disks": [
- {
- "mode": "w",
- "size": 1024
- },
- {
- "mode": "w",
- "size": 2048
- }
- ],
- "nics": [
- {
- "ip": null,
- "mac": "00:11:22:33:44:55",
- "bridge": null
- }
- ],
- "vcpus": 1,
- "disk_template": "drbd",
- "memory": 2048,
- "disk_space_total": 3328,
- "os": "etch-image"
+ "enabled_hypervisors": [
+ "xen-pvm"
+ ],
+ "nodegroups": {
+ "f4e06e0d-528a-4963-a5ad-10f3e114232d": {
+ "name": "default",
+ "alloc_policy": "preferred"
+ }
},
- "cluster_name": "cluster1.example.com",
"instances": {
"instance1.example.com": {
"tags": [],
"nodes": [
"nodee1.com"
],
- "os": "etch-image"
+ "os": "debootstrap+default"
},
"instance2.example.com": {
"tags": [],
"node2.example.com",
"node3.example.com"
],
- "os": "etch-image"
+ "os": "debootstrap+default"
}
},
- "version": 1,
"nodes": {
"node1.example.com": {
"total_disk": 858276,
- "primary_ip": "192.168.1.1",
- "secondary_ip": "192.168.2.1",
+ "primary_ip": "198.51.100.1",
+ "secondary_ip": "192.0.2.1",
"tags": [],
+ "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d",
"free_memory": 3505,
"free_disk": 856740,
"total_memory": 4095
},
"node2.example.com": {
"total_disk": 858240,
- "primary_ip": "192.168.1.3",
- "secondary_ip": "192.168.2.3",
+ "primary_ip": "198.51.100.2",
+ "secondary_ip": "192.0.2.2",
"tags": ["test"],
+ "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d",
"free_memory": 3505,
"free_disk": 848320,
"total_memory": 4095
},
"node3.example.com.com": {
"total_disk": 572184,
- "primary_ip": "192.168.1.3",
- "secondary_ip": "192.168.2.3",
+ "primary_ip": "198.51.100.3",
+ "secondary_ip": "192.0.2.3",
"tags": [],
+ "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d",
"free_memory": 3505,
"free_disk": 570648,
"total_memory": 4095
}
+ },
+ "request": {
+ "type": "allocate",
+ "name": "instance3.example.com",
+ "required_nodes": 2,
+ "disk_space_total": 3328,
+ "disks": [
+ {
+ "mode": "w",
+ "size": 1024
+ },
+ {
+ "mode": "w",
+ "size": 2048
+ }
+ ],
+ "nics": [
+ {
+ "ip": null,
+ "mac": "00:11:22:33:44:55",
+ "bridge": null
+ }
+ ],
+ "vcpus": 1,
+ "disk_template": "drbd",
+ "memory": 2048,
+ "os": "debootstrap+default",
+ "tags": [
+ "type:test",
+ "owner:foo"
+ ],
+ hypervisor: "xen-pvm"
}
}
-Input message, reallocation. Since only the request entry in the input
-message is changed, we show only this changed entry::
+Input message, reallocation::
- "request": {
- "relocate_from": [
- "node3.example.com"
- ],
- "required_nodes": 1,
- "type": "relocate",
- "name": "instance2.example.com",
- "disk_space_total": 832
- },
+ {
+ "version": 2,
+ ...
+ "request": {
+ "type": "relocate",
+ "name": "instance2.example.com",
+ "required_nodes": 1,
+ "disk_space_total": 832,
+ "relocate_from": [
+ "node3.example.com"
+ ]
+ }
+ }
+
+Input message, node evacuation::
+
+ {
+ "version": 2,
+ ...
+ "request": {
+ "type": "multi-evacuate",
+ "evac_nodes": [
+ "node2"
+ ],
+ }
+ }
Response messages
Successful response message::
{
+ "success": true,
"info": "Allocation successful",
- "nodes": [
+ "result": [
"node2.example.com",
"node1.example.com"
- ],
- "success": true
+ ]
}
Failed response message::
{
+ "success": false,
"info": "Can't find a suitable node for position 2 (already selected: node2.example.com)",
- "nodes": [],
- "success": false
+ "result": []
}
+Successful node evacuation message::
+
+ {
+ "success": true,
+ "info": "Request successful",
+ "result": [
+ [
+ "instance1",
+ "node3"
+ ],
+ [
+ "instance2",
+ "node1"
+ ]
+ ]
+ }
+
+
Command line messages
~~~~~~~~~~~~~~~~~~~~~
::
- # gnt-instance add -t plain -m 2g --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance3
+ # gnt-instance add -t plain -m 2g --os-size 1g --swap-size 512m --iallocator hail -o debootstrap+default instance3
Selected nodes for the instance: node1.example.com
* creating instance disks...
[...]
- # gnt-instance add -t plain -m 3400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance4
+ # gnt-instance add -t plain -m 3400m --os-size 1g --swap-size 512m --iallocator hail -o debootstrap+default instance4
Failure: prerequisites not met for this operation:
- Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 1 (already selected: )
+ Can't compute nodes using iallocator 'hail': Can't find a suitable node for position 1 (already selected: )
- # gnt-instance add -t drbd -m 1400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance5
+ # gnt-instance add -t drbd -m 1400m --os-size 1g --swap-size 512m --iallocator hail -o debootstrap+default instance5
Failure: prerequisites not met for this operation:
- Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 2 (already selected: node1.example.com)
+ Can't compute nodes using iallocator 'hail': Can't find a suitable node for position 2 (already selected: node1.example.com)
+
+Reference implementation
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Ganeti's default iallocator is "hail" which is available when "htools"
+components have been enabled at build time (see :doc:`install-quick` for
+more details).
+
+.. vim: set textwidth=72 :
+.. Local Variables:
+.. mode: rst
+.. fill-column: 72
+.. End: