Ganeti automatic instance allocation
====================================
-Documents Ganeti version 2.1
+Documents Ganeti version 2.9
.. contents::
Command line interface changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The node selection options in instanece add and instance replace disks
+The node selection options in instance add and instance replace disks
can be replace by the new ``--iallocator=NAME`` option (shortened to
``-I``), which will cause the auto-assignement of nodes with the
-passed iallocator. The selected node(s) will be show as part of the
+passed iallocator. The selected node(s) will be shown as part of the
command output.
IAllocator API
~~~~~~~~~~~~~
The input message will be the JSON encoding of a dictionary containing
-the following:
+all the required information to perform the operation. We explain the
+contents of this dictionary in two parts: common information that every
+type of operation requires, and operation-specific information.
+
+Common information
+++++++++++++++++++
+
+All input dictionaries to the IAllocator must carry the following keys:
version
the version of the protocol; this document
enabled_hypervisors
the list of enabled hypervisors
-request
- a dictionary containing the request data:
-
- type
- the request type; this can be either ``allocate``, ``relocate`` or
- ``multi-evacuate``; the ``allocate`` request is used when a new
- instance needs to be placed on the cluster, while the ``relocate``
- request is used when an existing instance needs to be moved within
- the cluster; the ``multi-evacuate`` protocol requests that the
- script computes the optimal relocate solution for all secondary
- instances of the given nodes
-
- The following keys are needed in allocate/relocate mode:
-
- name
- the name of the instance; if the request is a realocation, then this
- name will be found in the list of instances (see below), otherwise
- is the FQDN of the new instance
-
- required_nodes
- how many nodes should the algorithm return; while this information
- can be deduced from the instace's disk template, it's better if
- this computation is left to Ganeti as then allocator scripts are
- less sensitive to changes to the disk templates
-
- disk_space_total
- the total disk space that will be used by this instance on the
- (new) nodes; again, this information can be computed from the list
- of instance disks and its template type, but Ganeti is better
- suited to compute it
-
- If the request is an allocation, then there are extra fields in the
- request dictionary:
-
- disks
- list of dictionaries holding the disk definitions for this
- instance (in the order they are exported to the hypervisor):
-
- mode
- either ``ro`` or ``rw`` denoting if the disk is read-only or
- writable
-
- size
- the size of this disk in mebibytes
-
- nics
- a list of dictionaries holding the network interfaces for this
- instance, containing:
-
- ip
- the IP address that Ganeti know for this instance, or null
-
- mac
- the MAC address for this interface
-
- bridge
- the bridge to which this interface will be connected
-
- vcpus
- the number of VCPUs for the instance
-
- disk_template
- the disk template for the instance
-
- memory
- the memory size for the instance
-
- os
- the OS type for the instance
-
- tags
- the list of the instance's tags
-
- hypervisor
- the hypervisor of this instance
-
-
- If the request is of type relocate, then there is one more entry in
- the request dictionary, named ``relocate_from``, and it contains a
- list of nodes to move the instance away from; note that with Ganeti
- 2.0, this list will always contain a single node, the current
- secondary of the instance.
-
- The multi-evacuate mode has instead a single request argument:
+ipolicy
+ the cluster-wide instance policy (for information; the per-node group
+ values take precedence and should be used instead)
- nodes
- the names of the nodes to be evacuated
+request
+ a dictionary containing the details of the request; the keys vary
+ depending on the type of operation that's being requested, as
+ explained in `Operation-specific input`_ below.
nodegroups
a dictionary with the data for the cluster's node groups; it is keyed
name
the node group name
alloc_policy
- the allocation policy of the node group
+ the allocation policy of the node group (consult the semantics of
+ this attribute in the :manpage:`gnt-group(8)` manpage)
+ networks
+ the list of network UUID's this node group is connected to
+ ipolicy
+ the instance policy of the node group
+ tags
+ the list of node group tags
instances
a dictionary with the data for the current existing instance on the
cluster, indexed by instance name; the contents are similar to the
instance definitions for the allocate mode, with the addition of:
- admin_up
+ admin_state
if this instance is set to run (but not the actual status of the
instance)
reserved_memory, free_memory, total_disk, free_disk, total_cpus,
i_pri_memory and i_pri_up memory will be absent
+Operation-specific input
+++++++++++++++++++++++++
+
+All input dictionaries to the IAllocator carry, in the ``request``
+dictionary, detailed information about the operation that's being
+requested. The required keys vary depending on the type of operation, as
+follows.
+
+In all cases, it includes:
+
+ type
+ the request type; this can be either ``allocate``, ``relocate``,
+ ``change-group`` or ``node-evacuate``. The
+ ``allocate`` request is used when a new instance needs to be placed
+ on the cluster. The ``relocate`` request is used when an existing
+ instance needs to be moved within its node group.
+
+ The ``multi-evacuate`` protocol used to request that the script
+ computes the optimal relocate solution for all secondary instances
+ of the given nodes. It is now deprecated and needs only be
+ implemented if backwards compatibility with Ganeti 2.4 and lower is
+ needed.
+
+ The ``change-group`` request is used to relocate multiple instances
+ across multiple node groups. ``node-evacuate`` evacuates instances
+ off their node(s). These are described in a separate :ref:`design
+ document <multi-reloc-detailed-design>`.
+
+ The ``multi-allocate`` request is used to allocate multiple
+ instances on the cluster. The request is beside of that very
+ similiar to the ``allocate`` one. For more details look at
+ :doc:`Ganeti bulk create <design-bulk-create>`.
+
+For both allocate and relocate mode, the following extra keys are needed
+in the ``request`` dictionary:
+
+ name
+ the name of the instance; if the request is a realocation, then this
+ name will be found in the list of instances (see below), otherwise
+ is the FQDN of the new instance; type *string*
+
+ required_nodes
+ how many nodes should the algorithm return; while this information
+ can be deduced from the instace's disk template, it's better if
+ this computation is left to Ganeti as then allocator scripts are
+ less sensitive to changes to the disk templates; type *integer*
+
+ disk_space_total
+ the total disk space that will be used by this instance on the
+ (new) nodes; again, this information can be computed from the list
+ of instance disks and its template type, but Ganeti is better
+ suited to compute it; type *integer*
+
+.. pyassert::
+
+ constants.DISK_ACCESS_SET == set([constants.DISK_RDONLY,
+ constants.DISK_RDWR])
+
+Allocation needs, in addition:
+
+ disks
+ list of dictionaries holding the disk definitions for this
+ instance (in the order they are exported to the hypervisor):
+
+ mode
+ either :pyeval:`constants.DISK_RDONLY` or
+ :pyeval:`constants.DISK_RDWR` denoting if the disk is read-only or
+ writable
+
+ size
+ the size of this disk in mebibytes
+
+ nics
+ a list of dictionaries holding the network interfaces for this
+ instance, containing:
+
+ ip
+ the IP address that Ganeti know for this instance, or null
+
+ mac
+ the MAC address for this interface
+
+ bridge
+ the bridge to which this interface will be connected
+
+ vcpus
+ the number of VCPUs for the instance
+
+ disk_template
+ the disk template for the instance
+
+ memory
+ the memory size for the instance
+
+ os
+ the OS type for the instance
+
+ tags
+ the list of the instance's tags
+
+ hypervisor
+ the hypervisor of this instance
+
+Relocation:
+
+ relocate_from
+ a list of nodes to move the instance away from; for DRBD-based
+ instances, this will contain a single node, the current secondary
+ of the instance, whereas for shared-storage instance, this will
+ contain also a single node, the current primary of the instance;
+ type *list of strings*
+
+As for ``node-evacuate``, it needs the following request arguments:
+
+ instances
+ a list of instance names to evacuate; type *list of strings*
+
+ evac_mode
+ specify which instances to evacuate; one of ``primary-only``,
+ ``secondary-only``, ``all``, type *string*
+
+``change-group`` needs the following request arguments:
+
+ instances
+ a list of instance names whose group to change; type
+ *list of strings*
+
+ target_groups
+ must either be the empty list, or contain a list of group UUIDs that
+ should be considered for relocating instances to; type
+ *list of strings*
+
+``multi-allocate`` needs the following request arguments:
+
+ instances
+ a list of request dicts
Response message
~~~~~~~~~~~~~~~~
entry in the input message, otherwise Ganeti will consider the result
as failed
- for multi-evacuation mode, this is a list of lists; each element of
- the list is a list of instance name and the new secondary node
+ for the ``node-evacuate`` and ``change-group`` modes, this is a
+ dictionary containing, among other information, a list of lists of
+ serialized opcodes; see the :ref:`design document
+ <multi-reloc-result>` for a detailed description
+
+ for the ``multi-allocate`` mode this is a tuple of 2 lists, the first
+ being element of the tuple is a list of succeeded allocation, with the
+ instance name as first element of each entry and the node placement in
+ the second. The second element of the tuple is the instance list of
+ failed allocations.
.. note:: Current Ganeti version accepts either ``result`` or ``nodes``
as a backwards-compatibility measure (older versions only supported
Input messages to scripts
~~~~~~~~~~~~~~~~~~~~~~~~~
-Input message, new instance allocation::
+Input message, new instance allocation (common elements are listed this
+time, but not included in further examples below)::
{
+ "version": 2,
+ "cluster_name": "cluster1.example.com",
"cluster_tags": [],
- "request": {
- "required_nodes": 2,
- "name": "instance3.example.com",
- "tags": [
- "type:test",
- "owner:foo"
- ],
- "type": "allocate",
- "disks": [
- {
- "mode": "w",
- "size": 1024
+ "enabled_hypervisors": [
+ "xen-pvm"
+ ],
+ "nodegroups": {
+ "f4e06e0d-528a-4963-a5ad-10f3e114232d": {
+ "name": "default",
+ "alloc_policy": "preferred",
+ "networks": ["net-uuid-1", "net-uuid-2"],
+ "ipolicy": {
+ "disk-templates": ["drbd", "plain"],
+ "minmax": [
+ {
+ "max": {
+ "cpu-count": 2,
+ "disk-count": 8,
+ "disk-size": 2048,
+ "memory-size": 12800,
+ "nic-count": 8,
+ "spindle-use": 8
+ },
+ "min": {
+ "cpu-count": 1,
+ "disk-count": 1,
+ "disk-size": 1024,
+ "memory-size": 128,
+ "nic-count": 1,
+ "spindle-use": 1
+ }
+ }
+ ],
+ "spindle-ratio": 32.0,
+ "std": {
+ "cpu-count": 1,
+ "disk-count": 1,
+ "disk-size": 1024,
+ "memory-size": 128,
+ "nic-count": 1,
+ "spindle-use": 1
+ },
+ "vcpu-ratio": 4.0
},
- {
- "mode": "w",
- "size": 2048
- }
- ],
- "nics": [
- {
- "ip": null,
- "mac": "00:11:22:33:44:55",
- "bridge": null
- }
- ],
- "vcpus": 1,
- "disk_template": "drbd",
- "memory": 2048,
- "disk_space_total": 3328,
- "os": "debootstrap+default"
+ "tags": ["ng-tag-1", "ng-tag-2"]
+ }
},
- "cluster_name": "cluster1.example.com",
"instances": {
"instance1.example.com": {
"tags": [],
"os": "debootstrap+default"
}
},
- "version": 1,
"nodes": {
"node1.example.com": {
"total_disk": 858276,
"primary_ip": "198.51.100.1",
"secondary_ip": "192.0.2.1",
"tags": [],
+ "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d",
"free_memory": 3505,
"free_disk": 856740,
"total_memory": 4095
"primary_ip": "198.51.100.2",
"secondary_ip": "192.0.2.2",
"tags": ["test"],
+ "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d",
"free_memory": 3505,
"free_disk": 848320,
"total_memory": 4095
"primary_ip": "198.51.100.3",
"secondary_ip": "192.0.2.3",
"tags": [],
+ "group": "f4e06e0d-528a-4963-a5ad-10f3e114232d",
"free_memory": 3505,
"free_disk": 570648,
"total_memory": 4095
}
+ },
+ "request": {
+ "type": "allocate",
+ "name": "instance3.example.com",
+ "required_nodes": 2,
+ "disk_space_total": 3328,
+ "disks": [
+ {
+ "mode": "w",
+ "size": 1024
+ },
+ {
+ "mode": "w",
+ "size": 2048
+ }
+ ],
+ "nics": [
+ {
+ "ip": null,
+ "mac": "00:11:22:33:44:55",
+ "bridge": null
+ }
+ ],
+ "vcpus": 1,
+ "disk_template": "drbd",
+ "memory": 2048,
+ "os": "debootstrap+default",
+ "tags": [
+ "type:test",
+ "owner:foo"
+ ],
+ hypervisor: "xen-pvm"
}
}
-Input message, reallocation. Since only the request entry in the input
-message is changed, we show only this changed entry::
-
- "request": {
- "relocate_from": [
- "node3.example.com"
- ],
- "required_nodes": 1,
- "type": "relocate",
- "name": "instance2.example.com",
- "disk_space_total": 832
- },
-
+Input message, reallocation::
-Input message, node evacuation::
-
- "request": {
- "evac_nodes": [
- "node2"
- ],
- "type": "multi-evacuate"
- },
+ {
+ "version": 2,
+ ...
+ "request": {
+ "type": "relocate",
+ "name": "instance2.example.com",
+ "required_nodes": 1,
+ "disk_space_total": 832,
+ "relocate_from": [
+ "node3.example.com"
+ ]
+ }
+ }
Response messages
Successful response message::
{
+ "success": true,
"info": "Allocation successful",
"result": [
"node2.example.com",
"node1.example.com"
- ],
- "success": true
+ ]
}
Failed response message::
{
+ "success": false,
"info": "Can't find a suitable node for position 2 (already selected: node2.example.com)",
- "result": [],
- "success": false
+ "result": []
}
Successful node evacuation message::
{
+ "success": true,
"info": "Request successful",
"result": [
[
"instance2",
"node1"
]
- ],
- "success": true
+ ]
}
Reference implementation
~~~~~~~~~~~~~~~~~~~~~~~~
-Ganeti's default iallocator is "hail" which is part of the separate
-ganeti-htools project. In order to see its source code please clone
-``git://git.ganeti.org/htools.git``. Note that htools is implemented
-using the Haskell programming language.
+Ganeti's default iallocator is "hail" which is available when "htools"
+components have been enabled at build time (see :doc:`install-quick` for
+more details).
.. vim: set textwidth=72 :
.. Local Variables: