- `Python IP address manipulation library
<http://code.google.com/p/ipaddr-py/>`_
- `Bitarray Python library <http://pypi.python.org/pypi/bitarray/>`_
+- `GNU Make <http://www.gnu.org/software/make/>`_
These programs are supplied as part of most Linux distributions, so
usually they can be installed via the standard package manager. Also
Debian/Ubuntu, you can use this command line to install all required
packages, except for RBD, DRBD and Xen::
- $ apt-get install lvm2 ssh bridge-utils iproute iputils-arping \
+ $ apt-get install lvm2 ssh bridge-utils iproute iputils-arping make \
ndisc6 python python-pyopenssl openssl \
python-pyparsing python-simplejson python-bitarray \
python-pyinotify python-pycurl python-ipaddr socat fping
Or on newer distributions (eg. Debian Wheezy) the above becomes::
- $ apt-get install lvm2 ssh bridge-utils iproute iputils-arping \
+ $ apt-get install lvm2 ssh bridge-utils iproute iputils-arping make \
ndisc6 python python-openssl openssl \
python-pyparsing python-simplejson python-bitarray \
python-pyinotify python-pycurl python-ipaddr socat fping
On Fedora to install all required packages except RBD, DRBD and Xen::
- $ yum install openssh openssh-clients bridge-utils iproute ndisc6 \
+ $ yum install openssh openssh-clients bridge-utils iproute ndisc6 make \
pyOpenSSL pyparsing python-simplejson python-inotify \
python-lxm socat fping python-bitarray python-ipaddr
or ``cabal``, after installing a required non-Haskell dependency::
- $ apt-get install libpcre3-dev
+ $ apt-get install libpcre3-dev libcurl4-openssl-dev
$ cabal install hslogger Crypto text hinotify==0.3.2 regex-pcre \
attoparsec vector snap-server
doc/install.rst \
doc/locking.rst \
doc/manpages-disabled.rst \
+ doc/monitoring-query-format.rst \
doc/move-instance.rst \
doc/news.rst \
doc/ovfconverter.rst \
# Things to build but not to install (add it to EXTRA_DIST if it should be
# distributed)
noinst_DATA = \
- doc/html \
$(BUILT_EXAMPLES) \
doc/examples/bash_completion \
doc/examples/bash_completion-debug \
$(manhtml)
+if HAS_SPHINX
if MANPAGES_IN_DOC
noinst_DATA += doc/man-html
+else
+noinst_DATA += doc/html
+endif
endif
gnt_scripts = \
for the unit tests (and only used for testing).
-Version 2.8.0 beta1
--------------------
+Version 2.8.0 rc1
+-----------------
-*(Released Mon, 24 Jun 2013)*
+*(Released Fri, 2 Aug 2013)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- hail now honors network restrictions when allocating nodes. This led to an
update of the IAllocator protocol. See the IAllocator documentation for
details.
+- confd now only answers static configuration request over the network. luxid
+ was extracted, listens on the local LUXI socket and responds to live queries.
+ This allows finer grained permissions if using separate users.
New features
~~~~~~~~~~~~
- New command ``show-ispecs-cmd`` for ``gnt-cluster`` and ``gnt-group``.
It prints the command line to set the current policies, to ease
changing them.
+- Add the ``vnet_hdr`` HV parameter for KVM, to control whether the tap
+ devices for KVM virtio-net interfaces will get created with VNET_HDR
+ (IFF_VNET_HDR) support. If set to false, it disables offloading on the
+ virtio-net interfaces, which prevents host kernel tainting and log
+ flooding, when dealing with broken or malicious virtio-net drivers.
+ It's set to true by default.
+- Instance failover now supports a ``--cleanup`` parameter for fixing previous
+ failures.
New dependencies
~~~~~~~~~~~~~~~~
- The minimum Python version needed to run Ganeti is now 2.6.
- ``yaml`` library (only for running the QA).
+Since 2.8.0 beta1
+~~~~~~~~~~~~~~~~~
+
+- Fix upgrading/downgrading from 2.7
+- Increase maximum RAPI message size
+- Documentation updates
+- Split ``confd`` between ``luxid`` and ``confd``
+- Merge 2.7 series up to the 2.7.1 release
+- Allow the ``modify_etc_hosts`` option to be changed
+- Add better debugging for ``luxid`` queries
+- Expose bulk parameter for GetJobs in RAPI client
+- Expose missing ``network`` fields in RAPI
+- Add some ``cluster verify`` tests
+- Some unittest fixes
+- Fix a malfunction in ``hspace``'s tiered allocation
+- Fix query compatibility between haskell and python implementations
+- Add the ``vnet_hdr`` HV parameter for KVM
+- Add ``--cleanup`` to instance failover
+- Change the connected groups format in ``gnt-network info`` output; it
+ was previously displayed as a raw list by mistake. (Merged from 2.7)
+
+
+Version 2.8.0 beta1
+-------------------
+
+*(Released Mon, 24 Jun 2013)*
+
+This was the first beta release of the 2.8 series. All important changes
+are listed in the latest 2.8 entry.
+
Version 2.7.1
-------------
$ /usr/lib/ganeti/ensure-dirs --full-run
#. Create the (missing) required users and make users part of the required
-groups on all nodes::
+ groups on all nodes::
$ /usr/lib/ganeti/tools/users-setup
upgrading again.
#. Install the old Ganeti version on all nodes
+
+ NB: in Ganeti 2.8, the ``cmdlib.py`` file was split into a series of files
+ contained in the ``cmdlib`` directory. If Ganeti is installed from sources
+ and not from a package, while downgrading Ganeti to a pre-2.8
+ version it is important to remember to remove the ``cmdlib`` directory
+ from the directory containing the Ganeti python files (which usually is
+ ``${PREFIX}/lib/python${VERSION}/dist-packages/ganeti``).
+ A simpler upgrade/downgrade procedure will be made available in future
+ versions of Ganeti.
+
#. Restart daemons on all nodes::
$ /etc/init.d/ganeti restart
AC_MSG_ERROR([Sphinx 1.0 or higher is required])
fi
fi
+AM_CONDITIONAL([HAS_SPHINX], [test -n "$SPHINX"])
AC_ARG_ENABLE([manpages-in-doc],
[AS_HELP_STRING([--enable-manpages-in-doc],
shopt -s expand_aliases
alias in_chroot='schroot -c $CHNAME -d / '
-alias subst_variables='sed \
- -e "s/\${ARCH}/$ARCH/" \
- -e "s*\${CHDIR}*$CHDIR*" \
- -e "s/\${CHNAME}/$CHNAME/" \
- -e "s/\${CHROOTNAME}/$CHROOTNAME/" \
- -e "s*\${CHROOT_DIR}*$CHROOT_DIR*" \
- -e "s/\${COMP_FILENAME}/$COMP_FILENAME/" \
- -e "s/\${DIST_RELEASE}/$DIST_RELEASE/"'
+function subst_variables {
+ sed \
+ -e "s/\${ARCH}/$ARCH/" \
+ -e "s*\${CHDIR}*$CHDIR*" \
+ -e "s/\${CHNAME}/$CHNAME/" \
+ -e "s/\${CHROOTNAME}/$CHROOTNAME/" \
+ -e "s*\${CHROOT_DIR}*$CHROOT_DIR*" \
+ -e "s/\${COMP_FILENAME}/$COMP_FILENAME/" \
+ -e "s/\${DIST_RELEASE}/$DIST_RELEASE/" $@
+}
#Generate chroot configurations
cat $ACTUAL_DATA_DIR/temp.schroot.conf.in | subst_variables > $TEMP_CHROOT_CONF
- :doc:`design-reason-trail`
- :doc:`design-autorepair`
+- :doc:`design-device-uuid-name`
The following designs have been partially implemented in Ganeti 2.8:
and works on Linux, but is not-portable; however, Ganeti doesn't work on
non-Linux system at the moment.
+Luxi daemon
+-----------
+
+The ``luxid`` daemon (automatically enabled if ``confd`` is enabled at
+build time) serves local (UNIX socket) queries about the run-time
+configuration. Answering these means talking to other cluster nodes,
+exactly as ``masterd`` does. See the notes for ``masterd`` regarding
+permission-based protection.
+
Conf daemon
-----------
In Ganeti 2.8, the ``confd`` daemon (if enabled at build time), serves
-both network-originated queries (about the static configuration) and
-local (UNIX socket) queries (about the run-time configuration; answering
-these means talking to other cluster nodes, which makes use of the
-internal RPC SSL certificate). This makes it a bit more sensitive to
-bugs (a remote attacker could get direct access to the intra-cluster
-RPC), so to harden security it's recommended to:
-
-- disable confd at build time if it's not needed in your setup
-- otherwise, configure Ganeti (at build time) to use separate users, so
- that the confd daemon doesn't also have access to the server SSL/TLS
+network-originated queries about parts of the static cluster
+configuration.
+
+If Ganeti is not configured (at build time) to use separate users,
+``confd`` has access to all Ganeti related files (including internal RPC
+SSL certificates). This makes it a bit more sensitive to bugs (a remote
+attacker could get direct access to the intra-cluster RPC), so to harden
+security it's recommended to:
+
+- disable confd at build time if it (and ``luxid``) is not needed in
+ your setup.
+- configure Ganeti (at build time) to use separate users, so that the
+ confd daemon doesn't also have access to the server SSL/TLS
certificates.
-
-NB: the second suggestion is not valid since Ganeti 2.8.0~beta1, because confd
-needs access to the certificate in order to communicate on the network.
-This will be fixed when the planned split of the two functionalities
-(local/remote querying) of confd into two separate daemons will take place,
-in a future Ganeti version.
+- add firewall rules to protect the ``confd`` port or bind it to a
+ trusted address. Make sure that all nodes can access the daemon, as
+ the monitoring daemon requires it.
Monitoring daemon
-----------------
errors.ECODE_INVAL)
# set up ssh config and /etc/hosts
- sshline = utils.ReadFile(pathutils.SSH_HOST_RSA_PUB)
- sshkey = sshline.split(" ")[1]
+ rsa_sshkey = ""
+ dsa_sshkey = ""
+ if os.path.isfile(pathutils.SSH_HOST_RSA_PUB):
+ sshline = utils.ReadFile(pathutils.SSH_HOST_RSA_PUB)
+ rsa_sshkey = sshline.split(" ")[1]
+ if os.path.isfile(pathutils.SSH_HOST_DSA_PUB):
+ sshline = utils.ReadFile(pathutils.SSH_HOST_DSA_PUB)
+ dsa_sshkey = sshline.split(" ")[1]
+ if not rsa_sshkey and not dsa_sshkey:
+ raise errors.OpPrereqError("Failed to find SSH public keys",
+ errors.ECODE_ENVIRON)
if modify_etc_hosts:
utils.AddHostToEtcHosts(hostname.name, hostname.ip)
# init of cluster config file
cluster_config = objects.Cluster(
serial_no=1,
- rsahostkeypub=sshkey,
+ rsahostkeypub=rsa_sshkey,
+ dsahostkeypub=dsa_sshkey,
highest_used_port=(constants.FIRST_DRBD_PORT - 1),
mac_prefix=mac_prefix,
volume_group_name=vg_name,
CLEANUP_OPT = cli_option("--cleanup", dest="cleanup",
default=False, action="store_true",
- help="Instead of performing the migration, try to"
- " recover from a failed cleanup. This is safe"
+ help="Instead of performing the migration/failover,"
+ " try to recover from a failed cleanup. This is safe"
" to run even if the instance is healthy, but it"
" will create extra replication traffic and "
" disrupt briefly the replication (like during the"
- " migration")
+ " migration/failover")
STATIC_OPT = cli_option("-s", "--static", dest="static",
action="store_true", default=False,
ToStdout("Configuration format: %s", result["config_version"])
ToStdout("OS api version: %s", result["os_api_version"])
ToStdout("Export interface: %s", result["export_version"])
+ ToStdout("VCS version: %s", result["vcs_version"])
return 0
[FORCE_OPT, IGNORE_CONSIST_OPT] + SUBMIT_OPTS +
[SHUTDOWN_TIMEOUT_OPT,
DRY_RUN_OPT, PRIORITY_OPT, DST_NODE_OPT, IALLOCATOR_OPT,
- IGNORE_IPOLICY_OPT],
+ IGNORE_IPOLICY_OPT, CLEANUP_OPT],
"[-f] <instance>", "Stops the instance, changes its primary node and"
" (if it was originally running) starts it on the new node"
" (the secondary for mirrored instances or any node"
if group_list:
ToStdout(" connected to node groups:")
- for group in group_list:
- ToStdout(" %s", group)
+ for group, nic_mode, nic_link in group_list:
+ ToStdout(" %s (%s on %s)", group, nic_mode, nic_link)
else:
ToStdout(" not connected to any node group")
"config_version": constants.CONFIG_VERSION,
"os_api_version": max(constants.OS_API_VERSIONS),
"export_version": constants.EXPORT_VERSION,
+ "vcs_version": constants.VCS_VERSION,
"architecture": runtime.GetArchInfo(),
"name": cluster.cluster_name,
"master": self.cfg.GetMasterNodeName(),
self.new_diskparams = objects.FillDict(cluster.diskparams, {})
if self.op.diskparams:
for dt_name, dt_params in self.op.diskparams.items():
- if dt_name not in self.op.diskparams:
+ if dt_name not in self.new_diskparams:
self.new_diskparams[dt_name] = dt_params
else:
self.new_diskparams[dt_name].update(dt_params)
(errcode, msg) = _VerifyCertificate(cert_filename)
self._ErrorIf(errcode, constants.CV_ECLUSTERCERT, None, msg, code=errcode)
- self._ErrorIf(not utils.CanRead(constants.CONFD_USER,
+ self._ErrorIf(not utils.CanRead(constants.LUXID_USER,
pathutils.NODED_CERT_FILE),
constants.CV_ECLUSTERCERT,
None,
pathutils.NODED_CERT_FILE + " must be accessible by the " +
- constants.CONFD_USER + " user")
+ constants.LUXID_USER + " user")
feedback_fn("* Verifying hypervisor parameters")
" pnode/snode while others do not",
errors.ECODE_INVAL)
- if self.op.iallocator is None:
+ if not has_nodes and self.op.iallocator is None:
default_iallocator = self.cfg.GetDefaultIAllocator()
- if default_iallocator and has_nodes:
+ if default_iallocator:
self.op.iallocator = default_iallocator
else:
raise errors.OpPrereqError("No iallocator or nodes on the instances"
"""Check prerequisite.
"""
- cluster = self.cfg.GetClusterInfo()
- default_vg = self.cfg.GetVGName()
- ec_id = self.proc.GetECId()
+ if self.op.iallocator:
+ cluster = self.cfg.GetClusterInfo()
+ default_vg = self.cfg.GetVGName()
+ ec_id = self.proc.GetECId()
- if self.op.opportunistic_locking:
- # Only consider nodes for which a lock is held
- node_whitelist = self.cfg.GetNodeNames(
- list(self.owned_locks(locking.LEVEL_NODE)))
- else:
- node_whitelist = None
+ if self.op.opportunistic_locking:
+ # Only consider nodes for which a lock is held
+ node_whitelist = self.cfg.GetNodeNames(
+ list(self.owned_locks(locking.LEVEL_NODE)))
+ else:
+ node_whitelist = None
- insts = [_CreateInstanceAllocRequest(op, ComputeDisks(op, default_vg),
- _ComputeNics(op, cluster, None,
- self.cfg, ec_id),
- _ComputeFullBeParams(op, cluster),
- node_whitelist)
- for op in self.op.instances]
+ insts = [_CreateInstanceAllocRequest(op, ComputeDisks(op, default_vg),
+ _ComputeNics(op, cluster, None,
+ self.cfg, ec_id),
+ _ComputeFullBeParams(op, cluster),
+ node_whitelist)
+ for op in self.op.instances]
- req = iallocator.IAReqMultiInstanceAlloc(instances=insts)
- ial = iallocator.IAllocator(self.cfg, self.rpc, req)
+ req = iallocator.IAReqMultiInstanceAlloc(instances=insts)
+ ial = iallocator.IAllocator(self.cfg, self.rpc, req)
- ial.Run(self.op.iallocator)
+ ial.Run(self.op.iallocator)
- if not ial.success:
- raise errors.OpPrereqError("Can't compute nodes using"
- " iallocator '%s': %s" %
- (self.op.iallocator, ial.info),
- errors.ECODE_NORES)
+ if not ial.success:
+ raise errors.OpPrereqError("Can't compute nodes using"
+ " iallocator '%s': %s" %
+ (self.op.iallocator, ial.info),
+ errors.ECODE_NORES)
- self.ia_result = ial.result
+ self.ia_result = ial.result
if self.op.dry_run:
self.dry_run_result = objects.FillDict(self._ConstructPartialResult(), {
"""Contructs the partial result.
"""
- (allocatable, failed) = self.ia_result
+ if self.op.iallocator:
+ (allocatable, failed_insts) = self.ia_result
+ allocatable_insts = map(compat.fst, allocatable)
+ else:
+ allocatable_insts = [op.instance_name for op in self.op.instances]
+ failed_insts = []
+
return {
- opcodes.OpInstanceMultiAlloc.ALLOCATABLE_KEY:
- map(compat.fst, allocatable),
- opcodes.OpInstanceMultiAlloc.FAILED_KEY: failed,
+ opcodes.OpInstanceMultiAlloc.ALLOCATABLE_KEY: allocatable_insts,
+ opcodes.OpInstanceMultiAlloc.FAILED_KEY: failed_insts,
}
def Exec(self, feedback_fn):
"""Executes the opcode.
"""
- op2inst = dict((op.instance_name, op) for op in self.op.instances)
- (allocatable, failed) = self.ia_result
-
jobs = []
- for (name, node_names) in allocatable:
- op = op2inst.pop(name)
+ if self.op.iallocator:
+ op2inst = dict((op.instance_name, op) for op in self.op.instances)
+ (allocatable, failed) = self.ia_result
- (op.pnode_uuid, op.pnode) = \
- ExpandNodeUuidAndName(self.cfg, None, node_names[0])
- if len(node_names) > 1:
- (op.snode_uuid, op.snode) = \
- ExpandNodeUuidAndName(self.cfg, None, node_names[1])
+ for (name, node_names) in allocatable:
+ op = op2inst.pop(name)
- jobs.append([op])
+ (op.pnode_uuid, op.pnode) = \
+ ExpandNodeUuidAndName(self.cfg, None, node_names[0])
+ if len(node_names) > 1:
+ (op.snode_uuid, op.snode) = \
+ ExpandNodeUuidAndName(self.cfg, None, node_names[1])
- missing = set(op2inst.keys()) - set(failed)
- assert not missing, \
- "Iallocator did return incomplete result: %s" % utils.CommaJoin(missing)
+ jobs.append([op])
+
+ missing = set(op2inst.keys()) - set(failed)
+ assert not missing, \
+ "Iallocator did return incomplete result: %s" % \
+ utils.CommaJoin(missing)
+ else:
+ jobs.extend([op] for op in self.op.instances)
return ResultWithJobs(jobs, **self._ConstructPartialResult())
self._migrater = \
TLMigrateInstance(self, self.op.instance_uuid, self.op.instance_name,
- False, True, False, self.op.ignore_consistency, True,
+ self.op.cleanup, True, False,
+ self.op.ignore_consistency, True,
self.op.shutdown_timeout, self.op.ignore_ipolicy)
self.tasklets = [self._migrater]
"SHUTDOWN_TIMEOUT": self.op.shutdown_timeout,
"OLD_PRIMARY": self.cfg.GetNodeName(source_node_uuid),
"NEW_PRIMARY": self.op.target_node,
+ "FAILOVER_CLEANUP": self.op.cleanup,
}
if instance.disk_template in constants.DTS_INT_MIRROR:
return lvnames
def _AllDisks(self):
- """Compute the list of all Disks.
+ """Compute the list of all Disks (recursively, including children).
"""
+ def DiskAndAllChildren(disk):
+ """Returns a list containing the given disk and all of his children.
+
+ """
+ disks = [disk]
+ if disk.children:
+ for child_disk in disk.children:
+ disks.extend(DiskAndAllChildren(child_disk))
+ return disks
+
disks = []
for instance in self._config_data.instances.values():
- disks.extend(instance.disks)
+ for disk in instance.disks:
+ disks.extend(DiskAndAllChildren(disk))
return disks
def _AllNICs(self):
return self._config_data.cluster.enabled_hypervisors[0]
@locking.ssynchronized(_config_lock, shared=1)
- def GetHostKey(self):
+ def GetRsaHostKey(self):
"""Return the rsa hostkey from the config.
@rtype: string
return self._config_data.cluster.rsahostkeypub
@locking.ssynchronized(_config_lock, shared=1)
+ def GetDsaHostKey(self):
+ """Return the dsa hostkey from the config.
+
+ @rtype: string
+ @return: the dsa hostkey
+
+ """
+ return self._config_data.cluster.dsahostkeypub
+
+ @locking.ssynchronized(_config_lock, shared=1)
def GetDefaultIAllocator(self):
"""Get the default instance allocator for this cluster.
HV_VIF_SCRIPT = "vif_script"
HV_XEN_CMD = "xen_cmd"
HV_VNET_HDR = "vnet_hdr"
+HV_VIRIDIAN = "viridian"
HVS_PARAMETER_TYPES = {
HV_VIF_SCRIPT: VTYPE_STRING,
HV_XEN_CMD: VTYPE_STRING,
HV_VNET_HDR: VTYPE_BOOL,
+ HV_VIRIDIAN: VTYPE_BOOL,
}
HVS_PARAMETERS = frozenset(HVS_PARAMETER_TYPES.keys())
HV_CPU_WEIGHT: 256,
HV_VIF_TYPE: HT_HVM_VIF_IOEMU,
HV_VIF_SCRIPT: "",
+ HV_VIRIDIAN: False,
HV_XEN_CMD: XEN_CMD_XM,
},
HT_KVM: {
constants.HV_VIF_TYPE:
hv_base.ParamInSet(False, constants.HT_HVM_VALID_VIF_TYPES),
constants.HV_VIF_SCRIPT: hv_base.OPT_FILE_CHECK,
+ constants.HV_VIRIDIAN: hv_base.NO_CHECK,
constants.HV_XEN_CMD:
hv_base.ParamInSet(True, constants.KNOWN_XEN_COMMANDS),
}
config.write("acpi = 1\n")
else:
config.write("acpi = 0\n")
+ if hvp[constants.HV_VIRIDIAN]:
+ config.write("viridian = 1\n")
+ else:
+ config.write("viridian = 0\n")
+
config.write("apic = 1\n")
config.write("device_model = '%s'\n" % hvp[constants.HV_DEVICE_MODEL])
config.write("boot = '%s'\n" % hvp[constants.HV_BOOT_ORDER])
__slots__ = [
"serial_no",
"rsahostkeypub",
+ "dsahostkeypub",
"highest_used_port",
"tcpudp_port_pool",
"mac_prefix",
" for details"),
(COMMENT_ATTR, None, ht.TMaybeString,
"Comment describing the purpose of the opcode"),
- (constants.OPCODE_REASON, None, ht.TMaybeList,
+ (constants.OPCODE_REASON, ht.EmptyList, ht.TMaybeList,
"The reason trail, describing why the OpCode is executed"),
]
OP_RESULT = None
_PIgnoreIpolicy,
_PIAllocFromDesc("Iallocator for deciding the target node for"
" shared-storage instances"),
+ ("cleanup", False, ht.TBool,
+ "Whether a previously failed failover should be cleaned up"),
]
OP_RESULT = ht.TNone
"API version for OS template scripts"),
"export_version": ("ExportVersion", QFT_NUMBER, constants.EXPORT_VERSION,
"Import/export file format version"),
+ "vcs_version": ("VCSVersion", QFT_TEXT, constants.VCS_VERSION,
+ "VCS version"),
}
"""Writes the cluster-wide equally known_hosts file.
"""
- utils.WriteFile(file_name, mode=0600,
- data="%s ssh-rsa %s\n" % (cfg.GetClusterName(),
- cfg.GetHostKey()))
+ data = ""
+ if cfg.GetRsaHostKey():
+ data += "%s ssh-rsa %s\n" % (cfg.GetClusterName(), cfg.GetRsaHostKey())
+ if cfg.GetDsaHostKey():
+ data += "%s ssh-dss %s\n" % (cfg.GetClusterName(), cfg.GetDsaHostKey())
+
+ utils.WriteFile(file_name, mode=0600, data=data)
if uid_range not in uid_pool:
raise errors.OpPrereqError(
"User-id range to be removed is not found in the current"
- " user-id pool: %s" % uid_range, errors.ECODE_INVAL)
+ " user-id pool: %s" % str(uid_range), errors.ECODE_INVAL)
uid_pool.remove(uid_range)
pae
Valid for the Xen HVM and KVM hypervisors.
- A boolean option that specifies if the hypervisor should enabled
+ A boolean option that specifies if the hypervisor should enable
PAE support for this instance. The default is false, disabling PAE
support.
+viridian
+ Valid for the Xen HVM hypervisor.
+
+ A boolean option that specifies if the hypervisor should enable
+ viridian (Hyper-V) for this instance. The default is false,
+ disabling viridian support.
+
use\_localtime
Valid for the Xen HVM and KVM hypervisors.
Path to the userspace KVM (or qemu) program.
+vnet\_hdr
+ Valid for the KVM hypervisor.
+
+ This boolean option determines whether the tap devices used by the
+ KVM paravirtual nics (virtio-net) will get created with VNET_HDR
+ (IFF_VNET_HDR) support.
+
+ If set to false, it effectively disables offloading on the virio-net
+ interfaces, which prevents host kernel tainting and log flooding,
+ when dealing with broken or malicious virtio-net drivers.
+
+ It is set to ``true`` by default.
+
The ``-O (--os-parameters)`` option allows customisation of the OS
parameters. The actual parameter names and values depends on the OS
being used, but the syntax is the same key=value. For example, setting
BATCH-CREATE
^^^^^^^^^^^^
-**batch-create** {instances\_file.json}
+| **batch-create**
+| [{-I|\--iallocator} *instance allocator*]
+| {instances\_file.json}
This command (similar to the Ganeti 1.2 **batcher** tool) submits
-multiple instance creation jobs based on a definition file. The
-instance configurations do not encompass all the possible options for
-the **add** command, but only a subset.
+multiple instance creation jobs based on a definition file. This
+file can contain all options which are valid when adding an instance
+with the exception of the ``iallocator`` field. The IAllocator is,
+for optimization purposes, only allowed to be set for the whole batch
+operation using the ``--iallocator`` parameter.
-The instance file should be a valid-formed JSON file, containing a
-dictionary with instance name and instance parameters. The accepted
-parameters are:
+The instance file must be a valid-formed JSON file, containing an
+array of dictionaries with instance creation parameters. All parameters
+(except ``iallocator``) which are valid for the instance creation
+OP code are allowed. The most important ones are:
-disk\_size
- The size of the disks of the instance.
+instance\_name
+ The FQDN of the new instance.
disk\_template
The disk template to use for the instance, the same as in the
**add** command.
-backend
+disks
+ Array of disk specifications. Each entry describes one disk as a
+ dictionary of disk parameters.
+
+beparams
A dictionary of backend parameters.
hypervisor
- A dictionary with a single key (the hypervisor name), and as value
- the hypervisor options. If not passed, the default hypervisor and
- hypervisor options will be inherited.
+ The hypervisor for the instance.
-mac, ip, mode, link
- Specifications for the one NIC that will be created for the
- instance. 'bridge' is also accepted as a backwards compatible
- key.
+hvparams
+ A dictionary with the hypervisor options. If not passed, the default
+ hypervisor options will be inherited.
nics
List of NICs that will be created for the instance. Each entry
Please don't provide the "mac, ip, mode, link" parent keys if you
use this method for specifying NICs.
-primary\_node, secondary\_node
+pnode, snode
The primary and optionally the secondary node to use for the
- instance (in case an iallocator script is not used).
-
-iallocator
- Instead of specifying the nodes, an iallocator script can be used
- to automatically compute them.
+ instance (in case an iallocator script is not used). If those
+ parameters are given, they have to be given consistently for all
+ instances in the batch operation.
start
whether to start the instance
A simple definition for one instance can be (with most of the
parameters taken from the cluster defaults)::
- {
- "instance3": {
- "template": "drbd",
- "os": "debootstrap",
- "disk_size": ["25G"],
- "iallocator": "dumb"
+ [
+ {
+ "mode": "create",
+ "instance_name": "instance1.example.com",
+ "disk_template": "drbd",
+ "os_type": "debootstrap",
+ "disks": [{"size":"1024"}],
+ "nics": [{}],
+ "hypervisor": "xen-pvm"
},
- "instance5": {
- "template": "drbd",
- "os": "debootstrap",
- "disk_size": ["25G"],
- "iallocator": "dumb",
+ {
+ "mode": "create",
+ "instance_name": "instance2.example.com",
+ "disk_template": "drbd",
+ "os_type": "debootstrap",
+ "disks": [{"size":"4096", "mode": "rw", "vg": "xenvg"}],
+ "nics": [{}],
"hypervisor": "xen-hvm",
"hvparams": {"acpi": true},
- "backend": {"maxmem": 512, "minmem": 256}
+ "beparams": {"maxmem": 512, "minmem": 256}
}
- }
+ ]
The command will display the job id for each submitted instance, as
follows::
# gnt-instance batch-create instances.json
- instance3: 11224
- instance5: 11225
+ Submitted jobs 37, 38
REMOVE
^^^^^^
| **failover** [-f] [\--ignore-consistency] [\--ignore-ipolicy]
| [\--shutdown-timeout=*N*]
| [{-n|\--target-node} *node* \| {-I|\--iallocator} *name*]
+| [\--cleanup]
| [\--submit] [\--print-job-id]
| {*instance*}
If ``--ignore-ipolicy`` is given any instance policy violations occuring
during this operation are ignored.
+If the ``--cleanup`` option is passed, the operation changes from
+performin a failover to attempting recovery from a failed previous failover.
+In this mode, Ganeti checks if the instance runs on the correct node (and
+updates its configuration if not) and ensures the instances' disks
+are configured correctly.
+
See **ganeti**\(7) for a description of ``--submit`` and other common
options.
| [\--gateway6=*GATEWAY6*]
| [\--mac-prefix=*MACPREFIX*]
| [\--submit] [\--print-job-id]
+| [\--no-conflicts-check]
| {*network*}
Creates a new network with the given name. The network will be unused
``--gateway6`` options. IP pool is meaningless for IPV6 so those two
values can be used for EUI64 generation from a NIC's MAC address.
+The ``--no-conflicts-check`` option can be used to skip the check for
+conflicting IP addresses.
+
Note that a when connecting a network to a node group (see below) you
can specify also the NIC mode and link that will be used by instances on
that group to physically connect to this network. This allows the system
CONNECT
~~~~~~~
-| **connect** {*network*} {*mode*} {*link*} [*groups*...]
+| **connect**
+| [\--no-conflicts-check]
+| {*network*} {*mode*} {*link*} [*groups*...]
Connect a network to given node groups (all if not specified) with the
network parameters *mode* and *link*. Every network interface will
inherit those parameters if assigned in a network.
+The ``--no-conflicts-check`` option can be used to skip the check for
+conflicting IP addresses.
+
DISCONNECT
~~~~~~~~~~
- a node to go into N+1 failure state
- an instance to move onto an offline node (offline nodes are either
- read from the cluster or declared with *-O*)
+ read from the cluster or declared with *-O*; drained nodes are
+ considered offline)
- an exclusion-tag based conflict (exclusion tags are read from the
cluster and/or defined via the *\--exclusion-tags* option)
- a max vcpu/pcpu ratio to be exceeded (configured via *\--max-cpu*)
~~~~~~~~~~~~~~~
As said before, the algorithm tries to minimise the cluster score at
-each step. Currently this score is computed as a sum of the following
-components:
+each step. Currently this score is computed as a weighted sum of the
+following components:
- standard deviation of the percent of free memory
- standard deviation of the percent of reserved memory
- standard deviation of the percent of free disk
- count of nodes failing N+1 check
- count of instances living (either as primary or secondary) on
- offline nodes
+ offline nodes; in the sense of hbal (and the other htools) drained
+ nodes are considered offline
- count of instances living (as primary) on offline nodes; this
differs from the above metric by helping failover of such instances
in 2-node clusters
, opTargetNodeUuid = Nothing
, opIgnoreIpolicy = False
, opIallocator = Nothing
+ , opMigrationCleanup = False
}
])
| offSec ->
-- * Cluster definitions
$(buildObject "Cluster" "cluster" $
[ simpleField "rsahostkeypub" [t| String |]
+ , simpleField "dsahostkeypub" [t| String |]
, simpleField "highest_used_port" [t| Int |]
, simpleField "tcpudp_port_pool" [t| [Int] |]
, simpleField "mac_prefix" [t| String |]
, pMigrationTargetNodeUuid
, pIgnoreIpolicy
, pIallocator
+ , pMigrationCleanup
])
, ("OpInstanceMigrate",
[ pInstanceName
, ("config_version", showJSON C.configVersion)
, ("os_api_version", showJSON $ maximum C.osApiVersions)
, ("export_version", showJSON C.exportVersion)
+ , ("vcs_version", showJSON C.vcsVersion)
, ("architecture", showJSON arch_tuple)
, ("name", showJSON $ clusterClusterName cluster)
, ("master", showJSON (case master of
"OP_INSTANCE_FAILOVER" ->
OpCodes.OpInstanceFailover <$> genFQDN <*> return Nothing <*>
arbitrary <*> arbitrary <*> genMaybe genNodeNameNE <*>
- return Nothing <*> arbitrary <*> genMaybe genNameNE
+ return Nothing <*> arbitrary <*> genMaybe genNameNE <*> arbitrary
"OP_INSTANCE_MIGRATE" ->
OpCodes.OpInstanceMigrate <$> genFQDN <*> return Nothing <*>
arbitrary <*> arbitrary <*> genMaybe genNodeNameNE <*>
def testNormal(self):
tmpname = utils.PathJoin(self.tmpdir, "foobar")
os.mkdir(tmpname)
+ os.chmod(tmpname, 0755)
self.assertTrue(os.path.isdir(tmpname))
(status, msg) = \
backend._VerifyRestrictedCmdDirectory(tmpname,
cluster_config = objects.Cluster(
serial_no=1,
rsahostkeypub="",
+ dsahostkeypub="",
highest_used_port=(constants.FIRST_DRBD_PORT - 1),
mac_prefix="aa:00:00",
volume_group_name="xenvg",
cfg = mocks.FakeConfig()
ssh.WriteKnownHostsFile(cfg, self.tmpfile)
self.assertFileContent(self.tmpfile,
- "%s ssh-rsa %s\n" % (cfg.GetClusterName(),
- mocks.FAKE_CLUSTER_KEY))
+ "%s ssh-rsa %s\n%s ssh-dss %s\n" %
+ (cfg.GetClusterName(), mocks.FAKE_CLUSTER_KEY,
+ cfg.GetClusterName(), mocks.FAKE_CLUSTER_KEY))
class TestGetUserFiles(unittest.TestCase):
def GetNodeList(self):
return ["a", "b", "c"]
- def GetHostKey(self):
+ def GetRsaHostKey(self):
+ return FAKE_CLUSTER_KEY
+
+ def GetDsaHostKey(self):
return FAKE_CLUSTER_KEY
def GetClusterName(self):
"""
cfg["cluster"]["rsahostkeypub"] = ""
+ cfg["cluster"]["dsahostkeypub"] = ""
for instance in cfg["instances"].values():
for disk in instance["disks"]:
RandomizeDiskSecrets(disk)