X-Git-Url: https://code.grnet.gr/git/ganeti-local/blobdiff_plain/d7f1864041513efa7b086f6b81fdabcd65714c6e..d971402f988becaf3e8d8d555ed03cb4482e09e5:/NEWS diff --git a/NEWS b/NEWS index 1783ea1..f9b8a72 100644 --- a/NEWS +++ b/NEWS @@ -1,422 +1,2375 @@ -Ganeti-htools release notes -=========================== - - -Version 0.2.8 (Thu, 23 Dec 2010) --------------------------------- - -A bug fix release: - -- fixed balancing function for big clusters, which will improve corner - cases where hbal didn't see any solution even though the cluster was - obviously not well balanced -- fixed exit code of hbal in case of (Luxi) job errors -- changed the signal handling in hbal in order to make hbal control - easier: instead of synchronising on the count of signals, make SIGINT - cause graceful termination, and SIGTERM an immediate one -- increased the tag exclusion weight so that it has greater importance - during the balancing -- slight improvement to the speed of balancing via algorithm tweaks - - -Version 0.2.7 (Thu, 07 Oct 2010) --------------------------------- - -Bug fixes: - -- fixed the error message for hail multi-evacuation mode -- improve evacuation mode for offline secondary nodes (ignore available - memory) - -New features: - -- add a new option ``-S`` to hbal and hspace that saves the cluster - state at the end of the processing in the text format used by the - ``-t`` option, for later re-processing -- a two new options to hbal, -g and --min-gain-limit, that should help - in limiting the number of balances steps with a low gain in the final - stages -- hbal, when executing jobs, will now wait for the current jobs to - finish at the first stop (e.g. ^C); if the user wants immediate exit, - another signal should be sent -- added “normalized” physical CPU units in hspace output (NPU), which - represents units of physical CPUs free/used, based on the max-cpu - ratio - - -Version 0.2.6 (Mon, 26 Jul 2010) --------------------------------- - -Exactly three months since the last release. Many internal changes, plus -a couple of important changes in the balancing algorithm. - -First, the balancing may now introduce N+1 errors, if this solves other, -more critical problems. For the moment, this means that moving instances -away from offline nodes is allowed even if it creates N+1 errors, and -that means evacuation can be done in more cases. - -Second, the scoring for N+1 has changed. In previous versions, it simply -counted the number of failing N+1 nodes, which means moving an instance -away from a N+1 failed node (but without the node 'clearing' the N+1 -status) was not reflected in the cluster score. As such, the balancing -algorithm managed to clear N+1 errors only sometimes, since usually it -takes more than one move for this, and the first prerequisite move was -not 'rewarded' appropriately and thus it was not selected. Now, it is -possible to fix many more error cases than before: on a simulated 40 -node cluster full with instances (symmetrically allocated on all nodes), -around five nodes can be evacuated before N+1 errors can be solved, -whereas 0.2.5 could evacuate at best one node. - -There were some other internal changes to the scoring algorithm, such -that now the metrics have associated weights, and they are not all of -the same importance anymore. As of now, the only change is that offline -instances have a higher weight, which should favour proper node -evacuations. - -Among the other changes: - -- fixed the hspace KM_POOL_* metrics, which were returned as the final - state and not as the delta between the initial and final states -- fixed hspace handling of N+1 failing clusters: before, it used to - generate a 'fake' response, and the structure of this response was not - always in sync with the real responses, leading to missing items; - currently it proceeds correctly through the code (skipping the - computation), and uses the same display mechanisms as the normal case -- fixed hscan exit code for RAPI failures: previously it finished with - success even if all the clusters failed, which was creating issues - with the live-test script; now it exits with exit code 2 for RAPI - failures (unfortunately this is still not optimal as LUXI failures - will use exit code 1, the same as the command line) -- changed the limit values for CPU/disk, which previously were used - optionally, whereas now they are always used; the default cpu ratio - limit is now 64 VCPUs per PCPU -- changed the internal handling of the short name vs. original - (Ganeti-provided) name; now internally we always use the full name, - and only in display routines we show the shortened (called 'alias') - name; as a result, the -O and --excluded-instances options now accept - both the full name and the shortened name -- changed internal handling of JSON conversions and errors, such that - now we show a better context for failure messages, which should help - with diagnosing the malformed message -- changed the names for a few node fields, and added some more nodes; - this is most likely to help with debugging, and not with regular - operation though -- changed the node fields option to allow the '+' prefix to mean 'extend - the default fields list' rather than start from fresh (similar to - Ganeti's implementation) -- a few internal changes related to the LUXI protocol implementation, - which should make it more safe against potential bugs, one - optiomization that should help with large messages, and some patches - in preparation for potential expansion of the LUXI backend functionality - -And finally, many improvements on unittests and the live-test -script. Test coverage is much enhanced, and the test infrastructure has -better error reporting; this should lead down-the-road to better code -and fewer bugs… - - -Version 0.2.5 (Mon, 26 Apr 2010) --------------------------------- - -Some internal cleanup plus a few user-visible changes: - -- new option for marking instances as 'do-not-move' during rebalancing -- allow ``hscan`` to scan the local cluster via Luxi -- add more metrics to ``hspace`` which show the delta between original - state and final state better (only valid for tiered allocation) - - -Version 0.2.4 (Mon, 22 Feb 2010) --------------------------------- - -Two improvements for node evacuation: - -- hbal takes a new parameter ``--evac-mode`` that restricts the - instances to be moved to the ones on offline/drained nodes, which - should reduce the work done -- hail supports the new ``multi-evacuate`` mode of the IAllocator - protocol, that will be released in a minor release on the Ganeti 2.1 - branch - - -Version 0.2.3 (Thu, 4 Feb 2010) --------------------------------- - -A small release: - -- Fixes selection of secondary node: previously, if the cluster had - many N+1 failures, a N+1 failed node could be selected as secondary - even if it did not have enough memory to allow the instance to be - migrated/failed over to it; this is bad for automated tools, since - we can get the cluster in an unhealthy state -- Switch the text backend to a single input file, that is generated - now by hscan and shouldn't be generated manually via - gnt-node/instance list anymore; this allows richer information to be - kept in the file, and simplifies a little the internals of the text - backend - - -Version 0.2.2 (Tue, 29 Dec 2009) --------------------------------- - -Small release, 0.2.1 was broken and thus this was released earlier: - -- Release 0.2.1 broke the LUXI backend due to a typo, fixed -- Added a live-test script that should catch errors like the above one - in the future (needs a working, non-empty cluster) -- Changed RAPI and LUXI backends to treat drained nodes as offline, - similar to the IAllocator backend change in 0.2.0 (which was wrongly - marked as affecting all backends) -- Changed the metrics for offline instances and N1 score from percent to - count, in order to increase the priority of evacuations -- Added a new metric (offline primary instances) which should fix the - evacuation of a offline node in a 2-node cluster - - -Version 0.2.1 (Wed, 2 Dec 2009) --------------------------------- - -- Added instance exclusion defined via instance tags -- Fixed the output of hspace to be again parseable from the shell - - -Version 0.2.0 (Tue, 10 Nov 2009) --------------------------------- - -A significant release, with a few new major features: - -- Added direct execution of the hbal solution when using the Luxi - backend; the steps for each instance moves are submitted as a single - jobs, and the different jobs are submitted as groups in order to - parallelise the execution of moves -- Added support for balancing based on dynamic utilisation data for - instances, fed in via a text file; by default, all instances are - considered equal and this change also improves the equalisation of - secondary instances per node -- Added support for tiered capacity calculation in hspace, where we - start from a maximum instance spec and decrease the spec when we run - out of resources; this should give a better measure of available - capacity on 'fragmented' clusters; this is done separately from the - current fixed-mode computation +News +==== -Also there have been many minor improvements: -- Added option for showing instances (“--print-instances”), similar to - the print nodes option -- Added support for customising the node list via an argument to the - print nodes option in the form of a comma-separated list of field - names; currently the field names are not documented, expecting further - changes in a next release -- Enhanced the error reporting in the Luxi and Rapi backends -- Changed the handling of drained nodes, now being treated the same as - offline nodes, for Ganeti 2.0.4+ compatibility -- A number of internal changes, simplifying code and merging some - disparate functions -- Simplify the build system in relation to creation of archives +Version 2.7.0 beta0 +------------------- +*(unreleased)* -Version 0.1.8 (Tue, 29 Sep 2009) --------------------------------- +- ``gnt-instance batch-create`` has been changed to use the bulk create + opcode from Ganeti. This lead to incompatible changes in the format of + the JSON file. It's now not a custom dict anymore but a dict + compatible with the ``OpInstanceCreate`` opcode. -- Brown-paper-bag release fixing haddock issues +Version 2.6.0 +------------- -Version 0.1.7 (Mon, 28 Sep 2009) --------------------------------- +*(Released Fri, 27 Jul 2012)* -- Fixed a bug in the Luxi backend for big responses -- Fixed test suite exit code in presence of test failures -- Changed the migrate operation to run instead failover for instances - which were marked as not running in the input data (this could have - been changed since then, but it's better than today's always migrate) -- Added support for 'cheap' moves only (only migrate/failover) in - balancing -- Added support for building without curl (thus no RAPI backend) +.. attention:: The ``LUXI`` protocol has been made more consistent + regarding its handling of command arguments. This, however, leads to + incompatibility issues with previous versions. Please ensure that you + restart Ganeti daemons soon after the upgrade, otherwise most + ``LUXI`` calls (job submission, setting/resetting the drain flag, + pausing/resuming the watcher, cancelling and archiving jobs, querying + the cluster configuration) will fail. -Version 0.1.6 (Wed, 19 Aug 2009) --------------------------------- -- Added support for Luxi (the native Ganeti protocol) -- Added support for simulated clusters (for hspace only) -- Added timeouts for the RAPI backend -- Fixed a few inconsistencies in the command line handling -- Fixed handling of errors while loading data -- The 'network' is a new dependency due to the Luxi addition - - -Version 0.1.5 (Thu, 09 Jul 2009) --------------------------------- - -- Removed obsolete hn1 program; this allowed removal of a lot of - supporting code -- Lots of changes in hspace: the output now is a shell fragment in order - for script to source it or parse it easier; added failure reasons; - optimised to use less memory for large clusters -- Optimized the scoring algorithm (used by all tools) so that now - computations should be faster - - -Version 0.1.4 (Tue, 16 Jun 2009) --------------------------------- - -- Added CPU count/ratio of virtual-to-physical CPUs to the cluster - scoring methods; this means that now the balancer, the iallocator - plugin and so on will try to keep the VCPU-to-PCPU ratio equal across - the cluster -- Fixed some hscan bugs -- Fixed the way iallocator reads the total disk size (was broken and it - was always falling back to summing the disk sizes) -- Internals: fixed most compile-time warnings - - -Version 0.1.3 (Fri, 05 Jun 2009) --------------------------------- - -- Fix a bug in the ReplacePrimary instance moves, affecting most of the - tools - - -Version 0.1.2 (Tue, 02 Jun 2009) --------------------------------- - -- Add a new program, “hspace”, which computes the free space on a - cluster (based on a given instance spec) -- Improvements in API docs and partially in the user docs -- Started adding unittests - - -Version 0.1.1 (Tue, 26 May 2009) --------------------------------- - -- Add a new program, “hail”, which is an iallocator plugin and can - allocate/relocate instances -- Experimental support for non-mirrored instances (hail supports them, - hbal should no longer abort when it finds such instances and simply - ignore them) -- The RAPI port and/or scheme can be overriden now, and even “file://” - schemes can be used if the message body has been saved under the - appropriate name -- Lots of code reorganization, esp. rewritten loading pipeline -- Better data checking and better error messages in case validation - fails; tools now consider nodes with error in input data (‘?’ returned - by ganeti) as offline -- Small enhancement to the makefile for simpler packaging - - -Version 0.1.0 (Tue, 19 May 2009) --------------------------------- - -- Drop compatibility with Ganeti 1.2 -- Add a new minimum score option (with a very low default), should help - with very good clusters (but is still not optimal) -- Add a --quiet option to hbal -- Add support for reading offline nodes directly from the cluster +New features +~~~~~~~~~~~~ +Instance run status ++++++++++++++++++++ -Version 0.0.8 (Tue, 21 Apr 2009) --------------------------------- +The current ``admin_up`` field, which used to denote whether an instance +should be running or not, has been removed. Instead, ``admin_state`` is +introduced, with 3 possible values -- ``up``, ``down`` and ``offline``. -- hbal: prevent mismatches in wrong node names being passed to -O, by - aborting in this case -- add the ability to write the commands (-C) to a script via (-C), - so that it can be later executed directly; this has also changed the - commands to include the ncessary -f flags to skip confirmations -- add checks for extra argument in hbal and hn1, so that unintended - errors are catched -- raise the accepted “missing” memory limit to 512MB, to cover usual Xen - reservations - - -Version 0.0.7 (Mon, 23 Mar 2009) --------------------------------- +The rational behind this is that an instance being “down” can have +different meanings: -- added support for offline nodes, which are not used as targets for - instance relocation and if they hold instances the hbal algorithm will - attempt to relocate these away -- added support for offline instances, which now will no longer skew the - free memory estimation of nodes; the algorithm will no longer create - conditions for N+1 failures when such instances are later started -- implemented a complete model of node resources, in order to prevent an - unintended re-occurrence of cases like the offline instance were we - miscalculate some node resource; this gives warning now in case the - node reported free disk or free memory deviates by more than a set - amount from the expected value -- a new tool *hscan* that can generate the input text-file for the other - tools by collection via RAPI -- some small changes to the build system to make it more friendly; also - included the generated documentation in the source archive - - -Version 0.0.6 (Mon, 16 Mar 2009) --------------------------------- - -- re-factored the hbal algorithm to make it stable in the sense that it - gives the same solution when restarted from the middle; barring - rounding of disk/memory and incomplete reporting from Ganeti (for - 1.2), it should be now feasible to rely on its output without - generating moves ad infinitum -- the hbal algorithm now uses two more variables: the node N+1 failures - and the amount of reserved memory; the first of which tries to ‘fix’ - the N+1 status, the latter tries to distribute secondaries more - equally -- the hbal algorithm now uses two more moves at each step: - replace+failover and failover+replace (besides the original failover, - replace, and failover+replace+failover) -- slightly changed the build system to embed GIT version/tags into the - binaries so that we know for a binary from which tree it was done, - either via ‘--version’ or via “strings hbal|grep version” -- changed the solution list and in general the hbal output to be more - clear by default, and changed “gnt-instance failover” to “gnt-instance - migrate” -- added man pages for the two binaries - - -Version 0.0.5 (Mon, 09 Mar 2009) --------------------------------- - -- a few small improvements for hbal (possibly undone by later changes), - hbal is now quite faster -- fix documentation building -- allow hbal to work on non N+1 compliant clusters, but without - guarantees that the end cluster will be compliant; in any case, this - should give a smaller number of nodes that are not compliant if the - cluster state permits it -- strip common domain suffix from nodes and instances, so that output is - shorter and hopefully clearer - - -Version 0.0.4 (Sun, 15 Feb 2009) --------------------------------- - -- better balancing algorithm in hbal -- implemented an RAPI collector, now the cluster data can be gathered - automatically via RAPI and doesn't need manual export of node and - instance list - - -Version 0.0.3 (Wed, 28 Jan 2009) --------------------------------- - -- initial release of the hbal, a cluster rebalancing tool -- input data format changed due to hbal requirements - - -Version 0.0.2 (Tue, 06 Jan 2009) --------------------------------- - -- fix handling of some common cases (cluster N+1 compliant from the - start, too big depth given, failure to compute solution) -- add option to print the needed command list for reaching the proposed - solution - - -Version 0.0.1 (Tue, 06 Jan 2009) --------------------------------- - -- initial release of hn1 tool - -.. vim: set textwidth=72 : +- it could be down during a reboot +- it could be temporarily be down for a reinstall +- or it could be down because it is deprecated and kept just for its + disk + +The previous Boolean state was making it difficult to do capacity +calculations: should Ganeti reserve memory for a down instance? Now, the +tri-state field makes it clear: + +- in ``up`` and ``down`` state, all resources are reserved for the + instance, and it can be at any time brought up if it is down +- in ``offline`` state, only disk space is reserved for it, but not + memory or CPUs + +The field can have an extra use: since the transition between ``up`` and +``down`` and vice-versus is done via ``gnt-instance start/stop``, but +transition between ``offline`` and ``down`` is done via ``gnt-instance +modify``, it is possible to given different rights to users. For +example, owners of an instance could be allowed to start/stop it, but +not transition it out of the offline state. + +Instance policies and specs ++++++++++++++++++++++++++++ + +In previous Ganeti versions, an instance creation request was not +limited on the minimum size and on the maximum size just by the cluster +resources. As such, any policy could be implemented only in third-party +clients (RAPI clients, or shell wrappers over ``gnt-*`` +tools). Furthermore, calculating cluster capacity via ``hspace`` again +required external input with regards to instance sizes. + +In order to improve these workflows and to allow for example better +per-node group differentiation, we introduced instance specs, which +allow declaring: + +- minimum instance disk size, disk count, memory size, cpu count +- maximum values for the above metrics +- and “standard” values (used in ``hspace`` to calculate the standard + sized instances) + +The minimum/maximum values can be also customised at node-group level, +for example allowing more powerful hardware to support bigger instance +memory sizes. + +Beside the instance specs, there are a few other settings belonging to +the instance policy framework. It is possible now to customise, per +cluster and node-group: + +- the list of allowed disk templates +- the maximum ratio of VCPUs per PCPUs (to control CPU oversubscription) +- the maximum ratio of instance to spindles (see below for more + information) for local storage + +All these together should allow all tools that talk to Ganeti to know +what are the ranges of allowed values for instances and the +over-subscription that is allowed. + +For the VCPU/PCPU ratio, we already have the VCPU configuration from the +instance configuration, and the physical CPU configuration from the +node. For the spindle ratios however, we didn't track before these +values, so new parameters have been added: + +- a new node parameter ``spindle_count``, defaults to 1, customisable at + node group or node level +- at new backend parameter (for instances), ``spindle_use`` defaults to 1 + +Note that spindles in this context doesn't need to mean actual +mechanical hard-drives; it's just a relative number for both the node +I/O capacity and instance I/O consumption. + +Instance migration behaviour +++++++++++++++++++++++++++++ + +While live-migration is in general desirable over failover, it is +possible that for some workloads it is actually worse, due to the +variable time of the “suspend” phase during live migration. + +To allow the tools to work consistently over such instances (without +having to hard-code instance names), a new backend parameter +``always_failover`` has been added to control the migration/failover +behaviour. When set to True, all migration requests for an instance will +instead fall-back to failover. + +Instance memory ballooning +++++++++++++++++++++++++++ + +Initial support for memory ballooning has been added. The memory for an +instance is no longer fixed (backend parameter ``memory``), but instead +can vary between minimum and maximum values (backend parameters +``minmem`` and ``maxmem``). Currently we only change an instance's +memory when: + +- live migrating or failing over and instance and the target node + doesn't have enough memory +- user requests changing the memory via ``gnt-instance modify + --runtime-memory`` + +Instance CPU pinning +++++++++++++++++++++ + +In order to control the use of specific CPUs by instance, support for +controlling CPU pinning has been added for the Xen, HVM and LXC +hypervisors. This is controlled by a new hypervisor parameter +``cpu_mask``; details about possible values for this are in the +:manpage:`gnt-instance(8)`. Note that use of the most specific (precise +VCPU-to-CPU mapping) form will work well only when all nodes in your +cluster have the same amount of CPUs. + +Disk parameters ++++++++++++++++ + +Another area in which Ganeti was not customisable were the parameters +used for storage configuration, e.g. how many stripes to use for LVM, +DRBD resync configuration, etc. + +To improve this area, we've added disks parameters, which are +customisable at cluster and node group level, and which allow to +specify various parameters for disks (DRBD has the most parameters +currently), for example: + +- DRBD resync algorithm and parameters (e.g. speed) +- the default VG for meta-data volumes for DRBD +- number of stripes for LVM (plain disk template) +- the RBD pool + +These parameters can be modified via ``gnt-cluster modify -D …`` and +``gnt-group modify -D …``, and are used at either instance creation (in +case of LVM stripes, for example) or at disk “activation” time +(e.g. resync speed). + +Rados block device support +++++++++++++++++++++++++++ + +A Rados (http://ceph.com/wiki/Rbd) storage backend has been added, +denoted by the ``rbd`` disk template type. This is considered +experimental, feedback is welcome. For details on configuring it, see +the :doc:`install` document and the :manpage:`gnt-cluster(8)` man page. + +Master IP setup ++++++++++++++++ + +The existing master IP functionality works well only in simple setups (a +single network shared by all nodes); however, if nodes belong to +different networks, then the ``/32`` setup and lack of routing +information is not enough. + +To allow the master IP to function well in more complex cases, the +system was reworked as follows: + +- a master IP netmask setting has been added +- the master IP activation/turn-down code was moved from the node daemon + to a separate script +- whether to run the Ganeti-supplied master IP script or a user-supplied + on is a ``gnt-cluster init`` setting + +Details about the location of the standard and custom setup scripts are +in the man page :manpage:`gnt-cluster(8)`; for information about the +setup script protocol, look at the Ganeti-supplied script. + +SPICE support ++++++++++++++ + +The `SPICE `_ support has been +improved. + +It is now possible to use TLS-protected connections, and when renewing +or changing the cluster certificates (via ``gnt-cluster renew-crypto``, +it is now possible to specify spice or spice CA certificates. Also, it +is possible to configure a password for SPICE sessions via the +hypervisor parameter ``spice_password_file``. + +There are also new parameters to control the compression and streaming +options (e.g. ``spice_image_compression``, ``spice_streaming_video``, +etc.). For details, see the man page :manpage:`gnt-instance(8)` and look +for the spice parameters. + +Lastly, it is now possible to see the SPICE connection information via +``gnt-instance console``. + +OVF converter ++++++++++++++ + +A new tool (``tools/ovfconverter``) has been added that supports +conversion between Ganeti and the `Open Virtualization Format +`_ (both to and +from). + +This relies on the ``qemu-img`` tool to convert the disk formats, so the +actual compatibility with other virtualization solutions depends on it. + +Confd daemon changes +++++++++++++++++++++ + +The configuration query daemon (``ganeti-confd``) is now optional, and +has been rewritten in Haskell; whether to use the daemon at all, use the +Python (default) or the Haskell version is selectable at configure time +via the ``--enable-confd`` parameter, which can take one of the +``haskell``, ``python`` or ``no`` values. If not used, disabling the +daemon will result in a smaller footprint; for larger systems, we +welcome feedback on the Haskell version which might become the default +in future versions. + +If you want to use ``gnt-node list-drbd`` you need to have the Haskell +daemon running. The Python version doesn't implement the new call. + + +User interface changes +~~~~~~~~~~~~~~~~~~~~~~ + +We have replaced the ``--disks`` option of ``gnt-instance +replace-disks`` with a more flexible ``--disk`` option, which allows +adding and removing disks at arbitrary indices (Issue 188). Furthermore, +disk size and mode can be changed upon recreation (via ``gnt-instance +recreate-disks``, which accepts the same ``--disk`` option). + +As many people are used to a ``show`` command, we have added that as an +alias to ``info`` on all ``gnt-*`` commands. + +The ``gnt-instance grow-disk`` command has a new mode in which it can +accept the target size of the disk, instead of the delta; this can be +more safe since two runs in absolute mode will be idempotent, and +sometimes it's also easier to specify the desired size directly. + +Also the handling of instances with regard to offline secondaries has +been improved. Instance operations should not fail because one of it's +secondary nodes is offline, even though it's safe to proceed. + +A new command ``list-drbd`` has been added to the ``gnt-node`` script to +support debugging of DRBD issues on nodes. It provides a mapping of DRBD +minors to instance name. + +API changes +~~~~~~~~~~~ + +RAPI coverage has improved, with (for example) new resources for +recreate-disks, node power-cycle, etc. + +Compatibility +~~~~~~~~~~~~~ + +There is partial support for ``xl`` in the Xen hypervisor; feedback is +welcome. + +Python 2.7 is better supported, and after Ganeti 2.6 we will investigate +whether to still support Python 2.4 or move to Python 2.6 as minimum +required version. + +Support for Fedora has been slightly improved; the provided example +init.d script should work better on it and the INSTALL file should +document the needed dependencies. + +Internal changes +~~~~~~~~~~~~~~~~ + +The deprecated ``QueryLocks`` LUXI request has been removed. Use +``Query(what=QR_LOCK, ...)`` instead. + +The LUXI requests :pyeval:`luxi.REQ_QUERY_JOBS`, +:pyeval:`luxi.REQ_QUERY_INSTANCES`, :pyeval:`luxi.REQ_QUERY_NODES`, +:pyeval:`luxi.REQ_QUERY_GROUPS`, :pyeval:`luxi.REQ_QUERY_EXPORTS` and +:pyeval:`luxi.REQ_QUERY_TAGS` are deprecated and will be removed in a +future version. :pyeval:`luxi.REQ_QUERY` should be used instead. + +RAPI client: ``CertificateError`` now derives from +``GanetiApiError``. This should make it more easy to handle Ganeti +errors. + +Deprecation warnings due to PyCrypto/paramiko import in +``tools/setup-ssh`` have been silenced, as usually they are safe; please +make sure to run an up-to-date paramiko version, if you use this tool. + +The QA scripts now depend on Python 2.5 or above (the main code base +still works with Python 2.4). + +The configuration file (``config.data``) is now written without +indentation for performance reasons; if you want to edit it, it can be +re-formatted via ``tools/fmtjson``. + +A number of bugs has been fixed in the cluster merge tool. + +``x509`` certification verification (used in import-export) has been +changed to allow the same clock skew as permitted by the cluster +verification. This will remove some rare but hard to diagnose errors in +import-export. + + +Version 2.6.0 rc4 +----------------- + +*(Released Thu, 19 Jul 2012)* + +Very few changes from rc4 to the final release, only bugfixes: + +- integrated fixes from release 2.5.2 (fix general boot flag for KVM + instance, fix CDROM booting for KVM instances) +- fixed node group modification of node parameters +- fixed issue in LUClusterVerifyGroup with multi-group clusters +- fixed generation of bash completion to ensure a stable ordering +- fixed a few typos + + +Version 2.6.0 rc3 +----------------- + +*(Released Fri, 13 Jul 2012)* + +Third release candidate for 2.6. The following changes were done from +rc3 to rc4: + +- Fixed ``UpgradeConfig`` w.r.t. to disk parameters on disk objects. +- Fixed an inconsistency in the LUXI protocol with the provided + arguments (NOT backwards compatible) +- Fixed a bug with node groups ipolicy where ``min`` was greater than + the cluster ``std`` value +- Implemented a new ``gnt-node list-drbd`` call to list DRBD minors for + easier instance debugging on nodes (requires ``hconfd`` to work) + + +Version 2.6.0 rc2 +----------------- + +*(Released Tue, 03 Jul 2012)* + +Second release candidate for 2.6. The following changes were done from +rc2 to rc3: + +- Fixed ``gnt-cluster verify`` regarding ``master-ip-script`` on non + master candidates +- Fixed a RAPI regression on missing beparams/memory +- Fixed redistribution of files on offline nodes +- Added possibility to run activate-disks even though secondaries are + offline. With this change it relaxes also the strictness on some other + commands which use activate disks internally: + * ``gnt-instance start|reboot|rename|backup|export`` +- Made it possible to remove safely an instance if its secondaries are + offline +- Made it possible to reinstall even though secondaries are offline + + +Version 2.6.0 rc1 +----------------- + +*(Released Mon, 25 Jun 2012)* + +First release candidate for 2.6. The following changes were done from +rc1 to rc2: + +- Fixed bugs with disk parameters and ``rbd`` templates as well as + ``instance_os_add`` +- Made ``gnt-instance modify`` more consistent regarding new NIC/Disk + behaviour. It supports now the modify operation +- ``hcheck`` implemented to analyze cluster health and possibility of + improving health by rebalance +- ``hbal`` has been improved in dealing with split instances + + +Version 2.6.0 beta2 +------------------- + +*(Released Mon, 11 Jun 2012)* + +Second beta release of 2.6. The following changes were done from beta2 +to rc1: + +- Fixed ``daemon-util`` with non-root user models +- Fixed creation of plain instances with ``--no-wait-for-sync`` +- Fix wrong iv_names when running ``cfgupgrade`` +- Export more information in RAPI group queries +- Fixed bug when changing instance network interfaces +- Extended burnin to do NIC changes +- query: Added ``<``, ``>``, ``<=``, ``>=`` comparison operators +- Changed default for DRBD barriers +- Fixed DRBD error reporting for syncer rate +- Verify the options on disk parameters + +And of course various fixes to documentation and improved unittests and +QA. + + +Version 2.6.0 beta1 +------------------- + +*(Released Wed, 23 May 2012)* + +First beta release of 2.6. The following changes were done from beta1 to +beta2: + +- integrated patch for distributions without ``start-stop-daemon`` +- adapted example init.d script to work on Fedora +- fixed log handling in Haskell daemons +- adapted checks in the watcher for pycurl linked against libnss +- add partial support for ``xl`` instead of ``xm`` for Xen +- fixed a type issue in cluster verification +- fixed ssconf handling in the Haskell code (was breaking confd in IPv6 + clusters) + +Plus integrated fixes from the 2.5 branch: + +- fixed ``kvm-ifup`` to use ``/bin/bash`` +- fixed parallel build failures +- KVM live migration when using a custom keymap + + +Version 2.5.2 +------------- + +*(Released Tue, 24 Jul 2012)* + +A small bugfix release, with no new features: + +- fixed bash-isms in kvm-ifup, for compatibility with systems which use a + different default shell (e.g. Debian, Ubuntu) +- fixed KVM startup and live migration with a custom keymap (fixes Issue + 243 and Debian bug #650664) +- fixed compatibility with KVM versions that don't support multiple boot + devices (fixes Issue 230 and Debian bug #624256) + +Additionally, a few fixes were done to the build system (fixed parallel +build failures) and to the unittests (fixed race condition in test for +FileID functions, and the default enable/disable mode for QA test is now +customisable). + + +Version 2.5.1 +------------- + +*(Released Fri, 11 May 2012)* + +A small bugfix release. + +The main issues solved are on the topic of compatibility with newer LVM +releases: + +- fixed parsing of ``lv_attr`` field +- adapted to new ``vgreduce --removemissing`` behaviour where sometimes + the ``--force`` flag is needed + +Also on the topic of compatibility, ``tools/lvmstrap`` has been changed +to accept kernel 3.x too (was hardcoded to 2.6.*). + +A regression present in 2.5.0 that broke handling (in the gnt-* scripts) +of hook results and that also made display of other errors suboptimal +was fixed; the code behaves now like 2.4 and earlier. + +Another change in 2.5, the cleanup of the OS scripts environment, is too +aggressive: it removed even the ``PATH`` variable, which requires the OS +scripts to *always* need to export it. Since this is a bit too strict, +we now export a minimal PATH, the same that we export for hooks. + +The fix for issue 201 (Preserve bridge MTU in KVM ifup script) was +integrated into this release. + +Finally, a few other miscellaneous changes were done (no new features, +just small improvements): + +- Fix ``gnt-group --help`` display +- Fix hardcoded Xen kernel path +- Fix grow-disk handling of invalid units +- Update synopsis for ``gnt-cluster repair-disk-sizes`` +- Accept both PUT and POST in noded (makes future upgrade to 2.6 easier) + + +Version 2.5.0 +------------- + +*(Released Thu, 12 Apr 2012)* + +Incompatible/important changes and bugfixes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- The default of the ``/2/instances/[instance_name]/rename`` RAPI + resource's ``ip_check`` parameter changed from ``True`` to ``False`` + to match the underlying LUXI interface. +- The ``/2/nodes/[node_name]/evacuate`` RAPI resource was changed to use + body parameters, see :doc:`RAPI documentation `. The server does + not maintain backwards-compatibility as the underlying operation + changed in an incompatible way. The RAPI client can talk to old + servers, but it needs to be told so as the return value changed. +- When creating file-based instances via RAPI, the ``file_driver`` + parameter no longer defaults to ``loop`` and must be specified. +- The deprecated ``bridge`` NIC parameter is no longer supported. Use + ``link`` instead. +- Support for the undocumented and deprecated RAPI instance creation + request format version 0 has been dropped. Use version 1, supported + since Ganeti 2.1.3 and :doc:`documented `, instead. +- Pyparsing 1.4.6 or above is required, see :doc:`installation + documentation `. +- The "cluster-verify" hooks are now executed per group by the + ``OP_CLUSTER_VERIFY_GROUP`` opcode. This maintains the same behavior + if you just run ``gnt-cluster verify``, which generates one opcode per + group. +- The environment as passed to the OS scripts is cleared, and thus no + environment variables defined in the node daemon's environment will be + inherited by the scripts. +- The :doc:`iallocator ` mode ``multi-evacuate`` has been + deprecated. +- :doc:`New iallocator modes ` have been added to + support operations involving multiple node groups. +- Offline nodes are ignored when failing over an instance. +- Support for KVM version 1.0, which changed the version reporting format + from 3 to 2 digits. +- TCP/IP ports used by DRBD disks are returned to a pool upon instance + removal. +- ``Makefile`` is now compatible with Automake 1.11.2 +- Includes all bugfixes made in the 2.4 series + +New features +~~~~~~~~~~~~ + +- The ganeti-htools project has been merged into the ganeti-core source + tree and will be built as part of Ganeti (see :doc:`install-quick`). +- Implemented support for :doc:`shared storage `. +- Add support for disks larger than 2 TB in ``lvmstrap`` by supporting + GPT-style partition tables (requires `parted + `_). +- Added support for floppy drive and 2nd CD-ROM drive in KVM hypervisor. +- Allowed adding tags on instance creation. +- Export instance tags to hooks (``INSTANCE_TAGS``, see :doc:`hooks`) +- Allow instances to be started in a paused state, enabling the user to + see the complete console output on boot using the console. +- Added new hypervisor flag to control default reboot behaviour + (``reboot_behavior``). +- Added support for KVM keymaps (hypervisor parameter ``keymap``). +- Improved out-of-band management support: + + - Added ``gnt-node health`` command reporting the health status of + nodes. + - Added ``gnt-node power`` command to manage power status of nodes. + - Added command for emergency power-off (EPO), ``gnt-cluster epo``. + +- Instance migration can fall back to failover if instance is not + running. +- Filters can be used when listing nodes, instances, groups and locks; + see *ganeti(7)* manpage. +- Added post-execution status as variables to :doc:`hooks ` + environment. +- Instance tags are exported/imported together with the instance. +- When given an explicit job ID, ``gnt-job info`` will work for archived + jobs. +- Jobs can define dependencies on other jobs (not yet supported via + RAPI or command line, but used by internal commands and usable via + LUXI). + + - Lock monitor (``gnt-debug locks``) shows jobs waiting for + dependencies. + +- Instance failover is now available as a RAPI resource + (``/2/instances/[instance_name]/failover``). +- ``gnt-instance info`` defaults to static information if primary node + is offline. +- Opcodes have a new ``comment`` attribute. +- Added basic SPICE support to KVM hypervisor. +- ``tools/ganeti-listrunner`` allows passing of arguments to executable. + +Node group improvements +~~~~~~~~~~~~~~~~~~~~~~~ + +- ``gnt-cluster verify`` has been modified to check groups separately, + thereby improving performance. +- Node group support has been added to ``gnt-cluster verify-disks``, + which now operates per node group. +- Watcher has been changed to work better with node groups. + + - One process and state file per node group. + - Slow watcher in one group doesn't block other group's watcher. + +- Added new command, ``gnt-group evacuate``, to move all instances in a + node group to other groups. +- Added ``gnt-instance change-group`` to move an instance to another + node group. +- ``gnt-cluster command`` and ``gnt-cluster copyfile`` now support + per-group operations. +- Node groups can be tagged. +- Some operations switch from an exclusive to a shared lock as soon as + possible. +- Instance's primary and secondary nodes' groups are now available as + query fields (``pnode.group``, ``pnode.group.uuid``, ``snodes.group`` + and ``snodes.group.uuid``). + +Misc +~~~~ + +- Numerous updates to documentation and manpages. + + - :doc:`RAPI ` documentation now has detailed parameter + descriptions. + - Some opcode/job results are now also documented, see :doc:`RAPI + `. + +- A lockset's internal lock is now also visible in lock monitor. +- Log messages from job queue workers now contain information about the + opcode they're processing. +- ``gnt-instance console`` no longer requires the instance lock. +- A short delay when waiting for job changes reduces the number of LUXI + requests significantly. +- DRBD metadata volumes are overwritten with zeros during disk creation. +- Out-of-band commands no longer acquire the cluster lock in exclusive + mode. +- ``devel/upload`` now uses correct permissions for directories. + + +Version 2.5.0 rc6 +----------------- + +*(Released Fri, 23 Mar 2012)* + +This was the sixth release candidate of the 2.5 series. + + +Version 2.5.0 rc5 +----------------- + +*(Released Mon, 9 Jan 2012)* + +This was the fifth release candidate of the 2.5 series. + + +Version 2.5.0 rc4 +----------------- + +*(Released Thu, 27 Oct 2011)* + +This was the fourth release candidate of the 2.5 series. + + +Version 2.5.0 rc3 +----------------- + +*(Released Wed, 26 Oct 2011)* + +This was the third release candidate of the 2.5 series. + + +Version 2.5.0 rc2 +----------------- + +*(Released Tue, 18 Oct 2011)* + +This was the second release candidate of the 2.5 series. + + +Version 2.5.0 rc1 +----------------- + +*(Released Tue, 4 Oct 2011)* + +This was the first release candidate of the 2.5 series. + + +Version 2.5.0 beta3 +------------------- + +*(Released Wed, 31 Aug 2011)* + +This was the third beta release of the 2.5 series. + + +Version 2.5.0 beta2 +------------------- + +*(Released Mon, 22 Aug 2011)* + +This was the second beta release of the 2.5 series. + + +Version 2.5.0 beta1 +------------------- + +*(Released Fri, 12 Aug 2011)* + +This was the first beta release of the 2.5 series. + + +Version 2.4.5 +------------- + +*(Released Thu, 27 Oct 2011)* + +- Fixed bug when parsing command line parameter values ending in + backslash +- Fixed assertion error after unclean master shutdown +- Disable HTTP client pool for RPC, significantly reducing memory usage + of master daemon +- Fixed queue archive creation with wrong permissions + + +Version 2.4.4 +------------- + +*(Released Tue, 23 Aug 2011)* + +Small bug-fixes: + +- Fixed documentation for importing with ``--src-dir`` option +- Fixed a bug in ``ensure-dirs`` with queue/archive permissions +- Fixed a parsing issue with DRBD 8.3.11 in the Linux kernel + + +Version 2.4.3 +------------- + +*(Released Fri, 5 Aug 2011)* + +Many bug-fixes and a few small features: + +- Fixed argument order in ``ReserveLV`` and ``ReserveMAC`` which caused + issues when you tried to add an instance with two MAC addresses in one + request +- KVM: fixed per-instance stored UID value +- KVM: configure bridged NICs at migration start +- KVM: Fix a bug where instance will not start with never KVM versions + (>= 0.14) +- Added OS search path to ``gnt-cluster info`` +- Fixed an issue with ``file_storage_dir`` where you were forced to + provide an absolute path, but the documentation states it is a + relative path, the documentation was right +- Added a new parameter to instance stop/start called ``--no-remember`` + that will make the state change to not be remembered +- Implemented ``no_remember`` at RAPI level +- Improved the documentation +- Node evacuation: don't call IAllocator if node is already empty +- Fixed bug in DRBD8 replace disks on current nodes +- Fixed bug in recreate-disks for DRBD instances +- Moved assertion checking locks in ``gnt-instance replace-disks`` + causing it to abort with not owning the right locks for some situation +- Job queue: Fixed potential race condition when cancelling queued jobs +- Fixed off-by-one bug in job serial generation +- ``gnt-node volumes``: Fix instance names +- Fixed aliases in bash completion +- Fixed a bug in reopening log files after being sent a SIGHUP +- Added a flag to burnin to allow specifying VCPU count +- Bugfixes to non-root Ganeti configuration + + +Version 2.4.2 +------------- + +*(Released Thu, 12 May 2011)* + +Many bug-fixes and a few new small features: + +- Fixed a bug related to log opening failures +- Fixed a bug in instance listing with orphan instances +- Fixed a bug which prevented resetting the cluster-level node parameter + ``oob_program`` to the default +- Many fixes related to the ``cluster-merge`` tool +- Fixed a race condition in the lock monitor, which caused failures + during (at least) creation of many instances in parallel +- Improved output for gnt-job info +- Removed the quiet flag on some ssh calls which prevented debugging + failures +- Improved the N+1 failure messages in cluster verify by actually + showing the memory values (needed and available) +- Increased lock attempt timeouts so that when executing long operations + (e.g. DRBD replace-disks) other jobs do not enter 'blocking acquire' + too early and thus prevent the use of the 'fair' mechanism +- Changed instance query data (``gnt-instance info``) to not acquire + locks unless needed, thus allowing its use on locked instance if only + static information is asked for +- Improved behaviour with filesystems that do not support rename on an + opened file +- Fixed the behaviour of ``prealloc_wipe_disks`` cluster parameter which + kept locks on all nodes during the wipe, which is unneeded +- Fixed ``gnt-watcher`` handling of errors during hooks execution +- Fixed bug in ``prealloc_wipe_disks`` with small disk sizes (less than + 10GiB) which caused the wipe to fail right at the end in some cases +- Fixed master IP activation when doing master failover with no-voting +- Fixed bug in ``gnt-node add --readd`` which allowed the re-adding of + the master node itself +- Fixed potential data-loss in under disk full conditions, where Ganeti + wouldn't check correctly the return code and would consider + partially-written files 'correct' +- Fixed bug related to multiple VGs and DRBD disk replacing +- Added new disk parameter ``metavg`` that allows placement of the meta + device for DRBD in a different volume group +- Fixed error handling in the node daemon when the system libc doesn't + have major number 6 (i.e. if ``libc.so.6`` is not the actual libc) +- Fixed lock release during replace-disks, which kept cluster-wide locks + when doing disk replaces with an iallocator script +- Added check for missing bridges in cluster verify +- Handle EPIPE errors while writing to the terminal better, so that + piping the output to e.g. ``less`` doesn't cause a backtrace +- Fixed rare case where a ^C during Luxi calls could have been + interpreted as server errors, instead of simply terminating +- Fixed a race condition in LUGroupAssignNodes (``gnt-group + assign-nodes``) +- Added a few more parameters to the KVM hypervisor, allowing a second + CDROM, custom disk type for CDROMs and a floppy image +- Removed redundant message in instance rename when the name is given + already as a FQDN +- Added option to ``gnt-instance recreate-disks`` to allow creating the + disks on new nodes, allowing recreation when the original instance + nodes are completely gone +- Added option when converting disk templates to DRBD to skip waiting + for the resync, in order to make the instance available sooner +- Added two new variables to the OS scripts environment (containing the + instance's nodes) +- Made the root_path and optional parameter for the xen-pvm hypervisor, + to allow use of ``pvgrub`` as bootloader +- Changed the instance memory modifications to only check out-of-memory + conditions on memory increases, and turned the secondary node warnings + into errors (they can still be overridden via ``--force``) +- Fixed the handling of a corner case when the Python installation gets + corrupted (e.g. a bad disk) while ganeti-noded is running and we try + to execute a command that doesn't exist +- Fixed a bug in ``gnt-instance move`` (LUInstanceMove) when the primary + node of the instance returned failures during instance shutdown; this + adds the option ``--ignore-consistency`` to gnt-instance move + +And as usual, various improvements to the error messages, documentation +and man pages. + + +Version 2.4.1 +------------- + +*(Released Wed, 09 Mar 2011)* + +Emergency bug-fix release. ``tools/cfgupgrade`` was broken and overwrote +the RAPI users file if run twice (even with ``--dry-run``). + +The release fixes that bug (nothing else changed). + + +Version 2.4.0 +------------- + +*(Released Mon, 07 Mar 2011)* + +Final 2.4.0 release. Just a few small fixes: + +- Fixed RAPI node evacuate +- Fixed the kvm-ifup script +- Fixed internal error handling for special job cases +- Updated man page to specify the escaping feature for options + + +Version 2.4.0 rc3 +----------------- + +*(Released Mon, 28 Feb 2011)* + +A critical fix for the ``prealloc_wipe_disks`` feature: it is possible +that this feature wiped the disks of the wrong instance, leading to loss +of data. + +Other changes: + +- Fixed title of query field containing instance name +- Expanded the glossary in the documentation +- Fixed one unittest (internal issue) + + +Version 2.4.0 rc2 +----------------- + +*(Released Mon, 21 Feb 2011)* + +A number of bug fixes plus just a couple functionality changes. + +On the user-visible side, the ``gnt-* list`` command output has changed +with respect to "special" field states. The current rc1 style of display +can be re-enabled by passing a new ``--verbose`` (``-v``) flag, but in +the default output mode special fields states are displayed as follows: + +- Offline resource: ``*`` +- Unavailable/not applicable: ``-`` +- Data missing (RPC failure): ``?`` +- Unknown field: ``??`` + +Another user-visible change is the addition of ``--force-join`` to +``gnt-node add``. + +As for bug fixes: + +- ``tools/cluster-merge`` has seen many fixes and is now enabled again +- Fixed regression in RAPI/instance reinstall where all parameters were + required (instead of optional) +- Fixed ``gnt-cluster repair-disk-sizes``, was broken since Ganeti 2.2 +- Fixed iallocator usage (offline nodes were not considered offline) +- Fixed ``gnt-node list`` with respect to non-vm_capable nodes +- Fixed hypervisor and OS parameter validation with respect to + non-vm_capable nodes +- Fixed ``gnt-cluster verify`` with respect to offline nodes (mostly + cosmetic) +- Fixed ``tools/listrunner`` with respect to agent-based usage + + +Version 2.4.0 rc1 +----------------- + +*(Released Fri, 4 Feb 2011)* + +Many changes and fixes since the beta1 release. While there were some +internal changes, the code has been mostly stabilised for the RC +release. + +Note: the dumb allocator was removed in this release, as it was not kept +up-to-date with the IAllocator protocol changes. It is recommended to +use the ``hail`` command from the ganeti-htools package. + +Note: the 2.4 and up versions of Ganeti are not compatible with the +0.2.x branch of ganeti-htools. You need to upgrade to +ganeti-htools-0.3.0 (or later). + +Regressions fixed from 2.3 +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- Fixed the ``gnt-cluster verify-disks`` command +- Made ``gnt-cluster verify-disks`` work in parallel (as opposed to + serially on nodes) +- Fixed disk adoption breakage +- Fixed wrong headers in instance listing for field aliases + +Other bugs fixed +~~~~~~~~~~~~~~~~ + +- Fixed corner case in KVM handling of NICs +- Fixed many cases of wrong handling of non-vm_capable nodes +- Fixed a bug where a missing instance symlink was not possible to + recreate with any ``gnt-*`` command (now ``gnt-instance + activate-disks`` does it) +- Fixed the volume group name as reported by ``gnt-cluster + verify-disks`` +- Increased timeouts for the import-export code, hopefully leading to + fewer aborts due network or instance timeouts +- Fixed bug in ``gnt-node list-storage`` +- Fixed bug where not all daemons were started on cluster + initialisation, but only at the first watcher run +- Fixed many bugs in the OOB implementation +- Fixed watcher behaviour in presence of instances with offline + secondaries +- Fixed instance list output for instances running on the wrong node +- a few fixes to the cluster-merge tool, but it still cannot merge + multi-node groups (currently it is not recommended to use this tool) + + +Improvements +~~~~~~~~~~~~ + +- Improved network configuration for the KVM hypervisor +- Added e1000 as a supported NIC for Xen-HVM +- Improved the lvmstrap tool to also be able to use partitions, as + opposed to full disks +- Improved speed of disk wiping (the cluster parameter + ``prealloc_wipe_disks``, so that it has a low impact on the total time + of instance creations +- Added documentation for the OS parameters +- Changed ``gnt-instance deactivate-disks`` so that it can work if the + hypervisor is not responding +- Added display of blacklisted and hidden OS information in + ``gnt-cluster info`` +- Extended ``gnt-cluster verify`` to also validate hypervisor, backend, + NIC and node parameters, which might create problems with currently + invalid (but undetected) configuration files, but prevents validation + failures when unrelated parameters are modified +- Changed cluster initialisation to wait for the master daemon to become + available +- Expanded the RAPI interface: + + - Added config redistribution resource + - Added activation/deactivation of instance disks + - Added export of console information + +- Implemented log file reopening on SIGHUP, which allows using + logrotate(8) for the Ganeti log files +- Added a basic OOB helper script as an example + + +Version 2.4.0 beta1 +------------------- + +*(Released Fri, 14 Jan 2011)* + +User-visible +~~~~~~~~~~~~ + +- Fixed timezone issues when formatting timestamps +- Added support for node groups, available via ``gnt-group`` and other + commands +- Added out-of-band framework and management, see :doc:`design + document ` +- Removed support for roman numbers from ``gnt-node list`` and + ``gnt-instance list``. +- Allowed modification of master network interface via ``gnt-cluster + modify --master-netdev`` +- Accept offline secondaries while shutting down instance disks +- Added ``blockdev_prefix`` parameter to Xen PVM and HVM hypervisors +- Added support for multiple LVM volume groups +- Avoid sorting nodes for ``gnt-node list`` if specific nodes are + requested +- Added commands to list available fields: + + - ``gnt-node list-fields`` + - ``gnt-group list-fields`` + - ``gnt-instance list-fields`` + +- Updated documentation and man pages + +Integration +~~~~~~~~~~~ + +- Moved ``rapi_users`` file into separate directory, now named + ``.../ganeti/rapi/users``, ``cfgupgrade`` moves the file and creates a + symlink +- Added new tool for running commands on many machines, + ``tools/ganeti-listrunner`` +- Implemented more verbose result in ``OpInstanceConsole`` opcode, also + improving the ``gnt-instance console`` output +- Allowed customisation of disk index separator at ``configure`` time +- Export node group allocation policy to :doc:`iallocator ` +- Added support for non-partitioned md disks in ``lvmstrap`` +- Added script to gracefully power off KVM instances +- Split ``utils`` module into smaller parts +- Changed query operations to return more detailed information, e.g. + whether an information is unavailable due to an offline node. To use + this new functionality, the LUXI call ``Query`` must be used. Field + information is now stored by the master daemon and can be retrieved + using ``QueryFields``. Instances, nodes and groups can also be queried + using the new opcodes ``OpQuery`` and ``OpQueryFields`` (not yet + exposed via RAPI). The following commands make use of this + infrastructure change: + + - ``gnt-group list`` + - ``gnt-group list-fields`` + - ``gnt-node list`` + - ``gnt-node list-fields`` + - ``gnt-instance list`` + - ``gnt-instance list-fields`` + - ``gnt-debug locks`` + +Remote API +~~~~~~~~~~ + +- New RAPI resources (see :doc:`rapi`): + + - ``/2/modify`` + - ``/2/groups`` + - ``/2/groups/[group_name]`` + - ``/2/groups/[group_name]/assign-nodes`` + - ``/2/groups/[group_name]/modify`` + - ``/2/groups/[group_name]/rename`` + - ``/2/instances/[instance_name]/disk/[disk_index]/grow`` + +- RAPI changes: + + - Implemented ``no_install`` for instance creation + - Implemented OS parameters for instance reinstallation, allowing + use of special settings on reinstallation (e.g. for preserving data) + +Misc +~~~~ + +- Added IPv6 support in import/export +- Pause DRBD synchronization while wiping disks on instance creation +- Updated unittests and QA scripts +- Improved network parameters passed to KVM +- Converted man pages from docbook to reStructuredText + + +Version 2.3.1 +------------- + +*(Released Mon, 20 Dec 2010)* + +Released version 2.3.1~rc1 without any changes. + + +Version 2.3.1 rc1 +----------------- + +*(Released Wed, 1 Dec 2010)* + +- impexpd: Disable OpenSSL compression in socat if possible (backport + from master, commit e90739d625b, see :doc:`installation guide + ` for details) +- Changed unittest coverage report to exclude test scripts +- Added script to check version format + + +Version 2.3.0 +------------- + +*(Released Wed, 1 Dec 2010)* + +Released version 2.3.0~rc1 without any changes. + + +Version 2.3.0 rc1 +----------------- + +*(Released Fri, 19 Nov 2010)* + +A number of bugfixes and documentation updates: + +- Update ganeti-os-interface documentation +- Fixed a bug related to duplicate MACs or similar items which should be + unique +- Fix breakage in OS state modify +- Reinstall instance: disallow offline secondaries (fixes bug related to + OS changing but reinstall failing) +- plus all the other fixes between 2.2.1 and 2.2.2 + + +Version 2.3.0 rc0 +----------------- + +*(Released Tue, 2 Nov 2010)* + +- Fixed clearing of the default iallocator using ``gnt-cluster modify`` +- Fixed master failover race with watcher +- Fixed a bug in ``gnt-node modify`` which could lead to an inconsistent + configuration +- Accept previously stopped instance for export with instance removal +- Simplify and extend the environment variables for instance OS scripts +- Added new node flags, ``master_capable`` and ``vm_capable`` +- Added optional instance disk wiping prior during allocation. This is a + cluster-wide option and can be set/modified using + ``gnt-cluster {init,modify} --prealloc-wipe-disks``. +- Added IPv6 support, see :doc:`design document ` and + :doc:`install-quick` +- Added a new watcher option (``--ignore-pause``) +- Added option to ignore offline node on instance start/stop + (``--ignore-offline``) +- Allow overriding OS parameters with ``gnt-instance reinstall`` +- Added ability to change node's secondary IP address using ``gnt-node + modify`` +- Implemented privilege separation for all daemons except + ``ganeti-noded``, see ``configure`` options +- Complain if an instance's disk is marked faulty in ``gnt-cluster + verify`` +- Implemented job priorities (see ``ganeti(7)`` manpage) +- Ignore failures while shutting down instances during failover from + offline node +- Exit daemon's bootstrap process only once daemon is ready +- Export more information via ``LUInstanceQuery``/remote API +- Improved documentation, QA and unittests +- RAPI daemon now watches ``rapi_users`` all the time and doesn't need a + restart if the file was created or changed +- Added LUXI protocol version sent with each request and response, + allowing detection of server/client mismatches +- Moved the Python scripts among gnt-* and ganeti-* into modules +- Moved all code related to setting up SSH to an external script, + ``setup-ssh`` +- Infrastructure changes for node group support in future versions + + +Version 2.2.2 +------------- + +*(Released Fri, 19 Nov 2010)* + +A few small bugs fixed, and some improvements to the build system: + +- Fix documentation regarding conversion to drbd +- Fix validation of parameters in cluster modify (``gnt-cluster modify + -B``) +- Fix error handling in node modify with multiple changes +- Allow remote imports without checked names + + +Version 2.2.1 +------------- + +*(Released Tue, 19 Oct 2010)* + +- Disable SSL session ID cache in RPC client + + +Version 2.2.1 rc1 +----------------- + +*(Released Thu, 14 Oct 2010)* + +- Fix interaction between Curl/GnuTLS and the Python's HTTP server + (thanks Apollon Oikonomopoulos!), finally allowing the use of Curl + with GnuTLS +- Fix problems with interaction between Curl and Python's HTTP server, + resulting in increased speed in many RPC calls +- Improve our release script to prevent breakage with older aclocal and + Python 2.6 + + +Version 2.2.1 rc0 +----------------- + +*(Released Thu, 7 Oct 2010)* + +- Fixed issue 125, replace hardcoded "xenvg" in ``gnt-cluster`` with + value retrieved from master +- Added support for blacklisted or hidden OS definitions +- Added simple lock monitor (accessible via (``gnt-debug locks``) +- Added support for -mem-path in KVM hypervisor abstraction layer +- Allow overriding instance parameters in tool for inter-cluster + instance moves (``tools/move-instance``) +- Improved opcode summaries (e.g. in ``gnt-job list``) +- Improve consistency of OS listing by sorting it +- Documentation updates + + +Version 2.2.0.1 +--------------- + +*(Released Fri, 8 Oct 2010)* + +- Rebuild with a newer autotools version, to fix python 2.6 compatibility + + +Version 2.2.0 +------------- + +*(Released Mon, 4 Oct 2010)* + +- Fixed regression in ``gnt-instance rename`` + + +Version 2.2.0 rc2 +----------------- + +*(Released Wed, 22 Sep 2010)* + +- Fixed OS_VARIANT variable for OS scripts +- Fixed cluster tag operations via RAPI +- Made ``setup-ssh`` exit with non-zero code if an error occurred +- Disabled RAPI CA checks in watcher + + +Version 2.2.0 rc1 +----------------- + +*(Released Mon, 23 Aug 2010)* + +- Support DRBD versions of the format "a.b.c.d" +- Updated manpages +- Re-introduce support for usage from multiple threads in RAPI client +- Instance renames and modify via RAPI +- Work around race condition between processing and archival in job + queue +- Mark opcodes following failed one as failed, too +- Job field ``lock_status`` was removed due to difficulties making it + work with the changed job queue in Ganeti 2.2; a better way to monitor + locks is expected for a later 2.2.x release +- Fixed dry-run behaviour with many commands +- Support ``ssh-agent`` again when adding nodes +- Many additional bugfixes + + +Version 2.2.0 rc0 +----------------- + +*(Released Fri, 30 Jul 2010)* + +Important change: the internal RPC mechanism between Ganeti nodes has +changed from using a home-grown http library (based on the Python base +libraries) to use the PycURL library. This requires that PycURL is +installed on nodes. Please note that on Debian/Ubuntu, PycURL is linked +against GnuTLS by default. cURL's support for GnuTLS had known issues +before cURL 7.21.0 and we recommend using the latest cURL release or +linking against OpenSSL. Most other distributions already link PycURL +and cURL against OpenSSL. The command:: + + python -c 'import pycurl; print pycurl.version' + +can be used to determine the libraries PycURL and cURL are linked +against. + +Other significant changes: + +- Rewrote much of the internals of the job queue, in order to achieve + better parallelism; this decouples job query operations from the job + processing, and it should allow much nicer behaviour of the master + daemon under load, and it also has uncovered some long-standing bugs + related to the job serialisation (now fixed) +- Added a default iallocator setting to the cluster parameters, + eliminating the need to always pass nodes or an iallocator for + operations that require selection of new node(s) +- Added experimental support for the LXC virtualization method +- Added support for OS parameters, which allows the installation of + instances to pass parameter to OS scripts in order to customise the + instance +- Added a hypervisor parameter controlling the migration type (live or + non-live), since hypervisors have various levels of reliability; this + has renamed the 'live' parameter to 'mode' +- Added a cluster parameter ``reserved_lvs`` that denotes reserved + logical volumes, meaning that cluster verify will ignore them and not + flag their presence as errors +- The watcher will now reset the error count for failed instances after + 8 hours, thus allowing self-healing if the problem that caused the + instances to be down/fail to start has cleared in the meantime +- Added a cluster parameter ``drbd_usermode_helper`` that makes Ganeti + check for, and warn, if the drbd module parameter ``usermode_helper`` + is not consistent with the cluster-wide setting; this is needed to + make diagnose easier of failed drbd creations +- Started adding base IPv6 support, but this is not yet + enabled/available for use +- Rename operations (cluster, instance) will now return the new name, + which is especially useful if a short name was passed in +- Added support for instance migration in RAPI +- Added a tool to pre-configure nodes for the SSH setup, before joining + them to the cluster; this will allow in the future a simplified model + for node joining (but not yet fully enabled in 2.2); this needs the + paramiko python library +- Fixed handling of name-resolving errors +- Fixed consistency of job results on the error path +- Fixed master-failover race condition when executed multiple times in + sequence +- Fixed many bugs related to the job queue (mostly introduced during the + 2.2 development cycle, so not all are impacting 2.1) +- Fixed instance migration with missing disk symlinks +- Fixed handling of unknown jobs in ``gnt-job archive`` +- And many other small fixes/improvements + +Internal changes: + +- Enhanced both the unittest and the QA coverage +- Switched the opcode validation to a generic model, and extended the + validation to all opcode parameters +- Changed more parts of the code that write shell scripts to use the + same class for this +- Switched the master daemon to use the asyncore library for the Luxi + server endpoint + + +Version 2.2.0 beta0 +------------------- + +*(Released Thu, 17 Jun 2010)* + +- Added tool (``move-instance``) and infrastructure to move instances + between separate clusters (see :doc:`separate documentation + ` and :doc:`design document `) +- Added per-request RPC timeout +- RAPI now requires a Content-Type header for requests with a body (e.g. + ``PUT`` or ``POST``) which must be set to ``application/json`` (see + :rfc:`2616` (HTTP/1.1), section 7.2.1) +- ``ganeti-watcher`` attempts to restart ``ganeti-rapi`` if RAPI is not + reachable +- Implemented initial support for running Ganeti daemons as separate + users, see configure-time flags ``--with-user-prefix`` and + ``--with-group-prefix`` (only ``ganeti-rapi`` is supported at this + time) +- Instances can be removed after export (``gnt-backup export + --remove-instance``) +- Self-signed certificates generated by Ganeti now use a 2048 bit RSA + key (instead of 1024 bit) +- Added new cluster configuration file for cluster domain secret +- Import/export now use SSL instead of SSH +- Added support for showing estimated time when exporting an instance, + see the ``ganeti-os-interface(7)`` manpage and look for + ``EXP_SIZE_FD`` + + +Version 2.1.8 +------------- + +*(Released Tue, 16 Nov 2010)* + +Some more bugfixes. Unless critical bugs occur, this will be the last +2.1 release: + +- Fix case of MAC special-values +- Fix mac checker regex +- backend: Fix typo causing "out of range" error +- Add missing --units in gnt-instance list man page + + +Version 2.1.7 +------------- + +*(Released Tue, 24 Aug 2010)* + +Bugfixes only: + - Don't ignore secondary node silently on non-mirrored disk templates + (issue 113) + - Fix --master-netdev arg name in gnt-cluster(8) (issue 114) + - Fix usb_mouse parameter breaking with vnc_console (issue 109) + - Properly document the usb_mouse parameter + - Fix path in ganeti-rapi(8) (issue 116) + - Adjust error message when the ganeti user's .ssh directory is + missing + - Add same-node-check when changing the disk template to drbd + + +Version 2.1.6 +------------- + +*(Released Fri, 16 Jul 2010)* + +Bugfixes only: + - Add an option to only select some reboot types during qa/burnin. + (on some hypervisors consequent reboots are not supported) + - Fix infrequent race condition in master failover. Sometimes the old + master ip address would be still detected as up for a short time + after it was removed, causing failover to fail. + - Decrease mlockall warnings when the ctypes module is missing. On + Python 2.4 we support running even if no ctypes module is installed, + but we were too verbose about this issue. + - Fix building on old distributions, on which man doesn't have a + --warnings option. + - Fix RAPI not to ignore the MAC address on instance creation + - Implement the old instance creation format in the RAPI client. + + +Version 2.1.5 +------------- + +*(Released Thu, 01 Jul 2010)* + +A small bugfix release: + - Fix disk adoption: broken by strict --disk option checking in 2.1.4 + - Fix batch-create: broken in the whole 2.1 series due to a lookup on + a non-existing option + - Fix instance create: the --force-variant option was ignored + - Improve pylint 0.21 compatibility and warnings with Python 2.6 + - Fix modify node storage with non-FQDN arguments + - Fix RAPI client to authenticate under Python 2.6 when used + for more than 5 requests needing authentication + - Fix gnt-instance modify -t (storage) giving a wrong error message + when converting a non-shutdown drbd instance to plain + + +Version 2.1.4 +------------- + +*(Released Fri, 18 Jun 2010)* + +A small bugfix release: + + - Fix live migration of KVM instances started with older Ganeti + versions which had fewer hypervisor parameters + - Fix gnt-instance grow-disk on down instances + - Fix an error-reporting bug during instance migration + - Better checking of the ``--net`` and ``--disk`` values, to avoid + silently ignoring broken ones + - Fix an RPC error reporting bug affecting, for example, RAPI client + users + - Fix bug triggered by different API version os-es on different nodes + - Fix a bug in instance startup with custom hvparams: OS level + parameters would fail to be applied. + - Fix the RAPI client under Python 2.6 (but more work is needed to + make it work completely well with OpenSSL) + - Fix handling of errors when resolving names from DNS + + +Version 2.1.3 +------------- + +*(Released Thu, 3 Jun 2010)* + +A medium sized development cycle. Some new features, and some +fixes/small improvements/cleanups. + +Significant features +~~~~~~~~~~~~~~~~~~~~ + +The node deamon now tries to mlock itself into memory, unless the +``--no-mlock`` flag is passed. It also doesn't fail if it can't write +its logs, and falls back to console logging. This allows emergency +features such as ``gnt-node powercycle`` to work even in the event of a +broken node disk (tested offlining the disk hosting the node's +filesystem and dropping its memory caches; don't try this at home) + +KVM: add vhost-net acceleration support. It can be tested with a new +enough version of the kernel and of qemu-kvm. + +KVM: Add instance chrooting feature. If you use privilege dropping for +your VMs you can also now force them to chroot to an empty directory, +before starting the emulated guest. + +KVM: Add maximum migration bandwith and maximum downtime tweaking +support (requires a new-enough version of qemu-kvm). + +Cluster verify will now warn if the master node doesn't have the master +ip configured on it. + +Add a new (incompatible) instance creation request format to RAPI which +supports all parameters (previously only a subset was supported, and it +wasn't possible to extend the old format to accomodate all the new +features. The old format is still supported, and a client can check for +this feature, before using it, by checking for its presence in the +``features`` RAPI resource. + +Now with ancient latin support. Try it passing the ``--roman`` option to +``gnt-instance info``, ``gnt-cluster info`` or ``gnt-node list`` +(requires the python-roman module to be installed, in order to work). + +Other changes +~~~~~~~~~~~~~ + +As usual many internal code refactorings, documentation updates, and +such. Among others: + + - Lots of improvements and cleanups to the experimental Remote API + (RAPI) client library. + - A new unit test suite for the core daemon libraries. + - A fix to creating missing directories makes sure the umask is not + applied anymore. This enforces the same directory permissions + everywhere. + - Better handling terminating daemons with ctrl+c (used when running + them in debugging mode). + - Fix a race condition in live migrating a KVM instance, when stat() + on the old proc status file returned EINVAL, which is an unexpected + value. + - Fixed manpage checking with newer man and utf-8 charachters. But now + you need the en_US.UTF-8 locale enabled to build Ganeti from git. + + +Version 2.1.2.1 +--------------- + +*(Released Fri, 7 May 2010)* + +Fix a bug which prevented untagged KVM instances from starting. + + +Version 2.1.2 +------------- + +*(Released Fri, 7 May 2010)* + +Another release with a long development cycle, during which many +different features were added. + +Significant features +~~~~~~~~~~~~~~~~~~~~ + +The KVM hypervisor now can run the individual instances as non-root, to +reduce the impact of a VM being hijacked due to bugs in the +hypervisor. It is possible to run all instances as a single (non-root) +user, to manually specify a user for each instance, or to dynamically +allocate a user out of a cluster-wide pool to each instance, with the +guarantee that no two instances will run under the same user ID on any +given node. + +An experimental RAPI client library, that can be used standalone +(without the other Ganeti libraries), is provided in the source tree as +``lib/rapi/client.py``. Note this client might change its interface in +the future, as we iterate on its capabilities. + +A new command, ``gnt-cluster renew-crypto`` has been added to easily +replace the cluster's certificates and crypto keys. This might help in +case they have been compromised, or have simply expired. + +A new disk option for instance creation has been added that allows one +to "adopt" currently existing logical volumes, with data +preservation. This should allow easier migration to Ganeti from +unmanaged (or managed via other software) instances. + +Another disk improvement is the possibility to convert between redundant +(DRBD) and plain (LVM) disk configuration for an instance. This should +allow better scalability (starting with one node and growing the +cluster, or shrinking a two-node cluster to one node). + +A new feature that could help with automated node failovers has been +implemented: if a node sees itself as offline (by querying the master +candidates), it will try to shutdown (hard) all instances and any active +DRBD devices. This reduces the risk of duplicate instances if an +external script automatically failovers the instances on such nodes. To +enable this, the cluster parameter ``maintain_node_health`` should be +enabled; in the future this option (per the name) will enable other +automatic maintenance features. + +Instance export/import now will reuse the original instance +specifications for all parameters; that means exporting an instance, +deleting it and the importing it back should give an almost identical +instance. Note that the default import behaviour has changed from +before, where it created only one NIC; now it recreates the original +number of NICs. + +Cluster verify has added a few new checks: SSL certificates validity, +/etc/hosts consistency across the cluster, etc. + +Other changes +~~~~~~~~~~~~~ + +As usual, many internal changes were done, documentation fixes, +etc. Among others: + +- Fixed cluster initialization with disabled cluster storage (regression + introduced in 2.1.1) +- File-based storage supports growing the disks +- Fixed behaviour of node role changes +- Fixed cluster verify for some corner cases, plus a general rewrite of + cluster verify to allow future extension with more checks +- Fixed log spamming by watcher and node daemon (regression introduced + in 2.1.1) +- Fixed possible validation issues when changing the list of enabled + hypervisors +- Fixed cleanup of /etc/hosts during node removal +- Fixed RAPI response for invalid methods +- Fixed bug with hashed passwords in ``ganeti-rapi`` daemon +- Multiple small improvements to the KVM hypervisor (VNC usage, booting + from ide disks, etc.) +- Allow OS changes without re-installation (to record a changed OS + outside of Ganeti, or to allow OS renames) +- Allow instance creation without OS installation (useful for example if + the OS will be installed manually, or restored from a backup not in + Ganeti format) +- Implemented option to make cluster ``copyfile`` use the replication + network +- Added list of enabled hypervisors to ssconf (possibly useful for + external scripts) +- Added a new tool (``tools/cfgupgrade12``) that allows upgrading from + 1.2 clusters +- A partial form of node re-IP is possible via node readd, which now + allows changed node primary IP +- Command line utilities now show an informational message if the job is + waiting for a lock +- The logs of the master daemon now show the PID/UID/GID of the + connected client + + +Version 2.1.1 +------------- + +*(Released Fri, 12 Mar 2010)* + +During the 2.1.0 long release candidate cycle, a lot of improvements and +changes have accumulated with were released later as 2.1.1. + +Major changes +~~~~~~~~~~~~~ + +The node evacuate command (``gnt-node evacuate``) was significantly +rewritten, and as such the IAllocator protocol was changed - a new +request type has been added. This unfortunate change during a stable +series is designed to improve performance of node evacuations; on +clusters with more than about five nodes and which are well-balanced, +evacuation should proceed in parallel for all instances of the node +being evacuated. As such, any existing IAllocator scripts need to be +updated, otherwise the above command will fail due to the unknown +request. The provided "dumb" allocator has not been updated; but the +ganeti-htools package supports the new protocol since version 0.2.4. + +Another important change is increased validation of node and instance +names. This might create problems in special cases, if invalid host +names are being used. + +Also, a new layer of hypervisor parameters has been added, that sits at +OS level between the cluster defaults and the instance ones. This allows +customisation of virtualization parameters depending on the installed +OS. For example instances with OS 'X' may have a different KVM kernel +(or any other parameter) than the cluster defaults. This is intended to +help managing a multiple OSes on the same cluster, without manual +modification of each instance's parameters. + +A tool for merging clusters, ``cluster-merge``, has been added in the +tools sub-directory. + +Bug fixes +~~~~~~~~~ + +- Improved the int/float conversions that should make the code more + robust in face of errors from the node daemons +- Fixed the remove node code in case of internal configuration errors +- Fixed the node daemon behaviour in face of inconsistent queue + directory (e.g. read-only file-system where we can't open the files + read-write, etc.) +- Fixed the behaviour of gnt-node modify for master candidate demotion; + now it either aborts cleanly or, if given the new "auto_promote" + parameter, will automatically promote other nodes as needed +- Fixed compatibility with (unreleased yet) Python 2.6.5 that would + completely prevent Ganeti from working +- Fixed bug for instance export when not all disks were successfully + exported +- Fixed behaviour of node add when the new node is slow in starting up + the node daemon +- Fixed handling of signals in the LUXI client, which should improve + behaviour of command-line scripts +- Added checks for invalid node/instance names in the configuration (now + flagged during cluster verify) +- Fixed watcher behaviour for disk activation errors +- Fixed two potentially endless loops in http library, which led to the + RAPI daemon hanging and consuming 100% CPU in some cases +- Fixed bug in RAPI daemon related to hashed passwords +- Fixed bug for unintended qemu-level bridging of multi-NIC KVM + instances +- Enhanced compatibility with non-Debian OSes, but not using absolute + path in some commands and allowing customisation of the ssh + configuration directory +- Fixed possible future issue with new Python versions by abiding to the + proper use of ``__slots__`` attribute on classes +- Added checks that should prevent directory traversal attacks +- Many documentation fixes based on feedback from users + +New features +~~~~~~~~~~~~ + +- Added an "early_release" more for instance replace disks and node + evacuate, where we release locks earlier and thus allow higher + parallelism within the cluster +- Added watcher hooks, intended to allow the watcher to restart other + daemons (e.g. from the ganeti-nbma project), but they can be used of + course for any other purpose +- Added a compile-time disable for DRBD barriers, to increase + performance if the administrator trusts the power supply or the + storage system to not lose writes +- Added the option of using syslog for logging instead of, or in + addition to, Ganeti's own log files +- Removed boot restriction for paravirtual NICs for KVM, recent versions + can indeed boot from a paravirtual NIC +- Added a generic debug level for many operations; while this is not + used widely yet, it allows one to pass the debug value all the way to + the OS scripts +- Enhanced the hooks environment for instance moves (failovers, + migrations) where the primary/secondary nodes changed during the + operation, by adding {NEW,OLD}_{PRIMARY,SECONDARY} vars +- Enhanced data validations for many user-supplied values; one important + item is the restrictions imposed on instance and node names, which + might reject some (invalid) host names +- Add a configure-time option to disable file-based storage, if it's not + needed; this allows greater security separation between the master + node and the other nodes from the point of view of the inter-node RPC + protocol +- Added user notification in interactive tools if job is waiting in the + job queue or trying to acquire locks +- Added log messages when a job is waiting for locks +- Added filtering by node tags in instance operations which admit + multiple instances (start, stop, reboot, reinstall) +- Added a new tool for cluster mergers, ``cluster-merge`` +- Parameters from command line which are of the form ``a=b,c=d`` can now + use backslash escapes to pass in values which contain commas, + e.g. ``a=b\\c,d=e`` where the 'a' parameter would get the value + ``b,c`` +- For KVM, the instance name is the first parameter passed to KVM, so + that it's more visible in the process list + + +Version 2.1.0 +------------- + +*(Released Tue, 2 Mar 2010)* + +Ganeti 2.1 brings many improvements with it. Major changes: + +- Added infrastructure to ease automated disk repairs +- Added new daemon to export configuration data in a cheaper way than + using the remote API +- Instance NICs can now be routed instead of being associated with a + networking bridge +- Improved job locking logic to reduce impact of jobs acquiring multiple + locks waiting for other long-running jobs + +In-depth implementation details can be found in the Ganeti 2.1 design +document. + +Details +~~~~~~~ + +- Added chroot hypervisor +- Added more options to xen-hvm hypervisor (``kernel_path`` and + ``device_model``) +- Added more options to xen-pvm hypervisor (``use_bootloader``, + ``bootloader_path`` and ``bootloader_args``) +- Added the ``use_localtime`` option for the xen-hvm and kvm + hypervisors, and the default value for this has changed to false (in + 2.0 xen-hvm always enabled it) +- Added luxi call to submit multiple jobs in one go +- Added cluster initialization option to not modify ``/etc/hosts`` + file on nodes +- Added network interface parameters +- Added dry run mode to some LUs +- Added RAPI resources: + + - ``/2/instances/[instance_name]/info`` + - ``/2/instances/[instance_name]/replace-disks`` + - ``/2/nodes/[node_name]/evacuate`` + - ``/2/nodes/[node_name]/migrate`` + - ``/2/nodes/[node_name]/role`` + - ``/2/nodes/[node_name]/storage`` + - ``/2/nodes/[node_name]/storage/modify`` + - ``/2/nodes/[node_name]/storage/repair`` + +- Added OpCodes to evacuate or migrate all instances on a node +- Added new command to list storage elements on nodes (``gnt-node + list-storage``) and modify them (``gnt-node modify-storage``) +- Added new ssconf files with master candidate IP address + (``ssconf_master_candidates_ips``), node primary IP address + (``ssconf_node_primary_ips``) and node secondary IP address + (``ssconf_node_secondary_ips``) +- Added ``ganeti-confd`` and a client library to query the Ganeti + configuration via UDP +- Added ability to run hooks after cluster initialization and before + cluster destruction +- Added automatic mode for disk replace (``gnt-instance replace-disks + --auto``) +- Added ``gnt-instance recreate-disks`` to re-create (empty) disks + after catastrophic data-loss +- Added ``gnt-node repair-storage`` command to repair damaged LVM volume + groups +- Added ``gnt-instance move`` command to move instances +- Added ``gnt-cluster watcher`` command to control watcher +- Added ``gnt-node powercycle`` command to powercycle nodes +- Added new job status field ``lock_status`` +- Added parseable error codes to cluster verification (``gnt-cluster + verify --error-codes``) and made output less verbose (use + ``--verbose`` to restore previous behaviour) +- Added UUIDs to the main config entities (cluster, nodes, instances) +- Added support for OS variants +- Added support for hashed passwords in the Ganeti remote API users file + (``rapi_users``) +- Added option to specify maximum timeout on instance shutdown +- Added ``--no-ssh-init`` option to ``gnt-cluster init`` +- Added new helper script to start and stop Ganeti daemons + (``daemon-util``), with the intent to reduce the work necessary to + adjust Ganeti for non-Debian distributions and to start/stop daemons + from one place +- Added more unittests +- Fixed critical bug in ganeti-masterd startup +- Removed the configure-time ``kvm-migration-port`` parameter, this is + now customisable at the cluster level for both the KVM and Xen + hypervisors using the new ``migration_port`` parameter +- Pass ``INSTANCE_REINSTALL`` variable to OS installation script when + reinstalling an instance +- Allowed ``@`` in tag names +- Migrated to Sphinx (http://sphinx.pocoo.org/) for documentation +- Many documentation updates +- Distribute hypervisor files on ``gnt-cluster redist-conf`` +- ``gnt-instance reinstall`` can now reinstall multiple instances +- Updated many command line parameters +- Introduced new OS API version 15 +- No longer support a default hypervisor +- Treat virtual LVs as inexistent +- Improved job locking logic to reduce lock contention +- Match instance and node names case insensitively +- Reimplemented bash completion script to be more complete +- Improved burnin + + +Version 2.0.6 +------------- + +*(Released Thu, 4 Feb 2010)* + +- Fix cleaner behaviour on nodes not in a cluster (Debian bug 568105) +- Fix a string formatting bug +- Improve safety of the code in some error paths +- Improve data validation in the master of values returned from nodes + + +Version 2.0.5 +------------- + +*(Released Thu, 17 Dec 2009)* + +- Fix security issue due to missing validation of iallocator names; this + allows local and remote execution of arbitrary executables +- Fix failure of gnt-node list during instance removal +- Ship the RAPI documentation in the archive + + +Version 2.0.4 +------------- + +*(Released Wed, 30 Sep 2009)* + +- Fixed many wrong messages +- Fixed a few bugs related to the locking library +- Fixed MAC checking at instance creation time +- Fixed a DRBD parsing bug related to gaps in /proc/drbd +- Fixed a few issues related to signal handling in both daemons and + scripts +- Fixed the example startup script provided +- Fixed insserv dependencies in the example startup script (patch from + Debian) +- Fixed handling of drained nodes in the iallocator framework +- Fixed handling of KERNEL_PATH parameter for xen-hvm (Debian bug + #528618) +- Fixed error related to invalid job IDs in job polling +- Fixed job/opcode persistence on unclean master shutdown +- Fixed handling of partial job processing after unclean master + shutdown +- Fixed error reporting from LUs, previously all errors were converted + into execution errors +- Fixed error reporting from burnin +- Decreased significantly the memory usage of the job queue +- Optimised slightly multi-job submission +- Optimised slightly opcode loading +- Backported the multi-job submit framework from the development + branch; multi-instance start and stop should be faster +- Added script to clean archived jobs after 21 days; this will reduce + the size of the queue directory +- Added some extra checks in disk size tracking +- Added an example ethers hook script +- Added a cluster parameter that prevents Ganeti from modifying of + /etc/hosts +- Added more node information to RAPI responses +- Added a ``gnt-job watch`` command that allows following the ouput of a + job +- Added a bind-address option to ganeti-rapi +- Added more checks to the configuration verify +- Enhanced the burnin script such that some operations can be retried + automatically +- Converted instance reinstall to multi-instance model + + +Version 2.0.3 +------------- + +*(Released Fri, 7 Aug 2009)* + +- Added ``--ignore-size`` to the ``gnt-instance activate-disks`` command + to allow using the pre-2.0.2 behaviour in activation, if any existing + instances have mismatched disk sizes in the configuration +- Added ``gnt-cluster repair-disk-sizes`` command to check and update + any configuration mismatches for disk sizes +- Added ``gnt-master cluste-failover --no-voting`` to allow master + failover to work on two-node clusters +- Fixed the ``--net`` option of ``gnt-backup import``, which was + unusable +- Fixed detection of OS script errors in ``gnt-backup export`` +- Fixed exit code of ``gnt-backup export`` + + +Version 2.0.2 +------------- + +*(Released Fri, 17 Jul 2009)* + +- Added experimental support for stripped logical volumes; this should + enhance performance but comes with a higher complexity in the block + device handling; stripping is only enabled when passing + ``--with-lvm-stripecount=N`` to ``configure``, but codepaths are + affected even in the non-stripped mode +- Improved resiliency against transient failures at the end of DRBD + resyncs, and in general of DRBD resync checks +- Fixed a couple of issues with exports and snapshot errors +- Fixed a couple of issues in instance listing +- Added display of the disk size in ``gnt-instance info`` +- Fixed checking for valid OSes in instance creation +- Fixed handling of the "vcpus" parameter in instance listing and in + general of invalid parameters +- Fixed http server library, and thus RAPI, to handle invalid + username/password combinations correctly; this means that now they + report unauthorized for queries too, not only for modifications, + allowing earlier detect of configuration problems +- Added a new "role" node list field, equivalent to the master/master + candidate/drained/offline flags combinations +- Fixed cluster modify and changes of candidate pool size +- Fixed cluster verify error messages for wrong files on regular nodes +- Fixed a couple of issues with node demotion from master candidate role +- Fixed node readd issues +- Added non-interactive mode for ``ganeti-masterd --no-voting`` startup +- Added a new ``--no-voting`` option for masterfailover to fix failover + on two-nodes clusters when the former master node is unreachable +- Added instance reinstall over RAPI + + +Version 2.0.1 +------------- + +*(Released Tue, 16 Jun 2009)* + +- added ``-H``/``-B`` startup parameters to ``gnt-instance``, which will + allow re-adding the start in single-user option (regression from 1.2) +- the watcher writes the instance status to a file, to allow monitoring + to report the instance status (from the master) based on cached + results of the watcher's queries; while this can get stale if the + watcher is being locked due to other work on the cluster, this is + still an improvement +- the watcher now also restarts the node daemon and the rapi daemon if + they died +- fixed the watcher to handle full and drained queue cases +- hooks export more instance data in the environment, which helps if + hook scripts need to take action based on the instance's properties + (no longer need to query back into ganeti) +- instance failovers when the instance is stopped do not check for free + RAM, so that failing over a stopped instance is possible in low memory + situations +- rapi uses queries for tags instead of jobs (for less job traffic), and + for cluster tags it won't talk to masterd at all but read them from + ssconf +- a couple of error handling fixes in RAPI +- drbd handling: improved the error handling of inconsistent disks after + resync to reduce the frequency of "there are some degraded disks for + this instance" messages +- fixed a bug in live migration when DRBD doesn't want to reconnect (the + error handling path called a wrong function name) + + +Version 2.0.0 +------------- + +*(Released Wed, 27 May 2009)* + +- no changes from rc5 + + +Version 2.0 rc5 +--------------- + +*(Released Wed, 20 May 2009)* + +- fix a couple of bugs (validation, argument checks) +- fix ``gnt-cluster getmaster`` on non-master nodes (regression) +- some small improvements to RAPI and IAllocator +- make watcher automatically start the master daemon if down + + +Version 2.0 rc4 +--------------- + +*(Released Mon, 27 Apr 2009)* + +- change the OS list to not require locks; this helps with big clusters +- fix ``gnt-cluster verify`` and ``gnt-cluster verify-disks`` when the + volume group is broken +- ``gnt-instance info``, without any arguments, doesn't run for all + instances anymore; either pass ``--all`` or pass the desired + instances; this helps against mistakes on big clusters where listing + the information for all instances takes a long time +- miscellaneous doc and man pages fixes + + +Version 2.0 rc3 +--------------- + +*(Released Wed, 8 Apr 2009)* + +- Change the internal locking model of some ``gnt-node`` commands, in + order to reduce contention (and blocking of master daemon) when + batching many creation/reinstall jobs +- Fixes to Xen soft reboot +- No longer build documentation at build time, instead distribute it in + the archive, in order to reduce the need for the whole docbook/rst + toolchains + + +Version 2.0 rc2 +--------------- + +*(Released Fri, 27 Mar 2009)* + +- Now the cfgupgrade scripts works and can upgrade 1.2.7 clusters to 2.0 +- Fix watcher startup sequence, improves the behaviour of busy clusters +- Some other fixes in ``gnt-cluster verify``, ``gnt-instance + replace-disks``, ``gnt-instance add``, ``gnt-cluster queue``, KVM VNC + bind address and other places +- Some documentation fixes and updates + + +Version 2.0 rc1 +--------------- + +*(Released Mon, 2 Mar 2009)* + +- More documentation updates, now all docs should be more-or-less + up-to-date +- A couple of small fixes (mixed hypervisor clusters, offline nodes, + etc.) +- Added a customizable HV_KERNEL_ARGS hypervisor parameter (for Xen PVM + and KVM) +- Fix an issue related to $libdir/run/ganeti and cluster creation + + +Version 2.0 beta2 +----------------- + +*(Released Thu, 19 Feb 2009)* + +- Xen PVM and KVM have switched the default value for the instance root + disk to the first partition on the first drive, instead of the whole + drive; this means that the OS installation scripts must be changed + accordingly +- Man pages have been updated +- RAPI has been switched by default to HTTPS, and the exported functions + should all work correctly +- RAPI v1 has been removed +- Many improvements to the KVM hypervisor +- Block device errors are now better reported +- Many other bugfixes and small improvements + + +Version 2.0 beta1 +----------------- + +*(Released Mon, 26 Jan 2009)* + +- Version 2 is a general rewrite of the code and therefore the + differences are too many to list, see the design document for 2.0 in + the ``doc/`` subdirectory for more details +- In this beta version there is not yet a migration path from 1.2 (there + will be one in the final 2.0 release) +- A few significant changes are: + + - all commands are executed by a daemon (``ganeti-masterd``) and the + various ``gnt-*`` commands are just front-ends to it + - all the commands are entered into, and executed from a job queue, + see the ``gnt-job(8)`` manpage + - the RAPI daemon supports read-write operations, secured by basic + HTTP authentication on top of HTTPS + - DRBD version 0.7 support has been removed, DRBD 8 is the only + supported version (when migrating from Ganeti 1.2 to 2.0, you need + to migrate to DRBD 8 first while still running Ganeti 1.2) + - DRBD devices are using statically allocated minor numbers, which + will be assigned to existing instances during the migration process + - there is support for both Xen PVM and Xen HVM instances running on + the same cluster + - KVM virtualization is supported too + - file-based storage has been implemented, which means that it is + possible to run the cluster without LVM and DRBD storage, for + example using a shared filesystem exported from shared storage (and + still have live migration) + + +Version 1.2.7 +------------- + +*(Released Tue, 13 Jan 2009)* + +- Change the default reboot type in ``gnt-instance reboot`` to "hard" +- Reuse the old instance mac address by default on instance import, if + the instance name is the same. +- Handle situations in which the node info rpc returns incomplete + results (issue 46) +- Add checks for tcp/udp ports collisions in ``gnt-cluster verify`` +- Improved version of batcher: + + - state file support + - instance mac address support + - support for HVM clusters/instances + +- Add an option to show the number of cpu sockets and nodes in + ``gnt-node list`` +- Support OSes that handle more than one version of the OS api (but do + not change the current API in any other way) +- Fix ``gnt-node migrate`` +- ``gnt-debug`` man page +- Fixes various more typos and small issues +- Increase disk resync maximum speed to 60MB/s (from 30MB/s) + + +Version 1.2.6 +------------- + +*(Released Wed, 24 Sep 2008)* + +- new ``--hvm-nic-type`` and ``--hvm-disk-type`` flags to control the + type of disk exported to fully virtualized instances. +- provide access to the serial console of HVM instances +- instance auto_balance flag, set by default. If turned off it will + avoid warnings on cluster verify if there is not enough memory to fail + over an instance. in the future it will prevent automatically failing + it over when we will support that. +- batcher tool for instance creation, see ``tools/README.batcher`` +- ``gnt-instance reinstall --select-os`` to interactively select a new + operating system when reinstalling an instance. +- when changing the memory amount on instance modify a check has been + added that the instance will be able to start. also warnings are + emitted if the instance will not be able to fail over, if auto_balance + is true. +- documentation fixes +- sync fields between ``gnt-instance list/modify/add/import`` +- fix a race condition in drbd when the sync speed was set after giving + the device a remote peer. + + +Version 1.2.5 +------------- + +*(Released Tue, 22 Jul 2008)* + +- note: the allowed size and number of tags per object were reduced +- fix a bug in ``gnt-cluster verify`` with inconsistent volume groups +- fixed twisted 8.x compatibility +- fixed ``gnt-instance replace-disks`` with iallocator +- add TCP keepalives on twisted connections to detect restarted nodes +- disk increase support, see ``gnt-instance grow-disk`` +- implement bulk node/instance query for RAPI +- add tags in node/instance listing (optional) +- experimental migration (and live migration) support, read the man page + for ``gnt-instance migrate`` +- the ``ganeti-watcher`` logs are now timestamped, and the watcher also + has some small improvements in handling its state file + + +Version 1.2.4 +------------- + +*(Released Fri, 13 Jun 2008)* + +- Experimental readonly, REST-based remote API implementation; + automatically started on master node, TCP port 5080, if enabled by + ``--enable-rapi`` parameter to configure script. +- Instance allocator support. Add and import instance accept a + ``--iallocator`` parameter, and call that instance allocator to decide + which node to use for the instance. The iallocator document describes + what's expected from an allocator script. +- ``gnt-cluster verify`` N+1 memory redundancy checks: Unless passed the + ``--no-nplus1-mem`` option ``gnt-cluster verify`` now checks that if a + node is lost there is still enough memory to fail over the instances + that reside on it. +- ``gnt-cluster verify`` hooks: it is now possible to add post-hooks to + ``gnt-cluster verify``, to check for site-specific compliance. All the + hooks will run, and their output, if any, will be displayed. Any + failing hook will make the verification return an error value. +- ``gnt-cluster verify`` now checks that its peers are reachable on the + primary and secondary interfaces +- ``gnt-node add`` now supports the ``--readd`` option, to readd a node + that is still declared as part of the cluster and has failed. +- ``gnt-* list`` commands now accept a new ``-o +field`` way of + specifying output fields, that just adds the chosen fields to the + default ones. +- ``gnt-backup`` now has a new ``remove`` command to delete an existing + export from the filesystem. +- New per-instance parameters hvm_acpi, hvm_pae and hvm_cdrom_image_path + have been added. Using them you can enable/disable acpi and pae + support, and specify a path for a cd image to be exported to the + instance. These parameters as the name suggest only work on HVM + clusters. +- When upgrading an HVM cluster to Ganeti 1.2.4, the values for ACPI and + PAE support will be set to the previously hardcoded values, but the + (previously hardcoded) path to the CDROM ISO image will be unset and + if required, needs to be set manually with ``gnt-instance modify`` + after the upgrade. +- The address to which an instance's VNC console is bound is now + selectable per-instance, rather than being cluster wide. Of course + this only applies to instances controlled via VNC, so currently just + applies to HVM clusters. + + +Version 1.2.3 +------------- + +*(Released Mon, 18 Feb 2008)* + +- more tweaks to the disk activation code (especially helpful for DRBD) +- change the default ``gnt-instance list`` output format, now there is + one combined status field (see the manpage for the exact values this + field will have) +- some more fixes for the mac export to hooks change +- make Ganeti not break with DRBD 8.2.x (which changed the version + format in ``/proc/drbd``) (issue 24) +- add an upgrade tool from "remote_raid1" disk template to "drbd" disk + template, allowing migration from DRBD0.7+MD to DRBD8 + + +Version 1.2.2 +------------- + +*(Released Wed, 30 Jan 2008)* + +- fix ``gnt-instance modify`` breakage introduced in 1.2.1 with the HVM + support (issue 23) +- add command aliases infrastructure and a few aliases +- allow listing of VCPUs in the ``gnt-instance list`` and improve the + man pages and the ``--help`` option of ``gnt-node + list``/``gnt-instance list`` +- fix ``gnt-backup list`` with down nodes (issue 21) +- change the tools location (move from $pkgdatadir to $pkglibdir/tools) +- fix the dist archive and add a check for including svn/git files in + the future +- some developer-related changes: improve the burnin and the QA suite, + add an upload script for testing during development + + +Version 1.2.1 +------------- + +*(Released Wed, 16 Jan 2008)* + +- experimental HVM support, read the install document, section + "Initializing the cluster" +- allow for the PVM hypervisor per-instance kernel and initrd paths +- add a new command ``gnt-cluster verify-disks`` which uses a new + algorithm to improve the reconnection of the DRBD pairs if the device + on the secondary node has gone away +- make logical volume code auto-activate LVs at disk activation time +- slightly improve the speed of activating disks +- allow specification of the MAC address at instance creation time, and + changing it later via ``gnt-instance modify`` +- fix handling of external commands that generate lots of output on + stderr +- update documentation with regard to minimum version of DRBD8 supported + + +Version 1.2.0 +------------- + +*(Released Tue, 4 Dec 2007)* + +- Log the ``xm create`` output to the node daemon log on failure (to + help diagnosing the error) +- In debug mode, log all external commands output if failed to the logs +- Change parsing of lvm commands to ignore stderr + + +Version 1.2 beta3 +----------------- + +*(Released Wed, 28 Nov 2007)* + +- Another round of updates to the DRBD 8 code to deal with more failures + in the replace secondary node operation +- Some more logging of failures in disk operations (lvm, drbd) +- A few documentation updates +- QA updates + + +Version 1.2 beta2 +----------------- + +*(Released Tue, 13 Nov 2007)* + +- Change configuration file format from Python's Pickle to JSON. + Upgrading is possible using the cfgupgrade utility. +- Add support for DRBD 8.0 (new disk template ``drbd``) which allows for + faster replace disks and is more stable (DRBD 8 has many improvements + compared to DRBD 0.7) +- Added command line tags support (see man pages for ``gnt-instance``, + ``gnt-node``, ``gnt-cluster``) +- Added instance rename support +- Added multi-instance startup/shutdown +- Added cluster rename support +- Added ``gnt-node evacuate`` to simplify some node operations +- Added instance reboot operation that can speedup reboot as compared to + stop and start +- Soften the requirement that hostnames are in FQDN format +- The ``ganeti-watcher`` now activates drbd pairs after secondary node + reboots +- Removed dependency on debian's patched fping that uses the + non-standard ``-S`` option +- Now the OS definitions are searched for in multiple, configurable + paths (easier for distros to package) +- Some changes to the hooks infrastructure (especially the new + post-configuration update hook) +- Other small bugfixes + +.. vim: set textwidth=72 syntax=rst : .. Local Variables: .. mode: rst .. fill-column: 72