DRBD: ignore unreadable meta devices
The DRBD driver can ignore dead disks but not dead meta devices (forwhich it refuses to configure the minor). To handle this case, we checkthat the meta device is readable and if not we ignore it (the same aswhen backend._RecursiveAssembleBD didn't find it)....
gnt-cluster verify: Warn if node time diverges too far
The warning will be generated if the clocks diverge by morethan 150 seconds. Due to the way the RPC system works, wecannot get exact time differences, e.g. if one of thequeried nodes is broken. The comparision is done using a...
KVM: fail when a routed nic has no ip
This shouldn't happen, but if it does it's better to fail at this level,rather than create a broken NIC script, which is hard to debug.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Enable batch mode for devel/upload
Since the rsync/ssh calls are done in parallel, they can't read properly apassword or confirmation about keys from stdin. As such, it's better to enablebatch mode so that they fail right away instead of prompting and then timing...
cmdlib: Work around race condition in DRBD before version 8.0.13
DRBD goes into sync mode for a short amount of time afterexecuting the "resize" command. DRBD 8.x below version8.0.13 contains a bug whereby calling "resize" in syncmode fails.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Bump version to 2.1.0~rc1
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Remove testJsonIndent unittest
It can't work on older distributions where simplejsondoesn't have indentation support.
QA: improve usability with cluster-init: False
When not initialising the cluster, consider all nodes are added, so thatmulti-node tests (e.g. export, replace) work correctly (if there arenodes, of course).
Signed-off-by: Iustin Pop <iustin@google.com>...
Remove quotes from CommaJoin and convert to it
This patch removes the quotes from CommaJoin and converts most of thecallers (that I could find) to it. Since CommaJoin does str(i) for i inparam, we can remove these, thus simplifying slightly a few calls....
Re-add “nic.bridges” field to RAPI bulk instance list
Commit 495cfdf0 removed “nic.bridges” from the defaultlist for bulk instance list RAPI requests.
build-bash-completion: Check for None before comparing
Comparing a number with None is not a good idea:
>>> (0 < None, 0 > None) (False, True)
This patch also adds build-bash-completion to the listof checked Python scripts and wraps one line of more...
Revert "Get rid of utils.CommaJoin"
This reverts commit 6915bc28fe053e92aa16cf2d974d205f1140219c based on thread onganeti-devel.
Conflicts:
lib/cmdlib.py (due to the error code classification, trivial)
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add check for OpenSSL entropy status
By checking for this explicitly, the errors (SSLEAY_RAND_BYTES, “PRNGnot seeded”) will happen in the start-up phase of the daemon and notonly when executing remote procedure calls.
A couple of doc updates
Clarify the fact that temporary HV/BE params in instance start overrideand do not extend the configured parameters; and change the instancelist headers from HVM_* to * since many of the parameters apply to KVMtoo. Also fix a typo in the rapi documention for '/2/nodes'....
Handle EEXIST in utils.RenameFile
This should fix an issue I've seen exactly once during testing. It might havebeen caused by parallel RPC calls to archive jobs.
[…] ganeti-noded:112 ERROR Error in RPC call […] File "/usr/lib/python2.4/site-packages/ganeti/backend.py", line 2365, in JobQueueRename...
Remove unused parameter “unlock” from cmdlib._WaitForSync
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix off-by-one error when modifying instance NIC
For an instance with exactly one NIC:
$ gnt-instance modify --net 1:ip=1.2.3.4 inst1Failure: prerequisites not met for this operation:error type: wrong_input, error details:Invalid NIC index 1, valid values are 0 to 1...
Re-add check for duplicate instance IP
This was originally implemented in 0ce8f948 and partiallyrolled back in 9b65e0d4. Apart from re-adding the check,this patch does some housekeeping by renaming the “_helper”function to “_AddIpAddress”.
Fix gnt-instance list documentation
(1) Both the man page and the online help report the link and modefields, which are in the code called nic_link and nic_mode.(2) Add missing fields to the online help.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
config: Style fixes
Add packaging notes to documentation
This includes a few paragraphs about daemon-util.
Fix epydoc error
sphinx: Treat warnings as errors
This makes it easier to catch warnings.
Include INSTALL in documentation
Convert INSTALL to RST
This is in preparation to including it into the largedocumentation.
Fix change of cluster nic parameters
To stay on the safe side, we check for errors in all instances, andrefuse to act, reporting on the errors we found, if there are anyproblems.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix mispopulation of nic parameters at nic modify
There's a bug in Ganeti 2.1 rc0 that makes nic parameters be populatedfrom the "filled in" dict, even if we're not changing any values inthem. This patch fixes the problem, by populating them from the correct...
NIC.CheckParameterSyntax: fix bridged check
We should match for the strings to be the same "==" not to point to thesame memory location with is, or we skip the actual check.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Bump version to 2.1.0~rc0
Also add one item to NEWS.
Update RAPI documentation on job results
This documents the new error classifier added for OpPrereqError.
Revert "Backport AC_PATH_PROGS_FEATURE_CHECK"
This reverts commit 52b699ecaa688a2aaac00fa64558e249d0bc9a26.
Fix and simplify socat escape detection
- Program paths should not be --with-… options (see Autoconf docs)- Simplify checks for escape functionality- Make SOCAT_USE_ESCAPE variable a bool
Use “daemon-util” to reload SSH keys
KVMHypervisor: fix broken error format string
ConfigWriter: move _temporary_ids to reservation
In order to do this we need to pass a job id when reserving a resource.We have one during _EnsureUUIDs because we passed it in from AddNode andAddInstance. During config upgrade we use a fake job ID which we then...
ConfigWriter: move _temporary_macs to reservation
This solves the race conditions in mac reservation, as macs are actuallyreserved, under the current ec id.
ConfigWriter: simplify GenerateDRBDSecret
We can do this by adding a new TemporaryReservationManager
TemporaryReservationManager
Add errors.ReservationError
Remove exceptions list from GenerateUniqueID
It's not used anywhere, so it's dead code.
Processor: support a unique execution id
When the processor is executing a job, it can export the execution id toits callers. This is not supported for Queries, as they're not executedin a job.
Add config.DropECReservations
For now this function does nothing, but it gets called by mcpu when theexecution of an LU is done, making sure any pending reservations aredropped.
config.Add{Node,Instance}: get the ec id
This is ok because adding a node or instance cannot happen in a query.
We get the ec id from the LU and pass it to _EnsureUUID, which willthen for now not use it.
Fix pylint 'E' (error) codes
This patch adds some silences and tweaks the code slightly so that“pylint --rcfile pylintrc -e ganeti” doesn't give any errors.
The biggest change is in jqueue.py, the move of _RequireOpenQueue out ofthe JobQueue class. Since that is actually a function and not a method...
A few more small documentation updates
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Remove obsolete statement in autogen.sh
Nowadays we have actual files (tracket by VCS) in autotools/, so we knowthe directory exists.
Add use_localtime parameter for xen-hvm and kvm
Currently xen-hvm and kvm use different real time clock by default. Toreduce confusion, this patch adds an optional use_localtime parameter.
If the real time clock on the instance is set to local time, the...
Introduce 'global hypervisor parameters' support
This patch adds support for global hypervisor parameters in instancecreation, instance modification, instance query and at instance loadtime.
We basically prevent any query on these parameters, discard them at load...
Remove the KVM_MIGRATION_PORT configure.ac param
Since this is easily configurable at run-time, we remove theconfigure-time parameter. If anyone is building custom packages, thenthe default can be tweaked by a one-line patch to constants.py.
Note that this also fixes the type of parameter, the default from...
Documentation updates for the global hvparams
This patch does multiple documentation updates for the new framework,all pretty straightforward.
Fix the init script
The rewrite after the introduction of the daemon-util script has acopy-paste error.
gnt-*: Print better error message for uninitialized cluster
Cache JSON encoders and sort keys
The sort_keys argument is supported since simplejson 1.3.
Add new “daemon-util” script to start/stop Ganeti daemons
Until now, Ganeti started and stopped its own daemons using custom functions.To start, the daemon was just executed and then sent the appropriate signals tostop it again. Init scripts would have to pay attention to the PID file and...
configure: check for socat and its escape feature
Currently we use a static value for the socat path, or we trust theuser-provided one. With this patch we still trust any user providedvalue, but if none is passed we check for socat on the machine we're...
kvm console: use socat raw mode with escape
If this is enabled at configure time, we pass in different parameters tothe socat console, making it a lot more manageable.
Backport AC_PATH_PROGS_FEATURE_CHECK
In order to allow working with older versions of autoconf we backportthis macro, but only if it's not defined already (by autoconf itself).
This commit can be reverted after we decide support for autoconf 2.61and below should be deprecated....
Migration: add check for listening target
This patch adds a check for listening on the remote port in Xen and KVMmigrations. This will be generating a single “load of migration failed”message for KVM, but otherwise not prevent the migration. For Xen (which...
TLMigrateInstance: add error messagess during Exec
Currently the migration of an instance doesn't show any error until theend. We add two messages that show better the progress:
node1# gnt-instance migrate -f instance5Wed Nov 4 04:04:34 2009 Migrating instance instance5...
hypervisors: switch to using HV_MIGRATION_PORT
This changes KVM to use HV_MIGRATION_PORT instead of KVM_MIGRATION_PORTand enables passing the port for Xen migrations.
Since KVM_MIGRATION_PORT is not used anymore, we stop exporting it fromconstants.py....
Introduce HV_MIGRATION_PORT hypervisor parameter
This parameter will replace the direct use of KVM_MIGRATION_PORT and theimplicit use of the Xen migration port.
While it doesn't make sense to change this at instance level, we don'thave any other infrastructure for cluster-wide hypervisor parameters, so...
hypervisors: change MigrateInstance API
Currently the $hypervisor.MigrateInstance takes the instance name. Thispatch changes it to take the instance object, such that other instanceproperties (especially hvparams) are available to it.
Revert the instance IP conflicts
Since instances can live in different VLANs from nodes (especially inrouted mode), based on the 'link' parameter, we shouldn't alwaysrestrict having duplicate IPs. Thus we only check the node IPs/clusterIP for now.
Update gitignore rules
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Introduce a wrapper for hostname resolving
Currently a few of the LU's CheckPrereq use utils.HostInfo which raisesa resolver error in case of failure. This is an exception from thestandard that CheckPrereq should raise an OpPrereqError if the error isin the 'pre' phase (so that it can be retried)....
Add a configuration verify check for duplicate IPs
This patch adds a check that the cluster IP, the nodes primary (andsecondary, if enabled) IP and the instances NIC IPs are unique in thecluster.
Workaround fake failures in drbd+live migration
This patch is an attempt to fix the ugly issue during migration: Cannot resync disks on node …: [True, 100]
If my understanding is correct, sometimes we poll the /proc/drbd file atan inoportune moment, while it's being updated, or while the DRBD device...
Another round of pylint-related style fixes
A newer version of pylint, more warnings…
Revert "configure: check for socat and its escape feature"
This reverts commit 37fc2cf5ba8919cef407199ee540aad4b1a9a2b6, since itintroduces configure.ac changes that depend on very very new autoconfmacros that are not present in current stable distros (and it was not...
Revert "kvm console: use socat raw mode with escape"
This reverts commit ce0eb6694e3fb2510035501539c7acc92a0f174e, since it dependson 37fc2cf5ba8919cef407199ee540aad4b1a9a2b6 which will be reverted too.
Add an example script for backing up the config
This requires git and lockfile-progs, and only backs up config.data (seethe comments why).
Change behaviour of ConfigWriter._WriteConfig
This patch changes the behaviour of _WriteConfig in case ofconfiguration errors:
- before, it used to abort the saving (even though the in-memory configuration used by current jobs has already changed)- now, we log it (both to the log and to the user) but continue, since...
Fix version number in README
utils: Convert to utils.Retry
Throw specific error when ':' exists in PV names
While ':' is not actually a supporte character in PV names (it has aspecial meaning for commands like lvcreate), we should throw specificerrors for this case instead of generic “Can't create LV”.
This patch does two things:...
Change bdev.LogicalVolume.GetPVInfo usage
We will need to enumerate selectively the PVs of (possible) many VGs andnot only the allocatable ones. For this we make the VG selection and theallocatable filtering optional. The two callers are modified for this...
Implement cluster verify checks for wrong PV names
Since ':' is not a valid character in PV names (for the way Ganeti usesLVM), we need to check this and warn the user. This patch adds a newNV_PVLIST cluster verify check and verifies the PV names returned from...
jqueue: Convert to utils.Retry
hv_xen: Convert to utils.Retry
bootstrap: Convert to utils.Retry
bdev: Convert to utils.Retry
Also replaces a hardcoded limit of 15 seconds with 1/4of NET_RECONFIG_TIMEOUT.
backend: Convert to utils.Retry
Add generic retry loop function
There are quite a few retry loops with timeouts in Ganeti'scode. Duplicating code is not good, so this patch introducesa new function named “utils.Retry” to remedy this situation.
Ignore log messages in unittests
Some improvements to gnt-node repair-storage
Currently the repair storage has two issues:
- down instances are aborting the operation, even though they should be ignored (it's not technically possible to know their disk status unless we would activate their disks)...
Convert the rest of the OpPrereqError users
This finishes the conversion of OpPrereqError creation to two-argumentstyle. Any leftovers as one-argument are not breaking anything, justlosing information about the errors.
Add ecode to rpc.py's RpcResult.Raise()
This patch adds a new ecode argument to RpcResult.Raise(). This allowsspecifying the error code (for both OpExec and OpPrereq errors).
Note that this patch also makes the OpExecError exceptions raised from_FindFaultInstanceDisks have the error code classification....
Introduce two-argument style for OpPrereqError
This patch introduces a two-argument style for OpPrereqError. Only thedirect raise calls in cmdlib.py are converted, other users will follow.
cli.py is modified to handle both two-argument style and the current...
Remove the OpRetryError exception
This is only used in two places, in an error path that is no longervalid since Ganeti 2.0. We remove the try..except since we should notget it anymore (and if we do, then we should catch it in allconfig.Update cases) and we remove the exception class completely....
Activate disks while exporting an instance
Exporting an instance not running or without activated diskswill fail. This patch makes sure to activate disks beforeexporting an instance if it's in the ADMIN_down state.
Epydoc fixes
backend: Don't overwrite function parameter with loop variable
Add QA test for “gnt-node {list,modify,repair}-storage”
Unify the query fields for the storage framework
This patch unifies the query fields in the storage framework for alltypes. Note that the information is still computed on-demand, so if e.g.the used disk space is not requested for the ‘file’ type, it won't be...
Make cluster initialization more reliable
There was a race condition between starting the node daemonand sending requests to write the ssconf files. With thispatch, the initialization waits up to ten seconds for thenode daemon to become responsive.
Don't show warnings on ADMIN_down instance failover
Before:$ gnt-instance failover -f inst1… checking disk consistency between source and target… - WARNING: Can't find disk on node node21.example.com… shutting down instance on source node
After:$ gnt-instance failover -f inst1...
Update NEWS
Add rapi_users changes, rearrange a bit and one wording change.
Add remote API users and passwords documentation