Introduce 'global hypervisor parameters' support
This patch adds support for global hypervisor parameters in instancecreation, instance modification, instance query and at instance loadtime.
We basically prevent any query on these parameters, discard them at load...
Remove the KVM_MIGRATION_PORT configure.ac param
Since this is easily configurable at run-time, we remove theconfigure-time parameter. If anyone is building custom packages, thenthe default can be tweaked by a one-line patch to constants.py.
Note that this also fixes the type of parameter, the default from...
gnt-*: Print better error message for uninitialized cluster
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Cache JSON encoders and sort keys
The sort_keys argument is supported since simplejson 1.3.
Add new “daemon-util” script to start/stop Ganeti daemons
Until now, Ganeti started and stopped its own daemons using custom functions.To start, the daemon was just executed and then sent the appropriate signals tostop it again. Init scripts would have to pay attention to the PID file and...
kvm console: use socat raw mode with escape
If this is enabled at configure time, we pass in different parameters tothe socat console, making it a lot more manageable.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Migration: add check for listening target
This patch adds a check for listening on the remote port in Xen and KVMmigrations. This will be generating a single “load of migration failed”message for KVM, but otherwise not prevent the migration. For Xen (which...
TLMigrateInstance: add error messagess during Exec
Currently the migration of an instance doesn't show any error until theend. We add two messages that show better the progress:
node1# gnt-instance migrate -f instance5Wed Nov 4 04:04:34 2009 Migrating instance instance5...
hypervisors: switch to using HV_MIGRATION_PORT
This changes KVM to use HV_MIGRATION_PORT instead of KVM_MIGRATION_PORTand enables passing the port for Xen migrations.
Since KVM_MIGRATION_PORT is not used anymore, we stop exporting it fromconstants.py....
Introduce HV_MIGRATION_PORT hypervisor parameter
This parameter will replace the direct use of KVM_MIGRATION_PORT and theimplicit use of the Xen migration port.
While it doesn't make sense to change this at instance level, we don'thave any other infrastructure for cluster-wide hypervisor parameters, so...
hypervisors: change MigrateInstance API
Currently the $hypervisor.MigrateInstance takes the instance name. Thispatch changes it to take the instance object, such that other instanceproperties (especially hvparams) are available to it.
Signed-off-by: Iustin Pop <iustin@google.com>...
Revert the instance IP conflicts
Since instances can live in different VLANs from nodes (especially inrouted mode), based on the 'link' parameter, we shouldn't alwaysrestrict having duplicate IPs. Thus we only check the node IPs/clusterIP for now.
Introduce a wrapper for hostname resolving
Currently a few of the LU's CheckPrereq use utils.HostInfo which raisesa resolver error in case of failure. This is an exception from thestandard that CheckPrereq should raise an OpPrereqError if the error isin the 'pre' phase (so that it can be retried)....
Add a configuration verify check for duplicate IPs
This patch adds a check that the cluster IP, the nodes primary (andsecondary, if enabled) IP and the instances NIC IPs are unique in thecluster.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Workaround fake failures in drbd+live migration
This patch is an attempt to fix the ugly issue during migration: Cannot resync disks on node …: [True, 100]
If my understanding is correct, sometimes we poll the /proc/drbd file atan inoportune moment, while it's being updated, or while the DRBD device...
Another round of pylint-related style fixes
A newer version of pylint, more warnings…
Revert "kvm console: use socat raw mode with escape"
This reverts commit ce0eb6694e3fb2510035501539c7acc92a0f174e, since it dependson 37fc2cf5ba8919cef407199ee540aad4b1a9a2b6 which will be reverted too.
Change behaviour of ConfigWriter._WriteConfig
This patch changes the behaviour of _WriteConfig in case ofconfiguration errors:
- before, it used to abort the saving (even though the in-memory configuration used by current jobs has already changed)- now, we log it (both to the log and to the user) but continue, since...
utils: Convert to utils.Retry
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Throw specific error when ':' exists in PV names
While ':' is not actually a supporte character in PV names (it has aspecial meaning for commands like lvcreate), we should throw specificerrors for this case instead of generic “Can't create LV”.
This patch does two things:...
Change bdev.LogicalVolume.GetPVInfo usage
We will need to enumerate selectively the PVs of (possible) many VGs andnot only the allocatable ones. For this we make the VG selection and theallocatable filtering optional. The two callers are modified for this...
Implement cluster verify checks for wrong PV names
Since ':' is not a valid character in PV names (for the way Ganeti usesLVM), we need to check this and warn the user. This patch adds a newNV_PVLIST cluster verify check and verifies the PV names returned from...
jqueue: Convert to utils.Retry
hv_xen: Convert to utils.Retry
bootstrap: Convert to utils.Retry
bdev: Convert to utils.Retry
Also replaces a hardcoded limit of 15 seconds with 1/4of NET_RECONFIG_TIMEOUT.
backend: Convert to utils.Retry
Add generic retry loop function
There are quite a few retry loops with timeouts in Ganeti'scode. Duplicating code is not good, so this patch introducesa new function named “utils.Retry” to remedy this situation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Some improvements to gnt-node repair-storage
Currently the repair storage has two issues:
- down instances are aborting the operation, even though they should be ignored (it's not technically possible to know their disk status unless we would activate their disks)...
Convert the rest of the OpPrereqError users
This finishes the conversion of OpPrereqError creation to two-argumentstyle. Any leftovers as one-argument are not breaking anything, justlosing information about the errors.
Add ecode to rpc.py's RpcResult.Raise()
This patch adds a new ecode argument to RpcResult.Raise(). This allowsspecifying the error code (for both OpExec and OpPrereq errors).
Note that this patch also makes the OpExecError exceptions raised from_FindFaultInstanceDisks have the error code classification....
Introduce two-argument style for OpPrereqError
This patch introduces a two-argument style for OpPrereqError. Only thedirect raise calls in cmdlib.py are converted, other users will follow.
cli.py is modified to handle both two-argument style and the current...
Remove the OpRetryError exception
This is only used in two places, in an error path that is no longervalid since Ganeti 2.0. We remove the try..except since we should notget it anymore (and if we do, then we should catch it in allconfig.Update cases) and we remove the exception class completely....
Activate disks while exporting an instance
Exporting an instance not running or without activated diskswill fail. This patch makes sure to activate disks beforeexporting an instance if it's in the ADMIN_down state.
Epydoc fixes
backend: Don't overwrite function parameter with loop variable
Unify the query fields for the storage framework
This patch unifies the query fields in the storage framework for alltypes. Note that the information is still computed on-demand, so if e.g.the used disk space is not requested for the ‘file’ type, it won't be...
Make cluster initialization more reliable
There was a race condition between starting the node daemonand sending requests to write the ssconf files. With thispatch, the initialization waits up to ten seconds for thenode daemon to become responsive.
Don't show warnings on ADMIN_down instance failover
Before:$ gnt-instance failover -f inst1… checking disk consistency between source and target… - WARNING: Can't find disk on node node21.example.com… shutting down instance on source node
After:$ gnt-instance failover -f inst1...
http.auth: Add new function to verify passwords
This new function supports two schemes for passwords:- Old-style cleartext passwords- Hashed passwords according to RFC2617 (H(A1))
Schemes are differentiated by their prefix, a concept alsoused in OpenLDAP. Cleartext passwords can no longer start...
Fix another style issue
For the Nth time, re-fix shadowing of outer-scope variable :)
Fix an error handling case in TLReplaceDisks
pylint is your friend, since the compiler doesn't exist.
Provide feedback from redistributing configuration
This is particularily useful for “gnt-cluster redist-conf”, butalso for all other cases where the configuration files arerewritten on other nodes.
$ gnt-cluster redist-conf… Copy of file /var/lib/ganeti/config.data to node … failed: Error while...
Fix gnt-node evacuate w. iallocator
Commit 2bb5c911 moved around and changed the _RunAllocator function inthe DiskReplace → TaskLet conversion, but in the process it changed therelocate_from argument from a list of nodes to just the secondary node.This breaks the protocol and current iallocator scripts....
InstanceIpToNodePrimaryIpQuery: use a query dict
In 95b487b we changed InstanceIpToNodePrimaryIpQuery to be able to querymultiple instances at once. We also need to be able to query ipsbelonging to a specific nic link, so what we do is:
1) Move the "query" argument to a dict, containing different fields...
SimpleConfigReader: ips are partitioned by link
We were already half-doing it, but this completes the process.
1) We don't maintain a list of ips or an ip->instance map2) We add a new link,ip->instance map (link->ips list we had)3) We add the link parameter to GetInstanceByIp (making it...
SimpleConfigReader: queries for default nicparams
GetDefaultNicParams returns the default nic parameters.GetDefaultNicLink returns the default nic link.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Import errors in confd init
It's used by some functions defined there.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Allow '@' in tag values
This allows using an email address (as is) as part of a tag. The mainproblem that could arise is when parsing tags from a shell script, but(AFAIK) '@' is not a special character when used in values (happy to becorrected if not true)....
cmdlib._AssembleInstanceDisks: Fix case where variable wouldn't be set
The “result” variable may not be set and/or come from the previous loop.
KVM netscript: add static routes, with no suffix
The /32 suffix is useless, since the kernel already assumes single-host,if no suffix is specified. Moreover we prefer these routes to be"static" so that routing daemons, if present, won't mess with them....
Adding '--no-ssh-init' option to 'gnt-cluster init'.
Allows the initialization of a cluster without the creation or distributionof SSH key pairs. Includes changes for LeaveCluster and RPC.
Signed-off-by: Ken Wehr <ksw@google.com>Signed-off-by: Guido Trotter <ultrotter@google.com>...
KVMHypervisor: implement instance policy routing
Until now we relied on traffic from instances being policy routed via arule based on the instance network. With this change we can enforce iton the instance interfaces. Since the ip rules survive interface...
KVMHypervisor: configure v6 parameters on nic
In routing mode we are tweaking a few parameters on the interface. Withthis patch we'll tweak both the v4 and v6 ones.
confd: query the pnode of multiple instances at once
Signed-off-by: Flavio Silvestrow <flaviops@google.com>Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Try to reduce wrong errors in InstanceShutdown
In backend.InstanceShutdown(), there is a race condition betweenchecking that the instance exists and trying to shut it down whichtranslates sometime in error messages like:
Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed...
Revert breakage introduced in e4e9b80
Commit e4e9b8064787df01a79846a40f49c8ae06a8eb0e introduced two problemsin backend.InstanceShutdown():
- first, it reduced the check interval significantly (especially for the first few checks); there are very few production VMs that shutdown in...
Xen: Ignore the retry argument in stop instance
Commit 4ad4511 changed the KVM hypervisor to send multiple shutdownrequests to the monitor, but it didn't change this for the Xenhypervisor. We simply remove the return on retry model, since we do wantto send multiple shutdown signals for both Xen and KVM (even if the...
Ensure RpcResult has “payload” attribute
Also add assertions to avoid missing attributes in the future.They won't be included in optimized bytecode.
Introduce checks for /sys and /proc
This patch adds checks for /proc and /sys in cluster verify, sinceGaneti relies on these special filesystems to be mounted.
Fix serializer unittests
Commit d22b29997cd broke the serializer unittests with certainversions of simplejson. This patch removes sort_keys againand implements a slightly more efficient way of detectingsimplejson functionality. The serializer unittests no longer...
bootstrap: Factorize HMAC key generation
Make bootstrap._GenerateSelfSignedSslCert public
serializer: Sort keys in JSON
mcpu: Use new timeout class for timeout
locking: Convert pipe condition to new timeout class
locking.LockSet: Move timeout calculation to separate class
This class can also be used by mcpu.
locking, mcpu: Ensure timeout is always >= 0.0
locking.LockSet: Improve assertions
locking: Factorize LockSet.acquire
By moving the main code of LockSet.acquire to its own functionwe reduce the code complexity a bit and clarify the exceptionhandling.
This also fixes a case where a lock acquire timeout wasn'thandled correctly, leading to obscure error messages....
mcpu: Make sure added locks are released on errors
opcodes: Add missing shutdown_timeout to OpRemoveInstance
luxi: Pass socket path directly to exception, not in tuple
gnt-* use the correct opcode slot to build opcodes
gnt-* scripts were building wrong opcodes for commands which had theshutdown_timeout slot (due to missing testing after renaming). Fixing.
Also change SHUTDOWN_TIMEOUT_OPT dest field name to "shutdown_timeout":...
rapi: fix tag operations
This patch fixes the tag PUT/DELETE operations, and additionally changesthe Tags* functions to take only positional and not keyword arguments(the defaults do not make any sense at all, and they are always calledwith all arguments)....
cli: add SHUTDOWN_TIMEOUT_OPT
Add timeout options to other LUs
All the LUs that shut down the instance need to be able too pass thetimeout parameter as well.
mcpu: Change lock attempt timeout calculation
With this patch all timeouts are pre-calculated. The interface ofthe _LockTimeoutStrategy class is also changed a bit; NextAttemptnow returns a new instance.
Code and docstring style fixes
Found using pylint and epydoc.
mcpu: Improve lock reporting with timeouts
mcpu: Implement lock timeouts
The timeout is always between ~0.1 and ~10.0 seconds. A smallvariation of ±5% is added to prevent different jobs fromfighting each other. After 10 attempts to acquire the locks witha timeout, a blocking acquire is made.
Lock status reporting will be improved in a separate patch....
mcpu: Remove unused exclusive_BGL attribute
locking.LockSet: Implement acquire timeouts
The timeout passed to LockSet.acquire() is measured over all lock acquires. IfLockSet.acquire fails to acquire all requested locks within the specifiedamount of time, all locks are released again and the acquire fails....
Accept shutdown timeout from the user
Using the new --timeout option:
- gnt-instance shutdown is changed to accept a timeout- the opcode is changed to hold one- the LU is changed to optionally get one- the rpc is changed to carry one- the backend is changed to take it as a parameter rather than...
ChrootManager: clean StopInstance
Currently it has lots for duplicated code, and internal retries.Clean it up with the following assumptions:
We'll probably be called more than once.It is ok to fail to stop, unless we're called with force=True.If we're called only once, and with force=True it's ok not to run the...
cli: add a timeout option
KVMHypervisor: use the StopInstance retry feature
Since we know StopInstance is going to be called more than once (atleast twice, once with force and once without, but normally quite a lotmore) we don't need our own sleep/loop, and we can just send one monitor...
backend.InstanceShutdown: small cleanup
1) unhardcode the timeout, abstracting it in a constant2) Use time.time() rather than hiding the timeout in a range()3) call hyper.StopInstance multiple times -- currently all hypervisors just ignore all calls but once...
Add default instance shutdown timeout constant
It reflects the "current" two minutes we give to the instance.
Hypervisors: Add retry= to StopInstance
Currently some hypervisors need the stop operations to be retried morethan once, while other ones only do it in one pass. With this changewe'll handle retries outside the hypervisor code, but telling whetherthis is the first try or not....
Get rid of utils.CommaJoin
- We never remember to use it (5 uses vs 21 " ,".join())- It's longer to write than " ,".join()- The added value of the apostrophe in the string is not very much
Match instance and node names case insensitively
Since DNS cannot contain two names with different cases anyway, thisshould be ok.
Add case_sensitive keyword to MatchNameComponent
Now featuring unit testing, and more deterministic results on somecorner cases.
VNC password: move to hv param and use in kvm
Check the OS name for variants
If an OS supports variants, unless --force-variant is specified a validvariant must be passed.
Add force_variant slot to Create/ReinstallInstance
These two opcode need to know whether an unknown variant must be forcedthrough or not.
Allow --force-variant for instance add/reinstall
Passing this option makes an undeclared variant be passed to the os "asis", hoping it'll be able to figure it out (as per the design doc).
Update client os lists to name+variant format
List of OSes are displayed by gnt-os list, rapi, and gnt-instancereinstall --select-os, and checked by burnin. In all of these show thelist with name+variant, if the os has variants.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
cli.CalculateOSNames
Given an os and its variants, return a list of "full" os names.