History | View | Annotate | Download (327.4 kB)
Instance creation: implement --no-install mode
This is a simple patch that adds the no-install mode for instancecreation, allowing import from foreign source of the actual OS (insteadof requiring the preparation of data in a form expected by the import...
Allow OS changes without reinstallation
This patch modifies LUSetInstanceParms to allow OS name changes, withoutreinstallation, in case an OS gets renamed on-disk.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
cmdlib: Abstract OS checks
This patch moves the node-has-os checks to a separate function.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix behaviour of gnt-node modify -C no
The current check on whether we require auto_promote or not is wrong, aswe check whether we will have exactly the correct number of mastercandidates left. But it is fine if we have more (e.g. when CPS=10 andmc_remaning=19) than the current number, and in that case we shouldn't...
Rightname confd's HMAC key
Currently, the ganeti-confd's HMAC key is called “cluster HMAC key” orsimply “HMAC key” everywhere. With the implementation of inter-clusterinstance moves, another HMAC key will be introduced for signing criticaldata. They can not be the same, so this patch clarifies the purpose of the...
Implement conversion from drbd to plain
This is much simpler than the opposite, with fewer possibilities offailures.
Implement conversion from plain to drbd
This patch adds a new mode to instance modify, the changing of the disktemplate. For now only plain to drbd conversion is supported, and thenew secondary node must be specified manually (no iallocator support).
The procedure for conversion works as follows:...
Abstract check that an instance is down
Multiple LUs require that an instance is not running while they operateon the instance (reinstall, rename, modify, recreate disks, deactivatedisks). The code to do this check is duplicate many times, and not very...
Abstract node free disk space check
Both create instance and grow disk check the free disk space on nodesusing the same, duplicate code. Since we'll need this in other places inthe future, we abstract the check into a new function.
The patch adjusts the error message to be more in-line with the one for...
Abstract disk template verification
This is a simple check, but we'll need it in multiple places.
LUCreateInstance: implement disk adoption mode
This new mode, valid only for the plain template disk, allows creationof an instance based on existing logical volumes (preserving data),rather than creation of new volumes and OS creation.
The new mode works as follows:...
LUCreateInstance: Move parameter init earlier
This way, the parameters are available in CheckArguments too.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Verify cluster certificates in LUVerifyCluster
When using pyOpenSSL 0.7 or above, LUClusterVerify will start to show awarning 30 days before a certificate expires. 7 days before thecertificate expires, the warning becomes an error. Once expired,LUVerifyCluster will always report an error. The latter is also supported...
Add constant with cluster X509 certificates
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Merge branch 'stable-2.1' into devel-2.1
Improve cluster verify with hypervisor errors
In case the hypervisor has issues on one node, currentlybackend.VerifyNode will exit via an exception (two exit paths possible,one via HypervisorError from hypervisor.Verify(), and one via RPCFailfrom GetInstanceList). This is bad as it invalidates all other checks of...
Fix cluster verify with simulate-errors
In simulate errors mode, the test "ntime_diff is not None" will beignored, and thus a None value will try to be formatted as %.01f. Weworkaround this by formatting it before, and then only using %s, whichcan format a 'None' value....
Validate the os-specific hypervisor parameters
This adds a validation similar to the one for cluster-wide hypervisorparamters.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Rework the node modify for mc-demotion
The current code in LUSetNodeParms regarding the demotion from mastercandidate role is complicated and duplicates the code in ConfigWriter,where such decisions should be made. Furthermore, we still cannot demotenodes (not even with force), if other regular nodes exist....
Fix typo that makes cluster verify to ignore hooks
The return from LUVerifyCluster should be True (or equivalent) for pass,and False (or equivalent) for fail. The HooksCallBack function uses '1'(= True) when a hook fails, which is exactly the opposite of what we...
Fix redistribute config and offline nodes
We need to manually filter out offline nodes before usingrpc.call_upload_file and rpc.call_write_ssconf_files, since these methodare static (they work without a ConfigWriter instance) and thus do notknow which nodes are offline and which are not)....
Add support for per-os-hypervisor parameters
This patch implements all modifications to support per-os-hypervisorparameters in the framework.
Signed-off-by: René Nussbaumer <rn@google.com>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Validate the hostnames at creation time
This patch adds validation of new names used, i.e. at cluster init time,node add time, and instance creation.
For instances, especially when using «--no-name-check» (which skips DNSchecks), we should validate the give name, and also normalize it...
Implement disabling of file-based storage
Rationale: the file-based storage backend can add/remove files under acertain directory. However, the master node is also controlling thesetting of the file-based root directory, so basically it means we can'tprevent arbitrary modifications by the master of the node's filesystem....
Switch from os.path.join to utils.PathJoin
This passes a full burnin with lots of instances, and should be safe aswe mostly to join a known root (various constants) to a run-timevariable.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Fix bug in LUQueryConfigValues
LUQueryConfigValues supports multiple output fields. If the client askedfor the watcher pause status, it would not get a list, but simply thevalue.
Fix typo in LUVerifyCluster when checking node time
The first argument to _ErrorIf should always be True in this case.
Add LUNodeEvacuationStrategy
Implement support for mevac in OpTestAllocator
Implement IAllocator multi-evacuate mode
This is a new mode that request a solution for the evacuation ofmultiple nodes. The external script will be fed a list of names, and isexpected to return a list of [instance, new_node(s)] lists, detailingthe evacuation path of each instance....
Accept both 'nodes' and 'result from iallocator
This patch switches the default result key from 'nodes' to 'result'. Theold name is still accepted for backwards-compatiblity, and should beremoved in later versions.
Signed-off-by: Iustin Pop <iustin@google.com>...
Change internal API for the IAllocator class
Currently the 'name' parameter in the constructor is required (as anon-keyword argument). Since the (to follow) node evac IAllocator modedoesn't have 'name' as a valid argument, we're moving this one into the...
Remove redundant code in IAllocator class
This moves the setting of the request member on the in_data, of therequest type, and of the branching basef on request type outside ofindividual functions and directly into the constructor.
Since the values we're using externally are identical to the...
Simplify a bit _GetWantedNodes
This should have been done in the _ExpandNodeName patch.
Fix a wrong docstring
There's no such thing as OpProgrammerError (I found this as I wrote itin code in another place, and pylint complained).
Remove boiler-plate code about node/instance names
Currently we have lots of duplication of the error-checking (and properexception raising) around node/instance name expansion. LUCreateInstanceis the only place where we have abstracted this.
This patch creates two functions (ExpandNodeName and ExpandInstanceName)...
Release all node locks during disk replace
This patch extends commit 7ea7bcf by releasing all node locks in diskreplace for the early release mode. The rationale behind this is:
- LUCreateInstance already releases all node locks while waiting for disk synchronization, and does an instance startup later...
Auto-enable early release for offline old nodes
In case the old node is offline, we won't be able to talk to it toremove the storage, and in most cases the node is poweredoff/unreachable.
In this case, it makes no sense to delay the storage release, so we...
Run instance hooks on more nodes
This should fix issue 68: some hooks should be run on more nodes thancurrently. GrowDisk runs on both nodes, remove run the post hook on theinstance's nodes, and failover and migrate run the post hook on thesource node too....
Add {NEW,OLD}_{PRIMARY,SECONDARY} vars to hooks
Per issue 71, the migrate and failover need special variables forkeeping the nodes consistent during instance migrations.
Pass debug mode to noded for OS-related calls
Add a generic 'debug_level' attribute to opcodes
Also automatically fix opcodes which have this missing in the LU initroutine.
Add an early release lock/storage for disk replace
This patch adds an early_release parameter in the OpReplaceDisks andOpEvacuateNode opcodes, allowing earlier release of storage and moreimportantly of internal Ganeti locks.
The behaviour of the early release is that any locks and storage on all...
TLReplaceDisks: Delay iallocator when evacuating node
When evacuating nodes, the iallocator was run for allinstances without taking planned changes into consideration.This patch delays part of CheckPrereq and running theiallocator for node evacuation....
Implement debug level across OS-related RPC calls
This doesn't implement the full functionality, we need to add the debuglevel to the opcodes too, but at least won't require changing the RPCcalls during the 2.1 series.
Second try to fix LUVerifyCluster
My previous patch, commit 785d142, fixed the case where a node is markedoffline. With this patch it'll also handle other failures correctly.
LUVerifyCluster: Fix bug with offline nodes
[…] * Other Notes - NOTICE: 1 offline node(s) found. * Hooks ResultsFailure: command execution error:iteration over non-sequence
Commit a0c9776a introduced an error simulation mode to LUVerifyCluster.Due to a small mistake, offline nodes weren't skipped when checking the...
Merge remote branch 'origin/stable-2.1' into devel-2.1
Fix flipping MC flag bug
Currently unofflining or undraining an already functional mastercandidate node, can cause it to demote itself. In order to avoid that weonly trigger the self-promotion check if the node is not currently acandidate.
Merge branch 'devel-2.0' into devel-2.1
Conflicts: lib/backend.py - trivial merge...
LURemoveNode safety in face of wrong node list
LURemoveNode runs under the BGL, which means we're guaranteed that thelist of nodes as retrieved in CheckPrereq is still valid inBuildHooksEnv. However, we can make Ganeti handle failures in case thelocking is broken (or the node list has been modified otherwise) easily,...
Ensure all int/float conversions are handled right
int()/float() can raise either ValueError (in case of int("a")), orTypeError (in case of int(None)). We had many bugs over time due tothis, and a recent one was just diagnosed, so we go over the codebase...
Normalize MAC addresses to all lower.
This change will normalize the MAC to all lower after validation.
Signed-off-by: René Nussbaumer <rn@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
LURenameCluster: run post hook on all nodes
Since the cluster name might be used for various purposes on nodes, weshould let all nodes "know" about a cluster rename by running the posthook on all nodes. This will make cluster rename slightlyslower/costlier, but it is not/shouldn't be an operation that is run...
Further pylint disables, mostly for Unused args
Many of our functions have to follow a given API, and thus we have tokeep a given signature, but pylint doesn't understand this. Therefore,we silence this warning.
The patch does a few other cleanups.
LUDiagnoseOS._DiagnoseByOS: remove unused arg
The node_list argument to _DiagnoseByOS is not used, and is obsoleted bythe fact that the rlist argument already has the valid nodes as keys(assuming RPC behaviour didn't change). Thus, we remove it and silence...
Convert to static methods (where appropriate)
Many methods are simple pure functions, and not depending on the objectstate. We convert these to staticmethods.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Add targeted pylint disables
This patch should have only:
- pylint disables- docstring changes- whitespace changes
Fix an error message
Detected by an 'Unused variable' warning.
Remove many 'Unused variable' warnings
Note there are some cases left which need extra cleanup.
Add targetted pylint disables
This patch adds targeted pylint disables, where it makes sense (eitherdue to limitations in pylint or due to historical usage), and also a fewblanket ones in rapi where all the names are… “different”.
Fix two bugs in seldom-used codepaths
New version of pylint, new bugs found!
Clarifiy some more wide pylint disables
This removes/updates some module-wide pylint disables.
Implement BuildHooksEnv for NoHooksLU
This just adds a stub function that raises an assertion error; thisaccomplishes two things:
- silences many pylint warnings- if we ever stumble upon this, a specific assertion error is (hopefully) clearer than just a not implemented error...
Merge branch 'stable-2.0' into stable-2.1
CreateInstance: allow no ip check with start mode
Since gnt-instance start doesn't do any checks on the IP, it doesn'tmake much sense to do so in instance create (with start) if the userexpressly passes in ‘--no-ip-check’. Removing this requirement eases the...
Op/LUCreateInstance support for (no) name checks
This adds a new opcode parameter ‘name_check’ (similar to ip_check) thatis not required to be present (to easy backwards compatibility fortools).
It also adds a CheckArguments to LUCreateInstance and changes the...
Improve LUQueryNodes for lockless case
In most uses of LUQueryNodes, we don't take a lock. This means that theinstance data is not protected across GetInstanceList andGetInstanceInfo, and this can lead to instances not existing anymore.
Switching to GetAllInstanceInfo means that we get a single,...
gnt-cluster verify: Warn if node time diverges too far
The warning will be generated if the clocks diverge by morethan 150 seconds. Due to the way the RPC system works, wecannot get exact time differences, e.g. if one of thequeried nodes is broken. The comparision is done using a...
cmdlib: Work around race condition in DRBD before version 8.0.13
DRBD goes into sync mode for a short amount of time afterexecuting the "resize" command. DRBD 8.x below version8.0.13 contains a bug whereby calling "resize" in syncmode fails.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Remove quotes from CommaJoin and convert to it
This patch removes the quotes from CommaJoin and converts most of thecallers (that I could find) to it. Since CommaJoin does str(i) for i inparam, we can remove these, thus simplifying slightly a few calls....
Revert "Get rid of utils.CommaJoin"
This reverts commit 6915bc28fe053e92aa16cf2d974d205f1140219c based on thread onganeti-devel.
Conflicts:
lib/cmdlib.py (due to the error code classification, trivial)
Remove unused parameter “unlock” from cmdlib._WaitForSync
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix off-by-one error when modifying instance NIC
For an instance with exactly one NIC:
$ gnt-instance modify --net 1:ip=1.2.3.4 inst1Failure: prerequisites not met for this operation:error type: wrong_input, error details:Invalid NIC index 1, valid values are 0 to 1...
Re-add check for duplicate instance IP
This was originally implemented in 0ce8f948 and partiallyrolled back in 9b65e0d4. Apart from re-adding the check,this patch does some housekeeping by renaming the “_helper”function to “_AddIpAddress”.
Fix change of cluster nic parameters
To stay on the safe side, we check for errors in all instances, andrefuse to act, reporting on the errors we found, if there are anyproblems.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix mispopulation of nic parameters at nic modify
There's a bug in Ganeti 2.1 rc0 that makes nic parameters be populatedfrom the "filled in" dict, even if we're not changing any values inthem. This patch fixes the problem, by populating them from the correct...
ConfigWriter: move _temporary_ids to reservation
In order to do this we need to pass a job id when reserving a resource.We have one during _EnsureUUIDs because we passed it in from AddNode andAddInstance. During config upgrade we use a fake job ID which we then...
ConfigWriter: move _temporary_macs to reservation
This solves the race conditions in mac reservation, as macs are actuallyreserved, under the current ec id.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
ConfigWriter: simplify GenerateDRBDSecret
We can do this by adding a new TemporaryReservationManager
config.Add{Node,Instance}: get the ec id
This is ok because adding a node or instance cannot happen in a query.
We get the ec id from the LU and pass it to _EnsureUUID, which willthen for now not use it.
Introduce 'global hypervisor parameters' support
This patch adds support for global hypervisor parameters in instancecreation, instance modification, instance query and at instance loadtime.
We basically prevent any query on these parameters, discard them at load...
TLMigrateInstance: add error messagess during Exec
Currently the migration of an instance doesn't show any error until theend. We add two messages that show better the progress:
node1# gnt-instance migrate -f instance5Wed Nov 4 04:04:34 2009 Migrating instance instance5...
Introduce a wrapper for hostname resolving
Currently a few of the LU's CheckPrereq use utils.HostInfo which raisesa resolver error in case of failure. This is an exception from thestandard that CheckPrereq should raise an OpPrereqError if the error isin the 'pre' phase (so that it can be retried)....
Another round of pylint-related style fixes
A newer version of pylint, more warnings…
Implement cluster verify checks for wrong PV names
Since ':' is not a valid character in PV names (for the way Ganeti usesLVM), we need to check this and warn the user. This patch adds a newNV_PVLIST cluster verify check and verifies the PV names returned from...
jqueue: Convert to utils.Retry
Some improvements to gnt-node repair-storage
Currently the repair storage has two issues:
- down instances are aborting the operation, even though they should be ignored (it's not technically possible to know their disk status unless we would activate their disks)...
Add ecode to rpc.py's RpcResult.Raise()
This patch adds a new ecode argument to RpcResult.Raise(). This allowsspecifying the error code (for both OpExec and OpPrereq errors).
Note that this patch also makes the OpExecError exceptions raised from_FindFaultInstanceDisks have the error code classification....
Introduce two-argument style for OpPrereqError
This patch introduces a two-argument style for OpPrereqError. Only thedirect raise calls in cmdlib.py are converted, other users will follow.
cli.py is modified to handle both two-argument style and the current...
Remove the OpRetryError exception
This is only used in two places, in an error path that is no longervalid since Ganeti 2.0. We remove the try..except since we should notget it anymore (and if we do, then we should catch it in allconfig.Update cases) and we remove the exception class completely....
Activate disks while exporting an instance
Exporting an instance not running or without activated diskswill fail. This patch makes sure to activate disks beforeexporting an instance if it's in the ADMIN_down state.
Unify the query fields for the storage framework
This patch unifies the query fields in the storage framework for alltypes. Note that the information is still computed on-demand, so if e.g.the used disk space is not requested for the ‘file’ type, it won't be...
Don't show warnings on ADMIN_down instance failover
Before:$ gnt-instance failover -f inst1… checking disk consistency between source and target… - WARNING: Can't find disk on node node21.example.com… shutting down instance on source node
After:$ gnt-instance failover -f inst1...
Fix another style issue
For the Nth time, re-fix shadowing of outer-scope variable :)
Fix an error handling case in TLReplaceDisks
pylint is your friend, since the compiler doesn't exist.