History | View | Annotate | Download (337.8 kB)
Add --add-uids/--remove-uids to gnt-cluster modify
Signed-off-by: Balazs Lecz <leczb@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Add --uid-pool option to gnt-cluster modify
Fix cluster behaviour with disabled file storage
There are a few issues with disabled file storage:- cluster initialization is broken by default, as it uses the 'no' setting which is not a valid path- some other parts of the code require the file storage dir to be a...
Add a identify-defaults options for import
When importing an instance, all the saved valued will be used asexplicitly specified values, overriding the cluster defaults. This meansexport+import will change the status (from default to explicitlyspecified) of parameters....
Fix create/import verification of hvparams
Currently the instance creation checks the cluster hv defaults + the newparameters for validity, ignoring the os-specific hvparams (this was anoversight during the implementation of the os hvp). This patch uses the...
Reuse NIC information from export
If the user doesn't pass any nics in import, do not use a defaultone-nic, but instead read the nics from the export file as is.
Fortunately the export and the way nics are read from the command lineare compatible…
Signed-off-by: Iustin Pop <iustin@google.com>...
Reuse backend parameters from export
Similar to the previous patches, if we're missing some parameters andthe export has them (either in the new style or old-style), we reusethem.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reuse disk information from export
If the user doesn't pass the disk information on import, automaticallyreuse the number and size of disks. This loses the iv_name attribute,but that is only cosmetic and cannot be changed by the user.
Reuse hypervisor parameters in import
If available, we reuse the parameters from the export info.
Read disk template from export info
This patch changes the instance import to read the disk templateautomatically from the export info, if the opcode doesn't alreadyspecify a disk template.
To do this, we have a couple of additional changes:
- change from required parameter to optional one for disk_template...
CreateInstance: separate the reading of the export
We move the reading of the export to a separate function, to simplifyCheckPrepreq and also read it earlier. This will allow building themissing opcode parameters from the export information, instead of...
Move code from ExpandNames to CheckPrereq
This is needed since only in CheckPrereq we have the nodes locked, andfuture import enhancements will need to have access to the export infoduring the parameter build.
CreateInstance: Move some code to CheckArguments
ExpandNames holds too much non-locking code (first LU to be converted toExpandNames, and we didn't have CheckArguments at that poin), and thispatch moves the checks that are lock-independent to CheckArguments....
Handle errors better for wrong nic_count in export
This fixes an old 'FIXME' entry.
Add a new cluster parameter maintain_node_health
This will be used to conditionally enable the watcher node maintenancefeature.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Allow file storage to be grown
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix burnin error when trying to grow a file volume
Abstract the growable disk types in a ganeti constants, and only rundisk grow, from burnin, on them.
A rewrite of LUClusterVerify
Per issue 90, current cluster verify is very very brittle. It's one ofthe oldest pieces of code, with only additions without cleanups over thelast years.
Among its problems:
- data initialization interspersed with verification of RPC results,...
Some epydoc fixes
Instance creation: implement --no-install mode
This is a simple patch that adds the no-install mode for instancecreation, allowing import from foreign source of the actual OS (insteadof requiring the preparation of data in a form expected by the import...
Allow OS changes without reinstallation
This patch modifies LUSetInstanceParms to allow OS name changes, withoutreinstallation, in case an OS gets renamed on-disk.
cmdlib: Abstract OS checks
This patch moves the node-has-os checks to a separate function.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix behaviour of gnt-node modify -C no
The current check on whether we require auto_promote or not is wrong, aswe check whether we will have exactly the correct number of mastercandidates left. But it is fine if we have more (e.g. when CPS=10 andmc_remaning=19) than the current number, and in that case we shouldn't...
Rightname confd's HMAC key
Currently, the ganeti-confd's HMAC key is called “cluster HMAC key” orsimply “HMAC key” everywhere. With the implementation of inter-clusterinstance moves, another HMAC key will be introduced for signing criticaldata. They can not be the same, so this patch clarifies the purpose of the...
Implement conversion from drbd to plain
This is much simpler than the opposite, with fewer possibilities offailures.
Implement conversion from plain to drbd
This patch adds a new mode to instance modify, the changing of the disktemplate. For now only plain to drbd conversion is supported, and thenew secondary node must be specified manually (no iallocator support).
The procedure for conversion works as follows:...
Abstract check that an instance is down
Multiple LUs require that an instance is not running while they operateon the instance (reinstall, rename, modify, recreate disks, deactivatedisks). The code to do this check is duplicate many times, and not very...
Abstract node free disk space check
Both create instance and grow disk check the free disk space on nodesusing the same, duplicate code. Since we'll need this in other places inthe future, we abstract the check into a new function.
The patch adjusts the error message to be more in-line with the one for...
Abstract disk template verification
This is a simple check, but we'll need it in multiple places.
LUCreateInstance: implement disk adoption mode
This new mode, valid only for the plain template disk, allows creationof an instance based on existing logical volumes (preserving data),rather than creation of new volumes and OS creation.
The new mode works as follows:...
LUCreateInstance: Move parameter init earlier
This way, the parameters are available in CheckArguments too.
Verify cluster certificates in LUVerifyCluster
When using pyOpenSSL 0.7 or above, LUClusterVerify will start to show awarning 30 days before a certificate expires. 7 days before thecertificate expires, the warning becomes an error. Once expired,LUVerifyCluster will always report an error. The latter is also supported...
Add constant with cluster X509 certificates
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Merge branch 'stable-2.1' into devel-2.1
Improve cluster verify with hypervisor errors
In case the hypervisor has issues on one node, currentlybackend.VerifyNode will exit via an exception (two exit paths possible,one via HypervisorError from hypervisor.Verify(), and one via RPCFailfrom GetInstanceList). This is bad as it invalidates all other checks of...
Fix cluster verify with simulate-errors
In simulate errors mode, the test "ntime_diff is not None" will beignored, and thus a None value will try to be formatted as %.01f. Weworkaround this by formatting it before, and then only using %s, whichcan format a 'None' value....
Validate the os-specific hypervisor parameters
This adds a validation similar to the one for cluster-wide hypervisorparamters.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Rework the node modify for mc-demotion
The current code in LUSetNodeParms regarding the demotion from mastercandidate role is complicated and duplicates the code in ConfigWriter,where such decisions should be made. Furthermore, we still cannot demotenodes (not even with force), if other regular nodes exist....
Fix typo that makes cluster verify to ignore hooks
The return from LUVerifyCluster should be True (or equivalent) for pass,and False (or equivalent) for fail. The HooksCallBack function uses '1'(= True) when a hook fails, which is exactly the opposite of what we...
Fix redistribute config and offline nodes
We need to manually filter out offline nodes before usingrpc.call_upload_file and rpc.call_write_ssconf_files, since these methodare static (they work without a ConfigWriter instance) and thus do notknow which nodes are offline and which are not)....
Add support for per-os-hypervisor parameters
This patch implements all modifications to support per-os-hypervisorparameters in the framework.
Signed-off-by: René Nussbaumer <rn@google.com>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Validate the hostnames at creation time
This patch adds validation of new names used, i.e. at cluster init time,node add time, and instance creation.
For instances, especially when using «--no-name-check» (which skips DNSchecks), we should validate the give name, and also normalize it...
Implement disabling of file-based storage
Rationale: the file-based storage backend can add/remove files under acertain directory. However, the master node is also controlling thesetting of the file-based root directory, so basically it means we can'tprevent arbitrary modifications by the master of the node's filesystem....
Switch from os.path.join to utils.PathJoin
This passes a full burnin with lots of instances, and should be safe aswe mostly to join a known root (various constants) to a run-timevariable.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Fix bug in LUQueryConfigValues
LUQueryConfigValues supports multiple output fields. If the client askedfor the watcher pause status, it would not get a list, but simply thevalue.
Fix typo in LUVerifyCluster when checking node time
The first argument to _ErrorIf should always be True in this case.
Add LUNodeEvacuationStrategy
Implement support for mevac in OpTestAllocator
Implement IAllocator multi-evacuate mode
This is a new mode that request a solution for the evacuation ofmultiple nodes. The external script will be fed a list of names, and isexpected to return a list of [instance, new_node(s)] lists, detailingthe evacuation path of each instance....
Accept both 'nodes' and 'result from iallocator
This patch switches the default result key from 'nodes' to 'result'. Theold name is still accepted for backwards-compatiblity, and should beremoved in later versions.
Change internal API for the IAllocator class
Currently the 'name' parameter in the constructor is required (as anon-keyword argument). Since the (to follow) node evac IAllocator modedoesn't have 'name' as a valid argument, we're moving this one into the...
Remove redundant code in IAllocator class
This moves the setting of the request member on the in_data, of therequest type, and of the branching basef on request type outside ofindividual functions and directly into the constructor.
Since the values we're using externally are identical to the...
Simplify a bit _GetWantedNodes
This should have been done in the _ExpandNodeName patch.
Fix a wrong docstring
There's no such thing as OpProgrammerError (I found this as I wrote itin code in another place, and pylint complained).
Remove boiler-plate code about node/instance names
Currently we have lots of duplication of the error-checking (and properexception raising) around node/instance name expansion. LUCreateInstanceis the only place where we have abstracted this.
This patch creates two functions (ExpandNodeName and ExpandInstanceName)...
Release all node locks during disk replace
This patch extends commit 7ea7bcf by releasing all node locks in diskreplace for the early release mode. The rationale behind this is:
- LUCreateInstance already releases all node locks while waiting for disk synchronization, and does an instance startup later...
Auto-enable early release for offline old nodes
In case the old node is offline, we won't be able to talk to it toremove the storage, and in most cases the node is poweredoff/unreachable.
In this case, it makes no sense to delay the storage release, so we...
Run instance hooks on more nodes
This should fix issue 68: some hooks should be run on more nodes thancurrently. GrowDisk runs on both nodes, remove run the post hook on theinstance's nodes, and failover and migrate run the post hook on thesource node too....
Add {NEW,OLD}_{PRIMARY,SECONDARY} vars to hooks
Per issue 71, the migrate and failover need special variables forkeeping the nodes consistent during instance migrations.
Pass debug mode to noded for OS-related calls
Add a generic 'debug_level' attribute to opcodes
Also automatically fix opcodes which have this missing in the LU initroutine.
Add an early release lock/storage for disk replace
This patch adds an early_release parameter in the OpReplaceDisks andOpEvacuateNode opcodes, allowing earlier release of storage and moreimportantly of internal Ganeti locks.
The behaviour of the early release is that any locks and storage on all...
TLReplaceDisks: Delay iallocator when evacuating node
When evacuating nodes, the iallocator was run for allinstances without taking planned changes into consideration.This patch delays part of CheckPrereq and running theiallocator for node evacuation....
Implement debug level across OS-related RPC calls
This doesn't implement the full functionality, we need to add the debuglevel to the opcodes too, but at least won't require changing the RPCcalls during the 2.1 series.
Second try to fix LUVerifyCluster
My previous patch, commit 785d142, fixed the case where a node is markedoffline. With this patch it'll also handle other failures correctly.
LUVerifyCluster: Fix bug with offline nodes
[…] * Other Notes - NOTICE: 1 offline node(s) found. * Hooks ResultsFailure: command execution error:iteration over non-sequence
Commit a0c9776a introduced an error simulation mode to LUVerifyCluster.Due to a small mistake, offline nodes weren't skipped when checking the...
Merge remote branch 'origin/stable-2.1' into devel-2.1
Fix flipping MC flag bug
Currently unofflining or undraining an already functional mastercandidate node, can cause it to demote itself. In order to avoid that weonly trigger the self-promotion check if the node is not currently acandidate.
Merge branch 'devel-2.0' into devel-2.1
Conflicts: lib/backend.py - trivial merge...
LURemoveNode safety in face of wrong node list
LURemoveNode runs under the BGL, which means we're guaranteed that thelist of nodes as retrieved in CheckPrereq is still valid inBuildHooksEnv. However, we can make Ganeti handle failures in case thelocking is broken (or the node list has been modified otherwise) easily,...
Ensure all int/float conversions are handled right
int()/float() can raise either ValueError (in case of int("a")), orTypeError (in case of int(None)). We had many bugs over time due tothis, and a recent one was just diagnosed, so we go over the codebase...
Normalize MAC addresses to all lower.
This change will normalize the MAC to all lower after validation.
Signed-off-by: René Nussbaumer <rn@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
LURenameCluster: run post hook on all nodes
Since the cluster name might be used for various purposes on nodes, weshould let all nodes "know" about a cluster rename by running the posthook on all nodes. This will make cluster rename slightlyslower/costlier, but it is not/shouldn't be an operation that is run...
Further pylint disables, mostly for Unused args
Many of our functions have to follow a given API, and thus we have tokeep a given signature, but pylint doesn't understand this. Therefore,we silence this warning.
The patch does a few other cleanups.
LUDiagnoseOS._DiagnoseByOS: remove unused arg
The node_list argument to _DiagnoseByOS is not used, and is obsoleted bythe fact that the rlist argument already has the valid nodes as keys(assuming RPC behaviour didn't change). Thus, we remove it and silence...
Convert to static methods (where appropriate)
Many methods are simple pure functions, and not depending on the objectstate. We convert these to staticmethods.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Add targeted pylint disables
This patch should have only:
- pylint disables- docstring changes- whitespace changes
Fix an error message
Detected by an 'Unused variable' warning.
Remove many 'Unused variable' warnings
Note there are some cases left which need extra cleanup.
Add targetted pylint disables
This patch adds targeted pylint disables, where it makes sense (eitherdue to limitations in pylint or due to historical usage), and also a fewblanket ones in rapi where all the names are… “different”.
Fix two bugs in seldom-used codepaths
New version of pylint, new bugs found!
Clarifiy some more wide pylint disables
This removes/updates some module-wide pylint disables.
Implement BuildHooksEnv for NoHooksLU
This just adds a stub function that raises an assertion error; thisaccomplishes two things:
- silences many pylint warnings- if we ever stumble upon this, a specific assertion error is (hopefully) clearer than just a not implemented error...
Merge branch 'stable-2.0' into stable-2.1
CreateInstance: allow no ip check with start mode
Since gnt-instance start doesn't do any checks on the IP, it doesn'tmake much sense to do so in instance create (with start) if the userexpressly passes in ‘--no-ip-check’. Removing this requirement eases the...
Op/LUCreateInstance support for (no) name checks
This adds a new opcode parameter ‘name_check’ (similar to ip_check) thatis not required to be present (to easy backwards compatibility fortools).
It also adds a CheckArguments to LUCreateInstance and changes the...
Improve LUQueryNodes for lockless case
In most uses of LUQueryNodes, we don't take a lock. This means that theinstance data is not protected across GetInstanceList andGetInstanceInfo, and this can lead to instances not existing anymore.
Switching to GetAllInstanceInfo means that we get a single,...
gnt-cluster verify: Warn if node time diverges too far
The warning will be generated if the clocks diverge by morethan 150 seconds. Due to the way the RPC system works, wecannot get exact time differences, e.g. if one of thequeried nodes is broken. The comparision is done using a...
cmdlib: Work around race condition in DRBD before version 8.0.13
DRBD goes into sync mode for a short amount of time afterexecuting the "resize" command. DRBD 8.x below version8.0.13 contains a bug whereby calling "resize" in syncmode fails.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Remove quotes from CommaJoin and convert to it
This patch removes the quotes from CommaJoin and converts most of thecallers (that I could find) to it. Since CommaJoin does str(i) for i inparam, we can remove these, thus simplifying slightly a few calls....
Revert "Get rid of utils.CommaJoin"
This reverts commit 6915bc28fe053e92aa16cf2d974d205f1140219c based on thread onganeti-devel.
Conflicts:
lib/cmdlib.py (due to the error code classification, trivial)
Remove unused parameter “unlock” from cmdlib._WaitForSync
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix off-by-one error when modifying instance NIC
For an instance with exactly one NIC:
$ gnt-instance modify --net 1:ip=1.2.3.4 inst1Failure: prerequisites not met for this operation:error type: wrong_input, error details:Invalid NIC index 1, valid values are 0 to 1...
Re-add check for duplicate instance IP
This was originally implemented in 0ce8f948 and partiallyrolled back in 9b65e0d4. Apart from re-adding the check,this patch does some housekeeping by renaming the “_helper”function to “_AddIpAddress”.
Fix change of cluster nic parameters
To stay on the safe side, we check for errors in all instances, andrefuse to act, reporting on the errors we found, if there are anyproblems.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix mispopulation of nic parameters at nic modify
There's a bug in Ganeti 2.1 rc0 that makes nic parameters be populatedfrom the "filled in" dict, even if we're not changing any values inthem. This patch fixes the problem, by populating them from the correct...