History | View | Annotate | Download (86.9 kB)
Make the snapshot decision based on disk type
… instead of disk size, which is not as reliable. This actuallysimplifies the code; but it still leaves the possibility of stackoverflows if the disk data structure is corrupted.
Signed-off-by: Iustin Pop <iustin@google.com>...
Merge branch 'devel-2.0' into devel-2.1
Conflicts: lib/backend.py - trivial merge...
Ensure all int/float conversions are handled right
int()/float() can raise either ValueError (in case of int("a")), orTypeError (in case of int(None)). We had many bugs over time due tothis, and a recent one was just diagnosed, so we go over the codebase...
backend._OSOndiskAPIVersion: remove obsolete arg
The 'name' argument is not used anymore, probably since before 2.0.Since this is an internal function, we can just remove it (from itscaller too).
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Convert to static methods (where appropriate)
Many methods are simple pure functions, and not depending on the objectstate. We convert these to staticmethods.
Add targeted pylint disables
This patch should have only:
- pylint disables- docstring changes- whitespace changes
Remove many 'Unused variable' warnings
Note there are some cases left which need extra cleanup.
Add targetted pylint disables
This patch adds targeted pylint disables, where it makes sense (eitherdue to limitations in pylint or due to historical usage), and also a fewblanket ones in rapi where all the names are… “different”.
Merge branch 'stable-2.1' into devel-2.1
Merge branch 'stable-2.0' into stable-2.1
Move the hooks file mask into constants.py
This will allow reuse of the same mask for multiple validations.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Security issue: add validation of script names
This patch unifies the search for external script to always go throughutils.FindFile and implements in that function a restriction on validchars in file names and (additionally) that the passed name is the...
gnt-cluster verify: Warn if node time diverges too far
The warning will be generated if the clocks diverge by morethan 150 seconds. Due to the way the RPC system works, wecannot get exact time differences, e.g. if one of thequeried nodes is broken. The comparision is done using a...
Remove quotes from CommaJoin and convert to it
This patch removes the quotes from CommaJoin and converts most of thecallers (that I could find) to it. Since CommaJoin does str(i) for i inparam, we can remove these, thus simplifying slightly a few calls....
Use “daemon-util” to reload SSH keys
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix pylint 'E' (error) codes
This patch adds some silences and tweaks the code slightly so that“pylint --rcfile pylintrc -e ganeti” doesn't give any errors.
The biggest change is in jqueue.py, the move of _RequireOpenQueue out ofthe JobQueue class. Since that is actually a function and not a method...
Add new “daemon-util” script to start/stop Ganeti daemons
Until now, Ganeti started and stopped its own daemons using custom functions.To start, the daemon was just executed and then sent the appropriate signals tostop it again. Init scripts would have to pay attention to the PID file and...
hypervisors: change MigrateInstance API
Currently the $hypervisor.MigrateInstance takes the instance name. Thispatch changes it to take the instance object, such that other instanceproperties (especially hvparams) are available to it.
Workaround fake failures in drbd+live migration
This patch is an attempt to fix the ugly issue during migration: Cannot resync disks on node …: [True, 100]
If my understanding is correct, sometimes we poll the /proc/drbd file atan inoportune moment, while it's being updated, or while the DRBD device...
Another round of pylint-related style fixes
A newer version of pylint, more warnings…
Implement cluster verify checks for wrong PV names
Since ':' is not a valid character in PV names (for the way Ganeti usesLVM), we need to check this and warn the user. This patch adds a newNV_PVLIST cluster verify check and verifies the PV names returned from...
backend: Convert to utils.Retry
Epydoc fixes
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
backend: Don't overwrite function parameter with loop variable
Adding '--no-ssh-init' option to 'gnt-cluster init'.
Allows the initialization of a cluster without the creation or distributionof SSH key pairs. Includes changes for LeaveCluster and RPC.
Signed-off-by: Ken Wehr <ksw@google.com>Signed-off-by: Guido Trotter <ultrotter@google.com>...
Try to reduce wrong errors in InstanceShutdown
In backend.InstanceShutdown(), there is a race condition betweenchecking that the instance exists and trying to shut it down whichtranslates sometime in error messages like:
Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed...
Revert breakage introduced in e4e9b80
Commit e4e9b8064787df01a79846a40f49c8ae06a8eb0e introduced two problemsin backend.InstanceShutdown():
- first, it reduced the check interval significantly (especially for the first few checks); there are very few production VMs that shutdown in...
Introduce checks for /sys and /proc
This patch adds checks for /proc and /sys in cluster verify, sinceGaneti relies on these special filesystems to be mounted.
Add timeout options to other LUs
All the LUs that shut down the instance need to be able too pass thetimeout parameter as well.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Code and docstring style fixes
Found using pylint and epydoc.
Accept shutdown timeout from the user
Using the new --timeout option:
- gnt-instance shutdown is changed to accept a timeout- the opcode is changed to hold one- the LU is changed to optionally get one- the rpc is changed to carry one- the backend is changed to take it as a parameter rather than...
backend.InstanceShutdown: small cleanup
1) unhardcode the timeout, abstracting it in a constant2) Use time.time() rather than hiding the timeout in a range()3) call hyper.StopInstance multiple times -- currently all hypervisors just ignore all calls but once...
Populate OS variants if an api >= 15 is present
Adding the file name to the os_files dict will fill in the full path andget it checked, if present we also read it and split into lines, one perdeclared variant.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
OSEnvironment: populate OS_VARIANT
According to the design on api_version >= 15 the OS variant is the partof the OS name after the "+" sign. If none is found, we just pass in thefirst variant an OS declares (which is bound to exist, as we check forit in _TryOSFromDisk)....
OSFromDisk: handle variants when loading os
When we load an OS from disk, we need _TryOSFromDisk to get the realname, without any variant. This allows any functionality that uses theinstance OS to handle a name with a variant.
Add per-node variants list to OS diagnose output
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Convert os api version file name to a constant
TryOSFromDisk: s/os_scripts/os_files/
We'll be using this dict/loop to check more than just scripts, so we'rerenaming the variables appropriately.
TryOSFromDisk: only check actual os scripts for +x
Currently all checked files in the loop are os scripts, so nothing willchange, but in the future we only want the +x bit on actual os scripts,not necessarily all files.
Remove secrets and kill confd on cluster leave
Treat virtual LVs as inexistent
Currently, “gnt-cluster verify” and “gnt-cluster verify-disks” use thelist of LVs as returned by backend.GetVolumeList to determine whether anLV exists or not. However, LVs can also be ‘virtual’, which is handledcorrectly (i.e. as missing) by the bdev code, but not by this function....
Use ReadFile/WriteFile in more places
This survived QA, burnin and unittests.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Luca Bigliardi <shammash@google.com>
Add disk copy support at backend and the rpc level
This uses a simple 'dd if=… | ssh $target dd of=…' method, like theExportSnapshot (which uses the OS export; here we want full disk-levelcopy and not any FS-level changes).
Merge commit 'origin/next' into branch-2.1
Merge branch 'master' into next
Fix detecting of errors in export
This should fix issue 61, by explicitely calling bash (which is is now anon-explicit dependency) and setting the pipefail command.
Use objects for blockdev_getmirrorstatus RPC call result
This patch changes the return type for backend.BlockdevGetmirrorstatus froma list of tuples to a list of objects.BlockDevStatus instances.
Use object for blockdev_find RPC call result
This patch changes the return type for backend.BlockdevFind to an object(objects.BlockDevStatus). Before a tuple was used. Adding more values tothis tuple causes a lot of work. Converting the result to an object with...
rpc: add rpc call for getting disk size
Note that this exports the disk size as bdev returns it, in bytes. Thevalue will be converted to MiB in cmdlib.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Extend call_node_start_master rpc with no_voting
When the parameter is set to True and start_daemons is also True,ganeti-masterd will be started with the new --no-voting --yes-do-itoptions.
This new option is set to True only on masterfailover, when no_voting is...
Remove <DAEMON>_PID constants
The <DAEMON>_PID constants were created to reference a daemon pid file,but actually contain a daemon's name, because the various functions thatwork with pidfiles abstract the filename from the daemon namethemselves. Removing the constants and using the actual daemon name...
Change GetNodeDaemonPort to GetDaemonPort in utils
GetNodeDaemonPort is used to lookup the node daemon port in the servicesfile, and if not found to return the default one. We make it a genericfunction, which accepts the daemon name in input, so that it can be used...
Generate a shared HMAC key at cluster init time
This key is shared on all nodes (via cmdlib._RedistributeAncillaryFiles)and will be used for HMAC authentication of confd messages.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix backend import errors from GetHypervisorClass
The merge of commit 360b0dc into branch-2.1 broke import of backend,since it uses hypervisor.GetHypervisor() which returns an instance ofthe hypervisor. Some of the hypervisors create directories at init time,...
Merge branch 'next' into branch-2.1
Conflicts: lib/backend.py: non-trivial conflict but easy to solve
backend: Only build once the list of upload files
The list of upload files is built currently at every UploadFile() call.This patch moves it to a separate variable which is initialized onlyonce.
This won't make much difference but I regard it as cleanup....
Fix pylint warnings
Fix some typos
backend.DemoteFromMC: don't fail for missing files
If the config file is missing when the DemoteFromMC() function iscalled, it will raise a ProgrammerError. Instead of changing theutils.CreateBackup() file which is called from multiple places, for nowwe only change the DemoteFromMC() function to not call it if the file is...
Merge branch 'master' into branch-2.1
Introduce OS api version 15
Also, since Ganeti 2.1 will be compatible with both 10 and 15, changethe OS_API_VERSION constant to be an OS_API_VERSIONS set, and update theplaces in the code that used that constat to use something else.
In particular: - in the qa for now we just create a fake version 10 OS...
_OSOndiskAPIVersion: save a loop
The api_versions list is first stripped and then converted to integer.Combining the two operations.
Use ReadFile.splitlines() rather than readlines
A few places in the code open a file "manually" rather than using ourwrapper function, because they need an array with the lines. Combiningthe result of utils.ReadFile with splitlines() we get rid of theexceptions....
Rename _OSOndiskVersion to _OSOndiskAPIVersion
This makes what versions we're talking about clearer.
backend.StartMaster: fix variable name
As per comments for patch “Convert node_start_master to new styleresult”, the ‘payload’ variable is renamed to ‘err_msgs’.
Fix a typo in backend.InstanceReboot docstring
The documentation for the reboot was wrong. This patch fixes it andupdates the docstring with more details.
Fix various pylint warnings
There were multiple issues: - copy-paste resulted in wrong indentation - wrong function name - missing spaces around assignment - overriding built-in names (type, dir) or already defines ones (errors, hypervisor)
Fix backend.{Start,Stop}Master
Commit c26a6bd21c17641f718369caed88ae16947fa774 changed GetMasterInfonot to return a tuple anymore, but didn't update its two callers inbackend.py, which were trying to extract the values from the secondtuple element. This causes a stack trace in node-daemon.log....
Simplify usage of backend._FindDisks
Since all users of _FindDisks now return new-style results, we cansimply make it raise an exception and not deal with the status field.
Convert all backend function to exception
Instead of returning (False, msg) from rpc endpoints, we raise alwaysexceptions (the non-endpoint, internal functions can remain as is). Thismeans that the error paths are agnostic to how the failure is signalled...
Simplify the RPC result framework in backend.py
Since now all functions fail via _Fail, the return True, … is redundantas all normal return paths have it, and thus the True value can be addedin the ganeti-noded handler.
This means that all functions can now forget about the special result...
Convert hooks_runner rpc to new style result
This also converts (and fixes) unittests and mock objects to deal withthis change, and the custom hook verifier in cmdlib.LUClusterVerify.
Convert iallocator_runner rpc to new result style
This patch converts this rpc into the new style. Since the functionalready had some error handling, we remove this custom error reportingand replace it with our (new-style) result type. This allows significant...
Convert the file storage rpcs to new style result
This patch converts all three file-storage rpc (create, delete, rename)to new style result. This is done in a single patch as they all use ahelper function which itself needs to/can be converted.
Convert the jobqueue rpc to new style result
This patch converts the job queue rpc calls to the new style result.It's done in a single patch as there are helper function (in both jqueueand backend) that are used by multiple rpcs and need synchronizedchange....
Big rewrite of the OS-related functions
Currently the OSes have a special, customized error handling: the OSobject can represent either a valid OS, or an invalid OS. The associatedfunction, instead of raising other exception or failing, create customOS objects representing failed OSes....
Remove old invalid-os related functionality
We no longer need OS objects to be able to represent invalid OSes. Thiscleans up the code handling those cases.
Conver node_leave_cluster rpc to new style result
This patch converts this rpc call to the new style result, and alsochanges in the process the meaning of the QuitGanetiException'sarguments and the node daemon rpc call exception handler.
The problem with the exception handler is that we used a two-stage one,...
Convert node_volumes rpc to new style result
Convert master_info rpc to new style result
This was more tricky as the backend function is used by other functionin backend.py. As such, it must be handled specially - it must raisealways an exception and not simply return False, err.
Convert write_ssconf_files to new style
The patch also adds logging of errors from the ConfigWriter in case theRPC fails (although today we don't have failure modes).
Convert instance_list rpc to new style result
Since backend.GetInstanceList() is used both as RPC endpoint and asinternal function, it can't return (status, value). Instead it returnsonly valid instance info, and failures are denoted by exceptions; and...
Convert node_info rpc to new style result
This patch also does some cleanup and enforces valid results (withproper type, i.e. int for memory/disk values) from remote node,otherwise we handle the result as failure.
We do this so that we can remove custom processing in rpc.py which is...
Convert node_verify rpc to new result style
Convert node_start_master to new style result
This is used in multiple places outside cmdlib.py, so it's a moreinteresting patch.
Convert node_stop_master rpc to new style result
Convert instance_os_import rpc to new style result
This changes from a list of booleans to «status, error messages». Thismeans that instead knowing which disk has failed (position based), weget a list of all failures (with details how they failed).
Convert instance_info rpc to new style result
Convert all_instances_info rpc to new result style
Convert bridges_exist to new style result
This was a very simple (boolean) RPC, so converting it to actually havemore value with the new style results was more difficult.
Convert vg_list rpc to new style result
This doesn't have known failure modes but converting will help later.
We also now call directly utils.ListVolumeGroups() instead of thebackend.ListVolumeGroups() so that we don't have to undo the (status,value) result type....
Convert volume_list rpc to new style result
This is a big change, because we need to cleanup its users too.
The call and thus LUVerifyDisks LU used to differentiate between failureat node level and failure at LV level, by returning different types inthe RPC result. This is way too complicated for our needs....
Convert export_remove rpc to new style
This converts the export_remove rpc to new style result and also fixesand old TODO by adding exception handling (and conversion to failure).
Convert export_list rpc to new style result
This is used in multiple places, so it has a little more changes thanthe previous ones.
Convert export_info rpc to new style result
This also removes some code from ganeti-noded and rpc.py, which shouldnot do such processing of data (and be simply glue code). (Oralternatively they could, if we had better infrastructure).