History | View | Annotate | Download (88.1 kB)
Fix unsafe variant initializer in _TryOSFromDisk
In case an OS has inconsistent declarations, we might get into a casewhere one node reports a valid variants list (with OS API >=15), andanother node has OS API < 15, in which case its supported_variants gets...
Add checks for master IP in cluster verify
This also updates a comment in the unittest for utils.py. We unittestthe new function for two things: correct reporting on real case (forlocalhost), and correct reporting with a mocked-out TcpPing that returns...
Fix some pylint warnings
Disable warnings for:- except Exception,- use of __errno_location,- redeclaration of handleError()
Signed-off-by: Luca Bigliardi <shammash@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Lock PowercycleNode child in memory
Signed-off-by: Luca Bigliardi <shammash@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
backend: remove a couple of useless mkdir calls
Those directories must exist for the node daemon to run (it's in thenode daemon's list of ensured directories) and those functions are onlycalled by the node daemon, so there's no point in those checks+mkdir...
Add CleanupInstance hypervisor call
Currently some hypervisors (namely kvm) need to do some cleanup aftermaking sure an instance is stopped. With the moving of the retry cyclein backend those cleanups were never done. In order to solve this we adda new optional hypervisor function, CleanupInstance, which gets called...
backend: Consolidate code opening real block device
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Export more instance parameters in instance export
Currently the backend parameters are not exported automatically, butonly a few directly in the '[instance]' section. Hypervisor type andhypervisor parameters are not exported at all.
This patch creates two separate sections for the be and hv parameters,...
Export the nicparams too during instance export
The patch tries to export all params (based on the dict defined inconstants), using None for missing keys.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix backend.VerifyNode behaviour for VG problems
In case LVM is broken, backend.GetVolumeList will raise an RPC exception(as expected since it's a function exposed over RPC). Therefore we mustbe prepared to catch any such exceptions, so that we don't fail the...
backend: Two small style fixes
- Pass keyword parameter as such- Replace “not x == y” with “x != y”
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Rename SSL_CERT_FILE to NODED_CERT_FILE
To be consistent with RAPI_CERT_FILE, the rather generic named“SSL_CERT_FILE” constant is renamed to “NODED_CERT_FILE”. The actual filename is not changed.
Rightname confd's HMAC key
Currently, the ganeti-confd's HMAC key is called “cluster HMAC key” orsimply “HMAC key” everywhere. With the implementation of inter-clusterinstance moves, another HMAC key will be introduced for signing criticaldata. They can not be the same, so this patch clarifies the purpose of the...
utils.CreateBackup: Use human-readable instead of seconds since Epoch
Seconds since the Epoch are not easily readable by a human. Using aformatted timestamp makes it easier (e.g.“….backup-2010-03-12_14_02_43.…”). This patch also makes OS logfiles usethis formatted timestamp....
Improve cluster verify with hypervisor errors
In case the hypervisor has issues on one node, currentlybackend.VerifyNode will exit via an exception (two exit paths possible,one via HypervisorError from hypervisor.Verify(), and one via RPCFailfrom GetInstanceList). This is bad as it invalidates all other checks of...
Fix node volumes list for stripped volumes
Currently backend.NodeVolumes() drops everything except the first PV,thus we get a truncated result. The patch is not the nicest, as Pythondoesn't have a simple `concat' function, so I had to change the listcomprehension to an explicit loop....
Switch more code to PathJoin
This should remove most of the remaining constructs which can bereplaced by PathJoin.
Add caller-validation on Disk.StaticDevPath
Since in objects we don't have access to utils.py, we add a warning thatthe result value from objects.Disk.StaticDevPath might not be a validpath, and change its only caller to validate the path.
Signed-off-by: Iustin Pop <iustin@google.com>...
Implement disabling of file-based storage
Rationale: the file-based storage backend can add/remove files under acertain directory. However, the master node is also controlling thesetting of the file-based root directory, so basically it means we can'tprevent arbitrary modifications by the master of the node's filesystem....
Replace os.path.sep.join(seq) with utils.PathJoin
This is a no-op change, but at least we concentrate the calls to pathjoins into a single function.
A use in utils.FindFile is left as-is (don't want to raise exceptionsthere, at least for now).
Abstract OS log names computation
The various OS operations create log files in a specific directory(constants.LOG_OS_DIR). The construction of the log names is howeverspread and duplicated across multiple functions.
This patch abstracts this into a separate function that also validates...
Remove superfluous warnings in HooksRunner
For non-existing hooks (the majority of cases probably), logging awarning every time is not helpful. So we first check if we have a validdirectory.
Switch from os.path.join to utils.PathJoin
This passes a full burnin with lots of instances, and should be safe aswe mostly to join a known root (various constants) to a run-timevariable.
Add an extra safety layer to _CleanDirectory
In order to protect from accidental use of _CleanDirectory on a randomdirectory, we add a list of allowed clean directories, somewhat similarto _ALLOWED_UPLOAD_FILES (but statically computed).
Implement utils.RunParts and use it for hooks
This function is a generic pythonic version of runparts. We currentlyuse it in the backend HooksRunner, but we'll use it for runningdifferent directories as well.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Change backend hooks runner to use RunCmd
And save lots of lines of code, in the process
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Merge branch 'stable-2.1' into devel-2.1
Implement debug level across OS-related RPC calls
This doesn't implement the full functionality, we need to add the debuglevel to the opcodes too, but at least won't require changing the RPCcalls during the 2.1 series.
Make the snapshot decision based on disk type
… instead of disk size, which is not as reliable. This actuallysimplifies the code; but it still leaves the possibility of stackoverflows if the disk data structure is corrupted.
Merge branch 'devel-2.0' into devel-2.1
Conflicts: lib/backend.py - trivial merge...
Ensure all int/float conversions are handled right
int()/float() can raise either ValueError (in case of int("a")), orTypeError (in case of int(None)). We had many bugs over time due tothis, and a recent one was just diagnosed, so we go over the codebase...
backend._OSOndiskAPIVersion: remove obsolete arg
The 'name' argument is not used anymore, probably since before 2.0.Since this is an internal function, we can just remove it (from itscaller too).
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Convert to static methods (where appropriate)
Many methods are simple pure functions, and not depending on the objectstate. We convert these to staticmethods.
Add targeted pylint disables
This patch should have only:
- pylint disables- docstring changes- whitespace changes
Remove many 'Unused variable' warnings
Note there are some cases left which need extra cleanup.
Add targetted pylint disables
This patch adds targeted pylint disables, where it makes sense (eitherdue to limitations in pylint or due to historical usage), and also a fewblanket ones in rapi where all the names are… “different”.
Merge branch 'stable-2.0' into stable-2.1
Move the hooks file mask into constants.py
This will allow reuse of the same mask for multiple validations.
Security issue: add validation of script names
This patch unifies the search for external script to always go throughutils.FindFile and implements in that function a restriction on validchars in file names and (additionally) that the passed name is the...
gnt-cluster verify: Warn if node time diverges too far
The warning will be generated if the clocks diverge by morethan 150 seconds. Due to the way the RPC system works, wecannot get exact time differences, e.g. if one of thequeried nodes is broken. The comparision is done using a...
Remove quotes from CommaJoin and convert to it
This patch removes the quotes from CommaJoin and converts most of thecallers (that I could find) to it. Since CommaJoin does str(i) for i inparam, we can remove these, thus simplifying slightly a few calls....
Use “daemon-util” to reload SSH keys
Fix pylint 'E' (error) codes
This patch adds some silences and tweaks the code slightly so that“pylint --rcfile pylintrc -e ganeti” doesn't give any errors.
The biggest change is in jqueue.py, the move of _RequireOpenQueue out ofthe JobQueue class. Since that is actually a function and not a method...
Add new “daemon-util” script to start/stop Ganeti daemons
Until now, Ganeti started and stopped its own daemons using custom functions.To start, the daemon was just executed and then sent the appropriate signals tostop it again. Init scripts would have to pay attention to the PID file and...
hypervisors: change MigrateInstance API
Currently the $hypervisor.MigrateInstance takes the instance name. Thispatch changes it to take the instance object, such that other instanceproperties (especially hvparams) are available to it.
Workaround fake failures in drbd+live migration
This patch is an attempt to fix the ugly issue during migration: Cannot resync disks on node …: [True, 100]
If my understanding is correct, sometimes we poll the /proc/drbd file atan inoportune moment, while it's being updated, or while the DRBD device...
Another round of pylint-related style fixes
A newer version of pylint, more warnings…
Implement cluster verify checks for wrong PV names
Since ':' is not a valid character in PV names (for the way Ganeti usesLVM), we need to check this and warn the user. This patch adds a newNV_PVLIST cluster verify check and verifies the PV names returned from...
backend: Convert to utils.Retry
Epydoc fixes
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
backend: Don't overwrite function parameter with loop variable
Adding '--no-ssh-init' option to 'gnt-cluster init'.
Allows the initialization of a cluster without the creation or distributionof SSH key pairs. Includes changes for LeaveCluster and RPC.
Signed-off-by: Ken Wehr <ksw@google.com>Signed-off-by: Guido Trotter <ultrotter@google.com>...
Try to reduce wrong errors in InstanceShutdown
In backend.InstanceShutdown(), there is a race condition betweenchecking that the instance exists and trying to shut it down whichtranslates sometime in error messages like:
Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed...
Revert breakage introduced in e4e9b80
Commit e4e9b8064787df01a79846a40f49c8ae06a8eb0e introduced two problemsin backend.InstanceShutdown():
- first, it reduced the check interval significantly (especially for the first few checks); there are very few production VMs that shutdown in...
Introduce checks for /sys and /proc
This patch adds checks for /proc and /sys in cluster verify, sinceGaneti relies on these special filesystems to be mounted.
Add timeout options to other LUs
All the LUs that shut down the instance need to be able too pass thetimeout parameter as well.
Code and docstring style fixes
Found using pylint and epydoc.
Accept shutdown timeout from the user
Using the new --timeout option:
- gnt-instance shutdown is changed to accept a timeout- the opcode is changed to hold one- the LU is changed to optionally get one- the rpc is changed to carry one- the backend is changed to take it as a parameter rather than...
backend.InstanceShutdown: small cleanup
1) unhardcode the timeout, abstracting it in a constant2) Use time.time() rather than hiding the timeout in a range()3) call hyper.StopInstance multiple times -- currently all hypervisors just ignore all calls but once...
Populate OS variants if an api >= 15 is present
Adding the file name to the os_files dict will fill in the full path andget it checked, if present we also read it and split into lines, one perdeclared variant.
OSEnvironment: populate OS_VARIANT
According to the design on api_version >= 15 the OS variant is the partof the OS name after the "+" sign. If none is found, we just pass in thefirst variant an OS declares (which is bound to exist, as we check forit in _TryOSFromDisk)....
OSFromDisk: handle variants when loading os
When we load an OS from disk, we need _TryOSFromDisk to get the realname, without any variant. This allows any functionality that uses theinstance OS to handle a name with a variant.
Add per-node variants list to OS diagnose output
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Convert os api version file name to a constant
TryOSFromDisk: s/os_scripts/os_files/
We'll be using this dict/loop to check more than just scripts, so we'rerenaming the variables appropriately.
TryOSFromDisk: only check actual os scripts for +x
Currently all checked files in the loop are os scripts, so nothing willchange, but in the future we only want the +x bit on actual os scripts,not necessarily all files.
Remove secrets and kill confd on cluster leave
Treat virtual LVs as inexistent
Currently, “gnt-cluster verify” and “gnt-cluster verify-disks” use thelist of LVs as returned by backend.GetVolumeList to determine whether anLV exists or not. However, LVs can also be ‘virtual’, which is handledcorrectly (i.e. as missing) by the bdev code, but not by this function....
Use ReadFile/WriteFile in more places
This survived QA, burnin and unittests.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Luca Bigliardi <shammash@google.com>
Add disk copy support at backend and the rpc level
This uses a simple 'dd if=… | ssh $target dd of=…' method, like theExportSnapshot (which uses the OS export; here we want full disk-levelcopy and not any FS-level changes).
Merge commit 'origin/next' into branch-2.1
Merge branch 'master' into next
Fix detecting of errors in export
This should fix issue 61, by explicitely calling bash (which is is now anon-explicit dependency) and setting the pipefail command.
Use objects for blockdev_getmirrorstatus RPC call result
This patch changes the return type for backend.BlockdevGetmirrorstatus froma list of tuples to a list of objects.BlockDevStatus instances.
Use object for blockdev_find RPC call result
This patch changes the return type for backend.BlockdevFind to an object(objects.BlockDevStatus). Before a tuple was used. Adding more values tothis tuple causes a lot of work. Converting the result to an object with...
rpc: add rpc call for getting disk size
Note that this exports the disk size as bdev returns it, in bytes. Thevalue will be converted to MiB in cmdlib.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Extend call_node_start_master rpc with no_voting
When the parameter is set to True and start_daemons is also True,ganeti-masterd will be started with the new --no-voting --yes-do-itoptions.
This new option is set to True only on masterfailover, when no_voting is...
Remove <DAEMON>_PID constants
The <DAEMON>_PID constants were created to reference a daemon pid file,but actually contain a daemon's name, because the various functions thatwork with pidfiles abstract the filename from the daemon namethemselves. Removing the constants and using the actual daemon name...
Change GetNodeDaemonPort to GetDaemonPort in utils
GetNodeDaemonPort is used to lookup the node daemon port in the servicesfile, and if not found to return the default one. We make it a genericfunction, which accepts the daemon name in input, so that it can be used...
Generate a shared HMAC key at cluster init time
This key is shared on all nodes (via cmdlib._RedistributeAncillaryFiles)and will be used for HMAC authentication of confd messages.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix backend import errors from GetHypervisorClass
The merge of commit 360b0dc into branch-2.1 broke import of backend,since it uses hypervisor.GetHypervisor() which returns an instance ofthe hypervisor. Some of the hypervisors create directories at init time,...
Merge branch 'next' into branch-2.1
Conflicts: lib/backend.py: non-trivial conflict but easy to solve
backend: Only build once the list of upload files
The list of upload files is built currently at every UploadFile() call.This patch moves it to a separate variable which is initialized onlyonce.
This won't make much difference but I regard it as cleanup....
Fix pylint warnings
Fix some typos
backend.DemoteFromMC: don't fail for missing files
If the config file is missing when the DemoteFromMC() function iscalled, it will raise a ProgrammerError. Instead of changing theutils.CreateBackup() file which is called from multiple places, for nowwe only change the DemoteFromMC() function to not call it if the file is...
Merge branch 'master' into branch-2.1
Introduce OS api version 15
Also, since Ganeti 2.1 will be compatible with both 10 and 15, changethe OS_API_VERSION constant to be an OS_API_VERSIONS set, and update theplaces in the code that used that constat to use something else.
In particular: - in the qa for now we just create a fake version 10 OS...
_OSOndiskAPIVersion: save a loop
The api_versions list is first stripped and then converted to integer.Combining the two operations.
Use ReadFile.splitlines() rather than readlines
A few places in the code open a file "manually" rather than using ourwrapper function, because they need an array with the lines. Combiningthe result of utils.ReadFile with splitlines() we get rid of theexceptions....
Rename _OSOndiskVersion to _OSOndiskAPIVersion
This makes what versions we're talking about clearer.
backend.StartMaster: fix variable name
As per comments for patch “Convert node_start_master to new styleresult”, the ‘payload’ variable is renamed to ‘err_msgs’.
Fix a typo in backend.InstanceReboot docstring
The documentation for the reboot was wrong. This patch fixes it andupdates the docstring with more details.
Fix various pylint warnings
There were multiple issues: - copy-paste resulted in wrong indentation - wrong function name - missing spaces around assignment - overriding built-in names (type, dir) or already defines ones (errors, hypervisor)
Fix backend.{Start,Stop}Master
Commit c26a6bd21c17641f718369caed88ae16947fa774 changed GetMasterInfonot to return a tuple anymore, but didn't update its two callers inbackend.py, which were trying to extract the values from the secondtuple element. This causes a stack trace in node-daemon.log....