Michael Hanselmann [Fri, 30 Oct 2009 16:36:59 +0000 (17:36 +0100)]
Add generic retry loop function
There are quite a few retry loops with timeouts in Ganeti's
code. Duplicating code is not good, so this patch introduces
a new function named “utils.Retry” to remedy this situation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 3 Nov 2009 10:29:30 +0000 (11:29 +0100)]
Ignore log messages in unittests
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 27 Oct 2009 07:54:24 +0000 (16:54 +0900)]
Some improvements to gnt-node repair-storage
Currently the repair storage has two issues:
- down instances are aborting the operation, even though they should be
ignored (it's not technically possible to know their disk status
unless we would activate their disks)
- if the VG is so broken that disks cannot be activated via gnt-instance
activate-disks or gnt-instance startup, it's not possible to repair
the VG at all
The patch makes the opcode skip down instances and also introduces an
``--ignore-consistency`` flag for forcing the execution of the LU.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 27 Oct 2009 05:55:15 +0000 (14:55 +0900)]
Convert the rest of the OpPrereqError users
This finishes the conversion of OpPrereqError creation to two-argument
style. Any leftovers as one-argument are not breaking anything, just
losing information about the errors.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 27 Oct 2009 05:27:46 +0000 (14:27 +0900)]
Add ecode to rpc.py's RpcResult.Raise()
This patch adds a new ecode argument to RpcResult.Raise(). This allows
specifying the error code (for both OpExec and OpPrereq errors).
Note that this patch also makes the OpExecError exceptions raised from
_FindFaultInstanceDisks have the error code classification.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 27 Oct 2009 05:15:53 +0000 (14:15 +0900)]
Introduce two-argument style for OpPrereqError
This patch introduces a two-argument style for OpPrereqError. Only the
direct raise calls in cmdlib.py are converted, other users will follow.
cli.py is modified to handle both two-argument style and the current
format. RAPI doesn't need modification as the way we encode errors is
already using a list for the error arguments, so RAPI users only need to
start checking the list length and the second argument.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 2 Nov 2009 11:52:49 +0000 (12:52 +0100)]
Remove the OpRetryError exception
This is only used in two places, in an error path that is no longer
valid since Ganeti 2.0. We remove the try..except since we should not
get it anymore (and if we do, then we should catch it in all
config.Update cases) and we remove the exception class completely.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 29 Oct 2009 17:31:26 +0000 (18:31 +0100)]
Activate disks while exporting an instance
Exporting an instance not running or without activated disks
will fail. This patch makes sure to activate disks before
exporting an instance if it's in the ADMIN_down state.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 30 Oct 2009 16:33:18 +0000 (17:33 +0100)]
Epydoc fixes
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 30 Oct 2009 13:46:30 +0000 (14:46 +0100)]
backend: Don't overwrite function parameter with loop variable
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 29 Oct 2009 11:42:29 +0000 (12:42 +0100)]
Add QA test for “gnt-node {list,modify,repair}-storage”
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 27 Oct 2009 03:56:00 +0000 (12:56 +0900)]
Unify the query fields for the storage framework
This patch unifies the query fields in the storage framework for all
types. Note that the information is still computed on-demand, so if e.g.
the used disk space is not requested for the ‘file’ type, it won't be
computed on nodes.
Summary of changes:
- improve the LVM storage type to support multiple lvm fields in the
LIST_FIELDS declaration and constant (not-computed via lvm commands)
fields
- rename utils.GetFilesystemFreeSpace to utils.GetFilesystemStats
returning tuple of (total, free)
- add used and free as valid fields for lvm-vg (use being computed as
vg_size-vg_free)
- make allocatable accepted for all types (ones which are always
allocatable always return True)
- add a new list field ‘type’ that gives the current selected type; not
much useful today (except for understanding what the default output
is) but in the future might help if we want to list multiple types
- add type, size and allocatable to the default output field list
- update the man page with details on how, for file storage, size ≠ used
+ free for non-mountpoint cases
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 29 Oct 2009 17:30:56 +0000 (18:30 +0100)]
Make cluster initialization more reliable
There was a race condition between starting the node daemon
and sending requests to write the ssconf files. With this
patch, the initialization waits up to ten seconds for the
node daemon to become responsive.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 29 Oct 2009 16:06:13 +0000 (17:06 +0100)]
Don't show warnings on ADMIN_down instance failover
Before:
$ gnt-instance failover -f inst1
… checking disk consistency between source and target
… - WARNING: Can't find disk on node node21.example.com
… shutting down instance on source node
After:
$ gnt-instance failover -f inst1
… not checking disk consistency as instance is not running
… shutting down instance on source node
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 29 Oct 2009 10:09:19 +0000 (11:09 +0100)]
Update NEWS
Add rapi_users changes, rearrange a bit and one wording change.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 28 Oct 2009 18:33:54 +0000 (19:33 +0100)]
Add remote API users and passwords documentation
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 28 Oct 2009 17:08:28 +0000 (18:08 +0100)]
ganeti-rapi: Use new function to verify passwords
This enables the use of hashed passwords in rapi_users.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 28 Oct 2009 17:07:53 +0000 (18:07 +0100)]
http.auth: Add new function to verify passwords
This new function supports two schemes for passwords:
- Old-style cleartext passwords
- Hashed passwords according to RFC2617 (H(A1))
Schemes are differentiated by their prefix, a concept also
used in OpenLDAP. Cleartext passwords can no longer start
with an opening brace ("{") unless they're prefixed with
"{cleartext}" (case insensitive).
Currently there's no documentation for rapi_users at all.
It'll be in a consecutive patch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 27 Oct 2009 14:24:46 +0000 (15:24 +0100)]
Makefile.am: Add more checks to distcheck-hook
Also use grep only to convert find's output to an exit status.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 27 Oct 2009 02:33:21 +0000 (11:33 +0900)]
Documentation updates
Our admin guide was very very trivial. This patch updates it to contain
advice on when to use which commands, removes the instance
administration part from the installation guide (moved to the admin
guide), and adds a walkthrough document that should be useable as a
starting point for new admins.
The patch also adds emacs variables to the documents, and rewraps some
which were not already at 72 chars.
The doc updates also show backwards-compatible commands for Ganeti 2.0,
as we don't have a good up-to-date 2.0 document and people might refer
to this set of documentation even when running that.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 27 Oct 2009 04:59:38 +0000 (13:59 +0900)]
Fix another style issue
For the Nth time, re-fix shadowing of outer-scope variable :)
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 27 Oct 2009 03:24:19 +0000 (12:24 +0900)]
Make gnt-node list-storage more standard
This patch adds support for the -o+field,… format that the other list
commands accept and changes the format of the allocatable field from
simply str(bool) to Y/N.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 27 Oct 2009 03:13:01 +0000 (12:13 +0900)]
Rename the node storage commands
To reduce confusion, the following gnt-node commands are renamed:
- physical-volumes → list-storage
- modify-volume → modify-storage
- repair-volume → repair-storage
The NEWS file is update accordingly and it also gets emacs local
variables.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 27 Oct 2009 04:44:55 +0000 (13:44 +0900)]
Fix an error handling case in TLReplaceDisks
pylint is your friend, since the compiler doesn't exist.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 27 Oct 2009 13:14:15 +0000 (14:14 +0100)]
Provide feedback from redistributing configuration
This is particularily useful for “gnt-cluster redist-conf”, but
also for all other cases where the configuration files are
rewritten on other nodes.
$ gnt-cluster redist-conf
… Copy of file /var/lib/ganeti/config.data to node … failed: Error while
executing backend function: [Errno 1] Operation not permitted
… Error while uploading ssconf files to node …: Error while executing backend
function: [Errno 1] Operation not permitted
$ gnt-node modify --offline no --force node3.example.com
… - WARNING: Not enough master candidates (desired 10, new value will be 4)
… Copy of file /var/lib/ganeti/config.data to node node8.example.com failed:
Error while executing backend function: [Errno 1] Operation not permitted
Modified node node3.example.com
- offline -> True
- master_candidate -> auto-demotion due to offline
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 26 Oct 2009 18:22:00 +0000 (19:22 +0100)]
bash_completion: Move common code into function
This reduces the size of the script by about 9 kB.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 26 Oct 2009 11:43:19 +0000 (12:43 +0100)]
Makefile.am: Wrap long lines
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Mon, 26 Oct 2009 11:39:05 +0000 (12:39 +0100)]
Include NEWS in documentation again
This was implemented in
350ecfecca and reverted in
700bb84367
after it broke “make distcheck”. With other changes in this
patch series this will work now.
Contributing to the original problem was that the news.rst file
was not distributed. When we distribute the build documentation,
the source must also be included (see Automake manual).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 26 Oct 2009 11:00:30 +0000 (12:00 +0100)]
Makefile.am: Don't include MAINTAINERCLEANFILES in EXTRA_DIST
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 26 Oct 2009 10:56:57 +0000 (11:56 +0100)]
Makefile.am: Use noinst_DATA instead of all-local target
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 26 Oct 2009 10:48:55 +0000 (11:48 +0100)]
Makefile.am: Make HTML doc building depend on stamp file
This patch also adds an explicit list of all files written by
sphinx (“docoutput”).
By using an explicit list the build process is more predictable
and will allow us to include the NEWS file again.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 23 Oct 2009 10:51:29 +0000 (12:51 +0200)]
Makefile.am: Use dependencies to create symlinks only if necessary
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 23 Oct 2009 10:04:29 +0000 (12:04 +0200)]
Makefile.am: Move stamp-directories to BUILT_SOURCES
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 26 Oct 2009 12:21:30 +0000 (21:21 +0900)]
Fix gnt-debug breakage due to options move
Commits d3ed23f and 4eb6265 broke gnt-debug due to renamed option
targets. Sorry again!
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 26 Oct 2009 12:07:47 +0000 (21:07 +0900)]
Fix gnt-node evacuate w. iallocator
Commit
2bb5c911 moved around and changed the _RunAllocator function in
the DiskReplace → TaskLet conversion, but in the process it changed the
relocate_from argument from a list of nodes to just the secondary node.
This breaks the protocol and current iallocator scripts.
This patch fixes that but also adds a local variable 'instance' since
it's not nice to write self.instance so many times.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Fri, 23 Oct 2009 14:42:37 +0000 (10:42 -0400)]
InstanceIpToNodePrimaryIpQuery: use a query dict
In 95b487b we changed InstanceIpToNodePrimaryIpQuery to be able to query
multiple instances at once. We also need to be able to query ips
belonging to a specific nic link, so what we do is:
1) Move the "query" argument to a dict, containing different fields
2) Explicit the "query for a single ip" or "query for a list" options.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 23 Oct 2009 14:29:37 +0000 (10:29 -0400)]
SimpleConfigReader: ips are partitioned by link
We were already half-doing it, but this completes the process.
1) We don't maintain a list of ips or an ip->instance map
2) We add a new link,ip->instance map (link->ips list we had)
3) We add the link parameter to GetInstanceByIp (making it
GetInstanceByLinkIp)
4) We change the GetInstanceByIp caller to pass None as link
(thus for now using only the default link)
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 23 Oct 2009 14:26:56 +0000 (10:26 -0400)]
SimpleConfigReader: queries for default nicparams
GetDefaultNicParams returns the default nic parameters.
GetDefaultNicLink returns the default nic link.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 7 Oct 2009 14:40:46 +0000 (15:40 +0100)]
Use RUN_IN_TEMPDIR in Makefile.am
Since we have this variable and use it in other places, remove the only
leftover hardcoded place.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Guido Trotter [Thu, 8 Oct 2009 09:12:25 +0000 (10:12 +0100)]
Import errors in confd __init__
It's used by some functions defined there.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Iustin Pop [Mon, 26 Oct 2009 05:39:49 +0000 (14:39 +0900)]
Allow '@' in tag values
This allows using an email address (as is) as part of a tag. The main
problem that could arise is when parsing tags from a shell script, but
(AFAIK) '@' is not a special character when used in values (happy to be
corrected if not true).
The patch also moves the re to be compiled at class init time, should
use less resources; in my tests it is fine to use a compiled re from
multiple threads.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 26 Oct 2009 05:28:20 +0000 (14:28 +0900)]
Fix gnt-node modify-volume
This was broken by me in 064c21f, sorry!
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 26 Oct 2009 05:18:23 +0000 (14:18 +0900)]
gnt-node: add short option -t for --storage-type
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 22 Oct 2009 21:43:31 +0000 (17:43 -0400)]
init script: allow singling out confd as well
Currently we can start/stop the various subdaemons, but not confd.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Michael Hanselmann [Thu, 22 Oct 2009 15:20:16 +0000 (17:20 +0200)]
cmdlib._AssembleInstanceDisks: Fix case where variable wouldn't be set
The “result” variable may not be set and/or come from the previous loop.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 22 Oct 2009 15:15:28 +0000 (17:15 +0200)]
Makefile: Use path from configure script for sphinx-build
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Ken Wehr <ksw@google.com>
Guido Trotter [Thu, 22 Oct 2009 14:52:11 +0000 (10:52 -0400)]
KVM netscript: add static routes, with no suffix
The /32 suffix is useless, since the kernel already assumes single-host,
if no suffix is specified. Moreover we prefer these routes to be
"static" so that routing daemons, if present, won't mess with them.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 22 Oct 2009 12:04:54 +0000 (14:04 +0200)]
gnt-job manpage: Remove detailed description for lock_status
The format changed in the meantime and should be self-explanatory.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Wed, 21 Oct 2009 22:44:07 +0000 (18:44 -0400)]
KVMHypervisor: configure v6 parameters on nic
In routing mode we are tweaking a few parameters on the interface. With
this patch we'll tweak both the v4 and v6 ones.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 21 Oct 2009 22:13:21 +0000 (18:13 -0400)]
KVMHypervisor: implement instance policy routing
Until now we relied on traffic from instances being policy routed via a
rule based on the instance network. With this change we can enforce it
on the instance interfaces. Since the ip rules survive interface
disappearing and reappearing, we need first to remove leftover rules,
and then to apply the new one, when creating the interface.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 21 Oct 2009 21:25:44 +0000 (17:25 -0400)]
Man page for ganeti-confd
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Ken Wehr [Wed, 21 Oct 2009 16:15:47 +0000 (12:15 -0400)]
Adding '--no-ssh-init' option to 'gnt-cluster init'.
Allows the initialization of a cluster without the creation or distribution
of SSH key pairs. Includes changes for LeaveCluster and RPC.
Signed-off-by: Ken Wehr <ksw@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Flavio Silvestrow [Wed, 21 Oct 2009 19:01:01 +0000 (20:01 +0100)]
confd: query the pnode of multiple instances at once
Signed-off-by: Flavio Silvestrow <flaviops@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Thu, 22 Oct 2009 06:49:23 +0000 (15:49 +0900)]
Try to reduce wrong errors in InstanceShutdown
In backend.InstanceShutdown(), there is a race condition between
checking that the instance exists and trying to shut it down which
translates sometime in error messages like:
Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed
to force stop instance instance9: Failed to stop instance instance9:
exited with exit code 1, Error: Domain 'instance9' does not exist.
To fix this, we ignore any hypervisor StopInstance() errors if the
instance doesn't exist anymore, since our purpose (to make the instance
go away) is already accomplished.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 22 Oct 2009 06:36:40 +0000 (15:36 +0900)]
Revert breakage introduced in e4e9b80
Commit
e4e9b8064787df01a79846a40f49c8ae06a8eb0e introduced two problems
in backend.InstanceShutdown():
- first, it reduced the check interval significantly (especially for the
first few checks); there are very few production VMs that shutdown in
one second, and while not breaking anything this creates unnecessary
load for the hypervisor
- second, a wrong test added to the while condition (“not tried_once”)
means that we only sleep once for an instance, and after that we
immediately kill it forcefully
These two together means that any instance which is not lucky enough to
finish in roughly 1-1.5 seconds (the time it takes to sleep and verify
again the instance list) will have this happen:
2009-10-21 23:33:46,034: pid=16634 INFO Called for inst9 w. False/False
2009-10-21 23:33:47,440: pid=16634 ERROR Shutdown of 'inst9' unsuccessful, forcing
2009-10-21 23:33:47,440: pid=16634 INFO Called for inst9 w. True/False
The “Called…” are logs from the hypervisor shutdown function. This means
of course that at restart time:
[
12775866.644682] EXT3-fs: INFO: recovery required on readonly filesystem.
[
12775866.644689] EXT3-fs: write access will be enabled during recovery.
[
12775868.533674] kjournald starting. Commit interval 5 seconds
[
12775868.533697] EXT3-fs: sda1: orphan cleanup on readonly fs
[
12775868.551797] EXT3-fs: sda1: 12 orphan inodes deleted
[
12775868.551803] EXT3-fs: recovery complete.
[
12775868.586275] EXT3-fs: mounted filesystem with ordered data mode.
This patch reverts the broken test and changes the sleep to a fixed
duration of five seconds, since it makes no sense to check that often
for shutdown (and after ~20 seconds we anyway reach a stable value of
five seconds).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 22 Oct 2009 06:07:33 +0000 (15:07 +0900)]
Xen: Ignore the retry argument in stop instance
Commit 4ad4511 changed the KVM hypervisor to send multiple shutdown
requests to the monitor, but it didn't change this for the Xen
hypervisor. We simply remove the return on retry model, since we do want
to send multiple shutdown signals for both Xen and KVM (even if the
behaviour is not perfect, they should behave the same).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 20 Oct 2009 16:17:12 +0000 (18:17 +0200)]
Ensure RpcResult has “payload” attribute
Also add assertions to avoid missing attributes in the future.
They won't be included in optimized bytecode.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Tue, 20 Oct 2009 20:28:49 +0000 (16:28 -0400)]
Fix typo in install.rst
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Guido Trotter [Tue, 20 Oct 2009 15:49:09 +0000 (11:49 -0400)]
install.rst: mention xen config for live migration
This addresses issue 75.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Mon, 19 Oct 2009 16:48:47 +0000 (18:48 +0200)]
Bump version to 2.1.0~beta2
I forgot to bump the configure.ac version before tagging the 2.1.0~beta1
release. Since we cannot remove old tags (see “On Re-tagging” in git-tag(1)),
we have to call this release 2.1.0~beta2.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Iustin Pop [Mon, 19 Oct 2009 05:12:45 +0000 (14:12 +0900)]
Introduce checks for /sys and /proc
This patch adds checks for /proc and /sys in cluster verify, since
Ganeti relies on these special filesystems to be mounted.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 16 Oct 2009 16:23:23 +0000 (18:23 +0200)]
Fix serializer unittests
Commit
d22b29997cd broke the serializer unittests with certain
versions of simplejson. This patch removes sort_keys again
and implements a slightly more efficient way of detecting
simplejson functionality. The serializer unittests no longer
use a partially broken mock, but rather a function to convert all
tuples to lists before comparing.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 16 Oct 2009 14:32:00 +0000 (16:32 +0200)]
cfgupgrade: Implement upgrade to 2.1.0
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 16 Oct 2009 14:30:44 +0000 (16:30 +0200)]
bootstrap: Factorize HMAC key generation
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 16 Oct 2009 14:21:26 +0000 (16:21 +0200)]
Make bootstrap._GenerateSelfSignedSslCert public
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 16 Oct 2009 14:33:31 +0000 (16:33 +0200)]
cfgupgrade: Remove Ganeti 1.2 support
This also fixes a few typos.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 16 Oct 2009 13:56:21 +0000 (15:56 +0200)]
serializer: Sort keys in JSON
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 12 Oct 2009 15:38:47 +0000 (17:38 +0200)]
Bump version to 2.1.0~beta0
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 15 Oct 2009 12:39:14 +0000 (14:39 +0200)]
mcpu: Use new timeout class for timeout
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 15 Oct 2009 12:38:31 +0000 (14:38 +0200)]
locking: Convert pipe condition to new timeout class
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 15 Oct 2009 12:32:58 +0000 (14:32 +0200)]
locking.LockSet: Move timeout calculation to separate class
This class can also be used by mcpu.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 13 Oct 2009 17:23:58 +0000 (19:23 +0200)]
locking, mcpu: Ensure timeout is always >= 0.0
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 13 Oct 2009 16:34:04 +0000 (18:34 +0200)]
locking.LockSet: Improve assertions
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Tue, 13 Oct 2009 16:33:47 +0000 (18:33 +0200)]
locking: Factorize LockSet.acquire
By moving the main code of LockSet.acquire to its own function
we reduce the code complexity a bit and clarify the exception
handling.
This also fixes a case where a lock acquire timeout wasn't
handled correctly, leading to obscure error messages.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Tue, 13 Oct 2009 16:31:30 +0000 (18:31 +0200)]
mcpu: Make sure added locks are released on errors
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Tue, 13 Oct 2009 16:30:40 +0000 (18:30 +0200)]
Test LockSet.acquire return value for timeout
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Tue, 13 Oct 2009 16:30:19 +0000 (18:30 +0200)]
opcodes: Add missing shutdown_timeout to OpRemoveInstance
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Tue, 13 Oct 2009 16:29:56 +0000 (18:29 +0200)]
luxi: Pass socket path directly to exception, not in tuple
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Tue, 13 Oct 2009 12:26:49 +0000 (13:26 +0100)]
gnt-* use the correct opcode slot to build opcodes
gnt-* scripts were building wrong opcodes for commands which had the
shutdown_timeout slot (due to missing testing after renaming). Fixing.
Also change SHUTDOWN_TIMEOUT_OPT dest field name to "shutdown_timeout":
it was set to "timeout". It would still work that way, but possibly be
confusing.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Tue, 13 Oct 2009 11:22:35 +0000 (12:22 +0100)]
Update NEWS for instance shutdown timeout
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 13 Oct 2009 12:12:57 +0000 (14:12 +0200)]
Update documentation for recreate-disks
This also clarifies the UUIDs NEWS entry.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 13 Oct 2009 12:01:20 +0000 (14:01 +0200)]
rapi: fix tag operations
This patch fixes the tag PUT/DELETE operations, and additionally changes
the _Tags_* functions to take only positional and not keyword arguments
(the defaults do not make any sense at all, and they are always called
with all arguments).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Mon, 12 Oct 2009 15:18:37 +0000 (17:18 +0200)]
Update NEWS for Ganeti 2.1
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 12 Oct 2009 15:19:25 +0000 (17:19 +0200)]
Convert NEWS to ASCII
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Mon, 12 Oct 2009 18:05:48 +0000 (19:05 +0100)]
Update manpages for --shutdown-timeout
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Mon, 12 Oct 2009 11:49:50 +0000 (12:49 +0100)]
Add timeout options to other LUs
All the LUs that shut down the instance need to be able too pass the
timeout parameter as well.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Mon, 12 Oct 2009 15:43:55 +0000 (16:43 +0100)]
cli: add SHUTDOWN_TIMEOUT_OPT
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Mon, 12 Oct 2009 10:46:22 +0000 (12:46 +0200)]
mcpu: Change lock attempt timeout calculation
With this patch all timeouts are pre-calculated. The interface of
the _LockTimeoutStrategy class is also changed a bit; NextAttempt
now returns a new instance.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Wed, 7 Oct 2009 16:15:25 +0000 (18:15 +0200)]
Code and docstring style fixes
Found using pylint and epydoc.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Wed, 7 Oct 2009 14:11:22 +0000 (16:11 +0200)]
mcpu: Improve lock reporting with timeouts
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Wed, 7 Oct 2009 12:58:06 +0000 (14:58 +0200)]
mcpu: Implement lock timeouts
The timeout is always between ~0.1 and ~10.0 seconds. A small
variation of ±5% is added to prevent different jobs from
fighting each other. After 10 attempts to acquire the locks with
a timeout, a blocking acquire is made.
Lock status reporting will be improved in a separate patch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 5 Oct 2009 14:16:15 +0000 (16:16 +0200)]
mcpu: Remove unused exclusive_BGL attribute
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 9 Oct 2009 11:39:17 +0000 (13:39 +0200)]
locking.LockSet: Implement acquire timeouts
The timeout passed to LockSet.acquire() is measured over all lock acquires. If
LockSet.acquire fails to acquire all requested locks within the specified
amount of time, all locks are released again and the acquire fails.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Fri, 9 Oct 2009 13:03:39 +0000 (14:03 +0100)]
Update gnt-instance(8) for shutdown --timeout
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 9 Oct 2009 10:52:38 +0000 (11:52 +0100)]
Accept shutdown timeout from the user
Using the new --timeout option:
- gnt-instance shutdown is changed to accept a timeout
- the opcode is changed to hold one
- the LU is changed to optionally get one
- the rpc is changed to carry one
- the backend is changed to take it as a parameter rather than
hardcoding it in the function
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 9 Oct 2009 10:52:02 +0000 (11:52 +0100)]
cli: add a timeout option
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 9 Oct 2009 11:04:07 +0000 (12:04 +0100)]
ChrootManager: clean StopInstance
Currently it has lots for duplicated code, and internal retries.
Clean it up with the following assumptions:
We'll probably be called more than once.
It is ok to fail to stop, unless we're called with force=True.
If we're called only once, and with force=True it's ok not to run the
chroot "cleanup" script (it's a destroy after all, why should chroots
have more chances than other instances?).
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Guido Trotter [Fri, 9 Oct 2009 10:49:54 +0000 (11:49 +0100)]
KVMHypervisor: use the StopInstance retry feature
Since we know StopInstance is going to be called more than once (at
least twice, once with force and once without, but normally quite a lot
more) we don't need our own sleep/loop, and we can just send one monitor
command per call.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Guido Trotter [Fri, 9 Oct 2009 10:35:34 +0000 (11:35 +0100)]
backend.InstanceShutdown: small cleanup
1) unhardcode the timeout, abstracting it in a constant
2) Use time.time() rather than hiding the timeout in a range()
3) call hyper.StopInstance multiple times
-- currently all hypervisors just ignore all calls but once
4) Use hyper.ListInstances() rather than GetInstanceList([hv_name])
-- it's cheaper :)
5) Change the final message to "forcing" from "using destroy"
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Guido Trotter [Fri, 9 Oct 2009 10:17:57 +0000 (11:17 +0100)]
Add default instance shutdown timeout constant
It reflects the "current" two minutes we give to the instance.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>