ganeti-local
14 years agoAdd generic retry loop function
Michael Hanselmann [Fri, 30 Oct 2009 16:36:59 +0000 (17:36 +0100)]
Add generic retry loop function

There are quite a few retry loops with timeouts in Ganeti's
code. Duplicating code is not good, so this patch introduces
a new function named “utils.Retry” to remedy this situation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoIgnore log messages in unittests
Michael Hanselmann [Tue, 3 Nov 2009 10:29:30 +0000 (11:29 +0100)]
Ignore log messages in unittests

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoSome improvements to gnt-node repair-storage
Iustin Pop [Tue, 27 Oct 2009 07:54:24 +0000 (16:54 +0900)]
Some improvements to gnt-node repair-storage

Currently the repair storage has two issues:

- down instances are aborting the operation, even though they should be
  ignored (it's not technically possible to know their disk status
  unless we would activate their disks)
- if the VG is so broken that disks cannot be activated via gnt-instance
  activate-disks or gnt-instance startup, it's not possible to repair
  the VG at all

The patch makes the opcode skip down instances and also introduces an
``--ignore-consistency`` flag for forcing the execution of the LU.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoConvert the rest of the OpPrereqError users
Iustin Pop [Tue, 27 Oct 2009 05:55:15 +0000 (14:55 +0900)]
Convert the rest of the OpPrereqError users

This finishes the conversion of OpPrereqError creation to two-argument
style. Any leftovers as one-argument are not breaking anything, just
losing information about the errors.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd ecode to rpc.py's RpcResult.Raise()
Iustin Pop [Tue, 27 Oct 2009 05:27:46 +0000 (14:27 +0900)]
Add ecode to rpc.py's RpcResult.Raise()

This patch adds a new ecode argument to RpcResult.Raise(). This allows
specifying the error code (for both OpExec and OpPrereq errors).

Note that this patch also makes the OpExecError exceptions raised from
_FindFaultInstanceDisks have the error code classification.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoIntroduce two-argument style for OpPrereqError
Iustin Pop [Tue, 27 Oct 2009 05:15:53 +0000 (14:15 +0900)]
Introduce two-argument style for OpPrereqError

This patch introduces a two-argument style for OpPrereqError. Only the
direct raise calls in cmdlib.py are converted, other users will follow.

cli.py is modified to handle both two-argument style and the current
format. RAPI doesn't need modification as the way we encode errors is
already using a list for the error arguments, so RAPI users only need to
start checking the list length and the second argument.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRemove the OpRetryError exception
Iustin Pop [Mon, 2 Nov 2009 11:52:49 +0000 (12:52 +0100)]
Remove the OpRetryError exception

This is only used in two places, in an error path that is no longer
valid since Ganeti 2.0. We remove the try..except since we should not
get it anymore (and if we do, then we should catch it in all
config.Update cases) and we remove the exception class completely.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoActivate disks while exporting an instance
Michael Hanselmann [Thu, 29 Oct 2009 17:31:26 +0000 (18:31 +0100)]
Activate disks while exporting an instance

Exporting an instance not running or without activated disks
will fail. This patch makes sure to activate disks before
exporting an instance if it's in the ADMIN_down state.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoEpydoc fixes
Michael Hanselmann [Fri, 30 Oct 2009 16:33:18 +0000 (17:33 +0100)]
Epydoc fixes

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agobackend: Don't overwrite function parameter with loop variable
Michael Hanselmann [Fri, 30 Oct 2009 13:46:30 +0000 (14:46 +0100)]
backend: Don't overwrite function parameter with loop variable

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd QA test for “gnt-node {list,modify,repair}-storage”
Michael Hanselmann [Thu, 29 Oct 2009 11:42:29 +0000 (12:42 +0100)]
Add QA test for “gnt-node {list,modify,repair}-storage”

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoUnify the query fields for the storage framework
Iustin Pop [Tue, 27 Oct 2009 03:56:00 +0000 (12:56 +0900)]
Unify the query fields for the storage framework

This patch unifies the query fields in the storage framework for all
types. Note that the information is still computed on-demand, so if e.g.
the used disk space is not requested for the ‘file’ type, it won't be
computed on nodes.

Summary of changes:
- improve the LVM storage type to support multiple lvm fields in the
  LIST_FIELDS declaration and constant (not-computed via lvm commands)
  fields
- rename utils.GetFilesystemFreeSpace to utils.GetFilesystemStats
  returning tuple of (total, free)
- add used and free as valid fields for lvm-vg (use being computed as
  vg_size-vg_free)
- make allocatable accepted for all types (ones which are always
  allocatable always return True)
- add a new list field ‘type’ that gives the current selected type; not
  much useful today (except for understanding what the default output
  is) but in the future might help if we want to list multiple types
- add type, size and allocatable to the default output field list
- update the man page with details on how, for file storage, size ≠ used
  + free for non-mountpoint cases

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoMake cluster initialization more reliable
Michael Hanselmann [Thu, 29 Oct 2009 17:30:56 +0000 (18:30 +0100)]
Make cluster initialization more reliable

There was a race condition between starting the node daemon
and sending requests to write the ssconf files. With this
patch, the initialization waits up to ten seconds for the
node daemon to become responsive.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoDon't show warnings on ADMIN_down instance failover
Michael Hanselmann [Thu, 29 Oct 2009 16:06:13 +0000 (17:06 +0100)]
Don't show warnings on ADMIN_down instance failover

Before:
$ gnt-instance failover -f inst1
… checking disk consistency between source and target
… - WARNING: Can't find disk on node node21.example.com
… shutting down instance on source node

After:
$ gnt-instance failover -f inst1
… not checking disk consistency as instance is not running
… shutting down instance on source node

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoUpdate NEWS
Michael Hanselmann [Thu, 29 Oct 2009 10:09:19 +0000 (11:09 +0100)]
Update NEWS

Add rapi_users changes, rearrange a bit and one wording change.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd remote API users and passwords documentation
Michael Hanselmann [Wed, 28 Oct 2009 18:33:54 +0000 (19:33 +0100)]
Add remote API users and passwords documentation

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoganeti-rapi: Use new function to verify passwords
Michael Hanselmann [Wed, 28 Oct 2009 17:08:28 +0000 (18:08 +0100)]
ganeti-rapi: Use new function to verify passwords

This enables the use of hashed passwords in rapi_users.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agohttp.auth: Add new function to verify passwords
Michael Hanselmann [Wed, 28 Oct 2009 17:07:53 +0000 (18:07 +0100)]
http.auth: Add new function to verify passwords

This new function supports two schemes for passwords:
- Old-style cleartext passwords
- Hashed passwords according to RFC2617 (H(A1))

Schemes are differentiated by their prefix, a concept also
used in OpenLDAP. Cleartext passwords can no longer start
with an opening brace ("{") unless they're prefixed with
"{cleartext}" (case insensitive).

Currently there's no documentation for rapi_users at all.
It'll be in a consecutive patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMakefile.am: Add more checks to distcheck-hook
Michael Hanselmann [Tue, 27 Oct 2009 14:24:46 +0000 (15:24 +0100)]
Makefile.am: Add more checks to distcheck-hook

Also use grep only to convert find's output to an exit status.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoDocumentation updates
Iustin Pop [Tue, 27 Oct 2009 02:33:21 +0000 (11:33 +0900)]
Documentation updates

Our admin guide was very very trivial. This patch updates it to contain
advice on when to use which commands, removes the instance
administration part from the installation guide (moved to the admin
guide), and adds a walkthrough document that should be useable as a
starting point for new admins.

The patch also adds emacs variables to the documents, and rewraps some
which were not already at 72 chars.

The doc updates also show backwards-compatible commands for Ganeti 2.0,
as we don't have a good up-to-date 2.0 document and people might refer
to this set of documentation even when running that.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix another style issue
Iustin Pop [Tue, 27 Oct 2009 04:59:38 +0000 (13:59 +0900)]
Fix another style issue

For the Nth time, re-fix shadowing of outer-scope variable :)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoMake gnt-node list-storage more standard
Iustin Pop [Tue, 27 Oct 2009 03:24:19 +0000 (12:24 +0900)]
Make gnt-node list-storage more standard

This patch adds support for the -o+field,… format that the other list
commands accept and changes the format of the allocatable field from
simply str(bool) to Y/N.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRename the node storage commands
Iustin Pop [Tue, 27 Oct 2009 03:13:01 +0000 (12:13 +0900)]
Rename the node storage commands

To reduce confusion, the following gnt-node commands are renamed:

- physical-volumes → list-storage
- modify-volume → modify-storage
- repair-volume → repair-storage

The NEWS file is update accordingly and it also gets emacs local
variables.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix an error handling case in TLReplaceDisks
Iustin Pop [Tue, 27 Oct 2009 04:44:55 +0000 (13:44 +0900)]
Fix an error handling case in TLReplaceDisks

pylint is your friend, since the compiler doesn't exist.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoProvide feedback from redistributing configuration
Michael Hanselmann [Tue, 27 Oct 2009 13:14:15 +0000 (14:14 +0100)]
Provide feedback from redistributing configuration

This is particularily useful for “gnt-cluster redist-conf”, but
also for all other cases where the configuration files are
rewritten on other nodes.

$ gnt-cluster redist-conf
… Copy of file /var/lib/ganeti/config.data to node … failed: Error while
executing backend function: [Errno 1] Operation not permitted
… Error while uploading ssconf files to node …: Error while executing backend
function: [Errno 1] Operation not permitted

$ gnt-node modify --offline no --force node3.example.com
… - WARNING: Not enough master candidates (desired 10, new value will be 4)
… Copy of file /var/lib/ganeti/config.data to node node8.example.com failed:
Error while executing backend function: [Errno 1] Operation not permitted
Modified node node3.example.com
 - offline -> True
 - master_candidate -> auto-demotion due to offline

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agobash_completion: Move common code into function
Michael Hanselmann [Mon, 26 Oct 2009 18:22:00 +0000 (19:22 +0100)]
bash_completion: Move common code into function

This reduces the size of the script by about 9 kB.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMakefile.am: Wrap long lines
Michael Hanselmann [Mon, 26 Oct 2009 11:43:19 +0000 (12:43 +0100)]
Makefile.am: Wrap long lines

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

14 years agoInclude NEWS in documentation again
Michael Hanselmann [Mon, 26 Oct 2009 11:39:05 +0000 (12:39 +0100)]
Include NEWS in documentation again

This was implemented in 350ecfecca and reverted in 700bb84367
after it broke “make distcheck”. With other changes in this
patch series this will work now.

Contributing to the original problem was that the news.rst file
was not distributed. When we distribute the build documentation,
the source must also be included (see Automake manual).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoMakefile.am: Don't include MAINTAINERCLEANFILES in EXTRA_DIST
Michael Hanselmann [Mon, 26 Oct 2009 11:00:30 +0000 (12:00 +0100)]
Makefile.am: Don't include MAINTAINERCLEANFILES in EXTRA_DIST

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoMakefile.am: Use noinst_DATA instead of all-local target
Michael Hanselmann [Mon, 26 Oct 2009 10:56:57 +0000 (11:56 +0100)]
Makefile.am: Use noinst_DATA instead of all-local target

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoMakefile.am: Make HTML doc building depend on stamp file
Michael Hanselmann [Mon, 26 Oct 2009 10:48:55 +0000 (11:48 +0100)]
Makefile.am: Make HTML doc building depend on stamp file

This patch also adds an explicit list of all files written by
sphinx (“docoutput”).

By using an explicit list the build process is more predictable
and will allow us to include the NEWS file again.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMakefile.am: Use dependencies to create symlinks only if necessary
Michael Hanselmann [Fri, 23 Oct 2009 10:51:29 +0000 (12:51 +0200)]
Makefile.am: Use dependencies to create symlinks only if necessary

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoMakefile.am: Move stamp-directories to BUILT_SOURCES
Michael Hanselmann [Fri, 23 Oct 2009 10:04:29 +0000 (12:04 +0200)]
Makefile.am: Move stamp-directories to BUILT_SOURCES

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoFix gnt-debug breakage due to options move
Iustin Pop [Mon, 26 Oct 2009 12:21:30 +0000 (21:21 +0900)]
Fix gnt-debug breakage due to options move

Commits d3ed23f and 4eb6265 broke gnt-debug due to renamed option
targets. Sorry again!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix gnt-node evacuate w. iallocator
Iustin Pop [Mon, 26 Oct 2009 12:07:47 +0000 (21:07 +0900)]
Fix gnt-node evacuate w. iallocator

Commit 2bb5c911 moved around and changed the _RunAllocator function in
the DiskReplace → TaskLet conversion, but in the process it changed the
relocate_from argument from a list of nodes to just the secondary node.
This breaks the protocol and current iallocator scripts.

This patch fixes that but also adds a local variable 'instance' since
it's not nice to write self.instance so many times.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoInstanceIpToNodePrimaryIpQuery: use a query dict
Guido Trotter [Fri, 23 Oct 2009 14:42:37 +0000 (10:42 -0400)]
InstanceIpToNodePrimaryIpQuery: use a query dict

In 95b487b we changed InstanceIpToNodePrimaryIpQuery to be able to query
multiple instances at once. We also need to be able to query ips
belonging to a specific nic link, so what we do is:

1) Move the "query" argument to a dict, containing different fields
2) Explicit the "query for a single ip" or "query for a list" options.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoSimpleConfigReader: ips are partitioned by link
Guido Trotter [Fri, 23 Oct 2009 14:29:37 +0000 (10:29 -0400)]
SimpleConfigReader: ips are partitioned by link

We were already half-doing it, but this completes the process.

1) We don't maintain a list of ips or an ip->instance map
2) We add a new link,ip->instance map (link->ips list we had)
3) We add the link parameter to GetInstanceByIp (making it
   GetInstanceByLinkIp)
4) We change the GetInstanceByIp caller to pass None as link
   (thus for now using only the default link)

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoSimpleConfigReader: queries for default nicparams
Guido Trotter [Fri, 23 Oct 2009 14:26:56 +0000 (10:26 -0400)]
SimpleConfigReader: queries for default nicparams

GetDefaultNicParams returns the default nic parameters.
GetDefaultNicLink returns the default nic link.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoUse RUN_IN_TEMPDIR in Makefile.am
Guido Trotter [Wed, 7 Oct 2009 14:40:46 +0000 (15:40 +0100)]
Use RUN_IN_TEMPDIR in Makefile.am

Since we have this variable and use it in other places, remove the only
leftover hardcoded place.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

14 years agoImport errors in confd __init__
Guido Trotter [Thu, 8 Oct 2009 09:12:25 +0000 (10:12 +0100)]
Import errors in confd __init__

It's used by some functions defined there.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

14 years agoAllow '@' in tag values
Iustin Pop [Mon, 26 Oct 2009 05:39:49 +0000 (14:39 +0900)]
Allow '@' in tag values

This allows using an email address (as is) as part of a tag. The main
problem that could arise is when parsing tags from a shell script, but
(AFAIK) '@' is not a special character when used in values (happy to be
corrected if not true).

The patch also moves the re to be compiled at class init time, should
use less resources; in my tests it is fine to use a compiled re from
multiple threads.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix gnt-node modify-volume
Iustin Pop [Mon, 26 Oct 2009 05:28:20 +0000 (14:28 +0900)]
Fix gnt-node modify-volume

This was broken by me in 064c21f, sorry!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agognt-node: add short option -t for --storage-type
Iustin Pop [Mon, 26 Oct 2009 05:18:23 +0000 (14:18 +0900)]
gnt-node: add short option -t for --storage-type

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoinit script: allow singling out confd as well
Guido Trotter [Thu, 22 Oct 2009 21:43:31 +0000 (17:43 -0400)]
init script: allow singling out confd as well

Currently we can start/stop the various subdaemons, but not confd.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

14 years agocmdlib._AssembleInstanceDisks: Fix case where variable wouldn't be set
Michael Hanselmann [Thu, 22 Oct 2009 15:20:16 +0000 (17:20 +0200)]
cmdlib._AssembleInstanceDisks: Fix case where variable wouldn't be set

The “result” variable may not be set and/or come from the previous loop.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoMakefile: Use path from configure script for sphinx-build
Michael Hanselmann [Thu, 22 Oct 2009 15:15:28 +0000 (17:15 +0200)]
Makefile: Use path from configure script for sphinx-build

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Ken Wehr <ksw@google.com>

14 years agoKVM netscript: add static routes, with no suffix
Guido Trotter [Thu, 22 Oct 2009 14:52:11 +0000 (10:52 -0400)]
KVM netscript: add static routes, with no suffix

The /32 suffix is useless, since the kernel already assumes single-host,
if no suffix is specified. Moreover we prefer these routes to be
"static" so that routing daemons, if present, won't mess with them.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agognt-job manpage: Remove detailed description for lock_status
Michael Hanselmann [Thu, 22 Oct 2009 12:04:54 +0000 (14:04 +0200)]
gnt-job manpage: Remove detailed description for lock_status

The format changed in the meantime and should be self-explanatory.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoKVMHypervisor: configure v6 parameters on nic
Guido Trotter [Wed, 21 Oct 2009 22:44:07 +0000 (18:44 -0400)]
KVMHypervisor: configure v6 parameters on nic

In routing mode we are tweaking a few parameters on the interface. With
this patch we'll tweak both the v4 and v6 ones.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoKVMHypervisor: implement instance policy routing
Guido Trotter [Wed, 21 Oct 2009 22:13:21 +0000 (18:13 -0400)]
KVMHypervisor: implement instance policy routing

Until now we relied on traffic from instances being policy routed via a
rule based on the instance network. With this change we can enforce it
on the instance interfaces. Since the ip rules survive interface
disappearing and reappearing, we need first to remove leftover rules,
and then to apply the new one, when creating the interface.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoMan page for ganeti-confd
Guido Trotter [Wed, 21 Oct 2009 21:25:44 +0000 (17:25 -0400)]
Man page for ganeti-confd

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdding '--no-ssh-init' option to 'gnt-cluster init'.
Ken Wehr [Wed, 21 Oct 2009 16:15:47 +0000 (12:15 -0400)]
Adding '--no-ssh-init' option to 'gnt-cluster init'.

Allows the initialization of a cluster without the creation or distribution
of SSH key pairs. Includes changes for LeaveCluster and RPC.

Signed-off-by: Ken Wehr <ksw@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoconfd: query the pnode of multiple instances at once
Flavio Silvestrow [Wed, 21 Oct 2009 19:01:01 +0000 (20:01 +0100)]
confd: query the pnode of multiple instances at once

Signed-off-by: Flavio Silvestrow <flaviops@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoTry to reduce wrong errors in InstanceShutdown
Iustin Pop [Thu, 22 Oct 2009 06:49:23 +0000 (15:49 +0900)]
Try to reduce wrong errors in InstanceShutdown

In backend.InstanceShutdown(), there is a race condition between
checking that the instance exists and trying to shut it down which
translates sometime in error messages like:

Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed
to force stop instance instance9: Failed to stop instance instance9:
exited with exit code 1, Error: Domain 'instance9' does not exist.

To fix this, we ignore any hypervisor StopInstance() errors if the
instance doesn't exist anymore, since our purpose (to make the instance
go away) is already accomplished.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRevert breakage introduced in e4e9b80
Iustin Pop [Thu, 22 Oct 2009 06:36:40 +0000 (15:36 +0900)]
Revert breakage introduced in e4e9b80

Commit e4e9b8064787df01a79846a40f49c8ae06a8eb0e introduced two problems
in backend.InstanceShutdown():

- first, it reduced the check interval significantly (especially for the
  first few checks); there are very few production VMs that shutdown in
  one second, and while not breaking anything this creates unnecessary
  load for the hypervisor
- second, a wrong test added to the while condition (“not tried_once”)
  means that we only sleep once for an instance, and after that we
  immediately kill it forcefully

These two together means that any instance which is not lucky enough to
finish in roughly 1-1.5 seconds (the time it takes to sleep and verify
again the instance list) will have this happen:

2009-10-21 23:33:46,034:  pid=16634 INFO Called for inst9 w. False/False
2009-10-21 23:33:47,440:  pid=16634 ERROR Shutdown of 'inst9' unsuccessful, forcing
2009-10-21 23:33:47,440:  pid=16634 INFO Called for inst9 w. True/False

The “Called…” are logs from the hypervisor shutdown function. This means
of course that at restart time:

[12775866.644682] EXT3-fs: INFO: recovery required on readonly filesystem.
[12775866.644689] EXT3-fs: write access will be enabled during recovery.
[12775868.533674] kjournald starting.  Commit interval 5 seconds
[12775868.533697] EXT3-fs: sda1: orphan cleanup on readonly fs
[12775868.551797] EXT3-fs: sda1: 12 orphan inodes deleted
[12775868.551803] EXT3-fs: recovery complete.
[12775868.586275] EXT3-fs: mounted filesystem with ordered data mode.

This patch reverts the broken test and changes the sleep to a fixed
duration of five seconds, since it makes no sense to check that often
for shutdown (and after ~20 seconds we anyway reach a stable value of
five seconds).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoXen: Ignore the retry argument in stop instance
Iustin Pop [Thu, 22 Oct 2009 06:07:33 +0000 (15:07 +0900)]
Xen: Ignore the retry argument in stop instance

Commit 4ad4511 changed the KVM hypervisor to send multiple shutdown
requests to the monitor, but it didn't change this for the Xen
hypervisor. We simply remove the return on retry model, since we do want
to send multiple shutdown signals for both Xen and KVM (even if the
behaviour is not perfect, they should behave the same).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoEnsure RpcResult has “payload” attribute
Michael Hanselmann [Tue, 20 Oct 2009 16:17:12 +0000 (18:17 +0200)]
Ensure RpcResult has “payload” attribute

Also add assertions to avoid missing attributes in the future.
They won't be included in optimized bytecode.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoFix typo in install.rst
Guido Trotter [Tue, 20 Oct 2009 20:28:49 +0000 (16:28 -0400)]
Fix typo in install.rst

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

14 years agoinstall.rst: mention xen config for live migration
Guido Trotter [Tue, 20 Oct 2009 15:49:09 +0000 (11:49 -0400)]
install.rst: mention xen config for live migration

This addresses issue 75.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoBump version to 2.1.0~beta2 v2.1.0beta2
Michael Hanselmann [Mon, 19 Oct 2009 16:48:47 +0000 (18:48 +0200)]
Bump version to 2.1.0~beta2

I forgot to bump the configure.ac version before tagging the 2.1.0~beta1
release. Since we cannot remove old tags (see “On Re-tagging” in git-tag(1)),
we have to call this release 2.1.0~beta2.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

14 years agoIntroduce checks for /sys and /proc
Iustin Pop [Mon, 19 Oct 2009 05:12:45 +0000 (14:12 +0900)]
Introduce checks for /sys and /proc

This patch adds checks for /proc and /sys in cluster verify, since
Ganeti relies on these special filesystems to be mounted.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix serializer unittests v2.1.0beta1
Michael Hanselmann [Fri, 16 Oct 2009 16:23:23 +0000 (18:23 +0200)]
Fix serializer unittests

Commit d22b29997cd broke the serializer unittests with certain
versions of simplejson. This patch removes sort_keys again
and implements a slightly more efficient way of detecting
simplejson functionality. The serializer unittests no longer
use a partially broken mock, but rather a function to convert all
tuples to lists before comparing.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agocfgupgrade: Implement upgrade to 2.1.0
Michael Hanselmann [Fri, 16 Oct 2009 14:32:00 +0000 (16:32 +0200)]
cfgupgrade: Implement upgrade to 2.1.0

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agobootstrap: Factorize HMAC key generation
Michael Hanselmann [Fri, 16 Oct 2009 14:30:44 +0000 (16:30 +0200)]
bootstrap: Factorize HMAC key generation

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMake bootstrap._GenerateSelfSignedSslCert public
Michael Hanselmann [Fri, 16 Oct 2009 14:21:26 +0000 (16:21 +0200)]
Make bootstrap._GenerateSelfSignedSslCert public

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agocfgupgrade: Remove Ganeti 1.2 support
Michael Hanselmann [Fri, 16 Oct 2009 14:33:31 +0000 (16:33 +0200)]
cfgupgrade: Remove Ganeti 1.2 support

This also fixes a few typos.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoserializer: Sort keys in JSON
Michael Hanselmann [Fri, 16 Oct 2009 13:56:21 +0000 (15:56 +0200)]
serializer: Sort keys in JSON

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoBump version to 2.1.0~beta0 v2.1.0beta0
Michael Hanselmann [Mon, 12 Oct 2009 15:38:47 +0000 (17:38 +0200)]
Bump version to 2.1.0~beta0

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agomcpu: Use new timeout class for timeout
Michael Hanselmann [Thu, 15 Oct 2009 12:39:14 +0000 (14:39 +0200)]
mcpu: Use new timeout class for timeout

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agolocking: Convert pipe condition to new timeout class
Michael Hanselmann [Thu, 15 Oct 2009 12:38:31 +0000 (14:38 +0200)]
locking: Convert pipe condition to new timeout class

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agolocking.LockSet: Move timeout calculation to separate class
Michael Hanselmann [Thu, 15 Oct 2009 12:32:58 +0000 (14:32 +0200)]
locking.LockSet: Move timeout calculation to separate class

This class can also be used by mcpu.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agolocking, mcpu: Ensure timeout is always >= 0.0
Michael Hanselmann [Tue, 13 Oct 2009 17:23:58 +0000 (19:23 +0200)]
locking, mcpu: Ensure timeout is always >= 0.0

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agolocking.LockSet: Improve assertions
Michael Hanselmann [Tue, 13 Oct 2009 16:34:04 +0000 (18:34 +0200)]
locking.LockSet: Improve assertions

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agolocking: Factorize LockSet.acquire
Michael Hanselmann [Tue, 13 Oct 2009 16:33:47 +0000 (18:33 +0200)]
locking: Factorize LockSet.acquire

By moving the main code of LockSet.acquire to its own function
we reduce the code complexity a bit and clarify the exception
handling.

This also fixes a case where a lock acquire timeout wasn't
handled correctly, leading to obscure error messages.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agomcpu: Make sure added locks are released on errors
Michael Hanselmann [Tue, 13 Oct 2009 16:31:30 +0000 (18:31 +0200)]
mcpu: Make sure added locks are released on errors

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoTest LockSet.acquire return value for timeout
Michael Hanselmann [Tue, 13 Oct 2009 16:30:40 +0000 (18:30 +0200)]
Test LockSet.acquire return value for timeout

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoopcodes: Add missing shutdown_timeout to OpRemoveInstance
Michael Hanselmann [Tue, 13 Oct 2009 16:30:19 +0000 (18:30 +0200)]
opcodes: Add missing shutdown_timeout to OpRemoveInstance

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoluxi: Pass socket path directly to exception, not in tuple
Michael Hanselmann [Tue, 13 Oct 2009 16:29:56 +0000 (18:29 +0200)]
luxi: Pass socket path directly to exception, not in tuple

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agognt-* use the correct opcode slot to build opcodes
Guido Trotter [Tue, 13 Oct 2009 12:26:49 +0000 (13:26 +0100)]
gnt-* use the correct opcode slot to build opcodes

gnt-* scripts were building wrong opcodes for commands which had the
shutdown_timeout slot (due to missing testing after renaming). Fixing.

Also change SHUTDOWN_TIMEOUT_OPT dest field name to "shutdown_timeout":
it was set to "timeout". It would still work that way, but possibly be
confusing.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoUpdate NEWS for instance shutdown timeout
Guido Trotter [Tue, 13 Oct 2009 11:22:35 +0000 (12:22 +0100)]
Update NEWS for instance shutdown timeout

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoUpdate documentation for recreate-disks
Iustin Pop [Tue, 13 Oct 2009 12:12:57 +0000 (14:12 +0200)]
Update documentation for recreate-disks

This also clarifies the UUIDs NEWS entry.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agorapi: fix tag operations
Iustin Pop [Tue, 13 Oct 2009 12:01:20 +0000 (14:01 +0200)]
rapi: fix tag operations

This patch fixes the tag PUT/DELETE operations, and additionally changes
the _Tags_* functions to take only positional and not keyword arguments
(the defaults do not make any sense at all, and they are always called
with all arguments).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoUpdate NEWS for Ganeti 2.1
Michael Hanselmann [Mon, 12 Oct 2009 15:18:37 +0000 (17:18 +0200)]
Update NEWS for Ganeti 2.1

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoConvert NEWS to ASCII
Michael Hanselmann [Mon, 12 Oct 2009 15:19:25 +0000 (17:19 +0200)]
Convert NEWS to ASCII

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoUpdate manpages for --shutdown-timeout
Guido Trotter [Mon, 12 Oct 2009 18:05:48 +0000 (19:05 +0100)]
Update manpages for --shutdown-timeout

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd timeout options to other LUs
Guido Trotter [Mon, 12 Oct 2009 11:49:50 +0000 (12:49 +0100)]
Add timeout options to other LUs

All the LUs that shut down the instance need to be able too pass the
timeout parameter as well.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agocli: add SHUTDOWN_TIMEOUT_OPT
Guido Trotter [Mon, 12 Oct 2009 15:43:55 +0000 (16:43 +0100)]
cli: add SHUTDOWN_TIMEOUT_OPT

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agomcpu: Change lock attempt timeout calculation
Michael Hanselmann [Mon, 12 Oct 2009 10:46:22 +0000 (12:46 +0200)]
mcpu: Change lock attempt timeout calculation

With this patch all timeouts are pre-calculated. The interface of
the _LockTimeoutStrategy class is also changed a bit; NextAttempt
now returns a new instance.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoCode and docstring style fixes
Michael Hanselmann [Wed, 7 Oct 2009 16:15:25 +0000 (18:15 +0200)]
Code and docstring style fixes

Found using pylint and epydoc.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agomcpu: Improve lock reporting with timeouts
Michael Hanselmann [Wed, 7 Oct 2009 14:11:22 +0000 (16:11 +0200)]
mcpu: Improve lock reporting with timeouts

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agomcpu: Implement lock timeouts
Michael Hanselmann [Wed, 7 Oct 2009 12:58:06 +0000 (14:58 +0200)]
mcpu: Implement lock timeouts

The timeout is always between ~0.1 and ~10.0 seconds. A small
variation of ±5% is added to prevent different jobs from
fighting each other. After 10 attempts to acquire the locks with
a timeout, a blocking acquire is made.

Lock status reporting will be improved in a separate patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agomcpu: Remove unused exclusive_BGL attribute
Michael Hanselmann [Mon, 5 Oct 2009 14:16:15 +0000 (16:16 +0200)]
mcpu: Remove unused exclusive_BGL attribute

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agolocking.LockSet: Implement acquire timeouts
Michael Hanselmann [Fri, 9 Oct 2009 11:39:17 +0000 (13:39 +0200)]
locking.LockSet: Implement acquire timeouts

The timeout passed to LockSet.acquire() is measured over all lock acquires. If
LockSet.acquire fails to acquire all requested locks within the specified
amount of time, all locks are released again and the acquire fails.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoUpdate gnt-instance(8) for shutdown --timeout
Guido Trotter [Fri, 9 Oct 2009 13:03:39 +0000 (14:03 +0100)]
Update gnt-instance(8) for shutdown --timeout

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAccept shutdown timeout from the user
Guido Trotter [Fri, 9 Oct 2009 10:52:38 +0000 (11:52 +0100)]
Accept shutdown timeout from the user

Using the new --timeout option:

- gnt-instance shutdown is changed to accept a timeout
- the opcode is changed to hold one
- the LU is changed to optionally get one
- the rpc is changed to carry one
- the backend is changed to take it as a parameter rather than
  hardcoding it in the function

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agocli: add a timeout option
Guido Trotter [Fri, 9 Oct 2009 10:52:02 +0000 (11:52 +0100)]
cli: add a timeout option

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoChrootManager: clean StopInstance
Guido Trotter [Fri, 9 Oct 2009 11:04:07 +0000 (12:04 +0100)]
ChrootManager: clean StopInstance

Currently it has lots for duplicated code, and internal retries.
Clean it up with the following assumptions:

We'll probably be called more than once.
It is ok to fail to stop, unless we're called with force=True.
If we're called only once, and with force=True it's ok not to run the
chroot "cleanup" script (it's a destroy after all, why should chroots
have more chances than other instances?).

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

14 years agoKVMHypervisor: use the StopInstance retry feature
Guido Trotter [Fri, 9 Oct 2009 10:49:54 +0000 (11:49 +0100)]
KVMHypervisor: use the StopInstance retry feature

Since we know StopInstance is going to be called more than once (at
least twice, once with force and once without, but normally quite a lot
more) we don't need our own sleep/loop, and we can just send one monitor
command per call.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

14 years agobackend.InstanceShutdown: small cleanup
Guido Trotter [Fri, 9 Oct 2009 10:35:34 +0000 (11:35 +0100)]
backend.InstanceShutdown: small cleanup

1) unhardcode the timeout, abstracting it in a constant
2) Use time.time() rather than hiding the timeout in a range()
3) call hyper.StopInstance multiple times
   -- currently all hypervisors just ignore all calls but once
4) Use hyper.ListInstances() rather than GetInstanceList([hv_name])
   -- it's cheaper :)
5) Change the final message to "forcing" from "using destroy"

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

14 years agoAdd default instance shutdown timeout constant
Guido Trotter [Fri, 9 Oct 2009 10:17:57 +0000 (11:17 +0100)]
Add default instance shutdown timeout constant

It reflects the "current" two minutes we give to the instance.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>