ganeti-local
14 years agoProvide unittests for http.auth
Michael Hanselmann [Tue, 9 Mar 2010 20:12:40 +0000 (21:12 +0100)]
Provide unittests for http.auth

To simplify writing unittests, one data structure class in http.server is
also changed. According to the coverage utility, this provides 95%
coverage.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agohttp.auth: Fix bug with checking hashed passwords
Michael Hanselmann [Tue, 9 Mar 2010 20:11:55 +0000 (21:11 +0100)]
http.auth: Fix bug with checking hashed passwords

When username and password were sent for a resource not requiring
authentication, it wouldn't be accepted if the user in question had a
hashed password. The reason was that the function GetAuthRealm used to
return None if no authentication was necessary. However, the
authentication realm is necessary to verify hashed passwords. This is
fixed by requiring GetAuthRealm to always return a realm and separating
the decision whether to require authentication or not to a separate
function.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoClarify cluster nic parameters in install.rst
Guido Trotter [Tue, 9 Mar 2010 16:03:19 +0000 (16:03 +0000)]
Clarify cluster nic parameters in install.rst

There were a few outdated options specified there. This patch unifies
the description under only one section, and updates it.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd the auto_promote option to cli and gnt-node
Iustin Pop [Tue, 9 Mar 2010 12:21:54 +0000 (13:21 +0100)]
Add the auto_promote option to cli and gnt-node

This allows one to cleanly set a node offline and promote as needed
other nodes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRework the node modify for mc-demotion
Iustin Pop [Tue, 9 Mar 2010 12:07:08 +0000 (13:07 +0100)]
Rework the node modify for mc-demotion

The current code in LUSetNodeParms regarding the demotion from master
candidate role is complicated and duplicates the code in ConfigWriter,
where such decisions should be made. Furthermore, we still cannot demote
nodes (not even with force), if other regular nodes exist.

This patch adds a new opcode attribute ‘auto_promote’, and changes the
decision tree as follows:

- if the node will be set to offline or drained or explicitly demoted
  from master candidate, and this parameter is set, then we lock all
  nodes in ExpandNames()
- later, in CheckPrereq(), if the node is
  indeed a master candidate, and the future state (as computed via
  GetMasterCandidateStats with the current node in the exception list)
  has fewer nodes than it should, and we didn't lock all nodes, we exit
  with an exception
- in Exec, if we locked all nodes, we do a AdjustCandidatePool() run, to
  ensure nodes are locked as needed (we do it before updating the node
  to remove a warning, and prevent the situation that if the LU fails
  between these, we're not left with an inconsistent state)

Note that in Exec we run the AdjustCP irrespective of any node state
change (just based on lock status), so we might simplify the CheckPrereq
even more by not checking the future state, basically requiring
auto_promote/lock_all for master candidates, since the case where we
have more than needed master candidates is rarer; OTOH, this would prevent
manual promotion ahead of time of another node, which is why I didn't
choose this way.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix node volumes list for stripped volumes
Iustin Pop [Tue, 9 Mar 2010 14:08:37 +0000 (15:08 +0100)]
Fix node volumes list for stripped volumes

Currently backend.NodeVolumes() drops everything except the first PV,
thus we get a truncated result. The patch is not the nicest, as Python
doesn't have a simple `concat' function, so I had to change the list
comprehension to an explicit loop.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix typo that makes cluster verify to ignore hooks
Iustin Pop [Tue, 9 Mar 2010 14:15:40 +0000 (15:15 +0100)]
Fix typo that makes cluster verify to ignore hooks

The return from LUVerifyCluster should be True (or equivalent) for pass,
and False (or equivalent) for fail. The HooksCallBack function uses '1'
(= True) when a hook fails, which is exactly the opposite of what we
want - it will make failed hooks to reset the result to success,
overriding actual failures in cluster verify.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

14 years agoFix redistribute config and offline nodes
Iustin Pop [Tue, 9 Mar 2010 12:39:29 +0000 (13:39 +0100)]
Fix redistribute config and offline nodes

We need to manually filter out offline nodes before using
rpc.call_upload_file and rpc.call_write_ssconf_files, since these method
are static (they work without a ConfigWriter instance) and thus do not
know which nodes are offline and which are not).

Note that we add a new ConfigWriter._UnlockedGetOnlineNodeList() method
rather than hardcoding the filtering of online nodes in _WriteConfig.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdding documentation for “gnt-os modify”
René Nussbaumer [Tue, 9 Mar 2010 09:40:47 +0000 (10:40 +0100)]
Adding documentation for “gnt-os modify”

This finishes the integration of per-os-hypervisor parameters by updating
the man page.

Signed-off-by: René Nussbaumer <rn@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd “gnt-os modify” for per-os-hypervisor parameters
René Nussbaumer [Tue, 9 Mar 2010 09:40:45 +0000 (10:40 +0100)]
Add “gnt-os modify” for per-os-hypervisor parameters

Introduce “gnt-os modify” command to make it possible to set the
per-os-hypervisor parameters.

Signed-off-by: René Nussbaumer <rn@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoShow per-os-hypervisor parameters in “gnt-cluster info”
René Nussbaumer [Tue, 9 Mar 2010 09:40:46 +0000 (10:40 +0100)]
Show per-os-hypervisor parameters in “gnt-cluster info”

Let gnt-cluster info show us the per-os-hypervisor parameters.

Signed-off-by: René Nussbaumer <rn@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd support for per-os-hypervisor parameters
René Nussbaumer [Tue, 9 Mar 2010 09:40:43 +0000 (10:40 +0100)]
Add support for per-os-hypervisor parameters

This patch implements all modifications to support per-os-hypervisor
parameters in the framework.

Signed-off-by: René Nussbaumer <rn@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agocli: Add ArgOs for later use in gnt-os
René Nussbaumer [Tue, 9 Mar 2010 09:40:44 +0000 (10:40 +0100)]
cli: Add ArgOs for later use in gnt-os

Signed-off-by: René Nussbaumer <rn@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoKVM: Fix unintended qemu-level bridging of nics
Timothy Kuhlman [Mon, 8 Mar 2010 15:43:40 +0000 (15:43 +0000)]
KVM: Fix unintended qemu-level bridging of nics

Each nic should be connected to its own qemu vlan, to avoid them all
bridging together.

Signed-off-by: Timothy Kuhlman <timkuhlman@gmail.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoSupport passing in file object in utils.FileLock
Michael Hanselmann [Fri, 26 Feb 2010 17:35:59 +0000 (18:35 +0100)]
Support passing in file object in utils.FileLock

This way we can re-use file objects opened in other places. Also add more
unittests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoConvert utils.FileLock to utils.Retry
Michael Hanselmann [Fri, 26 Feb 2010 17:34:19 +0000 (18:34 +0100)]
Convert utils.FileLock to utils.Retry

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoSupport arguments in utils.RunInSeparateProcess
Michael Hanselmann [Fri, 26 Feb 2010 17:39:49 +0000 (18:39 +0100)]
Support arguments in utils.RunInSeparateProcess

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoValidate the hostnames at creation time
Iustin Pop [Fri, 5 Mar 2010 17:21:35 +0000 (18:21 +0100)]
Validate the hostnames at creation time

This patch adds validation of new names used, i.e. at cluster init time,
node add time, and instance creation.

For instances, especially when using «--no-name-check» (which skips DNS
checks), we should validate the give name, and also normalize it
(otherwise, we could have two instances named inst1 and Inst1).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd a function to validate and normalize hostnames
Iustin Pop [Fri, 5 Mar 2010 15:55:27 +0000 (16:55 +0100)]
Add a function to validate and normalize hostnames

This differs slightly from the specification, by allowing names to start
with digits, not checking the length of individual components, and
allowing underscores.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoListVisibleFiles: require normalized path names
Iustin Pop [Fri, 5 Mar 2010 10:52:32 +0000 (11:52 +0100)]
ListVisibleFiles: require normalized path names

This patch changes ListVisibleFiles to raise ProgrammerError if it's
passed a non-absolute/non-normalized path name, and adds unittests for
this behaviour.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoSwitch more code to PathJoin
Iustin Pop [Fri, 5 Mar 2010 10:28:33 +0000 (11:28 +0100)]
Switch more code to PathJoin

This should remove most of the remaining constructs which can be
replaced by PathJoin.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd caller-validation on Disk.StaticDevPath
Iustin Pop [Fri, 5 Mar 2010 10:17:59 +0000 (11:17 +0100)]
Add caller-validation on Disk.StaticDevPath

Since in objects we don't have access to utils.py, we add a warning that
the result value from objects.Disk.StaticDevPath might not be a valid
path, and change its only caller to validate the path.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agohv_kvm: remove hard-coded path constructs
Iustin Pop [Fri, 5 Mar 2010 10:07:52 +0000 (11:07 +0100)]
hv_kvm: remove hard-coded path constructs

This switches hv_kvm to PathJoin. There are still a few cases of direct
path construction, but those _should_ be safe.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agohv_fake: remove hard-coded path constructs
Iustin Pop [Fri, 5 Mar 2010 09:56:14 +0000 (10:56 +0100)]
hv_fake: remove hard-coded path constructs

This changes to hv_fake to PathJoin.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agohv_chroot: remove hard-coded path constructs
Iustin Pop [Fri, 5 Mar 2010 09:53:08 +0000 (10:53 +0100)]
hv_chroot: remove hard-coded path constructs

This patch abstract the computation of an instance's root directory into
a separate function (that uses PathJoin instead of "%s/%s").

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd strict name validation for the LVM backend
Iustin Pop [Thu, 4 Mar 2010 14:52:13 +0000 (15:52 +0100)]
Add strict name validation for the LVM backend

Currently we don't enforce name validation for the LVM backend, on the
idea that LVM itself will reject invalid names and we catch those
errors.

However, recent LVM documents the accepted VG/LV name space, so it's
easy to add this in the LVM backend code.

In addition, the patch changes some hardcoded /dev/ constructions with
utils.PathJoin().

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoImplement disabling of file-based storage
Iustin Pop [Thu, 4 Mar 2010 13:15:07 +0000 (14:15 +0100)]
Implement disabling of file-based storage

Rationale: the file-based storage backend can add/remove files under a
certain directory. However, the master node is also controlling the
setting of the file-based root directory, so basically it means we can't
prevent arbitrary modifications by the master of the node's filesystem.

In order to mitigate this for setups where the file-based storage is not
used, we introduce a new setting at ./configure time, that controls the
enable/disable of file-based storage. Since this is not modifiable by
the master (over RPC), it is now possible in this case to prevent
unintended modifications of the node's filesystem from the master.

The new setting is used in bdev.py to not expose the file-based storage
at all, and in cmdlib.py to prevent attempts at creation of such
instances.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoReplace os.path.sep.join(seq) with utils.PathJoin
Iustin Pop [Wed, 3 Mar 2010 14:07:56 +0000 (15:07 +0100)]
Replace os.path.sep.join(seq) with utils.PathJoin

This is a no-op change, but at least we concentrate the calls to path
joins into a single function.

A use in utils.FindFile is left as-is (don't want to raise exceptions
there, at least for now).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAbstract OS log names computation
Iustin Pop [Wed, 3 Mar 2010 13:24:41 +0000 (14:24 +0100)]
Abstract OS log names computation

The various OS operations create log files in a specific directory
(constants.LOG_OS_DIR). The construction of the log names is however
spread and duplicated across multiple functions.

This patch abstracts this into a separate function that also validates
the log name, which should be safer in the long run. We also rename the
export logs from having a prefix of “exp” to “export”, since it was the
only operation that had an abbreviated prefix.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRemove superfluous warnings in HooksRunner
Iustin Pop [Wed, 3 Mar 2010 12:50:52 +0000 (13:50 +0100)]
Remove superfluous warnings in HooksRunner

For non-existing hooks (the majority of cases probably), logging a
warning every time is not helpful. So we first check if we have a valid
directory.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoSwitch from os.path.join to utils.PathJoin
Iustin Pop [Wed, 3 Mar 2010 12:18:32 +0000 (13:18 +0100)]
Switch from os.path.join to utils.PathJoin

This passes a full burnin with lots of instances, and should be safe as
we mostly to join a known root (various constants) to a run-time
variable.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoutils: Add a PathJoin function
Iustin Pop [Wed, 3 Mar 2010 10:37:18 +0000 (11:37 +0100)]
utils: Add a PathJoin function

This will replace os.path.join since it is not safe for directory
traversal issues.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd an extra safety layer to _CleanDirectory
Iustin Pop [Wed, 3 Mar 2010 09:38:18 +0000 (10:38 +0100)]
Add an extra safety layer to _CleanDirectory

In order to protect from accidental use of _CleanDirectory on a random
directory, we add a list of allowed clean directories, somewhat similar
to _ALLOWED_UPLOAD_FILES (but statically computed).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAvoid absolute path for privileged commands
Vitaly Kuznetsov [Tue, 2 Mar 2010 13:28:29 +0000 (13:28 +0000)]
Avoid absolute path for privileged commands

Using absolute path for a privileged command is a bad idea as this path may vary.
For example /usr/sbin/brctl in Debian and /sbin/brctl in ALTLinux. Using $PATH is
a better idea.

Signed-off-by: Vitaly Kuznetsov <vitty@altlinux.ru>
Reviewed-by: Iustin Pop <iustin@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>

14 years agoMerge branch 'stable-2.1' into devel-2.1
Iustin Pop [Tue, 2 Mar 2010 12:15:25 +0000 (13:15 +0100)]
Merge branch 'stable-2.1' into devel-2.1

* stable-2.1:
  Make stable release 2.1.0

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

14 years agoMake stable release 2.1.0 v2.1.0
Iustin Pop [Tue, 2 Mar 2010 10:15:59 +0000 (11:15 +0100)]
Make stable release 2.1.0

It is about time (rc0 was almost four months ago)…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

14 years agowatcher: Acquire lock early and give more friendly message
Michael Hanselmann [Fri, 26 Feb 2010 15:42:13 +0000 (16:42 +0100)]
watcher: Acquire lock early and give more friendly message

By opening the lock file early, other programs can lock the
state file to prevent ganeti-watcher from restarting daemons.
Using the pause feature is inherently prone to race conditions.

Before a traceback was logged when the lock file couldn't
be acquired. Now it'll be a more friendly message.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMake SSH_CONFIG_DIR customizable
Vitaly Kuznetsov [Thu, 25 Feb 2010 18:39:16 +0000 (18:39 +0000)]
Make SSH_CONFIG_DIR customizable

This patch adds ability to customize ssh config directory with --with-ssh-config-dir
(instead of hardcoded /etc/ssh value). This is useful in Linux distributions with
custom ssh config directories (/etc/openssh in ALTLinux, for example).

Signed-off-by: Vitaly Kuznetsov <vitty@altlinux.ru>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoMerge branch 'stable-2.1' into devel-2.1
Guido Trotter [Fri, 26 Feb 2010 14:15:25 +0000 (15:15 +0100)]
Merge branch 'stable-2.1' into devel-2.1

* stable-2.1:
  Add NLD constants to Ganeti
  Fix two potentially endless loops in http library
  Fix bug in LUQueryConfigValues
  Fix typo in LUVerifyCluster when checking node time

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd NLD constants to Ganeti
Guido Trotter [Fri, 26 Feb 2010 13:49:44 +0000 (14:49 +0100)]
Add NLD constants to Ganeti

This avoids the need for them to be injected in the nbma repository.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMake pylint happy
René Nussbaumer [Fri, 26 Feb 2010 13:12:39 +0000 (14:12 +0100)]
Make pylint happy

I was using a too old version which doesn't got all those. This
patch is fixing the new lint errors.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoMerge remote branch 'origin/devel-2.0' into devel-2.1
Michael Hanselmann [Fri, 26 Feb 2010 13:36:19 +0000 (14:36 +0100)]
Merge remote branch 'origin/devel-2.0' into devel-2.1

* origin/devel-2.0:
  Fix two potentially endless loops in http library
  Update NEWS file and bump version to 2.0.6
  ganeti-cleaner: does 'echo 0' instead of 'exit 0'

Conflicts:
NEWS: Trivial
configure.ac: Trivial

Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoFix two potentially endless loops in http library
Michael Hanselmann [Fri, 26 Feb 2010 12:32:11 +0000 (13:32 +0100)]
Fix two potentially endless loops in http library

The first can be problematic if poll(2) returns POLLHUP|POLLERR on a
socket. Before it would be only be respected for SOCKOP_RECV, but since
they can also occur on other socket operations, esp. in combination with
OpenSSL, letting the socket functions handle POLLHUP|POLLERR seems to be
the right thing.

The second is a typo leading to an endless loop if the first line of an
HTTP connection is empty (simply "\r\n"). Instead of removing the empty
line, it would remove anything after it.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMove watcher's EnsureDaemon function to utils
Guido Trotter [Fri, 26 Feb 2010 12:41:40 +0000 (13:41 +0100)]
Move watcher's EnsureDaemon function to utils

This is going to be used from the nbma repository, to ensure that the
nld daemon is running.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdding tool for automated cluster-merger
René Nussbaumer [Fri, 26 Feb 2010 10:01:10 +0000 (11:01 +0100)]
Adding tool for automated cluster-merger

This is the implementation of docs/design-cluster-merger.rst. It allows
the automatic merging of one or more clusters into the invoking cluster.

While this version is tested and working it still needs some tweaking
here and there for error handling and user experience.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd multi-key support to the serializer
Balazs Lecz [Thu, 18 Feb 2010 17:40:41 +0000 (17:40 +0000)]
Add multi-key support to the serializer

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix two potentially endless loops in http library
Michael Hanselmann [Fri, 26 Feb 2010 12:32:11 +0000 (13:32 +0100)]
Fix two potentially endless loops in http library

The first can be problematic if poll(2) returns POLLHUP|POLLERR on a
socket. Before it would be only be respected for SOCKOP_RECV, but since
they can also occur on other socket operations, esp. in combination with
OpenSSL, letting the socket functions handle POLLHUP|POLLERR seems to be
the right thing.

The second is a typo leading to an endless loop if the first line of an
HTTP connection is empty (simply "\r\n"). Instead of removing the empty
line, it would remove anything after it.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoFix bug in LUQueryConfigValues
Michael Hanselmann [Fri, 26 Feb 2010 09:03:54 +0000 (10:03 +0100)]
Fix bug in LUQueryConfigValues

LUQueryConfigValues supports multiple output fields. If the client asked
for the watcher pause status, it would not get a list, but simply the
value.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd watcher hooks
Guido Trotter [Fri, 26 Feb 2010 11:25:21 +0000 (12:25 +0100)]
Add watcher hooks

These hooks are run on all nodes, after the "base" daemons are started.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAbstract starting the node daemons
Guido Trotter [Fri, 19 Feb 2010 12:20:59 +0000 (12:20 +0000)]
Abstract starting the node daemons

We're using a separate function for this, as we're going to add some
functionality to this feature.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoganeti-watcher: remove unused Indent function
Guido Trotter [Fri, 19 Feb 2010 12:12:16 +0000 (12:12 +0000)]
ganeti-watcher: remove unused Indent function

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoFix typo in LUVerifyCluster when checking node time
Michael Hanselmann [Tue, 23 Feb 2010 16:11:15 +0000 (17:11 +0100)]
Fix typo in LUVerifyCluster when checking node time

The first argument to _ErrorIf should always be True in this case.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd make target to generate unittest coverage report
Michael Hanselmann [Mon, 22 Feb 2010 11:29:10 +0000 (12:29 +0100)]
Add make target to generate unittest coverage report

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoCatch disk activation errors in watcher
Michael Hanselmann [Mon, 22 Feb 2010 17:22:28 +0000 (18:22 +0100)]
Catch disk activation errors in watcher

If activating disks fails for some reason, the watcher didn't
catch the exception. With this patch it's caught and logged.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd unittests for ganeti.opcodes
Michael Hanselmann [Fri, 19 Feb 2010 16:55:52 +0000 (17:55 +0100)]
Add unittests for ganeti.opcodes

According to “coverage”, this covers 99% of the code.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoDisable warning for not calling ProcessEvent init
Guido Trotter [Fri, 19 Feb 2010 15:36:48 +0000 (15:36 +0000)]
Disable warning for not calling ProcessEvent  init

This class doesn't need its constructor to be called.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoImplement utils.RunParts and use it for hooks
Guido Trotter [Fri, 19 Feb 2010 14:52:09 +0000 (14:52 +0000)]
Implement utils.RunParts and use it for hooks

This function is a generic pythonic version of runparts. We currently
use it in the backend HooksRunner, but we'll use it for running
different directories as well.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoChange backend hooks runner to use RunCmd
Guido Trotter [Fri, 19 Feb 2010 13:14:12 +0000 (13:14 +0000)]
Change backend hooks runner to use RunCmd

And save lots of lines of code, in the process

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd reset_env option to RunCmd
Guido Trotter [Fri, 19 Feb 2010 15:35:37 +0000 (15:35 +0000)]
Add reset_env option to RunCmd

This allows to run a command with only the passed in environment, rather
than just updating the default one with it.

Now with unit testing.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMake it possible to pass custom private key path to SshRunner.Run
René Nussbaumer [Mon, 22 Feb 2010 14:37:43 +0000 (15:37 +0100)]
Make it possible to pass custom private key path to SshRunner.Run

Signed-off-by: René Nussbaumer <rn@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoShow message when job is waiting in queue or for locks
Michael Hanselmann [Mon, 22 Feb 2010 15:26:41 +0000 (16:26 +0100)]
Show message when job is waiting in queue or for locks

Jobs submitted via the standard command line utilities didn't give any
indication that anything is happening while they were waiting in the job
queue (e.g. due to other jobs using all worker threads) or acquiring
locks. This could be very confusing for people not familiar with Ganeti's
architecture. Now they'll show a message after the first WaitForJobChanges
timeout.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoHandle EAGAIN in LUXI client
Michael Hanselmann [Mon, 22 Feb 2010 12:39:22 +0000 (13:39 +0100)]
Handle EAGAIN in LUXI client

If too many clients try to connect to the master at the same time, some of
them might fail if the master doesn't accept the connections fast enough.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoUpdate the IAllocator documentation
Iustin Pop [Wed, 17 Feb 2010 17:59:07 +0000 (18:59 +0100)]
Update the IAllocator documentation

This should be rewritten from a 'change document' (e.g. "Ganeti only
supports...") to a 'current implementation document', but in the
meantime we can at least update it with the multi-evac changes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoSwitch gnt-node evacuate to the new opcode
Iustin Pop [Wed, 17 Feb 2010 16:41:26 +0000 (17:41 +0100)]
Switch gnt-node evacuate to the new opcode

This switches gnt-node to the new opcode, and in the process also
enables multi-node arguments for it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd LUNodeEvacuationStrategy
Iustin Pop [Wed, 17 Feb 2010 09:28:45 +0000 (10:28 +0100)]
Add LUNodeEvacuationStrategy

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd a new opcode for node evacuation
Iustin Pop [Wed, 17 Feb 2010 09:28:45 +0000 (10:28 +0100)]
Add a new opcode for node evacuation

We add this as a new opcode since we don't want to alter the behaviour
of current opcodes/lus.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoImplement support for mevac in OpTestAllocator
Iustin Pop [Wed, 17 Feb 2010 09:25:59 +0000 (10:25 +0100)]
Implement support for mevac in OpTestAllocator

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoImplement IAllocator multi-evacuate mode
Iustin Pop [Wed, 17 Feb 2010 09:23:38 +0000 (10:23 +0100)]
Implement IAllocator multi-evacuate mode

This is a new mode that request a solution for the evacuation of
multiple nodes. The external script will be fed a list of names, and is
expected to return a list of [instance, new_node(s)] lists, detailing
the evacuation path of each instance.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAccept both 'nodes' and 'result from iallocator
Iustin Pop [Wed, 17 Feb 2010 17:28:36 +0000 (18:28 +0100)]
Accept both 'nodes' and 'result from iallocator

This patch switches the default result key from 'nodes' to 'result'. The
old name is still accepted for backwards-compatiblity, and should be
removed in later versions.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoChange internal API for the IAllocator class
Iustin Pop [Tue, 16 Feb 2010 09:33:57 +0000 (10:33 +0100)]
Change internal API for the IAllocator class

Currently the 'name' parameter in the constructor is required (as a
non-keyword argument). Since the (to follow) node evac IAllocator mode
doesn't have 'name' as a valid argument, we're moving this one into the
per-request key, leaving the constructor required arguments more
abstract.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRemove redundant code in IAllocator class
Iustin Pop [Mon, 15 Feb 2010 16:45:38 +0000 (17:45 +0100)]
Remove redundant code in IAllocator class

This moves the setting of the request member on the in_data, of the
request type, and of the branching basef on request type outside of
individual functions and directly into the constructor.

Since the values we're using externally are identical to the
constants.py values, we're also using those directly.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agobootstrap: Wait for node daemon when adding new node
Michael Hanselmann [Thu, 18 Feb 2010 18:39:04 +0000 (19:39 +0100)]
bootstrap: Wait for node daemon when adding new node

Until now this was only done for the master node, though
the problem originally fixed in 8f215968 also occurs for
other node daemons.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMerge branch 'stable-2.1' into devel-2.1
Michael Hanselmann [Thu, 18 Feb 2010 12:22:40 +0000 (13:22 +0100)]
Merge branch 'stable-2.1' into devel-2.1

* stable-2.1:
  Fix ssh host key checking with no-key-check

14 years agoReset tempfile module after fork where useful
Michael Hanselmann [Wed, 17 Feb 2010 15:53:41 +0000 (16:53 +0100)]
Reset tempfile module after fork where useful

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoMove RunInSeparateProcess to ganeti.utils
Michael Hanselmann [Wed, 17 Feb 2010 15:46:46 +0000 (16:46 +0100)]
Move RunInSeparateProcess to ganeti.utils

This function could be useful in other places and this
way we can easily unittest it.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd function to reset tempfile module after fork
Michael Hanselmann [Wed, 17 Feb 2010 15:31:28 +0000 (16:31 +0100)]
Add function to reset tempfile module after fork

On fork, the tempfile module's pseudo random generator is
not reset. If several processes (e.g. two children or parent
and child) try to create a temporary file, they'll conflict.
This function can be used to reset the name generator which
contains the pseudo random generator.

A unittest is included. It is in a separate script because
it changes a variable in the tempfile module to speed up the
test.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoFix ssh host key checking with no-key-check
Iustin Pop [Thu, 18 Feb 2010 09:12:19 +0000 (10:12 +0100)]
Fix ssh host key checking with no-key-check

In case we add a node with “--no-ssh-key-check”, this should override
any default yes/ask values in the system-wide (or user) ssh key check.

Currently this only works in batch mode, whereas in non-batch we only
override a 'no'. The patch fixes SshRunner such that in non-batch mode
we enforce the value of StrictHostKeyChecking in all cases.

Bug found and initial investigation by Theo Van Dinter.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoSimplify a bit _GetWantedNodes
Iustin Pop [Wed, 17 Feb 2010 16:42:48 +0000 (17:42 +0100)]
Simplify a bit _GetWantedNodes

This should have been done in the _ExpandNodeName patch.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix a wrong docstring
Iustin Pop [Wed, 17 Feb 2010 15:23:40 +0000 (16:23 +0100)]
Fix a wrong docstring

There's no such thing as OpProgrammerError (I found this as I wrote it
in code in another place, and pylint complained).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoganeti-noded: Fix bug when export didn't succeed for all disks
Michael Hanselmann [Tue, 16 Feb 2010 18:29:06 +0000 (19:29 +0100)]
ganeti-noded: Fix bug when export didn't succeed for all disks

snap_disks can contain boolean values. They weren't handled correctly.
The error message was “Error while executing backend function: Invalid
object passed to FromDict: expected dict, got <type 'bool'>”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoRemove boiler-plate code about node/instance names
Iustin Pop [Wed, 17 Feb 2010 12:47:43 +0000 (13:47 +0100)]
Remove boiler-plate code about node/instance names

Currently we have lots of duplication of the error-checking (and proper
exception raising) around node/instance name expansion. LUCreateInstance
is the only place where we have abstracted this.

This patch creates two functions (ExpandNodeName and ExpandInstanceName)
that will either raise the proper exception or return the expanded name.
This allows a lot of cleanup of duplicate code.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoMerge remote branch 'origin/stable-2.1' into devel-2.1
Michael Hanselmann [Mon, 15 Feb 2010 17:03:45 +0000 (18:03 +0100)]
Merge remote branch 'origin/stable-2.1' into devel-2.1

* origin/stable-2.1:
  Fix bug introduced in commit 413b747
  Fix locking bug causing high CPU usage
  Fix confd procotol design description
  Implement instance rename QA tests
  Fix "gnt-instance rename" functionality

Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoAdd unittest for utils._FingerprintFile
Michael Hanselmann [Fri, 12 Feb 2010 17:48:03 +0000 (18:48 +0100)]
Add unittest for utils._FingerprintFile

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoRelease all node locks during disk replace
Iustin Pop [Mon, 15 Feb 2010 13:34:07 +0000 (14:34 +0100)]
Release all node locks during disk replace

This patch extends commit 7ea7bcf by releasing all node locks in disk
replace for the early release mode. The rationale behind this is:

- LUCreateInstance already releases all node locks while waiting for
  disk synchronization, and does an instance startup later
- WaitForSync only runs (for disk template 'drbd') 'lvs' and read
  /proc/drbd on the primary node, which should be (modulo bugs in LVM)
  safe for parallel run

In any case, the worst I could foresee is a node having N lvs commands
run in parallel on it, while being a primary for disk storage. Based on
create instance doing this safely, and the fact that burnin with more
than two instances per node is safe, I think this can be applied.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoUnify a few re.compile calls in DRBD
Iustin Pop [Mon, 15 Feb 2010 10:44:00 +0000 (11:44 +0100)]
Unify a few re.compile calls in DRBD

These are both cleanups and, in the case of _MassageProcData, switching
from a weaker RE to a stronger one (we now need cs: in the line,
previosuly any line starting with \d+: was accepted).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAuto-enable early release for offline old nodes
Iustin Pop [Mon, 15 Feb 2010 09:27:59 +0000 (10:27 +0100)]
Auto-enable early release for offline old nodes

In case the old node is offline, we won't be able to talk to it to
remove the storage, and in most cases the node is powered
off/unreachable.

In this case, it makes no sense to delay the storage release, so we
enable automatically early_release mode, gaining parallelism during node
evacuation.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoSkip line-length warnings in man
Iustin Pop [Thu, 11 Feb 2010 16:35:36 +0000 (17:35 +0100)]
Skip line-length warnings in man

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRevert "Workaround man page output for long PREFIX dirs"
Iustin Pop [Thu, 11 Feb 2010 16:31:41 +0000 (17:31 +0100)]
Revert "Workaround man page output for long PREFIX dirs"

This reverts commit 83d9f4366f3aa9ae360e27bfe6619402793e9eb5.

man is still unable to wrap some long lines, so we simply revert this patch
(and filter out the specific message in autotools/check-man).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoRun instance hooks on more nodes
Iustin Pop [Thu, 11 Feb 2010 15:06:24 +0000 (16:06 +0100)]
Run instance hooks on more nodes

This should fix issue 68: some hooks should be run on more nodes than
currently. GrowDisk runs on both nodes, remove run the post hook on the
instance's nodes, and failover and migrate run the post hook on the
source node too.

Thanks to Maxence for the initial investigation and patch.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd {NEW,OLD}_{PRIMARY,SECONDARY} vars to hooks
Iustin Pop [Thu, 11 Feb 2010 14:42:49 +0000 (15:42 +0100)]
Add {NEW,OLD}_{PRIMARY,SECONDARY} vars to hooks

Per issue 71, the migrate and failover need special variables for
keeping the nodes consistent during instance migrations.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoWorkaround man page output for long PREFIX dirs
Iustin Pop [Thu, 11 Feb 2010 12:58:26 +0000 (13:58 +0100)]
Workaround man page output for long PREFIX dirs

A long PREFIX variable (to configure) will result in very long
LOCALSTATEDIR, which when concatenated with lib/ganeti/ (and even more
items under it) will go over the 80 char line length we enforce in the
man checker.

To workaround this, we change two things:

- use a specific REPLACE_VARS_MAN which adds breaking points after each
  slash in paths
- replace some <filename> entries with <literallayout> so that docbook
  generates a non-fill block around them (only a few cases need this
  after the breaking points are added

Note that with normal prefixes (e.g. / or /usr/local) this won't happen.

The patch also fixes a wording in the watcher man page.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoPass debug mode to noded for OS-related calls
Iustin Pop [Wed, 10 Feb 2010 15:50:25 +0000 (16:50 +0100)]
Pass debug mode to noded for OS-related calls

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoConvert scripts to pass options to the JobExecutor
Iustin Pop [Wed, 10 Feb 2010 10:30:33 +0000 (11:30 +0100)]
Convert scripts to pass options to the JobExecutor

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd the options attribute to cli.JobExecutor
Iustin Pop [Tue, 9 Feb 2010 14:43:23 +0000 (15:43 +0100)]
Add the options attribute to cli.JobExecutor

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd debug mode to burnin
Iustin Pop [Tue, 9 Feb 2010 13:15:00 +0000 (14:15 +0100)]
Add debug mode to burnin

There are two entry points to job execution in burnin, ExecOp and
ExecOrQueue, and these are modified to call the new _SetDebug method on
the opcodes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoUpdate scripts to pass 'opts' to SubmitOpCode
Iustin Pop [Tue, 9 Feb 2010 13:10:40 +0000 (14:10 +0100)]
Update scripts to pass 'opts' to SubmitOpCode

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoImplement generic CLI options->opcode updates
Iustin Pop [Tue, 9 Feb 2010 13:02:18 +0000 (14:02 +0100)]
Implement generic CLI options->opcode updates

This patch changes SubmitOpCode and SubmitOrSend such that we have a
single function that does generic CLI options to opcode attributes
function. This will allow, once all scripts pass the opts argument to
SubmitOpCode, to pass the debug parameter or the dry-run one to the LUs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoChange the debug CLI option to integer/count
Iustin Pop [Wed, 10 Feb 2010 16:54:22 +0000 (17:54 +0100)]
Change the debug CLI option to integer/count

This changes from boolean to integer/count (for a future differentiation
based on the actual debug level). All the uses of the code only test
it's boolean status, so it still works as an integer value.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd a generic 'debug_level' attribute to opcodes
Iustin Pop [Tue, 9 Feb 2010 13:01:59 +0000 (14:01 +0100)]
Add a generic 'debug_level' attribute to opcodes

Also automatically fix opcodes which have this missing in the LU init
routine.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoFix bug introduced in commit 413b747
Michael Hanselmann [Wed, 10 Feb 2010 13:47:52 +0000 (14:47 +0100)]
Fix bug introduced in commit 413b747

While commit 413b747 fixed the issue of poll(2) returning too
soon, it didn't work when the poll(2) call should've been
blocking. This is now fixed and verified.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>