ganeti-local
12 years agoRemove 15-second sleep from LUInstanceCreate
Apollon Oikonomopoulos [Wed, 3 Aug 2011 15:04:08 +0000 (18:04 +0300)]
Remove 15-second sleep from LUInstanceCreate

Remove 15 second sleep when wait_for_sync is not set. LUInstanceCreate already
calls _WaitForSync with oneshot=True, which already performs an internal
wait-loop for disks to start syncing.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd a readability alias
Iustin Pop [Thu, 4 Aug 2011 10:52:40 +0000 (12:52 +0200)]
Add a readability alias

lu.glm.list_owned becomes lu.owned_locks, which is clearer for the
reader.

Also rename three variables (which were before named owned_locks) to
make clearer what they track.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoFix broken object references in docstrings
Michael Hanselmann [Thu, 4 Aug 2011 12:29:09 +0000 (14:29 +0200)]
Fix broken object references in docstrings

The module is called “objects”, not “object”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd “gnt-instance change-group” command
Michael Hanselmann [Wed, 3 Aug 2011 09:46:10 +0000 (11:46 +0200)]
Add “gnt-instance change-group” command

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd opcode to change instance's group
Michael Hanselmann [Wed, 3 Aug 2011 09:45:46 +0000 (11:45 +0200)]
Add opcode to change instance's group

This is quite similar to evacuating a group, but the locking
is different.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFactorize checking instance's node groups
Michael Hanselmann [Thu, 4 Aug 2011 12:21:05 +0000 (14:21 +0200)]
Factorize checking instance's node groups

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoganeti-cleaner: Remove old watcher state files
Michael Hanselmann [Fri, 29 Jul 2011 14:35:10 +0000 (16:35 +0200)]
ganeti-cleaner: Remove old watcher state files

Watcher state files can stay around if node groups are removed. With
this patch they're removed after 21 days.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoRemove WATCHER_STATEFILE constant
Michael Hanselmann [Fri, 29 Jul 2011 13:14:36 +0000 (15:14 +0200)]
Remove WATCHER_STATEFILE constant

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocfgupgrade: Remove old watcher state file
Michael Hanselmann [Fri, 29 Jul 2011 13:15:00 +0000 (15:15 +0200)]
cfgupgrade: Remove old watcher state file

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoganeti-watcher: Split for node groups
Michael Hanselmann [Fri, 29 Jul 2011 13:49:20 +0000 (15:49 +0200)]
ganeti-watcher: Split for node groups

This patch brings a huge change to ganeti-watcher to make it aware of
node groups. Each node group is processed in its own subprocess,
reducing the impact of long-running operations.

The global watcher state file, $datadir/ganeti/watcher.data, is replaced
with a state file per node group ($datadir/ganeti/watcher.${uuid}.data).

Previously a lock on the state file was used to ensure only one instance
of watcher was running at the same time. Some operations, e.g.
“gnt-cluster renew-crypto”, blocked the watcher by acquiring an
exclusive lock on the state file. Since the watcher processes now use
different files, this method is no longer usable. Locking multiple files
isn't atomic. Instead a dedicated lock file is used and every watcher
process acquires a shared lock on it. If a Ganeti command wants to block
the watcher it acquires the lock in exclusive mode.

Each per-nodegroup watcher process also acquires an exclusive lock on
its state file. This prevents multiple watchers from running for the
same nodegroup.

The code is reorganized heavily to clear up dependencies between
functions and to get rid of the global “client” variable. The utility
class “Watcher” is removed in favour of stand-alone utility functions.

Since the parent watcher process won't wait for its children by
default, a new option (--wait-children) was added. It is used, for
example, by QA.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoLock potential target nodes for group evacuation
Michael Hanselmann [Wed, 3 Aug 2011 09:44:28 +0000 (11:44 +0200)]
Lock potential target nodes for group evacuation

All potential target nodes should be locked while calculating
a group evacuation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoSmall changes in group evacuation
Michael Hanselmann [Wed, 3 Aug 2011 08:46:51 +0000 (10:46 +0200)]
Small changes in group evacuation

- Use OpPrereqError in CheckPrereq
- Clarify command synopsis

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocmdlib: Factorize getting iallocator
Michael Hanselmann [Wed, 3 Aug 2011 08:43:41 +0000 (10:43 +0200)]
cmdlib: Factorize getting iallocator

The same logic will be used for changing an instance's group.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd design document for Ganeti 2.5
Michael Hanselmann [Wed, 3 Aug 2011 15:01:10 +0000 (17:01 +0200)]
Add design document for Ganeti 2.5

Including the designs which were actually implemented.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoPause DRBD sync for OS install if not wait_for_sync
Apollon Oikonomopoulos [Wed, 3 Aug 2011 15:03:38 +0000 (18:03 +0300)]
Pause DRBD sync for OS install if not wait_for_sync

When wait_for_sync is set to False in LUInstanceCreate, Ganeti lets DRBD sync
in the background while performing the rest of the installation steps,
including OS installation.

However, OS installation is a very disk-intensive task that intereferes badly
with the background I/O caused by DRBD's initial sync. To this end, we pause
the background sync before OS installation and unpause it afterwards, which
yields a significant speed boost for OS installation. The following should be
noted:

a) The user has requested not to wait for sync, i.e. the instance will be
   non-redundant for an unspecified interval anyway and delaying this by a
   couple of minutes is not a big compromise.

b) This approach is also followed during disk wiping.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
[iustin@google.com: simplify an if check]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix documentation of gnt-instance failover
Iustin Pop [Wed, 3 Aug 2011 15:57:39 +0000 (17:57 +0200)]
Fix documentation of gnt-instance failover

Explain that we only start the instance on the new node if it was
originally running.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoSmall doc patch for gnt-node evacuate
Iustin Pop [Wed, 3 Aug 2011 15:08:57 +0000 (17:08 +0200)]
Small doc patch for gnt-node evacuate

Just explain a bit the relation between node evacuate and instance
commands.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoChange the backend.InstanceLogName signature
Iustin Pop [Tue, 2 Aug 2011 08:14:05 +0000 (10:14 +0200)]
Change the backend.InstanceLogName signature

This uses now the component for the transfer (if available), otherwise
(e.g. in installs/renames) nothing.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoInstance transfer: export component name to backend
Iustin Pop [Tue, 2 Aug 2011 08:12:05 +0000 (10:12 +0200)]
Instance transfer: export component name to backend

This modifies the RPC layer to export the component name too to the
backend, so that it can be used in log files and messages.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoInstance transfer: add argument for the 'component'
Iustin Pop [Tue, 2 Aug 2011 08:06:34 +0000 (10:06 +0200)]
Instance transfer: add argument for the 'component'

Currently, transfer data is done mainly with just the instance name,
but when we have instances with multiple disks this is not enough to
distinguish between the different transfers being done for the
instance.

Some parts of the code do have knowledge of the part being transferred
(i.e. DiskTransfer.name), but if I understood correctly not all, so I
decided to add a new argument to the respective disk import/disk
export classes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoOptimise use of repeated/looping GetInstanceInfo
Iustin Pop [Wed, 3 Aug 2011 11:48:46 +0000 (13:48 +0200)]
Optimise use of repeated/looping GetInstanceInfo

Similar to the previous patch, this adds a helper function to
eliminate repeated calls info ConfigWriter.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoOptimise use of repeated/looping GetNodeInfo
Iustin Pop [Wed, 3 Aug 2011 11:16:48 +0000 (13:16 +0200)]
Optimise use of repeated/looping GetNodeInfo

This adds a new ConfigWriter.GetMultiNodeInfo function and replaces
multiple/looping calls to GetNodeInfo with it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix lint errors
Iustin Pop [Wed, 3 Aug 2011 10:59:02 +0000 (12:59 +0200)]
Fix lint errors

It turns out that the only use of the operator module was for
itemgetter, so patch eb62069e should have removed that import too.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agognt-node.rst: Fix a typo
René Nussbaumer [Wed, 3 Aug 2011 12:34:24 +0000 (14:34 +0200)]
gnt-node.rst: Fix a typo

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd two more compat functions
Iustin Pop [Wed, 3 Aug 2011 09:47:41 +0000 (11:47 +0200)]
Add two more compat functions

operator.itemgetter(0) → fst
operator.itemgetter(1) → snd

snd is not used yet, but it makes sense to add both.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoFix types passed to IAllocator
Iustin Pop [Tue, 2 Aug 2011 13:01:34 +0000 (15:01 +0200)]
Fix types passed to IAllocator

Iallocator mode reloc, parameter reloc_from takes a list; half of the
code already forced this parameter to list, we add the other two cases
where it is needed.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agohtools: change absolute to relative symlinks
Iustin Pop [Tue, 2 Aug 2011 12:59:00 +0000 (14:59 +0200)]
htools: change absolute to relative symlinks

Currently we use absolute symlinks, but this doesn't work when we
install remotely (due to install first to local temp dir, then rsync
to remote machines). To fix, we change to manually-computed relative
paths, which is not best, but it works.

One possible alternative would be to use hard-links…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agojqueue: Add short delay before detecting job changes
Michael Hanselmann [Tue, 2 Aug 2011 09:48:09 +0000 (11:48 +0200)]
jqueue: Add short delay before detecting job changes

By sleeping for 100ms after receiving a notification for a changed job
file the job is given some additional time to change again. This
significantly reduces the number of LUXI calls for WaitForJobChanges
(depending on the job, in my tests with “gnt-cluster verify
--debug-simulate-errors” by about 80%), and improves performance (the
same job went from around 7 seconds to around 3.5 seconds).

This method is not perfect. The algorithm could be made more complex,
e.g. by increasing the delay on each change, etc., but for now this
simple change provides a good improvement.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd primary/second nodes' group as query fields
Michael Hanselmann [Thu, 28 Jul 2011 11:37:20 +0000 (13:37 +0200)]
Add primary/second nodes' group as query fields

These will be very useful for ganeti-watcher as it needs to retrieve
instances by group.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix doclint failures
Iustin Pop [Tue, 2 Aug 2011 06:58:27 +0000 (08:58 +0200)]
Fix doclint failures

Commit 54ca6e4b2 renamed some arguments, but didn't also renames them
in the docstrings.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agowatcher: Separate function for writing instance status file
Michael Hanselmann [Fri, 29 Jul 2011 13:56:05 +0000 (15:56 +0200)]
watcher: Separate function for writing instance status file

For now this will do another query to the master daemon, but with the
split for node groups this issue will go away.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher: Make RAPI error messages less technical
Michael Hanselmann [Fri, 29 Jul 2011 13:49:55 +0000 (15:49 +0200)]
watcher: Make RAPI error messages less technical

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher.state: Use strings, not objects
Michael Hanselmann [Fri, 29 Jul 2011 13:43:14 +0000 (15:43 +0200)]
watcher.state: Use strings, not objects

Until now the state class would receive instances as objects
(ganeti.watcher.Instance), but this is not necessary. By using strings
the interface is simplified.

This patch also simplifies some code accessing the internal structures,
e.g. setting a key of a dictionary. Some instances of “del dict[key]”
are replaced with “dict.pop(key, None)” to suppress any exceptions if
the key doesn't exist.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher: Raise error on unknown hook status
Michael Hanselmann [Fri, 29 Jul 2011 13:20:42 +0000 (15:20 +0200)]
watcher: Raise error on unknown hook status

Also, remove punctuation from one error message.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher: Reformat constants
Michael Hanselmann [Fri, 29 Jul 2011 13:19:04 +0000 (15:19 +0200)]
watcher: Reformat constants

Make them match with style guide.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd new watcher constants
Michael Hanselmann [Fri, 29 Jul 2011 13:13:58 +0000 (15:13 +0200)]
Add new watcher constants

WATCHER_STATEFILE will be removed at the end of this
patch series.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix formatting of frozensets
Stephen Shirley [Fri, 29 Jul 2011 12:15:40 +0000 (14:15 +0200)]
Fix formatting of frozensets

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocli: Add constant for node group option
Michael Hanselmann [Thu, 28 Jul 2011 09:26:36 +0000 (11:26 +0200)]
cli: Add constant for node group option

ganeti-watcher will use this constant to pass the option to itself for
processing all node groups.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoReplace %r with '%s' in masterd/instance.py
Iustin Pop [Fri, 29 Jul 2011 08:55:44 +0000 (10:55 +0200)]
Replace %r with '%s' in masterd/instance.py

I still don't know why Michael is a fan of %r, but in the meantime
this patch changes:

WARNING: import u'import-2011-07-29_01_39_33-y3gZKV' on node1 failed:
Exited with status 1

into:

WARNING: import 'import-2011-07-29_01_39_33-y3gZKV' on node1 failed:
Exited with status 1

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd "reboot_behavior" hypervisor flag
Stephen Shirley [Mon, 20 Jun 2011 15:52:55 +0000 (17:52 +0200)]
Add "reboot_behavior" hypervisor flag

During instance installations, you do not want the instance to reboot
and start again with the same parameters, as that will most likely
re-start the install process. Therefore, when the instance requests a
reboot it should instead shutdown. This flag allows this to be
controlled.

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRemoved non-existing -t option from the gnt-cluster man page
Andrea Spadaccini [Fri, 29 Jul 2011 09:18:55 +0000 (10:18 +0100)]
Removed non-existing -t option from the gnt-cluster man page

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoClear the OS scripts environment
Iustin Pop [Thu, 28 Jul 2011 13:21:30 +0000 (15:21 +0200)]
Clear the OS scripts environment

The OS scripts currently run with the whole noded environment; this is
different from the hooks which run with a cleared one and most likely
an oversight.

This _might_ create problems when upgrading, so it needs to be clearly
announced for the new version.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agowatcher: Split state class into separate module
Michael Hanselmann [Wed, 27 Jul 2011 08:22:20 +0000 (10:22 +0200)]
watcher: Split state class into separate module

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoRename watcher's constant for instance status file
Michael Hanselmann [Wed, 27 Jul 2011 08:46:52 +0000 (10:46 +0200)]
Rename watcher's constant for instance status file

“upfile” is a bad name.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoFixed a typo in the installation tutorial
Andrea Spadaccini [Thu, 28 Jul 2011 19:37:04 +0000 (20:37 +0100)]
Fixed a typo in the installation tutorial

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher: Split node maintenance into separate module
Michael Hanselmann [Tue, 26 Jul 2011 12:14:18 +0000 (14:14 +0200)]
watcher: Split node maintenance into separate module

The node maintenance class is standalone.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFixed doc compilation under Sphinx 1.0.7
Andrea Spadaccini [Thu, 28 Jul 2011 11:23:47 +0000 (12:23 +0100)]
Fixed doc compilation under Sphinx 1.0.7

Sphinx 1.0.7 complains if an indented block in .warning starts with :option.
This fixes it.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoMerge branch 'devel-2.4'
Iustin Pop [Thu, 28 Jul 2011 11:10:28 +0000 (13:10 +0200)]
Merge branch 'devel-2.4'

* devel-2.4:
  Add support for cluster/OS parameters in QA
  Add OS search path to gnt-cluster info

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRemove requirement for variants on OS API v15+
Iustin Pop [Wed, 27 Jul 2011 15:18:48 +0000 (17:18 +0200)]
Remove requirement for variants on OS API v15+

This removes:

- the check in backend that such OSes have a variants file or if it
  exists that is non-empty; in order for this to work, we also rework
  the logic in backend._TryOSFromDisk to allow for optional OS files
- the check in cluster verify such OSes to have a non-empty variant
  list (the check for consistent variants is still kept)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd support for cluster/OS parameters in QA
Iustin Pop [Thu, 28 Jul 2011 09:18:08 +0000 (11:18 +0200)]
Add support for cluster/OS parameters in QA

Currently there is no way to QA with (for example) an initrd because
the QA only inits the cluster with the default parameters. This makes
it impossible to QA using anything but the default parameters, which
doesn't always work.

Additionally, we add OS parameters and OS hypervisor parameters, for
completeness and for testing that these commands also work.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRevert "cli.JobExecutor: Feedback function for info output"
Iustin Pop [Wed, 27 Jul 2011 16:46:47 +0000 (18:46 +0200)]
Revert "cli.JobExecutor: Feedback function for info output"

This reverts commit 7421df8e5f2cf31022085b332d1300640ba5854b.

The feedback_fn argument to JobExecutor is used for PollJob, and thus
has a fixed signature: a single arg, tuple of (timestamp, log type,
log message). It's use as drop-in replacement for ToStdout doesn't
work, as that function has a different signature.

For now, I propose to revert this, until we either change JobExecutor
to use the same log messages (and add an intermediate wrapper between
JobExecutor and ToStdout) or we add another parameter to
JobExecutor.__init__.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoExtend the ovf-support design with format translation
Agata Murawska [Thu, 28 Jul 2011 08:10:02 +0000 (10:10 +0200)]
Extend the ovf-support design with format translation

Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd a QA constant for cluster verify command
Iustin Pop [Wed, 27 Jul 2011 16:45:16 +0000 (18:45 +0200)]
Add a QA constant for cluster verify command

This seems to be used and reused multiple times, let's abstract it…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix group verification of offline nodes
Iustin Pop [Wed, 27 Jul 2011 16:02:46 +0000 (18:02 +0200)]
Fix group verification of offline nodes

Commit aef59ae7 reworked the file verification, but forgot to take
into account offline nodes.

The fact that this was not detected yet is due to the fact that we
don't test clusters with offline nodes in QA :(

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoDisallow variants for OSes that don't support them
Iustin Pop [Wed, 27 Jul 2011 11:53:24 +0000 (13:53 +0200)]
Disallow variants for OSes that don't support them

Otherwise we get no variant checks at all, but the variant is still
recorded.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix QA OS API failure
Iustin Pop [Wed, 27 Jul 2011 10:32:43 +0000 (12:32 +0200)]
Fix QA OS API failure

The patch changing the OS api in QA to 20 was not complete, sorry.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoQA: test using OS API v20
Iustin Pop [Tue, 26 Jul 2011 17:11:28 +0000 (19:11 +0200)]
QA: test using OS API v20

v20 is (mostly) a superset of the other versions, so testing with it
should be better than with V10. This detects properly the breakage
fixed by the previous patch.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix OS queries for API v20 w/parameters
Iustin Pop [Tue, 26 Jul 2011 16:34:46 +0000 (18:34 +0200)]
Fix OS queries for API v20 w/parameters

OS parameters is a list of tuples, so we can't pass it directly to
utils.NiceSort, hence we use a sort key.

This was not detected in QA since QA only tests API v10 :(

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd helper for declaring all locks shared
Iustin Pop [Tue, 26 Jul 2011 11:03:20 +0000 (13:03 +0200)]
Add helper for declaring all locks shared

This patch adds a function for abstracting
“dict.fromkeys(locking.LEVELS, 1)”. It also removes a duplicate
assignment for the share_locks in LUInstanceQuerydata.

Additionally, it moves the _SupportsOob function to the helper
function list.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd ht-based result checks to opcodes
Michael Hanselmann [Tue, 26 Jul 2011 11:12:11 +0000 (13:12 +0200)]
Add ht-based result checks to opcodes

This adds the infrastructure necessary to check opcode results using
ht-based functions. Checks are added for two opcodes.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoChange OpClusterVerifyDisks to per-group opcodes
Michael Hanselmann [Tue, 26 Jul 2011 09:34:00 +0000 (11:34 +0200)]
Change OpClusterVerifyDisks to per-group opcodes

Until now verifying disks, which is also used by the watcher,
would lock all nodes and instances. With this patch the opcode
is changed to operate on per nodegroup, requiring fewer locks.

Both “gnt-cluster” and “ganeti-watcher” are changed for the
new interface.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocmdlib: Give instance name in error message on group evacuation
Michael Hanselmann [Tue, 26 Jul 2011 09:32:22 +0000 (11:32 +0200)]
cmdlib: Give instance name in error message on group evacuation

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocmdlib: Factorize mapping instance LVs to node/volume
Michael Hanselmann [Tue, 26 Jul 2011 09:31:42 +0000 (11:31 +0200)]
cmdlib: Factorize mapping instance LVs to node/volume

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocli.JobExecutor: Feedback function for info output
Michael Hanselmann [Tue, 26 Jul 2011 09:31:04 +0000 (11:31 +0200)]
cli.JobExecutor: Feedback function for info output

This will be used in the watcher where we don't want to
pollute stdout unless in debug mode.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd OS search path to gnt-cluster info
Ben Lipton [Mon, 25 Jul 2011 17:22:36 +0000 (13:22 -0400)]
Add OS search path to gnt-cluster info

Otherwise, it's pretty hard to figure it out from the command line.

Signed-off-by: Ben Lipton <benlipton@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocluster-merge: remove a hardcoded constant
Guido Trotter [Tue, 26 Jul 2011 08:46:02 +0000 (10:46 +0200)]
cluster-merge: remove a hardcoded constant

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocluster-merge: remove option list from usage
Guido Trotter [Tue, 26 Jul 2011 08:23:52 +0000 (10:23 +0200)]
cluster-merge: remove option list from usage

It doesn't make sense to have to keep them up to date twice, and --help
already lists all of them with help strings.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocluster-merge: add instance restart strategy opt
Guido Trotter [Mon, 25 Jul 2011 15:17:23 +0000 (15:17 +0000)]
cluster-merge: add instance restart strategy opt

Right now we always restart all instances, which is not right if some
instances were already down for other reasons. Thus we add an option to
decide how to handle this. The right default should be "up" which is:
"restart all options which were switched off by the merge", but since
that's not implemented yet, the default remains the old one, for now.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix recompilation of htools on regen-vcs-version
Iustin Pop [Mon, 25 Jul 2011 16:42:52 +0000 (18:42 +0200)]
Fix recompilation of htools on regen-vcs-version

Currently, most htools code depends on Constants.hs which is generated
from constants.py and also depends on _autoconf.py. Also, _autoconf.py
depends on vcs-version, which all together means that when 'make
regen-vcs-version' is run, for example by ./devel/upload, most of the
Haskell code needs recompilation.

Since htools already has its 'optimised' vcs-version (and doesn't use
the _autoconf.VCS_VERSION constants), we can optimise this as follows:

- _autoconf.py doesn't contain the VCS_VERSION anymore, and that is
  instead moved to _vcsversion.py
- constants.py depends on and imports this new module
- _autoconf.py doesn't get regenerated at vcs-version changes, but
  only at re-running configure/changing Makefile time

The end result is that only htools/Ganeti/HTools/Version.hs is
recompiled now, which is a significant speedup (usually < 1 second
versus 10 seconds previously).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoAdd another name for the --yes-do-it option
Iustin Pop [Mon, 25 Jul 2011 16:26:18 +0000 (18:26 +0200)]
Add another name for the --yes-do-it option

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoMost boring patch ever
Iustin Pop [Mon, 25 Jul 2011 11:07:36 +0000 (13:07 +0200)]
Most boring patch ever

s/'/"/ in (hopefully) the right places.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoMerge branch 'devel-2.4'
Michael Hanselmann [Mon, 25 Jul 2011 13:02:23 +0000 (15:02 +0200)]
Merge branch 'devel-2.4'

* devel-2.4:
  Reopen daemon's stdio on SIGHUP
  Reopen log file only once after SIGHUP
  Don't leak file descriptors when setting up daemon output
  Fix aliases in bash completion

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoReopen daemon's stdio on SIGHUP
Michael Hanselmann [Mon, 25 Jul 2011 12:35:51 +0000 (14:35 +0200)]
Reopen daemon's stdio on SIGHUP

Before this patch daemons would continue to refer to an old logfile for
their standard I/O if they had been asked to reopen the log (SIGHUP).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoReopen log file only once after SIGHUP
Michael Hanselmann [Mon, 25 Jul 2011 11:08:22 +0000 (13:08 +0200)]
Reopen log file only once after SIGHUP

Commit b6fa9a44 added a re-openable log handler. The log file is
reopened when a daemon is sent a HUP signal. Due to a bug in the code,
fixed by this patch, the log file would be reopened for every single log
message thereafter.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoDon't leak file descriptors when setting up daemon output
Michael Hanselmann [Mon, 25 Jul 2011 10:02:24 +0000 (12:02 +0200)]
Don't leak file descriptors when setting up daemon output

When a daemon's output is configured using “utils.SetupDaemonFDs”, the
function must use dup2(2). Unfortunately the code didn't close the
original file descriptors, leaking them in the process.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agohtools: rework the algorithm for ChangeAll mode
Iustin Pop [Mon, 18 Jul 2011 11:17:41 +0000 (13:17 +0200)]
htools: rework the algorithm for ChangeAll mode

I think I've identified the problem with the current ChangeAll
mode. The current algorithm works as follows:

- identify a new primary by choosing the node which gives best score
  as new secondary
- failover to it
- identify a new secondary by choosing the node which gives best score
  as new secondary

This means that the future primary is 'fixed' after the first
iteration, leaving to possibly suboptimal results. This patch changes
the algorithm to do what, in hindsight, seems the obvious thing to do:
- generate all pairs (primary, secondary)
- identify the pair that after the above sequence (r:np, f, r:ns)
  gives the best group score

This fixes some of the corner cases I've seen in relocation, but not
all; the remaining cases are related to multi-instance relocation and
while they can't be fixed in the current framework, the needed
rebalancing is much smaller than with the current algorithm.

The patch also fixes an issue with the docstring of another function.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agognt-instance info: Return static info if node offline
Michael Hanselmann [Fri, 22 Jul 2011 11:27:05 +0000 (13:27 +0200)]
gnt-instance info: Return static info if node offline

Before this patch “gnt-instance info” would fail with the error message
“Error checking node $node: Node is marked offline” if the instance's
primary node is marked offline and the user didn't explicitely request
static information only. With this patch the LU will automatically
return static information if the instance's primary node is marked
offline.

Some explicit loops are changed to map().

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoIgnore offline primary when failing over
Michael Hanselmann [Fri, 22 Jul 2011 11:04:57 +0000 (13:04 +0200)]
Ignore offline primary when failing over

When the source node for a failover is marked offline, there's no need
to require the user to specify “--ignore-consistency”.

To make it work at all, a number of bugs introduced by the merge of
migration and failover are also fixed by this patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agohtools: replace two hardcoded uses of pri+sec nodes
Iustin Pop [Fri, 15 Jul 2011 08:53:45 +0000 (10:53 +0200)]
htools: replace two hardcoded uses of pri+sec nodes

These two cases use explicit uses of primary and secondary nodes with
Instance.allNodes, which means the code is more flexible if the
internal layout of the instance changes.

I've verified that the output of involvedNodes  is not required to be
4-element long, and as such the function docstring has been updated.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agohtools: add target_node member to migrate opcode
Iustin Pop [Sat, 9 Jul 2011 17:48:36 +0000 (19:48 +0200)]
htools: add target_node member to migrate opcode

… and failover too. Not many changes otherwise except for
serialisation and unittests.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agohtools: do not change node disk for non-local storage
Iustin Pop [Sat, 9 Jul 2011 09:17:10 +0000 (11:17 +0200)]
htools: do not change node disk for non-local storage

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agohtools: add more functions for local disk storage
Iustin Pop [Sat, 9 Jul 2011 09:01:49 +0000 (11:01 +0200)]
htools: add more functions for local disk storage

These will be used in Node.hs for proper add/remove instance
code. Furthermore, we restrict the movable status to the right disk
templates only, so that we don't attempt to move the 'wrong' instance
types.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoInitial design doc for OVF support
Agata Murawska [Thu, 21 Jul 2011 15:31:44 +0000 (17:31 +0200)]
Initial design doc for OVF support

Signed-off-by: Agata Murawska <agatamurawska@google.com>
[iustin@google.com: fixed formatting issues]

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix aliases in bash completion
Michael Hanselmann [Fri, 22 Jul 2011 09:55:46 +0000 (11:55 +0200)]
Fix aliases in bash completion

Ever since commit 2d48a3a2 aliases were not included in the bash
completion script. This patch also replaces one tab with two spaces.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agognt-instance console: Use query instead of opcode
Michael Hanselmann [Fri, 22 Jul 2011 06:14:39 +0000 (08:14 +0200)]
gnt-instance console: Use query instead of opcode

This means opening the console no longer requires the instance lock,
allowing it to be used during long-running operations (e.g. replacing a
disk).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoMerge branch 'devel-2.4'
Michael Hanselmann [Fri, 22 Jul 2011 09:05:55 +0000 (11:05 +0200)]
Merge branch 'devel-2.4'

* devel-2.4:
  gnt-node volumes: Fix instance names

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd opcode attribute for comments
Michael Hanselmann [Fri, 22 Jul 2011 05:35:44 +0000 (07:35 +0200)]
Add opcode attribute for comments

This attribute allows programmatic submitters of jobs (e.g. iallocator)
to add a comment to each opcode, describing its purpose. Example:

$ gnt-job info 123
Job ID: 123
  …
  Opcodes:
    OP_INSTANCE_REPLACE_DISKS
      …
      Input fields:
        comment: Replaces disks on inst1.example.com
      …

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agognt-node volumes: Fix instance names
Michael Hanselmann [Fri, 22 Jul 2011 08:27:09 +0000 (10:27 +0200)]
gnt-node volumes: Fix instance names

Commit 84d7e26b changed “objects.Instance.MapLVsByN” to not just return
the LV name, but to include the volume group name (e.g.
“xenvg/d67e8700….disk0_data”). This in turn broke the mapping of volume
names in LUNodeQueryvols, stopping instance names from displayed in
“gnt-node volumes”.

This patch fixes the issue and does some cleanup.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFixed one option name and a typo in the docs
Andrea Spadaccini [Thu, 21 Jul 2011 15:54:42 +0000 (16:54 +0100)]
Fixed one option name and a typo in the docs

The -g vg-name option was deprecated in commit
04367e70ad71eea3f0f19e7889dc68fb9783c98a.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix instance failover (missing argument)
Michael Hanselmann [Thu, 21 Jul 2011 13:22:23 +0000 (15:22 +0200)]
Fix instance failover (missing argument)

More fallout from commit 323f9095b49d.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoImplement instance failover via RAPI
Michael Hanselmann [Thu, 21 Jul 2011 13:20:45 +0000 (15:20 +0200)]
Implement instance failover via RAPI

No idea why this was missed before.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoExport job dependencies through lock monitor
Michael Hanselmann [Thu, 21 Jul 2011 08:47:27 +0000 (10:47 +0200)]
Export job dependencies through lock monitor

This makes them visible to the user. Example:

$ gnt-debug locks -o name,pending
Name    Pending
job/890 job:891,892
job/892 job:894

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agolocking.GLM: Allow adding locks to monitor
Michael Hanselmann [Thu, 21 Jul 2011 08:49:21 +0000 (10:49 +0200)]
locking.GLM: Allow adding locks to monitor

This will be used for exporting job dependencies through
the lock monitor.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoMake lock monitor more versatile
Michael Hanselmann [Wed, 13 Jul 2011 20:43:22 +0000 (22:43 +0200)]
Make lock monitor more versatile

With this change it'll be possible to register other lock information
providers. One usecase for this are job dependencies, which can be shown
in the output of “gnt-debug locks”, too.

The lock monitor is changed to accept more than one return value from
the function providing the information. Unfortunately it's hard to keep
weak references to bound methods, so that I settled on keeping a weak
reference on the object instead (see note in docstring).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoUpdate documentation regarding Haskell dependencies
Iustin Pop [Fri, 8 Jul 2011 14:07:42 +0000 (16:07 +0200)]
Update documentation regarding Haskell dependencies

These were forgot when the supported library versions were changed.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agohtools: add two more small unittests
Iustin Pop [Fri, 8 Jul 2011 13:52:14 +0000 (15:52 +0200)]
htools: add two more small unittests

This adds tests for the opToResult and eitherToResult functions from
Types.hs, and changes two other tests for the same module to test JSON
serialisation (which automatically also tests the lower-level to/from
string conversion functions).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agohtools: update hail man page with the new modes
Iustin Pop [Fri, 8 Jul 2011 13:23:26 +0000 (15:23 +0200)]
htools: update hail man page with the new modes

Also mark the deprecated modes we no longer support.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agohtools: a few more hlint fixes
Iustin Pop [Fri, 8 Jul 2011 13:18:07 +0000 (15:18 +0200)]
htools: a few more hlint fixes

Tested only on GHC 7.x, will test on 6.1x too before commit.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agohtools: further docstring fixes
Iustin Pop [Fri, 8 Jul 2011 12:52:01 +0000 (14:52 +0200)]
htools: further docstring fixes

This adds parameter documentation for Cluster.iMoveToJob (I think it
was not clear if the new or old node list is needed) and fixes other
docstring style issues.

After this patch, all modules except for CLI.hs (which has many
obvious declarations for command-line options) and QC.hs (unittests)
have 100% doc-strings.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agohtools: add JSON instance for EvacMode
Iustin Pop [Fri, 8 Jul 2011 12:19:17 +0000 (14:19 +0200)]
htools: add JSON instance for EvacMode

This abstracts the JSON parsing of the type EvacMode near its
definition, and simplifies its conversion in IAlloc.parseData.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>