Michael Hanselmann [Fri, 29 Jul 2011 14:35:10 +0000 (16:35 +0200)]
ganeti-cleaner: Remove old watcher state files
Watcher state files can stay around if node groups are removed. With
this patch they're removed after 21 days.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:14:36 +0000 (15:14 +0200)]
Remove WATCHER_STATEFILE constant
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:15:00 +0000 (15:15 +0200)]
cfgupgrade: Remove old watcher state file
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:49:20 +0000 (15:49 +0200)]
ganeti-watcher: Split for node groups
This patch brings a huge change to ganeti-watcher to make it aware of
node groups. Each node group is processed in its own subprocess,
reducing the impact of long-running operations.
The global watcher state file, $datadir/ganeti/watcher.data, is replaced
with a state file per node group ($datadir/ganeti/watcher.${uuid}.data).
Previously a lock on the state file was used to ensure only one instance
of watcher was running at the same time. Some operations, e.g.
“gnt-cluster renew-crypto”, blocked the watcher by acquiring an
exclusive lock on the state file. Since the watcher processes now use
different files, this method is no longer usable. Locking multiple files
isn't atomic. Instead a dedicated lock file is used and every watcher
process acquires a shared lock on it. If a Ganeti command wants to block
the watcher it acquires the lock in exclusive mode.
Each per-nodegroup watcher process also acquires an exclusive lock on
its state file. This prevents multiple watchers from running for the
same nodegroup.
The code is reorganized heavily to clear up dependencies between
functions and to get rid of the global “client” variable. The utility
class “Watcher” is removed in favour of stand-alone utility functions.
Since the parent watcher process won't wait for its children by
default, a new option (--wait-children) was added. It is used, for
example, by QA.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 09:44:28 +0000 (11:44 +0200)]
Lock potential target nodes for group evacuation
All potential target nodes should be locked while calculating
a group evacuation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 08:46:51 +0000 (10:46 +0200)]
Small changes in group evacuation
- Use OpPrereqError in CheckPrereq
- Clarify command synopsis
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 08:43:41 +0000 (10:43 +0200)]
cmdlib: Factorize getting iallocator
The same logic will be used for changing an instance's group.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 15:01:10 +0000 (17:01 +0200)]
Add design document for Ganeti 2.5
Including the designs which were actually implemented.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Apollon Oikonomopoulos [Wed, 3 Aug 2011 15:03:38 +0000 (18:03 +0300)]
Pause DRBD sync for OS install if not wait_for_sync
When wait_for_sync is set to False in LUInstanceCreate, Ganeti lets DRBD sync
in the background while performing the rest of the installation steps,
including OS installation.
However, OS installation is a very disk-intensive task that intereferes badly
with the background I/O caused by DRBD's initial sync. To this end, we pause
the background sync before OS installation and unpause it afterwards, which
yields a significant speed boost for OS installation. The following should be
noted:
a) The user has requested not to wait for sync, i.e. the instance will be
non-redundant for an unspecified interval anyway and delaying this by a
couple of minutes is not a big compromise.
b) This approach is also followed during disk wiping.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
[iustin@google.com: simplify an if check]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 3 Aug 2011 15:57:39 +0000 (17:57 +0200)]
Fix documentation of gnt-instance failover
Explain that we only start the instance on the new node if it was
originally running.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 15:08:57 +0000 (17:08 +0200)]
Small doc patch for gnt-node evacuate
Just explain a bit the relation between node evacuate and instance
commands.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 15:23:04 +0000 (17:23 +0200)]
Fix small typo in docstring
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 11:44:00 +0000 (13:44 +0200)]
Fix typo in NEWS
“--dry-run” starts with two dashes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 2 Aug 2011 08:14:05 +0000 (10:14 +0200)]
Change the backend.InstanceLogName signature
This uses now the component for the transfer (if available), otherwise
(e.g. in installs/renames) nothing.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 2 Aug 2011 08:12:05 +0000 (10:12 +0200)]
Instance transfer: export component name to backend
This modifies the RPC layer to export the component name too to the
backend, so that it can be used in log files and messages.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 2 Aug 2011 08:06:34 +0000 (10:06 +0200)]
Instance transfer: add argument for the 'component'
Currently, transfer data is done mainly with just the instance name,
but when we have instances with multiple disks this is not enough to
distinguish between the different transfers being done for the
instance.
Some parts of the code do have knowledge of the part being transferred
(i.e. DiskTransfer.name), but if I understood correctly not all, so I
decided to add a new argument to the respective disk import/disk
export classes.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 11:48:46 +0000 (13:48 +0200)]
Optimise use of repeated/looping GetInstanceInfo
Similar to the previous patch, this adds a helper function to
eliminate repeated calls info ConfigWriter.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 11:16:48 +0000 (13:16 +0200)]
Optimise use of repeated/looping GetNodeInfo
This adds a new ConfigWriter.GetMultiNodeInfo function and replaces
multiple/looping calls to GetNodeInfo with it.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 10:59:02 +0000 (12:59 +0200)]
Fix lint errors
It turns out that the only use of the operator module was for
itemgetter, so patch
eb62069e should have removed that import too.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
René Nussbaumer [Wed, 3 Aug 2011 12:34:24 +0000 (14:34 +0200)]
gnt-node.rst: Fix a typo
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 09:47:41 +0000 (11:47 +0200)]
Add two more compat functions
operator.itemgetter(0) → fst
operator.itemgetter(1) → snd
snd is not used yet, but it makes sense to add both.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Pedro Macedo [Tue, 2 Aug 2011 15:19:36 +0000 (17:19 +0200)]
Add a flag to burnin to allow specifying VCPU count.
Signed-off-by: Pedro Macedo <pmacedo@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 2 Aug 2011 13:01:34 +0000 (15:01 +0200)]
Fix types passed to IAllocator
Iallocator mode reloc, parameter reloc_from takes a list; half of the
code already forced this parameter to list, we add the other two cases
where it is needed.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 2 Aug 2011 12:59:00 +0000 (14:59 +0200)]
htools: change absolute to relative symlinks
Currently we use absolute symlinks, but this doesn't work when we
install remotely (due to install first to local temp dir, then rsync
to remote machines). To fix, we change to manually-computed relative
paths, which is not best, but it works.
One possible alternative would be to use hard-links…
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 2 Aug 2011 09:48:09 +0000 (11:48 +0200)]
jqueue: Add short delay before detecting job changes
By sleeping for 100ms after receiving a notification for a changed job
file the job is given some additional time to change again. This
significantly reduces the number of LUXI calls for WaitForJobChanges
(depending on the job, in my tests with “gnt-cluster verify
--debug-simulate-errors” by about 80%), and improves performance (the
same job went from around 7 seconds to around 3.5 seconds).
This method is not perfect. The algorithm could be made more complex,
e.g. by increasing the delay on each change, etc., but for now this
simple change provides a good improvement.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 28 Jul 2011 11:37:20 +0000 (13:37 +0200)]
Add primary/second nodes' group as query fields
These will be very useful for ganeti-watcher as it needs to retrieve
instances by group.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 2 Aug 2011 06:58:27 +0000 (08:58 +0200)]
Fix doclint failures
Commit
54ca6e4b2 renamed some arguments, but didn't also renames them
in the docstrings.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:56:05 +0000 (15:56 +0200)]
watcher: Separate function for writing instance status file
For now this will do another query to the master daemon, but with the
split for node groups this issue will go away.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:49:55 +0000 (15:49 +0200)]
watcher: Make RAPI error messages less technical
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:43:14 +0000 (15:43 +0200)]
watcher.state: Use strings, not objects
Until now the state class would receive instances as objects
(ganeti.watcher.Instance), but this is not necessary. By using strings
the interface is simplified.
This patch also simplifies some code accessing the internal structures,
e.g. setting a key of a dictionary. Some instances of “del dict[key]”
are replaced with “dict.pop(key, None)” to suppress any exceptions if
the key doesn't exist.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:20:42 +0000 (15:20 +0200)]
watcher: Raise error on unknown hook status
Also, remove punctuation from one error message.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:19:04 +0000 (15:19 +0200)]
watcher: Reformat constants
Make them match with style guide.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:13:58 +0000 (15:13 +0200)]
Add new watcher constants
WATCHER_STATEFILE will be removed at the end of this
patch series.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Stephen Shirley [Fri, 29 Jul 2011 12:15:40 +0000 (14:15 +0200)]
Fix formatting of frozensets
Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 28 Jul 2011 09:26:36 +0000 (11:26 +0200)]
cli: Add constant for node group option
ganeti-watcher will use this constant to pass the option to itself for
processing all node groups.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 29 Jul 2011 08:55:44 +0000 (10:55 +0200)]
Replace %r with '%s' in masterd/instance.py
I still don't know why Michael is a fan of %r, but in the meantime
this patch changes:
WARNING: import u'import-2011-07-29_01_39_33-y3gZKV' on node1 failed:
Exited with status 1
into:
WARNING: import 'import-2011-07-29_01_39_33-y3gZKV' on node1 failed:
Exited with status 1
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Stephen Shirley [Mon, 20 Jun 2011 15:52:55 +0000 (17:52 +0200)]
Add "reboot_behavior" hypervisor flag
During instance installations, you do not want the instance to reboot
and start again with the same parameters, as that will most likely
re-start the install process. Therefore, when the instance requests a
reboot it should instead shutdown. This flag allows this to be
controlled.
Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Fri, 29 Jul 2011 09:18:55 +0000 (10:18 +0100)]
Removed non-existing -t option from the gnt-cluster man page
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 28 Jul 2011 13:21:30 +0000 (15:21 +0200)]
Clear the OS scripts environment
The OS scripts currently run with the whole noded environment; this is
different from the hooks which run with a cleared one and most likely
an oversight.
This _might_ create problems when upgrading, so it needs to be clearly
announced for the new version.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Wed, 27 Jul 2011 08:22:20 +0000 (10:22 +0200)]
watcher: Split state class into separate module
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 27 Jul 2011 08:46:52 +0000 (10:46 +0200)]
Rename watcher's constant for instance status file
“upfile” is a bad name.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Andrea Spadaccini [Thu, 28 Jul 2011 19:37:04 +0000 (20:37 +0100)]
Fixed a typo in the installation tutorial
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 26 Jul 2011 12:14:18 +0000 (14:14 +0200)]
watcher: Split node maintenance into separate module
The node maintenance class is standalone.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Andrea Spadaccini [Thu, 28 Jul 2011 11:23:47 +0000 (12:23 +0100)]
Fixed doc compilation under Sphinx 1.0.7
Sphinx 1.0.7 complains if an indented block in .warning starts with :option.
This fixes it.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 28 Jul 2011 11:10:28 +0000 (13:10 +0200)]
Merge branch 'devel-2.4'
* devel-2.4:
Add support for cluster/OS parameters in QA
Add OS search path to gnt-cluster info
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Jul 2011 15:18:48 +0000 (17:18 +0200)]
Remove requirement for variants on OS API v15+
This removes:
- the check in backend that such OSes have a variants file or if it
exists that is non-empty; in order for this to work, we also rework
the logic in backend._TryOSFromDisk to allow for optional OS files
- the check in cluster verify such OSes to have a non-empty variant
list (the check for consistent variants is still kept)
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 28 Jul 2011 09:18:08 +0000 (11:18 +0200)]
Add support for cluster/OS parameters in QA
Currently there is no way to QA with (for example) an initrd because
the QA only inits the cluster with the default parameters. This makes
it impossible to QA using anything but the default parameters, which
doesn't always work.
Additionally, we add OS parameters and OS hypervisor parameters, for
completeness and for testing that these commands also work.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Jul 2011 16:46:47 +0000 (18:46 +0200)]
Revert "cli.JobExecutor: Feedback function for info output"
This reverts commit
7421df8e5f2cf31022085b332d1300640ba5854b.
The feedback_fn argument to JobExecutor is used for PollJob, and thus
has a fixed signature: a single arg, tuple of (timestamp, log type,
log message). It's use as drop-in replacement for ToStdout doesn't
work, as that function has a different signature.
For now, I propose to revert this, until we either change JobExecutor
to use the same log messages (and add an intermediate wrapper between
JobExecutor and ToStdout) or we add another parameter to
JobExecutor.__init__.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Thu, 28 Jul 2011 08:10:02 +0000 (10:10 +0200)]
Extend the ovf-support design with format translation
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 27 Jul 2011 16:45:16 +0000 (18:45 +0200)]
Add a QA constant for cluster verify command
This seems to be used and reused multiple times, let's abstract it…
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Jul 2011 16:02:46 +0000 (18:02 +0200)]
Fix group verification of offline nodes
Commit
aef59ae7 reworked the file verification, but forgot to take
into account offline nodes.
The fact that this was not detected yet is due to the fact that we
don't test clusters with offline nodes in QA :(
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Jul 2011 11:53:24 +0000 (13:53 +0200)]
Disallow variants for OSes that don't support them
Otherwise we get no variant checks at all, but the variant is still
recorded.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Jul 2011 10:32:43 +0000 (12:32 +0200)]
Fix QA OS API failure
The patch changing the OS api in QA to 20 was not complete, sorry.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 26 Jul 2011 17:11:28 +0000 (19:11 +0200)]
QA: test using OS API v20
v20 is (mostly) a superset of the other versions, so testing with it
should be better than with V10. This detects properly the breakage
fixed by the previous patch.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 26 Jul 2011 16:34:46 +0000 (18:34 +0200)]
Fix OS queries for API v20 w/parameters
OS parameters is a list of tuples, so we can't pass it directly to
utils.NiceSort, hence we use a sort key.
This was not detected in QA since QA only tests API v10 :(
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 26 Jul 2011 11:03:20 +0000 (13:03 +0200)]
Add helper for declaring all locks shared
This patch adds a function for abstracting
“dict.fromkeys(locking.LEVELS, 1)”. It also removes a duplicate
assignment for the share_locks in LUInstanceQuerydata.
Additionally, it moves the _SupportsOob function to the helper
function list.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 26 Jul 2011 11:12:11 +0000 (13:12 +0200)]
Add ht-based result checks to opcodes
This adds the infrastructure necessary to check opcode results using
ht-based functions. Checks are added for two opcodes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 26 Jul 2011 09:34:00 +0000 (11:34 +0200)]
Change OpClusterVerifyDisks to per-group opcodes
Until now verifying disks, which is also used by the watcher,
would lock all nodes and instances. With this patch the opcode
is changed to operate on per nodegroup, requiring fewer locks.
Both “gnt-cluster” and “ganeti-watcher” are changed for the
new interface.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 26 Jul 2011 09:32:22 +0000 (11:32 +0200)]
cmdlib: Give instance name in error message on group evacuation
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 26 Jul 2011 09:31:42 +0000 (11:31 +0200)]
cmdlib: Factorize mapping instance LVs to node/volume
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 26 Jul 2011 09:31:04 +0000 (11:31 +0200)]
cli.JobExecutor: Feedback function for info output
This will be used in the watcher where we don't want to
pollute stdout unless in debug mode.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Ben Lipton [Mon, 25 Jul 2011 17:22:36 +0000 (13:22 -0400)]
Add OS search path to gnt-cluster info
Otherwise, it's pretty hard to figure it out from the command line.
Signed-off-by: Ben Lipton <benlipton@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 26 Jul 2011 08:46:02 +0000 (10:46 +0200)]
cluster-merge: remove a hardcoded constant
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 26 Jul 2011 08:23:52 +0000 (10:23 +0200)]
cluster-merge: remove option list from usage
It doesn't make sense to have to keep them up to date twice, and --help
already lists all of them with help strings.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Mon, 25 Jul 2011 15:17:23 +0000 (15:17 +0000)]
cluster-merge: add instance restart strategy opt
Right now we always restart all instances, which is not right if some
instances were already down for other reasons. Thus we add an option to
decide how to handle this. The right default should be "up" which is:
"restart all options which were switched off by the merge", but since
that's not implemented yet, the default remains the old one, for now.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 25 Jul 2011 16:42:52 +0000 (18:42 +0200)]
Fix recompilation of htools on regen-vcs-version
Currently, most htools code depends on Constants.hs which is generated
from constants.py and also depends on _autoconf.py. Also, _autoconf.py
depends on vcs-version, which all together means that when 'make
regen-vcs-version' is run, for example by ./devel/upload, most of the
Haskell code needs recompilation.
Since htools already has its 'optimised' vcs-version (and doesn't use
the _autoconf.VCS_VERSION constants), we can optimise this as follows:
- _autoconf.py doesn't contain the VCS_VERSION anymore, and that is
instead moved to _vcsversion.py
- constants.py depends on and imports this new module
- _autoconf.py doesn't get regenerated at vcs-version changes, but
only at re-running configure/changing Makefile time
The end result is that only htools/Ganeti/HTools/Version.hs is
recompiled now, which is a significant speedup (usually < 1 second
versus 10 seconds previously).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 25 Jul 2011 16:26:18 +0000 (18:26 +0200)]
Add another name for the --yes-do-it option
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 25 Jul 2011 11:07:36 +0000 (13:07 +0200)]
Most boring patch ever
s/'/"/ in (hopefully) the right places.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Mon, 25 Jul 2011 13:02:23 +0000 (15:02 +0200)]
Merge branch 'devel-2.4'
* devel-2.4:
Reopen daemon's stdio on SIGHUP
Reopen log file only once after SIGHUP
Don't leak file descriptors when setting up daemon output
Fix aliases in bash completion
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 25 Jul 2011 12:35:51 +0000 (14:35 +0200)]
Reopen daemon's stdio on SIGHUP
Before this patch daemons would continue to refer to an old logfile for
their standard I/O if they had been asked to reopen the log (SIGHUP).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 25 Jul 2011 11:08:22 +0000 (13:08 +0200)]
Reopen log file only once after SIGHUP
Commit
b6fa9a44 added a re-openable log handler. The log file is
reopened when a daemon is sent a HUP signal. Due to a bug in the code,
fixed by this patch, the log file would be reopened for every single log
message thereafter.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 25 Jul 2011 10:02:24 +0000 (12:02 +0200)]
Don't leak file descriptors when setting up daemon output
When a daemon's output is configured using “utils.SetupDaemonFDs”, the
function must use dup2(2). Unfortunately the code didn't close the
original file descriptors, leaking them in the process.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 18 Jul 2011 11:17:41 +0000 (13:17 +0200)]
htools: rework the algorithm for ChangeAll mode
I think I've identified the problem with the current ChangeAll
mode. The current algorithm works as follows:
- identify a new primary by choosing the node which gives best score
as new secondary
- failover to it
- identify a new secondary by choosing the node which gives best score
as new secondary
This means that the future primary is 'fixed' after the first
iteration, leaving to possibly suboptimal results. This patch changes
the algorithm to do what, in hindsight, seems the obvious thing to do:
- generate all pairs (primary, secondary)
- identify the pair that after the above sequence (r:np, f, r:ns)
gives the best group score
This fixes some of the corner cases I've seen in relocation, but not
all; the remaining cases are related to multi-instance relocation and
while they can't be fixed in the current framework, the needed
rebalancing is much smaller than with the current algorithm.
The patch also fixes an issue with the docstring of another function.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 22 Jul 2011 11:27:05 +0000 (13:27 +0200)]
gnt-instance info: Return static info if node offline
Before this patch “gnt-instance info” would fail with the error message
“Error checking node $node: Node is marked offline” if the instance's
primary node is marked offline and the user didn't explicitely request
static information only. With this patch the LU will automatically
return static information if the instance's primary node is marked
offline.
Some explicit loops are changed to map().
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 22 Jul 2011 11:04:57 +0000 (13:04 +0200)]
Ignore offline primary when failing over
When the source node for a failover is marked offline, there's no need
to require the user to specify “--ignore-consistency”.
To make it work at all, a number of bugs introduced by the merge of
migration and failover are also fixed by this patch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 15 Jul 2011 08:53:45 +0000 (10:53 +0200)]
htools: replace two hardcoded uses of pri+sec nodes
These two cases use explicit uses of primary and secondary nodes with
Instance.allNodes, which means the code is more flexible if the
internal layout of the instance changes.
I've verified that the output of involvedNodes is not required to be
4-element long, and as such the function docstring has been updated.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 9 Jul 2011 17:48:36 +0000 (19:48 +0200)]
htools: add target_node member to migrate opcode
… and failover too. Not many changes otherwise except for
serialisation and unittests.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 9 Jul 2011 09:17:10 +0000 (11:17 +0200)]
htools: do not change node disk for non-local storage
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 9 Jul 2011 09:01:49 +0000 (11:01 +0200)]
htools: add more functions for local disk storage
These will be used in Node.hs for proper add/remove instance
code. Furthermore, we restrict the movable status to the right disk
templates only, so that we don't attempt to move the 'wrong' instance
types.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Agata Murawska [Thu, 21 Jul 2011 15:31:44 +0000 (17:31 +0200)]
Initial design doc for OVF support
Signed-off-by: Agata Murawska <agatamurawska@google.com>
[iustin@google.com: fixed formatting issues]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 22 Jul 2011 09:55:46 +0000 (11:55 +0200)]
Fix aliases in bash completion
Ever since commit
2d48a3a2 aliases were not included in the bash
completion script. This patch also replaces one tab with two spaces.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 22 Jul 2011 06:14:39 +0000 (08:14 +0200)]
gnt-instance console: Use query instead of opcode
This means opening the console no longer requires the instance lock,
allowing it to be used during long-running operations (e.g. replacing a
disk).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 22 Jul 2011 09:05:55 +0000 (11:05 +0200)]
Merge branch 'devel-2.4'
* devel-2.4:
gnt-node volumes: Fix instance names
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 22 Jul 2011 05:35:44 +0000 (07:35 +0200)]
Add opcode attribute for comments
This attribute allows programmatic submitters of jobs (e.g. iallocator)
to add a comment to each opcode, describing its purpose. Example:
$ gnt-job info 123
Job ID: 123
…
Opcodes:
OP_INSTANCE_REPLACE_DISKS
…
Input fields:
comment: Replaces disks on inst1.example.com
…
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 22 Jul 2011 08:27:09 +0000 (10:27 +0200)]
gnt-node volumes: Fix instance names
Commit
84d7e26b changed “objects.Instance.MapLVsByN” to not just return
the LV name, but to include the volume group name (e.g.
“xenvg/
d67e8700….disk0_data”). This in turn broke the mapping of volume
names in LUNodeQueryvols, stopping instance names from displayed in
“gnt-node volumes”.
This patch fixes the issue and does some cleanup.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Andrea Spadaccini [Thu, 21 Jul 2011 15:54:42 +0000 (16:54 +0100)]
Fixed one option name and a typo in the docs
The -g vg-name option was deprecated in commit
04367e70ad71eea3f0f19e7889dc68fb9783c98a.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 21 Jul 2011 13:22:23 +0000 (15:22 +0200)]
Fix instance failover (missing argument)
More fallout from commit
323f9095b49d.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 21 Jul 2011 13:20:45 +0000 (15:20 +0200)]
Implement instance failover via RAPI
No idea why this was missed before.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 21 Jul 2011 08:47:27 +0000 (10:47 +0200)]
Export job dependencies through lock monitor
This makes them visible to the user. Example:
$ gnt-debug locks -o name,pending
Name Pending
job/890 job:891,892
job/892 job:894
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 21 Jul 2011 08:49:21 +0000 (10:49 +0200)]
locking.GLM: Allow adding locks to monitor
This will be used for exporting job dependencies through
the lock monitor.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 13 Jul 2011 20:43:22 +0000 (22:43 +0200)]
Make lock monitor more versatile
With this change it'll be possible to register other lock information
providers. One usecase for this are job dependencies, which can be shown
in the output of “gnt-debug locks”, too.
The lock monitor is changed to accept more than one return value from
the function providing the information. Unfortunately it's hard to keep
weak references to bound methods, so that I settled on keeping a weak
reference on the object instead (see note in docstring).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 8 Jul 2011 14:07:42 +0000 (16:07 +0200)]
Update documentation regarding Haskell dependencies
These were forgot when the supported library versions were changed.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 8 Jul 2011 13:52:14 +0000 (15:52 +0200)]
htools: add two more small unittests
This adds tests for the opToResult and eitherToResult functions from
Types.hs, and changes two other tests for the same module to test JSON
serialisation (which automatically also tests the lower-level to/from
string conversion functions).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 8 Jul 2011 13:23:26 +0000 (15:23 +0200)]
htools: update hail man page with the new modes
Also mark the deprecated modes we no longer support.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 8 Jul 2011 13:18:07 +0000 (15:18 +0200)]
htools: a few more hlint fixes
Tested only on GHC 7.x, will test on 6.1x too before commit.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 8 Jul 2011 12:52:01 +0000 (14:52 +0200)]
htools: further docstring fixes
This adds parameter documentation for Cluster.iMoveToJob (I think it
was not clear if the new or old node list is needed) and fixes other
docstring style issues.
After this patch, all modules except for CLI.hs (which has many
obvious declarations for command-line options) and QC.hs (unittests)
have 100% doc-strings.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 8 Jul 2011 12:19:17 +0000 (14:19 +0200)]
htools: add JSON instance for EvacMode
This abstracts the JSON parsing of the type EvacMode near its
definition, and simplifies its conversion in IAlloc.parseData.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 8 Jul 2011 11:53:14 +0000 (13:53 +0200)]
htools: add human-readable output to hspace
Currently, hspace can only output a machine-readable format that
(while detailed) is hard to parse quickly by people. This patch adds
(and enables by default) a human-readable output that shows the most
important metrics in a simple format.
Most of the work of the patch is in moving the display of various
metrics from the 'main' function to separate functions, each of which
can output either a machine or human intended format.
The patch also corrects a bug in the CPU efficiency display: before,
the efficiency was computed as instance virtual CPUs divided by total
physical CPUs, which is almost always supra-unitary. More correct is
to divide by the total virtual CPUs, which shows a more meaningful
number (when the p-to-v CPU ratio has been defined correctly).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 21 Jul 2011 11:35:36 +0000 (13:35 +0200)]
Fix job constants use in htools
Commit
56c094b4 added use of job constants, but I didn't pay
attention and ended up mixing things: job constants were used for
opcode ones, and the job ones didn't get converted.
This patch corrects it and uses only C.* constants throughout the Jobs
module.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 21 Jul 2011 09:53:29 +0000 (11:53 +0200)]
Add error state to LUGroupEvacuate's exceptions
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>