ganeti-local
12 years agoUpdate NEWS for 2.5
Michael Hanselmann [Tue, 9 Aug 2011 12:10:33 +0000 (14:10 +0200)]
Update NEWS for 2.5

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoSmall improvements for cluster verify
Michael Hanselmann [Fri, 12 Aug 2011 09:08:53 +0000 (11:08 +0200)]
Small improvements for cluster verify

- Check if BGL is actually owned
- Show group name as feedback

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher: Use locks when querying for resource information
Michael Hanselmann [Fri, 12 Aug 2011 12:48:11 +0000 (14:48 +0200)]
watcher: Use locks when querying for resource information

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoAllow locking to be used via OpQuery
Michael Hanselmann [Fri, 12 Aug 2011 12:47:47 +0000 (14:47 +0200)]
Allow locking to be used via OpQuery

The original design for query2 specifically excluded locking, but now
it's turned out that it would be a good thing to have in watcher. This
patch adds a new parameter to OpQuery and enables its use in LUQuery. A
missing function is added to LUGroupQuery, a comment clarified in
_NodeQuery and all locks declared as shared acquires in the same LU.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoDocument job results for RAPI where possible
Michael Hanselmann [Fri, 12 Aug 2011 12:06:54 +0000 (14:06 +0200)]
Document job results for RAPI where possible

Some opcodes aren't documented yet.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoopcodes: Add more result checks, add some comments
Michael Hanselmann [Fri, 12 Aug 2011 12:06:16 +0000 (14:06 +0200)]
opcodes: Add more result checks, add some comments

Some of these will be used by the RAPI documentation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agosphinx_ext: Allow documenting opcode results
Michael Hanselmann [Fri, 12 Aug 2011 12:05:35 +0000 (14:05 +0200)]
sphinx_ext: Allow documenting opcode results

Will be used by RAPI documentation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoht: Allow adding comment to type descriptions
Michael Hanselmann [Fri, 12 Aug 2011 12:03:51 +0000 (14:03 +0200)]
ht: Allow adding comment to type descriptions

This will be used to add some more details to type descriptions, e.g. on
opcode parameters or result values. The implementation is very similar
to “WithDesc”.

I chose to use “[…]” after finding “/*…*/” hard to read and spot. At
some we'll have to introduce proper formatting (e.g. using HTML).

Example with a comment:
  List of ((Length 2) and (Item 0 is (NonEmptyString [name of changed
  parameter]), item 1 is Anything))

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoClarify job ID-related type checks, add unittests
Michael Hanselmann [Fri, 12 Aug 2011 10:09:24 +0000 (12:09 +0200)]
Clarify job ID-related type checks, add unittests

Instead of a rather complicated expression only “JobId” is output. Job
ID lists (like generated by “SubmitManyJobs”) are limited to two-item
lists. Unittests are added.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoChange OpClusterVerifyConfig's result, verify results
Michael Hanselmann [Fri, 12 Aug 2011 09:14:07 +0000 (11:14 +0200)]
Change OpClusterVerifyConfig's result, verify results

This patch removes the list of node groups (not used anymore since
commit fcad7225e3fc) from OpClusterVerifyConfig's result and adds result
verification to all OpClusterVerify* opcodes.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoFixed error in Makefile.am, changing spaces with tabs
Andrea Spadaccini [Fri, 12 Aug 2011 11:02:18 +0000 (12:02 +0100)]
Fixed error in Makefile.am, changing spaces with tabs

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdded the test data files for netutils to Makefile.am
Andrea Spadaccini [Fri, 12 Aug 2011 10:08:52 +0000 (11:08 +0100)]
Added the test data files for netutils to Makefile.am

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoUse LU-generated jobs for verifying cluster
Michael Hanselmann [Thu, 11 Aug 2011 16:15:20 +0000 (18:15 +0200)]
Use LU-generated jobs for verifying cluster

This patch moves the logic for verifying the various node groups in a
cluster into the master daemon. Job dependencies are used to ensure the
configuration, which requires the BGL, is verified first.

With this change it will be possible to expose whole-cluster
verification through the remote API without requiring additional client
logic on top of standard features like LU-generated jobs and job
dependencies.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoopcodes: Use variables for verification parameters
Michael Hanselmann [Thu, 11 Aug 2011 16:11:17 +0000 (18:11 +0200)]
opcodes: Use variables for verification parameters

Just some cleanup before the 2.5 release.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agomcpu: Specify actual received type on opcode issue
Michael Hanselmann [Thu, 11 Aug 2011 14:37:30 +0000 (16:37 +0200)]
mcpu: Specify actual received type on opcode issue

This helped me debug an issue with opcodes.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoUse resource kind as OpQuery*'s description
Michael Hanselmann [Thu, 11 Aug 2011 12:10:42 +0000 (14:10 +0200)]
Use resource kind as OpQuery*'s description

This gives a hint as to what's queried. “QUERY(instance)” or
“QUERY(node)” are way better than just “QUERY”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoAdded helper functions in netutils and related constants
Andrea Spadaccini [Thu, 11 Aug 2011 10:30:47 +0000 (11:30 +0100)]
Added helper functions in netutils and related constants

Added the following functions to netutils:
- IsValidInterface
- GetInterfaceIpAddresses
- _GetIpAddressesFromIpOutput

Added the following static methods to netutils.IPAddress:
- GetAddressFamilyFromVersion
- GetVersionFromAddressFamily

Added unit tests for the new methods in netutils.IPAddress, for the IP
address search regex and for GetInterfaceIpAddresses

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix epydoc error in rlib2.py
Michael Hanselmann [Thu, 11 Aug 2011 10:18:59 +0000 (12:18 +0200)]
Fix epydoc error in rlib2.py

I blindly assumed epydoc would use normal reST, but turns out it uses
its own “epytext” in our configuration. Since the latter doesn't support
blockquotes, I just make the paragraph a literal block.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix typo in rlib2's docstring
Michael Hanselmann [Wed, 10 Aug 2011 15:30:26 +0000 (17:30 +0200)]
Fix typo in rlib2's docstring

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Benjamin Lipton <benlipton@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoDocumentation fixes and clarification
Michael Hanselmann [Wed, 10 Aug 2011 15:22:49 +0000 (17:22 +0200)]
Documentation fixes and clarification

- In README, refer to “install.rst”, not “install.html”
- In rapi.rst, wrap line longer than 72 characters
- In rlib2.py, update and clarify description of POST vs. PUT

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agognt-instance: Rename _SHUTDOWN_* to _EXPAND_*
Michael Hanselmann [Tue, 9 Aug 2011 14:42:41 +0000 (16:42 +0200)]
gnt-instance: Rename _SHUTDOWN_* to _EXPAND_*

Once upon a time these constants were only used for stopping instances,
but pretty soon they became more useful. Let's rename them.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoList returned fields in RAPI documentation
Michael Hanselmann [Tue, 9 Aug 2011 14:24:32 +0000 (16:24 +0200)]
List returned fields in RAPI documentation

Also replace console types with constants.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agorlib2: Exclude oplog/opresult from bulk job list
Michael Hanselmann [Tue, 9 Aug 2011 13:24:04 +0000 (15:24 +0200)]
rlib2: Exclude oplog/opresult from bulk job list

These fields can get rather large. Excluding them from the big bulk list
reduces the amount of data. They are still available via per-job
requests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agorlib: Expose node group tags
Michael Hanselmann [Tue, 9 Aug 2011 11:26:11 +0000 (13:26 +0200)]
rlib: Expose node group tags

Commit 1ffd26739d3 added support for tagging node groups. Also add a
check for exposed fields.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agorapi: Bulk support for jobs
Michael Hanselmann [Tue, 9 Aug 2011 11:13:49 +0000 (13:13 +0200)]
rapi: Bulk support for jobs

This was requested in issue 181.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoFixed an error in the documentation of _GetKVMVersion
Andrea Spadaccini [Tue, 9 Aug 2011 12:31:39 +0000 (13:31 +0100)]
Fixed an error in the documentation of _GetKVMVersion

Fixed an epydoc compilation error that I introduced with last commit.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoMention globbing filters in ganeti(7) manpage
Michael Hanselmann [Tue, 9 Aug 2011 12:11:02 +0000 (14:11 +0200)]
Mention globbing filters in ganeti(7) manpage

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoRemoved code duplication for calls to _GetKVMVersion
Andrea Spadaccini [Tue, 9 Aug 2011 11:16:49 +0000 (12:16 +0100)]
Removed code duplication for calls to _GetKVMVersion

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix epydoc breakage caused by f8638e288c7a
Michael Hanselmann [Mon, 8 Aug 2011 16:14:10 +0000 (18:14 +0200)]
Fix epydoc breakage caused by f8638e288c7a

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoChanged NET_PORT_CHECK to REQ_NET_PORT_CHECK, to improve consistency
Andrea Spadaccini [Mon, 8 Aug 2011 14:00:04 +0000 (15:00 +0100)]
Changed NET_PORT_CHECK to REQ_NET_PORT_CHECK, to improve consistency

I originally made this change because I needed the OPT_NET_PORT_CHECK,
and I am committing it even if I don't need anymore OPT_NET_PORT_CHECK
because IMO it improves the consistency of the name of the wrappers.

Also, I changed the code of the check to use inequality operators
chaining.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdded check for the ip command at configure time
Andrea Spadaccini [Mon, 8 Aug 2011 14:00:03 +0000 (15:00 +0100)]
Added check for the ip command at configure time

Also, corrected a few places where the ip command was hardcoded.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoDetect globbing patterns as query arguments
Michael Hanselmann [Mon, 8 Aug 2011 15:31:59 +0000 (17:31 +0200)]
Detect globbing patterns as query arguments

Short: this patch enables the use of “gnt-instance list '*.site'”.

Detailed description: This patch changes the command line interface code
to try to deduce the kind of filter from the arguments to a “list”
command. If it's a list of plain names an old-style name filter is used.
If filtering is forced or the single argument is potentially a filter,
it is parsed as a query filter string. Any name looking like a globbing
pattern (e.g. “*.site” or “web?.example.com”) is treated as such.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocluster-merge: implement params delta mercifulness
Guido Trotter [Fri, 5 Aug 2011 09:40:45 +0000 (10:40 +0100)]
cluster-merge: implement params delta mercifulness

Sometimes it's good to tell the user about parameter differences but
then proceed anyway. Strictness is still enforced for those parameters
that would break the cluster (volume group name, storage dir if file
storage is enabled).

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocluster-merge: consider file storage enable state
Guido Trotter [Thu, 4 Aug 2011 14:01:04 +0000 (15:01 +0100)]
cluster-merge: consider file storage enable state

There's no point in checking whether the file storage dir in the two
clusters is the same if file storage is not even enabled

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAllow fixing of split instances via relocate
Iustin Pop [Mon, 8 Aug 2011 09:34:41 +0000 (11:34 +0200)]
Allow fixing of split instances via relocate

Currently, the IAllocator code requests strictly that the (set of) groups of
the nodes we're relocating from is equal to the set of groups we're
relocating to.

This, however, makes is impossible to fix split instances, since (by
definition) the secondary of a split instance is not in the same group
as the primary node, and after the fixing is it the same.

The patch changes the test from group equality to check that the final
group set (across both primary and secondary nodes) is a subset of the
initial group set (again across both nodes). This means we can't
"extend" the group of nodes but keeping the same or decreasing it is
allowed.

After this patch, one can finally fix (automatically) split instances
via a gnt-instance replace-disks.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRevert deprecation of evacuate mode in hail
Iustin Pop [Fri, 5 Aug 2011 16:56:57 +0000 (18:56 +0200)]
Revert deprecation of evacuate mode in hail

As discussed offline, the new node-change mode could be used for
evacuation, but it's not directly useful as it returns a list of
opcodes; therefore, we need to partially revert commits fbe5fcf and
5b53ca7 that removed it (and multi-evacuate, which remains removed).

The new version of relocate is actually just a wrapper over the
tryNodeEvac (which does the node evacuate); we run that and then we do
some extra checks that the nodes we got from that function are
consistent with the instance's new state.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFurther cleanup after multi-evacuate removal
Iustin Pop [Fri, 5 Aug 2011 14:52:44 +0000 (16:52 +0200)]
Further cleanup after multi-evacuate removal

Commit f0edfcf6 removed the parsing of multi-evacuate result, but the
code went from:

  if mode in (multi-evac, relocate):
    …
    if mode == relocate:
      …

to:

  if mode == relocate:
    …
    if mode == relocate
      …

This patch simply removes the nested if.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoFix bug in IAllocator parsing of Evacuate result
Iustin Pop [Fri, 5 Aug 2011 14:48:49 +0000 (16:48 +0200)]
Fix bug in IAllocator parsing of Evacuate result

Commit 342f9172 added stricter checks for the iallocator result in
evacuate mode, but it does this irrespective of the result
status. When the result has failed and (according to the design) the
list of nodes is empty, this code will trigger the following:

    node1# gnt-instance replace-disks -I hail instance14
    Failure: command execution error:
    Groups of nodes returned by iallocator () differ from original groups (default)

After the patch, the result is:

    node1# gnt-instance replace-disks -I hail instance14
    Failure: prerequisites not met for this operation:
    error type: insufficient_resources, error details:
    Can't compute nodes using iallocator 'hail': Request failed: …

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoImplement globbing operator for filters
Michael Hanselmann [Fri, 5 Aug 2011 13:38:41 +0000 (15:38 +0200)]
Implement globbing operator for filters

The operators “=*” and “!*” do globbing in filters, e.g.:

$ gnt-instance list --no-headers -o name 'name =* "*.site"'
inst1.site.example.com

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoZero DRBD metadata before creation
Iustin Pop [Fri, 5 Aug 2011 13:41:28 +0000 (15:41 +0200)]
Zero DRBD metadata before creation

The docstring of the DRBD8 class says:

  … The meta device is checked for valid size and is zeroed on create.

which is not done today, hence we have
http://code.google.com/p/ganeti/issues/detail?id=182:

  node1# mkreiserfs -f /dev/xenvg/t8
  …
  ReiserFS is successfully created on /dev/xenvg/t8.
  node1# drbdmeta --force /dev/drbd256 v08 /dev/xenvg/t8 0 create-md
  md_offset 0
  al_offset 4096
  bm_offset 36864

  Found reiser filesystem

  This would corrupt existing data.
  If you want me to do this, you need to zero out the first part
  of the device (destroy the content).
  You should be very sure that you mean it.
  Operation refused.

I've tested and even just 1MB is enough to wipe the meta, but let's be
safer and pass a 'clean' meta to drbd.

Note: I didn't copy _WipeDevice from backend.py since it seemed more
complex than needed here.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRemove iallocator's “multi-evacuate” mode
Michael Hanselmann [Fri, 5 Aug 2011 12:27:46 +0000 (14:27 +0200)]
Remove iallocator's “multi-evacuate” mode

It is no longer used and has been deprecated in 2.5.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoconfd.querylib: Remove long-deprecated query mode
Michael Hanselmann [Fri, 5 Aug 2011 12:39:24 +0000 (14:39 +0200)]
confd.querylib: Remove long-deprecated query mode

This was never used by a stable version.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd docstring to cmdlib.TLReplaceDisks._FindFaultyDisks
Michael Hanselmann [Fri, 5 Aug 2011 12:26:50 +0000 (14:26 +0200)]
Add docstring to cmdlib.TLReplaceDisks._FindFaultyDisks

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher: Fix breakage caused by 9bb69bb52fb9
Michael Hanselmann [Fri, 5 Aug 2011 12:17:07 +0000 (14:17 +0200)]
watcher: Fix breakage caused by 9bb69bb52fb9

The first argument to str.split is the separator, not the maximum number
of splits.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoLUGroupVerifyDisks: Use _CheckInstanceNodeGroups' result
Michael Hanselmann [Fri, 5 Aug 2011 11:10:53 +0000 (13:10 +0200)]
LUGroupVerifyDisks: Use _CheckInstanceNodeGroups' result

… instead of getting the list of instances once again from the
configuration.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocmdlib: Factorize checking node groups' instances
Michael Hanselmann [Fri, 5 Aug 2011 11:05:38 +0000 (13:05 +0200)]
cmdlib: Factorize checking node groups' instances

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoInclude hooks.rst in version check
Michael Hanselmann [Fri, 29 Jul 2011 08:18:30 +0000 (10:18 +0200)]
Include hooks.rst in version check

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoBump version to 2.5.0~beta1
Michael Hanselmann [Fri, 29 Jul 2011 08:18:17 +0000 (10:18 +0200)]
Bump version to 2.5.0~beta1

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher: Write per-group instance status, merge into global one
Michael Hanselmann [Fri, 5 Aug 2011 09:57:50 +0000 (11:57 +0200)]
watcher: Write per-group instance status, merge into global one

Each per-group watcher process writes its own instance status file. Once
that's done it tries to acquire an exclusive lock on the global file and
will proceed to read all status file, merging them based on each file's
mtime. If an instance is moved to another group, the newer status will
supersede that of an older file which hasn't yet been updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocleaner: Remove watcher's instance status file after 21 days
Michael Hanselmann [Thu, 4 Aug 2011 16:31:51 +0000 (18:31 +0200)]
cleaner: Remove watcher's instance status file after 21 days

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoutils.ReadFile: Add pre-read callback
Michael Hanselmann [Thu, 4 Aug 2011 16:38:38 +0000 (18:38 +0200)]
utils.ReadFile: Add pre-read callback

This will be used by the watcher to store the file's fstat(2). It must
be done from the filehandle.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoMerge branch 'stable-2.4'
René Nussbaumer [Fri, 5 Aug 2011 09:39:37 +0000 (11:39 +0200)]
Merge branch 'stable-2.4'

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoBumping version to 2.4.3 v2.4.3
René Nussbaumer [Thu, 4 Aug 2011 14:21:06 +0000 (16:21 +0200)]
Bumping version to 2.4.3

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFixed a typo in utils/process.py
Agata Murawska [Thu, 4 Aug 2011 14:46:20 +0000 (16:46 +0200)]
Fixed a typo in utils/process.py

Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix unittest failure after list_owned changes
Iustin Pop [Thu, 4 Aug 2011 14:20:34 +0000 (16:20 +0200)]
Fix unittest failure after list_owned changes

We just need an object that has a list_owned method.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoRemove 15-second sleep from LUInstanceCreate
Apollon Oikonomopoulos [Wed, 3 Aug 2011 15:04:08 +0000 (18:04 +0300)]
Remove 15-second sleep from LUInstanceCreate

Remove 15 second sleep when wait_for_sync is not set. LUInstanceCreate already
calls _WaitForSync with oneshot=True, which already performs an internal
wait-loop for disks to start syncing.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd a readability alias
Iustin Pop [Thu, 4 Aug 2011 10:52:40 +0000 (12:52 +0200)]
Add a readability alias

lu.glm.list_owned becomes lu.owned_locks, which is clearer for the
reader.

Also rename three variables (which were before named owned_locks) to
make clearer what they track.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoFix broken object references in docstrings
Michael Hanselmann [Thu, 4 Aug 2011 12:29:09 +0000 (14:29 +0200)]
Fix broken object references in docstrings

The module is called “objects”, not “object”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd “gnt-instance change-group” command
Michael Hanselmann [Wed, 3 Aug 2011 09:46:10 +0000 (11:46 +0200)]
Add “gnt-instance change-group” command

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd opcode to change instance's group
Michael Hanselmann [Wed, 3 Aug 2011 09:45:46 +0000 (11:45 +0200)]
Add opcode to change instance's group

This is quite similar to evacuating a group, but the locking
is different.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFactorize checking instance's node groups
Michael Hanselmann [Thu, 4 Aug 2011 12:21:05 +0000 (14:21 +0200)]
Factorize checking instance's node groups

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoUpdate the NEWS file for 2.4.3
René Nussbaumer [Thu, 4 Aug 2011 11:40:43 +0000 (13:40 +0200)]
Update the NEWS file for 2.4.3

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoganeti-cleaner: Remove old watcher state files
Michael Hanselmann [Fri, 29 Jul 2011 14:35:10 +0000 (16:35 +0200)]
ganeti-cleaner: Remove old watcher state files

Watcher state files can stay around if node groups are removed. With
this patch they're removed after 21 days.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoRemove WATCHER_STATEFILE constant
Michael Hanselmann [Fri, 29 Jul 2011 13:14:36 +0000 (15:14 +0200)]
Remove WATCHER_STATEFILE constant

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocfgupgrade: Remove old watcher state file
Michael Hanselmann [Fri, 29 Jul 2011 13:15:00 +0000 (15:15 +0200)]
cfgupgrade: Remove old watcher state file

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoganeti-watcher: Split for node groups
Michael Hanselmann [Fri, 29 Jul 2011 13:49:20 +0000 (15:49 +0200)]
ganeti-watcher: Split for node groups

This patch brings a huge change to ganeti-watcher to make it aware of
node groups. Each node group is processed in its own subprocess,
reducing the impact of long-running operations.

The global watcher state file, $datadir/ganeti/watcher.data, is replaced
with a state file per node group ($datadir/ganeti/watcher.${uuid}.data).

Previously a lock on the state file was used to ensure only one instance
of watcher was running at the same time. Some operations, e.g.
“gnt-cluster renew-crypto”, blocked the watcher by acquiring an
exclusive lock on the state file. Since the watcher processes now use
different files, this method is no longer usable. Locking multiple files
isn't atomic. Instead a dedicated lock file is used and every watcher
process acquires a shared lock on it. If a Ganeti command wants to block
the watcher it acquires the lock in exclusive mode.

Each per-nodegroup watcher process also acquires an exclusive lock on
its state file. This prevents multiple watchers from running for the
same nodegroup.

The code is reorganized heavily to clear up dependencies between
functions and to get rid of the global “client” variable. The utility
class “Watcher” is removed in favour of stand-alone utility functions.

Since the parent watcher process won't wait for its children by
default, a new option (--wait-children) was added. It is used, for
example, by QA.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoLock potential target nodes for group evacuation
Michael Hanselmann [Wed, 3 Aug 2011 09:44:28 +0000 (11:44 +0200)]
Lock potential target nodes for group evacuation

All potential target nodes should be locked while calculating
a group evacuation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoSmall changes in group evacuation
Michael Hanselmann [Wed, 3 Aug 2011 08:46:51 +0000 (10:46 +0200)]
Small changes in group evacuation

- Use OpPrereqError in CheckPrereq
- Clarify command synopsis

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocmdlib: Factorize getting iallocator
Michael Hanselmann [Wed, 3 Aug 2011 08:43:41 +0000 (10:43 +0200)]
cmdlib: Factorize getting iallocator

The same logic will be used for changing an instance's group.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd design document for Ganeti 2.5
Michael Hanselmann [Wed, 3 Aug 2011 15:01:10 +0000 (17:01 +0200)]
Add design document for Ganeti 2.5

Including the designs which were actually implemented.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoPause DRBD sync for OS install if not wait_for_sync
Apollon Oikonomopoulos [Wed, 3 Aug 2011 15:03:38 +0000 (18:03 +0300)]
Pause DRBD sync for OS install if not wait_for_sync

When wait_for_sync is set to False in LUInstanceCreate, Ganeti lets DRBD sync
in the background while performing the rest of the installation steps,
including OS installation.

However, OS installation is a very disk-intensive task that intereferes badly
with the background I/O caused by DRBD's initial sync. To this end, we pause
the background sync before OS installation and unpause it afterwards, which
yields a significant speed boost for OS installation. The following should be
noted:

a) The user has requested not to wait for sync, i.e. the instance will be
   non-redundant for an unspecified interval anyway and delaying this by a
   couple of minutes is not a big compromise.

b) This approach is also followed during disk wiping.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
[iustin@google.com: simplify an if check]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix documentation of gnt-instance failover
Iustin Pop [Wed, 3 Aug 2011 15:57:39 +0000 (17:57 +0200)]
Fix documentation of gnt-instance failover

Explain that we only start the instance on the new node if it was
originally running.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoSmall doc patch for gnt-node evacuate
Iustin Pop [Wed, 3 Aug 2011 15:08:57 +0000 (17:08 +0200)]
Small doc patch for gnt-node evacuate

Just explain a bit the relation between node evacuate and instance
commands.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix small typo in docstring
Iustin Pop [Wed, 3 Aug 2011 15:23:04 +0000 (17:23 +0200)]
Fix small typo in docstring

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix typo in NEWS
Michael Hanselmann [Wed, 3 Aug 2011 11:44:00 +0000 (13:44 +0200)]
Fix typo in NEWS

“--dry-run” starts with two dashes.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoChange the backend.InstanceLogName signature
Iustin Pop [Tue, 2 Aug 2011 08:14:05 +0000 (10:14 +0200)]
Change the backend.InstanceLogName signature

This uses now the component for the transfer (if available), otherwise
(e.g. in installs/renames) nothing.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoInstance transfer: export component name to backend
Iustin Pop [Tue, 2 Aug 2011 08:12:05 +0000 (10:12 +0200)]
Instance transfer: export component name to backend

This modifies the RPC layer to export the component name too to the
backend, so that it can be used in log files and messages.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoInstance transfer: add argument for the 'component'
Iustin Pop [Tue, 2 Aug 2011 08:06:34 +0000 (10:06 +0200)]
Instance transfer: add argument for the 'component'

Currently, transfer data is done mainly with just the instance name,
but when we have instances with multiple disks this is not enough to
distinguish between the different transfers being done for the
instance.

Some parts of the code do have knowledge of the part being transferred
(i.e. DiskTransfer.name), but if I understood correctly not all, so I
decided to add a new argument to the respective disk import/disk
export classes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoOptimise use of repeated/looping GetInstanceInfo
Iustin Pop [Wed, 3 Aug 2011 11:48:46 +0000 (13:48 +0200)]
Optimise use of repeated/looping GetInstanceInfo

Similar to the previous patch, this adds a helper function to
eliminate repeated calls info ConfigWriter.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoOptimise use of repeated/looping GetNodeInfo
Iustin Pop [Wed, 3 Aug 2011 11:16:48 +0000 (13:16 +0200)]
Optimise use of repeated/looping GetNodeInfo

This adds a new ConfigWriter.GetMultiNodeInfo function and replaces
multiple/looping calls to GetNodeInfo with it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix lint errors
Iustin Pop [Wed, 3 Aug 2011 10:59:02 +0000 (12:59 +0200)]
Fix lint errors

It turns out that the only use of the operator module was for
itemgetter, so patch eb62069e should have removed that import too.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agognt-node.rst: Fix a typo
René Nussbaumer [Wed, 3 Aug 2011 12:34:24 +0000 (14:34 +0200)]
gnt-node.rst: Fix a typo

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd two more compat functions
Iustin Pop [Wed, 3 Aug 2011 09:47:41 +0000 (11:47 +0200)]
Add two more compat functions

operator.itemgetter(0) → fst
operator.itemgetter(1) → snd

snd is not used yet, but it makes sense to add both.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoAdd a flag to burnin to allow specifying VCPU count.
Pedro Macedo [Tue, 2 Aug 2011 15:19:36 +0000 (17:19 +0200)]
Add a flag to burnin to allow specifying VCPU count.

Signed-off-by: Pedro Macedo <pmacedo@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix types passed to IAllocator
Iustin Pop [Tue, 2 Aug 2011 13:01:34 +0000 (15:01 +0200)]
Fix types passed to IAllocator

Iallocator mode reloc, parameter reloc_from takes a list; half of the
code already forced this parameter to list, we add the other two cases
where it is needed.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agohtools: change absolute to relative symlinks
Iustin Pop [Tue, 2 Aug 2011 12:59:00 +0000 (14:59 +0200)]
htools: change absolute to relative symlinks

Currently we use absolute symlinks, but this doesn't work when we
install remotely (due to install first to local temp dir, then rsync
to remote machines). To fix, we change to manually-computed relative
paths, which is not best, but it works.

One possible alternative would be to use hard-links…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agojqueue: Add short delay before detecting job changes
Michael Hanselmann [Tue, 2 Aug 2011 09:48:09 +0000 (11:48 +0200)]
jqueue: Add short delay before detecting job changes

By sleeping for 100ms after receiving a notification for a changed job
file the job is given some additional time to change again. This
significantly reduces the number of LUXI calls for WaitForJobChanges
(depending on the job, in my tests with “gnt-cluster verify
--debug-simulate-errors” by about 80%), and improves performance (the
same job went from around 7 seconds to around 3.5 seconds).

This method is not perfect. The algorithm could be made more complex,
e.g. by increasing the delay on each change, etc., but for now this
simple change provides a good improvement.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd primary/second nodes' group as query fields
Michael Hanselmann [Thu, 28 Jul 2011 11:37:20 +0000 (13:37 +0200)]
Add primary/second nodes' group as query fields

These will be very useful for ganeti-watcher as it needs to retrieve
instances by group.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix doclint failures
Iustin Pop [Tue, 2 Aug 2011 06:58:27 +0000 (08:58 +0200)]
Fix doclint failures

Commit 54ca6e4b2 renamed some arguments, but didn't also renames them
in the docstrings.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agowatcher: Separate function for writing instance status file
Michael Hanselmann [Fri, 29 Jul 2011 13:56:05 +0000 (15:56 +0200)]
watcher: Separate function for writing instance status file

For now this will do another query to the master daemon, but with the
split for node groups this issue will go away.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher: Make RAPI error messages less technical
Michael Hanselmann [Fri, 29 Jul 2011 13:49:55 +0000 (15:49 +0200)]
watcher: Make RAPI error messages less technical

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher.state: Use strings, not objects
Michael Hanselmann [Fri, 29 Jul 2011 13:43:14 +0000 (15:43 +0200)]
watcher.state: Use strings, not objects

Until now the state class would receive instances as objects
(ganeti.watcher.Instance), but this is not necessary. By using strings
the interface is simplified.

This patch also simplifies some code accessing the internal structures,
e.g. setting a key of a dictionary. Some instances of “del dict[key]”
are replaced with “dict.pop(key, None)” to suppress any exceptions if
the key doesn't exist.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher: Raise error on unknown hook status
Michael Hanselmann [Fri, 29 Jul 2011 13:20:42 +0000 (15:20 +0200)]
watcher: Raise error on unknown hook status

Also, remove punctuation from one error message.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agowatcher: Reformat constants
Michael Hanselmann [Fri, 29 Jul 2011 13:19:04 +0000 (15:19 +0200)]
watcher: Reformat constants

Make them match with style guide.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd new watcher constants
Michael Hanselmann [Fri, 29 Jul 2011 13:13:58 +0000 (15:13 +0200)]
Add new watcher constants

WATCHER_STATEFILE will be removed at the end of this
patch series.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix formatting of frozensets
Stephen Shirley [Fri, 29 Jul 2011 12:15:40 +0000 (14:15 +0200)]
Fix formatting of frozensets

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agocli: Add constant for node group option
Michael Hanselmann [Thu, 28 Jul 2011 09:26:36 +0000 (11:26 +0200)]
cli: Add constant for node group option

ganeti-watcher will use this constant to pass the option to itself for
processing all node groups.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoReplace %r with '%s' in masterd/instance.py
Iustin Pop [Fri, 29 Jul 2011 08:55:44 +0000 (10:55 +0200)]
Replace %r with '%s' in masterd/instance.py

I still don't know why Michael is a fan of %r, but in the meantime
this patch changes:

WARNING: import u'import-2011-07-29_01_39_33-y3gZKV' on node1 failed:
Exited with status 1

into:

WARNING: import 'import-2011-07-29_01_39_33-y3gZKV' on node1 failed:
Exited with status 1

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd "reboot_behavior" hypervisor flag
Stephen Shirley [Mon, 20 Jun 2011 15:52:55 +0000 (17:52 +0200)]
Add "reboot_behavior" hypervisor flag

During instance installations, you do not want the instance to reboot
and start again with the same parameters, as that will most likely
re-start the install process. Therefore, when the instance requests a
reboot it should instead shutdown. This flag allows this to be
controlled.

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRemoved non-existing -t option from the gnt-cluster man page
Andrea Spadaccini [Fri, 29 Jul 2011 09:18:55 +0000 (10:18 +0100)]
Removed non-existing -t option from the gnt-cluster man page

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>