Michael Hanselmann [Fri, 12 Aug 2011 13:32:10 +0000 (15:32 +0200)]
Fix exit code of “gnt-cluster verify”
With commit
fcad7225e3fc4 LU-generated jobs are used, but the
exit code must still be backwards-compatible.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 9 Aug 2011 12:10:33 +0000 (14:10 +0200)]
Update NEWS for 2.5
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 12 Aug 2011 09:08:53 +0000 (11:08 +0200)]
Small improvements for cluster verify
- Check if BGL is actually owned
- Show group name as feedback
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 12 Aug 2011 12:48:11 +0000 (14:48 +0200)]
watcher: Use locks when querying for resource information
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 12 Aug 2011 12:47:47 +0000 (14:47 +0200)]
Allow locking to be used via OpQuery
The original design for query2 specifically excluded locking, but now
it's turned out that it would be a good thing to have in watcher. This
patch adds a new parameter to OpQuery and enables its use in LUQuery. A
missing function is added to LUGroupQuery, a comment clarified in
_NodeQuery and all locks declared as shared acquires in the same LU.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 12 Aug 2011 12:06:54 +0000 (14:06 +0200)]
Document job results for RAPI where possible
Some opcodes aren't documented yet.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 12 Aug 2011 12:06:16 +0000 (14:06 +0200)]
opcodes: Add more result checks, add some comments
Some of these will be used by the RAPI documentation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 12 Aug 2011 12:05:35 +0000 (14:05 +0200)]
sphinx_ext: Allow documenting opcode results
Will be used by RAPI documentation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 12 Aug 2011 12:03:51 +0000 (14:03 +0200)]
ht: Allow adding comment to type descriptions
This will be used to add some more details to type descriptions, e.g. on
opcode parameters or result values. The implementation is very similar
to “WithDesc”.
I chose to use “[…]” after finding “/*…*/” hard to read and spot. At
some we'll have to introduce proper formatting (e.g. using HTML).
Example with a comment:
List of ((Length 2) and (Item 0 is (NonEmptyString [name of changed
parameter]), item 1 is Anything))
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 12 Aug 2011 10:09:24 +0000 (12:09 +0200)]
Clarify job ID-related type checks, add unittests
Instead of a rather complicated expression only “JobId” is output. Job
ID lists (like generated by “SubmitManyJobs”) are limited to two-item
lists. Unittests are added.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 12 Aug 2011 09:14:07 +0000 (11:14 +0200)]
Change OpClusterVerifyConfig's result, verify results
This patch removes the list of node groups (not used anymore since
commit
fcad7225e3fc) from OpClusterVerifyConfig's result and adds result
verification to all OpClusterVerify* opcodes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Andrea Spadaccini [Fri, 12 Aug 2011 11:02:18 +0000 (12:02 +0100)]
Fixed error in Makefile.am, changing spaces with tabs
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Fri, 12 Aug 2011 10:08:52 +0000 (11:08 +0100)]
Added the test data files for netutils to Makefile.am
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 11 Aug 2011 16:15:20 +0000 (18:15 +0200)]
Use LU-generated jobs for verifying cluster
This patch moves the logic for verifying the various node groups in a
cluster into the master daemon. Job dependencies are used to ensure the
configuration, which requires the BGL, is verified first.
With this change it will be possible to expose whole-cluster
verification through the remote API without requiring additional client
logic on top of standard features like LU-generated jobs and job
dependencies.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 11 Aug 2011 16:11:17 +0000 (18:11 +0200)]
opcodes: Use variables for verification parameters
Just some cleanup before the 2.5 release.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 11 Aug 2011 14:37:30 +0000 (16:37 +0200)]
mcpu: Specify actual received type on opcode issue
This helped me debug an issue with opcodes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 11 Aug 2011 12:10:42 +0000 (14:10 +0200)]
Use resource kind as OpQuery*'s description
This gives a hint as to what's queried. “QUERY(instance)” or
“QUERY(node)” are way better than just “QUERY”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Andrea Spadaccini [Thu, 11 Aug 2011 10:30:47 +0000 (11:30 +0100)]
Added helper functions in netutils and related constants
Added the following functions to netutils:
- IsValidInterface
- GetInterfaceIpAddresses
- _GetIpAddressesFromIpOutput
Added the following static methods to netutils.IPAddress:
- GetAddressFamilyFromVersion
- GetVersionFromAddressFamily
Added unit tests for the new methods in netutils.IPAddress, for the IP
address search regex and for GetInterfaceIpAddresses
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 11 Aug 2011 10:18:59 +0000 (12:18 +0200)]
Fix epydoc error in rlib2.py
I blindly assumed epydoc would use normal reST, but turns out it uses
its own “epytext” in our configuration. Since the latter doesn't support
blockquotes, I just make the paragraph a literal block.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 10 Aug 2011 15:30:26 +0000 (17:30 +0200)]
Fix typo in rlib2's docstring
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Benjamin Lipton <benlipton@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 10 Aug 2011 15:22:49 +0000 (17:22 +0200)]
Documentation fixes and clarification
- In README, refer to “install.rst”, not “install.html”
- In rapi.rst, wrap line longer than 72 characters
- In rlib2.py, update and clarify description of POST vs. PUT
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 9 Aug 2011 14:42:41 +0000 (16:42 +0200)]
gnt-instance: Rename _SHUTDOWN_* to _EXPAND_*
Once upon a time these constants were only used for stopping instances,
but pretty soon they became more useful. Let's rename them.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 9 Aug 2011 14:24:32 +0000 (16:24 +0200)]
List returned fields in RAPI documentation
Also replace console types with constants.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 9 Aug 2011 13:24:04 +0000 (15:24 +0200)]
rlib2: Exclude oplog/opresult from bulk job list
These fields can get rather large. Excluding them from the big bulk list
reduces the amount of data. They are still available via per-job
requests.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 9 Aug 2011 11:26:11 +0000 (13:26 +0200)]
rlib: Expose node group tags
Commit
1ffd26739d3 added support for tagging node groups. Also add a
check for exposed fields.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 9 Aug 2011 11:13:49 +0000 (13:13 +0200)]
rapi: Bulk support for jobs
This was requested in issue 181.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Andrea Spadaccini [Tue, 9 Aug 2011 12:31:39 +0000 (13:31 +0100)]
Fixed an error in the documentation of _GetKVMVersion
Fixed an epydoc compilation error that I introduced with last commit.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 9 Aug 2011 12:11:02 +0000 (14:11 +0200)]
Mention globbing filters in ganeti(7) manpage
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Andrea Spadaccini [Tue, 9 Aug 2011 11:16:49 +0000 (12:16 +0100)]
Removed code duplication for calls to _GetKVMVersion
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Mon, 8 Aug 2011 16:14:10 +0000 (18:14 +0200)]
Fix epydoc breakage caused by
f8638e288c7a
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Andrea Spadaccini [Mon, 8 Aug 2011 14:00:04 +0000 (15:00 +0100)]
Changed NET_PORT_CHECK to REQ_NET_PORT_CHECK, to improve consistency
I originally made this change because I needed the OPT_NET_PORT_CHECK,
and I am committing it even if I don't need anymore OPT_NET_PORT_CHECK
because IMO it improves the consistency of the name of the wrappers.
Also, I changed the code of the check to use inequality operators
chaining.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Mon, 8 Aug 2011 14:00:03 +0000 (15:00 +0100)]
Added check for the ip command at configure time
Also, corrected a few places where the ip command was hardcoded.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Mon, 8 Aug 2011 15:31:59 +0000 (17:31 +0200)]
Detect globbing patterns as query arguments
Short: this patch enables the use of “gnt-instance list '*.site'”.
Detailed description: This patch changes the command line interface code
to try to deduce the kind of filter from the arguments to a “list”
command. If it's a list of plain names an old-style name filter is used.
If filtering is forced or the single argument is potentially a filter,
it is parsed as a query filter string. Any name looking like a globbing
pattern (e.g. “*.site” or “web?.example.com”) is treated as such.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Fri, 5 Aug 2011 09:40:45 +0000 (10:40 +0100)]
cluster-merge: implement params delta mercifulness
Sometimes it's good to tell the user about parameter differences but
then proceed anyway. Strictness is still enforced for those parameters
that would break the cluster (volume group name, storage dir if file
storage is enabled).
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Thu, 4 Aug 2011 14:01:04 +0000 (15:01 +0100)]
cluster-merge: consider file storage enable state
There's no point in checking whether the file storage dir in the two
clusters is the same if file storage is not even enabled
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 8 Aug 2011 09:34:41 +0000 (11:34 +0200)]
Allow fixing of split instances via relocate
Currently, the IAllocator code requests strictly that the (set of) groups of
the nodes we're relocating from is equal to the set of groups we're
relocating to.
This, however, makes is impossible to fix split instances, since (by
definition) the secondary of a split instance is not in the same group
as the primary node, and after the fixing is it the same.
The patch changes the test from group equality to check that the final
group set (across both primary and secondary nodes) is a subset of the
initial group set (again across both nodes). This means we can't
"extend" the group of nodes but keeping the same or decreasing it is
allowed.
After this patch, one can finally fix (automatically) split instances
via a gnt-instance replace-disks.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 5 Aug 2011 16:56:57 +0000 (18:56 +0200)]
Revert deprecation of evacuate mode in hail
As discussed offline, the new node-change mode could be used for
evacuation, but it's not directly useful as it returns a list of
opcodes; therefore, we need to partially revert commits fbe5fcf and
5b53ca7 that removed it (and multi-evacuate, which remains removed).
The new version of relocate is actually just a wrapper over the
tryNodeEvac (which does the node evacuate); we run that and then we do
some extra checks that the nodes we got from that function are
consistent with the instance's new state.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 5 Aug 2011 14:52:44 +0000 (16:52 +0200)]
Further cleanup after multi-evacuate removal
Commit
f0edfcf6 removed the parsing of multi-evacuate result, but the
code went from:
if mode in (multi-evac, relocate):
…
if mode == relocate:
…
to:
if mode == relocate:
…
if mode == relocate
…
This patch simply removes the nested if.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 5 Aug 2011 14:48:49 +0000 (16:48 +0200)]
Fix bug in IAllocator parsing of Evacuate result
Commit
342f9172 added stricter checks for the iallocator result in
evacuate mode, but it does this irrespective of the result
status. When the result has failed and (according to the design) the
list of nodes is empty, this code will trigger the following:
node1# gnt-instance replace-disks -I hail instance14
Failure: command execution error:
Groups of nodes returned by iallocator () differ from original groups (default)
After the patch, the result is:
node1# gnt-instance replace-disks -I hail instance14
Failure: prerequisites not met for this operation:
error type: insufficient_resources, error details:
Can't compute nodes using iallocator 'hail': Request failed: …
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 5 Aug 2011 13:38:41 +0000 (15:38 +0200)]
Implement globbing operator for filters
The operators “=*” and “!*” do globbing in filters, e.g.:
$ gnt-instance list --no-headers -o name 'name =* "*.site"'
inst1.site.example.com
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 5 Aug 2011 13:41:28 +0000 (15:41 +0200)]
Zero DRBD metadata before creation
The docstring of the DRBD8 class says:
… The meta device is checked for valid size and is zeroed on create.
which is not done today, hence we have
http://code.google.com/p/ganeti/issues/detail?id=182:
node1# mkreiserfs -f /dev/xenvg/t8
…
ReiserFS is successfully created on /dev/xenvg/t8.
node1# drbdmeta --force /dev/drbd256 v08 /dev/xenvg/t8 0 create-md
md_offset 0
al_offset 4096
bm_offset 36864
Found reiser filesystem
This would corrupt existing data.
If you want me to do this, you need to zero out the first part
of the device (destroy the content).
You should be very sure that you mean it.
Operation refused.
I've tested and even just 1MB is enough to wipe the meta, but let's be
safer and pass a 'clean' meta to drbd.
Note: I didn't copy _WipeDevice from backend.py since it seemed more
complex than needed here.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 5 Aug 2011 12:27:46 +0000 (14:27 +0200)]
Remove iallocator's “multi-evacuate” mode
It is no longer used and has been deprecated in 2.5.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 5 Aug 2011 12:39:24 +0000 (14:39 +0200)]
confd.querylib: Remove long-deprecated query mode
This was never used by a stable version.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 5 Aug 2011 12:26:50 +0000 (14:26 +0200)]
Add docstring to cmdlib.TLReplaceDisks._FindFaultyDisks
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 5 Aug 2011 12:17:07 +0000 (14:17 +0200)]
watcher: Fix breakage caused by
9bb69bb52fb9
The first argument to str.split is the separator, not the maximum number
of splits.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 5 Aug 2011 11:10:53 +0000 (13:10 +0200)]
LUGroupVerifyDisks: Use _CheckInstanceNodeGroups' result
… instead of getting the list of instances once again from the
configuration.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 5 Aug 2011 11:05:38 +0000 (13:05 +0200)]
cmdlib: Factorize checking node groups' instances
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 08:18:30 +0000 (10:18 +0200)]
Include hooks.rst in version check
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 08:18:17 +0000 (10:18 +0200)]
Bump version to 2.5.0~beta1
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 5 Aug 2011 09:57:50 +0000 (11:57 +0200)]
watcher: Write per-group instance status, merge into global one
Each per-group watcher process writes its own instance status file. Once
that's done it tries to acquire an exclusive lock on the global file and
will proceed to read all status file, merging them based on each file's
mtime. If an instance is moved to another group, the newer status will
supersede that of an older file which hasn't yet been updated.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 4 Aug 2011 16:31:51 +0000 (18:31 +0200)]
cleaner: Remove watcher's instance status file after 21 days
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 4 Aug 2011 16:38:38 +0000 (18:38 +0200)]
utils.ReadFile: Add pre-read callback
This will be used by the watcher to store the file's fstat(2). It must
be done from the filehandle.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Fri, 5 Aug 2011 09:39:37 +0000 (11:39 +0200)]
Merge branch 'stable-2.4'
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 4 Aug 2011 14:21:06 +0000 (16:21 +0200)]
Bumping version to 2.4.3
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Agata Murawska [Thu, 4 Aug 2011 14:46:20 +0000 (16:46 +0200)]
Fixed a typo in utils/process.py
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 4 Aug 2011 14:20:34 +0000 (16:20 +0200)]
Fix unittest failure after list_owned changes
We just need an object that has a list_owned method.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Apollon Oikonomopoulos [Wed, 3 Aug 2011 15:04:08 +0000 (18:04 +0300)]
Remove 15-second sleep from LUInstanceCreate
Remove 15 second sleep when wait_for_sync is not set. LUInstanceCreate already
calls _WaitForSync with oneshot=True, which already performs an internal
wait-loop for disks to start syncing.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 4 Aug 2011 10:52:40 +0000 (12:52 +0200)]
Add a readability alias
lu.glm.list_owned becomes lu.owned_locks, which is clearer for the
reader.
Also rename three variables (which were before named owned_locks) to
make clearer what they track.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Thu, 4 Aug 2011 12:29:09 +0000 (14:29 +0200)]
Fix broken object references in docstrings
The module is called “objects”, not “object”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 09:46:10 +0000 (11:46 +0200)]
Add “gnt-instance change-group” command
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 09:45:46 +0000 (11:45 +0200)]
Add opcode to change instance's group
This is quite similar to evacuating a group, but the locking
is different.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 4 Aug 2011 12:21:05 +0000 (14:21 +0200)]
Factorize checking instance's node groups
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 4 Aug 2011 11:40:43 +0000 (13:40 +0200)]
Update the NEWS file for 2.4.3
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 14:35:10 +0000 (16:35 +0200)]
ganeti-cleaner: Remove old watcher state files
Watcher state files can stay around if node groups are removed. With
this patch they're removed after 21 days.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:14:36 +0000 (15:14 +0200)]
Remove WATCHER_STATEFILE constant
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:15:00 +0000 (15:15 +0200)]
cfgupgrade: Remove old watcher state file
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:49:20 +0000 (15:49 +0200)]
ganeti-watcher: Split for node groups
This patch brings a huge change to ganeti-watcher to make it aware of
node groups. Each node group is processed in its own subprocess,
reducing the impact of long-running operations.
The global watcher state file, $datadir/ganeti/watcher.data, is replaced
with a state file per node group ($datadir/ganeti/watcher.${uuid}.data).
Previously a lock on the state file was used to ensure only one instance
of watcher was running at the same time. Some operations, e.g.
“gnt-cluster renew-crypto”, blocked the watcher by acquiring an
exclusive lock on the state file. Since the watcher processes now use
different files, this method is no longer usable. Locking multiple files
isn't atomic. Instead a dedicated lock file is used and every watcher
process acquires a shared lock on it. If a Ganeti command wants to block
the watcher it acquires the lock in exclusive mode.
Each per-nodegroup watcher process also acquires an exclusive lock on
its state file. This prevents multiple watchers from running for the
same nodegroup.
The code is reorganized heavily to clear up dependencies between
functions and to get rid of the global “client” variable. The utility
class “Watcher” is removed in favour of stand-alone utility functions.
Since the parent watcher process won't wait for its children by
default, a new option (--wait-children) was added. It is used, for
example, by QA.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 09:44:28 +0000 (11:44 +0200)]
Lock potential target nodes for group evacuation
All potential target nodes should be locked while calculating
a group evacuation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 08:46:51 +0000 (10:46 +0200)]
Small changes in group evacuation
- Use OpPrereqError in CheckPrereq
- Clarify command synopsis
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 08:43:41 +0000 (10:43 +0200)]
cmdlib: Factorize getting iallocator
The same logic will be used for changing an instance's group.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 15:01:10 +0000 (17:01 +0200)]
Add design document for Ganeti 2.5
Including the designs which were actually implemented.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Apollon Oikonomopoulos [Wed, 3 Aug 2011 15:03:38 +0000 (18:03 +0300)]
Pause DRBD sync for OS install if not wait_for_sync
When wait_for_sync is set to False in LUInstanceCreate, Ganeti lets DRBD sync
in the background while performing the rest of the installation steps,
including OS installation.
However, OS installation is a very disk-intensive task that intereferes badly
with the background I/O caused by DRBD's initial sync. To this end, we pause
the background sync before OS installation and unpause it afterwards, which
yields a significant speed boost for OS installation. The following should be
noted:
a) The user has requested not to wait for sync, i.e. the instance will be
non-redundant for an unspecified interval anyway and delaying this by a
couple of minutes is not a big compromise.
b) This approach is also followed during disk wiping.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
[iustin@google.com: simplify an if check]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 3 Aug 2011 15:57:39 +0000 (17:57 +0200)]
Fix documentation of gnt-instance failover
Explain that we only start the instance on the new node if it was
originally running.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 15:08:57 +0000 (17:08 +0200)]
Small doc patch for gnt-node evacuate
Just explain a bit the relation between node evacuate and instance
commands.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 15:23:04 +0000 (17:23 +0200)]
Fix small typo in docstring
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 11:44:00 +0000 (13:44 +0200)]
Fix typo in NEWS
“--dry-run” starts with two dashes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 2 Aug 2011 08:14:05 +0000 (10:14 +0200)]
Change the backend.InstanceLogName signature
This uses now the component for the transfer (if available), otherwise
(e.g. in installs/renames) nothing.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 2 Aug 2011 08:12:05 +0000 (10:12 +0200)]
Instance transfer: export component name to backend
This modifies the RPC layer to export the component name too to the
backend, so that it can be used in log files and messages.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 2 Aug 2011 08:06:34 +0000 (10:06 +0200)]
Instance transfer: add argument for the 'component'
Currently, transfer data is done mainly with just the instance name,
but when we have instances with multiple disks this is not enough to
distinguish between the different transfers being done for the
instance.
Some parts of the code do have knowledge of the part being transferred
(i.e. DiskTransfer.name), but if I understood correctly not all, so I
decided to add a new argument to the respective disk import/disk
export classes.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 11:48:46 +0000 (13:48 +0200)]
Optimise use of repeated/looping GetInstanceInfo
Similar to the previous patch, this adds a helper function to
eliminate repeated calls info ConfigWriter.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 11:16:48 +0000 (13:16 +0200)]
Optimise use of repeated/looping GetNodeInfo
This adds a new ConfigWriter.GetMultiNodeInfo function and replaces
multiple/looping calls to GetNodeInfo with it.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 10:59:02 +0000 (12:59 +0200)]
Fix lint errors
It turns out that the only use of the operator module was for
itemgetter, so patch
eb62069e should have removed that import too.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
René Nussbaumer [Wed, 3 Aug 2011 12:34:24 +0000 (14:34 +0200)]
gnt-node.rst: Fix a typo
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 09:47:41 +0000 (11:47 +0200)]
Add two more compat functions
operator.itemgetter(0) → fst
operator.itemgetter(1) → snd
snd is not used yet, but it makes sense to add both.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Pedro Macedo [Tue, 2 Aug 2011 15:19:36 +0000 (17:19 +0200)]
Add a flag to burnin to allow specifying VCPU count.
Signed-off-by: Pedro Macedo <pmacedo@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 2 Aug 2011 13:01:34 +0000 (15:01 +0200)]
Fix types passed to IAllocator
Iallocator mode reloc, parameter reloc_from takes a list; half of the
code already forced this parameter to list, we add the other two cases
where it is needed.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 2 Aug 2011 12:59:00 +0000 (14:59 +0200)]
htools: change absolute to relative symlinks
Currently we use absolute symlinks, but this doesn't work when we
install remotely (due to install first to local temp dir, then rsync
to remote machines). To fix, we change to manually-computed relative
paths, which is not best, but it works.
One possible alternative would be to use hard-links…
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 2 Aug 2011 09:48:09 +0000 (11:48 +0200)]
jqueue: Add short delay before detecting job changes
By sleeping for 100ms after receiving a notification for a changed job
file the job is given some additional time to change again. This
significantly reduces the number of LUXI calls for WaitForJobChanges
(depending on the job, in my tests with “gnt-cluster verify
--debug-simulate-errors” by about 80%), and improves performance (the
same job went from around 7 seconds to around 3.5 seconds).
This method is not perfect. The algorithm could be made more complex,
e.g. by increasing the delay on each change, etc., but for now this
simple change provides a good improvement.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 28 Jul 2011 11:37:20 +0000 (13:37 +0200)]
Add primary/second nodes' group as query fields
These will be very useful for ganeti-watcher as it needs to retrieve
instances by group.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 2 Aug 2011 06:58:27 +0000 (08:58 +0200)]
Fix doclint failures
Commit
54ca6e4b2 renamed some arguments, but didn't also renames them
in the docstrings.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:56:05 +0000 (15:56 +0200)]
watcher: Separate function for writing instance status file
For now this will do another query to the master daemon, but with the
split for node groups this issue will go away.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:49:55 +0000 (15:49 +0200)]
watcher: Make RAPI error messages less technical
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:43:14 +0000 (15:43 +0200)]
watcher.state: Use strings, not objects
Until now the state class would receive instances as objects
(ganeti.watcher.Instance), but this is not necessary. By using strings
the interface is simplified.
This patch also simplifies some code accessing the internal structures,
e.g. setting a key of a dictionary. Some instances of “del dict[key]”
are replaced with “dict.pop(key, None)” to suppress any exceptions if
the key doesn't exist.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:20:42 +0000 (15:20 +0200)]
watcher: Raise error on unknown hook status
Also, remove punctuation from one error message.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:19:04 +0000 (15:19 +0200)]
watcher: Reformat constants
Make them match with style guide.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Jul 2011 13:13:58 +0000 (15:13 +0200)]
Add new watcher constants
WATCHER_STATEFILE will be removed at the end of this
patch series.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Stephen Shirley [Fri, 29 Jul 2011 12:15:40 +0000 (14:15 +0200)]
Fix formatting of frozensets
Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 28 Jul 2011 09:26:36 +0000 (11:26 +0200)]
cli: Add constant for node group option
ganeti-watcher will use this constant to pass the option to itself for
processing all node groups.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 29 Jul 2011 08:55:44 +0000 (10:55 +0200)]
Replace %r with '%s' in masterd/instance.py
I still don't know why Michael is a fan of %r, but in the meantime
this patch changes:
WARNING: import u'import-2011-07-29_01_39_33-y3gZKV' on node1 failed:
Exited with status 1
into:
WARNING: import 'import-2011-07-29_01_39_33-y3gZKV' on node1 failed:
Exited with status 1
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Stephen Shirley [Mon, 20 Jun 2011 15:52:55 +0000 (17:52 +0200)]
Add "reboot_behavior" hypervisor flag
During instance installations, you do not want the instance to reboot
and start again with the same parameters, as that will most likely
re-start the install process. Therefore, when the instance requests a
reboot it should instead shutdown. This flag allows this to be
controlled.
Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>