Guido Trotter [Fri, 10 Jun 2011 12:27:12 +0000 (12:27 +0000)]
--select-instances hbal manpage update
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Mon, 13 Jun 2011 12:01:08 +0000 (12:01 +0000)]
Check that the selected instances are known
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Mon, 13 Jun 2011 11:44:58 +0000 (11:44 +0000)]
Loader.updateMovable: evaluate selected instances
This also adds docstrings for the function arguments and renames exinst
to exinsts, which is how it is called in other functions, since it's a
list.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Fri, 10 Jun 2011 13:45:09 +0000 (14:45 +0100)]
Add instance selection list to Loader.mergeData
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Fri, 10 Jun 2011 13:44:30 +0000 (14:44 +0100)]
Add --select-instances hbal flag
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Fri, 10 Jun 2011 13:30:10 +0000 (14:30 +0100)]
Remove double whitespace in help string
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Apollon Oikonomopoulos [Thu, 16 Jun 2011 12:15:30 +0000 (15:15 +0300)]
Add gnt-network design doc
This design covers high level network block definition and pool
management.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 14 Jun 2011 16:27:06 +0000 (18:27 +0200)]
Replace iallocator's mreloc w/ change-group and node-evac
This patch removes all occurrences of the “multi-relocate” iallocator
mode. Commit
25ee7fd845 updated the design document and introduced
separate modes, “change-group” and “node-evacuate”. The constants aren't
removed yet as they're still used by htools.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Stephen Shirley [Mon, 6 Jun 2011 08:59:46 +0000 (10:59 +0200)]
Fix a couple of typos
Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 6 Jun 2011 15:18:30 +0000 (17:18 +0200)]
Makefile: Add version check for iallocator.rst
iallocator.rst contains the Ganeti version at the top.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 6 Jun 2011 15:10:43 +0000 (17:10 +0200)]
Update iallocator design for node group-aware operations
A while ago a new ``multi-relocate`` mode was proposed and documented.
As it turned out, the interface had some deficiencies. With this patch
The relocation modes are reduced to two and split into separate
iallocator request modes: node-evacuate and change-group. Some request
and response requirements are clarified in the documentation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 1 Jun 2011 15:44:56 +0000 (17:44 +0200)]
jqueue: Allow loading of archived jobs
Chained jobs need to look at previous jobs, including archived ones. A
nice side-effect of this change is the ability to look at archived jobs
using “gnt-job info <id>” as long as the ID is known.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Tue, 31 May 2011 09:54:23 +0000 (11:54 +0200)]
Adding basic abstraction layer for caching
This includes an own simple cache implementation and an
interface to a memcache instance.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 9 Jun 2011 10:03:39 +0000 (10:03 +0000)]
Fix _checkRsaPrivateKey for newer key generation
Keys generated under debian sid just read "BEGIN PRIVATE KEY" rather
than "BEGIN RSA PRIVATE KEY".
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 7 Jun 2011 06:48:55 +0000 (08:48 +0200)]
Fix locking issues in LUClusterVerifyGroup
- Use functions in ConfigWriter instead of custom loops
- Calculate nodes only once instances locks are acquired, removes one
potential race condition
- Don't retrieve lists of all node/instance information without locks
- Additionally move the end of the node time check window after the
first RPC call--the second call isn't involved in checking the
node time at all
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 7 Jun 2011 05:17:09 +0000 (07:17 +0200)]
cmdlib: Acquire BGL for LUClusterVerifyConfig
LUClusterVerifyConfig verifies a number of configuration settings. For
doing so, it needs a consistent list of nodes, groups and instances. So
far no locks were acquired at all (except for the BGL in shared mode).
This is a race condition (e.g. if a node group is added in parallel) and
can be fixed by acquiring the BGL in exclusive mode. Since this LU
verifies the cluster-wide configuration, doing so instead of acquiring
individual locks is just.
Includes one typo fix and one docstring update.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 7 Jun 2011 10:46:05 +0000 (12:46 +0200)]
Export/import instance tags
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 7 Jun 2011 10:45:53 +0000 (12:45 +0200)]
Fix issue with tags on instance creation
Commit
720f56c85a added the ability to specify tags when creating an
instance. The “tags” attribute of an instance object needs to be a set,
but the patch's code saved it as a list, causing breakage in other parts
of Ganeti. This patch changes the code to use TaggableObject.AddTag,
which has a nice side-effect of doing some verification (including max.
number of tags). Instance import was also broken (no “tags” attribute in
options).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 3 Jun 2011 09:53:04 +0000 (11:53 +0200)]
Fix incomplete merge
Commit
66bd7445 changed the semantics of _JobProcessor on finished
jobs, and updated the related unittests in the 2.4 branch. It was then
merged to master, however on master there was an additional test for
this case, which was not updated.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Apollon Oikonomopoulos [Tue, 31 May 2011 12:50:28 +0000 (15:50 +0300)]
Export instance tags to instance hooks
Instance hooks now get an INSTANCE_TAGS environment variable, which contains a
space-delimited list of the affected instance's tags.
Also update the documentation to reflect the change.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Apollon Oikonomopoulos [Tue, 31 May 2011 12:49:58 +0000 (15:49 +0300)]
Add tagging option to gnt-instance create
Add TAG_ADD_OPT option to cli.py and use it in gnt-instance. Modify
cli.GenericInstanceCreate() accordingly.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Apollon Oikonomopoulos [Tue, 31 May 2011 12:49:28 +0000 (15:49 +0300)]
Add tag handling to {Op,LU}InstanceCreate
Add a tag slot to opcodes.OpInstanceCreate. We do not reuse _PTags, as this is
intended for OpTagsSet and thus:
a) is not documented
b) does not carry a default value, making it mandatory
Also pass the tags to the iallocator during instance creation.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 26 May 2011 14:40:50 +0000 (16:40 +0200)]
http.client: Make debug log less noisy
The HTTP client code generates quite a lot of debug log messages. With
this patch they're hidden unless explicitely enabled in the code.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 1 Jun 2011 16:10:06 +0000 (18:10 +0200)]
Merge branch 'devel-2.4'
* devel-2.4:
jqueue: Fix potential race condition when cancelling queued jobs
Fix argument order in ReserveLV and ReserveMAC
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 1 Jun 2011 14:51:39 +0000 (16:51 +0200)]
htools: introduce a type alias for JSON objects
This makes the type definitions a bit more readable/simpler.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 30 May 2011 12:13:13 +0000 (14:13 +0200)]
hail: stop using old-style 'nodes' key
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 30 May 2011 11:50:49 +0000 (13:50 +0200)]
hail: add parsing of multi-relocate request
This is not handled yet, this patch just adds parsing of the incoming
request.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 30 May 2011 08:43:50 +0000 (10:43 +0200)]
hail: add option for displaying the parsed request
This can be used for debugging.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Thu, 26 May 2011 13:24:36 +0000 (15:24 +0200)]
hail: add new data types for the multi-reloc mode
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Tue, 31 May 2011 11:14:26 +0000 (13:14 +0200)]
Add --no-instance-moves to the htools live tests
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 31 May 2011 10:38:50 +0000 (12:38 +0200)]
Update hbal manpage for --no-instance-moves
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 31 May 2011 14:57:51 +0000 (16:57 +0200)]
Implement balancing with no instance moves
Note that --no-disk-moves and --no-instance-moves are not incompatible,
but if both are used no solution can possibly exist.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 31 May 2011 10:34:19 +0000 (10:34 +0000)]
Pass the instance moves option in hbal
While still being ignored, now it gets passed down to the iteration
function.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Mon, 30 May 2011 14:10:13 +0000 (16:10 +0200)]
Add --no-instance-moves cli htools option
This option doesn't currently do anything.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 31 May 2011 14:49:45 +0000 (16:49 +0200)]
jqueue: Fix potential race condition when cancelling queued jobs
When a job was cancelled, its status would be changed and the file
written again. Since this was a final status, the job file could be
moved anytime for archival. If the job was still in the queue, however,
it would be processed (not fully, just updating the “end_timestamp”
attribute) and written again. This was bad as it could leave the same
job in two different files.
With this patch the processor is changed to return early for finished
jobs. Cancelling a queued job will finalize it right away. Unittests are
updated.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 30 May 2011 10:52:22 +0000 (12:52 +0200)]
iallocator: add ht-checking for the request
Currently, we only ht-check the result value from the iallocator, and
we send whatever we happen to check manually in the LUs that call the
iallocator.
This is not good, as we have to duplicate checks in many places, and
still we might miss checks. So we add add ht information to the
per-request variables. As the cluster data is built in one place, the
iallocator code itself (and is more consistent), I didn't add checks
to that too.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 30 May 2011 11:14:18 +0000 (13:14 +0200)]
iallocator: rename mem_size to memory
Currently, the iallocator in 'allocate' requires mem_size on input
but serialises that as 'memory'. This inconsistency makes it hard to
automatically validate the parameters, hence this patch renames
mem_size.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 30 May 2011 09:56:45 +0000 (11:56 +0200)]
iallocator: change default for target_groups
Per the design doc, the target_groups request key "if present, it must
either be the empty list, or contain a list of group UUIDs". Currently
it defaults to None/null, which is not valid.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 30 May 2011 11:24:39 +0000 (13:24 +0200)]
iallocator: export the hypervisor value
In 'allocate' mode, the documentation specifies that we export the
hypervisor value (“Allocation needs, in addition: … hypervisor, the
hypervisor of this instance”) and we need that on input, however we
don't actually export it.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 30 May 2011 10:54:44 +0000 (12:54 +0200)]
iallocator: fix incomplete refactoring
Commit
fdbe29ee changed the iallocator modes from 'r'/'w' to
'ro'/'rw', but forgot one check in LUTestAllocator. This patch just
completes the replacements.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Mon, 23 May 2011 16:55:50 +0000 (18:55 +0200)]
gnt-node migrate: Use LU-generated jobs
Until now LUNodeMigrate used multiple tasklets to evacuate all primary
instances on a node. In some cases it would acquire all node locks,
which isn't good on big clusters. With upcoming improvements to the LUs
for instance failover and migration, switching to separate jobs looks
like a better option. This patch changes LUNodeMigrate to use
LU-generated jobs.
While working on this patch, I identified a race condition in
LUNodeMigrate.ExpandNames. A node's instances were retrieved without a
lock and no verification was done.
For RAPI, a new feature string is added and can be used to detect
clusters which support more parameters for node migration. The client
is updated.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Apollon Oikonomopoulos [Mon, 30 May 2011 11:02:25 +0000 (14:02 +0300)]
Fix argument order in ReserveLV and ReserveMAC
ConfigWriter.ReserveLV() and Configwriter.ReserveMAC() called
TemporaryReservationManager.Reserve() with the ec_id and resource arguments
swapped. As a result, two reservation attempts for the same resource type
within the same LU would fail, even if the resources requested were different,
e.g.:
$ gnt-instance add -t sharedfile -o debootstrap+default \
--net 0:mac=00:01:02:03:04:00 \
--net 1:mac=00:01:02:03:04:ff \
--disk 0:size=2g test_instance
Failure: prerequisites not met for this operation:
error type: resource_not_unique, error details:
MAC address 00:01:02:03:04:ff already in use in cluster
This patch fixes the argument order in the call to Reserve().
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 30 May 2011 11:02:14 +0000 (13:02 +0200)]
ht: Accept both int and long as integers
This fixes a unittest failure on 32 bit systems. A recently added
unittest for ht.TJobId uses a rather large number (
2347625220). On 64
bit systems it is stored as “int”. On 32 bit systems however, Python
uses “long”. The two types can be intermixed in Python as the
interpreter will take care of conversions. If one processed too many
jobs (2**31) on a 32 bit system, ht would no longer accept the job IDs.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Tsachy Shacham [Wed, 18 May 2011 17:00:00 +0000 (19:00 +0200)]
Design doc for CPU pinning
Signed-off-by: Tsachy Shacham <tsachy@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 27 May 2011 10:49:39 +0000 (12:49 +0200)]
ht: Add checks for anything, regexp, job ID, container items
The check for container items is useful for tuples and/or lists with
non-uniform values. The “anything” check can be used when any value
should be accepted for an item.
The job ID check, which uses the regexp check, will be used for
expressing opcode dependencies on other jobs.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 26 May 2011 12:57:34 +0000 (14:57 +0200)]
Merge branch 'devel-2.4'
* devel-2.4:
TLReplaceDisks: Move assertion checking locks
Conflicts:
lib/cmdlib.py: Trivial
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 26 May 2011 12:36:50 +0000 (14:36 +0200)]
TLReplaceDisks: Move assertion checking locks
Commit
1bee66f3 added assertions for ensuring only the necessary locks
are kept while replacing disks. One of them makes sure locks have been
released during the operation. Unfortunately the commit added the check
as part of a “finally” branch, which is also run when an exception is
thrown (in which case the locks may not have been released yet). Errors
could be masked by the assertion error. Moving the check out of the
“finally” branch fixes the issue.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 20 May 2011 13:17:46 +0000 (15:17 +0200)]
cli.JobExecutor: Handle empty name, allow adding job IDs
With LU-generated jobs only the ID is known.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 25 May 2011 15:32:56 +0000 (17:32 +0200)]
cli.JobExecutor: Use counter for indexing jobs
If “SubmitPending” were mixed with calls to “QueueJob”, jobs in the
internal structures will get duplicate indices. With this change each
queued job is assigned a unique index, which will be used for sorting
the results.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 25 May 2011 14:45:24 +0000 (16:45 +0200)]
Fix bug in LUNodeMigrate
Commit
aac4511a added CheckArguments to LUNodeMigrate with a call to
_CheckIAllocatorOrNode. When no default iallocator is defined,
evacuating a node would always fail:
$ gnt-node migrate node123
Migrate instance(s) '...'?
y/[n]/?: y
Failure: prerequisites not met for this operation:
No iallocator or node given and no cluster-wide default iallocator
found; please specify either an iallocator or a node, or set a
cluster-wide default iallocator
This patch adds a new parameter to specify a target node. This doesn't
solve all issues, but will make the most important cases work again in
the meantime. This opcode will receive more work for node group support.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 20 May 2011 13:35:28 +0000 (15:35 +0200)]
config: Add method to get members of nodes' groups
This will be used for locking during node evacuation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 25 May 2011 08:54:34 +0000 (10:54 +0200)]
Yet another attempt to fix builds
It seems that abs_top_srcdir is not a good option, so I tested again
with just using the same as in doc/examples/bash_completion.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Tue, 24 May 2011 16:46:39 +0000 (18:46 +0200)]
Fix build breakage
Sorry, I already had PYTHONPATH exported in my env, and as I said I
wasn't able to test this on buildbot.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 24 May 2011 16:50:13 +0000 (18:50 +0200)]
Merge branch 'devel-2.4'
* devel-2.4:
node evac: don't call IAllocator if no instances
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 24 May 2011 09:29:39 +0000 (11:29 +0200)]
node evac: don't call IAllocator if no instances
Currently we generate an empty list only for the '-n node' invocation,
but for iallocator we still call the iallocator (which needs an RPC
call, etc.). By moving the computation of instances outside of the if
block, we can return early from the LU.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 24 May 2011 16:35:56 +0000 (18:35 +0200)]
Merge branch 'devel-2.4'
* devel-2.4:
RPC/Backend: Make UploadFile uid and gid agnostic
Resolve uid/gid upon mainloop run
GetEntResolver: Make it possible to resolve uid/gid to name
utils.algo: Add InvertDict to invert a dict
autotools: Add noded group
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 24 May 2011 09:34:32 +0000 (11:34 +0200)]
gnt-debug: rename allocator to iallocator
I'm always confused by this strange difference, so let's rename the
command to match what it tests.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 19 May 2011 16:44:51 +0000 (18:44 +0200)]
Misc other conversions
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 19 May 2011 16:44:37 +0000 (18:44 +0200)]
Convert job status strings to constants
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 19 May 2011 16:29:06 +0000 (18:29 +0200)]
Convert group policies to constants
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 19 May 2011 16:21:13 +0000 (18:21 +0200)]
Replace instance states hardcoded with constants
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 19 May 2011 16:18:22 +0000 (18:18 +0200)]
IAllocator.hs: replace a few strings with constants
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 19 May 2011 15:45:01 +0000 (17:45 +0200)]
Implement conversion of Python constants to Haskell
With the merge of the repositories, we can now auto-generate the code
for Haskell constants from the Python code.
Currently this only handles the basic types (strings and
integers). Handling containers such as lists and dictionaries is only
possible if we would use a parser such that we recognise the element
names. We could extend the convert-constants script if that becomes
necessary, right now I'm looking at just the simple constants such as
Iallocator modes, instance states, etc.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Thu, 19 May 2011 08:37:58 +0000 (10:37 +0200)]
RPC/Backend: Make UploadFile uid and gid agnostic
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Fri, 20 May 2011 12:24:13 +0000 (14:24 +0200)]
Resolve uid/gid upon mainloop run
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Wed, 18 May 2011 12:19:52 +0000 (14:19 +0200)]
GetEntResolver: Make it possible to resolve uid/gid to name
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Fri, 20 May 2011 12:17:29 +0000 (14:17 +0200)]
utils.algo: Add InvertDict to invert a dict
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 19 May 2011 11:41:52 +0000 (13:41 +0200)]
autotools: Add noded group
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 17 May 2011 16:12:27 +0000 (18:12 +0200)]
Improve hooks documentation unittest
Also check for the opcode ID.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Fri, 20 May 2011 14:30:21 +0000 (15:30 +0100)]
Update hooks.rst for cluster verify changes
Also update NEWS on this change.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 20 May 2011 14:20:25 +0000 (15:20 +0100)]
Fix a couple of style mistakes
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:18 +0000 (15:22 +0100)]
Cluster verify: accept a --node-group option
This will trigger a ClusterVerifyGroup operation only on the specified
group, skipping other groups as well as cluster-wide verifications.
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:17 +0000 (15:22 +0100)]
Cluster verify: check for nodes/instances with no group
Previously, all nodes and instances would *always* be visited/verified. By
driving the verification by node group now, we will miss nodes and
instances that can't be reached from existing node groups, should that rare
and bogus circumstance ever occur.
We safeguard against that by checking for unreachable nodes and instances
explicitly. (These will not be further verified.)
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:16 +0000 (15:22 +0100)]
Cluster verify: fix LV checks for split instances
When sharding by group, if a mirrored instance is split (primary and
secondary) between two groups, its volumes will not be properly checked:
the group of the primary will warn about a missing volume in the secondary,
and the group of the secondary about an unknown volume (in the secondary as
well).
To solve the "missing volumes" bit, we will detect this case and perform an
extra RPC verify call to these split secondaries (querying only for
NV_LVLIST), and introduce the results in the node images appropriately. We
do this detection early in ExpandNames/CheckPrereq, as to properly lock the
extra nodes.
As for the "unknown volumes" warning in the secondary, we update the volume
mapping with split instances before checking for orphaned volumes.
Finally, we mark nodes as "ghost" only if they really don't exist in the
cluster configuration, which avoid spurious "instance lives in ghost node"
warnings.
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:15 +0000 (15:22 +0100)]
Cluster verify: make NV_NODELIST smaller
To cope with increasing cluster sizes, we now make nodes try to contact all
other nodes in their group, and one node from every other group.
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:14 +0000 (15:22 +0100)]
Cluster verify: verify hypervisor parameters only once
The list of all hypervisor parameters has to be computed in
LUClusterVerifyGroup, since it needs to be passed to nodes as
NV_HVPARAMS. However, it is better only to verify said parameters once,
out of LUClusterVerifyConfig.
For this, we refactor the code that constructs the list of parameters to a
module-level _GetAllHypervisorParameters() function that both LUs can use.
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:13 +0000 (15:22 +0100)]
Split LUClusterVerify into LUClusterVerify{Config,Group}
With this change, LUClusterVerifyConfig becomes a "light" LU that only
verifies the global config and other, master-only settings, and the bulk of
node/instance verification is done by LUClusterVerifyGroup, which only acts
on nodes and instances of a given group.
To ensure that `gnt-cluster verify` continues to operate on the whole
cluster, the client creates an OpClusterVerifyGroup job per node group; for
convenience, the list of node groups is returned by LUClusterVerifyConfig.
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:12 +0000 (15:22 +0100)]
Cluster verify: factor out error codes and functions
We move all error code definitions, plus the _Error and _ErrorIf helpers,
to a private _VerifyErrors mix-in class that can be later shared by the new
two cluster verify LUs.
(_Error and _ErrorIf code was moved around verbatim, except to disable
"_VerifyError class does not have 'op' or '_feedback_fn' members" errors
from pylint.)
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:11 +0000 (15:22 +0100)]
Cluster verify: make "instance runs in wrong node" node-driven
Previously, the "instance should not be running in this node" error was
computed by verifying, for each instance, whether any node other than its
primary was running it. But this is not a well-suited approach if we were
to shard cluster verification (because, for each instance, we won't have
information whether it's running *outside* the current set of nodes).
By reversing the logic of the check, and asking instead, for each node,
"is it running any instance for which it's not primary", we catch all
occurrences of the problem even if running sharded.
Because of this, we can also detect orphan instances at the same time
(instances that are not known in the cluster config). We warn about them
here too, and drop the later _VerifyOrphanInstances check.
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Mon, 16 May 2011 15:22:50 +0000 (16:22 +0100)]
Verify an absent vm_capable node for files
If we're not verifying all nodes, adding a node outside the current
group for file checksums helps us making sure checksums are the same in
all of the cluster.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:10 +0000 (15:22 +0100)]
Cluster verify: master must be present for _VerifyFiles
This commit prepares the call to _VerifyFiles for the case when the master
node is not one of the nodes that's being verified (which will be the case
for all node groups but one). We fix it by always passing master info and
checksums to _VerifyFiles, which ensures there's a cluster-wide consistency
check.
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:09 +0000 (15:22 +0100)]
Cluster verify: don't assume we're verifying all nodes/instances
This commit fixes a few initial simple cases in which it was assumed that
we're always working over the whole cluster. With this change, we
differentiate between "nodes/instances to verify" and "checks that need
cluster-wide information".
In particular:
- retrieve hypervisor parameters always from all instances
- always specify full node list in NV_NODELIST
- retrieve OOB path from all nodes
- verify DRBD devices against the full set of instances (this ensures
minors get properly verified even if an instance is split between groups)
- look up node groups against the set of all nodes (to avoid tracebacks
in case instances are split between groups)
- determine whether running instances are unknown by checking against the
full list of instances
Behavior in all cases stays the same if still running over the whole
cluster.
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 3 May 2011 14:22:08 +0000 (15:22 +0100)]
Cluster verify: gather node/instance list in CheckPrereq
This commit introduces no behavior changes, and is only a minor refactoring
that aids with a cleaner division of future LUClusterVerify work. The
change consists in:
- substitute the {node,instance}{list,info} structures previously created
in Exec() by member variables created in CheckPrereq; and
- mechanically convert all references to the old variables to the new
member variables.
Creating both self.all_{node,inst}_info and self.my_{node,inst}_info, both
with the same contents at the moment, is not capricious. We've now made
Exec use the my_* variables pervasively; in future commits, we'll break the
assumption that all nodes and instances are listed there, and it'll become
clear when the all_* variables have to be substituted instead.
Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 20 May 2011 07:33:29 +0000 (09:33 +0200)]
Merge remote branch 'origin/devel-2.4'
* origin/devel-2.4:
Fix errors in hooks documentation
Clarify a bit the noded man page
Note --no-remember in NEWS
Switch QA over to using instance stop --no-remember
Implement no_remember at RAPI level
Implement no_remember at CLI level
Introduce instance start/stop no_remember attribute
Bump version for the 2.4.2 release
Fix a bug in LUInstanceMove
Abstract ignore_consistency opcode parameter
Preload the string-escape code in noded
Fix error in iallocator documentation reg. disk mode
Try to prevent instance memory changes N+1 failures
Update NEWS file for the 2.4.2 release
Conflicts:
NEWS (trivial)
doc/iallocator.rst (kept our version)
lib/cli.py (trivial)
lib/opcodes.py (removed duplicated work, both branches
introduced the same new variable
PIgnoreConsistency :)
lib/rapi/client.py (trivial)
lib/rapi/rlib2.py (almost trivial)
qa/ganeti-qa.py (below trivial)
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 19 May 2011 14:31:19 +0000 (16:31 +0200)]
cli: Replace hardcoded disk templates with constants
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 17 May 2011 16:49:46 +0000 (18:49 +0200)]
mcpu: Add missing docstring to _ProcessResult
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 17 May 2011 15:56:18 +0000 (17:56 +0200)]
config: Add function to get instances in node group
This will be used for evacuating instances in a node group.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 19 May 2011 10:46:37 +0000 (12:46 +0200)]
iallocator: Stricter check for multi-evac result
Check new secondary nodes' group like it's already done for
multi-relocation requests.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 17 May 2011 14:10:15 +0000 (16:10 +0200)]
cmdlib: Use ganeti.ht for checking iallocator result
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 17 May 2011 16:13:16 +0000 (18:13 +0200)]
Fix errors in hooks documentation
In many cases the opcode ID was incorrect. A unittest for this will
be added in the master branch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 17 May 2011 14:09:11 +0000 (16:09 +0200)]
ht: Add strict check for dictionaries
This allows checking specific dictionary items, unlike TDict
or TDictOf.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 16 May 2011 15:44:55 +0000 (17:44 +0200)]
cmdlib: Remove punctuation from error messages
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Stephen Shirley [Tue, 17 May 2011 09:02:12 +0000 (11:02 +0200)]
Various grammar fixes and updates
Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 12 May 2011 16:16:55 +0000 (18:16 +0200)]
gnt-debug: New iallocator mode
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 12 May 2011 16:16:35 +0000 (18:16 +0200)]
Add new iallocator mode to LUTestAllocator
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 11 May 2011 15:56:54 +0000 (17:56 +0200)]
cmdlib.IAllocator: Add multi-relocate support
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 11 May 2011 15:29:13 +0000 (17:29 +0200)]
Add constants for multi-relocation iallocator mode
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 16 May 2011 11:26:44 +0000 (13:26 +0200)]
Clarify a bit the noded man page
"This can be overriden" can be read as either the port we listen on or
the address we bind to. Replace with "The port" for great clarity!
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 13 May 2011 15:54:34 +0000 (17:54 +0200)]
Note --no-remember in NEWS
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 13 May 2011 15:38:25 +0000 (17:38 +0200)]
Switch QA over to using instance stop --no-remember
Instead of hardcoded Xen commands. This will make it work for all
hypervisors, instead of duplicating hypervisor functionality in QA
itself.
The timeout has been removed as gnt-instance stop itself will make
sure the instance is down before returning. We just double-check that
it is indeed down.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>