ganeti-local
13 years ago--select-instances hbal manpage update
Guido Trotter [Fri, 10 Jun 2011 12:27:12 +0000 (12:27 +0000)]
--select-instances hbal manpage update

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCheck that the selected instances are known
Guido Trotter [Mon, 13 Jun 2011 12:01:08 +0000 (12:01 +0000)]
Check that the selected instances are known

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLoader.updateMovable: evaluate selected instances
Guido Trotter [Mon, 13 Jun 2011 11:44:58 +0000 (11:44 +0000)]
Loader.updateMovable: evaluate selected instances

This also adds docstrings for the function arguments and renames exinst
to exinsts, which is how it is called in other functions, since it's a
list.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd instance selection list to Loader.mergeData
Guido Trotter [Fri, 10 Jun 2011 13:45:09 +0000 (14:45 +0100)]
Add instance selection list to Loader.mergeData

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd --select-instances hbal flag
Guido Trotter [Fri, 10 Jun 2011 13:44:30 +0000 (14:44 +0100)]
Add --select-instances hbal flag

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRemove double whitespace in help string
Guido Trotter [Fri, 10 Jun 2011 13:30:10 +0000 (14:30 +0100)]
Remove double whitespace in help string

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd gnt-network design doc
Apollon Oikonomopoulos [Thu, 16 Jun 2011 12:15:30 +0000 (15:15 +0300)]
Add gnt-network design doc

This design covers high level network block definition and pool
management.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoReplace iallocator's mreloc w/ change-group and node-evac
Michael Hanselmann [Tue, 14 Jun 2011 16:27:06 +0000 (18:27 +0200)]
Replace iallocator's mreloc w/ change-group and node-evac

This patch removes all occurrences of the “multi-relocate” iallocator
mode. Commit 25ee7fd845 updated the design document and introduced
separate modes, “change-group” and “node-evacuate”. The constants aren't
removed yet as they're still used by htools.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoFix a couple of typos
Stephen Shirley [Mon, 6 Jun 2011 08:59:46 +0000 (10:59 +0200)]
Fix a couple of typos

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMakefile: Add version check for iallocator.rst
Michael Hanselmann [Mon, 6 Jun 2011 15:18:30 +0000 (17:18 +0200)]
Makefile: Add version check for iallocator.rst

iallocator.rst contains the Ganeti version at the top.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoUpdate iallocator design for node group-aware operations
Michael Hanselmann [Mon, 6 Jun 2011 15:10:43 +0000 (17:10 +0200)]
Update iallocator design for node group-aware operations

A while ago a new ``multi-relocate`` mode was proposed and documented.
As it turned out, the interface had some deficiencies. With this patch
The relocation modes are reduced to two and split into separate
iallocator request modes: node-evacuate and change-group. Some request
and response requirements are clarified in the documentation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue: Allow loading of archived jobs
Michael Hanselmann [Wed, 1 Jun 2011 15:44:56 +0000 (17:44 +0200)]
jqueue: Allow loading of archived jobs

Chained jobs need to look at previous jobs, including archived ones. A
nice side-effect of this change is the ability to look at archived jobs
using “gnt-job info <id>” as long as the ID is known.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdding basic abstraction layer for caching
René Nussbaumer [Tue, 31 May 2011 09:54:23 +0000 (11:54 +0200)]
Adding basic abstraction layer for caching

This includes an own simple cache implementation and an
interface to a memcache instance.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix _checkRsaPrivateKey for newer key generation
Guido Trotter [Thu, 9 Jun 2011 10:03:39 +0000 (10:03 +0000)]
Fix _checkRsaPrivateKey for newer key generation

Keys generated under debian sid just read "BEGIN PRIVATE KEY" rather
than "BEGIN RSA PRIVATE KEY".

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix locking issues in LUClusterVerifyGroup
Michael Hanselmann [Tue, 7 Jun 2011 06:48:55 +0000 (08:48 +0200)]
Fix locking issues in LUClusterVerifyGroup

- Use functions in ConfigWriter instead of custom loops
- Calculate nodes only once instances locks are acquired, removes one
  potential race condition
- Don't retrieve lists of all node/instance information without locks
- Additionally move the end of the node time check window after the
  first RPC call--the second call isn't involved in checking the
  node time at all

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agocmdlib: Acquire BGL for LUClusterVerifyConfig
Michael Hanselmann [Tue, 7 Jun 2011 05:17:09 +0000 (07:17 +0200)]
cmdlib: Acquire BGL for LUClusterVerifyConfig

LUClusterVerifyConfig verifies a number of configuration settings. For
doing so, it needs a consistent list of nodes, groups and instances. So
far no locks were acquired at all (except for the BGL in shared mode).
This is a race condition (e.g. if a node group is added in parallel) and
can be fixed by acquiring the BGL in exclusive mode. Since this LU
verifies the cluster-wide configuration, doing so instead of acquiring
individual locks is just.

Includes one typo fix and one docstring update.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoExport/import instance tags
Michael Hanselmann [Tue, 7 Jun 2011 10:46:05 +0000 (12:46 +0200)]
Export/import instance tags

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoFix issue with tags on instance creation
Michael Hanselmann [Tue, 7 Jun 2011 10:45:53 +0000 (12:45 +0200)]
Fix issue with tags on instance creation

Commit 720f56c85a added the ability to specify tags when creating an
instance. The “tags” attribute of an instance object needs to be a set,
but the patch's code saved it as a list, causing breakage in other parts
of Ganeti. This patch changes the code to use TaggableObject.AddTag,
which has a nice side-effect of doing some verification (including max.
number of tags). Instance import was also broken (no “tags” attribute in
options).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoFix incomplete merge
Iustin Pop [Fri, 3 Jun 2011 09:53:04 +0000 (11:53 +0200)]
Fix incomplete merge

Commit 66bd7445 changed the semantics of _JobProcessor on finished
jobs, and updated the related unittests in the 2.4 branch. It was then
merged to master, however on master there was an additional test for
this case, which was not updated.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoExport instance tags to instance hooks
Apollon Oikonomopoulos [Tue, 31 May 2011 12:50:28 +0000 (15:50 +0300)]
Export instance tags to instance hooks

Instance hooks now get an INSTANCE_TAGS environment variable, which contains a
space-delimited list of the affected instance's tags.

Also update the documentation to reflect the change.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd tagging option to gnt-instance create
Apollon Oikonomopoulos [Tue, 31 May 2011 12:49:58 +0000 (15:49 +0300)]
Add tagging option to gnt-instance create

Add TAG_ADD_OPT option to cli.py and use it in gnt-instance. Modify
cli.GenericInstanceCreate() accordingly.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd tag handling to {Op,LU}InstanceCreate
Apollon Oikonomopoulos [Tue, 31 May 2011 12:49:28 +0000 (15:49 +0300)]
Add tag handling to {Op,LU}InstanceCreate

Add a tag slot to opcodes.OpInstanceCreate. We do not reuse _PTags, as this is
intended for OpTagsSet and thus:

  a) is not documented
  b) does not carry a default value, making it mandatory

Also pass the tags to the iallocator during instance creation.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agohttp.client: Make debug log less noisy
Michael Hanselmann [Thu, 26 May 2011 14:40:50 +0000 (16:40 +0200)]
http.client: Make debug log less noisy

The HTTP client code generates quite a lot of debug log messages. With
this patch they're hidden unless explicitely enabled in the code.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.4'
Michael Hanselmann [Wed, 1 Jun 2011 16:10:06 +0000 (18:10 +0200)]
Merge branch 'devel-2.4'

* devel-2.4:
  jqueue: Fix potential race condition when cancelling queued jobs
  Fix argument order in ReserveLV and ReserveMAC

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agohtools: introduce a type alias for JSON objects
Iustin Pop [Wed, 1 Jun 2011 14:51:39 +0000 (16:51 +0200)]
htools: introduce a type alias for JSON objects

This makes the type definitions a bit more readable/simpler.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agohail: stop using old-style 'nodes' key
Iustin Pop [Mon, 30 May 2011 12:13:13 +0000 (14:13 +0200)]
hail: stop using old-style 'nodes' key

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agohail: add parsing of multi-relocate request
Iustin Pop [Mon, 30 May 2011 11:50:49 +0000 (13:50 +0200)]
hail: add parsing of multi-relocate request

This is not handled yet, this patch just adds parsing of the incoming
request.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agohail: add option for displaying the parsed request
Iustin Pop [Mon, 30 May 2011 08:43:50 +0000 (10:43 +0200)]
hail: add option for displaying the parsed request

This can be used for debugging.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agohail: add new data types for the multi-reloc mode
Iustin Pop [Thu, 26 May 2011 13:24:36 +0000 (15:24 +0200)]
hail: add new data types for the multi-reloc mode

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAdd --no-instance-moves to the htools live tests
Guido Trotter [Tue, 31 May 2011 11:14:26 +0000 (13:14 +0200)]
Add --no-instance-moves to the htools live tests

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoUpdate hbal manpage for --no-instance-moves
Guido Trotter [Tue, 31 May 2011 10:38:50 +0000 (12:38 +0200)]
Update hbal manpage for --no-instance-moves

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoImplement balancing with no instance moves
Guido Trotter [Tue, 31 May 2011 14:57:51 +0000 (16:57 +0200)]
Implement balancing with no instance moves

Note that --no-disk-moves and --no-instance-moves are not incompatible,
but if both are used no solution can possibly exist.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoPass the instance moves option in hbal
Guido Trotter [Tue, 31 May 2011 10:34:19 +0000 (10:34 +0000)]
Pass the instance moves option in hbal

While still being ignored, now it gets passed down to the iteration
function.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd --no-instance-moves cli htools option
Guido Trotter [Mon, 30 May 2011 14:10:13 +0000 (16:10 +0200)]
Add --no-instance-moves cli htools option

This option doesn't currently do anything.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue: Fix potential race condition when cancelling queued jobs
Michael Hanselmann [Tue, 31 May 2011 14:49:45 +0000 (16:49 +0200)]
jqueue: Fix potential race condition when cancelling queued jobs

When a job was cancelled, its status would be changed and the file
written again. Since this was a final status, the job file could be
moved anytime for archival. If the job was still in the queue, however,
it would be processed (not fully, just updating the “end_timestamp”
attribute) and written again. This was bad as it could leave the same
job in two different files.

With this patch the processor is changed to return early for finished
jobs. Cancelling a queued job will finalize it right away. Unittests are
updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoiallocator: add ht-checking for the request
Iustin Pop [Mon, 30 May 2011 10:52:22 +0000 (12:52 +0200)]
iallocator: add ht-checking for the request

Currently, we only ht-check the result value from the iallocator, and
we send whatever we happen to check manually in the LUs that call the
iallocator.

This is not good, as we have to duplicate checks in many places, and
still we might miss checks. So we add add ht information to the
per-request variables. As the cluster data is built in one place, the
iallocator code itself (and is more consistent), I didn't add checks
to that too.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoiallocator: rename mem_size to memory
Iustin Pop [Mon, 30 May 2011 11:14:18 +0000 (13:14 +0200)]
iallocator: rename mem_size to memory

Currently, the iallocator in 'allocate' requires mem_size on input
but serialises that as 'memory'. This inconsistency makes it hard to
automatically validate the parameters, hence this patch renames
mem_size.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoiallocator: change default for target_groups
Iustin Pop [Mon, 30 May 2011 09:56:45 +0000 (11:56 +0200)]
iallocator: change default for target_groups

Per the design doc, the target_groups request key "if present, it must
either be the empty list, or contain a list of group UUIDs". Currently
it defaults to None/null, which is not valid.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoiallocator: export the hypervisor value
Iustin Pop [Mon, 30 May 2011 11:24:39 +0000 (13:24 +0200)]
iallocator: export the hypervisor value

In 'allocate' mode, the documentation specifies that we export the
hypervisor value (“Allocation needs, in addition: … hypervisor, the
hypervisor of this instance”) and we need that on input, however we
don't actually export it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoiallocator: fix incomplete refactoring
Iustin Pop [Mon, 30 May 2011 10:54:44 +0000 (12:54 +0200)]
iallocator: fix incomplete refactoring

Commit fdbe29ee changed the iallocator modes from 'r'/'w' to
'ro'/'rw', but forgot one check in LUTestAllocator. This patch just
completes the replacements.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agognt-node migrate: Use LU-generated jobs
Michael Hanselmann [Mon, 23 May 2011 16:55:50 +0000 (18:55 +0200)]
gnt-node migrate: Use LU-generated jobs

Until now LUNodeMigrate used multiple tasklets to evacuate all primary
instances on a node. In some cases it would acquire all node locks,
which isn't good on big clusters. With upcoming improvements to the LUs
for instance failover and migration, switching to separate jobs looks
like a better option. This patch changes LUNodeMigrate to use
LU-generated jobs.

While working on this patch, I identified a race condition in
LUNodeMigrate.ExpandNames. A node's instances were retrieved without a
lock and no verification was done.

For RAPI, a new feature string is added and can be used to detect
clusters which support more parameters for node migration. The client
is updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix argument order in ReserveLV and ReserveMAC
Apollon Oikonomopoulos [Mon, 30 May 2011 11:02:25 +0000 (14:02 +0300)]
Fix argument order in ReserveLV and ReserveMAC

ConfigWriter.ReserveLV() and Configwriter.ReserveMAC() called
TemporaryReservationManager.Reserve() with the ec_id and resource arguments
swapped. As a result, two reservation attempts for the same resource type
within the same LU would fail, even if the resources requested were different,
e.g.:

  $ gnt-instance add -t sharedfile -o debootstrap+default \
       --net 0:mac=00:01:02:03:04:00 \
       --net 1:mac=00:01:02:03:04:ff \
       --disk 0:size=2g  test_instance
  Failure: prerequisites not met for this operation:
  error type: resource_not_unique, error details:
  MAC address 00:01:02:03:04:ff already in use in cluster

This patch fixes the argument order in the call to Reserve().

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoht: Accept both int and long as integers
Michael Hanselmann [Mon, 30 May 2011 11:02:14 +0000 (13:02 +0200)]
ht: Accept both int and long as integers

This fixes a unittest failure on 32 bit systems. A recently added
unittest for ht.TJobId uses a rather large number (2347625220). On 64
bit systems it is stored as “int”. On 32 bit systems however, Python
uses “long”. The two types can be intermixed in Python as the
interpreter will take care of conversions. If one processed too many
jobs (2**31) on a 32 bit system, ht would no longer accept the job IDs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoDesign doc for CPU pinning
Tsachy Shacham [Wed, 18 May 2011 17:00:00 +0000 (19:00 +0200)]
Design doc for CPU pinning

Signed-off-by: Tsachy Shacham <tsachy@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoht: Add checks for anything, regexp, job ID, container items
Michael Hanselmann [Fri, 27 May 2011 10:49:39 +0000 (12:49 +0200)]
ht: Add checks for anything, regexp, job ID, container items

The check for container items is useful for tuples and/or lists with
non-uniform values. The “anything” check can be used when any value
should be accepted for an item.

The job ID check, which uses the regexp check, will be used for
expressing opcode dependencies on other jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.4'
Michael Hanselmann [Thu, 26 May 2011 12:57:34 +0000 (14:57 +0200)]
Merge branch 'devel-2.4'

* devel-2.4:
  TLReplaceDisks: Move assertion checking locks

Conflicts:
lib/cmdlib.py: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoTLReplaceDisks: Move assertion checking locks
Michael Hanselmann [Thu, 26 May 2011 12:36:50 +0000 (14:36 +0200)]
TLReplaceDisks: Move assertion checking locks

Commit 1bee66f3 added assertions for ensuring only the necessary locks
are kept while replacing disks. One of them makes sure locks have been
released during the operation. Unfortunately the commit added the check
as part of a “finally” branch, which is also run when an exception is
thrown (in which case the locks may not have been released yet). Errors
could be masked by the assertion error. Moving the check out of the
“finally” branch fixes the issue.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agocli.JobExecutor: Handle empty name, allow adding job IDs
Michael Hanselmann [Fri, 20 May 2011 13:17:46 +0000 (15:17 +0200)]
cli.JobExecutor: Handle empty name, allow adding job IDs

With LU-generated jobs only the ID is known.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocli.JobExecutor: Use counter for indexing jobs
Michael Hanselmann [Wed, 25 May 2011 15:32:56 +0000 (17:32 +0200)]
cli.JobExecutor: Use counter for indexing jobs

If “SubmitPending” were mixed with calls to “QueueJob”, jobs in the
internal structures will get duplicate indices. With this change each
queued job is assigned a unique index, which will be used for sorting
the results.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix bug in LUNodeMigrate
Michael Hanselmann [Wed, 25 May 2011 14:45:24 +0000 (16:45 +0200)]
Fix bug in LUNodeMigrate

Commit aac4511a added CheckArguments to LUNodeMigrate with a call to
_CheckIAllocatorOrNode. When no default iallocator is defined,
evacuating a node would always fail:

$ gnt-node migrate node123
Migrate instance(s) '...'?
y/[n]/?: y
Failure: prerequisites not met for this operation:
No iallocator or node given and no cluster-wide default iallocator
found; please specify either an iallocator or a node, or set a
cluster-wide default iallocator

This patch adds a new parameter to specify a target node. This doesn't
solve all issues, but will make the most important cases work again in
the meantime. This opcode will receive more work for node group support.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoconfig: Add method to get members of nodes' groups
Michael Hanselmann [Fri, 20 May 2011 13:35:28 +0000 (15:35 +0200)]
config: Add method to get members of nodes' groups

This will be used for locking during node evacuation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoYet another attempt to fix builds
Iustin Pop [Wed, 25 May 2011 08:54:34 +0000 (10:54 +0200)]
Yet another attempt to fix builds

It seems that abs_top_srcdir is not a good option, so I tested again
with just using the same as in doc/examples/bash_completion.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoFix build breakage
Iustin Pop [Tue, 24 May 2011 16:46:39 +0000 (18:46 +0200)]
Fix build breakage

Sorry, I already had PYTHONPATH exported in my env, and as I said I
wasn't able to test this on buildbot.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'devel-2.4'
Michael Hanselmann [Tue, 24 May 2011 16:50:13 +0000 (18:50 +0200)]
Merge branch 'devel-2.4'

* devel-2.4:
  node evac: don't call IAllocator if no instances

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agonode evac: don't call IAllocator if no instances
Iustin Pop [Tue, 24 May 2011 09:29:39 +0000 (11:29 +0200)]
node evac: don't call IAllocator if no instances

Currently we generate an empty list only for the '-n node' invocation,
but for iallocator we still call the iallocator (which needs an RPC
call, etc.). By moving the computation of instances outside of the if
block, we can return early from the LU.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'devel-2.4'
Michael Hanselmann [Tue, 24 May 2011 16:35:56 +0000 (18:35 +0200)]
Merge branch 'devel-2.4'

* devel-2.4:
  RPC/Backend: Make UploadFile uid and gid agnostic
  Resolve uid/gid upon mainloop run
  GetEntResolver: Make it possible to resolve uid/gid to name
  utils.algo: Add InvertDict to invert a dict
  autotools: Add noded group

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agognt-debug: rename allocator to iallocator
Iustin Pop [Tue, 24 May 2011 09:34:32 +0000 (11:34 +0200)]
gnt-debug: rename allocator to iallocator

I'm always confused by this strange difference, so let's rename the
command to match what it tests.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMisc other conversions
Iustin Pop [Thu, 19 May 2011 16:44:51 +0000 (18:44 +0200)]
Misc other conversions

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoConvert job status strings to constants
Iustin Pop [Thu, 19 May 2011 16:44:37 +0000 (18:44 +0200)]
Convert job status strings to constants

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoConvert group policies to constants
Iustin Pop [Thu, 19 May 2011 16:29:06 +0000 (18:29 +0200)]
Convert group policies to constants

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoReplace instance states hardcoded with constants
Iustin Pop [Thu, 19 May 2011 16:21:13 +0000 (18:21 +0200)]
Replace instance states hardcoded with constants

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoIAllocator.hs: replace a few strings with constants
Iustin Pop [Thu, 19 May 2011 16:18:22 +0000 (18:18 +0200)]
IAllocator.hs: replace a few strings with constants

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoImplement conversion of Python constants to Haskell
Iustin Pop [Thu, 19 May 2011 15:45:01 +0000 (17:45 +0200)]
Implement conversion of Python constants to Haskell

With the merge of the repositories, we can now auto-generate the code
for Haskell constants from the Python code.

Currently this only handles the basic types (strings and
integers). Handling containers such as lists and dictionaries is only
possible if we would use a parser such that we recognise the element
names. We could extend the convert-constants script if that becomes
necessary, right now I'm looking at just the simple constants such as
Iallocator modes, instance states, etc.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRPC/Backend: Make UploadFile uid and gid agnostic
René Nussbaumer [Thu, 19 May 2011 08:37:58 +0000 (10:37 +0200)]
RPC/Backend: Make UploadFile uid and gid agnostic

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoResolve uid/gid upon mainloop run
René Nussbaumer [Fri, 20 May 2011 12:24:13 +0000 (14:24 +0200)]
Resolve uid/gid upon mainloop run

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoGetEntResolver: Make it possible to resolve uid/gid to name
René Nussbaumer [Wed, 18 May 2011 12:19:52 +0000 (14:19 +0200)]
GetEntResolver: Make it possible to resolve uid/gid to name

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoutils.algo: Add InvertDict to invert a dict
René Nussbaumer [Fri, 20 May 2011 12:17:29 +0000 (14:17 +0200)]
utils.algo: Add InvertDict to invert a dict

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoautotools: Add noded group
René Nussbaumer [Thu, 19 May 2011 11:41:52 +0000 (13:41 +0200)]
autotools: Add noded group

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoImprove hooks documentation unittest
Michael Hanselmann [Tue, 17 May 2011 16:12:27 +0000 (18:12 +0200)]
Improve hooks documentation unittest

Also check for the opcode ID.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoUpdate hooks.rst for cluster verify changes
Guido Trotter [Fri, 20 May 2011 14:30:21 +0000 (15:30 +0100)]
Update hooks.rst for cluster verify changes

Also update NEWS on this change.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix a couple of style mistakes
Guido Trotter [Fri, 20 May 2011 14:20:25 +0000 (15:20 +0100)]
Fix a couple of style mistakes

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoCluster verify: accept a --node-group option
Adeodato Simo [Tue, 3 May 2011 14:22:18 +0000 (15:22 +0100)]
Cluster verify: accept a --node-group option

This will trigger a ClusterVerifyGroup operation only on the specified
group, skipping other groups as well as cluster-wide verifications.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCluster verify: check for nodes/instances with no group
Adeodato Simo [Tue, 3 May 2011 14:22:17 +0000 (15:22 +0100)]
Cluster verify: check for nodes/instances with no group

Previously, all nodes and instances would *always* be visited/verified. By
driving the verification by node group now, we will miss nodes and
instances that can't be reached from existing node groups, should that rare
and bogus circumstance ever occur.

We safeguard against that by checking for unreachable nodes and instances
explicitly. (These will not be further verified.)

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCluster verify: fix LV checks for split instances
Adeodato Simo [Tue, 3 May 2011 14:22:16 +0000 (15:22 +0100)]
Cluster verify: fix LV checks for split instances

When sharding by group, if a mirrored instance is split (primary and
secondary) between two groups, its volumes will not be properly checked:
the group of the primary will warn about a missing volume in the secondary,
and the group of the secondary about an unknown volume (in the secondary as
well).

To solve the "missing volumes" bit, we will detect this case and perform an
extra RPC verify call to these split secondaries (querying only for
NV_LVLIST), and introduce the results in the node images appropriately. We
do this detection early in ExpandNames/CheckPrereq, as to properly lock the
extra nodes.

As for the "unknown volumes" warning in the secondary, we update the volume
mapping with split instances before checking for orphaned volumes.

Finally, we mark nodes as "ghost" only if they really don't exist in the
cluster configuration, which avoid spurious "instance lives in ghost node"
warnings.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCluster verify: make NV_NODELIST smaller
Adeodato Simo [Tue, 3 May 2011 14:22:15 +0000 (15:22 +0100)]
Cluster verify: make NV_NODELIST smaller

To cope with increasing cluster sizes, we now make nodes try to contact all
other nodes in their group, and one node from every other group.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCluster verify: verify hypervisor parameters only once
Adeodato Simo [Tue, 3 May 2011 14:22:14 +0000 (15:22 +0100)]
Cluster verify: verify hypervisor parameters only once

The list of all hypervisor parameters has to be computed in
LUClusterVerifyGroup, since it needs to be passed to nodes as
NV_HVPARAMS. However, it is better only to verify said parameters once,
out of LUClusterVerifyConfig.

For this, we refactor the code that constructs the list of parameters to a
module-level _GetAllHypervisorParameters() function that both LUs can use.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoSplit LUClusterVerify into LUClusterVerify{Config,Group}
Adeodato Simo [Tue, 3 May 2011 14:22:13 +0000 (15:22 +0100)]
Split LUClusterVerify into LUClusterVerify{Config,Group}

With this change, LUClusterVerifyConfig becomes a "light" LU that only
verifies the global config and other, master-only settings, and the bulk of
node/instance verification is done by LUClusterVerifyGroup, which only acts
on nodes and instances of a given group.

To ensure that `gnt-cluster verify` continues to operate on the whole
cluster, the client creates an OpClusterVerifyGroup job per node group; for
convenience, the list of node groups is returned by LUClusterVerifyConfig.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCluster verify: factor out error codes and functions
Adeodato Simo [Tue, 3 May 2011 14:22:12 +0000 (15:22 +0100)]
Cluster verify: factor out error codes and functions

We move all error code definitions, plus the _Error and _ErrorIf helpers,
to a private _VerifyErrors mix-in class that can be later shared by the new
two cluster verify LUs.

(_Error and _ErrorIf code was moved around verbatim, except to disable
"_VerifyError class does not have 'op' or '_feedback_fn' members" errors
from pylint.)

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCluster verify: make "instance runs in wrong node" node-driven
Adeodato Simo [Tue, 3 May 2011 14:22:11 +0000 (15:22 +0100)]
Cluster verify: make "instance runs in wrong node" node-driven

Previously, the "instance should not be running in this node" error was
computed by verifying, for each instance, whether any node other than its
primary was running it. But this is not a well-suited approach if we were
to shard cluster verification (because, for each instance, we won't have
information whether it's running *outside* the current set of nodes).

By reversing the logic of the check, and asking instead, for each node,
"is it running any instance for which it's not primary", we catch all
occurrences of the problem even if running sharded.

Because of this, we can also detect orphan instances at the same time
(instances that are not known in the cluster config). We warn about them
here too, and drop the later _VerifyOrphanInstances check.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoVerify an absent vm_capable node for files
Guido Trotter [Mon, 16 May 2011 15:22:50 +0000 (16:22 +0100)]
Verify an absent vm_capable node for files

If we're not verifying all nodes, adding a node outside the current
group for file checksums helps us making sure checksums are the same in
all of the cluster.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCluster verify: master must be present for _VerifyFiles
Adeodato Simo [Tue, 3 May 2011 14:22:10 +0000 (15:22 +0100)]
Cluster verify: master must be present for _VerifyFiles

This commit prepares the call to _VerifyFiles for the case when the master
node is not one of the nodes that's being verified (which will be the case
for all node groups but one). We fix it by always passing master info and
checksums to _VerifyFiles, which ensures there's a cluster-wide consistency
check.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCluster verify: don't assume we're verifying all nodes/instances
Adeodato Simo [Tue, 3 May 2011 14:22:09 +0000 (15:22 +0100)]
Cluster verify: don't assume we're verifying all nodes/instances

This commit fixes a few initial simple cases in which it was assumed that
we're always working over the whole cluster. With this change, we
differentiate between "nodes/instances to verify" and "checks that need
cluster-wide information".

In particular:

  - retrieve hypervisor parameters always from all instances
  - always specify full node list in NV_NODELIST
  - retrieve OOB path from all nodes
  - verify DRBD devices against the full set of instances (this ensures
    minors get properly verified even if an instance is split between groups)
  - look up node groups against the set of all nodes (to avoid tracebacks
    in case instances are split between groups)
  - determine whether running instances are unknown by checking against the
    full list of instances

Behavior in all cases stays the same if still running over the whole
cluster.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCluster verify: gather node/instance list in CheckPrereq
Adeodato Simo [Tue, 3 May 2011 14:22:08 +0000 (15:22 +0100)]
Cluster verify: gather node/instance list in CheckPrereq

This commit introduces no behavior changes, and is only a minor refactoring
that aids with a cleaner division of future LUClusterVerify work. The
change consists in:

  - substitute the {node,instance}{list,info} structures previously created
    in Exec() by member variables created in CheckPrereq; and

  - mechanically convert all references to the old variables to the new
    member variables.

Creating both self.all_{node,inst}_info and self.my_{node,inst}_info, both
with the same contents at the moment, is not capricious. We've now made
Exec use the my_* variables pervasively; in future commits, we'll break the
assumption that all nodes and instances are listed there, and it'll become
clear when the all_* variables have to be substituted instead.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge remote branch 'origin/devel-2.4'
Iustin Pop [Fri, 20 May 2011 07:33:29 +0000 (09:33 +0200)]
Merge remote branch 'origin/devel-2.4'

* origin/devel-2.4:
  Fix errors in hooks documentation
  Clarify a bit the noded man page
  Note --no-remember in NEWS
  Switch QA over to using instance stop --no-remember
  Implement no_remember at RAPI level
  Implement no_remember at CLI level
  Introduce instance start/stop no_remember attribute
  Bump version for the 2.4.2 release
  Fix a bug in LUInstanceMove
  Abstract ignore_consistency opcode parameter
  Preload the string-escape code in noded
  Fix error in iallocator documentation reg. disk mode
  Try to prevent instance memory changes N+1 failures
  Update NEWS file for the 2.4.2 release

Conflicts:
        NEWS                (trivial)
        doc/iallocator.rst  (kept our version)
        lib/cli.py          (trivial)
        lib/opcodes.py      (removed duplicated work, both branches
                             introduced the same new variable
                              PIgnoreConsistency :)
        lib/rapi/client.py  (trivial)
        lib/rapi/rlib2.py   (almost trivial)
        qa/ganeti-qa.py     (below trivial)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agocli: Replace hardcoded disk templates with constants
Michael Hanselmann [Thu, 19 May 2011 14:31:19 +0000 (16:31 +0200)]
cli: Replace hardcoded disk templates with constants

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agomcpu: Add missing docstring to _ProcessResult
Michael Hanselmann [Tue, 17 May 2011 16:49:46 +0000 (18:49 +0200)]
mcpu: Add missing docstring to _ProcessResult

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoconfig: Add function to get instances in node group
Michael Hanselmann [Tue, 17 May 2011 15:56:18 +0000 (17:56 +0200)]
config: Add function to get instances in node group

This will be used for evacuating instances in a node group.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoiallocator: Stricter check for multi-evac result
Michael Hanselmann [Thu, 19 May 2011 10:46:37 +0000 (12:46 +0200)]
iallocator: Stricter check for multi-evac result

Check new secondary nodes' group like it's already done for
multi-relocation requests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocmdlib: Use ganeti.ht for checking iallocator result
Michael Hanselmann [Tue, 17 May 2011 14:10:15 +0000 (16:10 +0200)]
cmdlib: Use ganeti.ht for checking iallocator result

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix errors in hooks documentation
Michael Hanselmann [Tue, 17 May 2011 16:13:16 +0000 (18:13 +0200)]
Fix errors in hooks documentation

In many cases the opcode ID was incorrect. A unittest for this will
be added in the master branch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoht: Add strict check for dictionaries
Michael Hanselmann [Tue, 17 May 2011 14:09:11 +0000 (16:09 +0200)]
ht: Add strict check for dictionaries

This allows checking specific dictionary items, unlike TDict
or TDictOf.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocmdlib: Remove punctuation from error messages
Michael Hanselmann [Mon, 16 May 2011 15:44:55 +0000 (17:44 +0200)]
cmdlib: Remove punctuation from error messages

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoVarious grammar fixes and updates
Stephen Shirley [Tue, 17 May 2011 09:02:12 +0000 (11:02 +0200)]
Various grammar fixes and updates

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agognt-debug: New iallocator mode
Michael Hanselmann [Thu, 12 May 2011 16:16:55 +0000 (18:16 +0200)]
gnt-debug: New iallocator mode

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd new iallocator mode to LUTestAllocator
Michael Hanselmann [Thu, 12 May 2011 16:16:35 +0000 (18:16 +0200)]
Add new iallocator mode to LUTestAllocator

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocmdlib.IAllocator: Add multi-relocate support
Michael Hanselmann [Wed, 11 May 2011 15:56:54 +0000 (17:56 +0200)]
cmdlib.IAllocator: Add multi-relocate support

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd constants for multi-relocation iallocator mode
Michael Hanselmann [Wed, 11 May 2011 15:29:13 +0000 (17:29 +0200)]
Add constants for multi-relocation iallocator mode

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoClarify a bit the noded man page
Iustin Pop [Mon, 16 May 2011 11:26:44 +0000 (13:26 +0200)]
Clarify a bit the noded man page

"This can be overriden" can be read as either the port we listen on or
the address we bind to. Replace with "The port" for great clarity!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoNote --no-remember in NEWS
Iustin Pop [Fri, 13 May 2011 15:54:34 +0000 (17:54 +0200)]
Note --no-remember in NEWS

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoSwitch QA over to using instance stop --no-remember
Iustin Pop [Fri, 13 May 2011 15:38:25 +0000 (17:38 +0200)]
Switch QA over to using instance stop --no-remember

Instead of hardcoded Xen commands. This will make it work for all
hypervisors, instead of duplicating hypervisor functionality in QA
itself.

The timeout has been removed as gnt-instance stop itself will make
sure the instance is down before returning. We just double-check that
it is indeed down.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>