ganeti-local
12 years agoRevert "Rename utils.mlock to utils.cfunc"
Guido Trotter [Wed, 19 Oct 2011 17:29:22 +0000 (18:29 +0100)]
Revert "Rename utils.mlock to utils.cfunc"

The rename is not needed either, since we're not adding more code as of
now.
This reverts commit 57ca011e1cd2681948969724e2646edaac22da28.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRevert "utils.cfunc: Cleanup, more flexibility"
Guido Trotter [Wed, 19 Oct 2011 17:28:42 +0000 (18:28 +0100)]
Revert "utils.cfunc: Cleanup, more flexibility"

We discussed that this is not needed right now, and it breaks existing
functionality and unittests.

This reverts commit 6915fe26da8dce41fc967d761f005390aa956161.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix unittest failures with python 2.7
Guido Trotter [Wed, 19 Oct 2011 17:05:22 +0000 (18:05 +0100)]
Fix unittest failures with python 2.7

In python 2.7 the ovf unittests fail because OVFReader expects
ElementTree.parse() of an erroneous document to throw an
xml.parsers.expat.ExpatError while instead it throws an
ElementTree.ParseError.

The solution is to "except" for both errors, with the catch that
ParseError didn't exist before, so we need to define it locally and
get it from the module if it exists, while leaving it set to "None"
(thus catching no exception) if it does not.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Agata Murawska  <agatamurawska@google.com>

12 years agoMerge branch 'devel-2.5'
Guido Trotter [Tue, 18 Oct 2011 15:06:14 +0000 (16:06 +0100)]
Merge branch 'devel-2.5'

* devel-2.5:
  Revert "rapi.client.ModifyNode should PUT rather than POST"
    - also fix the actual call, which was merged as PUT in master by
      mistake.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRevert "rapi.client.ModifyNode should PUT rather than POST"
Guido Trotter [Tue, 18 Oct 2011 15:01:39 +0000 (16:01 +0100)]
Revert "rapi.client.ModifyNode should PUT rather than POST"

This was a mistake on my side because ModifyGroup and ModifyInstance
were PUT, and I was not aware of the discussion and the rationale why
this one had to be POST.

This reverts commit 55ef0cf6497c570aaab9413851435a7ee744222e.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoMerge branch 'devel-2.5'
Guido Trotter [Tue, 18 Oct 2011 14:24:13 +0000 (15:24 +0100)]
Merge branch 'devel-2.5'

* devel-2.5:
  Revert "Added SPICE TLS option and related cert paths"
  Revert "Implementation of TLS-protected SPICE connections"
  Revert "Updated man pages with new SPICE TLS options"
  Revert "Add tls_ciphers and use_vdagent options"

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRevert "Added SPICE TLS option and related cert paths"
Guido Trotter [Mon, 17 Oct 2011 15:47:00 +0000 (16:47 +0100)]
Revert "Added SPICE TLS option and related cert paths"

This reverts commit bfe86c763a9ff1b481d799537ff0f0cf6740dfd1.
This commit will be readded on master.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRevert "Implementation of TLS-protected SPICE connections"
Guido Trotter [Mon, 17 Oct 2011 15:46:41 +0000 (16:46 +0100)]
Revert "Implementation of TLS-protected SPICE connections"

This reverts commit b6267745ede04b3c943bc02e004bdb9347e0f564.
This commit will be readded on master.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRevert "Updated man pages with new SPICE TLS options"
Guido Trotter [Mon, 17 Oct 2011 15:46:20 +0000 (16:46 +0100)]
Revert "Updated man pages with new SPICE TLS options"

This reverts commit b8a10435271ec4457cdc254e0a6b466b2d3bff24.
This commit will be readded on master.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRevert "Add tls_ciphers and use_vdagent options"
Guido Trotter [Mon, 17 Oct 2011 15:45:45 +0000 (16:45 +0100)]
Revert "Add tls_ciphers and use_vdagent options"

This reverts commit 3e40b5879fa0070d6dd0e689dcfc31f20198a5a8.
This commit will be readded on master.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoMerge branch 'devel-2.5'
Guido Trotter [Tue, 18 Oct 2011 13:00:25 +0000 (14:00 +0100)]
Merge branch 'devel-2.5'

* devel-2.5:
  rapi.client.ModifyNode should PUT rather than POST
  Fix RAPI node modify client and server calls
  xen: changes to facilitate "xl" support (xen 4.1)
  xen: abstract instance config file naming
  Abstract xen's 'xm' command as a constant
  Fix RAPI documentation build
  rapi: Allow auto-promotion on node role change
  rapi: Add resource for modifying node
  opcodes: Add comment to *SetParams result description

Conflicts:
lib/rapi/client.py
              - both functions stay, remove one empty line
lib/rapi/rlib2.py
              - convert new function to 2.6 rapi style
test/ganeti.rapi.client_unittest.py
              - both tests stay, trivial

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agorapi.client.ModifyNode should PUT rather than POST
Guido Trotter [Tue, 18 Oct 2011 12:32:12 +0000 (13:32 +0100)]
rapi.client.ModifyNode should PUT rather than POST

This was caught (albeit in a sibylline manner) by unittests on master
which are not present in 2.5.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoFix RAPI node modify client and server calls
Guido Trotter [Tue, 18 Oct 2011 08:50:03 +0000 (09:50 +0100)]
Fix RAPI node modify client and server calls

rapi.client.ModifyNode accepts a "group" and not a "node" param.
(this bug is invisible but still not nice)

rlib2.R_2_nodes_name_modify submits the opcode with instance_name rather
than node_name as a param. This would break the call.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoxen: changes to facilitate "xl" support (xen 4.1)
Guido Trotter [Mon, 17 Oct 2011 11:05:00 +0000 (12:05 +0100)]
xen: changes to facilitate "xl" support (xen 4.1)

- Copy the xl config file, in case there's any
- Start instances by config file, not name (also xm compatible)
- Start paused domains with -p and not --paused (also xm compatible)
- Add a fixme for migration (changes are not xm compatible)

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoxen: abstract instance config file naming
Guido Trotter [Mon, 17 Oct 2011 11:04:18 +0000 (12:04 +0100)]
xen: abstract instance config file naming

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAbstract xen's 'xm' command as a constant
Guido Trotter [Mon, 17 Oct 2011 10:35:52 +0000 (11:35 +0100)]
Abstract xen's 'xm' command as a constant

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoutils.cfunc: Cleanup, more flexibility
Michael Hanselmann [Thu, 13 Oct 2011 17:04:37 +0000 (19:04 +0200)]
utils.cfunc: Cleanup, more flexibility

- Split code using ctypes directly into a helper class
- Don't load “libc.so.6”, but use handle for main program instead (see
  comment in code)
- Clarify comment on errno with older ctypes versions
- Rename unittest since it can't be used for other functions (modifies
  process environment at runtime)
- Add boolean return value for “Mlockall”

These changes are leftovers from some experiments with ctypes.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoRename utils.mlock to utils.cfunc
Michael Hanselmann [Thu, 13 Oct 2011 15:24:48 +0000 (17:24 +0200)]
Rename utils.mlock to utils.cfunc

Renaming so that more code using ctypes could be added to the same file.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd design doc for virtual(ised) clusters
Iustin Pop [Fri, 7 Oct 2011 18:30:09 +0000 (20:30 +0200)]
Add design doc for virtual(ised) clusters

I am currently able to run a 2-node virtual cluster on my machine,
with a very ad-hoc setup. But the results show clearly that this is
doable, and that given the right tools, setting up such a cluster will
be quite easy.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoDocument some useful Haskell tips
Iustin Pop [Thu, 13 Oct 2011 12:11:59 +0000 (14:11 +0200)]
Document some useful Haskell tips

This improves devnotes.rst with some tricks for Haskell development,
and additionally it does two Makefile improvements:

- properly document lib/_vcsversion.py as a requirement for
  Constants.hs (but do not require rebuild when updated)
- move HEXTRA at the end of the GHC invocation, so any command line
  options will indeed override the built-in ones (especially -osuf)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFurther cleanup in hspace
Iustin Pop [Thu, 13 Oct 2011 10:39:56 +0000 (12:39 +0200)]
Further cleanup in hspace

This moves the checking of results from the allocation functions to a
separate function, so that we have less code duplication. It also does
a bit of simplification in the printing functions.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoA bit of cleanup in hspace
Iustin Pop [Mon, 10 Oct 2011 10:49:47 +0000 (12:49 +0200)]
A bit of cleanup in hspace

The node offline/mcpu is identical to hbal's setNodesStatus, so let's
move that to CLI.hs and reuse it in hspace (also, rename it and drop
one 's').

Also, the check for the number of nodes is obsolete, as we compute
that from the disk template.

The patch does a bit of other small cleanups.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd a type synonym for the allocation function sig
Iustin Pop [Thu, 13 Oct 2011 09:14:27 +0000 (11:14 +0200)]
Add a type synonym for the allocation function sig

Both iterateAlloc and tieredAlloc share the same signature, but it's
not documented nor exported (needed for refactoring).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agohtools: Simplify Luxi query results parsing
Iustin Pop [Wed, 12 Oct 2011 16:05:19 +0000 (18:05 +0200)]
htools: Simplify Luxi query results parsing

The logic is not entirely correct—the new Query interface exports the
field status, and we don't use that yet. But the new code should be
more readable.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoAdjust htools code to new Luxi argument format
Iustin Pop [Wed, 12 Oct 2011 11:54:57 +0000 (13:54 +0200)]
Adjust htools code to new Luxi argument format

This partially undoes commit 92678b3, more specifically it removes the
Store data type and the associated code, since all Luxi arguments are
now lists.

Furthermore, since the qfilter field on Query is complex (it's
actually a tree structure), and we don't support it, turn it into a
plain () type, which always gets encoded as JSNull ('null'), so that
we can remove the optional field handling from Luxi (all fields are
always required).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoFix RAPI documentation build
Michael Hanselmann [Thu, 13 Oct 2011 14:58:19 +0000 (16:58 +0200)]
Fix RAPI documentation build

*mumble*

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agorapi: Allow auto-promotion on node role change
Michael Hanselmann [Thu, 13 Oct 2011 12:19:08 +0000 (14:19 +0200)]
rapi: Allow auto-promotion on node role change

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agorapi: Add resource for modifying node
Michael Hanselmann [Thu, 13 Oct 2011 12:11:00 +0000 (14:11 +0200)]
rapi: Add resource for modifying node

A separate patch will add “auto-promote” through
“/2/nodes/[node_name]/role”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoopcodes: Add comment to *SetParams result description
Michael Hanselmann [Thu, 13 Oct 2011 12:10:07 +0000 (14:10 +0200)]
opcodes: Add comment to *SetParams result description

Explicitely say that the second element of the tuple is the new value.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoconstants: Verify exported names
Michael Hanselmann [Thu, 13 Oct 2011 12:36:41 +0000 (14:36 +0200)]
constants: Verify exported names

The “constants” module is a bit special in the sense that we don't want
to export random stuff from it. This unittest checks the naming
convention and removes imported modules from the module's namespace.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agohttp.client: Remove HTTP client pool code
Michael Hanselmann [Thu, 13 Oct 2011 11:01:09 +0000 (13:01 +0200)]
http.client: Remove HTTP client pool code

This patch removes all remains of the HTTP client pool. Newly added unittests
provide 96% coverage on http.client.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agorpc: Remove thread-local storage with HTTP pool
Michael Hanselmann [Wed, 12 Oct 2011 12:18:01 +0000 (14:18 +0200)]
rpc: Remove thread-local storage with HTTP pool

The HTTP pool is no longer used.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoMerge branch 'devel-2.5'
Michael Hanselmann [Wed, 12 Oct 2011 11:39:10 +0000 (13:39 +0200)]
Merge branch 'devel-2.5'

* devel-2.5:
  rpc: Disable HTTP client pool and reduce memory consumption
  Preserve bridge MTU in KVM ifup script
  hail: Fix result for node evacuation
  Fix assertion error on unclean master shutdown

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoTiny optimisation related to filter parsing
Iustin Pop [Wed, 12 Oct 2011 11:25:42 +0000 (13:25 +0200)]
Tiny optimisation related to filter parsing

Currently, we get a luxi Client, then parse the filter, then execute
the query. If parsing the filter fails, we connected to the masterd
needlessly.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoMerge branch 'stable-2.5' into devel-2.5
Michael Hanselmann [Wed, 12 Oct 2011 11:03:15 +0000 (13:03 +0200)]
Merge branch 'stable-2.5' into devel-2.5

* stable-2.5:
  rpc: Disable HTTP client pool and reduce memory consumption
  hail: Fix result for node evacuation
  Fix assertion error on unclean master shutdown

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoStandardise LUXI call argument types
Iustin Pop [Wed, 12 Oct 2011 10:16:32 +0000 (12:16 +0200)]
Standardise LUXI call argument types

Currently, we have 4 types of arguments in LUXI calls:

- most common, a list of values
- a single argument that is sent as a list of one element
- a single argument that is sent by itself
- a dictionary (only Query and QueryFields)

This inconsistency makes it not only harder to auto-generate the
HTools LUXI interface, but also in general to check the arguments and
(if we ever want to do it) auto-generate the Python LUXI client.

Compare this with the node daemon, which uses consistently a list for
its arguments, and even with way more changes over time had no issues
with extending the interface.

In case we want to extend a call, there are two options:

- preferred: add a new call, keep the old one unchanged
- possible: add further parameters to the current argument list

The patch against HTools will follow—sending separately as the Python
changes are very clear by themselves.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRename filter and filter_ to qfilter
Iustin Pop [Wed, 12 Oct 2011 09:07:18 +0000 (11:07 +0200)]
Rename filter and filter_ to qfilter

We currently use 'filter' as the OpCode, QueryRequest and RAPI field
name for representing a query filter. However, since 'filter' is a
built-in function, we actually have to use filter_ throughout the code
in order to not override the built-in function.

This patch simply goes and does a global sed over the code. Due to the
fact that the RAPI interface already exposed this field, we add
compatibility code for now which handles both forms.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoMerge branch 'devel-2.4' into stable-2.5
Michael Hanselmann [Wed, 12 Oct 2011 11:00:19 +0000 (13:00 +0200)]
Merge branch 'devel-2.4' into stable-2.5

* devel-2.4:
  rpc: Disable HTTP client pool and reduce memory consumption
  Fix assertion error on unclean master shutdown

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agorpc: Disable HTTP client pool and reduce memory consumption
Michael Hanselmann [Wed, 12 Oct 2011 10:37:43 +0000 (12:37 +0200)]
rpc: Disable HTTP client pool and reduce memory consumption

We noticed that “ganeti-masterd” can use large amounts of memory,
especially on large clusters. Measurements showed a single PycURL client
using about 500 kB of heap memory (the actual usage depends on versions,
build options and settings).

The RPC client uses a per-thread HTTP client pool with one client per
node. At this time there are 41 non-main threads (25 for the job queue
and 16 for client requests). This means the HTTP client pools use a lot
of memory (ca. 200 MB for 10 nodes, ca. 1 GB for 50 nodes).

This patch disables the per-thread HTTP client pool. No cleanup of
unused code is done. That will be done in the master branch only.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoHaskell support for generic Query in Luxi
Iustin Pop [Wed, 12 Oct 2011 08:08:00 +0000 (10:08 +0200)]
Haskell support for generic Query in Luxi

Untill now htools did not have support for generic Query in Luxi. This
patch introduces Query as a supported Luxi operation and replaces
QueryNodes, QueryInstances and QueryGroups with Query.

Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoTH simplification for Luxi
Agata Murawska [Tue, 11 Oct 2011 10:52:12 +0000 (12:52 +0200)]
TH simplification for Luxi

This patch simplifies the generation of save constructors for LuxiOp
by always using showJSON over an array of JSValues, instead of having
to pass showJSON in most cases, except the 5-tuple case.

Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
[iustin@google.com: fixed a few issues]

12 years agoDots in docstings and hlint error fixes for htools
Agata Murawska [Tue, 11 Oct 2011 10:51:03 +0000 (12:51 +0200)]
Dots in docstings and hlint error fixes for htools

Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd design doc for the resource model changes
Iustin Pop [Tue, 20 Sep 2011 04:39:58 +0000 (13:39 +0900)]
Add design doc for the resource model changes

This is not complete, but is as close as I can get it for now. I
expect people actually implementing the various changes to extend the
design doc.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoPreserve bridge MTU in KVM ifup script
Andrea Spadaccini [Tue, 11 Oct 2011 13:39:05 +0000 (14:39 +0100)]
Preserve bridge MTU in KVM ifup script

Closes: #201 - KVM_IFUP does not set bridge-MTU on tap devices
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRemove the oneline output option in hbal
Iustin Pop [Mon, 10 Oct 2011 10:12:50 +0000 (12:12 +0200)]
Remove the oneline output option in hbal

This was, AFAIK, never used, and complicates the output code enough
that it's better to remove it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRework/split hbal's main function
Iustin Pop [Mon, 10 Oct 2011 09:48:45 +0000 (11:48 +0200)]
Rework/split hbal's main function

This is just moving code around. A subsequent patch will do a bit more
cleanup and changing the output.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoSkip application of 'id' in TH code
Iustin Pop [Thu, 6 Oct 2011 14:48:40 +0000 (16:48 +0200)]
Skip application of 'id' in TH code

This is just beautification when dumping splices to stdout, as ghc
will optimise the 'id' away anyway.

Original generate code:

  opToArgs QueryTags kind name = J.showJSON (id kind, id name)

Afterwards:

  opToArgs QueryTags kind name = J.showJSON (kind, name)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoDon't send gratuitous ARP if master IP setup fails
Andrea Spadaccini [Fri, 7 Oct 2011 16:03:21 +0000 (17:03 +0100)]
Don't send gratuitous ARP if master IP setup fails

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoDocument --ignore-errors and --error-codes
Andrea Spadaccini [Thu, 6 Oct 2011 19:28:23 +0000 (20:28 +0100)]
Document --ignore-errors and --error-codes

Update the man page of gnt-cluster to contain the documentation of the
--ignore-errors and --error-codes verify options. Also, include the list
of the error codes and their documentation.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd error codes documentation
Andrea Spadaccini [Thu, 6 Oct 2011 19:25:54 +0000 (20:25 +0100)]
Add error codes documentation

lib/constants.py
* add to each CV_E* tuple the documentation of the error code
* add the DOCUMENTED_CONSTANTS constant for the doc preprocessor

autotools/docpp
* add a new directive class CONSTANTS_<kind>, that gets data from
  constants.DOCUMENTED_CONSTANTS

lib/cmdlib.py
* modify the code that unpacked the CV_E* tuples to ignore the
  documentation parameter

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoGeneralize docpp and sphinx_ext
Andrea Spadaccini [Thu, 6 Oct 2011 19:19:43 +0000 (20:19 +0100)]
Generalize docpp and sphinx_ext

autotools/docpp
* handle generic custom directives in the form <class>_<kind>
* adapt handling of query fields

build/sphinx_ext.py
* add the BuildValuesDoc function to output definitions using the sphinx
  syntax that was already used for query fields
* adapt BuildQueryFields to use BuildValuesDoc

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agohail: Fix result for node evacuation
Michael Hanselmann [Fri, 7 Oct 2011 09:58:09 +0000 (11:58 +0200)]
hail: Fix result for node evacuation

According to the iallocator documentation the “node-evacuate” call needs
to return a list of jobs, not a list of lists of jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoUse TemplateHaskell to create LUXI operations
Agata Murawska [Tue, 4 Oct 2011 15:29:39 +0000 (17:29 +0200)]
Use TemplateHaskell to create LUXI operations

Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoDocumentation update for ovfconverter
Agata Murawska [Wed, 5 Oct 2011 09:25:13 +0000 (11:25 +0200)]
Documentation update for ovfconverter

Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFixes for ovfconverter + vmware
Agata Murawska [Wed, 5 Oct 2011 09:24:56 +0000 (11:24 +0200)]
Fixes for ovfconverter + vmware

Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoDemote to warnings the errors in --ignore-errors
Andrea Spadaccini [Tue, 4 Oct 2011 14:55:48 +0000 (15:55 +0100)]
Demote to warnings the errors in --ignore-errors

Treat the gnt-cluster verify errors identified by the error codes in
--ignore-errors as warnings; just print a warning message for the user.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd --ignore-errors parameter to cluster verify
Andrea Spadaccini [Tue, 4 Oct 2011 12:26:43 +0000 (13:26 +0100)]
Add --ignore-errors parameter to cluster verify

lib/cli.py
- add IGNORE_ERROR_OPT;

client/gnt_cluster.py
- pass the ignore_errors parameter to the opcodes

lib/opcode.py
- update OpClusterVerifyConfig, OpClusterVerify and OpClusterVerifyGroup
  to accept the ignore_errors parameter

lib/cmdlib.py
- pass the ignore_errors parameter to the opcodes that need it

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoMove cluster verify error codes to constants
Andrea Spadaccini [Tue, 4 Oct 2011 12:26:43 +0000 (13:26 +0100)]
Move cluster verify error codes to constants

- move the cluster verify error codes from cmdlib._VerifyErrors to
  constants;
- add to each of them the CV (Cluster Verify) prefix;
- add the CV_ALL_ECODES and CV_ALL_ECODES_STRINGS constants;
- wrap the lines that exceed 80 characters after changing the error
  code names to the new ones.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoRestore backend.GetMasterInfo return values order
Andrea Spadaccini [Wed, 5 Oct 2011 10:10:35 +0000 (11:10 +0100)]
Restore backend.GetMasterInfo return values order

Change 5a8648eb609f7e3a8d7ad7f82e93cfdd467a8fb5 changed the order of the
return values of backend.GetMasterInfo(). This broke the users of the
master_info RPC.

This change restores the original order, and adds a comment in
bootstrap.py about the new value added to the return values of
master_info.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd cluster netmask parameter
Andrea Spadaccini [Mon, 3 Oct 2011 14:23:55 +0000 (15:23 +0100)]
Add cluster netmask parameter

Add the master_netmask cluster parameter, that represents the netmask of
the master IP, encoded as a CIDR suffix.

This parameter can be set via the --master-netmask of gnt-cluster init
and gnt-cluster modify. The default behaviour is to be consistent with
the old default (/32 for IPv4 and /128 for IPv6).

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd ValidateNetmask and GetClass IPAddress methods
Andrea Spadaccini [Mon, 3 Oct 2011 14:22:10 +0000 (15:22 +0100)]
Add ValidateNetmask and GetClass IPAddress methods

Add the following methods to netutils.IPAddress:
* ValidateNetmask
* GetClassFromIpVersion
* GetClassFromIpFamily

Also, add related tests to the test suite.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoMerge branch 'devel-2.5'
Andrea Spadaccini [Tue, 4 Oct 2011 18:34:27 +0000 (19:34 +0100)]
Merge branch 'devel-2.5'

* devel-2.5:
  cluster-merge: log an info message at node readd
  Bump version to 2.5.0~rc1
  Fix issue when verifying cluster files
  Revert "utils.log: Write error messages to stderr"
  Fix adding nodes after commit 64c7b3831dc
  LUClusterVerifyGroup: Spread SSH checks over more nodes
  Optimise cli.JobExecutor with many pending jobs

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoMerge branch 'stable-2.5' into devel-2.5
Andrea Spadaccini [Tue, 4 Oct 2011 18:31:46 +0000 (19:31 +0100)]
Merge branch 'stable-2.5' into devel-2.5

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agocluster-merge: log an info message at node readd
Guido Trotter [Tue, 4 Oct 2011 18:02:39 +0000 (14:02 -0400)]
cluster-merge: log an info message at node readd

node readd can take a long time, it's good to have info messages to see
progress.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

12 years agoBump version to 2.5.0~rc1 v2.5.0rc1
Michael Hanselmann [Tue, 4 Oct 2011 09:29:34 +0000 (11:29 +0200)]
Bump version to 2.5.0~rc1

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix Makefile rules for QCHelper.hs
Iustin Pop [Mon, 3 Oct 2011 16:25:24 +0000 (18:25 +0200)]
Fix Makefile rules for QCHelper.hs

Include QCHelper.hs in the distributed files, and also exclude it and
the THH.hs file from coverage reports.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

12 years agoFix issue when verifying cluster files
Michael Hanselmann [Mon, 3 Oct 2011 14:58:22 +0000 (16:58 +0200)]
Fix issue when verifying cluster files

If a cluster has any non-master-candidate nodes, those don't contain all
files (e.g. config.data). With commit aef59ae764dc (March 31st, 2011)
the logic was changed and subsequently verifying a cluster with non-mc
nodes would complain.

This patch fixes this issue by changing the algorithm. It also adds an
additional check for files which shouldn't exist on a machine. A newly
added unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoRevert "utils.log: Write error messages to stderr"
Michael Hanselmann [Mon, 3 Oct 2011 10:46:27 +0000 (12:46 +0200)]
Revert "utils.log: Write error messages to stderr"

This reverts commit 34aa8b7c4bb6f5e2e788108e024c9cd70bdb3431. Writing
error messages to stderr would also include backtraces, something we
tried to avoid in the past.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoFix adding nodes after commit 64c7b3831dc
Michael Hanselmann [Mon, 3 Oct 2011 10:04:09 +0000 (12:04 +0200)]
Fix adding nodes after commit 64c7b3831dc

Commit 64c7b3831dc changed the RPC call for verifying SSH connections.
Unfortunately this case in adding nodes was missed.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoSome TH simplifications
Iustin Pop [Fri, 30 Sep 2011 07:46:49 +0000 (09:46 +0200)]
Some TH simplifications

Now that the basic code works, let's use some aliases for simpler code
and less ))))))))).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoA few minor test improvements
Iustin Pop [Wed, 21 Sep 2011 09:43:57 +0000 (18:43 +0900)]
A few minor test improvements

This patch adds a few niceties to the test suite:

- allows matching test groups case insensitive and emit warnings when
  we give test group names that don't match anything
- add a new operator that is similar to assertEqual in Python: it
  tests for equality and emits the two values in case of error

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoUse TemplateHaskell to decorate tests with names
Iustin Pop [Tue, 20 Sep 2011 08:08:55 +0000 (17:08 +0900)]
Use TemplateHaskell to decorate tests with names

This makes error message change from "Test 4 failed …" to "Test
prop_Loader_mergeData failed", which is much more readable. It also
removes the duplication of test suite names in the test.hs file.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoUse TemplateHaskell to generate opcode serialisation
Iustin Pop [Wed, 21 Sep 2011 09:23:04 +0000 (18:23 +0900)]
Use TemplateHaskell to generate opcode serialisation

This replaces the hand-coded opcode serialisation code with
auto-generation based on TemplateHaskell.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoUse TemplateHaskell to build the opID function
Iustin Pop [Tue, 20 Sep 2011 07:17:53 +0000 (16:17 +0900)]
Use TemplateHaskell to build the opID function

This replaces the hand-coded opID with one automatically generated
from the constructor names, similar to the way Python does it, except
it's done at compilation time as opposed to runtime.

Again, the code line delta does not favour this patch, but this
eliminates error-prone, manual code with auto-generated one; in case
we add more opcode support, this will help a lot.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoUse TemplateHaskell instead of hand-coded instances
Iustin Pop [Tue, 20 Sep 2011 06:43:08 +0000 (15:43 +0900)]
Use TemplateHaskell instead of hand-coded instances

This patch replaces the current hard-coded JSON instances (all alike,
just manual conversion to/from string) with auto-generated code based
on Template Haskell
(http://www.haskell.org/haskellwiki/Template_Haskell).

The reduction in code line is not big, as the helper module is well
documented and thus overall we gain about 70 code lines; however, if
we ignore comments we're in good shape, and any future addition of
such data types will be much simpler and less error-prone.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoRename some helper functions for consistency
Iustin Pop [Tue, 20 Sep 2011 04:53:45 +0000 (13:53 +0900)]
Rename some helper functions for consistency

This changes the names for some helper functions so that future
patches are touching less unrelated code. The change replaces
shortened prefixes with the full type name.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoSplit part of Utils.hs into JSON.hs
Iustin Pop [Tue, 20 Sep 2011 14:58:36 +0000 (23:58 +0900)]
Split part of Utils.hs into JSON.hs

Utils is a bit big, let's split the JSON stuff (not all of it) into a
separate module that doesn't have any other dependencies.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoLUClusterVerifyGroup: Spread SSH checks over more nodes
Michael Hanselmann [Fri, 30 Sep 2011 15:48:28 +0000 (17:48 +0200)]
LUClusterVerifyGroup: Spread SSH checks over more nodes

When verifying a group the code would always check SSH to all nodes in
the same group, as well as the first node for every other group. On big
clusters this can cause issues since many nodes will try to connect to
the first node of another group at the same time. This patch changes the
algorithm to choose a different node every time.

A unittest for the selection algorithm is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoOptimise cli.JobExecutor with many pending jobs
Iustin Pop [Fri, 30 Sep 2011 14:35:29 +0000 (16:35 +0200)]
Optimise cli.JobExecutor with many pending jobs

In the case we submit many pending jobs (> 100) to the masterd, the
JobExecutor 'spams' the master daemon with status requests for the
status of all the jobs, even though in the end it will only choose a
single job for polling.

This is very sub-optimal, because when the master is busy processing
small/fast jobs, this query forces reading all the jobs from
this. Restricting the 'window' of jobs that we query from the entire
set to a smaller subset makes a huge difference (masterd only, 0s
delay jobs, all jobs to tmpfs thus no I/O involved):

- submitting/waiting for 500 jobs:
  - before: ~21 s
  - after:   ~5 s
- submitting/waiting for 1K jobs:
  - before: ~76 s
  - after:   ~8 s

This is with a batch of 25 jobs. With a batch of 50 jobs, it goes from
8s to 12s. I think that choosing the 'best' job for nice output only
matters with a small number of jobs, and that for more than that
people will not actually watch the jobs. So changing from 'perfect
job' to 'best job in the first 25' should be OK.

Note that most jobs won't execute as fast as 0 delay, but this is
still a good improvement.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoMerge branch 'devel-2.5'
Andrea Spadaccini [Fri, 30 Sep 2011 15:05:52 +0000 (16:05 +0100)]
Merge branch 'devel-2.5'

* devel-2.5:
  Use --yes to deactivate master ip in cluster merge
  Use deactivate-master-ip in cluster-merge
  Add gnt-cluster commands to toggle the master IP
  Split starting and stopping master IP and daemons
  listrunner: Don't pass arguments if there are none
  ssh: Quote strings in error message
  utils.log: Write error messages to stderr
  Add signal handling doc to hbal man page
  Migration: warn the user about hv version mismatch
  Fix handling of cluster verify hooks
  Redistribute the RAPI certificate
  QA: Add tests for instance start/stop via RAPI
  RAPI: Fix wrong check on instance shutdown
  baserlib: Accept empty body in FillOpcode

Conflicts:
lib/backend.py
   - no real conflicts
lib/constants.py
   - preserve both changes
lib/rapi/rlib2.py
   - keep master
lib/rpc.py
   - no real conflicts
tools/cluster-merge
   - keep devel-2.5

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoMerge branch 'stable-2.5' into devel-2.5
Andrea Spadaccini [Fri, 30 Sep 2011 14:25:48 +0000 (15:25 +0100)]
Merge branch 'stable-2.5' into devel-2.5

* stable-2.5:
  listrunner: Don't pass arguments if there are none
  ssh: Quote strings in error message
  utils.log: Write error messages to stderr
  Add signal handling doc to hbal man page
  Fix handling of cluster verify hooks
  Redistribute the RAPI certificate
  QA: Add tests for instance start/stop via RAPI
  RAPI: Fix wrong check on instance shutdown
  baserlib: Accept empty body in FillOpcode

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoUse --yes to deactivate master ip in cluster merge
Guido Trotter [Fri, 30 Sep 2011 12:44:46 +0000 (13:44 +0100)]
Use --yes to deactivate master ip in cluster merge

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

12 years agoUse deactivate-master-ip in cluster-merge
Andrea Spadaccini [Thu, 29 Sep 2011 19:22:22 +0000 (20:22 +0100)]
Use deactivate-master-ip in cluster-merge

Use the gnt-cluster deactivate-master-ip command in cluster-merge to
disable the master IP.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit e87e5afb163a2a51e783914f78120406f6b5f4e0)

12 years agoAdd gnt-cluster commands to toggle the master IP
Andrea Spadaccini [Thu, 29 Sep 2011 18:48:18 +0000 (19:48 +0100)]
Add gnt-cluster commands to toggle the master IP

lib/client/gnt_cluster.py:
* Add activate-master-ip and deactivate-master-ip commands

man/gnt-cluster.rst:
* Document the new commands

lib/opcodes.py lib/cmdlib.py
* Add two opcodes and the LU that call the relevant RPCs

test/docs_unittest.py
* Silence an error about RAPI not implemented for the two new opcodes

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit fb926117f623a61776b90e992126f953679ca066)

Conflicts:

test/docs_unittest.py
  - kept devel-2.5 version, without the RAPI opcode checks

12 years agoSplit starting and stopping master IP and daemons
Andrea Spadaccini [Thu, 29 Sep 2011 12:05:23 +0000 (13:05 +0100)]
Split starting and stopping master IP and daemons

lib/backend.py
* split StartMaster() in ActivateMasterIp() and StartMasterDaemons()
* split StopMaster() in DeactivateMasterIp() and StopMasterDaemons()

lib/server/noded.py, lib/rpc.py
* adapt the call chains to the new functions, define new RPCs

lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py
* use the new RPCs

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit fb460cf7e9bda225e4f1c070cd6b4fac1b3f6696)

12 years agoUse deactivate-master-ip in cluster-merge
Andrea Spadaccini [Thu, 29 Sep 2011 19:22:22 +0000 (20:22 +0100)]
Use deactivate-master-ip in cluster-merge

Use the gnt-cluster deactivate-master-ip command in cluster-merge to
disable the master IP.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoAdd gnt-cluster commands to toggle the master IP
Andrea Spadaccini [Thu, 29 Sep 2011 18:48:18 +0000 (19:48 +0100)]
Add gnt-cluster commands to toggle the master IP

lib/client/gnt_cluster.py:
* Add activate-master-ip and deactivate-master-ip commands

man/gnt-cluster.rst:
* Document the new commands

lib/opcodes.py lib/cmdlib.py
* Add two opcodes and the LU that call the relevant RPCs

test/docs_unittest.py
* Silence an error about RAPI not implemented for the two new opcodes

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agoSplit starting and stopping master IP and daemons
Andrea Spadaccini [Thu, 29 Sep 2011 12:05:23 +0000 (13:05 +0100)]
Split starting and stopping master IP and daemons

lib/backend.py
* split StartMaster() in ActivateMasterIp() and StartMasterDaemons()
* split StopMaster() in DeactivateMasterIp() and StopMasterDaemons()

lib/server/noded.py, lib/rpc.py
* adapt the call chains to the new functions, define new RPCs

lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py
* use the new RPCs

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

12 years agolistrunner: Don't pass arguments if there are none
Michael Hanselmann [Fri, 30 Sep 2011 09:54:20 +0000 (11:54 +0200)]
listrunner: Don't pass arguments if there are none

If no arguments were specified the “exec_args” variable was “None”,
leading to the command being run as “… ./… None”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agossh: Quote strings in error message
Michael Hanselmann [Fri, 30 Sep 2011 09:29:50 +0000 (11:29 +0200)]
ssh: Quote strings in error message

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

12 years agoutils.log: Write error messages to stderr
Michael Hanselmann [Fri, 30 Sep 2011 09:28:59 +0000 (11:28 +0200)]
utils.log: Write error messages to stderr

When “gnt-cluster copyfile” failed it would only print “Copy of file …
to node … failed”. A detailed message is written using logging.error.
Writing error messages to stderr can be helpful in figuring out what
went wrong (the messages also go to the log file, but not everyone might
know about it).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agoAdd signal handling doc to hbal man page
Iustin Pop [Fri, 30 Sep 2011 08:30:44 +0000 (10:30 +0200)]
Add signal handling doc to hbal man page

Also remove a bug note, since hbal can now for a long time directly
execute jobs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdapt non-KVM hypervisors to new migration RPCs
Andrea Spadaccini [Fri, 23 Sep 2011 13:04:05 +0000 (14:04 +0100)]
Adapt non-KVM hypervisors to new migration RPCs

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd memory transfer progress info to migration
Andrea Spadaccini [Fri, 23 Sep 2011 12:51:48 +0000 (13:51 +0100)]
Add memory transfer progress info to migration

* hypervisor/hv_kvm.py
  - parse the memory transfer status

* cmdlib.py
  - represent memory transfer info, if available

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoMake migration RPC non-blocking
Andrea Spadaccini [Thu, 22 Sep 2011 18:19:07 +0000 (19:19 +0100)]
Make migration RPC non-blocking

To add status reporting for the KVM migration, the instance_migrate RPC
must be non-blocking. Moreover, there must be a way to represent the
migration status and a way to fetch it.

* constants.py:
  - add constants representing the migration statuses

* objects.py:
  - add the MigrationStatus object

* hypervisor/hv_base.py
  - change the FinalizeMigration method name to FinalizeMigrationDst
  - add the FinalizeMigrationSource method
  - add the GetMigrationStatus method

* hypervisor/hv_kvm.py
  - change the implementation of MigrateInstance to be non-blocking
    (i.e. do not poll the status of the migration)
  - implement the new methods defined in BaseHypervisor

* backend.py, server/noded.py, rpc.py
  - add methods to call the new hypervisor methods
  - fix documentation of the existing methods to reflect the changes

* cmdlib.py
  - adapt the logic of TLMigrateInstance._ExecMigration to reflect
    the changes

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoMove _TimeoutExpired to utils
Andrea Spadaccini [Wed, 28 Sep 2011 14:56:22 +0000 (15:56 +0100)]
Move _TimeoutExpired to utils

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

12 years agoAdd an allocation limit to hspace
Iustin Pop [Fri, 23 Sep 2011 07:55:05 +0000 (16:55 +0900)]
Add an allocation limit to hspace

This is very useful for testing/benchmarking.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoSmall simplification in tryAlloc
Iustin Pop [Fri, 23 Sep 2011 07:32:58 +0000 (16:32 +0900)]
Small simplification in tryAlloc

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoChange how node pairs are generated/used
Iustin Pop [Fri, 23 Sep 2011 06:33:31 +0000 (15:33 +0900)]
Change how node pairs are generated/used

Currently, the node pairs used for allocation are a simple [(primary,
secondary)] list of tuples, as this is how they were used before the
previous patch. However, for that patch, we use them separately per
primary node, and we have to unpack this list right after generation.

Therefore it makes sense to directly generate the list in the correct
form, and remove the split from tryAlloc. This should not be slower
than the previous patch, at least, possibly even faster.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12 years agoParallelise instance allocation/capacity computation
Iustin Pop [Fri, 23 Sep 2011 05:53:35 +0000 (14:53 +0900)]
Parallelise instance allocation/capacity computation

This patch finally enables parallelisation in instance placement.

My original try for enabling this didn't work well, but it took a
while (and liberal use of threadscope) to understand why. The attempt
was to simply `parMap rwhnf` over allocateOnPair, however this is not
good as for a 100-node cluster, this will create roughly 100*100
sparks, which is way too much: each individual spark is too small, and
there are too many sparks. Furthermore, the combining of the
allocateOnPair results was done single-threaded, losing even more
parallelism. So we had O(n²) sparks to run in parallel, each spark of
size O(1), and we combine single-threadedly a list of O(n²) length.

The new algorithm does a two-stage process: we group the list of valid
pairs per primary node, relying on the fact that usually the secondary
nodes are somewhat balanced (it's definitely true for 'blank' cluster
computations). We then run in parallel over all primary nodes, doing
both the individual allocateOnPair calls *and* the concatAllocs
summarisation. This leaves only the summing of the primary group
results together for the main execution thread. The new numbers are:
O(n) sparks, each of size O(n), and we combine single-threadedly a
list of O(n) length.

This translates directly into a reasonable speedup (relative numbers
for allocation of 3 instances on a 120-node cluster):

- original code (non-threaded): 1.00 (baseline)
- first attempt (2 threads):    0.81 (20% slowdown‼)
- new code (non-threaded):      1.00 (no slowdown)
- new code (threaded/1 thread): 1.00
- new code (2 threads):         1.65 (65% faster)

We don't get a 2x speedup, because the GC time increases. Fortunately
the code should scale well to more cores, so on many-core machines we
should get a nice overall speedup. On a different machine with 4
cores, we get 3.29x.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>