Andrea Spadaccini [Mon, 3 Oct 2011 14:23:55 +0000 (15:23 +0100)]
Add cluster netmask parameter
Add the master_netmask cluster parameter, that represents the netmask of
the master IP, encoded as a CIDR suffix.
This parameter can be set via the --master-netmask of gnt-cluster init
and gnt-cluster modify. The default behaviour is to be consistent with
the old default (/32 for IPv4 and /128 for IPv6).
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Andrea Spadaccini [Mon, 3 Oct 2011 14:22:10 +0000 (15:22 +0100)]
Add ValidateNetmask and GetClass IPAddress methods
Add the following methods to netutils.IPAddress:
* ValidateNetmask
* GetClassFromIpVersion
* GetClassFromIpFamily
Also, add related tests to the test suite.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Andrea Spadaccini [Tue, 4 Oct 2011 18:34:27 +0000 (19:34 +0100)]
Merge branch 'devel-2.5'
* devel-2.5:
cluster-merge: log an info message at node readd
Bump version to 2.5.0~rc1
Fix issue when verifying cluster files
Revert "utils.log: Write error messages to stderr"
Fix adding nodes after commit
64c7b3831dc
LUClusterVerifyGroup: Spread SSH checks over more nodes
Optimise cli.JobExecutor with many pending jobs
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Andrea Spadaccini [Tue, 4 Oct 2011 18:31:46 +0000 (19:31 +0100)]
Merge branch 'stable-2.5' into devel-2.5
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Guido Trotter [Tue, 4 Oct 2011 18:02:39 +0000 (14:02 -0400)]
cluster-merge: log an info message at node readd
node readd can take a long time, it's good to have info messages to see
progress.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
Michael Hanselmann [Tue, 4 Oct 2011 09:29:34 +0000 (11:29 +0200)]
Bump version to 2.5.0~rc1
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 3 Oct 2011 16:25:24 +0000 (18:25 +0200)]
Fix Makefile rules for QCHelper.hs
Include QCHelper.hs in the distributed files, and also exclude it and
the THH.hs file from coverage reports.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
Michael Hanselmann [Mon, 3 Oct 2011 14:58:22 +0000 (16:58 +0200)]
Fix issue when verifying cluster files
If a cluster has any non-master-candidate nodes, those don't contain all
files (e.g. config.data). With commit
aef59ae764dc (March 31st, 2011)
the logic was changed and subsequently verifying a cluster with non-mc
nodes would complain.
This patch fixes this issue by changing the algorithm. It also adds an
additional check for files which shouldn't exist on a machine. A newly
added unittest is included.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 3 Oct 2011 10:46:27 +0000 (12:46 +0200)]
Revert "utils.log: Write error messages to stderr"
This reverts commit
34aa8b7c4bb6f5e2e788108e024c9cd70bdb3431. Writing
error messages to stderr would also include backtraces, something we
tried to avoid in the past.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 3 Oct 2011 10:04:09 +0000 (12:04 +0200)]
Fix adding nodes after commit
64c7b3831dc
Commit
64c7b3831dc changed the RPC call for verifying SSH connections.
Unfortunately this case in adding nodes was missed.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 30 Sep 2011 07:46:49 +0000 (09:46 +0200)]
Some TH simplifications
Now that the basic code works, let's use some aliases for simpler code
and less ))))))))).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Wed, 21 Sep 2011 09:43:57 +0000 (18:43 +0900)]
A few minor test improvements
This patch adds a few niceties to the test suite:
- allows matching test groups case insensitive and emit warnings when
we give test group names that don't match anything
- add a new operator that is similar to assertEqual in Python: it
tests for equality and emits the two values in case of error
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Tue, 20 Sep 2011 08:08:55 +0000 (17:08 +0900)]
Use TemplateHaskell to decorate tests with names
This makes error message change from "Test 4 failed …" to "Test
prop_Loader_mergeData failed", which is much more readable. It also
removes the duplication of test suite names in the test.hs file.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Wed, 21 Sep 2011 09:23:04 +0000 (18:23 +0900)]
Use TemplateHaskell to generate opcode serialisation
This replaces the hand-coded opcode serialisation code with
auto-generation based on TemplateHaskell.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Tue, 20 Sep 2011 07:17:53 +0000 (16:17 +0900)]
Use TemplateHaskell to build the opID function
This replaces the hand-coded opID with one automatically generated
from the constructor names, similar to the way Python does it, except
it's done at compilation time as opposed to runtime.
Again, the code line delta does not favour this patch, but this
eliminates error-prone, manual code with auto-generated one; in case
we add more opcode support, this will help a lot.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Tue, 20 Sep 2011 06:43:08 +0000 (15:43 +0900)]
Use TemplateHaskell instead of hand-coded instances
This patch replaces the current hard-coded JSON instances (all alike,
just manual conversion to/from string) with auto-generated code based
on Template Haskell
(http://www.haskell.org/haskellwiki/Template_Haskell).
The reduction in code line is not big, as the helper module is well
documented and thus overall we gain about 70 code lines; however, if
we ignore comments we're in good shape, and any future addition of
such data types will be much simpler and less error-prone.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Tue, 20 Sep 2011 04:53:45 +0000 (13:53 +0900)]
Rename some helper functions for consistency
This changes the names for some helper functions so that future
patches are touching less unrelated code. The change replaces
shortened prefixes with the full type name.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Tue, 20 Sep 2011 14:58:36 +0000 (23:58 +0900)]
Split part of Utils.hs into JSON.hs
Utils is a bit big, let's split the JSON stuff (not all of it) into a
separate module that doesn't have any other dependencies.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Michael Hanselmann [Fri, 30 Sep 2011 15:48:28 +0000 (17:48 +0200)]
LUClusterVerifyGroup: Spread SSH checks over more nodes
When verifying a group the code would always check SSH to all nodes in
the same group, as well as the first node for every other group. On big
clusters this can cause issues since many nodes will try to connect to
the first node of another group at the same time. This patch changes the
algorithm to choose a different node every time.
A unittest for the selection algorithm is included.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 30 Sep 2011 14:35:29 +0000 (16:35 +0200)]
Optimise cli.JobExecutor with many pending jobs
In the case we submit many pending jobs (> 100) to the masterd, the
JobExecutor 'spams' the master daemon with status requests for the
status of all the jobs, even though in the end it will only choose a
single job for polling.
This is very sub-optimal, because when the master is busy processing
small/fast jobs, this query forces reading all the jobs from
this. Restricting the 'window' of jobs that we query from the entire
set to a smaller subset makes a huge difference (masterd only, 0s
delay jobs, all jobs to tmpfs thus no I/O involved):
- submitting/waiting for 500 jobs:
- before: ~21 s
- after: ~5 s
- submitting/waiting for 1K jobs:
- before: ~76 s
- after: ~8 s
This is with a batch of 25 jobs. With a batch of 50 jobs, it goes from
8s to 12s. I think that choosing the 'best' job for nice output only
matters with a small number of jobs, and that for more than that
people will not actually watch the jobs. So changing from 'perfect
job' to 'best job in the first 25' should be OK.
Note that most jobs won't execute as fast as 0 delay, but this is
still a good improvement.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Fri, 30 Sep 2011 15:05:52 +0000 (16:05 +0100)]
Merge branch 'devel-2.5'
* devel-2.5:
Use --yes to deactivate master ip in cluster merge
Use deactivate-master-ip in cluster-merge
Add gnt-cluster commands to toggle the master IP
Split starting and stopping master IP and daemons
listrunner: Don't pass arguments if there are none
ssh: Quote strings in error message
utils.log: Write error messages to stderr
Add signal handling doc to hbal man page
Migration: warn the user about hv version mismatch
Fix handling of cluster verify hooks
Redistribute the RAPI certificate
QA: Add tests for instance start/stop via RAPI
RAPI: Fix wrong check on instance shutdown
baserlib: Accept empty body in FillOpcode
Conflicts:
lib/backend.py
- no real conflicts
lib/constants.py
- preserve both changes
lib/rapi/rlib2.py
- keep master
lib/rpc.py
- no real conflicts
tools/cluster-merge
- keep devel-2.5
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Andrea Spadaccini [Fri, 30 Sep 2011 14:25:48 +0000 (15:25 +0100)]
Merge branch 'stable-2.5' into devel-2.5
* stable-2.5:
listrunner: Don't pass arguments if there are none
ssh: Quote strings in error message
utils.log: Write error messages to stderr
Add signal handling doc to hbal man page
Fix handling of cluster verify hooks
Redistribute the RAPI certificate
QA: Add tests for instance start/stop via RAPI
RAPI: Fix wrong check on instance shutdown
baserlib: Accept empty body in FillOpcode
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Fri, 30 Sep 2011 12:44:46 +0000 (13:44 +0100)]
Use --yes to deactivate master ip in cluster merge
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
Andrea Spadaccini [Thu, 29 Sep 2011 19:22:22 +0000 (20:22 +0100)]
Use deactivate-master-ip in cluster-merge
Use the gnt-cluster deactivate-master-ip command in cluster-merge to
disable the master IP.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
e87e5afb163a2a51e783914f78120406f6b5f4e0)
Andrea Spadaccini [Thu, 29 Sep 2011 18:48:18 +0000 (19:48 +0100)]
Add gnt-cluster commands to toggle the master IP
lib/client/gnt_cluster.py:
* Add activate-master-ip and deactivate-master-ip commands
man/gnt-cluster.rst:
* Document the new commands
lib/opcodes.py lib/cmdlib.py
* Add two opcodes and the LU that call the relevant RPCs
test/docs_unittest.py
* Silence an error about RAPI not implemented for the two new opcodes
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
fb926117f623a61776b90e992126f953679ca066)
Conflicts:
test/docs_unittest.py
- kept devel-2.5 version, without the RAPI opcode checks
Andrea Spadaccini [Thu, 29 Sep 2011 12:05:23 +0000 (13:05 +0100)]
Split starting and stopping master IP and daemons
lib/backend.py
* split StartMaster() in ActivateMasterIp() and StartMasterDaemons()
* split StopMaster() in DeactivateMasterIp() and StopMasterDaemons()
lib/server/noded.py, lib/rpc.py
* adapt the call chains to the new functions, define new RPCs
lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py
* use the new RPCs
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
fb460cf7e9bda225e4f1c070cd6b4fac1b3f6696)
Andrea Spadaccini [Thu, 29 Sep 2011 19:22:22 +0000 (20:22 +0100)]
Use deactivate-master-ip in cluster-merge
Use the gnt-cluster deactivate-master-ip command in cluster-merge to
disable the master IP.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Andrea Spadaccini [Thu, 29 Sep 2011 18:48:18 +0000 (19:48 +0100)]
Add gnt-cluster commands to toggle the master IP
lib/client/gnt_cluster.py:
* Add activate-master-ip and deactivate-master-ip commands
man/gnt-cluster.rst:
* Document the new commands
lib/opcodes.py lib/cmdlib.py
* Add two opcodes and the LU that call the relevant RPCs
test/docs_unittest.py
* Silence an error about RAPI not implemented for the two new opcodes
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Andrea Spadaccini [Thu, 29 Sep 2011 12:05:23 +0000 (13:05 +0100)]
Split starting and stopping master IP and daemons
lib/backend.py
* split StartMaster() in ActivateMasterIp() and StartMasterDaemons()
* split StopMaster() in DeactivateMasterIp() and StopMasterDaemons()
lib/server/noded.py, lib/rpc.py
* adapt the call chains to the new functions, define new RPCs
lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py
* use the new RPCs
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 30 Sep 2011 09:54:20 +0000 (11:54 +0200)]
listrunner: Don't pass arguments if there are none
If no arguments were specified the “exec_args” variable was “None”,
leading to the command being run as “… ./… None”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 30 Sep 2011 09:29:50 +0000 (11:29 +0200)]
ssh: Quote strings in error message
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 30 Sep 2011 09:28:59 +0000 (11:28 +0200)]
utils.log: Write error messages to stderr
When “gnt-cluster copyfile” failed it would only print “Copy of file …
to node … failed”. A detailed message is written using logging.error.
Writing error messages to stderr can be helpful in figuring out what
went wrong (the messages also go to the log file, but not everyone might
know about it).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 30 Sep 2011 08:30:44 +0000 (10:30 +0200)]
Add signal handling doc to hbal man page
Also remove a bug note, since hbal can now for a long time directly
execute jobs.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Fri, 23 Sep 2011 13:04:05 +0000 (14:04 +0100)]
Adapt non-KVM hypervisors to new migration RPCs
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Fri, 23 Sep 2011 12:51:48 +0000 (13:51 +0100)]
Add memory transfer progress info to migration
* hypervisor/hv_kvm.py
- parse the memory transfer status
* cmdlib.py
- represent memory transfer info, if available
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Thu, 22 Sep 2011 18:19:07 +0000 (19:19 +0100)]
Make migration RPC non-blocking
To add status reporting for the KVM migration, the instance_migrate RPC
must be non-blocking. Moreover, there must be a way to represent the
migration status and a way to fetch it.
* constants.py:
- add constants representing the migration statuses
* objects.py:
- add the MigrationStatus object
* hypervisor/hv_base.py
- change the FinalizeMigration method name to FinalizeMigrationDst
- add the FinalizeMigrationSource method
- add the GetMigrationStatus method
* hypervisor/hv_kvm.py
- change the implementation of MigrateInstance to be non-blocking
(i.e. do not poll the status of the migration)
- implement the new methods defined in BaseHypervisor
* backend.py, server/noded.py, rpc.py
- add methods to call the new hypervisor methods
- fix documentation of the existing methods to reflect the changes
* cmdlib.py
- adapt the logic of TLMigrateInstance._ExecMigration to reflect
the changes
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Wed, 28 Sep 2011 14:56:22 +0000 (15:56 +0100)]
Move _TimeoutExpired to utils
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 23 Sep 2011 07:55:05 +0000 (16:55 +0900)]
Add an allocation limit to hspace
This is very useful for testing/benchmarking.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Fri, 23 Sep 2011 07:32:58 +0000 (16:32 +0900)]
Small simplification in tryAlloc
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Fri, 23 Sep 2011 06:33:31 +0000 (15:33 +0900)]
Change how node pairs are generated/used
Currently, the node pairs used for allocation are a simple [(primary,
secondary)] list of tuples, as this is how they were used before the
previous patch. However, for that patch, we use them separately per
primary node, and we have to unpack this list right after generation.
Therefore it makes sense to directly generate the list in the correct
form, and remove the split from tryAlloc. This should not be slower
than the previous patch, at least, possibly even faster.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Fri, 23 Sep 2011 05:53:35 +0000 (14:53 +0900)]
Parallelise instance allocation/capacity computation
This patch finally enables parallelisation in instance placement.
My original try for enabling this didn't work well, but it took a
while (and liberal use of threadscope) to understand why. The attempt
was to simply `parMap rwhnf` over allocateOnPair, however this is not
good as for a 100-node cluster, this will create roughly 100*100
sparks, which is way too much: each individual spark is too small, and
there are too many sparks. Furthermore, the combining of the
allocateOnPair results was done single-threaded, losing even more
parallelism. So we had O(n²) sparks to run in parallel, each spark of
size O(1), and we combine single-threadedly a list of O(n²) length.
The new algorithm does a two-stage process: we group the list of valid
pairs per primary node, relying on the fact that usually the secondary
nodes are somewhat balanced (it's definitely true for 'blank' cluster
computations). We then run in parallel over all primary nodes, doing
both the individual allocateOnPair calls *and* the concatAllocs
summarisation. This leaves only the summing of the primary group
results together for the main execution thread. The new numbers are:
O(n) sparks, each of size O(n), and we combine single-threadedly a
list of O(n) length.
This translates directly into a reasonable speedup (relative numbers
for allocation of 3 instances on a 120-node cluster):
- original code (non-threaded): 1.00 (baseline)
- first attempt (2 threads): 0.81 (20% slowdown‼)
- new code (non-threaded): 1.00 (no slowdown)
- new code (threaded/1 thread): 1.00
- new code (2 threads): 1.65 (65% faster)
We don't get a 2x speedup, because the GC time increases. Fortunately
the code should scale well to more cores, so on many-core machines we
should get a nice overall speedup. On a different machine with 4
cores, we get 3.29x.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Fri, 23 Sep 2011 05:47:05 +0000 (14:47 +0900)]
Abstract comparison of AllocElements
This is moved outside of the concatAllocs as it will be needed in
another place in the future.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Iustin Pop [Fri, 23 Sep 2011 04:23:29 +0000 (13:23 +0900)]
Change type of Cluster.AllocSolution
Originally, this data type was used both by instance allocation (1
result), and by instance relocation (many results, one per
instance). As such, the field 'asSolutions' was a list, and the
various code paths checked whether the length of the list matches the
current mode. This is very ugly, as we can't guarantee this matching
via the type system; hence the FIXME in the code.
However, commit 6804faa removed the instance evacuation code, and thus
we now always use just one allocation solution. Hence we can change
the data type to a simply Maybe type, and get rid of many 'otherwise
barf out' conditions.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Andrea Spadaccini [Tue, 27 Sep 2011 19:22:43 +0000 (20:22 +0100)]
Migration: warn the user about hv version mismatch
* hv_kvm.py, hv_xen.py
- return the hypervisor version (if available) from GetNodeInfo
* cmdlib.py
- if hypervisor version is available during the migration, and the
versions differ, warn the user
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 28 Sep 2011 10:38:12 +0000 (12:38 +0200)]
Fix handling of cluster verify hooks
The change to enforce boolean results for cluster verify group opcode
missed the HooksCallBack, which uses a very ugly 1/0
logic. Furthermore, the logic is wrong, since it unconditionally
resets the verify result to true.
The patch is changed to simply treat hook failures as failures, and do
nothing for offline/nodes.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 27 Sep 2011 14:57:44 +0000 (16:57 +0200)]
http.client: Show pending requests as “owner”
In the context of the lock monitor a “pending” item does not yet own the
requested resource. Since these HTTP requests are already undergoing
they should be shown as owners.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 27 Sep 2011 14:43:49 +0000 (16:43 +0200)]
http.client: Add nice name to requests
With this change a node name instead of the IP address can be shown for
pending RPC requests:
Name Pending
rpc/node18.example.com/test_delay thread:Jq1/Job692/TEST_DELAY
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 27 Sep 2011 14:43:45 +0000 (16:43 +0200)]
rpc/http: Show pending RPC requests in lock monitor
Not all requests use an instance of RpcRunner yet and therefore won't
show up (only instances have access to the global Ganeti context).
Currently only the IP address is accessible. Another patch will add a
nicer name for requests.
Example output (gnt-debug locks -o name,pending):
Name Pending
rpc/192.0.2.18/test_delay thread:Jq12/Job683/TEST_DELAY
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 27 Sep 2011 13:15:41 +0000 (15:15 +0200)]
http.client: Factorize code interacting with cURL
This simplifies HttpClientPool.ProcessRequests significantly and will be
handy for showing pending RPC requests in the lock monitor.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Wed, 28 Sep 2011 09:06:06 +0000 (11:06 +0200)]
Redistribute the RAPI certificate
This reverts to the old behaviour in Ganeti 2.4 and before.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Tue, 27 Sep 2011 13:22:36 +0000 (15:22 +0200)]
Adding qemu-img dependency to INSTALL
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 14 Sep 2011 15:17:54 +0000 (17:17 +0200)]
http.client: Reduce performance impact by assertion
Call dict.values once instead of N times.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 26 Sep 2011 14:07:12 +0000 (16:07 +0200)]
rpc: Overhaul client structure
- Clearly separate node name to IP address resolution into separate
functions
- Simplified code structure (one code path instead of several)
- Fully unittested
- Preparation for more RPC improvements
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 26 Sep 2011 10:11:58 +0000 (12:11 +0200)]
rpc: Make compression function module-global
No need to keep it in the class.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 26 Sep 2011 10:05:55 +0000 (12:05 +0200)]
Keep only one global RPC runner in Ganeti context
Instead of having one RPC runner per mcpu processor this will keep only
one instance as part of the masterd-wide Ganeti context. Upcoming
patches will change the RPC runner to report pending requests to the
lock manager.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Agata Murawska [Mon, 26 Sep 2011 15:00:23 +0000 (17:00 +0200)]
Update INSTALL with ovfconverter requirements
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Thu, 22 Sep 2011 12:17:25 +0000 (14:17 +0200)]
TemporaryFilesManager implementation
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Mon, 12 Sep 2011 11:40:37 +0000 (13:40 +0200)]
Export: unittests
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Mon, 12 Sep 2011 11:40:01 +0000 (13:40 +0200)]
Export: documentation
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Mon, 12 Sep 2011 11:40:24 +0000 (13:40 +0200)]
Export: saving data to ovf file
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Mon, 12 Sep 2011 11:17:28 +0000 (13:17 +0200)]
Export: parsing data from config file
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Mon, 12 Sep 2011 11:17:02 +0000 (13:17 +0200)]
Export: initial commit - manifest, ova creation etc
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Mon, 12 Sep 2011 08:47:44 +0000 (10:47 +0200)]
Import: unittests
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Mon, 12 Sep 2011 08:46:53 +0000 (10:46 +0200)]
Import: backend, hypervisor and os
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Mon, 12 Sep 2011 08:13:30 +0000 (10:13 +0200)]
Import: networks
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Mon, 12 Sep 2011 08:34:06 +0000 (10:34 +0200)]
Import: disk conversion
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Mon, 12 Sep 2011 08:33:42 +0000 (10:33 +0200)]
Import: reading ovf file
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Agata Murawska [Wed, 24 Aug 2011 14:28:41 +0000 (16:28 +0200)]
Initial commit for ovfconverter tool
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 19 Sep 2011 07:36:13 +0000 (16:36 +0900)]
doc: sphinx config file changes
I wanted to just enable another extension (the graphviz one), but then
I went and did a lot of changes:
- replaced ' with " for consistency with our style guide
- imported new settings (commented out) that current python-sphinx
(1.0.7) generates when starting a new project; for the keys that are
different in 0.6 and 1.0+, I left the 0.6 version until we bump our
documented version
- enabled graphviz; needed for a design doc I'm currently working on
- updated copyright years
- changed list style from single-line to multi-line
- added coverage/ dir to exclude_trees
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 19 Sep 2011 07:33:21 +0000 (16:33 +0900)]
doc: re-wrap design-oob to 72 chars
I started with just adding some :term:`SoW` and similar to design-oob,
but then I realised this was 80-chars wrapped, not 72-chars. So I went
and re-wrapped most of it, plus adding the glossary references.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 19 Sep 2011 07:32:35 +0000 (16:32 +0900)]
doc: glossary improvements
These will be used to remove some inline definitions and replace them
with :term:`foo`.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Mon, 26 Sep 2011 09:47:30 +0000 (11:47 +0200)]
serializer: Add comment about simplejson vs. built-in json
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 26 Sep 2011 09:19:16 +0000 (11:19 +0200)]
Revert "Fail if dictionary uses invalid keys" and "Support newer “json” module"
This reverts commit
fd0351aef246f5d36e641209429e2ec093d325f8 and
9869e771704ada62bab001e729c52a36525ef081. The built-in module is a lot
slower in Python 2.6.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 23 Sep 2011 14:26:10 +0000 (16:26 +0200)]
serializer: Fail if dictionary uses invalid keys
JSON only supports a very restricted set of types for dictionary keys,
among them strings, booleans and “null”. Integers and floats are
converted to strings. Since this can cause a lot of confusion in Python,
this check raises an exception if a caller tries to use such types.
Since the pre-Python 2.6 “simplejson” module doesn't support overriding
the function where the conversion takes place this check can only be
done for the newer “json” module.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 23 Sep 2011 14:23:59 +0000 (16:23 +0200)]
serializer: Support newer “json” module
This module is included from Python 2.6 and is based on
simplejson.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Tue, 20 Sep 2011 08:29:20 +0000 (17:29 +0900)]
htools: man page improvements
This patch moves all the backend options into the main htools man
page, and it adds documentation for the -t option, which so far was
not documented w.r.t. the file structure.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 20 Sep 2011 08:29:04 +0000 (17:29 +0900)]
hspace: add short forms for the group policy
This adds a shortened versions of the allocation policies, as writing
out the whole name in the command line can become tedious.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Andrea Spadaccini [Wed, 21 Sep 2011 13:38:57 +0000 (14:38 +0100)]
Fix interaction between CPU pinning and KVM migration
CPU pinning requires the KVM hypervisor to start in the paused state, in
order to retrieve information, and immediately unpauses it.
This does not play well with live migration, as the unpausing was done
before the migration started and so the receiving kvm process left the
migrated instance in the stopped status.
This patch fixes this behavior, by not launching the KVM process in
stopped state while on the receiving side of a migration.
Also, the stopping is now done outside _ExecuteCpuAffinity.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Tsachy Shacham <tsachy@google.com>
Michael Hanselmann [Thu, 22 Sep 2011 10:20:39 +0000 (12:20 +0200)]
QA: Add tests for instance start/stop via RAPI
This would have detected the issue fixed in the previous patch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Thu, 22 Sep 2011 10:19:56 +0000 (12:19 +0200)]
RAPI: Fix wrong check on instance shutdown
Commit
7fa310f6d84 (April 1st, 2011) converted the RAPI resource for
shutting down an instance to FillOpCode. Unfortunately it missed the
fact that the shutdown resource gets its parameters as query arguments.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Thu, 8 Sep 2011 11:36:15 +0000 (13:36 +0200)]
baserlib: Accept empty body in FillOpcode
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
(cherry picked from commit
c6e1a3eef05674d637570c39f25a799cec7ba187)
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 20 Sep 2011 08:28:49 +0000 (17:28 +0900)]
htools: add a MonadPlus instance for Result
This will be used to implement more easily 'choice' parsing of input
data, without resorting to syntax (case … of Bad _ -> …).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>
Andrea Spadaccini [Tue, 20 Sep 2011 15:50:00 +0000 (16:50 +0100)]
Merge branch 'devel-2.5'
* devel-2.5:
Add tls_ciphers and use_vdagent options
Updated man pages with new SPICE TLS options
Implementation of TLS-protected SPICE connections
Added SPICE TLS option and related cert paths
Fix OS creation's error handling when pausing sync
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 16 Sep 2011 09:38:15 +0000 (11:38 +0200)]
RAPI: Add resource to powercycle node
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Andrea Spadaccini [Wed, 14 Sep 2011 22:00:43 +0000 (23:00 +0100)]
Add tls_ciphers and use_vdagent options
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Tue, 6 Sep 2011 17:37:46 +0000 (18:37 +0100)]
Updated man pages with new SPICE TLS options
man/gnt-cluster.rst:
* documented the --new-spice-certificate, --spice-certificate and
--spice-ca-certificate options of renew-crypto.
man/gnt-instance.rst:
* documented the spice_use_tls KVM hypervisor option.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Tue, 6 Sep 2011 17:14:51 +0000 (18:14 +0100)]
Implementation of TLS-protected SPICE connections
Added support for TLS-protected SPICE connections:
client/gnt_cluster.py, cli.py:
* added three new parameters to renew-crypto (--new-spice-certificate,
--spice-certificate, --spice-ca-certificate) and their validation.
utils/x509.py:
* changed GenerateSelfSignedSslCert so that now also returns the
generated key and certificate;
* added missing return value in the docstring of
GenerateSelfSignedX509Cert.
lib/bootstrap.py:
* changed the signatures of the relevant functions and implemented
certificates generation/writing.
tools/cfupgrade:
* changed GenerateClusterCrypto invocation to reflect the new signature;
* added SPICE certificate names.
lib/errors.py:
* added the X509CertError class.
lib/hypervisor/hv_kvm.py:
* silenced pylint warning R0915
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Tue, 6 Sep 2011 09:26:56 +0000 (10:26 +0100)]
Added SPICE TLS option and related cert paths
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Faidon Liambotis [Fri, 16 Sep 2011 12:58:41 +0000 (15:58 +0300)]
Fix OS creation's error handling when pausing sync
Commit 41e1e79 introduced a feature in which when wait_for_sync is not
set, DRBD sync is paused during the OS installation.
Doing so, however, broke OS creation's error handling: the result value
from the instance_os_add RPC call was overwritten by the one of the
blockdev_pause_resume_sync call before there was a chance for it to
be raised and thus masking possible errors in the OS creation.
Note that the wipe method, from which the pause technique was inspired,
does not suffer from this bug.
Signed-off-by: Faidon Liambotis <faidon@noc.grnet.gr>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Tue, 20 Sep 2011 09:28:01 +0000 (10:28 +0100)]
Fix two pylint errors
- hv_kvm.py: silence F0401, that is raised if pylint does not find the
affinity module
- rlib2.py: change disable-msg to disable
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Andrea Spadaccini [Mon, 19 Sep 2011 13:32:34 +0000 (14:32 +0100)]
Fix backend.MigrateInstance docs
The MigrateInstance function does not return anything, so the relevant
lines are removed from the documentation. Instead, the raised exception
is documented.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Tsachy Shacham [Fri, 16 Sep 2011 10:04:19 +0000 (12:04 +0200)]
hv_kvm: bugfix
Signed-off-by: Tsachy Shacham <tsachy@google.com>
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Agata Murawska [Thu, 15 Sep 2011 08:39:49 +0000 (10:39 +0200)]
Import: further doc updates
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Thu, 15 Sep 2011 15:41:59 +0000 (17:41 +0200)]
RAPI: Add resource to recreate instance's disks
This was still missing from RAPI.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 15 Sep 2011 11:16:53 +0000 (13:16 +0200)]
Adding an updated design doc for the caching mechanism
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Tsachy Shacham [Thu, 15 Sep 2011 08:58:44 +0000 (10:58 +0200)]
hv_xen: fix use of CPU pinning constants
… to be consistent with hv_kvm
Signed-off-by: Tsachy Shacham <tsachy@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Tsachy Shacham [Thu, 15 Sep 2011 08:49:51 +0000 (10:49 +0200)]
hv_kvm: fix hardcoded KVM command string
Signed-off-by: Tsachy Shacham <tsachy@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Tsachy Shacham [Wed, 14 Sep 2011 15:48:01 +0000 (17:48 +0200)]
hv_kvm: support for CPU pinning
Signed-off-by: Tsachy Shacham <tsachy@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
[iustin@google.com: fixed some small code and style issues]
Reviewed-by: Iustin Pop <iustin@google.com>
Tsachy Shacham [Thu, 15 Sep 2011 08:32:34 +0000 (10:32 +0200)]
constants: support for CPU pinning under KVM
Signed-off-by: Tsachy Shacham <tsachy@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Thu, 15 Sep 2011 13:10:00 +0000 (14:10 +0100)]
Merge branch 'devel-2.5'
* devel-2.5: (33 commits)
htools: remove dead code
hail: don't select the primary as new secondary
hail: add an extra safety check in relocate
Fix RAPI documentation for gnt-instance console
Add SPICE compression and streaming options
Add SPICE support to gnt-instance console
Make KVM use the QXL vga driver with SPICE
Use a loop to check SPICE parameters dependency
import: Fix a logic error due to missing "not"
import: Make sure the disk_dump path is in EXPORT_DIR
Switch other commonprefix to IsBelowDir
utils: Introduce IsBelowDir
Fixed a typo in gnt_cluster.py
Added password for SPICE sessions
Draft implementation of QMP connection
Pylint fixes for autogenerated files
Version bump for 2.5.0~beta3
Makefile: Use $(LN_S) instead of “ln -s”
Fixes to errors/warnings raised by pylint 0.24
PEP8 for QA
...
Conflicts:
Makefile.am
- preserve both changes
lib/rapi/rlib2.py
- keep master version
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>