Iustin Pop [Wed, 1 Dec 2010 13:58:27 +0000 (13:58 +0000)]
Cleanup AllocSolution after AllocElement changes
Since we added the score to AllocElement, we don't need to wrap
AllocElement in yet another tuple, just to attach the cluster score. So
we simplify the AllocSolution type.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Tue, 30 Nov 2010 23:53:57 +0000 (23:53 +0000)]
AllocElement: extend with the cluster score
AllocElement, a type used as a result of allocations, holds the status
of the nodes after the allocation. In most cases, we'll compare this
allocation result with others, to see which allocation decision makes
the most sense. This comparison is done via the cluster score.
However, if we later need to redo this computation, as part of other
comparisons, we'd need to evaluate it again, etc. So it's easier to just
compute the score at the place where we compute the node list in the
initial step.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Wed, 1 Dec 2010 00:10:57 +0000 (00:10 +0000)]
Add two utility functions for the Result type
Actually, this just moves the functions from the QC module to Types, and
removes a duplicate entry from Cluster.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Tue, 30 Nov 2010 17:35:21 +0000 (17:35 +0000)]
Rework the types used during data loading
This improves on the previous change. Currently, the node and instance
lists shipped around during data loading are (again) association lists.
For instances it's not a big issue, but the node list is rewritten
continuously while we assign instances to nodes, and that is very slow.
The code was originally written for small (10-20 node) clusters, and
today with multinodes… :)
Rewriting to use Node.List/Instance.List makes a bit of a messy patch,
but it allows to remove some custom functions for assoc. list
processing, and also some custom unittests.
At the end, the load time is almost halved, and we spend time now just
in the string parsing code (which is, as we know, slow…).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Tue, 30 Nov 2010 16:31:31 +0000 (16:31 +0000)]
Loader functions: move from assoc lists to maps
When loading big clusters, the association lists become a bit slow, so
we'll replace this with a simple Map String Int; the change is trivial
and can be reverted easily, while it brings up a good speedup in the
data loading.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Tue, 30 Nov 2010 16:23:49 +0000 (16:23 +0000)]
Convert some leftovers to NameAssoc
The type alias NameAssoc has been introduced a long time ago, but there
are some few not-yet-converted cases. In preparation for changes to that
type, let's make sure we use it consistently.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Tue, 30 Nov 2010 15:42:14 +0000 (15:42 +0000)]
hbal: implement handling of multi-group clusters
On a single-group cluster, we proceed as before. On multi-group
clusters, we require selection of the desired group (currently via UUID
only).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Tue, 30 Nov 2010 14:53:18 +0000 (14:53 +0000)]
Add Cluster.splitCluster for node groups
This splits a top-level cluster information into the component node
groups. Instance go to the group of their primary node, but otherwise we
don't disallow split instances.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Tue, 30 Nov 2010 12:24:09 +0000 (12:24 +0000)]
Add the man html files to gitignore
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Tue, 30 Nov 2010 12:20:07 +0000 (12:20 +0000)]
Rework Container.hs and improve test coverage
Since some of the functions we export from Container.hs are 1:1
identical to IntMap, we can just export the originals and remove the
wrappers. This reduces the code we need to unittest.
Furthermore, we add two simple unittest for the two non-trivial
functions that we do have in Container.hs.
And finally, we remove the 'remove' function, since it's not used, and
thus bring code coverage very close to 100%.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Mon, 29 Nov 2010 18:37:33 +0000 (18:37 +0000)]
Add new command-line option for group selection
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Mon, 29 Nov 2010 18:17:26 +0000 (18:17 +0000)]
Add two functions for checking cluster consistency
For now, we don't support instances allocated across two groups, and we
will reject such clusters. The isClusterConsistent function will return
a list of inconsistent instances, potentially allowing operation without
touch them (but only the rest).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Mon, 29 Nov 2010 17:15:19 +0000 (17:15 +0000)]
Add function for nodes to (nodgroup, nodes) split
Unittests included. The function will be needed for consistency checks
in the algorithms.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Mon, 29 Nov 2010 16:52:34 +0000 (16:52 +0000)]
Add a type alias for UUIDs
This is to pottentially allow easier changes later.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Sun, 28 Nov 2010 17:02:24 +0000 (17:02 +0000)]
Also build HTML versions of man pages
Iustin Pop [Mon, 22 Nov 2010 16:53:37 +0000 (16:53 +0000)]
RAPI: read the group UUID from the server
This depends on future support from Ganeti (2.4+).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Mon, 22 Nov 2010 16:38:59 +0000 (16:38 +0000)]
IAlloc: read group uuid from the input message
This makes the code incompatible with JSON files from Ganeti pre-2.4.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Mon, 22 Nov 2010 15:33:13 +0000 (15:33 +0000)]
Text: read/save the node group UUID
Compatibility with old text files is kept by using the default UUID if
the file (or even some records) don't have a UUID.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Mon, 22 Nov 2010 15:15:20 +0000 (16:15 +0100)]
Luxi: read the node uuid from the cluster
This makes the code incompatible with Ganeti pre-2.4.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Mon, 22 Nov 2010 15:00:21 +0000 (16:00 +0100)]
Node: add the node group's UUID
This is not used anywhere yet, and the backend are all just adding the
default UUID, not the real one.
The patch also allows displaying the group UUID in the node list.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Mon, 22 Nov 2010 14:58:22 +0000 (15:58 +0100)]
Utils: add a default UUID
This will be used as a placeholder for the cases when we need a UUID
(any UUID), but we don't have one handy.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Tue, 23 Nov 2010 14:45:51 +0000 (14:45 +0000)]
Merge branch 'devel-0.2' into master
Iustin Pop [Mon, 22 Nov 2010 18:50:22 +0000 (18:50 +0000)]
Improve the standard deviation computation
This does just two passes, instead of three, over the list. This reduces
the overall runtime well enough (~25%) in some tests, but it's not
reproducible using profiling, so I don't know how much the function
itself is being sped-up.
Note: this is written via `seq`s, and not BangPatterns. Since it's just
one case, adding BangPatterns just for it wasn't a big gain.
Thanks to Lécz Balázs for the impetus to improve this!
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Mon, 22 Nov 2010 17:15:30 +0000 (17:15 +0000)]
hbal: change handling of signal
Currently, hbal does a one-two signal handling, where the first signal
causes graceful termination, and the second one an immediate on (either
SIGINT or SIGTERM can be used, interchangeably). However, this poses a
timing problem: if two programs want to send a graceful termination
request, they cannot do that without careful coordination.
To fix this, we change to code to handle the signal separately: SIGINT
(^C) sends graceful termination, while SIGTERM sends immediate
termination. This should allow easier controlling of hbal.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>
Iustin Pop [Fri, 19 Nov 2010 11:11:37 +0000 (12:11 +0100)]
Simu loader: move the loading to non-IO code
While we don't actually have IO code in the Simu loader, we do have the
same interface. So we move the code again to a separate parseData
function which is exported.
Iustin Pop [Fri, 19 Nov 2010 08:23:57 +0000 (09:23 +0100)]
Luxi loader: split parsing from loading
Iustin Pop [Sat, 31 Jul 2010 02:55:25 +0000 (22:55 -0400)]
Rapi loader: split parsing from loading
The change is similar to the text loader change.
Iustin Pop [Sat, 31 Jul 2010 02:46:37 +0000 (22:46 -0400)]
Text loader: split parsing from loadData
This change, which will be followed by similar changes in the other
loaders, splits the parsing of the data from the actual loading from
disk. Since the parsing doesn't usually involve IO actions, we will be
able to better test the parsing. The loading becomes a smaller part of
the code and thus inability to test it has a smaller impact.
Iustin Pop [Thu, 11 Nov 2010 11:01:34 +0000 (12:01 +0100)]
Ignore nodes which are not vm_capable
This break compatibility with Ganeti pre-2.3.
Iustin Pop [Tue, 9 Nov 2010 07:51:45 +0000 (08:51 +0100)]
Merge branch 'devel-0.2'
* devel-0.2:
Fix tag exclusion weight
Iustin Pop [Tue, 9 Nov 2010 07:11:05 +0000 (08:11 +0100)]
Fix tag exclusion weight
Currently, the tag exclusion metric has a weight of one, which means
there might be cases where we won't move instances around because it
upsets the cluster metrics. However, we do want to make a higher effort
for cleaning up tag collisions, so we increase the weight to an
empirically-determined value of 2.
Iustin Pop [Tue, 26 Oct 2010 09:34:37 +0000 (11:34 +0200)]
Force UTF-8 locale for pandoc invocation
Pandoc 1.5.x uses the locale information to parse its input files (only
1.5, pre and post version use always UTF-8). Hence we need to enforce a
UTF-8 locale for proper parsing of input files.
Iustin Pop [Mon, 25 Oct 2010 15:38:28 +0000 (17:38 +0200)]
Move from hand-written man pages to RST/pandoc
This simplifies the maintenance of the man pages, and unifies the rst-to-*
converter to pandoc.
Iustin Pop [Fri, 10 Sep 2010 15:10:26 +0000 (17:10 +0200)]
Add design for htools/Ganeti 2.3 sync
This is a work in progress, will be modified along with the progress
of Ganeti 2.3.
Iustin Pop [Thu, 7 Oct 2010 13:04:58 +0000 (15:04 +0200)]
Update NEWS file for 0.2.7 release
Iustin Pop [Thu, 7 Oct 2010 12:42:46 +0000 (14:42 +0200)]
Fix some warnings in unittests
Iustin Pop [Wed, 6 Oct 2010 12:57:27 +0000 (14:57 +0200)]
Add a hack for normalized CPU values in hspace
Currently, the key metrics/tiered spec computations show the virtual cpu
count. However, since we do have a maximum ration Vcpu/Pcpu, we can also
show the “normalized” cpu count, i.e. the equivalent physical cpu count
corresponding to the virtual ones.
Iustin Pop [Wed, 6 Oct 2010 12:56:14 +0000 (14:56 +0200)]
Improve the error message for tiered alloc option
Iustin Pop [Wed, 15 Sep 2010 15:30:24 +0000 (17:30 +0200)]
hbal: implement user-friendly termination requests
Currently, hbal will abort immediately when requested (^C, or SIGINT,
etc.). This is not nice, since then the already started jobs need to be
tracked manually.
This patch adds a signal handler for SIGINT and SIGTERM, which will, the
first time, simply record the shutdown request (and hbal will then exit
once all jobs in the current jobset finish), and at the second request,
will cause an immediate exit.
Iustin Pop [Wed, 1 Sep 2010 18:12:29 +0000 (20:12 +0200)]
Document the gain options in hbal's manpage
Iustin Pop [Wed, 1 Sep 2010 18:07:54 +0000 (20:07 +0200)]
Use the mingain options in the balancing algorithm
Also adds them in hbal.
Iustin Pop [Wed, 1 Sep 2010 16:31:54 +0000 (18:31 +0200)]
Add new CLI options for min gain during balancing
Recent hbal seems to run many steps for small improvements (< 1e-3), so
we should stop early in this case.
We add a new option (-g), that will be used for the minimum gain during
balancing. This check will only become active when the cluster score is
below a threshold (--min-gain-limit), so as to not stop rebalances too
early.
Iustin Pop [Thu, 2 Sep 2010 12:45:07 +0000 (14:45 +0200)]
Makefile: make the rst2html converter more strict
This will make the automated builds flag any problems.
Iustin Pop [Mon, 30 Aug 2010 09:10:06 +0000 (11:10 +0200)]
Add some more debugging functions
These are just variations of the standard debug, but are provided for
simpler code, since lazyness is something causing non-computation of
debug statements.
Iustin Pop [Thu, 26 Aug 2010 11:55:47 +0000 (13:55 +0200)]
Fix ReplaceSecondary moves for offline nodes
The addition of a new secondary on a node is doing two memory tests:
- in strict mode, reject if we get into N+1 failure
- reject if the new instance memory is greater than the free memory (not
available memory) on the node
The last check is designed to ensure that, irrespective of the other
secondary instances on this node, we are able to failover/migrate the
newly-added instance.
However, we should allow this, if the instances comes from an offline
node, which doesn't offer anything (not even disk replication).
Therefore this patch makes this check conditional on the strict mode.
Iustin Pop [Thu, 26 Aug 2010 11:55:16 +0000 (13:55 +0200)]
Update NEWS file
Iustin Pop [Thu, 26 Aug 2010 11:51:40 +0000 (13:51 +0200)]
Update man pages for the new -S option
Iustin Pop [Thu, 26 Aug 2010 11:28:59 +0000 (13:28 +0200)]
hspace: mark new instances as running
Otherwise the saved cluster state and the in-memory one are wrong.
Iustin Pop [Thu, 26 Aug 2010 11:07:40 +0000 (13:07 +0200)]
Implement cluster state saving in hspace
This also uncovered a few issues with the allocation model (instances
not being marked up, etc.).
Compared to hbal, hspace will generate either one or two files (for both
the standard and the tiered allocation mode), depending on the input
parameters.
Iustin Pop [Thu, 26 Aug 2010 10:58:38 +0000 (12:58 +0200)]
Change iterateAlloc to return the instance list
The Cluster.iterateAlloc and tieredAlloc functions are changed to also
return the updated instance list, since it is needed to have a “full”
cluster view.
Iustin Pop [Thu, 26 Aug 2010 09:49:19 +0000 (11:49 +0200)]
Implement cluster state saving in hbal
Also move the LUXI execution (-X) to the end, after all the output
messages are printed. No good in waiting for the messages for a long
while, especially as they are not up-to-date stats after the job
execution, just an estimation of what the state will be.
Iustin Pop [Wed, 25 Aug 2010 16:47:22 +0000 (18:47 +0200)]
Abstract the cluster serialization from hscan.hs
This is currently hardcoded in an internal function in hscan.hs, and we
move it to Text.hs for later use.
Iustin Pop [Wed, 25 Aug 2010 16:40:20 +0000 (18:40 +0200)]
Add a new option --save-cluster
This option will in the future be used to serialize the cluster state in
hbal and hspace after the rebalance/allocation steps.
Iustin Pop [Wed, 25 Aug 2010 16:04:41 +0000 (18:04 +0200)]
Add unittest for Node text serialization
This checks that the Node text serialization and deserialization
operations are idempotent when combined other.
Iustin Pop [Wed, 25 Aug 2010 15:53:53 +0000 (17:53 +0200)]
Switch unittest to custom hostnames
Currently, the hostnames are almost fully arbitrary chars, which breaks
the assumption that nodes/instances will be normal DNS hostnames.
This patch adds some custom generators for these hostnames, that will
allow better testing of text loader serialization/deserialization.
Iustin Pop [Tue, 24 Aug 2010 16:30:05 +0000 (18:30 +0200)]
Move text serialization functions to Text.hs
Currently these are in hscan, and cannot be reused easily.
Iustin Pop [Thu, 29 Jul 2010 04:03:42 +0000 (00:03 -0400)]
Fix a couple of typos in the manpages
Again, thanks to lintian.
Iustin Pop [Tue, 27 Jul 2010 18:44:30 +0000 (14:44 -0400)]
hail: fix error message for failed multi-evac
Currently we show the instance index, but this makes no sense outside
the current running program. Instead, we show the instance name.
Iustin Pop [Mon, 26 Jul 2010 23:49:23 +0000 (19:49 -0400)]
Update NEWS file for the 0.2.6 release
Iustin Pop [Tue, 27 Jul 2010 00:02:29 +0000 (20:02 -0400)]
NEWS: Add double blank lines before headers
This looks better for text-only viewing…
Iustin Pop [Fri, 23 Jul 2010 00:50:49 +0000 (20:50 -0400)]
hscan: return exit code 2 for RAPI failures
If some clusters failed during RAPI collection, exit with exit code 2 so
that tests can detect this failure.
Iustin Pop [Fri, 23 Jul 2010 00:32:41 +0000 (20:32 -0400)]
More enhancements to live-test.sh
Iustin Pop [Thu, 22 Jul 2010 13:57:13 +0000 (09:57 -0400)]
Fix another haddock issue
Iustin Pop [Thu, 22 Jul 2010 03:03:28 +0000 (23:03 -0400)]
Remove an obsolete function and add Utils tests
Iustin Pop [Thu, 22 Jul 2010 00:27:09 +0000 (20:27 -0400)]
Extend the live-test
The (recently-enabled) live test coverage stats found a few low-hanging
fruits in the tests we do…
Iustin Pop [Wed, 21 Jul 2010 23:25:44 +0000 (19:25 -0400)]
Use --union for hpc sum
… which fixes the issue noted in the previous commit (almost a brown
paper bag change).
Iustin Pop [Wed, 21 Jul 2010 22:43:12 +0000 (18:43 -0400)]
Preliminary support for coverage during live-test
While this doesn't work correctly yet (hpc sum seems to only take common
modules, not the sum of modules?), it prepares for gathering coverage
data during live-test (as an alternative to unittest coverage data).
Iustin Pop [Wed, 21 Jul 2010 22:18:44 +0000 (18:18 -0400)]
Add some more imports to QC.hs
This is needed so that in the coverage report we list all modules, even
the ones we don't test at all, such that we get the complete results.
Iustin Pop [Wed, 21 Jul 2010 15:47:25 +0000 (17:47 +0200)]
Change the meaning of the N+1 fail metric
Currently, this metric tracks the nodes failing the N+1 check. While
this helps (in some cases) to evacuate such nodes, it's not a good
metric since rarely it will change during a step (only at the last
instance moving away). Therefore we replace it with the count of
instances living on such nodes, which is much better because:
- moving an instance away while the node is still N+1 failing will still
reflect in the score as an optimization
- moving the last instance causing an N+1 failure will result in a heavy
decrease of this score, thus giving the right bonus to clear this
status
Iustin Pop [Wed, 21 Jul 2010 15:33:00 +0000 (17:33 +0200)]
Introduce per-metric weights
Currently all metrics have the same weight (we just sum them together).
However, for the hard constraints (N+1 failures, offline nodes, etc.)
we should handle the metrics differently based on their meaning. For
example, an instance living on a primary offline node is worse than an
instance having its secondary node offline, which in turn is worse than
an instance having its secondary node failing N+1.
To express this case in our code, we introduce a table of weights for
the metrics, with which we can influence their relative importance.
Iustin Pop [Wed, 21 Jul 2010 14:15:03 +0000 (16:15 +0200)]
Allow balancing moves to introduce N+1 errors
This patch switches the applyMove function to the extended versions of
Node.addPri and addSec, and passes the override flag based on the state
of the node that we're moving away from.
Iustin Pop [Wed, 21 Jul 2010 13:30:38 +0000 (15:30 +0200)]
Introduce a relaxed add instance mode
In case an instance is living on an offline node, it doesn't make sense
to refuse moving it because that would create N+1 failures; failing N+1
is still much better than not running at all. Similarly, if the
secondary node of an instance is offline, meaning the instance doesn't
have any redundancy, we have a worse case than having a secondary that
is N+1 failing and it could not accept the instance as primary, but it
stil does redundancy for it.
To allow this, we rename Node.addPri to addPriEx and introduce an extra
parameter (addPri is a partial application of addPriEx and keeps the
same signature). Node.addSec gets the same treatement.
Iustin Pop [Mon, 19 Jul 2010 10:23:07 +0000 (12:23 +0200)]
Remove obsolete Container.maxNameLen
This was only used in one place (hbal), and is obsolete by the change to
the dual name/alias structure.
Iustin Pop [Mon, 19 Jul 2010 10:22:00 +0000 (12:22 +0200)]
hbal: print short names in steps list
This was a regression from the name handling changes, as we started
using the original names for the solution list (which is not designed
for parsing/feeding back into ganeti).
Iustin Pop [Mon, 19 Jul 2010 10:19:21 +0000 (12:19 +0200)]
Remove an obsolete function
printSolution is no longer used, as we print the solution iteratively
now.
Iustin Pop [Sun, 18 Jul 2010 21:12:25 +0000 (23:12 +0200)]
Allow '+' in node list fields
When the field list is prefixed with a plus sign, this will extend the
default field list, instead of replacing it entirely.
Iustin Pop [Sun, 18 Jul 2010 20:59:20 +0000 (22:59 +0200)]
Update the node list fields
This patch renames the pri/sec to pcnt/scnt, and adds the real primary
and secondary instance lists, the peermap and the index of a node as
selectable options.
Iustin Pop [Fri, 16 Jul 2010 20:20:17 +0000 (22:20 +0200)]
Cleanup a node's peer map when possible
If the last secondary instance of a peer is deleted (detected by the new
peer memory value being equal to zero), then the pair (pdx, 0) should be
deleted completely. This is not optimization per se, but rather cleanup
(the speedup is at most a percent, and only in some corner cases).
Iustin Pop [Fri, 16 Jul 2010 18:30:51 +0000 (20:30 +0200)]
Fix handling of offline options and short names
This needs to be abstracted in a separate function, but in the meantime
we fix the issue in both places.
Signed-off-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 21 Jun 2010 09:12:41 +0000 (11:12 +0200)]
Fix another haddock special-char issue
Iustin Pop [Mon, 21 Jun 2010 02:58:52 +0000 (04:58 +0200)]
Remove JOB_STATUS_GONE and add unittests
… for the serialization/deserialization of the job and opcode status.
Job status 'gone' was not actually used. It can be reintroduced if
needed.
Iustin Pop [Mon, 21 Jun 2010 02:46:58 +0000 (04:46 +0200)]
Add opcode status constants/type
This mirrors, again, the Ganeti constats, and are added for future use.
Iustin Pop [Mon, 21 Jun 2010 02:32:39 +0000 (04:32 +0200)]
Rename the job status constants
The rename is done such that we match Ganeti's own constants.
Iustin Pop [Sun, 6 Sep 2009 14:23:13 +0000 (16:23 +0200)]
Optimise the Luxi.recvMsg function
Since the current buffer cannot contain (during network reads) an EOM,
we should look for the EOM only in the newly-received string. While
this shouldn't make much difference, in some tests it cuts the recvMsg
total time by around half.
On entering recvMsg, we have though to search the old buffer for a
message though, since we could have received two Luxi messages on the
last network query; this is however a one-off cost, compared to
continuously looking for the EOM in the old string (at each receive
loop).
Iustin Pop [Mon, 7 Jun 2010 22:09:37 +0000 (00:09 +0200)]
Complete the client Luxi implementation
All current Luxi calls are supported after this patch. A bug in
ArchiveJob is also fixed (Ganeti's job IDs are strings).
Iustin Pop [Mon, 7 Jun 2010 21:35:47 +0000 (23:35 +0200)]
Add support for more LUXI calls
While not are directly useful, having them will open some possibilities
(e.g. polling for job changes in hbal's -X mode, and auto-archiving the
jobs once they are successful).
Iustin Pop [Wed, 2 Jun 2010 21:08:43 +0000 (23:08 +0200)]
Fix some lint errors in the unit tests
Iustin Pop [Wed, 2 Jun 2010 20:27:56 +0000 (22:27 +0200)]
Change the Luxi operations structure
Currently, we define the LuxiOp type as a simple enumeration, and leave
the arguments structure to the users of the Ganeti.Luxi module. This is
suboptimal for a couple of reasons: first, we decouple the operation
type from operation arguments, and that means we don't use the type
system for validation of the arguments; second, the clients themselves
have to know about the JSON encoding of the protocol.
For the above arguments, we change the operation type to contain the
arguments too, and then the entire conversion/serialization is
restricted to the Ganeti.Luxi module. Also, the removal of the JSON
encoding from the clients results in an overall simplification of the
code.
Iustin Pop [Tue, 1 Jun 2010 20:51:10 +0000 (22:51 +0200)]
Fix a warning in Loader tests
Incomplete pattern match…
Iustin Pop [Tue, 1 Jun 2010 17:54:22 +0000 (19:54 +0200)]
Add a few Loader tests
These are not comprehensive, but at least we have a start.
Iustin Pop [Sun, 30 May 2010 18:16:33 +0000 (20:16 +0200)]
Modify the test runner to show test exceptions
QuickCheck's batch driver (at least v1) doesn't show the test aborts,
but simply discards the specific exception and increases the abort
count. This makes it hard to debug the tests, so we modify our own test
wrapper (which so far only tracked total failures) to show any
exceptions.
Iustin Pop [Fri, 28 May 2010 09:13:56 +0000 (11:13 +0200)]
Reduce the warnings during the unittests
Since the unittests are not 'clean' from the p.o.v. of type
declarations, and cannot be made clean in all respects (e.g. orphan
instances), we silence some warnings for the test target, to have a
cleaner output.
Iustin Pop [Thu, 27 May 2010 21:39:58 +0000 (23:39 +0200)]
Improve the test driver
The tests are moved to a separate data structure, and we can select a
subset of tests to run.
Iustin Pop [Thu, 27 May 2010 21:25:21 +0000 (23:25 +0200)]
Introduce OpCode unittests
Iustin Pop [Thu, 27 May 2010 21:00:30 +0000 (23:00 +0200)]
Introduce suport for optional keys in JObjects
Some keys are optional in the Ganeti opcodes (e.g. ‘node’ in the
OpReplaceDisks), and as such we need to transform them in a Maybe value,
instead of failing.
The patch reworks a bit fromObj and adds maybeFromObj which parses such
optional values. It then uses it in the opcode reading.
Iustin Pop [Thu, 27 May 2010 20:37:54 +0000 (22:37 +0200)]
Replace fromJResult with annotateJResult
This patch removes all old uses of fromJResult with the annotated
version, and removes the non-annotated version. All JSON parsing points
should now have annotated errors.
Iustin Pop [Thu, 27 May 2010 20:32:18 +0000 (22:32 +0200)]
Add annotations to loadJSArray
This allows, for example, the RAPI backend to detail which information
(instance or node data) fails to parse.
Iustin Pop [Wed, 26 May 2010 22:54:48 +0000 (00:54 +0200)]
Change fromObj error messages
Currently fromObj doesn't detail what we're trying to read, which can
lead to cryptic messages: "Cannot read Int". The patch changes this
function to annotate the error messages with the key/value we're trying
to convert, by using a new version of fromJResult.
Since the display of the key in tryFromObj is now redundant (it was
already redundant in the 'not found' case), we remove it.
The new version of fromJResult (annotateJResult) simply prepends a
description string to the actual error message.
Iustin Pop [Wed, 26 May 2010 22:11:53 +0000 (00:11 +0200)]
A few more small Node unit-tests
Iustin Pop [Tue, 25 May 2010 17:17:57 +0000 (19:17 +0200)]
Add more unittests
Instance, Node and Text modules have improved coverage.