Michael Hanselmann [Thu, 30 Jul 2009 15:52:49 +0000 (17:52 +0200)]
cmdlib: Add new opcode to migrate node
It migrates all primary instances from the node to their secondaries.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 30 Jul 2009 16:00:31 +0000 (18:00 +0200)]
rapi: Add default parameter to _checkIntVariable
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 30 Jul 2009 15:25:50 +0000 (17:25 +0200)]
cmdlib: Add logging for tasklets
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 30 Jul 2009 15:02:46 +0000 (17:02 +0200)]
cmdlib: Fix tasklets handling if no tasklets are added
If no tasklets are added, self.tasklets evaluates to None. The LU base
class will throw an exception because it thinks the derived class doesn't
implement the right methods.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 30 Jul 2009 10:46:23 +0000 (12:46 +0200)]
rapi: Add /2/[node_name]/evacuate resource
This can be used to evacuate a node.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 30 Jul 2009 09:40:02 +0000 (11:40 +0200)]
Add information about storage units framework
This updates the 2.1 design document with storage units framework information.
Signed-off-by: Iustin Pop <iustin@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Wed, 29 Jul 2009 12:24:59 +0000 (14:24 +0200)]
Add RPC calls for storage unit list
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 29 Jul 2009 10:49:53 +0000 (12:49 +0200)]
Add first implementation of generic storage unit framework
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 28 Jul 2009 17:29:47 +0000 (19:29 +0200)]
utils: Add functions to calc directory size and free space on filesystem
These will be used by the new storage unit framework.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Fri, 24 Jul 2009 13:31:34 +0000 (15:31 +0200)]
Build HTML from Ganeti 2.1 design
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Sat, 25 Jul 2009 18:04:03 +0000 (20:04 +0200)]
Collapse SSL key checking/overriding for daemons
Signed-off-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Thu, 23 Jul 2009 15:46:37 +0000 (16:46 +0100)]
Collapse daemon's main function
With three ganeti daemons, and one or two more coming, the daemon's main
function started becoming too much cut&pasted code. Collapsing most of
it in a daemon.GenericMain function. Some more code could be collapsed
between the two http-based daemons, but since the new daemons won't be
http-based we won't do it right now.
As a bonus a functionality for overriding the network port on the
command line for all network based nodes is added.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Thu, 23 Jul 2009 16:23:06 +0000 (17:23 +0100)]
Remove <DAEMON>_PID constants
The <DAEMON>_PID constants were created to reference a daemon pid file,
but actually contain a daemon's name, because the various functions that
work with pidfiles abstract the filename from the daemon name
themselves. Removing the constants and using the actual daemon name
constants in their place.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 23 Jul 2009 16:12:43 +0000 (17:12 +0100)]
Slightly abstract the daemon logfile lookup
The original LOG_<DAEMON_NAME> constants for daemon logfiles are gone.
In their place there is a DAEMONS_LOGFILES dict, indexed by daemon name.
This is a minor change with the objective to uniform most of the
daemon's main() functions code, which is very similar one to the other.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 23 Jul 2009 13:11:54 +0000 (14:11 +0100)]
Move rapi to GetDaemonPort
Currently rapi is the only daemon which accepts a port option, rather
than querying its own port from services, and failing back to the
default if not found. Changing this to conform to what other daemons do.
Also update the ganeti-rapi(8) manpage
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 22 Jul 2009 16:57:26 +0000 (17:57 +0100)]
Change GetNodeDaemonPort to GetDaemonPort in utils
GetNodeDaemonPort is used to lookup the node daemon port in the services
file, and if not found to return the default one. We make it a generic
function, which accepts the daemon name in input, so that it can be used
by confd as well, to lookup its own udp port.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Fri, 24 Jul 2009 12:01:49 +0000 (14:01 +0200)]
Merge branch 'next' into branch-2.1
* next:
lvmstrap: Change diskinfo to use GenerateTable
Get rid of constants.RAPI_ENABLE
Remove references to utils.debug
ganeti-rapi, replace hardcoded exit value
Add the bind-address option to ganeti-rapi
noded: Abstract hard-coded sys.exit value
Add an example "ethers" hook
burnin: move batch init/commit into a decorator
burnin: move instance alive checks to a decorator
burnin: Implement retryable operations
Ignore vim swap files
burnin: fix removal errors hiding real errors
Stephen Shirley [Thu, 23 Jul 2009 17:14:20 +0000 (19:14 +0200)]
lvmstrap: Change diskinfo to use GenerateTable
This way the produced table is formatted nicely.
Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 23 Jul 2009 13:41:02 +0000 (14:41 +0100)]
Get rid of constants.RAPI_ENABLE
This constant is unused, except in qa. Removing it since it's always True.
This patch also removes the unused qa_rapi.PrintRemoteAPIWarning
function, and removes a comment about temporary constants "until we have
cluster parameters".
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 23 Jul 2009 08:43:57 +0000 (10:43 +0200)]
cmdlib: Add __init__ to Tasklet class
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Thu, 23 Jul 2009 08:58:33 +0000 (09:58 +0100)]
Remove references to utils.debug
Various modules set it to True when called in debugging mode, but the
utils module supports no such global.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 23 Jul 2009 07:55:53 +0000 (08:55 +0100)]
ganeti-rapi, replace hardcoded exit value
substitute exit(1) with exit(constants.EXIT_FAILURE).
Also fix a wrongly indented line.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 23 Jul 2009 07:48:14 +0000 (08:48 +0100)]
Add the bind-address option to ganeti-rapi
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 21 Jul 2009 17:24:47 +0000 (19:24 +0200)]
cmdlib: Move LUMigrateInstance functionality to tasklet
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Tue, 21 Jul 2009 12:28:28 +0000 (14:28 +0200)]
gnt-node: Use new opcode to evacuate nodes
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 22 Jul 2009 17:31:15 +0000 (19:31 +0200)]
Add new opcode to evacuate nodes
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Tue, 21 Jul 2009 16:17:20 +0000 (18:17 +0200)]
cmdlib: Convert _DiskReplacer to tasklet
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Tue, 21 Jul 2009 15:45:45 +0000 (17:45 +0200)]
cmdlib: Function to get all secondary instances on a certain node
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Wed, 22 Jul 2009 17:07:54 +0000 (18:07 +0100)]
noded: Abstract hard-coded sys.exit value
On machines without the ssl file noded exists '5'.
Changing this to constants.EXIT_NOTCLUSTER.
Also utils.GetNodeDaemonPort hasn't risen errors.ConfigurationError for
a while, so removing that try/except block.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Wed, 22 Jul 2009 13:57:09 +0000 (15:57 +0200)]
cmdlib: Add tasklet support to logical unit base class
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Wed, 22 Jul 2009 11:01:20 +0000 (13:01 +0200)]
cmdlib: Add tasklet base class
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 21 Jul 2009 12:53:40 +0000 (13:53 +0100)]
Add an example "ethers" hook
This hook can be used to update /etc/ethers with instance's mac
addresses. A dhcp server on the nodes can then serve to the instances
their correct address. (This has been tested with dnsmasq's dhcp
implementation)
Signed-off-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Thu, 16 Jul 2009 14:27:17 +0000 (16:27 +0200)]
ganeti-confd design doc
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 21 Jul 2009 09:55:19 +0000 (11:55 +0200)]
burnin: move batch init/commit into a decorator
Many burnin steps initialize the batch queue at the beginning and commit
it at the end of their operation. This patch moves this code to a
decorator, in order to reduce redundant code.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Iustin Pop [Tue, 21 Jul 2009 09:41:12 +0000 (11:41 +0200)]
burnin: move instance alive checks to a decorator
Many burn steps to a manual check of instance aliveness, via duplicate
code. This patch moves this code to a decorator.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 21 Jul 2009 08:53:27 +0000 (10:53 +0200)]
burnin: Implement retryable operations
Some burnin steps are idempotent: e.g. reinstalling an instance (from
burning p.o.v.) can be done multiple times without any side-effects that
would affect later burnin steps. As such, failing the whole burnin
process due a reinstall failure is undesirable.
This patch modifies burnin by marking each opcode (in case of individual
execution) and job set retryable or not. Retryable actions will be
retried up to a number of times, after which we give up and return
failure.
One side-effect is that in case of full-failure in retryable job sets we
lose the original exception (but we do log its string format), so we
have a little bit less information in this case.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Thu, 16 Jul 2009 15:30:44 +0000 (17:30 +0200)]
Generate a shared HMAC key at cluster init time
This key is shared on all nodes (via cmdlib._RedistributeAncillaryFiles)
and will be used for HMAC authentication of confd messages.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 20 Jul 2009 15:49:55 +0000 (17:49 +0200)]
Fix unittests broken by commit
2bb5c9115f
File "../test/ganeti.hooks_unittest.py", line 239, in setUp
self.lu = FakeLU(FakeProc(), self.op, self.context, None)
File "…/ganeti/cmdlib.py", line 92, in __init__
self.LogStep = processor.LogStep
AttributeError: FakeProc instance has no attribute 'LogStep'
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 20 Jul 2009 15:38:01 +0000 (17:38 +0200)]
cmdlib: Move code doing disk replacements into separate class
This class will be used for a new opcode to evacuate nodes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 14 Jul 2009 13:23:36 +0000 (15:23 +0200)]
cmdlib: Pass config and rpc objects directly to IAllocator
Before IAllocator would access them using “self.lu.cfg” and “self.lu.rpc”.
It shouldn't know about the internals of the LU.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 20 Jul 2009 11:26:39 +0000 (13:26 +0200)]
Ignore vim swap files
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 20 Jul 2009 10:29:55 +0000 (12:29 +0200)]
Fix backend import errors from GetHypervisorClass
The merge of commit 360b0dc into branch-2.1 broke import of backend,
since it uses hypervisor.GetHypervisor() which returns an instance of
the hypervisor. Some of the hypervisors create directories at init time,
thus the import of backend failed due this chain if it's not done on a
(proper) ganeti node, such as during unittest time.
This patch adds in hypervisor a GetHypervisorClass() function, which
returns the class not the instance of the hypervisor, and uses that in
_BuildUploadFiles(). The existing GetHypervisor is then changed to use
this function.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Sun, 19 Jul 2009 18:34:08 +0000 (20:34 +0200)]
burnin: fix removal errors hiding real errors
A long-standing bug in burnin makes errors during the removal phase
(e.g. because an import has failed, or because the initial creation has
failed) hide the original error.
This patch suppresses removal errors if we are already in ‘has_err’
mode, and otherwise it displays them normally.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 19 Jul 2009 18:26:48 +0000 (20:26 +0200)]
Merge branch 'next' into branch-2.1
Conflicts:
lib/backend.py: non-trivial conflict but easy to solve
Iustin Pop [Sun, 19 Jul 2009 13:27:12 +0000 (15:27 +0200)]
backend: Only build once the list of upload files
The list of upload files is built currently at every UploadFile() call.
This patch moves it to a separate variable which is initialized only
once.
This won't make much difference but I regard it as cleanup.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 19 Jul 2009 16:47:09 +0000 (18:47 +0200)]
Merge commit 'origin/next' into branch-2.1
Conflicts:
lib/cli.py: trivial extra empty line
Iustin Pop [Wed, 10 Jun 2009 15:37:51 +0000 (17:37 +0200)]
Fix gnt-instance reinstall
Commit
55efe6dabe48e5c37dc1ff6099e0bb8afde7a468 "Convert instance
reinstall to multi instance model" actually broke instance reinstall for
single-instance cases. This one-liner fixes it.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
b6e243ab010d1df2b6c211b9edc9fe1978e52391)
Iustin Pop [Sun, 19 Jul 2009 14:40:57 +0000 (16:40 +0200)]
Fix a couple of epydoc warnings
It seems epydoc needs fully-qualified references, and doesn't deal with
relative ones (not even in the current module) if there are any
ambiguities.
There are other epydoc warnings, in the rapi docstrings, but those are
left as-is as they're removed in 2.1.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 19 Jul 2009 01:45:45 +0000 (03:45 +0200)]
job queue: fix loss of finalized opcode result
Currently, unclean master daemon shutdown overwrites all of a job's
opcode status and result with error/None. This is incorrect, since the
any already finished opcode(s) should have their status and result
preserved, and only not-yet-processed opcodes should be marked as
‘error’. Cancelling jobs between opcodes does the same (but this is not
allowed currently by the code, so it's not as important as unclean
shutdown).
This patch adds a new _QueuedJob function that only overwrites the
status and result of finalized opcodes, which is then used in job queue
init and in the cancel job functions. The patch also adds some comments
and a new set constants in constants.py highlighting the finalized vs.
non-finalized opcode statuses.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 18 Jul 2009 23:51:04 +0000 (01:51 +0200)]
Switch gnt-debug submit-job to JobExecutor
Currently gnt-debug submits jobs individually, but in 2.1 JobExecutor
uses the optimized SubmitManyJobs luxi call and as such should be used
whenever multiple jobs need to be submitted.
This patch converts gnt-debug submit-job to use it and also removes an
extra empty line in the JobExecutor class.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 22 May 2009 12:27:46 +0000 (14:27 +0200)]
Convert instance reinstall to multi instance model
This patch converts ‘gnt-instance reinstall’ from single-instance to
multi-instance model; since this is dangerours, it's required to pass
“--force --force-multiple” to skip the confirmation.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
55efe6dabe48e5c37dc1ff6099e0bb8afde7a468)
Iustin Pop [Fri, 22 May 2009 11:01:35 +0000 (13:01 +0200)]
gnt-instance batch-create: use the job executor
This small patch changed the batch create functionality to use the job
executor instead of single-job submits.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
d4dd4b74a786cd0f31e5fc530f140aaf438c68e7)
Iustin Pop [Fri, 22 May 2009 10:25:31 +0000 (12:25 +0200)]
Modify cli.JobExecutor to use SubmitManyJobs
This patch changes the generic "multiple job executor" to use the many
jobs submit model, which automatically makes all its users use the new
model.
This makes, for example, startup/shutdown of a full cluster much more
logical (all the submitted job IDs are visible fast, and then waiting
for them proceeds normally).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
23b4b983afc9b9e81d558f06e4e0cde53703e575)
Iustin Pop [Thu, 21 May 2009 16:02:42 +0000 (18:02 +0200)]
Add a luxi call for multi-job submit
As a workaround for the job submit timeouts that we have, this patch
adds a new luxi call for multi-job submit; the advantage is that all the
jobs are added in the queue and only after the workers can start
processing them.
This is definitely faster than per-job submit, where the submission of
new jobs competes with the workers processing jobs.
On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
- 100 jobs:
- individual: submit time ~21s, processing time ~21s
- multiple: submit time 7-9s, processing time ~22s
- 250 jobs:
- individual: submit time ~56s, processing time ~57s
run 2: ~54s ~55s
- multiple: submit time ~20s, processing time ~51s
run 2: ~17s ~52s
which shows that we indeed gain on the client side, and maybe even on
the total processing time for a high number of jobs. For just 10 or so I
expect the difference to be just noise.
This will probably require increasing the timeout a little when
submitting too many jobs - 250 jobs at ~20 seconds is close to the
current rw timeout of 60s.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
2971c9132b8b798178921a389b18d893edec06fb)
Iustin Pop [Sun, 19 Jul 2009 02:12:11 +0000 (04:12 +0200)]
job queue: fix interrupted job processing
If a job with more than one opcodes is being processed, and the master
daemon crashes between two opcodes, we have the first N opcodes marked
successful, and the rest marked as queued. This means that the overall
jbo status is queued, and thus on master daemon restart it will be
resent for completion.
However, the RunTask() function in jqueue.py doesn't deal with
partially-completed jobs. This patch makes it simply skip such opcodes.
An alternative option would be to not mark partially-completed jobs as
QUEUED but instead RUNNING, which would result in aborting of the job at
restart time.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 19 Jul 2009 02:01:16 +0000 (04:01 +0200)]
Fix an error path in job queue worker's RunTask
In case the job fails, we try to set the job's run_op_idx to -1.
However, this is a wrong variable, which wasn't detected until the
__slots__ addition. The correct variable is run_op_index.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 17 Jul 2009 15:16:51 +0000 (17:16 +0200)]
Add __slots__ on objects in jqueue
Adding slots to _QueuedOpCode decreases memory usage (of these objects)
by roughly four times. It is a lesser change for _QueuedJobs.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 17 Jul 2009 15:31:56 +0000 (17:31 +0200)]
Merge commit 'origin/next' into branch-2.1
* commit 'origin/next':
ganeti.initd: Pass $*_ARGS to programs when restarting them
Michael Hanselmann [Fri, 17 Jul 2009 15:09:33 +0000 (17:09 +0200)]
ganeti.initd: Pass $*_ARGS to programs when restarting them
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 17 Jul 2009 13:30:56 +0000 (15:30 +0200)]
Merge branch 'next' into branch-2.1
* next:
Optimizie OpCode loading
Yet another fallout from the pylint fixes
Iustin Pop [Fri, 17 Jul 2009 12:54:30 +0000 (14:54 +0200)]
Optimizie OpCode loading
This patch converts the opcode loading to a pre-built map (at import
time) instead of iteration over the globals dict at each call.
Microbenchmarks show that this should be around three times faster, and
burnin still passes.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 17 Jul 2009 12:39:45 +0000 (14:39 +0200)]
Yet another fallout from the pylint fixes
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Guido Trotter [Fri, 17 Jul 2009 11:41:21 +0000 (13:41 +0200)]
Merge branch 'next' into branch-2.1
* next:
Fix another issue with hypervisor_name change
Update NEWS and version for 2.0.2 release
Improve the description of node flags in man page
Add enabled hypervisors to TestConfigRunner
Add a few more checks to verify config
Make sure enabled_hypervisors list is valid
Change default stripe count to 1
Use full-stripe size in LVM growth
Remove ConfigWriter.InitConfig
RAPI: implement instance reinstall
Conflicts:
test/ganeti.config_unittest.py
529d13a43907dd3f5ab1814e52098f04c2a43f93 contained a small fix which
was also present in
066f465dbc53dd8ae80442dfe2592602be1ac231
Guido Trotter [Fri, 17 Jul 2009 11:40:17 +0000 (13:40 +0200)]
Merge branch 'master' into next
* master:
Update NEWS and version for 2.0.2 release
Improve the description of node flags in man page
Change default stripe count to 1
Use full-stripe size in LVM growth
RAPI: implement instance reinstall
Iustin Pop [Thu, 16 Jul 2009 16:30:57 +0000 (18:30 +0200)]
Fix another issue with hypervisor_name change
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 16 Jul 2009 13:41:36 +0000 (15:41 +0200)]
Update NEWS and version for 2.0.2 release
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Raiford Storey [Thu, 16 Jul 2009 16:49:18 +0000 (09:49 -0700)]
Improve the description of node flags in man page
[iustin@google.com: slightly reworded the explanation for offline and
changed the commit message]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Thu, 16 Jul 2009 13:48:53 +0000 (15:48 +0200)]
Add enabled hypervisors to TestConfigRunner
This parameter is now mandatory for the cluster config to work.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 16 Jul 2009 12:44:17 +0000 (14:44 +0200)]
Add a few more checks to verify config
- Check that the enabled hypervisors list is valid
- Check that the master node is a valid node
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Thu, 16 Jul 2009 12:02:42 +0000 (14:02 +0200)]
Make sure enabled_hypervisors list is valid
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 14 Jul 2009 16:37:49 +0000 (18:37 +0200)]
Get rid of the default_hypervisor slot
Currently we have both a default_hypervisor and an enabled_hypervisors
list. The former is only settable at cluster init time, while the latter
can be changed with cluster modify.
This becomes cumbersome in a few ways: at cluster init time for example
if we pass in a list of enabled hypervisors which doesn't include the
"default" xen-pvm one, we're also forced to pass a default hypervisor,
or an error will be reported. It is also currently possible to disable
the default hypervisor in cluster-modify (with unknown results).
In order to avoid this we get rid of this field altogether, and define
the "first" enabled hypervisor as the default one. This allows ease of
changing which one is the default, and at the same time maintains
coherency.
At configuration upgrade we make sure that the old default is first in
the list, so that 2.0 cluster defaults are preserved.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 14 Jul 2009 12:28:37 +0000 (14:28 +0200)]
design-2.1: Update OS Flavours section
This reflects a discussion we had, according to which the full
"parameters" implementation is too heavy weight for 2.1, and we should
have a partial version for now, and decide again later.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 16 Jul 2009 10:41:55 +0000 (12:41 +0200)]
Change default stripe count to 1
In order not to change the default during a stable series, we modify
configure.ac to default to one stripe, in effect keeping the status quo
(well, minus the LVM Attach() changes).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Tue, 14 Jul 2009 12:35:34 +0000 (14:35 +0200)]
cmdlib: Use dict.fromkeys instead of custom loop
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 14 Jul 2009 15:56:48 +0000 (17:56 +0200)]
Simplify InitConfig and remove SimpleConfigWriter
InitConfig currently creates the cluster config_data, then puts it into
a dict, passes it to SimpleConfigWriter to load it from a dict (which
just reuses the dict value) and then saves it. The SimpleConfigWriter is
then returned, but ignored. With this patch we just write out the
config_data at InitConfig time, and thus can remove SimpleConfigWriter
altogether. The now unused SimpleConfigReader.FromDict is also gone.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 14 Jul 2009 15:52:59 +0000 (17:52 +0200)]
InitCluster, don't use SimpleConfigWriter
InitConfig returns a SimpleConfigWriter to InitCluster, which then
passes it on to ssh.WriteKnownHostsFile, which extracts a couple of
values from it. One line later the full ConfigWriter is initialized.
By initializing it one line before we can pass the full writer to
ssh.WriteKnownHostsFile, and thus we don't need to care anymore for the
InitConfig returned SimpleConfigWriter
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Thu, 16 Jul 2009 10:43:56 +0000 (12:43 +0200)]
Fix python 2.4 compatibility
I got overexcited and forgot we have to remain compatible with python
2.4. With this patch we move from sha256 to sha1 for hmac authenticated
serialized messages, and we handle both newer and older python, by
importing the right module for each.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 13 Jul 2009 12:52:50 +0000 (14:52 +0200)]
Use full-stripe size in LVM growth
LVM has issues when growing stripped volumes, so it's best to specify
the growth in exact multiples of the full stripe size (as precise as
possible). For this we need to do a couple of changes:
- in LVM Attach(), we query additionally the VG extent size and the LV
stripe count; since this makes lvs return a (possibly) multi-line
output, we now split it into lines and only take the last one
- in LVM Grow(), we round up the increase in multiples of the full
stripe size
The patch also sets the correct target size in DRBD growth.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Guido Trotter [Tue, 14 Jul 2009 15:47:03 +0000 (17:47 +0200)]
Remove ConfigWriter.InitConfig
It's been replaced by a simpler bootstrap.InitConfig function, which
does the same job, and is currently unused.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 14 Jul 2009 15:15:43 +0000 (17:15 +0200)]
Merge branch 'next' into branch-2.1
* next:
Remove SimpleConfigWriter.SetMasterNode
_GenerateDiskTemplate: use base_index in the name
ganeti-masterd: avoid SimpleConfigReader
cmdlib: Fix typo in LUQueryClusterInfo
Conflicts:
daemons/ganeti-masterd
RPC related conflict
Signed-off-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Tue, 14 Jul 2009 15:05:34 +0000 (17:05 +0200)]
Remove SimpleConfigWriter.SetMasterNode
This function is not used.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Tue, 14 Jul 2009 13:42:58 +0000 (15:42 +0200)]
_GenerateDiskTemplate: use base_index in the name
Currently if a disk is added later the base_index is not considered, and
all the disks are called disk0. This patch fixes it.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Tue, 14 Jul 2009 12:09:21 +0000 (14:09 +0200)]
ganeti-masterd: avoid SimpleConfigReader
SimpleStore is a lot less heavyweight than SimpleConfigReader, and to
just get the master name we can use that. This is the only usage of
SimpleConfigReader currently, but we're not going to delete the class,
as new usages will come in for ganeti-confd (in 2.1). Using it there,
though, will make the class even more heavy to load, so it makes sense
for this simple usage to be converted.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Mon, 13 Jul 2009 13:40:06 +0000 (15:40 +0200)]
HMAC authenticated json messages
This patch includes HMAC authenticated json messages to the serializer.
The new interface works on any json-encodable data type, and can sign it
with a private key and an optional salt. The same private key must be
used upon message loading to verify the message.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Mon, 13 Jul 2009 13:48:29 +0000 (15:48 +0200)]
rapi: Implement /2/nodes/[node_name]/role resource
This resource can be used to retrieve and set the role of a node.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 13 Jul 2009 13:47:41 +0000 (15:47 +0200)]
rapi: Add generic “force” parameter
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 13 Jul 2009 13:55:56 +0000 (15:55 +0200)]
cmdlib: Fix typo in LUQueryClusterInfo
This was broken by my pylint fixes patch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 13 Jul 2009 09:11:41 +0000 (11:11 +0200)]
RAPI: implement instance reinstall
This patch adds instance reinstall to RAPI, with two optional parameters:
- ‘os', in order to change the OS on reinstall
- ‘nostartup’, in order to leave the instance down after reinstall
The call will first shutdown the instance, the reinstall it, and unless
‘nostartup’ has been passed and is equal to 1, it will be started
automatically.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Tue, 7 Jul 2009 13:35:05 +0000 (15:35 +0200)]
Extend call_node_start_master rpc with no_voting
When the parameter is set to True and start_daemons is also True,
ganeti-masterd will be started with the new --no-voting --yes-do-it
options.
This new option is set to True only on masterfailover, when no_voting is
used. This changed the behavior from 2.0, where we didn't start the
master daemon at all, when this option was used.
The manpage is also updated to remove the 2.0 only change.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Wed, 8 Jul 2009 09:28:37 +0000 (11:28 +0200)]
Merge branch 'next' into branch-2.1
* next:
Create a new --no-voting option for masterfailover
ganeti-masterd: allow non-interactive --no-voting
Fix pylint warnings
Add custom pylintrc
bootstrap: Don't leak file descriptor when generating SSL certificate
Fix problem with EAGAIN on socket connection in clients
Fix some typos
Increase maximum accepted size for a DRBD meta dev
Cleanup config data when draining nodes
Fix node readd issues
backend.DemoteFromMC: don't fail for missing files
Allow GetMasterCandidateStats to ignore some nodes
Fix error message for extra files on non MC nodes
Conflicts:
lib/backend.py
Most of the conflicts where in the new rpcs VS pylint fixes
and usually the new rpcs fixed the pylint problems as well
lib/bootstrap.py
Small conflict between masterfailover --no-voting and new rpcs
lib/cmdlib.py
Net parameters conflicted here, kept that version
lib/objects.py
Same problem fixed in two different ways. 'next' version kept
Signed-off-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Wed, 8 Jul 2009 09:27:52 +0000 (11:27 +0200)]
Merge branch 'master' into next
* master:
Create a new --no-voting option for masterfailover
ganeti-masterd: allow non-interactive --no-voting
Guido Trotter [Wed, 8 Jul 2009 08:34:11 +0000 (10:34 +0200)]
Create a new --no-voting option for masterfailover
This allows failing over in certain corner cases, such as a 2 node
cluster with one node down. The man page is also updated to document
this dangerous option and how to recover from this situation.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 7 Jul 2009 13:23:38 +0000 (15:23 +0200)]
ganeti-masterd: allow non-interactive --no-voting
This will be used by ganeti-noded to start ganeti-masterd in a
--no-voting masterfailover.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 3 Jul 2009 20:42:23 +0000 (22:42 +0200)]
Fix pylint warnings
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 3 Jul 2009 20:41:01 +0000 (22:41 +0200)]
Add custom pylintrc
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 3 Jul 2009 19:54:08 +0000 (21:54 +0200)]
bootstrap: Don't leak file descriptor when generating SSL certificate
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 2 Jul 2009 20:39:20 +0000 (22:39 +0200)]
Fix problem with EAGAIN on socket connection in clients
If a user used ^Z to stop the program, poll() in socket.recv would return
EAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 1 Jul 2009 21:28:35 +0000 (23:28 +0200)]
Fix some typos
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 1 Jul 2009 09:08:13 +0000 (11:08 +0200)]
Increase maximum accepted size for a DRBD meta dev
With the change to stripped LVs, the actual size of a meta device (which
is small) can be more than we expected (for non-stripped LVs). This
patch increases from 160MB to 1GB the accepted size, and updates the
comment with the rationale behind this change.
Note that we do want even meta devices stripped, since it can increase
metadata update.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Iustin Pop [Tue, 30 Jun 2009 16:10:16 +0000 (18:10 +0200)]
Cleanup config data when draining nodes
Currently, when draining nodes we reset their master candidate flag, but
we don't instruct them to demote themselves. This leads to “ERROR: file
'/var/lib/ganeti/config.data' should not exist on non master candidates
(and the file is outdated)”.
This patch simply adds a call to node_demote_from_mc in this case.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>