ganeti-local
14 years agocmdlib: Add new opcode to migrate node
Michael Hanselmann [Thu, 30 Jul 2009 15:52:49 +0000 (17:52 +0200)]
cmdlib: Add new opcode to migrate node

It migrates all primary instances from the node to their secondaries.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agorapi: Add default parameter to _checkIntVariable
Michael Hanselmann [Thu, 30 Jul 2009 16:00:31 +0000 (18:00 +0200)]
rapi: Add default parameter to _checkIntVariable

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agocmdlib: Add logging for tasklets
Michael Hanselmann [Thu, 30 Jul 2009 15:25:50 +0000 (17:25 +0200)]
cmdlib: Add logging for tasklets

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agocmdlib: Fix tasklets handling if no tasklets are added
Michael Hanselmann [Thu, 30 Jul 2009 15:02:46 +0000 (17:02 +0200)]
cmdlib: Fix tasklets handling if no tasklets are added

If no tasklets are added, self.tasklets evaluates to None. The LU base
class will throw an exception because it thinks the derived class doesn't
implement the right methods.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agorapi: Add /2/[node_name]/evacuate resource
Michael Hanselmann [Thu, 30 Jul 2009 10:46:23 +0000 (12:46 +0200)]
rapi: Add /2/[node_name]/evacuate resource

This can be used to evacuate a node.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd information about storage units framework
Michael Hanselmann [Thu, 30 Jul 2009 09:40:02 +0000 (11:40 +0200)]
Add information about storage units framework

This updates the 2.1 design document with storage units framework information.

Signed-off-by: Iustin Pop <iustin@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd RPC calls for storage unit list
Michael Hanselmann [Wed, 29 Jul 2009 12:24:59 +0000 (14:24 +0200)]
Add RPC calls for storage unit list

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd first implementation of generic storage unit framework
Michael Hanselmann [Wed, 29 Jul 2009 10:49:53 +0000 (12:49 +0200)]
Add first implementation of generic storage unit framework

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoutils: Add functions to calc directory size and free space on filesystem
Michael Hanselmann [Tue, 28 Jul 2009 17:29:47 +0000 (19:29 +0200)]
utils: Add functions to calc directory size and free space on filesystem

These will be used by the new storage unit framework.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoBuild HTML from Ganeti 2.1 design
Michael Hanselmann [Fri, 24 Jul 2009 13:31:34 +0000 (15:31 +0200)]
Build HTML from Ganeti 2.1 design

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoCollapse SSL key checking/overriding for daemons
Guido Trotter [Sat, 25 Jul 2009 18:04:03 +0000 (20:04 +0200)]
Collapse SSL key checking/overriding for daemons

Signed-off-by: Guido Trotter <ultrotter@google.com>

14 years agoCollapse daemon's main function
Guido Trotter [Thu, 23 Jul 2009 15:46:37 +0000 (16:46 +0100)]
Collapse daemon's main function

With three ganeti daemons, and one or two more coming, the daemon's main
function started becoming too much cut&pasted code. Collapsing most of
it in a daemon.GenericMain function. Some more code could be collapsed
between the two http-based daemons, but since the new daemons won't be
http-based we won't do it right now.

As a bonus a functionality for overriding the network port on the
command line for all network based nodes is added.

Signed-off-by: Guido Trotter <ultrotter@google.com>

14 years agoRemove <DAEMON>_PID constants
Guido Trotter [Thu, 23 Jul 2009 16:23:06 +0000 (17:23 +0100)]
Remove <DAEMON>_PID constants

The <DAEMON>_PID constants were created to reference a daemon pid file,
but actually contain a daemon's name, because the various functions that
work with pidfiles abstract the filename from the daemon name
themselves. Removing the constants and using the actual daemon name
constants in their place.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoSlightly abstract the daemon logfile lookup
Guido Trotter [Thu, 23 Jul 2009 16:12:43 +0000 (17:12 +0100)]
Slightly abstract the daemon logfile lookup

The original LOG_<DAEMON_NAME> constants for daemon logfiles are gone.
In their place there is a DAEMONS_LOGFILES dict, indexed by daemon name.

This is a minor change with the objective to uniform most of the
daemon's main() functions code, which is very similar one to the other.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoMove rapi to GetDaemonPort
Guido Trotter [Thu, 23 Jul 2009 13:11:54 +0000 (14:11 +0100)]
Move rapi to GetDaemonPort

Currently rapi is the only daemon which accepts a port option, rather
than querying its own port from services, and failing back to the
default if not found. Changing this to conform to what other daemons do.

Also update the ganeti-rapi(8) manpage

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoChange GetNodeDaemonPort to GetDaemonPort in utils
Guido Trotter [Wed, 22 Jul 2009 16:57:26 +0000 (17:57 +0100)]
Change GetNodeDaemonPort to GetDaemonPort in utils

GetNodeDaemonPort is used to lookup the node daemon port in the services
file, and if not found to return the default one. We make it a generic
function, which accepts the daemon name in input, so that it can be used
by confd as well, to lookup its own udp port.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoMerge branch 'next' into branch-2.1
Guido Trotter [Fri, 24 Jul 2009 12:01:49 +0000 (14:01 +0200)]
Merge branch 'next' into branch-2.1

* next:
  lvmstrap: Change diskinfo to use GenerateTable
  Get rid of constants.RAPI_ENABLE
  Remove references to utils.debug
  ganeti-rapi, replace hardcoded exit value
  Add the bind-address option to ganeti-rapi
  noded: Abstract hard-coded sys.exit value
  Add an example "ethers" hook
  burnin: move batch init/commit into a decorator
  burnin: move instance alive checks to a decorator
  burnin: Implement retryable operations
  Ignore vim swap files
  burnin: fix removal errors hiding real errors

14 years agolvmstrap: Change diskinfo to use GenerateTable
Stephen Shirley [Thu, 23 Jul 2009 17:14:20 +0000 (19:14 +0200)]
lvmstrap: Change diskinfo to use GenerateTable

This way the produced table is formatted nicely.

Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoGet rid of constants.RAPI_ENABLE
Guido Trotter [Thu, 23 Jul 2009 13:41:02 +0000 (14:41 +0100)]
Get rid of constants.RAPI_ENABLE

This constant is unused, except in qa. Removing it since it's always True.

This patch also removes the unused qa_rapi.PrintRemoteAPIWarning
function, and removes a comment about temporary constants "until we have
cluster parameters".

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agocmdlib: Add __init__ to Tasklet class
Michael Hanselmann [Thu, 23 Jul 2009 08:43:57 +0000 (10:43 +0200)]
cmdlib: Add __init__ to Tasklet class

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoRemove references to utils.debug
Guido Trotter [Thu, 23 Jul 2009 08:58:33 +0000 (09:58 +0100)]
Remove references to utils.debug

Various modules set it to True when called in debugging mode, but the
utils module supports no such global.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoganeti-rapi, replace hardcoded exit value
Guido Trotter [Thu, 23 Jul 2009 07:55:53 +0000 (08:55 +0100)]
ganeti-rapi, replace hardcoded exit value

substitute exit(1) with exit(constants.EXIT_FAILURE).
Also fix a wrongly indented line.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoAdd the bind-address option to ganeti-rapi
Guido Trotter [Thu, 23 Jul 2009 07:48:14 +0000 (08:48 +0100)]
Add the bind-address option to ganeti-rapi

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agocmdlib: Move LUMigrateInstance functionality to tasklet
Michael Hanselmann [Tue, 21 Jul 2009 17:24:47 +0000 (19:24 +0200)]
cmdlib: Move LUMigrateInstance functionality to tasklet

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agognt-node: Use new opcode to evacuate nodes
Michael Hanselmann [Tue, 21 Jul 2009 12:28:28 +0000 (14:28 +0200)]
gnt-node: Use new opcode to evacuate nodes

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd new opcode to evacuate nodes
Michael Hanselmann [Wed, 22 Jul 2009 17:31:15 +0000 (19:31 +0200)]
Add new opcode to evacuate nodes

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agocmdlib: Convert _DiskReplacer to tasklet
Michael Hanselmann [Tue, 21 Jul 2009 16:17:20 +0000 (18:17 +0200)]
cmdlib: Convert _DiskReplacer to tasklet

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agocmdlib: Function to get all secondary instances on a certain node
Michael Hanselmann [Tue, 21 Jul 2009 15:45:45 +0000 (17:45 +0200)]
cmdlib: Function to get all secondary instances on a certain node

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agonoded: Abstract hard-coded sys.exit value
Guido Trotter [Wed, 22 Jul 2009 17:07:54 +0000 (18:07 +0100)]
noded: Abstract hard-coded sys.exit value

On machines without the ssl file noded exists '5'.
Changing this to constants.EXIT_NOTCLUSTER.

Also utils.GetNodeDaemonPort hasn't risen errors.ConfigurationError for
a while, so removing that try/except block.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agocmdlib: Add tasklet support to logical unit base class
Michael Hanselmann [Wed, 22 Jul 2009 13:57:09 +0000 (15:57 +0200)]
cmdlib: Add tasklet support to logical unit base class

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agocmdlib: Add tasklet base class
Michael Hanselmann [Wed, 22 Jul 2009 11:01:20 +0000 (13:01 +0200)]
cmdlib: Add tasklet base class

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoAdd an example "ethers" hook
Guido Trotter [Tue, 21 Jul 2009 12:53:40 +0000 (13:53 +0100)]
Add an example "ethers" hook

This hook can be used to update /etc/ethers with instance's mac
addresses. A dhcp server on the nodes can then serve to the instances
their correct address. (This has been tested with dnsmasq's dhcp
implementation)

Signed-off-by: Guido Trotter <ultrotter@google.com>

14 years agoganeti-confd design doc
Guido Trotter [Thu, 16 Jul 2009 14:27:17 +0000 (16:27 +0200)]
ganeti-confd design doc

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoburnin: move batch init/commit into a decorator
Iustin Pop [Tue, 21 Jul 2009 09:55:19 +0000 (11:55 +0200)]
burnin: move batch init/commit into a decorator

Many burnin steps initialize the batch queue at the beginning and commit
it at the end of their operation. This patch moves this code to a
decorator, in order to reduce redundant code.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

14 years agoburnin: move instance alive checks to a decorator
Iustin Pop [Tue, 21 Jul 2009 09:41:12 +0000 (11:41 +0200)]
burnin: move instance alive checks to a decorator

Many burn steps to a manual check of instance aliveness, via duplicate
code. This patch moves this code to a decorator.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoburnin: Implement retryable operations
Iustin Pop [Tue, 21 Jul 2009 08:53:27 +0000 (10:53 +0200)]
burnin: Implement retryable operations

Some burnin steps are idempotent: e.g. reinstalling an instance (from
burning p.o.v.) can be done multiple times without any side-effects that
would affect later burnin steps. As such, failing the whole burnin
process due a reinstall failure is undesirable.

This patch modifies burnin by marking each opcode (in case of individual
execution) and job set retryable or not. Retryable actions will be
retried up to a number of times, after which we give up and return
failure.

One side-effect is that in case of full-failure in retryable job sets we
lose the original exception (but we do log its string format), so we
have a little bit less information in this case.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoGenerate a shared HMAC key at cluster init time
Guido Trotter [Thu, 16 Jul 2009 15:30:44 +0000 (17:30 +0200)]
Generate a shared HMAC key at cluster init time

This key is shared on all nodes (via cmdlib._RedistributeAncillaryFiles)
and will be used for HMAC authentication of confd messages.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoFix unittests broken by commit 2bb5c9115f
Michael Hanselmann [Mon, 20 Jul 2009 15:49:55 +0000 (17:49 +0200)]
Fix unittests broken by commit 2bb5c9115f

File "../test/ganeti.hooks_unittest.py", line 239, in setUp
  self.lu = FakeLU(FakeProc(), self.op, self.context, None)
File "…/ganeti/cmdlib.py", line 92, in __init__
  self.LogStep = processor.LogStep
AttributeError: FakeProc instance has no attribute 'LogStep'

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agocmdlib: Move code doing disk replacements into separate class
Michael Hanselmann [Mon, 20 Jul 2009 15:38:01 +0000 (17:38 +0200)]
cmdlib: Move code doing disk replacements into separate class

This class will be used for a new opcode to evacuate nodes.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agocmdlib: Pass config and rpc objects directly to IAllocator
Michael Hanselmann [Tue, 14 Jul 2009 13:23:36 +0000 (15:23 +0200)]
cmdlib: Pass config and rpc objects directly to IAllocator

Before IAllocator would access them using “self.lu.cfg” and “self.lu.rpc”.
It shouldn't know about the internals of the LU.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoIgnore vim swap files
Michael Hanselmann [Mon, 20 Jul 2009 11:26:39 +0000 (13:26 +0200)]
Ignore vim swap files

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14 years agoFix backend import errors from GetHypervisorClass
Iustin Pop [Mon, 20 Jul 2009 10:29:55 +0000 (12:29 +0200)]
Fix backend import errors from GetHypervisorClass

The merge of commit 360b0dc into branch-2.1 broke import of backend,
since it uses hypervisor.GetHypervisor() which returns an instance of
the hypervisor. Some of the hypervisors create directories at init time,
thus the import of backend failed due this chain if it's not done on a
(proper) ganeti node, such as during unittest time.

This patch adds in hypervisor a GetHypervisorClass() function, which
returns the class not the instance of the hypervisor, and uses that in
_BuildUploadFiles(). The existing GetHypervisor is then changed to use
this function.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

14 years agoburnin: fix removal errors hiding real errors
Iustin Pop [Sun, 19 Jul 2009 18:34:08 +0000 (20:34 +0200)]
burnin: fix removal errors hiding real errors

A long-standing bug in burnin makes errors during the removal phase
(e.g. because an import has failed, or because the initial creation has
failed) hide the original error.

This patch suppresses removal errors if we are already in ‘has_err’
mode, and otherwise it displays them normally.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoMerge branch 'next' into branch-2.1
Iustin Pop [Sun, 19 Jul 2009 18:26:48 +0000 (20:26 +0200)]
Merge branch 'next' into branch-2.1

Conflicts:
lib/backend.py: non-trivial conflict but easy to solve

14 years agobackend: Only build once the list of upload files
Iustin Pop [Sun, 19 Jul 2009 13:27:12 +0000 (15:27 +0200)]
backend: Only build once the list of upload files

The list of upload files is built currently at every UploadFile() call.
This patch moves it to a separate variable which is initialized only
once.

This won't make much difference but I regard it as cleanup.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoMerge commit 'origin/next' into branch-2.1
Iustin Pop [Sun, 19 Jul 2009 16:47:09 +0000 (18:47 +0200)]
Merge commit 'origin/next' into branch-2.1

Conflicts:
lib/cli.py: trivial extra empty line

14 years agoFix gnt-instance reinstall
Iustin Pop [Wed, 10 Jun 2009 15:37:51 +0000 (17:37 +0200)]
Fix gnt-instance reinstall

Commit 55efe6dabe48e5c37dc1ff6099e0bb8afde7a468 "Convert instance
reinstall to multi instance model" actually broke instance reinstall for
single-instance cases. This one-liner fixes it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit b6e243ab010d1df2b6c211b9edc9fe1978e52391)

14 years agoFix a couple of epydoc warnings
Iustin Pop [Sun, 19 Jul 2009 14:40:57 +0000 (16:40 +0200)]
Fix a couple of epydoc warnings

It seems epydoc needs fully-qualified references, and doesn't deal with
relative ones (not even in the current module) if there are any
ambiguities.

There are other epydoc warnings, in the rapi docstrings, but those are
left as-is as they're removed in 2.1.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agojob queue: fix loss of finalized opcode result
Iustin Pop [Sun, 19 Jul 2009 01:45:45 +0000 (03:45 +0200)]
job queue: fix loss of finalized opcode result

Currently, unclean master daemon shutdown overwrites all of a job's
opcode status and result with error/None. This is incorrect, since the
any already finished opcode(s) should have their status and result
preserved, and only not-yet-processed opcodes should be marked as
‘error’. Cancelling jobs between opcodes does the same (but this is not
allowed currently by the code, so it's not as important as unclean
shutdown).

This patch adds a new _QueuedJob function that only overwrites the
status and result of finalized opcodes, which is then used in job queue
init and in the cancel job functions. The patch also adds some comments
and a new set constants in constants.py highlighting the finalized vs.
non-finalized opcode statuses.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoSwitch gnt-debug submit-job to JobExecutor
Iustin Pop [Sat, 18 Jul 2009 23:51:04 +0000 (01:51 +0200)]
Switch gnt-debug submit-job to JobExecutor

Currently gnt-debug submits jobs individually, but in 2.1 JobExecutor
uses the optimized SubmitManyJobs luxi call and as such should be used
whenever multiple jobs need to be submitted.

This patch converts gnt-debug submit-job to use it and also removes an
extra empty line in the JobExecutor class.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoConvert instance reinstall to multi instance model
Iustin Pop [Fri, 22 May 2009 12:27:46 +0000 (14:27 +0200)]
Convert instance reinstall to multi instance model

This patch converts ‘gnt-instance reinstall’ from single-instance to
multi-instance model; since this is dangerours, it's required to pass
“--force --force-multiple” to skip the confirmation.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit 55efe6dabe48e5c37dc1ff6099e0bb8afde7a468)

14 years agognt-instance batch-create: use the job executor
Iustin Pop [Fri, 22 May 2009 11:01:35 +0000 (13:01 +0200)]
gnt-instance batch-create: use the job executor

This small patch changed the batch create functionality to use the job
executor instead of single-job submits.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit d4dd4b74a786cd0f31e5fc530f140aaf438c68e7)

14 years agoModify cli.JobExecutor to use SubmitManyJobs
Iustin Pop [Fri, 22 May 2009 10:25:31 +0000 (12:25 +0200)]
Modify cli.JobExecutor to use SubmitManyJobs

This patch changes the generic "multiple job executor" to use the many
jobs submit model, which automatically makes all its users use the new
model.

This makes, for example, startup/shutdown of a full cluster much more
logical (all the submitted job IDs are visible fast, and then waiting
for them proceeds normally).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit 23b4b983afc9b9e81d558f06e4e0cde53703e575)

14 years agoAdd a luxi call for multi-job submit
Iustin Pop [Thu, 21 May 2009 16:02:42 +0000 (18:02 +0200)]
Add a luxi call for multi-job submit

As a workaround for the job submit timeouts that we have, this patch
adds a new luxi call for multi-job submit; the advantage is that all the
jobs are added in the queue and only after the workers can start
processing them.

This is definitely faster than per-job submit, where the submission of
new jobs competes with the workers processing jobs.

On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
  - 100 jobs:
    - individual: submit time ~21s, processing time ~21s
    - multiple:   submit time 7-9s, processing time ~22s
  - 250 jobs:
    - individual: submit time ~56s, processing time ~57s
                  run 2:      ~54s                  ~55s
    - multiple:   submit time ~20s, processing time ~51s
                  run 2:      ~17s                  ~52s

which shows that we indeed gain on the client side, and maybe even on
the total processing time for a high number of jobs. For just 10 or so I
expect the difference to be just noise.

This will probably require increasing the timeout a little when
submitting too many jobs - 250 jobs at ~20 seconds is close to the
current rw timeout of 60s.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit 2971c9132b8b798178921a389b18d893edec06fb)

14 years agojob queue: fix interrupted job processing
Iustin Pop [Sun, 19 Jul 2009 02:12:11 +0000 (04:12 +0200)]
job queue: fix interrupted job processing

If a job with more than one opcodes is being processed, and the master
daemon crashes between two opcodes, we have the first N opcodes marked
successful, and the rest marked as queued. This means that the overall
jbo status is queued, and thus on master daemon restart it will be
resent for completion.

However, the RunTask() function in jqueue.py doesn't deal with
partially-completed jobs. This patch makes it simply skip such opcodes.

An alternative option would be to not mark partially-completed jobs as
QUEUED but instead RUNNING, which would result in aborting of the job at
restart time.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

14 years agoFix an error path in job queue worker's RunTask
Iustin Pop [Sun, 19 Jul 2009 02:01:16 +0000 (04:01 +0200)]
Fix an error path in job queue worker's RunTask

In case the job fails, we try to set the job's run_op_idx to -1.
However, this is a wrong variable, which wasn't detected until the
__slots__ addition. The correct variable is run_op_index.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

15 years agoAdd __slots__ on objects in jqueue
Iustin Pop [Fri, 17 Jul 2009 15:16:51 +0000 (17:16 +0200)]
Add __slots__ on objects in jqueue

Adding slots to _QueuedOpCode decreases memory usage (of these objects)
by roughly four times. It is a lesser change for _QueuedJobs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

15 years agoMerge commit 'origin/next' into branch-2.1
Michael Hanselmann [Fri, 17 Jul 2009 15:31:56 +0000 (17:31 +0200)]
Merge commit 'origin/next' into branch-2.1

* commit 'origin/next':
  ganeti.initd: Pass $*_ARGS to programs when restarting them

15 years agoganeti.initd: Pass $*_ARGS to programs when restarting them
Michael Hanselmann [Fri, 17 Jul 2009 15:09:33 +0000 (17:09 +0200)]
ganeti.initd: Pass $*_ARGS to programs when restarting them

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoMerge branch 'next' into branch-2.1
Iustin Pop [Fri, 17 Jul 2009 13:30:56 +0000 (15:30 +0200)]
Merge branch 'next' into branch-2.1

* next:
  Optimizie OpCode loading
  Yet another fallout from the pylint fixes

15 years agoOptimizie OpCode loading
Iustin Pop [Fri, 17 Jul 2009 12:54:30 +0000 (14:54 +0200)]
Optimizie OpCode loading

This patch converts the opcode loading to a pre-built map (at import
time) instead of iteration over the globals dict at each call.

Microbenchmarks show that this should be around three times faster, and
burnin still passes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

15 years agoYet another fallout from the pylint fixes
Iustin Pop [Fri, 17 Jul 2009 12:39:45 +0000 (14:39 +0200)]
Yet another fallout from the pylint fixes

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

15 years agoMerge branch 'next' into branch-2.1
Guido Trotter [Fri, 17 Jul 2009 11:41:21 +0000 (13:41 +0200)]
Merge branch 'next' into branch-2.1

* next:
  Fix another issue with hypervisor_name change
  Update NEWS and version for 2.0.2 release
  Improve the description of node flags in man page
  Add enabled hypervisors to TestConfigRunner
  Add a few more checks to verify config
  Make sure enabled_hypervisors list is valid
  Change default stripe count to 1
  Use full-stripe size in LVM growth
  Remove ConfigWriter.InitConfig
  RAPI: implement instance reinstall

Conflicts:

  test/ganeti.config_unittest.py
    529d13a43907dd3f5ab1814e52098f04c2a43f93 contained a small fix which
    was also present in 066f465dbc53dd8ae80442dfe2592602be1ac231

15 years agoMerge branch 'master' into next
Guido Trotter [Fri, 17 Jul 2009 11:40:17 +0000 (13:40 +0200)]
Merge branch 'master' into next

* master:
  Update NEWS and version for 2.0.2 release
  Improve the description of node flags in man page
  Change default stripe count to 1
  Use full-stripe size in LVM growth
  RAPI: implement instance reinstall

15 years agoFix another issue with hypervisor_name change
Iustin Pop [Thu, 16 Jul 2009 16:30:57 +0000 (18:30 +0200)]
Fix another issue with hypervisor_name change

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

15 years agoUpdate NEWS and version for 2.0.2 release v2.0.2
Iustin Pop [Thu, 16 Jul 2009 13:41:36 +0000 (15:41 +0200)]
Update NEWS and version for 2.0.2 release

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

15 years agoImprove the description of node flags in man page
Raiford Storey [Thu, 16 Jul 2009 16:49:18 +0000 (09:49 -0700)]
Improve the description of node flags in man page

[iustin@google.com: slightly reworded the explanation for offline and
changed the commit message]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoAdd enabled hypervisors to TestConfigRunner
Guido Trotter [Thu, 16 Jul 2009 13:48:53 +0000 (15:48 +0200)]
Add enabled hypervisors to TestConfigRunner

This parameter is now mandatory for the cluster config to work.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

15 years agoAdd a few more checks to verify config
Guido Trotter [Thu, 16 Jul 2009 12:44:17 +0000 (14:44 +0200)]
Add a few more checks to verify config

- Check that the enabled hypervisors list is valid
- Check that the master node is a valid node

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoMake sure enabled_hypervisors list is valid
Guido Trotter [Thu, 16 Jul 2009 12:02:42 +0000 (14:02 +0200)]
Make sure enabled_hypervisors list is valid

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoGet rid of the default_hypervisor slot
Guido Trotter [Tue, 14 Jul 2009 16:37:49 +0000 (18:37 +0200)]
Get rid of the default_hypervisor slot

Currently we have both a default_hypervisor and an enabled_hypervisors
list. The former is only settable at cluster init time, while the latter
can be changed with cluster modify.

This becomes cumbersome in a few ways: at cluster init time for example
if we pass in a list of enabled hypervisors which doesn't include the
"default" xen-pvm one, we're also forced to pass a default hypervisor,
or an error will be reported. It is also currently possible to disable
the default hypervisor in cluster-modify (with unknown results).

In order to avoid this we get rid of this field altogether, and define
the "first" enabled hypervisor as the default one. This allows ease of
changing which one is the default, and at the same time maintains
coherency.

At configuration upgrade we make sure that the old default is first in
the list, so that 2.0 cluster defaults are preserved.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agodesign-2.1: Update OS Flavours section
Guido Trotter [Tue, 14 Jul 2009 12:28:37 +0000 (14:28 +0200)]
design-2.1: Update OS Flavours section

This reflects a discussion we had, according to which the full
"parameters" implementation is too heavy weight for 2.1, and we should
have a partial version for now, and decide again later.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoChange default stripe count to 1
Iustin Pop [Thu, 16 Jul 2009 10:41:55 +0000 (12:41 +0200)]
Change default stripe count to 1

In order not to change the default during a stable series, we modify
configure.ac to default to one stripe, in effect keeping the status quo
(well, minus the LVM Attach() changes).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

15 years agocmdlib: Use dict.fromkeys instead of custom loop
Michael Hanselmann [Tue, 14 Jul 2009 12:35:34 +0000 (14:35 +0200)]
cmdlib: Use dict.fromkeys instead of custom loop

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoSimplify InitConfig and remove SimpleConfigWriter
Guido Trotter [Tue, 14 Jul 2009 15:56:48 +0000 (17:56 +0200)]
Simplify InitConfig and remove SimpleConfigWriter

InitConfig currently creates the cluster config_data, then puts it into
a dict, passes it to SimpleConfigWriter to load it from a dict (which
just reuses the dict value) and then saves it. The SimpleConfigWriter is
then returned, but ignored. With this patch we just write out the
config_data at InitConfig time, and thus can remove SimpleConfigWriter
altogether. The now unused SimpleConfigReader.FromDict is also gone.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoInitCluster, don't use SimpleConfigWriter
Guido Trotter [Tue, 14 Jul 2009 15:52:59 +0000 (17:52 +0200)]
InitCluster, don't use SimpleConfigWriter

InitConfig returns a SimpleConfigWriter to InitCluster, which then
passes it on to ssh.WriteKnownHostsFile, which extracts a couple of
values from it. One line later the full ConfigWriter is initialized.

By initializing it one line before we can pass the full writer to
ssh.WriteKnownHostsFile, and thus we don't need to care anymore for the
InitConfig returned SimpleConfigWriter

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoFix python 2.4 compatibility
Guido Trotter [Thu, 16 Jul 2009 10:43:56 +0000 (12:43 +0200)]
Fix python 2.4 compatibility

I got overexcited and forgot we have to remain compatible with python
2.4. With this patch we move from sha256 to sha1 for hmac authenticated
serialized messages, and we handle both newer and older python, by
importing the right module for each.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoUse full-stripe size in LVM growth
Iustin Pop [Mon, 13 Jul 2009 12:52:50 +0000 (14:52 +0200)]
Use full-stripe size in LVM growth

LVM has issues when growing stripped volumes, so it's best to specify
the growth in exact multiples of the full stripe size (as precise as
possible). For this we need to do a couple of changes:
  - in LVM Attach(), we query additionally the VG extent size and the LV
    stripe count; since this makes lvs return a (possibly) multi-line
    output, we now split it into lines and only take the last one
  - in LVM Grow(), we round up the increase in multiples of the full
    stripe size

The patch also sets the correct target size in DRBD growth.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

15 years agoRemove ConfigWriter.InitConfig
Guido Trotter [Tue, 14 Jul 2009 15:47:03 +0000 (17:47 +0200)]
Remove ConfigWriter.InitConfig

It's been replaced by a simpler bootstrap.InitConfig function, which
does the same job, and is currently unused.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoMerge branch 'next' into branch-2.1
Guido Trotter [Tue, 14 Jul 2009 15:15:43 +0000 (17:15 +0200)]
Merge branch 'next' into branch-2.1

* next:
  Remove SimpleConfigWriter.SetMasterNode
  _GenerateDiskTemplate: use base_index in the name
  ganeti-masterd: avoid SimpleConfigReader
  cmdlib: Fix typo in LUQueryClusterInfo

Conflicts:

daemons/ganeti-masterd
  RPC related conflict

Signed-off-by: Guido Trotter <ultrotter@google.com>

15 years agoRemove SimpleConfigWriter.SetMasterNode
Guido Trotter [Tue, 14 Jul 2009 15:05:34 +0000 (17:05 +0200)]
Remove SimpleConfigWriter.SetMasterNode

This function is not used.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

15 years ago_GenerateDiskTemplate: use base_index in the name
Guido Trotter [Tue, 14 Jul 2009 13:42:58 +0000 (15:42 +0200)]
_GenerateDiskTemplate: use base_index in the name

Currently if a disk is added later the base_index is not considered, and
all the disks are called disk0. This patch fixes it.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

15 years agoganeti-masterd: avoid SimpleConfigReader
Guido Trotter [Tue, 14 Jul 2009 12:09:21 +0000 (14:09 +0200)]
ganeti-masterd: avoid SimpleConfigReader

SimpleStore is a lot less heavyweight than SimpleConfigReader, and to
just get the master name we can use that. This is the only usage of
SimpleConfigReader currently, but we're not going to delete the class,
as new usages will come in for ganeti-confd (in 2.1). Using it there,
though, will make the class even more heavy to load, so it makes sense
for this simple usage to be converted.

Signed-off-by: Guido Trotter <ultrotter@google.com>

15 years agoHMAC authenticated json messages
Guido Trotter [Mon, 13 Jul 2009 13:40:06 +0000 (15:40 +0200)]
HMAC authenticated json messages

This patch includes HMAC authenticated json messages to the serializer.
The new interface works on any json-encodable data type, and can sign it
with a private key and an optional salt. The same private key must be
used upon message loading to verify the message.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

15 years agorapi: Implement /2/nodes/[node_name]/role resource
Michael Hanselmann [Mon, 13 Jul 2009 13:48:29 +0000 (15:48 +0200)]
rapi: Implement /2/nodes/[node_name]/role resource

This resource can be used to retrieve and set the role of a node.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agorapi: Add generic “force” parameter
Michael Hanselmann [Mon, 13 Jul 2009 13:47:41 +0000 (15:47 +0200)]
rapi: Add generic “force” parameter

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agocmdlib: Fix typo in LUQueryClusterInfo
Michael Hanselmann [Mon, 13 Jul 2009 13:55:56 +0000 (15:55 +0200)]
cmdlib: Fix typo in LUQueryClusterInfo

This was broken by my pylint fixes patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoRAPI: implement instance reinstall
Iustin Pop [Mon, 13 Jul 2009 09:11:41 +0000 (11:11 +0200)]
RAPI: implement instance reinstall

This patch adds instance reinstall to RAPI, with two optional parameters:
  - ‘os', in order to change the OS on reinstall
  - ‘nostartup’, in order to leave the instance down after reinstall

The call will first shutdown the instance, the reinstall it, and unless
‘nostartup’ has been passed and is equal to 1, it will be started
automatically.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

15 years agoExtend call_node_start_master rpc with no_voting
Guido Trotter [Tue, 7 Jul 2009 13:35:05 +0000 (15:35 +0200)]
Extend call_node_start_master rpc with no_voting

When the parameter is set to True and start_daemons is also True,
ganeti-masterd will be started with the new --no-voting --yes-do-it
options.

This new option is set to True only on masterfailover, when no_voting is
used. This changed the behavior from 2.0, where we didn't start the
master daemon at all, when this option was used.

The manpage is also updated to remove the 2.0 only change.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoMerge branch 'next' into branch-2.1
Guido Trotter [Wed, 8 Jul 2009 09:28:37 +0000 (11:28 +0200)]
Merge branch 'next' into branch-2.1

* next:
  Create a new --no-voting option for masterfailover
  ganeti-masterd: allow non-interactive --no-voting
  Fix pylint warnings
  Add custom pylintrc
  bootstrap: Don't leak file descriptor when generating SSL certificate
  Fix problem with EAGAIN on socket connection in clients
  Fix some typos
  Increase maximum accepted size for a DRBD meta dev
  Cleanup config data when draining nodes
  Fix node readd issues
  backend.DemoteFromMC: don't fail for missing files
  Allow GetMasterCandidateStats to ignore some nodes
  Fix error message for extra files on non MC nodes

Conflicts:

lib/backend.py
          Most of the conflicts where in the new rpcs VS pylint fixes
          and usually the new rpcs fixed the pylint problems as well
lib/bootstrap.py
          Small conflict between masterfailover --no-voting and new rpcs
lib/cmdlib.py
          Net parameters conflicted here, kept that version
lib/objects.py
          Same problem fixed in two different ways. 'next' version kept

Signed-off-by: Guido Trotter <ultrotter@google.com>

15 years agoMerge branch 'master' into next
Guido Trotter [Wed, 8 Jul 2009 09:27:52 +0000 (11:27 +0200)]
Merge branch 'master' into next

* master:
  Create a new --no-voting option for masterfailover
  ganeti-masterd: allow non-interactive --no-voting

15 years agoCreate a new --no-voting option for masterfailover
Guido Trotter [Wed, 8 Jul 2009 08:34:11 +0000 (10:34 +0200)]
Create a new --no-voting option for masterfailover

This allows failing over in certain corner cases, such as a 2 node
cluster with one node down. The man page is also updated to document
this dangerous option and how to recover from this situation.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoganeti-masterd: allow non-interactive --no-voting
Guido Trotter [Tue, 7 Jul 2009 13:23:38 +0000 (15:23 +0200)]
ganeti-masterd: allow non-interactive --no-voting

This will be used by ganeti-noded to start ganeti-masterd in a
--no-voting masterfailover.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoFix pylint warnings
Michael Hanselmann [Fri, 3 Jul 2009 20:42:23 +0000 (22:42 +0200)]
Fix pylint warnings

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoAdd custom pylintrc
Michael Hanselmann [Fri, 3 Jul 2009 20:41:01 +0000 (22:41 +0200)]
Add custom pylintrc

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agobootstrap: Don't leak file descriptor when generating SSL certificate
Michael Hanselmann [Fri, 3 Jul 2009 19:54:08 +0000 (21:54 +0200)]
bootstrap: Don't leak file descriptor when generating SSL certificate

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoFix problem with EAGAIN on socket connection in clients
Michael Hanselmann [Thu, 2 Jul 2009 20:39:20 +0000 (22:39 +0200)]
Fix problem with EAGAIN on socket connection in clients

If a user used ^Z to stop the program, poll() in socket.recv would return
EAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoFix some typos
Michael Hanselmann [Wed, 1 Jul 2009 21:28:35 +0000 (23:28 +0200)]
Fix some typos

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

15 years agoIncrease maximum accepted size for a DRBD meta dev
Iustin Pop [Wed, 1 Jul 2009 09:08:13 +0000 (11:08 +0200)]
Increase maximum accepted size for a DRBD meta dev

With the change to stripped LVs, the actual size of a meta device (which
is small) can be more than we expected (for non-stripped LVs). This
patch increases from 160MB to 1GB the accepted size, and updates the
comment with the rationale behind this change.

Note that we do want even meta devices stripped, since it can increase
metadata update.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

15 years agoCleanup config data when draining nodes
Iustin Pop [Tue, 30 Jun 2009 16:10:16 +0000 (18:10 +0200)]
Cleanup config data when draining nodes

Currently, when draining nodes we reset their master candidate flag, but
we don't instruct them to demote themselves. This leads to “ERROR: file
'/var/lib/ganeti/config.data' should not exist on non master candidates
(and the file is outdated)”.

This patch simply adds a call to node_demote_from_mc in this case.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>