Iustin Pop [Fri, 14 Aug 2009 09:46:51 +0000 (11:46 +0200)]
Fix small typo in gnt-node
The iallocator option is '-I' not '-i'.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 14 Aug 2009 09:39:41 +0000 (11:39 +0200)]
Simplify handling of boolean args in rapi
This patch replaces hardcoded boolean-type args with
bool(_checkIntVariable). There should be no other cases of this left, I
think.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 14 Aug 2009 09:07:06 +0000 (11:07 +0200)]
Fix checks in LUSetNodeParms for the master node
There was a check already in the LU for the master node, however is
wasn't correct. This patch disallows any role changes on the master node
via LUSetNodeParms (and as this LU can't change anything else, it
practically prevents it from touching the master node).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 13 Aug 2009 13:52:30 +0000 (15:52 +0200)]
Improve the example startup script
Currently, the supplised script has two issues:
- it doesn't use start-stop-daemon --start correctly, leading to
messages like "ganeti.errors.GenericError:
/var/run/ganeti/ganeti-rapi.pid contains a live process" in the logs
- it doesn't allow start/stop/restart of a single daemon, which leads
to manual launch, which is bad because we don't reuse the settings
from the defaults file
For the first one, we change from ‘--exec …’ to ‘--startas …’, which is
the actual option used for start, whereas exec is a test (that also
supplies the default to startas). We also add ‘--oknodo’ as per recent
Debian policy changes.
For the second, we do a bigger change; we basically remove the full-path
and pid variables, and construct these two from the daemon name. We then
check if we are given a daemon name (in which case we only do that)
otherwise we do the requested action on all daemons.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 13 Aug 2009 12:37:47 +0000 (14:37 +0200)]
Fix insserv dependencies
(import of a Debian patch)
This patch removes xend from the list of dependencies.
Ganeti doesn't need xend running to startup, it will only need it later
(and only if xen is used as virtualisation technology). It also removes
'Xen' from the description in the init script.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Mon, 8 Jun 2009 13:23:52 +0000 (14:23 +0100)]
Fix a typo in InitCluster
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit
022c3a0b36cb60644b6861ff27ad59202883963c)
Iustin Pop [Wed, 12 Aug 2009 12:22:41 +0000 (14:22 +0200)]
Ignore results from drained nodes in iallocator
Since drained nodes could be (partially or fully) broken in iallocator,
we ignore results from these nodes when building the cluster map in
preparation for sending it to the script.
This is a cheap change for the stable branch; ideally we should not
query them at all.
The patch also fixes a typo in iallocator.rst.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Mon, 10 Aug 2009 16:40:08 +0000 (17:40 +0100)]
Ship the ethers hook
doc/examples/hooks/ethers has been added without being shipped in the
released tarball. Putting a stop to this.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Luca Bigliardi [Mon, 10 Aug 2009 16:41:14 +0000 (17:41 +0100)]
Ethers hook, compatibility with old lockfile
Remove "-l" option since some ancient systems ship a version of lockfile-progs
not supporting it.
Signed-off-by: Luca Bigliardi <shammash@google.com>
Guido Trotter [Mon, 10 Aug 2009 15:30:21 +0000 (16:30 +0100)]
Remove a few unused imports from noded/masterd
Signed-off-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Thu, 14 May 2009 15:00:32 +0000 (17:00 +0200)]
Move HVM's device_model to a hypervisor parameter
This moves yet another hardcoded value to a hypervisor parameter. I
removed the 64/32 difference as it doesn't seem valid to me - it's more
of a local site config rather than arch config.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
09ea8710e47288e73746698c50f328e400c056c9)
Iustin Pop [Thu, 14 May 2009 12:58:32 +0000 (14:58 +0200)]
Implement the KERNEL_PATH parameter for xen-hvm
For the xen-hvm hypervisor, the KERNEL_PATH parameter is needed but
today is hardcoded to a constants in the xen hypervisor library (argh!).
This patch moves this to a hypervisor constant with the default value
being the current hardcoded path. This will allow cluster/instance
customisation based on the installed xen version.
This should fix Debian bug #528618.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
e2ee1cea7709a2ef82153ec808c4fc3a5bce3ea1)
Guido Trotter [Thu, 28 May 2009 09:25:36 +0000 (10:25 +0100)]
Upgrade be/hv params with default values
From time to time we're adding new be or hv parameters. With this patch
missing parameters get set to the default value when loading the cluster
object. This patch version also considers the case when hv/be params
don't exist at all, and fixes a broken unit test triggered in that
case.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit
c1b42c18b914aa7ea650362ade7489448f71a523)
Guido Trotter [Thu, 28 May 2009 09:12:58 +0000 (10:12 +0100)]
Add cluster-init --no-etc-hosts parameter
If --no-etc-hosts is passed in at cluster init time we set a new
parameter in the cluster's object to false, and avoid adding nodes to
the hosts file. The UpgradeConfig function is used to set the value to
True, when upgrading from an old configuration version.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit
b86a6bcd476e420269a24a9b3d6289bebba69442)
Guido Trotter [Thu, 28 May 2009 09:10:11 +0000 (10:10 +0100)]
objects: add configuration upgrade system
Add a very basic configuration update mechanism to objects.
An object can define the UpgradeConfig method, which will be called at
init time, and use it to fill in missing defaults in the configuration.
In the future we may want to make it more complex, for example adding
the config version, but for now a basic solution will do.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit
560428be4f5852813972cd2791f425cf708ca7c6)
Guido Trotter [Fri, 7 Aug 2009 11:22:16 +0000 (12:22 +0100)]
Merge branch 'master' into next
* master:
Update NEWS and version for 2.0.3 release
devel/upload: revert rsync -p
export: add meaningful exit code
Fix detecting of errors in export
Implement gnt-cluster check-disk-sizes
rpc: add rpc call for getting disk size
bdev: Add function for reading actual disk size
Implement --ignore-size in activate-disks
Add ignore size support in _AssembleInstanceDisks
Add a objects.Disk.UnsetSize() method
bdev: allow ignoring of size in Assemble()
Fix instance import net option
Simplify the devel/upload script
Add a Copy method to object.ConfigObject
Extend call_node_start_master rpc with no_voting
Conflicts:
daemons/ganeti-masterd
s/SimpleConfigReader/SimpleStore/ VS start-master no-voting
(kept both)
Signed-off-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 7 Aug 2009 09:23:16 +0000 (11:23 +0200)]
Update NEWS and version for 2.0.3 release
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 5 Aug 2009 23:02:17 +0000 (00:02 +0100)]
example ethers hook: use lockfile-progs
Rather than writing our own locking routing, use the one implemented by
the lockfile-create program.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Luca Bigliardi [Wed, 5 Aug 2009 17:58:42 +0000 (18:58 +0100)]
ethers hook lock: use logger not echo
Overwrite debugging 'echo's
Signed-off-by: Luca Bigliardi <shammash@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Luca Bigliardi [Wed, 5 Aug 2009 17:48:20 +0000 (18:48 +0100)]
ethers hook: reduce the probability of data loss
The hook was exiting immediately if lock was not acquired, entering a timed
loop to have more chances when acquiring the lock.
Signed-off-by: Luca Bigliardi <shammash@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Wed, 5 Aug 2009 13:29:16 +0000 (15:29 +0200)]
devel/upload: revert rsync -p
The permissions replications also will change the permissions on the /
and /usr directories, which is bad. This reverts it to the original
behaviour.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 5 Aug 2009 11:02:48 +0000 (13:02 +0200)]
export: add meaningful exit code
Currently ‘gnt-backup export’ always returns exit code zero, even in the
face of complete failure during backup (only failure to stop/start the
instance will cause job failure and thus non-zero exit code). This is
bad, since one cannot script the backup.
This patch adds some simple results from the LU so that the command line
script can return good exit code. It will:
- return zero for full success (snapshot removal errors are ignored
though)
- return one for full failure (finalize export failure or all disks
failure)
- return two for partial failure (some disks backed up, some not)
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 5 Aug 2009 09:49:56 +0000 (11:49 +0200)]
Fix detecting of errors in export
This should fix issue 61, by explicitely calling bash (which is is now a
non-explicit dependency) and setting the pipefail command.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 4 Aug 2009 11:14:13 +0000 (13:14 +0200)]
Implement gnt-cluster check-disk-sizes
This patch adds a new opcode and lu for checking disk sizes. Currently
it does only top-level disk verification, and also doesn't check
primary/secondary node size mismatches (these two are added as TODOs in
the Exec() function of the LU).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Tue, 4 Aug 2009 10:43:22 +0000 (12:43 +0200)]
rpc: add rpc call for getting disk size
Note that this exports the disk size as bdev returns it, in bytes. The
value will be converted to MiB in cmdlib.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 26 Jul 2009 15:09:26 +0000 (17:09 +0200)]
bdev: Add function for reading actual disk size
This patch adds a GetActualSize for block devices that returns the
actual disk size. It is done using blockdev (and stat for file storage).
While this could be done via reading /sys/block/N/size, that is not as
simple as running blockdev, as the correspondence between an LV and its
sys entry is not straightforward.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 26 Jul 2009 19:35:10 +0000 (21:35 +0200)]
Implement --ignore-size in activate-disks
This patch modified OpActivateDisks, LUActivateDisks and gnt-instance
activate-disks to support and pass this option to
_AssembleInstanceDisks.
The patch is quite trivial I think; there should be no issues from it
except if used when not needed.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 26 Jul 2009 19:19:15 +0000 (21:19 +0200)]
Add ignore size support in _AssembleInstanceDisks
This patch adds an optional parameter to _AssembleInstanceDisks that
allows ignoring of size information by making a copy of the disk
structure and setting the size to zero.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 26 Jul 2009 19:26:57 +0000 (21:26 +0200)]
Add a objects.Disk.UnsetSize() method
This method recursively resets the size of the disk and its children to
zero.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Sun, 26 Jul 2009 15:27:49 +0000 (17:27 +0200)]
bdev: allow ignoring of size in Assemble()
This patch changes the DRBD8 class (the only one to use the size in
Assemble) to ignore the size in Assemble when a zero size is passed.
This will allow activation of disks even when the size recorded in the
configuration is wrong.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Tue, 4 Aug 2009 12:33:37 +0000 (14:33 +0200)]
Fix instance import net option
This is identical to
dc30b0e4 but applied to gnt-backup. Thanks to user
ocaner for catching it.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 4 Aug 2009 11:38:42 +0000 (13:38 +0200)]
Simplify the devel/upload script
Instead of multiple uploads to each node, this script copies everything
as needed to the temporary directory, exactly as to be installed in the
destination machine, then runs only one rsync per host.
This is more dangerous (we can break /etc now), but for development
machines is fine.
The patch then also uploads the bash completions and the current name
for the cron job (I think that ganeti-master-cron is a deprecated name,
not that someone actually intends to upload a file named like that). A
flag --no-cron is added to skip uploading the cron file if desired.
The patch also changes rsync to propagate the file permissions.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 26 Jul 2009 19:46:07 +0000 (21:46 +0200)]
Add a Copy method to object.ConfigObject
This small patch adds a simple Copy method that is can be used for
'throw-away' copies of objects.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 31 Jul 2009 14:49:26 +0000 (16:49 +0200)]
Add “gnt-job watch” command
This command can be used to follow the output of a job. It's useful
together with the --submit parameter for other commands.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 31 Jul 2009 14:48:27 +0000 (16:48 +0200)]
jqueue: Fix error when WaitForJobChange gets invalid ID
When JobQueue.WaitForJobChange gets an invalid or no longer existing job ID it
tries to return job_info and log_entries, both of which aren't defined yet.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 31 Jul 2009 12:55:45 +0000 (14:55 +0200)]
jqueue: Update message for cancelling running job
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 7 Jul 2009 13:35:05 +0000 (15:35 +0200)]
Extend call_node_start_master rpc with no_voting
When the parameter is set to True and start_daemons is also True,
ganeti-masterd will be started with the new --no-voting --yes-do-it
options.
This new option is set to True only on masterfailover, when no_voting is
used. This changed the behavior from 2.0, where we didn't start the
master daemon at all, when this option was used.
The manpage is also updated to remove the 2.0 only change.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Stephen Shirley [Thu, 23 Jul 2009 17:14:20 +0000 (19:14 +0200)]
lvmstrap: Change diskinfo to use GenerateTable
This way the produced table is formatted nicely.
Signed-off-by: Stephen Shirley <diamond@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 23 Jul 2009 13:41:02 +0000 (14:41 +0100)]
Get rid of constants.RAPI_ENABLE
This constant is unused, except in qa. Removing it since it's always True.
This patch also removes the unused qa_rapi.PrintRemoteAPIWarning
function, and removes a comment about temporary constants "until we have
cluster parameters".
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 23 Jul 2009 08:58:33 +0000 (09:58 +0100)]
Remove references to utils.debug
Various modules set it to True when called in debugging mode, but the
utils module supports no such global.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 23 Jul 2009 07:55:53 +0000 (08:55 +0100)]
ganeti-rapi, replace hardcoded exit value
substitute exit(1) with exit(constants.EXIT_FAILURE).
Also fix a wrongly indented line.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 23 Jul 2009 07:48:14 +0000 (08:48 +0100)]
Add the bind-address option to ganeti-rapi
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 22 Jul 2009 17:07:54 +0000 (18:07 +0100)]
noded: Abstract hard-coded sys.exit value
On machines without the ssl file noded exists '5'.
Changing this to constants.EXIT_NOTCLUSTER.
Also utils.GetNodeDaemonPort hasn't risen errors.ConfigurationError for
a while, so removing that try/except block.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Tue, 21 Jul 2009 12:53:40 +0000 (13:53 +0100)]
Add an example "ethers" hook
This hook can be used to update /etc/ethers with instance's mac
addresses. A dhcp server on the nodes can then serve to the instances
their correct address. (This has been tested with dnsmasq's dhcp
implementation)
Signed-off-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Tue, 21 Jul 2009 09:55:19 +0000 (11:55 +0200)]
burnin: move batch init/commit into a decorator
Many burnin steps initialize the batch queue at the beginning and commit
it at the end of their operation. This patch moves this code to a
decorator, in order to reduce redundant code.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Iustin Pop [Tue, 21 Jul 2009 09:41:12 +0000 (11:41 +0200)]
burnin: move instance alive checks to a decorator
Many burn steps to a manual check of instance aliveness, via duplicate
code. This patch moves this code to a decorator.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 21 Jul 2009 08:53:27 +0000 (10:53 +0200)]
burnin: Implement retryable operations
Some burnin steps are idempotent: e.g. reinstalling an instance (from
burning p.o.v.) can be done multiple times without any side-effects that
would affect later burnin steps. As such, failing the whole burnin
process due a reinstall failure is undesirable.
This patch modifies burnin by marking each opcode (in case of individual
execution) and job set retryable or not. Retryable actions will be
retried up to a number of times, after which we give up and return
failure.
One side-effect is that in case of full-failure in retryable job sets we
lose the original exception (but we do log its string format), so we
have a little bit less information in this case.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 20 Jul 2009 11:26:39 +0000 (13:26 +0200)]
Ignore vim swap files
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Sun, 19 Jul 2009 18:34:08 +0000 (20:34 +0200)]
burnin: fix removal errors hiding real errors
A long-standing bug in burnin makes errors during the removal phase
(e.g. because an import has failed, or because the initial creation has
failed) hide the original error.
This patch suppresses removal errors if we are already in ‘has_err’
mode, and otherwise it displays them normally.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 19 Jul 2009 13:27:12 +0000 (15:27 +0200)]
backend: Only build once the list of upload files
The list of upload files is built currently at every UploadFile() call.
This patch moves it to a separate variable which is initialized only
once.
This won't make much difference but I regard it as cleanup.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Wed, 10 Jun 2009 15:37:51 +0000 (17:37 +0200)]
Fix gnt-instance reinstall
Commit
55efe6dabe48e5c37dc1ff6099e0bb8afde7a468 "Convert instance
reinstall to multi instance model" actually broke instance reinstall for
single-instance cases. This one-liner fixes it.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
b6e243ab010d1df2b6c211b9edc9fe1978e52391)
Iustin Pop [Sun, 19 Jul 2009 14:40:57 +0000 (16:40 +0200)]
Fix a couple of epydoc warnings
It seems epydoc needs fully-qualified references, and doesn't deal with
relative ones (not even in the current module) if there are any
ambiguities.
There are other epydoc warnings, in the rapi docstrings, but those are
left as-is as they're removed in 2.1.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 19 Jul 2009 01:45:45 +0000 (03:45 +0200)]
job queue: fix loss of finalized opcode result
Currently, unclean master daemon shutdown overwrites all of a job's
opcode status and result with error/None. This is incorrect, since the
any already finished opcode(s) should have their status and result
preserved, and only not-yet-processed opcodes should be marked as
‘error’. Cancelling jobs between opcodes does the same (but this is not
allowed currently by the code, so it's not as important as unclean
shutdown).
This patch adds a new _QueuedJob function that only overwrites the
status and result of finalized opcodes, which is then used in job queue
init and in the cancel job functions. The patch also adds some comments
and a new set constants in constants.py highlighting the finalized vs.
non-finalized opcode statuses.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sat, 18 Jul 2009 23:51:04 +0000 (01:51 +0200)]
Switch gnt-debug submit-job to JobExecutor
Currently gnt-debug submits jobs individually, but in 2.1 JobExecutor
uses the optimized SubmitManyJobs luxi call and as such should be used
whenever multiple jobs need to be submitted.
This patch converts gnt-debug submit-job to use it and also removes an
extra empty line in the JobExecutor class.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 22 May 2009 12:27:46 +0000 (14:27 +0200)]
Convert instance reinstall to multi instance model
This patch converts ‘gnt-instance reinstall’ from single-instance to
multi-instance model; since this is dangerours, it's required to pass
“--force --force-multiple” to skip the confirmation.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
55efe6dabe48e5c37dc1ff6099e0bb8afde7a468)
Iustin Pop [Fri, 22 May 2009 11:01:35 +0000 (13:01 +0200)]
gnt-instance batch-create: use the job executor
This small patch changed the batch create functionality to use the job
executor instead of single-job submits.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
d4dd4b74a786cd0f31e5fc530f140aaf438c68e7)
Iustin Pop [Fri, 22 May 2009 10:25:31 +0000 (12:25 +0200)]
Modify cli.JobExecutor to use SubmitManyJobs
This patch changes the generic "multiple job executor" to use the many
jobs submit model, which automatically makes all its users use the new
model.
This makes, for example, startup/shutdown of a full cluster much more
logical (all the submitted job IDs are visible fast, and then waiting
for them proceeds normally).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
23b4b983afc9b9e81d558f06e4e0cde53703e575)
Iustin Pop [Thu, 21 May 2009 16:02:42 +0000 (18:02 +0200)]
Add a luxi call for multi-job submit
As a workaround for the job submit timeouts that we have, this patch
adds a new luxi call for multi-job submit; the advantage is that all the
jobs are added in the queue and only after the workers can start
processing them.
This is definitely faster than per-job submit, where the submission of
new jobs competes with the workers processing jobs.
On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
- 100 jobs:
- individual: submit time ~21s, processing time ~21s
- multiple: submit time 7-9s, processing time ~22s
- 250 jobs:
- individual: submit time ~56s, processing time ~57s
run 2: ~54s ~55s
- multiple: submit time ~20s, processing time ~51s
run 2: ~17s ~52s
which shows that we indeed gain on the client side, and maybe even on
the total processing time for a high number of jobs. For just 10 or so I
expect the difference to be just noise.
This will probably require increasing the timeout a little when
submitting too many jobs - 250 jobs at ~20 seconds is close to the
current rw timeout of 60s.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit
2971c9132b8b798178921a389b18d893edec06fb)
Iustin Pop [Sun, 19 Jul 2009 02:12:11 +0000 (04:12 +0200)]
job queue: fix interrupted job processing
If a job with more than one opcodes is being processed, and the master
daemon crashes between two opcodes, we have the first N opcodes marked
successful, and the rest marked as queued. This means that the overall
jbo status is queued, and thus on master daemon restart it will be
resent for completion.
However, the RunTask() function in jqueue.py doesn't deal with
partially-completed jobs. This patch makes it simply skip such opcodes.
An alternative option would be to not mark partially-completed jobs as
QUEUED but instead RUNNING, which would result in aborting of the job at
restart time.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Sun, 19 Jul 2009 02:01:16 +0000 (04:01 +0200)]
Fix an error path in job queue worker's RunTask
In case the job fails, we try to set the job's run_op_idx to -1.
However, this is a wrong variable, which wasn't detected until the
__slots__ addition. The correct variable is run_op_index.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Fri, 17 Jul 2009 15:16:51 +0000 (17:16 +0200)]
Add __slots__ on objects in jqueue
Adding slots to _QueuedOpCode decreases memory usage (of these objects)
by roughly four times. It is a lesser change for _QueuedJobs.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 17 Jul 2009 15:09:33 +0000 (17:09 +0200)]
ganeti.initd: Pass $*_ARGS to programs when restarting them
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 17 Jul 2009 12:54:30 +0000 (14:54 +0200)]
Optimizie OpCode loading
This patch converts the opcode loading to a pre-built map (at import
time) instead of iteration over the globals dict at each call.
Microbenchmarks show that this should be around three times faster, and
burnin still passes.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 17 Jul 2009 12:39:45 +0000 (14:39 +0200)]
Yet another fallout from the pylint fixes
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Guido Trotter [Fri, 17 Jul 2009 11:40:17 +0000 (13:40 +0200)]
Merge branch 'master' into next
* master:
Update NEWS and version for 2.0.2 release
Improve the description of node flags in man page
Change default stripe count to 1
Use full-stripe size in LVM growth
RAPI: implement instance reinstall
Iustin Pop [Thu, 16 Jul 2009 16:30:57 +0000 (18:30 +0200)]
Fix another issue with hypervisor_name change
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 16 Jul 2009 13:41:36 +0000 (15:41 +0200)]
Update NEWS and version for 2.0.2 release
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Raiford Storey [Thu, 16 Jul 2009 16:49:18 +0000 (09:49 -0700)]
Improve the description of node flags in man page
[iustin@google.com: slightly reworded the explanation for offline and
changed the commit message]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Thu, 16 Jul 2009 13:48:53 +0000 (15:48 +0200)]
Add enabled hypervisors to TestConfigRunner
This parameter is now mandatory for the cluster config to work.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 16 Jul 2009 12:44:17 +0000 (14:44 +0200)]
Add a few more checks to verify config
- Check that the enabled hypervisors list is valid
- Check that the master node is a valid node
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Thu, 16 Jul 2009 12:02:42 +0000 (14:02 +0200)]
Make sure enabled_hypervisors list is valid
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 16 Jul 2009 10:41:55 +0000 (12:41 +0200)]
Change default stripe count to 1
In order not to change the default during a stable series, we modify
configure.ac to default to one stripe, in effect keeping the status quo
(well, minus the LVM Attach() changes).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 13 Jul 2009 12:52:50 +0000 (14:52 +0200)]
Use full-stripe size in LVM growth
LVM has issues when growing stripped volumes, so it's best to specify
the growth in exact multiples of the full stripe size (as precise as
possible). For this we need to do a couple of changes:
- in LVM Attach(), we query additionally the VG extent size and the LV
stripe count; since this makes lvs return a (possibly) multi-line
output, we now split it into lines and only take the last one
- in LVM Grow(), we round up the increase in multiples of the full
stripe size
The patch also sets the correct target size in DRBD growth.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Guido Trotter [Tue, 14 Jul 2009 15:47:03 +0000 (17:47 +0200)]
Remove ConfigWriter.InitConfig
It's been replaced by a simpler bootstrap.InitConfig function, which
does the same job, and is currently unused.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 14 Jul 2009 15:05:34 +0000 (17:05 +0200)]
Remove SimpleConfigWriter.SetMasterNode
This function is not used.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Tue, 14 Jul 2009 13:42:58 +0000 (15:42 +0200)]
_GenerateDiskTemplate: use base_index in the name
Currently if a disk is added later the base_index is not considered, and
all the disks are called disk0. This patch fixes it.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Tue, 14 Jul 2009 12:09:21 +0000 (14:09 +0200)]
ganeti-masterd: avoid SimpleConfigReader
SimpleStore is a lot less heavyweight than SimpleConfigReader, and to
just get the master name we can use that. This is the only usage of
SimpleConfigReader currently, but we're not going to delete the class,
as new usages will come in for ganeti-confd (in 2.1). Using it there,
though, will make the class even more heavy to load, so it makes sense
for this simple usage to be converted.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Mon, 13 Jul 2009 13:55:56 +0000 (15:55 +0200)]
cmdlib: Fix typo in LUQueryClusterInfo
This was broken by my pylint fixes patch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 13 Jul 2009 09:11:41 +0000 (11:11 +0200)]
RAPI: implement instance reinstall
This patch adds instance reinstall to RAPI, with two optional parameters:
- ‘os', in order to change the OS on reinstall
- ‘nostartup’, in order to leave the instance down after reinstall
The call will first shutdown the instance, the reinstall it, and unless
‘nostartup’ has been passed and is equal to 1, it will be started
automatically.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Wed, 8 Jul 2009 09:27:52 +0000 (11:27 +0200)]
Merge branch 'master' into next
* master:
Create a new --no-voting option for masterfailover
ganeti-masterd: allow non-interactive --no-voting
Guido Trotter [Wed, 8 Jul 2009 08:34:11 +0000 (10:34 +0200)]
Create a new --no-voting option for masterfailover
This allows failing over in certain corner cases, such as a 2 node
cluster with one node down. The man page is also updated to document
this dangerous option and how to recover from this situation.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Guido Trotter [Tue, 7 Jul 2009 13:23:38 +0000 (15:23 +0200)]
ganeti-masterd: allow non-interactive --no-voting
This will be used by ganeti-noded to start ganeti-masterd in a
--no-voting masterfailover.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 3 Jul 2009 20:42:23 +0000 (22:42 +0200)]
Fix pylint warnings
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 3 Jul 2009 20:41:01 +0000 (22:41 +0200)]
Add custom pylintrc
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 3 Jul 2009 19:54:08 +0000 (21:54 +0200)]
bootstrap: Don't leak file descriptor when generating SSL certificate
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 2 Jul 2009 20:39:20 +0000 (22:39 +0200)]
Fix problem with EAGAIN on socket connection in clients
If a user used ^Z to stop the program, poll() in socket.recv would return
EAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 1 Jul 2009 21:28:35 +0000 (23:28 +0200)]
Fix some typos
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 1 Jul 2009 09:08:13 +0000 (11:08 +0200)]
Increase maximum accepted size for a DRBD meta dev
With the change to stripped LVs, the actual size of a meta device (which
is small) can be more than we expected (for non-stripped LVs). This
patch increases from 160MB to 1GB the accepted size, and updates the
comment with the rationale behind this change.
Note that we do want even meta devices stripped, since it can increase
metadata update.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Iustin Pop [Tue, 30 Jun 2009 16:10:16 +0000 (18:10 +0200)]
Cleanup config data when draining nodes
Currently, when draining nodes we reset their master candidate flag, but
we don't instruct them to demote themselves. This leads to “ERROR: file
'/var/lib/ganeti/config.data' should not exist on non master candidates
(and the file is outdated)”.
This patch simply adds a call to node_demote_from_mc in this case.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 30 Jun 2009 15:51:26 +0000 (17:51 +0200)]
Fix node readd issues
This patch fixes a few node readd issues.
Currently, the node readd consists of two opcodes:
- OpSetNodeParms, which resets the offline/drained flags
- OpAddNode (with readd=True), which reconfigures the node
The problem is that between these two, the configuration is inconsistent
for certain cluster configurations. Thus, this patch removes the first
opcode and modified the LUAddNode to deal with this case too.
The patch also modifies the computation of the intended master_candidate
status, and actually sets the readded node to master candidate if
needed. Previously, we didn't modify the existing node at all.
Finally, the patch modifies the bottom of the Exec() function for this
LU to:
- trigger a node update, which in turn redistributes the ssconf files
to all nodes (and thus the new node too)
- if the new node is not a master candidate, then call the
node_demote_from_mc RPC so that old master files are cleared
My testing shows this behaves correctly for various cases.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 30 Jun 2009 15:59:29 +0000 (17:59 +0200)]
backend.DemoteFromMC: don't fail for missing files
If the config file is missing when the DemoteFromMC() function is
called, it will raise a ProgrammerError. Instead of changing the
utils.CreateBackup() file which is called from multiple places, for now
we only change the DemoteFromMC() function to not call it if the file is
not existing (we rely on the master to prevent race conditions here).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Iustin Pop [Tue, 30 Jun 2009 13:09:00 +0000 (15:09 +0200)]
Allow GetMasterCandidateStats to ignore some nodes
This patch modifies ConfigWriter.GetMasterCandidateStats to allow it to
ignore some nodes in the calculation, so that we can use it to predict
cluster state without some nodes (which we know we will modify, and thus
we should not rely on their state).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Iustin Pop [Tue, 30 Jun 2009 07:49:57 +0000 (09:49 +0200)]
Fix error message for extra files on non MC nodes
Currently the message for extraneous files on non master candidates is
confusing, to say the least. This makes it hopefully more clear.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>
Iustin Pop [Mon, 29 Jun 2009 11:02:03 +0000 (13:02 +0200)]
Fix adjustement of candidates in cluster modify
The code for adjusting the candidate pool size was done after the config
update, and this means we triggered the save of the config file without
fixing the candidate pool, which aborts with an error.
The patch just moves it above. The old comment was valid, but we anyway
save the config file in MaintainCandidatePool, so this should be safe.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Mon, 29 Jun 2009 08:57:27 +0000 (10:57 +0200)]
Add a new node list field
This patch adds a ‘role’ node list field, which shows a one-character
node status. This is a simpler way to see the node status than selecting
all the flags individually.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Tue, 23 Jun 2009 11:38:35 +0000 (13:38 +0200)]
Fix HTTP server library handling of credentials
Currently the http library only checks credentials when authentication
is required. This means that any credentials are accepted on the root
resource, for example, which makes problems hard to diagnose - the
user/pw works for all queries, until one tries to do a modification at
which point fails.
This patch changes the PreHandleRequest() function to not ignore
credentials when passed, even if we don't require authentication. This
makes the behavior of RAPI more predictable.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Tue, 23 Jun 2009 11:07:58 +0000 (13:07 +0200)]
Fix a typo in backend.InstanceReboot docstring
The documentation for the reboot was wrong. This patch fixes it and
updates the docstring with more details.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Wed, 17 Jun 2009 13:42:09 +0000 (15:42 +0200)]
Fix handling of 'vcpus' in instance list
Currently running “gnt-instance list -o+vcpus” fails with a cryptic message:
Unhandled Ganeti error: vcpus
This is due to multiple issues:
- in some corner cases cmdlib.py raises an errors.ParameterError but
this is not handled by cli.py
- LUQueryInstances declares ‘vcpu’ as a supported field, but doesn't handle
it, so instead of failing with unknown parameter, e.g.:
Failure: prerequisites not met for this operation:
Unknown output fields selected: vcpuscd
it raises the ParameteError message
This patch:
- adds handling of 'vcpus' to LUQueryInstances
- adds handling of the ParameterError exception to cli.py
- changes the 'else: raise errors.ParameterError' in the field handling of
LUQueryInstance to an assert, since it's a programmer error if we reached
this step
With this, a future unhandled parameter will show:
gnt-instance list -o+vcpus
Unhandled protocol error while talking to the master daemon:
Caught exception: Declared but unhandled parameter 'vcpus'
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Wed, 17 Jun 2009 11:01:46 +0000 (13:01 +0200)]
Fix checking for valid OS in instance create
The current check in LUCreateInstance.CheckPrereq() is wrong - it only checks
if we got an OS, but not if we got a valid OS. This patch fixes it.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Wed, 17 Jun 2009 10:50:39 +0000 (12:50 +0200)]
Show disk size in instance info
The size of the instance's disk was not shown in “gnt-instance info”.
This patch adds it and formats it nicely if possible.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>