Iustin Pop [Wed, 30 Nov 2011 09:33:52 +0000 (10:33 +0100)]
Add UnescapeAndSplit unittest for multi-escapes
This would have caught the bug in the first place. Argh,
hand-generated test cases!
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Nikos Skalkotos [Tue, 29 Nov 2011 12:30:46 +0000 (14:30 +0200)]
Fix a bug in command line option parsing code
Fix bug affecting command line options of "keyval" type. Although
escaping commands with \ is supported, it is is not applied to the
input recursively.
Signed-off-by: Nikos Skalkotos <skalkoto@grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 24 Nov 2011 12:02:36 +0000 (13:02 +0100)]
ConfigWriter: Fix epydoc error
The parameter is called “mods”, not “modes”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
Michael Hanselmann [Thu, 24 Nov 2011 07:43:04 +0000 (08:43 +0100)]
LUGroupAssignNodes: Fix node membership corruption
Note: This bug only manifests itself in Ganeti 2.5, but since the
problematic code also exists in 2.4, I decided to fix it there.
If a node was assigned to a new group using “gnt-group assign-nodes” the
node object's group would be changed, but not the duplicate member list
in the group object. The latter is an optimization to require fewer
locks for other operations. The per-group member list is only kept in
memory and not written to disk.
Ganeti 2.5 starts to make use of the data kept in the per-group member
list and consequently fails when it is out of date. The following
commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was
confirmed using additional logging):
$ gnt-group add foo
$ gnt-group assign-nodes foo $(gnt-node list --no-header -o name)
$ gnt-cluster verify # Fails with KeyError
This patch moves the code modifying node and group objects into
“config.ConfigWriter” to do the complete operation under the config
lock, and also to avoid making use of side-effects of modifying objects
without calling “ConfigWriter.Update”. A unittest is included.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Vangelis Koukis [Thu, 27 Oct 2011 17:04:20 +0000 (20:04 +0300)]
Ensure unused ports return to the free port pool
Ensure ports previously allocated by calling ConfigWriter's AllocatePort() are
returned to the pool of free ports when no longer needed:
* Return the network_port of an instance when it is removed
* Return the port used by a DRBD-based disk when it is removed
Signed-off-by: Vangelis Koukis <vkoukis@grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 14 Nov 2011 09:01:06 +0000 (10:01 +0100)]
Re-wrap a paragraph to eliminate a sphinx warning
This just makes sure that the paragraph doesn't contains lines that
start with :, which make Sphinx (1.0.7) complain.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
René Nussbaumer [Thu, 27 Oct 2011 12:57:10 +0000 (14:57 +0200)]
Update NEWS and increase to 2.4.5
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Wed, 19 Oct 2011 14:51:27 +0000 (16:51 +0200)]
Fix queue archive creation with wrong permissions
On a master failover some of the archive dirs might have wrong
permissions in the non-root model. This is due to the nature of noded
still running as root and the job queue is synced that way. This patch
will fix this behaviour by setting the permissions accordingly.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 18 Oct 2011 11:39:34 +0000 (13:39 +0200)]
Update NEWS for unreleased 2.4.5
I need this for another 2.5 release.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Wed, 12 Oct 2011 10:37:43 +0000 (12:37 +0200)]
rpc: Disable HTTP client pool and reduce memory consumption
We noticed that “ganeti-masterd” can use large amounts of memory,
especially on large clusters. Measurements showed a single PycURL client
using about 500 kB of heap memory (the actual usage depends on versions,
build options and settings).
The RPC client uses a per-thread HTTP client pool with one client per
node. At this time there are 41 non-main threads (25 for the job queue
and 16 for client requests). This means the HTTP client pools use a lot
of memory (ca. 200 MB for 10 nodes, ca. 1 GB for 50 nodes).
This patch disables the per-thread HTTP client pool. No cleanup of
unused code is done. That will be done in the master branch only.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 14 Jul 2011 20:49:34 +0000 (22:49 +0200)]
Fix assertion error on unclean master shutdown
Commit
66bd7445 added an assertion to ensure a finalized job has its
“end_timestamp” attribute set. Unfortunately it didn't cover a case when
the queue is recovering from an unclean master shutdown.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
(cherry picked from commit
45df0793c6bc83001aa545fda95c1ad9a35d732f)
Michael Hanselmann [Fri, 26 Aug 2011 12:21:35 +0000 (14:21 +0200)]
utils: Fix UnescapeAndSplit parsing bug
If a value passed to UnescapeAndSplit ended with a backslash an
exception would be raised:
$ gnt-instance modify -H mem=x\\ inst1.example.com
[…]
e2 = slist.pop(0)
IndexError: pop from empty list
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Tue, 23 Aug 2011 09:42:51 +0000 (11:42 +0200)]
Version bump 2.4.4
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Tue, 23 Aug 2011 09:21:36 +0000 (11:21 +0200)]
Update NEWS file
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Tue, 23 Aug 2011 09:10:36 +0000 (11:10 +0200)]
Documentation fix for importing with --src-dir option
Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
(cherry picked from commit
b7d7876bd0e9844fab8be28bfa1fd5d563ec7412)
Conflicts:
lib/cmdlib.py (easily fixed)
René Nussbaumer [Tue, 23 Aug 2011 09:04:27 +0000 (11:04 +0200)]
Adding missing test data for commit
7a380ddfc
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Tue, 23 Aug 2011 07:23:39 +0000 (09:23 +0200)]
Fix a parsing issue with DRBD 8.3.11 in the Linux Kernel
In the Linux kernel commit
4b0715f096 introduced a display bug into
/proc/drbd which broke our regex.
The bug was first introduced into Linux 2.6.39-rc1. This bug is still
unfixed as of today.
This patch adapt the regular expression to workaround this bug for the
time being.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Thu, 18 Aug 2011 11:17:15 +0000 (13:17 +0200)]
ensure-dirs: Fix a bug with queue/archive permissions
While it sets the permission on all files in queue/archive accordingly
it doesn't do so for the created archive directories. This patch fixes
this problem.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 5 Aug 2011 12:48:31 +0000 (14:48 +0200)]
Fix typo in NEWS
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 5 Aug 2011 12:43:05 +0000 (14:43 +0200)]
doc/admin: s/grub/GRUB/
“GRUB” is an acronym for GRand Unified Bootloader.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
René Nussbaumer [Thu, 4 Aug 2011 14:21:06 +0000 (16:21 +0200)]
Bumping version to 2.4.3
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 4 Aug 2011 11:40:43 +0000 (13:40 +0200)]
Update the NEWS file for 2.4.3
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 3 Aug 2011 15:23:04 +0000 (17:23 +0200)]
Fix small typo in docstring
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Wed, 3 Aug 2011 11:44:00 +0000 (13:44 +0200)]
Fix typo in NEWS
“--dry-run” starts with two dashes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Pedro Macedo [Tue, 2 Aug 2011 15:19:36 +0000 (17:19 +0200)]
Add a flag to burnin to allow specifying VCPU count.
Signed-off-by: Pedro Macedo <pmacedo@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 28 Jul 2011 09:18:08 +0000 (11:18 +0200)]
Add support for cluster/OS parameters in QA
Currently there is no way to QA with (for example) an initrd because
the QA only inits the cluster with the default parameters. This makes
it impossible to QA using anything but the default parameters, which
doesn't always work.
Additionally, we add OS parameters and OS hypervisor parameters, for
completeness and for testing that these commands also work.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Ben Lipton [Mon, 25 Jul 2011 17:22:36 +0000 (13:22 -0400)]
Add OS search path to gnt-cluster info
Otherwise, it's pretty hard to figure it out from the command line.
Signed-off-by: Ben Lipton <benlipton@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 25 Jul 2011 12:35:51 +0000 (14:35 +0200)]
Reopen daemon's stdio on SIGHUP
Before this patch daemons would continue to refer to an old logfile for
their standard I/O if they had been asked to reopen the log (SIGHUP).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 25 Jul 2011 11:08:22 +0000 (13:08 +0200)]
Reopen log file only once after SIGHUP
Commit
b6fa9a44 added a re-openable log handler. The log file is
reopened when a daemon is sent a HUP signal. Due to a bug in the code,
fixed by this patch, the log file would be reopened for every single log
message thereafter.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 25 Jul 2011 10:02:24 +0000 (12:02 +0200)]
Don't leak file descriptors when setting up daemon output
When a daemon's output is configured using “utils.SetupDaemonFDs”, the
function must use dup2(2). Unfortunately the code didn't close the
original file descriptors, leaking them in the process.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 22 Jul 2011 09:55:46 +0000 (11:55 +0200)]
Fix aliases in bash completion
Ever since commit
2d48a3a2 aliases were not included in the bash
completion script. This patch also replaces one tab with two spaces.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 22 Jul 2011 08:27:09 +0000 (10:27 +0200)]
gnt-node volumes: Fix instance names
Commit
84d7e26b changed “objects.Instance.MapLVsByN” to not just return
the LV name, but to include the volume group name (e.g.
“xenvg/
d67e8700….disk0_data”). This in turn broke the mapping of volume
names in LUNodeQueryvols, stopping instance names from displayed in
“gnt-node volumes”.
This patch fixes the issue and does some cleanup.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 8 Jul 2011 19:33:16 +0000 (21:33 +0200)]
ht: Add new check for numbers
Places which receive floats can usually also deal with integers, e.g.
OpTestDelay. Tests are added and the new check function is used for the
aforementioned opcode and verifying query results.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 7 Jul 2011 19:10:20 +0000 (21:10 +0200)]
Fix off-by-one bug in job serial generation
Commit
009e73d0 (September 2009) changed the job queue to generate
multiple job serials at once. Ever since it would return one more than
requested.
The “serial” file in the job queue directory is defined to contain the
“last job ID used” (design-2.0). With the change above, the serial file
would always contain the next serial number. The first value returned by
the generating function was the one contained in the file, so during the
switch in 2009 one job may have been overwritten.
This patch changes the code to always return the exact number of
serials, to keep the last used serial on disk and adds an assertion.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Thu, 30 Jun 2011 15:15:51 +0000 (17:15 +0200)]
Shorten some unbreakable lines in man pages
In order to make the display right on 80-columns terminals.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Thu, 30 Jun 2011 15:07:34 +0000 (17:07 +0200)]
Correct some spelling mistakes
New lintian is even smarter:
- overriden → overridden
- allows to → allows one to
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Iustin Pop [Tue, 28 Jun 2011 12:24:45 +0000 (14:24 +0200)]
Fix bug in recreate-disks for DRBD instances
The new functionality in 2.4.2 for recreate-disks to change nodes is
broken for DRBD instances: it simply changes the nodes without caring
for the DRBD minors mapping, which will lead to conflicts in non-empty
clusters.
This patch changes Exec() method of this LU significantly, to both fix
the DRBD minor usage and make sure that we don't have partial
modification to the instance objects:
- the first half of the method makes all the checks and computes the
needed configuration changes
- the second half then performs the configuration changes and
recreates the disks
This way, instances will either be fully modified or not at all;
whether the disks are successfully recreate is another point, but at
least we'll have the configuration sane.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 28 Jun 2011 12:41:28 +0000 (14:41 +0200)]
Fix a lint warning
Patch
db8e5f1c removed the use of feedback_fn, hence pylint warn
now.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Apollon Oikonomopoulos [Wed, 22 Jun 2011 09:03:41 +0000 (12:03 +0300)]
KVM: configure bridged NICs at migration start
Commit
5d9bfd870 moved tap interface handling from KVM to Ganeti, partly
to also solve the problem of routed interfaces getting configured too
early during live migrations, causing network anomalies. In that
direction, configuration of NICs of incoming instances was deferred to
FinalizeMigration time.
However, this causes minor issues with bridged interfaces; KVM sends out
an ARP-like packet upon migration finish, which is lost because the tap
interface is not yet configured. As a consequence, intermediate network
equipment (i.e. switches) does not get notified about the topology
change, until the instance transmits another packet after the bridge has
been configured, or the switch's ARP cache expires.
The proper solution to that is to support different phases in network
configuration (pre/post migration), which also requires separate ifup
scripts. Until then we fall back to configuring bridged interfaces on
incoming instances at migration start, instead of finish.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 27 Jun 2011 10:54:28 +0000 (12:54 +0200)]
Fix RAPI documentation regarding master role
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 10 Jun 2011 14:44:03 +0000 (16:44 +0200)]
Fix bug in drbd8 replace disks on current nodes
Currently the drbd8 replace-disks on the same node (i.e. -p or -s) has
a bug in that it does modify the instance disk temporarily before
changing it back to the same value. However, we don't need to, and
shouldn't do that: what this operation do is simply change the LVM
configuration on the node, but otherwise the instance disks keep the
same configuration as before.
In the current code, this change back-and-forth is fine *unless* we
fail during attaching the new LVs to DRBD; in which case, we're left
with a half-modified disk, which is entirely wrong.
So we change the code in two ways:
- use temporary copies of the disk children in the old_lvs var
- stop updating disk.children
Which means that the instance should not be modified anymore (except
maybe for SetDiskID, which is a legacy and unfortunate decision that
will have to cleaned up sometime).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Wed, 22 Jun 2011 13:14:27 +0000 (14:14 +0100)]
LUInstanceCreate: use opcodes.RequireFileStorage
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Guido Trotter [Thu, 9 Jun 2011 09:01:30 +0000 (09:01 +0000)]
Don't add ",boot=on" to disks on kvm >= 0.14
Under newer kvm this prevents the vm from starting.
Ah, change!
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Apollon Oikonomopoulos [Wed, 22 Jun 2011 15:41:29 +0000 (18:41 +0300)]
KVM: fix per-instance stored UID value
When using the pool security model, _ExecuteKVMRuntime was storing the
instance's UID using str(uid), which would result in storing the
LockedUid.__repr__() result:
$ cat /var/run/ganeti/kvm-hypervisor/uid/xxxxxxxxxxxxx
<ganeti.uidpool.LockedUid object at 0x1f30610>
This patch restores the intended behaviour, by using LockedUid.AsStr().
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Guido Trotter [Fri, 17 Jun 2011 11:44:55 +0000 (14:44 +0300)]
Add one forgotten element to the file disk path
This was left out during the fix/refactoring
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Guido Trotter [Fri, 17 Jun 2011 09:39:43 +0000 (12:39 +0300)]
LUInstanceCreate: fix file storage dir calculation
- Move the calculation at the beginning of CheckPrereq, since it doesn't
modify any state, but still keeps locks
- Only perform the calculation if the actual disk template is filebased
- Error out if there is no defined file storage dir
- Only join the optional --file-storage-dir extra-path if one is passed
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Guido Trotter [Fri, 17 Jun 2011 09:23:51 +0000 (12:23 +0300)]
Check that filestorage is enabled when requested
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Guido Trotter [Fri, 17 Jun 2011 09:16:58 +0000 (09:16 +0000)]
Remove self.op.file_storage_dir isabs check
As the manpage says, and the code does, self.op.file_storage_dir is an
additional relative path under the cluster file storage dir. As such it
should not be absolute.
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 31 May 2011 14:49:45 +0000 (16:49 +0200)]
jqueue: Fix potential race condition when cancelling queued jobs
When a job was cancelled, its status would be changed and the file
written again. Since this was a final status, the job file could be
moved anytime for archival. If the job was still in the queue, however,
it would be processed (not fully, just updating the “end_timestamp”
attribute) and written again. This was bad as it could leave the same
job in two different files.
With this patch the processor is changed to return early for finished
jobs. Cancelling a queued job will finalize it right away. Unittests are
updated.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Apollon Oikonomopoulos [Mon, 30 May 2011 11:02:25 +0000 (14:02 +0300)]
Fix argument order in ReserveLV and ReserveMAC
ConfigWriter.ReserveLV() and Configwriter.ReserveMAC() called
TemporaryReservationManager.Reserve() with the ec_id and resource arguments
swapped. As a result, two reservation attempts for the same resource type
within the same LU would fail, even if the resources requested were different,
e.g.:
$ gnt-instance add -t sharedfile -o debootstrap+default \
--net 0:mac=00:01:02:03:04:00 \
--net 1:mac=00:01:02:03:04:ff \
--disk 0:size=2g test_instance
Failure: prerequisites not met for this operation:
error type: resource_not_unique, error details:
MAC address 00:01:02:03:04:ff already in use in cluster
This patch fixes the argument order in the call to Reserve().
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Thu, 26 May 2011 12:36:50 +0000 (14:36 +0200)]
TLReplaceDisks: Move assertion checking locks
Commit
1bee66f3 added assertions for ensuring only the necessary locks
are kept while replacing disks. One of them makes sure locks have been
released during the operation. Unfortunately the commit added the check
as part of a “finally” branch, which is also run when an exception is
thrown (in which case the locks may not have been released yet). Errors
could be masked by the assertion error. Moving the check out of the
“finally” branch fixes the issue.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Tue, 24 May 2011 09:29:39 +0000 (11:29 +0200)]
node evac: don't call IAllocator if no instances
Currently we generate an empty list only for the '-n node' invocation,
but for iallocator we still call the iallocator (which needs an RPC
call, etc.). By moving the computation of instances outside of the if
block, we can return early from the LU.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Thu, 19 May 2011 08:37:58 +0000 (10:37 +0200)]
RPC/Backend: Make UploadFile uid and gid agnostic
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Fri, 20 May 2011 12:24:13 +0000 (14:24 +0200)]
Resolve uid/gid upon mainloop run
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Wed, 18 May 2011 12:19:52 +0000 (14:19 +0200)]
GetEntResolver: Make it possible to resolve uid/gid to name
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Fri, 20 May 2011 12:17:29 +0000 (14:17 +0200)]
utils.algo: Add InvertDict to invert a dict
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 19 May 2011 11:41:52 +0000 (13:41 +0200)]
autotools: Add noded group
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 17 May 2011 16:13:16 +0000 (18:13 +0200)]
Fix errors in hooks documentation
In many cases the opcode ID was incorrect. A unittest for this will
be added in the master branch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 16 May 2011 11:26:44 +0000 (13:26 +0200)]
Clarify a bit the noded man page
"This can be overriden" can be read as either the port we listen on or
the address we bind to. Replace with "The port" for great clarity!
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 13 May 2011 15:54:34 +0000 (17:54 +0200)]
Note --no-remember in NEWS
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 13 May 2011 15:38:25 +0000 (17:38 +0200)]
Switch QA over to using instance stop --no-remember
Instead of hardcoded Xen commands. This will make it work for all
hypervisors, instead of duplicating hypervisor functionality in QA
itself.
The timeout has been removed as gnt-instance stop itself will make
sure the instance is down before returning. We just double-check that
it is indeed down.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 13 May 2011 15:20:29 +0000 (17:20 +0200)]
Implement no_remember at RAPI level
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 13 May 2011 15:17:44 +0000 (17:17 +0200)]
Implement no_remember at CLI level
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 13 May 2011 15:02:35 +0000 (17:02 +0200)]
Introduce instance start/stop no_remember attribute
This will allow stopping or starting an instance without changing the
remembered state. While this seems counter-intuitive at first (it will
create cluster verify errors), it can help in a few corner cases:
- shutting down an entire cluster for maintenance but without having
to remember state
- doing testing of Ganeti itself
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Thu, 12 May 2011 13:46:46 +0000 (15:46 +0200)]
Bump version for the 2.4.2 release
I think we should stop finding bugs and instead release this :)
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 12 May 2011 11:39:47 +0000 (13:39 +0200)]
Fix a bug in LUInstanceMove
The opcode parameter ignore_consistency was used in the LU, but not
actually declared in the OpCode. The patch adds it in the opcode and
the command line client.
ObQuote — Please, please, can I have static typing?
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Thu, 12 May 2011 11:34:23 +0000 (13:34 +0200)]
Abstract ignore_consistency opcode parameter
Two opcodes already use it and we need it for a third, time to add a
constant for it.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Thu, 12 May 2011 11:17:47 +0000 (13:17 +0200)]
Preload the string-escape code in noded
This encoding, part of the standard Python installation, is used by
the pickle module (in turn used by subprocess when handling
failures in program execution). Preloading it means that Python will
cache it in memory so that even if the disk goes away or just the
module, we're not going to fail in reporting errors.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Wed, 11 May 2011 16:14:30 +0000 (18:14 +0200)]
Fix error in iallocator documentation reg. disk mode
The code uses the disk object's “mode” attribute, which uses the
constants DISK_RDONLY (“ro”) and DISK_RDWR (“rw”).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 11 May 2011 15:54:27 +0000 (17:54 +0200)]
Try to prevent instance memory changes N+1 failures
There are multiple bugs with the code checking for N+1 failures in the
instance memory changes which needs significant changes, in the
meantime we can at least:
- change the warning message into an error (--force will skip checks)
- only make checks when we increase the memory
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 10 May 2011 16:45:18 +0000 (18:45 +0200)]
Update NEWS file for the 2.4.2 release
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Marco Casavecchia [Mon, 2 May 2011 08:39:50 +0000 (01:39 -0700)]
Use floppy disk and a second CDROM on KVM
Hi all,
this patch will add 3 new KVM parameters and a new option.
New Parameters:
- floppy_image_path = "" -> Specify the floppy image to load as
floppy disk.
- cdrom2_image_path = "" -> Specify a second cdrom image to load on
the system (note: this in not intended to be used as a boot device. To
boot the system from cdrom you must use the "cdrom_image_path"
parameter as always).
- cdrom_disk_type = "" -> it can be one of the kvm supported types as
"ide,scsi,paravirtual,ecc". I introduced this optional parameter to
make possible to specify a different virtual device for cdroms. It is
useful if you want to install a windows system
New option for "boot_device" parameter:
- "floppy": with this value you should be able to boot a KVM
instance from floppy image.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit
cc130cc7a60fd5377c032116b0c036ae44639913)
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 10 May 2011 15:54:31 +0000 (17:54 +0200)]
Document the selection of instance kernels
A simple doc patch to document how to configure the kernels for the
instances.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Mon, 9 May 2011 13:49:10 +0000 (15:49 +0200)]
Make root_path an optional hypervisor parameter
This will allow us an easy migration to pv-grub, because a set root_path
confused pv-grub.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 9 May 2011 12:09:03 +0000 (14:09 +0200)]
Some man page updates
This adds documentation for both the short and long form of many
options (which was inconsistent before: in some cases only the short
form was used, in others only the long form).
Note that the standard this patch adopts is to document both forms as
such:
{-O|--os-parameters} …
This makes it a bit uglier in complex situations, but the alternatives
considered were not perfect either. Other suggestions (with patches)
welcome.
Additionally, it fixes two doc bugs:
- in gnt-cluster.rst, the --prealloc-wipe-disks section was in the
middle of a paragraph
- in gnt-instance.rst, a list was not typed correctly, thus it was
mangled as a single paragraph
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Marco Casavecchia [Thu, 5 May 2011 09:17:09 +0000 (02:17 -0700)]
Add 2 new variables to the OS scripts environment
Add INSTANCE_PRIMARY_NODE and INSTANCE_SECONDARY_NODES. These new
values are useful for OS scripts that needs to know the nodes where
the instance lives.. or has lived.
Signed-off-by: Iustin Pop <iustin@google.com>
[iustin@google.com: fixed small issue with SECONDARY_NODES]
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 9 May 2011 09:42:25 +0000 (11:42 +0200)]
Add --no-wait-for-sync when converting to drbd
Currently, when converting an instance from plain to DRBD, the
instance is blocked during the entire resync period. This patch adds
the --no-wait-for-sync so that the operation finishes as soon as the
DRBD sync has started, without waiting for the entire sync. This makes
the instance available much faster.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Sat, 7 May 2011 10:25:18 +0000 (12:25 +0200)]
Recreate instance disks: allow changing nodes
This patch introduces the option of changing an instance's nodes when
doing the disk recreation. The rationale is that currently if an
instance lives on a node that has gone down and is marked offline,
it's not possible to re-create the disks and reinstall the instance on
a different node without hacking the config file.
Additionally, the LU now locks the instance's nodes (which was not
done before), as we most likely allocate new resources on them.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Iustin Pop [Fri, 6 May 2011 09:03:30 +0000 (11:03 +0200)]
Rename instance: only show new name when different
It makes not sense to show messages like:
Fri May 6 02:04:01 2011 - INFO: Resolved given name 'instance18' to
'instance18'
So we'll skip the message if the resolved name is identical to the
requested one.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 5 May 2011 13:38:43 +0000 (15:38 +0200)]
Fix race condition in LUGroupAssignNodes
The original code would get all node information and their groups
without before acquiring the necessary locks. With this patch the node
information is only retrieved once all locks have been acquired. Groups
are locked optimistically and verified after acquiring the node locks.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Wed, 4 May 2011 11:06:12 +0000 (13:06 +0200)]
Re-wrap and fix formatting issues in gnt-instance.rst
This is mostly rewrapping plus fixing a few small issues in
gnt-instance.rst.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Marco Casavecchia [Tue, 3 May 2011 10:16:45 +0000 (12:16 +0200)]
Documentation for the new parameters for KVM
Options added/updated are: cdrom2_image_path, floppy_image_path,
cdrom_disk_type and boot_order.
Signed-off-by: Iustin Pop <iustin@google.com>
[iustin@google.com: small formatting update]
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 3 May 2011 15:37:37 +0000 (17:37 +0200)]
cmdlib: Fix typo, s/nick/NIC/
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Mon, 2 May 2011 13:20:43 +0000 (15:20 +0200)]
A small optimisation in cluster verify
This removes (count of instances + count of nodes) lock
acquires/releases.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 2 May 2011 13:00:26 +0000 (15:00 +0200)]
A few docstring fixes
At least one generates an epydoc error :)
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 2 May 2011 12:03:00 +0000 (14:03 +0200)]
luxi: do not handle KeyboardInterrupt
With the current code, it's possible to mistake a ^C for a protocol
error:
node1# gnt-job info 221691
[press ^C]
Unhandled protocol error while talking to the master daemon:
Error while deserializing response:
(and note empty error message).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 2 May 2011 11:55:21 +0000 (13:55 +0200)]
Handle EPIPE errors while writing to the terminal
This handles EPIPE errors in two places: ToStream (to catch logging
done in GenericMain itself) and in GenericMain (to cover also plain
print statements).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Mon, 2 May 2011 09:56:44 +0000 (11:56 +0200)]
Cluster verify: check for missing bridges
Currently cluster verify doesn't check for bridge information; the
only checks are done at instance create and failover/migrate
time. This means a cluster that seems healthy will fail creation jobs.
This patch implements a simple verification that all nodes (in the
entire cluster, so doesn't work well for multi-group) have all the
required bridges: the default one plus any instance bridge.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 29 Apr 2011 12:54:26 +0000 (14:54 +0200)]
TLReplaceDisks: Use implicit loop for dictionary
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Apr 2011 12:43:02 +0000 (14:43 +0200)]
Release unneeded locks while replacing disks
If an iallocator is used, “gnt-instance replace-disks” would acquire the
locks of all nodes (only the allocator will decide which node to use).
Unfortunately the unneeded locks were not released during the operation,
causing unnecessary delays for other jobs.
This patch changes the LU to release unneeded locks and adds assertions.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Apr 2011 10:45:47 +0000 (12:45 +0200)]
locking: Export “list_owned” from lock manager
This is analog to “is_owned” and will be used for assertions.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 29 Apr 2011 10:45:13 +0000 (12:45 +0200)]
gnt-instance: Fix typo in error message
The iallocator parameter is “-I”, not “-i”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 29 Apr 2011 10:24:32 +0000 (12:24 +0200)]
mlock: fail gracefully if libc.so.6 cannot be loaded
This allows noded to continue instead of blowing up if the libc major
number changes.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 28 Apr 2011 08:52:37 +0000 (10:52 +0200)]
Allow creating the DRBD metadev in a different VG
This is a simple change to allow specifying a different VG for the
meta device during the creation of instances and addition of disks via
gnt-instance modify.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 28 Apr 2011 08:40:32 +0000 (10:40 +0200)]
Make _GenerateDRBD8Branch accept different VG names
This is a small change to make this function take a list of VG names,
instead of a single one.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Thu, 28 Apr 2011 09:21:19 +0000 (11:21 +0200)]
Fix WriteFile with unicode data
Unicode is fun, indeed:
>>> len(buffer("abc"))
3
>>> len(buffer(u"abc"))
12
So we can't pass unicode data to buffer(), as the result will be to
write the in-memory (usually UTF-32) representation to disk.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Apr 2011 12:23:12 +0000 (14:23 +0200)]
Replace disks: keep the meta device in the same VG
This patch enhances the multi-VG support in replace disks, by keeping
the meta device in the same VG, as opposed to moving it to the data
device VG (note that we don't have a way to create the meta in a
different VG in the first place, but at least we correctly handle a
custom config).
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Doug Dumitru [Wed, 27 Apr 2011 09:15:54 +0000 (11:15 +0200)]
Fix for multiple VGs - PlainToDrbd and replace-disks
Converting an instance from 'plain' to 'drbd'. The old code would
create the drbd volumes in the default VG and then the renames would
fail. This fix pulls the plain VG names from the existing volumes and
places it into the new disk template.
Running 'replace-disks' has a similar issue with the new disks going
into the wrong VG and then the rename failing.
Their might be a similar issue with 'recreate-disks', but I actually
have no idea what recreate-disks does, so did not look into it.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Apr 2011 11:45:57 +0000 (13:45 +0200)]
Fix potential data-loss in utils.WriteFile
os.write can do incomplete writes, as long as at least some bytes have
been written (like write(2)):
>>> os.write(fd, " " * 1300)
1300
>>> os.write(fd, " " * 1300)
1300
>>> os.write(fd, " " * 1300)
1300
>>> os.write(fd, " " * 1300)
980
>>> os.write(fd, " " * 1300)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OSError: [Errno 28] No space left on device
Note that incomplete write that only wrote 980 bytes, before the
exception.
To workaround this, we simply iterate until all data is
written. Unittests could be written by using a parameter instead of
hardcoding os.write and checking for incomplete writes.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 27 Apr 2011 10:19:19 +0000 (12:19 +0200)]
Improve error messages in cluster verify/OS
A few issues in the clarity of the error messages are fixed:
- "ERROR: node node3: OS API version lenny-image": no preposition
between the parameter type and the OS name, changed to "for
lenny-image"
- "API version lenny-image differs from reference node node1: 10, 5
vs. 10, 20, 5, 15": parameters not sorted in display
- "OS variants list lenny-image differs from reference node node1:
vs. default, i386": empty sets are not clearly delimited, changed to
add [] around the sets: "node node1: [] vs. [default, i386]"
- "OS parameters lenny-image differs from reference node node1:
vs. (u'dhcp', u'Whether to enable (yes) or disable (dhcp)')": ugly
formatting in the OS parameters list, as we used to just "%s" the
tuple; now it is "reference node node1: [] vs. [dhcp: Whether to
enable (yes) or disable (dhcp)]"
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>