ganeti-local
12 years agoFix aliases in bash completion
Michael Hanselmann [Fri, 22 Jul 2011 09:55:46 +0000 (11:55 +0200)]
Fix aliases in bash completion

Ever since commit 2d48a3a2 aliases were not included in the bash
completion script. This patch also replaces one tab with two spaces.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12 years agognt-node volumes: Fix instance names
Michael Hanselmann [Fri, 22 Jul 2011 08:27:09 +0000 (10:27 +0200)]
gnt-node volumes: Fix instance names

Commit 84d7e26b changed “objects.Instance.MapLVsByN” to not just return
the LV name, but to include the volume group name (e.g.
“xenvg/d67e8700….disk0_data”). This in turn broke the mapping of volume
names in LUNodeQueryvols, stopping instance names from displayed in
“gnt-node volumes”.

This patch fixes the issue and does some cleanup.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoht: Add new check for numbers
Michael Hanselmann [Fri, 8 Jul 2011 19:33:16 +0000 (21:33 +0200)]
ht: Add new check for numbers

Places which receive floats can usually also deal with integers, e.g.
OpTestDelay. Tests are added and the new check function is used for the
aforementioned opcode and verifying query results.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix off-by-one bug in job serial generation
Michael Hanselmann [Thu, 7 Jul 2011 19:10:20 +0000 (21:10 +0200)]
Fix off-by-one bug in job serial generation

Commit 009e73d0 (September 2009) changed the job queue to generate
multiple job serials at once. Ever since it would return one more than
requested.

The “serial” file in the job queue directory is defined to contain the
“last job ID used” (design-2.0). With the change above, the serial file
would always contain the next serial number. The first value returned by
the generating function was the one contained in the file, so during the
switch in 2009 one job may have been overwritten.

This patch changes the code to always return the exact number of
serials, to keep the last used serial on disk and adds an assertion.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoShorten some unbreakable lines in man pages
Iustin Pop [Thu, 30 Jun 2011 15:15:51 +0000 (17:15 +0200)]
Shorten some unbreakable lines in man pages

In order to make the display right on 80-columns terminals.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoCorrect some spelling mistakes
Iustin Pop [Thu, 30 Jun 2011 15:07:34 +0000 (17:07 +0200)]
Correct some spelling mistakes

New lintian is even smarter:

- overriden → overridden
- allows to → allows one to

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoFix bug in recreate-disks for DRBD instances
Iustin Pop [Tue, 28 Jun 2011 12:24:45 +0000 (14:24 +0200)]
Fix bug in recreate-disks for DRBD instances

The new functionality in 2.4.2 for recreate-disks to change nodes is
broken for DRBD instances: it simply changes the nodes without caring
for the DRBD minors mapping, which will lead to conflicts in non-empty
clusters.

This patch changes Exec() method of this LU significantly, to both fix
the DRBD minor usage and make sure that we don't have partial
modification to the instance objects:

- the first half of the method makes all the checks and computes the
  needed configuration changes
- the second half then performs the configuration changes and
  recreates the disks

This way, instances will either be fully modified or not at all;
whether the disks are successfully recreate is another point, but at
least we'll have the configuration sane.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix a lint warning
Iustin Pop [Tue, 28 Jun 2011 12:41:28 +0000 (14:41 +0200)]
Fix a lint warning

Patch db8e5f1c removed the use of feedback_fn, hence pylint warn
now.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoKVM: configure bridged NICs at migration start
Apollon Oikonomopoulos [Wed, 22 Jun 2011 09:03:41 +0000 (12:03 +0300)]
KVM: configure bridged NICs at migration start

Commit 5d9bfd870 moved tap interface handling from KVM to Ganeti, partly
to also solve the problem of routed interfaces getting configured too
early during live migrations, causing network anomalies. In that
direction, configuration of NICs of incoming instances was deferred to
FinalizeMigration time.

However, this causes minor issues with bridged interfaces; KVM sends out
an ARP-like packet upon migration finish, which is lost because the tap
interface is not yet configured. As a consequence, intermediate network
equipment (i.e. switches) does not get notified about the topology
change, until the instance transmits another packet after the bridge has
been configured, or the switch's ARP cache expires.

The proper solution to that is to support different phases in network
configuration (pre/post migration), which also requires separate ifup
scripts. Until then we fall back to configuring bridged interfaces on
incoming instances at migration start, instead of finish.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix RAPI documentation regarding master role
Iustin Pop [Mon, 27 Jun 2011 10:54:28 +0000 (12:54 +0200)]
Fix RAPI documentation regarding master role

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoFix bug in drbd8 replace disks on current nodes
Iustin Pop [Fri, 10 Jun 2011 14:44:03 +0000 (16:44 +0200)]
Fix bug in drbd8 replace disks on current nodes

Currently the drbd8 replace-disks on the same node (i.e. -p or -s) has
a bug in that it does modify the instance disk temporarily before
changing it back to the same value. However, we don't need to, and
shouldn't do that: what this operation do is simply change the LVM
configuration on the node, but otherwise the instance disks keep the
same configuration as before.

In the current code, this change back-and-forth is fine *unless* we
fail during attaching the new LVs to DRBD; in which case, we're left
with a half-modified disk, which is entirely wrong.

So we change the code in two ways:

- use temporary copies of the disk children in the old_lvs var
- stop updating disk.children

Which means that the instance should not be modified anymore (except
maybe for SetDiskID, which is a legacy and unfortunate decision that
will have to cleaned up sometime).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoLUInstanceCreate: use opcodes.RequireFileStorage
Guido Trotter [Wed, 22 Jun 2011 13:14:27 +0000 (14:14 +0100)]
LUInstanceCreate: use opcodes.RequireFileStorage

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoDon't add ",boot=on" to disks on kvm >= 0.14
Guido Trotter [Thu, 9 Jun 2011 09:01:30 +0000 (09:01 +0000)]
Don't add ",boot=on" to disks on kvm >= 0.14

Under newer kvm this prevents the vm from starting.
Ah, change!

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoKVM: fix per-instance stored UID value
Apollon Oikonomopoulos [Wed, 22 Jun 2011 15:41:29 +0000 (18:41 +0300)]
KVM: fix per-instance stored UID value

When using the pool security model, _ExecuteKVMRuntime was storing the
instance's UID using str(uid), which would result in storing the
LockedUid.__repr__() result:

 $ cat /var/run/ganeti/kvm-hypervisor/uid/xxxxxxxxxxxxx
 <ganeti.uidpool.LockedUid object at 0x1f30610>

This patch restores the intended behaviour, by using LockedUid.AsStr().

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAdd one forgotten element to the file disk path
Guido Trotter [Fri, 17 Jun 2011 11:44:55 +0000 (14:44 +0300)]
Add one forgotten element to the file disk path

This was left out during the fix/refactoring

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoLUInstanceCreate: fix file storage dir calculation
Guido Trotter [Fri, 17 Jun 2011 09:39:43 +0000 (12:39 +0300)]
LUInstanceCreate: fix file storage dir calculation

- Move the calculation at the beginning of CheckPrereq, since it doesn't
  modify any state, but still keeps locks
- Only perform the calculation if the actual disk template is filebased
- Error out if there is no defined file storage dir
- Only join the optional --file-storage-dir extra-path if one is passed

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoCheck that filestorage is enabled when requested
Guido Trotter [Fri, 17 Jun 2011 09:23:51 +0000 (12:23 +0300)]
Check that filestorage is enabled when requested

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoRemove self.op.file_storage_dir isabs check
Guido Trotter [Fri, 17 Jun 2011 09:16:58 +0000 (09:16 +0000)]
Remove self.op.file_storage_dir isabs check

As the manpage says, and the code does, self.op.file_storage_dir is an
additional relative path under the cluster file storage dir. As such it
should not be absolute.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agojqueue: Fix potential race condition when cancelling queued jobs
Michael Hanselmann [Tue, 31 May 2011 14:49:45 +0000 (16:49 +0200)]
jqueue: Fix potential race condition when cancelling queued jobs

When a job was cancelled, its status would be changed and the file
written again. Since this was a final status, the job file could be
moved anytime for archival. If the job was still in the queue, however,
it would be processed (not fully, just updating the “end_timestamp”
attribute) and written again. This was bad as it could leave the same
job in two different files.

With this patch the processor is changed to return early for finished
jobs. Cancelling a queued job will finalize it right away. Unittests are
updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoFix argument order in ReserveLV and ReserveMAC
Apollon Oikonomopoulos [Mon, 30 May 2011 11:02:25 +0000 (14:02 +0300)]
Fix argument order in ReserveLV and ReserveMAC

ConfigWriter.ReserveLV() and Configwriter.ReserveMAC() called
TemporaryReservationManager.Reserve() with the ec_id and resource arguments
swapped. As a result, two reservation attempts for the same resource type
within the same LU would fail, even if the resources requested were different,
e.g.:

  $ gnt-instance add -t sharedfile -o debootstrap+default \
       --net 0:mac=00:01:02:03:04:00 \
       --net 1:mac=00:01:02:03:04:ff \
       --disk 0:size=2g  test_instance
  Failure: prerequisites not met for this operation:
  error type: resource_not_unique, error details:
  MAC address 00:01:02:03:04:ff already in use in cluster

This patch fixes the argument order in the call to Reserve().

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoTLReplaceDisks: Move assertion checking locks
Michael Hanselmann [Thu, 26 May 2011 12:36:50 +0000 (14:36 +0200)]
TLReplaceDisks: Move assertion checking locks

Commit 1bee66f3 added assertions for ensuring only the necessary locks
are kept while replacing disks. One of them makes sure locks have been
released during the operation. Unfortunately the commit added the check
as part of a “finally” branch, which is also run when an exception is
thrown (in which case the locks may not have been released yet). Errors
could be masked by the assertion error. Moving the check out of the
“finally” branch fixes the issue.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agonode evac: don't call IAllocator if no instances
Iustin Pop [Tue, 24 May 2011 09:29:39 +0000 (11:29 +0200)]
node evac: don't call IAllocator if no instances

Currently we generate an empty list only for the '-n node' invocation,
but for iallocator we still call the iallocator (which needs an RPC
call, etc.). By moving the computation of instances outside of the if
block, we can return early from the LU.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRPC/Backend: Make UploadFile uid and gid agnostic
René Nussbaumer [Thu, 19 May 2011 08:37:58 +0000 (10:37 +0200)]
RPC/Backend: Make UploadFile uid and gid agnostic

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoResolve uid/gid upon mainloop run
René Nussbaumer [Fri, 20 May 2011 12:24:13 +0000 (14:24 +0200)]
Resolve uid/gid upon mainloop run

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoGetEntResolver: Make it possible to resolve uid/gid to name
René Nussbaumer [Wed, 18 May 2011 12:19:52 +0000 (14:19 +0200)]
GetEntResolver: Make it possible to resolve uid/gid to name

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoutils.algo: Add InvertDict to invert a dict
René Nussbaumer [Fri, 20 May 2011 12:17:29 +0000 (14:17 +0200)]
utils.algo: Add InvertDict to invert a dict

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoautotools: Add noded group
René Nussbaumer [Thu, 19 May 2011 11:41:52 +0000 (13:41 +0200)]
autotools: Add noded group

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix errors in hooks documentation
Michael Hanselmann [Tue, 17 May 2011 16:13:16 +0000 (18:13 +0200)]
Fix errors in hooks documentation

In many cases the opcode ID was incorrect. A unittest for this will
be added in the master branch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoClarify a bit the noded man page
Iustin Pop [Mon, 16 May 2011 11:26:44 +0000 (13:26 +0200)]
Clarify a bit the noded man page

"This can be overriden" can be read as either the port we listen on or
the address we bind to. Replace with "The port" for great clarity!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoNote --no-remember in NEWS
Iustin Pop [Fri, 13 May 2011 15:54:34 +0000 (17:54 +0200)]
Note --no-remember in NEWS

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoSwitch QA over to using instance stop --no-remember
Iustin Pop [Fri, 13 May 2011 15:38:25 +0000 (17:38 +0200)]
Switch QA over to using instance stop --no-remember

Instead of hardcoded Xen commands. This will make it work for all
hypervisors, instead of duplicating hypervisor functionality in QA
itself.

The timeout has been removed as gnt-instance stop itself will make
sure the instance is down before returning. We just double-check that
it is indeed down.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoImplement no_remember at RAPI level
Iustin Pop [Fri, 13 May 2011 15:20:29 +0000 (17:20 +0200)]
Implement no_remember at RAPI level

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoImplement no_remember at CLI level
Iustin Pop [Fri, 13 May 2011 15:17:44 +0000 (17:17 +0200)]
Implement no_remember at CLI level

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoIntroduce instance start/stop no_remember attribute
Iustin Pop [Fri, 13 May 2011 15:02:35 +0000 (17:02 +0200)]
Introduce instance start/stop no_remember attribute

This will allow stopping or starting an instance without changing the
remembered state. While this seems counter-intuitive at first (it will
create cluster verify errors), it can help in a few corner cases:

- shutting down an entire cluster for maintenance but without having
  to remember state
- doing testing of Ganeti itself

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoBump version for the 2.4.2 release v2.4.2
Iustin Pop [Thu, 12 May 2011 13:46:46 +0000 (15:46 +0200)]
Bump version for the 2.4.2 release

I think we should stop finding bugs and instead release this :)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix a bug in LUInstanceMove
Iustin Pop [Thu, 12 May 2011 11:39:47 +0000 (13:39 +0200)]
Fix a bug in LUInstanceMove

The opcode parameter ignore_consistency was used in the LU, but not
actually declared in the OpCode. The patch adds it in the opcode and
the command line client.

ObQuote — Please, please, can I have static typing?

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoAbstract ignore_consistency opcode parameter
Iustin Pop [Thu, 12 May 2011 11:34:23 +0000 (13:34 +0200)]
Abstract ignore_consistency opcode parameter

Two opcodes already use it and we need it for a third, time to add a
constant for it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoPreload the string-escape code in noded
Iustin Pop [Thu, 12 May 2011 11:17:47 +0000 (13:17 +0200)]
Preload the string-escape code in noded

This encoding, part of the standard Python installation, is used by
the pickle module (in turn used by subprocess when handling
failures in program execution). Preloading it means that Python will
cache it in memory so that even if the disk goes away or just the
module, we're not going to fail in reporting errors.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoFix error in iallocator documentation reg. disk mode
Michael Hanselmann [Wed, 11 May 2011 16:14:30 +0000 (18:14 +0200)]
Fix error in iallocator documentation reg. disk mode

The code uses the disk object's “mode” attribute, which uses the
constants DISK_RDONLY (“ro”) and DISK_RDWR (“rw”).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoTry to prevent instance memory changes N+1 failures
Iustin Pop [Wed, 11 May 2011 15:54:27 +0000 (17:54 +0200)]
Try to prevent instance memory changes N+1 failures

There are multiple bugs with the code checking for N+1 failures in the
instance memory changes which needs significant changes, in the
meantime we can at least:

- change the warning message into an error (--force will skip checks)
- only make checks when we increase the memory

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoUpdate NEWS file for the 2.4.2 release
Iustin Pop [Tue, 10 May 2011 16:45:18 +0000 (18:45 +0200)]
Update NEWS file for the 2.4.2 release

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoUse floppy disk and a second CDROM on KVM
Marco Casavecchia [Mon, 2 May 2011 08:39:50 +0000 (01:39 -0700)]
Use floppy disk and a second CDROM on KVM

Hi all,
this patch will add 3 new KVM parameters and a new option.

New Parameters:
 - floppy_image_path = "" -> Specify the floppy image to load as
floppy disk.
 - cdrom2_image_path = "" -> Specify a second cdrom image to load on
the system (note: this in not intended to be used as a boot device. To
boot the system from cdrom you must use the "cdrom_image_path"
parameter as always).
 - cdrom_disk_type = "" -> it can be one of the kvm supported types as
"ide,scsi,paravirtual,ecc". I introduced this optional parameter to
make possible to specify a different virtual device for cdroms. It is
useful if you want to install a windows system

New option for "boot_device" parameter:
 -  "floppy": with this value you should be able to boot a KVM
instance from floppy image.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit cc130cc7a60fd5377c032116b0c036ae44639913)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoDocument the selection of instance kernels
Iustin Pop [Tue, 10 May 2011 15:54:31 +0000 (17:54 +0200)]
Document the selection of instance kernels

A simple doc patch to document how to configure the kernels for the
instances.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMake root_path an optional hypervisor parameter
René Nussbaumer [Mon, 9 May 2011 13:49:10 +0000 (15:49 +0200)]
Make root_path an optional hypervisor parameter

This will allow us an easy migration to pv-grub, because a set root_path
confused pv-grub.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoSome man page updates
Iustin Pop [Mon, 9 May 2011 12:09:03 +0000 (14:09 +0200)]
Some man page updates

This adds documentation for both the short and long form of many
options (which was inconsistent before: in some cases only the short
form was used, in others only the long form).

Note that the standard this patch adopts is to document both forms as
such:

  {-O|--os-parameters} …

This makes it a bit uglier in complex situations, but the alternatives
considered were not perfect either. Other suggestions (with patches)
welcome.

Additionally, it fixes two doc bugs:

- in gnt-cluster.rst, the --prealloc-wipe-disks section was in the
  middle of a paragraph
- in gnt-instance.rst, a list was not typed correctly, thus it was
  mangled as a single paragraph

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd 2 new variables to the OS scripts environment
Marco Casavecchia [Thu, 5 May 2011 09:17:09 +0000 (02:17 -0700)]
Add 2 new variables to the OS scripts environment

Add INSTANCE_PRIMARY_NODE and INSTANCE_SECONDARY_NODES. These new
values are useful for OS scripts that needs to know the nodes where
the instance lives.. or has lived.

Signed-off-by: Iustin Pop <iustin@google.com>
[iustin@google.com: fixed small issue with SECONDARY_NODES]
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd --no-wait-for-sync when converting to drbd
Iustin Pop [Mon, 9 May 2011 09:42:25 +0000 (11:42 +0200)]
Add --no-wait-for-sync when converting to drbd

Currently, when converting an instance from plain to DRBD, the
instance is blocked during the entire resync period. This patch adds
the --no-wait-for-sync so that the operation finishes as soon as the
DRBD sync has started, without waiting for the entire sync. This makes
the instance available much faster.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRecreate instance disks: allow changing nodes
Iustin Pop [Sat, 7 May 2011 10:25:18 +0000 (12:25 +0200)]
Recreate instance disks: allow changing nodes

This patch introduces the option of changing an instance's nodes when
doing the disk recreation. The rationale is that currently if an
instance lives on a node that has gone down and is marked offline,
it's not possible to re-create the disks and reinstall the instance on
a different node without hacking the config file.

Additionally, the LU now locks the instance's nodes (which was not
done before), as we most likely allocate new resources on them.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoRename instance: only show new name when different
Iustin Pop [Fri, 6 May 2011 09:03:30 +0000 (11:03 +0200)]
Rename instance: only show new name when different

It makes not sense to show messages like:
Fri May  6 02:04:01 2011  - INFO: Resolved given name 'instance18' to
'instance18'

So we'll skip the message if the resolved name is identical to the
requested one.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix race condition in LUGroupAssignNodes
Michael Hanselmann [Thu, 5 May 2011 13:38:43 +0000 (15:38 +0200)]
Fix race condition in LUGroupAssignNodes

The original code would get all node information and their groups
without before acquiring the necessary locks. With this patch the node
information is only retrieved once all locks have been acquired. Groups
are locked optimistically and verified after acquiring the node locks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRe-wrap and fix formatting issues in gnt-instance.rst
Iustin Pop [Wed, 4 May 2011 11:06:12 +0000 (13:06 +0200)]
Re-wrap and fix formatting issues in gnt-instance.rst

This is mostly rewrapping plus fixing a few small issues in
gnt-instance.rst.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoDocumentation for the new parameters for KVM
Marco Casavecchia [Tue, 3 May 2011 10:16:45 +0000 (12:16 +0200)]
Documentation for the new parameters for KVM

Options added/updated are: cdrom2_image_path, floppy_image_path,
cdrom_disk_type and boot_order.

Signed-off-by: Iustin Pop <iustin@google.com>
[iustin@google.com: small formatting update]
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocmdlib: Fix typo, s/nick/NIC/
Michael Hanselmann [Tue, 3 May 2011 15:37:37 +0000 (17:37 +0200)]
cmdlib: Fix typo, s/nick/NIC/

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoA small optimisation in cluster verify
Iustin Pop [Mon, 2 May 2011 13:20:43 +0000 (15:20 +0200)]
A small optimisation in cluster verify

This removes (count of instances + count of nodes) lock
acquires/releases.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoA few docstring fixes
Iustin Pop [Mon, 2 May 2011 13:00:26 +0000 (15:00 +0200)]
A few docstring fixes

At least one generates an epydoc error :)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoluxi: do not handle KeyboardInterrupt
Iustin Pop [Mon, 2 May 2011 12:03:00 +0000 (14:03 +0200)]
luxi: do not handle KeyboardInterrupt

With the current code, it's possible to mistake a ^C for a protocol
error:

node1# gnt-job info 221691
[press ^C]
Unhandled protocol error while talking to the master daemon:
Error while deserializing response:

(and note empty error message).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoHandle EPIPE errors while writing to the terminal
Iustin Pop [Mon, 2 May 2011 11:55:21 +0000 (13:55 +0200)]
Handle EPIPE errors while writing to the terminal

This handles EPIPE errors in two places: ToStream (to catch logging
done in GenericMain itself) and in GenericMain (to cover also plain
print statements).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoCluster verify: check for missing bridges
Iustin Pop [Mon, 2 May 2011 09:56:44 +0000 (11:56 +0200)]
Cluster verify: check for missing bridges

Currently cluster verify doesn't check for bridge information; the
only checks are done at instance create and failover/migrate
time. This means a cluster that seems healthy will fail creation jobs.

This patch implements a simple verification that all nodes (in the
entire cluster, so doesn't work well for multi-group) have all the
required bridges: the default one plus any instance bridge.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoTLReplaceDisks: Use implicit loop for dictionary
Michael Hanselmann [Fri, 29 Apr 2011 12:54:26 +0000 (14:54 +0200)]
TLReplaceDisks: Use implicit loop for dictionary

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRelease unneeded locks while replacing disks
Michael Hanselmann [Fri, 29 Apr 2011 12:43:02 +0000 (14:43 +0200)]
Release unneeded locks while replacing disks

If an iallocator is used, “gnt-instance replace-disks” would acquire the
locks of all nodes (only the allocator will decide which node to use).
Unfortunately the unneeded locks were not released during the operation,
causing unnecessary delays for other jobs.

This patch changes the LU to release unneeded locks and adds assertions.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agolocking: Export “list_owned” from lock manager
Michael Hanselmann [Fri, 29 Apr 2011 10:45:47 +0000 (12:45 +0200)]
locking: Export “list_owned” from lock manager

This is analog to “is_owned” and will be used for assertions.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agognt-instance: Fix typo in error message
Michael Hanselmann [Fri, 29 Apr 2011 10:45:13 +0000 (12:45 +0200)]
gnt-instance: Fix typo in error message

The iallocator parameter is “-I”, not “-i”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agomlock: fail gracefully if libc.so.6 cannot be loaded
Iustin Pop [Fri, 29 Apr 2011 10:24:32 +0000 (12:24 +0200)]
mlock: fail gracefully if libc.so.6 cannot be loaded

This allows noded to continue instead of blowing up if the libc major
number changes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAllow creating the DRBD metadev in a different VG
Iustin Pop [Thu, 28 Apr 2011 08:52:37 +0000 (10:52 +0200)]
Allow creating the DRBD metadev in a different VG

This is a simple change to allow specifying a different VG for the
meta device during the creation of instances and addition of disks via
gnt-instance modify.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMake _GenerateDRBD8Branch accept different VG names
Iustin Pop [Thu, 28 Apr 2011 08:40:32 +0000 (10:40 +0200)]
Make _GenerateDRBD8Branch accept different VG names

This is a small change to make this function take a list of VG names,
instead of a single one.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix WriteFile with unicode data
Iustin Pop [Thu, 28 Apr 2011 09:21:19 +0000 (11:21 +0200)]
Fix WriteFile with unicode data

Unicode is fun, indeed:

>>> len(buffer("abc"))
3
>>> len(buffer(u"abc"))
12

So we can't pass unicode data to buffer(), as the result will be to
write the in-memory (usually UTF-32) representation to disk.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoReplace disks: keep the meta device in the same VG
Iustin Pop [Wed, 27 Apr 2011 12:23:12 +0000 (14:23 +0200)]
Replace disks: keep the meta device in the same VG

This patch enhances the multi-VG support in replace disks, by keeping
the meta device in the same VG, as opposed to moving it to the data
device VG (note that we don't have a way to create the meta in a
different VG in the first place, but at least we correctly handle a
custom config).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix for multiple VGs - PlainToDrbd and replace-disks
Doug Dumitru [Wed, 27 Apr 2011 09:15:54 +0000 (11:15 +0200)]
Fix for multiple VGs - PlainToDrbd and replace-disks

Converting an instance from 'plain' to 'drbd'.  The old code would
create the drbd volumes in the default VG and then the renames would
fail.  This fix pulls the plain VG names from the existing volumes and
places it into the new disk template.

Running 'replace-disks' has a similar issue with the new disks going
into the wrong VG and then the rename failing.

Their might be a similar issue with 'recreate-disks', but I actually
have no idea what recreate-disks does, so did not look into it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix potential data-loss in utils.WriteFile
Iustin Pop [Wed, 27 Apr 2011 11:45:57 +0000 (13:45 +0200)]
Fix potential data-loss in utils.WriteFile

os.write can do incomplete writes, as long as at least some bytes have
been written (like write(2)):

>>> os.write(fd, " " * 1300)
1300
>>> os.write(fd, " " * 1300)
1300
>>> os.write(fd, " " * 1300)
1300
>>> os.write(fd, " " * 1300)
980
>>> os.write(fd, " " * 1300)
Traceback (most recent call last):
 File "<stdin>", line 1, in ?
OSError: [Errno 28] No space left on device

Note that incomplete write that only wrote 980 bytes, before the
exception.

To workaround this, we simply iterate until all data is
written. Unittests could be written by using a parameter instead of
hardcoding os.write and checking for incomplete writes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoImprove error messages in cluster verify/OS
Iustin Pop [Wed, 27 Apr 2011 10:19:19 +0000 (12:19 +0200)]
Improve error messages in cluster verify/OS

A few issues in the clarity of the error messages are fixed:

- "ERROR: node node3: OS API version lenny-image": no preposition
  between the parameter type and the OS name, changed to "for
  lenny-image"

- "API version lenny-image differs from reference node node1: 10, 5
  vs. 10, 20, 5, 15": parameters not sorted in display

- "OS variants list lenny-image differs from reference node node1:
  vs. default, i386": empty sets are not clearly delimited, changed to
  add [] around the sets: "node node1: [] vs. [default, i386]"

- "OS parameters lenny-image differs from reference node node1:
  vs. (u'dhcp', u'Whether to enable (yes) or disable (dhcp)')": ugly
  formatting in the OS parameters list, as we used to just "%s" the
  tuple; now it is "reference node node1: [] vs. [dhcp: Whether to
  enable (yes) or disable (dhcp)]"

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoPrevent readding of the master node
Iustin Pop [Wed, 27 Apr 2011 09:42:36 +0000 (11:42 +0200)]
Prevent readding of the master node

This breaks Ganeti in multiple ways. If we don't make the check in
gnt-node itself, then bootstrap.SetupNodeDaemon will restart the
master daemon, making the operation fail:

  node1# gnt-node add --readd node1
  Cannot communicate with the master daemon.
  Is it running and listening for connections?

The check in cmdlib is more of a safety check, as we shouldn't reach
it. If we do (via a bad client), then it will prevent breakage in the
job queue/config handling.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix punctuation in an error message
Iustin Pop [Wed, 27 Apr 2011 09:23:33 +0000 (11:23 +0200)]
Fix punctuation in an error message

IIRC we don't use punctuation at the end of error messages.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agocli: Fix wrong argument kind for groups
Michael Hanselmann [Thu, 21 Apr 2011 11:59:46 +0000 (13:59 +0200)]
cli: Fix wrong argument kind for groups

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoQuote filename in gnt-instance.8
Michael Hanselmann [Thu, 21 Apr 2011 12:00:36 +0000 (14:00 +0200)]
Quote filename in gnt-instance.8

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix typo in LUGroupAssignNodes
Michael Hanselmann [Wed, 20 Apr 2011 12:33:02 +0000 (14:33 +0200)]
Fix typo in LUGroupAssignNodes

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agognt-instance info: automatically request locking
Iustin Pop [Wed, 20 Apr 2011 11:15:23 +0000 (13:15 +0200)]
gnt-instance info: automatically request locking

Commit dae661a4 added support for controlling the locking, but it
didn't modify the gnt-instance info code, which leads to this command
always showing:

Wed Apr 20 04:10:48 2011  - WARNING: Non-static data requested, locks
need to be acquired

We simply change gnt-instance to request locks whenever we don't use
the static mode.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoDocument the dependency on OOB for gnt-node power
Iustin Pop [Wed, 20 Apr 2011 09:29:09 +0000 (11:29 +0200)]
Document the dependency on OOB for gnt-node power

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix master IP activation in failover with no-voting
Iustin Pop [Tue, 19 Apr 2011 16:35:13 +0000 (18:35 +0200)]
Fix master IP activation in failover with no-voting

Thanks to net.for.hub@gmail.com for reporting this. The logic in
masterd.CheckMasterd did an early return in case of no_voting, hence
skipping the master IP activation. We just change the ifs to not
return but simply continue through the function.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agodisk wiping: fix bug in chunk size computation
Iustin Pop [Tue, 19 Apr 2011 15:31:09 +0000 (17:31 +0200)]
disk wiping: fix bug in chunk size computation

The current wipe_chunk_size computation is doing min(int_value,
float_value). For small disks (below 10GiB), the actual formula will
result into the float value being chosen. This results into very
interesting behaviour:

Wiping disk 0, offset 102.4, chunk 102.4
Wiping disk 0, offset 204.8, chunk 102.4

Wiping disk 0, offset 921.6, chunk 102.4
Wiping disk 0, offset 1024.0, chunk 1.13686837722e-13

Since these are passed to dd via %d, this will result into the call to
dd specifying offset 1024 and count 0, which will fail.

We just need to enforce conversion to int, in order to not get bitten
by floating point rounding errors.

The patch also reorders some logging messages in order to log the
chunk size.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix bug in watcher
Michael Hanselmann [Tue, 19 Apr 2011 11:38:58 +0000 (13:38 +0200)]
Fix bug in watcher

If “utils.RunParts” were to raise an exception, a log message was
written and the code continued to run. Due to the exception the
“results” variable would not be defined.

Also change the code to log a backtrace (getting an exception is rather
unlikely and having a backtrace is useful) and update one comment.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoRelease locks before wiping disks during instance creation
Michael Hanselmann [Wed, 13 Apr 2011 11:53:55 +0000 (13:53 +0200)]
Release locks before wiping disks during instance creation

Ganeti 2.3 introduced an optional feature to overwrite an instance's
disks on creation. Unfortunately the code kept all locks while doing the
wipe, slowing down the creation of multiple instances in parallel.

This patch changes the code to wipe the disks only after releasing the
locks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoutils.WriteFile: Close file before renaming
Michael Hanselmann [Mon, 11 Apr 2011 13:44:43 +0000 (15:44 +0200)]
utils.WriteFile: Close file before renaming

Issue 154 (http://code.google.com/p/ganeti/issues/detail?id=154)
reported an “Operation not supported” error when writing instance
exports to a mounted CIFS filesystem. Experimentation showed the error
to only occur when using rename(2) on an opened file. Various references
on the web confirmed this observation. Whether or not the problem occurs
can also depend on the CIFS server implementation. In issue 154 it was
Windows 2008 R2.

While not solving all cases, closing the file before renaming helps
alleviating the issue a bit. Unittests are updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix distcheck
Michael Hanselmann [Fri, 8 Apr 2011 11:58:48 +0000 (13:58 +0200)]
Fix distcheck

README is not copied to the build tree.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoNicer formatting for group query error
Michael Hanselmann [Fri, 8 Apr 2011 11:29:57 +0000 (13:29 +0200)]
Nicer formatting for group query error

Before this patc the message would look like “Some groups do not exist:
[u'foo', u'bar']”, now it's “Some groups do not exist: foo, bar”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agognt-instance.8: Fix wrongly formatted title
Michael Hanselmann [Fri, 8 Apr 2011 11:22:37 +0000 (13:22 +0200)]
gnt-instance.8: Fix wrongly formatted title

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoUpdate version in README
Michael Hanselmann [Fri, 8 Apr 2011 10:21:41 +0000 (12:21 +0200)]
Update version in README

Also add a check to Makefile's check-local target.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'stable-2.4' into devel-2.4
Michael Hanselmann [Thu, 7 Apr 2011 09:44:52 +0000 (11:44 +0200)]
Merge branch 'stable-2.4' into devel-2.4

* stable-2.4:
  Add error checking and merging for cluster params
  Clarify --force-join parameter message
  Treat empty oob_program param as default
  Fix bug in instance listing with orphan instances
  Fix bug related to log opening failures
  Bump version for 2.4.1 release
  cfgupgrade: Fix critical bug overwriting RAPI users file

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLUInstanceQueryData: Don't acquire locks unless requested
Michael Hanselmann [Wed, 6 Apr 2011 16:32:31 +0000 (18:32 +0200)]
LUInstanceQueryData: Don't acquire locks unless requested

Until now LUInstanceQueryData always acquired locks for the instance(s)
and nodes involved. In combination with long-running operations this
prevented the use of “gnt-instance info”, even with the “--static”
option. With this patch, locks are only acquired when explicitely
requested in the opcode (like all query operations).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoIncrease the lock timeouts before we block-acquire
Iustin Pop [Mon, 4 Apr 2011 13:59:39 +0000 (15:59 +0200)]
Increase the lock timeouts before we block-acquire

This has been observed to cause problems on real clusters via the
following mechanism:

- a long job (e.g. a replace-disks) is keeping an exclusive lock on an
  instance
- the watcher starts and submits its query instances opcode which
  wants shared locks for all instances
- after about an hour, the watcher job falls back to blocking acquire,
  after having acquired all other locks
- any instance opcode that wants an exclusive lock for an instance
  cannot start until the watcher has finished, even though there's no
  actual operation on that instance

In order to alleviate this problem, we simply increase the max timeout
until lock acquires are sent back to either blocking acquire or
priority increase. The timeout is computed such that we wait ~10 hours
(instead of one) for this to happen, which should be within the
maximum lifetime of a reasonable opcode on a healthy cluster. The
timeout also means that priority increases will happen every half hour.

We also increase the max wait interval to 15 seconds, otherwise we'd
have too many retries with the increased interval.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agodaemon.py: move startup log message before prep_fn
Iustin Pop [Mon, 4 Apr 2011 10:13:44 +0000 (12:13 +0200)]
daemon.py: move startup log message before prep_fn

Before this, the output in the rapi daemon log was:
2011-04-04 03:09:51,026: ganeti-rapi pid=17447 INFO Reading users file
at /var/lib/ganeti/rapi/users
2011-04-04 03:09:51,027: ganeti-rapi pid=17447 INFO ganeti-rapi daemon
startup

Which is confusing, as it might look like the read of the users file
is part of the previous run. This is because we log the 'daemon
startup' message after the prepare_fn, which can log things on its
own.

The patch simply moves the 'daemon startup' message just before
prepare_fn call.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoDisplay the actual memory values in N+1 failures
Iustin Pop [Mon, 4 Apr 2011 09:33:01 +0000 (11:33 +0200)]
Display the actual memory values in N+1 failures

This changes the display from:
Mon Apr  4 02:29:46 2011 * Verifying N+1 Memory redundancy
Mon Apr  4 02:29:46 2011   - ERROR: node node2: not enough memory to
accomodate instance failovers should node node1 fail

To:

Mon Apr  4 02:32:50 2011 * Verifying N+1 Memory redundancy
Mon Apr  4 02:32:50 2011   - ERROR: node node2: not enough memory to
accomodate instance failovers should node node1 fail (33536MiB needed,
27910MiB available)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agossh.VerifyNodeHostname: remove the quiet flag
Iustin Pop [Thu, 31 Mar 2011 16:41:09 +0000 (18:41 +0200)]
ssh.VerifyNodeHostname: remove the quiet flag

This is not needed for this function, and can interfere with debugging
of ssh failures.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd error checking and merging for cluster params
Stephen Shirley [Fri, 25 Feb 2011 15:01:38 +0000 (16:01 +0100)]
Add error checking and merging for cluster params

Set the default stderr logging level to WARNING so the relevant output
can be seen.

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoRAPI: Document need for Content-type header in requests
Michael Hanselmann [Thu, 24 Mar 2011 14:13:12 +0000 (15:13 +0100)]
RAPI: Document need for Content-type header in requests

This was added to the NEWS file in commit ab221ddf, but never
documented properly.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix output for “gnt-job info”
Michael Hanselmann [Thu, 24 Mar 2011 11:51:31 +0000 (12:51 +0100)]
Fix output for “gnt-job info”

If the result of an opcode was a non-empty dictionary, it
would be impossible to differenciate between input and result:

  Input fields:
    […]
    debug_level: 0
    fields: cluster_name,master_node,volume_group_name
    jobs: [[True, u'37922'], [True, u'37923'], [True, u'37924']]

Expected output:

  Input fields:
    […]
    debug_level: 0
    fields: cluster_name,master_node,volume_group_name
  Result:
    jobs: [[True, u'37922'], [True, u'37923'], [True, u'37924']]

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agowatcher: Fix misleading usage output
Michael Hanselmann [Thu, 17 Mar 2011 16:36:57 +0000 (17:36 +0100)]
watcher: Fix misleading usage output

When “ganeti-watcher” is called with an argument, it would hint at
a non-existing “-f” parameter. With this patch the separate usage
string is no longer necessary.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoClarify --force-join parameter message
Stephen Shirley [Thu, 17 Mar 2011 10:05:36 +0000 (11:05 +0100)]
Clarify --force-join parameter message

This isn't only used during cluster merge.

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agolocking: Fix race condition in lock monitor
Michael Hanselmann [Mon, 14 Mar 2011 18:09:28 +0000 (19:09 +0100)]
locking: Fix race condition in lock monitor

In some rare cases it can happen that a lock is re-created very soon
after deletion, while the old instance hasn't been destructed yet. In
such a case the code would detect a duplicate name and raise an
exception.

We have seen at least one case where this happened during the creation
of many instances. It is not exactly clear how it came to be, but it
appears to have occurred while different jobs fought for locks with
short timeouts (in the case of instance creation locks are added at this
stage and removed shortly after if not all locks can be acquired).

The issue is fixed by removing the check for duplicate names. To still
guarantee a stable sort order for the lock information as shown by
“gnt-debug locks”, a registration number is recorded for each lock in
the monitor.

A unittest is included to check for the situation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoutils: Export NiceSortKey function
Michael Hanselmann [Thu, 24 Feb 2011 18:20:13 +0000 (19:20 +0100)]
utils: Export NiceSortKey function

The ability to split a string into a list of strings and integers can be
handy elsewhere and is necessary for sorting query results by names.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit f47941f864cf03264d363aebed530480a64e21dd)

13 years agoRevert "Only merge nodes that are known to not be offline"
Guido Trotter [Fri, 11 Mar 2011 12:59:33 +0000 (12:59 +0000)]
Revert "Only merge nodes that are known to not be offline"

This reverts commit 288f240f62dafa8bd8ba7482c8367adbdf6d96c2.

That commit was buggy at various levels:
  - broke ssh access to the second cluster, making cluster-merge
    unusable (unless ssh key were previously setup?)
  - filtered away offline nodes from being added to the cluster config
    (wrong, they should be kept, as offline)
  - broke commit-check

The previous commit makes the code work again with what this commit
tried to achieve.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>