Oleksiy Mishchenko [Mon, 7 Jul 2008 07:13:59 +0000 (07:13 +0000)]
QA for instance migration.
Reviewed-by: imsnah
Iustin Pop [Fri, 4 Jul 2008 15:58:19 +0000 (15:58 +0000)]
Fix some issues with the watcher
This patch fixes two bugs:
- the state file is not saved because we use the method for checking
for udpated data
- in two places 'Error' was used instead of 'Exception', which breaks
error handling
Additionally:
- the unused 're' import has been removed
- a variable named 'id' which collides with a builtin function has
been renamed
Note that comparing the serialized forms might create false negatives
(due to the dicts being reordered) but that will just cause an extra
write of the file, which is sub-optimal but harmless.
Reviewed-by: ultrotter
Iustin Pop [Fri, 4 Jul 2008 12:01:31 +0000 (12:01 +0000)]
Fix error handling in _CheckNodeFreeMemory
If the remote node is down, the rpc layer will return 'False' for the
node result, and not a dict.
The patch adds extra checks that we have the node and that its result is
a dict.
Reviewed-by: imsnah
Michael Hanselmann [Tue, 1 Jul 2008 12:17:45 +0000 (12:17 +0000)]
Set locale when building manpages
docbook2man includes the date in the current locale into
the output document. Therefore we need to use the "C"
locale.
Reviewed-by: iustinp
Michael Hanselmann [Tue, 1 Jul 2008 12:17:33 +0000 (12:17 +0000)]
Document additional options in manpages
Reviewed-by: iustinp
Iustin Pop [Tue, 1 Jul 2008 08:00:30 +0000 (08:00 +0000)]
Ganeti version 1.2.5~rc0
Let's a release candidate, the only other needed things will be
non-functional changes (QA and some documentation updates).
Reviewed-by: imsnah
Iustin Pop [Tue, 1 Jul 2008 07:54:57 +0000 (07:54 +0000)]
Further increase the migration delays
More testing shows that an even bigger timeout is better and allows many
more migrations before xen breaks.
Reviewed-by: imsnah
Oleksiy Mishchenko [Fri, 27 Jun 2008 16:04:08 +0000 (16:04 +0000)]
QA parameter for grow-disk size
Reviewed-by: iustinp
Iustin Pop [Fri, 27 Jun 2008 14:02:58 +0000 (14:02 +0000)]
Instance migration: add delays around migration
It seems that xen can have issues that are triggered by too fast
migrations. For now, until we have more experience with it, we should
add some delays before/after the migration call so that things have
enough time to settle down (e.g. hot plug scripts, etc.).
Note that I don't have solid data that this will fix it, or that indeed
xen is the culprit, but for now this seems the simplest way to try to
mitigate it.
Reviewed-by: ultrotter
Iustin Pop [Fri, 27 Jun 2008 13:46:21 +0000 (13:46 +0000)]
Fix burnin for migration cleanup
I forgot again to add the cleanup parameter to the burnin. This patch
adds it and also runs migration cleanup for the instances.
Reviewed-by: imsnah
Oleksiy Mishchenko [Fri, 27 Jun 2008 13:19:02 +0000 (13:19 +0000)]
QA for gnt-instance disk-grow option.
Reviewed-by: imsnah
Guido Trotter [Fri, 27 Jun 2008 12:07:49 +0000 (12:07 +0000)]
Change fping to TcpPing in two LUs
Two LUs are using RunCmd to call fping, in order to check for an IP
presence on the network. Substituting it with TcpPing will get rid of
it, which makes it not break in the new world order, where the master
cannot fork.
Reviewed-by: iustinp
Iustin Pop [Fri, 27 Jun 2008 09:14:00 +0000 (09:14 +0000)]
Implement cleanup for broken instance migration
The patch adds a new option for the “gnt-instance migrate” called
‘--cleanup’ that makes this command to perform a recovery from a
(possibly) failed migration instead of actually migrating the instance.
The cleanup checks that the instance runs on the correct node (or
update the config if not), and makes sure the disks are in single-master
mode by reconfiguring the drbd devices.
The patch also fixes a wrong description in an error message for the
migration prerequisite checks.
Reviewed-by: ultrotter
Iustin Pop [Thu, 26 Jun 2008 15:03:02 +0000 (15:03 +0000)]
Rework the migration implementation
The current migration code has many issues related to the
synchronization between nodes in the drbd network reconfiguration part.
As such, a new algorithm is implemented that uses the master as a
synchronization point, and implementing a cache of bdevs in the backend
to allow the rpc to resume from the previous state.
The code also splits the LU code in a few methods, so that we can reuse
it better when we implement the ‘--recover’ flag.
Reviewed-by: ultrotter
Iustin Pop [Thu, 26 Jun 2008 10:32:45 +0000 (10:32 +0000)]
Improve some corner cases in instance migration
Since the backend.DrbdReconfigNet function is called from the rpc layer,
it's better not to raise exception but to return a meaningful error
message.
The patch adds two try.. except blocks around the ReAttachNet() calls
which are the ones which most often return errors (among the few errors
that we have for now).
The patch also adds two SetDiskID calls in cmdlib's LUMigrateInstance
before the change-to-secondary which are needed depending on what is
saved in the config file.
Reviewed-by: ultrotter
Guido Trotter [Wed, 25 Jun 2008 08:14:41 +0000 (08:14 +0000)]
Add an instance_migratable rpc call
This call will check whether an instance is up on its primary, and that
it has been started with symlinks. We currently have no on-secondary
checks, nor any hypervisor specific call.
Reviewed-by: iustinp
Guido Trotter [Wed, 25 Jun 2008 08:14:27 +0000 (08:14 +0000)]
backend: improve MigrateInstance docstring
Document the arguments accepted by backend.MigrateInstance
Reviewed-by: iustinp
Guido Trotter [Wed, 25 Jun 2008 08:14:17 +0000 (08:14 +0000)]
backend: Fix a few instance related docstrings
{Start,Shutdown,Reboot}Instance take instance objects, not names.
Reviewed-by: iustinp
Guido Trotter [Wed, 25 Jun 2008 08:14:08 +0000 (08:14 +0000)]
Only burnin migrate with drbd8
Currently burnin would try (and fail) with remote_raid1 as well.
Reviewed-by: iustinp
Guido Trotter [Wed, 25 Jun 2008 08:13:57 +0000 (08:13 +0000)]
Add gnt-node migrate
This is the same as gnt-node failover, and is also a cut&paste of its
code (almost). It will be really really useful to quickly empty a
healthy node. I can be persuaded to merge MigrateNode and FailoverNode
in a common codebase, but could also forget about it and submit it if
nobody cares.
Reviewed-by: iustinp
Iustin Pop [Wed, 25 Jun 2008 06:45:52 +0000 (06:45 +0000)]
Cleanup LV status computation
Currently, when seeing if a LV is degraded or not (i.e. virtual volume),
we first attach to the device (which does an lvdisplay), then do a lvs
in order to display the lv_attr. This generates two external commands to
do (almost) the same thing.
This patch changes the Attach() method for LVs to call lvs and display
both the major/minor (needed for attach) and the lv_status (needed for
GetSyncStatus). Thus, later in GetSyncStatus, we don't need to run lvs
again, and instead just return the value computed in Attach().
Reviewed-by: imsnah
Oleksiy Mishchenko [Tue, 24 Jun 2008 18:19:56 +0000 (18:19 +0000)]
Add Instances/Nodes List/Bulk QA for RAPI
Revieved-by: imsnah
Michael Hanselmann [Tue, 24 Jun 2008 14:30:44 +0000 (14:30 +0000)]
backend: Remove symlinks by disk name, not using a wildcard
Reviewed-by: ultrotter
Iustin Pop [Tue, 24 Jun 2008 14:29:18 +0000 (14:29 +0000)]
grow-disk: wait until resync is completed
The patch adds a new ‘--no-wait-for-sync’ parameter to grow-disk similar
to the one in instance add, and changes the default to wait.
This is cleaner as at the moment when the command returns, we either
have a fully synced disk or there is an error.
Reviewed-by: ultrotter
Guido Trotter [Tue, 24 Jun 2008 12:16:22 +0000 (12:16 +0000)]
Burnin OpMigrateInstance
If the new --no-migrate flag is not passed live migrate all the
instances to their secondary nodes.
Reviewed-by: iustinp
Guido Trotter [Tue, 24 Jun 2008 12:16:07 +0000 (12:16 +0000)]
Burnin OpGrowDisk
With this patch both the os and the swap disk are grown during
burnin. You can pass an increase size of 0 to skip this operation.
Reviewed-by: iustinp
Iustin Pop [Tue, 24 Jun 2008 11:51:03 +0000 (11:51 +0000)]
grow-disk: refuse to start if the disk is degraded
DRBD refuses to grow the disk if it's not fully synch-ed, for example
it's still processing the result of a previous grow. As such, it's
better to prevent any operation unless all nodes look good.
The patch also adds a simple check for amount > 0.
Reviewed-by: imsnah
Guido Trotter [Tue, 24 Jun 2008 11:37:09 +0000 (11:37 +0000)]
Fix gnt-instance(8) grow-disk example wording
Improve a bit the grow-disk manpage example wording making it clearer
that "amount" is the difference, not the new disk size.
Reviewed-by: iustinp
Iustin Pop [Tue, 24 Jun 2008 10:44:44 +0000 (10:44 +0000)]
Enable TCP keep-alives in the twisted rpc client
Since a dead node will not be detected (in our current implementation)
by the twisted rpc layer, we should enable TCP-level keepalives in order
to at least detect TCP-level problems.
This patch changes the rpc.NodeController class to enable TCP-keepalives
before doing the remote call (so a failure in the connect/login sequence
will not be protected by this, but there twisted - if I understood
correctly - has its own timeouts).
We should always have a TCP-based transport under our object broker, but
I think it's better to check first for support of the setTcpKeepAlive
function and only if it's available call it.
Note that the interval and keepalive behaviour must be tuned at kernel
level (sysctls net.ipv4.tcp_keepalive_* in Linux) and cannot be tuned
from within our code.
Reviewed-by: imsnah
Iustin Pop [Tue, 24 Jun 2008 09:46:35 +0000 (09:46 +0000)]
xen: remove the config file after migration
Since it's not good to have the instance definition file left on the
source one if the migration succeeded, we remove it after the ‘xm
migrate’ call.
In order to do this using the existing self._RemoveConfigFile, we change
the protocol for this method to take the instance name and not instance
object as parameter, as in the migration we don't have the instance
object (and the only other caller is cheap to modify).
Reviewed-by: imsnah
Oleksiy Mishchenko [Tue, 24 Jun 2008 08:30:07 +0000 (08:30 +0000)]
Fix copy/paste error in RAPI comments
Revieved-by: imsnah
Iustin Pop [Tue, 24 Jun 2008 05:17:25 +0000 (05:17 +0000)]
Add a tool for cleaning up clusters
This tool (that is not installed, just available in the source tree)
helps with cleaning up clusters to a (as much as possible) pristine
state. This helps in preparing for QA runs. By cleaning I mean removing
all instances, logical volumes, nodes, etc. - so this will cause
complete data-loss if it's run on a normal cluster.
Limitations: it only deals with drbd8 and not md (if any md devices are
in use which use drbd devices, they will not be cleaned up). Also the
logic behind the cleaning up is not bulet-proof.
Reviewed-by: ultrotter
Oleksiy Mishchenko [Mon, 23 Jun 2008 14:23:00 +0000 (14:23 +0000)]
Provide bulk nodes and instances info
Add tags data to node/instance queries.
Reviewed-by: imsnah
Iustin Pop [Mon, 23 Jun 2008 14:09:23 +0000 (14:09 +0000)]
Update man pages with migration information
The patch adds the new ‘migrate’ subcommand to the gnt-instance man
page. Some details about the limitations are also given.
Reviewed-by: imsnah
Iustin Pop [Mon, 23 Jun 2008 14:09:12 +0000 (14:09 +0000)]
Live migration implementation in gnt-instance
This patch adds the subcommand ‘gnt-instance migrate’.
Reviewed-by: imsnah
Iustin Pop [Mon, 23 Jun 2008 14:05:55 +0000 (14:05 +0000)]
Add live failover implementation at LU level
This patch adds the OpCode and the LU to implement live migration.
Together with the recent symlink changes, this should allow live
migration.
Reviewed-by: imsnah
Iustin Pop [Mon, 23 Jun 2008 14:04:16 +0000 (14:04 +0000)]
Add a rpc call for drbd network reconfiguration
This patch adds a rpc call and backend.py implementation over the
low-level bdev.DRBD8.ReAttachNet() call.
The rpc call and reconfiguration is done in such a way that it needs to
be called to two nodes in parallel, and will wait (with one minute
timeout) until the two nodes are reconnected with the new configuration.
Note that this means that if the rpc calls cannot be launched in
parallel to the two nodes, this call will fail.
The patch also adds an extra attribute to the bdev.DRBD8Status class,
and changes the blockdev_close rpc to also pass the instance name to
backend.CloseBlockDevices(), in order to cleanup the symlinks (which are
create in the DrbdReconfigNet function).
Reviewed-by: imsnah
Iustin Pop [Mon, 23 Jun 2008 14:01:03 +0000 (14:01 +0000)]
Implement network reconfiguration for drbd8
This patch implements a simple network reconfiguration method for DRBD8
devices: it currently only allows changing of the multi-master flag.
The method can only be called when the device is attached, and it will
not work for (primary and diskless) devices, as then the shutdown will
fail; the consistency needs to be checked before attempting the
reconfiguration.
Reviewed-by: imsnah
Iustin Pop [Mon, 23 Jun 2008 13:22:12 +0000 (13:22 +0000)]
Fix the _RemoveBlockDevLinks() function
This fixes the removal of the instance symlinks (probably breakage from
the glob changes).
Reviewed-by: imsnah
Guido Trotter [Mon, 23 Jun 2008 12:50:57 +0000 (12:50 +0000)]
Fix the zombie process unittest
The failure is because in high load, the parent gets to run before the
child has the chance to os._exit(), and therefore it is still running
when the parent does the check.
The fix removes the chance of this happening by waiting to receive a SIGCHLD
(but not calling wait()) before trying to test the pid.
Reviewed-by: imsnah
Guido Trotter [Mon, 23 Jun 2008 11:06:56 +0000 (11:06 +0000)]
Create all SUB_RUN_DIRS in ganeti-noded
Rather than just creating BDEV_CACHE_DIR we loop through the
SUB_RUN_DIRS list and create all its childs.
Reviewed-by: iustinp
Guido Trotter [Mon, 23 Jun 2008 11:06:41 +0000 (11:06 +0000)]
Add a top level RUN_GANETI_DIR constant
This patch creates a base RUN_GANETI_DIR and then moves the other run
dir constants to use that (even if just setting BDEV_CACHE_DIR as equal
to it, rather than putting it deeper, for now).
Also we create a constant list of all the subdirs we need in RUN_DIR to
work properly, which we'll use when creating them in ganeti-noded.
Reviewed-by: iustinp
Guido Trotter [Mon, 23 Jun 2008 08:18:25 +0000 (08:18 +0000)]
Fix cut&paste error when removing symlinks
It's just whitespace... isn't it? uhm... :) Anyway, fixing an error made
when reformatting the code for the new "safer" behaviour.
Reviewed-by: iustinp
Guido Trotter [Mon, 23 Jun 2008 08:01:36 +0000 (08:01 +0000)]
Remove instance's symlinks
Add _RemoveBlockDevLinks auxiliary function, called when an instance
fails to start and when it is shut down.
Reviewed-by: iustinp
Guido Trotter [Mon, 23 Jun 2008 08:01:21 +0000 (08:01 +0000)]
Catch BlockDeviceError when starting instance
_GatherAndLinkBlockDevs used to raise the errors.BlockDeviceError
exception when it failed to create a block device, and with this patch
set it does so also when it fails to create a symlink to it.
With this patch we move the call to this function into a pre-existing
try-except block in the code, and catch the BlockDeviceError exception,
logging a message and returning a failure state if it happens.
Reviewed-by: iustinp
Guido Trotter [Mon, 23 Jun 2008 08:01:07 +0000 (08:01 +0000)]
Create symlinks to intances' block devices
Change the _GatherBlockDevs private function, called only one time by
StartInstance, to _GatherAndLinkBlockDevs, and make it transform the
device returned even more by calling the new _SimlinkBlockDev auxiliary
function.
This makes sure that every time an instance is started symlinks to its
block devices are created, and the instance is started off them, rather
than the underlying block devices.
Reviewed-by: iustinp
Guido Trotter [Mon, 23 Jun 2008 08:00:47 +0000 (08:00 +0000)]
symlinks: Add DISK_LINKS_DIR constant
The DISK_LINKS_DIR points to the RUN_DIR/ganeti/instance-disks
directory, which will contain symlinks to the instances' disks. These
provide a stable name accross all nodes for them, and permit
live-migration to happen.
Unfortunately RUN_DIR/ganeti/instance-disks happens to be below ganeti
1.2's BDEV_CACHE_DIR, which will we need to address at some point
(possibly in 2.0).
Reviewed-by: iustinp
Iustin Pop [Sun, 22 Jun 2008 10:46:55 +0000 (10:46 +0000)]
Add a ‘tags’ field to instance and node listing
Currently there isn't any easy way to list all nodes or instance and
their tags; you have to query each node in turn, or list all the tags
via something like “gnt-cluster search-tags '.*'”. Of course, this is
not optimal.
The patch adds a new fields to “gnt-instance list” and “gnt-node list”
called ‘tags’, that will list the tags of the object in comma-separated
form. This field will be empty if there are no tags (when using a
separator this output can still be parsed by other scripts).
At opcode level, there is a new fields called ‘tags’ that returns a
(python) list of the object tags.
Reviewed-by: ultrotter
Iustin Pop [Sun, 22 Jun 2008 05:06:57 +0000 (05:06 +0000)]
Update error message in instance failover
Currently both the hypervisor code and the backend code add a prefix of
'Failed to migrate instance' to the actual error output, thus doubling
this message.
The patch removes this string from the backend code.
Reviewed-by: imsnah
Iustin Pop [Sat, 21 Jun 2008 08:14:04 +0000 (08:14 +0000)]
Make testSignal unittest not depend on default shell
This patch changes the code executed when testing the signal handling
of RunCmd. Since sh does not always point to bash (e.g. on Ubuntu,
where it points to /bin/dash) this test might fail due to the returned
exit code is different so the received signal is not correctly
detected.
Additionally fix the docstring of testSignal.
(This is a backport from trunk)
Reviewed-by: iustinp
From: Manuel Franceschini <manuel.franceschini@gmail.com>
Iustin Pop [Sat, 21 Jun 2008 04:47:15 +0000 (04:47 +0000)]
Extra whitespace removal
This patch just removes two extra lines from bdev.py
Reviewed-by: amishchenko
Iustin Pop [Fri, 20 Jun 2008 10:59:15 +0000 (10:59 +0000)]
Add a rpc call for BlockDev.Close()
This patch adds rpc layer calls (in rpc.py and the equivalent in
ganeti-noded) to close a list of block devices, and the wrapper in
backend.py that takes a list of Disk objects, identifies them and
returns correctly formatted results.
The reason why this very basic call was missing until now from the rpc
layer is that we usually don't care about device closes (though we
should, and will do so in the future) as only drbd has a meaningful
Close() operation; right now we directly do Shutdown().
The patch is clean enough that it's actually independent of the live
migration implementation.
Reviewed-by: imsnah
Guido Trotter [Thu, 19 Jun 2008 15:32:27 +0000 (15:32 +0000)]
Skip N+1 checks if node_info is not complete
If we didn't get answers from all nodes there's no point in trying to
compute N+1 redundancy, as we don't have enough information to do it.
So we just skip the check altogether, and don't print anything (so it's
easier to see the real source of trouble).
Reviewed-by: imsnah
Guido Trotter [Thu, 19 Jun 2008 15:32:04 +0000 (15:32 +0000)]
Simplify hypervisor block_devices structure
The hypervisor doesn't need to be passed the whole block device
structure, so we'll just give it the block device name on the local
node, and the name as seen by the instance. This will make it easier to
manipulate it later without messing with the block devices (eg. by
changing the system name to a symlink to the name itself).
Since the HVM hypervisor changes the "virtual" name a note is added
calling for a redesign that doesn't need this change, as different
hypervisors and emulation types will anyway have different names for
exported devices.
Reviewed-by: iustinp
Iustin Pop [Wed, 18 Jun 2008 14:58:21 +0000 (14:58 +0000)]
Fix bdev unittest when run under distcheck
The path to the filename for drbd8 proc data is not correctly computed
when using distcheck. The patch duplicates it from the other drbd tests.
Reviewed-by: ultrotter
Michael Hanselmann [Wed, 18 Jun 2008 14:20:34 +0000 (14:20 +0000)]
Reduce allowed size and number of tags
We pass cluster tags to the cluster-verify hook in an environment
variable. Due to environment size restrictions, we need to limit
tags more than before. Longer tags can still be removed.
Reviewed-by: ultrotter
Michael Hanselmann [Wed, 18 Jun 2008 14:20:09 +0000 (14:20 +0000)]
Add tags to “gnt-cluster verify” hook environment
Reviewed-by: ultrotter
Iustin Pop [Wed, 18 Jun 2008 10:28:28 +0000 (10:28 +0000)]
Rework the DRBD8 device status computation
Currently, compute the status of a drbd8 device in GetSyncStatus and
return only the values that we need (and fit in the framework of
GetSyncStatus). However, the full status details are useful (and needed)
in other places, so the patch attempts to improve this situation.
We abstract the status of a device outside in a separate class, that
knows how to parse contents from /proc/drbd and set easily accessible
attributes. We then simplify the GetSyncStatus to use this and return
the values that it needs, and add a separate method that returns the
full status object.
The move to a separate class cleans up a little bit the old
sync-progress computation from GetSyncStatus, but it's still many
regexes.
The patch also adds unittests for a few statuses, and modifies one
BaseDRBD call to accept a custom filename instead of '/proc/drbd' to
ease unittests.
Reviewed-by: imsnah
Iustin Pop [Tue, 17 Jun 2008 15:07:23 +0000 (15:07 +0000)]
Allow disk object to set their own physical ID
Currently, the way to customize a DRBD disk from (node name 1, node name
2, port) to (ip1, port, ip2, port) is to use the ConfigWriter method
SetDiskID. However, since this needs a ConfigWriter object, it can be
run only on the master, and therefore disk object can't be passed to
more than one node unchanged. This, coupled with the rpc layer
limitation that all nodes in a multi-node call receive the same
arguments, prevent any kind of multi-node operation that has disks as an
argument.
This patch takes the SetDiskID method from ConfigWriter and ports it to
the disk object itself, and instead of the full node configuration it
uses a simple {node_name: replication_ip} mapping for all the nodes
involved in the disk tree (currently we only pass primary and secondary
node since we don't support nested drbd devices).
This allows us to send disks to both the primary and secondary nodes at
once and perform synchronized drbd activation on primary/secondary
nodes.
Note that while for the 1.2 branch this will not change old methods, it
is worth to investigate and possible replace all such calls on the
master to the nodes themselves for the 2.0 branch.
Reviewed-by: ultrotter
Iustin Pop [Tue, 17 Jun 2008 13:10:55 +0000 (13:10 +0000)]
Fix an error-handling case
There is a mistake in handling grow-disk for an invalid disk. This patch
fixes it.
Reviewed-by: imsnah
Iustin Pop [Tue, 17 Jun 2008 06:44:59 +0000 (06:44 +0000)]
Manpage updates for the new grow-disk command
The patch documents the steps needed to complete a user-visible grow
(i.e. not only grow-disk, but also filesystem resize is needed, etc.)
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 16:26:24 +0000 (16:26 +0000)]
Implement gnt-instance grow-disk
This patch exposes at command line level the grow-disk operation.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 16:25:19 +0000 (16:25 +0000)]
Implement disk grow at LU level
This patch adds a new opcode and LU for growing an instance's disk.
The opcode allows growing only one disk at time, and will throw an error
if the operation fails midway (e.g. on the primary node after it has
been increased on the secondary node). As such, it might actually leave
different sized LVs on different nodes, but this will not create
problems.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 16:21:21 +0000 (16:21 +0000)]
Add method to update a disk object size
This patch adds a method that implements updating of a disk
(object.Disk) size, together with its children.
While this will not track the exact disk size, it allows at least an
approximate size to be recorded in the configuration (and queried).
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 16:20:09 +0000 (16:20 +0000)]
Implement block device grow at the rpc layer
This simple patch exposes the block device grow operation at the rpc
layer. It does not increase the protocol version as it has been recently
changed by the live failover rpc call.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 15:08:18 +0000 (15:08 +0000)]
Expose block device grow in backend.py
This patch adds a wrapper over the block device grow operation that
converts the input and output parameters as needed for the rpc layer.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 15:04:01 +0000 (15:04 +0000)]
bdev: implement disk resize for lvm/drbd8
This patch implements disk resize at the bdev level for the LVM and
DRBD8 disk types. It is not implemented for DRBD7 and MD since the way
MD works with its underlaying devices makes it harder and this
combination is also deprecated.
The LVM resize operation is tried three times, with different allocation
policies:
- contiguous first, since this is best for allocation purposes (it
won't fragment too much the PV)
- cling, which is supported only by more recent LVM versions, will try
to place the new extents on the same PV as the rest of the LV
- and finally normal, which is the default
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 13:52:43 +0000 (13:52 +0000)]
Add migration support at the rpc layer
This patch adds the migration rpc call and its implementation in the
backend. The patch does not deal with the correct activation of disks.
Because of the new RPC, the protocol version is increased.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 10:29:24 +0000 (10:29 +0000)]
hypervisor: add live migration support
This is just the hypervisor-level migration (e.g. “xm migrate”) not the
whole node coordination work.
Reviewed-by: ultrotter
Michael Hanselmann [Mon, 16 Jun 2008 08:06:04 +0000 (08:06 +0000)]
ganeti-watcher: Replace custom exceptions with ganeti.error.*
Reviewed-by: iustinp
Michael Hanselmann [Mon, 16 Jun 2008 08:05:38 +0000 (08:05 +0000)]
ganeti-watcher: Don't write file if data didn't change
This is the safest way to detect changes and the amount of data
is small, so keeping a copy around is cheap enough.
Reviewed-by: iustinp
Michael Hanselmann [Mon, 16 Jun 2008 08:05:18 +0000 (08:05 +0000)]
ganeti-watcher: Rename WatcherState.data to WatcherState._data
Cleanup: _data is private and should not be modified from outside
of this class.
Reviewed-by: iustinp
Michael Hanselmann [Mon, 16 Jun 2008 08:04:54 +0000 (08:04 +0000)]
Don't log SystemExit exception in ganeti-watcher
Reviewed-by: iustinp
Michael Hanselmann [Mon, 16 Jun 2008 08:04:29 +0000 (08:04 +0000)]
Replace watcher state file atomically
- Lock it before renaming
- Code cleanup; close() automatically unlocks it
Reviewed-by: iustinp
Michael Hanselmann [Mon, 16 Jun 2008 08:04:02 +0000 (08:04 +0000)]
Write ganeti-watcher status file even if something failed
Reviewed-by: iustinp
Michael Hanselmann [Mon, 16 Jun 2008 08:03:32 +0000 (08:03 +0000)]
Add more parameters to utils.WriteFile
- Make closing file optional: Required by ganeti-watcher to keep
file open after writing it. Changes return value of utils.WriteFile
if "close" parameter evaluates to True.
- Pre- and post-write functions: Can be used to lock files. This
will be used by ganeti-watcher to lock the temporary file before
renaming.
Reviewed-by: iustinp
Michael Hanselmann [Mon, 16 Jun 2008 08:03:08 +0000 (08:03 +0000)]
Use ganeti.serializer module in ganeti-watcher
Reviewed-by: ultrotter
Michael Hanselmann [Mon, 16 Jun 2008 08:02:27 +0000 (08:02 +0000)]
Replace custom logging code in watcher with logging module
- Log timestamp for all messages
- Write everything to logfile and optionally to stderr
- Log messages are no longer buffered, allowing a user to see progress
Reviewed-by: ultrotter
Michael Hanselmann [Mon, 16 Jun 2008 08:01:51 +0000 (08:01 +0000)]
Make sure serialized data ends with EOL character
Also fix the regular expression to not remove newlines. The simplejson
module puts whitespace at line endings when using indentation. Remove
unnecessary import of ConfigParser module.
Reviewed-by: ultrotter
Iustin Pop [Sun, 15 Jun 2008 05:21:16 +0000 (05:21 +0000)]
Fix an error message in instance add
There is a mistake in the error message generated when we can't reach a
node for checking for available disk space. Without it, the error
message is:
Failure: prerequisites not met for this operation:
Cannot get current information from node '{u'gnte2.lab.k1024.org':
{'cpu_total': 1, 'memory_free': 480, 'vg_size': 131068, 'memory_total':
504, 'bootid': '
2176dd3b-2f96-42f0-8b6e-
2873ecaf5f9c', 'memory_dom0':
134, 'vg_free': 130172}, u'gnte1.lab.k1024.org': False}'
instead of the expected:
Failure: prerequisites not met for this operation:
Cannot get current information from node 'gnte2.lab.k1024.org'
Reviewed-by: imsnah
Guido Trotter [Sat, 14 Jun 2008 08:34:27 +0000 (08:34 +0000)]
Activate down instances' disks on replace-disks
When replacing disks or evacuating nodes with instances administratively
down ganeti fails because the instance disks are not active. This patch
activates them, performs the replacement, and shuts them down again.
Changing this also fixes the same issue on gnt-node evacuate.
Reviewed-by: iustinp
Guido Trotter [Sat, 14 Jun 2008 08:34:10 +0000 (08:34 +0000)]
FailoverInstance: change AddInstance with Update
We're not adding a new instance, just making configuration changes to
the one we're working on.
Reviewed-by: imsnah
Guido Trotter [Sat, 14 Jun 2008 08:33:51 +0000 (08:33 +0000)]
Burnin: Use iallocator in import/export
Currently the iallocator option is ignored by burnin at import/export
time even if passed in. With this patch it becomes used. The log message
used by the importer is also changed to reflect this.
This patch also improves import/export on the non-iallocator case:
- The secondary node is not passed anymore on non-mirrored templates
- On mirrored templates the secondary node is logged
Reviewed-by: imsnah
Iustin Pop [Fri, 13 Jun 2008 13:16:27 +0000 (13:16 +0000)]
Fix building of auto-generated rapi docs
This is needed for a clean checkout to pass 'make distcheck'
Reviewed-by: imsnah
Iustin Pop [Fri, 13 Jun 2008 12:59:25 +0000 (12:59 +0000)]
Make final 1.2.4 release
Reviewed-by: imsnah
Guido Trotter [Wed, 11 Jun 2008 09:58:32 +0000 (09:58 +0000)]
Ganeti version 1.2.4~rc2
Third release candidate for the 1.2.4 release.
Reviewed-by: iustinp
Alexander Schreiber [Mon, 9 Jun 2008 12:31:51 +0000 (12:31 +0000)]
Ganeti 1.2.4 upgrade hint for existing HVM clusters
Add an upgrade hint for anybody upgrading existing HVM clusters to Ganeti
1.2.4 to avoid mysteriously non-starting instances after the upgrade.
Reviewed-by: iustinp
Alexander Schreiber [Mon, 9 Jun 2008 10:01:28 +0000 (10:01 +0000)]
Use default values for unset HVM instance attributes.
This patch is intended reduce the update friction for those who used
HVM with the Ganeti 1.2.3 version. Newly introduced HVM instance flags
will be unset for existing HVM instances after the upgrade. Those unset
flags will be treated as set to the previously hardcoded values where
this makes sense (ACPI and PAE flags).
Reviewed-by: iustinp
Guido Trotter [Tue, 3 Jun 2008 10:42:14 +0000 (10:42 +0000)]
Ganeti version 1.2.4~rc1
Second release candidate for the 1.2.4 stable release.
Reviewed-by: schreiberal
Iustin Pop [Sat, 31 May 2008 21:59:47 +0000 (21:59 +0000)]
Add check for node memory in instance creation
Currently the check for enough memory is done only on instance start
command and failover command. But we also start an instance in instance
create, therefore we need to check this instead of failing to start in
the hypervisor phase.
The patch adds a check for node memory in the case the creation command
specifies that the instance should be started. It is allowed for the
memory to be less than needed if the instance will not be started, in
order to allow migration and other such cases.
Reviewed-by: imsnah
Michael Hanselmann [Fri, 30 May 2008 10:55:17 +0000 (10:55 +0000)]
Fix two problems in QA scripts
- Failover back to original node in instance failure test
- Exclude secondary node from list of potential nodes in
replace-disks test
Reviewed-by: iustinp
Michael Hanselmann [Fri, 30 May 2008 10:55:03 +0000 (10:55 +0000)]
Add QA tests for “gnt-instance reboot”
Reviewed-by: ultrotter
Michael Hanselmann [Fri, 30 May 2008 10:54:43 +0000 (10:54 +0000)]
Add QA test for “gnt-instance replace-disks”
Reviewed-by: iustinp
Iustin Pop [Fri, 30 May 2008 10:51:22 +0000 (10:51 +0000)]
LURemoveInstance: fix op.ignore_failures usage
Currently: the LURemoveInstance.Exec() method uses the ignore_failures
attribute of the OpRemoveInstance opcode, but it doesn't check for its
existence. The patch adds this attribute to _OP_REQP and to all the
places where this opcode was created.
This attributes is always passed by gnt-instance, but burnin didn't pass
it so it can fail if it enters the 'fail to remove disks' branch of the
method (which is why it was not triggered until now).
Reviewed-by: ultrotter, imsnah
Oleksiy Mishchenko [Fri, 30 May 2008 09:56:58 +0000 (09:56 +0000)]
Fix exception handling for lock timeout in RAPI
Michael Hanselmann [Fri, 30 May 2008 08:18:40 +0000 (08:18 +0000)]
Fix typo in rapi docstring
Reviewed-by: amishchenko
Oleksiy Mishchenko [Thu, 29 May 2008 16:39:10 +0000 (16:39 +0000)]
RAPI Docstrings update.
Reviewed-by: imsnah
Michael Hanselmann [Thu, 29 May 2008 13:25:52 +0000 (13:25 +0000)]
Update gnt-instance and gnt-backup manpages
- Add --iallocator options
- Small text fixes
Reviewed-by: ultrotter
Michael Hanselmann [Thu, 29 May 2008 12:05:21 +0000 (12:05 +0000)]
Add manpage for ganeti-rapi daemon
Reviewed-by: iustinp
Michael Hanselmann [Thu, 29 May 2008 12:05:03 +0000 (12:05 +0000)]
Fix wrong filename in ganeti-watcher manpage
Reviewed-by: iustinp