Iustin Pop [Tue, 3 Feb 2009 14:45:03 +0000 (14:45 +0000)]
Documentation: update the gnt-os manpage
This patch updates the gnt-os man page and the common footer page for
ganeti 2.0.
Reviewed-by: ultrotter
Iustin Pop [Tue, 3 Feb 2009 10:55:30 +0000 (10:55 +0000)]
Small patch for handling errors in node add
This small path hopefully fixes the handling of ssh verify errors in
node add (note: untested).
Reviewed-by: ultrotter
Iustin Pop [Tue, 3 Feb 2009 10:55:19 +0000 (10:55 +0000)]
ssh: more details on failure
In case we fail without output from the ssh command, we should at least
add the exit code or any other failure reason to the error message, and
log it and the cmdline used to the node daemon log.
Reviewed-by: imsnah
Guido Trotter [Tue, 3 Feb 2009 10:45:12 +0000 (10:45 +0000)]
Give a sane permission to the known_host file
Reviewed-by: iustinp
Iustin Pop [Mon, 2 Feb 2009 14:49:10 +0000 (14:49 +0000)]
A couple of small changes to the OS environment
This patch correctly exports the mode of disks (rw/ro) and also exports
the instance OS.
Reviewed-by: imsnah
Iustin Pop [Mon, 2 Feb 2009 11:23:48 +0000 (11:23 +0000)]
Whitespace change: bad indentation in constants.py
This patch only changes some indentation in constants.py.
Reviewed-by: imsnah
Iustin Pop [Mon, 2 Feb 2009 11:23:40 +0000 (11:23 +0000)]
Return error messages in node add ssh handling
When the rpc call node_add fails, we don't have any error message. This
patch changes the call to return (status, data) so that the user can see
the correct error message.
Reviewed-by: imsnah
Guido Trotter [Sun, 1 Feb 2009 09:48:37 +0000 (09:48 +0000)]
gnt-instance: support no_PARAMETER value
Since parameters get set to False if a no_ is prefixed don't try to
interpret those boolean values, and pass them unchanged.
Reviewed-by: iustinp
Guido Trotter [Sun, 1 Feb 2009 09:48:23 +0000 (09:48 +0000)]
LUQueryClusterInfo: filter hvparams
We don't need to show hvparams for hypervisors which are not enabled on
the cluster.
Reviewed-by: iustinp
Guido Trotter [Thu, 29 Jan 2009 15:51:58 +0000 (15:51 +0000)]
KVM: advise about VNC support on GetShellCommand
Reviewed-by: iustinp
Guido Trotter [Thu, 29 Jan 2009 15:51:44 +0000 (15:51 +0000)]
KVM: enable VNC if a VNC_BIND_ADDRESS is defined
We'll also enable a tablet usb device, as suggested by the kvm man page.
Reviewed-by: iustinp
Guido Trotter [Thu, 29 Jan 2009 15:51:29 +0000 (15:51 +0000)]
KVM: Allow the HV_VNC_BIND_ADDRESS parameter
Reviewed-by: iustinp
Guido Trotter [Thu, 29 Jan 2009 15:51:14 +0000 (15:51 +0000)]
LUAddNode: copy the vnc password file also for KVM
Before we used to copy the file if xen-hvm was enabled on the cluster,
no we'll do that if any enabled hypervisor is in the new HTS_USE_VNC
group.
Reviewed-by: iustinp
Guido Trotter [Thu, 29 Jan 2009 15:51:00 +0000 (15:51 +0000)]
Add HT_KVM to HTS_REQ_PORT
HT_KVM doesn't technically require a port, but if it has one it can give
vnc displays to instances.
Reviewed-by: iustinp
Guido Trotter [Thu, 29 Jan 2009 15:50:38 +0000 (15:50 +0000)]
KVM: make the kernel and initrd arguments optional
Under KVM we don't strictly need a kernel and initrd. If some are passed
we'll use them, otherwise the guest OS will need to behave as fully
native, and have its own boot loader and kernel.
The root_path hypervisor parameter becomes mandatory only if a kernel is
specified.
Reviewed-by: iustinp
Guido Trotter [Thu, 29 Jan 2009 15:47:21 +0000 (15:47 +0000)]
KVM: add the HV_SERIAL_CONSOLE parameter
Up until now a KVM instance was forced to have a serial port.
With this change this is no longer mandatory, by default we'll use one,
but if the HV_SERIAL_CONSOLE parameter is set to False we'll do without.
Reviewed-by: iustinp
Guido Trotter [Thu, 29 Jan 2009 15:47:06 +0000 (15:47 +0000)]
GetShellCommand: get hvparams and beparams
Sometimes the hypervisor will use the instance hv and/or be parameters
to determine the best shell command. This is not possible, though,
currently, as the instance hv/beparams are not filled, so we have to
pass the filled versions separately.
Reviewed-by: iustinp
Iustin Pop [Thu, 29 Jan 2009 15:09:21 +0000 (15:09 +0000)]
Implement software release version checks too
Currently the LUVerifyCluster only reports the protocol version changes,
not software ones. This is useful to know/monitor, so we add this too as
a warning.
Reviewed-by: ultrotter
Iustin Pop [Thu, 29 Jan 2009 15:09:11 +0000 (15:09 +0000)]
gnt-instance list: accept input names
Currently gnt-instance list will refuse to take arguments, and always
return the full list of instances. This patch allows it to pass names to
LUQueryInstances, so that we restrict the input to a given set of
instances.
Reviewed-by: ultrotter
Iustin Pop [Thu, 29 Jan 2009 15:08:57 +0000 (15:08 +0000)]
LUQueryInstances: keep the given order of names
Currently LUQueryInstances keeps the ordering of instances only in some cases,
and in others it will reorder the list. This patch fixes this by more clearly
separating the various cases (names passed or not and locking or not locking),
so that the output list is in the same order as always.
Of course, this disables the sorting when arguments are passed.
Reviewed-by: ultrotter
Iustin Pop [Thu, 29 Jan 2009 15:08:46 +0000 (15:08 +0000)]
locking.LockSet: don't modify input arguments
Currently LockSet.acquire() sorts in place it's input argument if it's a
list. This is not good, since callers might depend on a specific
ordering of the input data, and this is a 'hidden' modification.
We fix it by simply using a sorted copy, instead of sorting in place.
Reviewed-by: ultrotter
Iustin Pop [Thu, 29 Jan 2009 15:08:34 +0000 (15:08 +0000)]
Re-wrap some lines to keep them under 80 chars
This non-code change rewraps some lines in locking.py to keep them under
80 chars.
Reviewed-by: ultrotter
Iustin Pop [Thu, 29 Jan 2009 15:08:24 +0000 (15:08 +0000)]
Check that instance exists before confirm. queries
Currently we ask the user for confirmation, and only after (try to)
remove, failover or migrate the instance. This doesn't work nicely if
the instance doesn't exist, so we make a query for the instance before
the prompt, which will throw an error in case it doesn't exist.
Side-note: the way the query works today is not really nice. It would be
better if we could query explicitly for a missing instance name, so that
this is done cleaner (explicit check) instead of side-effect (throw
exception). We do add code for this explicit check, except that today it
won't be used actually.
Reviewed-by: ultrotter
Oleksiy Mishchenko [Thu, 29 Jan 2009 15:03:42 +0000 (15:03 +0000)]
RAPI: tag work
Generalize tag work for instances/nodes/cluster tag management.
Reviewed-by: iustinp
Oleksiy Mishchenko [Thu, 29 Jan 2009 15:03:00 +0000 (15:03 +0000)]
RAPI: rlib1 removal
The resources we still need moved to rlib2.
Reviewed-by: iustinp
Oleksiy Mishchenko [Thu, 29 Jan 2009 15:02:20 +0000 (15:02 +0000)]
RAPI: Implement /2 resource
Reviewed-by: iustinp
Oleksiy Mishchenko [Thu, 29 Jan 2009 14:52:41 +0000 (14:52 +0000)]
RAPI: Deprecate version Rapi version1
It is impossible to keep backward compatibility due to
significant changes in the Ganeti core.
Reviewed-by: iustinp
Iustin Pop [Wed, 28 Jan 2009 19:06:11 +0000 (19:06 +0000)]
Fix gnt-cluster modify -H and offline nodes
Reviewed-by: ultrotter
Iustin Pop [Wed, 28 Jan 2009 19:06:00 +0000 (19:06 +0000)]
Actually mark drives as read-only if so configured
This patch correctly marks the drives as read-only for Xen, and raises
and exception for KVM since it doesn't support read-only drives.
Reviewed-by: ultrotter
Iustin Pop [Wed, 28 Jan 2009 14:46:58 +0000 (14:46 +0000)]
Fix some issues related to job cancelling
This patch fixes two issues with the cancel mechanism:
- cancelled jobs show as such, and not in error state (we mark them as
OP_STATUS_CANCELED and not OP_STATUS_ERROR)
- queued jobs which are cancelled don't raise errors in the master (we
treat OP_STATUS_CANCELED now)
Reviewed-by: imsnah
Guido Trotter [Tue, 27 Jan 2009 16:44:38 +0000 (16:44 +0000)]
Xen: use utils.WriteFile for the instance configs
Also raise HypervisorError rather than OpExecError.
Reviewed-by: iustinp
Guido Trotter [Tue, 27 Jan 2009 16:44:23 +0000 (16:44 +0000)]
Xen: use utils.Readfile to read the VNC password
Also raise HypervisorError rather than OpExecError.
Reviewed-by: iustinp
Iustin Pop [Tue, 27 Jan 2009 15:41:38 +0000 (15:41 +0000)]
Implement disk verify checks in config verify
This patch adds a simple check that the 'mode' attribute of top-level disks is
correct. It does not recurse over children.
The framework could be extended with other checks in the future.
Reviewed-by: imsnah
Iustin Pop [Tue, 27 Jan 2009 15:41:26 +0000 (15:41 +0000)]
Fix the mode attribute of newly-created disks
Currently, only the LUSetInstanceParams correctly sets up the mode
attribute via a manual operation. We remove this and instead do the
correct setting in the generic _GenerateDiskTemplate function, so that
we set the mode correctly for all disk creations.
Reviewed-by: ultrotter
Iustin Pop [Tue, 27 Jan 2009 15:41:15 +0000 (15:41 +0000)]
Rework the multi-instance gnt commands
This patch changes the multi-instance gnt-* commands (gnt-instance
start/stop, gnt-node evacuate/failover) such that the individual
operations are submitted in parallel, ideally improving the speed of the
execution.
The patch does this by abstracting the job set functionality into a new
class in cli.py, that takes care of the job submit, job poll and error
handling.
Reviewed-by: ultrotter
Iustin Pop [Tue, 27 Jan 2009 15:41:01 +0000 (15:41 +0000)]
Fix single-job archiving (gnt-job archive)
This is a simply typo from the conversion to multi-job archiving.
Reviewed-by: imsnah
Guido Trotter [Tue, 27 Jan 2009 11:31:38 +0000 (11:31 +0000)]
KVM and Xen: add the HV_ROOT_PATH parameter
This parameter allows a different path to be passed to the instance
kernel. The new parameter is mandatory, and by default has the value of
the old hardcoded value for both kvm and xen.
Beta1 clusters will need to have this parameter added for their
instances to be able to boot.
Reviewed-by: iustinp
Guido Trotter [Tue, 27 Jan 2009 11:31:19 +0000 (11:31 +0000)]
KVM: implement GetShellCommandForConsole
This is a class method, because it calls _InstanceSerial, which is
another class method. The patch changes it to classmethod for all the
hypervisor classes.
Reviewed-by: iustinp
Guido Trotter [Tue, 27 Jan 2009 11:30:57 +0000 (11:30 +0000)]
KVM: classify _Instance{Monitor,Serial,KVMRuntime}
Those methods need nothing from the instantiated class, and just
manipulate strings, and fetch some class global variables, so they can
be classmethods.
Reviewed-by: iustinp
Iustin Pop [Mon, 26 Jan 2009 15:08:02 +0000 (15:08 +0000)]
Release 2.0 beta 1
Even though alpha started at 0, we release beta 1 first as we did for
1.2.
Reviewed-by: imsnah, ultrotter
Iustin Pop [Mon, 26 Jan 2009 12:34:59 +0000 (12:34 +0000)]
Update the NEWS documents for beta1
Also import the NEWS entries from the 1.2 branch which were added since
we created it.
Reviewed-by: ultrotter
Guido Trotter [Fri, 23 Jan 2009 17:02:26 +0000 (17:02 +0000)]
Xen and KVM: correct a typo when checking args
A missing 'be' was present in the error string for both xen and kvm,
when the kernel or initrd path was not absolute.
Reviewed-by: imsnah
Iustin Pop [Fri, 23 Jan 2009 13:33:41 +0000 (13:33 +0000)]
Sort the instance names in batcher
In case we submit multiple instances via batcher, it's nicer to have the
sorted nicely.
Reviewed-by: imsnah
Iustin Pop [Fri, 23 Jan 2009 13:33:32 +0000 (13:33 +0000)]
Fix batcher for 2.0-style disks and nics
This patch fixes the gnt-instance batch-create command, and in doing so
also slightly changes two other functions:
- we change utils.ParseUnit so that it accepts integer values also
(both ParseUnit(5) and ParseUnit("5") return the same value)
- a bridge 'None' in LUCreateInstance will be converted to the default
bridge; currently only missing bridges will be accepted to mean the
default one
The main changes to batcher were the change to variable number of disks
and NICs.
The patch also adds a batcher-instances.json example file copied from
the 1.2 branch and properly modified.
Reviewed-by: imsnah, killerfoxi
Iustin Pop [Fri, 23 Jan 2009 12:36:44 +0000 (12:36 +0000)]
Make iallocator work with offline nodes
This patch changes the iallocator framework to work with and properly
export to plugins offline nodes. It does this by only exporting the
static configuration data for those nodes, and not attempting to parse
the runtime data.
The patch also fixes bugs in iallocator related to the RpcResult
conversion, changes the should_run to admin_up attribute name (as per
the internals change), and adds “-I” as a short option for
“--iallocator” in gnt-instance, gnt-backup and burnin.
Reviewed-by: ultrotter
Iustin Pop [Fri, 23 Jan 2009 12:36:28 +0000 (12:36 +0000)]
Remove checking of DRBD metadata for validity
Currently the DRBD code checks that the metadata devices are valid
before creation, initial disk attachment and add children.
However, the process for checking validity requires a free DRBD minor,
and this conflict with parallel checking.
There are at least three possible solutions:
- serialize all checks, which means we reduce parallelism and need
extra locks
- don't pass a valid minor number, but one like “/dev/drbd256” (which
is invalid); this works for current version of DRBD, but since it's
not guaranteed to remain so it doesn't look nice
- don't do the checking at all, and rely on “drbdsetup ... disk ...”
to fail by itself
The reason for checking metadata was that in 1.2, this was much cheaper
than trying to activate devices (and the subsequent iteration over the
minors). However, in 2.0, they have the same cost, so we can choose
option 3: just remove the explicit checking and rely on drbdsetup and
the kernel to fail.
Since DRBD8._InitMeta still requires a minor number, the two places
where this is run are handled as follows:
- Create: we just use our own (unused currently) minor number
- AddChildren: we keep using FindUnusedMinor, with the caveat that
this function (used by replace-disks -n ...) cannot be yet
parallelized
Reviewed-by: ultrotter
Iustin Pop [Fri, 23 Jan 2009 12:36:18 +0000 (12:36 +0000)]
Rework the execution model in burnin
This patch changes (significantly) the execution model in burnin:
- for all runs, (almost) all instance mods in a single Burn* procedure
are done as part of a job; so for example add disk, stop, remove
disk, start are no longer done as separate jobs but as a single job
consisting of four opcodes
- for parallel runs, all Burn* procedures except the rename (which
uses a single target name) run in parallel; before, only the
creation was done in parallel
- due to the single-job execution and also parallel execution, the
logging messages are no longer happening synchronously with the
execution, so they are more informative than an actual execution log
The end result is that burnin now tests properly multi-opcode jobs and
also tests all opcodes (except rename) for parallel execution.
Note: On a test cluster, parallelization reduces burnin time from 23m to
15m.
Reviewed-by: ultrotter
Iustin Pop [Fri, 23 Jan 2009 12:36:09 +0000 (12:36 +0000)]
Relax the restrictions on temporary DRBD minors
Currently the restrictions are too harsh: there is a time interval
between an instance gets a new disk and before it is added to the
configuration in which the restriction is not met. We solve this by
allowing temporary DRBD minors to match existing minors (for the same
instance), such that parallel creations/minor allocations are OK.
The change is done by moving the add of temporary minors to the
minor map after the instance minors are computed, and only considering
them as duplicate if the instance name doesn't match.
Reviewed-by: ultrotter
Iustin Pop [Fri, 23 Jan 2009 12:36:00 +0000 (12:36 +0000)]
Introduce more configuration consistency checks
This patch enhances the duplicate DRBD minors checks (currently just a
few) and adds automatic checks of configuration consistency at
configuration file writing time.
In order to do so and show meaningful error messages, the
_UnlockedComputeDRBDMap function is changed to not raise errors in case
of duplicates, but instead return both the minors map and the duplicate
list, and its callers now raise the error. This allows the VerifyConfig
function to return a complete list of duplicates.
The new checks required some small updates to the unittests for the
config module.
Reviewed-by: ultrotter
Iustin Pop [Fri, 23 Jan 2009 10:15:55 +0000 (10:15 +0000)]
Fill the 'call' attribute of offline rpc results
When creating ‘fake’ results for offline nodes, we currently don't pass
the call attribute. This complicates debugging, so even though this
should not matter in practice, it's better to fix it.
Reviewed-by: imsnah
Iustin Pop [Fri, 23 Jan 2009 09:13:44 +0000 (09:13 +0000)]
A couple of small fixes to iallocator
This removes some constraints:
- only two disks supported, this is no longer true as the underlying
functions can now compute size for a variable number of disks
- error when the hypervisor was not being passed
- typo error
Reviewed-by: imsnah
Iustin Pop [Thu, 22 Jan 2009 16:39:40 +0000 (16:39 +0000)]
luxi: close and reopen the socket on errors
This is less of an actual issue for regular gnt-* clients, but it's
easily reproducible with burnin and possible with RAPI (depending on how
the program uses luxi.Client(s)).
In case of burnin, if we interrupt the client (^C) while it polls the
job, it will abort and raise an error. After that, burnin issues a
remove instance job, and at this point, we send the submit job (remove)
call but the first thing we read from the socket will be the response to
the previous poll job request, since that was queued already from the
master.
To solve this, whenever we detect an error in Transport.Call(), we close
that transport and re-create a new one, to start anew. The other
alternative would be to introduce a sequence to the protocol, but this
is something that would be design-level change and it's not recommended
at this stage.
Reviewed-by: imsnah
Guido Trotter [Wed, 21 Jan 2009 18:23:59 +0000 (18:23 +0000)]
ShutdownInstance: log instance name, not object
When an instance fails to shut down we currently log its whole object,
rather than just the instance name.
Reviewed-by: iustinp
Guido Trotter [Wed, 21 Jan 2009 18:23:44 +0000 (18:23 +0000)]
KVM live migration: handle failure
If the KVM live migration ends up in a 'failed' state it has been
aborted at the kvm level, and the machine is still running locally.
We support also the 'cancelled' state even though there should be no way
of reaching it, without manual intervention.
Reviewed-by: iustinp
Guido Trotter [Wed, 21 Jan 2009 18:23:29 +0000 (18:23 +0000)]
KVM: change a few IOError with EnvironmentError
Reviewed-by: iustinp
Guido Trotter [Wed, 21 Jan 2009 18:23:13 +0000 (18:23 +0000)]
KVM: instance migration
The tcp port used for migrating KVM instances is selectable at
./configure time. We use a single port as nodes are locked anyway during
a migration, so no two migrations can happen at the same time to the
same node.
Reviewed-by: iustinp
Guido Trotter [Wed, 21 Jan 2009 18:22:54 +0000 (18:22 +0000)]
KVM: add the _InstancePidAlive function
Throughout the kvm code we very often look for the instance pidfile
name, read it, and check if the process is alive. Abstract this into a
private function and use that one instead.
This patch also changes RebootInstance to check whether the instance is
alive before trying to reboot it.
Reviewed-by: iustinp
Guido Trotter [Wed, 21 Jan 2009 18:22:35 +0000 (18:22 +0000)]
KVM: fix RebootInstance
RebootInstance was broken, because it just used to call StartInstance
with wrong parameters. With this patch we still stop the instance, but
use the saved kvm runtime to start it again.
Reviewed-by: iustinp
Guido Trotter [Wed, 21 Jan 2009 18:22:17 +0000 (18:22 +0000)]
KVM: retry the instance shutdown command
When we ask the instance to shutdown sometimes the command won't work,
especially if the instance isn't fully booted up. We'll wait for a bit,
and give it a few chances before giving up.
Reviewed-by: iustinp
Guido Trotter [Wed, 21 Jan 2009 18:20:46 +0000 (18:20 +0000)]
Xen: implement auxiliary migration functions
These are used, for the xen hypervisor, to copy the xen config file to
the remote node. This breaks migration for instances which have been
migrated, but not restarted, with the old code, for which the config
file was just lost.
Reviewed-by: iustinp
Iustin Pop [Wed, 21 Jan 2009 14:15:39 +0000 (14:15 +0000)]
Automatically release DRBD minors on success
This patch converts the DRBD minors reservation protocol from explicit
release to automatic release on the success paths. On the errors paths,
it's still needed to manual release.
The patch doesn't bring much by itself, but is needed for a future patch
which enhances the automatic verification of configuration consistency.
Reviewed-by: ultrotter
Iustin Pop [Wed, 21 Jan 2009 14:12:35 +0000 (14:12 +0000)]
Fix some more pylint errors
Two are real errors (invalid names) and one is style error (overriding
name from outer scope).
Reviewed-by: ultrotter
Iustin Pop [Wed, 21 Jan 2009 10:48:23 +0000 (10:48 +0000)]
One more gitignore rule
This was forgotten in the recent “switch to explicit ignore rules”.
Reviewed-by: imsnah
Iustin Pop [Wed, 21 Jan 2009 10:48:15 +0000 (10:48 +0000)]
Log the rpc call name in the RPC errors message
Currently the rpc module logs the error description and target node in
rpc calls logging, as such:
2009-01-21 00:50:01,456: pid=1051/Thread-21 ERROR RPC error from node
node1.example.com: Connection failed (111: Connection
refused)
but this doesn't help to understand which call caused this (here it's an
offline node which should not be contacted at all).
This patch adds the logging of the call too, so cases like the above can
be debugged easier.
Reviewed-by: imsnah, ultrotter
Iustin Pop [Wed, 21 Jan 2009 10:30:24 +0000 (10:30 +0000)]
Change the instance status attribute to boolean
Due to historic reasons, the “should run or not” attribute of an
instance was denoted by its “status” attribute having a string value of
either ‘up’ or ‘down’. Checking this is in code was done via hardcoding
of the strings.
This was long done for a redo, and this patch changes this attribute to
“admin_up” having a boolean value. The patch is in fact shorter than I
expected, and passes burnin.
The patch also fixes an error in BuildInstanceHookEnvByObject where the
instance.os was passed as the status value.
Reviewed-by: ultrotter
Guido Trotter [Wed, 21 Jan 2009 10:03:01 +0000 (10:03 +0000)]
Implement the new live migration backend functions
MigrationInfo, AcceptInstance and AbortMigration are implemented as
hypervisor specific functions, and by default they do nothing (as
they're not always necessary).
This patch also converts hv_base.MigrateInstance docstring to epydoc,
adds a missing @type to the GetInstanceInfo docstring, and removes an
unneeded empty line.
Reviewed-by: iustinp
Guido Trotter [Wed, 21 Jan 2009 09:55:22 +0000 (09:55 +0000)]
KVM: save and remove the KVM runtime
At instance startup time we save the kvm runtime, and at stop time we
delete it. This patch also includes a function to load the kvm runtime,
which is unused yet.
Reviewed-by: iustinp
Guido Trotter [Wed, 21 Jan 2009 09:55:08 +0000 (09:55 +0000)]
KVM: split KVM runtime generation and startup
Before we used to generate the kvm command line and then just run it.
With this patch we split the generation from the time it is run,
allowing us to save it and replay it at reboot.
We must take special care about instance nics:
- We can't include them in the saved command line, as they point to
temporary files
- We can't just generate them at exec time, because we would apply
those changes, but not all the other ones, to a running instance,
thus making it inconsistent (for example if an instance had a memory
increased and one more nic, in a soft reboot we would add the nic, but
not the memory)
So we'll just save the instance nic data at the time the kvm runtime
data is generated, and transform it into actual parameters at execution
time.
Reviewed-by: iustinp
Guido Trotter [Wed, 21 Jan 2009 09:54:11 +0000 (09:54 +0000)]
Add calls in the intra-node migration protocol
Currently the hypervisor is expected to do all the migration from the
source side. With this patch we also add the option of passing some
information to the target side, and starting some operation there.
As a bonus, a function to cleanup any started operation is included.
Reviewed-by: iustinp
Iustin Pop [Wed, 21 Jan 2009 08:33:01 +0000 (08:33 +0000)]
Update the objects.Disk formatting method
With the addition of minors, this needs to show them too.
Reviewed-by: ultrotter
Guido Trotter [Tue, 20 Jan 2009 18:12:21 +0000 (18:12 +0000)]
KVM: add a _CONF_DIR
Currently we keep pid files and control files. In the conf dir we'll
also keep the data to start the instance anew, and the network
interface scripts. These will then be copied to a separate area (since
_CONF_DIR could be mounted 'noexec') and used to start the instance.
This patch also adds comments to state what the various directories are
used for.
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 18:12:01 +0000 (18:12 +0000)]
KVM: Remove sockets after shutdown
Abstract the monitor and serial socket naming in two functions, and
reuse them to cleanup the files after shutdown.
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 18:11:48 +0000 (18:11 +0000)]
KVM: fix class docstring
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 18:11:35 +0000 (18:11 +0000)]
Xen: use epydoc in MigrateInstance docstring
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 17:50:42 +0000 (17:50 +0000)]
ShutdownInstance: report hypervisor error
When StopInstance raises an HypervisorError, report it in the logged
message to ease with debugging.
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 17:50:27 +0000 (17:50 +0000)]
ConfigObject docstring, close an open parenthesis
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 17:50:08 +0000 (17:50 +0000)]
Fix a typo in luxi's docstring
Reviewed-by: iustinp
Iustin Pop [Tue, 20 Jan 2009 17:19:58 +0000 (17:19 +0000)]
Update the logging output of job processing
(this is related to the master daemon log)
Currently it's not possible to follow (in the non-debug runs) the
logical execution thread of jobs. This is due to the fact that we don't
log the thread name (so we lose the association of log messages to jobs)
and we don't log the start/stop of job and opcode execution.
This patch adds a new parameter to utils.SetupLogging that enables
thread name logging, and promotes some log entries from debug to info.
With this applied, it's easier to understand which log messages relate
to which jobs/opcodes.
The patch also moves the "INFO client closed connection" entry to debug
level, since it's not a very informative log entry.
Reviewed-by: ultrotter
Michael Hanselmann [Tue, 20 Jan 2009 16:47:25 +0000 (16:47 +0000)]
.gitignore: Don't exclude whole /autotools/ dir, but only files
This way newly added files will be not be excluded by default. Fixes
also a small whitespace error in utils.py.
Reviewed-by: iustinp
Iustin Pop [Tue, 20 Jan 2009 16:26:57 +0000 (16:26 +0000)]
Convert RenameInstance to (status, data)
This allows the rename failures to show the ouput of OS scripts.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 16:26:46 +0000 (16:26 +0000)]
Update gitignore rules
As per Michael's comment, gitignore should not ignore a couple of real
files from the autotools/ directory.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 14:20:35 +0000 (14:20 +0000)]
Fix adding of disks to an instance
The ConfigWriter.AllocateDRBDMinor requires the instance name, not the
instance object. The LUSetInstanceParms is passing wrongly the instance
object, which can cause breakage.
The patch also adds asserts to check for this mismatch in ConfigWriter.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 14:20:24 +0000 (14:20 +0000)]
Fix burnin problems when using http checks
The urllib2 module has very bad error handling. This patch changes to urllib
which is simpler, and we derive a custom class from the FancyURLopener. Burning
is no longer keeping sockets in CLOSE_WAIT state with this patch.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 14:20:15 +0000 (14:20 +0000)]
Make cluster-verify check the drbd minors space
This patch adds support for verification of drbd minors space in cluster
verify: minors which belong to running instances and should be online
but are not, and minors which do not belong to any instace but are in
use.
The patch requires exposing some methods from bdev.DRBD8 and
config.ConfigWriter which were until now private methods.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 14:20:03 +0000 (14:20 +0000)]
Fix a couple of epydoc warnings
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 11:18:31 +0000 (11:18 +0000)]
DRBD: check for in-use minor during Create
In order to prevent errors with old, in-use DRBD minors, we check and
abort at create time if our minor is already in use. For this we need to
also modify DRBD8Status to be able to parse cs:Unconfigured devices.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 11:18:20 +0000 (11:18 +0000)]
Add a TailFile function
This patch adds a tail file function, to be used for parsing and returning in
the job log OS installation failures.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 11:18:10 +0000 (11:18 +0000)]
Unify some unittest functions
This patch adds unified temporary file handling to the
testutils.GanetiTestCase class, which adds easy creation and automated
cleanup of temporary files.
The patch allows a simpler handling in a couple of test cases but
requires all child classes to call the parent setUp and tearDown
methods.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 10:12:00 +0000 (10:12 +0000)]
Some small fixes in cmdlib
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 10:11:48 +0000 (10:11 +0000)]
Convert AddOSToInstance to (status, data)
This allows the install and reinstall instance to return (hopefully)
relevant log files from the OS create scripts.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 10:11:36 +0000 (10:11 +0000)]
Convert the start instance rpc to (status, data)
This will record the failure cause in starting up the instance in the
job log (and thus to the user).
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 17:22:32 +0000 (17:22 +0000)]
Fix handling of failures in create instance disks
Commit 2302 only modified _CreateBlockDevOnPrimary to the new style
result, but _CreateBlockDevOnSecondary was forgotten. After the merger
of the two functions, _CreateBlockDevOnSecondary was taken as template
so we checked against old-style values, thus completely breaking error
handling.
Reviewed-by: imsnah
Iustin Pop [Mon, 19 Jan 2009 14:35:03 +0000 (14:35 +0000)]
Move the default MAC prefix to the constants file
Instead of having the default live in the gnt-cluster script, we move it
to the constants file. The patch also fixes a typo on constants.py.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 14:33:13 +0000 (14:33 +0000)]
Use instance.all_nodes instead of hand-building it
This patch replaces a few obvious uses of [instance.primary_node] +
list(instance.secondary_nodes) (or similar usage) with the new
instance.all_nodes.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 14:32:21 +0000 (14:32 +0000)]
Fix non-drbd instance creation
Commit 2294 introduced a new instance.all_nodes property, which
unfortunately is working incorrectly for non-drbd instances.
This patch fixes it by making sure the primary node is always added to
the set, even before recursing over (any potential) children.
Reviewed-by: imsnah
Iustin Pop [Mon, 19 Jan 2009 11:10:52 +0000 (11:10 +0000)]
Small simplification in MapLVsByNode
We don't need to pre-create the node entries in lvmap, since they will
be created at recursion time.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 11:10:42 +0000 (11:10 +0000)]
Split the block device creation in two parts
Some callers of _CreateBlockDev need recursive behaviour, but not all.
The replace secondary first creates (manually) new LVs to ensure storage
is there, and then it creates the new DRBD. At this point, we need a
non-recursive call so that the LVs are not needlessly re-created.
This patch splits the single device creation into a separate function,
so that LUReplaceDisks can use it.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 11:10:29 +0000 (11:10 +0000)]
Combine the two _CreateBlockDevOnXXX functions
Since only two boolean parameters differ between these two functions, we
combine them as to have less code duplication. This will be needed in
the future as we will need to split off the recursive part off.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 11:10:19 +0000 (11:10 +0000)]
Switch call_blockdev_create call to (status, data)
This allows errors to be visible at the user level instead of just node
daemon logs.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 11:10:10 +0000 (11:10 +0000)]
Small change in the instance disk creation path
For future propagation of error messages from backend to cmdlib and to
the job log, just having True/False return from the disk creation
function is not enough.
This patch converts these functions (_CreateDisks, _CreateBlockDevOnXXX)
to raise exception on errors, and otherwise the return value is None.
Reviewed-by: ultrotter