Guido Trotter [Tue, 20 Jan 2009 18:12:21 +0000 (18:12 +0000)]
KVM: add a _CONF_DIR
Currently we keep pid files and control files. In the conf dir we'll
also keep the data to start the instance anew, and the network
interface scripts. These will then be copied to a separate area (since
_CONF_DIR could be mounted 'noexec') and used to start the instance.
This patch also adds comments to state what the various directories are
used for.
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 18:12:01 +0000 (18:12 +0000)]
KVM: Remove sockets after shutdown
Abstract the monitor and serial socket naming in two functions, and
reuse them to cleanup the files after shutdown.
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 18:11:48 +0000 (18:11 +0000)]
KVM: fix class docstring
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 18:11:35 +0000 (18:11 +0000)]
Xen: use epydoc in MigrateInstance docstring
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 17:50:42 +0000 (17:50 +0000)]
ShutdownInstance: report hypervisor error
When StopInstance raises an HypervisorError, report it in the logged
message to ease with debugging.
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 17:50:27 +0000 (17:50 +0000)]
ConfigObject docstring, close an open parenthesis
Reviewed-by: iustinp
Guido Trotter [Tue, 20 Jan 2009 17:50:08 +0000 (17:50 +0000)]
Fix a typo in luxi's docstring
Reviewed-by: iustinp
Iustin Pop [Tue, 20 Jan 2009 17:19:58 +0000 (17:19 +0000)]
Update the logging output of job processing
(this is related to the master daemon log)
Currently it's not possible to follow (in the non-debug runs) the
logical execution thread of jobs. This is due to the fact that we don't
log the thread name (so we lose the association of log messages to jobs)
and we don't log the start/stop of job and opcode execution.
This patch adds a new parameter to utils.SetupLogging that enables
thread name logging, and promotes some log entries from debug to info.
With this applied, it's easier to understand which log messages relate
to which jobs/opcodes.
The patch also moves the "INFO client closed connection" entry to debug
level, since it's not a very informative log entry.
Reviewed-by: ultrotter
Michael Hanselmann [Tue, 20 Jan 2009 16:47:25 +0000 (16:47 +0000)]
.gitignore: Don't exclude whole /autotools/ dir, but only files
This way newly added files will be not be excluded by default. Fixes
also a small whitespace error in utils.py.
Reviewed-by: iustinp
Iustin Pop [Tue, 20 Jan 2009 16:26:57 +0000 (16:26 +0000)]
Convert RenameInstance to (status, data)
This allows the rename failures to show the ouput of OS scripts.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 16:26:46 +0000 (16:26 +0000)]
Update gitignore rules
As per Michael's comment, gitignore should not ignore a couple of real
files from the autotools/ directory.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 14:20:35 +0000 (14:20 +0000)]
Fix adding of disks to an instance
The ConfigWriter.AllocateDRBDMinor requires the instance name, not the
instance object. The LUSetInstanceParms is passing wrongly the instance
object, which can cause breakage.
The patch also adds asserts to check for this mismatch in ConfigWriter.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 14:20:24 +0000 (14:20 +0000)]
Fix burnin problems when using http checks
The urllib2 module has very bad error handling. This patch changes to urllib
which is simpler, and we derive a custom class from the FancyURLopener. Burning
is no longer keeping sockets in CLOSE_WAIT state with this patch.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 14:20:15 +0000 (14:20 +0000)]
Make cluster-verify check the drbd minors space
This patch adds support for verification of drbd minors space in cluster
verify: minors which belong to running instances and should be online
but are not, and minors which do not belong to any instace but are in
use.
The patch requires exposing some methods from bdev.DRBD8 and
config.ConfigWriter which were until now private methods.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 14:20:03 +0000 (14:20 +0000)]
Fix a couple of epydoc warnings
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 11:18:31 +0000 (11:18 +0000)]
DRBD: check for in-use minor during Create
In order to prevent errors with old, in-use DRBD minors, we check and
abort at create time if our minor is already in use. For this we need to
also modify DRBD8Status to be able to parse cs:Unconfigured devices.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 11:18:20 +0000 (11:18 +0000)]
Add a TailFile function
This patch adds a tail file function, to be used for parsing and returning in
the job log OS installation failures.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 11:18:10 +0000 (11:18 +0000)]
Unify some unittest functions
This patch adds unified temporary file handling to the
testutils.GanetiTestCase class, which adds easy creation and automated
cleanup of temporary files.
The patch allows a simpler handling in a couple of test cases but
requires all child classes to call the parent setUp and tearDown
methods.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 10:12:00 +0000 (10:12 +0000)]
Some small fixes in cmdlib
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 10:11:48 +0000 (10:11 +0000)]
Convert AddOSToInstance to (status, data)
This allows the install and reinstall instance to return (hopefully)
relevant log files from the OS create scripts.
Reviewed-by: ultrotter
Iustin Pop [Tue, 20 Jan 2009 10:11:36 +0000 (10:11 +0000)]
Convert the start instance rpc to (status, data)
This will record the failure cause in starting up the instance in the
job log (and thus to the user).
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 17:22:32 +0000 (17:22 +0000)]
Fix handling of failures in create instance disks
Commit 2302 only modified _CreateBlockDevOnPrimary to the new style
result, but _CreateBlockDevOnSecondary was forgotten. After the merger
of the two functions, _CreateBlockDevOnSecondary was taken as template
so we checked against old-style values, thus completely breaking error
handling.
Reviewed-by: imsnah
Iustin Pop [Mon, 19 Jan 2009 14:35:03 +0000 (14:35 +0000)]
Move the default MAC prefix to the constants file
Instead of having the default live in the gnt-cluster script, we move it
to the constants file. The patch also fixes a typo on constants.py.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 14:33:13 +0000 (14:33 +0000)]
Use instance.all_nodes instead of hand-building it
This patch replaces a few obvious uses of [instance.primary_node] +
list(instance.secondary_nodes) (or similar usage) with the new
instance.all_nodes.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 14:32:21 +0000 (14:32 +0000)]
Fix non-drbd instance creation
Commit 2294 introduced a new instance.all_nodes property, which
unfortunately is working incorrectly for non-drbd instances.
This patch fixes it by making sure the primary node is always added to
the set, even before recursing over (any potential) children.
Reviewed-by: imsnah
Iustin Pop [Mon, 19 Jan 2009 11:10:52 +0000 (11:10 +0000)]
Small simplification in MapLVsByNode
We don't need to pre-create the node entries in lvmap, since they will
be created at recursion time.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 11:10:42 +0000 (11:10 +0000)]
Split the block device creation in two parts
Some callers of _CreateBlockDev need recursive behaviour, but not all.
The replace secondary first creates (manually) new LVs to ensure storage
is there, and then it creates the new DRBD. At this point, we need a
non-recursive call so that the LVs are not needlessly re-created.
This patch splits the single device creation into a separate function,
so that LUReplaceDisks can use it.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 11:10:29 +0000 (11:10 +0000)]
Combine the two _CreateBlockDevOnXXX functions
Since only two boolean parameters differ between these two functions, we
combine them as to have less code duplication. This will be needed in
the future as we will need to split off the recursive part off.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 11:10:19 +0000 (11:10 +0000)]
Switch call_blockdev_create call to (status, data)
This allows errors to be visible at the user level instead of just node
daemon logs.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 11:10:10 +0000 (11:10 +0000)]
Small change in the instance disk creation path
For future propagation of error messages from backend to cmdlib and to
the job log, just having True/False return from the disk creation
function is not enough.
This patch converts these functions (_CreateDisks, _CreateBlockDevOnXXX)
to raise exception on errors, and otherwise the return value is None.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 11:10:01 +0000 (11:10 +0000)]
Block device creation cleanup
Currently when creation LVM-based instances, we always get the
extremely-confusing message "ERROR Can't find LV /dev/xenvg/..." which
is actually expected. This behaviour was introduced before we had
UUID-style LV names, since at that point it was not a unexpected to have
such volumes laying around after a failed creation.
Today, it's much more of an error to see existing volumes, and it's
better to abort with a failure. Since bdev.LogicalVolume.Create() method
will raise an error in case it exists, we can remove this check in
backend before creating the device.
The Create methods for DRBD and FileStorage currently don't raise
exception, as behaviour is not very well defined here.
We also change some exception types raised in bdev so that all
exceptions raised by device creation are a subclass of GenericError.
Reviewed-by: ultrotter
Iustin Pop [Mon, 19 Jan 2009 10:43:11 +0000 (10:43 +0000)]
Use the same root for both _data and _meta LVs
Currently we use a different UUID for the _data and _meta volumes of a
DRBD disk. This is confusing as it's hard to associate the two in the
output of “lvs” or “gnt-node volumes”.
The patch changes so that they use the same prefix.
Reviewed-by: ultrotter
Iustin Pop [Fri, 16 Jan 2009 16:24:26 +0000 (16:24 +0000)]
Fix LUExportInstance
Due to deficiencies in our block device implementation, it is a must to
call SetDiskID on disks before passing them to remote nodes. Since in
export/import, we don't touch the disks themselves, this was not needed
before in this function.
However, since having instance symlinks, the correct ID is needed here
too, and with static minors it's a "must need". This reflects into
failed instance starts after migration and/or failover.
Reviewed-by: ultrotter
Iustin Pop [Fri, 16 Jan 2009 13:09:43 +0000 (13:09 +0000)]
burnin: only call self.GrowDisks() if needed
In case we pass --disk-grow 0[,0..] then we should not call GrowDisks as it
prints confusing log lines.
Reviewed-by: imsnah
Iustin Pop [Fri, 16 Jan 2009 11:02:57 +0000 (11:02 +0000)]
Instance: add a new all_nodes property
Since we often need the list of all nodes of an instance, we add a new
"all_nodes" property that returns all nodes of the instance, and we
switch secondary_nodes to a simpler implementation based on this new
function.
Reviewed-by: ultrotter
Iustin Pop [Fri, 16 Jan 2009 10:43:09 +0000 (10:43 +0000)]
Fix gnt-backup export with short names
We need to pass the fully-qualified node to _CheckNodeOnline, not the short
one.
Reviewed-by: imsnah
Iustin Pop [Fri, 16 Jan 2009 10:41:17 +0000 (10:41 +0000)]
burnin: add option to not remove instances
This patch adds a burnin option to keep instances at the end, so that
debugging after a burnin failure is easier.
Also, we reorder the command line parsing and client query so that one
can use ./tools/burnin --help even on non-ganeti machines.
Reviewed-by: ultrotter
Iustin Pop [Thu, 15 Jan 2009 10:00:23 +0000 (10:00 +0000)]
Some docstring updates
This patch rewraps some comments to shorter lengths, changes
double-quotes to single-quotes inside triple-quoted docstrings for
better editor handling.
It also fixes some epydoc errors, namely invalid crossreferences (after
method rename), documentation for inexistent (removed) parameters, etc.
Reviewed-by: ultrotter
Iustin Pop [Thu, 15 Jan 2009 10:00:12 +0000 (10:00 +0000)]
ganeti-noded: reduce log noise
The source port/addr is currently logged three times for each
connection, and this is unnecessary. We change two log entries to debug,
since they are useful for precise timing, and we keep only one at INFO
level.
Reviewed-by: imsnah
Iustin Pop [Wed, 14 Jan 2009 09:57:06 +0000 (09:57 +0000)]
burnin: update migration to latest log formatting
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 15:21:35 +0000 (15:21 +0000)]
Forward port of the burnin migration
This is again a copy of the latest 1.2 burnin code related to migration.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 15:21:22 +0000 (15:21 +0000)]
Forward port the live migration from 1.2 branch
This is forward port via copy (and not individual patches cherry-pick)
of the latest code on the 1.2 branch related to the migration.
The changes compared to 1.2 are the fact that we don't need the
IdentifyDisks step anymore (the drbd rpc calls are independent now), and
the rpc module improvements.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 15:20:56 +0000 (15:20 +0000)]
Port replace disk/change node to the new DRBD RPCs
In replace disks to new secondary, since Attach (and therefore
call_blockdev_find) is not modifying the devices anymore, we need to
switch this LU to the new call_drbd_disconnect_net and
call_drbd_attach_net functions.
Due to the authentication needed in 2.0, we need to be more careful with
the activation order. In 1.2, we have the case that the new node was
directly activated with networking information, and could connect to the
primary while it was still connected or WFConnect to the old secondary.
In the new scheme, we:
- create the new drbd in StandAlone mode
- shutdown old secondary (primary becomes WFConnection)
- disconnect primary (and thus it goes into StandAlone)
- connect both primary and new secondary to network using the
call_drbd_attach_net rpc
This should be safer, and is cleaner. This passes burnin.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 15:20:44 +0000 (15:20 +0000)]
Forward-port DrbdNetReconfig
This is a modified forward-port of DrbdNetReconfig and their associated
RPCs. In Ganeti 2.0, these functions will be used for two things:
- live migration (as in 1.2)
- and for other network reconfiguration tasks, since DRBD8.Attach()
doesn't do them anymore
Because of the Attach() changes, we can now implement the
AttachNet/DisconnectNet functions as independent entities, and we don't
need the cache anymore.
Note these functions are copies of the latest 1.2 code, and not
cherry-picks of the (many) patches that went into 1.2.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 15:20:25 +0000 (15:20 +0000)]
backend: rename AttachOrAssemble to Assemble
Since now the Assemble function is different than Attach, we rename this
backend function to show that the intent is to fully assemble the device
(and it's always allowed to modify the device).
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 15:20:12 +0000 (15:20 +0000)]
drbd: change the semantics of Attach vs. Assemble
Currently, both the Attach and Assemble methods for DRBD8 devices will use and
alter the device state. This is suboptimal, and it has been worked
around in 1.2 via a special cache in the node daemon so that we don't
need to call Attach() again in migration, for example.
Since in 2.0 we have static minors, we can change these functions so
that:
- Attach() does not affect the device in any way, and only checks if
the minor is already in use or not
- Assemble() has two logic paths, one for startup from unused minor
(the old Assemble, now renamed _FastAssemble) and one for
re-checking/fixing an in-use minor (the old Attach, now renamed
_SlowAttach)
Basically Attach was renamed to _SlowAttach, Assemble to _FastAssemble,
and we have a new, simple Assemble that calls one or the other based on
the result of the new Attach.
The LUReplaceDisks (with new secondary) is relying on the special
semantics of Attach modifying the device and is broken until the end of
the patch series.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 15:20:00 +0000 (15:20 +0000)]
bdev: Do not call Assemble() on children
The caller of dev.Assemble() (backend._RecursiveAssembleBD) is doing an
explicit recursion over all the children of the device, with better
error reporting. As such, we don't need this repeated assembly inside
the base BlockDev class.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 14:43:12 +0000 (14:43 +0000)]
Fix modification of instance memory
... as found by the QA script - bug was introduced by me in commit 2117.
Reviwed-by: imsnah
Iustin Pop [Tue, 13 Jan 2009 14:14:18 +0000 (14:14 +0000)]
burnin: redo the output formatting
Since we added many more tests in burnin, the output became almost
unreadable. This patch changes the output to an indented one, so that
the different phases and operations of burnin are more easily
understood.
Reviwed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 13:25:58 +0000 (13:25 +0000)]
burnin: move start_stop at the end
Traditionally the start/stop test was the last, so move it back to there
(added as last option in commit 854).
Reviewed-by: amishchenko
Iustin Pop [Tue, 13 Jan 2009 13:25:48 +0000 (13:25 +0000)]
QA: add burnin parameters (parallel, http-check)
This patch adds burnin parameters for --parallel and --http-check
options to the burnin script.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 13:16:41 +0000 (13:16 +0000)]
Increase resync speed to 60MB/s
This is a forward-port of commit 2219 on the 1.2 branch.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 13:03:44 +0000 (13:03 +0000)]
burnin: introduce instance alive checks
This patch adds instance alive checks after most start operations. The
check is done in a custom way:
- the instance is expected to have an http server up and running
- and it should server the '/hostname.txt' resource containing the
hostname of the instance
This allows checking that:
- creation is working OK
- start after failover (and in the future migrate) is ok
- rename works correctly
By default, the check is disabled since one needs a custom OS for this
check.
The patch also fixes a wrong variable name from a previous burnin patch.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 Jan 2009 08:04:40 +0000 (08:04 +0000)]
Small typo in ganeti-watcher
Reviewed-by: imsnah
Iustin Pop [Mon, 12 Jan 2009 16:06:14 +0000 (16:06 +0000)]
Skip offline nodes in gnt-cluster commands
This patch makes gnt-cluster copyfile and command skip the offline
nodes.
Reviwed-by: ultrotter, imsnah
Iustin Pop [Mon, 12 Jan 2009 13:25:27 +0000 (13:25 +0000)]
burnin: Add tests for add/remove disks and NICs
This patch adds testing of add/remove disks and NICs to the burnin.
Reviewed-by: imsnah
Iustin Pop [Mon, 12 Jan 2009 12:42:22 +0000 (12:42 +0000)]
Heavy redo of gnt-instance info output
In 2.0, we have more parameters in drbd's logical_id, and passing the
results over json makes them unicode which looks worse with the default
formatting. As such, a redo of the output is needed.
This patch:
- adds a separate function to format the logical_id of devices
- moves the actual indentation format out of _FormatBlockDevInfo,
which now just generates a list of items
- adds a function _FormatList that recursively formats the list
- formats specially key,value tuples
The result is that the output is nicer, and the code in
_FormatBlockDevInfo somewhat cleaner (as it doesn't deal with spacing
and such issues).
Reviewed-by: ultrotter
Iustin Pop [Mon, 12 Jan 2009 12:42:13 +0000 (12:42 +0000)]
Fix some errors in instance modify --disk remove
The RpcResult introduction still left some bugs (after multiple patches):
- we don't correctly check the result type
- rename a variable to prevent a conflict
Reviewed-by: imsnah
Iustin Pop [Mon, 12 Jan 2009 10:27:02 +0000 (10:27 +0000)]
Fix an error handling case in instance info
The checking for invalid instance names in LUQueryInstanceData is broken
since commit 1642.
Reviewed-by: imsnah
Iustin Pop [Mon, 12 Jan 2009 09:14:50 +0000 (09:14 +0000)]
Introduce a very simple LU to force config updates
This LU can be used to force a push of the config in case it's needed,
for example after an upgrade to update the ssconf_release_version file.
Reviewed-by: imsnah
Iustin Pop [Fri, 9 Jan 2009 16:24:05 +0000 (16:24 +0000)]
Add a new ssconf file with the ganeti version
The patch adds a new ssconf file containing the ganeti version.
Reviewed-by: imsnah
Iustin Pop [Fri, 9 Jan 2009 15:34:25 +0000 (15:34 +0000)]
Work around a DRBD sync speed race condition
This is modified forward-port of commit 1544 on the 1.2 branch:
When DRBD is doing its dance to establish a connection with its
peer, it also sends the synchronization speed over the wire. In
some cases setting the sync speed only after setting up both
sides can race with DRBD connecting, hence we set it here before
telling DRBD anything about its peer.
Reviewed-by: iustinp
The modification we make is that we split SetSyncSpeed in two so that we
don't need to modify our minor temporarily, and the fact that we call
this function from within _AssembleNet (right before enabling network),
instead of Assemble()/Attach().
Original-Author: imsnah
Iustin Pop [Fri, 9 Jan 2009 14:58:29 +0000 (14:58 +0000)]
burnin: Add activate/deactivate disks
Reviewed-by: imsnah
Iustin Pop [Fri, 9 Jan 2009 14:58:19 +0000 (14:58 +0000)]
burnin: use the new replace_disks constants
This patch updates burnin to the latest replace disks constant, and
changes the constants' values to be more accurate.
Reviewed-by: imsnah
Iustin Pop [Fri, 9 Jan 2009 14:26:53 +0000 (14:26 +0000)]
burnin: do not use offline nodes
This patch makes burnin skip the offline nodes in it's builtin node
selection. It also removes an extra line.
Reviewed-by: imsnah
Iustin Pop [Fri, 9 Jan 2009 14:26:37 +0000 (14:26 +0000)]
Fix gnt-os for offline nodes
We shouldn't query offline nodes in gnt-os. This patch adds an utility
function to ConfigWriter that returns the names of online nodes and uses
it in LUDiagnoseOS to query only the good nodes.
Reviewed-by: imsnah
Iustin Pop [Fri, 9 Jan 2009 12:52:28 +0000 (12:52 +0000)]
Silence warning on node list for offline nodes
The warning in node list is meant for nodes that return wrong
information, but for offline nodes this case is normal.
Reviewed-by: imsnah
Iustin Pop [Fri, 9 Jan 2009 12:52:17 +0000 (12:52 +0000)]
Rework the daemonization sequence
The current fork+close fds sequence has deficiencies which are hard to
work around:
- logging can start logging before we fork (e.g. if we need to emit
messages related to master checking), and thus use FDs which we
can't track nicely
- the queue locks the queue file, and again this fd needs to be kept
open which is hard from the main loop (and this error is currently
hidden by the fact that we don't log it)
Given the above, it's much simpler, in case we will fork later, to close
file descriptors right at the beginning of the program, and in Daemonize
only close/reopen the stdin/out/err fds.
In addition, we also close() the handlers we remove in SetupLogging so
that the cleanup is more thorough.
Reviewed-by: imsnah
Iustin Pop [Fri, 9 Jan 2009 12:22:51 +0000 (12:22 +0000)]
Cleanup replace-disks modes and options
In 1.2, due to the md+drbd7 legacy, we had a complex choice of replace
modes, and the new drbd8 modes where forced into this syntax, with some
complicated rules of transition from one mode to another (if REPLACE_ALL
but not new node passed, switch to REPLACE_SEC, etc.).
This patch cleans this situation by making a clear separation between
the two main modes:
- replace on current nodes (with the two sub-cases on primary and on
secondary)
- change to a new node (either via manually specified node or via
iallocator)
Reviewed-by: imsnah
Iustin Pop [Thu, 8 Jan 2009 16:39:47 +0000 (16:39 +0000)]
Fix cluster verify/node net test for offline nodes
For offline nodes, we shouldn't add them to the NV_NODELIST and
NV_NODENETTEST tests since they most likely won't succeed.
The patch makes gnt-cluster verify happy again in such cases.
Reviewed-by: imsnah
Iustin Pop [Thu, 8 Jan 2009 16:05:30 +0000 (16:05 +0000)]
rpc: Add a method for easy check of remote results
The patch adds a new method to the rpc.RpcResult class called
"RemoteFailMsg" which is useful for the RPC calls which return a
(status, payload) style result.
Reviewed-by: imsnah
Iustin Pop [Thu, 8 Jan 2009 14:16:44 +0000 (14:16 +0000)]
Add an instance_migratable rpc call
This is a forward-port of commit 1194 on the 1.2 branch:
This call will check whether an instance is up on its primary, and that
it has been started with symlinks. We currently have no on-secondary
checks, nor any hypervisor specific call.
Reviewed-by: iustinp
The difference from the original patch is that we don't include the
cmdlib changes, since those will come as a copy from the 1.2 cmdlib.py,
and not as individual patches.
Original-Author: ultrotter
Iustin Pop [Thu, 8 Jan 2009 12:03:31 +0000 (12:03 +0000)]
bdev: forward-port ReAttachNet/DisconnectNet
This is plain copy of the 1.2 ReAttachNet and DisconnectNet methods on
the DRBD8 device, with the logger to logging module changes and the
ReAttachNet method renamed to AttachNet.
These methods are not used anywhere right now, but will be used for
migration and a simpler disk-replace.
The code was originally committed on the 1.2 branch as revision numbers
1165 and 1204.
Originally-Reviewed-by: imsnah, ultrotter
Iustin Pop [Wed, 7 Jan 2009 17:02:37 +0000 (17:02 +0000)]
backend: Remove symlinks by disk name
This is a modified forward-port of commit 1184 on the 1.2 branch:
backend: Remove symlinks by disk name, not using a wildcard
Reviewed-by: ultrotter
The changes to the original patch are related to the docstring style and
iv_name to index switch.
Original-Author: imsnah
Iustin Pop [Wed, 7 Jan 2009 17:02:26 +0000 (17:02 +0000)]
Pass instance name to rpc call blockdev_close
This is an extract of commit 1166 on the 1.2 branch (Add a rpc call for
drbd network reconfiguration), but only the blockdev_close part.
The patch changes the blockdev_close call to take the instance so that
it can remove the symlinks of the instance.
Originally-Reviewed-by: imsnah
Iustin Pop [Wed, 7 Jan 2009 17:02:12 +0000 (17:02 +0000)]
Fix the _RemoveBlockDevLinks() function
This is a forward-port of commit 1163 on the 1.2 branch:
This fixes the removal of the instance symlinks (probably breakage from
the glob changes).
Reviewed-by: imsnah
Iustin Pop [Wed, 7 Jan 2009 17:01:58 +0000 (17:01 +0000)]
Remove instance's symlinks
This is a forward-port of commits 1150 and 1151 on the 1.2 branch:
Add _RemoveBlockDevLinks auxiliary function, called when an instance
fails to start and when it is shut down.
Reviewed-by: iustinp
and:
Fix cut&paste error when removing symlinks
It's just whitespace... isn't it? uhm... :) Anyway, fixing an error made
when reformatting the code for the new "safer" behaviour.
Reviewed-by: iustinp
Original-Author: ultrotter
Iustin Pop [Wed, 7 Jan 2009 17:01:46 +0000 (17:01 +0000)]
Catch BlockDeviceError when starting instance
This is a forward-port of commit 1149 on the 1.2 branch:
_GatherAndLinkBlockDevs used to raise the errors.BlockDeviceError
exception when it failed to create a block device, and with this patch
set it does so also when it fails to create a symlink to it.
With this patch we move the call to this function into a pre-existing
try-except block in the code, and catch the BlockDeviceError exception,
logging a message and returning a failure state if it happens.
Reviewed-by: iustinp
The changes are related to the new hypervisor and logging syntax.
Original-Author: ultrotter
Iustin Pop [Wed, 7 Jan 2009 17:01:36 +0000 (17:01 +0000)]
Create symlinks to intances' block devices
This is a forward-port of commit 1148 on the 1.2 branch:
Change the _GatherBlockDevs private function, called only one time by
StartInstance, to _GatherAndLinkBlockDevs, and make it transform the
device returned even more by calling the new _SimlinkBlockDev auxiliary
function.
This makes sure that every time an instance is started symlinks to its
block devices are created, and the instance is started off them, rather
than the underlying block devices.
Reviewed-by: iustinp
The changes we make to the patch is related to newer function signatures
in 2.0, and to the fact that iv_name is deprecated and we use instead
disk%d based on the disk index.
Original-Author: ultrotter
Iustin Pop [Wed, 7 Jan 2009 17:01:25 +0000 (17:01 +0000)]
Simplify hypervisor block_devices structure
This is a partial forward-port of commit 1136 on the 1.2 branch:
The hypervisor doesn't need to be passed the whole block device
structure, so we'll just give it the block device name on the local
node, and the name as seen by the instance. This will make it easier to
manipulate it later without messing with the block devices (eg. by
changing the system name to a symlink to the name itself).
Since the HVM hypervisor changes the "virtual" name a note is added
calling for a redesign that doesn't need this change, as different
hypervisors and emulation types will anyway have different names for
exported devices.
Reviewed-by: iustinp
The changes in this patch compared to the original are:
- we keep passing the original disk object, not for its iv_name, but
for it's physical_id which is needed by the file driver (this could
be fixed maybe)
- we don't use the iv_name anymore, since in 2.0 we already use the
index of the device
Original-Author: ultrotter
Iustin Pop [Wed, 7 Jan 2009 14:38:29 +0000 (14:38 +0000)]
_AssembleInstanceDisks: fix rpcresult handling
Commit 2117 changed _AssembleInstanceDisks to correctly parse the
failure status of the new RpcResult structure, but it didn't fix the
storing of only the result payload. Since RpcResult is not JSON
serializable, LUActivateInstanceDisks is failing.
Reviewed-by: ultrotter
Iustin Pop [Tue, 6 Jan 2009 09:57:53 +0000 (09:57 +0000)]
Fix some pylint-detected issues
Two bad indentation cases and a missing variable.
Reviewed-by: imsnah
Michael Hanselmann [Fri, 19 Dec 2008 19:31:17 +0000 (19:31 +0000)]
ganeti.bootstrap: Set permissions on newly uploaded files
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 19:31:04 +0000 (19:31 +0000)]
ganeti.cmdlib: Check remote API certificate on "gnt-cluster verify"
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 19:30:46 +0000 (19:30 +0000)]
ganeti.bootstrap: Upload remote API certificate to new nodes
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 19:30:31 +0000 (19:30 +0000)]
ganeti.bootstrap: Prepare for remote API certificate
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 19:30:17 +0000 (19:30 +0000)]
ganeti.bootstrap: Write SSL key to temporary file and set permissions
Previously, we set the permissions only after writing the key. This
gave other users on the system a small window during which they could
read the key.
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 19:30:05 +0000 (19:30 +0000)]
ganeti.bootstrap: Generate SSL certificate for remote API
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 19:29:50 +0000 (19:29 +0000)]
ganeti.bootstrap: Move SSL certificate generation into separate function
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 12:58:27 +0000 (12:58 +0000)]
ganeti-rapi: Implement HTTP authentication
Passwords are stored in "$localstatedir/lib/ganeti/rapi_users". User
options specify the access permissions of a user (see docstring for
ganeti.http.ReadPasswordFile), for which only "write" is supported
to grant write access. Every other user has read-only access.
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 12:58:10 +0000 (12:58 +0000)]
ganeti-rapi: Introduce per-request context
This will be used to evaluate access permissions to resources.
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 12:57:58 +0000 (12:57 +0000)]
ganeti.http: Function to read password file
Lines in the password file are of the following format:
<username> <password> [options]
Fields are separated by whitespace. Username and password are
mandatory, options are optional and separated by comma (",").
Empty lines and comments ("#") are ignored.
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 12:57:38 +0000 (12:57 +0000)]
ganeti.http: Add support for private data in HTTP requests
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 12:57:22 +0000 (12:57 +0000)]
ganeti.http: Add support for basic HTTP authentication
As per RFC2617.
Reviewed-by: amishchenko
Michael Hanselmann [Fri, 19 Dec 2008 12:57:07 +0000 (12:57 +0000)]
ganeti.http: Prepare authentication for HTTP server
The authentication class will override PreHandleRequest.
Reviewed-by: amishchenko
Michael Hanselmann [Thu, 18 Dec 2008 16:39:04 +0000 (16:39 +0000)]
Job queue: Allow more than one file rename per RPC call
Reviewed-by: ultrotter
Michael Hanselmann [Thu, 18 Dec 2008 16:38:47 +0000 (16:38 +0000)]
ganeti.jqueue: Group job archivals to reduce number of RPC calls
Reducing the actual number of RPC calls will come in another patch.
Reviewed-by: ultrotter
Michael Hanselmann [Thu, 18 Dec 2008 16:38:32 +0000 (16:38 +0000)]
Prevent RPC timeout on auto-archiving jobs
With a large job queue, auto-archiving jobs can take a very long time,
causing timeouts on the luxi RPC layer. With this change, auto-
archive returns after half of the RPC timeout has passed. The user
will see how many jobs are left unchecked.
Reviewed-by: ultrotter
Michael Hanselmann [Thu, 18 Dec 2008 16:38:09 +0000 (16:38 +0000)]
jqueue: When auto-archiving jobs, calculate job status only once
This is done by passing the job object to _ArchiveJobUnlocked instead
of only the job ID. Also return whether job was actually archived.
Reviewed-by: ultrotter
Michael Hanselmann [Thu, 18 Dec 2008 16:23:26 +0000 (16:23 +0000)]
Use subdirectories for job queue archive
As it turned out, having many files in a single directory can be
very painful. With this patch, only 10'000 files are stored in a
directory for the job queue archive. With 10'000 directries, this
allows for up to 100 million jobs be archived without having large
numbers of files in a single directories. Not that it is realistic,
anyway.
Reviewed-by: ultrotter