Michael Hanselmann [Mon, 23 Jun 2008 13:00:45 +0000 (13:00 +0000)]
objects: Remove config_version from cluster configuration
Reviewed-by: ultrotter
Michael Hanselmann [Mon, 23 Jun 2008 12:53:15 +0000 (12:53 +0000)]
cfgupgrade: Add main() function
Reviewed-by: iustinp
Michael Hanselmann [Mon, 23 Jun 2008 12:53:04 +0000 (12:53 +0000)]
cfgupgrade: Add logging module
Reviewed-by: iustinp
Guido Trotter [Mon, 23 Jun 2008 12:50:15 +0000 (12:50 +0000)]
Fix the zombie process unittest
The failure is because in high load, the parent gets to run before the
child has the chance to os._exit(), and therefore it is still running
when the parent does the check.
The fix removes the chance of this happening by waiting to receive a SIGCHLD
(but not calling wait()) before trying to test the pid.
Reviewed-by: imsnah
Michael Hanselmann [Mon, 23 Jun 2008 11:30:54 +0000 (11:30 +0000)]
Bump version to 2.0.0~alpha0
We decided to bump the major number to 2 a few weeks ago due to the huge number
of changes going into it.
Reviewed-by: iustinp
Michael Hanselmann [Mon, 23 Jun 2008 11:11:42 +0000 (11:11 +0000)]
Add functions to calculate version number to constants.py
In cfgupgrade, we need to extract parts of and build new version numbers.
Reviewed-by: iustinp
Michael Hanselmann [Mon, 23 Jun 2008 09:52:52 +0000 (09:52 +0000)]
utils.WriteFile: Remove optional check_abspath parameter
cfgupgrade will not work with relative paths at all, but rather get them
from constants.py.
Reviewed-by: iustinp
Iustin Pop [Sun, 22 Jun 2008 10:57:52 +0000 (10:57 +0000)]
Add a ‘tags’ field to instance and node listing
Currently there isn't any easy way to list all nodes or instance and
their tags; you have to query each node in turn, or list all the tags
via something like “gnt-cluster search-tags '.*'”. Of course, this is
not optimal.
The patch adds a new fields to “gnt-instance list” and “gnt-node list”
called ‘tags’, that will list the tags of the object in comma-separated
form. This field will be empty if there are no tags (when using a
separator this output can still be parsed by other scripts).
At opcode level, there is a new fields called ‘tags’ that returns a
(python) list of the object tags.
Reviewed-by: ultrotter
Iustin Pop [Sat, 21 Jun 2008 18:49:14 +0000 (18:49 +0000)]
Implement handling of luxi errors in cli.py
Currently the generic handling of ganeti errors in cli.py (GenericMain
and FormatError) only handles the core ganeti errors, and not the client
protocol errors (which live in a separate hierarchy).
This patch adds handling of luxi errors too, and also adds another luxi
error for the case when the master is not running. This gives us a nice:
gnta1:~# gnt-node list
Cannot communicate with the master daemon.
Is it running and listening on '/var/run/ganeti-master.sock'?
error message instead of a traceback.
Reviewed-by: amishchenko
Iustin Pop [Sat, 21 Jun 2008 11:27:22 +0000 (11:27 +0000)]
Remove twisted checks from configure.ac
Currently we don't use twisted, so we remove the twisted checks from the
configure stage.
Reviewed-by: amishchenko
Iustin Pop [Fri, 20 Jun 2008 11:04:27 +0000 (11:04 +0000)]
Add a rpc call for BlockDev.Close()
This patch adds rpc layer calls (in rpc.py and the equivalent in
ganeti-noded) to close a list of block devices, and the wrapper in
backend.py that takes a list of Disk objects, identifies them and
returns correctly formatted results.
The reason why this very basic call was missing until now from the rpc
layer is that we usually don't care about device closes (though we
should, and will do so in the future) as only drbd has a meaningful
Close() operation; right now we directly do Shutdown().
The patch is clean enough that it's actually independent of the live
migration implementation.
Reviewed-by: imsnah
Michael Hanselmann [Thu, 19 Jun 2008 14:06:28 +0000 (14:06 +0000)]
Check for docbook2{man,pdf,html}
docbook2{man,pdf,html} are mandatory. "configure" aborts if one
of them isn't found.
Reviewed-by: iustinp
Iustin Pop [Thu, 19 Jun 2008 13:37:08 +0000 (13:37 +0000)]
Small typo in gnt-instance manpage
Reviewed-by: manuel.franceschini
Michael Hanselmann [Thu, 19 Jun 2008 12:56:17 +0000 (12:56 +0000)]
Use a single Makefile.am instead of many
This change allows us to use cleaner dependencies between
directories. The build system is basically rewritten in large parts
and may contain bugs.
Reviewed-by: iustinp
Iustin Pop [Wed, 18 Jun 2008 15:09:08 +0000 (15:09 +0000)]
Fix bdev unittest when run under distcheck
The path to the filename for drbd8 proc data is not correctly computed
when using distcheck. The patch duplicates it from the other drbd tests.
Reviewed-by: ultrotter
Iustin Pop [Wed, 18 Jun 2008 15:08:53 +0000 (15:08 +0000)]
Rework the DRBD8 device status computation
Currently, compute the status of a drbd8 device in GetSyncStatus and
return only the values that we need (and fit in the framework of
GetSyncStatus). However, the full status details are useful (and needed)
in other places, so the patch attempts to improve this situation.
We abstract the status of a device outside in a separate class, that
knows how to parse contents from /proc/drbd and set easily accessible
attributes. We then simplify the GetSyncStatus to use this and return
the values that it needs, and add a separate method that returns the
full status object.
The move to a separate class cleans up a little bit the old
sync-progress computation from GetSyncStatus, but it's still many
regexes.
The patch also adds unittests for a few statuses, and modifies one
BaseDRBD call to accept a custom filename instead of '/proc/drbd' to
ease unittests.
Reviewed-by: imsnah
Michael Hanselmann [Wed, 18 Jun 2008 12:32:23 +0000 (12:32 +0000)]
ganeti-watcher: Replace custom exceptions with ganeti.error.*
Reviewed-by: iustinp
Michael Hanselmann [Wed, 18 Jun 2008 12:31:53 +0000 (12:31 +0000)]
ganeti-watcher: Don't write file if data didn't change
This is the safest way to detect changes and the amount of data
is small, so keeping a copy around is cheap enough.
Reviewed-by: iustinp
Michael Hanselmann [Wed, 18 Jun 2008 12:31:34 +0000 (12:31 +0000)]
ganeti-watcher: Rename WatcherState.data to WatcherState._data
Cleanup: _data is private and should not be modified from outside
of this class.
Reviewed-by: iustinp
Michael Hanselmann [Wed, 18 Jun 2008 12:31:16 +0000 (12:31 +0000)]
Don't log SystemExit exception in ganeti-watcher
Reviewed-by: iustinp
Michael Hanselmann [Wed, 18 Jun 2008 12:31:00 +0000 (12:31 +0000)]
Replace watcher state file atomically
- Lock it before renaming
- Code cleanup; close() automatically unlocks it
Reviewed-by: iustinp
Michael Hanselmann [Wed, 18 Jun 2008 12:30:44 +0000 (12:30 +0000)]
Write ganeti-watcher status file even if something failed
Reviewed-by: iustinp
Michael Hanselmann [Wed, 18 Jun 2008 12:30:11 +0000 (12:30 +0000)]
Add more parameters to utils.WriteFile
- Make closing file optional: Required by ganeti-watcher to keep
file open after writing it. Changes return value of utils.WriteFile
if "close" parameter evaluates to True.
- Pre- and post-write functions: Can be used to lock files. This
will be used by ganeti-watcher to lock the temporary file before
renaming.
Reviewed-by: iustinp
Michael Hanselmann [Wed, 18 Jun 2008 12:29:52 +0000 (12:29 +0000)]
Use ganeti.serializer module in ganeti-watcher
Reviewed-by: ultrotter
Michael Hanselmann [Wed, 18 Jun 2008 12:29:37 +0000 (12:29 +0000)]
Replace custom logging code in watcher with logging module
- Log timestamp for all messages
- Write everything to logfile and optionally to stderr
- Log messages are no longer buffered, allowing a user to see progress
Reviewed-by: ultrotter
Michael Hanselmann [Wed, 18 Jun 2008 12:29:03 +0000 (12:29 +0000)]
Make sure serialized data ends with EOL character
Also fix the regular expression to not remove newlines. The simplejson
module puts whitespace at line endings when using indentation. Remove
unnecessary import of ConfigParser module.
Reviewed-by: ultrotter
Iustin Pop [Tue, 17 Jun 2008 15:08:24 +0000 (15:08 +0000)]
Allow disk object to set their own physical ID
Currently, the way to customize a DRBD disk from (node name 1, node name
2, port) to (ip1, port, ip2, port) is to use the ConfigWriter method
SetDiskID. However, since this needs a ConfigWriter object, it can be
run only on the master, and therefore disk object can't be passed to
more than one node unchanged. This, coupled with the rpc layer
limitation that all nodes in a multi-node call receive the same
arguments, prevent any kind of multi-node operation that has disks as an
argument.
This patch takes the SetDiskID method from ConfigWriter and ports it to
the disk object itself, and instead of the full node configuration it
uses a simple {node_name: replication_ip} mapping for all the nodes
involved in the disk tree (currently we only pass primary and secondary
node since we don't support nested drbd devices).
This allows us to send disks to both the primary and secondary nodes at
once and perform synchronized drbd activation on primary/secondary
nodes.
Note that while for the 1.2 branch this will not change old methods, it
is worth to investigate and possible replace all such calls on the
master to the nodes themselves for the 2.0 branch.
Reviewed-by: ultrotter
Iustin Pop [Tue, 17 Jun 2008 13:13:14 +0000 (13:13 +0000)]
Fix an error-handling case
There is a mistake in handling grow-disk for an invalid disk. This patch
fixes it.
Reviewed-by: imsnah
Iustin Pop [Tue, 17 Jun 2008 06:51:18 +0000 (06:51 +0000)]
Manpage updates for the new grow-disk command
The patch documents the steps needed to complete a user-visible grow
(i.e. not only grow-disk, but also filesystem resize is needed, etc.)
Reviewed-by: imsnah
Iustin Pop [Tue, 17 Jun 2008 06:51:05 +0000 (06:51 +0000)]
Implement gnt-instance grow-disk
This patch exposes at command line level the grow-disk operation.
Reviewed-by: imsnah
Iustin Pop [Tue, 17 Jun 2008 06:50:51 +0000 (06:50 +0000)]
Implement disk grow at LU level
This patch adds a new opcode and LU for growing an instance's disk.
The opcode allows growing only one disk at time, and will throw an error
if the operation fails midway (e.g. on the primary node after it has
been increased on the secondary node). As such, it might actually leave
different sized LVs on different nodes, but this will not create
problems.
Reviewed-by: imsnah
Iustin Pop [Tue, 17 Jun 2008 06:50:33 +0000 (06:50 +0000)]
Add method to update a disk object size
This patch adds a method that implements updating of a disk
(object.Disk) size, together with its children.
While this will not track the exact disk size, it allows at least an
approximate size to be recorded in the configuration (and queried).
Reviewed-by: imsnah
Iustin Pop [Tue, 17 Jun 2008 06:50:19 +0000 (06:50 +0000)]
Implement block device grow at the rpc layer
This simple patch exposes the block device grow operation at the rpc
layer. It does not increase the protocol version as it has been recently
changed by the live failover rpc call.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 16:06:43 +0000 (16:06 +0000)]
Expose block device grow in backend.py
This patch adds a wrapper over the block device grow operation that
converts the input and output parameters as needed for the rpc layer.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 16:01:03 +0000 (16:01 +0000)]
bdev: implement disk resize for lvm/drbd8
This patch implements disk resize at the bdev level for the LVM and
DRBD8 disk types. It is not implemented for DRBD7 and MD since the way
MD works with its underlaying devices makes it harder and this
combination is also deprecated.
The LVM resize operation is tried three times, with different allocation
policies:
- contiguous first, since this is best for allocation purposes (it
won't fragment too much the PV)
- cling, which is supported only by more recent LVM versions, will try
to place the new extents on the same PV as the rest of the LV
- and finally normal, which is the default
Reviewed-by: imsnah
Guido Trotter [Mon, 16 Jun 2008 14:32:22 +0000 (14:32 +0000)]
Move SetKey to WritableSimpleStore and use it
Before we used to be able to update SimpleStore by just calling SetKey, this
feature is now moved to an external class, which inherits from it. In this
patch the new WritableSimpleStore class is also put to use, in the LUs that
need it. Rather than making each LU instantiate it, we have a new LogicalUnit
flag REQ_WSSTORE which defaults to False, but when declared to be True asks the
LogicalUnit to be initialized with a writeable version of the SimpleStore.
LUMasterFailover and LURenameCluster are then changed to use it.
InitCluster is also changed to instantiate a WritableSimpleStore, rather
than a normal one.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 13:57:36 +0000 (13:57 +0000)]
Add migration support at the rpc layer
This patch adds the migration rpc call and its implementation in the
backend. The patch does not deal with the correct activation of disks.
Because of the new RPC, the protocol version is increased.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Jun 2008 10:37:26 +0000 (10:37 +0000)]
hypervisor: add live migration support
This is just the hypervisor-level migration (e.g. “xm migrate”) not the
whole node coordination work.
Reviewed-by: ultrotter
Guido Trotter [Sun, 15 Jun 2008 10:55:37 +0000 (10:55 +0000)]
Activate down instances' disks on replace-disks
When replacing disks or evacuating nodes with instances administratively
down ganeti fails because the instance disks are not active. This patch
activates them, performs the replacement, and shuts them down again.
Changing this also fixes the same issue on gnt-node evacuate.
Reviewed-by: iustinp
Guido Trotter [Sun, 15 Jun 2008 10:55:24 +0000 (10:55 +0000)]
FailoverInstance: change AddInstance with Update
We're not adding a new instance, just making configuration changes to
the one we're working on.
Reviewed-by: imsnah
Guido Trotter [Sun, 15 Jun 2008 10:55:09 +0000 (10:55 +0000)]
Burnin: Use iallocator in import/export
Currently the iallocator option is ignored by burnin at import/export
time even if passed in. With this patch it becomes used. The log message
used by the importer is also changed to reflect this.
This patch also improves import/export on the non-iallocator case:
- The secondary node is not passed anymore on non-mirrored templates
- On mirrored templates the secondary node is logged
Reviewed-by: imsnah
Iustin Pop [Sun, 15 Jun 2008 05:22:12 +0000 (05:22 +0000)]
Fix an error message in instance add
There is a mistake in the error message generated when we can't reach a
node for checking for available disk space. Without it, the error
message is:
Failure: prerequisites not met for this operation:
Cannot get current information from node '{u'gnte2.lab.k1024.org':
{'cpu_total': 1, 'memory_free': 480, 'vg_size': 131068, 'memory_total':
504, 'bootid': '
2176dd3b-2f96-42f0-8b6e-
2873ecaf5f9c', 'memory_dom0':
134, 'vg_free': 130172}, u'gnte1.lab.k1024.org': False}'
instead of the expected:
Failure: prerequisites not met for this operation:
Cannot get current information from node 'gnte2.lab.k1024.org'
Reviewed-by: imsnah
Michael Hanselmann [Fri, 13 Jun 2008 14:33:51 +0000 (14:33 +0000)]
Move warning flags from autogen.sh to configure.ac
Reviewed-by: iustinp
Michael Hanselmann [Fri, 13 Jun 2008 12:46:43 +0000 (12:46 +0000)]
Replace logging functions with calls to logging module
- Shorter code
- Reorder arguments to logger.SetupLogging calls to make more sense
Reviewed-by: iustinp
Guido Trotter [Fri, 13 Jun 2008 10:14:31 +0000 (10:14 +0000)]
Fail job on ganeti exceptions
When a Job raises a ganeti exception a message is printed but nothing is
reported in the job itself. It's better to update the job status, thus
notifying the client, possibly polling for the job result, of what went
wrong.
Reviewed-by: iustinp
Guido Trotter [Fri, 13 Jun 2008 10:14:09 +0000 (10:14 +0000)]
Fix a typo in jqueue.py
s/result/op_result/ (this code was never used, so this wasn't caught)
Reviewed-by: iustinp
Michael Hanselmann [Thu, 12 Jun 2008 13:47:32 +0000 (13:47 +0000)]
Don't use specific versions in autogen.sh
Not all distributions have the same version of aclocal, autoconf
or automake. Users can pass the name of a specific executables
via environment variables. Change configure.ac to require at
least autoconf 1.9.
Reviewed-by: iustinp
Michael Hanselmann [Thu, 12 Jun 2008 13:05:27 +0000 (13:05 +0000)]
Move InitCluster opcode into a single function
This allows us to initialize a new cluster. The code certainly contains
bugs and hooks aren't implemented yet.
Reviewed-by: iustinp
Michael Hanselmann [Thu, 12 Jun 2008 13:04:57 +0000 (13:04 +0000)]
Move cmdlib._HasValidVG to utils.CheckVolumeGroupSize
This is required for splitting the cluster initialization code.
Reviewed-by: iustinp
Michael Hanselmann [Thu, 12 Jun 2008 13:04:23 +0000 (13:04 +0000)]
Move {Set,Remove}EtcHostsEntry wrappers to utils.py
This is required for the split of the cluster initialization code.
Reviewed-by: iustinp, ultrotter
Michael Hanselmann [Thu, 12 Jun 2008 10:18:39 +0000 (10:18 +0000)]
Remove REQ_CLUSTER from opcode handling code
It's not needed anymore now that all opcodes require a cluster. Cluster
initialization was the only exception.
Reviewed-by: iustinp
Michael Hanselmann [Thu, 12 Jun 2008 10:06:35 +0000 (10:06 +0000)]
Remove unreachable code from cli.SubmitOpCode
Reviewed-by: iustinp
Michael Hanselmann [Thu, 12 Jun 2008 10:06:18 +0000 (10:06 +0000)]
Rename master socket to ganeti-master.sock
…/run/master.sock is not specific enough.
Reviewed-by: iustinp
Guido Trotter [Wed, 11 Jun 2008 10:12:25 +0000 (10:12 +0000)]
Remove SimpleStore cache
SimpleStore is instantiated anew most of the times it's used, so having
a cache inside it serves no purpose. Removing it.
Reviewed-by: iustinp
Michael Hanselmann [Fri, 6 Jun 2008 09:32:24 +0000 (09:32 +0000)]
Forward-port: Fix two problems in QA scripts
- Failover back to original node in instance failure test
- Exclude secondary node from list of potential nodes in
replace-disks test
Reviewed-by: iustinp
Michael Hanselmann [Fri, 6 Jun 2008 09:31:54 +0000 (09:31 +0000)]
Forward-port: Add QA tests for “gnt-instance reboot”
Reviewed-by: ultrotter
Michael Hanselmann [Fri, 6 Jun 2008 09:31:07 +0000 (09:31 +0000)]
Forward-port: Add QA test for “gnt-instance replace-disks”
Reviewed-by: iustinp
Michael Hanselmann [Fri, 6 Jun 2008 09:30:33 +0000 (09:30 +0000)]
Forward-port: Update gnt-instance and gnt-backup manpages
- Add --iallocator options
- Small text fixes
Reviewed-by: ultrotter
Michael Hanselmann [Fri, 6 Jun 2008 09:30:01 +0000 (09:30 +0000)]
Forward-port: Fix wrong filename in ganeti-watcher manpage
Reviewed-by: iustinp
Michael Hanselmann [Fri, 6 Jun 2008 09:28:56 +0000 (09:28 +0000)]
Forward-port: Small codestyle fixes for dumb-allocator
Reviewed-by: iustinp
Michael Hanselmann [Fri, 6 Jun 2008 09:28:21 +0000 (09:28 +0000)]
Forward-port: Remove output file if docbook failed
Reviewed-by: ultrotter
Michael Hanselmann [Fri, 6 Jun 2008 09:27:49 +0000 (09:27 +0000)]
Forward-port: Alias Dump/Load functions in ganeti.serializer to DumpJson/LoadJson
The remote API will use JSON for the foreseable future, so it's better
to put the serialization format in the function name. We can still
use another serialization format for Ganeti's core.
Reviewed-by: amishchenko, schreiberal
Michael Hanselmann [Fri, 6 Jun 2008 09:19:59 +0000 (09:19 +0000)]
Add line-breaks to gnt-instance manpage
Reviewed-by: ultrotter
Iustin Pop [Sat, 31 May 2008 23:52:35 +0000 (23:52 +0000)]
Add check for node memory in instance creation
Currently the check for enough memory is done only on instance start
command and failover command. But we also start an instance in instance
create, therefore we need to check this instead of failing to start in
the hypervisor phase.
The patch adds a check for node memory in the case the creation command
specifies that the instance should be started. It is allowed for the
memory to be less than needed if the instance will not be started, in
order to allow migration and other such cases.
Reviewed-by: imsnah
Iustin Pop [Sat, 31 May 2008 23:51:07 +0000 (23:51 +0000)]
Show cluster hypervisor for gnt-cluster info
Author: schreiberal
Reviewed-by: iustinp
Iustin Pop [Sat, 31 May 2008 23:47:55 +0000 (23:47 +0000)]
Forward-port: Another for gnt-instance modify & HVM parameters
Another tiny fix. Anybody got a nice brown paper bag I can wear?
Author: schreiberal
Reviewed-by: iustinp
Iustin Pop [Sat, 31 May 2008 23:45:51 +0000 (23:45 +0000)]
Forward-port: make gnt-modify work with new HVM parameters
This fixes gnt-instance modify so it actually works with the
new HVM parameters for Ganeti 1.2
Author: schreiberal
Reviewed-by: iustinp
Iustin Pop [Sat, 31 May 2008 23:43:11 +0000 (23:43 +0000)]
Forward-port: show only parameters relevant to the instance
This patch modifies the code for "gnt-instance info .." to only display
instance parameters that actually apply to that instance, i.e. for PVM
instances no HVM parameters are shown and vice versa.
Author: schreiberal
Reviewed-by: iustinp
Iustin Pop [Sat, 31 May 2008 23:39:23 +0000 (23:39 +0000)]
Forward-port: patch 4/4 extended HVM features for 1.2
This patch documents the extended HVM features.
Author: schreiberal
Reviewed-by: imsnah
Iustin Pop [Sat, 31 May 2008 23:37:52 +0000 (23:37 +0000)]
Forward-port: patch 3/4 extended HVM features for 1.2
This patch adds hypervisor support for the extended HVM features.
Author: schreiberal
Reviewed-by: iustinp
Iustin Pop [Sat, 31 May 2008 23:14:35 +0000 (23:14 +0000)]
Forward-port: patch 2/4 extended HVM features for 1.2
This patch adds the commandline extensions and the code to store
and display the extended HVM features.
Author: schreiberal
Reviewed-by: iustinp
Iustin Pop [Fri, 30 May 2008 10:55:05 +0000 (10:55 +0000)]
Complete removal of md/drbd 0.7 code
This patch removes the last of the md and drbd 0.7 code. Cluster which
have the old device types will be broken if they have this applied.
Reviewed-by: imsnah
Iustin Pop [Fri, 30 May 2008 10:52:21 +0000 (10:52 +0000)]
LURemoveInstance: fix op.ignore_failures usage
Currently: the LURemoveInstance.Exec() method uses the ignore_failures
attribute of the OpRemoveInstance opcode, but it doesn't check for its
existence. The patch adds this attribute to _OP_REQP and to all the
places where this opcode was created.
This attributes is always passed by gnt-instance, but burnin didn't pass
it so it can fail if it enters the 'fail to remove disks' branch of the
method (which is why it was not triggered until now).
Reviewed-by: ultrotter, imsnah
Iustin Pop [Thu, 29 May 2008 09:59:45 +0000 (09:59 +0000)]
Documentation: cleanup of local/remote_raid1
Since we have removed support for local and remote raid1, update the man
pages and guides to reflect the new situation.
Reviewed-by: imsnah
Guido Trotter [Sat, 24 May 2008 08:59:39 +0000 (08:59 +0000)]
Distribute dumb-allocator in examples
When creating the ganeti tarball the dumb allocator was left out.
Shipping it alongside the other examples.
Reviewed-by: iustinp
Michael Hanselmann [Thu, 15 May 2008 14:38:31 +0000 (14:38 +0000)]
Update command line help and manpages with mandatory options
Reviewed-by: ultrotter
Guido Trotter [Thu, 15 May 2008 14:22:56 +0000 (14:22 +0000)]
document cluster verify --no-nsplus1-mem option
Add this recently added option to the gnt-cluster man page before
releasing 1.2.4.
Reviewed-by: imsnah
Guido Trotter [Thu, 15 May 2008 09:00:10 +0000 (09:00 +0000)]
Fix drbd show parser to handle valueless keywords
It turns out in some cases there can exist keywords without an
associated value exported by drbdsetup show. This patch makes the value
part optional in our parser, so that if it's not present the parsing
result will contain an array with just the keyword in it. This is not a
problem since we check all keyword names before accessing their values,
so we won't mistakenly try to access the value of a valueless keyword.
Reviewed-by: iustinp
Guido Trotter [Thu, 15 May 2008 09:00:01 +0000 (09:00 +0000)]
Split drbd command creation and execution
Make _AssembleDisk more similar to _AssembleNet by splitting the
generation of the drbdsetup command and its execution. While not
changing anything this makes it easier to manipulate the command just in
certain cases, which in the future we'll need to do.
Reviewed-by: iustinp
Iustin Pop [Tue, 13 May 2008 14:42:18 +0000 (14:42 +0000)]
Small style fixes
[Trunk version]
Reviwed-by: imsnah
Iustin Pop [Tue, 13 May 2008 14:33:12 +0000 (14:33 +0000)]
Implement node daemon conectivity tests
This patch adds in gnt-cluster verify checks for inter-node tcp
communication checks on the node daemon port for both the primary and
(if defined) secondary networks.
The output looks like (4-node cluster, one with the secondary interface
down):
* Verifying node node1.example.com
- ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
* Verifying node node2.example.com
- ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
* Verifying node node3.example.com
- ERROR: tcp communication with node 'node1.example.com': failure using the secondary interface(s)
- ERROR: tcp communication with node 'node2.example.com': failure using the secondary interface(s)
- ERROR: tcp communication with node 'node4.example.com': failure using the secondary interface(s)
* Verifying node node4.example.com
- ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
Reviewed-by: imsnah
Michael Hanselmann [Tue, 13 May 2008 14:26:57 +0000 (14:26 +0000)]
Forward-port changes made to readd in 1.2
qa_node.py: Fix typo in message
cmdlib.py: Don't add readded node to node list
ganeti-qa.py: Make sure readd isn't done for master node
Reviewed-by: iustinp
Iustin Pop [Tue, 13 May 2008 13:41:41 +0000 (13:41 +0000)]
CLI: retry: remove command opts/args in "gnt-X"
This new version of the patch removes only the listing of the usage in
the "gnt-X" list, but keeps the strings in since we'll want to enhance
and use them in "gnt-X $cmd --help".
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 May 2008 13:04:17 +0000 (13:04 +0000)]
Revert "CLI: remove command opts/args in "gnt-X""
This reverts commit 976.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 May 2008 12:24:56 +0000 (12:24 +0000)]
CLI: remove command opts/args in "gnt-X"
[Forward-port of the 1.2 branch patch]
This patch removes all the parameters and options from the output
"gnt-X" (i.e. the subcommand list for command). This is done in order to
uniformize the output, currently only some parameters are shown and they
are not always consistent (e.g. required versus important parameters).
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 May 2008 09:48:15 +0000 (09:48 +0000)]
Watcher: do not activate disks for started instances
Currently the watcher runs first the instance startup and then the
boot-id method of disk reactivation. However, irrelevant of the fact
that a node has rebooted or not, if we just started an instance, there's
no need for its disks to be activated again, since the start instance
has done that (if it is at all possible).
The patch modifies the watcher to remember all started instances and not
run activate-disks for them.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 May 2008 09:48:07 +0000 (09:48 +0000)]
Watcher: do not activate disks for admin_down
Currently the watcher does activate disks (via bootid mechanisms) even
for admin_down instances. This patch logs and skips over these
instances.
Reviewed-by: ultrotter
Iustin Pop [Tue, 13 May 2008 07:32:58 +0000 (07:32 +0000)]
Reduce chance of ssh failures in verify cluster
The cluster verify builds a sorted list of nodes and passes that to all
the nodes (in parallel) for ssh checks. This means that for a cluster
with N nodes, there will be approximately N simultaneous connections to
the first node, then to the second node, etc. This, coupled with the
ssh daemon's “MaxStartups” parameter, can create false alarms about ssh
connectivity.
This patch randomizes the node list in the backend (therefore, each node
should have it's own order of ssh-ing to the other nodes) and the chance
of these alarms should be reduced.
Reviewed-by: ultrotter
Iustin Pop [Mon, 12 May 2008 09:14:28 +0000 (09:14 +0000)]
bdev: always log command output if it failed
Currently many error handling code paths in bdev.py log only
result.fail_reason (i.e. exit code or signal that killed the command)
but not its output. This makes debugging very hard.
The patch changes all places where we only log fail_reason to also log
result.output.
Reviewed-by: ultrotter
Iustin Pop [Sat, 10 May 2008 08:25:57 +0000 (08:25 +0000)]
DRBD: Fix another bug in diskless activation
DRBD8 requires that we pass ‘--create-device’ to the first command that
wants to activate a new DRBD minor. We do this currently when we run the
“drbdsetup ... disk” command which we run before the network setup.
But if the LVs are missing, we skip the ‘disk’ subcommand and run only
the ‘net’ one, so it might be that the activation fails because the
minor we selected was never created in the first place.
The patch adds the required parameter to the DRBD8._AssembleNet() call.
Since it's a no-op for existing minors, it should not create any
problems (tested and works both with configured and unconfigured
minors).
Reviewed-by: ultrotter
Michael Hanselmann [Fri, 9 May 2008 10:12:08 +0000 (10:12 +0000)]
Remove utils.CheckDaemonAlive and use “xm info” instead
There are a couple of reasons for doing so:
- /proc is not OS independent, it's only supported by Linux (there are
emulations on other systems, but those might differ from the way
Linux represents data).
- Checking a daemon's state doesn't necessarily mean it's usable.
Connecting to the socket using “xm info” is much safer.
- Reduce code size.
Reviewed-by: iustinp
Guido Trotter [Thu, 8 May 2008 19:50:14 +0000 (19:50 +0000)]
Improve DRBD8.Open's docstring a bit more
Reviewed-by: iustinp
Guido Trotter [Thu, 8 May 2008 19:50:04 +0000 (19:50 +0000)]
Fix comment typo in bdev.py
Reviewed-by: iustinp
Iustin Pop [Thu, 8 May 2008 08:21:20 +0000 (08:21 +0000)]
Fix DRBD8 diskless assembling
The algorithm for attaching to existing DRBD devices is not trivial. It
has four alternatives, and there is a bug in the last one when we have
diskless devices.
The last case (local disk info matches but remote/network configuration
doesn't match) we disconnect from the network and reattach with the
correct info. We do this because correct local device has higher
priority than remote device.
However, the test we use (self._MatchesLocal) can succeed in two cases:
- we have a disk and it's the same as the one attached
- we don't have a disk and the drbd is in diskless mode
But this creates problems for the fourth case as when we already have
one diskless DRBD, activating then next one will do:
- _MatchesLocal? yes, because both config data and system have no
disks (with the effect that all diskless devices are identical)
- _MatchesRemote? no, because this disk is configured to its current
remote peer, not to our new one
The fix is trivial, although the algorithm not: we only allow overriding
the network configuration when the disk information matches and we are
not diskless, by adding the <"local_dev" in info'> test.
Reviewed-by: ultrotter
Michael Hanselmann [Wed, 7 May 2008 11:12:50 +0000 (11:12 +0000)]
Add unittest for constants
Reviewed-by: iustinp
Michael Hanselmann [Wed, 7 May 2008 11:12:37 +0000 (11:12 +0000)]
Use new ssconf function to check configuration version
Upgrades will be handled in future patches.
Reviewed-by: iustinp
Michael Hanselmann [Tue, 6 May 2008 10:20:49 +0000 (10:20 +0000)]
Use dict instead of if/elif map for hypervisor classes
Reviewed-by: iustinp
Michael Hanselmann [Tue, 6 May 2008 10:20:27 +0000 (10:20 +0000)]
Rename hypervisor code to lowercase filenames
Reviewed-by: iustinp
Michael Hanselmann [Mon, 5 May 2008 15:27:53 +0000 (15:27 +0000)]
Generate devel/upload during build time from template
- Use variable with prefix instead of grep and sed
- Always run with /bin/bash
Reviewed-by: ultrotter
Iustin Pop [Mon, 5 May 2008 10:03:39 +0000 (10:03 +0000)]
Export the number of cpus to iallocator scripts
Now that we have the number of cpus available from the hypervisors, we
can export this to the iallocator scripts.
Reviewed-by: ultrotter