Guido Trotter [Fri, 24 Apr 2009 16:13:45 +0000 (16:13 +0000)]
Update gnt-instance(8) for info
Add the --all argument, and reword a bit the basic information.
Reviewed-by: iustinp
Guido Trotter [Fri, 24 Apr 2009 16:13:29 +0000 (16:13 +0000)]
gnt-instance info --all
Don't show all instances info by default, but require --all to be passed
for this time consuming operation.
Reviewed-by: iustinp
Iustin Pop [Fri, 24 Apr 2009 14:36:17 +0000 (14:36 +0000)]
LUDiagnoseOS: change locking and error handling
Since the “list OSes” call is exported via RAPI, this can be used pretty
easily to DOS the master daemon during long jobs.
The implementation of LUDiagnoseOS makes an RPC call to all nodes; we
lock nodes here in order to prevent node removal.
However, after closer examination, the worst case is:
- we get the list of nodes from the config
- another thread removes a node
- our RPC queries reach the removed node
As this point, if ganeti-noded is stopped or doesn't accept our queries,
the RPC call will return failed, and in the current implementation all
OSes will become invalid.
If we change the ‘failed RPC’ handling to ignore such nodes, this allows
us to both remove locking, and to handle transient RPC failures better
(not invalidating all OSes).
This patch does both these things, with a single drawback: in gnt-os
diagnose, the down nodes do not appear at all. I think this is a small
drawback, and the alternative is to add them with status failed; this
works (3-line patch), but then the output of “list” and “diagnose” will
no longer be consistent. As such, my proposal is to not list the nodes.
Reviewed-by: ultrotter
Iustin Pop [Fri, 24 Apr 2009 08:43:09 +0000 (08:43 +0000)]
Fix verify-disks with broken volume groups
When a remote node returns invalid LVM data, we check it, but we don't
stop and continue with the rest of the checks (which require a valid
volume group). This raises an internal error and breaks verify disks.
This seems unchanged for a long while, I don't know why it surfaced just
recently.
Reviewed-by: ultrotter
Iustin Pop [Fri, 24 Apr 2009 08:43:01 +0000 (08:43 +0000)]
Prevent errors when xenvg is broken cluster verify
When vg_name is not returned at all, we currently abort with an internal
error. This is because we don't catch KeyError.
This patch adds a custom message for this case, and also adds KeyError
to the list of catched exceptions, just for safety.
On the other hand, we could also just remove this piece of code since
it's not used at all the ["dfree"] value.
Reviewed-by: ultrotter
Iustin Pop [Wed, 15 Apr 2009 11:11:12 +0000 (11:11 +0000)]
A bunch of doc and other small fixes
This patch adds a couple of both externally and internally reported
issues:
- missing SGML tags (Issue 54), report and patch by superdupont
- wrong variable used in the init.d script, report and patch by
Karsten Keil <karsten-keil@t-online.de>
- man page for gnt-instance reinstall needs clarification (Issue 56)
- gnt-instance man page missing --disks documentation for
replace-disks
- gnt-node modify help output is unclear about the -C/-D/-O input
format, and the man page doesn't document this command at all
- “gnt-node modify -C yes” for offline or drained nodes had wrong
error message
- “gnt-instance reinstall --select-os” has wrong prompt, we only
accept a number for the OS and not the template name
Reviewed-by: ultrotter
Alexander Schreiber [Tue, 14 Apr 2009 16:42:43 +0000 (16:42 +0000)]
Trivial typo fix in error message
Reviewed-by: iustinp
Iustin Pop [Wed, 8 Apr 2009 12:34:49 +0000 (12:34 +0000)]
Release 2.0rc3
Burnin tests were successful, release rc3.
Reviewed-by: imsnah
Iustin Pop [Tue, 7 Apr 2009 11:53:58 +0000 (11:53 +0000)]
Distribute built documentation
This patch changes the way documentation is built in order to distribute
the generated output in the 'dist' archive, and thus no longer
requiring the presence of the docbook/rst toolchains during build time.
This will lower the requirements for installation and also makes the
build time insignificant.
First, we remove the docbook2pdf rules and variables, since we no longer
build this kind of docs. Furthermore, the rst source files are not
(today) processed via replace_vars_sed, so the whole .in rules for doc/
go away.
Next, we change the ".sgml|.rst -> replace_vars_sed -> .in -> processor
-> final file" processing to ".sgml|.rst -> generator -> .in ->
replace_vars_sed -> final file"; this means we first process the file
using the formatter, with the @VARIABLE@ entries in it, and save the
output as .in; this output we distribute, and on the user side, the
replace_vars_sed will use the new configure flags to transform the
(almost final .in form) to the final form, without needing the
toolchain.
In configure.ac we also change from ERROR to WARN for the documentation
generators, and extra tests in Makefile.am check that the programs have
been found.
This was tested with distcheck and works as expected.
Reviewed-by: ultrotter
Iustin Pop [Mon, 6 Apr 2009 08:21:30 +0000 (08:21 +0000)]
Disable synchronous (locking) queries
This patch raises an error in the master daemon in case the user
requests a locking query; accordingly, all clients were modified to send
only lockless queries. This is short-term fix, for proper fix the
clients should be modified to submit a job when the user request a
locking query.
The other approach would be to ignore the flag passed by the client;
this would be worse as client's wouldn't get at least an error.
The possible impact of this is multiple:
- some commands could have been not converted, and thus fail; this
can be remedied easily
- the consistency of commands is lost; e.g. node failover will not
lock the node *while we get the node info*, so we could miss some
data; this is again in the thread of atomic operations which are
missing in the current model of query-and-act from gnt-* scripts
Reviewed-by: imsnah, ultrotter
Iustin Pop [Mon, 6 Apr 2009 08:21:13 +0000 (08:21 +0000)]
Fix the output of watcher on non-master nodes
Currently the watcher spews errors message on non-master nodes. This
cleans it up.
Reviewed-by: imsnah
Iustin Pop [Mon, 6 Apr 2009 08:21:04 +0000 (08:21 +0000)]
Change the watcher to use jobs instead of queries
As per the mailing list discussion, this patch changes the watcher to
use a single job (two opcodes) for getting the cluster state (node list
and instance list); it will then compute the needed actions based on
this data.
The patch also archives this job and the verify-disks job.
Reviewed-by: imsnah
Iustin Pop [Mon, 6 Apr 2009 08:20:53 +0000 (08:20 +0000)]
Fix Xen soft reboot via polling
This patch fixes the Xen soft reboot ("xm reboot") via polling for a specific
time for either changed domain ID or decreased CPU run-time.
This sould prevent the race-conditions discussed on the mailing list for
reboots.
Reviewed-by: imsnah
Iustin Pop [Mon, 6 Apr 2009 08:20:40 +0000 (08:20 +0000)]
Add a new ssconf file with the cluster tags
Since the cluster tags are/should be more-or-less static, add them as an
ssconf key, so that querying them is possible without creating a
job/requiring the masterd to be running.
Reviewed-by: imsnah
Iustin Pop [Mon, 6 Apr 2009 08:20:28 +0000 (08:20 +0000)]
Add some more debugging info to masterd
This patch will log data about queries, which are today completely
invisible (at the default log level) in the master log file.
Reviewed-by: imsnah
Iustin Pop [Fri, 27 Mar 2009 15:11:54 +0000 (15:11 +0000)]
Release 2.0rc2
This updates the NEWS file and bumps up the version number.
Reviewed-by: ultrotter
Guido Trotter [Fri, 20 Mar 2009 13:07:22 +0000 (13:07 +0000)]
Fix _NOQUOTE regexp
Allow expressions longer than one character to match.
Reviewed-by: imsnah
Guido Trotter [Fri, 20 Mar 2009 13:06:56 +0000 (13:06 +0000)]
Mainloop: avoid calculating timeout every time
set timeout_needs_update to False after calculating the timeout.
Reviewed-by: imsnah
Guido Trotter [Fri, 20 Mar 2009 13:06:29 +0000 (13:06 +0000)]
Raise on invalid gnt-cluster queue commands
# gnt-cluster queue foo
Failure: prerequisites not met for this operation:
Command 'foo' is not valid.
Reviewed-by: iustinp
Guido Trotter [Thu, 12 Mar 2009 12:08:04 +0000 (12:08 +0000)]
kvm: use the correct vnc bind address
There is a bug in kvm, when binding vnc to a specific address the
constant 'vnc_bind_address' is passed in, instead of the actual
requested address. This patch fixes it.
Reviewed-by: iustinp
Iustin Pop [Thu, 12 Mar 2009 11:54:35 +0000 (11:54 +0000)]
Add the 2.0-specific node flags to the design doc
This patch adds the newly-introduced node flags to the design document,
as they currently are missing from there.
The patch also reduces the TOC depth to 3, as it was too big.
Reviewed-by: ultrotter
Iustin Pop [Thu, 12 Mar 2009 11:54:22 +0000 (11:54 +0000)]
Fix the --net option to gnt-instance add
Similar to the --disk fixes a while ago, --net is broken too. This patch
fixes it.
Reviewed-by: imsnah
Guido Trotter [Tue, 10 Mar 2009 15:02:43 +0000 (15:02 +0000)]
Xen: Remove one hardcoded constant
s/"vnc_bind_address"/constants.HV_VNC_BIND_ADDRESS/
Reviewed-by: imsnah
Iustin Pop [Mon, 9 Mar 2009 15:12:24 +0000 (15:12 +0000)]
watcher: fix startup sequence locking the master
Currently, the watcher startup sequence does:
- open a luxi client
- get the instance list
- get the node boot ids
- open and lock the status file, and:
- archive jobs
- restart the down instances
- check disks
This, of course, can lead to problems when a node is (genuinely or not)
locked for more than (watcher interval * maximum query clients) time. At
that time, the master is completely unresponsive until the node is
unlocked and all the watchers exit with error due to the state file
being locked by the first instance.
This patch reworks the startup sequence to first open/lock the status
file, and only then open a luxi client. This should prevent the above
case.
Reviewed-by: ultrotter
Iustin Pop [Mon, 9 Mar 2009 15:12:13 +0000 (15:12 +0000)]
Handle ghost instances in temp DRBD map
Currently cluster-verify doesn't handle the (admitedly invalid) case where we
have reservation for instances that were removed in the meantime.
This patch adds a check for this and prevents code errors in cluster-verify in
this case:
* Verifying node node4.example.com (master candidate)
- ERROR: ghost instance \'instance3.example.com\' in temporary DRBD map
Reviewed-by: imsnah
Iustin Pop [Mon, 9 Mar 2009 15:12:01 +0000 (15:12 +0000)]
Fix error handling in replace-disks with new node
Currently the _CreateSingleBlockDev function only raises OpExecError and not
BlockDeviceError. This means that we don't release the instance's temporary
minors properly, and this creates problems later if the instance is removed
without master restart.
We could just use OpExecError, but adding it and leaving
BlockDeviceError in seems safer.
Reviewed-by: imsnah
Iustin Pop [Fri, 6 Mar 2009 14:49:06 +0000 (14:49 +0000)]
Fix serial_no field on instances
The instance objects did not get a serial_no field. This patch adds a
new constants for the field name and uses it for all three cases
(cluster, nodes, instances).
Reviewed-by: imsnah
Guido Trotter [Thu, 5 Mar 2009 15:42:21 +0000 (15:42 +0000)]
Update gnt-cluster(8) for be/hyp parameter syntax
Now it displays:
--hypervisor-parameters hypervisor:hv-param=value [ ,hv-param=value ... ]
--backend-parameters be-param=value [ ,be-param=value ... ]
Sorry for the super-long lines :( Is there a better way to insert spaces
without pushing them to the resulting man page?
Reviewed-by: iustinp
Iustin Pop [Wed, 4 Mar 2009 14:22:52 +0000 (14:22 +0000)]
Complete the cfgupgrade script for 2.0 migrations
This patch makes the cfgupgrade script to handle:
- instance changes
- disk changes
- further cluster fixes
- adds configuration checks at the end, in non-dry-run mode
Reviewed-by: ultrotter
Iustin Pop [Wed, 4 Mar 2009 14:20:47 +0000 (14:20 +0000)]
First run at cfgupgrade for 2.0 upgrades
This patch makes cfgupgrade work on empty cluster (i.e. no instances),
up to a point that the config file can be converted from 1.2 to 2.0.
This is not yet complete, though.
Reviewed-by: ultrotter
Iustin Pop [Wed, 4 Mar 2009 10:13:53 +0000 (10:13 +0000)]
Fix bash completion for cluster copyfile/command
“copyfile” takes a file argument, so we enable file-completion for it.
“gnt-cluster command” takes a command, so we enable command completion.
Reviewed-by: imsnah
Iustin Pop [Mon, 2 Mar 2009 14:30:26 +0000 (14:30 +0000)]
Release 2.0rc1
This patch updates the NEWS file and increases the version to 2.0 rc1.
Reviewed-by: ultrotter
Iustin Pop [Mon, 2 Mar 2009 12:19:45 +0000 (12:19 +0000)]
Export tags to cluster verify hooks
This patch export the cluster and node tags to the cluster verify hook
scripts. The tags are exported as a space-separated list, which allows
easy parsing from the shell (e.g. “for tag in $GANETI_CLUSTER_TAGS; do
...”) and therefore requires the previous “Don't allow spaces in tag
names” patch.
The patch also fixes a minor line length style problem.
Reviewed-by: ultrotter
Iustin Pop [Mon, 2 Mar 2009 12:19:23 +0000 (12:19 +0000)]
Don't allow spaces in tag names
This patch restricts the use of spaces in tags, as this does not allow
nice exporting of tags to environment in hooks. One can use underscores
or dashes instead of spaces.
Reviewed-by: schreiberal
Iustin Pop [Mon, 2 Mar 2009 12:18:42 +0000 (12:18 +0000)]
Update the iallocator documentation
This updates the iallocator documentation to 2.0, bumps up the
iallocator version (and moves a constants to lib/constants.py), and
fixes a style on install.rst.
Reviewed-by: ultrotter
Iustin Pop [Mon, 2 Mar 2009 12:18:13 +0000 (12:18 +0000)]
Fix a bug in utils.EnsureDirs
This fixes a bug introduced in rev 2562 and also fixes the indentation.
Reviewed-by: ultrotter
Iustin Pop [Mon, 2 Mar 2009 09:51:15 +0000 (09:51 +0000)]
A doc update and a small indentation fix
This adds a small paragraph about the “master” role of a node, and fixes
a wrong indentation in the bash completion file.
Reviewed-by: imsnah
Guido Trotter [Fri, 27 Feb 2009 17:09:15 +0000 (17:09 +0000)]
Use EnsureDirs in KVM as well.
The KVM hypervisor has also code to ensure a list of directories exist.
Substitute it with our new utils function.
Reviewed-by: iustinp
Guido Trotter [Fri, 27 Feb 2009 17:08:55 +0000 (17:08 +0000)]
Create runtime dir in bootstrap
Some hypervisors (KVM) need RUN_GANETI_DIR to exist even at cluster init
time. This patch creates it in InitCluster just before hv parameter
checking. Since the code to make list of directories is already repeated
twice in the code, and this would be the third time, we abstract it into
an utils.EnsureDirs function and we call that one from ganti-noded,
ganeti-masterd and bootstrap.
Reviewed-by: iustinp
Guido Trotter [Fri, 27 Feb 2009 17:08:31 +0000 (17:08 +0000)]
LUVerifyCluster: Handle the "no volume group" case
If we're only file based and out volume group is set to "None" there's
no point in asking nodes for their volume groups, logical volumes, and
drbd devices, and checking those.
Reviewed-by: iustinp
Iustin Pop [Fri, 27 Feb 2009 13:06:31 +0000 (13:06 +0000)]
Convert the RAPI document to restructured text
This patch changes the RAPI document, and the RAPI resources
autogenerated-documentation to restructured text. This meant changing
the autogen tool.
The new fragment can be included via RST directives, and doesn't need
passing through replace-sed-vars. This was also the last sgml document
in doc/, so we remove old makefile rules about it.
Reviewed-by: imsnah
Iustin Pop [Fri, 27 Feb 2009 13:06:15 +0000 (13:06 +0000)]
Fix some epydoc style issues
99% of the epydoc return tags are "@return:", but each of the modified files
had one "@returns:" line. We fix this for consistency.
Reviewed-by: imsnah
Iustin Pop [Fri, 27 Feb 2009 10:38:51 +0000 (10:38 +0000)]
Convert the install document to restructured text.
This switches back to the hardcoding of the version number, as we don't
yet have a wrapper for rst files that passes them through
replace-sed-vars.
Reviewed-by: imsnah
Iustin Pop [Thu, 26 Feb 2009 16:27:45 +0000 (16:27 +0000)]
Fix the Makefile after the bash_completion patch
I've somehow left these two out. Sorry!
Reviewed-by: imsnah
Iustin Pop [Thu, 26 Feb 2009 16:11:11 +0000 (16:11 +0000)]
Add bash-completion rules
This is a not-complete bash completion file for ganeti commands (gnt-*)
and the burnin tool. It is based on previous work by Minghua Ye
<yeminghua@google.com> for Ganeti 1.1, which wasn't used because the
lack of ssconf keys (which allow easy inspection by the shell of the
existing nodes and instances) made it too slow.
The file works as expected, however I realized that our custom (like
comma-separated, or a=b:c,e:f) options are not very nice for
auto-completion. There are a few FIXMEs in the source for that.
The file is not installed at make install time, but it should be put in
the correct place by packages.
Reviewed-by: imsnah
Michael Hanselmann [Thu, 26 Feb 2009 12:32:04 +0000 (12:32 +0000)]
Fix typos in utils.WriteFile's docstring
Reviewed-by: iustinp
Iustin Pop [Wed, 25 Feb 2009 15:03:43 +0000 (15:03 +0000)]
Fix mixed pvm/hvm clusters and instance listing
The current implementation of the combining of the instance lists will
only do this for instances whose all four-fields match in both
hypervisors; however, this is broken for the dynamic fields (state,
times) which can change between the invocations of the two different
hypervisors if the instance is busy.
The patch checks only the memory and VCPUs, and makes mixed clusters
work even with 100% CPU instances.
Reviewed-by: imsnah
Iustin Pop [Wed, 25 Feb 2009 15:03:24 +0000 (15:03 +0000)]
Fix xen-hvm and KERNEL_ARGS
xen-hvm doesn't have KERNEL_ARGS, and I just changed blindly all old
extra_args to HV_KERNEL_ARGS. This makes xen-hvm work again.
Reviewed-by: imsnah
Iustin Pop [Wed, 25 Feb 2009 12:50:46 +0000 (12:50 +0000)]
Update some version-related constants
Since we are quite close to final RPC and hooks APIs, we update the hooks and
protocol_version constants.
Reviewed-by: imsnah
Iustin Pop [Wed, 25 Feb 2009 11:24:10 +0000 (11:24 +0000)]
Convert the hooks document to restructured text
This also updates the hooks document to 2.0.
Reviewed-by: ultrotter
Iustin Pop [Wed, 25 Feb 2009 11:23:58 +0000 (11:23 +0000)]
Update some hooks settings
While reviewing the hooks document, I realised we are not correctly
exporting the instance properties.
This patch fixes:
- export the disk and disk template in all LUs, not only (hardcoded)
in the instance create
- removes the instance create INSTANCE_ prefix on some non-instance
variables (those are LU-related, not instance-related)
- adds a couple of more variables to other LUs
The hook document will be updated in a separate patch.
Reviewed-by: ultrotter
Iustin Pop [Tue, 24 Feb 2009 15:25:37 +0000 (15:25 +0000)]
Remove the extra_args parameter in instance start
This patch removes the extra_args parameter and instead switches the
instance to the HV_KERNEL_ARGS hypervisor option.
This is a big change, but it's a needed cleanup, this extra parameter on
all RPC calls is not generic and we also need to have a persistent value
here.
Reviewed-by: imsnah
Iustin Pop [Tue, 24 Feb 2009 15:25:01 +0000 (15:25 +0000)]
Simplify a little the hypervisor routines
Instead of “instance.hvparams”, we use a shorter “hvp” name to make readability
better.
Reviewed-by: imsnah
Iustin Pop [Tue, 24 Feb 2009 15:24:45 +0000 (15:24 +0000)]
Add definitions for the root_args hypervisor param
This patch adds a new hypervisor parameter for the hypervisors that can
actually start and instance with external kernels.
Reviewed-by: imsnah
Iustin Pop [Tue, 24 Feb 2009 13:57:53 +0000 (13:57 +0000)]
Convert iallocator.sgml to restructured text
This is a no-contents change, this doc will need update to conform to
2.0 message contents (and also the code will need increase to version 2
of the iallocator protocol).
Reviewed-by: imsnah
Iustin Pop [Tue, 24 Feb 2009 13:56:12 +0000 (13:56 +0000)]
Convert the admin guide to restructured text
The RST format holds a little bit less information, as all the <file
class="directory"> and <userinput> tags are gone, however we're not
really losing important context here. And it's way easier to read and
update.
Reviewed-by: imsnah
Guido Trotter [Tue, 24 Feb 2009 12:59:13 +0000 (12:59 +0000)]
gnt-instance info: remove hvattr descriptions
Having hvattr descriptions is only confusing for the user, because even
if they explain better what an attribute is about, they don't help in
deciding what keyword should be used to actually set it. If in the
future we want descriptions they should probably live in constants.py,
and be displayed together with the key, rather than instead of it.
This patch also changes the handling of the vnc console connection
description, making compatible work with both kvm and xen-hvm.
Reviewed-by: iustinp
Iustin Pop [Tue, 24 Feb 2009 11:23:30 +0000 (11:23 +0000)]
Make gnt-instance info work with offline nodes
This simply makes LUQueryInstanceData return the same information as for
a static query when one or both of the nodes are down.
Reviewed-by: imsnah
Guido Trotter [Fri, 20 Feb 2009 10:45:34 +0000 (10:45 +0000)]
dumb-allocator: avoid allocating on drained nodes
This was forgotten when drained nodes were added.
Reviewed-by: iustinp
Iustin Pop [Fri, 20 Feb 2009 09:56:33 +0000 (09:56 +0000)]
Also generate HTML format for the man pages
This would help in generating online-viewable docs, which could link to
the man pages.
Reviewed-by: imsnah
Iustin Pop [Thu, 19 Feb 2009 15:49:23 +0000 (15:49 +0000)]
Update version numbers to beta2
Note that the RAPI change is in a docstring (i.e. example), not in code.
Reviewed-by: ultrotter
Iustin Pop [Tue, 17 Feb 2009 12:44:02 +0000 (12:44 +0000)]
Show more details for failed xen commands
This patch also logs the output of the xm commands in case of failures;
some corner cases were forgotten in the last redo.
Reviewed-by: imsnah
Iustin Pop [Tue, 17 Feb 2009 12:43:51 +0000 (12:43 +0000)]
Update the install and admin documents
This is not a real update, just a quick pass changing the obvious parts.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Feb 2009 14:50:40 +0000 (14:50 +0000)]
QA: add support for burnin rename
This patch adds support for optionally doing the rename burnin test, and
adds an example to the sample QA file. To disable, either remove or
specify an empty rename target.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Feb 2009 14:50:30 +0000 (14:50 +0000)]
Fix some bugs in reboot
There are two issues fixed in this patch:
- first, the recent RPC changes caused loss of data in hard reboot
type; we weren't reporting any results from the stop/start instance
calls;
- second, in soft or hard reboots, we didn't initialized the disk
physical ID; based on the last state of the instance's disks, this
can create a failure in identifying the disks
After this patch, burnin works again with reboot, and reports errors
correctly.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Feb 2009 14:50:19 +0000 (14:50 +0000)]
Burnin: fix rename
In rename, we must stop different names in the first and second phases,
so we create two different opcodes for this purpose (instead of using
the same one twice, which doesn't work).
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Feb 2009 13:05:45 +0000 (13:05 +0000)]
Update NEWS for beta 2
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Feb 2009 12:17:18 +0000 (12:17 +0000)]
Convert IOErrors for /proc/drbd into our errors
If /proc/drbd can't be opened, this raises an IOError, but all the
error-handling behaviour in backend treats only BlockDeviceErrors. This
creates a plain failure in cluster verify and in other RPC calls.
This patch simply converts EnvironmentErrors into BlockDeviceErrors, and
also changes the RPC result for NV_DRBDLIST and its handling to be able
to show the error. The other RPC calls work by default now, due the
existing error handling.
Reviewed-by: ultrotter
Guido Trotter [Mon, 16 Feb 2009 12:16:59 +0000 (12:16 +0000)]
DEVNOTES: we have no --enable-rapi anymore
Remove it from the suggested development ./configure line
Reviewed-by: iustinp
Guido Trotter [Mon, 16 Feb 2009 12:09:25 +0000 (12:09 +0000)]
Convert default root partition to msdos style
As discussed with 2.0 msdos partition style should be the default in the
instance OS, so we're changing the default instance params accordingly.
A followup patch will update the debootstrap os.
Reviewed-by: iustinp
Iustin Pop [Mon, 16 Feb 2009 11:08:18 +0000 (11:08 +0000)]
watcher: fix checking of boot IDs
The recent change (commit 2151) to the watcher to make it handle offline
nodes also saves the offline attribute to the state file, but this is
not needed and also breaks the checking of the boot ID. This patch
simply removes it, restoring the correct behaviour.
Reviewed-by: imsnah
Iustin Pop [Mon, 16 Feb 2009 11:08:10 +0000 (11:08 +0000)]
watcher: autoarchive old jobs
This patch adds auto-archiving of jobs older than 6 hours to the
watcher.
Reviewed-by: imsnah
Iustin Pop [Fri, 13 Feb 2009 16:17:05 +0000 (16:17 +0000)]
RAPI: documentation updates
This patch fixes the version and does some update to the RAPI resources
docs.
Reviewed-by: imsnah
Iustin Pop [Fri, 13 Feb 2009 15:54:41 +0000 (15:54 +0000)]
RAPI: fixes related to write mode
This patch fixes many small issues related to write functions:
- update documentations w.r.t. how to add users
- update the instance add function for latest API
- add instance delete
- fix addition of tags
- update some error messages
Reviewed-by: imsnah
Iustin Pop [Fri, 13 Feb 2009 15:35:05 +0000 (15:35 +0000)]
Some small improvements to the fake hypervisor
This patch modifies the fake hypervisor to substract the memory “used”
by “running” instances from the free memory, so the actual node
information changes based on the running instances.
Also some style changes and fixes are added.
Reviewed-by: ultrotter
Iustin Pop [Fri, 13 Feb 2009 15:34:48 +0000 (15:34 +0000)]
Implement the backward-compatible ‘-s’ disk option
This patch adds back to the instance creation command (gnt-instace add,
gnt-backup import) the ‘-s’ short form option for specifying a
single-disk instance.
Also a small bug in gnt-backup import is fixed.
Reviewed-by: ultrotter
Guido Trotter [Fri, 13 Feb 2009 13:49:06 +0000 (13:49 +0000)]
SetInstanceParams: export nic changes to hooks
Currently we export the old instance "as is" and any nic changes get
lost, so hooks won't know of a different ip, bridge, or mac address.
This patch fixes it by putting the nics in the override dict, if any
changes are done.
Reviewed-by: iustinp
Guido Trotter [Fri, 13 Feb 2009 12:28:14 +0000 (12:28 +0000)]
Remove two fixed FIXME and convert one to TODO
The cli FIXME is not something broken, but rather some better handling
feature we'd rather have, and the two backend FIXME are done (disks have
their read only parameter set, and the error is raised and thus reaches
the master).
Reviewed-by: iustinp
Iustin Pop [Fri, 13 Feb 2009 11:38:26 +0000 (11:38 +0000)]
RAPI: format error messages as JSON
This patch changes the format of the HTTP error messages from text/html, which
is hard to parse from RAPI clients, to JSON which can be automatically parsed.
The error message is an object, which contains always three keys:
- code, an integer with the error code
- message, a short description
- explain, holding (if available) a description of the error
In order to implement this, there is a bit of change to the http server
and executor classes. I've tested and the error handling still works
(but less optimal, no error message) in case the error formatting itself
raises an exception.
Reviewed-by: imsnah
Iustin Pop [Fri, 13 Feb 2009 11:38:08 +0000 (11:38 +0000)]
Make RAPI return 502/504 errors for luxi errors
This changes the RAPI error codes for luxi errors; a timeout error is
now reported properly as 504, while any other luxi error is reported as
502.
It would be good to convert even more errors into proper return codes in
the future.
Reviewed-by: imsnah
Iustin Pop [Fri, 13 Feb 2009 11:37:57 +0000 (11:37 +0000)]
Fix ganeti-rapi startup with missing certificate
This patch displays a nicer error message compared to the default
stacktrace.
Reviewed-by: imsnah
Iustin Pop [Thu, 12 Feb 2009 18:13:23 +0000 (18:13 +0000)]
job queue: log the opcode error too
Currently we only log "Error in opcode ...", but we don't log the error itself.
This is not good for debugging.
Reviewed-by: ultrotter
Guido Trotter [Thu, 12 Feb 2009 17:35:43 +0000 (17:35 +0000)]
LUSetInstanceParams: Fix nic handling
CheckArguments:
Use constants.VALUE_NONE rather than hardcoding the string "none"
If we're adding a nic fill the nic_dict with default values
Check if the mac is syntactically valid, if we have one
Don't allow the mac to be 'auto' when modifying a nic
CheckPrereq:
Check that bridge and mac if present in the dict are not None
(before this wasn't handled at all)
Generate the nic mac address here if demanded
Exec:
Do not generate nics and macs
Reviewed-by: iustin
Guido Trotter [Thu, 12 Feb 2009 17:35:28 +0000 (17:35 +0000)]
ConfigWriter.AddInstance check instance mac
There is a race condition in CreateInstance, since the mac address is
generated early and only added to the config (and thus really assured to
be unique) only at this point. Since it's possible that another instance
gets the same mac address in the meantime with this check we'll make the
instance creation fail before modifying the config data and thus having
a wrong in-memory config (which is bad!!).
Note that the same race condition exists, for example, in
SetInstanceParams, and should be fully addressed by a way to revert
config changes if writing them fails!
Reviewed-by: iustin
Guido Trotter [Thu, 12 Feb 2009 17:35:10 +0000 (17:35 +0000)]
Instance Creation: generate nics earlier
We want the real nic to be shown to the hooks and the allocators, so
we'll generate them in CheckPrereq. We also write a comment about the
race condition we generate. This race condition existed even before, so
moving this generation will just lenghen it a bit. A separate patch
mitigates its effects.
This patch also adds an ENDIF comment for a very long if, and removes a
double empty line inside the CheckPrereq function of LUCreateInstance.
Reviewed-by: iustin
Iustin Pop [Thu, 12 Feb 2009 17:09:20 +0000 (17:09 +0000)]
Handle better broken disks
While running burnin:
File "/usr/lib/python2.4/site-packages/ganeti/objects.py", line 497, in __str__
val += ", size=%dm)>" % self.size
TypeError: int argument required
This happened while handling another error, so we lose the original
error information.
So we should try to handle this better.
Reviewed-by: ultrotter
Iustin Pop [Thu, 12 Feb 2009 17:05:08 +0000 (17:05 +0000)]
Update the command line scripts man pages for 2.0
This patch updates the gnt-* scripts to show the new 2.0 syntax. It's
not guaranteed to be 80% complete.
Reviewed-by: ultrotter
Iustin Pop [Thu, 12 Feb 2009 17:04:45 +0000 (17:04 +0000)]
Some command line scripts fixes
This patch changes the gnt-node and gnt-job list commands to accept
argument and list only the selected items, which is useful when having
many nodes or jobs.
It also removes the “--units” option from gnt-job list as we don't
actually use it.
Reviewed-by: imsnah
Iustin Pop [Thu, 12 Feb 2009 17:04:32 +0000 (17:04 +0000)]
Do not check 'None' disk IDs for duplicates
In case of 'None' logical or physical IDs, we don't need to check them
for duplicates. This case can happen for DRBD devices in case of newly
added disks, for example.
Reviewed-by: imsnah
Iustin Pop [Thu, 12 Feb 2009 17:04:19 +0000 (17:04 +0000)]
Prevent race condition on MAC addresses
This patch adds a temporary set for MACs that have been requested but
are not yet in the configuration (as part of an instance NIC). The MACs
of an instance are automatically removed from this set when the instance
is updated (or first added to the config).
Reviewed-by: ultrotter
Iustin Pop [Thu, 12 Feb 2009 17:04:07 +0000 (17:04 +0000)]
Always use the same short option for iallocator
This patch changes the scripts so that the short name for the
“--iallocator” option is always ‘-I’.
Reviewed-by: ultrotter
Iustin Pop [Thu, 12 Feb 2009 17:03:58 +0000 (17:03 +0000)]
Some batcher fixes
Currently the batcher hypervisor parameter must be a dict with one
element (e.g. {"xen-hvm": { "acpi": true }}). This is overly complex and
hard to validate correctly; the patch splits it in two:
- one "hypervisor" string parameter, with the name of the hypervisor
- one "hvparams" dictionary, with the hypervisor parameters
The patch also changes the error handling in parsing the definition file
- since this is not a long-running file, we are less concerned with safe
closing of the file, and more with presenting meaningful error
messages.
Reviewed-by: killerfoxi
Iustin Pop [Thu, 12 Feb 2009 17:03:46 +0000 (17:03 +0000)]
Some small fixes
This patch removes the admin_ram LUQueryInstances field (is broken
anyway) and fixes the VNC address checks in the Xen Hypervisor.
Reviewed-by: imsnah
Iustin Pop [Thu, 12 Feb 2009 17:03:33 +0000 (17:03 +0000)]
Fix LUQueryInstances fields.
The query fields are now regular expressions. We need to quote the dots,
otherwise invalid fields will be accepted but they will lose special
formatting in the cli scripts.
Reviewed-by: imsnah
Guido Trotter [Thu, 12 Feb 2009 09:15:52 +0000 (09:15 +0000)]
Apply the right permissions to /etc/hosts
In the current Ganeti version when modifying /etc/hosts we mistakenly
give it the permissions of the temporary file we create to define its
content, which is by default 0600. This breaks most non-root
applications, and thus must be corrected. This patch forces the mode to
be 0644 (but we might decide to just use the mode of the previous
/etc/hosts, if we want to be more polite against any eventual
administrative choice). We also add a new assertFileMode() method for
unit tests and actually check in the SetEtcHostsEntry and
RemoveEtcHostsEntry tests that the mode is correct, to be sure not to
reintroduce this bug again. Also, a FIXME is added in the original
functions stating that it would be nice to use WriteFile+fn() rather
than reimplementing its functionality again.
Reviewed-by: iustinp
Iustin Pop [Thu, 12 Feb 2009 07:34:21 +0000 (07:34 +0000)]
Fix RPC result handling in _AssembleInstanceDisks
For (status, data)-style RPC calls, the result data is in the ‘payload’
attribute. This was missed in the conversion patch, with the only side
effect that gnt-instance activate-disks didn't show a nice output
anymore.
Reviewed-by: ultrotter
Iustin Pop [Thu, 12 Feb 2009 07:33:41 +0000 (07:33 +0000)]
Man page updates for the ganeti daemons.
This patch adds new man pages for the master and RAPI daemons, and
updates the node daemon and watcher man pages.
Reviewed-by: ultrotter
Iustin Pop [Thu, 12 Feb 2009 07:32:03 +0000 (07:32 +0000)]
master daemon: allow skipping the voting process
This patch introduces a 'force' mode for the master daemon startup where
the voting process is not done, but the user has to confirm manually the
startup (before forking, of course).
Reviewed-by: imsnah
Iustin Pop [Thu, 12 Feb 2009 07:31:26 +0000 (07:31 +0000)]
Remove a duplicate line in sed_vars
LOCALSTATEDIR is added twice to the sed variables.
Reviewed-by: imsnah
Iustin Pop [Thu, 12 Feb 2009 07:31:04 +0000 (07:31 +0000)]
ConfigWriter: add checks for duplicate disk IDs
This patch adds a safety check for duplicate disk logical/physical IDs,
in order to prevent possible software bugs.
Reviewed-by: imsnah