Michael Hanselmann [Fri, 7 Jan 2011 10:12:05 +0000 (11:12 +0100)]
Merge branch 'devel-2.3'
* devel-2.3:
Remove unused import from client.gnt_instance
gnt-instance console: Improve error reporting
Increase timeout for connection on remote import
import-export: Improve timeout error reporting
Conflicts:
lib/cmdlib.py: Trivial
lib/opcodes.py: Trivial
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 6 Jan 2011 19:34:26 +0000 (20:34 +0100)]
Remove unused import from client.gnt_instance
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Adeodato Simo <dato@google.com>
David Knowles [Thu, 6 Jan 2011 22:09:47 +0000 (17:09 -0500)]
Updating hooks documentation with missing environment variables
Signed-off-by: David Knowles <dknowles@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 6 Jan 2011 15:38:31 +0000 (16:38 +0100)]
gnt-instance console: Improve error reporting
If the SSH command fails, this will give a more detailed error
message than before.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 25 Nov 2010 19:47:44 +0000 (20:47 +0100)]
Increase timeout for connection on remote import
The source cluster has to shut down an instance before it can be
exported. Doing so can take a while, but the default connection timeout
is only 60 seconds. Adding the shutdown timeout on the receiving cluster
should help.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit
dae91d021ecba089c478163b25dc426abb589351)
Michael Hanselmann [Thu, 6 Jan 2011 16:36:38 +0000 (17:36 +0100)]
import-export: Improve timeout error reporting
When the source cluster takes too long to create a snapshot, the
destination would time out. Unfortunately no good error message was
written unless debug logging was enabled, not even to the log file. This
will be improved with this patch.
Another patch to be backported from master will hopefully avoid this
situation completely.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 6 Jan 2011 13:41:27 +0000 (14:41 +0100)]
List recorded powered state in gnt-node info
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Thu, 6 Jan 2011 12:13:23 +0000 (13:13 +0100)]
Support query of node field 'powered'
This field is based on OOB support and is only available if there's oob
support for that node.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adeodato Simo [Wed, 5 Jan 2011 17:57:10 +0000 (17:57 +0000)]
qa_group.py: reimplement query tests with qa_utils
Now that group queries use query2 infrastructure, update the QA tests to
use the generic functions in qa_utils.py.
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Wed, 5 Jan 2011 16:42:59 +0000 (16:42 +0000)]
ganeti.query_unittest.py: add tests for group queries
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adeodato Simo [Wed, 5 Jan 2011 12:04:15 +0000 (12:04 +0000)]
Convert “gnt-group list” to query2
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adeodato Simo [Wed, 5 Jan 2011 12:40:51 +0000 (12:40 +0000)]
cmdlib.py: convert LUQueryGroups to query2
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adeodato Simo [Wed, 5 Jan 2011 11:58:19 +0000 (11:58 +0000)]
cmdlib.py: move _GetQueryImplementation to end of file
_GetQueryImplementation() uses _QUERY_IMPL, which list all query type
implementations. By moving it to the end of the file, we ensure all such
classes are defined by the time we list them.
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adeodato Simo [Wed, 5 Jan 2011 11:56:49 +0000 (11:56 +0000)]
query.py: add definitions for node group queries
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adeodato Simo [Wed, 5 Jan 2011 12:01:43 +0000 (12:01 +0000)]
constants.py: define QR_GROUP resource
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adeodato Simo [Thu, 6 Jan 2011 12:27:40 +0000 (12:27 +0000)]
ganeti.query_unittest.py: test lock fields too
Additionally, change TestQueryFields.testSomeFields() to handle lists of
fields shorter than 20 elements.
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 5 Jan 2011 13:19:29 +0000 (14:19 +0100)]
lvmstrap: also test sysfs holders
If a device has entries in its holder directory
(/sys/block/$name/holders), it means that some kernel system "uses"
that device, and hence should not be considered available.
This patch adds a new 'in-use' check based on this sysfs test, and
introduces an overal InUse function for devices.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Wed, 5 Jan 2011 12:56:00 +0000 (13:56 +0100)]
lvmstrap: add support for non-partitioned md disks
This patch, originally written by Marc Schmitt <mschmitt@google.com>,
adds support for MD devices (used in a non-partitioned mode). I
abstracted all the original startswith('md') checks into separate
functions, and also moved the supported disk types to a list.
Proper "in-use" detection also needs another check, which will come in
a subsequent patch.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Tue, 4 Jan 2011 09:59:05 +0000 (10:59 +0100)]
RPC: mark jobqueue functions as URGENT
Recently, we've seen more and more cases of a specific breakage
pattern in Ganeti: master candidates which are semi-alive (as in, they
respond to ping, they can complete a TCP/SSL handshake, but otherwise
the root filesystem is broken) cause lots of confusion within masterd.
My analysis shows that waiting up to 5 minutes for a reply from such a
broken master candidate is too long, and this long wait breaks other
timeouts (e.g. the Luxi timeout), making standard recovery from this
situation very hard. It's much easier to kill the master daemon, edit
manually the config file and mark the node as regular, then restart
the master daemon.
The proposal is therefore to reduce the timeout for the job queue
functions to TMO_URGENT (1 minute), which should be more balanced
between a working but overloaded node and a broken node.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 6 Jan 2011 10:27:23 +0000 (11:27 +0100)]
Merge branch 'devel-2.3'
* devel-2.3:
cfgupgrade: Remove unused “program” variable
cfgupgrade: Check master name, clarify question
Makefile: Merge build-time reST copying
Move doc/upgrade.rst to UPGRADE, copy at build-time
Import upgrade notes into documentation
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Thu, 6 Jan 2011 10:25:32 +0000 (11:25 +0100)]
cfgupgrade: Remove unused “program” variable
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
René Nussbaumer [Thu, 6 Jan 2011 10:06:34 +0000 (11:06 +0100)]
QA: Remove 'oob_program=default' on gnt-cluster modify
On cluster level there's no 'default' because it's the highest cascading
level. Due to this 'default' is a valid value and doesn't mean to remove
the value from the dict like it does on the other levels. To overcome
this we just empty the string.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Thu, 6 Jan 2011 10:08:11 +0000 (11:08 +0100)]
Convert “gnt-debug locks” to query2
Locks can now be queried using “Query(what="lock", …)” over LUXI.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 5 Jan 2011 17:41:59 +0000 (18:41 +0100)]
cfgupgrade: Check master name, clarify question
- Check hostname and abort if it doesn't match contents of
“ssconf_master_node”, can be overridden using “--ignore-hostname”
parameter.
- Clarify confirmation question and don't mention instances anymore.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 5 Jan 2011 17:52:29 +0000 (18:52 +0100)]
Makefile: Merge build-time reST copying
No need to copy this snippet around, “make” can work harder for us.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 5 Jan 2011 17:48:29 +0000 (18:48 +0100)]
Move doc/upgrade.rst to UPGRADE, copy at build-time
This will allow distributions to install the file as text documentation.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 5 Jan 2011 15:22:33 +0000 (16:22 +0100)]
Import upgrade notes into documentation
This patch formats the upgrade notes currently in the wiki[1] as reST
and adds them to the documentation.
[1] http://code.google.com/p/ganeti/wiki/UpgradeNotes
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 5 Jan 2011 16:45:54 +0000 (17:45 +0100)]
Fix OpSetInstanceParams.disk_template check
When moving the opcode parameters I moved two or three checks from an
opcode's CheckArguments function to the type checks. This was one of
them and unfortunately I didn't notice the parameter can be None.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 5 Jan 2011 11:54:15 +0000 (12:54 +0100)]
RAPI: Add resource to grow instance disk
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Adeodato Simo [Tue, 4 Jan 2011 21:01:29 +0000 (21:01 +0000)]
Reword "one of hmgt" as "one of h/m/g/t" for clarity
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Tue, 4 Jan 2011 15:21:32 +0000 (16:21 +0100)]
QA: Adding new cluster verify cases
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Tue, 4 Jan 2011 12:34:49 +0000 (13:34 +0100)]
out of band verification in gnt-cluster verify
This adds the verify tests for out of band management
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Tue, 4 Jan 2011 10:15:53 +0000 (11:15 +0100)]
Adding additional VerifyNode checks to backend
This adds checks for out of band support. The helpers have to exist and
they have to be executable.
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Tue, 4 Jan 2011 17:30:29 +0000 (18:30 +0100)]
RAPI: Add resource to modify cluster
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 4 Jan 2011 16:29:50 +0000 (17:29 +0100)]
baserlib: Add function for filling opcodes
This function makes use of the opcode parameters which now live
directly in the opcode. A number of RAPI resources can now be simplified.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 31 Dec 2010 15:42:49 +0000 (16:42 +0100)]
Improve opcode summary tests
Test full summary instead of just format.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 31 Dec 2010 15:39:53 +0000 (16:39 +0100)]
Migrate code verifying opcode parameters to base class
This allows the function to be used in other places as well.
An optional parameter is added to control whether default
values should be set. Unittests are added, providing full
coverage.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 30 Dec 2010 17:22:05 +0000 (18:22 +0100)]
Improve tests for OP_ID
… by detecting duplicates.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 30 Dec 2010 15:50:43 +0000 (16:50 +0100)]
cmdlib: Remove opcode parameters
Remove the parameter definitions and use those from the opcode classes
instead. Small style changes are also made (empty lines, wrapping).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 30 Dec 2010 15:50:17 +0000 (16:50 +0100)]
opcodes: Add opcode parameter definitions
This is the first step for migrating them from cmdlib. A metaclass is
used to define “__slots__” upon class creation time (not instantiation).
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 29 Dec 2010 17:03:36 +0000 (18:03 +0100)]
query2: Add new field status “offline”
This allows “gnt-node list” to show the difference between modes marked
offline and nodes with e.g. RPC errors (“(nodata)”). node1 is the
master, node2's node daemon crashed and node3 is marked offline:
$ gnt-node list -o name,offline,dtotal,dfree
Node Offline DTotal DFree
node1.example.com N 1.3T 1.3T
node2.example.com N (nodata) (nodata)
node3.example.com Y (offline) (offline)
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 30 Dec 2010 17:48:36 +0000 (18:48 +0100)]
QA: Fix out-of-band tests
- Handle situations with no non-master node
- Expand node name to make test work when configuration just has short
names (e.g. “node1” instead of “node1.example.com”)
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Tue, 4 Jan 2011 14:32:16 +0000 (15:32 +0100)]
Add unittests for ht module
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 4 Jan 2011 14:40:40 +0000 (15:40 +0100)]
ht.TInt: Exclude boolean values
See inline comment.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 4 Jan 2011 12:45:04 +0000 (13:45 +0100)]
Cleanup bootstrap.SetupNodeDaemon
- Code formatting
- Use ShellQuote for one argument
- Remove variables no longer used after commit
9294514d
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 31 Dec 2010 12:39:10 +0000 (13:39 +0100)]
Merge branch 'devel-2.3'
* devel-2.3:
Fix typo in gnt-instance manpage
jqueue: Fix cancelling while in waitlock in queue
cli: Extend message for LUXI timeouts
Fix timeout handling in LUXI client
Conflicts:
man/gnt-instance.sgml: Trivial merge in gnt-instance.rst
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 31 Dec 2010 12:08:16 +0000 (13:08 +0100)]
Fix build errors with ganeti-listrunner
- Remove non-ASCII character from manpage
- Reformat docstring for epydoc in script
These caused build breakage on some but not all distributions.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 31 Dec 2010 12:11:05 +0000 (13:11 +0100)]
Fix typo in gnt-instance manpage
s/os-name/os-type/. This was reported in issue 133.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 29 Dec 2010 16:42:28 +0000 (17:42 +0100)]
cli: Change “<…>” in query output to “(…)”
This should reduce the amount of damage in case of accidential copy &
paste.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 29 Dec 2010 16:35:52 +0000 (17:35 +0100)]
Initial import of listrunner
This tool was used and worked on internally for quite a long time. We
decided to include it in Ganeti.
Known issues:
- Code doesn't match rest of Ganeti (e.g. using “print” all over the
place, hardcoded calls to sys.exit deep in functions)
- Code duplication from Ganeti library (e.g. PingByTcp/netutils.TcpPing,
GetHosts/utils.ReadFile)
- Using ssh-agent doesn't work with more than one worker (Paramiko keeps
the socket open and the file descriptor is used from different
workers)
- No clear separation between parent and child process in code
- Uses getopt instead of optparse
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 21 Dec 2010 18:10:32 +0000 (19:10 +0100)]
jqueue: Fix cancelling while in waitlock in queue
Since the recent change to leave jobs in the “waitlock” status (commit
5fd6b6947), cancelling a job while it's back in the queue would break.
This patch handles these cases and adds a unittest.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 24 Dec 2010 08:27:15 +0000 (09:27 +0100)]
LUInstanceRename: log result of name resolving
While the LU does return the final name, it's useful to log the actual
DNS resolving process (input and output) in order to help with the
diagnose of failures.
The patch also fixes the docstring of the Exec() function.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Tue, 21 Dec 2010 16:39:42 +0000 (17:39 +0100)]
Fix QA for “list-fields” commands
The list of fields is not only sorted, but sorted in a nice way.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 21 Dec 2010 16:37:25 +0000 (17:37 +0100)]
Remove utils.FormatTimestampWithTZ
Long story short: time.strftime("%Z", time.localtime()) doesn't work,
even though it's documented to be equivalent to time.strftime("%Z").
$ TZ=America/Sao_Paulo python -c 'import time; print
time.strftime("%Z"), time.strftime("%Z", time.localtime())'
BRST LMT
References:
http://bugs.python.org/issue762963
https://bugs.launchpad.net/ubuntu/+source/python2.6/+bug/564607
http://stackoverflow.com/questions/4367896/issue-with-timezone-with-time-strftime
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 21 Dec 2010 16:34:43 +0000 (17:34 +0100)]
Ensure temp files from RunCmd tests are removed
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Tue, 21 Dec 2010 13:18:39 +0000 (14:18 +0100)]
Allow customisation of the disk index separator
As per issue 124, some Xen versions (or packaging) don't deal nicely
with the colon being part of a disk name. Therefore we add a
configure-time option for customising this.
Note: setting the separator to interesting values like / is not
handled by the code. This being a configure-time option (e.g. to be
set by distribution packagers), we assume the person building the code
knows what they are doing.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Michael Hanselmann [Fri, 17 Dec 2010 17:38:40 +0000 (18:38 +0100)]
utils: Timezone fixes and tests
- Update docstrings to explicitely mention Epoch
- Fix timezone bug in FormatTimestampWithTZ, where it would
use GMT/UTC when it should use the local timezone
- Add unittests for time formatting functions
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 16 Dec 2010 17:48:10 +0000 (18:48 +0100)]
query: Add wrapper for creating response object
It'll be used for querying locks.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 16 Dec 2010 16:38:19 +0000 (17:38 +0100)]
Move QueryFields to query module
Also replace “sorted” with “utils.NiceSort” now that it supports a key
function.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 20 Dec 2010 21:23:13 +0000 (22:23 +0100)]
cli: Extend message for LUXI timeouts
Point out that jobs already submitted continue to run.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 20 Dec 2010 19:20:18 +0000 (20:20 +0100)]
Fix timeout handling in LUXI client
If the socket can't be read in time, it raises “socket.timeout”, for
which there is special handling code. Unfortunately the exception block
was in the wrong order and “socket.error” caught it before.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 20 Dec 2010 14:34:11 +0000 (15:34 +0100)]
Merge branch 'devel-2.3'
* devel-2.3:
Prepare 2.3.1 release
Fix disk status verification in LUClusterVerify
Conflicts:
NEWS: Trivial
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 20 Dec 2010 14:18:36 +0000 (15:18 +0100)]
Merge branch 'stable-2.3' into devel-2.3
* stable-2.3:
Prepare 2.3.1 release
Fix disk status verification in LUClusterVerify
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 20 Dec 2010 13:43:16 +0000 (14:43 +0100)]
Add QA scripts to checked Python code
pylint is not yet included as the code needs some work for that.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 20 Dec 2010 13:38:53 +0000 (14:38 +0100)]
ganeti-qa: Wrap lines longer than 80 chars
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 20 Dec 2010 13:15:19 +0000 (14:15 +0100)]
Prepare 2.3.1 release
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 16 Dec 2010 14:09:15 +0000 (15:09 +0100)]
Adapt QA for change in behaviour
As we can't test this on master anymore (if we flag the node offline we
would change master role on master) we use the first non master node we
find in the configuration
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 16 Dec 2010 14:19:06 +0000 (15:19 +0100)]
gnt-node modify: Adding --node-powered=yes|no
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Thu, 16 Dec 2010 14:16:30 +0000 (15:16 +0100)]
LUSetNodeParams: Add support for powered state
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Thu, 16 Dec 2010 13:55:59 +0000 (14:55 +0100)]
LUSetNodeParams/LUOobCommand respect offline/powered
This patch makes sure we cross verify the state the node is
in with our view:
power off -> Node has to be set offline
modify -O no -> Node has to be powered
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Thu, 16 Dec 2010 13:55:11 +0000 (14:55 +0100)]
gnt-node power: Mark also offline when powering off
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Fri, 17 Dec 2010 16:08:41 +0000 (17:08 +0100)]
Merge branch 'devel-2.3'
* devel-2.3:
QA: Run cluster-verify as part of all instance tests
QA: Fix typo and add “not”
ensure-dirs: Speed up when using big queues
Fix gnt-cluster verify with diskless instances
Conflicts:
lib/cmdlib.py: Trivial
qa/ganeti-qa.py: Trivial
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 17 Dec 2010 14:05:57 +0000 (15:05 +0100)]
utils.NiceSort: Use sorted(), add keyfunc, unittests
This patch changes utils.NiceSort to use the built-in “sorted()” and
gets rid of the intermediate list. Instead of wrapping the items
ourselves, a key function is used. The caller can specify another key
function (useful to sort objects by their name, e.g.
“utils.NiceSort(instances, key=operator.attrgetter("name"))”.
Unittests are provided.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Thu, 16 Dec 2010 14:19:52 +0000 (15:19 +0100)]
QA: Run cluster-verify as part of all instance tests
“gnt-cluster verify” looks at some per-instance information as well, so
it should be run for each instance type QA tests.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 15 Dec 2010 19:03:18 +0000 (20:03 +0100)]
QA: Fix typo and add “not”
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Iustin Pop [Fri, 17 Dec 2010 14:20:57 +0000 (15:20 +0100)]
ShutdownInstanceDisks: accept offline secondaries
For secondary node that is offline, we should not consider that the
disk shutdown has failed, as it can never succeed under this cluster
state and (by virtue of the fact that the secondary node is offline)
the disks are already "shutdown".
The patch also fixes a tiny typo.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Iustin Pop [Fri, 17 Dec 2010 12:20:10 +0000 (13:20 +0100)]
RpcResult: simplify some asserts
data ≫ code, eom.
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Michael Hanselmann [Wed, 15 Dec 2010 17:53:34 +0000 (18:53 +0100)]
ensure-dirs: Speed up when using big queues
The “ensure-dirs” script as included in Ganeti 2.3 is very slow when
working with big queues requiring a change of permissions on many or all
files.
$ find /var/lib/ganeti/queue/ | wc -l
52354
Before this change:
$ time /usr/local/lib/ganeti/ensure-dirs -f
real 16m4.739s
While not adressed in this patch, I'd like to record the overall
ineffiency of the “ensure-dirs” script, even after this change:
$ time /usr/local/lib/ganeti/ensure-dirs -f
real 5m57.362s
[…]
$ strace -e clone,execve -f -c /usr/local/lib/ganeti/ensure-dirs -f
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
50.08 5.147090 49 104774 clone
49.92 5.131094 49 104739 execve
More changes will be needed. Just for comparision, a small Python
snippet changing permissions on all files (“ensure-dirs” changes the
owner too):
$ time python -c 'import os; from ganeti import utils;
[os.chmod(i, 0644) for i in
utils.ListVisibleFiles("/var/lib/ganeti/queue/archive/big")]'
real 0m0.605s
[…]
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Thu, 16 Dec 2010 12:37:44 +0000 (13:37 +0100)]
LUAddNode: default ndparams to empty dict when not provided
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Tue, 14 Dec 2010 16:15:18 +0000 (17:15 +0100)]
QA: Add some basic OOB tests
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Tue, 14 Dec 2010 16:14:55 +0000 (17:14 +0100)]
QA: Allow upload of string data
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adeodato Simo [Wed, 15 Dec 2010 17:40:30 +0000 (17:40 +0000)]
Fix gnt-cluster verify with diskless instances
`gnt-cluster verify` was failing with KeyError if there was any
diskless instance in the cluster. This was because _CollectDiskInfo()
was not including these instances in the returned dictionary, but they
were expected to be present in LUVerifyCluster.Exec().
With this commit, we ensure that the dictionary returned by _CollectDiskInfo
includes entries for diskless instances as well.
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Miguel Di Ciurcio Filho [Tue, 14 Dec 2010 13:18:29 +0000 (11:18 -0200)]
Fix N+1 error message
The error contained a typo and is slightly cumbersome. It changes from:
- ERROR: node a: not enough memory on to accommodate failovers should peer node
b fail
to:
- ERROR: node a: not enough memory to accomodate instance failovers should node
b fail
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
René Nussbaumer [Wed, 15 Dec 2010 15:18:44 +0000 (16:18 +0100)]
Rename (Op|LU)OutOfBand to (Op|LU)OobCommand
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Wed, 15 Dec 2010 13:45:31 +0000 (14:45 +0100)]
Merge branch 'devel-2.3'
* devel-2.3:
jqueue: Keep jobs in “waitlock” while returning to queue
Improve jqueue unittests
Update manpages to display version 2.3
Conflicts:
man/ganeti-cleaner.sgml: Removed
man/ganeti-confd.sgml: Removed
man/ganeti-masterd.sgml: Removed
man/ganeti-noded.sgml: Removed
man/ganeti-os-interface.sgml: Removed
man/ganeti-rapi.sgml: Removed
man/ganeti-watcher.sgml: Removed
man/ganeti.sgml: Removed
man/gnt-backup.sgml: Removed
man/gnt-cluster.sgml: Removed
man/gnt-debug.sgml: Removed
man/gnt-instance.sgml: Removed
man/gnt-job.sgml: Removed
man/gnt-node.sgml: Removed
man/gnt-os.sgml: Removed
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 14 Dec 2010 16:56:39 +0000 (17:56 +0100)]
jqueue: Keep jobs in “waitlock” while returning to queue
Iustin Pop reported that a job's file is updated many times while it
waits for locks held by other thread(s). After an investigation it was
concluded that the reason was a design decision for job priorities to
return jobs to the “queued” status if they couldn't acquire all locks.
Changing a jobs' status or priority requires an update to permanent
storage.
In a high-level view this is what happens:
1. Mark as waitlock
2. Write to disk as permanent storage (jobs left in this state by a
crashing master daemon are resumed on restart)
3. Wait for lock (assume lock is held by another thread)
4. Mark as queued
5. Write to disk again
6. Return to workerpool
Another option originally discussed was to leave the job in the
“waitlock” status. Ignoring priority changes, this is what would happen:
1. If not in waitlock
1.1. Assert state == queued
1.2. Mark as waitlock
1.3. Set start_timestamp
1.4. Write to disk as permanent storage
3. Wait for locks (assume lock is held by another thread)
4. Leave in waitlock
5. Return to workerpool
Now let's assume the lock is released by the other thread:
[…]
3. Wait for locks and get them
4. Assert state == waitlock
5. Set state to running
6. Set exec_timestamp
7. Write to disk
As this change reduces the number of writes from two per lock acquire
attempt to two per opcode and one per priority increase (as happens
after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
the highest priority is reached), here's the patch to implement it.
Unittests are updated.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Mon, 13 Dec 2010 17:32:27 +0000 (18:32 +0100)]
Improve jqueue unittests
- Verify job file updates
- Ensure queue lock is released while executing opcode
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Wed, 15 Dec 2010 10:15:03 +0000 (11:15 +0100)]
Adding gnt-node power * commands
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
René Nussbaumer [Wed, 15 Dec 2010 10:13:40 +0000 (11:13 +0100)]
Do the expanding of the node name in ExpandNames
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Michael Hanselmann [Tue, 14 Dec 2010 15:07:28 +0000 (16:07 +0100)]
client.gnt_node: Remove unnecessary lambda
Pylint complained that the “lambda may not be necessary”. Turns out it
was right.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Fri, 10 Dec 2010 15:45:20 +0000 (16:45 +0100)]
QA: Extend unittests for query operations, add tests for list-fields
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Wed, 8 Dec 2010 18:55:08 +0000 (19:55 +0100)]
Update NEWS for new query infrastructure
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Wed, 8 Dec 2010 19:41:43 +0000 (20:41 +0100)]
Convert “gnt-instance list” to query2
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Wed, 8 Dec 2010 17:54:50 +0000 (18:54 +0100)]
Convert “gnt-node list” to query2
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Michael Hanselmann [Wed, 8 Dec 2010 17:55:50 +0000 (18:55 +0100)]
cli: Add infrastructure for query2
A new function for formatting the query results is added,
``FormatTable``. This was determined to be easier and safer than
modifying the existing ``GenerateTable`` function while keeping
backwards compatibility for code not yet converted. The new code makes
use of the enhanced information provided by query2.
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Miguel Di Ciurcio Filho [Mon, 13 Dec 2010 19:07:34 +0000 (17:07 -0200)]
Update manpages to display version 2.3
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Balazs Lecz [Mon, 13 Dec 2010 19:30:03 +0000 (19:30 +0000)]
iallocator: Export node group allocation policy
Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Mon, 13 Dec 2010 14:13:53 +0000 (15:13 +0100)]
Adding --node-powered command line flag
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Mon, 13 Dec 2010 14:55:11 +0000 (15:55 +0100)]
Set powered to True for added nodes
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
René Nussbaumer [Mon, 13 Dec 2010 14:07:39 +0000 (15:07 +0100)]
Set recorded powered state for OOB calls
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>