ganeti-local
13 years agoganeti.query_unittest.py: test lock fields too
Adeodato Simo [Thu, 6 Jan 2011 12:27:40 +0000 (12:27 +0000)]
ganeti.query_unittest.py: test lock fields too

Additionally, change TestQueryFields.testSomeFields() to handle lists of
fields shorter than 20 elements.

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agolvmstrap: also test sysfs holders
Iustin Pop [Wed, 5 Jan 2011 13:19:29 +0000 (14:19 +0100)]
lvmstrap: also test sysfs holders

If a device has entries in its holder directory
(/sys/block/$name/holders), it means that some kernel system "uses"
that device, and hence should not be considered available.

This patch adds a new 'in-use' check based on this sysfs test, and
introduces an overal InUse function for devices.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agolvmstrap: add support for non-partitioned md disks
Iustin Pop [Wed, 5 Jan 2011 12:56:00 +0000 (13:56 +0100)]
lvmstrap: add support for non-partitioned md disks

This patch, originally written by Marc Schmitt <mschmitt@google.com>,
adds support for MD devices (used in a non-partitioned mode). I
abstracted all the original startswith('md') checks into separate
functions, and also moved the supported disk types to a list.

Proper "in-use" detection also needs another check, which will come in
a subsequent patch.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRPC: mark jobqueue functions as URGENT
Iustin Pop [Tue, 4 Jan 2011 09:59:05 +0000 (10:59 +0100)]
RPC: mark jobqueue functions as URGENT

Recently, we've seen more and more cases of a specific breakage
pattern in Ganeti: master candidates which are semi-alive (as in, they
respond to ping, they can complete a TCP/SSL handshake, but otherwise
the root filesystem is broken) cause lots of confusion within masterd.

My analysis shows that waiting up to 5 minutes for a reply from such a
broken master candidate is too long, and this long wait breaks other
timeouts (e.g. the Luxi timeout), making standard recovery from this
situation very hard. It's much easier to kill the master daemon, edit
manually the config file and mark the node as regular, then restart
the master daemon.

The proposal is therefore to reduce the timeout for the job queue
functions to TMO_URGENT (1 minute), which should be more balanced
between a working but overloaded node and a broken node.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'devel-2.3'
Michael Hanselmann [Thu, 6 Jan 2011 10:27:23 +0000 (11:27 +0100)]
Merge branch 'devel-2.3'

* devel-2.3:
  cfgupgrade: Remove unused “program” variable
  cfgupgrade: Check master name, clarify question
  Makefile: Merge build-time reST copying
  Move doc/upgrade.rst to UPGRADE, copy at build-time
  Import upgrade notes into documentation

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agocfgupgrade: Remove unused “program” variable
Michael Hanselmann [Thu, 6 Jan 2011 10:25:32 +0000 (11:25 +0100)]
cfgupgrade: Remove unused “program” variable

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoQA: Remove 'oob_program=default' on gnt-cluster modify
René Nussbaumer [Thu, 6 Jan 2011 10:06:34 +0000 (11:06 +0100)]
QA: Remove 'oob_program=default' on gnt-cluster modify

On cluster level there's no 'default' because it's the highest cascading
level. Due to this 'default' is a valid value and doesn't mean to remove
the value from the dict like it does on the other levels. To overcome
this we just empty the string.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoConvert “gnt-debug locks” to query2
Michael Hanselmann [Thu, 6 Jan 2011 10:08:11 +0000 (11:08 +0100)]
Convert “gnt-debug locks” to query2

Locks can now be queried using “Query(what="lock", …)” over LUXI.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocfgupgrade: Check master name, clarify question
Michael Hanselmann [Wed, 5 Jan 2011 17:41:59 +0000 (18:41 +0100)]
cfgupgrade: Check master name, clarify question

- Check hostname and abort if it doesn't match contents of
  “ssconf_master_node”, can be overridden using “--ignore-hostname”
  parameter.
- Clarify confirmation question and don't mention instances anymore.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMakefile: Merge build-time reST copying
Michael Hanselmann [Wed, 5 Jan 2011 17:52:29 +0000 (18:52 +0100)]
Makefile: Merge build-time reST copying

No need to copy this snippet around, “make” can work harder for us.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove doc/upgrade.rst to UPGRADE, copy at build-time
Michael Hanselmann [Wed, 5 Jan 2011 17:48:29 +0000 (18:48 +0100)]
Move doc/upgrade.rst to UPGRADE, copy at build-time

This will allow distributions to install the file as text documentation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoImport upgrade notes into documentation
Michael Hanselmann [Wed, 5 Jan 2011 15:22:33 +0000 (16:22 +0100)]
Import upgrade notes into documentation

This patch formats the upgrade notes currently in the wiki[1] as reST
and adds them to the documentation.

[1] http://code.google.com/p/ganeti/wiki/UpgradeNotes

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix OpSetInstanceParams.disk_template check
Michael Hanselmann [Wed, 5 Jan 2011 16:45:54 +0000 (17:45 +0100)]
Fix OpSetInstanceParams.disk_template check

When moving the opcode parameters I moved two or three checks from an
opcode's CheckArguments function to the type checks. This was one of
them and unfortunately I didn't notice the parameter can be None.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRAPI: Add resource to grow instance disk
Michael Hanselmann [Wed, 5 Jan 2011 11:54:15 +0000 (12:54 +0100)]
RAPI: Add resource to grow instance disk

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoReword "one of hmgt" as "one of h/m/g/t" for clarity
Adeodato Simo [Tue, 4 Jan 2011 21:01:29 +0000 (21:01 +0000)]
Reword "one of hmgt" as "one of h/m/g/t" for clarity

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoQA: Adding new cluster verify cases
René Nussbaumer [Tue, 4 Jan 2011 15:21:32 +0000 (16:21 +0100)]
QA: Adding new cluster verify cases

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoout of band verification in gnt-cluster verify
René Nussbaumer [Tue, 4 Jan 2011 12:34:49 +0000 (13:34 +0100)]
out of band verification in gnt-cluster verify

This adds the verify tests for out of band management

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdding additional VerifyNode checks to backend
René Nussbaumer [Tue, 4 Jan 2011 10:15:53 +0000 (11:15 +0100)]
Adding additional VerifyNode checks to backend

This adds checks for out of band support. The helpers have to exist and
they have to be executable.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRAPI: Add resource to modify cluster
Michael Hanselmann [Tue, 4 Jan 2011 17:30:29 +0000 (18:30 +0100)]
RAPI: Add resource to modify cluster

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agobaserlib: Add function for filling opcodes
Michael Hanselmann [Tue, 4 Jan 2011 16:29:50 +0000 (17:29 +0100)]
baserlib: Add function for filling opcodes

This function makes use of the opcode parameters which now live
directly in the opcode. A number of RAPI resources can now be simplified.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoImprove opcode summary tests
Michael Hanselmann [Fri, 31 Dec 2010 15:42:49 +0000 (16:42 +0100)]
Improve opcode summary tests

Test full summary instead of just format.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMigrate code verifying opcode parameters to base class
Michael Hanselmann [Fri, 31 Dec 2010 15:39:53 +0000 (16:39 +0100)]
Migrate code verifying opcode parameters to base class

This allows the function to be used in other places as well.
An optional parameter is added to control whether default
values should be set. Unittests are added, providing full
coverage.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoImprove tests for OP_ID
Michael Hanselmann [Thu, 30 Dec 2010 17:22:05 +0000 (18:22 +0100)]
Improve tests for OP_ID

… by detecting duplicates.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocmdlib: Remove opcode parameters
Michael Hanselmann [Thu, 30 Dec 2010 15:50:43 +0000 (16:50 +0100)]
cmdlib: Remove opcode parameters

Remove the parameter definitions and use those from the opcode classes
instead. Small style changes are also made (empty lines, wrapping).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoopcodes: Add opcode parameter definitions
Michael Hanselmann [Thu, 30 Dec 2010 15:50:17 +0000 (16:50 +0100)]
opcodes: Add opcode parameter definitions

This is the first step for migrating them from cmdlib. A metaclass is
used to define “__slots__” upon class creation time (not instantiation).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoquery2: Add new field status “offline”
Michael Hanselmann [Wed, 29 Dec 2010 17:03:36 +0000 (18:03 +0100)]
query2: Add new field status “offline”

This allows “gnt-node list” to show the difference between modes marked
offline and nodes with e.g. RPC errors (“(nodata)”). node1 is the
master, node2's node daemon crashed and node3 is marked offline:

$ gnt-node list -o name,offline,dtotal,dfree
Node              Offline    DTotal     DFree
node1.example.com N            1.3T      1.3T
node2.example.com N        (nodata)  (nodata)
node3.example.com Y       (offline) (offline)

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoQA: Fix out-of-band tests
Michael Hanselmann [Thu, 30 Dec 2010 17:48:36 +0000 (18:48 +0100)]
QA: Fix out-of-band tests

- Handle situations with no non-master node
- Expand node name to make test work when configuration just has short
  names (e.g. “node1” instead of “node1.example.com”)

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoAdd unittests for ht module
Michael Hanselmann [Tue, 4 Jan 2011 14:32:16 +0000 (15:32 +0100)]
Add unittests for ht module

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoht.TInt: Exclude boolean values
Michael Hanselmann [Tue, 4 Jan 2011 14:40:40 +0000 (15:40 +0100)]
ht.TInt: Exclude boolean values

See inline comment.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoCleanup bootstrap.SetupNodeDaemon
Michael Hanselmann [Tue, 4 Jan 2011 12:45:04 +0000 (13:45 +0100)]
Cleanup bootstrap.SetupNodeDaemon

- Code formatting
- Use ShellQuote for one argument
- Remove variables no longer used after commit 9294514d

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoMerge branch 'devel-2.3'
Michael Hanselmann [Fri, 31 Dec 2010 12:39:10 +0000 (13:39 +0100)]
Merge branch 'devel-2.3'

* devel-2.3:
  Fix typo in gnt-instance manpage
  jqueue: Fix cancelling while in waitlock in queue
  cli: Extend message for LUXI timeouts
  Fix timeout handling in LUXI client

Conflicts:
man/gnt-instance.sgml: Trivial merge in gnt-instance.rst

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix build errors with ganeti-listrunner
Michael Hanselmann [Fri, 31 Dec 2010 12:08:16 +0000 (13:08 +0100)]
Fix build errors with ganeti-listrunner

- Remove non-ASCII character from manpage
- Reformat docstring for epydoc in script

These caused build breakage on some but not all distributions.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix typo in gnt-instance manpage
Michael Hanselmann [Fri, 31 Dec 2010 12:11:05 +0000 (13:11 +0100)]
Fix typo in gnt-instance manpage

s/os-name/os-type/. This was reported in issue 133.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocli: Change “<…>” in query output to “(…)”
Michael Hanselmann [Wed, 29 Dec 2010 16:42:28 +0000 (17:42 +0100)]
cli: Change “<…>” in query output to “(…)”

This should reduce the amount of damage in case of accidential copy &
paste.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoInitial import of listrunner
Michael Hanselmann [Wed, 29 Dec 2010 16:35:52 +0000 (17:35 +0100)]
Initial import of listrunner

This tool was used and worked on internally for quite a long time. We
decided to include it in Ganeti.

Known issues:
- Code doesn't match rest of Ganeti (e.g. using “print” all over the
  place, hardcoded calls to sys.exit deep in functions)
- Code duplication from Ganeti library (e.g. PingByTcp/netutils.TcpPing,
  GetHosts/utils.ReadFile)
- Using ssh-agent doesn't work with more than one worker (Paramiko keeps
  the socket open and the file descriptor is used from different
  workers)
  - No clear separation between parent and child process in code
- Uses getopt instead of optparse

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue: Fix cancelling while in waitlock in queue
Michael Hanselmann [Tue, 21 Dec 2010 18:10:32 +0000 (19:10 +0100)]
jqueue: Fix cancelling while in waitlock in queue

Since the recent change to leave jobs in the “waitlock” status (commit
5fd6b6947), cancelling a job while it's back in the queue would break.
This patch handles these cases and adds a unittest.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLUInstanceRename: log result of name resolving
Iustin Pop [Fri, 24 Dec 2010 08:27:15 +0000 (09:27 +0100)]
LUInstanceRename: log result of name resolving

While the LU does return the final name, it's useful to log the actual
DNS resolving process (input and output) in order to help with the
diagnose of failures.

The patch also fixes the docstring of the Exec() function.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoFix QA for “list-fields” commands
Michael Hanselmann [Tue, 21 Dec 2010 16:39:42 +0000 (17:39 +0100)]
Fix QA for “list-fields” commands

The list of fields is not only sorted, but sorted in a nice way.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRemove utils.FormatTimestampWithTZ
Michael Hanselmann [Tue, 21 Dec 2010 16:37:25 +0000 (17:37 +0100)]
Remove utils.FormatTimestampWithTZ

Long story short: time.strftime("%Z", time.localtime()) doesn't work,
even though it's documented to be equivalent to time.strftime("%Z").

$ TZ=America/Sao_Paulo python -c 'import time; print
time.strftime("%Z"), time.strftime("%Z", time.localtime())'
BRST LMT

References:
http://bugs.python.org/issue762963
https://bugs.launchpad.net/ubuntu/+source/python2.6/+bug/564607
http://stackoverflow.com/questions/4367896/issue-with-timezone-with-time-strftime

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoEnsure temp files from RunCmd tests are removed
Michael Hanselmann [Tue, 21 Dec 2010 16:34:43 +0000 (17:34 +0100)]
Ensure temp files from RunCmd tests are removed

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAllow customisation of the disk index separator
Iustin Pop [Tue, 21 Dec 2010 13:18:39 +0000 (14:18 +0100)]
Allow customisation of the disk index separator

As per issue 124, some Xen versions (or packaging) don't deal nicely
with the colon being part of a disk name. Therefore we add a
configure-time option for customising this.

Note: setting the separator to interesting values like / is not
handled by the code. This being a configure-time option (e.g. to be
set by distribution packagers), we assume the person building the code
knows what they are doing.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoutils: Timezone fixes and tests
Michael Hanselmann [Fri, 17 Dec 2010 17:38:40 +0000 (18:38 +0100)]
utils: Timezone fixes and tests

- Update docstrings to explicitely mention Epoch
- Fix timezone bug in FormatTimestampWithTZ, where it would
  use GMT/UTC when it should use the local timezone
- Add unittests for time formatting functions

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoquery: Add wrapper for creating response object
Michael Hanselmann [Thu, 16 Dec 2010 17:48:10 +0000 (18:48 +0100)]
query: Add wrapper for creating response object

It'll be used for querying locks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove QueryFields to query module
Michael Hanselmann [Thu, 16 Dec 2010 16:38:19 +0000 (17:38 +0100)]
Move QueryFields to query module

Also replace “sorted” with “utils.NiceSort” now that it supports a key
function.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocli: Extend message for LUXI timeouts
Michael Hanselmann [Mon, 20 Dec 2010 21:23:13 +0000 (22:23 +0100)]
cli: Extend message for LUXI timeouts

Point out that jobs already submitted continue to run.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix timeout handling in LUXI client
Michael Hanselmann [Mon, 20 Dec 2010 19:20:18 +0000 (20:20 +0100)]
Fix timeout handling in LUXI client

If the socket can't be read in time, it raises “socket.timeout”, for
which there is special handling code. Unfortunately the exception block
was in the wrong order and “socket.error” caught it before.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.3'
Michael Hanselmann [Mon, 20 Dec 2010 14:34:11 +0000 (15:34 +0100)]
Merge branch 'devel-2.3'

* devel-2.3:
  Prepare 2.3.1 release
  Fix disk status verification in LUClusterVerify

Conflicts:
NEWS: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'stable-2.3' into devel-2.3
Michael Hanselmann [Mon, 20 Dec 2010 14:18:36 +0000 (15:18 +0100)]
Merge branch 'stable-2.3' into devel-2.3

* stable-2.3:
  Prepare 2.3.1 release
  Fix disk status verification in LUClusterVerify

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd QA scripts to checked Python code
Michael Hanselmann [Mon, 20 Dec 2010 13:43:16 +0000 (14:43 +0100)]
Add QA scripts to checked Python code

pylint is not yet included as the code needs some work for that.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoganeti-qa: Wrap lines longer than 80 chars
Michael Hanselmann [Mon, 20 Dec 2010 13:38:53 +0000 (14:38 +0100)]
ganeti-qa: Wrap lines longer than 80 chars

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoPrepare 2.3.1 release v2.3.1
Michael Hanselmann [Mon, 20 Dec 2010 13:15:19 +0000 (14:15 +0100)]
Prepare 2.3.1 release

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdapt QA for change in behaviour
René Nussbaumer [Thu, 16 Dec 2010 14:09:15 +0000 (15:09 +0100)]
Adapt QA for change in behaviour

As we can't test this on master anymore (if we flag the node offline we
would change master role on master) we use the first non master node we
find in the configuration

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agognt-node modify: Adding --node-powered=yes|no
René Nussbaumer [Thu, 16 Dec 2010 14:19:06 +0000 (15:19 +0100)]
gnt-node modify: Adding --node-powered=yes|no

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoLUSetNodeParams: Add support for powered state
René Nussbaumer [Thu, 16 Dec 2010 14:16:30 +0000 (15:16 +0100)]
LUSetNodeParams: Add support for powered state

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoLUSetNodeParams/LUOobCommand respect offline/powered
René Nussbaumer [Thu, 16 Dec 2010 13:55:59 +0000 (14:55 +0100)]
LUSetNodeParams/LUOobCommand respect offline/powered

This patch makes sure we cross verify the state the node is
in with our view:

power off -> Node has to be set offline
modify -O no -> Node has to be powered

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agognt-node power: Mark also offline when powering off
René Nussbaumer [Thu, 16 Dec 2010 13:55:11 +0000 (14:55 +0100)]
gnt-node power: Mark also offline when powering off

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.3'
Michael Hanselmann [Fri, 17 Dec 2010 16:08:41 +0000 (17:08 +0100)]
Merge branch 'devel-2.3'

* devel-2.3:
  QA: Run cluster-verify as part of all instance tests
  QA: Fix typo and add “not”
  ensure-dirs: Speed up when using big queues
  Fix gnt-cluster verify with diskless instances

Conflicts:
lib/cmdlib.py: Trivial
qa/ganeti-qa.py: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoutils.NiceSort: Use sorted(), add keyfunc, unittests
Michael Hanselmann [Fri, 17 Dec 2010 14:05:57 +0000 (15:05 +0100)]
utils.NiceSort: Use sorted(), add keyfunc, unittests

This patch changes utils.NiceSort to use the built-in “sorted()” and
gets rid of the intermediate list. Instead of wrapping the items
ourselves, a key function is used. The caller can specify another key
function (useful to sort objects by their name, e.g.
“utils.NiceSort(instances, key=operator.attrgetter("name"))”.

Unittests are provided.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoQA: Run cluster-verify as part of all instance tests
Michael Hanselmann [Thu, 16 Dec 2010 14:19:52 +0000 (15:19 +0100)]
QA: Run cluster-verify as part of all instance tests

“gnt-cluster verify” looks at some per-instance information as well, so
it should be run for each instance type QA tests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoQA: Fix typo and add “not”
Michael Hanselmann [Wed, 15 Dec 2010 19:03:18 +0000 (20:03 +0100)]
QA: Fix typo and add “not”

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoShutdownInstanceDisks: accept offline secondaries
Iustin Pop [Fri, 17 Dec 2010 14:20:57 +0000 (15:20 +0100)]
ShutdownInstanceDisks: accept offline secondaries

For secondary node that is offline, we should not consider that the
disk shutdown has failed, as it can never succeed under this cluster
state and (by virtue of the fact that the secondary node is offline)
the disks are already "shutdown".

The patch also fixes a tiny typo.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRpcResult: simplify some asserts
Iustin Pop [Fri, 17 Dec 2010 12:20:10 +0000 (13:20 +0100)]
RpcResult: simplify some asserts

data ≫ code, eom.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoensure-dirs: Speed up when using big queues
Michael Hanselmann [Wed, 15 Dec 2010 17:53:34 +0000 (18:53 +0100)]
ensure-dirs: Speed up when using big queues

The “ensure-dirs” script as included in Ganeti 2.3 is very slow when
working with big queues requiring a change of permissions on many or all
files.

$ find /var/lib/ganeti/queue/ | wc -l
52354

Before this change:
$ time /usr/local/lib/ganeti/ensure-dirs -f
real    16m4.739s

While not adressed in this patch, I'd like to record the overall
ineffiency of the “ensure-dirs” script, even after this change:

$ time /usr/local/lib/ganeti/ensure-dirs -f
real    5m57.362s
[…]
$ strace -e clone,execve -f -c /usr/local/lib/ganeti/ensure-dirs -f
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.08    5.147090          49    104774           clone
 49.92    5.131094          49    104739           execve

More changes will be needed. Just for comparision, a small Python
snippet changing permissions on all files (“ensure-dirs” changes the
owner too):

$ time python -c 'import os; from ganeti import utils;
[os.chmod(i, 0644) for i in
utils.ListVisibleFiles("/var/lib/ganeti/queue/archive/big")]'
real    0m0.605s
[…]

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLUAddNode: default ndparams to empty dict when not provided
René Nussbaumer [Thu, 16 Dec 2010 12:37:44 +0000 (13:37 +0100)]
LUAddNode: default ndparams to empty dict when not provided

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoQA: Add some basic OOB tests
René Nussbaumer [Tue, 14 Dec 2010 16:15:18 +0000 (17:15 +0100)]
QA: Add some basic OOB tests

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoQA: Allow upload of string data
René Nussbaumer [Tue, 14 Dec 2010 16:14:55 +0000 (17:14 +0100)]
QA: Allow upload of string data

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix gnt-cluster verify with diskless instances
Adeodato Simo [Wed, 15 Dec 2010 17:40:30 +0000 (17:40 +0000)]
Fix gnt-cluster verify with diskless instances

`gnt-cluster verify` was failing with KeyError if there was any
diskless instance in the cluster. This was because _CollectDiskInfo()
was not including these instances in the returned dictionary, but they
were expected to be present in LUVerifyCluster.Exec().

With this commit, we ensure that the dictionary returned by _CollectDiskInfo
includes entries for diskless instances as well.

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix N+1 error message
Miguel Di Ciurcio Filho [Tue, 14 Dec 2010 13:18:29 +0000 (11:18 -0200)]
Fix N+1 error message

The error contained a typo and is slightly cumbersome. It changes from:

- ERROR: node a: not enough memory on to accommodate failovers should peer node
  b fail

to:

- ERROR: node a: not enough memory to accomodate instance failovers should node
  b fail

Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRename (Op|LU)OutOfBand to (Op|LU)OobCommand
René Nussbaumer [Wed, 15 Dec 2010 15:18:44 +0000 (16:18 +0100)]
Rename (Op|LU)OutOfBand to (Op|LU)OobCommand

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.3'
Michael Hanselmann [Wed, 15 Dec 2010 13:45:31 +0000 (14:45 +0100)]
Merge branch 'devel-2.3'

* devel-2.3:
  jqueue: Keep jobs in “waitlock” while returning to queue
  Improve jqueue unittests
  Update manpages to display version 2.3

Conflicts:
man/ganeti-cleaner.sgml: Removed
man/ganeti-confd.sgml: Removed
man/ganeti-masterd.sgml: Removed
man/ganeti-noded.sgml: Removed
man/ganeti-os-interface.sgml: Removed
man/ganeti-rapi.sgml: Removed
man/ganeti-watcher.sgml: Removed
man/ganeti.sgml: Removed
man/gnt-backup.sgml: Removed
man/gnt-cluster.sgml: Removed
man/gnt-debug.sgml: Removed
man/gnt-instance.sgml: Removed
man/gnt-job.sgml: Removed
man/gnt-node.sgml: Removed
man/gnt-os.sgml: Removed

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue: Keep jobs in “waitlock” while returning to queue
Michael Hanselmann [Tue, 14 Dec 2010 16:56:39 +0000 (17:56 +0100)]
jqueue: Keep jobs in “waitlock” while returning to queue

Iustin Pop reported that a job's file is updated many times while it
waits for locks held by other thread(s). After an investigation it was
concluded that the reason was a design decision for job priorities to
return jobs to the “queued” status if they couldn't acquire all locks.
Changing a jobs' status or priority requires an update to permanent
storage.

In a high-level view this is what happens:
1. Mark as waitlock
2. Write to disk as permanent storage (jobs left in this state by a
   crashing master daemon are resumed on restart)
3. Wait for lock (assume lock is held by another thread)
4. Mark as queued
5. Write to disk again
6. Return to workerpool

Another option originally discussed was to leave the job in the
“waitlock” status. Ignoring priority changes, this is what would happen:
1. If not in waitlock
1.1. Assert state == queued
1.2. Mark as waitlock
1.3. Set start_timestamp
1.4. Write to disk as permanent storage
3. Wait for locks (assume lock is held by another thread)
4. Leave in waitlock
5. Return to workerpool

Now let's assume the lock is released by the other thread:
[…]
3. Wait for locks and get them
4. Assert state == waitlock
5. Set state to running
6. Set exec_timestamp
7. Write to disk

As this change reduces the number of writes from two per lock acquire
attempt to two per opcode and one per priority increase (as happens
after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
the highest priority is reached), here's the patch to implement it.
Unittests are updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoImprove jqueue unittests
Michael Hanselmann [Mon, 13 Dec 2010 17:32:27 +0000 (18:32 +0100)]
Improve jqueue unittests

- Verify job file updates
- Ensure queue lock is released while executing opcode

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdding gnt-node power * commands
René Nussbaumer [Wed, 15 Dec 2010 10:15:03 +0000 (11:15 +0100)]
Adding gnt-node power * commands

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoDo the expanding of the node name in ExpandNames
René Nussbaumer [Wed, 15 Dec 2010 10:13:40 +0000 (11:13 +0100)]
Do the expanding of the node name in ExpandNames

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoclient.gnt_node: Remove unnecessary lambda
Michael Hanselmann [Tue, 14 Dec 2010 15:07:28 +0000 (16:07 +0100)]
client.gnt_node: Remove unnecessary lambda

Pylint complained that the “lambda may not be necessary”. Turns out it
was right.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoQA: Extend unittests for query operations, add tests for list-fields
Michael Hanselmann [Fri, 10 Dec 2010 15:45:20 +0000 (16:45 +0100)]
QA: Extend unittests for query operations, add tests for list-fields

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoUpdate NEWS for new query infrastructure
Michael Hanselmann [Wed, 8 Dec 2010 18:55:08 +0000 (19:55 +0100)]
Update NEWS for new query infrastructure

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoConvert “gnt-instance list” to query2
Michael Hanselmann [Wed, 8 Dec 2010 19:41:43 +0000 (20:41 +0100)]
Convert “gnt-instance list” to query2

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoConvert “gnt-node list” to query2
Michael Hanselmann [Wed, 8 Dec 2010 17:54:50 +0000 (18:54 +0100)]
Convert “gnt-node list” to query2

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agocli: Add infrastructure for query2
Michael Hanselmann [Wed, 8 Dec 2010 17:55:50 +0000 (18:55 +0100)]
cli: Add infrastructure for query2

A new function for formatting the query results is added,
``FormatTable``. This was determined to be easier and safer than
modifying the existing ``GenerateTable`` function while keeping
backwards compatibility for code not yet converted. The new code makes
use of the enhanced information provided by query2.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoUpdate manpages to display version 2.3
Miguel Di Ciurcio Filho [Mon, 13 Dec 2010 19:07:34 +0000 (17:07 -0200)]
Update manpages to display version 2.3

Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoiallocator: Export node group allocation policy
Balazs Lecz [Mon, 13 Dec 2010 19:30:03 +0000 (19:30 +0000)]
iallocator: Export node group allocation policy

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdding --node-powered command line flag
René Nussbaumer [Mon, 13 Dec 2010 14:13:53 +0000 (15:13 +0100)]
Adding --node-powered command line flag

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoSet powered to True for added nodes
René Nussbaumer [Mon, 13 Dec 2010 14:55:11 +0000 (15:55 +0100)]
Set powered to True for added nodes

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoSet recorded powered state for OOB calls
René Nussbaumer [Mon, 13 Dec 2010 14:07:39 +0000 (15:07 +0100)]
Set recorded powered state for OOB calls

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd new Node attribute powered
René Nussbaumer [Mon, 13 Dec 2010 13:52:46 +0000 (14:52 +0100)]
Add new Node attribute powered

This is just a state of record field and does not necessary
reflect the reality.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd ganeti-kvm-poweroff.initd to .gitignore
Adeodato Simo [Mon, 13 Dec 2010 19:16:20 +0000 (19:16 +0000)]
Add ganeti-kvm-poweroff.initd to .gitignore

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMore QA tests for group operations
Adeodato Simo [Thu, 9 Dec 2010 16:15:58 +0000 (16:15 +0000)]
More QA tests for group operations

This adds QA tests for the SetGroupParams operation, both for CLI and
RAPI. Additionally, it adds tests for add/rename/remove groups via RAPI,
which had not been included in a previous patch series. Finally, it also
tests setting "alloc_policy" (and, for the CLI, "ndparams") at group
creation time.

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd some unit tests for ConfigWriter.AddNodeGroup()
Adeodato Simo [Thu, 9 Dec 2010 16:45:55 +0000 (16:45 +0000)]
Add some unit tests for ConfigWriter.AddNodeGroup()

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoExpose OpSetGroupParams in RAPI and RAPI client
Adeodato Simo [Thu, 9 Dec 2010 15:43:54 +0000 (15:43 +0000)]
Expose OpSetGroupParams in RAPI and RAPI client

This creates the /2/groups/<name>/modify resource; at the moment, only the
"alloc_policy" attribute can be modified.

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd the "alloc_policy" attribute to node groups
Adeodato Simo [Thu, 9 Dec 2010 15:16:23 +0000 (15:16 +0000)]
Add the "alloc_policy" attribute to node groups

This can be set at group creation time and via OpSetGroupParams. The default
is "preferred", and existing node groups from previous Ganeti version will
get the attribute set to this value.

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAdd modification of node groups (OpCode/LU/CLI)
Adeodato Simo [Thu, 9 Dec 2010 12:55:17 +0000 (12:55 +0000)]
Add modification of node groups (OpCode/LU/CLI)

With this commit, only modification of the "ndparams" attribute is
supported.

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoIntroduce OpAddGroup.ndparams and expose in CLI
Adeodato Simo [Wed, 8 Dec 2010 19:49:24 +0000 (19:49 +0000)]
Introduce OpAddGroup.ndparams and expose in CLI

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoFix sorting bug in LUQueryGroups
Adeodato Simo [Thu, 9 Dec 2010 14:25:51 +0000 (14:25 +0000)]
Fix sorting bug in LUQueryGroups

In LUQueryGroups.Exec(), NiceSort was being applied to group UUIDs, and
not to group names. We use a temporary name to UUID map to sort the list
of UUIDs by group name instead.

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agocmdlib: Sort list of fields for QueryFields
Michael Hanselmann [Wed, 8 Dec 2010 19:20:41 +0000 (20:20 +0100)]
cmdlib: Sort list of fields for QueryFields

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoobjects: Add custom de-/serializing code for query responses
Michael Hanselmann [Wed, 8 Dec 2010 17:54:14 +0000 (18:54 +0100)]
objects: Add custom de-/serializing code for query responses

… and use them in cmdlib.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoLUXI: Add Query and QueryFields functions
Michael Hanselmann [Mon, 6 Dec 2010 21:21:14 +0000 (22:21 +0100)]
LUXI: Add Query and QueryFields functions

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoobjects: Add definitions for query requests and responses
Michael Hanselmann [Mon, 6 Dec 2010 21:17:49 +0000 (22:17 +0100)]
objects: Add definitions for query requests and responses

Also update description of QueryFieldDefinition.name.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoqlang: Add function to build simple filter
Michael Hanselmann [Wed, 8 Dec 2010 17:53:07 +0000 (18:53 +0100)]
qlang: Add function to build simple filter

This will be used in clients to build the filters for query2.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoquery: Handle items missing timestamps
Michael Hanselmann [Fri, 10 Dec 2010 17:25:04 +0000 (18:25 +0100)]
query: Handle items missing timestamps

In upgraded configurations, some items might miss the “ctime” and/or
“mtime” values and need to be handled specially.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>