ganeti-local
13 years agomove-instance: Pass OS parameters to new instance
Michael Hanselmann [Thu, 29 Jul 2010 15:55:14 +0000 (17:55 +0200)]
move-instance: Pass OS parameters to new instance

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoUpdate NEWS file for the first release candidate
Iustin Pop [Fri, 30 Jul 2010 14:15:48 +0000 (10:15 -0400)]
Update NEWS file for the first release candidate

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix a few job archival issues
Iustin Pop [Thu, 29 Jul 2010 23:00:19 +0000 (19:00 -0400)]
Fix a few job archival issues

This patch fixes two issues with job archival. First, the
LoadJobFromDisk can return 'None' for no-such-job, and we shouldn't add
None to the job list; we can't anyway, as this raises an exception:

  node1# gnt-job archive foo
  Unhandled protocol error while talking to the master daemon:
  Caught exception: cannot create weak reference to 'NoneType' object

After fixing this, job archival of missing jobs will just continue
silently, so we modify gnt-job archive to log jobs which were not
archived and to return exit code 1 for any missing jobs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoburning: fix handling of empty job sets
Iustin Pop [Thu, 29 Jul 2010 22:37:10 +0000 (18:37 -0400)]
burning: fix handling of empty job sets

If we call burning with only existing instance, then it will fail to
create any of them, and thus in the removal phase it won't have anything
to remove. Since calling luxi.SUBMIT_MULTIPLE_JOBS with an empty job set
is an error (and will raise an exception), this creates a very strange
error in burnin (which is unfortunately hidden by ExecJobSet()).

As such, we modify CommitQueue to return immediately if it has an empty
op queue.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoChange semantics of --force-multi for reinstall
Iustin Pop [Thu, 29 Jul 2010 22:13:58 +0000 (18:13 -0400)]
Change semantics of --force-multi for reinstall

Currently, we require both --force and --force-multiple for skipping the
confirmation on instance reinstalls. After offline conversations, this
has been deemed to be excessive, and this patch changes the meaning of
--force-multiple to be a “stronger” force, and not require both.

So, to skip the prompts:
- single instance reinstallation requires either --force or
  --force-multiple
- multiple instance reinstallation requires --force-multiple

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoChange handling of non-Ganeti errors in jqueue
Iustin Pop [Thu, 29 Jul 2010 21:14:19 +0000 (17:14 -0400)]
Change handling of non-Ganeti errors in jqueue

Currently, if a job execution raises a Ganeti-specific error (i.e.
subclass of GenericError), then we encode it as (error class, [error
args]). This matches the RAPI documentation.

However, if we get a non-Ganeti error, then we encode it as simply
str(err), a single string. This means that the opresult field is not
according to the RAPI docs, and thus it's hard to reliably parse the
job results.

This patch changes the encoding of a failed job (via failure) to always
be an OpExecError, so that we always encode it properly. For the command
line interface, the behaviour is the same, as any non-Ganeti errors get
re-encoded as OpExecError anyway. For the RAPI clients, it only means
that we always present the same type for results. The actual error value
is the same, since the err.args is either way str(original_error);
compare the original (doesn't contain the ValueError):

  "opresult": [
    "invalid literal for int(): aa"
  ],

with:

  "opresult": [
    [
      "OpExecError",
      [
        "invalid literal for int(): aa"
      ]
    ]
  ],

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoImplement gnt-cluster master-ping
Iustin Pop [Thu, 29 Jul 2010 21:41:24 +0000 (17:41 -0400)]
Implement gnt-cluster master-ping

This can be used from shell-scripts to quickly check the status of the
master node, before launching a series of jobs (and handling the failure
of the jobs due to masterd other issues).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoInstance migration: remove error on missing link
Iustin Pop [Thu, 29 Jul 2010 18:41:09 +0000 (14:41 -0400)]
Instance migration: remove error on missing link

Since we don't support upgrades from 1.2.4 without restarting the
instance, the 'not restarted since 1.2.5' check/error is
wrong/misleading.

Since the live migration works anyway without the links (it recreates
them during the disk reconfiguration anyway), we remove the check and we
transform it into a warning (to the node daemon log only,
unfortunately).

For 2.3, we'll need to change the symlink creation from instance start
time to disk activation time (but that requires more RPC changes).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAdd check for RAPI paths to start with /2
Michael Hanselmann [Thu, 29 Jul 2010 16:08:11 +0000 (18:08 +0200)]
Add check for RAPI paths to start with /2

During a discussion in July 2010 it was decided that we'll stabilize on /2. See
message ID <20100716180012.GA9423@google.com> for reference.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoEnsure assertions are evaluated in tests
Michael Hanselmann [Thu, 29 Jul 2010 13:18:47 +0000 (15:18 +0200)]
Ensure assertions are evaluated in tests

A lot of assertions are used in Ganeti's code. Some unittests even check
whether AssertionError is raised in some cases. Explicitely ensuring
assertions are evaluated makes sure those tests don't fail and
assertions are checked.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRAPI client: The os argument for instance reinstalls is optional
David Knowles [Tue, 20 Jul 2010 21:46:13 +0000 (17:46 -0400)]
RAPI client: The os argument for instance reinstalls is optional

Signed-off-by: David Knowles <dknowles@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoQA: Test instance migration via CLI and RAPI
Michael Hanselmann [Fri, 16 Jul 2010 17:44:06 +0000 (19:44 +0200)]
QA: Test instance migration via CLI and RAPI

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRAPI client: Support migrating instances
Michael Hanselmann [Fri, 16 Jul 2010 17:43:50 +0000 (19:43 +0200)]
RAPI client: Support migrating instances

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRAPI: Support migrating instances
Michael Hanselmann [Fri, 16 Jul 2010 17:43:28 +0000 (19:43 +0200)]
RAPI: Support migrating instances

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoworkerpool: Change signature of AddTask function to not use *args
Michael Hanselmann [Sat, 17 Jul 2010 21:04:32 +0000 (23:04 +0200)]
workerpool: Change signature of AddTask function to not use *args

By changing it to a normal parameter, which must be a sequence, we can
start using keyword parameters.

Before this patch all arguments to “AddTask(self, *args)” were passed as
arguments to the worker's “RunTask” method. Priorities, which should be
optional and will be implemented in a future patch, must be passed as a keyword
parameter. This means “*args” can no longer be used as one can't combine *args
and keyword parameters in a clean way:

>>> def f(name=None, *args):
...   print "%r, %r" % (args, name)
...
>>> f("p1", "p2", "p3", name="thename")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 TypeError: f() got multiple values for keyword argument 'name'

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoworkerpool: Add two additional assertions
Michael Hanselmann [Sat, 17 Jul 2010 21:00:56 +0000 (23:00 +0200)]
workerpool: Add two additional assertions

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoworkerpool: Additional check in BaseWorker.ShouldTerminate
Michael Hanselmann [Sat, 17 Jul 2010 20:58:09 +0000 (22:58 +0200)]
workerpool: Additional check in BaseWorker.ShouldTerminate

Document that it should only be called from within RunTask and
add an assertion for this. This means we can no longer use a
method on the pool and hence remove WorkerPool.ShouldWorkerTerminate.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoworkerpool: Remove unused worker method
Michael Hanselmann [Sat, 17 Jul 2010 20:56:18 +0000 (22:56 +0200)]
workerpool: Remove unused worker method

HasRunningTask is never used except for an assertion, where we
don't really need the lock.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoworkerpool: Move waiting for new tasks for a worker to the pool
Michael Hanselmann [Sat, 17 Jul 2010 20:32:43 +0000 (22:32 +0200)]
workerpool: Move waiting for new tasks for a worker to the pool

This way fewer private variables of the pool are accesssed by the worker.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoworkerpool: Use common function to add tasks
Michael Hanselmann [Sat, 17 Jul 2010 20:25:27 +0000 (22:25 +0200)]
workerpool: Use common function to add tasks

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix install document regarding DRBD usage
Iustin Pop [Wed, 28 Jul 2010 23:39:53 +0000 (19:39 -0400)]
Fix install document regarding DRBD usage

This is related to issue 105.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoUpdate RAPI documentation for the OS changes
Iustin Pop [Wed, 28 Jul 2010 21:17:10 +0000 (17:17 -0400)]
Update RAPI documentation for the OS changes

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRename masterfailover to master-failover
Iustin Pop [Tue, 27 Jul 2010 21:07:20 +0000 (17:07 -0400)]
Rename masterfailover to master-failover

Most (all?) of our commands use dash-separator: replace-disks,
verify-disks, add-tags, etc. “gnt-cluster masterfailover” is an old
exception to this rule.

The patch replaces it with master-failover, add a compatiblity alias,
and updates the documentation for this change.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRAPI: Add os params to instance creation v1
Iustin Pop [Wed, 28 Jul 2010 18:26:04 +0000 (14:26 -0400)]
RAPI: Add os params to instance creation v1

Since the RAPI QA suite doesn't seem to offer easy testing of failed
creations, I didn't add this to the QA. Pointers on how to do it are
welcome.

The patch also changes the 'os' argument to be required, since that is
how the LU expects it, and without it we just fail later instead of
directly at submission time.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agomakefile: fix TAGS building
Iustin Pop [Wed, 28 Jul 2010 18:24:23 +0000 (14:24 -0400)]
makefile: fix TAGS building

“find .” requires that “-path” arguments start with a dot, otherwise
they are not matches. Additionally, we also include the QA files in the
tags, for easier search while modifying the QA suite.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoImprove handling of lost jobs
Iustin Pop [Wed, 28 Jul 2010 20:08:29 +0000 (16:08 -0400)]
Improve handling of lost jobs

Currently, if the cli.JobExecutor class is being used, and one of the
jobs is being archived before it can check its result, it will raise a
stracktrace as _ChooseJob is not prepared to handle this case.

This case makes JobExecutor work better with lost jobs (it still reports
them as 'failed', but it doesn't break and returns a proper error
message), and modifies the generic FormatError to report the JobLost
exception properly, instead of as "Unhandled Ganeti Exception".

Since JobExecutor is hard to test properly, I only tested this manually,
via a fake invocation.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoluxi: convert permission errors into exception
Iustin Pop [Wed, 28 Jul 2010 15:50:38 +0000 (11:50 -0400)]
luxi: convert permission errors into exception

This patch adds handling of permission errors so that we don't show
tracebacks when a non-root user runs a gnt-* command. Since in the
future we'll have different permissions, we need to handle this in RAPI
too.

It also fixes a typo in RAPI error message and the docstrings of LUXI
errors.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agocmdlib: Return new name from rename operations
Michael Hanselmann [Mon, 19 Jul 2010 14:40:38 +0000 (16:40 +0200)]
cmdlib: Return new name from rename operations

The new name is then displayed by the clients.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Manuel Franceschini <livewire@google.com>

13 years agognt-instance rename: Fix bug and rename params
Manuel Franceschini [Wed, 28 Jul 2010 14:58:19 +0000 (16:58 +0200)]
gnt-instance rename: Fix bug and rename params

This patch fixes a bug when gnt-instance rename was invoked with
--no-name-check. It renames the internal variables to be consistent with
the ones in equivalent instance add code. Furthermore it checks whether
and instance rename is invoked with --no-name-check but without
--no-ip-check and throws an exception if so.

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoQA: add tests for the reserved lvs feature
Iustin Pop [Sat, 24 Jul 2010 00:11:00 +0000 (20:11 -0400)]
QA: add tests for the reserved lvs feature

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoAdd modification of the reserved logical volumes
Iustin Pop [Fri, 23 Jul 2010 23:28:29 +0000 (19:28 -0400)]
Add modification of the reserved logical volumes

This doesn't allow addition/removal of individual volumes, only
wholesale replace of the entire list. It can be improved later, if we
ever get generic container parameters.

The man page changes replaces some tabs with spaces (hence the
whitespace changes).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoAdd printing of reserved_lvs in cluster info
Iustin Pop [Thu, 15 Apr 2010 15:08:40 +0000 (17:08 +0200)]
Add printing of reserved_lvs in cluster info

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoIntrouce a new cluster parameter - reserved_lvs
Iustin Pop [Thu, 15 Apr 2010 15:07:03 +0000 (17:07 +0200)]
Introuce a new cluster parameter - reserved_lvs

This parameter, which is a list of regular expression patterns, will
make cluster verify ignore any such LVs. It will not prevent creation or
removal of such volumes by the backend code.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoChange the meaning of call_node_start_master
Iustin Pop [Fri, 23 Jul 2010 19:29:31 +0000 (15:29 -0400)]
Change the meaning of call_node_start_master

Currently, backend.StartMaster (the function behind this RPC call) will
activate the master IP and then, if the start_daemons parameter is true,
it will also activate the master role.

While this works, it has two issues:

- first, it will activate the master IP unconditionally, even if this
  node will not start the master daemon due to missing votes
- second, the activation of the IP is done twice if start_daemons is
  true, because the master daemon does its own activation too

This behaviour seems to be unmodified since Summer 2008, so probably any
rationale on why this is done in two places is forgotten.

The patch changes so that this function does *either* IP activation or
master role activation but not both. So the IP will be activated only
once (from the master daemon or from LURenameCluster), and it will only
be done if the masterd got enough votes for startup.

I can see only one downside to this change: if masterd won't actually
start (due to missing votes), RAPI will still start, and without the
master IP activated. But this is no worse than before, when both RAPI
was running and the IP was activated.

Note that the behaviour of StopMaster remains the same, as noone else
does the IP removal.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agomasterd: move the IP activation from Exec to Check
Iustin Pop [Fri, 23 Jul 2010 18:12:16 +0000 (14:12 -0400)]
masterd: move the IP activation from Exec to Check

Currently, the master IP activation is done in the Exec function. Since
the original masterd process returns after forking, and Exec is run in
the (grand)child process, this means that after 'ganeti-masterd' has
returned there are still initialization tasks running.

Normally this is not a problem, but in cases where one does quick master
failovers, this creates a race condition which hits the QA scripts
especially hard.

To solve this, and make the startup process cleaner (the system is in
steady state after the command has returned, even though masterd startup
could still fail), we move the IP activation to Check(). This also
allows error messages about the IP activation to be seen on the console.

With this patch enabled, I can no longer reproduce the double-failover
errors, which were occuring before in 4/5 cases.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoMove the UsesRPC decorator from cli to rpc
Iustin Pop [Fri, 23 Jul 2010 17:51:44 +0000 (13:51 -0400)]
Move the UsesRPC decorator from cli to rpc

This is needed because not just the cli scripts need this decorator, but
the master daemon too (and it already duplicated the code once).

In cli.py we just leave a stub, so that we don't have to modify all the
scripts to import rpc.py.

We then change the master daemon code to reuse this decorator, instead
of duplicating it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agowatcher: smarter handling of instance records
Iustin Pop [Fri, 23 Jul 2010 21:41:35 +0000 (17:41 -0400)]
watcher: smarter handling of instance records

This patch implements a few changes to the instance handling. First, old
instances which no longer exist on the cluster are removed from the
state file, to keep things clean.

Second, the instance restart counters are reset every 8 hours, since
some error cases might be transient (e.g. networking issues, or machine
temporarily down), and if the problem takes more than 5 restarts but is
not permanent, watcher will not restart the instance. The value of 8
hours is, I think, both conservative (as not to hammer the cluster too
often with restarts) and fast enough to clear semi-transient problems.

And last, if an instance is not restarted due to exhausted retries, this
should be warned, otherwise it's hard to understand why watcher doesn't
want to restart an ERROR_down instance.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoUpdate the RAPI node migrate for the 'live' change
Iustin Pop [Thu, 22 Jul 2010 17:54:17 +0000 (13:54 -0400)]
Update the RAPI node migrate for the 'live' change

This patch adds handling of the new 'mode' parameter to the RAPI server,
while keeping compatibility with the old mode. Note that in the old mode
(when 'live' is being passed), the auto-mode doesn't work.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoUpdate the RAPI client for the migration mode
Iustin Pop [Thu, 22 Jul 2010 17:42:01 +0000 (13:42 -0400)]
Update the RAPI client for the migration mode

See the discussion on the previous patch about this. Basically unless we
want to a add a new 'feature' marking for the live migration parameter,
there is no simple way to handle this nicely in the client.

Given that the client was/is marked as experimental, this patch simply
replaces live with mode. This means that this client won't work with 2.1
clusters…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoFix burnin and live migration
Iustin Pop [Thu, 22 Jul 2010 17:38:26 +0000 (13:38 -0400)]
Fix burnin and live migration

This is breakage from the original 'live' parameter changes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoRename the OpMigrate* parameter 'live' to 'mode'
Iustin Pop [Tue, 20 Jul 2010 16:26:44 +0000 (18:26 +0200)]
Rename the OpMigrate* parameter 'live' to 'mode'

This is needed as now the parameter is no longer boolean, but tri-state.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoRename migration type to migration mode
Iustin Pop [Mon, 19 Jul 2010 14:57:59 +0000 (16:57 +0200)]
Rename migration type to migration mode

This is in preparation for the rename of the opcode 'live' parameter to
'mode'.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoutils: Fix incorrect docstring
Manuel Franceschini [Thu, 22 Jul 2010 16:09:59 +0000 (18:09 +0200)]
utils: Fix incorrect docstring

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.1' into master
Iustin Pop [Thu, 22 Jul 2010 17:21:56 +0000 (13:21 -0400)]
Merge branch 'devel-2.1' into master

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoFix issue when changing the disk template to drbd
Iustin Pop [Thu, 22 Jul 2010 17:00:15 +0000 (13:00 -0400)]
Fix issue when changing the disk template to drbd

If we pass the current primary node, the conversion will fail horribly
with LVM creation errors. Instead, we catch and check for this
condition in CheckPrereq.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoRemove a couple of empty design sections
Guido Trotter [Wed, 21 Jul 2010 15:27:32 +0000 (16:27 +0100)]
Remove a couple of empty design sections

The 2.1 and 2.2 designs contain sections with no actual content, as they
are detailed for each single change. Removing the global empty ones.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Manuel Franceschini <livewire@google.com>

13 years agoDisable 'invalid name' pylint warning for tools/setup-ssh
Manuel Franceschini [Wed, 21 Jul 2010 09:29:40 +0000 (11:29 +0200)]
Disable 'invalid name' pylint warning for tools/setup-ssh

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

13 years agoAlways set commonName in X509 certificates
Manuel Franceschini [Mon, 19 Jul 2010 19:07:57 +0000 (21:07 +0200)]
Always set commonName in X509 certificates

Due to the current switch of the RPC client to PycURL, a bug with newer
versions of libcurl surfaced. When the 'Subject' or 'Issuer' of
'server.pem' were empty, SSL handshake failed.

This patch changes the certificate generation functions such that they
always use "ganeti.example.com" as commonName (CN) for 'Subject' and
'Issuer'.

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdding tool to setup SSH on a remote host
René Nussbaumer [Tue, 13 Jul 2010 09:38:36 +0000 (11:38 +0200)]
Adding tool to setup SSH on a remote host

This prepares the remote node to be joined into a cluster

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdding new (optional) dependency to configure.ac
René Nussbaumer [Wed, 14 Jul 2010 09:04:35 +0000 (11:04 +0200)]
Adding new (optional) dependency to configure.ac

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdding constants for setup-ssh
René Nussbaumer [Tue, 13 Jul 2010 14:29:49 +0000 (16:29 +0200)]
Adding constants for setup-ssh

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoChange AddAuthorizedKey to also allow filehandles
René Nussbaumer [Tue, 13 Jul 2010 09:37:53 +0000 (11:37 +0200)]
Change AddAuthorizedKey to also allow filehandles

This is required to use this function over paramiko
sftp file handles.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoUpdate .gitignore for vcs-version
Iustin Pop [Mon, 19 Jul 2010 14:14:37 +0000 (16:14 +0200)]
Update .gitignore for vcs-version

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoRAPI client: Encode empty body to JSON
Michael Hanselmann [Fri, 16 Jul 2010 17:20:04 +0000 (19:20 +0200)]
RAPI client: Encode empty body to JSON

If the body consists of an empty dict, it should also be encoded.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoIntroduce git reference/tag tracking for debugging
Iustin Pop [Fri, 16 Jul 2010 08:18:24 +0000 (10:18 +0200)]
Introduce git reference/tag tracking for debugging

This patch adds a new vcs-version file that is generated via git (and
can be adapted if VCS is changed) and then embebbded as VCS_VERSION in
the constants module.

This means two things:
- local modifications without committing to git (or when using a tar.gz
  archive + mods) will not be reflected
- version is fixed at the time of the last make regen-vcs-version (dist time,
  or devel/upload which calls this)

Thus this is more geared at developers rather than end users.

The patch:

- adds rules for generating the vcs-version file
- adds a dist-hook for re-generating the file (if possible) and copying
  the updated version to the distdir
- modifies devel/upload to re-generate the file before upload

The output of --version will look like:
gnt-cluster (ganeti v2.2.0beta0-184-gebca7e6) 2.2.0~beta0

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoFix epydoc warning "Lists must be indented."
Luca Bigliardi [Fri, 16 Jul 2010 15:29:16 +0000 (16:29 +0100)]
Fix epydoc warning "Lists must be indented."

Signed-off-by: Luca Bigliardi <shammash@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoConvert RPC client to PycURL
Michael Hanselmann [Tue, 6 Jul 2010 13:56:49 +0000 (15:56 +0200)]
Convert RPC client to PycURL

Instead of using our custom HTTP client, using PycURL's multi
interface allows us to get rid of the HTTP client threadpool.
The majority of the code is still in the ganeti.http.client
module.

A simple per-thread HTTP client pool gives cURL a chance to
cache and retain as much information as possible (e.g. SSL certs).
Unused HTTP clients (e.g. due to removed nodes) are deleted after
25 requests going through the pool.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoImplement lock names for debugging purposes
Iustin Pop [Mon, 4 May 2009 20:51:04 +0000 (22:51 +0200)]
Implement lock names for debugging purposes

This patch adds lock names to SharedLocks and LockSets, that can be used
later for displaying the actual locks being held/used in places where we
only have the lock, and not the entire context of the locking operation.

Since I realized that the production code doesn't call LockSet with the
proper members= syntax, but directly as positional parameters, I've
converted this (and the arguments to GlobalLockManager) into positional
arguments.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoMerge branch 'devel-2.1'
Guido Trotter [Fri, 16 Jul 2010 13:05:23 +0000 (14:05 +0100)]
Merge branch 'devel-2.1'

* devel-2.1:
  Bump up version to release 2.1.6
  Update NEWS file for 2.1.6

Conflicts:
NEWS
  - merge
configure.ac
  - keep 2.2 version

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoBump up version to release 2.1.6 v2.1.6
Guido Trotter [Fri, 16 Jul 2010 11:04:02 +0000 (12:04 +0100)]
Bump up version to release 2.1.6

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoUpdate NEWS file for 2.1.6
Guido Trotter [Fri, 16 Jul 2010 11:17:40 +0000 (12:17 +0100)]
Update NEWS file for 2.1.6

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoFix pylint complaints introduced in commit e58f87a958c
Michael Hanselmann [Fri, 16 Jul 2010 00:00:04 +0000 (02:00 +0200)]
Fix pylint complaints introduced in commit e58f87a958c

Due to a small mistake I missed three non-critical pylint complaints for
commit e58f87a958c. They're fixed with this patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLXC: Add cpu_mask hypervisor parameter
Balazs Lecz [Fri, 9 Jul 2010 12:30:39 +0000 (13:30 +0100)]
LXC: Add cpu_mask hypervisor parameter

Also implement syntax checking.

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd ParseCpuMask() utility function
Balazs Lecz [Mon, 12 Jul 2010 16:54:47 +0000 (17:54 +0100)]
Add ParseCpuMask() utility function

Also adds a generic ParseError exception.

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd a migration type global hypervisor parameter
Iustin Pop [Thu, 15 Jul 2010 16:05:46 +0000 (18:05 +0200)]
Add a migration type global hypervisor parameter

Since migration live/non-live is more stable (e.g.) for Xen-PVM versus
Xen-HVM, we introduce a new parameter for what mode we should use by
default (if not overridden by the user, in the opcode).

The meaning of the opcode 'live' field changes from boolean to either
None (use the hypervisor default), or one of the allowed migration
string constants. The live parameter of the TLMigrateInstance is still a
boolean, computed from the opcode field (which is no longer passed to
the TL).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoAdd test for some aspects of job queue
Michael Hanselmann [Thu, 15 Jul 2010 16:23:17 +0000 (18:23 +0200)]
Add test for some aspects of job queue

This new opcode and gnt-debug sub-command test some aspects of the
job queue, including the status of a job. The bug fixed in commit
2034c70d507 was identified using this test. A future patch will
run this test automatically from the QA scripts.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLUVerifyCluster: update _ValidateNode description
Luca Bigliardi [Thu, 15 Jul 2010 16:13:06 +0000 (17:13 +0100)]
LUVerifyCluster: update _ValidateNode description

Change _ValidateNode description to reflect what the function actually does.

Signed-off-by: Luca Bigliardi <shammash@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoKVM hypervisor: Use utils.ShellWriter for network script
Michael Hanselmann [Wed, 14 Jul 2010 18:15:59 +0000 (20:15 +0200)]
KVM hypervisor: Use utils.ShellWriter for network script

This patch converts hv_kvm to use utils.ShellWriter for writing
the network script. It also adds a few unittests (the first
for any hypervisor modules).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMove ShellWriter class to utils
Michael Hanselmann [Wed, 14 Jul 2010 17:32:23 +0000 (19:32 +0200)]
Move ShellWriter class to utils

Also add unittest.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRename test for utils.IgnoreProcessNotFound
Michael Hanselmann [Wed, 14 Jul 2010 17:32:55 +0000 (19:32 +0200)]
Rename test for utils.IgnoreProcessNotFound

Usually our tests are named “Test…”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Manuel Franceschini <livewire@google.com>

13 years agojqueue: Factorize code waiting for job changes
Michael Hanselmann [Wed, 14 Jul 2010 15:29:56 +0000 (17:29 +0200)]
jqueue: Factorize code waiting for job changes

By splitting the _WaitForJobChangesHelper class into multiple smaller
classes, we gain in several places:

- Simpler code, less interaction between functions and variables
- Easy to unittest (close to 100% coverage)
- Waiting for job changes has no direct knowledge of queue anymore (it
  doesn't references queue functions anymore, especially not private ones)
- Activate inotify only if there was no change at the beginning (and
  checking again right away to avoid race conditions)

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

13 years agoMerge remote branch 'origin/devel-2.1'
Michael Hanselmann [Tue, 13 Jul 2010 19:01:17 +0000 (21:01 +0200)]
Merge remote branch 'origin/devel-2.1'

* origin/devel-2.1:
  RAPI client: Implement old instance creation request format
  rlib2: Use constants for disk and NIC parameters

Conflicts:
test/ganeti.rapi.client_unittest.py: Trivial
test/ganeti.rapi.rlib2_unittest.py: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoRAPI client: Implement old instance creation request format
Michael Hanselmann [Tue, 13 Jul 2010 17:45:44 +0000 (19:45 +0200)]
RAPI client: Implement old instance creation request format

Commit 8a47b4478 implemented instance creation in the RAPI client,
but it left out support for the old instance creation request format.
This patch now implements the old format as good as possible. This
will only be used when talking to clusters before Ganeti 2.1.3.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agorlib2: Use constants for disk and NIC parameters
Michael Hanselmann [Mon, 12 Jul 2010 20:16:52 +0000 (22:16 +0200)]
rlib2: Use constants for disk and NIC parameters

These constants were added in commit bd061c35, but the parsing code
was not updated. This also fixes a bug where a NIC's MAC address
wasn't used.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoUse reserved documentation IPs and domains
Manuel Franceschini [Mon, 12 Jul 2010 13:44:20 +0000 (15:44 +0200)]
Use reserved documentation IPs and domains

Use RFC 5737 IP addresses and RFC 2606 domain names in all
unittests, docs, qa and docstrings.

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoProvide feedback function for all LU methods
Michael Hanselmann [Thu, 8 Jul 2010 15:21:39 +0000 (17:21 +0200)]
Provide feedback function for all LU methods

By exposing mcpu's _Feedback function (now renamed to “Log”) to LU's,
methods like ExpandNames can also write to the job execution log.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agojqueue: Setup inotify before checking for any job changes
Michael Hanselmann [Thu, 8 Jul 2010 15:09:03 +0000 (17:09 +0200)]
jqueue: Setup inotify before checking for any job changes

Since the code waiting for job changes was modified to use inotify,
a race condition between checking for changes the first time and
setting up inotify occurs. If the job is modified after the check
but before inotify is active, changes would only be noticed after
the timeout (29 seconds in most cases) expired.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agocli.SubmitOpCode: Support custom job reporter
Michael Hanselmann [Thu, 8 Jul 2010 15:06:14 +0000 (17:06 +0200)]
cli.SubmitOpCode: Support custom job reporter

This is necessary to reuse SubmitOpCode while adding processing for
custom message types.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd function to format all job log messages
Michael Hanselmann [Thu, 8 Jul 2010 15:04:59 +0000 (17:04 +0200)]
Add function to format all job log messages

Just calling utils.SafeEncode on the log message failed when it
wasn't of the type ELOG_MESSAGE and not a string. Now non-message
log entries are formatted using repr().

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agobaserlib: Fix feedback function
Michael Hanselmann [Thu, 8 Jul 2010 15:02:14 +0000 (17:02 +0200)]
baserlib: Fix feedback function

The feedback function is called with only one parameter, a tuple
with the message details.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoConfd IPv6 support
Manuel Franceschini [Wed, 30 Jun 2010 09:55:18 +0000 (11:55 +0200)]
Confd IPv6 support

This patch series basically adds a new parameter 'family' to the constructors
of daemon.AsyncUDPSocket and confd.client.ConfdUDPClient. This enables the
users of these two classes to support IPv6.

In ganeti-confd.ConfdAsyncUDPClient a method to check the address families of
all peers is added.

Furthermore it adds unittests for the added functionality.

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLXC: Fix GetInstanceInfo()
Balazs Lecz [Thu, 8 Jul 2010 17:45:48 +0000 (18:45 +0100)]
LXC: Fix GetInstanceInfo()

Don't try to get cgroups info if instance is not running.

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLXC: Fix wording of error messages
Balazs Lecz [Thu, 8 Jul 2010 17:18:51 +0000 (18:18 +0100)]
LXC: Fix wording of error messages

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLXC: Create per-instance log files
Balazs Lecz [Thu, 8 Jul 2010 17:12:09 +0000 (18:12 +0100)]
LXC: Create per-instance log files

This replaces the single global log file with per-instance logs.
The instance log file is not truncated when the instance is started.

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.1'
Iustin Pop [Fri, 9 Jul 2010 13:48:00 +0000 (15:48 +0200)]
Merge branch 'devel-2.1'

* devel-2.1:
  Enable from-repository builds on old distributions

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Manuel Franceschini <livewire@google.com>

13 years agoEnable from-repository builds on old distributions
Iustin Pop [Fri, 9 Jul 2010 13:11:39 +0000 (15:11 +0200)]
Enable from-repository builds on old distributions

… or on distributions which simply have other implementations of man,
that do not support '--warnings'.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Manuel Franceschini <livewire@google.com>

13 years agoIntroduce lib/netutils.py
Manuel Franceschini [Mon, 5 Jul 2010 16:50:39 +0000 (18:50 +0200)]
Introduce lib/netutils.py

This patch moves network utility functions to a dedicated module.

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd oper_vcpus instance status field
Balazs Lecz [Wed, 7 Jul 2010 18:02:26 +0000 (18:02 +0000)]
Add oper_vcpus instance status field

This introduces a new instance status field, named "oper_vcpus".
It contains the actual number of VCPUs an instance is using as
seen by the hypervisor.

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLXC: Fix GetAllInstancesInfo()
Balazs Lecz [Wed, 7 Jul 2010 16:59:06 +0000 (16:59 +0000)]
LXC: Fix GetAllInstancesInfo()

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLUNodeEvacuationStrategy: Use default iallocator
Apollon Oikonomopoulos [Thu, 8 Jul 2010 12:05:01 +0000 (15:05 +0300)]
LUNodeEvacuationStrategy: Use default iallocator

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLUCreateInstance: use cluster-wide iallocator
Apollon Oikonomopoulos [Thu, 8 Jul 2010 12:04:44 +0000 (15:04 +0300)]
LUCreateInstance: use cluster-wide iallocator

LUCreateInstance uses the cluster-wide default iallocator if no iallocator or
primary node is specified manually.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years ago_CheckIAllocatorOrNode unit tests
Apollon Oikonomopoulos [Thu, 8 Jul 2010 12:04:15 +0000 (15:04 +0300)]
_CheckIAllocatorOrNode unit tests

Add unit tests to check the function of _CheckIAllocatorOrNode

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd _CheckIAllocatorOrNode for common iallocator/node checks
Apollon Oikonomopoulos [Thu, 8 Jul 2010 12:03:58 +0000 (15:03 +0300)]
Add _CheckIAllocatorOrNode for common iallocator/node checks

_CheckIAllocatorOrNode will be called by LUs wishing to use an instance
allocator or a target node. It performs sanity checks and will modify the LU's
opcode's iallocator slot to use the cluster-wide allocator if
appropriate.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoDocument the default instance allocator in gnt-cluster.sgml
Apollon Oikonomopoulos [Thu, 8 Jul 2010 12:03:26 +0000 (15:03 +0300)]
Document the default instance allocator in gnt-cluster.sgml

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoAdd default_iallocator cluster parameter
Apollon Oikonomopoulos [Thu, 8 Jul 2010 12:02:55 +0000 (15:02 +0300)]
Add default_iallocator cluster parameter

Add a cluster parameter to hold the iallocator that will be used by default
when required and no alternative (manually-specified iallocator or
manually-specified node(s)) is given.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoLXC: Report actual number of CPUs
Balazs Lecz [Wed, 7 Jul 2010 15:57:00 +0000 (15:57 +0000)]
LXC: Report actual number of CPUs

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMerge branch 'devel-2.1'
Luca Bigliardi [Wed, 7 Jul 2010 14:48:37 +0000 (15:48 +0100)]
Merge branch 'devel-2.1'

Signed-off-by: Luca Bigliardi <shammash@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

13 years agoMlockall: decrease warnings if ctypes module is not present
Luca Bigliardi [Tue, 6 Jul 2010 14:28:58 +0000 (15:28 +0100)]
Mlockall: decrease warnings if ctypes module is not present

Node daemon prints a lot of warnings if --no-mlock option is not specified and
ctypes module is not present.

With the following patch the warning is printed only at noded startup.

Signed-off-by: Luca Bigliardi <shammash@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13 years agoAdd a delay in master failover
Iustin Pop [Tue, 6 Jul 2010 12:20:15 +0000 (14:20 +0200)]
Add a delay in master failover

I have seen some very seldom errors where (it seems) the address is
still live for a short while after removing it from the old master, thus
the new master will fail in startup/adding its own IP address.

To prevent against this, we add a delay/retry before we proceed, if the
IP is still reachable.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
(cherry picked from commit 425f0f5470c912ff4a615d14c8b924116abe5c92)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Balazs Lecz <leczb@google.com>

13 years agoLXC: Use lxc-info to get instance info
Balazs Lecz [Tue, 6 Jul 2010 17:58:09 +0000 (17:58 +0000)]
LXC: Use lxc-info to get instance info

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>