Statistics
| Branch: | Tag: | Revision:

root / daemons / ganeti-watcher @ 073c31a5

History | View | Annotate | Download (23.8 kB)

# Date Author Comment
ae8419a2 09/07/2010 01:07 pm Michael Hanselmann

Merge branch 'devel-2.2'

  • devel-2.2:
    cli: Use list of options shared between commands
    jqueue: Use separate function for encoding errors
    Fix some epydoc warnings
    Fix breakage introduced by commit 8044bf655
    Remove “dry_run” from opcodes.OpCreateInstance...
34f06005 09/02/2010 02:04 pm Iustin Pop

Disable the RAPI CA checks in watcher

Since the RAPI certificate is not necessarily self-signed, and we
currently don't have any configuration variable for the real CA file, we
disable for now the CA checks. This fixes the 'restart RAPI every 5
minutes' problem with non-self-signed certs....

b705c7a6 08/18/2010 11:27 am Manuel Franceschini

Support for resolving hostnames to IPv6 addresses

This patch enables IPv6 name resolution by using socket.getaddrinfo
instead of socket.gethostbyname_ex.

It renames the HostInfo class to Hostname and unifies its use throughout
the code. This is achieved by using static calls where no object is...

f5116c87 07/26/2010 05:09 pm Iustin Pop

watcher: smarter handling of instance records

This patch implements a few changes to the instance handling. First, old
instances which no longer exist on the cluster are removed from the
state file, to keep things clean.

Second, the instance restart counters are reset every 8 hours, since...

a744b676 07/09/2010 04:37 pm Manuel Franceschini

Introduce lib/netutils.py

This patch moves network utility functions to a dedicated module.

Signed-off-by: Manuel Franceschini <>
Reviewed-by: Iustin Pop <>

2a7c3583 07/01/2010 03:13 pm Michael Hanselmann

RAPI client: Switch to pycURL

Currently the RAPI client uses the urllib2 and httplib modules from
Python's standard library. They're used with pyOpenSSL in a very fragile
way, and there are known issues when receiving large responses from a RAPI
server.
...

9769bb78 06/30/2010 01:41 pm Manuel Franceschini

Rename some constants to facilitate IPv6 support

Signed-off-by: Manuel Franceschini <>
Reviewed-by: Guido Trotter <>

db147305 06/03/2010 12:39 pm Tom Limoncelli

ganeti-watcher should attempt to fix ganeti-rapi

Update ganeti-watcher so that it tests the master's RAPI port with a
simple test (in this case GetVersion). If it fails, make one attempt
at restarting ganeti-rapi and retest.

- daemons/ganeti-watcher: Test rapi and make one attempt at restarting it....

ebacb943 04/09/2010 10:57 am Iustin Pop

Make watcher request the max coverage

Since the actions are potentially destructive, we should try to get a
consistent view of the cluster, so it's better to get the most coverage
possible.

Signed-off-by: Iustin Pop <>
Reviewed-by: Guido Trotter <>

50273051 04/08/2010 06:50 pm Iustin Pop

Watcher: automatic shutdown of orphan resources

This patch changes the watcher so that it maintains (on all nodes) the
list of instances and DRBD devices by shutting down ones that confd
daemons indicate should not be running on this node.

Signed-off-by: Iustin Pop <>...

10e689d4 03/23/2010 12:21 pm Iustin Pop

Watcher: do not warn for missing hooks dir

If the hooks dir does not exist, do not warn needlessly. This is similar
to commit a9b7e346 (for backend.py).

Signed-off-by: Iustin Pop <>
Reviewed-by: René Nussbaumer <>

55c85950 03/23/2010 12:21 pm Iustin Pop

Watcher: fix some doc typos

Signed-off-by: Iustin Pop <>
Reviewed-by: René Nussbaumer <>

c4feafe8 03/08/2010 03:48 pm Iustin Pop

Switch from os.path.join to utils.PathJoin

This passes a full burnin with lots of instances, and should be safe as
we mostly to join a known root (various constants) to a run-time
variable.

Signed-off-by: Iustin Pop <>
Reviewed-by: Michael Hanselmann <>

001b3825 02/26/2010 07:42 pm Michael Hanselmann

watcher: Acquire lock early and give more friendly message

By opening the lock file early, other programs can lock the
state file to prevent ganeti-watcher from restarting daemons.
Using the pause feature is inherently prone to race conditions.

Before a traceback was logged when the lock file couldn't...

2826b361 02/26/2010 03:20 pm Guido Trotter

Move watcher's EnsureDaemon function to utils

This is going to be used from the nbma repository, to ensure that the
nld daemon is running.

Signed-off-by: Guido Trotter <>
Reviewed-by: Iustin Pop <>

9e289e36 02/26/2010 02:24 pm Guido Trotter

Add watcher hooks

These hooks are run on all nodes, after the "base" daemons are started.

Signed-off-by: Guido Trotter <>
Reviewed-by: Michael Hanselmann <>

f1115454 02/26/2010 02:24 pm Guido Trotter

Abstract starting the node daemons

We're using a separate function for this, as we're going to add some
functionality to this feature.

Signed-off-by: Guido Trotter <>
Reviewed-by: Iustin Pop <>

46cf6260 02/26/2010 02:23 pm Guido Trotter

ganeti-watcher: remove unused Indent function

Signed-off-by: Guido Trotter <>
Reviewed-by: Iustin Pop <>

a9105b24 02/23/2010 06:31 pm Michael Hanselmann

Catch disk activation errors in watcher

If activating disks fails for some reason, the watcher didn't
catch the exception. With this patch it's caught and logged.

Signed-off-by: Michael Hanselmann <>
Reviewed-by: Guido Trotter <>

7369e826 01/28/2010 01:44 pm Guido Trotter

ganeti-watcher: ensure confd is running as well

Ganeti-confd should be running on all 2.1 nodes.

Signed-off-by: Guido Trotter <>
Reviewed-by: Iustin Pop <>

30e4e741 01/04/2010 11:20 am Iustin Pop

Fix unused imports or add silences where needed

In some cases pylint doesn't parse the import correctly, so we add
silences; but there are also many cases of unused imports, which we
simply remove.

Signed-off-by: Iustin Pop <>
Reviewed-by: Olivier Tharan <>

f93427cd 01/04/2010 11:16 am Iustin Pop

daemons: handle arguments correctly and uniformly

Of all daemons, only rapi did abort when given argument. None of our
daemons use any arguments, but they accepted them blindly. This is a
very bad experience for the user.

This patch adds checking and exiting in all daemons, in a uniform way....

f4ad2ef0 01/04/2010 11:15 am Iustin Pop

Remove more unused variables

This removes unused variables in the rest of the code (outside lib/).

Signed-off-by: Iustin Pop <>
Reviewed-by: Olivier Tharan <>

7260cfbe 01/04/2010 11:15 am Iustin Pop

Add targeted pylint disables

This patch should have only:

- pylint disables
- docstring changes
- whitespace changes

Signed-off-by: Iustin Pop <>
Reviewed-by: Olivier Tharan <>

07b8a2b5 01/04/2010 10:42 am Iustin Pop

Fix use of the logging functions

The logging functions expand the arguments themselves, thus it's safer
to let them do it rather than manual string formatting.

Also re-wraps one comment.

Signed-off-by: Iustin Pop <>
Reviewed-by: Olivier Tharan <>

1f864b60 11/25/2009 04:30 pm Iustin Pop

Remove quotes from CommaJoin and convert to it

This patch removes the quotes from CommaJoin and converts most of the
callers (that I could find) to it. Since CommaJoin does str(i) for i in
param, we can remove these, thus simplifying slightly a few calls....

f154a7a3 11/05/2009 05:36 pm Michael Hanselmann

Add new “daemon-util” script to start/stop Ganeti daemons

Until now, Ganeti started and stopped its own daemons using custom functions.
To start, the daemon was just executed and then sent the appropriate signals to
stop it again. Init scripts would have to pay attention to the PID file and...

6d4e8ec0 09/18/2009 01:53 pm Iustin Pop

Make ganeti-watcher use the standard debug option

Signed-off-by: Iustin Pop <>
Reviewed-by: Michael Hanselmann <>

3753b2cb 08/26/2009 07:09 pm Michael Hanselmann

ganeti-watcher: Don't run if paused

Signed-off-by: Michael Hanselmann <>
Reviewed-by: Iustin Pop <>

83052f9e 07/24/2009 03:05 pm Guido Trotter

Remove <DAEMON>_PID constants

The <DAEMON>_PID constants were created to reference a daemon pid file,
but actually contain a daemon's name, because the various functions that
work with pidfiles abstract the filename from the daemon name
themselves. Removing the constants and using the actual daemon name...

c4f0219c 05/25/2009 05:07 pm Iustin Pop

watcher: automatically restart noded/rapi

This patch makes the watcher automatically restart the node and rapi
daemons, if they are not running (as per the PID file).

This is not an exhaustive test; a better one would be TCP connect to the
port, and an even better one a simple protocol ping (e.g. get / for rapi...

24edc6d4 05/25/2009 02:19 pm Iustin Pop

watcher: handle full and drained queue cases

Currently the watcher is broken when the queue is full, thus not
fulfilling its job as a queue cleaner. It also doesn't handle nicely the
queue drained status.

This patch does a few changes:
- first archive jobs, and only after submit jobs; this fixes the case...

f226f085 05/20/2009 04:14 pm Guido Trotter

Merge branch 'master' into next

Signed-off-by: Guido Trotter <>

78f44650 05/20/2009 04:00 pm Iustin Pop

watcher: write the instance status to a file

This patch modifies the watcher to keep on-disk a file with the instance
status; this can be used from outside of ganeti to react to instances
being down (when the watcher cannot restart them).

Signed-off-by: Iustin Pop <>...

7dfb83c2 05/19/2009 02:28 pm Iustin Pop

watcher: try to restart the master if down

Bugs in either our code or in associated libraries can bring the master daemon
down, and this (due to the 2.0 architecture) stops all work on the cluster.

Since the watcher already does periodic checks on the cluster, we modify...

2c404217 04/06/2009 11:21 am Iustin Pop

Fix the output of watcher on non-master nodes

Currently the watcher spews errors message on non-master nodes. This
cleans it up.

Reviewed-by: imsnah

6dfcc47b 04/06/2009 11:21 am Iustin Pop

Change the watcher to use jobs instead of queries

As per the mailing list discussion, this patch changes the watcher to
use a single job (two opcodes) for getting the cluster state (node list
and instance list); it will then compute the needed actions based on...

cc962d58 03/09/2009 05:12 pm Iustin Pop

watcher: fix startup sequence locking the master

Currently, the watcher startup sequence does:
- open a luxi client
- get the instance list
- get the node boot ids
- open and lock the status file, and:
- archive jobs
- restart the down instances...

07813a9e 02/24/2009 05:25 pm Iustin Pop

Remove the extra_args parameter in instance start

This patch removes the extra_args parameter and instead switches the
instance to the HV_KERNEL_ARGS hypervisor option.

This is a big change, but it's a needed cleanup, this extra parameter on
all RPC calls is not generic and we also need to have a persistent value...

3448aa22 02/16/2009 01:08 pm Iustin Pop

watcher: fix checking of boot IDs

The recent change (commit 2151) to the watcher to make it handle offline
nodes also saves the offline attribute to the state file, but this is
not needed and also breaks the checking of the boot ID. This patch
simply removes it, restoring the correct behaviour....

f07521e5 02/16/2009 01:08 pm Iustin Pop

watcher: autoarchive old jobs

This patch adds auto-archiving of jobs older than 6 hours to the
watcher.

Reviewed-by: imsnah

ec79568d 02/04/2009 12:30 pm Iustin Pop

Implement lockless query operations

This patch adds the framework for, and enables lockless OpQueryInstances. This
means that instances will be shown in ERROR_up or ERROR_down state, even though
this is not an error (but just an in-progress job).

The framework is implemented as follows:...

4bffa7f7 01/13/2009 10:04 am Iustin Pop

Small typo in ganeti-watcher

Reviewed-by: imsnah

c41eea6e 12/11/2008 07:13 pm Iustin Pop

Fix epydoc format warnings

This patch should fix all outstanding epydoc parsing errors; as such, we
switch epydoc into verbose mode so that any new errors will be visible.

Reviewed-by: imsnah

cbfc4681 12/05/2008 04:58 am Iustin Pop

watcher: handle offline nodes better

This patch changes the LUQueryInstances to show a different state for
offline nodes and also modifies the watcher to understand the offline
state in its checks.

Reviewed-by: ultrotter

82d9caef 10/20/2008 03:50 pm Iustin Pop

Remove the logger.py module

Since now we use only one function from the logger module
(SetupLogging), we move it to utils.py (which is already imported by all
users of this function), and we remove the module.

Reviewed-by: imsnah

2859b87b 10/01/2008 08:36 pm Michael Hanselmann

Convert ganeti-watcher

Use RPC calls instead of ssconf.

Reviewed-by: iustinp

37b77b18 10/01/2008 12:27 pm Iustin Pop

Fix the watcher with down nodes

The watcher didn't handle the down nodes, fix this by ignoring (in
secondary node reboot checks) any node that doesn't return a boot id.

Reviewed-by: imsnah

b7309a0d 10/01/2008 12:27 pm Iustin Pop

Fix the watcher not restarting instance bug

The watcher was using conflicting attributes of the instance:
- it queried the admin_/oper_state, which are booleans
- but it compared those to the status (which is a text field)

The code was changed to query the aggregated 'status' field, as that...

5188ab37 10/01/2008 12:27 pm Iustin Pop

Remove last use of utils.RunCmd from the watcher

The watcher has one last use of ganeti commands as opposed to sending
requests via luxi. The patch changes this to use the cli functions.

The patch also has two other changes:
- fix the docstring for OpVerifyDisks (found out while converting...

e125c67c 08/07/2008 04:03 pm Michael Hanselmann

Use API instead of command line utilities in watcher

Reviewed-by: iustinp

59f187eb 07/30/2008 03:32 pm Iustin Pop

Unify SetupDaemon/SetupLogging

The 'old-style' info, error, debug logs do not make much sense. This
patch unifies the SetupLogging and SetupDaemon functions. As a result,
all the commands logs to a 'commands.log' file.

The patch also changes the log setup to keep going if there's an error...

eb0f0ce0 07/10/2008 03:38 pm Michael Hanselmann

Move watcher's LockFile function to utils

Reviewed-by: iustinp

26517d45 07/04/2008 07:01 pm Iustin Pop

Fix some issues with the watcher

This patch fixes two bugs:
- the state file is not saved because we use the method for checking
for udpated data
- in two places 'Error' was used instead of 'Exception', which breaks
error handling

Additionally:...

3b316acb 07/03/2008 03:06 pm Iustin Pop

Add custom logging setup for daemons

It's better for daemons if:
- they log only to one log file
- the log level is included
- for debug runs, the filename/line number is included

This patch moves the custom formatter from the watcher to the logging...

7bca53e4 06/18/2008 03:32 pm Michael Hanselmann

ganeti-watcher: Replace custom exceptions with ganeti.error.*

Reviewed-by: iustinp

2fb96d39 06/18/2008 03:31 pm Michael Hanselmann

ganeti-watcher: Don't write file if data didn't change

This is the safest way to detect changes and the amount of data
is small, so keeping a copy around is cheap enough.

Reviewed-by: iustinp

b76f660d 06/18/2008 03:31 pm Michael Hanselmann

ganeti-watcher: Rename WatcherState.data to WatcherState._data

Cleanup: _data is private and should not be modified from outside
of this class.

Reviewed-by: iustinp

1b052f42 06/18/2008 03:31 pm Michael Hanselmann

Don't log SystemExit exception in ganeti-watcher

Reviewed-by: iustinp

fc428e32 06/18/2008 03:31 pm Michael Hanselmann

Replace watcher state file atomically

- Lock it before renaming
- Code cleanup; close() automatically unlocks it

Reviewed-by: iustinp

78f3bd30 06/18/2008 03:30 pm Michael Hanselmann

Write ganeti-watcher status file even if something failed

Reviewed-by: iustinp

67fe61c4 06/18/2008 03:29 pm Michael Hanselmann

Use ganeti.serializer module in ganeti-watcher

Reviewed-by: ultrotter

438b45d4 06/18/2008 03:29 pm Michael Hanselmann

Replace custom logging code in watcher with logging module

- Log timestamp for all messages
- Write everything to logfile and optionally to stderr
- Log messages are no longer buffered, allowing a user to see progress

Reviewed-by: ultrotter

eee1fa2d 05/13/2008 12:48 pm Iustin Pop

Watcher: do not activate disks for started instances

Currently the watcher runs first the instance startup and then the
boot-id method of disk reactivation. However, irrelevant of the fact
that a node has rebooted or not, if we just started an instance, there's...

0c0f834d 05/13/2008 12:48 pm Iustin Pop

Watcher: do not activate disks for admin_down

Currently the watcher does activate disks (via bootid mechanisms) even
for admin_down instances. This patch logs and skips over these
instances.

Reviewed-by: ultrotter

d2f311db 12/12/2007 03:13 pm Iustin Pop

Modify ‘ganeti-watcher’ to run verify-disks

This patch modifies the watcher to run the ‘gnt-cluster verify-disks’
command and to log its output (if any).

Reviewed-by: imsnah

f4bc1f2c 12/03/2007 04:03 pm Michael Hanselmann

Various code style fixes for strings.

- When line wrapping is needed, move spaces to the next line.
- Remove embedded line breaks from error messages.

Reviewed-by: schreiberal

7b195d9b 11/13/2007 09:31 pm Michael Hanselmann

Small changes and fixes in ganeti-watcher.

- Use constants for keys.
- Fix bug through which automatic instance restarts wouldn't be limited

Reviewed-by: iustinp

781b2b2b 10/10/2007 04:05 pm Michael Hanselmann

Exit ganeti-watcher cleanly when there's no configuration.

Reviewed-by: iustinp

5a3103e9 10/10/2007 11:12 am Michael Hanselmann

Detect node restarts and reactivate disks.

- Change format of watcher state file to JSON.
- Move log path for watcher script to constants.py.

Reviewed-by: iustinp

89e1fc26 09/21/2007 04:37 pm Iustin Pop

Remove requirement that host names are FQDN

We currently require that hostnames are FQDN not short names
(node1.example.com instead of node1). We can allow short names as long
as:
- we always resolve the names as returned by socket.gethostname()
- we rely on having a working resolver...

3ecf6786 08/14/2007 06:17 pm Iustin Pop

Style changes for pep-8 and python-3000 compliance.

This changes the raising of exceptions from:
raise Exception, value
to
raise Exception(value)

as the first form will be removed in python-3000 and the second form is
preferred now.

The changes also involve a few cases of changing from raising standard...

098c0958 07/26/2007 02:40 pm Michael Hanselmann

Comment formatting updates.

Reviewed-by: iustinp

38242904 07/25/2007 11:05 am Iustin Pop

Make the ganeti-watcher exit gracefully if it's not run on the master.

Reviewed-by: imsnah

a8083063 07/16/2007 04:39 pm Iustin Pop

Initial commit.