Statistics
| Branch: | Tag: | Revision:

root / daemons / ganeti-watcher @ 90b54c26

History | View | Annotate | Download (14.4 kB)

# Date Author Comment
7dfb83c2 05/19/2009 02:28 pm Iustin Pop

watcher: try to restart the master if down

Bugs in either our code or in associated libraries can bring the master daemon
down, and this (due to the 2.0 architecture) stops all work on the cluster.

Since the watcher already does periodic checks on the cluster, we modify...

2c404217 04/06/2009 11:21 am Iustin Pop

Fix the output of watcher on non-master nodes

Currently the watcher spews errors message on non-master nodes. This
cleans it up.

Reviewed-by: imsnah

6dfcc47b 04/06/2009 11:21 am Iustin Pop

Change the watcher to use jobs instead of queries

As per the mailing list discussion, this patch changes the watcher to
use a single job (two opcodes) for getting the cluster state (node list
and instance list); it will then compute the needed actions based on...

cc962d58 03/09/2009 05:12 pm Iustin Pop

watcher: fix startup sequence locking the master

Currently, the watcher startup sequence does:
- open a luxi client
- get the instance list
- get the node boot ids
- open and lock the status file, and:
- archive jobs
- restart the down instances...

07813a9e 02/24/2009 05:25 pm Iustin Pop

Remove the extra_args parameter in instance start

This patch removes the extra_args parameter and instead switches the
instance to the HV_KERNEL_ARGS hypervisor option.

This is a big change, but it's a needed cleanup, this extra parameter on
all RPC calls is not generic and we also need to have a persistent value...

3448aa22 02/16/2009 01:08 pm Iustin Pop

watcher: fix checking of boot IDs

The recent change (commit 2151) to the watcher to make it handle offline
nodes also saves the offline attribute to the state file, but this is
not needed and also breaks the checking of the boot ID. This patch
simply removes it, restoring the correct behaviour....

f07521e5 02/16/2009 01:08 pm Iustin Pop

watcher: autoarchive old jobs

This patch adds auto-archiving of jobs older than 6 hours to the
watcher.

Reviewed-by: imsnah

ec79568d 02/04/2009 12:30 pm Iustin Pop

Implement lockless query operations

This patch adds the framework for, and enables lockless OpQueryInstances. This
means that instances will be shown in ERROR_up or ERROR_down state, even though
this is not an error (but just an in-progress job).

The framework is implemented as follows:...

4bffa7f7 01/13/2009 10:04 am Iustin Pop

Small typo in ganeti-watcher

Reviewed-by: imsnah

c41eea6e 12/11/2008 07:13 pm Iustin Pop

Fix epydoc format warnings

This patch should fix all outstanding epydoc parsing errors; as such, we
switch epydoc into verbose mode so that any new errors will be visible.

Reviewed-by: imsnah

cbfc4681 12/05/2008 04:58 am Iustin Pop

watcher: handle offline nodes better

This patch changes the LUQueryInstances to show a different state for
offline nodes and also modifies the watcher to understand the offline
state in its checks.

Reviewed-by: ultrotter

82d9caef 10/20/2008 03:50 pm Iustin Pop

Remove the logger.py module

Since now we use only one function from the logger module
(SetupLogging), we move it to utils.py (which is already imported by all
users of this function), and we remove the module.

Reviewed-by: imsnah

2859b87b 10/01/2008 08:36 pm Michael Hanselmann

Convert ganeti-watcher

Use RPC calls instead of ssconf.

Reviewed-by: iustinp

37b77b18 10/01/2008 12:27 pm Iustin Pop

Fix the watcher with down nodes

The watcher didn't handle the down nodes, fix this by ignoring (in
secondary node reboot checks) any node that doesn't return a boot id.

Reviewed-by: imsnah

b7309a0d 10/01/2008 12:27 pm Iustin Pop

Fix the watcher not restarting instance bug

The watcher was using conflicting attributes of the instance:
- it queried the admin_/oper_state, which are booleans
- but it compared those to the status (which is a text field)

The code was changed to query the aggregated 'status' field, as that...

5188ab37 10/01/2008 12:27 pm Iustin Pop

Remove last use of utils.RunCmd from the watcher

The watcher has one last use of ganeti commands as opposed to sending
requests via luxi. The patch changes this to use the cli functions.

The patch also has two other changes:
- fix the docstring for OpVerifyDisks (found out while converting...

e125c67c 08/07/2008 04:03 pm Michael Hanselmann

Use API instead of command line utilities in watcher

Reviewed-by: iustinp

59f187eb 07/30/2008 03:32 pm Iustin Pop

Unify SetupDaemon/SetupLogging

The 'old-style' info, error, debug logs do not make much sense. This
patch unifies the SetupLogging and SetupDaemon functions. As a result,
all the commands logs to a 'commands.log' file.

The patch also changes the log setup to keep going if there's an error...

eb0f0ce0 07/10/2008 03:38 pm Michael Hanselmann

Move watcher's LockFile function to utils

Reviewed-by: iustinp

26517d45 07/04/2008 07:01 pm Iustin Pop

Fix some issues with the watcher

This patch fixes two bugs:
- the state file is not saved because we use the method for checking
for udpated data
- in two places 'Error' was used instead of 'Exception', which breaks
error handling

Additionally:...

3b316acb 07/03/2008 03:06 pm Iustin Pop

Add custom logging setup for daemons

It's better for daemons if:
- they log only to one log file
- the log level is included
- for debug runs, the filename/line number is included

This patch moves the custom formatter from the watcher to the logging...

7bca53e4 06/18/2008 03:32 pm Michael Hanselmann

ganeti-watcher: Replace custom exceptions with ganeti.error.*

Reviewed-by: iustinp

2fb96d39 06/18/2008 03:31 pm Michael Hanselmann

ganeti-watcher: Don't write file if data didn't change

This is the safest way to detect changes and the amount of data
is small, so keeping a copy around is cheap enough.

Reviewed-by: iustinp

b76f660d 06/18/2008 03:31 pm Michael Hanselmann

ganeti-watcher: Rename WatcherState.data to WatcherState._data

Cleanup: _data is private and should not be modified from outside
of this class.

Reviewed-by: iustinp

1b052f42 06/18/2008 03:31 pm Michael Hanselmann

Don't log SystemExit exception in ganeti-watcher

Reviewed-by: iustinp

fc428e32 06/18/2008 03:31 pm Michael Hanselmann

Replace watcher state file atomically

- Lock it before renaming
- Code cleanup; close() automatically unlocks it

Reviewed-by: iustinp

78f3bd30 06/18/2008 03:30 pm Michael Hanselmann

Write ganeti-watcher status file even if something failed

Reviewed-by: iustinp

67fe61c4 06/18/2008 03:29 pm Michael Hanselmann

Use ganeti.serializer module in ganeti-watcher

Reviewed-by: ultrotter

438b45d4 06/18/2008 03:29 pm Michael Hanselmann

Replace custom logging code in watcher with logging module

- Log timestamp for all messages
- Write everything to logfile and optionally to stderr
- Log messages are no longer buffered, allowing a user to see progress

Reviewed-by: ultrotter

eee1fa2d 05/13/2008 12:48 pm Iustin Pop

Watcher: do not activate disks for started instances

Currently the watcher runs first the instance startup and then the
boot-id method of disk reactivation. However, irrelevant of the fact
that a node has rebooted or not, if we just started an instance, there's...

0c0f834d 05/13/2008 12:48 pm Iustin Pop

Watcher: do not activate disks for admin_down

Currently the watcher does activate disks (via bootid mechanisms) even
for admin_down instances. This patch logs and skips over these
instances.

Reviewed-by: ultrotter

d2f311db 12/12/2007 03:13 pm Iustin Pop

Modify ‘ganeti-watcher’ to run verify-disks

This patch modifies the watcher to run the ‘gnt-cluster verify-disks’
command and to log its output (if any).

Reviewed-by: imsnah

f4bc1f2c 12/03/2007 04:03 pm Michael Hanselmann

Various code style fixes for strings.

- When line wrapping is needed, move spaces to the next line.
- Remove embedded line breaks from error messages.

Reviewed-by: schreiberal

7b195d9b 11/13/2007 09:31 pm Michael Hanselmann

Small changes and fixes in ganeti-watcher.

- Use constants for keys.
- Fix bug through which automatic instance restarts wouldn't be limited

Reviewed-by: iustinp

781b2b2b 10/10/2007 04:05 pm Michael Hanselmann

Exit ganeti-watcher cleanly when there's no configuration.

Reviewed-by: iustinp

5a3103e9 10/10/2007 11:12 am Michael Hanselmann

Detect node restarts and reactivate disks.

- Change format of watcher state file to JSON.
- Move log path for watcher script to constants.py.

Reviewed-by: iustinp

89e1fc26 09/21/2007 04:37 pm Iustin Pop

Remove requirement that host names are FQDN

We currently require that hostnames are FQDN not short names
(node1.example.com instead of node1). We can allow short names as long
as:
- we always resolve the names as returned by socket.gethostname()
- we rely on having a working resolver...

3ecf6786 08/14/2007 06:17 pm Iustin Pop

Style changes for pep-8 and python-3000 compliance.

This changes the raising of exceptions from:
raise Exception, value
to
raise Exception(value)

as the first form will be removed in python-3000 and the second form is
preferred now.

The changes also involve a few cases of changing from raising standard...

098c0958 07/26/2007 02:40 pm Michael Hanselmann

Comment formatting updates.

Reviewed-by: iustinp

38242904 07/25/2007 11:05 am Iustin Pop

Make the ganeti-watcher exit gracefully if it's not run on the master.

Reviewed-by: imsnah

a8083063 07/16/2007 04:39 pm Iustin Pop

Initial commit.