History | View | Annotate | Download (23.8 kB)
Merge branch 'devel-2.2'
Disable the RAPI CA checks in watcher
Since the RAPI certificate is not necessarily self-signed, and wecurrently don't have any configuration variable for the real CA file, wedisable for now the CA checks. This fixes the 'restart RAPI every 5minutes' problem with non-self-signed certs....
Support for resolving hostnames to IPv6 addresses
This patch enables IPv6 name resolution by using socket.getaddrinfoinstead of socket.gethostbyname_ex.
It renames the HostInfo class to Hostname and unifies its use throughoutthe code. This is achieved by using static calls where no object is...
watcher: smarter handling of instance records
This patch implements a few changes to the instance handling. First, oldinstances which no longer exist on the cluster are removed from thestate file, to keep things clean.
Second, the instance restart counters are reset every 8 hours, since...
Introduce lib/netutils.py
This patch moves network utility functions to a dedicated module.
Signed-off-by: Manuel Franceschini <livewire@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
RAPI client: Switch to pycURL
Currently the RAPI client uses the urllib2 and httplib modules fromPython's standard library. They're used with pyOpenSSL in a very fragileway, and there are known issues when receiving large responses from a RAPIserver....
Rename some constants to facilitate IPv6 support
Signed-off-by: Manuel Franceschini <livewire@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
ganeti-watcher should attempt to fix ganeti-rapi
Update ganeti-watcher so that it tests the master's RAPI port with asimple test (in this case GetVersion). If it fails, make one attemptat restarting ganeti-rapi and retest.
- daemons/ganeti-watcher: Test rapi and make one attempt at restarting it....
Make watcher request the max coverage
Since the actions are potentially destructive, we should try to get aconsistent view of the cluster, so it's better to get the most coveragepossible.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Watcher: automatic shutdown of orphan resources
This patch changes the watcher so that it maintains (on all nodes) thelist of instances and DRBD devices by shutting down ones that confddaemons indicate should not be running on this node.
Signed-off-by: Iustin Pop <iustin@google.com>...
Watcher: do not warn for missing hooks dir
If the hooks dir does not exist, do not warn needlessly. This is similarto commit a9b7e346 (for backend.py).
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Watcher: fix some doc typos
Switch from os.path.join to utils.PathJoin
This passes a full burnin with lots of instances, and should be safe aswe mostly to join a known root (various constants) to a run-timevariable.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
watcher: Acquire lock early and give more friendly message
By opening the lock file early, other programs can lock thestate file to prevent ganeti-watcher from restarting daemons.Using the pause feature is inherently prone to race conditions.
Before a traceback was logged when the lock file couldn't...
Move watcher's EnsureDaemon function to utils
This is going to be used from the nbma repository, to ensure that thenld daemon is running.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add watcher hooks
These hooks are run on all nodes, after the "base" daemons are started.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Abstract starting the node daemons
We're using a separate function for this, as we're going to add somefunctionality to this feature.
ganeti-watcher: remove unused Indent function
Catch disk activation errors in watcher
If activating disks fails for some reason, the watcher didn'tcatch the exception. With this patch it's caught and logged.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
ganeti-watcher: ensure confd is running as well
Ganeti-confd should be running on all 2.1 nodes.
Fix unused imports or add silences where needed
In some cases pylint doesn't parse the import correctly, so we addsilences; but there are also many cases of unused imports, which wesimply remove.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
daemons: handle arguments correctly and uniformly
Of all daemons, only rapi did abort when given argument. None of ourdaemons use any arguments, but they accepted them blindly. This is avery bad experience for the user.
This patch adds checking and exiting in all daemons, in a uniform way....
Remove more unused variables
This removes unused variables in the rest of the code (outside lib/).
Add targeted pylint disables
This patch should have only:
- pylint disables- docstring changes- whitespace changes
Fix use of the logging functions
The logging functions expand the arguments themselves, thus it's saferto let them do it rather than manual string formatting.
Also re-wraps one comment.
Remove quotes from CommaJoin and convert to it
This patch removes the quotes from CommaJoin and converts most of thecallers (that I could find) to it. Since CommaJoin does str(i) for i inparam, we can remove these, thus simplifying slightly a few calls....
Add new “daemon-util” script to start/stop Ganeti daemons
Until now, Ganeti started and stopped its own daemons using custom functions.To start, the daemon was just executed and then sent the appropriate signals tostop it again. Init scripts would have to pay attention to the PID file and...
Make ganeti-watcher use the standard debug option
ganeti-watcher: Don't run if paused
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Remove <DAEMON>_PID constants
The <DAEMON>_PID constants were created to reference a daemon pid file,but actually contain a daemon's name, because the various functions thatwork with pidfiles abstract the filename from the daemon namethemselves. Removing the constants and using the actual daemon name...
watcher: automatically restart noded/rapi
This patch makes the watcher automatically restart the node and rapidaemons, if they are not running (as per the PID file).
This is not an exhaustive test; a better one would be TCP connect to theport, and an even better one a simple protocol ping (e.g. get / for rapi...
watcher: handle full and drained queue cases
Currently the watcher is broken when the queue is full, thus notfulfilling its job as a queue cleaner. It also doesn't handle nicely thequeue drained status.
This patch does a few changes: - first archive jobs, and only after submit jobs; this fixes the case...
Merge branch 'master' into next
Signed-off-by: Guido Trotter <ultrotter@google.com>
watcher: write the instance status to a file
This patch modifies the watcher to keep on-disk a file with the instancestatus; this can be used from outside of ganeti to react to instancesbeing down (when the watcher cannot restart them).
watcher: try to restart the master if down
Bugs in either our code or in associated libraries can bring the master daemondown, and this (due to the 2.0 architecture) stops all work on the cluster.
Since the watcher already does periodic checks on the cluster, we modify...
Fix the output of watcher on non-master nodes
Currently the watcher spews errors message on non-master nodes. Thiscleans it up.
Reviewed-by: imsnah
Change the watcher to use jobs instead of queries
As per the mailing list discussion, this patch changes the watcher touse a single job (two opcodes) for getting the cluster state (node listand instance list); it will then compute the needed actions based on...
watcher: fix startup sequence locking the master
Currently, the watcher startup sequence does: - open a luxi client - get the instance list - get the node boot ids - open and lock the status file, and: - archive jobs - restart the down instances...
Remove the extra_args parameter in instance start
This patch removes the extra_args parameter and instead switches theinstance to the HV_KERNEL_ARGS hypervisor option.
This is a big change, but it's a needed cleanup, this extra parameter onall RPC calls is not generic and we also need to have a persistent value...
watcher: fix checking of boot IDs
The recent change (commit 2151) to the watcher to make it handle offlinenodes also saves the offline attribute to the state file, but this isnot needed and also breaks the checking of the boot ID. This patchsimply removes it, restoring the correct behaviour....
watcher: autoarchive old jobs
This patch adds auto-archiving of jobs older than 6 hours to thewatcher.
Implement lockless query operations
This patch adds the framework for, and enables lockless OpQueryInstances. Thismeans that instances will be shown in ERROR_up or ERROR_down state, even thoughthis is not an error (but just an in-progress job).
The framework is implemented as follows:...
Small typo in ganeti-watcher
Fix epydoc format warnings
This patch should fix all outstanding epydoc parsing errors; as such, weswitch epydoc into verbose mode so that any new errors will be visible.
watcher: handle offline nodes better
This patch changes the LUQueryInstances to show a different state foroffline nodes and also modifies the watcher to understand the offlinestate in its checks.
Reviewed-by: ultrotter
Remove the logger.py module
Since now we use only one function from the logger module(SetupLogging), we move it to utils.py (which is already imported by allusers of this function), and we remove the module.
Convert ganeti-watcher
Use RPC calls instead of ssconf.
Reviewed-by: iustinp
Fix the watcher with down nodes
The watcher didn't handle the down nodes, fix this by ignoring (insecondary node reboot checks) any node that doesn't return a boot id.
Fix the watcher not restarting instance bug
The watcher was using conflicting attributes of the instance: - it queried the admin_/oper_state, which are booleans - but it compared those to the status (which is a text field)
The code was changed to query the aggregated 'status' field, as that...
Remove last use of utils.RunCmd from the watcher
The watcher has one last use of ganeti commands as opposed to sendingrequests via luxi. The patch changes this to use the cli functions.
The patch also has two other changes: - fix the docstring for OpVerifyDisks (found out while converting...
Use API instead of command line utilities in watcher
Unify SetupDaemon/SetupLogging
The 'old-style' info, error, debug logs do not make much sense. Thispatch unifies the SetupLogging and SetupDaemon functions. As a result,all the commands logs to a 'commands.log' file.
The patch also changes the log setup to keep going if there's an error...
Move watcher's LockFile function to utils
Fix some issues with the watcher
This patch fixes two bugs: - the state file is not saved because we use the method for checking for udpated data - in two places 'Error' was used instead of 'Exception', which breaks error handling
Additionally:...
Add custom logging setup for daemons
It's better for daemons if: - they log only to one log file - the log level is included - for debug runs, the filename/line number is included
This patch moves the custom formatter from the watcher to the logging...
ganeti-watcher: Replace custom exceptions with ganeti.error.*
ganeti-watcher: Don't write file if data didn't change
This is the safest way to detect changes and the amount of datais small, so keeping a copy around is cheap enough.
ganeti-watcher: Rename WatcherState.data to WatcherState._data
Cleanup: _data is private and should not be modified from outsideof this class.
Don't log SystemExit exception in ganeti-watcher
Replace watcher state file atomically
- Lock it before renaming- Code cleanup; close() automatically unlocks it
Write ganeti-watcher status file even if something failed
Use ganeti.serializer module in ganeti-watcher
Replace custom logging code in watcher with logging module
- Log timestamp for all messages- Write everything to logfile and optionally to stderr- Log messages are no longer buffered, allowing a user to see progress
Watcher: do not activate disks for started instances
Currently the watcher runs first the instance startup and then theboot-id method of disk reactivation. However, irrelevant of the factthat a node has rebooted or not, if we just started an instance, there's...
Watcher: do not activate disks for admin_down
Currently the watcher does activate disks (via bootid mechanisms) evenfor admin_down instances. This patch logs and skips over theseinstances.
Modify ‘ganeti-watcher’ to run verify-disks
This patch modifies the watcher to run the ‘gnt-cluster verify-disks’command and to log its output (if any).
Various code style fixes for strings.
- When line wrapping is needed, move spaces to the next line.- Remove embedded line breaks from error messages.
Reviewed-by: schreiberal
Small changes and fixes in ganeti-watcher.
- Use constants for keys.- Fix bug through which automatic instance restarts wouldn't be limited
Exit ganeti-watcher cleanly when there's no configuration.
Detect node restarts and reactivate disks.
- Change format of watcher state file to JSON.- Move log path for watcher script to constants.py.
Remove requirement that host names are FQDN
We currently require that hostnames are FQDN not short names(node1.example.com instead of node1). We can allow short names as longas: - we always resolve the names as returned by socket.gethostname() - we rely on having a working resolver...
Style changes for pep-8 and python-3000 compliance.
This changes the raising of exceptions from: raise Exception, valueto raise Exception(value)
as the first form will be removed in python-3000 and the second form ispreferred now.
The changes also involve a few cases of changing from raising standard...
Comment formatting updates.
Make the ganeti-watcher exit gracefully if it's not run on the master.
Initial commit.