History | View | Annotate | Download (18 kB)
Add a hint to masterd for inconsistent clusters
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Move RunInSeparateProcess to ganeti.utils
This function could be useful in other places and thisway we can easily unittest it.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
workerpool: Make worker ID alphanumeric
Having a proper name instead of just a number makes debuggingeasier.
Introduce a Luxi call for GetTags
This changes from submitting jobs to get the tags (in cli scripts) toqueries, which (since the tags query is a cheap one) should be muchfaster.
The tags queries are already done without locks (in the generic querypaths for instances/nodes/cluster), so this shouldn't break tags query...
Further pylint disables, mostly for Unused args
Many of our functions have to follow a given API, and thus we have tokeep a given signature, but pylint doesn't understand this. Therefore,we silence this warning.
The patch does a few other cleanups.
Signed-off-by: Iustin Pop <iustin@google.com>...
daemons: handle arguments correctly and uniformly
Of all daemons, only rapi did abort when given argument. None of ourdaemons use any arguments, but they accepted them blindly. This is avery bad experience for the user.
This patch adds checking and exiting in all daemons, in a uniform way....
Add targeted pylint disables
This patch should have only:
- pylint disables- docstring changes- whitespace changes
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Remove quotes from CommaJoin and convert to it
This patch removes the quotes from CommaJoin and converts most of thecallers (that I could find) to it. Since CommaJoin does str(i) for i inparam, we can remove these, thus simplifying slightly a few calls....
Processor: support a unique execution id
When the processor is executing a job, it can export the execution id toits callers. This is not supported for Queries, as they're not executedin a job.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
config.Add{Node,Instance}: get the ec id
This is ok because adding a node or instance cannot happen in a query.
We get the ec id from the LU and pass it to _EnsureUUID, which willthen for now not use it.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Convert the rest of the OpPrereqError users
This finishes the conversion of OpPrereqError creation to two-argumentstyle. Any leftovers as one-argument are not breaking anything, justlosing information about the errors.
Remove ‘-u’ from masterd shebang
This is not needed anymore - the original change was more than a yearago when masterd was in its incipient phase.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Remove RpcResult.RemoteFailMsg completely
Keep lock status with every job
This can be useful for debugging locking problems.
Move OpCode processor callbacks into separate class
There are two major arguments for this:- There will be more callbacks (e.g. for lock debugging) and extending the parameter list is a lot of work.- In the jqueue module this allows us to keep per-job or per-opcode variables in...
gnt-cluster watcher: Show more information
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Move the luxi error handling into errors.py
Currently the luxi error handling is hardcoded as special encoding onthe masterd-side and special decoding on the client side. This patchmoves it to errors.py such that other parts of the code can reuse thesame encoding....
Add file to pause watcher for a certain duration
This can be used during maintenance work.
ganeti-masterd: Master voting in separate process
One shouldn't fork a Python process after using threads. Mastervoting is done before forking (utils.Daemonize), but it also usesthreads. Hence it's now called from a separate process.
This patch also fixes the check function to actually exit if...
ganeti-masterd: Add helper to run function in separate process
This will be used to do the master voting.
Style fixes for ganeti-*
Convert ganeti-masterd to @utils.SignalHandled
Merge commit 'origin/next' into branch-2.1
Remove a few unused imports from noded/masterd
Signed-off-by: Guido Trotter <ultrotter@google.com>
Merge branch 'master' into next
Extend call_node_start_master rpc with no_voting
When the parameter is set to True and start_daemons is also True,ganeti-masterd will be started with the new --no-voting --yes-do-itoptions.
This new option is set to True only on masterfailover, when no_voting is...
Collapse daemon's main function
With three ganeti daemons, and one or two more coming, the daemon's mainfunction started becoming too much cut&pasted code. Collapsing most ofit in a daemon.GenericMain function. Some more code could be collapsedbetween the two http-based daemons, but since the new daemons won't be...
Slightly abstract the daemon logfile lookup
The original LOG_<DAEMON_NAME> constants for daemon logfiles are gone.In their place there is a DAEMONS_LOGFILES dict, indexed by daemon name.
This is a minor change with the objective to uniform most of thedaemon's main() functions code, which is very similar one to the other....
Remove <DAEMON>_PID constants
The <DAEMON>_PID constants were created to reference a daemon pid file,but actually contain a daemon's name, because the various functions thatwork with pidfiles abstract the filename from the daemon namethemselves. Removing the constants and using the actual daemon name...
Merge branch 'next' into branch-2.1
Remove references to utils.debug
Various modules set it to True when called in debugging mode, but theutils module supports no such global.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add a luxi call for multi-job submit
As a workaround for the job submit timeouts that we have, this patchadds a new luxi call for multi-job submit; the advantage is that all thejobs are added in the queue and only after the workers can startprocessing them....
Conflicts:
daemons/ganeti-masterd...
ganeti-masterd: avoid SimpleConfigReader
SimpleStore is a lot less heavyweight than SimpleConfigReader, and tojust get the master name we can use that. This is the only usage ofSimpleConfigReader currently, but we're not going to delete the class,as new usages will come in for ganeti-confd (in 2.1). Using it there,...
ganeti-masterd: allow non-interactive --no-voting
This will be used by ganeti-noded to start ganeti-masterd in a--no-voting masterfailover.
Fix problem with EAGAIN on socket connection in clients
If a user used ^Z to stop the program, poll() in socket.recv would returnEAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Convert node_start_master to new style result
This is used in multiple places outside cmdlib.py, so it's a moreinteresting patch.
Fix luxi serialization in ganeti-masterd
Currently, lib/luxi.py used lib/serializer.py for encoding/decodingmessages, but the master daemon uses directly the simplejson module.This is wrong as any non-trivial change to serializer.py will break themaster daemon....
Disable synchronous (locking) queries
This patch raises an error in the master daemon in case the userrequests a locking query; accordingly, all clients were modified to sendonly lockless queries. This is short-term fix, for proper fix theclients should be modified to submit a job when the user request a...
Add some more debugging info to masterd
This patch will log data about queries, which are today completelyinvisible (at the default log level) in the master log file.
Reviewed-by: imsnah
Create runtime dir in bootstrap
Some hypervisors (KVM) need RUN_GANETI_DIR to exist even at cluster inittime. This patch creates it in InitCluster just before hv parameterchecking. Since the code to make list of directories is already repeatedtwice in the code, and this would be the third time, we abstract it into...
master daemon: allow skipping the voting process
This patch introduces a 'force' mode for the master daemon startup wherethe voting process is not done, but the user has to confirm manually thestartup (before forking, of course).
Add one new luxi query: cluster info
This is the last query that RAPI executes via opcodes and is purelystatic (config values only). As such, we can convert it safely to aquery instead of job.
Implement lockless query operations
This patch adds the framework for, and enables lockless OpQueryInstances. Thismeans that instances will be shown in ERROR_up or ERROR_down state, even thoughthis is not an error (but just an in-progress job).
The framework is implemented as follows:...
Fix some more pylint errors
Two are real errors (invalid names) and one is style error (overridingname from outer scope).
Reviewed-by: ultrotter
Update the logging output of job processing
(this is related to the master daemon log)
Currently it's not possible to follow (in the non-debug runs) thelogical execution thread of jobs. This is due to the fact that we don'tlog the thread name (so we lose the association of log messages to jobs)...
Rework the daemonization sequence
The current fork+close fds sequence has deficiencies which are hard towork around: - logging can start logging before we fork (e.g. if we need to emit messages related to master checking), and thus use FDs which we...
Fix some pylint-detected issues
Two bad indentation cases and a missing variable.
Prevent RPC timeout on auto-archiving jobs
With a large job queue, auto-archiving jobs can take a very long time,causing timeouts on the luxi RPC layer. With this change, auto-archive returns after half of the RPC timeout has passed. The userwill see how many jobs are left unchecked....
Fix epydoc format warnings
This patch should fix all outstanding epydoc parsing errors; as such, weswitch epydoc into verbose mode so that any new errors will be visible.
Fix master failover
The ssconf files were not updated by the master failover. We need topush them, and since we already have RPC initialized, we can use thestandard ConfigWriter to do so - this will take care of both the configfile and the ssconf files....
ganeti-masterd: create RUN_GANETI_DIR as well
Since we're not sure ganeti-noded has started yet, we need to createRUN_GANETI_DIR before SOCKET_DIR as well, with the proper permissions.
Move the MASTER_SOCKET to SOCKET_DIR
Before it was in the abstract linux namespace, where unfortunately wecouldn't easily check from python the credentials of the connectingclients. Now we also have to remove the file on exit and when starting.
ganeti-masterd: create SOCKET_DIR
If SOCKET_DIR doesn't exist we create it in the master daemon, beforetrying to put a socket inside it.
ganeti-masterd: Remove PID file at the end
Removing the PID file should be the last thing done. This patch makessure it's also removed when master.server_cleanup() throws an exception.
Also initialize logging only after writing the PID file.
Reviewed-by: iustinp
Reuse HTTP client pool for RPC
ganeti-masterd: Add initialization and shutdown of RPC pool. It needsto be shutdown before forking.
ganeti.cli: Add decorator function to initialize and shutdown RPC pool.
ganeti.rpc: Add functions to initialize and shutdown RPC pool. Throw...
Convert the job queue rpcs to address-based
The two main multi-node job queue RPC calls (jobqueue_update,jobqueue_rename) are converted to address-based calls, in order to speedup queue changes. For this, we need to change the _nodes attribute onthe jobqueue to be a dict {name: ip}, instead of a set....
Remove the logger.py module
Since now we use only one function from the logger module(SetupLogging), we move it to utils.py (which is already imported by allusers of this function), and we remove the module.
Improvements to the master startup checks
In order to account for future improvements to master failover, we movethe actual data gathering capabilities from ganeti-masterd intobootstrap.py, and we leave only the verification into masterd.
The verification procedure is then changed to retry multiple times (up...
Add an interface for the drain flag changes/query
This adds the set/reset in the jqueue and luxi modules, and a way toquery it in OpQueryConfigValues, and also the comand line interface forit:$ gnt-cluster queue infoThe drain flag is unset$ gnt-cluster queue drain...
Implement transport of ganeti errors across luxi
This patch adds a generic method to identify the ganeti error given itsclass name, and implements this across the luxi protocol.
Convert rpc module to RpcRunner
This big patch changes the call model used in internode-rpc fromstandalong function calls in the rpc module to via a RpcRunner class,that holds all the methods. This can be used in the future to enablesmarter processing in the RPC layer itself (some quick examples are not...
Implement job 'waiting' status
Background: when we have multiple jobs in the queue (more than just afew), many of the jobs (up to the number of threads) will be in state'running', although many of them could be actually blocked, waiting forsome locks. This is not good, as one cannot easily see what is...
Implement job auto-archiving
This patch adds a new luxi call that implements auto-archiving of jobsolder than a certain age (or -1 for all completed jobs), and the gnt-jobcommand that makes use of this (with 'all' for -1).
Convert ganeti-master
Use simpleconfig instead of ssconf.
Add new query to get cluster config values
This can be used to retrieve certain cluster config values fromwithin clients.
OpDumpClusterConfig was not used anywhere, hence I'm just reusingit. The way ConfigWriter.DumpConfig returned the configurationwas not thread-safe, anyway (no deepcopy)....
Implement master startup safety check
This is an initial version of the master startup checks. It's a veryrudimentary change, however in normal usage (an old master was started,the rest of the cluster is functioning normally) it will succeed inpreventing wrong startups....
Make WaitForJobChanges deal with long jobs
This patch alters the WaitForJobChanges luxi-RPC call to have aconfigurable timeout, so that the call behaves nicely with long jobsthat have no update.
We do this by adding a timeout parameter in the RPC call, and returning...
Make sure that client programs get all messages
This is a large patch, but I can't figure out how to split it withoutbreaking stuff. The old way of getting messages by always getting thelast one didn't bring all messages to the client if they were added...
Use Linux-specific way to name master socket
By using this Linux-specific way we don't have to care about removing thesocket file when quitting or starting (after an unclean shutdown). For amore detailed description, see the comment in the patch.
Reviewed-by: schreiberal
Add RPC call to wait for job changes
This way clients can react faster to status or message changes anddon't have to poll anymore.
Add query function for exports
Notify job queue about added/removed nodes
The job queue maintains its own node list and must be notifiedwhen nodes are added/removed.
Implement {Add,Readd,Remove}Node in GanetiContext
By doing this we've a central place which coordinates what needs to bedone when adding or removing nodes. Another patch will add calls intothe job queue.
Two log messages move to config.py.
When removing a node, node_leave_cluster is now called after it has...
jqueue: Don't pass the list of nodes to SubmitJob anymore
The job queue now maintains its own list and is updated whennodes are added or removed from the cluster.
masterd: Move job queue into context object
The job queue must be called from cmdlib when adding or removingnodes to the cluster. Moving it to the context objects makesthis possible.
Implement query for nodes
Implement query for instances
Queries don't create jobs and are more efficient. Log messagesare not yet stored anywhere.
Unify SetupDaemon/SetupLogging
The 'old-style' info, error, debug logs do not make much sense. Thispatch unifies the SetupLogging and SetupDaemon functions. As a result,all the commands logs to a 'commands.log' file.
The patch also changes the log setup to keep going if there's an error...
Rework master startup/shutdown/failover
This (big) patch reworks the master startup/shutdown and the fixes themaster failover.
What does the patch do?
For master start/stop: - remove the old ganeti-master script and its associated man page - moves the ip start/stop directly into the backend.(Start|Stop)Master...
Implement checking for the master role in rapi
This patch moves the CheckMaster function from ganeti-masterd to ssconf(most logical place, it cannot go in utils since we would have recursiveimports between ssconf and utils) and changes ganeti-rapi to also call...
Use constants for the pid file stems
Fix RPC parameters for {Cancel,Archive}Job
They aren't be tuples on the client side.
ganeti-masterd: write and remove pidfile
Distribute the queue serial file after each update
This patch adds distribution of the queue serial file after each writeto it (but before a new job is created and written with that ID, andbefore a response is returned, so we should be safe from crashes in...
Use new signal handler class in master daemon
Fix previous patch using workerpool in masterd
The function to stop a worker pool is TerminateWorkers(), not Shutdown().
Use workerpool in master daemon
Reusing threads instead of starting one for each request is more efficient.
Remove more old job queue code
Apparently I forgot to this code when removing the rest.
Fix double-logging in daemons
Currently, in debug mode, both the logfile handler and the stderrhandler will log debug messages. Since the stderr is redirected to thesame logfile (to catch non-logged errors), it means log entries aredoubled.
The patch adds an extra parameter to the logger.SetupDaemon() function...
Remove the old locking functions
This removes (hopefully) all traces of the old locking functions anduses.
Remove old job queue code
Change masterd/client RPC protocol
- Introduce abstraction class on client side- Use constants for method names- Adopt legacy function SubmitOpCode to use it
Make luxi RPC more flexible
- Use constants for dict entries- Handle exceptions on server side- Rename client function to CallMethod to match server side naming
Instantiate new job queue in master daemon
Add custom logging setup for daemons
It's better for daemons if: - they log only to one log file - the log level is included - for debug runs, the filename/line number is included
This patch moves the custom formatter from the watcher to the logging...