Make sure that client programs get all messages
This is a large patch, but I can't figure out how to split it withoutbreaking stuff. The old way of getting messages by always getting thelast one didn't bring all messages to the client if they were added...
Use Linux-specific way to name master socket
By using this Linux-specific way we don't have to care about removing thesocket file when quitting or starting (after an unclean shutdown). For amore detailed description, see the comment in the patch.
Reviewed-by: schreiberal
Add RPC call to wait for job changes
This way clients can react faster to status or message changes anddon't have to poll anymore.
Reviewed-by: ultrotter
Add query function for exports
Reviewed-by: iustinp
noded: Add RPC function to rename job queue files
This will be used to archive jobs.
noded: Add decorator for job queue lock
The lock will also be needed by another function.
Implement queue locking in node daemon
More logging for errors during noded RPC calls
Add job queue RPC functions
jobqueue_update: Uploads a job queue file's content to a node. Themost common operation is to upload something that we already havein a string. Unlike in the upload_file function, the file is notread again when distributing changes, but content has to be passed...
Use API instead of command line utilities in watcher
Notify job queue about added/removed nodes
The job queue maintains its own node list and must be notifiedwhen nodes are added/removed.
Implement {Add,Readd,Remove}Node in GanetiContext
By doing this we've a central place which coordinates what needs to bedone when adding or removing nodes. Another patch will add calls intothe job queue.
Two log messages move to config.py.
When removing a node, node_leave_cluster is now called after it has...
jqueue: Don't pass the list of nodes to SubmitJob anymore
The job queue now maintains its own list and is updated whennodes are added or removed from the cluster.
masterd: Move job queue into context object
The job queue must be called from cmdlib when adding or removingnodes to the cluster. Moving it to the context objects makesthis possible.
Implement query for nodes
Implement query for instances
Queries don't create jobs and are more efficient. Log messagesare not yet stored anywhere.
First write operation (add tag) for Ganeti RAPI
Add instance tag handling, improved error logging....oh, yes adopt instance listing for RAPI2!
Unify SetupDaemon/SetupLogging
The 'old-style' info, error, debug logs do not make much sense. Thispatch unifies the SetupLogging and SetupDaemon functions. As a result,all the commands logs to a 'commands.log' file.
The patch also changes the log setup to keep going if there's an error...
Rework master startup/shutdown/failover
This (big) patch reworks the master startup/shutdown and the fixes themaster failover.
What does the patch do?
For master start/stop: - remove the old ganeti-master script and its associated man page - moves the ip start/stop directly into the backend.(Start|Stop)Master...
Implement checking for the master role in rapi
This patch moves the CheckMaster function from ganeti-masterd to ssconf(most logical place, it cannot go in utils since we would have recursiveimports between ssconf and utils) and changes ganeti-rapi to also call...
Add a new parameter to backend.(Start|Stop)Master
This patch adds a new, unused for now, parameter to the start and stopmaster operations in backend. The idea behind it is that we need to beable to control whether the IP (de)activation is coupled with daemon...
Use constants for the pid file stems
Reviewed-by: imsnah
Make the rapi daemon create a pidfile
This is needed for controlling it cleanly with start-stop daemon.
Implement signal handling in ganeti-rapi
Move ganeti-rapi core code to daemon
All other daemons have their main code in themselves and not in a module.This patch does the same to ganeti-rapi by moving the code fromlib/rapi/RESTHTTPServer.py to daemons/ganeti-rapi.
Fix RPC parameters for {Cancel,Archive}Job
They aren't be tuples on the client side.
ganeti-masterd: write and remove pidfile
ganeti-noded: write and remove pid file
Distribute the queue serial file after each update
This patch adds distribution of the queue serial file after each writeto it (but before a new job is created and written with that ID, andbefore a response is returned, so we should be safe from crashes in...
Handle signals in node daemon
This also fixes a TODO added by ultrotter by killing the parentprocess when QuitGanetiException is raised.
Use new signal handler class in master daemon
Breath life in to RAPI for trunk
Fork ganeti-noded
Create a new ForkingHTTPServer in ganeti-noded by deriving both fromNodeDaemonHttpServer and ForkingMixin. This will allow us to processconcurrent requests.
Fix previous patch using workerpool in masterd
The function to stop a worker pool is TerminateWorkers(), not Shutdown().
Use workerpool in master daemon
Reusing threads instead of starting one for each request is more efficient.
Use new HTTP server classes in ganeti-noded
Initial copy of RAPI filebase to the trunk
Remove more old job queue code
Apparently I forgot to this code when removing the rest.
Move watcher's LockFile function to utils
Fix double-logging in daemons
Currently, in debug mode, both the logfile handler and the stderrhandler will log debug messages. Since the stderr is redirected to thesame logfile (to catch non-logged errors), it means log entries aredoubled.
The patch adds an extra parameter to the logger.SetupDaemon() function...
ganeti-noded logging improvements
The patch adds some more logging to the node daemon:
- log methods at beggining not only at the end- log method parameters (they are very verbose, but useful)
A separate change is to initialize the global variable in the global...
Remove the old locking functions
This removes (hopefully) all traces of the old locking functions anduses.
Remove old job queue code
Change masterd/client RPC protocol
- Introduce abstraction class on client side- Use constants for method names- Adopt legacy function SubmitOpCode to use it
Make luxi RPC more flexible
- Use constants for dict entries- Handle exceptions on server side- Rename client function to CallMethod to match server side naming
Instantiate new job queue in master daemon
Create all SUB_RUN_DIRS in ganeti-noded
Rather than just creating BDEV_CACHE_DIR we loop through theSUB_RUN_DIRS list and create all its childs.
Fix some issues with the watcher
This patch fixes two bugs: - the state file is not saved because we use the method for checking for udpated data - in two places 'Error' was used instead of 'Exception', which breaks error handling
Additionally:...
Add custom logging setup for daemons
It's better for daemons if: - they log only to one log file - the log level is included - for debug runs, the filename/line number is included
This patch moves the custom formatter from the watcher to the logging...
ganeti-masterd: Remove unused locking code
Reviewed-by: iustinp, ultrotter
ganeti-masterd: Use logging module
Reviewed-by: ultrotter, iustinp
Context: s/GLM/glm/
Make the GanetiLockManager instance of GanetiContext lowercase
Increase the thread size to 5
Now that we use the locking library to make sure running opcodes cannotstep on each other toes we can have a bigger thread size, andpotentially process many opcodes in a parallel manner.
Processor: pass context in and use it.
The processor used to create a new ConfigWriter when it was initialized.We now have one in the context, so we'll just recycle it. First of allwe'll pass the context in when creating a new Processor object, thenwe'll just use context.cfg, which is granted to be initialized, wherever...
ganeti-masterd: init and distribute common context
This patch creates a new GanetiContext class, which is used to holdcontext common to all ganeti worker threads. As for theGanetiLockingManager class it is paramount that there is only one suchclass throughout the execution of Ganeti, so the class checks for that,...
ganeti-noded: Fix handling of QuitGanetiException
- s/GanetiQuitException/QuitGanetiException/- Look for the arguments in err.args, not err itself
ganeti-noded: quit on QuitGanetiException
Accoring to the usage documented in the QuitGanetiException docstring,if we receive such an exception we'll set the global _EXIT_GANETI_NODEDvariable to True, and then return either a valid value or an errormessage to the user. This will be the last request we serve, though,...
ganeti-noded: serve not quite forever
Rather than calling httpd.serve_forever() in ganeti-noded we'll callhttpd.handle_request() but just while a global variable, which we'llcall _EXIT_GANETI_NODED, remains false.
Handle any exception in ganeti-masterd
If an uncaught exception is thrown currently it destroys the callingthread. This patch changes the behaviour to failing the current job,logging a message, but trying to keep the daemon up.
Add a rpc call for BlockDev.Close()
This patch adds rpc layer calls (in rpc.py and the equivalent inganeti-noded) to close a list of block devices, and the wrapper inbackend.py that takes a list of Disk objects, identifies them andreturns correctly formatted results....
Use a single Makefile.am instead of many
This change allows us to use cleaner dependencies betweendirectories. The build system is basically rewritten in large partsand may contain bugs.
ganeti-watcher: Replace custom exceptions with ganeti.error.*
ganeti-watcher: Don't write file if data didn't change
This is the safest way to detect changes and the amount of datais small, so keeping a copy around is cheap enough.
ganeti-watcher: Rename WatcherState.data to WatcherState._data
Cleanup: _data is private and should not be modified from outsideof this class.
Don't log SystemExit exception in ganeti-watcher
Replace watcher state file atomically
- Lock it before renaming- Code cleanup; close() automatically unlocks it
Write ganeti-watcher status file even if something failed
Use ganeti.serializer module in ganeti-watcher
Replace custom logging code in watcher with logging module
- Log timestamp for all messages- Write everything to logfile and optionally to stderr- Log messages are no longer buffered, allowing a user to see progress
Implement block device grow at the rpc layer
This simple patch exposes the block device grow operation at the rpclayer. It does not increase the protocol version as it has been recentlychanged by the live failover rpc call.
Add migration support at the rpc layer
This patch adds the migration rpc call and its implementation in thebackend. The patch does not deal with the correct activation of disks.
Because of the new RPC, the protocol version is increased.
Replace logging functions with calls to logging module
- Shorter code- Reorder arguments to logger.SetupLogging calls to make more sense
Fail job on ganeti exceptions
When a Job raises a ganeti exception a message is printed but nothing isreported in the job itself. It's better to update the job status, thusnotifying the client, possibly polling for the job result, of what wentwrong.
Watcher: do not activate disks for started instances
Currently the watcher runs first the instance startup and then theboot-id method of disk reactivation. However, irrelevant of the factthat a node has rebooted or not, if we just started an instance, there's...
Watcher: do not activate disks for admin_down
Currently the watcher does activate disks (via bootid mechanisms) evenfor admin_down instances. This patch logs and skips over theseinstances.
ganeti-masterd: Some docstrings work
- Add a docstring to IOServer's constructor- Add argument description to PoolWorker's and JobRunner's ones
Disable forking in the master daemon
This patch adds a mechanism to disable utils.RunCmd in selectedprograms. This is needed in the master daemon unless we confirmthreading doesn't pose any problems.
This makes cluster init fail, but creating new trunk clusters is anyway...
Move the 'cmd' lock from cli.py to ganeti-masterd
This patch removes the lock and the lock options from cli.py and movesthem to the master.
Later during development we can remove it completely, but for now it'sgood to protect any other tool that uses the lock directly....
Convert cli.SubmitOpCode to use the master
This patch converts the cli.py SubmitOpCode method to use the unixprotocol and thus execute the opcodes via the master.
The patch allows a partial burnin to work with the master. Currently thequery opcodes, since they are executed via the SubmitOpCode, are...
Move iallocator script execution to ganeti-noded
Currently the iallocator execution takes place in the master, which is aviolation of the current architecture, and will create problems with athreaded master daemon.
This patch moves the execution to the backend, similar to the hooks...
Add per-opcode results to job processing
This patch changes the definition of a job and introduces per-opcoderesults.
First, the result and status fields of a job are condensed into a single'status' attribute. Then, we introduce an opcode status and one result...
Implement forking/master role checking in masterd
This patch adds checks for the master role and daemonize support toganeti-masterd.
The patch modifies the startup/shutdown of the server because: - we want bind()/listen() to the master socket to occur before forking...
ganeti-noded directory functions for file backend
Add a simple gnt-job script
This patch adds a very basic gnt-job script that allows job querying.This goes on top of the previous master daemon patches.
Currently, because of the not-changed cmd lock, you can't query the jobsas long as a job is running - you have to rm the cmd lock and then you...
Move the daemonize function to utils.py
Currently, in ganeti-noded we have the createDaemon function. Sincewe'll need the same in other daemons, we move this function to utils.py
With the move, a few changes were also done: - change the name to Daemonize()...
Initial tests with ganeti-masterd
This patch adds a very in-progress master daemon. This needs to belaunched manually, does not background itself, but can be used foropcode execution.
Also parts of this code should be moved to luxi.py.
Reduce log noise for the new http-based rpc
This patch just removes an extraneous \n from the log message making itnicer to view.
Make ganeti-noded create BDEV_CACHE_DIR automatically
Currently in order to deal with tmpfs /var/run, we create theBDEV_CACHE_DIR in the init script. However, that does not cover all thecases, and it's not a proper place to deal with it: for example, dealing...
Modify utils.TcpPing to make source address optional
This patch modifies TcpPing and its callers to make the source addressselection optional. Usually, the kernel will know better whatsource address to use, just in some cases we want to enforce a given...
Break trunk by removing twisted
This patch switches from the twisted usage for inter-node protocol tosimple BaseHTTPServer/httplib. The patch has more deletions because weuse no authentication, no encryption at all.
As such, this is just for trunk, and only for testing. What it brings is...
Add a test opcode that sleeps for a given duration
This can be used for testing purposes.
Reviewed-by: ultrotter,imsnah
Modify ‘ganeti-watcher’ to run verify-disks
This patch modifies the watcher to run the ‘gnt-cluster verify-disks’command and to log its output (if any).
Various code style fixes for strings.
- When line wrapping is needed, move spaces to the next line.- Remove embedded line breaks from error messages.
Make utils.RunCmd log failures when using debug
This patch adds logging of command failures to the debug log in case theuser either started the command (gnt-*) or the node daemon with thedebug flag.
Small changes and fixes in ganeti-watcher.
- Use constants for keys.- Fix bug through which automatic instance restarts wouldn't be limited
Convert os_get to use OS rather than InvalidOS
In order to do this for simplicity we leave the OSFromDisk function as-is andwe convert the eventual exception to an OS object in ganeti-noded. Theunmangling gets simplified and so does the code for checking whether the OS is...
Simplify diagnose mangling/unmangling functions
The functions in ganeti-noded and rpc.py still deal with the fact that anInvalidOS error could be returned by DiagnoseOS. As this is not the caseanymore simplify their code for the current behavior.
Reviewed-By: iustinp
Implement device to instance mapping cache
Currently, troubleshooting DRBD problems involves a manual process of goingbackwards from the DRBD device to the instance that owns it.
This patch adds a weak (i.e. not guaranteed to be correct or up-to-date)cache of device to instance. The cache should be, in normal operation,...
Implement block device renaming
This patch add code for renaming a device; more precisely, for changingthe unique_id of the device. This means: - logical volumes, rename the volume - drbd8, change the remote peer
This is needed for the being able to replace disks for drbd8....
Modify two mirror-device related rpc calls
The two calls mirror_addchild and mirror_removechild take only one childfor addition/removal. While this is enough for our md usage, for localdisk replacement in drbd8, we need to be able to specify both the data...