Remove the extra_args parameter in instance start
This patch removes the extra_args parameter and instead switches theinstance to the HV_KERNEL_ARGS hypervisor option.
This is a big change, but it's a needed cleanup, this extra parameter onall RPC calls is not generic and we also need to have a persistent value...
watcher: fix checking of boot IDs
The recent change (commit 2151) to the watcher to make it handle offlinenodes also saves the offline attribute to the state file, but this isnot needed and also breaks the checking of the boot ID. This patchsimply removes it, restoring the correct behaviour....
watcher: autoarchive old jobs
This patch adds auto-archiving of jobs older than 6 hours to thewatcher.
Reviewed-by: imsnah
RAPI: fixes related to write mode
This patch fixes many small issues related to write functions: - update documentations w.r.t. how to add users - update the instance add function for latest API - add instance delete - fix addition of tags - update some error messages...
RAPI: format error messages as JSON
This patch changes the format of the HTTP error messages from text/html, whichis hard to parse from RAPI clients, to JSON which can be automatically parsed.
The error message is an object, which contains always three keys:...
Make RAPI return 502/504 errors for luxi errors
This changes the RAPI error codes for luxi errors; a timeout error isnow reported properly as 504, while any other luxi error is reported as502.
It would be good to convert even more errors into proper return codes in...
Fix ganeti-rapi startup with missing certificate
This patch displays a nicer error message compared to the defaultstacktrace.
master daemon: allow skipping the voting process
This patch introduces a 'force' mode for the master daemon startup wherethe voting process is not done, but the user has to confirm manually thestartup (before forking, of course).
Switch the instance_shutdown rpc to (status, data)
This patch changes the return type from this RPC call to include statusinformation and renames the backend method to match the RPC call name.
The patch is a little bigger than the reboot one, since this call is...
Switch the instance_reboot rpc to (status, data)
This small patch changes the return type from this RPC call to includestatus information and renames the backend method to match the RPC callname.
Reviewed-by: ultrotter
ganeti-noded: Create LOCK_DIR if missing
We need this directory for locks, so if for any reason it's not therewe'll create it. The permissions are the standard /var/lock permissions.
Reviewed-by: iustinp
Uniformize some function names in backend.py
Currently, the names of the functions in backend.py that are actuallyRPC procedures and are called from ganeti-noded are not corresponding tothe RPC names. This makes it hard to actually see which functions are...
rpc.call_blockdev_find: convert to (status, data)
This patch converts the call_blockdev_find - which searches for blockdevices and returns their status - to the (status, data) format. We alsomodify the backend function name to match the rpc call.
Fix handling OS errors in AddOSToInstance
This patch fixes the error handling in the add OS to instance functionwith regard to invalid OSes. Previously, we didn't handle any sucherrors, with the end result that the user would have to look in the nodedaemon log....
rapi: fix SSL mode and use SSL by default
This patch fixes the SSL mode (by actually constructing SSL parametersfrom the command line options) and enables SSL by default; the old “-S”option which enabled SSL is now changed to “--no-ssl”. The certificate...
rapi: fix authentication and queries
For queries, we don't want to require authentication. We fix this by adding anoverride GetAuthRealm in the rapi daemon.
We also fix a method name.
Add one new luxi query: cluster info
This is the last query that RAPI executes via opcodes and is purelystatic (config values only). As such, we can convert it safely to aquery instead of job.
Implement lockless query operations
This patch adds the framework for, and enables lockless OpQueryInstances. Thismeans that instances will be shown in ERROR_up or ERROR_down state, even thoughthis is not an error (but just an in-progress job).
The framework is implemented as follows:...
Fix some more pylint errors
Two are real errors (invalid names) and one is style error (overridingname from outer scope).
Add calls in the intra-node migration protocol
Currently the hypervisor is expected to do all the migration from thesource side. With this patch we also add the option of passing someinformation to the target side, and starting some operation there.
As a bonus, a function to cleanup any started operation is included....
Update the logging output of job processing
(this is related to the master daemon log)
Currently it's not possible to follow (in the non-debug runs) thelogical execution thread of jobs. This is due to the fact that we don'tlog the thread name (so we lose the association of log messages to jobs)...
Forward-port DrbdNetReconfig
This is a modified forward-port of DrbdNetReconfig and their associatedRPCs. In Ganeti 2.0, these functions will be used for two things: - live migration (as in 1.2) - and for other network reconfiguration tasks, since DRBD8.Attach()...
Small typo in ganeti-watcher
Rework the daemonization sequence
The current fork+close fds sequence has deficiencies which are hard towork around: - logging can start logging before we fork (e.g. if we need to emit messages related to master checking), and thus use FDs which we...
Add an instance_migratable rpc call
This is a forward-port of commit 1194 on the 1.2 branch:
This call will check whether an instance is up on its primary, and that it has been started with symlinks. We currently have no on-secondary checks, nor any hypervisor specific call....
Pass instance name to rpc call blockdev_close
This is an extract of commit 1166 on the 1.2 branch (Add a rpc call fordrbd network reconfiguration), but only the blockdev_close part.
The patch changes the blockdev_close call to take the instance so that...
Fix some pylint-detected issues
Two bad indentation cases and a missing variable.
ganeti-rapi: Implement HTTP authentication
Passwords are stored in "$localstatedir/lib/ganeti/rapi_users". Useroptions specify the access permissions of a user (see docstring forganeti.http.ReadPasswordFile), for which only "write" is supportedto grant write access. Every other user has read-only access....
ganeti-rapi: Introduce per-request context
This will be used to evaluate access permissions to resources.
Reviewed-by: amishchenko
Job queue: Allow more than one file rename per RPC call
Prevent RPC timeout on auto-archiving jobs
With a large job queue, auto-archiving jobs can take a very long time,causing timeouts on the luxi RPC layer. With this change, auto-archive returns after half of the RPC timeout has passed. The userwill see how many jobs are left unchecked....
Fix epydoc format warnings
This patch should fix all outstanding epydoc parsing errors; as such, weswitch epydoc into verbose mode so that any new errors will be visible.
Cleanup the config file on demotion from candidate
This patch adds a simple rpc which makes a backup of the config file andthen removes it. This is done so that cluster verify doesn't complainimmediately after demoting a node.
watcher: handle offline nodes better
This patch changes the LUQueryInstances to show a different state foroffline nodes and also modifies the watcher to understand the offlinestate in its checks.
ganeti-rapi: Convert to new HTTP server
ganeti-noded: Migrate to new HTTP server
Rename all HTTP classes to camel case
It should be consistent.
Fix master failover
The ssconf files were not updated by the master failover. We need topush them, and since we already have RPC initialized, we can use thestandard ConfigWriter to do so - this will take care of both the configfile and the ssconf files....
ganeti-masterd: create RUN_GANETI_DIR as well
Since we're not sure ganeti-noded has started yet, we need to createRUN_GANETI_DIR before SOCKET_DIR as well, with the proper permissions.
convert run dir mode to constant
ganeti-noded used to create all directories under /var/run with anhard-coded mode. convert it to a constant.
Move the MASTER_SOCKET to SOCKET_DIR
Before it was in the abstract linux namespace, where unfortunately wecouldn't easily check from python the credentials of the connectingclients. Now we also have to remove the file on exit and when starting.
ganeti-masterd: create SOCKET_DIR
If SOCKET_DIR doesn't exist we create it in the master daemon, beforetrying to put a socket inside it.
Pass ssconf values from master to node
Instead of parsing the configuration on the node, we pass the ssconfvalues from the master.
Use SSL for master/node RPC
This patch enables SSL between masterd and noded.
Get rid of node daemon password
With the new SSL client certificate stuff it's no longer needed.
ganeti-masterd: Remove PID file at the end
Removing the PID file should be the last thing done. This patch makessure it's also removed when master.server_cleanup() throws an exception.
Also initialize logging only after writing the PID file.
Reuse HTTP client pool for RPC
ganeti-masterd: Add initialization and shutdown of RPC pool. It needsto be shutdown before forking.
ganeti.cli: Add decorator function to initialize and shutdown RPC pool.
ganeti.rpc: Add functions to initialize and shutdown RPC pool. Throw...
Add RPC call to update ssconf files
Abstract runtime creation of dirs into a function
Currently the dir creation in ganeti-noded is in the main function. Thisis not nice: we move it into a separate function and also add creationof the OS_LOG_DIR (with different permissions, but in the same way)....
Document HttpServer.__init__
At the same time, simplify the interface a bit by not using a tuple.
Reviewed-by: killerfoxi, ultrotter
Export the disk index in the import/export scripts
We want to export the disk index as some OSes will only want to exportthe first disk (or the second one, etc.), even if we have multipledisks.
The patch also updates the backend.ExportSnapshot docstring....
Convert ImportOSIntoInstance to OS API 10
- Change ImportOSIntoInstance not to get any "os_disk" and "swap_disk" arguments but to accept multiple target images to import, and to return a list of booleans with the result of each import- Change the relevant rpc call and the only caller to conform...
Pass request headers in to RAPI handlers.
Convert the job queue rpcs to address-based
The two main multi-node job queue RPC calls (jobqueue_update,jobqueue_rename) are converted to address-based calls, in order to speedup queue changes. For this, we need to change the _nodes attribute onthe jobqueue to be a dict {name: ip}, instead of a set....
Remove the logger.py module
Since now we use only one function from the logger module(SetupLogging), we move it to utils.py (which is already imported by allusers of this function), and we remove the module.
Cleanup os_add/rename rpc for OS API 10
- remove now unused osdev and swapdev arguments from backend, noded, rpc, cmdlib- convert docstrings to epydoc
ETag passing support.
rapi: Convert to new HTTP server class
Requests are no longer logged to a separate file.
Improvements to the master startup checks
In order to account for future improvements to master failover, we movethe actual data gathering capabilities from ganeti-masterd intobootstrap.py, and we leave only the verification into masterd.
The verification procedure is then changed to retry multiple times (up...
Add an interface for the drain flag changes/query
This adds the set/reset in the jqueue and luxi modules, and a way toquery it in OpQueryConfigValues, and also the comand line interface forit:$ gnt-cluster queue infoThe drain flag is unset$ gnt-cluster queue drain...
Add a rpc call for changing the drain flag
A new multi-node call is added that sets/resets the drain flag.
Implement transport of ganeti errors across luxi
This patch adds a generic method to identify the ganeti error given itsclass name, and implements this across the luxi protocol.
rapi: Whitespace fixes
Export the hypervisor.ValidateParameters over RPC
The newly-added node-specific ValidateParams hypervisor method isexported over RPC, using the semi-standard (success, message) returnvalue. Multi-node call, so that we call on both primary and secondary at...
Fix a few rpc-related errors
This fixes: - whitespace change, double lines between methods - duplication of call_upload_file, introduced by mistake in rev 1795 and which went undetected because of the many changes in that ref (only diff -b shows it clearly)...
Abstract checking own address into a function
Currently, we check if we have a given ip address (i.e. it's alive onone of our interfaces) but manually calling TcpPing(source=localhost).This works, but having it spread all over the code makes it hard to...
Convert ganeti-noded to new HTTP server class
Convert rpc module to RpcRunner
This big patch changes the call model used in internode-rpc fromstandalong function calls in the rpc module to via a RpcRunner class,that holds all the methods. This can be used in the future to enablesmarter processing in the RPC layer itself (some quick examples are not...
Move the hypervisor attribute to the instances
This (big) patch moves the hypervisor type from the cluster to theinstance level; the cluster attribute remains as the default hypervisor,and will be renamed accordingly in a next patch. The cluster also gains...
rpc.call_instance_migrate: pass the whole instance
Currently the call_instance_migrate call only passes the instance name;we need to pass the whole object for the hypervisor_type changes (allthe other individual instance rpc calls already pass the instance...
Implement job 'waiting' status
Background: when we have multiple jobs in the queue (more than just afew), many of the jobs (up to the number of threads) will be in state'running', although many of them could be actually blocked, waiting forsome locks. This is not good, as one cannot easily see what is...
Implement job auto-archiving
This patch adds a new luxi call that implements auto-archiving of jobsolder than a certain age (or -1 for all completed jobs), and the gnt-jobcommand that makes use of this (with 'all' for -1).
backend.py change to get cluster name from master
Currently there are three function in backend that need the cluster namein order to instantiate an SshRunner. The patch changes these to get thecluster name from the master in the rpc call; once the multi-hypervisor...
Convert ganeti-master
Use simpleconfig instead of ssconf.
Convert ganeti-watcher
Use RPC calls instead of ssconf.
Convert ganeti-noded
Replace ssconf with utility functions.
Add new query to get cluster config values
This can be used to retrieve certain cluster config values fromwithin clients.
OpDumpClusterConfig was not used anywhere, hence I'm just reusingit. The way ConfigWriter.DumpConfig returned the configurationwas not thread-safe, anyway (no deepcopy)....
Fix the watcher with down nodes
The watcher didn't handle the down nodes, fix this by ignoring (insecondary node reboot checks) any node that doesn't return a boot id.
Fix the watcher not restarting instance bug
The watcher was using conflicting attributes of the instance: - it queried the admin_/oper_state, which are booleans - but it compared those to the status (which is a text field)
The code was changed to query the aggregated 'status' field, as that...
Remove last use of utils.RunCmd from the watcher
The watcher has one last use of ganeti commands as opposed to sendingrequests via luxi. The patch changes this to use the cli functions.
The patch also has two other changes: - fix the docstring for OpVerifyDisks (found out while converting...
ganeti-noded: Add constant for queue lock timeout
Implement master startup safety check
This is an initial version of the master startup checks. It's a veryrudimentary change, however in normal usage (an old master was started,the rest of the cluster is functioning normally) it will succeed inpreventing wrong startups....
Export backend.GetMasterInfo over the rpc layer
We create a multi-node call so that querying all nodes for agreementwill be fast.
Use lock timeout for queue updates in ganeti-noded
This helps to prevent complete deadlocks.
noded: Get job queue lock while purging queue content
Only one process should modify the queue at the same time.
Make WaitForJobChanges deal with long jobs
This patch alters the WaitForJobChanges luxi-RPC call to have aconfigurable timeout, so that the call behaves nicely with long jobsthat have no update.
We do this by adding a timeout parameter in the RPC call, and returning...
Make sure that client programs get all messages
This is a large patch, but I can't figure out how to split it withoutbreaking stuff. The old way of getting messages by always getting thelast one didn't bring all messages to the client if they were added...
Use Linux-specific way to name master socket
By using this Linux-specific way we don't have to care about removing thesocket file when quitting or starting (after an unclean shutdown). For amore detailed description, see the comment in the patch.
Reviewed-by: schreiberal
Add RPC call to wait for job changes
This way clients can react faster to status or message changes anddon't have to poll anymore.
Add query function for exports
noded: Add RPC function to rename job queue files
This will be used to archive jobs.
noded: Add decorator for job queue lock
The lock will also be needed by another function.
Implement queue locking in node daemon
More logging for errors during noded RPC calls
Add job queue RPC functions
jobqueue_update: Uploads a job queue file's content to a node. Themost common operation is to upload something that we already havein a string. Unlike in the upload_file function, the file is notread again when distributing changes, but content has to be passed...
Use API instead of command line utilities in watcher
Notify job queue about added/removed nodes
The job queue maintains its own node list and must be notifiedwhen nodes are added/removed.
Implement {Add,Readd,Remove}Node in GanetiContext
By doing this we've a central place which coordinates what needs to bedone when adding or removing nodes. Another patch will add calls intothe job queue.
Two log messages move to config.py.
When removing a node, node_leave_cluster is now called after it has...
jqueue: Don't pass the list of nodes to SubmitJob anymore
The job queue now maintains its own list and is updated whennodes are added or removed from the cluster.
masterd: Move job queue into context object
The job queue must be called from cmdlib when adding or removingnodes to the cluster. Moving it to the context objects makesthis possible.