Add RPC calls for storage unit list
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Collapse SSL key checking/overriding for daemons
Signed-off-by: Guido Trotter <ultrotter@google.com>
Collapse daemon's main function
With three ganeti daemons, and one or two more coming, the daemon's mainfunction started becoming too much cut&pasted code. Collapsing most ofit in a daemon.GenericMain function. Some more code could be collapsedbetween the two http-based daemons, but since the new daemons won't be...
Slightly abstract the daemon logfile lookup
The original LOG_<DAEMON_NAME> constants for daemon logfiles are gone.In their place there is a DAEMONS_LOGFILES dict, indexed by daemon name.
This is a minor change with the objective to uniform most of thedaemon's main() functions code, which is very similar one to the other....
Remove <DAEMON>_PID constants
The <DAEMON>_PID constants were created to reference a daemon pid file,but actually contain a daemon's name, because the various functions thatwork with pidfiles abstract the filename from the daemon namethemselves. Removing the constants and using the actual daemon name...
Move rapi to GetDaemonPort
Currently rapi is the only daemon which accepts a port option, ratherthan querying its own port from services, and failing back to thedefault if not found. Changing this to conform to what other daemons do.
Also update the ganeti-rapi(8) manpage...
Change GetNodeDaemonPort to GetDaemonPort in utils
GetNodeDaemonPort is used to lookup the node daemon port in the servicesfile, and if not found to return the default one. We make it a genericfunction, which accepts the daemon name in input, so that it can be used...
Merge branch 'next' into branch-2.1
Remove references to utils.debug
Various modules set it to True when called in debugging mode, but theutils module supports no such global.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
ganeti-rapi, replace hardcoded exit value
substitute exit(1) with exit(constants.EXIT_FAILURE).Also fix a wrongly indented line.
Add the bind-address option to ganeti-rapi
noded: Abstract hard-coded sys.exit value
On machines without the ssl file noded exists '5'.Changing this to constants.EXIT_NOTCLUSTER.
Also utils.GetNodeDaemonPort hasn't risen errors.ConfigurationError fora while, so removing that try/except block....
Add a luxi call for multi-job submit
As a workaround for the job submit timeouts that we have, this patchadds a new luxi call for multi-job submit; the advantage is that all thejobs are added in the queue and only after the workers can startprocessing them....
Conflicts:
daemons/ganeti-masterd...
ganeti-masterd: avoid SimpleConfigReader
SimpleStore is a lot less heavyweight than SimpleConfigReader, and tojust get the master name we can use that. This is the only usage ofSimpleConfigReader currently, but we're not going to delete the class,as new usages will come in for ganeti-confd (in 2.1). Using it there,...
Extend call_node_start_master rpc with no_voting
When the parameter is set to True and start_daemons is also True,ganeti-masterd will be started with the new --no-voting --yes-do-itoptions.
This new option is set to True only on masterfailover, when no_voting is...
Merge branch 'master' into next
ganeti-masterd: allow non-interactive --no-voting
This will be used by ganeti-noded to start ganeti-masterd in a--no-voting masterfailover.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix problem with EAGAIN on socket connection in clients
If a user used ^Z to stop the program, poll() in socket.recv would returnEAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Rename the volume_list RPC call to lv_list
There are volume-related rpc calls. This patch renames the ‘volume_list’call to ‘lv_list’ to make more clear its purpose.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Simplify the RPC result framework in backend.py
Since now all functions fail via _Fail, the return True, … is redundantas all normal return paths have it, and thus the True value can be addedin the ganeti-noded handler.
This means that all functions can now forget about the special result...
Implement result-type restriction in ganeti-noded
Since all rpc calls were converted, we can now: - enforce result type to (status, data) - convert all unhandled exceptions to (False, str(err))
This makes sure that all unhandled errors are reported to rpc users....
Big rewrite of the OS-related functions
Currently the OSes have a special, customized error handling: the OSobject can represent either a valid OS, or an invalid OS. The associatedfunction, instead of raising other exception or failing, create customOS objects representing failed OSes....
Convert the jobqueue rpc to new style result
This patch converts the job queue rpc calls to the new style result.It's done in a single patch as there are helper function (in both jqueueand backend) that are used by multiple rpcs and need synchronizedchange....
Convert os_diagnose rpc to new style result
This also removes custom post-processing from rpc.py; since this callhas only one user, it was simple to move it back to the caller.
Convert call_version rpc to new style result
This also cleans up its single use in cmdlib.py.
Conver node_leave_cluster rpc to new style result
This patch converts this rpc call to the new style result, and alsochanges in the process the meaning of the QuitGanetiException'sarguments and the node daemon rpc call exception handler.
The problem with the exception handler is that we used a two-stage one,...
Convert node_start_master to new style result
This is used in multiple places outside cmdlib.py, so it's a moreinteresting patch.
Convert node_has_ip_address rpc to new style
This should actually have a function in backend, but it's fine for now.
Convert instance_list rpc to new style result
Since backend.GetInstanceList() is used both as RPC endpoint and asinternal function, it can't return (status, value). Instead it returnsonly valid instance info, and failures are denoted by exceptions; and...
Convert volume_list rpc to new style result
This is a big change, because we need to cleanup its users too.
The call and thus LUVerifyDisks LU used to differentiate between failureat node level and failure at LV level, by returning different types inthe RPC result. This is way too complicated for our needs....
Convert export_info rpc to new style result
This also removes some code from ganeti-noded and rpc.py, which shouldnot do such processing of data (and be simply glue code). (Oralternatively they could, if we had better infrastructure).
Signed-off-by: Iustin Pop <iustin@google.com>...
rpc: Add a simple failure reporting framework
This patch adds a simple failure reporting tool, similar to bdev's_ThrowError. In backend, we move towards the new-style RPC results (oftype (status, payload)) and thus functions which use this style can very...
Add a node powercycle command
This (somewhat big) patch adds support for remotely rebooting the nodesvia whatever support the hypervisor has for such a concept.
For KVM/fake (and containers in the future) this just uses sysrq plus a‘reboot’ call if the sysrq method failed. For Xen, it first tries the...
watcher: automatically restart noded/rapi
This patch makes the watcher automatically restart the node and rapidaemons, if they are not running (as per the PID file).
This is not an exhaustive test; a better one would be TCP connect to theport, and an even better one a simple protocol ping (e.g. get / for rapi...
watcher: handle full and drained queue cases
Currently the watcher is broken when the queue is full, thus notfulfilling its job as a queue cleaner. It also doesn't handle nicely thequeue drained status.
This patch does a few changes: - first archive jobs, and only after submit jobs; this fixes the case...
Merge branch 'master' into branch-2.1
watcher: write the instance status to a file
This patch modifies the watcher to keep on-disk a file with the instancestatus; this can be used from outside of ganeti to react to instancesbeing down (when the watcher cannot restart them).
Merge commit 'origin/next' into branch-2.1
watcher: try to restart the master if down
Bugs in either our code or in associated libraries can bring the master daemondown, and this (due to the 2.0 architecture) stops all work on the cluster.
Since the watcher already does periodic checks on the cluster, we modify...
Inform the OS create script of reinstalls
Sometimes reinstalls are slightly different than new installs. Forexample certain partitions may need to be preserved accross reinstalls.In order to do that on a per-os basis we pass in the INSTANCE_REINSTALLvariable to inform the create script about when a reinstall is...
ganeti-noded: add bind address option
This allows ganeti-noded to bind only on one interface rather than allthe ones on the machine. The default behaviour doesn't change.
Fix luxi serialization in ganeti-masterd
Currently, lib/luxi.py used lib/serializer.py for encoding/decodingmessages, but the master daemon uses directly the simplejson module.This is wrong as any non-trivial change to serializer.py will break themaster daemon....
Disable synchronous (locking) queries
This patch raises an error in the master daemon in case the userrequests a locking query; accordingly, all clients were modified to sendonly lockless queries. This is short-term fix, for proper fix theclients should be modified to submit a job when the user request a...
Fix the output of watcher on non-master nodes
Currently the watcher spews errors message on non-master nodes. Thiscleans it up.
Reviewed-by: imsnah
Change the watcher to use jobs instead of queries
As per the mailing list discussion, this patch changes the watcher touse a single job (two opcodes) for getting the cluster state (node listand instance list); it will then compute the needed actions based on...
Add some more debugging info to masterd
This patch will log data about queries, which are today completelyinvisible (at the default log level) in the master log file.
watcher: fix startup sequence locking the master
Currently, the watcher startup sequence does: - open a luxi client - get the instance list - get the node boot ids - open and lock the status file, and: - archive jobs - restart the down instances...
Create runtime dir in bootstrap
Some hypervisors (KVM) need RUN_GANETI_DIR to exist even at cluster inittime. This patch creates it in InitCluster just before hv parameterchecking. Since the code to make list of directories is already repeatedtwice in the code, and this would be the third time, we abstract it into...
Remove the extra_args parameter in instance start
This patch removes the extra_args parameter and instead switches theinstance to the HV_KERNEL_ARGS hypervisor option.
This is a big change, but it's a needed cleanup, this extra parameter onall RPC calls is not generic and we also need to have a persistent value...
watcher: fix checking of boot IDs
The recent change (commit 2151) to the watcher to make it handle offlinenodes also saves the offline attribute to the state file, but this isnot needed and also breaks the checking of the boot ID. This patchsimply removes it, restoring the correct behaviour....
watcher: autoarchive old jobs
This patch adds auto-archiving of jobs older than 6 hours to thewatcher.
RAPI: fixes related to write mode
This patch fixes many small issues related to write functions: - update documentations w.r.t. how to add users - update the instance add function for latest API - add instance delete - fix addition of tags - update some error messages...
RAPI: format error messages as JSON
This patch changes the format of the HTTP error messages from text/html, whichis hard to parse from RAPI clients, to JSON which can be automatically parsed.
The error message is an object, which contains always three keys:...
Make RAPI return 502/504 errors for luxi errors
This changes the RAPI error codes for luxi errors; a timeout error isnow reported properly as 504, while any other luxi error is reported as502.
It would be good to convert even more errors into proper return codes in...
Fix ganeti-rapi startup with missing certificate
This patch displays a nicer error message compared to the defaultstacktrace.
master daemon: allow skipping the voting process
This patch introduces a 'force' mode for the master daemon startup wherethe voting process is not done, but the user has to confirm manually thestartup (before forking, of course).
Switch the instance_shutdown rpc to (status, data)
This patch changes the return type from this RPC call to include statusinformation and renames the backend method to match the RPC call name.
The patch is a little bigger than the reboot one, since this call is...
Switch the instance_reboot rpc to (status, data)
This small patch changes the return type from this RPC call to includestatus information and renames the backend method to match the RPC callname.
Reviewed-by: ultrotter
ganeti-noded: Create LOCK_DIR if missing
We need this directory for locks, so if for any reason it's not therewe'll create it. The permissions are the standard /var/lock permissions.
Reviewed-by: iustinp
Uniformize some function names in backend.py
Currently, the names of the functions in backend.py that are actuallyRPC procedures and are called from ganeti-noded are not corresponding tothe RPC names. This makes it hard to actually see which functions are...
rpc.call_blockdev_find: convert to (status, data)
This patch converts the call_blockdev_find - which searches for blockdevices and returns their status - to the (status, data) format. We alsomodify the backend function name to match the rpc call.
Fix handling OS errors in AddOSToInstance
This patch fixes the error handling in the add OS to instance functionwith regard to invalid OSes. Previously, we didn't handle any sucherrors, with the end result that the user would have to look in the nodedaemon log....
rapi: fix SSL mode and use SSL by default
This patch fixes the SSL mode (by actually constructing SSL parametersfrom the command line options) and enables SSL by default; the old “-S”option which enabled SSL is now changed to “--no-ssl”. The certificate...
rapi: fix authentication and queries
For queries, we don't want to require authentication. We fix this by adding anoverride GetAuthRealm in the rapi daemon.
We also fix a method name.
Add one new luxi query: cluster info
This is the last query that RAPI executes via opcodes and is purelystatic (config values only). As such, we can convert it safely to aquery instead of job.
Implement lockless query operations
This patch adds the framework for, and enables lockless OpQueryInstances. Thismeans that instances will be shown in ERROR_up or ERROR_down state, even thoughthis is not an error (but just an in-progress job).
The framework is implemented as follows:...
Fix some more pylint errors
Two are real errors (invalid names) and one is style error (overridingname from outer scope).
Add calls in the intra-node migration protocol
Currently the hypervisor is expected to do all the migration from thesource side. With this patch we also add the option of passing someinformation to the target side, and starting some operation there.
As a bonus, a function to cleanup any started operation is included....
Update the logging output of job processing
(this is related to the master daemon log)
Currently it's not possible to follow (in the non-debug runs) thelogical execution thread of jobs. This is due to the fact that we don'tlog the thread name (so we lose the association of log messages to jobs)...
Forward-port DrbdNetReconfig
This is a modified forward-port of DrbdNetReconfig and their associatedRPCs. In Ganeti 2.0, these functions will be used for two things: - live migration (as in 1.2) - and for other network reconfiguration tasks, since DRBD8.Attach()...
Small typo in ganeti-watcher
Rework the daemonization sequence
The current fork+close fds sequence has deficiencies which are hard towork around: - logging can start logging before we fork (e.g. if we need to emit messages related to master checking), and thus use FDs which we...
Add an instance_migratable rpc call
This is a forward-port of commit 1194 on the 1.2 branch:
This call will check whether an instance is up on its primary, and that it has been started with symlinks. We currently have no on-secondary checks, nor any hypervisor specific call....
Pass instance name to rpc call blockdev_close
This is an extract of commit 1166 on the 1.2 branch (Add a rpc call fordrbd network reconfiguration), but only the blockdev_close part.
The patch changes the blockdev_close call to take the instance so that...
Fix some pylint-detected issues
Two bad indentation cases and a missing variable.
ganeti-rapi: Implement HTTP authentication
Passwords are stored in "$localstatedir/lib/ganeti/rapi_users". Useroptions specify the access permissions of a user (see docstring forganeti.http.ReadPasswordFile), for which only "write" is supportedto grant write access. Every other user has read-only access....
ganeti-rapi: Introduce per-request context
This will be used to evaluate access permissions to resources.
Reviewed-by: amishchenko
Job queue: Allow more than one file rename per RPC call
Prevent RPC timeout on auto-archiving jobs
With a large job queue, auto-archiving jobs can take a very long time,causing timeouts on the luxi RPC layer. With this change, auto-archive returns after half of the RPC timeout has passed. The userwill see how many jobs are left unchecked....
Fix epydoc format warnings
This patch should fix all outstanding epydoc parsing errors; as such, weswitch epydoc into verbose mode so that any new errors will be visible.
Cleanup the config file on demotion from candidate
This patch adds a simple rpc which makes a backup of the config file andthen removes it. This is done so that cluster verify doesn't complainimmediately after demoting a node.
watcher: handle offline nodes better
This patch changes the LUQueryInstances to show a different state foroffline nodes and also modifies the watcher to understand the offlinestate in its checks.
ganeti-rapi: Convert to new HTTP server
ganeti-noded: Migrate to new HTTP server
Rename all HTTP classes to camel case
It should be consistent.
Fix master failover
The ssconf files were not updated by the master failover. We need topush them, and since we already have RPC initialized, we can use thestandard ConfigWriter to do so - this will take care of both the configfile and the ssconf files....
ganeti-masterd: create RUN_GANETI_DIR as well
Since we're not sure ganeti-noded has started yet, we need to createRUN_GANETI_DIR before SOCKET_DIR as well, with the proper permissions.
convert run dir mode to constant
ganeti-noded used to create all directories under /var/run with anhard-coded mode. convert it to a constant.
Move the MASTER_SOCKET to SOCKET_DIR
Before it was in the abstract linux namespace, where unfortunately wecouldn't easily check from python the credentials of the connectingclients. Now we also have to remove the file on exit and when starting.
ganeti-masterd: create SOCKET_DIR
If SOCKET_DIR doesn't exist we create it in the master daemon, beforetrying to put a socket inside it.
Pass ssconf values from master to node
Instead of parsing the configuration on the node, we pass the ssconfvalues from the master.
Use SSL for master/node RPC
This patch enables SSL between masterd and noded.
Get rid of node daemon password
With the new SSL client certificate stuff it's no longer needed.
ganeti-masterd: Remove PID file at the end
Removing the PID file should be the last thing done. This patch makessure it's also removed when master.server_cleanup() throws an exception.
Also initialize logging only after writing the PID file.
Reuse HTTP client pool for RPC
ganeti-masterd: Add initialization and shutdown of RPC pool. It needsto be shutdown before forking.
ganeti.cli: Add decorator function to initialize and shutdown RPC pool.
ganeti.rpc: Add functions to initialize and shutdown RPC pool. Throw...