Design document for KVM daemon
Design document for KVM daemon which is needed by the instanceshutdown detection for KVM.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Improve the point-free section of the style guide
Distinguish declaring functions in the point-free style and usinga very similar technique to avoid parentheses (which isn't technicallypoint-free).
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Document 2.11 to 2.10 specific downgrade tasks
While the recommended way of downgrading from version 2.11 to 2.10is ``gnt-cluster upgrade --to 2.10``, manual downgrade is supported.So, the version-specific steps need to be documented in the UPGRADEnotes....
Remove certification on 2.11 to 2.10 downgrade
While version 2.10 ignores any leftover client certificates, theirpresence will prevent a the cluster working after an upgrade backto version 2.11 again. So we have to remove them right at thedowngrade.
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
Add support for version-specific downgrade tasks
Upgrading can have no specific knowledge about additionaltasks besides upgrading the configuration, as upgrades needto be able to go to any future version (within the same majorversion). Downgrading, however, is version specific and always...
design: version-specific downgrade actions
Some new features, like client-specific ssl certificates, require additionalsteps at downgrade, so add this to the design. Two things should be noted.
- There cannot be explicit version-specific upgrade actions; upgrades...
Document support for automatic downgrades
The recommended way of downgrading a cluster from 2.11 onwardsis to use the ``gnt-cluster upgrade`` command. Document this inthe section on downgrades.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Clean up epydoc comments
Add missing colons, and improve descriptions of parameters.
Signed-off-by: Hrvoje Ribicic <riba@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Use options for turning functionality on/off
Two command-line options are added: one for confirming that the testhas been started intentionally, and one for showing the methodinvocation output, which is useful, but not always needed.
Signed-off-by: Hrvoje Ribicic <riba@google.com>...
Add job cancellation workload
To examine if jobs can be cancelled correctly, provide workload relatedto this as well.
Add cluster parameter change workload
One of the few leftover unused RAPI methods is the cluster modifymethod. This patch tests it by setting and unsetting various safevalues.
Make an instance move workload that works in 2.6
The instance move workload present before this patch works on 2.11, butfails on 2.6. The 2.11 workload will still be useful should any laterversion of Ganeti use it as reference, but a 2.6 workload has to be...
Add instance move workload
Through the use of functions provided by the rapi QA, all the requestsrelated to instance moves can be exercised.
Make the move-instance tool more fault tolerant
The move-instance tool raises an exception when used with a clusterrunning an earlier version of Ganeti. As the tool is meant to performinter-cluster moves, this situation could be encountered outside the...
Allow the skipping of checks for inter-cluster move test
The inter-cluster instance move test is very interesting for the RAPIcompatibility tests, as it uses many RAPI requests that are otherwisehard to exercise. It uses no command-line functionality apart from...
Make the finish function return the error status explicitly
The earlier version of the Finish function assumed that checking if thevalue of the response is None would suffice to check if any errors haveoccurred. This is not, and this patch adjusts the function to expose...
Add migration and failover workload
This patch introduces additional calls adding migration and failoverRAPI operations, moving a DRBD-disk template instance between nodes.
Add tracking of used client methods
As a helper or a warning to anyone extending the RAPI client, theclient wrapper now warns of unused methods or method arguments.
Add network workload
This patch exercises the network RAPI commands.
Add miniature query filtering workload
As query filtering was not a part of the previous workloads, this patchadds a single example of its use.
Add per-resource query workload
The query requests are done to receive data about a certain resourcetype. With tests for all the resources barring networks in place, thequery workloads can be added at the point where the existence of enoughresources in the system can be confirmed, making the results of the...
Add group-related workload
This patch further extends the RAPI workload by exercising all thegroup-related functionality.
Add node-related workload
This patch further expands the workload by performing various nodeoperations.
Add warning about the RecreateInstanceDisks invocation
A test relying on RAPI alone cannot exercise the RecreateInstanceDisksfunctionality properly - simply because it cannot damage an instanceto the point where its disks would be missing and in need of...
Add various single instance operations
To further expand the number of RAPI methods in the workload, thesingle instance operations are added in this patch. An instance iscreated, deleted, shutdown, restarted, reinstalled, renamed, andhas its disks activated and deactivated and grown....
Add tag method testing
This patch adds a generic way to test tagging of various entities viaRAPI. More tags testing will be added as other entitt tests are added.
Add helper function that waits for jobs to finish
Some RAPI calls result in the creation of a long-running job,returning a job id to be used to extract the results later. To reducethe amount of boilerplate, introduce a simple function to do thewaiting....
Add simple retrieval operations to workload
This patch expands the RAPI workload with simple Get* commands.
Add the first version of the RAPI workload script
The RAPI workload script supplies work for the RAPI compatibilitytests. The initial version does very little, but can be expandedas needed.
Make the qa_rapi setup method return the RAPI client
Move RAPI secret lookup to qa_rapi
The RAPI secret lookup is a helper function used by the Ganeti QA toretrieve the RAPI password of an already setup cluster. As this couldbe useful to other utilities performing QA, move it to the qa_rapimodule.
Add code style document to documentation
The Ganeti code style has been stored on the project wiki at:
https://code.google.com/p/ganeti/wiki/StyleGuide https://code.google.com/p/ganeti/wiki/HaskellStyleGuide
This commit combines the two pages into an .rst file with minimal...
Correct exception when ssconf file does not exist
After an upgrade to 2.11, the ssconf file for the mastercertificates might not exist. Based on the non-existance,noded falls back to a compatibility mode regarding dealingwith SSL certificates. The check for the ssconf file...
Also downgrade gluster parameters
Support for gluster was added only in version 2.11. So,when downgrading to the 2.10 branch, these parametersneed to be removed.
Create client certificate for normal nodes
The vcluster QA revealed a bug in the SSL certificatehandling code, where certificates were only createdwhen the node is a master-candidate. However, every nodeshould have a certificate, but only the digests of the...
Also consider filter fields for deciding if using live data
If the query fields don't require live data, we use the shortcutand don't request live data. However, we cannot take this shortcutif the fields the filter depends on requires live data.
Catch exceptions when calling curses.setupterm() in QA
If it's running on a non-standard terminal, such asrxvt-unicode-256color, the call fails with an exception. Instead, catchthe exception and proceed without coloring warnings/errors.
Signed-off-by: Petr Pudlak <pudlak@google.com>...
Increase job queue polling interval
Now that all jobs are monitored with inotify, increase the polling interval.
After detecting a finished job, schedule again
In order to obtain a higher throughput of jobs, schedule new jobsas soon as a job was detected to have finished.
Attach a watcher for jobs
Add a function that can serve as an event handler for inotifyupdating a job in the job queue if the corresponding job filechanges. Also attach it to all jobs selected to be run.
JQScheduler: always pass JobWithStat
When attaching inotifies to jobs, we need to preserveit through potential requeuing actions. Also, this informationis needed for cleaning up.
Cleanup inotifies
When cleaning up finished jobs, remove the inotifyattached to them, if any.
Add an optional inotify to jobs in the scheduler
This provides the infrastructure to monitor running jobsby inotify, and hence update the queue promptly uponjob changes.
Make luxid handle SetDrainFlag
Make luxid also handle queries to drain the job queue.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Add RPC for setting the queue drain flag
As luxid is also responsible for handling requests to drain the job queue,we need the corresponding RPC in Haskell as well.
Fix sign in drain_flag request
The drain flag is set, if the queue is not open.
Eliminate installation modes in OS reinstalls doc
Eliminate installation modes in OS reinstalls design doc and insteadallow disk images and OS scripts to be combined, with an optionalvirtualized environment.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>...
Reinstantiate inotify after a lost file
When watching a file, reinstantiate the inotify if notifiedof an event that removes the watch. Such events are likelyto happen, as our usual way to "modify" a file is to atomicallyreplace it by another one.
Improve debug-logging for watch file
Also log, at debug level only, when a change of a watchedfile was observed, but the change did not result in anychange of derived value.
Improve debugging by logging inotify events
At debug level, not only log that an inotify triggered,but also log the actual event.
Update design doc to match implementation
This patch contains some minor changes in the design docto make sure the details match the implementation.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Update UPGRADE nodes
Adds to the upgrade nodes that a renewal of the nodecertificates is necessary.
Update NEWS wrt to client RPC certificates
This updates the NEWS file regarding the changes inRPC communication.
Verify client certificates
This patch adds a step to 'gnt-cluster verify' to verifythe existence and validity of the nodes' clientcertificates. Since this is a crucial point of thesecurity concept, the verification is very detailed withexpressive error messages and well tested by unit tests....
Verify incoming RPCs against candidate map
From this patch on, incoming RPC calls are checked againstthe map of valid master candidate certificates. If no mapis present, the cluster is assumed to be inbootstrap/upgrade mode and compares the incoming call...
Handle promoting/demoting nodes wrt to client certificates
This patch makes Ganeti correctly handle the clientcertificates when nodes get promoted to master candidatesor demoted to normal nodes.
Extend RPC call to create SSL certificates
So far the RPC call 'node_crypto_tokens' did only retrievethe certificate digest of an existing certificate. Thiscall is now enhanced to also create a new certificate andreturn the respective digest. This will be used in various...
Create client SSL certificates on cluster init
This patch makes Ganeti create a client SSL certificate forthe master node on cluster initialization. Note that some ofthe code in this patch is later moved into an LU to serverequirements for crypto renewal and updates, but for this...
Store candidate certificates in ssconf
This patch enables Ganeti to store the candidatecertificate map in ssconf. A utility function toread it is provided as well.
Handle client certificates on node add/remove
This patch adds the certificate of a newly added orreadded master candidate node to the map of master candidatecertificates. It removes a master candidate node's certificatedigest from the candidate certificate map if the node is...
Add certificate for master node
On cluster initialization, the master node'sSSL certificate digest is added to the list of mastercandidate certificates.
Add candiate certificate map to configuration
At the end of this patch series, incoming RPC calls arelegitimized against a map of master candidate nodes'SSL certificate digests. This patch adds the map itselfto the cluster's configuration.
Signed-off-by: Helga Velroyen <helgav@google.com>...
Retrieve a node's certificate digest
In various cluster operations, the master node needs toretrieve the digest of a node's SSL certificate. For thispurpose, we add an RPC call to retrieve the digest. Thefunction is designed in a general way to make it possible...
Utility functions to manipulate the candidate map
This patch adds a couple of utility functions to manipulatethe map of master candidate SSL certificate digests.
Remove superfluous imports
This removes some superfluous imports from the X509 (SSL)unittests.
Fix types for queries in QA
Due to the actual implementation of the '?' operatorin our query language, it happily accepted essentiallyany value that was not 0 or False as being true. However,it was always only specified to work on boolean values.Therefore, our QA shouldn't test for this unspecified...
Merge branch 'stable-2.10' into master
Replace errors re-export in luxi.py with proper imports
Instead of re-exporting errors in luxi.py, import rpc/errors.py in themodules that use them.
break line longer than 80 chars
luxi.py: Fix pylint warning about unused imports
Reexport exception classes more explicitly for pylint's convenience.
Signed-off-by: Santi Raffa <rsanti@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
rpc: Fix one more py-apidoc warnings
hsqueeze: Also test for tagging
hsqueeze is required to tag nodes before powering them down. Also testfor this behavior.
hsqueeze: tag nodes before offlining them
hsqueeze is supposed to tag nodes before powering them down, so thatit later can recognize which nodes can be activated later. When showingthe commands to execute, also add the tagging commands.
Add an hsqueeze test for drbd instances
In this example, there are two drbd instances, rendering a total offour nodes ineligible for being offlined. Additionally, the mastermay not be offlined either, leaving a single candidate.
hsqueeze: only consider nodes that are not secondaries
If an instance has a secondary node, it cannot be easilymoved to every node (in the same node group), as otherwiseno node would be distinguished as secondary. As hsqueezeshould only consider nodes were moving the instances away...
rpc: Fix py-apidoc warnings
The previous commits shuffled code around using import renames asglue. apidoc ignores import renames, however, and chokes on somenow invalid link targets.
This commit fixes the issue.
Signed-off-by: Santi Raffa <rsanti@google.com>...
Separate the LUXI protocol version from the generic client
This allows other daemons and their clients (such as WconfD) to use adifferent versioning sequence of their protocols.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Rename CallLuxiMethod to CallRPCMethod
Also update error messages and testing code to refer to RPC instead ofLUXI.
Split Luxi Client into a generic and a specific part
The generic part will be reused in WConfd.
Move Transport from luxi.py to a separate module
Also create a new module for RPC errors.This allows it to be reused for other clients as well.
Add a Python directory for RPC code to keep it at one place
Move rpc.py to rpc/node.py and modify imports in existing code.
Gluster: announce in NEWS
Add the relevant line to NEWS
Signed-off-by: Santi Raffa <rsanti@google.com>Signed-off-by: Thomas Thrainer <thomasth@google.com>Reviewed-by: Thomas Thrainer <thomasth@google.com>
Gluster: add the Shared File storage type
The shared file and gluster disk templates should not report their diskspace information like file does, because they do not behave the same.
If a cluster pulls from the same, shared source of storage then it is...
Gluster: add userspace access support
Add support for the QEMU gluster: protocol. Also change the accessmode routines so they check the access parameter for all templates.
Signed-off-by: Santi Raffa <rsanti@google.com>Signed-off-by: Thomas Thrainer <thomasth@google.com>...
Gluster: mount automatically
Add parameters to the Gluster disk template so Gluster can manage themount point point autonomously.
Gluster: use ssconf value for mountpoint directory
Gluster still does not mount anything autonomously, but this commitchanges where Gluster expects its mountpoint to be.
ssconf: Add Gluster mount directory
This commit adds the gluster storage directory to ssconf (withoutactually using its value just yet).
Gluster: add GlusterVolume class
This commit teaches Gluster what a volume is and how to use it.
Gluster: minimal implementation
Add Gluster to Ganeti by essentially cloning the shared file behavioureverywhere in the code base.
netutils: Add ValidatePortNumber method
This method accepts a port number and checks that it is in fact valid.
FileStorage: extract file logic to a FileDeviceHelper object
This will allow code reuse for Gluster through composition, ratherthan inheritance.
FileStorage: move to filesstorage.py
Move the FileStorage class in its own file, together with its helperfunctions.
PathJoin: improve error message when given one argument
PathJoin fails with an unclear message if only one argument is passedto it. Calling PathJoin("/foo") causes this exception:
Error: path joining resulted in different prefix (/foo != /foo)
However, /foo and /foo obviously share prefixes: what this function...
ComputeLDParams: do not spell out disk templates
A large part of the complexity in this function is due to the needto translate from "template-specific" parameter names to"template-agnostic" parameter names. This logic is complex and havingcomplex code for complex logic is okay....
bdev: Fix position of DEV_MAP
This rather important dictionary from constants to classes was hidingbetween function definitions. The dict cannot go to the top of the fileas the classes haven't been defined there yet, so it's been pushedto the bottom of the file....
gnt-cluster verify: demote orphan volume error to warning
Ganeti checks for orphan volume by making sure that it knows about allvolumes on disk; any additional orphan volume, even if created by theadministrator, causes a failure in gnt-cluster verify. Given that...
For the commandline, switch to query socket by default
As luxid now understands all the requests used by the command-line tools,switch the default luxi socket for those to be the socket of luxid.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Implement fields query for instance
Support the query for the fields available for instances.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>
Remove the hvsGlobals from instance query fields
...to be consistent with the python implementation.
Add nic.vlans to the query fields
In commit 3293332 this was only done for the Haskell side; doso for python as well, to have both views consistent.