Consider old client cert only when available
This fixes a bug which occurred only after upgradingfrom 2.10 to 2.11. During the cluster renew-cryptooperation, Ganeti tries to include the old certificatein the candidate map while it is providing newcertificates. This failed when there was no certificate...
Fix return of 'Validate'
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Add reason for job pickup to the trail
Add a new entry in the reason trail when a job is picked up by MasterD from thehard drive, after LuxiD put it there.
Note that the signature of NameToReasonSrc is changed in an incompatible way,although it's a public method because in this commit we also change its only...
Make the AddReason method public
It will need to be accessed from outside the class too in one of the nextcommits.
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Merge branch 'origin/stable-2.10' into stable-2.11
Signed-off-by: Hrvoje Ribicic <riba@google.com>...
Merge branch 'origin/stable-2.9' into stable-2.10
Signed-off-by: Hrvoje Ribicic <riba@google.com>Reviewed-by: Jose A. Lopes <jabolopes@google.com>
Make gnt-debug locks display fake job locks properly
When a job is dependent on other jobs, a fake lock is created whosepending entry contains a list of job ids waiting on the job. gnt-debuglocks did not expect the job ids to be ints, crashing when encountering...
Make NiceSort treat integers well
NiceSort is invoked on arrays that may contain strings, but in othersituations can contain ints as well. As this surprisingly makes sense,add a tiny modification to make NiceSort work in these conditions.
Merge branch 'stable-2.10' into stable-2.11
Merge branch 'stable-2.9' into stable-2.10
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>Reviewed-by: Jose A. Lopes <jabolopes@google.com>
Merge branch 'stable-2.8' into stable-2.9
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Jose A. Lopes <jabolopes@google.com>
Fix expression describing optional parameters
The NIC's network and vlan are also newly added, hence need to beconsidered optional to remain backwards compatible.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Let the instance's tuple of nodes start with the primary
Before the tuple of nodes of an instance was created from a set, listingthe nodes in alphabetical order. This patch ensures that the primarynode is always the first one in the list.
Signed-off-by: Petr Pudlak <pudlak@google.com>...
Check the existence of system users and groups at bootstrap
Before, if any of these were missing, the creation of a cluster failedand the cluster remained in an inconsistent state, without thepossibility to destroy it or to re-create it (#603).
This patch calls 'GetEnts' during bootstrap, which tries to read all...
Conflicts: lib/cmdlib/instance.py: manually apply 0973f9ed on...
Improve job status assert affected by race condition
In the sliver of time between choosing a waiting job to be executed andtrying to acquire locks for its execution, the status of the job can bechanged to canceling. An assert checking the job status neglected to...
Export and import Disk/NIC name
Name of Disk/NIC were not exported during backup until now.Use the exported info during gnt-backup import.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Fix backup import in case NIC is inside a network
Network UUID is written in .ini file during backup exportbut is not used by _ReadExportParams(). This patch fixes it.
Please note that in case a network is given, link and mode shouldnot be included in NIC options....
Override get() method of ConfigParser
During backup import/export SafeConfigParser() is used tosave/restore instance's configuration. There is a possibility if anexport is done with a different Ganeti version, a specific value notto be saved during export (e.g. the NIC/Disk name) but still...
Smooth renewal of client certificates
This patch fixes another chicken-and-egg problem whichoccurred when the node certificates get renewed. Whenrenewing a node certificate, the previous certificatehas to be used to update the configuration. To address...
Use node UUID as client certificate serial number
It turns out, that some implementations of OpenSSL are morepedantic in checking the certficates than others. In thisparticular case, the SSL connection could not beestablished when the serial number of the certificates...
Revert "Disabling client certificate usage"
This reverts commit 45f75526b848, which was introduced totemporarily disable the implementation of SSL clientcertificates. As this patch series fixes the reason forthe disabling, we are rolling back the patch....
Remove the HTOOLS configuration variable
.. and update the code that uses it.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Gracefully handle queries for non-existing nodes
When adding a node, Ganeti checks whether the node is alreadypart of the cluster by querying for the node name. However,as queries are meant to return all nodes with the given name,it might well return the empty list when a new node is to be...
Workaround for monitor bug related to greeting msg
QMP may return multiple greeting messages upon connection.This is reported on qemu-devel. The fix is one-liner butuntil it get's released this is a quick and dirty workaroundthat flushes the client's buffer after getting the first...
hotplug: Verify if a command succeeded or not
Just after issuing _CallHoplugCommands() we invoke_VerifyHotplugCommand() which parses `info pci` resultand searches for given PCI slot and device id.
If we previously had removed a device but it is still there...
hotplug: Call each qemu commmand with an own socat
Previously we issued one socat command with two "\n" separatedactions (e.g. netdev_add ...\ndevice_add...)
After having observed a strange monitor behavior [1] splittingthose commands and introducing a sleep time in between, may reduce...
Fix specification of TIDiskParams
Commit 580b1fdd incorrectly assumes that disk parameters arejust the standard ones, whereas the man page explicitly statesthat additional parameters can be passed as well, if they makesense for the chosen storage type. Fix this....
Make BlockDev subclasses adhere the interface for Create
In commit 702c3270 two new parameters were added to theCreate function of BlockDev. Make subclasses also adherethis specification.
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Petr Pudlak <pudlak@google.com>
Make the LUInstanceCreate return node names, not UUIDs
The LUInstanceCreate returned names instead of UUIDs in 2.6. Along theway, the names were internally replaced with UUIDs, and the abstractionleaked. This patch fixes the issue.
Make BlockDev subclasses adhere to new interface
In commit 702c3270 two new parameters were added to theconstructor of BlockDev. Make the subclassess accept theseadditional parameters as well.
Make disk.name and disk.uuid available in bdev
Until now Disk name and uuid was not available on bdev level.In case of ExtStorage, this info is useful, and may be for othertemplates in the future too.
This patch treats the name and uuid object slots just like the size...
upgrade: start daemons after ensure-dirs
On upgrading a cluster, we only can rely on daemons startingup cleanly, if all needed directories are generated first. Soensure-dirs needs to be run first.
Gracefully handle degraded instances in verification
The current code assumes that every instance either is of typediskless or has at least one disk. However, with the option toremove individual disk degraded 0-disk non-diskless instancescan occur. While such instances usually are not useful, Ganeti...
Be aware of the degraded case when cleaning up an instance
In the case of a degraded file-based instance, the file storage directoryfor that instance cannot be obtained by looking at the first disk. Usethe standard location, computed from first principles, in this case....
Preserve disk basename on instance rename
For file-based instances, upon rename, the directory containingthe instance disks is moved. Therefore, the basename needs tobe preserved in this case. Fix this. Note that so far, thisworked by accident as before 94e252a3 file names used to be...
Add QA tests for RAPI multi-instance allocation
The instance multi-allocation had no tests to detect its breakage, andthis patch fixes that.
Signed-off-by: Hrvoje Ribicic <riba@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
Fix multi-allocation RAPI method
The OpInstanceMultiAlloc that the instances-multi-alloc RAPI methoduses accepts a list of OpInstanceCreate opcodes rather than a list ofdictionaries as provided by the method. This patch correctly constructsthe opcodes, allowing the RAPI call to work as expected....
Assign unique filenames to filebased disks
With the new format for cmdline arguments, the user is able to add adisk to an instance at a specific index. But filebased disks' filenameshave the form "{0}/disk{1}" where '{0}' is the file_storage_dir and'{1}' is the index of the disk. So if an instance has 3 disks and we...
Add missing import
Signed-off-by: Klaus Aehlig <aehlig@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Disabling client certificate usage
This patch temporarily disables the usage of the clientSSL certificates. The handling of RPC connections had aconceptional flaw, because the certificates lack a propersignature. For this, Ganeti needs to implement a CA,...
Fix 'hvparams' of '_InstanceStartupMemory' on hypervisors
Most hypervisors were calling '_InstanceStartupMemory' but not passingthe 'hvparams' keyword argument. Actually, it is not necessary topass this argument given that it is an attribute in the instance...
Run drbdsetup syncer only on network attach
As late as DRBD 8.3.11, the drbdsetup syncer command has a bug causingnodes to hang from time to time, requiring manual intervention to fix.The use of the command cannot be avoided, but the incidence of use can...
Add correct locking of master node to gnt-debug delay
The gnt-debug delay command required locks for all nodes except themaster - this patch fixes the issue by adding master to the lockswhenever needed.
Add job id type assert to jqueue.py
While the changes introduced in previous patches should stop any jobid parameters reaching the queue as strings, add an assertion here tocatch any strings making it through.
Add job id transformation/check to Luxi Python client
This patch adds checks to the Luxi client, making sure that job idsare converted from strings to ints before being passed on, or that anerror is reported.
query: fix detection of master in _GetNodeRole()
Commit 1c3231aa changed the invocation of _GetNodeRole() to pass themaster node by UUID and not by name, but didn't change theimplementation to compare the nodes by name. As a result, the masternode (which is also a master candidate) would always fall through to the...
Move vcluster-related constants to Constants.hs
...as, in that way, they will also be available in Haskell,where job replication happens as well.
Include target node in hooks nodes for migration
In case of DRBD, hooks run on both primary (source) and secondary(target) nodes. To get the same behavior for DTS_EXT_MIRROR, where wedo not have secondary node, we should explicitly add target node tohooks nodes during instance migration/failover....
Make max_running_jobs queryable
As we have introduced a new cluster parameter, it shouldbe also visible when querying about the cluster configuration.
Add a command-line parameter for max_running_jobs
...so that this opcode parameter can become available for 'gnt-cluster modify'.
Add opcode parameter for the maximal number of running jobs
This parameter of OpClusterSetParams will allow to set themaximal number of jobs to be run simultaneously.
Add parameter max_running_jobs to the cluster configuration
This cluster-wide parameter will determine how many non-finalized jobs maximallyshould be in a not queued state at the same time.
Simplify 'GetMasterInfo' RPC
RPC 'GetMasterInfo' returns several fields, namely, 'master_netdev','master_ip', 'master_netmask', 'master_node', and 'primary_ip_family',of which only the 'master_node' is actually used.
Add certificate of auto-promoted master candidates to map
When a normal node is auto-promoted to be a mastercandidate, its SSL client certificate digest needsto be added to the map of candidate certificatesas well.
Signed-off-by: Helga Velroyen <helgav@google.com>...
Hook KVM hypervisor with KVM daemon shutdown files
User shutdown hypervisor parameter
Add user shutdown parameter for KVM. Based on this parameter, decidewhat information to report for a KVM instance, for example,distinguish between 'ADMIN_down' and 'USER_down'.
Signed-off-by: Jose A. Lopes <jabolopes@google.com>...
Add helper function to tell if a daemon is alive
Add helper function 'utils.IsDaemonAlive' to tell if a daemon is aliveby name. This function will be necessary for the KVM hypervisor todetermine if the KVM daemon is running and otherwise start it.
Fix docstring for 'AsyncStreamServer'
Signed-off-by: Jose A. Lopes <jabolopes@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Remove deprecated _ERROR_DATA_KEY in QMP
Commit de253f14 of QEMU repo "BREAKS QMP's compatibility forthe error response" as it removes "data" key from qmp errorresponse messages. To this end we only log "class" and "desc" values of the message.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>...
Add utility to compare versions
This will be needed, e.g., for post-upgrade task, as theyhave to decide whether a feature was not yet present atthe version started from.
Merge branch 'stable-2.10' into master
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
Run postupgrade hook after upgrade
To allow for necessary last-moment adaptions, of the new cluster,we run the post-upgrade hook of the target version, providingthe version we originally started from.
Provide path to post-upgrade
Also add the current version to the intent-to-upgrade file
Our design states, that the intent-to-upgrade file contains "the currentversion of ganeti, the version to change to, and the process ID". Make theimplementation fit with that design.
admin.rst: update and reword disk template section
The disk template section was not updated for Gluster. This commitalso refactors the section slightly by unifying the different remarksabout /etc/ganeti/file-storage-paths.
sphinx_ext is also changed in order to not hardcode too much...
Remove certification on 2.11 to 2.10 downgrade
While version 2.10 ignores any leftover client certificates, theirpresence will prevent a the cluster working after an upgrade backto version 2.11 again. So we have to remove them right at thedowngrade.
Add support for version-specific downgrade tasks
Upgrading can have no specific knowledge about additionaltasks besides upgrading the configuration, as upgrades needto be able to go to any future version (within the same majorversion). Downgrading, however, is version specific and always...
Improve backwards compatibility of Issue 649 fix
Commit e6e4ff4cf8d0100f331f94f7a27aa1e03a5d0e7d fixed Issue 649 by switching theseparator for usb_devices from comma to space. That solved the problem withthe command line, but RAPI was able to work with commas too, so, for backwards...
Correct exception when ssconf file does not exist
After an upgrade to 2.11, the ssconf file for the mastercertificates might not exist. Based on the non-existance,noded falls back to a compatibility mode regarding dealingwith SSL certificates. The check for the ssconf file...
Create client certificate for normal nodes
The vcluster QA revealed a bug in the SSL certificatehandling code, where certificates were only createdwhen the node is a master-candidate. However, every nodeshould have a certificate, but only the digests of the...
Change usb_devices separator to whitespace
The usb_devices parameter was using comma as a list separator, but this cannotwork because comma is already used as the hypervisor parameter separator.
Change it to use whitespace as a separator, in accordance to what already done...
Verify client certificates
This patch adds a step to 'gnt-cluster verify' to verifythe existence and validity of the nodes' clientcertificates. Since this is a crucial point of thesecurity concept, the verification is very detailed withexpressive error messages and well tested by unit tests....
Verify incoming RPCs against candidate map
From this patch on, incoming RPC calls are checked againstthe map of valid master candidate certificates. If no mapis present, the cluster is assumed to be inbootstrap/upgrade mode and compares the incoming call...
Handle promoting/demoting nodes wrt to client certificates
This patch makes Ganeti correctly handle the clientcertificates when nodes get promoted to master candidatesor demoted to normal nodes.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Hrvoje Ribicic <riba@google.com>
Extend RPC call to create SSL certificates
So far the RPC call 'node_crypto_tokens' did only retrievethe certificate digest of an existing certificate. Thiscall is now enhanced to also create a new certificate andreturn the respective digest. This will be used in various...
Create client SSL certificates on cluster init
This patch makes Ganeti create a client SSL certificate forthe master node on cluster initialization. Note that some ofthe code in this patch is later moved into an LU to serverequirements for crypto renewal and updates, but for this...
Store candidate certificates in ssconf
This patch enables Ganeti to store the candidatecertificate map in ssconf. A utility function toread it is provided as well.
Handle client certificates on node add/remove
This patch adds the certificate of a newly added orreadded master candidate node to the map of master candidatecertificates. It removes a master candidate node's certificatedigest from the candidate certificate map if the node is...
Add certificate for master node
On cluster initialization, the master node'sSSL certificate digest is added to the list of mastercandidate certificates.
Add candiate certificate map to configuration
At the end of this patch series, incoming RPC calls arelegitimized against a map of master candidate nodes'SSL certificate digests. This patch adds the map itselfto the cluster's configuration.
Retrieve a node's certificate digest
In various cluster operations, the master node needs toretrieve the digest of a node's SSL certificate. For thispurpose, we add an RPC call to retrieve the digest. Thefunction is designed in a general way to make it possible...
Utility functions to manipulate the candidate map
This patch adds a couple of utility functions to manipulatethe map of master candidate SSL certificate digests.
Ensure that all the hypervisors exist in the config file
All the hypervisors are supposed to exist in the config file, but it might notbe so after upgrades from old versions. This patch ensures that all the missinghypervisors are added with their default values to the config file....
Replace errors re-export in luxi.py with proper imports
Instead of re-exporting errors in luxi.py, import rpc/errors.py in themodules that use them.
Signed-off-by: Petr Pudlak <pudlak@google.com>Reviewed-by: Klaus Aehlig <aehlig@google.com>
luxi.py: Fix pylint warning about unused imports
Reexport exception classes more explicitly for pylint's convenience.
Signed-off-by: Santi Raffa <rsanti@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
rpc: Fix one more py-apidoc warnings
rpc: Fix py-apidoc warnings
The previous commits shuffled code around using import renames asglue. apidoc ignores import renames, however, and chokes on somenow invalid link targets.
This commit fixes the issue.
Signed-off-by: Santi Raffa <rsanti@google.com>...
Separate the LUXI protocol version from the generic client
This allows other daemons and their clients (such as WconfD) to use adifferent versioning sequence of their protocols.
Rename CallLuxiMethod to CallRPCMethod
Also update error messages and testing code to refer to RPC instead ofLUXI.
Split Luxi Client into a generic and a specific part
The generic part will be reused in WConfd.
Move Transport from luxi.py to a separate module
Also create a new module for RPC errors.This allows it to be reused for other clients as well.
Add a Python directory for RPC code to keep it at one place
Move rpc.py to rpc/node.py and modify imports in existing code.