History | View | Annotate | Download (234.3 kB)
Add a new instance query flag ‘disk_usage’
This patch adds a new instance query flag called disk_usage thatretrieves the overall space used by an instance on each of its nodes.This can be used when balancing the cluster or checking N+1 status.
The flag is also exported in RAPI. Note the flag is currently broken for...
Uniformize some function names in backend.py
Currently, the names of the functions in backend.py that are actuallyRPC procedures and are called from ganeti-noded are not corresponding tothe RPC names. This makes it hard to actually see which functions are...
rpc.call_blockdev_find: convert to (status, data)
This patch converts the call_blockdev_find - which searches for blockdevices and returns their status - to the (status, data) format. We alsomodify the backend function name to match the rpc call.
Reviewed-by: ultrotter
Export the cpu nodes and sockets from Xen
This is a hand-picked forward patch of commit 1755 on the 1.2 branch(hand-picked since the trees diverged too much since then):
The patch changed the xen hypervisor to compute the number of cpu sockets/nodes and enables the command line and the RAPI to show this...
cmdlib: simplify some rpc error handling cases
By using the RemoteFailMsg() or the payload field of RpcResult, we cansimplify a few functions in cmdlib.
LUCreateInstance: only set running flag at the end
In lockless queries, it's better if we see the instance in ADMIN_downrather than ERROR_down during the time it's installed. As such, wechange the LU to only mark the instance 'up' at the time we are ready to...
Enable lockless node queries
Similar to the instance list, this patch enables lockless node queris.“gnt-node list” accepts now the “--sync” flag which enables locking, thedefault is lockless.
Reviewed-by: imsnah
Implement lockless query operations
This patch adds the framework for, and enables lockless OpQueryInstances. Thismeans that instances will be shown in ERROR_up or ERROR_down state, even thoughthis is not an error (but just an in-progress job).
The framework is implemented as follows:...
An attempt at fixing some encoding issues
This patch unifies the hardcoded re-encoding attempts into a singlefunction in utils.py. This function is used to take either an unicode orstr object and convert it to a ASCII-only str object which can be safely...
Small patch for handling errors in node add
This small path hopefully fixes the handling of ssh verify errors innode add (note: untested).
Return error messages in node add ssh handling
When the rpc call node_add fails, we don't have any error message. Thispatch changes the call to return (status, data) so that the user can seethe correct error message.
LUQueryClusterInfo: filter hvparams
We don't need to show hvparams for hypervisors which are not enabled onthe cluster.
Reviewed-by: iustinp
LUAddNode: copy the vnc password file also for KVM
Before we used to copy the file if xen-hvm was enabled on the cluster,no we'll do that if any enabled hypervisor is in the new HTS_USE_VNCgroup.
GetShellCommand: get hvparams and beparams
Sometimes the hypervisor will use the instance hv and/or be parametersto determine the best shell command. This is not possible, though,currently, as the instance hv/beparams are not filled, so we have topass the filled versions separately....
Implement software release version checks too
Currently the LUVerifyCluster only reports the protocol version changes,not software ones. This is useful to know/monitor, so we add this too asa warning.
LUQueryInstances: keep the given order of names
Currently LUQueryInstances keeps the ordering of instances only in some cases,and in others it will reorder the list. This patch fixes this by more clearlyseparating the various cases (names passed or not and locking or not locking),...
Fix gnt-cluster modify -H and offline nodes
Fix the mode attribute of newly-created disks
Currently, only the LUSetInstanceParams correctly sets up the modeattribute via a manual operation. We remove this and instead do thecorrect setting in the generic _GenerateDiskTemplate function, so thatwe set the mode correctly for all disk creations....
Fix batcher for 2.0-style disks and nics
This patch fixes the gnt-instance batch-create command, and in doing soalso slightly changes two other functions: - we change utils.ParseUnit so that it accepts integer values also (both ParseUnit(5) and ParseUnit("5") return the same value)...
Make iallocator work with offline nodes
This patch changes the iallocator framework to work with and properlyexport to plugins offline nodes. It does this by only exporting thestatic configuration data for those nodes, and not attempting to parsethe runtime data....
Remove checking of DRBD metadata for validity
Currently the DRBD code checks that the metadata devices are validbefore creation, initial disk attachment and add children.
However, the process for checking validity requires a free DRBD minor,and this conflict with parallel checking....
A couple of small fixes to iallocator
This removes some constraints: - only two disks supported, this is no longer true as the underlying functions can now compute size for a variable number of disks - error when the hypervisor was not being passed...
Automatically release DRBD minors on success
This patch converts the DRBD minors reservation protocol from explicitrelease to automatic release on the success paths. On the errors paths,it's still needed to manual release.
The patch doesn't bring much by itself, but is needed for a future patch...
Change the instance status attribute to boolean
Due to historic reasons, the “should run or not” attribute of aninstance was denoted by its “status” attribute having a string value ofeither ‘up’ or ‘down’. Checking this is in code was done via hardcoding...
Add calls in the intra-node migration protocol
Currently the hypervisor is expected to do all the migration from thesource side. With this patch we also add the option of passing someinformation to the target side, and starting some operation there.
As a bonus, a function to cleanup any started operation is included....
Convert RenameInstance to (status, data)
This allows the rename failures to show the ouput of OS scripts.
Fix adding of disks to an instance
The ConfigWriter.AllocateDRBDMinor requires the instance name, not theinstance object. The LUSetInstanceParms is passing wrongly the instanceobject, which can cause breakage.
The patch also adds asserts to check for this mismatch in ConfigWriter....
Make cluster-verify check the drbd minors space
This patch adds support for verification of drbd minors space in clusterverify: minors which belong to running instances and should be onlinebut are not, and minors which do not belong to any instace but are in...
Some small fixes in cmdlib
Convert AddOSToInstance to (status, data)
This allows the install and reinstall instance to return (hopefully)relevant log files from the OS create scripts.
Convert the start instance rpc to (status, data)
This will record the failure cause in starting up the instance in thejob log (and thus to the user).
Fix handling of failures in create instance disks
Commit 2302 only modified _CreateBlockDevOnPrimary to the new styleresult, but _CreateBlockDevOnSecondary was forgotten. After the mergerof the two functions, _CreateBlockDevOnSecondary was taken as template...
Use instance.all_nodes instead of hand-building it
This patch replaces a few obvious uses of [instance.primary_node] +list(instance.secondary_nodes) (or similar usage) with the newinstance.all_nodes.
Split the block device creation in two parts
Some callers of _CreateBlockDev need recursive behaviour, but not all.The replace secondary first creates (manually) new LVs to ensure storageis there, and then it creates the new DRBD. At this point, we need a...
Combine the two _CreateBlockDevOnXXX functions
Since only two boolean parameters differ between these two functions, wecombine them as to have less code duplication. This will be needed inthe future as we will need to split off the recursive part off....
Switch call_blockdev_create call to (status, data)
This allows errors to be visible at the user level instead of just nodedaemon logs.
Small change in the instance disk creation path
For future propagation of error messages from backend to cmdlib and tothe job log, just having True/False return from the disk creationfunction is not enough.
This patch converts these functions (_CreateDisks, _CreateBlockDevOnXXX)...
Use the same root for both _data and _meta LVs
Currently we use a different UUID for the _data and _meta volumes of aDRBD disk. This is confusing as it's hard to associate the two in theoutput of “lvs” or “gnt-node volumes”.
The patch changes so that they use the same prefix....
Fix LUExportInstance
Due to deficiencies in our block device implementation, it is a must tocall SetDiskID on disks before passing them to remote nodes. Since inexport/import, we don't touch the disks themselves, this was not neededbefore in this function....
Fix gnt-backup export with short names
We need to pass the fully-qualified node to _CheckNodeOnline, not the shortone.
Forward port the live migration from 1.2 branch
This is forward port via copy (and not individual patches cherry-pick)of the latest code on the 1.2 branch related to the migration.
The changes compared to 1.2 are the fact that we don't need theIdentifyDisks step anymore (the drbd rpc calls are independent now), and...
Port replace disk/change node to the new DRBD RPCs
In replace disks to new secondary, since Attach (and thereforecall_blockdev_find) is not modifying the devices anymore, we need toswitch this LU to the new call_drbd_disconnect_net andcall_drbd_attach_net functions....
Fix modification of instance memory
... as found by the QA script - bug was introduced by me in commit 2117.
Reviwed-by: imsnah
Fix some errors in instance modify --disk remove
The RpcResult introduction still left some bugs (after multiple patches): - we don't correctly check the result type - rename a variable to prevent a conflict
Fix an error handling case in instance info
The checking for invalid instance names in LUQueryInstanceData is brokensince commit 1642.
Introduce a very simple LU to force config updates
This LU can be used to force a push of the config in case it's needed,for example after an upgrade to update the ssconf_release_version file.
Fix gnt-os for offline nodes
We shouldn't query offline nodes in gnt-os. This patch adds an utilityfunction to ConfigWriter that returns the names of online nodes and usesit in LUDiagnoseOS to query only the good nodes.
Cleanup replace-disks modes and options
In 1.2, due to the md+drbd7 legacy, we had a complex choice of replacemodes, and the new drbd8 modes where forced into this syntax, with somecomplicated rules of transition from one mode to another (if REPLACE_ALL...
Fix cluster verify/node net test for offline nodes
For offline nodes, we shouldn't add them to the NV_NODELIST andNV_NODENETTEST tests since they most likely won't succeed.
The patch makes gnt-cluster verify happy again in such cases.
_AssembleInstanceDisks: fix rpcresult handling
Commit 2117 changed _AssembleInstanceDisks to correctly parse thefailure status of the new RpcResult structure, but it didn't fix thestoring of only the result payload. Since RpcResult is not JSONserializable, LUActivateInstanceDisks is failing....
ganeti.cmdlib: Check remote API certificate on "gnt-cluster verify"
Reviewed-by: amishchenko
LUConnectConsole: fix primary_node online check
The primary node is part of the instance, not of the opcode.
cleanup: fix IAllocator hypervisor usage
Two problems: the iallocator.hypervisor wasn't initialized to None inthe constructor, so pylint doesn't realize it's initialized later withsetattr.
Second, 'hypervisor' is a module, so we shouldn't use it as a variable....
cleanup: LUReplaceDisks unused vars
And a small whitespace fix.
cleanup: do not hide upper-scope name
hypervisor is a module, so we shouldn't use it as an argument.
cleanup: fix use of _CheckNodeOnline
A few cases of wrong variable name.
cleanup: LUAddNode, LUSetNodeParams unused variable
This is a leftover from the abstraction of AdjustCandidatePool, and italso requires the config lock, so it's better to remove it.
cleanup: LURenameCluster wrong variable name
Warn for instances living on offline nodes
The patch also changes the result to error for non-reachable secondary nodes(as for primary nodes).
Fix _AdjustCandidatePool
Currently the ConfigWriter.MaintainCandidatePool returns node names, and_AdjustCandidatePool uses them as such, but then it passes these tocontext.ReaddNode which in turn passes them to jqueue.JobQueue.AddNode whichuses them as objects.Node instances....
gnt-node modify: add the offline attribute
This patch changes gnt-node modify and the associated opcode/lu to allowmodification of the node offline attribute.
Setting a node into offline mode automatically demotes it from themaster role.
Make cluster verify understand offline nodes
This patch changes cluster verify to not alert on offline nodes, butinstead just show a note at the end with the number of such nodes.
It also removes warnings in verify-disks and hooks about failures tomake rpc calls to such nodes....
cmdlib: check node stats in prereqs
This patch adds checks for offline nodes in most instance LUs so that wecan work with offline secondaries, but not with offline primaries. Somecases (like grow disk, which needs both sides up) are not allowingoffline nodes at all....
Add two utility functions to cmdlib
These will be used for parameter checking and node status checking.
Add function to compute the master candidates
Since some nodes can be offline, we can't just take the length of thenode list as the maximum possible number of master candidates.
The patch adds an utility function to correctly compute this value andreplaces hardcoded computations with the use of this function. It then...
Cleanup the config file on demotion from candidate
This patch adds a simple rpc which makes a backup of the config file andthen removes it. This is done so that cluster verify doesn't complainimmediately after demoting a node.
watcher: handle offline nodes better
This patch changes the LUQueryInstances to show a different state foroffline nodes and also modifies the watcher to understand the offlinestate in its checks.
node list: add the offline field
Add a new node parameter 'offline'
This patch adds a new node parameter called offline that will be used tomark nodes which should be touched by commands.
We also add this flag at cluster init, node add, and export it toiallocator scripts.
LURemoveNode, promote nodes to master candidates
If after the remove node there are not enough master candidates, we'lltry to promote them.
LUQueryExports: fix rpcresult handling
call_export_list is a multi node call, so we need to go through theresults, extrapolate the good ones, and return a failure value for thebad ones.
LUAddNode: Auto-make master candidates
When a node is added, if there are not enough master candidates, we'llautomatically promote it.
LUAddNode: Check the correct result
This is a typo in the conversion to RpcResult
A few fixes related to master candidates
This patch: - fixes cluster verify when all nodes are master candidates, but the candidate_pool_size is higher - warn when the master node is not marked as candidate - disable setting master node to regular node...
Fix cluster rename and known_hosts
This patch rewrites and distributes ganeti's known_hosts file in case ofa cluster rename.
We also fix a problem in the node add (from where I copied theknown_hosts file distribution).
Fix gnt-cluster verify w.r.t. rpc changes
This partially reorganizes the cluster verify LU: - introduce constants for the node verify rpc call - move from additional rpc calls to a single rpc call, the call_node_info, which gaters all data needed...
Fix cluster rename
With the recent configwriter/ssconf changes, cluster rename becomestrivial. This patch gets rids of the code and just updates the clusterobject.
Convert rpc results to a custom type
For a long time we had the problem that both RPC-layer errors andresults from the remote node share the same "valuespace". This isbecause we shouldn't raise an exception when only one node failed(and lose the results from the other nodes)....
Use the new utils.CheckBEParams function
Where we used/forgot to validate beparams we now use the new common function.
Handle default/none values in hv/be params
When a value is set to constants.VALUE_DEFAULT we have to remove it fromthe specific instance dict, as this way it will be populated from thecluster before. If instead it's specified as constants.VALUE_NONE we'll...
ImportExport: make src_node and src_path optional
If src_node is not there we'll default to using the currently exportedinstance name as src_path. Also, if src_path is not absolute we'll lookfor it in EXPORT_DIR.
LUCreateInstance: handle import without src_node
If we get called with no source node we'll thread src_path as aninstance name exported in EXPORT_DIR in one of the nodes and look forit with the export_list rpc call.
LUCreateInstance: keep src node lock on import
Currently the node lock also guards against removing the import at thewrong time, so if we're importing an instance image we want to keep thesource node locked. In the future we might want to put export locks at a...
Adjust cluster-verify to check for candidate role
Currently cluster verify checks all nodes for the same set of files,even if the nodes are not master candidates.
This patch adds back checking of ssconf files for consistency and splitsthe checksum check into different error reporting messages based on...
Prevent demotion from candidate based on pool size
In gnt-cluster modify we prevent demotion from the candidate role ifthere are not enough master candidates left.
Add cluster candidate pool size parameter
This patch adds a new cluster paramater "candidate_pool_size" whichtracks the desired size of the list of nodes with the master_candidateflag set.
Add a gnt-node modify operation
This patch adds the OpCode, LogicalUnit and gnt-node command formodifying node parameters, more specifically the master candidate flagfor a node.
Add master/master_candidate fields to node list
This patch adds listing of the master_candidate field (as Y/N) and ofthe master role (again Y/N) for nodes.
Fix errors when the node info RPC is incomplete
[Forward-port from the 1.2 branch]
If ganeti starts before xend, the node information will not have all thefields filled in. The patch changes so that missing keys will be treatedas unknown (this applies to other cases as well, not only xend not...
Fix gnt-backup export
This patch fixes a bug in disk calculation for gnt-backup export, whichcompletely broke one-disk instance export.
The patch also corrects some error messages and style issues.
Fix a message in LUExportInstance
We never verified the node name before, so this is most likely not anon-retrieve but a wrong name case.
Fix instance creation
This patch fixes the diskless and drbd/file based instances. Sorry :(
Implement support for multi devices changes
This big patch adds support for: - changing NIC/disks in the multi-device model - adding/removing NICs - adding/removing disks
The patch is big and not very nice; the error checking paths are notvery clear....
Slight change to the LU initialization code
This patch adds support for a separate LU.CheckArguments() method whichshould do syntactic checks without holding locks and without pollutingthe ExpandNames which is a lock-related function. See for example the...
Fix a bug in LUSetInstanceParams
The wrong names were reused in a copy-paste.
Show disk access mode in gnt-instance info
The mode parameter needs to be exported and shown in the info output.
Change _GenerateDiskTemplate iv_name generation
Currently the _GenerateDiskTemplate assumes it does initial creation ofdisks (i.e. it starts with index 0).
For dynamic disk adds, we need to pass an additional offset. This patchadds this offset and modifies its sole current caller....
LUCreateInstance: Fix import mac AUTO mode
Previously on import LUCreateInstance used to recycle the mac if the instancename was the same than the one used at export time. Now we do the same, butapply the setting separately for each nic.
LUCreateInstance unlock all nodes mid-way
When creating a new instance, after saving the instance data to the config fileand creating the disks, but before waiting for sync and installing the OS, werelease the node locks, to allow for more instance creations to proceed in...
IAllocator: subtract down instances from free mem
Currently free_memory just reports the amount of free ram, as seen by thehypervisor. We adjust this amount by subtracting the memory for any instancewhich is down, and the difference for any instance which is configured to have...