Set the default metavg to be equal to the cluster name
The default metavg was always set to be the constant value "xenvg".This is OK for most cases, but if the cluster is initialized witha default name through the --vg-name option, the metavg should change...
Move HooksMaster out of the mcpu module
We need to do this, so that backend.py doesn't need to import mcpu, andthus indirectly cmdlib. This reduces the size of the node daemon byabout half, which is very important as it is pinned in memory.
This solves Issue 419....
Use KB as the unit for LVM PE size
LogicalVolume.Attach in bdev.py is calling "lvs" and specifying a unit ofMegabytes that is then converted to an integer resulting in the value 0 forsmall sizes.
This patch makes Ganeti use KB for the unit instead of MB....
Properly update iv_name of disks while changing templates
Trasforming the disk of an instance from DRBD to plain did not properlyupdate the iv_name of disks, leaving it to None.
This commit modifies _ConvertDrbdToPlain to properly set the iv_namevariables....
Check minimum size of networks on creation
When creating a network, so far no size constraints were checked.We now limit the size of a network to a /30 or bigger, althoughtecnically, the ipaddr library supports even /32 networks.
Signed-off-by: Helga Velroyen <helgav@google.com>...
Limit the size of networks to /16
This patch introduces an upper limit to the size of the networks that canbe created.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix job queue directory permission problems
If split users are used, the queue directory could only be accessedby masterd, but also confd needs to be able to read it, e.g. when itis queried as part of "gnt-job list"
This commit fixes the permissions in such a way to allow proper access rights....
The disk size of a diskless instance is 0, not None
For diskless instances it is still reasonable to sum up the disk usageof all the (zero) disks, resulting in the empty sum. This uniformityalso has the advantage that iallocators (like hail) do not have to do...
Postpone non-urgent TODO from 2.7 to 2.9
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Restrict instance move to templates based on local files
Moving an instance is done by copying over the disks. Restrictthis to disk templates that are copyable. This avoids accidentaluse on, e.g., the sharedfile template.
Signed-off-by: Klaus Aehlig <aehlig@google.com>...
Introduce a constant for the copyable disk templates
This list contains the disk templates suitable for movingan instance by copying the files. A requirement is thatthey're not accessed externally or shared between nodes;in particular, sharedfile is not suitable....
Do not _RemoveDisks after failed _CreateDisks
Now that _CreateDisks cleans up after itself in case of failure,do not clean up at call sites, as there we have to overapproximatethus potentially causing data loss.
Make _CreateDisk clean up partially created disks on failure
_CreateDisk used to just throw an exception if _CreateBlockDev failedleaving the caller in the state that some disks were created, withoutprecise knowledge which. Usually, the clean up then overapproximated...
Fix typo in an error message
Update "FIXME" string in RAPI
We are not ready for this change yet. Let's push it to 2.8.
rapi client: add target_node to migrate instance
This allows migrating to any node, as it is already possible forfailover, when instances are externally mirrored.
Signed-off-by: Daniel Krambrock <danielk_lists@z9d.de>Signed-off-by: Guido Trotter <ultrotter@google.com>...
Make diskless instances externally mirrored
This addresses Issue 237.
Mirroring no disk is a no-op. As such we can treat them like mirroredinstances, since the data they need (none) will be present on all nodes.
This is definitely enough to failover or migrate instances with a manual...
Fix migrate/failover -n for ext mirror storage
This fixes issue 396.
- Fix a wrong comment that mentions drbd8 when actually the code acts only on externally mirrored instances.- Fix a wrong assert that requires failover/migrate to acquire the NAL on externally mirrored instances: this is the case only when a...
Unit tests for objects.InstancePolicy + a fix
Tests for: objects.InstancePolicy.CheckParameterSyntax() objects.InstancePolicy.CheckDiskTemplates() objects.InstancePolicy.CheckISpecSyntax()
Instance policies with an empty disk-template list now are reported....
Unit tests for objects.FillIPolicy() + small fix
IPOLICY_DEFAULTS is now a legal policy (the disk-templates entry was a setinstead of a list, before).
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix upgrade of policy in objects.Cluster
Unknown elements were silently removed on startup. This means that asoftware upgrade could result in lost configuration information ifcfgupgrade wasn't run promptly.
Added unit test for Cluster.UpgradeConfig() to cover this case....
Fix instance policy checks for default back-end parameters
Policy violations of back-end parameters that used the cluster defaultvalue were not reported in cluster-verify.
Fix restoring default instance specs in group policies
"default" was not accepted as a valid input value for instance specs ingroup policies, due to a bug introduced in 2cc673a3e (and released with2.6.0). Added QA for this and another similar case.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>...
Fix policy check for disk templates
Instance disk template is checked against the policy, and disklessinstances aren't checked for the number of disks.
Fix merge 8e09e801 that resulted in duplicated code
A fragment in LUInstanceCreate.CheckPrereq() removed in commit ba147ff8 wasreintroduced in merge 8e09e801 due to a change in df28c49b.
GanetiRapiClient: fix the no_remember option
There was a typo which prevented the correct option from being passed toRAPI
Signed-off-by: Daniel Krambrock <danielk_lists@z9d.de>Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix build/sphinx_ext.py with tuple defaults for op params
When an OpCode's parameter has a tuple as default value, this codewill break:
buf.write("defaults to ``%s``" % default)
The patch fixes this and other potential cases by always passing atuple to '%'....
Fix handling of disabled (shared) file storage
The vcluster changes broke the disabling of file storage; we canworkaround by (manually) skipping the virtualisation of file storagepaths if they are not enabled.
Note that tests/QA are still broken with disabled file storage; this...
Fix LUTestAllocator with instance alloc
This is similar to commit 8775e62a; the addition of node_whitelistbroke this LU as well.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
Allow rpc.MakeLegacyNodeInfo to parse non-LVM results
'MakeLegacyNodeInfo' is not the best place for this, but we'd have toduplicate it if we wanted a LVM-less version, so the easiest is to addan optional parameter that allows it to accept/skip LVM-less results....
Allow iallocator to work without LVM storage
Currently, the iallocator interface requires LVM storage, due to theway it computes the node storage information.
By changing the code to understand that GetVGName() can return None,and by setting the disk_total/disk_free node parameters to the value...
Fix networks in _PrepareNicModifications()
Passing --net 0:add,ip=5.5.5.5 failed due to a referenceof a non initialized variable (new_net_obj). Reorder the checksand add some comments to be readable.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>...
Remove early returns in network LUs
Remove any early returns in LUNetworkDisconnect/LUNetworkConnectand replace them with if-else statements.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add networks to _AllIDs()
networks config objects have UUIDs and thus should be includedin _AllUUIDObjects().
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix HooksDict() in case of no tags
In this method self.tags might be None and cannot be usedin join(). Use GetTags instead.
Fix locking in LUNetworkConnect()
Locks for group instances are aquired only if conflicts are checked.To this end we must _CheckNodeGroupInstances() only then otherwisethis check will always fail (owned_instances will be []).
Fix networks in LUInstanceSetParams()
Params passed in _CreateNewNic() are not yet evaluated and includethe value passed by user for the network. A lookup must be done firstin order to find the corresponding network UUID which should be storedin the newly created NIC object....
Locking fixes regarding Issue 324
LUNetworkConnect/LUNetworkDisconnect, in case locking is used, mightlock instances that exist in the requested node group. The acquiredlocks should be checked if they are correct at the beginning ofCheckPrereq() via _CheckNodeGroupInstances()....
Fix small typo in a docstring
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michele Tartara <mtartara@google.com>
Minor fixes regarding nic.network change
Make LookupNetwork() return None in case target is None. This fixesIssue 380. Rapi passes network=None and the lookup should not fail.
Make network client aware of new nic.network.gnt-network info showsthe IPs of each instance inside the network. It parses nic.networks...
Fix issue 378
In case a NIC is not inside a network then netinfo None. Thusnetinfo["name"] fails.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>Reviewed-by: Helga Velroyen <helgav@google.com>
Remove useless code in backend for network hooks
In backend NICs arrive with netinfo filled. If nic.network is not Nonenic.netinfo is not too. Thus all the info is derived from HooksDict()and nic.network must not be checked.
Show network name and not uuid in instance info
This was before the case too. Now is bit trickier because nic.networkis uuid. Info must derive from nic.netinfo.
Implement network locking in Instance queries
This is needed in case more info than each nic's network uuid is tobe returned. We need to lock networks to get valid data. For nowonly the name is returned as an extra field. All other can be addedwith trivial effort....
Changes in query to support nic.network as uuid
Queries now return the network uuid as well as it's name. Here weonly use info provided be LUInstanceQueryData context.
Modify query LUs to supoprt nic.network as uuid
Make _InstanceQuery gather all network info related to instance'sNICs and in case of NETQ_INST in _NetworkQuery get all networkuuids directly from nic.network
Add GetInstanceNetworks() config method
This will be needed for Instance Queries. It walks through theinstance's NICs and returns a list network uuids that the NICsare attached to.
cmdlib changes to support nic.network as uuid
Refactor Instance related LUs to support nic.network asa uuid. This removes all the unnecessary invocations toLookupNetwork().
Make network config methods take uuid as argument
This will be needed in the following patches where nic.networkwill refer to network's uuid and not name.
Rename lib/objectutils to outils.py
Back when this was introduced, I mentioned that it breaks heavily tabcompletion (ob<TAB> doesn't work anymore), but at that moment I didn'thave a suggestion what to name it. I think outils is good and shortenough, and doesn't conflict with anything else, so here it goes....
Fix wrong type in a docstring of the RAPI subsystem
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Finish the remote→restricted commands rename
The documentation still points to /etc/ganeti/remote-commands,although the code is already using restricted-command. Update thedocumentation and a few docstrings accordingly.
Signed-off-by: Iustin Pop <iustin@google.com>...
Force conflicts check in LUNetworkDisconnect
Until now if one disconnects a network with --no-conflicts-checkand then remove it, there is a possibility to leave instances with NICsreferencing non-existing networks. This causes network queries,instance removal and modification to fail....
If _UnlockedLookupNetwork() fails raise error
Make _UnlockedLookupNetwork() raise OpPrereqError (instead of returningNone) in case it does not find the requested network. Remove useless andduplicate code such as:
if net_uuid is None: raise...
This is a cherry-pick of commit 1cce2c4....
Change default xen root path to /dev/xvda1
All recent-enough versions of linux see the xen paravirtual device as/dev/xvd*.
This doesn't break old installations, as the default is only used on newclusters.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Fix rbd showmapped output parsing
'rbd showmapped' output formatting differs between older and newer versions ofthe ceph tools. Try to use json output formatting if available (currentlyavailable only in the ceph master branch). For bobtail, argonaut and older...
Improve reporting on errors.AddressPoolError exceptions
This patch improves the error messages given when a“errors.AddressPoolError” exception is caught. Includes some small stylefixes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Remove network_type slot (Issue 363)
This slot was not used by Ganeti so the same info can beprovided via tags. In order not to break configuration datawe add a FromDict() method in Network config object thatremoves the deprecated network_type (if found) and then invoke...
Remove family and size from network objects
This info is not used by Ganeti and therefore is removed.
Make use of HooksDict() for networks
This can be used in hypervisor code as well. For consistencyexport *NETWORK_NAME and not *NETWORK throughout the code.
Moved uniformity check for exclusive_storage flag
Cluster-verify used to check that the value of exclusive_storage is uniformwithin node groups. Now, it's impossible to change the flag for a singlenode, so that check has been removed and an equivalent one has been added...
"exclusive_storage" cannot be changed on single nodes
There's never been support for a configuration where nodes in the same nodegroup have different values of the exclusive_storage flag. This patchdisables the possibility to change the flag for individual nodes....
Upgrades made on loading the configuration are always saved
Before, only some upgrades were written back to the configuration file. Alittle refactoring of _UpgradeConfig() has been done to write unit tests.
Show correct daemon name on Luxi connect errors
Since now confd also serves a Luxi endpoint, the current message incli.FormatError is misleading when actually failing to connect toit. The patch adds a somewhat hackish way to show the right daemonname....
ConfigData: run UpgradeConfig on network objects
Although this does nothing for now, running it is safe, and consistentwith how other objects behave.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
OS environment: add network information
1) Move the hooks environment dict generator inside the object. Thisalso adds missing values such as network family and uuid.2) Use the same generator both for the os environment and for theinstance hooks.3) Update manpage and hooks documentation....
Make gnt-os list work with no OSes
When absolutely no OSes are present on the cluster, the result ofOpOsDiagnose is an empty list. This is currently handled in gnt-os asan error condition, probably due to how OpOsDiagnose used to returnerrors in the past....
baserlib: Fix two mistakes in docstring
The method names were wrong due to copy & paste.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix Haskell log file naming after virtual cluster changes
Commit 3329f4de changed the Haskell log file from constants tofunctions, but introduced a bug: it uses now the daemon name insteadof the correct log file, which means "ganeti-confd.log" instead of...
Switch KVM to multi-error verify results
This uses the new _FormatVerifyResults helper function to returnmultiple errors.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Make LXC fail verification if cgroup is not mounted
Since LXC now relies on cgroup memory limits to enforce memory, let'smake hypervisor verification (and thus cluster-verify) return errorswhen the cgroup filesystem is not mounted.
Add a helper function for hypervisor verification
This will allow easier multi-error results from hypervisors; rightnow, we only report the first error, which is not nice.
hv_lxc: fix whitespace errors
The latest lxc patches included a few whitespace style errors, that makelint fail. This patch fixes those.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
LXC: add support for the memory controller
Add support for the memory resource controller, useful to actually placememory limits on instances.
Support is still optional, in case the kernel doesn't have it compiledin, or in the case of Debian has it dependent on a kernel command-line...
LXC: adapt hv for newer lxc userspace tools
Currently hv_lxc depends on the behavior and output of older LXC tools,which have since changed, making it unable to function in currentdistributions (e.g. Debian wheezy).
Adapt the tools and expectation for the output and make it into a...
Disable live-RPC queries under split query
Currently, the node listing RPC is very slow due to missingparallelisation. For the 2.7 release, we reset these back to masterd,hoping to revert them by the time 2.8 is ready.
There are a number of queries that I've left pointing to confd, as...
hv_kvm: Original error message, keyword parameter
- Include original error message when creating TAP interface failed- Pass keyword parameter as such
kvm: fix bug while fetching -device list
_GetKVMOutput expects the command to succeed, but unfortunately on someversions of kvm "-device ?" will output a correct list of devices, whileexiting with an error code.
To fix this we accept failure in that case (note that this doesn't...
hv_xen: Remove config after shutdown was successful
If stopping an instance failed, the configuration would already be goneand other operations depending on it (e.g. migration) would no longerwork. With this patch the configuration file is only removed once the...
hv_*: Always return from Verify, style fixes
Change all “Verify” methods in hypervisor abstractions to explicitelyreturn None if no problem was detected. Remove punctuation from errormessages. Update docstrings with “@return” and some small mistakes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
_VerifyErrors()._Error() and _ErrorIf() are now consistent
_Error() didn't contain the logic for demoting errors to warnings and formarking an operation as failed. Now _ErrorIf() is just a minimal wrapperfor _Error().
Unit tests included.
Handle the result of QueryGroups() correctly
If no group is given for the “gnt-network connect“/“… disconnect”commands, the client uses the result of “QueryGroups()” which is a listof lists. Use “itertools.chain()” to handle the return value correctly....
hv_xen: Compose file name outside error handling
In _ReadConfigFile, the filename should be prepared outside thetry/except block. Fixes bad code formatting, too.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
hv_base: Remove empty constructor
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
Add test for backend._GetBlockDevSymlinkPath
Add a unit test for the trivial “_GetBlockDevSymlinkPath” function inbackend (small changes in the function were required).
Fix format string of KVM output
This fixes a missing 's' in the format string andthe wrong quotes. Those bugs were introduced incommit 6e043e60.
Signed-off-by: Helga Velroyen <helgav@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
First part of confd timer changes
This patch changes the resolution of the timers: the watcher timergoes from 60s to 17s, and the polling-mode timer goes from 2 secondsto 250ms. The code changes a bit more due to the changes in the unitsof the various constants....
Fix type of 'node_whitelist' request parameter
If opportunistic_locking is used, then 'node_whitelist' parameter passedto the allocator is set to the LU's owned node locks. However, LU owned_lockshas type of 'set' while IReqInstanceAlloc expects type of...
hv_xen: Add test for CPU pinning configuration
Add a unittest for a function formatting CPU pinning information forXen's configuration.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
kvm: deduplicate 'get output' code
We had the same code twice, and were about to add a third time. Betterto collapse it into just one function.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
kvm: extract a regexp matching out of a for loop
kvm: remove last version-based feature detection
This was left behind because it required a different kvm invocation.Now that we can add new ones cheaply (two constants) it's easy to getrid of it. Differently than in other cases we support old version which...
Make Xen config path a build-time option
Stop hardcoding the path in “hv_xen.py”.
burnin: Don't keep hypervisor class around
Just determine whether it can migrate and keep that value instead of thefull hypervisor class.
Run pre-migrate hooks on primary node too
Signed-off-by: Constantinos Venetsanopoulos <cven@grnet.gr>Reviewed-by: Guido Trotter <ultrotter@google.com>
Check if KVM machine version is supported
If machine version is passed as an hv param, a check is madein target node whether this version is included in the supportedones derived from kvm -M ? command.
Verify that templates are compatible with exclusive storage
cluster-verify reports instances with disk templates not compatible withexclusive storage but that are running on nodes with the exclusive storageflag set.
Moved checks within LUClusterVerifyGroup
Almost all instance-specific checks have been moved from the Exec method tothe _VerifyInstance method. This cleans up Exec, which was becoming too bigeven for pylint…
bdev.GetPVInfo() returns list of LVs
This will be used for checks related to exclusive storage.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
cluster-verify checks that PVs are not shared
When exclusive_storage is set, cluster-verify complains if PVs are sharedamong unrelated LVs.