gnt-network: Fix import for textwrap
The style guide says to use “import foo”-style imports, not “from fooimport bar” unless it's a Ganeti module. There are some places withexceptions, but this one certainly isn't warranted. Also fix the importorder....
gnt-network add: "--network" is required
Also do some minor code re-formatting.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
cmdlib: Replace period with colon in error message
query: Factorize code for getting statistics value
This was not only copied for the networking fields in commit 306bed0e,but commit cfcea7ef fixed wrongly ordered parameters and didn't fix theoriginal. Either way, this patch merges the two cases again. The newly...
gnt-network add: Network mode bash completion
This makes entering the command easier.
OpNetworkConnect: Check for network mode
Improve descriptions of network query fields
They should be in the same style as other descriptions.
Replace frozenset with compat.UniqueFrozenset
This is not a trivial s/frozenset/compat.UniqueFrozenset/, but ratheronly replaces “frozenset” where appropriate. Most of the places are“static” information that doesn't change after the module has beenloaded....
Enable job queries via confd in gnt-node and RAPI
This patch is enabling split queries for jobs for gnt-node and rapiaccess (only for job listing, not job waiting).
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Remove some unused Python code
This patch removes code which is no longer used due to refactoring:
- http.InitSsl, last usage removed in commit 33231500 (“Convert RPC client to PycURL”)- rapi.baserlib.MakeParamsDict, last usage remove in commit 4e5a68f8...
Add exclusive_storage node parameter
Unit tests updated and expanded with an inheritance check.
The flag has no effect yet.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Allow shutting down offline instance
If an instance is offline we definitely shouldn't start it up.But shutting it down, should it be up by mistake is not "that" bad.Still, we only allow it with --force, as it still performs an action onan instance we shouldn't touch. This should make everybody happy....
Allow running instances to be put online
If an instance is running (eg. ERROR_up) and at the same time offline,there's no way to either shut it down or reonline it. This allowsonlining it. Offlining is still disabled for running instances.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Fixes and improvements to comments
Some fixes, added more information in a few points, removed a stale (5+year old) TODO comment.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix error during cluster initialization due to hv_kvm
Commit 141d148 was a bit too enthusiastic. The three parameters added tothe list of parameters to be checked default to a value not evaluatingto false, leading to a failure on cluster initialization....
Make LUNetworkAdd pass _VerifyLocks()
LEVEL_NODE_ALLOC should be aquired too if LEVEL_NODE is ALL_SET.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add test for SPICE parameter list, add missing ones
“_SPICE_ADDITIONAL_PARAMS” is supposed to be the full list ofSPICE-related KVM hypervisor parameters with the exception of“HV_KVM_SPICE_BIND”. The new test checks if all parameters starting with“HV_KVM_SPICE_*” are included. Three previously missing parameters are...
hv_kvm: List of SPICE parameters should be module-global
This list is static at runtime and doesn't need to be recreated everytime.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
_DeclareLocksForMigration: Fix non-DRBD locking issue
When non-DRBD disks are used for an instance,“lu.needed_locks[locking.LEVEL_NODE]” is set to “locking.ALL_SET” (whichis None). The assertion will then fail as None evaluates to False.
Reported by Constantinos Venetsanopoulos....
LUOobCommand: Always declare NAL in shared mode
Doing so avoids triggering an assertion in mcpu which cross-checks thenode lock and node allocation lock acquisition mode.
Fix two logging messages in TLReplaceDisks
Commit f0f8d060 (“Show old primary/secondary node on diskreplacement”) added two wrong uses of feedback_fn, which results inlog entries such like these:
"log": [ [ 7, [1351258326, 466214], "message", "Replacing disk(s) 0 for instance 'instance1.example.com'"...
Add optional formatting for OP_DSC_FIELD
For some opcodes, the output is not "stable", and depends on the exactinput values; this makes it harder to check consistency againstHaskell code.
To compensate for this, we add a way to override the formatting of the...
Fix an small but quite nasty typo
Introduced in commit d4752785.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Very very very basic openvswitch support
This is a "better than nothing" support, just for kvm and just joiningthe machine to the opevswitch bridge with the right command.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>...
Pass check_ip and not hostname.ip to _ComputeNics
This should be done because in the case of --no-name-check thereis no 'hostname.ip' attribute, causing an execution error.'check_ip' is always set (in CheckArguments) even if --no-name-checkis passed in the command line....
Read watcher pause using RPC, not directly
The master daemon should not directly read files written by the nodedaemon. This patch adds a new RPC to read the watcher pause file andchanges the master code to use it.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Set watcher pause on all nodes
Instead of just setting the watcher pause file, which isn't replicated,RPC is used to set it on all nodes (where possible). This means thatafter an outage of the master node and a following master-failover, thewatcher will still be paused....
Add RPC for setting watcher pause
The watcher pause file should be set/unset on all nodes at once, notonly the master node. For that a new RPC is needed.
jqueue: Improve inotify error reporting
This addresses issue 218. When the number of inotify watches isexhausted, for example by being set too low from the beginning or byother programs, waiting for a job to change would just report a lost job(e.g. “Error checking job status: Job with id 7817 lost”)....
Improve test for tools.ensure_dirs
- Add more checks, some of them are deliberately redundant- Descriptive error messages- Add comment describing order to “tools.ensure_dirs”- Avoid copying a list in an assertion in “tools.ensure_dirs”
uidpool: Remove roman number support
Doing so simplifies to code a bit and never had a practical use.
Remove checks wrt IDISK_PARAMS from OpCode level
Change the "--disks" option validation, to just check the formatof the dict and do not check whether the keys are included in theIDISK_PARAMS constant at OpCode level. This allows the passing ofarbitrary parameters at the CLI, which will then be logically...
Move the path of the DRBD status file to the Constants file
It will be needed by the DRBD data collector, that will be added shortly.
Signed-off-by: Michele Tartara <mtartara@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix ordering of entries in tools.ensure_dirs
Commit ebd437a added two new entries to tools.ensure_dirs, but did so inthe wrong order. Patch forthcoming to improve the unittest's errormessage.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Helga Velroyen <helgav@google.com>
Set owner on watcher pause and queue drain files
If the files were created by a user different, e.g. due to a switch fromrunning masterd as root to running it as a dedicated user, they couldn'tbe modified/removed anymore.
lib/tools/ensure_dirs.py: Code formatting
Wrap lines in a consistent manner (uid/gid on the second line) ifwrapping is necessary at all. “git diff --color-words” shows nodifference at all as only whitespace changed.
Replicate queue drain flag across all master candidates
Until now, the flag was unset on a master failover unless the“$localstatedir/lib/ganeti/queue/drain” file existed.
Add utility function to create frozenset with unique values
When used instead of a plain call to “frozenset”, this would haveavoided the issue fixed in commit e2dd6ec. The new function is locatedin the “compat” module as it will be used at module load time in most...
constants: Remove duplicate DRBD barrier option
Change value for ECODE_TEMP_NORES
Unfortunately there was a bug in commit 518a45e whereby ECODE_TEMP_NOREShad the same value as ECODE_NORES, leading to failures in a Haskelltest. Of course this would also have affected other users of the constant.
masterd: Remove duplicate code
Improve error message for when adding inotify watcher fails
Explicitely mention the fs.inotify.max_user_watches sysctl value.
Add error code for temporary lack of resources
When an instance creation uses opportunistic locks, the iallocator mightnot be able to find an allocation solution if not enough node locks (ora suboptimal subset thereof) were acquired. As per the design document...
Export error codes from RAPI client module
Until now the error codes were not available from the RAPI clientmodule. A newly added unit test ensures all error codes are contained in“ECODE_ALL”, as well as ensuring consistency between the RAPI client and...
cmdlib: Use locked nodes as node whitelist
Also actually start using opportunistic locks (if requested).
cmdlib: Opportunistic locking on instance creation
Adds a new parameter to “OpInstanceCreate” and “OpInstanceMultiAlloc” touse opportunistic locks.
cmdlib: Node whitelist support for allocation request
Forward the node whitelist to the iallocator plugin.
mcpu: Verify node allocation lock mode
Add verification code to mcpu to check an LU's locks. Two whitelists areprovided to exclude LUs from the two tests.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
_ExportQuery: Use node allocation lock
Block instance allocations when all node locks will be acquired.
LUBackupExport: Use node allocation lock
LUBackupRemove: Use node allocation lock
LUInstanceMultiAlloc: Use node allocation lock
Avoid conflicts between instance allocations.
LUInstanceRecreateDisks: Use node allocation lock
LUNodeSetParams: Use node allocation lock
LUNodeQueryvols: Use node allocation lock
LUOobCommand: Use node allocation lock
If no node names are given, all node locks are acquired.
Add tool to clean up node
Sometimes a node is not removed properly from a cluster (especiallyduring development). This new tool stops all daemons and removes (aftermaking copies) the most critical files.
Add tool to configure node daemon
The design for this is in “doc/design-node-add.rst”. The tool receives aJSON data structure on stdin and configures the node's daemon afterverifying the received values.
Switch from scp/ssh to node daemon setup utility
This patch does away with many calls to scp and by means of using“tools/node-daemon-setup”, verifies most of the values before writingthem to files.
Locking related fixes for networks
Use GetNetwork() only when having already aquired the lock,i.e. in CheckPrereq().
In LUNetworkConnect/Disconnect do not include Network info in Hooksenvironment, so that network locking can be avoided if conflictsare not checked....
jqueue: Don't modify input opcode when changing priority
Commit 4679547 implemented the ability to change job's priority after itwas submitted. The code contained a bug whereby it would modify theinput data for an opcode, something the job queue shouldn't do (logical...
Use new util function for mac_prefix validation
Use new NormalizeAndValidateThreeOctetMacPrefix() util function inLUNetworkAdd/LUNetworkSetParams to validate network's MAC prefix.Additionally, move the check in CheckArguments() in the case ofLUNetworkAdd....
LUClusterRedistConf: Use node allocation lock
All node locks are acquired.
LUClusterRepairDiskSizes: Use node allocation lock
This opcode acquires all node resource locks, which conflicts withinstance allocations.
LUGroupVerifyDisks: Use node allocation lock
See comment in code.
LUClusterVerifyGroup: Use node allocation lock
LUInstanceReplaceDisks: Acquire node allocation lock
If the lock was acquired in the first place (only when an iallocator isused), it is released as soon as possible.
LUInstanceChangeGroup: Acquire node allocation lock
Changing instances' groups shouldn't conflict with instance allocations.
Acquire node allocation lock during node query
If locking is used (usually by ganeti-watcher), node allocations must betemporarily blocked.
iallocator: Add node whitelist
In the future instance creations might have a lock on all nodes as wasthe case until the implementation of opportunistic locking. Nodes forwhich the lock is not held will be shown to the iallocator plugin as ifthey were marked offline....
Allow ignoring successful commands in "gnt-cluster command"
In some cases it is useful to ignore the output of and avoid mentioningsuccessful commands. One would be when looking for a certain string ina file:
$ gnt-cluster command egrep -q '^testing$' /etc/......
errors: Show error descriptions in API documentation
Comments with a colon after the hash sign (“#:”) show up in theepydoc output.
Fix locking mistake introduced in commit 5cc1f88
The node resource locks were not set correctly on instance import.
Add safety check on job dependency/TIsLength
If TIsLength is applied to a non-container item, it will fail (typeerror) due to invalid application of len(). Since this can happen onuser-supplied data, we add an explicit TList/TTuple check (the TTupletest is a new one)....
LUClusterSetParams: Use node allocation lock
All resources are acquired and opportunistic instance creations wouldfail. Also add a TODO.
LUInstanceCreate: Acquire node allocation lock
Opportunistic locks are not yet used. This patch changesLUInstanceCreate to acquire the node allocation lock to avoid conflictswith other opcodes acquiring many node locks.
Acquire node allocation lock for failover/migration
See code for an explanatory comment. The lock is released as soon aspossible.
Use GetMultiInstanceInfo in LUNetwork* opcodes
LUNetworkConnect/Disconnect looks up a nodegroup's instances forconflicting IPs. To do so, use GetNodeGroupInstances() andGetMulitInstanceInfo().
Additionally, check if the correct locks were acquired.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>...
utils.text: Function to verify MAC address prefix
The network management code needs to verify a MAC address prefix.Instead of (ab)using NormalizeAndValidateMac, clean code should be used.Unit tests for NormalizeAndValidateMac are updated and new ones for...
Factorize code for checking node daemon certificate
This code is going to be used by a new utility for setting up the nodedaemon. Unit tests are updated/added.
Additionally, the certificate and key stored in “server.pem” areverified, too.
Support opportunistic locks in mcpu/LUs
Similar to “share_locks”, a new dictionary containing booleans for eachlocking level is added to “cmdlib.LogicalUnit”. Logical units wanting tomake use of opportunistic locks will be able to configure thisdictionary accordingly....
Add opportunistic locking to GanetiLockManager
Just forwarding the parameter, nothing more.
locking: Implement opportunistic locking in LockSet
This patch adds a new parameter to “LockSet.acquire” named“opportunistic”. When enabled the lockset will try to acquire as manylocks as possible, but it won't wait for them (with the exception of thelockset-internal lock in case the whole set is acquired). This is...
Add ssconf function to read all files
Configuring a node daemon on a newly added node will need all ssconfvalues.
bootstrap.RunNodeSetupCmd: Add IPv6 support
Commit 224ff0f modified the node SSH setup to use the system's SSHclient. Before that Paramiko was used. It's not entirely clear whehterthe latter ever supported IPv6 properly, but with this patch“bootstrap.RunNodeSetupCmd” is changed to use it if configured. The code...
Factorize running node setup command
Part of the code used for running “prepare-node-join” can be re-used forrunning a tool to configure the node daemon.
ssconf: Add dry-run support for writing files
A new utility for configuring the node daemon will support a dry-runmode. This patch adds the necessary functionality to“ssconf.SimpleStore” and provides comprehensive tests for“SimpleStore.WriteFiles”. To enable the latter, a testing-only parameter...
ssconf: Add function to verify keys
The new utility for configuring the node daemon will have to checkwhether it received valid ssconf names.
LUNetworkAdd: Log warning when needed
In case conflicts are checked, log warnings if nodes' IPs cannotbe reserved.
Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Use constants.IP4_VERSION in LUNetworkAdd
Fix locking in networks
Ensure that locks are held only if needed.
Add conflicts_check in OpNetworkAdd. This is needed if we want tocheck whether nodes/master IPs are included in network.
Depending on conflicts_check value, we have to hold node/instance locks...
Rename OpTestAllocator.allocator to iallocator
This makes the OpCode more consistent with the other opcodes. Thedownside is incompatibility when upgrading from 2.6, but since this isa test opcode it shouldn't be problematic.
Signed-off-by: Iustin Pop <iustin@google.com>...
Add a helper for the "iallocator" opcode field
This field is used with just changed description in about 10 opcodes,so unifying it makes things simpler for future potential changes tothe field type.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add Group, OS and Backup opcodes
This also corrects a docstring in OpBackupExport on the Python side.
Fix empty list as default value in OpInstanceMultiAlloc
Commit 12e62af5 (“Adding the new opcode for multi-allocation”)introduced a "bad" default value; while porting this to Haskell, Irealised this is wrong.
Fix breakage introduced in commit a8b3b09
The order of the calls to “ctx.use_privatekey” and “ctx.use_certificate”was wrong, leading to an exception being thrown.
Factorize SSL context setup for certificate check
This code will also be used by the node daemon setup utility.
Introduce ht.TMaybeValueNone and ht.TValueNone
TValueNone checks if a value is "none" and TMaybeValueNone is a wrapperof TOr(TValueNone, x). This is used by OpNetworkSetParam in order toreset a network value (e.g. mac_prefix, gateway, etc.)
opcodes: Replace manual loop with map
Also remove a superfluous empty line in test file.
Fix type descriptions in RAPI documentation
This patch adds descriptors to the “_CheckCIDR*” functions in opcodesand improves the descriptions generated by “ht.TInstanceOf”, therebyindirectly fixing bad type descriptions in the RAPI documentation.
Before this patch:...