Move _TimeoutExpired to utils
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
(cherry picked from commit f8326fcaac87958241d78526e5868d23d78ac286)
EPO: Pass the no_remember parameter to preserve state
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Workaround changed LVM behaviour
The vgreduce command has changed behaviour from when we initiallywrote the code (2.02.02 versus 2.02.66, 4 years delta):
- if there are LVs which will be impacted, it requires --force- otherwise refuses to proceed, but it still returns exit code 0...
Accept both PUT and POST in noded
This is a partial cherry-pick from7530364ddbe949bc34fc26f25ba3f5d921beb021 on master:
Currently, noded requires PUT, even though the semantics of the RPCcalls do not match a PUT. We change the code accept both PUT and POST,...
Merge branch 'stable-2.5' into devel-2.5
Fix type check for OpQuery.filter
Just using ht.TListOf as a type check doesn't work correctly. Thefunction must be called with the expected item type. In this specificcase TListOf was always called with the filter as a value, and theresult of that call evaluated to truth. Since filters can be quite...
Fix explanation of gnt-node evacuate --primaries-only
Furthermore, correct the --help display on evacuate.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix cluster verification issues on multi-group clusters
This patch attempts to fix a number of issues with “gnt-cluster verify”in presence of multiple node groups and DRBD8 instances split over nodesin more than one group.
- Look up instances in a group only by their primary node (otherwise...
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Migrate: don't check for free memory on cleanup
Cleanup just updates the config with the correct location of theinstance, or informs of its down status, but never starts it. As suchthere's no point in checking for enough free memory. Actually this check...
Revert "cli: Disable abbreviation matching for options"
This reverts commit 232aab3f4f602a19f1226e85c3a3ecb245d60af4. Weshouldn't change the parsing of command line options in the middle ofthe 2.5.x series.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Fix a bug in command line option parsing code
Fix bug affecting command line options of "keyval" type. Althoughescaping commands with \ is supported, it is is not applied to theinput recursively.
Signed-off-by: Nikos Skalkotos <skalkoto@grnet.gr>Signed-off-by: Iustin Pop <iustin@google.com>...
cli: Disable abbreviation matching for options
Python's “optparse” module does option name prefix matching by default.Since this can lead to confusing behaviour, e.g. by specifying “--force”for a command which only has a “--force-multi” option, this patch...
Merge branch 'devel-2.4' into stable-2.5
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
KVM: support version reported by 1.0
This of course was working for all the rcs, but broke with 1.0 itself.
In addition: - split between running kvm --version and parsing its output - unittest parsing for various known --help outputs - updated NEWS file...
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
jqueue: Factorize checking job processor's result
This allows for more unittesting.
jqueue: Fix epylint errors introduced in 37d76f1e4
jqueue: Fix deadlock between job queue and dependency manager
When an opcode is about to be processed its dependencies areevaluated using “_JobDependencyManager.CheckAndRegister”. Dueto its nature that function requires a lock on the manager'sinternal structures. All of this happens while the job queue...
locking: Add “__repr__” to SharedLock and PipeCondition
These help when debugging.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
daemon.GenericMain: Don't generate backtrace on conflicting daemons
Instead, print a nicer error message. This should fix issue 200.
utils.io.WritePidFile: Improve error reporting
If the PID file is already locked by another process, try to readthe content and report it as part of the error message.
utils.ListVisibleFiles: Hide “lost+found” directories
If a “lost+found” directory is found at a filesystem's root path it isignored. In all other cases directory entries named “lost+found” aretreated normally. Unittests are included. Fixes issue 153.
ConfigWriter: Fix epydoc error
The parameter is called “mods”, not “modes”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Andrea Spadaccini <spadaccio@google.com>(cherry picked from commit 1730d4a1ab56ef36d082b614d3d0ab13f3e14a85)
LUGroupAssignNodes: Fix node membership corruption
Note: This bug only manifests itself in Ganeti 2.5, but since theproblematic code also exists in 2.4, I decided to fix it there.
If a node was assigned to a new group using “gnt-group assign-nodes” the...
Fix pylint warning on unreachable code
Commit c50452c3186 added an exception when all instances should beevacuated off a node, but did so in a way which made pylint complainabout unreachable code.
LUNodeEvacuate: Disallow migrating all instances at once
There is a design issue in the iallocator interface which prevents usfrom doing this.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Separate OpNodeEvacuate.mode from iallocator
Until now the iallocator constants for node evacuation(IALLOCATOR_NEVAC_*) were also used for the opcode. However, it turnedout this was due to a misunderstanding and is incorrect. This patch addsnew constants (with the same values) and changes the affected places....
LUNodeEvacuate: Locking fixes
When evacuating a node, only an assertion without informative text wasused to check if the necessary node locks had been acquired. This was ontop of evaluating the list of nodes without having a node group lock, sothis was changed as well....
Fix error when removing node
ConfigWriter.GetAllInstancesInfo returns a dictionary, not a list.Removing a node would fail with “too many values to unpack”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Merge branch 'devel-2.4' into devel-2.5
LUInstanceCreate: Release unused node locks
After iallocator ran we can release any unused node locks. Since theymust be in exclusive mode this should improve parallelization duringinstance creation.
Document OpNodeMigrate's result for RAPI
- Commit b7a1c8161 changed the LU to generate jobs- Mention documented results in NEWS
Ensure unused ports return to the free port pool
Ensure ports previously allocated by calling ConfigWriter's AllocatePort() arereturned to the pool of free ports when no longer needed:
Fix newer pylint's E0611 error in compat.py
These are triggered by our "stay-compatible" approach.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Fail if node/group evacuation can't evacuate instances
If an instance can't be evacuated, only a message would be printed. Withthis change the operation always aborts. Newly added unittests check forthis behaviour.
LUInstanceRename: Compare name with name
… instead of object with name.
LUClusterRepairDiskSizes: Acquire instance locks in exclusive mode
Instances are modified if their disk size doesn't match.
Update synopsis for “gnt-cluster repair-disk-sizes”
Mention that instances can be passed on the CLI when “--help” is used.
Move hooks PATH environment variable to constants
Move the contents of the PATH environment variable for hooks toconstants, and use its value in the code and in the hooks documentation.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Check the results of master IP RPCs
A failed gnt-cluster (de)activate-master-ip would not produce any outputto the user. This patch adds code that checks for the results of theRPCs and raise an exception if appropriate.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>...
Add master IP turnup and turndown hooks
Reviewed-by: Michael Hanselmann <hansmi@google.com>Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Add RunLocalHooks decorator
Add the RunLocalHooks decorator, that allows the execution of hookslocally. Also, add a RunLocalHooks method to HooksRunner, to adapt thesignature of HooksRunner.RunHooks to the one expected by HooksMaster,and also to check that the hooks are being executed locally....
Generalize HooksMaster
- remove any dependence on Logical Units from the HooksMaster;- add a new function parameter to the constructor, a function that is expected to convert the results of the hooks execution in a format understood by the HooksMaster;...
jqueue: Allow zero jobs to be submitted at once
If cmdlib.LUNodeMigrate was called for a node without primary instancesit would try to submit an empty list of jobs. This was never visible viaCLI as there we check the list of primary instances first.
Move RenameFile to the new functions
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
ensure_dirs: Move some useful functions into utils.
With this change we can easily reuse this functionality where it makessense on other parts of Ganeti.
Use JoinDisjointDicts in mcpu
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add the JoinDisjointDicts function to utils.algo
Add a function that joins two dictionaries, enforcing the constraintthat the two key sets should be disjoint. Also, add unit tests for thisfunction.
Fix queue archive creation with wrong permissions
On a master failover some of the archive dirs might have wrongpermissions in the non-root model. This is due to the nature of nodedstill running as root and the job queue is synced that way. This patchwill fix this behaviour by setting the permissions accordingly....
Allow per-hypervisor optional files
Rather than just allowing files for all nodes to be optional, we allowoptional files to be per-category. The way this works is that they mustbe included in both lists (the new code checks for this).
The code also removes a duplicate assert (present both in verify and...
Add hypervisors ancillary files list
These lists will be used to declare some of the files not necessarilyneeded on all nodes. The files selected are files without which thevarious hypervisors can still work, but that when they are presentshould be synchronized across the cluster (or node group)....
xen: abstract a few hardcoded strings as constants
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Ensure permission on the job queue version file
OpGroupVerifyDisks: Fix wrong result type declaration
If an instance had actually a missing disk, the type check would fail.
RAPI: Make node evacuation actually work
Commit e1f23243 changed te LU and opcode for node evacuation to receivea “mode” parameter (among other things). Commit de40437a changed theRAPI code accordingly, but did so for an earlier version of the firstpatch. Obviously this couldn't work, so here's the fix....
Revert "rapi.client.ModifyNode should PUT rather than POST"
This was a mistake on my side because ModifyGroup and ModifyInstancewere PUT, and I was not aware of the discussion and the rationale whythis one had to be POST.
This reverts commit 55ef0cf6497c570aaab9413851435a7ee744222e....
Revert "Added SPICE TLS option and related cert paths"
This reverts commit bfe86c763a9ff1b481d799537ff0f0cf6740dfd1.This commit will be readded on master.
Revert "Implementation of TLS-protected SPICE connections"
This reverts commit b6267745ede04b3c943bc02e004bdb9347e0f564.This commit will be readded on master.
Revert "Add tls_ciphers and use_vdagent options"
This reverts commit 3e40b5879fa0070d6dd0e689dcfc31f20198a5a8.This commit will be readded on master.
rapi.client.ModifyNode should PUT rather than POST
This was caught (albeit in a sibylline manner) by unittests on masterwhich are not present in 2.5.
Fix RAPI node modify client and server calls
rapi.client.ModifyNode accepts a "group" and not a "node" param.(this bug is invisible but still not nice)
rlib2.R_2_nodes_name_modify submits the opcode with instance_name ratherthan node_name as a param. This would break the call....
xen: changes to facilitate "xl" support (xen 4.1)
- Copy the xl config file, in case there's any- Start instances by config file, not name (also xm compatible)- Start paused domains with p and not --paused (also xm compatible) Add a fixme for migration (changes are not xm compatible)...
xen: abstract instance config file naming
Abstract xen's 'xm' command as a constant
RAPI: Fix resource for replacing disks
Commit d1c172deb4f inadvertently changes the“/2/instances/[instance_name]/replace-disks” resource to use bodyparameters. There were no QA tests and the issue wasn't noticed.
This patch re-introduces support for query parameters and adds a QA...
rapi: Allow auto-promotion on node role change
rapi: Add resource for modifying node
A separate patch will add “auto-promote” through“/2/nodes/[node_name]/role”.
opcodes: Add comment to *SetParams result description
Explicitely say that the second element of the tuple is the new value.
rpc: Disable HTTP client pool and reduce memory consumption
We noticed that “ganeti-masterd” can use large amounts of memory,especially on large clusters. Measurements showed a single PycURL clientusing about 500 kB of heap memory (the actual usage depends on versions,...
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Fix issue when verifying cluster files
If a cluster has any non-master-candidate nodes, those don't contain allfiles (e.g. config.data). With commit aef59ae764dc (March 31st, 2011)the logic was changed and subsequently verifying a cluster with non-mcnodes would complain....
Revert "utils.log: Write error messages to stderr"
This reverts commit 34aa8b7c4bb6f5e2e788108e024c9cd70bdb3431. Writingerror messages to stderr would also include backtraces, something wetried to avoid in the past.
Fix adding nodes after commit 64c7b3831dc
Commit 64c7b3831dc changed the RPC call for verifying SSH connections.Unfortunately this case in adding nodes was missed.
LUClusterVerifyGroup: Spread SSH checks over more nodes
When verifying a group the code would always check SSH to all nodes inthe same group, as well as the first node for every other group. On bigclusters this can cause issues since many nodes will try to connect to...
Optimise cli.JobExecutor with many pending jobs
In the case we submit many pending jobs (> 100) to the masterd, theJobExecutor 'spams' the master daemon with status requests for thestatus of all the jobs, even though in the end it will only choose asingle job for polling....
Add gnt-cluster commands to toggle the master IP
Split starting and stopping master IP and daemons
ssh: Quote strings in error message
utils.log: Write error messages to stderr
When “gnt-cluster copyfile” failed it would only print “Copy of file …to node … failed”. A detailed message is written using logging.error.Writing error messages to stderr can be helpful in figuring out whatwent wrong (the messages also go to the log file, but not everyone might...
Migration: warn the user about hv version mismatch
Fix handling of cluster verify hooks
The change to enforce boolean results for cluster verify group opcodemissed the HooksCallBack, which uses a very ugly 1/0logic. Furthermore, the logic is wrong, since it unconditionallyresets the verify result to true....
Redistribute the RAPI certificate
This reverts to the old behaviour in Ganeti 2.4 and before.
RAPI: Fix wrong check on instance shutdown
Commit 7fa310f6d84 (April 1st, 2011) converted the RAPI resource forshutting down an instance to FillOpCode. Unfortunately it missed thefact that the shutdown resource gets its parameters as query arguments.
baserlib: Accept empty body in FillOpcode
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>(cherry picked from commit c6e1a3eef05674d637570c39f25a799cec7ba187)
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Add tls_ciphers and use_vdagent options
Implementation of TLS-protected SPICE connections
Added support for TLS-protected SPICE connections:
Added SPICE TLS option and related cert paths