History | View | Annotate | Download (18.1 kB)
mcpu: Adjust lock acquire strategy
The changes to job queue processing require some changes on this class'interface. LockAttemptTimeoutStrategy might move to another place, but that'llbe done in a later patch.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
mcpu.Processor: Raise exception on lock acquire timeout
Right now the timeout is not passed by any caller, making the codeeffectively go back to blocking acquires. Since the timeout is alwaysNone, no caller needs to be changed in this patch.
This change also means that any LUXI query handled by ganeti-masterd...
Remove mcpu's ReportLocks callback
This is no longer needed with the new lock monitor. One callback is kept tocheck for cancelled jobs.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add test for some aspects of job queue
This new opcode and gnt-debug sub-command test some aspects of thejob queue, including the status of a job. The bug fixed in commit2034c70d507 was identified using this test. A future patch willrun this test automatically from the QA scripts....
Provide feedback function for all LU methods
By exposing mcpu's _Feedback function (now renamed to “Log”) to LU's,methods like ExpandNames can also write to the job execution log.
Remove the obsolete EvacuateNode OpCode/LU
All code has been switched to the new-style LU… time for cleanup.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Add opcode to prepare export
To prepare a remote export, the X509 key and certificate need to be generated.A handshake value is also returned for an easier check whether both clustersshare the same cluster domain secret.
Add LUNodeEvacuationStrategy
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
mcpu: Log lock status with sorted names
Reading and comparing sorted lists is easier when debugging locking problems.
Add targeted pylint disables
This patch should have only:
- pylint disables- docstring changes- whitespace changes
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Remove many 'Unused variable' warnings
Note there are some cases left which need extra cleanup.
Processor: support a unique execution id
When the processor is executing a job, it can export the execution id toits callers. This is not supported for Queries, as they're not executedin a job.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add config.DropECReservations
For now this function does nothing, but it gets called by mcpu when theexecution of an LU is done, making sure any pending reservations aredropped.
Convert the rest of the OpPrereqError users
This finishes the conversion of OpPrereqError creation to two-argumentstyle. Any leftovers as one-argument are not breaking anything, justlosing information about the errors.
Signed-off-by: Iustin Pop <iustin@google.com>...
mcpu: Use new timeout class for timeout
locking, mcpu: Ensure timeout is always >= 0.0
mcpu: Make sure added locks are released on errors
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
mcpu: Change lock attempt timeout calculation
With this patch all timeouts are pre-calculated. The interface ofthe _LockTimeoutStrategy class is also changed a bit; NextAttemptnow returns a new instance.
Code and docstring style fixes
Found using pylint and epydoc.
mcpu: Improve lock reporting with timeouts
mcpu: Implement lock timeouts
The timeout is always between ~0.1 and ~10.0 seconds. A smallvariation of ±5% is added to prevent different jobs fromfighting each other. After 10 attempts to acquire the locks witha timeout, a blocking acquire is made.
Lock status reporting will be improved in a separate patch....
mcpu: Remove unused exclusive_BGL attribute
Remove RpcResult.RemoteFailMsg completely
Keep lock status with every job
This can be useful for debugging locking problems.
Move OpCode processor callbacks into separate class
There are two major arguments for this:- There will be more callbacks (e.g. for lock debugging) and extending the parameter list is a lot of work.- In the jqueue module this allows us to keep per-job or per-opcode variables in...
mcpu: formatting/indenting fix
Small fix for a mistake done by bad editor settings.
Signed-off-by: Luca Bigliardi <shammash@google.com>
HooksMaster: fix RunPhase logging
In case of complete failure results is empty, return immediately(tnx unittests).
Signed-off-by: Luca Bigliardi <shammash@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
HooksMaster: logging hooks in RunPhase
Extend RunPhase so it will log hooks results in POST phase.
node-remove post on removed node
Run post phase of node-remove on the removed node as well.
HooksMaster: document raised exception
HooksAbort is raised, but not documented.
HooksMaster: list of nodes override
Allow the caller of HooksMaster.RunPhase() to specify an alternative list ofnodes.
Add OPMoveInstance and LUMoveInstance
This patch adds a basic version of LUMoveInstance. It doesn't yetsupport iallocator-mode and it's implemented in old-style (non-TL) mode.
Remove extra argument from HooksMaster class
The mcpu.py:HooksMaster class needs to have a proc attribute/argument toinit in ordet to call its LogWarning method. However, this is availablefrom the 'lu' attribute, so we can remove this dependency.
Add opcode to repair storage volumes
Implement instance recreate-disks
This can be used for a 'plain' type instance when the underlying storagewent away, to recreate the storage (and reinstall) instead of removingthe instance and readding it.
Merge commit 'origin/next' into branch-2.1
Post cluster initialization LU
Add an 'empty' logical unit to run hooks after cluster initialization.
Merge branch 'master' into next
Implement gnt-cluster check-disk-sizes
This patch adds a new opcode and lu for checking disk sizes. Currentlyit does only top-level disk verification, and also doesn't checkprimary/secondary node size mismatches (these two are added as TODOs inthe Exec() function of the LU)....
cmdlib: Add opcode to modify storage unit fields
Add new opcode to list physical volumes
cmdlib: Add new opcode to migrate node
It migrates all primary instances from the node to their secondaries.
Add new opcode to evacuate nodes
Merge branch 'next' into branch-2.1
Fix pylint warnings
Fix some typos
LU execution: implement dry-run framework
This patch adds a new (global) opcode flag 'dry_run' which, when True,causes early exit from the LU workflow, returning a special value fromthe LU object (initialized in the parent LogicalUnit class, and which if...
Fix various pylint warnings
There were multiple issues: - copy-paste resulted in wrong indentation - wrong function name - missing spaces around assignment - overriding built-in names (type, dir) or already defines ones (errors, hypervisor)
Convert hooks_runner rpc to new style result
This also converts (and fixes) unittests and mock objects to deal withthis change, and the custom hook verifier in cmdlib.LUClusterVerify.
Add a node powercycle command
This (somewhat big) patch adds support for remotely rebooting the nodesvia whatever support the hypervisor has for such a concept.
For KVM/fake (and containers in the future) this just uses sysrq plus a‘reboot’ call if the sysrq method failed. For Xen, it first tries the...
An attempt at fixing some encoding issues
This patch unifies the hardcoded re-encoding attempts into a singlefunction in utils.py. This function is used to take either an unicode orstr object and convert it to a ASCII-only str object which can be safely...
Forward port the live migration from 1.2 branch
This is forward port via copy (and not individual patches cherry-pick)of the latest code on the 1.2 branch related to the migration.
The changes compared to 1.2 are the fact that we don't need theIdentifyDisks step anymore (the drbd rpc calls are independent now), and...
Introduce a very simple LU to force config updates
This LU can be used to force a push of the config in case it's needed,for example after an upgrade to update the ssconf_release_version file.
Reviewed-by: imsnah
Make cluster verify understand offline nodes
This patch changes cluster verify to not alert on offline nodes, butinstead just show a note at the end with the number of such nodes.
It also removes warnings in verify-disks and hooks about failures tomake rpc calls to such nodes....
Convert rpc results to a custom type
For a long time we had the problem that both RPC-layer errors andresults from the remote node share the same "valuespace". This isbecause we shouldn't raise an exception when only one node failed(and lose the results from the other nodes)....
Add a gnt-node modify operation
This patch adds the OpCode, LogicalUnit and gnt-node command formodifying node parameters, more specifically the master candidate flagfor a node.
Documentation updates for mcpu.py
This is the only change needed to make mcpu epydoc-compliant.
Reviewed-by: ultrotter
Improve the mcpu.Processor logging routines
As discussed previously, many of the routinges in cmdlib.py are usinglogging functions as a carry-over from 1.2 (when these also showed themessage on stderr/to the user), instead of actually warning the user....
Convert mcpu.py to use the logging module
Convert rpc module to RpcRunner
This big patch changes the call model used in internode-rpc fromstandalong function calls in the rpc module to via a RpcRunner class,that holds all the methods. This can be used in the future to enablesmarter processing in the RPC layer itself (some quick examples are not...
Implement job 'waiting' status
Background: when we have multiple jobs in the queue (more than just afew), many of the jobs (up to the number of threads) will be in state'running', although many of them could be actually blocked, waiting forsome locks. This is not good, as one cannot easily see what is...
Don't pass sstore to LUs anymore
sstore is no longer used in LUs.
Reviewed-by: iustinp
Convert mcpu.py
Replacing ssconf with configuration.
Add new query to get cluster config values
This can be used to retrieve certain cluster config values fromwithin clients.
OpDumpClusterConfig was not used anywhere, hence I'm just reusingit. The way ConfigWriter.DumpConfig returned the configurationwas not thread-safe, anyway (no deepcopy)....
Implement adding/removal of locks by declaration
With this patch LUs can declare locks to be added when they start and/orremoved after they finish. For now locks can only be added in theacquired state, and removed if owned, and added locks default to be...
Use is_owned to determine whether to unlock
Now that is_owned is public we don't need to play games at the end of anLU. If we're still owning anything we just release it.
Processor: remove ChainOpCode
This function was incompatible with the new locking system, and itsusage has been removed from the code. For now LUs share code by callingcommon module-private functions in cmdlib.py, in the future they willuse tasklets (when those will be implemented)....
Fix issue when acquiring empty lock sets
By design if an empty list of locks is acquired from a set, no locks areacquired, and thus release() cannot be called on the set. On the otherhand if None is passed instead of the list, the whole set is acquired,...
Processor: lock all levels even if one is missing
If a locking level wasn't specified locking used to stop. This meansthat if one, for example, didn't specify anything at the LEVEL_INSTANCElevel, no locks at the LEVEL_NODE level were acquired either. With this...
ChainOpCode is still BGL-only
Prevent mistakes with an assert.
Fix pylint-detected issues
This is mostly: - whitespace fix (space at EOL in some files, not all, broken indentation, etc) - variable names overriding others (one is a real bug in there) - too-long-lines - cleanup of most unused imports (not all)...
Make sharing locks possible
LUs can declare which locks they need by populating theself.needed_locks dictionary, but those locks are always acquired asexclusive. Make it possible to acquire shared locks as well, bydeclaring a particular level as shared in the self.share_locks...
Add LogicalUnit.DeclareLocks
This additional LogicalUnit function is optional to implement, but letsyou change your locking needs for one level just before locking it, butafter the previous levels have been already locked. It is useful forexample to calculate what nodes to lock after locking an instance....
Rework master startup/shutdown/failover
This (big) patch reworks the master startup/shutdown and the fixes themaster failover.
What does the patch do?
For master start/stop: - remove the old ganeti-master script and its associated man page - moves the ip start/stop directly into the backend.(Start|Stop)Master...
Invert nodes/instances locking order
An implementation mistake from the original design caused nodes to belocked before instances, rather than after. This patch inverts the levelnumbering, changing also the relevant unittests and the recursivelocking function starting point....
First version of user feedback fixes
This patch contains a raw version for fixing feedback_fn.
The new mechanism works as follows: - instead of a per-Processor feedback_fn, there's one for each ExecOpCode, so that feedback for different opcodes go via possibly...
Processor: Acquire locks before executing an LU
If we're running in a "new style" LU we may need some locks, as requiredby the ExpandNames function, to be able to run. We'll walk up the locklevels present in the needed_locks dictionary and acquire them, then run...
LogicalUnit: add ExpandNames function
New concurrent LUs will need to call ExpandNames so that any namespassed in by the user are canonicalized, and can be used by hooks,locking and other parts of the code. This was done in CheckPrereqbefore, but it's now splitted out, as it's needed for locking, which in...
Processor: Move LU execution to its own method
This makes the try...finally code simplier, and helps adding a morecomplex locking structure before the actual execution. It also fixes aconcurrency bug caused by the fact that write_count was read beforeacquiring the BGL, and thus spurious config update hooks run could have...
Pass context to LUs
Rather than passing a ConfigWriter to the LUs we'll pass the wholecontext, from which a ConfigWriter can be extracted, but we can alsoaccess the GanetiLockManager. This also fixes the places where a FakeLUis created.
Context: s/GLM/glm/
Make the GanetiLockManager instance of GanetiContext lowercase
Processor: acquire the BGL for LUs requiring it
If a LU required the BGL (all LUs do, right now, by default) we'llacquire it in the Processor before starting them. For LUs that don'twe'll still acquire it, but in a shared fashion, so that they cannot run...
Processor: pass context in and use it.
The processor used to create a new ConfigWriter when it was initialized.We now have one in the context, so we'll just recycle it. First of allwe'll pass the context in when creating a new Processor object, thenwe'll just use context.cfg, which is granted to be initialized, wherever...
Fix sstore handling in Processor
- no need to keep the sstore as an object member, remove it- don't reinitialize sstore only if self.cfg is None This is not an issue, as the Processor is recycled for every opcode, but in general we know that (a) we might need a different type of...
Fix gnt-cluster “command” and “copyfile”
Since the disabling of forking in the master daemon, the two ssh-basedsubcommands were not working anymore. However, there is no need at allfor the commands to be run from the master daemon (permissions to readthe cluster private ssh key notwithstanding), they can be run directly...
Implement disk grow at LU level
This patch adds a new opcode and LU for growing an instance's disk.
The opcode allows growing only one disk at time, and will throw an errorif the operation fails midway (e.g. on the primary node after it hasbeen increased on the secondary node). As such, it might actually leave...
Move SetKey to WritableSimpleStore and use it
Before we used to be able to update SimpleStore by just calling SetKey, thisfeature is now moved to an external class, which inherits from it. In thispatch the new WritableSimpleStore class is also put to use, in the LUs that...
Move InitCluster opcode into a single function
This allows us to initialize a new cluster. The code certainly containsbugs and hooks aren't implemented yet.
Remove REQ_CLUSTER from opcode handling code
It's not needed anymore now that all opcodes require a cluster. Clusterinitialization was the only exception.
Add a LU Hooks notification function
Previously LUs could be failed by pre-hooks, and post-hooks just had effects bythemselves. This patch allows a LU to define the HooksCallBack function if itwants to know about its hooks' results and alter its results in response....
HooksMaster: Make RunPhase return the rpc output
Right now the hooks output is propagated from the nodes all the way up toHooksMaster.RunPhase, which uses it for debugging PRE hooks, but then silentlydiscards them. We'll now propagate it up to the Processor.ExecOpCode function,...
Add gnt-backup remove functionality
This patch also fixes the LUExportInstance Prereq docstring.
Allocator framework, 1st part: allocator input generation
In preparation for the introduction of automatic instance allocator,this patch adds an allocator simulation opcode, that based on the inputparameters, will return either the input message to the allocator...
parms->params Refactoring
- Substitute all occurences of name 'parms' with 'params'- Small codestyle fix
Map OpSetClusterParams to correponding LU
Change the order of config updates in some LUs
In the start and stop instance LUs, the configuration update is doneright at the end. This means that if, for example, the instance shutdownsucceeds, but the drive deactivation fails, the next run of the watcher...
Remove the add/remove mirror operations
These two operations are related to md/drbd7 code (remote_raid1). Removethem as part of the md/drbd7 removal.
Codestyle fixes: adding a few empty lines
Fixes small spell mistakes and comments
Add a test opcode that sleeps for a given duration
This can be used for testing purposes.
Reviewed-by: ultrotter,imsnah