History | View | Annotate | Download (17.9 kB)
Move HooksMaster out of the mcpu module
We need to do this, so that backend.py doesn't need to import mcpu, andthus indirectly cmdlib. This reduces the size of the node daemon byabout half, which is very important as it is pinned in memory.
This solves Issue 419....
Merge branch 'devel-2.6'
Replace frozenset with compat.UniqueFrozenset
This is not a trivial s/frozenset/compat.UniqueFrozenset/, but ratheronly replaces “frozenset” where appropriate. Most of the places are“static” information that doesn't change after the module has beenloaded....
Stop verifying opcode results in dry_run mode
Commit 1ce03fb1 (“Add ht-based result checks to opcodes”) introducedinfrastructure for checking opcode results, and subsequent commitsimproved the list of opcodes which do declare a result, however thiswas not tested for dry-run mode operation....
mcpu: Verify node allocation lock mode
Add verification code to mcpu to check an LU's locks. Two whitelists areprovided to exclude LUs from the two tests.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
Support opportunistic locks in mcpu/LUs
Similar to “share_locks”, a new dictionary containing booleans for eachlocking level is added to “cmdlib.LogicalUnit”. Logical units wanting tomake use of opportunistic locks will be able to configure thisdictionary accordingly....
mcpu: Start locking at correct level
Commit 8716b1d added a new lock level, LEVEL_NODE_ALLOC. It is ahead ofLEVEL_INSTANCE. The latter was hardcoded in mcpu to be locked rightafter the BGL, effectively ignoring LEVEL_NODE_ALLOC.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
jqueue/mcpu: Determine priority using callback
Instead of being given the priority for acquiring locks by means of aparameter, mcpu will now call back. This is in preparation forimplementing a command to change a job's priority on the fly and allowsto change it while locks are being acquired (taking effect on the next...
Improve handling of lock exceptions
There are two issues with lock exceptions right now:
- first, we don't log the original error; this is fine for now (locking.py always returns the same error here), but in general is brittle: if locking.py would start returning more information, we'd...
Improve logging of AssertionErrors
Currently, when we have an assertion error raised from cmdlib, it looks like this:
[cluster] root@node4:~# gnt-instance grow-disk instance1 0 1G Failure: command execution error:
This is very very confusing. This patch adds a bit of traceback...
Migrate lib/mcpu.py from constants to pathutils
File system paths moved from constants to pathutils.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Bump pep8 version to 1.2
Debian Wheezy will ship with this version, and it has many improved checks compared to 0.6, so let's:
- bump version in the docs- silence some new checks that are wrong due to our indent=2 instead of 4- fix lots of errors in the code where the indentation was wrong by 1...
Stop acquiring BGL for LUXI queries
Short description: This fixes an issue whereby masterd would becomeunresponsive on the LUXI socket, leading to client timeouts. While madeworse in 2.5, the underlying issue was already present in 2.4.
Longer description: Until now all LUXI queries would acquire the BGL...
Fix error in opcode result processing
LUXI queries are processed without callbacks (seeserver.masterd.ClientOps._Query). With commit 07923a3c the logic forchecking an opcode's result for jobs to submit was changed andsubsequently raised an exception (“'NoneType' object has no attribute...
Copy debug level, priority and set comment for LU-generated opcodes
Before this patch, a node evacuation submitted with high priority wouldonly compute the solution at that priority, but the actual evacuationran at normal priority.
utils.text: Add function to truncate string
The function adds an ellipse if the string was actually truncated. Alsostart using it in mcpu for result checks (where the message is alsoslightly changed to use a colon).
mcpu: Make the op result exception more verbose
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Merge branch 'devel-2.5'
Move hooks PATH environment variable to constants
Move the contents of the PATH environment variable for hooks toconstants, and use its value in the code and in the hooks documentation.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Generalize HooksMaster
- remove any dependence on Logical Units from the HooksMaster;- add a new function parameter to the constructor, a function that is expected to convert the results of the hooks execution in a format understood by the HooksMaster;...
Use JoinDisjointDicts in mcpu
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Keep only one global RPC runner in Ganeti context
Instead of having one RPC runner per mcpu processor this will keep onlyone instance as part of the masterd-wide Ganeti context. Upcomingpatches will change the RPC runner to report pending requests to the...
DeprecationWarning fixes for pylint
In version 0.21, pylint unified all the disable-* (and enable-*)directives to disable (resp. enable). This leads to a lot ofDeprecationWarning being emitted even if one uses the recommendedversion of pylint (0.21.1, as stated in devnotes.rst)....
mcpu: Specify actual received type on opcode issue
This helped me debug an issue with opcodes.
Add ht-based result checks to opcodes
This adds the infrastructure necessary to check opcode results usinght-based functions. Checks are added for two opcodes.
mcpu: Add missing docstring to _ProcessResult
cmdlib: Remove acquired_locks attribute from LUs
The “acquired_locks” attribute in LUs is used to keep a list of acquiredlocks at each lock level. This information is already known in the lockmanager, which also happens to be the authoritative source. Removing the...
Merge branch 'devel-2.4'
Increase the lock timeouts before we block-acquire
This has been observed to cause problems on real clusters via thefollowing mechanism:
- a long job (e.g. a replace-disks) is keeping an exclusive lock on an instance- the watcher starts and submits its query instances opcode which...
Implement submitting jobs from logical units
The design details can be seen in the design document(doc/design-lu-generated-jobs.rst).
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Split BuildHooksEnv of LUs
Commit dd7f677623 added another call to BuildHooksEnv to providepost-phase status variables. Since BuildHooksEnv also built the nodelists, that meant they have to be built twice. First a rather strictcheck was used, but it turned out to be more tricky. Commit b423c51336...
Remove restrictive hook node list check
Commit dd7f67762 added a restrictive check for the node lists returnedby BuildHooksEnv, leading to errors with some LUs, one of which wasfixed in commit 0dfa2c227. As it turns out, other LUs have similarissues, some not easy to fix. This patch disables the restrictive check...
HooksMaster: Add more assertions for variable names
Also replace explicit loop with dict.update.
hooks: Provide variables with post-opcode values
When a hook is called, it is provided with a number of variablesdescribing the status of the instance/node/etc. before the operation.Some opcodes provide extra variables to see modified values from hooks,...
mcpu: Tidy HooksMaster a bit
- Dictionary indentation- Add empty lines for readability- Simplify conditional code
Fix LU processor's GetECId
The exception was never actually raised.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Adeodato Simo <dato@google.com>
mcpu: Automatically build the DISPATCH_TABLE
While reviewing dato's interdiff for the OpAssignGroupNodes, Irealised that we can do better. This patch replaces the hand-builtDISPATCH_TABLE with one built from the opcode.OP_MAPPING dict.
Signed-off-by: Iustin Pop <iustin@google.com>...
Rename (Op|LU)OutOfBand to (Op|LU)OobCommand
Add modification of node groups (OpCode/LU/CLI)
With this commit, only modification of the "ndparams" attribute issupported.
Signed-off-by: Adeodato Simo <dato@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Group operations: OpCode and LU for renaming a group
Signed-off-by: Adeodato Simo <dato@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Group operations: OpCode and LU for removing a group
Group operations: OpCode and LU for adding a group
Adding new OpCode for OOB
Register OpCode and Logical Unit in mcpu.py
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Querying node groups: LU/Opcode
This adds opcodes.OpQueryGroups and cmdlib.LUQueryGroups.
Signed-off-by: Adeodato Simo <dato@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add OpQuery opcode
Move locking.RunningTimeout to utils
As we need this functionality in other places than just locking it makessense to move it to utils rather than keeping it in locking
mcpu: Raise directly in _AcquireLocks
Removes code duplication.
mcpu: Implement priority for lock acquiring
Until now the priority for lock acquires couldn't be passedwhen running opcodes.
mcpu: Adjust lock acquire strategy
The changes to job queue processing require some changes on this class'interface. LockAttemptTimeoutStrategy might move to another place, but that'llbe done in a later patch.
mcpu.Processor: Raise exception on lock acquire timeout
Right now the timeout is not passed by any caller, making the codeeffectively go back to blocking acquires. Since the timeout is alwaysNone, no caller needs to be changed in this patch.
This change also means that any LUXI query handled by ganeti-masterd...
Remove mcpu's ReportLocks callback
This is no longer needed with the new lock monitor. One callback is kept tocheck for cancelled jobs.
Add test for some aspects of job queue
This new opcode and gnt-debug sub-command test some aspects of thejob queue, including the status of a job. The bug fixed in commit2034c70d507 was identified using this test. A future patch willrun this test automatically from the QA scripts....
Provide feedback function for all LU methods
By exposing mcpu's _Feedback function (now renamed to “Log”) to LU's,methods like ExpandNames can also write to the job execution log.
Remove the obsolete EvacuateNode OpCode/LU
All code has been switched to the new-style LU… time for cleanup.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Add opcode to prepare export
To prepare a remote export, the X509 key and certificate need to be generated.A handshake value is also returned for an easier check whether both clustersshare the same cluster domain secret.
Add LUNodeEvacuationStrategy
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
mcpu: Log lock status with sorted names
Reading and comparing sorted lists is easier when debugging locking problems.
Add targeted pylint disables
This patch should have only:
- pylint disables- docstring changes- whitespace changes
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Remove many 'Unused variable' warnings
Note there are some cases left which need extra cleanup.
Processor: support a unique execution id
When the processor is executing a job, it can export the execution id toits callers. This is not supported for Queries, as they're not executedin a job.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add config.DropECReservations
For now this function does nothing, but it gets called by mcpu when theexecution of an LU is done, making sure any pending reservations aredropped.
Convert the rest of the OpPrereqError users
This finishes the conversion of OpPrereqError creation to two-argumentstyle. Any leftovers as one-argument are not breaking anything, justlosing information about the errors.
mcpu: Use new timeout class for timeout
locking, mcpu: Ensure timeout is always >= 0.0
mcpu: Make sure added locks are released on errors
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
mcpu: Change lock attempt timeout calculation
With this patch all timeouts are pre-calculated. The interface ofthe _LockTimeoutStrategy class is also changed a bit; NextAttemptnow returns a new instance.
Code and docstring style fixes
Found using pylint and epydoc.
mcpu: Improve lock reporting with timeouts
mcpu: Implement lock timeouts
The timeout is always between ~0.1 and ~10.0 seconds. A smallvariation of ±5% is added to prevent different jobs fromfighting each other. After 10 attempts to acquire the locks witha timeout, a blocking acquire is made.
Lock status reporting will be improved in a separate patch....
mcpu: Remove unused exclusive_BGL attribute
Remove RpcResult.RemoteFailMsg completely
Keep lock status with every job
This can be useful for debugging locking problems.
Move OpCode processor callbacks into separate class
There are two major arguments for this:- There will be more callbacks (e.g. for lock debugging) and extending the parameter list is a lot of work.- In the jqueue module this allows us to keep per-job or per-opcode variables in...
mcpu: formatting/indenting fix
Small fix for a mistake done by bad editor settings.
Signed-off-by: Luca Bigliardi <shammash@google.com>
HooksMaster: fix RunPhase logging
In case of complete failure results is empty, return immediately(tnx unittests).
Signed-off-by: Luca Bigliardi <shammash@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
HooksMaster: logging hooks in RunPhase
Extend RunPhase so it will log hooks results in POST phase.
node-remove post on removed node
Run post phase of node-remove on the removed node as well.
HooksMaster: document raised exception
HooksAbort is raised, but not documented.
HooksMaster: list of nodes override
Allow the caller of HooksMaster.RunPhase() to specify an alternative list ofnodes.
Add OPMoveInstance and LUMoveInstance
This patch adds a basic version of LUMoveInstance. It doesn't yetsupport iallocator-mode and it's implemented in old-style (non-TL) mode.
Remove extra argument from HooksMaster class
The mcpu.py:HooksMaster class needs to have a proc attribute/argument toinit in ordet to call its LogWarning method. However, this is availablefrom the 'lu' attribute, so we can remove this dependency.
Add opcode to repair storage volumes
Implement instance recreate-disks
This can be used for a 'plain' type instance when the underlying storagewent away, to recreate the storage (and reinstall) instead of removingthe instance and readding it.
Merge commit 'origin/next' into branch-2.1
Post cluster initialization LU
Add an 'empty' logical unit to run hooks after cluster initialization.
Merge branch 'master' into next
Implement gnt-cluster check-disk-sizes
This patch adds a new opcode and lu for checking disk sizes. Currentlyit does only top-level disk verification, and also doesn't checkprimary/secondary node size mismatches (these two are added as TODOs inthe Exec() function of the LU)....
cmdlib: Add opcode to modify storage unit fields
Add new opcode to list physical volumes
cmdlib: Add new opcode to migrate node
It migrates all primary instances from the node to their secondaries.
Add new opcode to evacuate nodes
Merge branch 'next' into branch-2.1
Fix pylint warnings
Fix some typos
LU execution: implement dry-run framework
This patch adds a new (global) opcode flag 'dry_run' which, when True,causes early exit from the LU workflow, returning a special value fromthe LU object (initialized in the parent LogicalUnit class, and which if...
Fix various pylint warnings
There were multiple issues: - copy-paste resulted in wrong indentation - wrong function name - missing spaces around assignment - overriding built-in names (type, dir) or already defines ones (errors, hypervisor)
Convert hooks_runner rpc to new style result
This also converts (and fixes) unittests and mock objects to deal withthis change, and the custom hook verifier in cmdlib.LUClusterVerify.