RAPI: Add support for tagging node groups
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
gnt-group: Add commands for tagging groups
masterd: Add support for tagging node groups
TLMigrateInstance: remove 10s sleeps
TLMigrateInstance._ExecMigration contains two 10-second sleeps betweenindividual migration steps.
Apart from prolonging the migration duration by 20s, the second sleepcauses FinalizeMigration to be called 10 seconds after the real...
gnt-group list: Query filter support
gnt-node list: Query filter support
Update manpage, quote field names.
gnt-instance list: Query filter support
cli: Add support for parsing query filters
cli: Add option to force names to be treated as filter
opcodes: Change parameter type definition for query filter
The old definition wouldn't accept integers.
cli: Error reporting for query filter parsing
qlang: Add function to distinguish filters from names
qlang: Add parser for query filter language
With this parser, command line utilities will be able to provide filtersthrough query2 in a simplistic language. Example filters:
name "node3.example.com" master or (name "node4.example.com") be/memory == 128 and name =~ /^web/i...
Add instance query field for OS parameters
These were not available as a query field before. Update unittestsand description text for the other “..params” fields.
Fix shared_file_storage_dir on upgrades
If the cluster was upgraded from 2.4 or earlier, this key won't exist(it's only set to a correct value on cluster init), so we need toproperly set it to a null string (disabled).
Signed-off-by: Iustin Pop <iustin@google.com>...
Prevent ssconf values from having non-string values
For whatever reason, my test cluster managed to acquireshared_file_storage_dir with a None value, instead of emptystring. This is not flagged in masterd itself, but the node daemonwill fail in writing the value to disk, as it calls len() on the...
cli: Replace hardcoded strings with constants
Merge branch 'devel-2.4'
Merge branch 'stable-2.4' into devel-2.4
LUInstanceQueryData: Don't acquire locks unless requested
Until now LUInstanceQueryData always acquired locks for the instance(s)and nodes involved. In combination with long-running operations thisprevented the use of “gnt-instance info”, even with the “--static”...
gnt-instance migrate: Adding --allow-failover option
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
TLMigrateInstance: Merge failover code, allow fallback
As the code for failover for checking is almost identical it's an easytask to switch it over to the TLMigrateInstance. This allows us tofallback to failover if migrate fails prereq check for some reason....
Increase the lock timeouts before we block-acquire
This has been observed to cause problems on real clusters via thefollowing mechanism:
- a long job (e.g. a replace-disks) is keeping an exclusive lock on an instance- the watcher starts and submits its query instances opcode which...
utils: Add function generating regex for DNS name globbing
The intent of this function is to be able to provide a globbing operatoror query filters. One should be able to say, for example, something tothe effect of “gnt-instance shutdown '*.site'”.
Also rename a variable in MatchNameComponent....
Verify file consistency using centrally computed list
Until now “gnt-cluster verify” (LUClusterVerify) would compute its ownlist of files to check for consistency. This list was not complete andcertain inconsistencies were missed.
With this patch the code is changed to use the list of files used by...
cmdlib: Factorize computation of ancillary files
… and change the logic in _RedistributeAncillaryFiles. The virtuallysame list of files will be used to verify the files' consistency.
qlang: Remove OP_GLOB operator
It'll be implemented using OP_REGEXP by the parser.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
query: Add implementation of regex match operator
So far this operator was not implemented. This patch adds an additionalvalue preparation function to the function table for binary operators,used to compile the regular expression. Unittests are included....
cmdlib: Fix mistake made in commit 75c7520f0
Commit 75c7520f0 used the wrong constant. I double-checked all otherchanges made in the commit.
cmdlib: Replace hardcoded values with constants
daemon.py: move startup log message before prep_fn
Before this, the output in the rapi daemon log was:2011-04-04 03:09:51,026: ganeti-rapi pid=17447 INFO Reading users fileat /var/lib/ganeti/rapi/users2011-04-04 03:09:51,027: ganeti-rapi pid=17447 INFO ganeti-rapi daemon...
Display the actual memory values in N+1 failures
This changes the display from:Mon Apr 4 02:29:46 2011 * Verifying N+1 Memory redundancyMon Apr 4 02:29:46 2011 - ERROR: node node2: not enough memory toaccomodate instance failovers should node node1 fail...
RAPI: Convert instance shutdown to the new FillOpCode
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
ssh.VerifyNodeHostname: remove the quiet flag
This is not needed for this function, and can interfere with debuggingof ssh failures.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add a simple wrapper over utils.Retry
The new wrapper makes moving legacy code to utils.Retry or addingretries in existing code simpler.
Automatically enable hail if enabled and found
Expose whether htools was enabled to Python code
This exports whether htools was enabled at configure-time, and adds aconstant for our reference iallocator.
test.ganeti.process_unittest: Fix race condition
There was a race condition on heavily loaded testsystem causing randomlyto fail the timeout unittests as the signal handler is not yet setup butthe timeout has already hit.
Therefore we introduce a workaround to wait until a program reached a...
RAPI client: Remove support for version 0 instance creation requests
RAPI server: Drop support for instance creation format 0
Ganeti 2.1.3, released in June 2010, added support for a new, extensibleinstance creation request format, called version 1. This patch removessupport for the old and undocumented version 0 format....
Improved GanetiRapiClient docstrings
- Added @rtype and/or @return where missing- Fixed @param for Query() filter_ parameter (colon was missing)
Signed-off-by: Simeon Miteff <simeon.miteff@gmail.com>Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Relax instance ERROR on admin_down on offline node
This fixes a issue, where an stopped instances is reported as ERRORin cluster verify if it lives on a offline node. As the instances isdown this shouldn't happen.
Signed-off-by: René Nussbaumer <rn@google.com>...
Implement submitting jobs from logical units
The design details can be seen in the design document(doc/design-lu-generated-jobs.rst).
watcher: improve logging a bit
Add some debug logging to detail why we don't run some steps.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Fix output for “gnt-job info”
If the result of an opcode was a non-empty dictionary, itwould be impossible to differenciate between input and result:
Input fields: […] debug_level: 0 fields: cluster_name,master_node,volume_group_name jobs: [[True, u'37922'], [True, u'37923'], [True, u'37924']]...
Rewrite of ensure-dirs in python
I provided unittest to test the important pieces of the infrastructure.The one remaining function (ResuriveEnsure) is not easy to unittestbut also not critical if it fails to operate correctly.
Add opcode summary to SubmitManyJobs errors
Requested-by: Iustin Pop <iustin@google.com>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
RAPI client: Tidy and test WaitForJobCompletion
- Use constants- Don't sleep if no delay is given- Mark function as deprecated: it uses polling instead of waiting for changes (but the latter needs authentication); it can still be used- Add unittests...
RAPI client: Add job status constants
RAPI client: Job IDs are strings
Split BuildHooksEnv of LUs
Commit dd7f677623 added another call to BuildHooksEnv to providepost-phase status variables. Since BuildHooksEnv also built the nodelists, that meant they have to be built twice. First a rather strictcheck was used, but it turned out to be more tricky. Commit b423c51336...
RAPI client: fix epydoc formatting
Add a helper function to the RAPI client
This adds a new method WaitForJobCompletion that can be used forclient who are not interested in the entire job log, just in itscompletion status.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>...
Remove restrictive hook node list check
Commit dd7f67762 added a restrictive check for the node lists returnedby BuildHooksEnv, leading to errors with some LUs, one of which wasfixed in commit 0dfa2c227. As it turns out, other LUs have similarissues, some not easy to fix. This patch disables the restrictive check...
watcher: Fix misleading usage output
When “ganeti-watcher” is called with an argument, it would hint ata non-existing “-f” parameter. With this patch the separate usagestring is no longer necessary.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Fix hook node list when adding node
This broke QA (and everyone trying to add a node) by complaining aboutdifferent node lists.
Clarify --force-join parameter message
This isn't only used during cluster merge.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
hooks: Provide variables with post-opcode values
When a hook is called, it is provided with a number of variablesdescribing the status of the instance/node/etc. before the operation.Some opcodes provide extra variables to see modified values from hooks,...
HooksMaster: Add more assertions for variable names
Also replace explicit loop with dict.update.
mcpu: Tidy HooksMaster a bit
- Dictionary indentation- Add empty lines for readability- Simplify conditional code
cmdlib: Factorize running post-pase hook
locking: Fix race condition in lock monitor
In some rare cases it can happen that a lock is re-created very soonafter deletion, while the old instance hasn't been destructed yet. Insuch a case the code would detect a duplicate name and raise anexception....
qlang: Remove unused import
RAPI: Add support for querying resources
- Access is only permitted for authenticated clients (queries can return sensitive data)- Filters can be specified when sending a PUT request- Updates RAPI client, documentation and tests
Add support for query resources in RAPI URIs
masterd: Simplify code for field queries
Instead of going via cmdlib and using special cases for differentresources, the list of fields is used directly.
constants: Rename QR_OP_*, add QR_VIA_RAPI
Commit 28b71a76 added a list of resources which can be queried usingLUXI. Unfortunately the variable was named “QR_OP_LUXI”, which can beconfusing. This patch renames “QR_OP_QUERY” to “QR_VIA_OP”, “QR_OP_LUXI”...
qlang: Remove unused ReadSimpleFilter
utils: Export NiceSortKey function
The ability to split a string into a list of strings and integers can behandy elsewhere and is necessary for sorting query results by names.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>...
TLReplaceDisks: Add check if disks are activated
Previously we failed later with a rather useless error message. Thispatch fixes this and tells the user to activate-disks if replace-disksis in the need of activated disks rather than abort with a cryptic error...
LUOsDiagnose: Move legacy behaviour into filter
The behaviour of LUOsDiagnose needs special treatment. Commit d22dfef7changed it to not return hidden, blacklisted or invalid OSes if therespective field is not requested. This behaviour needs to be preserved...
Convert OsDiagnose to query
qlang: Add some more documentation for filters
It's not perfect, but at least some more.
query: Add conversion wrapper
Allows converting the value of a column before returning it. Useful forsorting while still using one of the other generic functions forretrieving the value.
Fix epydoc warning about unknown reference
config: Wrap MatchNameComponent, reduce lock duration
- Remove duplication by merging two MatchNameComponent into a wrapper- Reduce lock duration by getting list of names under lock and then matching names without the lock- Also, ExpandNodeName's docstring is fixed....
opcodes: Document OpQueryFields' parameters
Treat empty oob_program param as default
There is currently no way to reset oob_program back to its default fromthe cmdline, which causes problems for cluster-merge. This patch meansthat the following now works: gnt-cluster modify --node-parameters oob_program=...
Fix bug in instance listing with orphan instances
Nodes can return unknown instances, so we shouldn't use the name as anindex without checking.
Fix bug related to log opening failures
If opening the log file fails, then we shouldn't attempt to use thatvariable.
Instance failover: fix bug for INT_MIRROR cases
Patches db366d9a and aac4511a added support for EXT_MIRROR instances,but inadvertently introduced a bug: for INT_MIRROR cases, we don'tneed (actually we can't support) neither an iallocator nor a targetnode....
gnt-cluster epo: Adding --power-delay flag
gnt-node power: Adding --power-delay flag
cli.py: Adding POWER_DELAY_OPT
The command line option --power-delay sets the time waited between powerons.
OpOobCommand: Adding power on delay
This delays the invocation of the power on of the next node. So if youpower on a bunch of nodes it will not blow the fuse.
OpOobCommand: Document all fields
gnt-cluster epo: Adding --shutdown-timeout
This adds the --shutdown-timeout flag to gnt-cluster epo to specify theshutdown timeout for instance shutdown.
Rename DTS_NET_MIRROR to DTS_INT_MIRROR
DTS_INT_MIRROR better contrasts DTS_EXT_MIRROR.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>[iustin@google.com: updated patch for changed context]Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
KVM: use cache=none for shared disk templates
Disable host cache for externally mirrored disks to avoid cache incoherency.Without this, migrations between the same two nodes may end up in diskcorruption.
This is a runtime override of cluster defaults, mostly a workaround....
Shared storage instance failover
Modify LUFailoverInstance to enable shared storage instances to failover.Shared storage instance failover requires either a target node or aniallocator to determine the target node. If none is given, the cluster default...
Shared storage node migration
Modify LUNodeMigrate to provide node migration for nodes with instances usingshared storage. gnt-node migrate has to be passed an iallocator for migrationof shared storage instances to be performed. When using a shared storage...
Shared storage instance migration
Modify LUMigrateInstance and TLMigrateInstance to allow instance migrations forinstances with DTS_EXT_MIRROR disk templates.
Migrations of shared storage instances require either a target node, or aniallocator to determine the target node. If none is given, the cluster default...
CLI changes to facilitate shared storage migration/failover
Add DST_NODE_OPT to cli.py to use for directly specifying the target nodeduring migration/failover.
gnt-instance failover/migrate also get passed an iallocator option.
gnt-node failover/migrate get only a target_node option....
Migration and failover: add iallocator and target_node slots
Add iallocator and target_node slots to OpMigrateInstance andOpFailoverInstance to facilitate shared-storage-backed instance mobility. Addiallocator slot to OpMigrateNode (no explicit target_node in this case)....
IAllocator changes to work with shared storage
Make cmdlib.IAllocator shared-storage-aware. IAllocator requires secondarynodes only on DTS_NET_MIRROR disk templates and requires no secondaries forDTS_EXT_MIRROR templates.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Shared block storage support
This patch introduces basic shared block storage support.
It introduces a new storage backend, bdev.PersistentBlockDevice, touse as a backend for shared block storage. The new bdev requires a newBLOCKDEV_DRIVER_MANUAL constant with the value "manual" and uses it as...
Add bdev_sizes RPC call
The bdev_sizes multi-node RPC call returns the sizes of the requestedblock devices on the desired nodes. Its intended use is to verify theexistence of a block device on a given node for shared block storagesupport.
Block device paths are expected to lie under constants.BLOCKDEV_DIR...
QA: Improve tests for gnt-os
- Test OS lists via command line and RAPI- Test “gnt-os diagnose” and “gnt-os info”
Log log-file reopening
This makes the log files get an record notifying of the reopen, so asto force creation of the log files soon after rotation.
Merge branch 'stable-2.4'