jqueue: Update worker thread name to include opcode summary
With this patch, the worker thread name is updated to include a shortsummary of the opcode (basically its OP_ID). The base name of job queuethreads is shortened from “JobQueue” to “Jq”. Logs and the lock monitor...
Use the new dry-run mode in cmdlib
This will hopefully detect potential LVM (or any other storage, whenthey implement it) issues before committing changes just on somenodes.
Unfortunately due to the dry_run opcode handling, we can't integratethis into the usual handling (as we need to activate the disks before...
Implement grow dry-run at RPC level
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Implement dryrun mode for BlockDev.Grow()
This is always called with False from backend for now.
cmdlib: Sort nodes for OOB commands
Also reorder the methods to match all other LUs.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
cmdlib: Use helper for expanding nodes for OOB commands
cmdlib: Expand instances using helper for repairing disks
Also change the way “share_locks” is filled.
Fix bug introduced in commit 0d5a0b96
When removing “acquired_locks” in commit 0d5a0b96, I didn't rememberthat it does not contain the Big Ganeti Lock.
Fix lock release in TLMigrateInstance
Commit 52f33103 introduced lock release factorization, replacing manuallock release using utility functions. However, it brokeTLMigrateInstance due to a typo (passing the Tasklet to ReleaseLocksinstead of the parent LU). We fix this by passing the LU to...
cmdlib: Remove acquired_locks attribute from LUs
The “acquired_locks” attribute in LUs is used to keep a list of acquiredlocks at each lock level. This information is already known in the lockmanager, which also happens to be the authoritative source. Removing the...
cmdlib: Use local alias for lock manager
Saves some typing and we'll use it more often in the future.
opcodes: Add function for compact summary
Depending on the opcode and its parameters, the existing “Summary”function can give a rater long summary. For displaying the summary inlogs and in the lock monitor, it should be shorter. Hence this newfunction is added to just use the opcode ID with common prefixes...
Show locksets in lock monitor
When all locks contained in a set are acquired, the lockset's internallock is acquired with the same mode. With this patch the internal lockwill show up on the lock monitor, named e.g. “instances/[lockset]”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
locking: Make parameter to condition's wait() positional
It is always used in the locking code. Unittests are updated.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
SharedLock: Avoid acquires from sneaking in while notifying
In some rare cases new shared acquires could sneak in through thecondition cached in “__pending_shared” while the code was stillnotifying acquires. This was only working because such a condition...
Fix instance failover/migration w.r.t TLMigrateInstance
Commit 1c6e5787 removed the iallocator and target_node keywordparameters from TLMigrateInstance, but I didn't update their use inLUInstanceFailover and (not fully) in LUInstanceMigrate.
Signed-off-by: Iustin Pop <iustin@google.com>...
Fix DTS_EXT_MIRROR migration
Commit faaabe3c fixed failover behaviour for DTS_INT_MIRROR instances, howeverit broke migration for DTS_EXT_MIRROR instances, by moving iallocator and nodechecks from LUInstanceMigrate to TLMigrateInstance. This has the side-effect...
Use node group locking for replacing disks
This is one of the first opcodes to make use of node group locking. Toget an instance's node groups, the instance's nodes need to be lookedat. Due to a previous design decision nodes are locked after the group,...
config: Add function to determine instance's groups
This will be used for locking only the necessary node group(s)for per-instance operations.
TLMigrateInstance: Fix live migration breakage
Commit 77fcff4 unintentionally incorporated code fromTLMigrateInstance.CheckPrereq into TLMigrateInstance._RunAllocator, presumablyduring a rebase from earlier versions of the patch to the 2.5 codebase. As a...
cmdlib: Update error messages, remove some punctuation
- Clarify some error messages- Remove unnecessary punctuation- Merge two if conditions in one place
Use floppy disk and a second CDROM on KVM
Hi all,this patch will add 3 new KVM parameters and a new option.
New Parameters: - floppy_image_path = "" -> Specify the floppy image to load asfloppy disk. - cdrom2_image_path = "" -> Specify a second cdrom image to load on...
cmdlib: Factorize lock releasing
There will be more lock releasing with upcoming changes, so this willcentralize the logic behind it (what locks to keep, which variables toupdate, etc.).
Merge branch 'devel-2.4'
TLReplaceDisks: Use implicit loop for dictionary
Release unneeded locks while replacing disks
If an iallocator is used, “gnt-instance replace-disks” would acquire thelocks of all nodes (only the allocator will decide which node to use).Unfortunately the unneeded locks were not released during the operation,...
locking: Export “list_owned” from lock manager
This is analog to “is_owned” and will be used for assertions.
gnt-instance: Fix typo in error message
The iallocator parameter is “-I”, not “-i”.
mlock: fail gracefully if libc.so.6 cannot be loaded
This allows noded to continue instead of blowing up if the libc majornumber changes.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
cmdlib: Drop SSH runner from LU base class
It is no longer used.
cmdlib.py: fix indentation in _VerifyNode
Signed-off-by: Adeodato Simo <dato@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
TLMigrateInstance: Fix confusing text
Commit d5cafd31 changed this error message, swapping thetext parts in the process.
LUInstanceRename: Amend comment about lock
Also add an assertion.
iallocator: Relocation nodes must be in same group
Quoting from iallocator.rst: “[…] ``relocate`` request is used when anexisting instance needs to be moved within its node group […]”.
Fix 'unused import' lint error
Sorry!
SetEtcHostsEntry: maintain existing ordering
Currently RemoveEtcHostsEntry keeps the ordering, but SetEtcHostsEntrynot, as it will always write the new entry at the end of file. Ipersonally dislike this as it "uglifies" my custom host files, so thispatch makes it update the record instead in-place so to say instead of...
Convert utils.nodesetup to utils.WriteFile(data=…)
It makes no sense to iteratively write the new etc/hosts file, as wecan pre-compute the desired contents (neither the old nor the newversions are safe against concurrent changes anyway).
Allow creating the DRBD metadev in a different VG
This is a simple change to allow specifying a different VG for themeta device during the creation of instances and addition of disks viagnt-instance modify.
Make _GenerateDRBD8Branch accept different VG names
This is a small change to make this function take a list of VG names,instead of a single one.
Fix WriteFile with unicode data
Unicode is fun, indeed:
len(buffer("abc"))
3
len(buffer(u"abc"))
12
So we can't pass unicode data to buffer(), as the result will be towrite the in-memory (usually UTF-32) representation to disk.
Fix for multiple VGs - PlainToDrbd and replace-disks
Converting an instance from 'plain' to 'drbd'. The old code wouldcreate the drbd volumes in the default VG and then the renames wouldfail. This fix pulls the plain VG names from the existing volumes and...
Replace disks: keep the meta device in the same VG
This patch enhances the multi-VG support in replace disks, by keepingthe meta device in the same VG, as opposed to moving it to the datadevice VG (note that we don't have a way to create the meta in adifferent VG in the first place, but at least we correctly handle a...
Fix punctuation in an error message
IIRC we don't use punctuation at the end of error messages.
Prevent readding of the master node
This breaks Ganeti in multiple ways. If we don't make the check ingnt-node itself, then bootstrap.SetupNodeDaemon will restart themaster daemon, making the operation fail:
node1# gnt-node add --readd node1 Cannot communicate with the master daemon....
Improve error messages in cluster verify/OS
A few issues in the clarity of the error messages are fixed:
- "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image"
- "API version lenny-image differs from reference node node1: 10, 5...
Fix potential data-loss in utils.WriteFile
os.write can do incomplete writes, as long as at least some bytes havebeen written (like write(2)):
os.write(fd, " " * 1300)
1300
os.write(fd, " " * 1300)...
RAPI: Add support for tagging node groups
masterd: Add support for tagging node groups
gnt-group: Add commands for tagging groups
cli: Fix wrong argument kind for groups
TLMigrateInstance: remove 10s sleeps
TLMigrateInstance._ExecMigration contains two 10-second sleeps betweenindividual migration steps.
Apart from prolonging the migration duration by 20s, the second sleepcauses FinalizeMigration to be called 10 seconds after the real...
Fix typo in LUGroupAssignNodes
gnt-instance info: automatically request locking
Commit dae661a4 added support for controlling the locking, but itdidn't modify the gnt-instance info code, which leads to this commandalways showing:
Wed Apr 20 04:10:48 2011 - WARNING: Non-static data requested, locks...
Fix master IP activation in failover with no-voting
Thanks to net.for.hub@gmail.com for reporting this. The logic inmasterd.CheckMasterd did an early return in case of no_voting, henceskipping the master IP activation. We just change the ifs to notreturn but simply continue through the function....
disk wiping: fix bug in chunk size computation
The current wipe_chunk_size computation is doing min(int_value,float_value). For small disks (below 10GiB), the actual formula willresult into the float value being chosen. This results into veryinteresting behaviour:...
gnt-group list: Query filter support
gnt-node list: Query filter support
Update manpage, quote field names.
gnt-instance list: Query filter support
Fix bug in watcher
If “utils.RunParts” were to raise an exception, a log message waswritten and the code continued to run. Due to the exception the“results” variable would not be defined.
Also change the code to log a backtrace (getting an exception is rather...
opcodes: Change parameter type definition for query filter
The old definition wouldn't accept integers.
cli: Add option to force names to be treated as filter
cli: Add support for parsing query filters
qlang: Add function to distinguish filters from names
cli: Error reporting for query filter parsing
qlang: Add parser for query filter language
With this parser, command line utilities will be able to provide filtersthrough query2 in a simplistic language. Example filters:
name "node3.example.com" master or (name "node4.example.com") be/memory == 128 and name =~ /^web/i...
Add instance query field for OS parameters
These were not available as a query field before. Update unittestsand description text for the other “..params” fields.
Release locks before wiping disks during instance creation
Ganeti 2.3 introduced an optional feature to overwrite an instance'sdisks on creation. Unfortunately the code kept all locks while doing thewipe, slowing down the creation of multiple instances in parallel....
Fix shared_file_storage_dir on upgrades
If the cluster was upgraded from 2.4 or earlier, this key won't exist(it's only set to a correct value on cluster init), so we need toproperly set it to a null string (disabled).
Prevent ssconf values from having non-string values
For whatever reason, my test cluster managed to acquireshared_file_storage_dir with a None value, instead of emptystring. This is not flagged in masterd itself, but the node daemonwill fail in writing the value to disk, as it calls len() on the...
cli: Replace hardcoded strings with constants
utils.WriteFile: Close file before renaming
Issue 154 (http://code.google.com/p/ganeti/issues/detail?id=154)reported an “Operation not supported” error when writing instanceexports to a mounted CIFS filesystem. Experimentation showed the errorto only occur when using rename(2) on an opened file. Various references...
Nicer formatting for group query error
Before this patc the message would look like “Some groups do not exist:[u'foo', u'bar']”, now it's “Some groups do not exist: foo, bar”.
Merge branch 'stable-2.4' into devel-2.4
LUInstanceQueryData: Don't acquire locks unless requested
Until now LUInstanceQueryData always acquired locks for the instance(s)and nodes involved. In combination with long-running operations thisprevented the use of “gnt-instance info”, even with the “--static”...
gnt-instance migrate: Adding --allow-failover option
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
TLMigrateInstance: Merge failover code, allow fallback
As the code for failover for checking is almost identical it's an easytask to switch it over to the TLMigrateInstance. This allows us tofallback to failover if migrate fails prereq check for some reason....
Increase the lock timeouts before we block-acquire
This has been observed to cause problems on real clusters via thefollowing mechanism:
- a long job (e.g. a replace-disks) is keeping an exclusive lock on an instance- the watcher starts and submits its query instances opcode which...
utils: Add function generating regex for DNS name globbing
The intent of this function is to be able to provide a globbing operatoror query filters. One should be able to say, for example, something tothe effect of “gnt-instance shutdown '*.site'”.
Also rename a variable in MatchNameComponent....
Verify file consistency using centrally computed list
Until now “gnt-cluster verify” (LUClusterVerify) would compute its ownlist of files to check for consistency. This list was not complete andcertain inconsistencies were missed.
With this patch the code is changed to use the list of files used by...
cmdlib: Factorize computation of ancillary files
… and change the logic in _RedistributeAncillaryFiles. The virtuallysame list of files will be used to verify the files' consistency.
qlang: Remove OP_GLOB operator
It'll be implemented using OP_REGEXP by the parser.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
query: Add implementation of regex match operator
So far this operator was not implemented. This patch adds an additionalvalue preparation function to the function table for binary operators,used to compile the regular expression. Unittests are included....
cmdlib: Fix mistake made in commit 75c7520f0
Commit 75c7520f0 used the wrong constant. I double-checked all otherchanges made in the commit.
cmdlib: Replace hardcoded values with constants
daemon.py: move startup log message before prep_fn
Before this, the output in the rapi daemon log was:2011-04-04 03:09:51,026: ganeti-rapi pid=17447 INFO Reading users fileat /var/lib/ganeti/rapi/users2011-04-04 03:09:51,027: ganeti-rapi pid=17447 INFO ganeti-rapi daemon...
Display the actual memory values in N+1 failures
This changes the display from:Mon Apr 4 02:29:46 2011 * Verifying N+1 Memory redundancyMon Apr 4 02:29:46 2011 - ERROR: node node2: not enough memory toaccomodate instance failovers should node node1 fail...
RAPI: Convert instance shutdown to the new FillOpCode
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
ssh.VerifyNodeHostname: remove the quiet flag
This is not needed for this function, and can interfere with debuggingof ssh failures.
Add a simple wrapper over utils.Retry
The new wrapper makes moving legacy code to utils.Retry or addingretries in existing code simpler.
Expose whether htools was enabled to Python code
This exports whether htools was enabled at configure-time, and adds aconstant for our reference iallocator.
Automatically enable hail if enabled and found
test.ganeti.process_unittest: Fix race condition
There was a race condition on heavily loaded testsystem causing randomlyto fail the timeout unittests as the signal handler is not yet setup butthe timeout has already hit.
Therefore we introduce a workaround to wait until a program reached a...
RAPI client: Remove support for version 0 instance creation requests
RAPI server: Drop support for instance creation format 0
Ganeti 2.1.3, released in June 2010, added support for a new, extensibleinstance creation request format, called version 1. This patch removessupport for the old and undocumented version 0 format....
Improved GanetiRapiClient docstrings
- Added @rtype and/or @return where missing- Fixed @param for Query() filter_ parameter (colon was missing)
Signed-off-by: Simeon Miteff <simeon.miteff@gmail.com>Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Relax instance ERROR on admin_down on offline node
This fixes a issue, where an stopped instances is reported as ERRORin cluster verify if it lives on a offline node. As the instances isdown this shouldn't happen.
Signed-off-by: René Nussbaumer <rn@google.com>...
Implement submitting jobs from logical units
The design details can be seen in the design document(doc/design-lu-generated-jobs.rst).
watcher: improve logging a bit
Add some debug logging to detail why we don't run some steps.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>