locking: Fix lock deletion with timeout
While working on another SharedLock fix I realized timeouts on lockdeletion don't work very well if the timeout actually expires. Thispatch fixes the issue and adds a new unittest.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Move _TimeoutExpired to utils
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
(cherry picked from commit f8326fcaac87958241d78526e5868d23d78ac286)
Workaround changed LVM behaviour
The vgreduce command has changed behaviour from when we initiallywrote the code (2.02.02 versus 2.02.66, 4 years delta):
- if there are LVs which will be impacted, it requires --force- otherwise refuses to proceed, but it still returns exit code 0...
Add UnescapeAndSplit unittest for multi-escapes
This would have caught the bug in the first place. Argh,hand-generated test cases!
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Merge branch 'stable-2.5' into devel-2.5
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
KVM: support version reported by 1.0
This of course was working for all the rcs, but broke with 1.0 itself.
In addition: - split between running kvm --version and parsing its output - unittest parsing for various known --help outputs - updated NEWS file...
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
jqueue: Factorize checking job processor's result
This allows for more unittesting.
jqueue unittest: Rename simple fake-job class
Also add a parameter for priority, to be used in an upcomingpatch.
jqueue: Fix deadlock between job queue and dependency manager
When an opcode is about to be processed its dependencies areevaluated using “_JobDependencyManager.CheckAndRegister”. Dueto its nature that function requires a lock on the manager'sinternal structures. All of this happens while the job queue...
utils.io.WritePidFile: Improve error reporting
If the PID file is already locked by another process, try to readthe content and report it as part of the error message.
utils.ListVisibleFiles: Hide “lost+found” directories
If a “lost+found” directory is found at a filesystem's root path it isignored. In all other cases directory entries named “lost+found” aretreated normally. Unittests are included. Fixes issue 153.
Fix race condition in test for *FileID functions
In this test the “file ID” of a temporary file is compared against thefile ID gathered via an open file descriptor to the same file. Forreasons unknown to me utime(2) is called in-between to update theinode's a- and mtime. Depending on the file system's timestamp...
LUGroupAssignNodes: Fix node membership corruption
Note: This bug only manifests itself in Ganeti 2.5, but since theproblematic code also exists in 2.4, I decided to fix it there.
If a node was assigned to a new group using “gnt-group assign-nodes” the...
Separate OpNodeEvacuate.mode from iallocator
Until now the iallocator constants for node evacuation(IALLOCATOR_NEVAC_*) were also used for the opcode. However, it turnedout this was due to a misunderstanding and is incorrect. This patch addsnew constants (with the same values) and changes the affected places....
Fail if node/group evacuation can't evacuate instances
If an instance can't be evacuated, only a message would be printed. Withthis change the operation always aborts. Newly added unittests check forthis behaviour.
Generalize HooksMaster
- remove any dependence on Logical Units from the HooksMaster;- add a new function parameter to the constructor, a function that is expected to convert the results of the hooks execution in a format understood by the HooksMaster;...
Fix wrong headers and licences
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Stephen Shirley <diamond@google.com>
Fix parameters of RpcResult in hooks unit tests
In FakeHooksRpcSuccess, the data parameter of the RpcResult constructorwas not enclosed in a tuple. While this does not make the test fail, itmust be fixed.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>...
Fix a too long line.
That's what you get for not running make lint :(
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Move RenameFile to the new functions
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
ensure_dirs: Move some useful functions into utils.
With this change we can easily reuse this functionality where it makessense on other parts of Ganeti.
Add the JoinDisjointDicts function to utils.algo
Add a function that joins two dictionaries, enforcing the constraintthat the two key sets should be disjoint. Also, add unit tests for thisfunction.
Allow per-hypervisor optional files
Rather than just allowing files for all nodes to be optional, we allowoptional files to be per-category. The way this works is that they mustbe included in both lists (the new code checks for this).
The code also removes a duplicate assert (present both in verify and...
RAPI: Make node evacuation actually work
Commit e1f23243 changed te LU and opcode for node evacuation to receivea “mode” parameter (among other things). Commit de40437a changed theRAPI code accordingly, but did so for an earlier version of the firstpatch. Obviously this couldn't work, so here's the fix....
RAPI: Fix resource for replacing disks
Commit d1c172deb4f inadvertently changes the“/2/instances/[instance_name]/replace-disks” resource to use bodyparameters. There were no QA tests and the issue wasn't noticed.
This patch re-introduces support for query parameters and adds a QA...
rapi: Add resource for modifying node
A separate patch will add “auto-promote” through“/2/nodes/[node_name]/role”.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Fix issue when verifying cluster files
If a cluster has any non-master-candidate nodes, those don't contain allfiles (e.g. config.data). With commit aef59ae764dc (March 31st, 2011)the logic was changed and subsequently verifying a cluster with non-mcnodes would complain....
LUClusterVerifyGroup: Spread SSH checks over more nodes
When verifying a group the code would always check SSH to all nodes inthe same group, as well as the first node for every other group. On bigclusters this can cause issues since many nodes will try to connect to...
Add SPICE support to gnt-instance console
Also update related unit tests.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
utils: Introduce IsBelowDir
This is mainly a wrapper to overcome the limitation of commonprefixwhich makes a string by string comparisation and reports the commonprefix in both strings. However this is bad for directory handling.
IsBelowDir works around this limitation and should be used in favour of...
Draft implementation of QMP connection
Basic implementation of the QMP connection and related tests.
Merge branch 'devel-2.4' into devel-2.5
Conflicts: NEWS (trivial) configure.ac (trivial) daemons/ensure-dirs.in (deleted)
utils: Fix UnescapeAndSplit parsing bug
If a value passed to UnescapeAndSplit ended with a backslash anexception would be raised:
$ gnt-instance modify -H mem=x\\ inst1.example.com[…] e2 = slist.pop(0)IndexError: pop from empty list
Adding missing test data for commit 7a380ddfc
Fix a parsing issue with DRBD 8.3.11 in the Linux Kernel
In the Linux kernel commit 4b0715f096 introduced a display bug into/proc/drbd which broke our regex.
The bug was first introduced into Linux 2.6.39-rc1. This bug is stillunfixed as of today.
This patch adapt the regular expression to workaround this bug for the...
ensure-dirs: Check mode and owner before changing
This avoids many calls to chmod(2) and chown(2), and thereby ctimeupdates.
Since I had to update the unittests anyway I untangled the code a bit,split it into more separate functions and added some more tests....
ensure-dirs: Refine error handling on stat(2)
The “_stat_fn” function is renamed to “_lstat_fn” to reflect itsfunction. The try/except block just wraps calling lstat(2) and nothingelse.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Clarify job ID-related type checks, add unittests
Instead of a rather complicated expression only “JobId” is output. JobID lists (like generated by “SubmitManyJobs”) are limited to two-itemlists. Unittests are added.
Added helper functions in netutils and related constants
Added the following functions to netutils:- IsValidInterface- GetInterfaceIpAddresses- _GetIpAddressesFromIpOutput
Added the following static methods to netutils.IPAddress:- GetAddressFamilyFromVersion...
rlib: Expose node group tags
Commit 1ffd26739d3 added support for tagging node groups. Also add acheck for exposed fields.
Detect globbing patterns as query arguments
Short: this patch enables the use of “gnt-instance list '*.site'”.
Detailed description: This patch changes the command line interface codeto try to deduce the kind of filter from the arguments to a “list”command. If it's a list of plain names an old-style name filter is used....
Implement globbing operator for filters
The operators “=*” and “!*” do globbing in filters, e.g.:
$ gnt-instance list --no-headers -o name 'name =* "*.site"'inst1.site.example.com
Bump version to 2.5.0~beta1
utils.ReadFile: Add pre-read callback
This will be used by the watcher to store the file's fstat(2). It mustbe done from the filehandle.
cleaner: Remove watcher's instance status file after 21 days
Fix unittest failure after list_owned changes
We just need an object that has a list_owned method.
ganeti-cleaner: Remove old watcher state files
Watcher state files can stay around if node groups are removed. Withthis patch they're removed after 21 days.
Add primary/second nodes' group as query fields
These will be very useful for ganeti-watcher as it needs to retrieveinstances by group.
Add ht-based result checks to opcodes
This adds the infrastructure necessary to check opcode results usinght-based functions. Checks are added for two opcodes.
Merge branch 'devel-2.4'
Reopen log file only once after SIGHUP
Commit b6fa9a44 added a re-openable log handler. The log file isreopened when a daemon is sent a HUP signal. Due to a bug in the code,fixed by this patch, the log file would be reopened for every single logmessage thereafter....
Implement instance failover via RAPI
No idea why this was missed before.
Make lock monitor more versatile
With this change it'll be possible to register other lock informationproviders. One usecase for this are job dependencies, which can be shownin the output of “gnt-debug locks”, too.
The lock monitor is changed to accept more than one return value from...
Export job dependencies through lock monitor
This makes them visible to the user. Example:
$ gnt-debug locks -o name,pendingName Pendingjob/890 job:891,892job/892 job:894
Rename *_STATUS_WAITLOCK to …_WAITING
This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to {JOB,OP}_STATUS_WAITING, as per design document for chained jobs.
Fix locking issue with job dependencies
When jobs waiting for a dependency are notified, they're re-added to thequeue. This would require owning the queue lock in exclusive mode, butsince the function doing so is called from within the job/opcodeprocessor, it only holds the lock in shared mode....
jqueue: Implement submitting multiple jobs with dependencies
With this change users of the “SubmitManyJobs” interface can userelative job dependencies. Relative job IDs in dependencies are resolvedbefore handing the job off to the workerpool.
jqueue: Add “writable” flag to memory objects
Basically only one instance of the job, the one being processed,should be serialized to disk and replicated to other nodes. Withthis flag assertions can be added in various places.
Implement chained jobs
An overview is available in the design document for this change,doc/design-chained-jobs.rst.
When a job enters the job processor, the current opcode's dependenciesare evaluated. If a referenced job has not yet reached the desired...
Adding a wrapper around connecting to kvm console
The wrapper will connect to the console, and check in the background ifthe instance is paused, unpausing it as necessary.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
cli.GetOnlineNodes: Support node group filter, use query2
This patc changes cli.GetOnlineNodes to use query2, which does thefiltering in the master daemon, and adds a new parameter to filter bynode group.
Unittests were added for the old implementation and then adopted to...
ht: Add new check for numbers
Places which receive floats can usually also deal with integers, e.g.OpTestDelay. Tests are added and the new check function is used for theaforementioned opcode and verifying query results.
Change RAPI for new node evacuation opcode
The change is not backwards compatible, see the updated NEWS file.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Adding basic abstraction layer for caching
This includes an own simple cache implementation and aninterface to a memcache instance.
Fix _checkRsaPrivateKey for newer key generation
Keys generated under debian sid just read "BEGIN PRIVATE KEY" ratherthan "BEGIN RSA PRIVATE KEY".
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix incomplete merge
Commit 66bd7445 changed the semantics of _JobProcessor on finishedjobs, and updated the related unittests in the 2.4 branch. It was thenmerged to master, however on master there was an additional test forthis case, which was not updated....
jqueue: Fix potential race condition when cancelling queued jobs
When a job was cancelled, its status would be changed and the filewritten again. Since this was a final status, the job file could bemoved anytime for archival. If the job was still in the queue, however,...
gnt-node migrate: Use LU-generated jobs
Until now LUNodeMigrate used multiple tasklets to evacuate all primaryinstances on a node. In some cases it would acquire all node locks,which isn't good on big clusters. With upcoming improvements to the LUsfor instance failover and migration, switching to separate jobs looks...
ht: Add checks for anything, regexp, job ID, container items
The check for container items is useful for tuples and/or lists withnon-uniform values. The “anything” check can be used when any valueshould be accepted for an item.
The job ID check, which uses the regexp check, will be used for...
GetEntResolver: Make it possible to resolve uid/gid to name
utils.algo: Add InvertDict to invert a dict
Improve hooks documentation unittest
Also check for the opcode ID.
Split LUClusterVerify into LUClusterVerify{Config,Group}
With this change, LUClusterVerifyConfig becomes a "light" LU that onlyverifies the global config and other, master-only settings, and the bulk ofnode/instance verification is done by LUClusterVerifyGroup, which only acts...
ht: Add strict check for dictionaries
This allows checking specific dictionary items, unlike TDictor TDictOf.
SharedLock: Implement downgrade from exclusive to shared mode
If a job needs to modify a resource and then wait for a result, it mustacquire the resource lock in exclusive mode. In some cases it would bepossible to only have a shared lock for waiting. Until now it was not...
Re-indent test/mocks.py using two spaces
No idea where those four spaces came from, but they must've been therefor a while.
cmdlib: Remove acquired_locks attribute from LUs
The “acquired_locks” attribute in LUs is used to keep a list of acquiredlocks at each lock level. This information is already known in the lockmanager, which also happens to be the authoritative source. Removing the...
opcodes: Add function for compact summary
Depending on the opcode and its parameters, the existing “Summary”function can give a rater long summary. For displaying the summary inlogs and in the lock monitor, it should be shorter. Hence this newfunction is added to just use the opcode ID with common prefixes...
locking: Make parameter to condition's wait() positional
It is always used in the locking code. Unittests are updated.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
SetEtcHostsEntry: maintain existing ordering
Currently RemoveEtcHostsEntry keeps the ordering, but SetEtcHostsEntrynot, as it will always write the new entry at the end of file. Ipersonally dislike this as it "uglifies" my custom host files, so thispatch makes it update the record instead in-place so to say instead of...
Fix WriteFile with unicode data
Unicode is fun, indeed:
len(buffer("abc"))
3
len(buffer(u"abc"))
12
So we can't pass unicode data to buffer(), as the result will be towrite the in-memory (usually UTF-32) representation to disk.
Signed-off-by: Iustin Pop <iustin@google.com>...
RAPI: Add support for tagging node groups
qlang: Add function to distinguish filters from names
qlang: Add parser for query filter language
With this parser, command line utilities will be able to provide filtersthrough query2 in a simplistic language. Example filters:
name "node3.example.com" master or (name "node4.example.com") be/memory == 128 and name =~ /^web/i...
Add instance query field for OS parameters
These were not available as a query field before. Update unittestsand description text for the other “..params” fields.
utils.WriteFile: Close file before renaming
Issue 154 (http://code.google.com/p/ganeti/issues/detail?id=154)reported an “Operation not supported” error when writing instanceexports to a mounted CIFS filesystem. Experimentation showed the errorto only occur when using rename(2) on an opened file. Various references...
Merge branch 'stable-2.4' into devel-2.4
Increase the lock timeouts before we block-acquire
This has been observed to cause problems on real clusters via thefollowing mechanism:
- a long job (e.g. a replace-disks) is keeping an exclusive lock on an instance- the watcher starts and submits its query instances opcode which...
utils: Add function generating regex for DNS name globbing
The intent of this function is to be able to provide a globbing operatoror query filters. One should be able to say, for example, something tothe effect of “gnt-instance shutdown '*.site'”.
Also rename a variable in MatchNameComponent....
query: Add implementation of regex match operator
So far this operator was not implemented. This patch adds an additionalvalue preparation function to the function table for binary operators,used to compile the regular expression. Unittests are included....