TLReplaceDisks: Use implicit loop for dictionary
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Release unneeded locks while replacing disks
If an iallocator is used, “gnt-instance replace-disks” would acquire thelocks of all nodes (only the allocator will decide which node to use).Unfortunately the unneeded locks were not released during the operation,...
locking: Export “list_owned” from lock manager
This is analog to “is_owned” and will be used for assertions.
gnt-instance: Fix typo in error message
The iallocator parameter is “-I”, not “-i”.
mlock: fail gracefully if libc.so.6 cannot be loaded
This allows noded to continue instead of blowing up if the libc majornumber changes.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Allow creating the DRBD metadev in a different VG
This is a simple change to allow specifying a different VG for themeta device during the creation of instances and addition of disks viagnt-instance modify.
Signed-off-by: Iustin Pop <iustin@google.com>...
Make _GenerateDRBD8Branch accept different VG names
This is a small change to make this function take a list of VG names,instead of a single one.
Fix WriteFile with unicode data
Unicode is fun, indeed:
len(buffer("abc"))
3
len(buffer(u"abc"))
12
So we can't pass unicode data to buffer(), as the result will be towrite the in-memory (usually UTF-32) representation to disk.
Fix for multiple VGs - PlainToDrbd and replace-disks
Converting an instance from 'plain' to 'drbd'. The old code wouldcreate the drbd volumes in the default VG and then the renames wouldfail. This fix pulls the plain VG names from the existing volumes and...
Replace disks: keep the meta device in the same VG
This patch enhances the multi-VG support in replace disks, by keepingthe meta device in the same VG, as opposed to moving it to the datadevice VG (note that we don't have a way to create the meta in adifferent VG in the first place, but at least we correctly handle a...
Fix punctuation in an error message
IIRC we don't use punctuation at the end of error messages.
Prevent readding of the master node
This breaks Ganeti in multiple ways. If we don't make the check ingnt-node itself, then bootstrap.SetupNodeDaemon will restart themaster daemon, making the operation fail:
node1# gnt-node add --readd node1 Cannot communicate with the master daemon....
Improve error messages in cluster verify/OS
A few issues in the clarity of the error messages are fixed:
- "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image"
- "API version lenny-image differs from reference node node1: 10, 5...
Fix potential data-loss in utils.WriteFile
os.write can do incomplete writes, as long as at least some bytes havebeen written (like write(2)):
os.write(fd, " " * 1300)
1300
os.write(fd, " " * 1300)...
cli: Fix wrong argument kind for groups
Quote filename in gnt-instance.8
Fix typo in LUGroupAssignNodes
gnt-instance info: automatically request locking
Commit dae661a4 added support for controlling the locking, but itdidn't modify the gnt-instance info code, which leads to this commandalways showing:
Wed Apr 20 04:10:48 2011 - WARNING: Non-static data requested, locks...
Document the dependency on OOB for gnt-node power
Fix master IP activation in failover with no-voting
Thanks to net.for.hub@gmail.com for reporting this. The logic inmasterd.CheckMasterd did an early return in case of no_voting, henceskipping the master IP activation. We just change the ifs to notreturn but simply continue through the function....
disk wiping: fix bug in chunk size computation
The current wipe_chunk_size computation is doing min(int_value,float_value). For small disks (below 10GiB), the actual formula willresult into the float value being chosen. This results into veryinteresting behaviour:...
Fix bug in watcher
If “utils.RunParts” were to raise an exception, a log message waswritten and the code continued to run. Due to the exception the“results” variable would not be defined.
Also change the code to log a backtrace (getting an exception is rather...
Release locks before wiping disks during instance creation
Ganeti 2.3 introduced an optional feature to overwrite an instance'sdisks on creation. Unfortunately the code kept all locks while doing thewipe, slowing down the creation of multiple instances in parallel....
utils.WriteFile: Close file before renaming
Issue 154 (http://code.google.com/p/ganeti/issues/detail?id=154)reported an “Operation not supported” error when writing instanceexports to a mounted CIFS filesystem. Experimentation showed the errorto only occur when using rename(2) on an opened file. Various references...
Fix distcheck
README is not copied to the build tree.
Nicer formatting for group query error
Before this patc the message would look like “Some groups do not exist:[u'foo', u'bar']”, now it's “Some groups do not exist: foo, bar”.
gnt-instance.8: Fix wrongly formatted title
Update version in README
Also add a check to Makefile's check-local target.
Merge branch 'stable-2.4' into devel-2.4
LUInstanceQueryData: Don't acquire locks unless requested
Until now LUInstanceQueryData always acquired locks for the instance(s)and nodes involved. In combination with long-running operations thisprevented the use of “gnt-instance info”, even with the “--static”...
Increase the lock timeouts before we block-acquire
This has been observed to cause problems on real clusters via thefollowing mechanism:
- a long job (e.g. a replace-disks) is keeping an exclusive lock on an instance- the watcher starts and submits its query instances opcode which...
daemon.py: move startup log message before prep_fn
Before this, the output in the rapi daemon log was:2011-04-04 03:09:51,026: ganeti-rapi pid=17447 INFO Reading users fileat /var/lib/ganeti/rapi/users2011-04-04 03:09:51,027: ganeti-rapi pid=17447 INFO ganeti-rapi daemon...
Display the actual memory values in N+1 failures
This changes the display from:Mon Apr 4 02:29:46 2011 * Verifying N+1 Memory redundancyMon Apr 4 02:29:46 2011 - ERROR: node node2: not enough memory toaccomodate instance failovers should node node1 fail...
ssh.VerifyNodeHostname: remove the quiet flag
This is not needed for this function, and can interfere with debuggingof ssh failures.
Add error checking and merging for cluster params
Set the default stderr logging level to WARNING so the relevant outputcan be seen.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
RAPI: Document need for Content-type header in requests
This was added to the NEWS file in commit ab221ddf, but neverdocumented properly.
Fix output for “gnt-job info”
If the result of an opcode was a non-empty dictionary, itwould be impossible to differenciate between input and result:
Input fields: […] debug_level: 0 fields: cluster_name,master_node,volume_group_name jobs: [[True, u'37922'], [True, u'37923'], [True, u'37924']]...
watcher: Fix misleading usage output
When “ganeti-watcher” is called with an argument, it would hint ata non-existing “-f” parameter. With this patch the separate usagestring is no longer necessary.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Clarify --force-join parameter message
This isn't only used during cluster merge.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
locking: Fix race condition in lock monitor
In some rare cases it can happen that a lock is re-created very soonafter deletion, while the old instance hasn't been destructed yet. Insuch a case the code would detect a duplicate name and raise anexception....
utils: Export NiceSortKey function
The ability to split a string into a list of strings and integers can behandy elsewhere and is necessary for sorting query results by names.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>...
Revert "Only merge nodes that are known to not be offline"
This reverts commit 288f240f62dafa8bd8ba7482c8367adbdf6d96c2.
That commit was buggy at various levels: - broke ssh access to the second cluster, making cluster-merge unusable (unless ssh key were previously setup?)...
cluster-merge: only operate on online nodes
The node list in MergerData is used only to: - stop ganeti on the nodes - readd the nodes to the clusterAs such offline nodes should be skipped from it.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Only merge nodes that are known to not be offline
Otherwise the readd will fail, breaking the merge.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Treat empty oob_program param as default
There is currently no way to reset oob_program back to its default fromthe cmdline, which causes problems for cluster-merge. This patch meansthat the following now works: gnt-cluster modify --node-parameters oob_program=...
Fix bug in instance listing with orphan instances
Nodes can return unknown instances, so we shouldn't use the name as anindex without checking.
Fix bug related to log opening failures
If opening the log file fails, then we shouldn't attempt to use thatvariable.
Bump version for 2.4.1 release
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
cfgupgrade: Fix critical bug overwriting RAPI users file
The cfgupgrade tool was designed to be idempotent, that means it couldbe run several times and still give produce the correct result. Ganeti2.4 moved the file containing the RAPI users to a separate directory...
Release 2.4.0
NEWS update and version bump.
Merge branch 'devel-2.3' into devel-2.4
Small improvement to the ganeti man page
Also specifies the comma-escaping feature.
Merge branch 'devel-2.2' into devel-2.3
Fix LUClusterRepairDiskSizes and rpc result usage
This LU was introduced before the RPC result conversion from .data to.payload, and it has managed to keep the old-style usage (how? it'sthe only LU that does so). Fix by changing to payload, and add some...
Fix RPC mismatch in blockdev_getsize[s]
Commit 92fd2250 added consistency checks in the RPC layer, which brokethe call_blockdev_getsizes RPC call (declared with 's' at the end inrpc.py, without 's' in the node daemon).
The immediate fix is to correct the rpc function name, the long term...
RAPI: fix evacuate node resource
PollJob returns the whole op_results, hence a list of opcode results.
Merge remote branch 'stable-2.4' into devel-2.4
Fix typo in kvm-ifup script
Reported-by: Bas Tichelaar <bas@30loops.net>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
NEWS: Replace smartquotes, start lines with uppercase
- Sphinx converts ASCII quotes ("") to smartquotes (“”) automatically- Sentences or list items start with an uppercase letter- Changed description of non-verbose “gnt-* list” output slightly
Fix LU processor's GetECId
The exception was never actually raised.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Adeodato Simo <dato@google.com>
Update NEWS and release 2.4.0 rc3
Merge branch 'devel-2.4' into stable-2.4
Fix potential data-loss bug in disk wipe routines
For the 2.4 release, we only add the missing RPC calls. However, thisneeds to be fixed properly, by preventing usage of mis-configureddisks.
Also add a bit more logging so that it's directly clear on which node...
1-char comment typo fix
Expand some acronyms, add to glossary
query_unittest: Fix argument to set()
Commit e431074f introduced an uncatched bug. This patch fixes this. Theset is expecting a list or iteratable to work on, so it splitted theprovided instance name into a set of characters. This caused theexp_status never been set and therefore not catched in one assert rule...
Fix title of query field containing instance name
Update news and bump version for 2.4.0 rc2
Fix pylint warnings
- 1 80-char line infraction- 4 changes in how arguments are passed to logging functions- 3 pylint disable-msg's because cluster-merge needs to access ganeti config internals
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Guido Trotter <ultrotter@google.com>...
TestRapiInstanceRename use instance name
Currently the QA rename job wrongly passed the whole info dict to theclient.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Change the list formatting to a 'special' chars
And also enable verbose display via the, well, verbose option. Manpage and tests are updated, and the formatting is moved from 4 ifstatements to a data structure.
Add support for merging node groups
Add option to rename groups on conflict
Fix minor docstring typo
Add QA rapi test for instance reinstall
This tests at least the basic case, unfortunately there is no way tocheck all possibilities using the provided rapi client, as that will usethe new method unless the cluster doesn't support it.
RAPI: remove required parameters for reinstall
Before c744425f354f1bef2d0d7d306e2d00c494d67d2b instance reinstallaccepted the "os" and "nostartup" optional query parameters. With thatcommit it was changed to allow "os" "start" and "osparams" via bodyrather than encoded in the URL. Unfortunately that commit introduced a...
NodeQuery: don't query non-vm_capable nodes
Because non-vm_capable nodes most likely don't have a hypervisorconfigured and/or storage, so the call will fail anyway.
NodeQuery: mark live fields as UNAVAIL for non-vm_capable nodes
Since we don't have the data per design, UNAVAIL is appropriate here,while NODATA is not.
The patch also adds a comment: if we extend the live fields list tocontain other data in the future, we need to reevaluate this solution....
Fix HV/OS parameter validation on non-vm nodes
Currently, there is at least one LU that does wrong validation of HVparameters (against all nodes, LUClusterSetParams). It's possible tofix this case, but I went and modified the base functions to filterout non-vm_capable nodes so all callers are protected....
Remove superfluous redundant requirement
The condition is already covered by the previous requirement.
Don't remove master_candidate flag from merged nodes
Prevents lots of spurious warnings like:2011-02-10 17:00:22,776: CRITICAL Configuration data is not consistent:Not enough master candidates: actual 3, target 4
Signed-off-by: Stephen Shirley <diamond@google.com>...
Use a consistent ECID base
ECID was being calculated completely differently in__MergeNodeGroups() and _MergeConfig()
listrunner: convert from getopt to optparse
The “-A” (use agent) was not documented, and instead of adding manuallisting, I converted it to optparse like the other CLI tools.
Note that I cleaned up a bit the usage and help texts.
listrunner: fix agent usage
By delaying the agent key query until after the fork, we prevent theproblem of simultaneous access to the agent.
Tested that it works against 80 hosts in parallel without error; thecurrent version breaks already at 20 hosts....
Revert "Disable the cluster-merge tool for the moment"
This reverts commit c0711f2cb989facd60430ab18c5b0e59a1f279ac.
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix cluster-merging by not stopping noded
cli.RunWhileClusterStopped() stops noded on all of the nodes in theoriginal cluster. This prevents /etc/hosts updates on the master, andconfig redistribution doesn't reach the other nodes in the originalcluster. As all we want to do is merge while the master is stopped,...
Fix bug in iallocator data structures build
Commit a1cef11c fixed non-vm_capable nodes export, but brokeinadvertently offline nodes. The update of the dict only needs tohappen for online nodes, in the 'if' block.
Without this patch, offline nodes keep the data from the last node...
Fix error msg for instances on offline nodes
Currently, for both primary and secondary offline nodes, we give thesame message:- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: instance lives on offline node(s) node3...
Minor reordering to match param order
cluster verify and instance disks on offline nodes
Currently, cluster-verify says:
- ERROR: instance instance14: couldn't retrieve status for disk/0 on node3: node offline- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: couldn't retrieve status for disk/0 on node3: node offline...
Cluster verify and N+1 warnings for offline nodes
Currently, cluster verify shows warnings N+1 warnings for offlinenodes having any redundant instances since the memory data that wehave for those nodes is zero, so any instance will trigger thewarning....
Handle gnt-instance shutdown --all for empty clusters
The current code gives:Failure: prerequisites not met for this operation:error type: wrong_input, error details:Selection filter does not match any instances
Use gnt-node add --force-join to add foreign nodes
Add --force-join option to gnt-node add
This is needed so cluster-merge can add nodes from other clusters.
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>...
Fix iterating over node groups
Current line tries to unpack dict incorrectly
Update NEWS file for the 2.4.0 rc1 release
Also bump up the version.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Disable the cluster-merge tool for the moment
Hopefully this can be fixed before the final 2.4 release…
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>Reviewed-by: Stephen Shirley <diamond@google.com>