Revert "Disable the cluster-merge tool for the moment"
This reverts commit c0711f2cb989facd60430ab18c5b0e59a1f279ac.
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix cluster-merging by not stopping noded
cli.RunWhileClusterStopped() stops noded on all of the nodes in theoriginal cluster. This prevents /etc/hosts updates on the master, andconfig redistribution doesn't reach the other nodes in the originalcluster. As all we want to do is merge while the master is stopped,...
Fix error msg for instances on offline nodes
Currently, for both primary and secondary offline nodes, we give thesame message:- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: instance lives on offline node(s) node3...
Minor reordering to match param order
cluster verify and instance disks on offline nodes
Currently, cluster-verify says:
- ERROR: instance instance14: couldn't retrieve status for disk/0 on node3: node offline- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: couldn't retrieve status for disk/0 on node3: node offline...
Cluster verify and N+1 warnings for offline nodes
Currently, cluster verify shows warnings N+1 warnings for offlinenodes having any redundant instances since the memory data that wehave for those nodes is zero, so any instance will trigger thewarning....
Handle gnt-instance shutdown --all for empty clusters
The current code gives:Failure: prerequisites not met for this operation:error type: wrong_input, error details:Selection filter does not match any instances
Signed-off-by: Stephen Shirley <diamond@google.com>...
Use gnt-node add --force-join to add foreign nodes
Add --force-join option to gnt-node add
This is needed so cluster-merge can add nodes from other clusters.
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>...
Fix iterating over node groups
Current line tries to unpack dict incorrectly
Update NEWS file for the 2.4.0 rc1 release
Also bump up the version.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Disable the cluster-merge tool for the moment
Hopefully this can be fixed before the final 2.4 release…
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>Reviewed-by: Stephen Shirley <diamond@google.com>
Bump up intra-cluster import connect timeout
Currently, the export timeout is 10 times 20 seconds, but the importis only 30 seconds. I'm raising this to 60 seconds with two goals inmind:
- when debugging manually, this allows for easier synchronisation of...
Import-export: fix logging of daemon output
In case of failures, the recent daemon output is logged as %r on alist of unicode strings, which results in the (ugly):
Thu Feb 3 05:13:34 2011 snapshot/0 failed to send data: Exited with status 1 (recent output: [u' DUMP: Date of this level 0 dump: Thu Feb 3 05:13:18 2011', u' DUMP: Dumping /dev/mapper/6369a5f7-1e67-4d0d-a4f0-956b3649c6d7.disk0_data.snap-1 (an unlisted file system) to standard output', u' DUMP: Label: none', u' DUMP: Writing 10 Kilobyte records', u' DUMP: mapping (Pass I) [regular files]', u' DUMP: mapping (Pass II) [directories]', u' DUMP: estimated 54301 blocks.', u' DUMP: Volume 1 started with block 1 at: Thu Feb 3 05:13:19 2011', u' DUMP: dumping (Pass III) [directories]', u' DUMP: dumping (Pass IV) [regular files]', u'socat: E SSL_write(): Connection reset by peer', u"dd: dd: writing `standard output': Broken pipe", u' DUMP: Broken pipe', u' DUMP: The ENTIRE dump is aborted.'])...
Fix handling of ^C in the CLI scripts
This adds a message and nice handling of ^C, especially useful for``gnt-job watch``.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Merge branch 'devel-2.3' into devel-2.4
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
backend: Disable compression in export info file
The new import/export infrastructure in Ganeti 2.2 and up handlescompression differently. It no longer writes compressed files to thedestination. Unfortunately changing this behaviour would be non-trivial,...
utils.SetupLogging: Return function to reopen log file
This function can be used from a SIGHUP handler to reopen log files.Initial, simple unittests are included.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Reopen log files upon SIGHUP in daemons
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
utils.SetupLogging: Make program a mandatory argument
It's passed in by most users (daemons, CLI scripts) and for the others(burnin, watcher) it certainly doesn't hurt, especially when usingsyslog.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
utils.log: Restrict I/O error handling coverage
The I/O error will occur while opening the file, not while addingand configuring the handler.
utils.log: Split formatter building into separate function
burner: Trivial code cleanup
- Use constant for exit value- Configure logging from main function, not from class' “__init__”
burnin: Reuse existing function for debug value
Instead of using its own, burnin can use cli.SetGenericOpcodeOpts.
Merge node groups from other cluster
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Enforce that new node groups have unique names
Add _UnlockedLookupNodeGroup()
This allows calling of _UnlockedLookupNodeGroup() from withinAddNodeGroup()
Fix grammar of var naming
flatten is the verb, flattened is the adjective.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Minor grammar fix in QuitGanetiException docstring
cluster-merge should refuse to merge own cluster
Also fix type of Merger.cluster_name from list to string. This wouldhave triggered an error in sshRunner if cluster keys were in use.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Introduce re-openable log record handler
This patch adds a new log handler class based on the standard library'sBaseRotatingHandler. This new class allows the log file to be re-opened,e.g. upon receiving a SIGHUP signal. The latter will be implemented in...
Re-create instance disk symlinks on activate
This patch implements recreation of instance disk symlinks when theactivate-disks operation is run. Until now, it was not possible tore-create these symlinks without stopping and starting or migrating aninstance as the RPC call where this is done was in instance startup...
Add RAPI resource for instance console
Export console information as query field
This makes it possible to get the console information via a LUXI query.
manpage: gnt-group remove cannot remove last group
ConfigWriter: simplify _UnlockedVerifyConfig
This just adds a 'cluster' local variable for reducing duplication.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
ConfigWriter: add checks for be/nd/nic params
This adds checking (in the configuration) for invalid be, nd and nicparams. The code is a bit tricky as nd params are at cluster,nodegroup and node level, nicparams are at cluster and nic level,whereas beparams are at cluster and instance level....
Add e1000 nic support for HVM
Closes issue: 130
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Prevent removal of last node group
- Add check in ConfigWriter to prevent last node group from being removed- Tidy up error message a bit
Fix instance list for instances running multiple times
If for some reason (e.g. failed migration) one instance is runningon multiple nodes the output can become inconsistent. To get that errorand make it consistent between runs we make the call on the secondary...
Small QA fixes: groups via RAPI, cluster OOB
Add “cluster-oob” to sample configuration file. Don't run RAPI grouptests if disabled.
cluster verify: add hvparams verification
Currently, the validity of the hypervisor parameters is only checkedat init/modification time, and not in the cluster verify. This is bad,as it can lead to inconsistent state that is only detected when thenext modification (which can be unrelated) is made, leading to...
Remove dumb-allocator
- Remove the actual code- Remove mentions of it from iallocator.rst, and use hail instead- Also remove mentions of "etch-image" and use "debootstrap+default" - Mention htools as the reference implementation in iallocator.rst
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Open other clusters' config in foreign mode
Add (unused) arg to _OfflineClusterMerge
cli._RunWhileClusterStoppedHelper.Call passes (self, *args) to functionscalled via cli.RunWhileClusterStoppedHelper(). The code in cluster-mergewas broken by commit d8aab233.
Fix unittest breakage on Python 2.4/2.5
Commit 70b0d2a29 broke unittests on Python 2.4 and 2.5. Turns out thatPython 2.6 and above allow classes to be passed as custom test runners,whereas earlier versions don't.
Ensure all resources are used by RAPI client
Check for duplicate RAPI URIs and handlers
RAPI client: Wrap /2/redistribute-config resource
RAPI client: De-/activating instance disks
Add unittest for RAPI client's ModifyInstance
Watcher: Fix endless repair tries for broken secondary
In cases where secondary was offline and not evacuated watcher triedto activate-disks in an endless manner, but this is useless, as thesecondary is offline and therefore not responding to this approach....
Verify disks: increase parallelism and other fixes
The recent work on multi-VG support has converted LUClusterVerifyDisksinto doing serialised calls to each node, as each node can havedifferent VGs. This is suboptimal, especially for big clusters, where...
gnt-cluster verify-disks: fix VG name
Recent multi-VG work already exports the missing LV names as vg/lv,not simply lv. So the query and addition of the VG name in gnt-clusterverify-disks is redundant, and even wrong for non-default-VGinstances.
Signed-off-by: Iustin Pop <iustin@google.com>...
Deactivate disks: allow skipping hypervisor checks
In some cases (e.g. the hypervisor not running at all), we might wantto force disk deactivation, skipping the hypervisor checks. I believethis is not a good thing to do all the time, so this patch adds the...
Wait for master to become available on initialization
This is analogue to the existing check for a responsive node daemon.
Start all daemons on cluster initialization
At least ganeti-confd was not started. It got started a few minuteslater by ganeti-watcher. Also move one pylint disable to the effectiveline.
Clarify job processing order in admin guide
The fact that jobs don't necessarily execute in order has been sourcefor some confusion. Hopefully this update will clarify things.
Improve option descriptions
Also replace hardcoded “xenvg” with constant.
Remove two unused variables
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Show hidden/blacklisted OSes in cluster info
Since we can blacklist/hide non-existing OSes (for preseeding), wecannot query easily the OSes themselves for this status. Hence weexport the entire lists in cluster info (which should be cheaper thangnt-os diagnose)....
Further man page updates for OS parameters
Also replace one UTF-8 char with the ASCII equivalent, not all Pandocversions support it.
Change the Makefile to use bash as SHELL
This is because we want, whenever we use sequences of commands, to setpipefail, otherwise detecting build failures is difficult.
Add documentation for OS parameters
Fix LUOSDiagnose and non-vm_capable nodes
This skips non-vm_capable nodes in the OS diagnose search, since suchOSes will not be used anyway on those nodes.
Rephrasing two error messages for auto promotion
Using auto_promote or auto-promote can lead to confusion on using theuser facing interfaces. While auto-promote is fine for CLI it's not forRAPI and vice-versa. This patch should eliminate this confusion....
lvmstrap: fix logic bug for partition reread
The if structure in CheckReread is broken, and makes partitions rereadbe full of race issues (esp. after updating them).
Also fix a small message bug.
storage: Check that mapper is either used or None
This is a followup patch to the one moving GetAllocatable out tomodule level.
Fix bug in “gnt-node list-storage”
LVM PV storage units would always show as allocatable, even when theyweren't. For some reason I have not been able to determine, the functionparsing the attributes (“_GetAllocatable”) was not even called and thelist opcode simply returned the attribute string as the value (e.g....
Fix payload check for out-of-band health
This logic error was not detected before as health has not beenimplemented on the cli and therefore no QA code existed for that.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix premature abort of LUOobCommand due to result.Raise
This is a bug I recognized while doing tests on gnt-node health. A leftover result.Raise line causes premature abort of LUOobCommand on thefirst node failing the RPC call. This is not expected behaviour for...
ht: Add TMaybeDict check
This replaces a number of equal “ht.TOr(ht.TDict, ht.TNone)” checks.
QA: replace ping with fping
This allows a lot of simplification in the TestIcmpPing, as fping cantake multiple arguments so we don't need anymore to create manycommands joined with &&.
Modify LUOobCommand to support multiple nodes
This will change the result of this LU to a query like result. A list oftuples with information about the state of the data.
It also includes the modification to the commands calling this opcode.
Signed-off-by: René Nussbaumer <rn@google.com>...
Merge branch 'devel-2.4'
Rename QRFS_* to RS_*
This patch renames QRFS_* to RS_* fields so they can be used in otherplaces (i.e. LUs) without confusion, as this was initially meant forquery operations.
Another fix for LUClusterVerifyDisks
The LVM queries should only be done for vm_capable nodes. In order todo this, we also add a new ConfigWriter method to abstract that query.
QA: also run gnt-cluster verify-disks
The bug recently reported by Apollon Oikonomopoulos was missed becausewe don't test this command at all.
Fix disk adoption breakage
Disk adoption is currently broken by 84d7e26b, which added multiple LVMvolume group support. This patch fixes the calls to rpc.call_vg_list,which are multi-node calls but were handled as single-node calls in84d7e26b.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Fix typo in query2 design document
Improve import/export timeout settings
With this patch, the exporting node will retry to connect a few times.The receiving node will make use of the master's increased timeout (seeprevious patch).
Increase remote import/export timeout
It's been shown that 60 seconds may not be enough to establish aconnection.
lvmstrap: fix very old contact information
A memory from the past: this was left over from before the 1.2 releaseor so…
lvmstrap: add more excluded FS types
Also moves the list of excluded types to the top level and make it afrozenset.
lvmstrap: add an explicit test for mounted devices
Recent kernels/userland report a mounted filesystem as follows:
root@node4:~# fuser -avm /dev/sda5 USER PID ACCESS COMMAND /dev/sda5: root kernel mount /srv/ganeti...
lvmstrap: add explicit test for swap backends
Similar to mounted filesystems, recent kernel/userland report swapbackends:
root@node4:~# fuser -avm /dev/sda6 USER PID ACCESS COMMAND /dev/sda6: root kernel swap /dev/sda6...
lvmstrap: ignore small-sized partitions
This patch changes lvmstrap to ignore “small” partitions. Currentlyextended partitions are reported as unused as with a size of 1024(bytes), and this confuses lvmstrap. Since a very small partitionwon't help anyway (below hundred of PE size is not helpful), let's...
lvmstrap: abstract a little the sysfs paths
lvmstrap: add PV-on-partition support
This is a not-so-nice change, adding support for partitions to be usedas PVs.
The not-nice part is that partitions live in a separate place insysfs, whereas in dev they live at the same level as disks. Weworkaround this via a new SysfsName function that computes the correct...
Improve documentation for QRFS_UNAVAIL
IMHO this should have been named QRFS_NA or QRFS_UNSUPPORTED, butUNAVAIL is good enough.
query: Add alias support in _PrepareFieldList
Instance query: replace duplicates with aliases
Adding a basic oob helper as an example
This is just a plain stupid and simple out-of-band helper withoutanything fancy. It uses plain ssh to power off / power cycle themachine, does not support power on. It support power status using fpingto check if the host replies....
Fix disk count check in LUSetInstanceParams
LUSetInstanceParams checked instance.nics (and not instance.disks)against constants.MAX_DISKS.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Document iallocator change (alloc_policy)
Signed-off-by: Balazs Lecz <leczb@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add check-news to EXTRA_DIST
This was missing from commit 7385c51d.
Add script checking release dates in NEWS
This will detect human errors when setting a release date in NEWS.
query: use the actual types for BE/HV parameters
This patch exposes the VTYPE kind of BE/HV parameters, instead ofreturning QFT_OTHER. The current situation makes a query like:
gnt-instance list -o name,be/memory,oper_ram
very strange looking.
query: return UNAVAIL for "wrong" HV parameters
If a HV parameter is required that does not apply for an instance,currently the code returns None. This is bad, as it means we cannotswitch to the actual HV parameter types and validate correctly thisfield....