Add shared file storage design doc
Add doc/design-shared-storage.rst to document the proposed changes and updateMakefile.am respectively.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Core shared file storage support
This patch introduces core file storage support, consisting of the following:
A configure-time switch for enabling/disabling shared file storagesupport and controlling the shared file storage location:--with-shared-file-storage-dir=. Shared file storage configuration is then...
Shared file storage initialization code
Add shared file storage handling during cluster initialization.
Add DTS_MIRRORED frozenset
Use DTS_MIRRORED to indicate mirrored disk templates that allowmigrations/failover.
DTS_MIRRORED is the union of DTS_EXT_MIRROR and DTS_NET_MIRROR.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Reviewed-by: Michael Hanselmann <hansmi@google.com>...
Add bdev_sizes RPC call
The bdev_sizes multi-node RPC call returns the sizes of the requestedblock devices on the desired nodes. Its intended use is to verify theexistence of a block device on a given node for shared block storagesupport.
Block device paths are expected to lie under constants.BLOCKDEV_DIR...
Add error checking and merging for cluster params
Set the default stderr logging level to WARNING so the relevant outputcan be seen.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Clarify --force-join parameter message
This isn't only used during cluster merge.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Treat empty oob_program param as default
There is currently no way to reset oob_program back to its default fromthe cmdline, which causes problems for cluster-merge. This patch meansthat the following now works: gnt-cluster modify --node-parameters oob_program=...
Fix bug in instance listing with orphan instances
Nodes can return unknown instances, so we shouldn't use the name as anindex without checking.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix bug related to log opening failures
If opening the log file fails, then we shouldn't attempt to use thatvariable.
Bump version for 2.4.1 release
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
cfgupgrade: Fix critical bug overwriting RAPI users file
The cfgupgrade tool was designed to be idempotent, that means it couldbe run several times and still give produce the correct result. Ganeti2.4 moved the file containing the RAPI users to a separate directory...
Release 2.4.0
NEWS update and version bump.
Merge branch 'devel-2.3' into devel-2.4
Small improvement to the ganeti man page
Also specifies the comma-escaping feature.
Merge branch 'devel-2.2' into devel-2.3
Fix LUClusterRepairDiskSizes and rpc result usage
This LU was introduced before the RPC result conversion from .data to.payload, and it has managed to keep the old-style usage (how? it'sthe only LU that does so). Fix by changing to payload, and add some...
Fix RPC mismatch in blockdev_getsize[s]
Commit 92fd2250 added consistency checks in the RPC layer, which brokethe call_blockdev_getsizes RPC call (declared with 's' at the end inrpc.py, without 's' in the node daemon).
The immediate fix is to correct the rpc function name, the long term...
RAPI: fix evacuate node resource
PollJob returns the whole op_results, hence a list of opcode results.
Merge remote branch 'stable-2.4' into devel-2.4
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Fix typo in kvm-ifup script
Reported-by: Bas Tichelaar <bas@30loops.net>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
NEWS: Replace smartquotes, start lines with uppercase
- Sphinx converts ASCII quotes ("") to smartquotes (“”) automatically- Sentences or list items start with an uppercase letter- Changed description of non-verbose “gnt-* list” output slightly
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Fix LU processor's GetECId
The exception was never actually raised.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Adeodato Simo <dato@google.com>
Update NEWS and release 2.4.0 rc3
Merge branch 'devel-2.4' into stable-2.4
Fix potential data-loss bug in disk wipe routines
For the 2.4 release, we only add the missing RPC calls. However, thisneeds to be fixed properly, by preventing usage of mis-configureddisks.
Also add a bit more logging so that it's directly clear on which node...
1-char comment typo fix
Expand some acronyms, add to glossary
query_unittest: Fix argument to set()
Commit e431074f introduced an uncatched bug. This patch fixes this. Theset is expecting a list or iteratable to work on, so it splitted theprovided instance name into a set of characters. This caused theexp_status never been set and therefore not catched in one assert rule...
Fix title of query field containing instance name
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Update news and bump version for 2.4.0 rc2
Fix pylint warnings
- 1 80-char line infraction- 4 changes in how arguments are passed to logging functions- 3 pylint disable-msg's because cluster-merge needs to access ganeti config internals
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Guido Trotter <ultrotter@google.com>...
TestRapiInstanceRename use instance name
Currently the QA rename job wrongly passed the whole info dict to theclient.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Change the list formatting to a 'special' chars
And also enable verbose display via the, well, verbose option. Manpage and tests are updated, and the formatting is moved from 4 ifstatements to a data structure.
Signed-off-by: Iustin Pop <iustin@google.com>...
Add support for merging node groups
Add option to rename groups on conflict
Fix minor docstring typo
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add QA rapi test for instance reinstall
This tests at least the basic case, unfortunately there is no way tocheck all possibilities using the provided rapi client, as that will usethe new method unless the cluster doesn't support it.
RAPI: remove required parameters for reinstall
Before c744425f354f1bef2d0d7d306e2d00c494d67d2b instance reinstallaccepted the "os" and "nostartup" optional query parameters. With thatcommit it was changed to allow "os" "start" and "osparams" via bodyrather than encoded in the URL. Unfortunately that commit introduced a...
NodeQuery: don't query non-vm_capable nodes
Because non-vm_capable nodes most likely don't have a hypervisorconfigured and/or storage, so the call will fail anyway.
NodeQuery: mark live fields as UNAVAIL for non-vm_capable nodes
Since we don't have the data per design, UNAVAIL is appropriate here,while NODATA is not.
The patch also adds a comment: if we extend the live fields list tocontain other data in the future, we need to reevaluate this solution....
Fix HV/OS parameter validation on non-vm nodes
Currently, there is at least one LU that does wrong validation of HVparameters (against all nodes, LUClusterSetParams). It's possible tofix this case, but I went and modified the base functions to filterout non-vm_capable nodes so all callers are protected....
Remove superfluous redundant requirement
The condition is already covered by the previous requirement.
Don't remove master_candidate flag from merged nodes
Prevents lots of spurious warnings like:2011-02-10 17:00:22,776: CRITICAL Configuration data is not consistent:Not enough master candidates: actual 3, target 4
Signed-off-by: Stephen Shirley <diamond@google.com>...
Use a consistent ECID base
ECID was being calculated completely differently in__MergeNodeGroups() and _MergeConfig()
listrunner: convert from getopt to optparse
The “-A” (use agent) was not documented, and instead of adding manuallisting, I converted it to optparse like the other CLI tools.
Note that I cleaned up a bit the usage and help texts.
listrunner: fix agent usage
By delaying the agent key query until after the fork, we prevent theproblem of simultaneous access to the agent.
Tested that it works against 80 hosts in parallel without error; thecurrent version breaks already at 20 hosts....
Revert "Disable the cluster-merge tool for the moment"
This reverts commit c0711f2cb989facd60430ab18c5b0e59a1f279ac.
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix cluster-merging by not stopping noded
cli.RunWhileClusterStopped() stops noded on all of the nodes in theoriginal cluster. This prevents /etc/hosts updates on the master, andconfig redistribution doesn't reach the other nodes in the originalcluster. As all we want to do is merge while the master is stopped,...
Fix bug in iallocator data structures build
Commit a1cef11c fixed non-vm_capable nodes export, but brokeinadvertently offline nodes. The update of the dict only needs tohappen for online nodes, in the 'if' block.
Without this patch, offline nodes keep the data from the last node...
Fix error msg for instances on offline nodes
Currently, for both primary and secondary offline nodes, we give thesame message:- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: instance lives on offline node(s) node3...
Minor reordering to match param order
cluster verify and instance disks on offline nodes
Currently, cluster-verify says:
- ERROR: instance instance14: couldn't retrieve status for disk/0 on node3: node offline- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: couldn't retrieve status for disk/0 on node3: node offline...
Cluster verify and N+1 warnings for offline nodes
Currently, cluster verify shows warnings N+1 warnings for offlinenodes having any redundant instances since the memory data that wehave for those nodes is zero, so any instance will trigger thewarning....
Handle gnt-instance shutdown --all for empty clusters
The current code gives:Failure: prerequisites not met for this operation:error type: wrong_input, error details:Selection filter does not match any instances
Use gnt-node add --force-join to add foreign nodes
Add --force-join option to gnt-node add
This is needed so cluster-merge can add nodes from other clusters.
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>...
Fix iterating over node groups
Current line tries to unpack dict incorrectly
Update NEWS file for the 2.4.0 rc1 release
Also bump up the version.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Disable the cluster-merge tool for the moment
Hopefully this can be fixed before the final 2.4 release…
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>Reviewed-by: Stephen Shirley <diamond@google.com>
Bump up intra-cluster import connect timeout
Currently, the export timeout is 10 times 20 seconds, but the importis only 30 seconds. I'm raising this to 60 seconds with two goals inmind:
- when debugging manually, this allows for easier synchronisation of...
Import-export: fix logging of daemon output
In case of failures, the recent daemon output is logged as %r on alist of unicode strings, which results in the (ugly):
Thu Feb 3 05:13:34 2011 snapshot/0 failed to send data: Exited with status 1 (recent output: [u' DUMP: Date of this level 0 dump: Thu Feb 3 05:13:18 2011', u' DUMP: Dumping /dev/mapper/6369a5f7-1e67-4d0d-a4f0-956b3649c6d7.disk0_data.snap-1 (an unlisted file system) to standard output', u' DUMP: Label: none', u' DUMP: Writing 10 Kilobyte records', u' DUMP: mapping (Pass I) [regular files]', u' DUMP: mapping (Pass II) [directories]', u' DUMP: estimated 54301 blocks.', u' DUMP: Volume 1 started with block 1 at: Thu Feb 3 05:13:19 2011', u' DUMP: dumping (Pass III) [directories]', u' DUMP: dumping (Pass IV) [regular files]', u'socat: E SSL_write(): Connection reset by peer', u"dd: dd: writing `standard output': Broken pipe", u' DUMP: Broken pipe', u' DUMP: The ENTIRE dump is aborted.'])...
Fix handling of ^C in the CLI scripts
This adds a message and nice handling of ^C, especially useful for``gnt-job watch``.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
backend: Disable compression in export info file
The new import/export infrastructure in Ganeti 2.2 and up handlescompression differently. It no longer writes compressed files to thedestination. Unfortunately changing this behaviour would be non-trivial,...
utils.SetupLogging: Return function to reopen log file
This function can be used from a SIGHUP handler to reopen log files.Initial, simple unittests are included.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Reopen log files upon SIGHUP in daemons
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
utils.SetupLogging: Make program a mandatory argument
It's passed in by most users (daemons, CLI scripts) and for the others(burnin, watcher) it certainly doesn't hurt, especially when usingsyslog.
utils.log: Restrict I/O error handling coverage
The I/O error will occur while opening the file, not while addingand configuring the handler.
utils.log: Split formatter building into separate function
burner: Trivial code cleanup
- Use constant for exit value- Configure logging from main function, not from class' “__init__”
burnin: Reuse existing function for debug value
Instead of using its own, burnin can use cli.SetGenericOpcodeOpts.
Merge node groups from other cluster
Enforce that new node groups have unique names
Add _UnlockedLookupNodeGroup()
This allows calling of _UnlockedLookupNodeGroup() from withinAddNodeGroup()
Fix grammar of var naming
flatten is the verb, flattened is the adjective.
Minor grammar fix in QuitGanetiException docstring
cluster-merge should refuse to merge own cluster
Also fix type of Merger.cluster_name from list to string. This wouldhave triggered an error in sshRunner if cluster keys were in use.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Introduce re-openable log record handler
This patch adds a new log handler class based on the standard library'sBaseRotatingHandler. This new class allows the log file to be re-opened,e.g. upon receiving a SIGHUP signal. The latter will be implemented in...
Re-create instance disk symlinks on activate
This patch implements recreation of instance disk symlinks when theactivate-disks operation is run. Until now, it was not possible tore-create these symlinks without stopping and starting or migrating aninstance as the RPC call where this is done was in instance startup...
Add RAPI resource for instance console
Export console information as query field
This makes it possible to get the console information via a LUXI query.
manpage: gnt-group remove cannot remove last group
ConfigWriter: simplify _UnlockedVerifyConfig
This just adds a 'cluster' local variable for reducing duplication.
ConfigWriter: add checks for be/nd/nic params
This adds checking (in the configuration) for invalid be, nd and nicparams. The code is a bit tricky as nd params are at cluster,nodegroup and node level, nicparams are at cluster and nic level,whereas beparams are at cluster and instance level....
Add e1000 nic support for HVM
Closes issue: 130
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Prevent removal of last node group
- Add check in ConfigWriter to prevent last node group from being removed- Tidy up error message a bit
Fix instance list for instances running multiple times
If for some reason (e.g. failed migration) one instance is runningon multiple nodes the output can become inconsistent. To get that errorand make it consistent between runs we make the call on the secondary...
Small QA fixes: groups via RAPI, cluster OOB
Add “cluster-oob” to sample configuration file. Don't run RAPI grouptests if disabled.
cluster verify: add hvparams verification
Currently, the validity of the hypervisor parameters is only checkedat init/modification time, and not in the cluster verify. This is bad,as it can lead to inconsistent state that is only detected when thenext modification (which can be unrelated) is made, leading to...
Remove dumb-allocator
- Remove the actual code- Remove mentions of it from iallocator.rst, and use hail instead- Also remove mentions of "etch-image" and use "debootstrap+default" - Mention htools as the reference implementation in iallocator.rst
Open other clusters' config in foreign mode
Add (unused) arg to _OfflineClusterMerge
cli._RunWhileClusterStoppedHelper.Call passes (self, *args) to functionscalled via cli.RunWhileClusterStoppedHelper(). The code in cluster-mergewas broken by commit d8aab233.
Fix unittest breakage on Python 2.4/2.5
Commit 70b0d2a29 broke unittests on Python 2.4 and 2.5. Turns out thatPython 2.6 and above allow classes to be passed as custom test runners,whereas earlier versions don't.
Ensure all resources are used by RAPI client
Check for duplicate RAPI URIs and handlers