kvm: check that the ISO image is there if it's a URL
Perform a simple urllib2 check on ISO images specified as URL before instancestart, so as to work around qemu bug #597575 [1].
[1] https://bugs.launchpad.net/qemu/+bug/597575
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Remove 20-second sleeps from _ExecMigration
TLMigrateInstance._ExecMigration included two 10-second sleeps for unknownreasons.
This patch removes them.
KVM: reduce 'info migrate' polling period to 1s
Shared storage instance failover
Modify LUFailoverInstance to enable shared storage instances to failover.Shared storage instance failover requires either a target node or aniallocator to determine the target node. If none is given, the cluster default...
KVM: use cache=none for shared disk templates
Disable host cache for externally mirrored disks to avoid cache incoherency.Without this, migrations between the same two nodes may end up in diskcorruption.
This is a runtime override of cluster defaults, mostly a workaround....
Rename DTS_NET_MIRROR to DTS_INT_MIRROR
DTS_INT_MIRROR better contrasts DTS_EXT_MIRROR.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>[iustin@google.com: updated patch for changed context]Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Instance failover: fix bug for INT_MIRROR cases
Patches db366d9a and aac4511a added support for EXT_MIRROR instances,but inadvertently introduced a bug: for INT_MIRROR cases, we don'tneed (actually we can't support) neither an iallocator nor a targetnode....
Allow disk adoption during disk addition
It is now possible to allow adopting a disk during gnt-instance modify time, asfollows:
gnt-instance modify --disk add:adopt=/path/to/disk (blockdev) orgnt-instance modify --disk add:adopt=<lvname> (plain)
We do the same checks as during instance creation....
Temporary workaround for hail to work with shared storage
Fix parts of shared storage migration
Commit faaabe3c fixed failover behaviour for DTS_INT_MIRROR instances, howeverit broke migration for DTS_EXT_MIRROR instances, by moving iallocator and nodechecks from LUInstanceMigrate to TLMigrateInstance. This has the side-effect...
Add completion suggestion for more parameters
Add suggestions for disk-, nic-, and backend-parameter completion.
Also alter autotools/build-bash-completion to ignore the new suggestion typesfor the moment.
Ignore parameter completion for bash completion
Named parameters are only handled by zsh completion for the time being.
Allow KVM to boot from HTTP
New versions of KVM support booting from HTTP-hosted ISOimages, via libcurl. This patch adds a proper check toallow defining either a sane, absolute path or an HTTPURL as an iso image path.
Remove "format=raw" from the cdrom device options when iso_image...
Prevent ssconf values from having non-string values
For whatever reason, my test cluster managed to acquireshared_file_storage_dir with a None value, instead of emptystring. This is not flagged in masterd itself, but the node daemonwill fail in writing the value to disk, as it calls len() on the...
Fix shared_file_storage_dir on upgrades
If the cluster was upgraded from 2.4 or earlier, this key won't exist(it's only set to a correct value on cluster init), so we need toproperly set it to a null string (disabled).
Signed-off-by: Iustin Pop <iustin@google.com>...
Core shared file storage support
This patch introduces core file storage support, consisting of the following:
A configure-time switch for enabling/disabling shared file storagesupport and controlling the shared file storage location:--with-shared-file-storage-dir=. Shared file storage configuration is then...
Shared file storage initialization code
Add shared file storage handling during cluster initialization.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add DTS_MIRRORED frozenset
Use DTS_MIRRORED to indicate mirrored disk templates that allowmigrations/failover.
DTS_MIRRORED is the union of DTS_EXT_MIRROR and DTS_NET_MIRROR.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Reviewed-by: Michael Hanselmann <hansmi@google.com>...
Add bdev_sizes RPC call
The bdev_sizes multi-node RPC call returns the sizes of the requestedblock devices on the desired nodes. Its intended use is to verify theexistence of a block device on a given node for shared block storagesupport.
Block device paths are expected to lie under constants.BLOCKDEV_DIR...
Shared block storage support
This patch introduces basic shared block storage support.
It introduces a new storage backend, bdev.PersistentBlockDevice, touse as a backend for shared block storage. The new bdev requires a newBLOCKDEV_DRIVER_MANUAL constant with the value "manual" and uses it as...
IAllocator changes to work with shared storage
Make cmdlib.IAllocator shared-storage-aware. IAllocator requires secondarynodes only on DTS_NET_MIRROR disk templates and requires no secondaries forDTS_EXT_MIRROR templates.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Migration and failover: add iallocator and target_node slots
Add iallocator and target_node slots to OpMigrateInstance andOpFailoverInstance to facilitate shared-storage-backed instance mobility. Addiallocator slot to OpMigrateNode (no explicit target_node in this case)....
CLI changes to facilitate shared storage migration/failover
Add DST_NODE_OPT to cli.py to use for directly specifying the target nodeduring migration/failover.
gnt-instance failover/migrate also get passed an iallocator option.
gnt-node failover/migrate get only a target_node option....
Shared storage instance migration
Modify LUMigrateInstance and TLMigrateInstance to allow instance migrations forinstances with DTS_EXT_MIRROR disk templates.
Migrations of shared storage instances require either a target node, or aniallocator to determine the target node. If none is given, the cluster default...
Shared storage node migration
Modify LUNodeMigrate to provide node migration for nodes with instances usingshared storage. gnt-node migrate has to be passed an iallocator for migrationof shared storage instances to be performed. When using a shared storage...
Clarify --force-join parameter message
This isn't only used during cluster merge.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Treat empty oob_program param as default
There is currently no way to reset oob_program back to its default fromthe cmdline, which causes problems for cluster-merge. This patch meansthat the following now works: gnt-cluster modify --node-parameters oob_program=...
Fix bug in instance listing with orphan instances
Nodes can return unknown instances, so we shouldn't use the name as anindex without checking.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix bug related to log opening failures
If opening the log file fails, then we shouldn't attempt to use thatvariable.
Merge branch 'devel-2.3' into devel-2.4
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Merge branch 'devel-2.2' into devel-2.3
Fix LUClusterRepairDiskSizes and rpc result usage
This LU was introduced before the RPC result conversion from .data to.payload, and it has managed to keep the old-style usage (how? it'sthe only LU that does so). Fix by changing to payload, and add some...
Fix RPC mismatch in blockdev_getsize[s]
Commit 92fd2250 added consistency checks in the RPC layer, which brokethe call_blockdev_getsizes RPC call (declared with 's' at the end inrpc.py, without 's' in the node daemon).
The immediate fix is to correct the rpc function name, the long term...
RAPI: fix evacuate node resource
PollJob returns the whole op_results, hence a list of opcode results.
Merge remote branch 'stable-2.4' into devel-2.4
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Fix LU processor's GetECId
The exception was never actually raised.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Adeodato Simo <dato@google.com>
Merge branch 'devel-2.4' into stable-2.4
Fix potential data-loss bug in disk wipe routines
For the 2.4 release, we only add the missing RPC calls. However, thisneeds to be fixed properly, by preventing usage of mis-configureddisks.
Also add a bit more logging so that it's directly clear on which node...
1-char comment typo fix
Expand some acronyms, add to glossary
Fix title of query field containing instance name
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Change the list formatting to a 'special' chars
And also enable verbose display via the, well, verbose option. Manpage and tests are updated, and the formatting is moved from 4 ifstatements to a data structure.
Fix minor docstring typo
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
RAPI: remove required parameters for reinstall
Before c744425f354f1bef2d0d7d306e2d00c494d67d2b instance reinstallaccepted the "os" and "nostartup" optional query parameters. With thatcommit it was changed to allow "os" "start" and "osparams" via bodyrather than encoded in the URL. Unfortunately that commit introduced a...
NodeQuery: don't query non-vm_capable nodes
Because non-vm_capable nodes most likely don't have a hypervisorconfigured and/or storage, so the call will fail anyway.
NodeQuery: mark live fields as UNAVAIL for non-vm_capable nodes
Since we don't have the data per design, UNAVAIL is appropriate here,while NODATA is not.
The patch also adds a comment: if we extend the live fields list tocontain other data in the future, we need to reevaluate this solution....
Fix HV/OS parameter validation on non-vm nodes
Currently, there is at least one LU that does wrong validation of HVparameters (against all nodes, LUClusterSetParams). It's possible tofix this case, but I went and modified the base functions to filterout non-vm_capable nodes so all callers are protected....
Fix bug in iallocator data structures build
Commit a1cef11c fixed non-vm_capable nodes export, but brokeinadvertently offline nodes. The update of the dict only needs tohappen for online nodes, in the 'if' block.
Without this patch, offline nodes keep the data from the last node...
Fix error msg for instances on offline nodes
Currently, for both primary and secondary offline nodes, we give thesame message:- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: instance lives on offline node(s) node3...
cluster verify and instance disks on offline nodes
Currently, cluster-verify says:
- ERROR: instance instance14: couldn't retrieve status for disk/0 on node3: node offline- ERROR: instance instance14: instance lives on offline node(s) node3- ERROR: instance instance15: couldn't retrieve status for disk/0 on node3: node offline...
Cluster verify and N+1 warnings for offline nodes
Currently, cluster verify shows warnings N+1 warnings for offlinenodes having any redundant instances since the memory data that wehave for those nodes is zero, so any instance will trigger thewarning....
Handle gnt-instance shutdown --all for empty clusters
The current code gives:Failure: prerequisites not met for this operation:error type: wrong_input, error details:Selection filter does not match any instances
Signed-off-by: Stephen Shirley <diamond@google.com>...
Add --force-join option to gnt-node add
This is needed so cluster-merge can add nodes from other clusters.
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>...
Bump up intra-cluster import connect timeout
Currently, the export timeout is 10 times 20 seconds, but the importis only 30 seconds. I'm raising this to 60 seconds with two goals inmind:
- when debugging manually, this allows for easier synchronisation of...
Import-export: fix logging of daemon output
In case of failures, the recent daemon output is logged as %r on alist of unicode strings, which results in the (ugly):
Thu Feb 3 05:13:34 2011 snapshot/0 failed to send data: Exited with status 1 (recent output: [u' DUMP: Date of this level 0 dump: Thu Feb 3 05:13:18 2011', u' DUMP: Dumping /dev/mapper/6369a5f7-1e67-4d0d-a4f0-956b3649c6d7.disk0_data.snap-1 (an unlisted file system) to standard output', u' DUMP: Label: none', u' DUMP: Writing 10 Kilobyte records', u' DUMP: mapping (Pass I) [regular files]', u' DUMP: mapping (Pass II) [directories]', u' DUMP: estimated 54301 blocks.', u' DUMP: Volume 1 started with block 1 at: Thu Feb 3 05:13:19 2011', u' DUMP: dumping (Pass III) [directories]', u' DUMP: dumping (Pass IV) [regular files]', u'socat: E SSL_write(): Connection reset by peer', u"dd: dd: writing `standard output': Broken pipe", u' DUMP: Broken pipe', u' DUMP: The ENTIRE dump is aborted.'])...
Fix handling of ^C in the CLI scripts
This adds a message and nice handling of ^C, especially useful for``gnt-job watch``.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
backend: Disable compression in export info file
The new import/export infrastructure in Ganeti 2.2 and up handlescompression differently. It no longer writes compressed files to thedestination. Unfortunately changing this behaviour would be non-trivial,...
utils.SetupLogging: Return function to reopen log file
This function can be used from a SIGHUP handler to reopen log files.Initial, simple unittests are included.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Reopen log files upon SIGHUP in daemons
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
utils.SetupLogging: Make program a mandatory argument
It's passed in by most users (daemons, CLI scripts) and for the others(burnin, watcher) it certainly doesn't hurt, especially when usingsyslog.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
utils.log: Restrict I/O error handling coverage
The I/O error will occur while opening the file, not while addingand configuring the handler.
utils.log: Split formatter building into separate function
Enforce that new node groups have unique names
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Add _UnlockedLookupNodeGroup()
This allows calling of _UnlockedLookupNodeGroup() from withinAddNodeGroup()
Minor grammar fix in QuitGanetiException docstring
Introduce re-openable log record handler
This patch adds a new log handler class based on the standard library'sBaseRotatingHandler. This new class allows the log file to be re-opened,e.g. upon receiving a SIGHUP signal. The latter will be implemented in...
Re-create instance disk symlinks on activate
This patch implements recreation of instance disk symlinks when theactivate-disks operation is run. Until now, it was not possible tore-create these symlinks without stopping and starting or migrating aninstance as the RPC call where this is done was in instance startup...
Add RAPI resource for instance console
Export console information as query field
This makes it possible to get the console information via a LUXI query.
ConfigWriter: simplify _UnlockedVerifyConfig
This just adds a 'cluster' local variable for reducing duplication.
ConfigWriter: add checks for be/nd/nic params
This adds checking (in the configuration) for invalid be, nd and nicparams. The code is a bit tricky as nd params are at cluster,nodegroup and node level, nicparams are at cluster and nic level,whereas beparams are at cluster and instance level....
Add e1000 nic support for HVM
Closes issue: 130
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Prevent removal of last node group
- Add check in ConfigWriter to prevent last node group from being removed- Tidy up error message a bit
Signed-off-by: Stephen Shirley <diamond@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix instance list for instances running multiple times
If for some reason (e.g. failed migration) one instance is runningon multiple nodes the output can become inconsistent. To get that errorand make it consistent between runs we make the call on the secondary...
cluster verify: add hvparams verification
Currently, the validity of the hypervisor parameters is only checkedat init/modification time, and not in the cluster verify. This is bad,as it can lead to inconsistent state that is only detected when thenext modification (which can be unrelated) is made, leading to...
RAPI client: Wrap /2/redistribute-config resource
RAPI client: De-/activating instance disks
Watcher: Fix endless repair tries for broken secondary
In cases where secondary was offline and not evacuated watcher triedto activate-disks in an endless manner, but this is useless, as thesecondary is offline and therefore not responding to this approach....
Verify disks: increase parallelism and other fixes
The recent work on multi-VG support has converted LUClusterVerifyDisksinto doing serialised calls to each node, as each node can havedifferent VGs. This is suboptimal, especially for big clusters, where...
gnt-cluster verify-disks: fix VG name
Recent multi-VG work already exports the missing LV names as vg/lv,not simply lv. So the query and addition of the VG name in gnt-clusterverify-disks is redundant, and even wrong for non-default-VGinstances.
Deactivate disks: allow skipping hypervisor checks
In some cases (e.g. the hypervisor not running at all), we might wantto force disk deactivation, skipping the hypervisor checks. I believethis is not a good thing to do all the time, so this patch adds the...
Wait for master to become available on initialization
This is analogue to the existing check for a responsive node daemon.
Start all daemons on cluster initialization
At least ganeti-confd was not started. It got started a few minuteslater by ganeti-watcher. Also move one pylint disable to the effectiveline.
Improve option descriptions
Also replace hardcoded “xenvg” with constant.
Remove two unused variables
Show hidden/blacklisted OSes in cluster info
Since we can blacklist/hide non-existing OSes (for preseeding), wecannot query easily the OSes themselves for this status. Hence weexport the entire lists in cluster info (which should be cheaper thangnt-os diagnose)....
Fix LUOSDiagnose and non-vm_capable nodes
This skips non-vm_capable nodes in the OS diagnose search, since suchOSes will not be used anyway on those nodes.
Rephrasing two error messages for auto promotion
Using auto_promote or auto-promote can lead to confusion on using theuser facing interfaces. While auto-promote is fine for CLI it's not forRAPI and vice-versa. This patch should eliminate this confusion....
storage: Check that mapper is either used or None
This is a followup patch to the one moving GetAllocatable out tomodule level.
Fix bug in “gnt-node list-storage”
LVM PV storage units would always show as allocatable, even when theyweren't. For some reason I have not been able to determine, the functionparsing the attributes (“_GetAllocatable”) was not even called and thelist opcode simply returned the attribute string as the value (e.g....
Fix payload check for out-of-band health
This logic error was not detected before as health has not beenimplemented on the cli and therefore no QA code existed for that.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix premature abort of LUOobCommand due to result.Raise
This is a bug I recognized while doing tests on gnt-node health. A leftover result.Raise line causes premature abort of LUOobCommand on thefirst node failing the RPC call. This is not expected behaviour for...
ht: Add TMaybeDict check
This replaces a number of equal “ht.TOr(ht.TDict, ht.TNone)” checks.
Modify LUOobCommand to support multiple nodes
This will change the result of this LU to a query like result. A list oftuples with information about the state of the data.
It also includes the modification to the commands calling this opcode.
Signed-off-by: René Nussbaumer <rn@google.com>...
Merge branch 'devel-2.4'