Add network and IP pool config methods and ssconf
Add helper methods to ganeti.config.ConfigWriter to handle Network addition andremoval, IP address reservation and commitment to the address pools.
Also add the “networks” ssconf key to make network names/UUIDs available to all...
query: add network query primitives
Add NetworkQueryData data container for network-related queries. Also add anetwork query field generator and helper functions to generate usage statisticsfor IP address pools.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
bootstrap: initialize network configuration to an empty dict
Add network-related LUs
Add Logical Units for network adding and querying network config objects.
LUGroupSetParams: handle network map/unmap
Modify LUGroupSetParams to accept network addition/removal. Networks areconnected to nodegroups' links.
Add network-related opcodes
Add opcodes for network addition and query.
Add network query LUXI method
Add gnt-network client script
gnt-network manipulates network objects. Required parameters have been added toganeti.cli.
Instance addition and removal: manage IP addresses
Prototype implementation of IP address management during instanceaddition/removal.
Fix cyclic dependency between iallocator and pools
The IP pool management infrastructure currently needs to know an instance'snode to fill in IP addresses. For this to happen, the iallocator must runbefore we generate IP addresses, yet the iallocator itself needs the instance's...
Fix typo in gnt-network list
ToStdOut -> ToStdout
More fixes regarding ip pool mgmt and instance creation
Add lib/ippool.py: IP pool management primitives
This patch introduces primitive structures for IP pool management. The core ofthe IP Pool management framework is the ippool.IPv4Network class, which encodesan IPv4 subnet configuration with its IP address pool. The pool functionality...
Add config objects and methods for IP pools
This patch introduces the following changes to lib.config and lib.objects tofacilitate IP pool management.
- Add a "networks" key to objects.cluster to hold link -> subnet relations. Each link is assumed to be associated with at most one IPv4 and at most one...
LUCreateInstance: Retrieve IPs from an IP pool
A new ip keyword is added ("pool") to signify that LUCreateInstance should getthe instance's IP from an IP pool (rather than manually or by DNS resolution).
IP and link checks are re-ordered so that a NIC's link is available at the time...
lib.objects: Add network object definition
Add a network object definition, with the appropriate slots, to lib.objects.
Also add a “networks” top-level configuration slot to host network definitionsand a “networks” slot to the NodeGroup objects to hold network ←→ node-group...
Add ganeti.network: address pool management tools
lib.network contains AddressPool, a helper class for ganeti.objects.Network.AddressPool wraps a Network object and provides all IPv4 Address Poolmanagement logic, manipulating the respective fields of the Network object....
KVM: configure bridged NICs at migration start
Commit 5d9bfd870 moved tap interface handling from KVM to Ganeti, partly toalso solve the problem of routed interfaces getting configured too early duringlive migrations, causing network anomalies. In that direction, configuration of...
Remove duplicate CDROM_IMAGE_PATH KVM hv param def
Introduced during merge, because of fba7f91. Remove the second definition, asit broke KVM HTTP CDROM boot.
Fix argument order in ReserveLV and ReserveMAC
ConfigWriter.ReserveLV() and Configwriter.ReserveMAC() calledTemporaryReservationManager.Reserve() with the ec_id and resource argumentsswapped. As a result, two reservation attempts for the same resource type...
Merge remote branch 'origin/stable-2.4' into grnet-2.4
Conflicts: lib/hypervisor/hv_kvm.py lib/opcodes.py - Merge
TLMigrateInstance: do not migrate to self
Check that the instance is not being migrated to its current primary nodeduring CheckPrereq. Otherwise migration is aborted because the instance isalready running and cleaned-up, which causes the running instance to be killed....
Preload the string-escape code in noded
This encoding, part of the standard Python installation, is used bythe pickle module (in turn used by subprocess when handlingfailures in program execution). Preloading it means that Python willcache it in memory so that even if the disk goes away or just the...
Abstract ignore_consistency opcode parameter
Two opcodes already use it and we need it for a third, time to add aconstant for it.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Fix a bug in LUInstanceMove
The opcode parameter ignore_consistency was used in the LU, but notactually declared in the OpCode. The patch adds it in the opcode andthe command line client.
ObQuote — Please, please, can I have static typing?
Signed-off-by: Iustin Pop <iustin@google.com>...
Try to prevent instance memory changes N+1 failures
There are multiple bugs with the code checking for N+1 failures in theinstance memory changes which needs significant changes, in themeantime we can at least:
- change the warning message into an error (--force will skip checks)...
Use floppy disk and a second CDROM on KVM
Hi all,this patch will add 3 new KVM parameters and a new option.
New Parameters: - floppy_image_path = "" -> Specify the floppy image to load asfloppy disk. - cdrom2_image_path = "" -> Specify a second cdrom image to load on...
Remove self._migrater use from TLMigrateInstance
Comply with changes introduced in f1ea1be, as we forgot to completely removeself._migrater.target_node from TLMigrateInstance.
Make root_path an optional hypervisor parameter
This will allow us an easy migration to pv-grub, because a set root_pathconfused pv-grub.
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add 2 new variables to the OS scripts environment
Add INSTANCE_PRIMARY_NODE and INSTANCE_SECONDARY_NODES. These newvalues are useful for OS scripts that needs to know the nodes wherethe instance lives.. or has lived.
Add --no-wait-for-sync when converting to drbd
Currently, when converting an instance from plain to DRBD, theinstance is blocked during the entire resync period. This patch addsthe --no-wait-for-sync so that the operation finishes as soon as theDRBD sync has started, without waiting for the entire sync. This makes...
Recreate instance disks: allow changing nodes
This patch introduces the option of changing an instance's nodes whendoing the disk recreation. The rationale is that currently if aninstance lives on a node that has gone down and is marked offline,it's not possible to re-create the disks and reinstall the instance on...
Rename instance: only show new name when different
It makes not sense to show messages like:Fri May 6 02:04:01 2011 - INFO: Resolved given name 'instance18' to'instance18'
So we'll skip the message if the resolved name is identical to therequested one....
Fix race condition in LUGroupAssignNodes
The original code would get all node information and their groupswithout before acquiring the necessary locks. With this patch the nodeinformation is only retrieved once all locks have been acquired. Groupsare locked optimistically and verified after acquiring the node locks....
TLMigrateInstance: restore live migration behavior
Commit b9187ba2 erroneously incorporated parts of the code ofTLMigrateInstance.CheckPrereq into TLMigrateInstance._RunAllocator. As aresult, all migrations performed without the use of an iallocator would end-up...
kvm: check that the ISO image is there if it's a URL
Perform a simple urllib2 check on ISO images specified as URL before instancestart, so as to work around qemu bug #597575 [1].
[1] https://bugs.launchpad.net/qemu/+bug/597575
cmdlib: Fix typo, s/nick/NIC/
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
A small optimisation in cluster verify
This removes (count of instances + count of nodes) lockacquires/releases.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
A few docstring fixes
At least one generates an epydoc error :)
luxi: do not handle KeyboardInterrupt
With the current code, it's possible to mistake a ^C for a protocolerror:
node1# gnt-job info 221691[press ^C]Unhandled protocol error while talking to the master daemon:Error while deserializing response:
(and note empty error message)....
Handle EPIPE errors while writing to the terminal
This handles EPIPE errors in two places: ToStream (to catch loggingdone in GenericMain itself) and in GenericMain (to cover also plainprint statements).
Cluster verify: check for missing bridges
Currently cluster verify doesn't check for bridge information; theonly checks are done at instance create and failover/migratetime. This means a cluster that seems healthy will fail creation jobs.
This patch implements a simple verification that all nodes (in the...
TLReplaceDisks: Use implicit loop for dictionary
Release unneeded locks while replacing disks
If an iallocator is used, “gnt-instance replace-disks” would acquire thelocks of all nodes (only the allocator will decide which node to use).Unfortunately the unneeded locks were not released during the operation,...
locking: Export “list_owned” from lock manager
This is analog to “is_owned” and will be used for assertions.
gnt-instance: Fix typo in error message
The iallocator parameter is “-I”, not “-i”.
mlock: fail gracefully if libc.so.6 cannot be loaded
This allows noded to continue instead of blowing up if the libc majornumber changes.
Allow creating the DRBD metadev in a different VG
This is a simple change to allow specifying a different VG for themeta device during the creation of instances and addition of disks viagnt-instance modify.
Make _GenerateDRBD8Branch accept different VG names
This is a small change to make this function take a list of VG names,instead of a single one.
Fix WriteFile with unicode data
Unicode is fun, indeed:
len(buffer("abc"))
3
len(buffer(u"abc"))
12
So we can't pass unicode data to buffer(), as the result will be towrite the in-memory (usually UTF-32) representation to disk.
Fix for multiple VGs - PlainToDrbd and replace-disks
Converting an instance from 'plain' to 'drbd'. The old code wouldcreate the drbd volumes in the default VG and then the renames wouldfail. This fix pulls the plain VG names from the existing volumes and...
Replace disks: keep the meta device in the same VG
This patch enhances the multi-VG support in replace disks, by keepingthe meta device in the same VG, as opposed to moving it to the datadevice VG (note that we don't have a way to create the meta in adifferent VG in the first place, but at least we correctly handle a...
Fix punctuation in an error message
IIRC we don't use punctuation at the end of error messages.
Prevent readding of the master node
This breaks Ganeti in multiple ways. If we don't make the check ingnt-node itself, then bootstrap.SetupNodeDaemon will restart themaster daemon, making the operation fail:
node1# gnt-node add --readd node1 Cannot communicate with the master daemon....
Improve error messages in cluster verify/OS
A few issues in the clarity of the error messages are fixed:
- "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image"
- "API version lenny-image differs from reference node node1: 10, 5...
Fix potential data-loss in utils.WriteFile
os.write can do incomplete writes, as long as at least some bytes havebeen written (like write(2)):
os.write(fd, " " * 1300)
1300
os.write(fd, " " * 1300)...
cli: Fix wrong argument kind for groups
Fix typo in LUGroupAssignNodes
gnt-instance info: automatically request locking
Commit dae661a4 added support for controlling the locking, but itdidn't modify the gnt-instance info code, which leads to this commandalways showing:
Wed Apr 20 04:10:48 2011 - WARNING: Non-static data requested, locks...
Remove 20-second sleeps from _ExecMigration
TLMigrateInstance._ExecMigration included two 10-second sleeps for unknownreasons.
This patch removes them.
KVM: reduce 'info migrate' polling period to 1s
Shared storage instance failover
Modify LUFailoverInstance to enable shared storage instances to failover.Shared storage instance failover requires either a target node or aniallocator to determine the target node. If none is given, the cluster default...
KVM: use cache=none for shared disk templates
Disable host cache for externally mirrored disks to avoid cache incoherency.Without this, migrations between the same two nodes may end up in diskcorruption.
This is a runtime override of cluster defaults, mostly a workaround....
Rename DTS_NET_MIRROR to DTS_INT_MIRROR
DTS_INT_MIRROR better contrasts DTS_EXT_MIRROR.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>[iustin@google.com: updated patch for changed context]Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Instance failover: fix bug for INT_MIRROR cases
Patches db366d9a and aac4511a added support for EXT_MIRROR instances,but inadvertently introduced a bug: for INT_MIRROR cases, we don'tneed (actually we can't support) neither an iallocator nor a targetnode....
Allow disk adoption during disk addition
It is now possible to allow adopting a disk during gnt-instance modify time, asfollows:
gnt-instance modify --disk add:adopt=/path/to/disk (blockdev) orgnt-instance modify --disk add:adopt=<lvname> (plain)
We do the same checks as during instance creation....
Temporary workaround for hail to work with shared storage
Fix parts of shared storage migration
Commit faaabe3c fixed failover behaviour for DTS_INT_MIRROR instances, howeverit broke migration for DTS_EXT_MIRROR instances, by moving iallocator and nodechecks from LUInstanceMigrate to TLMigrateInstance. This has the side-effect...
Add completion suggestion for more parameters
Add suggestions for disk-, nic-, and backend-parameter completion.
Also alter autotools/build-bash-completion to ignore the new suggestion typesfor the moment.
Ignore parameter completion for bash completion
Named parameters are only handled by zsh completion for the time being.
Allow KVM to boot from HTTP
New versions of KVM support booting from HTTP-hosted ISOimages, via libcurl. This patch adds a proper check toallow defining either a sane, absolute path or an HTTPURL as an iso image path.
Remove "format=raw" from the cdrom device options when iso_image...
Prevent ssconf values from having non-string values
For whatever reason, my test cluster managed to acquireshared_file_storage_dir with a None value, instead of emptystring. This is not flagged in masterd itself, but the node daemonwill fail in writing the value to disk, as it calls len() on the...
Fix shared_file_storage_dir on upgrades
If the cluster was upgraded from 2.4 or earlier, this key won't exist(it's only set to a correct value on cluster init), so we need toproperly set it to a null string (disabled).
Core shared file storage support
This patch introduces core file storage support, consisting of the following:
A configure-time switch for enabling/disabling shared file storagesupport and controlling the shared file storage location:--with-shared-file-storage-dir=. Shared file storage configuration is then...
Shared file storage initialization code
Add shared file storage handling during cluster initialization.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add DTS_MIRRORED frozenset
Use DTS_MIRRORED to indicate mirrored disk templates that allowmigrations/failover.
DTS_MIRRORED is the union of DTS_EXT_MIRROR and DTS_NET_MIRROR.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Reviewed-by: Michael Hanselmann <hansmi@google.com>...
Add bdev_sizes RPC call
The bdev_sizes multi-node RPC call returns the sizes of the requestedblock devices on the desired nodes. Its intended use is to verify theexistence of a block device on a given node for shared block storagesupport.
Block device paths are expected to lie under constants.BLOCKDEV_DIR...
Shared block storage support
This patch introduces basic shared block storage support.
It introduces a new storage backend, bdev.PersistentBlockDevice, touse as a backend for shared block storage. The new bdev requires a newBLOCKDEV_DRIVER_MANUAL constant with the value "manual" and uses it as...
IAllocator changes to work with shared storage
Make cmdlib.IAllocator shared-storage-aware. IAllocator requires secondarynodes only on DTS_NET_MIRROR disk templates and requires no secondaries forDTS_EXT_MIRROR templates.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Migration and failover: add iallocator and target_node slots
Add iallocator and target_node slots to OpMigrateInstance andOpFailoverInstance to facilitate shared-storage-backed instance mobility. Addiallocator slot to OpMigrateNode (no explicit target_node in this case)....
CLI changes to facilitate shared storage migration/failover
Add DST_NODE_OPT to cli.py to use for directly specifying the target nodeduring migration/failover.
gnt-instance failover/migrate also get passed an iallocator option.
gnt-node failover/migrate get only a target_node option....
Shared storage instance migration
Modify LUMigrateInstance and TLMigrateInstance to allow instance migrations forinstances with DTS_EXT_MIRROR disk templates.
Migrations of shared storage instances require either a target node, or aniallocator to determine the target node. If none is given, the cluster default...
Shared storage node migration
Modify LUNodeMigrate to provide node migration for nodes with instances usingshared storage. gnt-node migrate has to be passed an iallocator for migrationof shared storage instances to be performed. When using a shared storage...
Fix master IP activation in failover with no-voting
Thanks to net.for.hub@gmail.com for reporting this. The logic inmasterd.CheckMasterd did an early return in case of no_voting, henceskipping the master IP activation. We just change the ifs to notreturn but simply continue through the function....
disk wiping: fix bug in chunk size computation
The current wipe_chunk_size computation is doing min(int_value,float_value). For small disks (below 10GiB), the actual formula willresult into the float value being chosen. This results into veryinteresting behaviour:...
Fix bug in watcher
If “utils.RunParts” were to raise an exception, a log message waswritten and the code continued to run. Due to the exception the“results” variable would not be defined.
Also change the code to log a backtrace (getting an exception is rather...
Release locks before wiping disks during instance creation
Ganeti 2.3 introduced an optional feature to overwrite an instance'sdisks on creation. Unfortunately the code kept all locks while doing thewipe, slowing down the creation of multiple instances in parallel....
utils.WriteFile: Close file before renaming
Issue 154 (http://code.google.com/p/ganeti/issues/detail?id=154)reported an “Operation not supported” error when writing instanceexports to a mounted CIFS filesystem. Experimentation showed the errorto only occur when using rename(2) on an opened file. Various references...
Nicer formatting for group query error
Before this patc the message would look like “Some groups do not exist:[u'foo', u'bar']”, now it's “Some groups do not exist: foo, bar”.
Merge branch 'stable-2.4' into devel-2.4
LUInstanceQueryData: Don't acquire locks unless requested
Until now LUInstanceQueryData always acquired locks for the instance(s)and nodes involved. In combination with long-running operations thisprevented the use of “gnt-instance info”, even with the “--static”...
Increase the lock timeouts before we block-acquire
This has been observed to cause problems on real clusters via thefollowing mechanism:
- a long job (e.g. a replace-disks) is keeping an exclusive lock on an instance- the watcher starts and submits its query instances opcode which...
daemon.py: move startup log message before prep_fn
Before this, the output in the rapi daemon log was:2011-04-04 03:09:51,026: ganeti-rapi pid=17447 INFO Reading users fileat /var/lib/ganeti/rapi/users2011-04-04 03:09:51,027: ganeti-rapi pid=17447 INFO ganeti-rapi daemon...
Display the actual memory values in N+1 failures
This changes the display from:Mon Apr 4 02:29:46 2011 * Verifying N+1 Memory redundancyMon Apr 4 02:29:46 2011 - ERROR: node node2: not enough memory toaccomodate instance failovers should node node1 fail...
ssh.VerifyNodeHostname: remove the quiet flag
This is not needed for this function, and can interfere with debuggingof ssh failures.
Fix output for “gnt-job info”
If the result of an opcode was a non-empty dictionary, itwould be impossible to differenciate between input and result:
Input fields: […] debug_level: 0 fields: cluster_name,master_node,volume_group_name jobs: [[True, u'37922'], [True, u'37923'], [True, u'37924']]...
watcher: Fix misleading usage output
When “ganeti-watcher” is called with an argument, it would hint ata non-existing “-f” parameter. With this patch the separate usagestring is no longer necessary.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Clarify --force-join parameter message
This isn't only used during cluster merge.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
locking: Fix race condition in lock monitor
In some rare cases it can happen that a lock is re-created very soonafter deletion, while the old instance hasn't been destructed yet. Insuch a case the code would detect a duplicate name and raise anexception....
utils: Export NiceSortKey function
The ability to split a string into a list of strings and integers can behandy elsewhere and is necessary for sorting query results by names.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>...