Merge remote-tracking branch 'kwolf/for-anthony' into staging
qcow2: Zero write support
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qcow2: Ignore reserved bits in refcount table entries
qcow2: Ignore reserved bits in check_refcounts
Also don't infer the cluster type directly from the L2 entries, but useqcow2_get_cluster_type() to keep everything in a single place.
qcow2: Version 3 images
This adds the basic infrastructure to qcow2 to handle version 3 images.It includes code to create v3 images, allow header updates for v3 imagesand checks feature bits.
It still misses support for zero clusters, so this is not a fully...
qcow2: Support reading zero clusters
This adds support for reading zero clusters in version 3 images.
qcow2: Support for feature table header extension
Instead of printing an ugly bitmask, qemu can now print a more helpfulstring even for yet unknown features.
qcow2: Ignore reserved bits in L1/L2 entries
This changes the still existing places that assume that the only flagsare QCOW_OFLAG_COPIED and QCOW_OFLAG_COMPRESSED to properly mask outreserved bits.
It does not convert bdrv_check yet.
qcow2: Refactor qcow2_free_any_clusters
Zero clusters will add another cluster type. Refactor the open-codedcluster type detection into a switch of QCOW2_CLUSTER_* options so thatthe detection is in a single place. This makes it easier to add newcluster types....
qcow2: Simplify count_cow_clusters
count_cow_clusters() tries to reuse existing functions, and all itachieves is to make things much more complicated than they really are:Everything needs COW, unless it's a normal cluster with refcount 1.
This patch implements the obvious way of doing this, and by using...
qcow2: Save disk size in snapshot header
This allows that different snapshots of an image can have differentsizes, which is a requirement for enabling image resizing even withimages that have internal snapshots.
We don't do the actual support for it now, but make sure that the...
qcow2: Ignore reserved bits in get_cluster_offset
With this change, reading from a qcow2 image ignores all reserved bitsthat are set in an L1 or L2 table entry.
Now get_cluster_offset() assigns *cluster_offset only the offset withoutany other flags. The cluster type is not longer encoded in the offset,...
qcow2: Ignore reserved bits in count_contiguous_clusters()
Until now, count_contiguous_clusters() has an argument that allowed tospecify flags that should be ignored in the comparison, i.e. that areallowed to change between contiguous clusters.
This patch changes the function so that it ignores all flags by default...
qcow2: Fail write_compressed when overwriting data
qcow2_alloc_compressed_cluster_offset() already fails if the copied flagis set, because qcow2_write_compressed() doesn't perform COW as it wouldhave to do to allow this.
However, what we really want to check here is whether the cluster is...
qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at()
Refcount block allocation and refcount table growth rely ons->free_cluster_index pointing to somewhere after the currentallocation. Change qcow2_alloc_cluster_at() to fulfill thisassumption....
aio: remove process_queue callback and qemu_aio_process_queue
Both unused after the previous patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>Signed-off-by: Kevin Wolf <kwolf@redhat.com>
nbd: consistently check for <0 or >=0
This prepares for the following patch, which changes -1 return valuesto negative errno.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
nbd: consistently return negative errno values
In the next patch we need to look at the return code of nbd_wr_sync.To avoid percolating the socket_error() ugliness all around, let'shandle errors by returning negative errno values.
nbd: do not block in nbd_wr_sync if no data at all is available
Right now, nbd_wr_sync will hang if no data at all is available on thesocket and the other side is not going to provide any. Relax this bymaking it loop only for writes or partial reads. This fixes a race...
nbd: avoid out of bounds access to recv_coroutine array
This can happen with a buggy or malicious server.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qcow2: Fix error handling in qcow2_alloc_cluster_offset
If do_alloc_cluster_offset() fails, the error handling code tried toremove the request from the in-flight queue, to which it wasn't addedyet, resulting in a NULL pointer dereference.
m->nb_clusters really only becomes != 0 when the request is in the list....
qcow2: Fix return value of alloc_refcount_block
Someone forgot something in commit 29c1a730... Documenting the rightreturn value is not enough, you also need to actually return it in thecode.
This bug sometimes causes error return values even when everything has...
block: Fix spelling in comment (ineffcient -> inefficient)
Signed-off-by: Stefan Weil <sw@weilnetz.de>Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qed: remove incoming live migration blocker
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qed: honor BDRV_O_INCOMING for incoming live migration
From original commit with Patchwork-id: 31108 byStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
"The QED image format includes a file header bit to mark images dirty.QED normally checks dirty images on open and fixes inconsistent...
qed: add bdrv_invalidate_cache to be called after incoming live migration
The QED image is reopened to flush metadata and check consistency.
block stream: close unused files and update ->backing_hd
Close the now unused images that were part of the previous backing filechain and adjust ->backing_hd, backing_filename and backing_formatproperly.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=801449...
qed: track dirty flag status
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>Signed-off-by: Kevin Wolf <kwolf@redhat.com>
sheepdog: implement SD_OP_FLUSH_VDI operation
Flush operation is supposed to flush the write-back cache ofsheepdog cluster.
By issuing flush operation, we can assure the Guest of datareaching the sheepdog cluster storage.
Cc: Kevin Wolf <kwolf@redhat.com>...
sheepdog: fix send req helpers
We should return if reading of the header fails.
Cc: Kevin Wolf <kwolf@redhat.com>Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>Signed-off-by: Liu Yuan <tailai.ly@taobao.com>Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>...
block/vpc: write checksum back to footer after check
After validation check, the 'checksum' is not written backto footer, which leave it with zero.
This results in errors while loadding it under Microsoft'sHyper-V environment, and also errors from utilities like...
vdi: basic conversion to coroutines
Even a basic conversion changing the bdrv_aio_readv/bdrv_aio_writev callsto bdrv_co_readv/bdrv_co_writev, and callbacks to goto statements caneliminate a lot of code. This is because error handling is simplifiedand indirections through bottom halves can go away....
vdi: move end-of-I/O handling at the end
The next step is to take code that only triggers after the first operation,and move it at the end of vdi_aio_read_cb and vdi_aio_write_cb.
Acked-by: Stefan Weil <sw@weilnetz.de>Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>...
vdi: merge aio_read_cb and aio_write_cb into callers
Now inline the former AIO callbacks into vdi_co_readv and vdi_co_writev.While many cleanups are possible, the code now really looks synchronous.
vdi: move aiocb fields to locals
Most of the AIOCB really holds local variables that need to persistacross callback invocation. It can go away now.
Acked-by: Stefan Weil <sw@weilnetz.de>Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>Signed-off-by: Kevin Wolf <kwolf@redhat.com>
vdi: leave bounce buffering to block layer
vdi.c really works as if it implemented bdrv_read and bdrv_write. However,because only vector I/O is supported by the asynchronous callbacks, itwent through extra pain to bounce-buffer the I/O. This can be handled...
vdi: do not create useless iovecs
Reads and writes to the underlying file can also occur with the simplenon-vectored I/O interfaces.
vdi: change goto to loop
Finally reindent all code and change goto statements to a loop.
block: fix streaming/closing race
Streaming can issue I/O while qcow2_close is running. This causes theL2 caches to become very confused or, alternatively, could cause asegfault when the streaming coroutine is reentered after closing itsblock device. The fix is to cancel streaming jobs when closing their...
block: set job->speed in block_set_speed
There is no need to do this in every implementation of set_speed(even though there is only one right now).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>...
qed: image fragmentation statistics
qcow2: Remove unused parameter in get_cluster_table()
Since everything goes through the cache, callers don't use the L2 tableoffset any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
block: push recursive flushing up from drivers
block/curl: Replace usleep by g_usleep
The function usleep is not available for all supported platforms:at least some versions of MinGW don't support it.
usleep was also declared obsolete by POSIX.1-2001.
The function g_usleep is part of glib2.0, so it is available for...
qcow2: Factor out count_cow_clusters
qcow2: Add qcow2_alloc_clusters_at()
This function allows to allocate clusters at a given offset in the imagefile. This is useful if you want to allocate the second part of an areathat must be contiguous.
qcow2: Reduce number of I/O requests
If the first part of a write request is allocated, but the second isn'tand it can be allocated so that the resulting area is contiguous, handleit at once. This is a common case for sequential writes.
After this patch, alloc_cluster_offset() only checks if the clusters are...
qed: do not evict in-use L2 table cache entries
The L2 table cache reduces QED metadata reads that would be requiredwhen translating LBAs to offsets into the image file. Since requestsexecute in parallel it is possible to share an L2 table between multiple...
qcow2: Add some tracing
qcow2: Add error messages in qcow2_truncate
qemu-img resize has some limitations with qcow2, but the user is onlytold that "this image format does not support resize". Quite confusing,so add some more detailed error_report() calls and change "this image...
block/vmdk: Fix warning from splint (comparision of unsigned value)
l1_entry_sectors will never be less than 0.
Signed-off-by: Stefan Weil <sw@weilnetz.de>Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
qcow2: Fix build with DEBUG_EXT enabled
qcow2: Fix offset in qcow2_read_extensions
The spec says that the length of extensions is padded to 8 bytes, notthe offset. Currently this is the same because the header size is amultiple of 8, so this is only about compatibility with future changesto the header size....
qcow2: Reject too large header extensions
Image files that make qemu-img info read several gigabytes into theunknown header extensions list are bad. Just fail opening the imageif an extension claims to be larger than the header extension area.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>...
block: bdrv_eject(): Make eject_flag a real bool
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>Reviewed-by: Markus Armbruster <armbru@redhat.com>Acked-by: Kevin Wolf <kwolf@redhat.com>
vpc: Add support for Fixed Disk type
The Virtual Hard Disk Image Format Specification allows for threetypes of hard disk formats, Fixed, Dynamic, and Differencing. Qemucurrently only supports Dynamic disks. This patch adds support forthe Fixed Disk format....
vpc: Round up image size during fixed image creation
The geometry calculation algorithm from the VHD spec rounds the imagesize down if it doesn't exactly match a geometry. During imageconversion, this causes the image to be truncated. For dynamic images,...
qcow2: Update whole header at once
In order to switch the backing file, qcow2 issues multiple writerequests that only changed a part of the image header. Any failure afterthe first one would leave the header in an corrupted state. With thispatch, the whole header is written at once, so we can't fail in the...
qcow2: Keep unknown header extension when rewriting header
If we want header extensions to work as compatible extensions, we can'tdestroy yet unknown header extensions when rewriting the header (e.g.for changing the backing file). Save all unknown header extensions in a...
sheepdog: fix co_recv coroutine context
The co_recv coroutine has two things that will try to enter it:
1. The select(2) read callback on the sheepdog socket. 2. The aio_add_request() blocking operations, including a coroutine mutex.
This patch fixes it by setting NULL to co_recv before sending data....
qed: replace is_write with flags field
Per-request attributes like read/write are currently implemented as boolfields in the QEDAIOCB struct. This becomes unwiedly as the number ofattributes grows. For example, the qed_aio_setup() function would have...
qed: add .bdrv_co_write_zeroes() support
Zero writes are a dedicated interface for writing regions of zeroes intothe image file. If clusters are not yet allocated it is possible to usean efficient metadata representation which keeps the image file compact...
iSCSI: add configuration variables for iSCSI
This patch adds configuration variables for iSCSI to setinitiator-name to use when logging in to the target,which type of header-digest to negotiate with the targetand username and password for CHAP authentication....
block: add support for partial streaming
Add support for streaming data from an intermediate section of theimage chain (see patch and documentation for details).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>...
block/vdi: Zero unused parts when allocating a new block (fix #919242)
The new block was filled with zero when it was allocated by g_malloc0,but when it was reused later and only partially used, data from thepreviously allocated block were still present and written to the new...
qcow: Return real error code in qcow_open
Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com>Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qcow: Use bdrv functions to replace file operation
Since common file operation functions lack of error detection and usemuch more I/O syscalls, so change them to bdrv series functions andreduce I/O request.
Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com>...
block: add image streaming block job
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block: rate-limit streaming operations
This patch implements rate-limiting for image streaming. If we'veexceeded the bandwidth quota for a 100 ms time slice we sleep thecoroutine until the next slice begins.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>...
rbd: wire up snapshot removal and rollback functionality
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block: replace unchecked strdup/malloc/calloc with glib
Most of the codebase as been converted to use glib memory allocationfunctions. There are still a few instances of malloc/calloc in theblock layer and qemu-io. Replace them, especially since they do not...
prepare for future GPLv2+ relicensing
All files under GPLv2 will get GPLv2+ changes starting tomorrow.event_notifier.c and exec-obsolete.h were only ever touched by Red Hatemployees and can be relicensed now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>...
vvfat: avoid leaking file descriptor in commit_one_file()
Reported-by: Dr David Alan Gilbert <davidagilbert@uk.ibm.com>Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
move corking functions to osdep.c
nbd: switch to asynchronous operation
nbd: split requests
qemu-nbd has a limit of slightly less than 1M per request. Workaround this in the nbd block driver.
nbd: allow multiple in-flight requests
Allow sending up to 16 requests, and drive the replies to the coroutinethat did the request. The code is written to be exactly the same asbefore this patch when MAX_NBD_REQUESTS == 1 (modulo the extra mutexand state)....
nbd: add support for NBD_CMD_FLAG_FUA
nbd: add support for NBD_CMD_FLUSH
nbd: add support for NBD_CMD_TRIM
sheepdog: move coroutine send/recv function to generic code
Outside coroutines, avoid busy waiting on EAGAIN by temporarilymaking the socket blocking.
The API of qemu_recvv/qemu_sendv is slightly different fromdo_readv/do_writev because they do not handle coroutines. It...
qcow2: Allow >4 GB VM state
This is a compatible extension to the snapshot header format that allowssaving a 64 bit VM state size.
block/cow: Return real error code
block: qemu_aio_get does not return NULL
Initially done with the following semantic patch:
rule1 expression E;statement S;@@ E = qemu_aio_get (...);(- if (E == NULL) { ... } |- if (E) { <... S ...> })
rule1
which however missed occurrences in linux-aio.c and posix-aio-compat.c....
block/qcow2.c: call qcow2_free_snapshots in the function of qcow2_close
Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com>Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>Signed-off-by: Kevin Wolf <kwolf@redhat.com>
rbd: always set out parameter in qemu_rbd_snap_list
The caller expects psn_tab to be NULL when there are no snapshots oran error occurs. This results in calling g_free on an invalid address.
Reported-by: Oliver Francke <Oliver@filoo.de>Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>...
block: bdrv_aio_* do not return NULL
rule1 expression E;statement S;@@ E =( bdrv_aio_readv | bdrv_aio_writev | bdrv_aio_flush | bdrv_aio_discard | bdrv_aio_ioctl) (...);(- if (E == NULL) { ... }...
fix typo: delete redundant semicolon
Double semicolons should be single.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
coroutine: add qemu_co_queue_restart_all()
It's common to wake up all waiting coroutines. Introduce theqemu_co_queue_restart_all() function to do this instead of looping overqemu_co_queue_next() in every caller.
cow: use bdrv_co_is_allocated()
Now that bdrv_co_is_allocated() is available we can use it instead ofthe synchronous bdrv_is_allocated() interface. This is a follow-up thatKevin Wolf <kwolf@redhat.com> pointed out after applying the series thatintroduces bdrv_co_is_allocated()....
qed: convert to .bdrv_co_is_allocated()
The bdrv_qed_is_allocated() function is a synchronous wrapper aroundqed_find_cluster(), which performs the cluster lookup. In order toconvert the synchronous function to a coroutine function we yieldinstead of using qemu_aio_wait(). Note that QED's cache is already safe...
block: convert qcow2, qcow2, and vmdk to .bdrv_co_is_allocated()
The qcow2, qcow, and vmdk block drivers are based on coroutines. They have acoroutine mutex which protects internal state. We can convert the.bdrv_is_allocated() function to .bdrv_co_is_allocated() by holding the mutex...
vvfat: convert to .bdrv_co_is_allocated()
It is trivial to switch from the synchronous .bdrv_is_allocated()interface to .bdrv_co_is_allocated() since vvfat_is_allocated() does notblock.
vdi: convert to .bdrv_co_is_allocated()
It is trivial to switch from the synchronous .bdrv_is_allocated()interface to .bdrv_co_is_allocated() since vdi_is_allocated() does notblock.
cow: convert to .bdrv_co_is_allocated()
The cow block driver does not keep internal state for cluster lookups.This means it is safe to perform cluster lookups in coroutine contextwithout risk of race conditions that corrupt internal state.
qcow2: Fix order of refcount updates in qcow2_snapshot_goto
The refcount updates must be moved so that in the worst case we can getcluster leaks, but refcounts may never be too low.
qcow2: Fix order in qcow2_snapshot_delete
First the snapshot must be deleted and only then the refcounts can bedecreased.
qcow2: Fix error path in qcow2_snapshot_load_tmp
If the bdrv_read() of the snapshot's L1 table fails, return the righterror code and make sure that the old L1 table is still loaded and wedon't break the BlockDriverState completely.