root / docs / specs / qcow2.txt @ bf3f363a
History | View | Annotate | Download (13.6 kB)
1 |
== General == |
---|---|
2 |
|
3 |
A qcow2 image file is organized in units of constant size, which are called |
4 |
(host) clusters. A cluster is the unit in which all allocations are done, |
5 |
both for actual guest data and for image metadata. |
6 |
|
7 |
Likewise, the virtual disk as seen by the guest is divided into (guest) |
8 |
clusters of the same size. |
9 |
|
10 |
All numbers in qcow2 are stored in Big Endian byte order. |
11 |
|
12 |
|
13 |
== Header == |
14 |
|
15 |
The first cluster of a qcow2 image contains the file header: |
16 |
|
17 |
Byte 0 - 3: magic |
18 |
QCOW magic string ("QFI\xfb") |
19 |
|
20 |
4 - 7: version |
21 |
Version number (valid values are 2 and 3) |
22 |
|
23 |
8 - 15: backing_file_offset |
24 |
Offset into the image file at which the backing file name |
25 |
is stored (NB: The string is not null terminated). 0 if the |
26 |
image doesn't have a backing file. |
27 |
|
28 |
16 - 19: backing_file_size |
29 |
Length of the backing file name in bytes. Must not be |
30 |
longer than 1023 bytes. Undefined if the image doesn't have |
31 |
a backing file. |
32 |
|
33 |
20 - 23: cluster_bits |
34 |
Number of bits that are used for addressing an offset |
35 |
within a cluster (1 << cluster_bits is the cluster size). |
36 |
Must not be less than 9 (i.e. 512 byte clusters). |
37 |
|
38 |
Note: qemu as of today has an implementation limit of 2 MB |
39 |
as the maximum cluster size and won't be able to open images |
40 |
with larger cluster sizes. |
41 |
|
42 |
24 - 31: size |
43 |
Virtual disk size in bytes |
44 |
|
45 |
32 - 35: crypt_method |
46 |
0 for no encryption |
47 |
1 for AES encryption |
48 |
|
49 |
36 - 39: l1_size |
50 |
Number of entries in the active L1 table |
51 |
|
52 |
40 - 47: l1_table_offset |
53 |
Offset into the image file at which the active L1 table |
54 |
starts. Must be aligned to a cluster boundary. |
55 |
|
56 |
48 - 55: refcount_table_offset |
57 |
Offset into the image file at which the refcount table |
58 |
starts. Must be aligned to a cluster boundary. |
59 |
|
60 |
56 - 59: refcount_table_clusters |
61 |
Number of clusters that the refcount table occupies |
62 |
|
63 |
60 - 63: nb_snapshots |
64 |
Number of snapshots contained in the image |
65 |
|
66 |
64 - 71: snapshots_offset |
67 |
Offset into the image file at which the snapshot table |
68 |
starts. Must be aligned to a cluster boundary. |
69 |
|
70 |
If the version is 3 or higher, the header has the following additional fields. |
71 |
For version 2, the values are assumed to be zero, unless specified otherwise |
72 |
in the description of a field. |
73 |
|
74 |
72 - 79: incompatible_features |
75 |
Bitmask of incompatible features. An implementation must |
76 |
fail to open an image if an unknown bit is set. |
77 |
|
78 |
Bits 0-63: Reserved (set to 0) |
79 |
|
80 |
80 - 87: compatible_features |
81 |
Bitmask of compatible features. An implementation can |
82 |
safely ignore any unknown bits that are set. |
83 |
|
84 |
Bits 0-63: Reserved (set to 0) |
85 |
|
86 |
88 - 95: autoclear_features |
87 |
Bitmask of auto-clear features. An implementation may only |
88 |
write to an image with unknown auto-clear features if it |
89 |
clears the respective bits from this field first. |
90 |
|
91 |
Bits 0-63: Reserved (set to 0) |
92 |
|
93 |
96 - 99: refcount_order |
94 |
Describes the width of a reference count block entry (width |
95 |
in bits = 1 << refcount_order). For version 2 images, the |
96 |
order is always assumed to be 4 (i.e. the width is 16 bits). |
97 |
|
98 |
100 - 103: header_length |
99 |
Length of the header structure in bytes. For version 2 |
100 |
images, the length is always assumed to be 72 bytes. |
101 |
|
102 |
Directly after the image header, optional sections called header extensions can |
103 |
be stored. Each extension has a structure like the following: |
104 |
|
105 |
Byte 0 - 3: Header extension type: |
106 |
0x00000000 - End of the header extension area |
107 |
0xE2792ACA - Backing file format name |
108 |
0x6803f857 - Feature name table |
109 |
other - Unknown header extension, can be safely |
110 |
ignored |
111 |
|
112 |
4 - 7: Length of the header extension data |
113 |
|
114 |
8 - n: Header extension data |
115 |
|
116 |
n - m: Padding to round up the header extension size to the next |
117 |
multiple of 8. |
118 |
|
119 |
Unless stated otherwise, each header extension type shall appear at most once |
120 |
in the same image. |
121 |
|
122 |
The remaining space between the end of the header extension area and the end of |
123 |
the first cluster can be used for the backing file name. It is not allowed to |
124 |
store other data here, so that an implementation can safely modify the header |
125 |
and add extensions without harming data of compatible features that it |
126 |
doesn't support. Compatible features that need space for additional data can |
127 |
use a header extension. |
128 |
|
129 |
|
130 |
== Feature name table == |
131 |
|
132 |
The feature name table is an optional header extension that contains the name |
133 |
for features used by the image. It can be used by applications that don't know |
134 |
the respective feature (e.g. because the feature was introduced only later) to |
135 |
display a useful error message. |
136 |
|
137 |
The number of entries in the feature name table is determined by the length of |
138 |
the header extension data. Each entry look like this: |
139 |
|
140 |
Byte 0: Type of feature (select feature bitmap) |
141 |
0: Incompatible feature |
142 |
1: Compatible feature |
143 |
2: Autoclear feature |
144 |
|
145 |
1: Bit number within the selected feature bitmap (valid |
146 |
values: 0-63) |
147 |
|
148 |
2 - 47: Feature name (padded with zeros, but not necessarily null |
149 |
terminated if it has full length) |
150 |
|
151 |
|
152 |
== Host cluster management == |
153 |
|
154 |
qcow2 manages the allocation of host clusters by maintaining a reference count |
155 |
for each host cluster. A refcount of 0 means that the cluster is free, 1 means |
156 |
that it is used, and >= 2 means that it is used and any write access must |
157 |
perform a COW (copy on write) operation. |
158 |
|
159 |
The refcounts are managed in a two-level table. The first level is called |
160 |
refcount table and has a variable size (which is stored in the header). The |
161 |
refcount table can cover multiple clusters, however it needs to be contiguous |
162 |
in the image file. |
163 |
|
164 |
It contains pointers to the second level structures which are called refcount |
165 |
blocks and are exactly one cluster in size. |
166 |
|
167 |
Given a offset into the image file, the refcount of its cluster can be obtained |
168 |
as follows: |
169 |
|
170 |
refcount_block_entries = (cluster_size / sizeof(uint16_t)) |
171 |
|
172 |
refcount_block_index = (offset / cluster_size) % refcount_block_entries |
173 |
refcount_table_index = (offset / cluster_size) / refcount_block_entries |
174 |
|
175 |
refcount_block = load_cluster(refcount_table[refcount_table_index]); |
176 |
return refcount_block[refcount_block_index]; |
177 |
|
178 |
Refcount table entry: |
179 |
|
180 |
Bit 0 - 8: Reserved (set to 0) |
181 |
|
182 |
9 - 63: Bits 9-63 of the offset into the image file at which the |
183 |
refcount block starts. Must be aligned to a cluster |
184 |
boundary. |
185 |
|
186 |
If this is 0, the corresponding refcount block has not yet |
187 |
been allocated. All refcounts managed by this refcount block |
188 |
are 0. |
189 |
|
190 |
Refcount block entry (x = refcount_bits - 1): |
191 |
|
192 |
Bit 0 - x: Reference count of the cluster. If refcount_bits implies a |
193 |
sub-byte width, note that bit 0 means the least significant |
194 |
bit in this context. |
195 |
|
196 |
|
197 |
== Cluster mapping == |
198 |
|
199 |
Just as for refcounts, qcow2 uses a two-level structure for the mapping of |
200 |
guest clusters to host clusters. They are called L1 and L2 table. |
201 |
|
202 |
The L1 table has a variable size (stored in the header) and may use multiple |
203 |
clusters, however it must be contiguous in the image file. L2 tables are |
204 |
exactly one cluster in size. |
205 |
|
206 |
Given a offset into the virtual disk, the offset into the image file can be |
207 |
obtained as follows: |
208 |
|
209 |
l2_entries = (cluster_size / sizeof(uint64_t)) |
210 |
|
211 |
l2_index = (offset / cluster_size) % l2_entries |
212 |
l1_index = (offset / cluster_size) / l2_entries |
213 |
|
214 |
l2_table = load_cluster(l1_table[l1_index]); |
215 |
cluster_offset = l2_table[l2_index]; |
216 |
|
217 |
return cluster_offset + (offset % cluster_size) |
218 |
|
219 |
L1 table entry: |
220 |
|
221 |
Bit 0 - 8: Reserved (set to 0) |
222 |
|
223 |
9 - 55: Bits 9-55 of the offset into the image file at which the L2 |
224 |
table starts. Must be aligned to a cluster boundary. If the |
225 |
offset is 0, the L2 table and all clusters described by this |
226 |
L2 table are unallocated. |
227 |
|
228 |
56 - 62: Reserved (set to 0) |
229 |
|
230 |
63: 0 for an L2 table that is unused or requires COW, 1 if its |
231 |
refcount is exactly one. This information is only accurate |
232 |
in the active L1 table. |
233 |
|
234 |
L2 table entry: |
235 |
|
236 |
Bit 0 - 61: Cluster descriptor |
237 |
|
238 |
62: 0 for standard clusters |
239 |
1 for compressed clusters |
240 |
|
241 |
63: 0 for a cluster that is unused or requires COW, 1 if its |
242 |
refcount is exactly one. This information is only accurate |
243 |
in L2 tables that are reachable from the the active L1 |
244 |
table. |
245 |
|
246 |
Standard Cluster Descriptor: |
247 |
|
248 |
Bit 0: If set to 1, the cluster reads as all zeros. The host |
249 |
cluster offset can be used to describe a preallocation, |
250 |
but it won't be used for reading data from this cluster, |
251 |
nor is data read from the backing file if the cluster is |
252 |
unallocated. |
253 |
|
254 |
With version 2, this is always 0. |
255 |
|
256 |
1 - 8: Reserved (set to 0) |
257 |
|
258 |
9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a |
259 |
cluster boundary. If the offset is 0, the cluster is |
260 |
unallocated. |
261 |
|
262 |
56 - 61: Reserved (set to 0) |
263 |
|
264 |
|
265 |
Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)): |
266 |
|
267 |
Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a |
268 |
cluster boundary! |
269 |
|
270 |
x+1 - 61: Compressed size of the images in sectors of 512 bytes |
271 |
|
272 |
If a cluster is unallocated, read requests shall read the data from the backing |
273 |
file (except if bit 0 in the Standard Cluster Descriptor is set). If there is |
274 |
no backing file or the backing file is smaller than the image, they shall read |
275 |
zeros for all parts that are not covered by the backing file. |
276 |
|
277 |
|
278 |
== Snapshots == |
279 |
|
280 |
qcow2 supports internal snapshots. Their basic principle of operation is to |
281 |
switch the active L1 table, so that a different set of host clusters are |
282 |
exposed to the guest. |
283 |
|
284 |
When creating a snapshot, the L1 table should be copied and the refcount of all |
285 |
L2 tables and clusters reachable from this L1 table must be increased, so that |
286 |
a write causes a COW and isn't visible in other snapshots. |
287 |
|
288 |
When loading a snapshot, bit 63 of all entries in the new active L1 table and |
289 |
all L2 tables referenced by it must be reconstructed from the refcount table |
290 |
as it doesn't need to be accurate in inactive L1 tables. |
291 |
|
292 |
A directory of all snapshots is stored in the snapshot table, a contiguous area |
293 |
in the image file, whose starting offset and length are given by the header |
294 |
fields snapshots_offset and nb_snapshots. The entries of the snapshot table |
295 |
have variable length, depending on the length of ID, name and extra data. |
296 |
|
297 |
Snapshot table entry: |
298 |
|
299 |
Byte 0 - 7: Offset into the image file at which the L1 table for the |
300 |
snapshot starts. Must be aligned to a cluster boundary. |
301 |
|
302 |
8 - 11: Number of entries in the L1 table of the snapshots |
303 |
|
304 |
12 - 13: Length of the unique ID string describing the snapshot |
305 |
|
306 |
14 - 15: Length of the name of the snapshot |
307 |
|
308 |
16 - 19: Time at which the snapshot was taken in seconds since the |
309 |
Epoch |
310 |
|
311 |
20 - 23: Subsecond part of the time at which the snapshot was taken |
312 |
in nanoseconds |
313 |
|
314 |
24 - 31: Time that the guest was running until the snapshot was |
315 |
taken in nanoseconds |
316 |
|
317 |
32 - 35: Size of the VM state in bytes. 0 if no VM state is saved. |
318 |
If there is VM state, it starts at the first cluster |
319 |
described by first L1 table entry that doesn't describe a |
320 |
regular guest cluster (i.e. VM state is stored like guest |
321 |
disk content, except that it is stored at offsets that are |
322 |
larger than the virtual disk presented to the guest) |
323 |
|
324 |
36 - 39: Size of extra data in the table entry (used for future |
325 |
extensions of the format) |
326 |
|
327 |
variable: Extra data for future extensions. Unknown fields must be |
328 |
ignored. Currently defined are (offset relative to snapshot |
329 |
table entry): |
330 |
|
331 |
Byte 40 - 47: Size of the VM state in bytes. 0 if no VM |
332 |
state is saved. If this field is present, |
333 |
the 32-bit value in bytes 32-35 is ignored. |
334 |
|
335 |
Byte 48 - 55: Virtual disk size of the snapshot in bytes |
336 |
|
337 |
Version 3 images must include extra data at least up to |
338 |
byte 55. |
339 |
|
340 |
variable: Unique ID string for the snapshot (not null terminated) |
341 |
|
342 |
variable: Name of the snapshot (not null terminated) |