root / docs / specs / qcow2.txt @ 03feae73
History | View | Annotate | Download (10 kB)
1 | 03feae73 | Kevin Wolf | == General == |
---|---|---|---|
2 | 03feae73 | Kevin Wolf | |
3 | 03feae73 | Kevin Wolf | A qcow2 image file is organized in units of constant size, which are called |
4 | 03feae73 | Kevin Wolf | (host) clusters. A cluster is the unit in which all allocations are done, |
5 | 03feae73 | Kevin Wolf | both for actual guest data and for image metadata. |
6 | 03feae73 | Kevin Wolf | |
7 | 03feae73 | Kevin Wolf | Likewise, the virtual disk as seen by the guest is divided into (guest) |
8 | 03feae73 | Kevin Wolf | clusters of the same size. |
9 | 03feae73 | Kevin Wolf | |
10 | 03feae73 | Kevin Wolf | All numbers in qcow2 are stored in Big Endian byte order. |
11 | 03feae73 | Kevin Wolf | |
12 | 03feae73 | Kevin Wolf | |
13 | 03feae73 | Kevin Wolf | == Header == |
14 | 03feae73 | Kevin Wolf | |
15 | 03feae73 | Kevin Wolf | The first cluster of a qcow2 image contains the file header: |
16 | 03feae73 | Kevin Wolf | |
17 | 03feae73 | Kevin Wolf | Byte 0 - 3: magic |
18 | 03feae73 | Kevin Wolf | QCOW magic string ("QFI\xfb") |
19 | 03feae73 | Kevin Wolf | |
20 | 03feae73 | Kevin Wolf | 4 - 7: version |
21 | 03feae73 | Kevin Wolf | Version number (only valid value is 2) |
22 | 03feae73 | Kevin Wolf | |
23 | 03feae73 | Kevin Wolf | 8 - 15: backing_file_offset |
24 | 03feae73 | Kevin Wolf | Offset into the image file at which the backing file name |
25 | 03feae73 | Kevin Wolf | is stored (NB: The string is not null terminated). 0 if the |
26 | 03feae73 | Kevin Wolf | image doesn't have a backing file. |
27 | 03feae73 | Kevin Wolf | |
28 | 03feae73 | Kevin Wolf | 16 - 19: backing_file_size |
29 | 03feae73 | Kevin Wolf | Length of the backing file name in bytes. Must not be |
30 | 03feae73 | Kevin Wolf | longer than 1023 bytes. Undefined if the image doesn't have |
31 | 03feae73 | Kevin Wolf | a backing file. |
32 | 03feae73 | Kevin Wolf | |
33 | 03feae73 | Kevin Wolf | 20 - 23: cluster_bits |
34 | 03feae73 | Kevin Wolf | Number of bits that are used for addressing an offset |
35 | 03feae73 | Kevin Wolf | within a cluster (1 << cluster_bits is the cluster size). |
36 | 03feae73 | Kevin Wolf | Must not be less than 9 (i.e. 512 byte clusters). |
37 | 03feae73 | Kevin Wolf | |
38 | 03feae73 | Kevin Wolf | Note: qemu as of today has an implementation limit of 2 MB |
39 | 03feae73 | Kevin Wolf | as the maximum cluster size and won't be able to open images |
40 | 03feae73 | Kevin Wolf | with larger cluster sizes. |
41 | 03feae73 | Kevin Wolf | |
42 | 03feae73 | Kevin Wolf | 24 - 31: size |
43 | 03feae73 | Kevin Wolf | Virtual disk size in bytes |
44 | 03feae73 | Kevin Wolf | |
45 | 03feae73 | Kevin Wolf | 32 - 35: crypt_method |
46 | 03feae73 | Kevin Wolf | 0 for no encryption |
47 | 03feae73 | Kevin Wolf | 1 for AES encryption |
48 | 03feae73 | Kevin Wolf | |
49 | 03feae73 | Kevin Wolf | 36 - 39: l1_size |
50 | 03feae73 | Kevin Wolf | Number of entries in the active L1 table |
51 | 03feae73 | Kevin Wolf | |
52 | 03feae73 | Kevin Wolf | 40 - 47: l1_table_offset |
53 | 03feae73 | Kevin Wolf | Offset into the image file at which the active L1 table |
54 | 03feae73 | Kevin Wolf | starts. Must be aligned to a cluster boundary. |
55 | 03feae73 | Kevin Wolf | |
56 | 03feae73 | Kevin Wolf | 48 - 55: refcount_table_offset |
57 | 03feae73 | Kevin Wolf | Offset into the image file at which the refcount table |
58 | 03feae73 | Kevin Wolf | starts. Must be aligned to a cluster boundary. |
59 | 03feae73 | Kevin Wolf | |
60 | 03feae73 | Kevin Wolf | 56 - 59: refcount_table_clusters |
61 | 03feae73 | Kevin Wolf | Number of clusters that the refcount table occupies |
62 | 03feae73 | Kevin Wolf | |
63 | 03feae73 | Kevin Wolf | 60 - 63: nb_snapshots |
64 | 03feae73 | Kevin Wolf | Number of snapshots contained in the image |
65 | 03feae73 | Kevin Wolf | |
66 | 03feae73 | Kevin Wolf | 64 - 71: snapshots_offset |
67 | 03feae73 | Kevin Wolf | Offset into the image file at which the snapshot table |
68 | 03feae73 | Kevin Wolf | starts. Must be aligned to a cluster boundary. |
69 | 03feae73 | Kevin Wolf | |
70 | 03feae73 | Kevin Wolf | Directly after the image header, optional sections called header extensions can |
71 | 03feae73 | Kevin Wolf | be stored. Each extension has a structure like the following: |
72 | 03feae73 | Kevin Wolf | |
73 | 03feae73 | Kevin Wolf | Byte 0 - 3: Header extension type: |
74 | 03feae73 | Kevin Wolf | 0x00000000 - End of the header extension area |
75 | 03feae73 | Kevin Wolf | 0xE2792ACA - Backing file format name |
76 | 03feae73 | Kevin Wolf | other - Unknown header extension, can be safely |
77 | 03feae73 | Kevin Wolf | ignored |
78 | 03feae73 | Kevin Wolf | |
79 | 03feae73 | Kevin Wolf | 4 - 7: Length of the header extension data |
80 | 03feae73 | Kevin Wolf | |
81 | 03feae73 | Kevin Wolf | 8 - n: Header extension data |
82 | 03feae73 | Kevin Wolf | |
83 | 03feae73 | Kevin Wolf | n - m: Padding to round up the header extension size to the next |
84 | 03feae73 | Kevin Wolf | multiple of 8. |
85 | 03feae73 | Kevin Wolf | |
86 | 03feae73 | Kevin Wolf | The remaining space between the end of the header extension area and the end of |
87 | 03feae73 | Kevin Wolf | the first cluster can be used for other data. Usually, the backing file name is |
88 | 03feae73 | Kevin Wolf | stored there. |
89 | 03feae73 | Kevin Wolf | |
90 | 03feae73 | Kevin Wolf | |
91 | 03feae73 | Kevin Wolf | == Host cluster management == |
92 | 03feae73 | Kevin Wolf | |
93 | 03feae73 | Kevin Wolf | qcow2 manages the allocation of host clusters by maintaining a reference count |
94 | 03feae73 | Kevin Wolf | for each host cluster. A refcount of 0 means that the cluster is free, 1 means |
95 | 03feae73 | Kevin Wolf | that it is used, and >= 2 means that it is used and any write access must |
96 | 03feae73 | Kevin Wolf | perform a COW (copy on write) operation. |
97 | 03feae73 | Kevin Wolf | |
98 | 03feae73 | Kevin Wolf | The refcounts are managed in a two-level table. The first level is called |
99 | 03feae73 | Kevin Wolf | refcount table and has a variable size (which is stored in the header). The |
100 | 03feae73 | Kevin Wolf | refcount table can cover multiple clusters, however it needs to be contiguous |
101 | 03feae73 | Kevin Wolf | in the image file. |
102 | 03feae73 | Kevin Wolf | |
103 | 03feae73 | Kevin Wolf | It contains pointers to the second level structures which are called refcount |
104 | 03feae73 | Kevin Wolf | blocks and are exactly one cluster in size. |
105 | 03feae73 | Kevin Wolf | |
106 | 03feae73 | Kevin Wolf | Given a offset into the image file, the refcount of its cluster can be obtained |
107 | 03feae73 | Kevin Wolf | as follows: |
108 | 03feae73 | Kevin Wolf | |
109 | 03feae73 | Kevin Wolf | refcount_block_entries = (cluster_size / sizeof(uint16_t)) |
110 | 03feae73 | Kevin Wolf | |
111 | 03feae73 | Kevin Wolf | refcount_block_index = (offset / cluster_size) % refcount_table_entries |
112 | 03feae73 | Kevin Wolf | refcount_table_index = (offset / cluster_size) / refcount_table_entries |
113 | 03feae73 | Kevin Wolf | |
114 | 03feae73 | Kevin Wolf | refcount_block = load_cluster(refcount_table[refcount_table_index]); |
115 | 03feae73 | Kevin Wolf | return refcount_block[refcount_block_index]; |
116 | 03feae73 | Kevin Wolf | |
117 | 03feae73 | Kevin Wolf | Refcount table entry: |
118 | 03feae73 | Kevin Wolf | |
119 | 03feae73 | Kevin Wolf | Bit 0 - 8: Reserved (set to 0) |
120 | 03feae73 | Kevin Wolf | |
121 | 03feae73 | Kevin Wolf | 9 - 63: Bits 9-63 of the offset into the image file at which the |
122 | 03feae73 | Kevin Wolf | refcount block starts. Must be aligned to a cluster |
123 | 03feae73 | Kevin Wolf | boundary. |
124 | 03feae73 | Kevin Wolf | |
125 | 03feae73 | Kevin Wolf | If this is 0, the corresponding refcount block has not yet |
126 | 03feae73 | Kevin Wolf | been allocated. All refcounts managed by this refcount block |
127 | 03feae73 | Kevin Wolf | are 0. |
128 | 03feae73 | Kevin Wolf | |
129 | 03feae73 | Kevin Wolf | Refcount block entry: |
130 | 03feae73 | Kevin Wolf | |
131 | 03feae73 | Kevin Wolf | Bit 0 - 15: Reference count of the cluster |
132 | 03feae73 | Kevin Wolf | |
133 | 03feae73 | Kevin Wolf | |
134 | 03feae73 | Kevin Wolf | == Cluster mapping == |
135 | 03feae73 | Kevin Wolf | |
136 | 03feae73 | Kevin Wolf | Just as for refcounts, qcow2 uses a two-level structure for the mapping of |
137 | 03feae73 | Kevin Wolf | guest clusters to host clusters. They are called L1 and L2 table. |
138 | 03feae73 | Kevin Wolf | |
139 | 03feae73 | Kevin Wolf | The L1 table has a variable size (stored in the header) and may use multiple |
140 | 03feae73 | Kevin Wolf | clusters, however it must be contiguous in the image file. L2 tables are |
141 | 03feae73 | Kevin Wolf | exactly one cluster in size. |
142 | 03feae73 | Kevin Wolf | |
143 | 03feae73 | Kevin Wolf | Given a offset into the virtual disk, the offset into the image file can be |
144 | 03feae73 | Kevin Wolf | obtained as follows: |
145 | 03feae73 | Kevin Wolf | |
146 | 03feae73 | Kevin Wolf | l2_entries = (cluster_size / sizeof(uint64_t)) |
147 | 03feae73 | Kevin Wolf | |
148 | 03feae73 | Kevin Wolf | l2_index = (offset / cluster_size) % l2_entries |
149 | 03feae73 | Kevin Wolf | l1_index = (offset / cluster_size) / l2_entries |
150 | 03feae73 | Kevin Wolf | |
151 | 03feae73 | Kevin Wolf | l2_table = load_cluster(l1_table[l1_index]); |
152 | 03feae73 | Kevin Wolf | cluster_offset = l2_table[l2_index]; |
153 | 03feae73 | Kevin Wolf | |
154 | 03feae73 | Kevin Wolf | return cluster_offset + (offset % cluster_size) |
155 | 03feae73 | Kevin Wolf | |
156 | 03feae73 | Kevin Wolf | L1 table entry: |
157 | 03feae73 | Kevin Wolf | |
158 | 03feae73 | Kevin Wolf | Bit 0 - 8: Reserved (set to 0) |
159 | 03feae73 | Kevin Wolf | |
160 | 03feae73 | Kevin Wolf | 9 - 55: Bits 9-55 of the offset into the image file at which the L2 |
161 | 03feae73 | Kevin Wolf | table starts. Must be aligned to a cluster boundary. If the |
162 | 03feae73 | Kevin Wolf | offset is 0, the L2 table and all clusters described by this |
163 | 03feae73 | Kevin Wolf | L2 table are unallocated. |
164 | 03feae73 | Kevin Wolf | |
165 | 03feae73 | Kevin Wolf | 56 - 62: Reserved (set to 0) |
166 | 03feae73 | Kevin Wolf | |
167 | 03feae73 | Kevin Wolf | 63: 0 for an L2 table that is unused or requires COW, 1 if its |
168 | 03feae73 | Kevin Wolf | refcount is exactly one. This information is only accurate |
169 | 03feae73 | Kevin Wolf | in the active L1 table. |
170 | 03feae73 | Kevin Wolf | |
171 | 03feae73 | Kevin Wolf | L2 table entry (for normal clusters): |
172 | 03feae73 | Kevin Wolf | |
173 | 03feae73 | Kevin Wolf | Bit 0 - 8: Reserved (set to 0) |
174 | 03feae73 | Kevin Wolf | |
175 | 03feae73 | Kevin Wolf | 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a |
176 | 03feae73 | Kevin Wolf | cluster boundary. If the offset is 0, the cluster is |
177 | 03feae73 | Kevin Wolf | unallocated. |
178 | 03feae73 | Kevin Wolf | |
179 | 03feae73 | Kevin Wolf | 56 - 61: Reserved (set to 0) |
180 | 03feae73 | Kevin Wolf | |
181 | 03feae73 | Kevin Wolf | 62: 0 (this cluster is not compressed) |
182 | 03feae73 | Kevin Wolf | |
183 | 03feae73 | Kevin Wolf | 63: 0 for a cluster that is unused or requires COW, 1 if its |
184 | 03feae73 | Kevin Wolf | refcount is exactly one. This information is only accurate |
185 | 03feae73 | Kevin Wolf | in L2 tables that are reachable from the the active L1 |
186 | 03feae73 | Kevin Wolf | table. |
187 | 03feae73 | Kevin Wolf | |
188 | 03feae73 | Kevin Wolf | L2 table entry (for compressed clusters; x = 62 - (cluster_size - 8)): |
189 | 03feae73 | Kevin Wolf | |
190 | 03feae73 | Kevin Wolf | Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a |
191 | 03feae73 | Kevin Wolf | cluster boundary! |
192 | 03feae73 | Kevin Wolf | |
193 | 03feae73 | Kevin Wolf | x+1 - 61: Compressed size of the images in sectors of 512 bytes |
194 | 03feae73 | Kevin Wolf | |
195 | 03feae73 | Kevin Wolf | 62: 1 (this cluster is compressed using zlib) |
196 | 03feae73 | Kevin Wolf | |
197 | 03feae73 | Kevin Wolf | 63: 0 for a cluster that is unused or requires COW, 1 if its |
198 | 03feae73 | Kevin Wolf | refcount is exactly one. This information is only accurate |
199 | 03feae73 | Kevin Wolf | in L2 tables that are reachable from the the active L1 |
200 | 03feae73 | Kevin Wolf | table. |
201 | 03feae73 | Kevin Wolf | |
202 | 03feae73 | Kevin Wolf | If a cluster is unallocated, read requests shall read the data from the backing |
203 | 03feae73 | Kevin Wolf | file. If there is no backing file or the backing file is smaller than the image, |
204 | 03feae73 | Kevin Wolf | they shall read zeros for all parts that are not covered by the backing file. |
205 | 03feae73 | Kevin Wolf | |
206 | 03feae73 | Kevin Wolf | |
207 | 03feae73 | Kevin Wolf | == Snapshots == |
208 | 03feae73 | Kevin Wolf | |
209 | 03feae73 | Kevin Wolf | qcow2 supports internal snapshots. Their basic principle of operation is to |
210 | 03feae73 | Kevin Wolf | switch the active L1 table, so that a different set of host clusters are |
211 | 03feae73 | Kevin Wolf | exposed to the guest. |
212 | 03feae73 | Kevin Wolf | |
213 | 03feae73 | Kevin Wolf | When creating a snapshot, the L1 table should be copied and the refcount of all |
214 | 03feae73 | Kevin Wolf | L2 tables and clusters reachable form this L1 table must be increased, so that |
215 | 03feae73 | Kevin Wolf | a write causes a COW and isn't visible in other snapshots. |
216 | 03feae73 | Kevin Wolf | |
217 | 03feae73 | Kevin Wolf | When loading a snapshot, bit 63 of all entries in the new active L1 table and |
218 | 03feae73 | Kevin Wolf | all L2 tables referenced by it must be reconstructed from the refcount table |
219 | 03feae73 | Kevin Wolf | as it doesn't need to be accurate in inactive L1 tables. |
220 | 03feae73 | Kevin Wolf | |
221 | 03feae73 | Kevin Wolf | A directory of all snapshots is stored in the snapshot table, a contiguous area |
222 | 03feae73 | Kevin Wolf | in the image file, whose starting offset and length are given by the header |
223 | 03feae73 | Kevin Wolf | fields snapshots_offset and nb_snapshots. The entries of the snapshot table |
224 | 03feae73 | Kevin Wolf | have variable length, depending on the length of ID, name and extra data. |
225 | 03feae73 | Kevin Wolf | |
226 | 03feae73 | Kevin Wolf | Snapshot table entry: |
227 | 03feae73 | Kevin Wolf | |
228 | 03feae73 | Kevin Wolf | Byte 0 - 7: Offset into the image file at which the L1 table for the |
229 | 03feae73 | Kevin Wolf | snapshot starts. Must be aligned to a cluster boundary. |
230 | 03feae73 | Kevin Wolf | |
231 | 03feae73 | Kevin Wolf | 8 - 11: Number of entries in the L1 table of the snapshots |
232 | 03feae73 | Kevin Wolf | |
233 | 03feae73 | Kevin Wolf | 12 - 13: Length of the unique ID string describing the snapshot |
234 | 03feae73 | Kevin Wolf | |
235 | 03feae73 | Kevin Wolf | 14 - 15: Length of the name of the snapshot |
236 | 03feae73 | Kevin Wolf | |
237 | 03feae73 | Kevin Wolf | 16 - 19: Time at which the snapshot was taken in seconds since the |
238 | 03feae73 | Kevin Wolf | Epoch |
239 | 03feae73 | Kevin Wolf | |
240 | 03feae73 | Kevin Wolf | 20 - 23: Subsecond part of the time at which the snapshot was taken |
241 | 03feae73 | Kevin Wolf | in nanoseconds |
242 | 03feae73 | Kevin Wolf | |
243 | 03feae73 | Kevin Wolf | 24 - 31: Time that the guest was running until the snapshot was |
244 | 03feae73 | Kevin Wolf | taken in nanoseconds |
245 | 03feae73 | Kevin Wolf | |
246 | 03feae73 | Kevin Wolf | 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved. |
247 | 03feae73 | Kevin Wolf | If there is VM state, it starts at the first cluster |
248 | 03feae73 | Kevin Wolf | described by first L1 table entry that doesn't describe a |
249 | 03feae73 | Kevin Wolf | regular guest cluster (i.e. VM state is stored like guest |
250 | 03feae73 | Kevin Wolf | disk content, except that it is stored at offsets that are |
251 | 03feae73 | Kevin Wolf | larger than the virtual disk presented to the guest) |
252 | 03feae73 | Kevin Wolf | |
253 | 03feae73 | Kevin Wolf | 36 - 39: Size of extra data in the table entry (used for future |
254 | 03feae73 | Kevin Wolf | extensions of the format) |
255 | 03feae73 | Kevin Wolf | |
256 | 03feae73 | Kevin Wolf | variable: Extra data for future extensions. Must be ignored. |
257 | 03feae73 | Kevin Wolf | |
258 | 03feae73 | Kevin Wolf | variable: Unique ID string for the snapshot (not null terminated) |
259 | 03feae73 | Kevin Wolf | |
260 | 03feae73 | Kevin Wolf | variable: Name of the snapshot (not null terminated) |