Statistics
| Branch: | Revision:

root / docs / specs / qcow2.txt @ 03feae73

History | View | Annotate | Download (10 kB)

1 03feae73 Kevin Wolf
== General ==
2 03feae73 Kevin Wolf
3 03feae73 Kevin Wolf
A qcow2 image file is organized in units of constant size, which are called
4 03feae73 Kevin Wolf
(host) clusters. A cluster is the unit in which all allocations are done,
5 03feae73 Kevin Wolf
both for actual guest data and for image metadata.
6 03feae73 Kevin Wolf
7 03feae73 Kevin Wolf
Likewise, the virtual disk as seen by the guest is divided into (guest)
8 03feae73 Kevin Wolf
clusters of the same size.
9 03feae73 Kevin Wolf
10 03feae73 Kevin Wolf
All numbers in qcow2 are stored in Big Endian byte order.
11 03feae73 Kevin Wolf
12 03feae73 Kevin Wolf
13 03feae73 Kevin Wolf
== Header ==
14 03feae73 Kevin Wolf
15 03feae73 Kevin Wolf
The first cluster of a qcow2 image contains the file header:
16 03feae73 Kevin Wolf
17 03feae73 Kevin Wolf
    Byte  0 -  3:   magic
18 03feae73 Kevin Wolf
                    QCOW magic string ("QFI\xfb")
19 03feae73 Kevin Wolf
20 03feae73 Kevin Wolf
          4 -  7:   version
21 03feae73 Kevin Wolf
                    Version number (only valid value is 2)
22 03feae73 Kevin Wolf
23 03feae73 Kevin Wolf
          8 - 15:   backing_file_offset
24 03feae73 Kevin Wolf
                    Offset into the image file at which the backing file name
25 03feae73 Kevin Wolf
                    is stored (NB: The string is not null terminated). 0 if the
26 03feae73 Kevin Wolf
                    image doesn't have a backing file.
27 03feae73 Kevin Wolf
28 03feae73 Kevin Wolf
         16 - 19:   backing_file_size
29 03feae73 Kevin Wolf
                    Length of the backing file name in bytes. Must not be
30 03feae73 Kevin Wolf
                    longer than 1023 bytes. Undefined if the image doesn't have
31 03feae73 Kevin Wolf
                    a backing file.
32 03feae73 Kevin Wolf
33 03feae73 Kevin Wolf
         20 - 23:   cluster_bits
34 03feae73 Kevin Wolf
                    Number of bits that are used for addressing an offset
35 03feae73 Kevin Wolf
                    within a cluster (1 << cluster_bits is the cluster size).
36 03feae73 Kevin Wolf
                    Must not be less than 9 (i.e. 512 byte clusters).
37 03feae73 Kevin Wolf
38 03feae73 Kevin Wolf
                    Note: qemu as of today has an implementation limit of 2 MB
39 03feae73 Kevin Wolf
                    as the maximum cluster size and won't be able to open images
40 03feae73 Kevin Wolf
                    with larger cluster sizes.
41 03feae73 Kevin Wolf
42 03feae73 Kevin Wolf
         24 - 31:   size
43 03feae73 Kevin Wolf
                    Virtual disk size in bytes
44 03feae73 Kevin Wolf
45 03feae73 Kevin Wolf
         32 - 35:   crypt_method
46 03feae73 Kevin Wolf
                    0 for no encryption
47 03feae73 Kevin Wolf
                    1 for AES encryption
48 03feae73 Kevin Wolf
49 03feae73 Kevin Wolf
         36 - 39:   l1_size
50 03feae73 Kevin Wolf
                    Number of entries in the active L1 table
51 03feae73 Kevin Wolf
52 03feae73 Kevin Wolf
         40 - 47:   l1_table_offset
53 03feae73 Kevin Wolf
                    Offset into the image file at which the active L1 table
54 03feae73 Kevin Wolf
                    starts. Must be aligned to a cluster boundary.
55 03feae73 Kevin Wolf
56 03feae73 Kevin Wolf
         48 - 55:   refcount_table_offset
57 03feae73 Kevin Wolf
                    Offset into the image file at which the refcount table
58 03feae73 Kevin Wolf
                    starts. Must be aligned to a cluster boundary.
59 03feae73 Kevin Wolf
60 03feae73 Kevin Wolf
         56 - 59:   refcount_table_clusters
61 03feae73 Kevin Wolf
                    Number of clusters that the refcount table occupies
62 03feae73 Kevin Wolf
63 03feae73 Kevin Wolf
         60 - 63:   nb_snapshots
64 03feae73 Kevin Wolf
                    Number of snapshots contained in the image
65 03feae73 Kevin Wolf
66 03feae73 Kevin Wolf
         64 - 71:   snapshots_offset
67 03feae73 Kevin Wolf
                    Offset into the image file at which the snapshot table
68 03feae73 Kevin Wolf
                    starts. Must be aligned to a cluster boundary.
69 03feae73 Kevin Wolf
70 03feae73 Kevin Wolf
Directly after the image header, optional sections called header extensions can
71 03feae73 Kevin Wolf
be stored. Each extension has a structure like the following:
72 03feae73 Kevin Wolf
73 03feae73 Kevin Wolf
    Byte  0 -  3:   Header extension type:
74 03feae73 Kevin Wolf
                        0x00000000 - End of the header extension area
75 03feae73 Kevin Wolf
                        0xE2792ACA - Backing file format name
76 03feae73 Kevin Wolf
                        other      - Unknown header extension, can be safely
77 03feae73 Kevin Wolf
                                     ignored
78 03feae73 Kevin Wolf
79 03feae73 Kevin Wolf
          4 -  7:   Length of the header extension data
80 03feae73 Kevin Wolf
81 03feae73 Kevin Wolf
          8 -  n:   Header extension data
82 03feae73 Kevin Wolf
83 03feae73 Kevin Wolf
          n -  m:   Padding to round up the header extension size to the next
84 03feae73 Kevin Wolf
                    multiple of 8.
85 03feae73 Kevin Wolf
86 03feae73 Kevin Wolf
The remaining space between the end of the header extension area and the end of
87 03feae73 Kevin Wolf
the first cluster can be used for other data. Usually, the backing file name is
88 03feae73 Kevin Wolf
stored there.
89 03feae73 Kevin Wolf
90 03feae73 Kevin Wolf
91 03feae73 Kevin Wolf
== Host cluster management ==
92 03feae73 Kevin Wolf
93 03feae73 Kevin Wolf
qcow2 manages the allocation of host clusters by maintaining a reference count
94 03feae73 Kevin Wolf
for each host cluster. A refcount of 0 means that the cluster is free, 1 means
95 03feae73 Kevin Wolf
that it is used, and >= 2 means that it is used and any write access must
96 03feae73 Kevin Wolf
perform a COW (copy on write) operation.
97 03feae73 Kevin Wolf
98 03feae73 Kevin Wolf
The refcounts are managed in a two-level table. The first level is called
99 03feae73 Kevin Wolf
refcount table and has a variable size (which is stored in the header). The
100 03feae73 Kevin Wolf
refcount table can cover multiple clusters, however it needs to be contiguous
101 03feae73 Kevin Wolf
in the image file.
102 03feae73 Kevin Wolf
103 03feae73 Kevin Wolf
It contains pointers to the second level structures which are called refcount
104 03feae73 Kevin Wolf
blocks and are exactly one cluster in size.
105 03feae73 Kevin Wolf
106 03feae73 Kevin Wolf
Given a offset into the image file, the refcount of its cluster can be obtained
107 03feae73 Kevin Wolf
as follows:
108 03feae73 Kevin Wolf
109 03feae73 Kevin Wolf
    refcount_block_entries = (cluster_size / sizeof(uint16_t))
110 03feae73 Kevin Wolf
111 03feae73 Kevin Wolf
    refcount_block_index = (offset / cluster_size) % refcount_table_entries
112 03feae73 Kevin Wolf
    refcount_table_index = (offset / cluster_size) / refcount_table_entries
113 03feae73 Kevin Wolf
114 03feae73 Kevin Wolf
    refcount_block = load_cluster(refcount_table[refcount_table_index]);
115 03feae73 Kevin Wolf
    return refcount_block[refcount_block_index];
116 03feae73 Kevin Wolf
117 03feae73 Kevin Wolf
Refcount table entry:
118 03feae73 Kevin Wolf
119 03feae73 Kevin Wolf
    Bit  0 -  8:    Reserved (set to 0)
120 03feae73 Kevin Wolf
121 03feae73 Kevin Wolf
         9 - 63:    Bits 9-63 of the offset into the image file at which the
122 03feae73 Kevin Wolf
                    refcount block starts. Must be aligned to a cluster
123 03feae73 Kevin Wolf
                    boundary.
124 03feae73 Kevin Wolf
125 03feae73 Kevin Wolf
                    If this is 0, the corresponding refcount block has not yet
126 03feae73 Kevin Wolf
                    been allocated. All refcounts managed by this refcount block
127 03feae73 Kevin Wolf
                    are 0.
128 03feae73 Kevin Wolf
129 03feae73 Kevin Wolf
Refcount block entry:
130 03feae73 Kevin Wolf
131 03feae73 Kevin Wolf
    Bit  0 - 15:    Reference count of the cluster
132 03feae73 Kevin Wolf
133 03feae73 Kevin Wolf
134 03feae73 Kevin Wolf
== Cluster mapping ==
135 03feae73 Kevin Wolf
136 03feae73 Kevin Wolf
Just as for refcounts, qcow2 uses a two-level structure for the mapping of
137 03feae73 Kevin Wolf
guest clusters to host clusters. They are called L1 and L2 table.
138 03feae73 Kevin Wolf
139 03feae73 Kevin Wolf
The L1 table has a variable size (stored in the header) and may use multiple
140 03feae73 Kevin Wolf
clusters, however it must be contiguous in the image file. L2 tables are
141 03feae73 Kevin Wolf
exactly one cluster in size.
142 03feae73 Kevin Wolf
143 03feae73 Kevin Wolf
Given a offset into the virtual disk, the offset into the image file can be
144 03feae73 Kevin Wolf
obtained as follows:
145 03feae73 Kevin Wolf
146 03feae73 Kevin Wolf
    l2_entries = (cluster_size / sizeof(uint64_t))
147 03feae73 Kevin Wolf
148 03feae73 Kevin Wolf
    l2_index = (offset / cluster_size) % l2_entries
149 03feae73 Kevin Wolf
    l1_index = (offset / cluster_size) / l2_entries
150 03feae73 Kevin Wolf
151 03feae73 Kevin Wolf
    l2_table = load_cluster(l1_table[l1_index]);
152 03feae73 Kevin Wolf
    cluster_offset = l2_table[l2_index];
153 03feae73 Kevin Wolf
154 03feae73 Kevin Wolf
    return cluster_offset + (offset % cluster_size)
155 03feae73 Kevin Wolf
156 03feae73 Kevin Wolf
L1 table entry:
157 03feae73 Kevin Wolf
158 03feae73 Kevin Wolf
    Bit  0 -  8:    Reserved (set to 0)
159 03feae73 Kevin Wolf
160 03feae73 Kevin Wolf
         9 - 55:    Bits 9-55 of the offset into the image file at which the L2
161 03feae73 Kevin Wolf
                    table starts. Must be aligned to a cluster boundary. If the
162 03feae73 Kevin Wolf
                    offset is 0, the L2 table and all clusters described by this
163 03feae73 Kevin Wolf
                    L2 table are unallocated.
164 03feae73 Kevin Wolf
165 03feae73 Kevin Wolf
        56 - 62:    Reserved (set to 0)
166 03feae73 Kevin Wolf
167 03feae73 Kevin Wolf
             63:    0 for an L2 table that is unused or requires COW, 1 if its
168 03feae73 Kevin Wolf
                    refcount is exactly one. This information is only accurate
169 03feae73 Kevin Wolf
                    in the active L1 table.
170 03feae73 Kevin Wolf
171 03feae73 Kevin Wolf
L2 table entry (for normal clusters):
172 03feae73 Kevin Wolf
173 03feae73 Kevin Wolf
    Bit  0 -  8:    Reserved (set to 0)
174 03feae73 Kevin Wolf
175 03feae73 Kevin Wolf
         9 - 55:    Bits 9-55 of host cluster offset. Must be aligned to a
176 03feae73 Kevin Wolf
                    cluster boundary. If the offset is 0, the cluster is
177 03feae73 Kevin Wolf
                    unallocated.
178 03feae73 Kevin Wolf
179 03feae73 Kevin Wolf
        56 - 61:    Reserved (set to 0)
180 03feae73 Kevin Wolf
181 03feae73 Kevin Wolf
             62:    0 (this cluster is not compressed)
182 03feae73 Kevin Wolf
183 03feae73 Kevin Wolf
             63:    0 for a cluster that is unused or requires COW, 1 if its
184 03feae73 Kevin Wolf
                    refcount is exactly one. This information is only accurate
185 03feae73 Kevin Wolf
                    in L2 tables that are reachable from the the active L1
186 03feae73 Kevin Wolf
                    table.
187 03feae73 Kevin Wolf
188 03feae73 Kevin Wolf
L2 table entry (for compressed clusters; x = 62 - (cluster_size - 8)):
189 03feae73 Kevin Wolf
190 03feae73 Kevin Wolf
    Bit  0 -  x:    Host cluster offset. This is usually _not_ aligned to a
191 03feae73 Kevin Wolf
                    cluster boundary!
192 03feae73 Kevin Wolf
193 03feae73 Kevin Wolf
       x+1 - 61:    Compressed size of the images in sectors of 512 bytes
194 03feae73 Kevin Wolf
195 03feae73 Kevin Wolf
             62:    1 (this cluster is compressed using zlib)
196 03feae73 Kevin Wolf
197 03feae73 Kevin Wolf
             63:    0 for a cluster that is unused or requires COW, 1 if its
198 03feae73 Kevin Wolf
                    refcount is exactly one. This information is only accurate
199 03feae73 Kevin Wolf
                    in L2 tables that are reachable from the the active L1
200 03feae73 Kevin Wolf
                    table.
201 03feae73 Kevin Wolf
202 03feae73 Kevin Wolf
If a cluster is unallocated, read requests shall read the data from the backing
203 03feae73 Kevin Wolf
file. If there is no backing file or the backing file is smaller than the image,
204 03feae73 Kevin Wolf
they shall read zeros for all parts that are not covered by the backing file.
205 03feae73 Kevin Wolf
206 03feae73 Kevin Wolf
207 03feae73 Kevin Wolf
== Snapshots ==
208 03feae73 Kevin Wolf
209 03feae73 Kevin Wolf
qcow2 supports internal snapshots. Their basic principle of operation is to
210 03feae73 Kevin Wolf
switch the active L1 table, so that a different set of host clusters are
211 03feae73 Kevin Wolf
exposed to the guest.
212 03feae73 Kevin Wolf
213 03feae73 Kevin Wolf
When creating a snapshot, the L1 table should be copied and the refcount of all
214 03feae73 Kevin Wolf
L2 tables and clusters reachable form this L1 table must be increased, so that
215 03feae73 Kevin Wolf
a write causes a COW and isn't visible in other snapshots.
216 03feae73 Kevin Wolf
217 03feae73 Kevin Wolf
When loading a snapshot, bit 63 of all entries in the new active L1 table and
218 03feae73 Kevin Wolf
all L2 tables referenced by it must be reconstructed from the refcount table
219 03feae73 Kevin Wolf
as it doesn't need to be accurate in inactive L1 tables.
220 03feae73 Kevin Wolf
221 03feae73 Kevin Wolf
A directory of all snapshots is stored in the snapshot table, a contiguous area
222 03feae73 Kevin Wolf
in the image file, whose starting offset and length are given by the header
223 03feae73 Kevin Wolf
fields snapshots_offset and nb_snapshots. The entries of the snapshot table
224 03feae73 Kevin Wolf
have variable length, depending on the length of ID, name and extra data.
225 03feae73 Kevin Wolf
226 03feae73 Kevin Wolf
Snapshot table entry:
227 03feae73 Kevin Wolf
228 03feae73 Kevin Wolf
    Byte 0 -  7:    Offset into the image file at which the L1 table for the
229 03feae73 Kevin Wolf
                    snapshot starts. Must be aligned to a cluster boundary.
230 03feae73 Kevin Wolf
231 03feae73 Kevin Wolf
         8 - 11:    Number of entries in the L1 table of the snapshots
232 03feae73 Kevin Wolf
233 03feae73 Kevin Wolf
        12 - 13:    Length of the unique ID string describing the snapshot
234 03feae73 Kevin Wolf
235 03feae73 Kevin Wolf
        14 - 15:    Length of the name of the snapshot
236 03feae73 Kevin Wolf
237 03feae73 Kevin Wolf
        16 - 19:    Time at which the snapshot was taken in seconds since the
238 03feae73 Kevin Wolf
                    Epoch
239 03feae73 Kevin Wolf
240 03feae73 Kevin Wolf
        20 - 23:    Subsecond part of the time at which the snapshot was taken
241 03feae73 Kevin Wolf
                    in nanoseconds
242 03feae73 Kevin Wolf
243 03feae73 Kevin Wolf
        24 - 31:    Time that the guest was running until the snapshot was
244 03feae73 Kevin Wolf
                    taken in nanoseconds
245 03feae73 Kevin Wolf
246 03feae73 Kevin Wolf
        32 - 35:    Size of the VM state in bytes. 0 if no VM state is saved.
247 03feae73 Kevin Wolf
                    If there is VM state, it starts at the first cluster
248 03feae73 Kevin Wolf
                    described by first L1 table entry that doesn't describe a
249 03feae73 Kevin Wolf
                    regular guest cluster (i.e. VM state is stored like guest
250 03feae73 Kevin Wolf
                    disk content, except that it is stored at offsets that are
251 03feae73 Kevin Wolf
                    larger than the virtual disk presented to the guest)
252 03feae73 Kevin Wolf
253 03feae73 Kevin Wolf
        36 - 39:    Size of extra data in the table entry (used for future
254 03feae73 Kevin Wolf
                    extensions of the format)
255 03feae73 Kevin Wolf
256 03feae73 Kevin Wolf
        variable:   Extra data for future extensions. Must be ignored.
257 03feae73 Kevin Wolf
258 03feae73 Kevin Wolf
        variable:   Unique ID string for the snapshot (not null terminated)
259 03feae73 Kevin Wolf
260 03feae73 Kevin Wolf
        variable:   Name of the snapshot (not null terminated)