Statistics
| Branch: | Revision:

root / docs / specs / qcow2.txt @ bf3f363a

History | View | Annotate | Download (13.6 kB)

1
== General ==
2

    
3
A qcow2 image file is organized in units of constant size, which are called
4
(host) clusters. A cluster is the unit in which all allocations are done,
5
both for actual guest data and for image metadata.
6

    
7
Likewise, the virtual disk as seen by the guest is divided into (guest)
8
clusters of the same size.
9

    
10
All numbers in qcow2 are stored in Big Endian byte order.
11

    
12

    
13
== Header ==
14

    
15
The first cluster of a qcow2 image contains the file header:
16

    
17
    Byte  0 -  3:   magic
18
                    QCOW magic string ("QFI\xfb")
19

    
20
          4 -  7:   version
21
                    Version number (valid values are 2 and 3)
22

    
23
          8 - 15:   backing_file_offset
24
                    Offset into the image file at which the backing file name
25
                    is stored (NB: The string is not null terminated). 0 if the
26
                    image doesn't have a backing file.
27

    
28
         16 - 19:   backing_file_size
29
                    Length of the backing file name in bytes. Must not be
30
                    longer than 1023 bytes. Undefined if the image doesn't have
31
                    a backing file.
32

    
33
         20 - 23:   cluster_bits
34
                    Number of bits that are used for addressing an offset
35
                    within a cluster (1 << cluster_bits is the cluster size).
36
                    Must not be less than 9 (i.e. 512 byte clusters).
37

    
38
                    Note: qemu as of today has an implementation limit of 2 MB
39
                    as the maximum cluster size and won't be able to open images
40
                    with larger cluster sizes.
41

    
42
         24 - 31:   size
43
                    Virtual disk size in bytes
44

    
45
         32 - 35:   crypt_method
46
                    0 for no encryption
47
                    1 for AES encryption
48

    
49
         36 - 39:   l1_size
50
                    Number of entries in the active L1 table
51

    
52
         40 - 47:   l1_table_offset
53
                    Offset into the image file at which the active L1 table
54
                    starts. Must be aligned to a cluster boundary.
55

    
56
         48 - 55:   refcount_table_offset
57
                    Offset into the image file at which the refcount table
58
                    starts. Must be aligned to a cluster boundary.
59

    
60
         56 - 59:   refcount_table_clusters
61
                    Number of clusters that the refcount table occupies
62

    
63
         60 - 63:   nb_snapshots
64
                    Number of snapshots contained in the image
65

    
66
         64 - 71:   snapshots_offset
67
                    Offset into the image file at which the snapshot table
68
                    starts. Must be aligned to a cluster boundary.
69

    
70
If the version is 3 or higher, the header has the following additional fields.
71
For version 2, the values are assumed to be zero, unless specified otherwise
72
in the description of a field.
73

    
74
         72 -  79:  incompatible_features
75
                    Bitmask of incompatible features. An implementation must
76
                    fail to open an image if an unknown bit is set.
77

    
78
                    Bits 0-63:  Reserved (set to 0)
79

    
80
         80 -  87:  compatible_features
81
                    Bitmask of compatible features. An implementation can
82
                    safely ignore any unknown bits that are set.
83

    
84
                    Bits 0-63:  Reserved (set to 0)
85

    
86
         88 -  95:  autoclear_features
87
                    Bitmask of auto-clear features. An implementation may only
88
                    write to an image with unknown auto-clear features if it
89
                    clears the respective bits from this field first.
90

    
91
                    Bits 0-63:  Reserved (set to 0)
92

    
93
         96 -  99:  refcount_order
94
                    Describes the width of a reference count block entry (width
95
                    in bits = 1 << refcount_order). For version 2 images, the
96
                    order is always assumed to be 4 (i.e. the width is 16 bits).
97

    
98
        100 - 103:  header_length
99
                    Length of the header structure in bytes. For version 2
100
                    images, the length is always assumed to be 72 bytes.
101

    
102
Directly after the image header, optional sections called header extensions can
103
be stored. Each extension has a structure like the following:
104

    
105
    Byte  0 -  3:   Header extension type:
106
                        0x00000000 - End of the header extension area
107
                        0xE2792ACA - Backing file format name
108
                        0x6803f857 - Feature name table
109
                        other      - Unknown header extension, can be safely
110
                                     ignored
111

    
112
          4 -  7:   Length of the header extension data
113

    
114
          8 -  n:   Header extension data
115

    
116
          n -  m:   Padding to round up the header extension size to the next
117
                    multiple of 8.
118

    
119
Unless stated otherwise, each header extension type shall appear at most once
120
in the same image.
121

    
122
The remaining space between the end of the header extension area and the end of
123
the first cluster can be used for the backing file name. It is not allowed to
124
store other data here, so that an implementation can safely modify the header
125
and add extensions without harming data of compatible features that it
126
doesn't support. Compatible features that need space for additional data can
127
use a header extension.
128

    
129

    
130
== Feature name table ==
131

    
132
The feature name table is an optional header extension that contains the name
133
for features used by the image. It can be used by applications that don't know
134
the respective feature (e.g. because the feature was introduced only later) to
135
display a useful error message.
136

    
137
The number of entries in the feature name table is determined by the length of
138
the header extension data. Each entry look like this:
139

    
140
    Byte       0:   Type of feature (select feature bitmap)
141
                        0: Incompatible feature
142
                        1: Compatible feature
143
                        2: Autoclear feature
144

    
145
               1:   Bit number within the selected feature bitmap (valid
146
                    values: 0-63)
147

    
148
          2 - 47:   Feature name (padded with zeros, but not necessarily null
149
                    terminated if it has full length)
150

    
151

    
152
== Host cluster management ==
153

    
154
qcow2 manages the allocation of host clusters by maintaining a reference count
155
for each host cluster. A refcount of 0 means that the cluster is free, 1 means
156
that it is used, and >= 2 means that it is used and any write access must
157
perform a COW (copy on write) operation.
158

    
159
The refcounts are managed in a two-level table. The first level is called
160
refcount table and has a variable size (which is stored in the header). The
161
refcount table can cover multiple clusters, however it needs to be contiguous
162
in the image file.
163

    
164
It contains pointers to the second level structures which are called refcount
165
blocks and are exactly one cluster in size.
166

    
167
Given a offset into the image file, the refcount of its cluster can be obtained
168
as follows:
169

    
170
    refcount_block_entries = (cluster_size / sizeof(uint16_t))
171

    
172
    refcount_block_index = (offset / cluster_size) % refcount_block_entries
173
    refcount_table_index = (offset / cluster_size) / refcount_block_entries
174

    
175
    refcount_block = load_cluster(refcount_table[refcount_table_index]);
176
    return refcount_block[refcount_block_index];
177

    
178
Refcount table entry:
179

    
180
    Bit  0 -  8:    Reserved (set to 0)
181

    
182
         9 - 63:    Bits 9-63 of the offset into the image file at which the
183
                    refcount block starts. Must be aligned to a cluster
184
                    boundary.
185

    
186
                    If this is 0, the corresponding refcount block has not yet
187
                    been allocated. All refcounts managed by this refcount block
188
                    are 0.
189

    
190
Refcount block entry (x = refcount_bits - 1):
191

    
192
    Bit  0 -  x:    Reference count of the cluster. If refcount_bits implies a
193
                    sub-byte width, note that bit 0 means the least significant
194
                    bit in this context.
195

    
196

    
197
== Cluster mapping ==
198

    
199
Just as for refcounts, qcow2 uses a two-level structure for the mapping of
200
guest clusters to host clusters. They are called L1 and L2 table.
201

    
202
The L1 table has a variable size (stored in the header) and may use multiple
203
clusters, however it must be contiguous in the image file. L2 tables are
204
exactly one cluster in size.
205

    
206
Given a offset into the virtual disk, the offset into the image file can be
207
obtained as follows:
208

    
209
    l2_entries = (cluster_size / sizeof(uint64_t))
210

    
211
    l2_index = (offset / cluster_size) % l2_entries
212
    l1_index = (offset / cluster_size) / l2_entries
213

    
214
    l2_table = load_cluster(l1_table[l1_index]);
215
    cluster_offset = l2_table[l2_index];
216

    
217
    return cluster_offset + (offset % cluster_size)
218

    
219
L1 table entry:
220

    
221
    Bit  0 -  8:    Reserved (set to 0)
222

    
223
         9 - 55:    Bits 9-55 of the offset into the image file at which the L2
224
                    table starts. Must be aligned to a cluster boundary. If the
225
                    offset is 0, the L2 table and all clusters described by this
226
                    L2 table are unallocated.
227

    
228
        56 - 62:    Reserved (set to 0)
229

    
230
             63:    0 for an L2 table that is unused or requires COW, 1 if its
231
                    refcount is exactly one. This information is only accurate
232
                    in the active L1 table.
233

    
234
L2 table entry:
235

    
236
    Bit  0 -  61:   Cluster descriptor
237

    
238
              62:   0 for standard clusters
239
                    1 for compressed clusters
240

    
241
              63:   0 for a cluster that is unused or requires COW, 1 if its
242
                    refcount is exactly one. This information is only accurate
243
                    in L2 tables that are reachable from the the active L1
244
                    table.
245

    
246
Standard Cluster Descriptor:
247

    
248
    Bit       0:    If set to 1, the cluster reads as all zeros. The host
249
                    cluster offset can be used to describe a preallocation,
250
                    but it won't be used for reading data from this cluster,
251
                    nor is data read from the backing file if the cluster is
252
                    unallocated.
253

    
254
                    With version 2, this is always 0.
255

    
256
         1 -  8:    Reserved (set to 0)
257

    
258
         9 - 55:    Bits 9-55 of host cluster offset. Must be aligned to a
259
                    cluster boundary. If the offset is 0, the cluster is
260
                    unallocated.
261

    
262
        56 - 61:    Reserved (set to 0)
263

    
264

    
265
Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
266

    
267
    Bit  0 -  x:    Host cluster offset. This is usually _not_ aligned to a
268
                    cluster boundary!
269

    
270
       x+1 - 61:    Compressed size of the images in sectors of 512 bytes
271

    
272
If a cluster is unallocated, read requests shall read the data from the backing
273
file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
274
no backing file or the backing file is smaller than the image, they shall read
275
zeros for all parts that are not covered by the backing file.
276

    
277

    
278
== Snapshots ==
279

    
280
qcow2 supports internal snapshots. Their basic principle of operation is to
281
switch the active L1 table, so that a different set of host clusters are
282
exposed to the guest.
283

    
284
When creating a snapshot, the L1 table should be copied and the refcount of all
285
L2 tables and clusters reachable from this L1 table must be increased, so that
286
a write causes a COW and isn't visible in other snapshots.
287

    
288
When loading a snapshot, bit 63 of all entries in the new active L1 table and
289
all L2 tables referenced by it must be reconstructed from the refcount table
290
as it doesn't need to be accurate in inactive L1 tables.
291

    
292
A directory of all snapshots is stored in the snapshot table, a contiguous area
293
in the image file, whose starting offset and length are given by the header
294
fields snapshots_offset and nb_snapshots. The entries of the snapshot table
295
have variable length, depending on the length of ID, name and extra data.
296

    
297
Snapshot table entry:
298

    
299
    Byte 0 -  7:    Offset into the image file at which the L1 table for the
300
                    snapshot starts. Must be aligned to a cluster boundary.
301

    
302
         8 - 11:    Number of entries in the L1 table of the snapshots
303

    
304
        12 - 13:    Length of the unique ID string describing the snapshot
305

    
306
        14 - 15:    Length of the name of the snapshot
307

    
308
        16 - 19:    Time at which the snapshot was taken in seconds since the
309
                    Epoch
310

    
311
        20 - 23:    Subsecond part of the time at which the snapshot was taken
312
                    in nanoseconds
313

    
314
        24 - 31:    Time that the guest was running until the snapshot was
315
                    taken in nanoseconds
316

    
317
        32 - 35:    Size of the VM state in bytes. 0 if no VM state is saved.
318
                    If there is VM state, it starts at the first cluster
319
                    described by first L1 table entry that doesn't describe a
320
                    regular guest cluster (i.e. VM state is stored like guest
321
                    disk content, except that it is stored at offsets that are
322
                    larger than the virtual disk presented to the guest)
323

    
324
        36 - 39:    Size of extra data in the table entry (used for future
325
                    extensions of the format)
326

    
327
        variable:   Extra data for future extensions. Unknown fields must be
328
                    ignored. Currently defined are (offset relative to snapshot
329
                    table entry):
330

    
331
                    Byte 40 - 47:   Size of the VM state in bytes. 0 if no VM
332
                                    state is saved. If this field is present,
333
                                    the 32-bit value in bytes 32-35 is ignored.
334

    
335
                    Byte 48 - 55:   Virtual disk size of the snapshot in bytes
336

    
337
                    Version 3 images must include extra data at least up to
338
                    byte 55.
339

    
340
        variable:   Unique ID string for the snapshot (not null terminated)
341

    
342
        variable:   Name of the snapshot (not null terminated)