root / docs / xbzrle.txt @ c08ba66f
History | View | Annotate | Download (4.4 kB)
1 | 34c26412 | Orit Wasserman | XBZRLE (Xor Based Zero Run Length Encoding) |
---|---|---|---|
2 | 34c26412 | Orit Wasserman | =========================================== |
3 | 34c26412 | Orit Wasserman | |
4 | 34c26412 | Orit Wasserman | Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction |
5 | 34c26412 | Orit Wasserman | of VM downtime and the total live-migration time of Virtual machines. |
6 | 34c26412 | Orit Wasserman | It is particularly useful for virtual machines running memory write intensive |
7 | 34c26412 | Orit Wasserman | workloads that are typical of large enterprise applications such as SAP ERP |
8 | 34c26412 | Orit Wasserman | Systems, and generally speaking for any application that uses a sparse memory |
9 | 34c26412 | Orit Wasserman | update pattern. |
10 | 34c26412 | Orit Wasserman | |
11 | 34c26412 | Orit Wasserman | Instead of sending the changed guest memory page this solution will send a |
12 | 34c26412 | Orit Wasserman | compressed version of the updates, thus reducing the amount of data sent during |
13 | 34c26412 | Orit Wasserman | live migration. |
14 | 34c26412 | Orit Wasserman | In order to be able to calculate the update, the previous memory pages need to |
15 | 34c26412 | Orit Wasserman | be stored on the source. Those pages are stored in a dedicated cache |
16 | 34c26412 | Orit Wasserman | (hash table) and are accessed by their address. |
17 | 34c26412 | Orit Wasserman | The larger the cache size the better the chances are that the page has already |
18 | 34c26412 | Orit Wasserman | been stored in the cache. |
19 | 34c26412 | Orit Wasserman | A small cache size will result in high cache miss rate. |
20 | 34c26412 | Orit Wasserman | Cache size can be changed before and during migration. |
21 | 34c26412 | Orit Wasserman | |
22 | 34c26412 | Orit Wasserman | Format |
23 | 34c26412 | Orit Wasserman | ======= |
24 | 34c26412 | Orit Wasserman | |
25 | 34c26412 | Orit Wasserman | The compression format performs a XOR between the previous and current content |
26 | 34c26412 | Orit Wasserman | of the page, where zero represents an unchanged value. |
27 | 34c26412 | Orit Wasserman | The page data delta is represented by zero and non zero runs. |
28 | 34c26412 | Orit Wasserman | A zero run is represented by its length (in bytes). |
29 | 34c26412 | Orit Wasserman | A non zero run is represented by its length (in bytes) and the new data. |
30 | 34c26412 | Orit Wasserman | The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128) |
31 | 34c26412 | Orit Wasserman | |
32 | 34c26412 | Orit Wasserman | There can be more than one valid encoding, the sender may send a longer encoding |
33 | 34c26412 | Orit Wasserman | for the benefit of reducing computation cost. |
34 | 34c26412 | Orit Wasserman | |
35 | 34c26412 | Orit Wasserman | page = zrun nzrun |
36 | 34c26412 | Orit Wasserman | | zrun nzrun page |
37 | 34c26412 | Orit Wasserman | |
38 | 34c26412 | Orit Wasserman | zrun = length |
39 | 34c26412 | Orit Wasserman | |
40 | 34c26412 | Orit Wasserman | nzrun = length byte... |
41 | 34c26412 | Orit Wasserman | |
42 | 34c26412 | Orit Wasserman | length = uleb128 encoded integer |
43 | 34c26412 | Orit Wasserman | |
44 | 34c26412 | Orit Wasserman | On the sender side XBZRLE is used as a compact delta encoding of page updates, |
45 | 34c26412 | Orit Wasserman | retrieving the old page content from the cache (default size of 512 MB). The |
46 | 34c26412 | Orit Wasserman | receiving side uses the existing page's content and XBZRLE to decode the new |
47 | 34c26412 | Orit Wasserman | page's content. |
48 | 34c26412 | Orit Wasserman | |
49 | 34c26412 | Orit Wasserman | This work was originally based on research results published |
50 | 34c26412 | Orit Wasserman | VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live |
51 | 34c26412 | Orit Wasserman | Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth. |
52 | 34c26412 | Orit Wasserman | Additionally the delta encoder XBRLE was improved further using the XBZRLE |
53 | 34c26412 | Orit Wasserman | instead. |
54 | 34c26412 | Orit Wasserman | |
55 | 34c26412 | Orit Wasserman | XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it |
56 | 34c26412 | Orit Wasserman | ideal for in-line, real-time encoding such as is needed for live-migration. |
57 | 34c26412 | Orit Wasserman | |
58 | 34c26412 | Orit Wasserman | Example |
59 | 34c26412 | Orit Wasserman | old buffer: |
60 | 34c26412 | Orit Wasserman | 1001 zeros |
61 | 34c26412 | Orit Wasserman | 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d |
62 | 34c26412 | Orit Wasserman | 3074 zeros |
63 | 34c26412 | Orit Wasserman | |
64 | 34c26412 | Orit Wasserman | new buffer: |
65 | 34c26412 | Orit Wasserman | 1001 zeros |
66 | 34c26412 | Orit Wasserman | 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69 |
67 | 34c26412 | Orit Wasserman | 3074 zeros |
68 | 34c26412 | Orit Wasserman | |
69 | 34c26412 | Orit Wasserman | encoded buffer: |
70 | 34c26412 | Orit Wasserman | |
71 | 34c26412 | Orit Wasserman | encoded length 24 |
72 | 34c26412 | Orit Wasserman | e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69 |
73 | 34c26412 | Orit Wasserman | |
74 | 34c26412 | Orit Wasserman | Usage |
75 | 34c26412 | Orit Wasserman | ====================== |
76 | 34c26412 | Orit Wasserman | 1. Verify the destination QEMU version is able to decode the new format. |
77 | 34c26412 | Orit Wasserman | {qemu} info migrate_capabilities |
78 | 34c26412 | Orit Wasserman | {qemu} xbzrle: off , ... |
79 | 34c26412 | Orit Wasserman | |
80 | 34c26412 | Orit Wasserman | 2. Activate xbzrle on both source and destination: |
81 | 34c26412 | Orit Wasserman | {qemu} migrate_set_capability xbzrle on |
82 | 34c26412 | Orit Wasserman | |
83 | 34c26412 | Orit Wasserman | 3. Set the XBZRLE cache size - the cache size is in MBytes and should be a |
84 | 34c26412 | Orit Wasserman | power of 2. The cache default value is 64MBytes. (on source only) |
85 | 34c26412 | Orit Wasserman | {qemu} migrate_set_cache_size 256m |
86 | 34c26412 | Orit Wasserman | |
87 | 34c26412 | Orit Wasserman | 4. Start outgoing migration |
88 | 34c26412 | Orit Wasserman | {qemu} migrate -d tcp:destination.host:4444 |
89 | 34c26412 | Orit Wasserman | {qemu} info migrate |
90 | 34c26412 | Orit Wasserman | capabilities: xbzrle: on |
91 | 34c26412 | Orit Wasserman | Migration status: active |
92 | 34c26412 | Orit Wasserman | transferred ram: A kbytes |
93 | 34c26412 | Orit Wasserman | remaining ram: B kbytes |
94 | 34c26412 | Orit Wasserman | total ram: C kbytes |
95 | 34c26412 | Orit Wasserman | total time: D milliseconds |
96 | 34c26412 | Orit Wasserman | duplicate: E pages |
97 | 34c26412 | Orit Wasserman | normal: F pages |
98 | 34c26412 | Orit Wasserman | normal bytes: G kbytes |
99 | 34c26412 | Orit Wasserman | cache size: H bytes |
100 | 34c26412 | Orit Wasserman | xbzrle transferred: I kbytes |
101 | 34c26412 | Orit Wasserman | xbzrle pages: J pages |
102 | 34c26412 | Orit Wasserman | xbzrle cache miss: K |
103 | 34c26412 | Orit Wasserman | xbzrle overflow : L |
104 | 34c26412 | Orit Wasserman | |
105 | 34c26412 | Orit Wasserman | xbzrle cache-miss: the number of cache misses to date - high cache-miss rate |
106 | 34c26412 | Orit Wasserman | indicates that the cache size is set too low. |
107 | 34c26412 | Orit Wasserman | xbzrle overflow: the number of overflows in the decoding which where the delta |
108 | 34c26412 | Orit Wasserman | could not be compressed. This can happen if the changes in the pages are too |
109 | 34c26412 | Orit Wasserman | large or there are many short changes; for example, changing every second byte |
110 | 34c26412 | Orit Wasserman | (half a page). |
111 | 34c26412 | Orit Wasserman | |
112 | 34c26412 | Orit Wasserman | Testing: Testing indicated that live migration with XBZRLE was completed in 110 |
113 | 34c26412 | Orit Wasserman | seconds, whereas without it would not be able to complete. |
114 | 34c26412 | Orit Wasserman | |
115 | 34c26412 | Orit Wasserman | A simple synthetic memory r/w load generator: |
116 | 34c26412 | Orit Wasserman | .. include <stdlib.h> |
117 | 34c26412 | Orit Wasserman | .. include <stdio.h> |
118 | 34c26412 | Orit Wasserman | .. int main() |
119 | 34c26412 | Orit Wasserman | .. { |
120 | 34c26412 | Orit Wasserman | .. char *buf = (char *) calloc(4096, 4096); |
121 | 34c26412 | Orit Wasserman | .. while (1) { |
122 | 34c26412 | Orit Wasserman | .. int i; |
123 | 34c26412 | Orit Wasserman | .. for (i = 0; i < 4096 * 4; i++) { |
124 | 34c26412 | Orit Wasserman | .. buf[i * 4096 / 4]++; |
125 | 34c26412 | Orit Wasserman | .. } |
126 | 34c26412 | Orit Wasserman | .. printf("."); |
127 | 34c26412 | Orit Wasserman | .. } |
128 | 34c26412 | Orit Wasserman | .. } |