Revision dda5336e
b/docs/migration.txt | ||
---|---|---|
1 | 1 |
= Migration = |
2 | 2 |
|
3 | 3 |
QEMU has code to load/save the state of the guest that it is running. |
4 |
This are two complementary operations. Saving the state just does
|
|
4 |
These are two complementary operations. Saving the state just does
|
|
5 | 5 |
that, saves the state for each device that the guest is running. |
6 | 6 |
Restoring a guest is just the opposite operation: we need to load the |
7 | 7 |
state of each device. |
8 | 8 |
|
9 |
For this to work, QEMU has to be launch with the same arguments the |
|
9 |
For this to work, QEMU has to be launched with the same arguments the
|
|
10 | 10 |
two times. I.e. it can only restore the state in one guest that has |
11 | 11 |
the same devices that the one it was saved (this last requirement can |
12 |
be relaxed a bit, but for now we can consider that configuration have
|
|
12 |
be relaxed a bit, but for now we can consider that configuration has
|
|
13 | 13 |
to be exactly the same). |
14 | 14 |
|
15 | 15 |
Once that we are able to save/restore a guest, a new functionality is |
16 | 16 |
requested: migration. This means that QEMU is able to start in one |
17 |
machine and being "migrated" to other machine. I.e. being moved to |
|
18 |
other machine. |
|
17 |
machine and being "migrated" to another machine. I.e. being moved to
|
|
18 |
another machine.
|
|
19 | 19 |
|
20 | 20 |
Next was the "live migration" functionality. This is important |
21 | 21 |
because some guests run with a lot of state (specially RAM), and it |
... | ... | |
24 | 24 |
transferred. Only while the last part of the state is transferred has |
25 | 25 |
the guest to be stopped. Typically the time that the guest is |
26 | 26 |
unresponsive during live migration is the low hundred of milliseconds |
27 |
(notice that this depends on lot of things). |
|
27 |
(notice that this depends on a lot of things).
|
|
28 | 28 |
|
29 | 29 |
=== Types of migration === |
30 | 30 |
|
... | ... | |
35 | 35 |
- unix migration: do the migration using unix sockets |
36 | 36 |
- exec migration: do the migration using the stdin/stdout through a process. |
37 | 37 |
- fd migration: do the migration using an file descriptor that is |
38 |
passed to QEMU. QEMU don't cares how this file descriptor is opened.
|
|
38 |
passed to QEMU. QEMU doesn't care how this file descriptor is opened.
|
|
39 | 39 |
|
40 |
All this four migration protocols use the same infrastructure to
|
|
40 |
All these four migration protocols use the same infrastructure to
|
|
41 | 41 |
save/restore state devices. This infrastructure is shared with the |
42 | 42 |
savevm/loadvm functionality. |
43 | 43 |
|
... | ... | |
49 | 49 |
=== What is the common infrastructure === |
50 | 50 |
|
51 | 51 |
QEMU uses a QEMUFile abstraction to be able to do migration. Any type |
52 |
of migration that what to use QEMU infrastructure has to create a
|
|
52 |
of migration that wants to use QEMU infrastructure has to create a
|
|
53 | 53 |
QEMUFile with: |
54 | 54 |
|
55 | 55 |
QEMUFile *qemu_fopen_ops(void *opaque, |
56 |
QEMUFilePutBufferFunc *put_buffer,
|
|
56 |
QEMUFilePutBufferFunc *put_buffer,
|
|
57 | 57 |
QEMUFileGetBufferFunc *get_buffer, |
58 | 58 |
QEMUFileCloseFunc *close, |
59 | 59 |
QEMUFileRateLimit *rate_limit, |
60 | 60 |
QEMUFileSetRateLimit *set_rate_limit, |
61 |
QEMUFileGetRateLimit *get_rate_limit);
|
|
61 |
QEMUFileGetRateLimit *get_rate_limit);
|
|
62 | 62 |
|
63 | 63 |
The functions have the following functionality: |
64 | 64 |
|
65 | 65 |
This function writes a chunk of data to a file at the given position. |
66 |
The pos argument can be ignored if the file is only being used for
|
|
66 |
The pos argument can be ignored if the file is only used for |
|
67 | 67 |
streaming. The handler should try to write all of the data it can. |
68 | 68 |
|
69 | 69 |
typedef int (QEMUFilePutBufferFunc)(void *opaque, const uint8_t *buf, |
... | ... | |
76 | 76 |
typedef int (QEMUFileGetBufferFunc)(void *opaque, uint8_t *buf, |
77 | 77 |
int64_t pos, int size); |
78 | 78 |
|
79 |
Close a file and return an error code |
|
79 |
Close a file and return an error code.
|
|
80 | 80 |
|
81 | 81 |
typedef int (QEMUFileCloseFunc)(void *opaque); |
82 | 82 |
|
83 |
Called to determine if the file has exceeded it's bandwidth allocation. The
|
|
83 |
Called to determine if the file has exceeded its bandwidth allocation. The |
|
84 | 84 |
bandwidth capping is a soft limit, not a hard limit. |
85 | 85 |
|
86 | 86 |
typedef int (QEMUFileRateLimit)(void *opaque); |
87 | 87 |
|
88 | 88 |
Called to change the current bandwidth allocation. This function must return |
89 | 89 |
the new actual bandwidth. It should be new_rate if everything goes OK, and |
90 |
the old rate otherwise |
|
90 |
the old rate otherwise.
|
|
91 | 91 |
|
92 | 92 |
typedef size_t (QEMUFileSetRateLimit)(void *opaque, size_t new_rate); |
93 | 93 |
typedef size_t (QEMUFileGetRateLimit)(void *opaque); |
... | ... | |
111 | 111 |
of fields. Some times, due to bugs or new functionality, we need to |
112 | 112 |
change the state to store more/different information. We use the |
113 | 113 |
version to identify each time that we do a change. Each version is |
114 |
associated with a series of fields saved. The save_state always save |
|
115 |
the state as the newer version. But load_state some times is able to
|
|
114 |
associated with a series of fields saved. The save_state always saves
|
|
115 |
the state as the newer version. But load_state sometimes is able to |
|
116 | 116 |
load state from an older version. |
117 | 117 |
|
118 | 118 |
=== Legacy way === |
... | ... | |
135 | 135 |
|
136 | 136 |
The important functions for the device state format are the save_state |
137 | 137 |
and load_state. Notice that load_state receives a version_id |
138 |
parameter to know what state format is receiving. save_state don't |
|
139 |
have a version_id parameter because it uses always the latest version.
|
|
138 |
parameter to know what state format is receiving. save_state doesn't
|
|
139 |
have a version_id parameter because it always uses the latest version.
|
|
140 | 140 |
|
141 | 141 |
=== VMState === |
142 | 142 |
|
143 | 143 |
The legacy way of saving/loading state of the device had the problem |
144 |
that we have to maintain in sync two functions. If we did one change
|
|
145 |
in one of them and not on the other, we got a failed migration.
|
|
144 |
that we have to maintain two functions in sync. If we did one change
|
|
145 |
in one of them and not in the other, we would get a failed migration.
|
|
146 | 146 |
|
147 | 147 |
VMState changed the way that state is saved/loaded. Instead of using |
148 | 148 |
a function to save the state and another to load it, it was changed to |
... | ... | |
173 | 173 |
|
174 | 174 |
vmstate_register(NULL, 0, &vmstate_kbd, s); |
175 | 175 |
|
176 |
Note: talk about how vmstate <-> qdev interact, and what the instance id's mean.
|
|
176 |
Note: talk about how vmstate <-> qdev interact, and what the instance ids mean. |
|
177 | 177 |
|
178 | 178 |
You can search for VMSTATE_* macros for lots of types used in QEMU in |
179 | 179 |
hw/hw.h. |
... | ... | |
182 | 182 |
|
183 | 183 |
You can see that there are several version fields: |
184 | 184 |
|
185 |
- version_id: the maximum version_id supported by VMState for that device |
|
185 |
- version_id: the maximum version_id supported by VMState for that device.
|
|
186 | 186 |
- minimum_version_id: the minimum version_id that VMState is able to understand |
187 | 187 |
for that device. |
188 | 188 |
- minimum_version_id_old: For devices that were not able to port to vmstate, we can |
... | ... | |
195 | 195 |
|
196 | 196 |
=== Massaging functions === |
197 | 197 |
|
198 |
Some times, it is not enough to be able to save the state directly
|
|
198 |
Sometimes, it is not enough to be able to save the state directly |
|
199 | 199 |
from one structure, we need to fill the correct values there. One |
200 | 200 |
example is when we are using kvm. Before saving the cpu state, we |
201 | 201 |
need to ask kvm to copy to QEMU the state that it is using. And the |
... | ... | |
227 | 227 |
add anything to the state to fix a bug, we have to disable migration |
228 | 228 |
to older versions that don't have that bug-fix (i.e. a new field). |
229 | 229 |
|
230 |
But some time, that bug-fix is only needed sometimes, not always. For
|
|
230 |
But sometimes, that bug-fix is only needed sometimes, not always. For
|
|
231 | 231 |
instance, if the device is in the middle of a DMA operation, it is |
232 | 232 |
using a specific functionality, .... |
233 | 233 |
|
234 | 234 |
It is impossible to create a way to make migration from any version to |
235 |
any other version to work. But we can do better that only allowing
|
|
235 |
any other version to work. But we can do better than only allowing
|
|
236 | 236 |
migration from older versions no newer ones. For that fields that are |
237 |
only needed sometimes, we add the idea of subsections. a subsection
|
|
237 |
only needed sometimes, we add the idea of subsections. A subsection
|
|
238 | 238 |
is "like" a device vmstate, but with a particularity, it has a Boolean |
239 | 239 |
function that tells if that values are needed to be sent or not. If |
240 | 240 |
this functions returns false, the subsection is not sent. |
... | ... | |
266 | 266 |
.fields = (VMStateField []) { |
267 | 267 |
VMSTATE_INT32(req_nb_sectors, IDEState), |
268 | 268 |
VMSTATE_VARRAY_INT32(io_buffer, IDEState, io_buffer_total_len, 1, |
269 |
vmstate_info_uint8, uint8_t),
|
|
269 |
vmstate_info_uint8, uint8_t),
|
|
270 | 270 |
VMSTATE_INT32(cur_io_buffer_offset, IDEState), |
271 | 271 |
VMSTATE_INT32(cur_io_buffer_len, IDEState), |
272 | 272 |
VMSTATE_UINT8(end_transfer_fn_idx, IDEState), |
Also available in: Unified diff