root / docs / memory.txt @ 9d3a4736
History | View | Annotate | Download (7.2 kB)
1 |
The memory API |
---|---|
2 |
============== |
3 |
|
4 |
The memory API models the memory and I/O buses and controllers of a QEMU |
5 |
machine. It attempts to allow modelling of: |
6 |
|
7 |
- ordinary RAM |
8 |
- memory-mapped I/O (MMIO) |
9 |
- memory controllers that can dynamically reroute physical memory regions |
10 |
to different destinations |
11 |
|
12 |
The memory model provides support for |
13 |
|
14 |
- tracking RAM changes by the guest |
15 |
- setting up coalesced memory for kvm |
16 |
- setting up ioeventfd regions for kvm |
17 |
|
18 |
Memory is modelled as an tree (really acyclic graph) of MemoryRegion objects. |
19 |
The root of the tree is memory as seen from the CPU's viewpoint (the system |
20 |
bus). Nodes in the tree represent other buses, memory controllers, and |
21 |
memory regions that have been rerouted. Leaves are RAM and MMIO regions. |
22 |
|
23 |
Types of regions |
24 |
---------------- |
25 |
|
26 |
There are four types of memory regions (all represented by a single C type |
27 |
MemoryRegion): |
28 |
|
29 |
- RAM: a RAM region is simply a range of host memory that can be made available |
30 |
to the guest. |
31 |
|
32 |
- MMIO: a range of guest memory that is implemented by host callbacks; |
33 |
each read or write causes a callback to be called on the host. |
34 |
|
35 |
- container: a container simply includes other memory regions, each at |
36 |
a different offset. Containers are useful for grouping several regions |
37 |
into one unit. For example, a PCI BAR may be composed of a RAM region |
38 |
and an MMIO region. |
39 |
|
40 |
A container's subregions are usually non-overlapping. In some cases it is |
41 |
useful to have overlapping regions; for example a memory controller that |
42 |
can overlay a subregion of RAM with MMIO or ROM, or a PCI controller |
43 |
that does not prevent card from claiming overlapping BARs. |
44 |
|
45 |
- alias: a subsection of another region. Aliases allow a region to be |
46 |
split apart into discontiguous regions. Examples of uses are memory banks |
47 |
used when the guest address space is smaller than the amount of RAM |
48 |
addressed, or a memory controller that splits main memory to expose a "PCI |
49 |
hole". Aliases may point to any type of region, including other aliases, |
50 |
but an alias may not point back to itself, directly or indirectly. |
51 |
|
52 |
|
53 |
Region names |
54 |
------------ |
55 |
|
56 |
Regions are assigned names by the constructor. For most regions these are |
57 |
only used for debugging purposes, but RAM regions also use the name to identify |
58 |
live migration sections. This means that RAM region names need to have ABI |
59 |
stability. |
60 |
|
61 |
Region lifecycle |
62 |
---------------- |
63 |
|
64 |
A region is created by one of the constructor functions (memory_region_init*()) |
65 |
and destroyed by the destructor (memory_region_destroy()). In between, |
66 |
a region can be added to an address space by using memory_region_add_subregion() |
67 |
and removed using memory_region_del_subregion(). Region attributes may be |
68 |
changed at any point; they take effect once the region becomes exposed to the |
69 |
guest. |
70 |
|
71 |
Overlapping regions and priority |
72 |
-------------------------------- |
73 |
Usually, regions may not overlap each other; a memory address decodes into |
74 |
exactly one target. In some cases it is useful to allow regions to overlap, |
75 |
and sometimes to control which of an overlapping regions is visible to the |
76 |
guest. This is done with memory_region_add_subregion_overlap(), which |
77 |
allows the region to overlap any other region in the same container, and |
78 |
specifies a priority that allows the core to decide which of two regions at |
79 |
the same address are visible (highest wins). |
80 |
|
81 |
Visibility |
82 |
---------- |
83 |
The memory core uses the following rules to select a memory region when the |
84 |
guest accesses an address: |
85 |
|
86 |
- all direct subregions of the root region are matched against the address, in |
87 |
descending priority order |
88 |
- if the address lies outside the region offset/size, the subregion is |
89 |
discarded |
90 |
- if the subregion is a leaf (RAM or MMIO), the seach terminates |
91 |
- if the subregion is a container, the same algorithm is used within the |
92 |
subregion (after the address is adjusted by the subregion offset) |
93 |
- if the subregion is an alias, the search is continues at the alias target |
94 |
(after the address is adjusted by the subregion offset and alias offset) |
95 |
|
96 |
Example memory map |
97 |
------------------ |
98 |
|
99 |
system_memory: container@0-2^48-1 |
100 |
| |
101 |
+---- lomem: alias@0-0xdfffffff ---> #ram (0-0xdfffffff) |
102 |
| |
103 |
+---- himem: alias@0x100000000-0x11fffffff ---> #ram (0xe0000000-0xffffffff) |
104 |
| |
105 |
+---- vga-window: alias@0xa0000-0xbfffff ---> #pci (0xa0000-0xbffff) |
106 |
| (prio 1) |
107 |
| |
108 |
+---- pci-hole: alias@0xe0000000-0xffffffff ---> #pci (0xe0000000-0xffffffff) |
109 |
|
110 |
pci (0-2^32-1) |
111 |
| |
112 |
+--- vga-area: container@0xa0000-0xbffff |
113 |
| | |
114 |
| +--- alias@0x00000-0x7fff ---> #vram (0x010000-0x017fff) |
115 |
| | |
116 |
| +--- alias@0x08000-0xffff ---> #vram (0x020000-0x027fff) |
117 |
| |
118 |
+---- vram: ram@0xe1000000-0xe1ffffff |
119 |
| |
120 |
+---- vga-mmio: mmio@0xe2000000-0xe200ffff |
121 |
|
122 |
ram: ram@0x00000000-0xffffffff |
123 |
|
124 |
The is a (simplified) PC memory map. The 4GB RAM block is mapped into the |
125 |
system address space via two aliases: "lomem" is a 1:1 mapping of the first |
126 |
3.5GB; "himem" maps the last 0.5GB at address 4GB. This leaves 0.5GB for the |
127 |
so-called PCI hole, that allows a 32-bit PCI bus to exist in a system with |
128 |
4GB of memory. |
129 |
|
130 |
The memory controller diverts addresses in the range 640K-768K to the PCI |
131 |
address space. This is modeled using the "vga-window" alias, mapped at a |
132 |
higher priority so it obscures the RAM at the same addresses. The vga window |
133 |
can be removed by programming the memory controller; this is modelled by |
134 |
removing the alias and exposing the RAM underneath. |
135 |
|
136 |
The pci address space is not a direct child of the system address space, since |
137 |
we only want parts of it to be visible (we accomplish this using aliases). |
138 |
It has two subregions: vga-area models the legacy vga window and is occupied |
139 |
by two 32K memory banks pointing at two sections of the framebuffer. |
140 |
In addition the vram is mapped as a BAR at address e1000000, and an additional |
141 |
BAR containing MMIO registers is mapped after it. |
142 |
|
143 |
Note that if the guest maps a BAR outside the PCI hole, it would not be |
144 |
visible as the pci-hole alias clips it to a 0.5GB range. |
145 |
|
146 |
Attributes |
147 |
---------- |
148 |
|
149 |
Various region attributes (read-only, dirty logging, coalesced mmio, ioeventfd) |
150 |
can be changed during the region lifecycle. They take effect once the region |
151 |
is made visible (which can be immediately, later, or never). |
152 |
|
153 |
MMIO Operations |
154 |
--------------- |
155 |
|
156 |
MMIO regions are provided with ->read() and ->write() callbacks; in addition |
157 |
various constraints can be supplied to control how these callbacks are called: |
158 |
|
159 |
- .valid.min_access_size, .valid.max_access_size define the access sizes |
160 |
(in bytes) which the device accepts; accesses outside this range will |
161 |
have device and bus specific behaviour (ignored, or machine check) |
162 |
- .valid.aligned specifies that the device only accepts naturally aligned |
163 |
accesses. Unaligned accesses invoke device and bus specific behaviour. |
164 |
- .impl.min_access_size, .impl.max_access_size define the access sizes |
165 |
(in bytes) supported by the *implementation*; other access sizes will be |
166 |
emulated using the ones available. For example a 4-byte write will be |
167 |
emulated using four 1-byte write, is .impl.max_access_size = 1. |
168 |
- .impl.valid specifies that the *implementation* only supports unaligned |
169 |
accesses; unaligned accesses will be emulated by two aligned accesses. |
170 |
- .old_portio and .old_mmio can be used to ease porting from code using |
171 |
cpu_register_io_memory() and register_ioport(). They should not be used |
172 |
in new code. |