Revision 0a6b7b78 tcg/README
b/tcg/README | ||
---|---|---|
16 | 16 |
|
17 | 17 |
A TCG "function" corresponds to a QEMU Translated Block (TB). |
18 | 18 |
|
19 |
A TCG "temporary" is a variable only live in a given
|
|
20 |
function. Temporaries are allocated explicitly in each function.
|
|
19 |
A TCG "temporary" is a variable only live in a basic
|
|
20 |
block. Temporaries are allocated explicitly in each function.
|
|
21 | 21 |
|
22 |
A TCG "global" is a variable which is live in all the functions. They |
|
23 |
are defined before the functions defined. A TCG global can be a memory |
|
24 |
location (e.g. a QEMU CPU register), a fixed host register (e.g. the |
|
25 |
QEMU CPU state pointer) or a memory location which is stored in a |
|
26 |
register outside QEMU TBs (not implemented yet). |
|
22 |
A TCG "local temporary" is a variable only live in a function. Local |
|
23 |
temporaries are allocated explicitly in each function. |
|
24 |
|
|
25 |
A TCG "global" is a variable which is live in all the functions |
|
26 |
(equivalent of a C global variable). They are defined before the |
|
27 |
functions defined. A TCG global can be a memory location (e.g. a QEMU |
|
28 |
CPU register), a fixed host register (e.g. the QEMU CPU state pointer) |
|
29 |
or a memory location which is stored in a register outside QEMU TBs |
|
30 |
(not implemented yet). |
|
27 | 31 |
|
28 | 32 |
A TCG "basic block" corresponds to a list of instructions terminated |
29 | 33 |
by a branch instruction. |
... | ... | |
32 | 36 |
|
33 | 37 |
3.1) Introduction |
34 | 38 |
|
35 |
TCG instructions operate on variables which are temporaries or
|
|
36 |
globals. TCG instructions and variables are strongly typed. Two types
|
|
37 |
are supported: 32 bit integers and 64 bit integers. Pointers are
|
|
38 |
defined as an alias to 32 bit or 64 bit integers depending on the TCG
|
|
39 |
target word size. |
|
39 |
TCG instructions operate on variables which are temporaries, local
|
|
40 |
temporaries or globals. TCG instructions and variables are strongly
|
|
41 |
typed. Two types are supported: 32 bit integers and 64 bit
|
|
42 |
integers. Pointers are defined as an alias to 32 bit or 64 bit
|
|
43 |
integers depending on the TCG target word size.
|
|
40 | 44 |
|
41 | 45 |
Each instruction has a fixed number of output variable operands, input |
42 | 46 |
variable operands and always constant operands. |
... | ... | |
44 | 48 |
The notable exception is the call instruction which has a variable |
45 | 49 |
number of outputs and inputs. |
46 | 50 |
|
47 |
In the textual form, output operands come first, followed by input
|
|
48 |
operands, followed by constant operands. The output type is included
|
|
49 |
in the instruction name. Constants are prefixed with a '$'. |
|
51 |
In the textual form, output operands usually come first, followed by
|
|
52 |
input operands, followed by constant operands. The output type is
|
|
53 |
included in the instruction name. Constants are prefixed with a '$'.
|
|
50 | 54 |
|
51 | 55 |
add_i32 t0, t1, t2 (t0 <- t1 + t2) |
52 | 56 |
|
53 |
sub_i64 t2, t3, $4 (t2 <- t3 - 4) |
|
54 |
|
|
55 | 57 |
3.2) Assumptions |
56 | 58 |
|
57 | 59 |
* Basic blocks |
... | ... | |
62 | 64 |
- Basic blocks start after the end of a previous basic block, at a |
63 | 65 |
set_label instruction or after a legacy dyngen operation. |
64 | 66 |
|
65 |
After the end of a basic block, temporaries at destroyed and globals |
|
66 |
are stored at their initial storage (register or memory place |
|
67 |
depending on their declarations). |
|
67 |
After the end of a basic block, the content of temporaries is |
|
68 |
destroyed, but local temporaries and globals are preserved. |
|
68 | 69 |
|
69 | 70 |
* Floating point types are not supported yet |
70 | 71 |
|
... | ... | |
100 | 101 |
is suppressed. |
101 | 102 |
|
102 | 103 |
- A liveness analysis is done at the basic block level. The |
103 |
information is used to suppress moves from a dead temporary to
|
|
104 |
information is used to suppress moves from a dead variable to
|
|
104 | 105 |
another one. It is also used to remove instructions which compute |
105 | 106 |
dead results. The later is especially useful for condition code |
106 | 107 |
optimization in QEMU. |
... | ... | |
113 | 114 |
|
114 | 115 |
only the last instruction is kept. |
115 | 116 |
|
116 |
- A macro system is supported (may get closer to function inlining |
|
117 |
some day). It is useful if the liveness analysis is likely to prove |
|
118 |
that some results of a computation are indeed not useful. With the |
|
119 |
macro system, the user can provide several alternative |
|
120 |
implementations which are used depending on the used results. It is |
|
121 |
especially useful for condition code optimization in QEMU. |
|
122 |
|
|
123 |
Here is an example: |
|
124 |
|
|
125 |
macro_2 t0, t1, $1 |
|
126 |
mov_i32 t0, $0x1234 |
|
127 |
|
|
128 |
The macro identified by the ID "$1" normally returns the values t0 |
|
129 |
and t1. Suppose its implementation is: |
|
130 |
|
|
131 |
macro_start |
|
132 |
brcond_i32 t2, $0, $TCG_COND_EQ, $1 |
|
133 |
mov_i32 t0, $2 |
|
134 |
br $2 |
|
135 |
set_label $1 |
|
136 |
mov_i32 t0, $3 |
|
137 |
set_label $2 |
|
138 |
add_i32 t1, t3, t4 |
|
139 |
macro_end |
|
140 |
|
|
141 |
If t0 is not used after the macro, the user can provide a simpler |
|
142 |
implementation: |
|
143 |
|
|
144 |
macro_start |
|
145 |
add_i32 t1, t2, t4 |
|
146 |
macro_end |
|
147 |
|
|
148 |
TCG automatically chooses the right implementation depending on |
|
149 |
which macro outputs are used after it. |
|
150 |
|
|
151 |
Note that if TCG did more expensive optimizations, macros would be |
|
152 |
less useful. In the previous example a macro is useful because the |
|
153 |
liveness analysis is done on each basic block separately. Hence TCG |
|
154 |
cannot remove the code computing 't0' even if it is not used after |
|
155 |
the first macro implementation. |
|
156 |
|
|
157 | 117 |
3.4) Instruction Reference |
158 | 118 |
|
159 | 119 |
********* Function call |
... | ... | |
241 | 201 |
|
242 | 202 |
t0=t1^t2 |
243 | 203 |
|
204 |
* not_i32/i64 t0, t1 |
|
205 |
|
|
206 |
t0=~t1 |
|
207 |
|
|
244 | 208 |
********* Shifts |
245 | 209 |
|
246 | 210 |
* shl_i32/i64 t0, t1, t2 |
... | ... | |
428 | 392 |
the generated code. |
429 | 393 |
|
430 | 394 |
The exception model is the same as the dyngen one. |
395 |
|
|
396 |
6) Recommended coding rules for best performance |
|
397 |
|
|
398 |
- Use globals to represent the parts of the QEMU CPU state which are |
|
399 |
often modified, e.g. the integer registers and the condition |
|
400 |
codes. TCG will be able to use host registers to store them. |
|
401 |
|
|
402 |
- Avoid globals stored in fixed registers. They must be used only to |
|
403 |
store the pointer to the CPU state and possibly to store a pointer |
|
404 |
to a register window. The other uses are to ensure backward |
|
405 |
compatibility with dyngen during the porting a new target to TCG. |
|
406 |
|
|
407 |
- Use temporaries. Use local temporaries only when really needed, |
|
408 |
e.g. when you need to use a value after a jump. Local temporaries |
|
409 |
introduce a performance hit in the current TCG implementation: their |
|
410 |
content is saved to memory at end of each basic block. |
|
411 |
|
|
412 |
- Free temporaries and local temporaries when they are no longer used |
|
413 |
(tcg_temp_free). Since tcg_const_x() also creates a temporary, you |
|
414 |
should free it after it is used. Freeing temporaries does not yield |
|
415 |
a better generated code, but it reduces the memory usage of TCG and |
|
416 |
the speed of the translation. |
|
417 |
|
|
418 |
- Don't hesitate to use helpers for complicated or seldom used target |
|
419 |
intructions. There is little performance advantage in using TCG to |
|
420 |
implement target instructions taking more than about twenty TCG |
|
421 |
instructions. |
|
422 |
|
|
423 |
- Use the 'discard' instruction if you know that TCG won't be able to |
|
424 |
prove that a given global is "dead" at a given program point. The |
|
425 |
x86 target uses it to improve the condition codes optimisation. |
Also available in: Unified diff