16 |
16 |
|
17 |
17 |
A TCG "function" corresponds to a QEMU Translated Block (TB).
|
18 |
18 |
|
19 |
|
A TCG "temporary" is a variable only live in a given
|
20 |
|
function. Temporaries are allocated explicitly in each function.
|
|
19 |
A TCG "temporary" is a variable only live in a basic
|
|
20 |
block. Temporaries are allocated explicitly in each function.
|
21 |
21 |
|
22 |
|
A TCG "global" is a variable which is live in all the functions. They
|
23 |
|
are defined before the functions defined. A TCG global can be a memory
|
24 |
|
location (e.g. a QEMU CPU register), a fixed host register (e.g. the
|
25 |
|
QEMU CPU state pointer) or a memory location which is stored in a
|
26 |
|
register outside QEMU TBs (not implemented yet).
|
|
22 |
A TCG "local temporary" is a variable only live in a function. Local
|
|
23 |
temporaries are allocated explicitly in each function.
|
|
24 |
|
|
25 |
A TCG "global" is a variable which is live in all the functions
|
|
26 |
(equivalent of a C global variable). They are defined before the
|
|
27 |
functions defined. A TCG global can be a memory location (e.g. a QEMU
|
|
28 |
CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
|
|
29 |
or a memory location which is stored in a register outside QEMU TBs
|
|
30 |
(not implemented yet).
|
27 |
31 |
|
28 |
32 |
A TCG "basic block" corresponds to a list of instructions terminated
|
29 |
33 |
by a branch instruction.
|
... | ... | |
32 |
36 |
|
33 |
37 |
3.1) Introduction
|
34 |
38 |
|
35 |
|
TCG instructions operate on variables which are temporaries or
|
36 |
|
globals. TCG instructions and variables are strongly typed. Two types
|
37 |
|
are supported: 32 bit integers and 64 bit integers. Pointers are
|
38 |
|
defined as an alias to 32 bit or 64 bit integers depending on the TCG
|
39 |
|
target word size.
|
|
39 |
TCG instructions operate on variables which are temporaries, local
|
|
40 |
temporaries or globals. TCG instructions and variables are strongly
|
|
41 |
typed. Two types are supported: 32 bit integers and 64 bit
|
|
42 |
integers. Pointers are defined as an alias to 32 bit or 64 bit
|
|
43 |
integers depending on the TCG target word size.
|
40 |
44 |
|
41 |
45 |
Each instruction has a fixed number of output variable operands, input
|
42 |
46 |
variable operands and always constant operands.
|
... | ... | |
44 |
48 |
The notable exception is the call instruction which has a variable
|
45 |
49 |
number of outputs and inputs.
|
46 |
50 |
|
47 |
|
In the textual form, output operands come first, followed by input
|
48 |
|
operands, followed by constant operands. The output type is included
|
49 |
|
in the instruction name. Constants are prefixed with a '$'.
|
|
51 |
In the textual form, output operands usually come first, followed by
|
|
52 |
input operands, followed by constant operands. The output type is
|
|
53 |
included in the instruction name. Constants are prefixed with a '$'.
|
50 |
54 |
|
51 |
55 |
add_i32 t0, t1, t2 (t0 <- t1 + t2)
|
52 |
56 |
|
53 |
|
sub_i64 t2, t3, $4 (t2 <- t3 - 4)
|
54 |
|
|
55 |
57 |
3.2) Assumptions
|
56 |
58 |
|
57 |
59 |
* Basic blocks
|
... | ... | |
62 |
64 |
- Basic blocks start after the end of a previous basic block, at a
|
63 |
65 |
set_label instruction or after a legacy dyngen operation.
|
64 |
66 |
|
65 |
|
After the end of a basic block, temporaries at destroyed and globals
|
66 |
|
are stored at their initial storage (register or memory place
|
67 |
|
depending on their declarations).
|
|
67 |
After the end of a basic block, the content of temporaries is
|
|
68 |
destroyed, but local temporaries and globals are preserved.
|
68 |
69 |
|
69 |
70 |
* Floating point types are not supported yet
|
70 |
71 |
|
... | ... | |
100 |
101 |
is suppressed.
|
101 |
102 |
|
102 |
103 |
- A liveness analysis is done at the basic block level. The
|
103 |
|
information is used to suppress moves from a dead temporary to
|
|
104 |
information is used to suppress moves from a dead variable to
|
104 |
105 |
another one. It is also used to remove instructions which compute
|
105 |
106 |
dead results. The later is especially useful for condition code
|
106 |
107 |
optimization in QEMU.
|
... | ... | |
113 |
114 |
|
114 |
115 |
only the last instruction is kept.
|
115 |
116 |
|
116 |
|
- A macro system is supported (may get closer to function inlining
|
117 |
|
some day). It is useful if the liveness analysis is likely to prove
|
118 |
|
that some results of a computation are indeed not useful. With the
|
119 |
|
macro system, the user can provide several alternative
|
120 |
|
implementations which are used depending on the used results. It is
|
121 |
|
especially useful for condition code optimization in QEMU.
|
122 |
|
|
123 |
|
Here is an example:
|
124 |
|
|
125 |
|
macro_2 t0, t1, $1
|
126 |
|
mov_i32 t0, $0x1234
|
127 |
|
|
128 |
|
The macro identified by the ID "$1" normally returns the values t0
|
129 |
|
and t1. Suppose its implementation is:
|
130 |
|
|
131 |
|
macro_start
|
132 |
|
brcond_i32 t2, $0, $TCG_COND_EQ, $1
|
133 |
|
mov_i32 t0, $2
|
134 |
|
br $2
|
135 |
|
set_label $1
|
136 |
|
mov_i32 t0, $3
|
137 |
|
set_label $2
|
138 |
|
add_i32 t1, t3, t4
|
139 |
|
macro_end
|
140 |
|
|
141 |
|
If t0 is not used after the macro, the user can provide a simpler
|
142 |
|
implementation:
|
143 |
|
|
144 |
|
macro_start
|
145 |
|
add_i32 t1, t2, t4
|
146 |
|
macro_end
|
147 |
|
|
148 |
|
TCG automatically chooses the right implementation depending on
|
149 |
|
which macro outputs are used after it.
|
150 |
|
|
151 |
|
Note that if TCG did more expensive optimizations, macros would be
|
152 |
|
less useful. In the previous example a macro is useful because the
|
153 |
|
liveness analysis is done on each basic block separately. Hence TCG
|
154 |
|
cannot remove the code computing 't0' even if it is not used after
|
155 |
|
the first macro implementation.
|
156 |
|
|
157 |
117 |
3.4) Instruction Reference
|
158 |
118 |
|
159 |
119 |
********* Function call
|
... | ... | |
241 |
201 |
|
242 |
202 |
t0=t1^t2
|
243 |
203 |
|
|
204 |
* not_i32/i64 t0, t1
|
|
205 |
|
|
206 |
t0=~t1
|
|
207 |
|
244 |
208 |
********* Shifts
|
245 |
209 |
|
246 |
210 |
* shl_i32/i64 t0, t1, t2
|
... | ... | |
428 |
392 |
the generated code.
|
429 |
393 |
|
430 |
394 |
The exception model is the same as the dyngen one.
|
|
395 |
|
|
396 |
6) Recommended coding rules for best performance
|
|
397 |
|
|
398 |
- Use globals to represent the parts of the QEMU CPU state which are
|
|
399 |
often modified, e.g. the integer registers and the condition
|
|
400 |
codes. TCG will be able to use host registers to store them.
|
|
401 |
|
|
402 |
- Avoid globals stored in fixed registers. They must be used only to
|
|
403 |
store the pointer to the CPU state and possibly to store a pointer
|
|
404 |
to a register window. The other uses are to ensure backward
|
|
405 |
compatibility with dyngen during the porting a new target to TCG.
|
|
406 |
|
|
407 |
- Use temporaries. Use local temporaries only when really needed,
|
|
408 |
e.g. when you need to use a value after a jump. Local temporaries
|
|
409 |
introduce a performance hit in the current TCG implementation: their
|
|
410 |
content is saved to memory at end of each basic block.
|
|
411 |
|
|
412 |
- Free temporaries and local temporaries when they are no longer used
|
|
413 |
(tcg_temp_free). Since tcg_const_x() also creates a temporary, you
|
|
414 |
should free it after it is used. Freeing temporaries does not yield
|
|
415 |
a better generated code, but it reduces the memory usage of TCG and
|
|
416 |
the speed of the translation.
|
|
417 |
|
|
418 |
- Don't hesitate to use helpers for complicated or seldom used target
|
|
419 |
intructions. There is little performance advantage in using TCG to
|
|
420 |
implement target instructions taking more than about twenty TCG
|
|
421 |
instructions.
|
|
422 |
|
|
423 |
- Use the 'discard' instruction if you know that TCG won't be able to
|
|
424 |
prove that a given global is "dead" at a given program point. The
|
|
425 |
x86 target uses it to improve the condition codes optimisation.
|