Revision 0a6b7b78 tcg/README

b/tcg/README
16 16

  
17 17
A TCG "function" corresponds to a QEMU Translated Block (TB).
18 18

  
19
A TCG "temporary" is a variable only live in a given
20
function. Temporaries are allocated explicitly in each function.
19
A TCG "temporary" is a variable only live in a basic
20
block. Temporaries are allocated explicitly in each function.
21 21

  
22
A TCG "global" is a variable which is live in all the functions. They
23
are defined before the functions defined. A TCG global can be a memory
24
location (e.g. a QEMU CPU register), a fixed host register (e.g. the
25
QEMU CPU state pointer) or a memory location which is stored in a
26
register outside QEMU TBs (not implemented yet).
22
A TCG "local temporary" is a variable only live in a function. Local
23
temporaries are allocated explicitly in each function.
24

  
25
A TCG "global" is a variable which is live in all the functions
26
(equivalent of a C global variable). They are defined before the
27
functions defined. A TCG global can be a memory location (e.g. a QEMU
28
CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
29
or a memory location which is stored in a register outside QEMU TBs
30
(not implemented yet).
27 31

  
28 32
A TCG "basic block" corresponds to a list of instructions terminated
29 33
by a branch instruction. 
......
32 36

  
33 37
3.1) Introduction
34 38

  
35
TCG instructions operate on variables which are temporaries or
36
globals. TCG instructions and variables are strongly typed. Two types
37
are supported: 32 bit integers and 64 bit integers. Pointers are
38
defined as an alias to 32 bit or 64 bit integers depending on the TCG
39
target word size.
39
TCG instructions operate on variables which are temporaries, local
40
temporaries or globals. TCG instructions and variables are strongly
41
typed. Two types are supported: 32 bit integers and 64 bit
42
integers. Pointers are defined as an alias to 32 bit or 64 bit
43
integers depending on the TCG target word size.
40 44

  
41 45
Each instruction has a fixed number of output variable operands, input
42 46
variable operands and always constant operands.
......
44 48
The notable exception is the call instruction which has a variable
45 49
number of outputs and inputs.
46 50

  
47
In the textual form, output operands come first, followed by input
48
operands, followed by constant operands. The output type is included
49
in the instruction name. Constants are prefixed with a '$'.
51
In the textual form, output operands usually come first, followed by
52
input operands, followed by constant operands. The output type is
53
included in the instruction name. Constants are prefixed with a '$'.
50 54

  
51 55
add_i32 t0, t1, t2  (t0 <- t1 + t2)
52 56

  
53
sub_i64 t2, t3, $4  (t2 <- t3 - 4)
54

  
55 57
3.2) Assumptions
56 58

  
57 59
* Basic blocks
......
62 64
- Basic blocks start after the end of a previous basic block, at a
63 65
  set_label instruction or after a legacy dyngen operation.
64 66

  
65
After the end of a basic block, temporaries at destroyed and globals
66
are stored at their initial storage (register or memory place
67
depending on their declarations).
67
After the end of a basic block, the content of temporaries is
68
destroyed, but local temporaries and globals are preserved.
68 69

  
69 70
* Floating point types are not supported yet
70 71

  
......
100 101
  is suppressed.
101 102

  
102 103
- A liveness analysis is done at the basic block level. The
103
  information is used to suppress moves from a dead temporary to
104
  information is used to suppress moves from a dead variable to
104 105
  another one. It is also used to remove instructions which compute
105 106
  dead results. The later is especially useful for condition code
106 107
  optimization in QEMU.
......
113 114

  
114 115
  only the last instruction is kept.
115 116

  
116
- A macro system is supported (may get closer to function inlining
117
  some day). It is useful if the liveness analysis is likely to prove
118
  that some results of a computation are indeed not useful. With the
119
  macro system, the user can provide several alternative
120
  implementations which are used depending on the used results. It is
121
  especially useful for condition code optimization in QEMU.
122

  
123
  Here is an example:
124

  
125
  macro_2 t0, t1, $1
126
  mov_i32 t0, $0x1234
127

  
128
  The macro identified by the ID "$1" normally returns the values t0
129
  and t1. Suppose its implementation is:
130

  
131
  macro_start
132
  brcond_i32  t2, $0, $TCG_COND_EQ, $1
133
  mov_i32 t0, $2
134
  br $2
135
  set_label $1
136
  mov_i32 t0, $3
137
  set_label $2
138
  add_i32 t1, t3, t4
139
  macro_end
140
  
141
  If t0 is not used after the macro, the user can provide a simpler
142
  implementation:
143

  
144
  macro_start
145
  add_i32 t1, t2, t4
146
  macro_end
147

  
148
  TCG automatically chooses the right implementation depending on
149
  which macro outputs are used after it.
150

  
151
  Note that if TCG did more expensive optimizations, macros would be
152
  less useful. In the previous example a macro is useful because the
153
  liveness analysis is done on each basic block separately. Hence TCG
154
  cannot remove the code computing 't0' even if it is not used after
155
  the first macro implementation.
156

  
157 117
3.4) Instruction Reference
158 118

  
159 119
********* Function call
......
241 201

  
242 202
t0=t1^t2
243 203

  
204
* not_i32/i64 t0, t1
205

  
206
t0=~t1
207

  
244 208
********* Shifts
245 209

  
246 210
* shl_i32/i64 t0, t1, t2
......
428 392
the generated code.
429 393

  
430 394
The exception model is the same as the dyngen one.
395

  
396
6) Recommended coding rules for best performance
397

  
398
- Use globals to represent the parts of the QEMU CPU state which are
399
  often modified, e.g. the integer registers and the condition
400
  codes. TCG will be able to use host registers to store them.
401

  
402
- Avoid globals stored in fixed registers. They must be used only to
403
  store the pointer to the CPU state and possibly to store a pointer
404
  to a register window. The other uses are to ensure backward
405
  compatibility with dyngen during the porting a new target to TCG.
406

  
407
- Use temporaries. Use local temporaries only when really needed,
408
  e.g. when you need to use a value after a jump. Local temporaries
409
  introduce a performance hit in the current TCG implementation: their
410
  content is saved to memory at end of each basic block.
411

  
412
- Free temporaries and local temporaries when they are no longer used
413
  (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
414
  should free it after it is used. Freeing temporaries does not yield
415
  a better generated code, but it reduces the memory usage of TCG and
416
  the speed of the translation.
417

  
418
- Don't hesitate to use helpers for complicated or seldom used target
419
  intructions. There is little performance advantage in using TCG to
420
  implement target instructions taking more than about twenty TCG
421
  instructions.
422

  
423
- Use the 'discard' instruction if you know that TCG won't be able to
424
  prove that a given global is "dead" at a given program point. The
425
  x86 target uses it to improve the condition codes optimisation.

Also available in: Unified diff