Statistics
| Branch: | Revision:

root / tcg / README @ cf2be984

History | View | Annotate | Download (11.6 kB)

1
Tiny Code Generator - Fabrice Bellard.
2

    
3
1) Introduction
4

    
5
TCG (Tiny Code Generator) began as a generic backend for a C
6
compiler. It was simplified to be used in QEMU. It also has its roots
7
in the QOP code generator written by Paul Brook. 
8

    
9
2) Definitions
10

    
11
The TCG "target" is the architecture for which we generate the
12
code. It is of course not the same as the "target" of QEMU which is
13
the emulated architecture. As TCG started as a generic C backend used
14
for cross compiling, it is assumed that the TCG target is different
15
from the host, although it is never the case for QEMU.
16

    
17
A TCG "function" corresponds to a QEMU Translated Block (TB).
18

    
19
A TCG "temporary" is a variable only live in a given
20
function. Temporaries are allocated explicitly in each function.
21

    
22
A TCG "global" is a variable which is live in all the functions. They
23
are defined before the functions defined. A TCG global can be a memory
24
location (e.g. a QEMU CPU register), a fixed host register (e.g. the
25
QEMU CPU state pointer) or a memory location which is stored in a
26
register outside QEMU TBs (not implemented yet).
27

    
28
A TCG "basic block" corresponds to a list of instructions terminated
29
by a branch instruction. 
30

    
31
3) Intermediate representation
32

    
33
3.1) Introduction
34

    
35
TCG instructions operate on variables which are temporaries or
36
globals. TCG instructions and variables are strongly typed. Two types
37
are supported: 32 bit integers and 64 bit integers. Pointers are
38
defined as an alias to 32 bit or 64 bit integers depending on the TCG
39
target word size.
40

    
41
Each instruction has a fixed number of output variable operands, input
42
variable operands and always constant operands.
43

    
44
The notable exception is the call instruction which has a variable
45
number of outputs and inputs.
46

    
47
In the textual form, output operands come first, followed by input
48
operands, followed by constant operands. The output type is included
49
in the instruction name. Constants are prefixed with a '$'.
50

    
51
add_i32 t0, t1, t2  (t0 <- t1 + t2)
52

    
53
sub_i64 t2, t3, $4  (t2 <- t3 - 4)
54

    
55
3.2) Assumptions
56

    
57
* Basic blocks
58

    
59
- Basic blocks end after branches (e.g. brcond_i32 instruction),
60
  goto_tb and exit_tb instructions.
61
- Basic blocks end before legacy dyngen operations.
62
- Basic blocks start after the end of a previous basic block, at a
63
  set_label instruction or after a legacy dyngen operation.
64

    
65
After the end of a basic block, temporaries at destroyed and globals
66
are stored at their initial storage (register or memory place
67
depending on their declarations).
68

    
69
* Floating point types are not supported yet
70

    
71
* Pointers: depending on the TCG target, pointer size is 32 bit or 64
72
  bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or
73
  TCG_TYPE_I64.
74

    
75
* Helpers:
76

    
77
Using the tcg_gen_helper_x_y it is possible to call any function
78
taking i32, i64 or pointer types types. Before calling an helper, all
79
globals are stored at their canonical location and it is assumed that
80
the function can modify them. In the future, function modifiers will
81
be allowed to tell that the helper does not read or write some globals.
82

    
83
On some TCG targets (e.g. x86), several calling conventions are
84
supported.
85

    
86
* Branches:
87

    
88
Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an
89
explicit address. Conditional branches can only jump to labels.
90

    
91
3.3) Code Optimizations
92

    
93
When generating instructions, you can count on at least the following
94
optimizations:
95

    
96
- Single instructions are simplified, e.g.
97

    
98
   and_i32 t0, t0, $0xffffffff
99
    
100
  is suppressed.
101

    
102
- A liveness analysis is done at the basic block level. The
103
  information is used to suppress moves from a dead temporary to
104
  another one. It is also used to remove instructions which compute
105
  dead results. The later is especially useful for condition code
106
  optimization in QEMU.
107

    
108
  In the following example:
109

    
110
  add_i32 t0, t1, t2
111
  add_i32 t0, t0, $1
112
  mov_i32 t0, $1
113

    
114
  only the last instruction is kept.
115

    
116
- A macro system is supported (may get closer to function inlining
117
  some day). It is useful if the liveness analysis is likely to prove
118
  that some results of a computation are indeed not useful. With the
119
  macro system, the user can provide several alternative
120
  implementations which are used depending on the used results. It is
121
  especially useful for condition code optimization in QEMU.
122

    
123
  Here is an example:
124

    
125
  macro_2 t0, t1, $1
126
  mov_i32 t0, $0x1234
127

    
128
  The macro identified by the ID "$1" normally returns the values t0
129
  and t1. Suppose its implementation is:
130

    
131
  macro_start
132
  brcond_i32  t2, $0, $TCG_COND_EQ, $1
133
  mov_i32 t0, $2
134
  br $2
135
  set_label $1
136
  mov_i32 t0, $3
137
  set_label $2
138
  add_i32 t1, t3, t4
139
  macro_end
140
  
141
  If t0 is not used after the macro, the user can provide a simpler
142
  implementation:
143

    
144
  macro_start
145
  add_i32 t1, t2, t4
146
  macro_end
147

    
148
  TCG automatically chooses the right implementation depending on
149
  which macro outputs are used after it.
150

    
151
  Note that if TCG did more expensive optimizations, macros would be
152
  less useful. In the previous example a macro is useful because the
153
  liveness analysis is done on each basic block separately. Hence TCG
154
  cannot remove the code computing 't0' even if it is not used after
155
  the first macro implementation.
156

    
157
3.4) Instruction Reference
158

    
159
********* Function call
160

    
161
* call <ret> <params> ptr
162

    
163
call function 'ptr' (pointer type)
164

    
165
<ret> optional 32 bit or 64 bit return value
166
<params> optional 32 bit or 64 bit parameters
167

    
168
********* Jumps/Labels
169

    
170
* jmp t0
171

    
172
Absolute jump to address t0 (pointer type).
173

    
174
* set_label $label
175

    
176
Define label 'label' at the current program point.
177

    
178
* br $label
179

    
180
Jump to label.
181

    
182
* brcond_i32/i64 cond, t0, t1, label
183

    
184
Conditional jump if t0 cond t1 is true. cond can be:
185
    TCG_COND_EQ
186
    TCG_COND_NE
187
    TCG_COND_LT /* signed */
188
    TCG_COND_GE /* signed */
189
    TCG_COND_LE /* signed */
190
    TCG_COND_GT /* signed */
191
    TCG_COND_LTU /* unsigned */
192
    TCG_COND_GEU /* unsigned */
193
    TCG_COND_LEU /* unsigned */
194
    TCG_COND_GTU /* unsigned */
195

    
196
********* Arithmetic
197

    
198
* add_i32/i64 t0, t1, t2
199

    
200
t0=t1+t2
201

    
202
* sub_i32/i64 t0, t1, t2
203

    
204
t0=t1-t2
205

    
206
* mul_i32/i64 t0, t1, t2
207

    
208
t0=t1*t2
209

    
210
* div_i32/i64 t0, t1, t2
211

    
212
t0=t1/t2 (signed). Undefined behavior if division by zero or overflow.
213

    
214
* divu_i32/i64 t0, t1, t2
215

    
216
t0=t1/t2 (unsigned). Undefined behavior if division by zero.
217

    
218
* rem_i32/i64 t0, t1, t2
219

    
220
t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
221

    
222
* remu_i32/i64 t0, t1, t2
223

    
224
t0=t1%t2 (unsigned). Undefined behavior if division by zero.
225

    
226
********* Logical
227

    
228
* and_i32/i64 t0, t1, t2
229

    
230
t0=t1&t2
231

    
232
* or_i32/i64 t0, t1, t2
233

    
234
t0=t1|t2
235

    
236
* xor_i32/i64 t0, t1, t2
237

    
238
t0=t1^t2
239

    
240
********* Shifts
241

    
242
* shl_i32/i64 t0, t1, t2
243

    
244
t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
245

    
246
* shr_i32/i64 t0, t1, t2
247

    
248
t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
249

    
250
* sar_i32/i64 t0, t1, t2
251

    
252
t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
253

    
254
********* Misc
255

    
256
* mov_i32/i64 t0, t1
257

    
258
t0 = t1
259

    
260
Move t1 to t0 (both operands must have the same type).
261

    
262
* ext8s_i32/i64 t0, t1
263
ext16s_i32/i64 t0, t1
264
ext32s_i64 t0, t1
265

    
266
8, 16 or 32 bit sign extension (both operands must have the same type)
267

    
268
* bswap16_i32 t0, t1
269

    
270
16 bit byte swap on a 32 bit value. The two high order bytes must be set
271
to zero.
272

    
273
* bswap_i32 t0, t1
274

    
275
32 bit byte swap
276

    
277
* bswap_i64 t0, t1
278

    
279
64 bit byte swap
280

    
281
* discard_i32/i64 t0
282

    
283
Indicate that the value of t0 won't be used later. It is useful to
284
force dead code elimination.
285

    
286
********* Type conversions
287

    
288
* ext_i32_i64 t0, t1
289
Convert t1 (32 bit) to t0 (64 bit) and does sign extension
290

    
291
* extu_i32_i64 t0, t1
292
Convert t1 (32 bit) to t0 (64 bit) and does zero extension
293

    
294
* trunc_i64_i32 t0, t1
295
Truncate t1 (64 bit) to t0 (32 bit)
296

    
297
********* Load/Store
298

    
299
* ld_i32/i64 t0, t1, offset
300
ld8s_i32/i64 t0, t1, offset
301
ld8u_i32/i64 t0, t1, offset
302
ld16s_i32/i64 t0, t1, offset
303
ld16u_i32/i64 t0, t1, offset
304
ld32s_i64 t0, t1, offset
305
ld32u_i64 t0, t1, offset
306

    
307
t0 = read(t1 + offset)
308
Load 8, 16, 32 or 64 bits with or without sign extension from host memory. 
309
offset must be a constant.
310

    
311
* st_i32/i64 t0, t1, offset
312
st8_i32/i64 t0, t1, offset
313
st16_i32/i64 t0, t1, offset
314
st32_i64 t0, t1, offset
315

    
316
write(t0, t1 + offset)
317
Write 8, 16, 32 or 64 bits to host memory.
318

    
319
********* QEMU specific operations
320

    
321
* tb_exit t0
322

    
323
Exit the current TB and return the value t0 (word type).
324

    
325
* goto_tb index
326

    
327
Exit the current TB and jump to the TB index 'index' (constant) if the
328
current TB was linked to this TB. Otherwise execute the next
329
instructions.
330

    
331
* qemu_ld_i32/i64 t0, t1, flags
332
qemu_ld8u_i32/i64 t0, t1, flags
333
qemu_ld8s_i32/i64 t0, t1, flags
334
qemu_ld16u_i32/i64 t0, t1, flags
335
qemu_ld16s_i32/i64 t0, t1, flags
336
qemu_ld32u_i64 t0, t1, flags
337
qemu_ld32s_i64 t0, t1, flags
338

    
339
Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU
340
address type. 'flags' contains the QEMU memory index (selects user or
341
kernel access) for example.
342

    
343
* qemu_st_i32/i64 t0, t1, flags
344
qemu_st8_i32/i64 t0, t1, flags
345
qemu_st16_i32/i64 t0, t1, flags
346
qemu_st32_i64 t0, t1, flags
347

    
348
Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
349
address type. 'flags' contains the QEMU memory index (selects user or
350
kernel access) for example.
351

    
352
Note 1: Some shortcuts are defined when the last operand is known to be
353
a constant (e.g. addi for add, movi for mov).
354

    
355
Note 2: When using TCG, the opcodes must never be generated directly
356
as some of them may not be available as "real" opcodes. Always use the
357
function tcg_gen_xxx(args).
358

    
359
4) Backend
360

    
361
tcg-target.h contains the target specific definitions. tcg-target.c
362
contains the target specific code.
363

    
364
4.1) Assumptions
365

    
366
The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or
367
64 bit. It is expected that the pointer has the same size as the word.
368

    
369
On a 32 bit target, all 64 bit operations are converted to 32 bits. A
370
few specific operations must be implemented to allow it (see add2_i32,
371
sub2_i32, brcond2_i32).
372

    
373
Floating point operations are not supported in this version. A
374
previous incarnation of the code generator had full support of them,
375
but it is better to concentrate on integer operations first.
376

    
377
On a 64 bit target, no assumption is made in TCG about the storage of
378
the 32 bit values in 64 bit registers.
379

    
380
4.2) Constraints
381

    
382
GCC like constraints are used to define the constraints of every
383
instruction. Memory constraints are not supported in this
384
version. Aliases are specified in the input operands as for GCC.
385

    
386
A target can define specific register or constant constraints. If an
387
operation uses a constant input constraint which does not allow all
388
constants, it must also accept registers in order to have a fallback.
389

    
390
The movi_i32 and movi_i64 operations must accept any constants.
391

    
392
The mov_i32 and mov_i64 operations must accept any registers of the
393
same type.
394

    
395
The ld/st instructions must accept signed 32 bit constant offsets. It
396
can be implemented by reserving a specific register to compute the
397
address if the offset is too big.
398

    
399
The ld/st instructions must accept any destination (ld) or source (st)
400
register.
401

    
402
4.3) Function call assumptions
403

    
404
- The only supported types for parameters and return value are: 32 and
405
  64 bit integers and pointer.
406
- The stack grows downwards.
407
- The first N parameters are passed in registers.
408
- The next parameters are passed on the stack by storing them as words.
409
- Some registers are clobbered during the call. 
410
- The function can return 0 or 1 value in registers. On a 32 bit
411
  target, functions must be able to return 2 values in registers for
412
  64 bit return type.
413

    
414
5) Migration from dyngen to TCG
415

    
416
TCG is backward compatible with QEMU "dyngen" operations. It means
417
that TCG instructions can be freely mixed with dyngen operations. It
418
is expected that QEMU targets will be progressively fully converted to
419
TCG. Once a target is fully converted to TCG, it will be possible
420
to apply more optimizations because more registers will be free for
421
the generated code.
422

    
423
The exception model is the same as the dyngen one.