Statistics
| Branch: | Revision:

root / tcg / README @ 390efc54

History | View | Annotate | Download (11.7 kB)

1
Tiny Code Generator - Fabrice Bellard.
2

    
3
1) Introduction
4

    
5
TCG (Tiny Code Generator) began as a generic backend for a C
6
compiler. It was simplified to be used in QEMU. It also has its roots
7
in the QOP code generator written by Paul Brook. 
8

    
9
2) Definitions
10

    
11
The TCG "target" is the architecture for which we generate the
12
code. It is of course not the same as the "target" of QEMU which is
13
the emulated architecture. As TCG started as a generic C backend used
14
for cross compiling, it is assumed that the TCG target is different
15
from the host, although it is never the case for QEMU.
16

    
17
A TCG "function" corresponds to a QEMU Translated Block (TB).
18

    
19
A TCG "temporary" is a variable only live in a given
20
function. Temporaries are allocated explicitly in each function.
21

    
22
A TCG "global" is a variable which is live in all the functions. They
23
are defined before the functions defined. A TCG global can be a memory
24
location (e.g. a QEMU CPU register), a fixed host register (e.g. the
25
QEMU CPU state pointer) or a memory location which is stored in a
26
register outside QEMU TBs (not implemented yet).
27

    
28
A TCG "basic block" corresponds to a list of instructions terminated
29
by a branch instruction. 
30

    
31
3) Intermediate representation
32

    
33
3.1) Introduction
34

    
35
TCG instructions operate on variables which are temporaries or
36
globals. TCG instructions and variables are strongly typed. Two types
37
are supported: 32 bit integers and 64 bit integers. Pointers are
38
defined as an alias to 32 bit or 64 bit integers depending on the TCG
39
target word size.
40

    
41
Each instruction has a fixed number of output variable operands, input
42
variable operands and always constant operands.
43

    
44
The notable exception is the call instruction which has a variable
45
number of outputs and inputs.
46

    
47
In the textual form, output operands come first, followed by input
48
operands, followed by constant operands. The output type is included
49
in the instruction name. Constants are prefixed with a '$'.
50

    
51
add_i32 t0, t1, t2  (t0 <- t1 + t2)
52

    
53
sub_i64 t2, t3, $4  (t2 <- t3 - 4)
54

    
55
3.2) Assumptions
56

    
57
* Basic blocks
58

    
59
- Basic blocks end after branches (e.g. brcond_i32 instruction),
60
  goto_tb and exit_tb instructions.
61
- Basic blocks end before legacy dyngen operations.
62
- Basic blocks start after the end of a previous basic block, at a
63
  set_label instruction or after a legacy dyngen operation.
64

    
65
After the end of a basic block, temporaries at destroyed and globals
66
are stored at their initial storage (register or memory place
67
depending on their declarations).
68

    
69
* Floating point types are not supported yet
70

    
71
* Pointers: depending on the TCG target, pointer size is 32 bit or 64
72
  bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or
73
  TCG_TYPE_I64.
74

    
75
* Helpers:
76

    
77
Using the tcg_gen_helper_x_y it is possible to call any function
78
taking i32, i64 or pointer types types. Before calling an helper, all
79
globals are stored at their canonical location and it is assumed that
80
the function can modify them. In the future, function modifiers will
81
be allowed to tell that the helper does not read or write some globals.
82

    
83
On some TCG targets (e.g. x86), several calling conventions are
84
supported.
85

    
86
* Branches:
87

    
88
Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an
89
explicit address. Conditional branches can only jump to labels.
90

    
91
3.3) Code Optimizations
92

    
93
When generating instructions, you can count on at least the following
94
optimizations:
95

    
96
- Single instructions are simplified, e.g.
97

    
98
   and_i32 t0, t0, $0xffffffff
99
    
100
  is suppressed.
101

    
102
- A liveness analysis is done at the basic block level. The
103
  information is used to suppress moves from a dead temporary to
104
  another one. It is also used to remove instructions which compute
105
  dead results. The later is especially useful for condition code
106
  optimization in QEMU.
107

    
108
  In the following example:
109

    
110
  add_i32 t0, t1, t2
111
  add_i32 t0, t0, $1
112
  mov_i32 t0, $1
113

    
114
  only the last instruction is kept.
115

    
116
- A macro system is supported (may get closer to function inlining
117
  some day). It is useful if the liveness analysis is likely to prove
118
  that some results of a computation are indeed not useful. With the
119
  macro system, the user can provide several alternative
120
  implementations which are used depending on the used results. It is
121
  especially useful for condition code optimization in QEMU.
122

    
123
  Here is an example:
124

    
125
  macro_2 t0, t1, $1
126
  mov_i32 t0, $0x1234
127

    
128
  The macro identified by the ID "$1" normally returns the values t0
129
  and t1. Suppose its implementation is:
130

    
131
  macro_start
132
  brcond_i32  t2, $0, $TCG_COND_EQ, $1
133
  mov_i32 t0, $2
134
  br $2
135
  set_label $1
136
  mov_i32 t0, $3
137
  set_label $2
138
  add_i32 t1, t3, t4
139
  macro_end
140
  
141
  If t0 is not used after the macro, the user can provide a simpler
142
  implementation:
143

    
144
  macro_start
145
  add_i32 t1, t2, t4
146
  macro_end
147

    
148
  TCG automatically chooses the right implementation depending on
149
  which macro outputs are used after it.
150

    
151
  Note that if TCG did more expensive optimizations, macros would be
152
  less useful. In the previous example a macro is useful because the
153
  liveness analysis is done on each basic block separately. Hence TCG
154
  cannot remove the code computing 't0' even if it is not used after
155
  the first macro implementation.
156

    
157
3.4) Instruction Reference
158

    
159
********* Function call
160

    
161
* call <ret> <params> ptr
162

    
163
call function 'ptr' (pointer type)
164

    
165
<ret> optional 32 bit or 64 bit return value
166
<params> optional 32 bit or 64 bit parameters
167

    
168
********* Jumps/Labels
169

    
170
* jmp t0
171

    
172
Absolute jump to address t0 (pointer type).
173

    
174
* set_label $label
175

    
176
Define label 'label' at the current program point.
177

    
178
* br $label
179

    
180
Jump to label.
181

    
182
* brcond_i32/i64 cond, t0, t1, label
183

    
184
Conditional jump if t0 cond t1 is true. cond can be:
185
    TCG_COND_EQ
186
    TCG_COND_NE
187
    TCG_COND_LT /* signed */
188
    TCG_COND_GE /* signed */
189
    TCG_COND_LE /* signed */
190
    TCG_COND_GT /* signed */
191
    TCG_COND_LTU /* unsigned */
192
    TCG_COND_GEU /* unsigned */
193
    TCG_COND_LEU /* unsigned */
194
    TCG_COND_GTU /* unsigned */
195

    
196
********* Arithmetic
197

    
198
* add_i32/i64 t0, t1, t2
199

    
200
t0=t1+t2
201

    
202
* sub_i32/i64 t0, t1, t2
203

    
204
t0=t1-t2
205

    
206
* neg_i32/i64 t0, t1
207

    
208
t0=-t1 (two's complement)
209

    
210
* mul_i32/i64 t0, t1, t2
211

    
212
t0=t1*t2
213

    
214
* div_i32/i64 t0, t1, t2
215

    
216
t0=t1/t2 (signed). Undefined behavior if division by zero or overflow.
217

    
218
* divu_i32/i64 t0, t1, t2
219

    
220
t0=t1/t2 (unsigned). Undefined behavior if division by zero.
221

    
222
* rem_i32/i64 t0, t1, t2
223

    
224
t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
225

    
226
* remu_i32/i64 t0, t1, t2
227

    
228
t0=t1%t2 (unsigned). Undefined behavior if division by zero.
229

    
230
********* Logical
231

    
232
* and_i32/i64 t0, t1, t2
233

    
234
t0=t1&t2
235

    
236
* or_i32/i64 t0, t1, t2
237

    
238
t0=t1|t2
239

    
240
* xor_i32/i64 t0, t1, t2
241

    
242
t0=t1^t2
243

    
244
********* Shifts
245

    
246
* shl_i32/i64 t0, t1, t2
247

    
248
t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
249

    
250
* shr_i32/i64 t0, t1, t2
251

    
252
t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
253

    
254
* sar_i32/i64 t0, t1, t2
255

    
256
t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
257

    
258
********* Misc
259

    
260
* mov_i32/i64 t0, t1
261

    
262
t0 = t1
263

    
264
Move t1 to t0 (both operands must have the same type).
265

    
266
* ext8s_i32/i64 t0, t1
267
ext8u_i32/i64 t0, t1
268
ext16s_i32/i64 t0, t1
269
ext16u_i32/i64 t0, t1
270
ext32s_i64 t0, t1
271
ext32u_i64 t0, t1
272

    
273
8, 16 or 32 bit sign/zero extension (both operands must have the same type)
274

    
275
* bswap16_i32 t0, t1
276

    
277
16 bit byte swap on a 32 bit value. The two high order bytes must be set
278
to zero.
279

    
280
* bswap_i32 t0, t1
281

    
282
32 bit byte swap
283

    
284
* bswap_i64 t0, t1
285

    
286
64 bit byte swap
287

    
288
* discard_i32/i64 t0
289

    
290
Indicate that the value of t0 won't be used later. It is useful to
291
force dead code elimination.
292

    
293
********* Type conversions
294

    
295
* ext_i32_i64 t0, t1
296
Convert t1 (32 bit) to t0 (64 bit) and does sign extension
297

    
298
* extu_i32_i64 t0, t1
299
Convert t1 (32 bit) to t0 (64 bit) and does zero extension
300

    
301
* trunc_i64_i32 t0, t1
302
Truncate t1 (64 bit) to t0 (32 bit)
303

    
304
********* Load/Store
305

    
306
* ld_i32/i64 t0, t1, offset
307
ld8s_i32/i64 t0, t1, offset
308
ld8u_i32/i64 t0, t1, offset
309
ld16s_i32/i64 t0, t1, offset
310
ld16u_i32/i64 t0, t1, offset
311
ld32s_i64 t0, t1, offset
312
ld32u_i64 t0, t1, offset
313

    
314
t0 = read(t1 + offset)
315
Load 8, 16, 32 or 64 bits with or without sign extension from host memory. 
316
offset must be a constant.
317

    
318
* st_i32/i64 t0, t1, offset
319
st8_i32/i64 t0, t1, offset
320
st16_i32/i64 t0, t1, offset
321
st32_i64 t0, t1, offset
322

    
323
write(t0, t1 + offset)
324
Write 8, 16, 32 or 64 bits to host memory.
325

    
326
********* QEMU specific operations
327

    
328
* tb_exit t0
329

    
330
Exit the current TB and return the value t0 (word type).
331

    
332
* goto_tb index
333

    
334
Exit the current TB and jump to the TB index 'index' (constant) if the
335
current TB was linked to this TB. Otherwise execute the next
336
instructions.
337

    
338
* qemu_ld_i32/i64 t0, t1, flags
339
qemu_ld8u_i32/i64 t0, t1, flags
340
qemu_ld8s_i32/i64 t0, t1, flags
341
qemu_ld16u_i32/i64 t0, t1, flags
342
qemu_ld16s_i32/i64 t0, t1, flags
343
qemu_ld32u_i64 t0, t1, flags
344
qemu_ld32s_i64 t0, t1, flags
345

    
346
Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU
347
address type. 'flags' contains the QEMU memory index (selects user or
348
kernel access) for example.
349

    
350
* qemu_st_i32/i64 t0, t1, flags
351
qemu_st8_i32/i64 t0, t1, flags
352
qemu_st16_i32/i64 t0, t1, flags
353
qemu_st32_i64 t0, t1, flags
354

    
355
Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
356
address type. 'flags' contains the QEMU memory index (selects user or
357
kernel access) for example.
358

    
359
Note 1: Some shortcuts are defined when the last operand is known to be
360
a constant (e.g. addi for add, movi for mov).
361

    
362
Note 2: When using TCG, the opcodes must never be generated directly
363
as some of them may not be available as "real" opcodes. Always use the
364
function tcg_gen_xxx(args).
365

    
366
4) Backend
367

    
368
tcg-target.h contains the target specific definitions. tcg-target.c
369
contains the target specific code.
370

    
371
4.1) Assumptions
372

    
373
The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or
374
64 bit. It is expected that the pointer has the same size as the word.
375

    
376
On a 32 bit target, all 64 bit operations are converted to 32 bits. A
377
few specific operations must be implemented to allow it (see add2_i32,
378
sub2_i32, brcond2_i32).
379

    
380
Floating point operations are not supported in this version. A
381
previous incarnation of the code generator had full support of them,
382
but it is better to concentrate on integer operations first.
383

    
384
On a 64 bit target, no assumption is made in TCG about the storage of
385
the 32 bit values in 64 bit registers.
386

    
387
4.2) Constraints
388

    
389
GCC like constraints are used to define the constraints of every
390
instruction. Memory constraints are not supported in this
391
version. Aliases are specified in the input operands as for GCC.
392

    
393
A target can define specific register or constant constraints. If an
394
operation uses a constant input constraint which does not allow all
395
constants, it must also accept registers in order to have a fallback.
396

    
397
The movi_i32 and movi_i64 operations must accept any constants.
398

    
399
The mov_i32 and mov_i64 operations must accept any registers of the
400
same type.
401

    
402
The ld/st instructions must accept signed 32 bit constant offsets. It
403
can be implemented by reserving a specific register to compute the
404
address if the offset is too big.
405

    
406
The ld/st instructions must accept any destination (ld) or source (st)
407
register.
408

    
409
4.3) Function call assumptions
410

    
411
- The only supported types for parameters and return value are: 32 and
412
  64 bit integers and pointer.
413
- The stack grows downwards.
414
- The first N parameters are passed in registers.
415
- The next parameters are passed on the stack by storing them as words.
416
- Some registers are clobbered during the call. 
417
- The function can return 0 or 1 value in registers. On a 32 bit
418
  target, functions must be able to return 2 values in registers for
419
  64 bit return type.
420

    
421
5) Migration from dyngen to TCG
422

    
423
TCG is backward compatible with QEMU "dyngen" operations. It means
424
that TCG instructions can be freely mixed with dyngen operations. It
425
is expected that QEMU targets will be progressively fully converted to
426
TCG. Once a target is fully converted to TCG, it will be possible
427
to apply more optimizations because more registers will be free for
428
the generated code.
429

    
430
The exception model is the same as the dyngen one.