Merge git://github.com/hw-claudio/qemu-aarch64-queue into tcg-next
tcg/aarch64: Implement tlb lookup fast path
Supports CONFIG_QEMU_LDST_OPTIMIZATION
Signed-off-by: Jani Kokkonen <jani.kokkonen@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
tcg-i386: Use QEMU_BUILD_BUG_ON instead of assert for frame size
We can check the condition at compile time, rather than run time.
Reviewed-by: Andreas Färber <afaerber@suse.de>Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg-arm: Implement tcg_register_jit
Allows unwinding past the code_gen_buffer.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg: Fix high_pc fields in .debug_info
I don't think the debugger actually looks at this for anything,using the correct .debug_frame contents, but might as well getit all correct.
tcg: Move the CIE and FDE header definitions to common code
These will necessarily be the same layout for all hosts. This limitsthe amount of boilerplate required to implement jit debug for a host.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>...
tcg-arm: Use AT_PLATFORM to detect the host ISA
With this we can generate armv7 insns even when the OS compiles for alower common denominator. The macros are arranged so that when we docompile for a given ISA, all of the runtime checks for that ISA are...
tcg-arm: Simplify logic in detecting the ARM ISA in use
GCC 4.8 defines a handy __ARM_ARCH symbol that we can use, whichwill make us nicely forward compatible with ARMv8 AArch32.
tcg-arm: Rename use_armv5_instructions to use_armvt5_instructions
As it really controls the availability of a thumb interworkinginstruction on armv5t.
tcg: Allow non-constant control macros
This allows TCG_TARGET_HAS_* to be a variable rather than a constant,which allows easier support for differing ISA levels for the host.
tcg: Simplify logic using TCG_OPF_NOT_PRESENT
Expand the definition of "not present" to include "should not be present".This means we can simplify the logic surrounding the generic tcg opcodesfor which the host backend ought not be providing definitions....
tcg-arm: Make use of conditional availability of opcodes for divide
We can now detect and use divide instructions at runtime, rather thanhaving to restrict their availability to compile-time.
tcg-arm: Don't implement rem
tcg-ppc: Don't implement rem
tcg-ppc64: Don't implement rem
tcg: Split rem requirement from div requirement
There are several hosts with only a "div" insn. Remainder is computedmanually from the quotient and inputs. We can do this generically.
tcg/aarch64: implement ldst 12bit scaled uimm offset
implement the 12bit scaled unsigned immediate offsetvariant of LDR/STR. This improves code size by avoidingthe movi + ldst_r for naturally aligned offsets in range.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>...
tcg-ppc64: bswap64 rotates output 32 bits
If our input and output is in the same register, bswap64 tries toundo a rotate of the input. This just ends up rotating the output.
Cc: qemu-stable@nongnu.orgSigned-off-by: Anton Blanchard <anton@samba.org>Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg-ppc64: Fix add2_i64
add2_i64 was adding the lower double word to the upper double wordof each input. Fix this so we add the lower double words, then theupper double words with carry propagation.
Cc: qemu-stable@nongnu.orgSigned-off-by: Anton Blanchard <anton@samba.org>...
tcg-ppc64: rotr_i32 rotates wrong amount
rotr_i32 calculates the amount to left shift and puts it into atemporary, but then doesn't use it when doing the shift.
tcg-ppc64: Fix RLDCL opcode
The rldcl instruction doesn't have an sh field, so the minor opcodeis shifted 1 bit. We were using the XO30 macro which shifted theminor opcode 2 bits.
Remove XO30 and add MD30 and MDS30 macros which match thePower ISA categories....
Merge remote-tracking branch 'pmaydell/tcg-aarch64.next' into staging
tcg/aarch64: implement byte swap operations
implement the optional byte swap operations with the dedicatedaarch64 instructions.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>Message-id: 51AC9A33.9050003@huawei.com...
tcg/aarch64: implement sign/zero extend operations
implement the optional sign/zero extend operations with the dedicatedaarch64 instructions.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>Message-id: 51AC9A58.40502@huawei.com...
tcg/aarch64: implement user mode qemu ld/st
also put aarch64 in the list of archs that do not need an ldscript.
Signed-off-by: Jani Kokkoken <jani.kokkonen@huawei.com>Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>...
tcg/aarch64: implement new TCG target for aarch64
add preliminary support for TCG target aarch64.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>Reviewed-by: Peter Maydell <peter.maydell@linaro.org>...
tcg/aarch64: improve arith shifted regs operations
for arith operations, add SUBS, ANDS, ADDS and add a shift parameterso that all arith instructions can make use of shifted registers.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>...
tcg/aarch64: implement AND/TEST immediate pattern
add functions to AND/TEST registers with immediate patterns.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>Message-id: 51AC9A0C.3090303@huawei.com...
tcg: Remove redundant tcg_target_init checks
We've got a compile-time check for the condition in exec/cpu-defs.h.
Reviewed-by: Andreas Färber <afaerber@suse.de>Reviewed-by: liguang <lig.fnst@cn.fujitsu.com>Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg/optimize: fix setcond2 optimization
When setcond2 is rewritten into setcond, the state of the destinationtemp should be reset, so that a copy of the previous value is notused instead of the result.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>Reviewed-by: Richard Henderson <rth@twiddle.net>...
tcg-arm: Use movi32 in exit_tb
Avoid the mini constant pool for armv7, and avoid replicatingthe test for pre-v7.
Signed-off-by: Richard Henderson <rth@twiddle.net>Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
tcg-arm: Fix 64-bit tlb load for pre-v6
Found by inspection, since the effect of the bug was simply tosend all memory ops through the slow path.
tcg-arm: Split out tcg_out_tlb_read
Share code between qemu_ld and qemu_st to process the tlb.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg-arm: Improve scheduling of tcg_out_tlb_read
The schedule was fully serial, with no possibility for dual issue.The old schedule had a minimal issue of 7 cycles; the new schedulehas a minimal issue of 5 cycles.
Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg-arm: Delete the 'S' constraint
After the previous patch, 's' and 'S' are the same.
tcg-arm: Use movi32 + blx for calls on v7
Work better with branch predition when we have movw+movt,as the size of the code is the same. Perhaps re-evaluatewhen we have a proper constant pool.
tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION
Move the slow path out of line, as the TODO's mention.This allows the fast path to be unconditional, which canspeed up the fast path as well, depending on the core.
tcg-arm: Remove long jump from tcg_out_goto_label
Branches within a TB will always be within 16MB.
tcg-arm: Implement deposit for armv7
We have BFI and BFC available for implementing it.
tcg-arm: Implement division instructions
An armv7 extension implements division, present on Cortex A15.
tcg-arm: Use TCG_REG_TMP name for the tcg temporary
Don't hard-code R8.
tcg-arm: Use R12 for the tcg temporary
R12 is call clobbered, while R8 is call saved. This changegives tcg one more call saved register for real data.
tcg-arm: Cleanup multiply subroutines
Make the code more readable by only having one copy of the magicnumbers, swapping registers as needed prior to that. Speed thecompiler by not applying the rd == rn avoidance for v6 or later.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>...
tcg-arm: Cleanup most primitive load store subroutines
Use even more primitive helper functions to avoid lots of duplicated code.
tcg-arm: Handle negated constant arguments to and/sub
This greatly improves code generation for addition of smallnegative constants.
tcg-arm: Allow constant first argument to sub
This allows the generation of RSB instructions.
tcg-arm: Use tcg_out_dat_rIN for compares
This allows us to emit CMN instructions.
tcg-arm: Handle constant arguments to add2/sub2
We get to re-use the _rIN and _rIK subroutines to handle the variouscombinations of add vs sub. Fold the << 21 into the opcode enum valuesso that we can explicitly add TO_CPSR as desired.
tcg-arm: Improve constant generation
Try fully rotated arguments to mov and mvn before trying movtor full decomposition. Begin decomposition with mvn when itlooks like it'll help. Examples include
: mov r9, #0x00000fa0: orr r9, r9, #0x000ee000...
tcg-arm: Use bic to implement and with constant
This greatly improves the code we can produce for depositwithout armv7 support.
tcg: Log the contents of the prologue with -d out_asm
This makes it easier to verify changes to the codegenerating the prologue.
[Aurelien: change the format from %i to %zu]
tcg-arm: Fix local stack frame
We were not allocating TCG_STATIC_CALL_ARGS_SIZE, so this meant thatany helper with more than 4 arguments would clobber the saved regs.Realizing that we're supposed to have this memory pre-allocated meanswe can clean up the tcg_out_arg functions, which were trying to do...
tcg: fix deposit_i64 op on 32-bit targets
On 32-bit TCG targets, when emulating deposit_i64 with a mov_i32 +deposit_i32, care should be taken to not overwrite the low part ofthe second argument before the deposit when it is the same thedestination.
This fixes the shld instruction in qemu-system-x86_64, which in turns...
tcg-ppc64: Handle deposit of zero
The TCG optimizer does great work when inserting constants, being ableto fold the open-coded deposit expansion to just an AND or an OR. Avoida bit the regression caused by having the deposit opcode by expandingdeposit of zero as an AND....
tcg-ppc64: Implement movcond
tcg-ppc64: Use getauxval for ISA detection
Glibc 2.16 includes an easy way to get feature bits previouslyburied in /proc or the program startup auxiliary vector. Use it.
tcg-ppc64: Implement add2/sub2_i64
tcg-ppc64: Implement mulu2/muls2_i64
tcg-ppc64: Cleanup i32 constants to tcg_out_cmp
Nothing else in the call chain ensures that theseconstants don't have garbage in the high bits.
tcg-ppc64: Use MFOCRF instead of MFCR
It takes half the cycles to read one CR register instead of all 8.This is a backward compatible addition to the ISA, so chips priorto Power 2.00 spec will simply continue to read the entire CR register.
tcg-ppc64: Use ISEL for setcond
There are a few simple special cases that should be handled first.Break these out to subroutines to avoid code duplication.
tcg-ppc64: Implement deposit
tcg-ppc64: Use I constraint for mul
The mul_i32 pattern was loading non-16-bit constants into a register,when we can get the middle-end to do that for us. The mul_i64 patternwas not considering that MULLI takes 64-bit inputs.
tcg-ppc64: Use TCGType throughout compares
The optimization/bug being fixed is that tcg_out_cmp was not applying theright type to loading a constant, in the case it can't be implementeddirectly. Rather than recomputing the TCGType enum from the arch64 bool,...
tcg-ppc64: Implement bswap64
tcg-ppc64: Implement compound logicals
Mostly copied from the ppc32 port.
tcg-ppc64: Handle constant inputs for some compound logicals
Since we have special code to handle and/or/xor with a constant,apply the same to andc/orc/eqv with a constant.
tcg-ppc64: Implement bswap16 and bswap32
tcg-ppc64: Implement rotates
tcg-ppc64: Streamline qemu_ld/st insn selection
Using a table to look up insns of the right width and sign.Include support for the Power 2.06 LDBRX and STDBRX insns.
tcg-ppc64: Use automatic implementation of ext32u_i64
The enhancements to and immediate obviate this.
tcg-ppc64: Improve and_i32 with constant
Use RLWINM
tcg-ppc64: Improve and_i64 with constant
Use RLDICL and RLDICR.
tcg-ppc64: Tidy or and xor patterns.
Handle constants in common code; we'll want to reuse that later.
tcg-ppc64: Allow constant first argument to sub
Using SUBFIC for 16-bit signed constants.
tcg-ppc64: Improve constant add and sub ops.
Improve constant addition -- previously we'd emit useless addi with 0.Use new constraints to force the driver to pull full 64-bit constantsinto a register.
tcg-ppc64: Rearrange integer constant constraints
We'll need a zero, and Z makes more sense for that. Make sure wehave a full compliment of signed and unsigned 16 and 32-bit tests.
tcg-ppc64: Cleanup tcg_out_movi
The test for using movi32 was sub-optimal for TCG_TYPE_I32, comparinga signed 32-bit quantity against an unsigned 32-bit quantity.
When possible, use addi+oris for 32-bit unsigned constants. Otherwise,standardize on addi+oris+ori instead of addis+ori+rldicl....
tcg-ppc64: Fix setcond_i32
We weren't ignoring the high 32 bits during a NE comparison.
tcg-ppc64: Introduce and use TAI and SAI
tcg-ppc64: Introduce and use tcg_out_shri64
tcg-ppc64: Introduce and use tcg_out_shli64
tcg-ppc64: Introduce and use tcg_out_ext32u
tcg-ppc64: Introduce and use tcg_out_rlw
tcg-ppc64: Use TCGReg everywhere
Merge branch 'tci' of git://qemu.weilnetz.de/qemu
tci: Use 32-bit signed offsets to loads/stores
Since the change to tcg_exit_req, the first insn of every TB isa load with a negative offset from env.
Signed-off-by: Richard Henderson <rth@twiddle.net>Signed-off by: Stefan Weil <sw@weilnetz.de>
tci: Delete unused tb_ret_addr
tci: Make tcg temporaries local to tcg_qemu_tb_exec
We're moving away from the temporaries stored in env. Make sure we candifferentiate between temp stores and possibly bogus stores for extracall arguments. Move TCG_AREG0 and TCG_REG_CALL_STACK out of the way...
tcg-s390: Fix merge error in tgen_brcond
When the TCG condition codes were re-organized last year,we failed to update all of the "old-style" tests for unsigned.
tcg-s390: Remove constraint letters for and
Since we have a free temporary and can always just load the constant, weought to do so, rather than spending the same effort constraining the const.
tcg-s390: Use risbgz for andi
This is immediately usable by the tlb lookup code.
tcg-s390: Cleanup argument shuffling fixme in softmmu code
tcg-s390: Use load-address for addition
Since we're always in 64-bit mode, load address performs a full64-bit add. Use that for 3-address addition, as well as forlarger constant addends when we lack extended-immediates facility.
tcg-s390: Use all 20 bits of the offset in tcg_out_mem
This can save one insn, if the constant has any bits in 32-63 set,but no bits in 21-31 set. It never results in more insns.
tcg-s390: Remove useless preprocessor conditions
We only support 64-bit code generation for s390x.Don't clutter the code with ifdefs that suggest otherwise.
tcg-s390: Implement add2/sub2 opcodes
tcg-s390: Implement mulu2_i64 opcode
tcg-s390: Implement movcond opcodes
tcg-s390: Implement deposit opcodes