tcg-ppc64: More use of TAI and SAI helper macros
Finish conversion of all memory operations.
Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg-ppc64: Use TCG_REG_Rn constants
Instead of bare N, for clarity. The only (intentional) exception madeis for insns that encode R|0, i.e. when R0 encoded into the insn isinterpreted as zero not the contents of the register.
tcg-ppc64: Use tcg_out64
tcg-ppc: Avoid code for nop move
While these are rare from code that's been through the optimizer,it's not uncommon within the tcg backend.
tcg-ppc: Cleanup tcg_out_qemu_ld/st_slow_path
Coding style fixes. Use TCGReg enumeration values instead of rawnumbers. Don't needlessly pull the whole TCGLabelQemuLdst structinto local variables. Less conditional compilation.
No functional changes....
tcg-ppc: Use conditional branch and link to slow path
Saves one insn per slow path. Note that we can no longer usea tail call into the store helper.
tcg-ppc: Fix and cleanup tcg_out_tlb_check
The fix is that sparc has so many mmu modes that the last one overflowedthe 16-bit signed offset we assumed would fit. Handle this, and checkthe new assumption at compile time.
Load the tlb addend earlier for the fast path....
tcg-ppc64: Reformat tcg-target.c
Whitespace and brace changes only.
tcg-ppc: use new return-argument ld/st helpers
These use a 32-bit load-of-immediate to save a mflr+addi+mtlr sequence.Tested with a Windows 98 guest (pretty much the most recent thing Icould run on my PPC machine) and kvm-unit-tests's sieve.flat. Thespeed up for sieve.flat is as high as 10% for qemu-system-i386, 25%...
tcg-ppc: fix qemu_ld/qemu_st for AIX ABI
For the AIX ABI, the function pointer and small area pointer needto be loaded in the trampoline. The trampoline instead is calledwith a normal BL instruction.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>...
tcg-sparc: Fix parenthesis warning
error: suggest parentheses around comparison in operand of ‘&’ [-Werror=parentheses]
Signed-off-by: Richard Henderson <rth@twiddle.net>Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Merge remote-tracking branch 'mjt/trivial-patches' into staging
Merge branch 'tcg-next' of git://github.com/rth7680/qemu
tcg/mips: detect available host instructions at runtime
Now that TCG supports enabling and disabling ops at runtime, it'spossible to detect the available host instructions at runtime, andenable the corresponding ops accordingly.
Unfortunately it's not easy to probe for available instructions on...
tcg/mips: inline bswap16/bswap32 ops
Use an inline version for the bswap16 and bswap32 ops to avoidtesting for MIPS32R2 instructions availability, as these ops areonly available in that case.
Reviewed-by: Richard Henderson <rth@twiddle.net>Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
tcg/mips: only enable ext8s/ext16s ops on MIPS32R2
On MIPS ext8s and ext16s ops are implemented with a dedicatedinstruction only on MIPS32R2, otherwise the same kind of implementationthan at TCG level (shift left followed by shift right) is used.
Change that by only implementing the ext8s and ext16s ops on MIPS32R2 so...
tcg: Introduce zero and sign-extended versions of load helpers
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg-i386: Make use of zero-extended memory helper routines
For 8 and 16-bit unsigned loads, rely on the zero-extensionfrom the helper and use a smaller 32-bit move insn.
tcg: Change tcg_gen_exit_tb argument to uintptr_t
And update all users.
tcg: Change tcg_out_ld/st offset to intptr_t
tcg: Use appropriate types in tcg_reg_alloc_call
tcg: Fix jit debug for x32
tcg-i386: Use intptr_t appropriately
tcg-i386: Adjust tcg_out_tlb_load for x32
tcg-i386: Don't perform GETPC adjustment in TCG code
Since we now perform it inside the helper, no need to do it here.This also lets us perform a tail-call from the store slow path tothe helper.
exec: Split softmmu_defs.h
The _cmmu helpers can be moved to exec-all.h. The helpers that areused from TCG will shortly need access to tcg_target_long so movetheir declarations into tcg.h.
This requires minor include adjustments to all TCG backends....
tcg: Add muluh and mulsh opcodes
Use them in places where mulu2 and muls2 are used.Optimize mulx2 with dead low part to mulxh.
tcg-mips: Implement mulsh, muluh
With the optimization in tcg_liveness_analysis,we can avoid the MFLO when it is unused.
tcg-ppc64: Implement muluh, mulsh
Using these instead of mulu2 and muls2 lets us avoid having to argumentoverlap analysis in the backend. Normal register allocation will DTRT.
tcg: Constant fold div, rem
tcg: Change flush_icache_range arguments to uintptr_t
tcg: Change tcg_qemu_tb_exec return to uintptr_t
tcg: Allow TCG_TARGET_REG_BITS to be specified independantly
There are several hosts for which it would be useful to use theavailable 64-bit registers in a 32-bit pointer environment.
tcg: Define TCG_TYPE_PTR properly
tcg: Define TCG_ptr properly
tcg: Change frame pointer offsets to intptr_t
tcg: Change memory offsets to intptr_t
tcg: Change relocation offsets to intptr_t
tcg: Use uintptr_t in TCGHelperInfo
tci: Remove function tcg_out64 (fix broken build)
Commit ac26eb69a311396668809eadbf7ff4e623447d4c added tcg_out64 to tcg/tcg.c.tcg/tci/tcg-target.c already had a nearly identical implementation which isnow removed to fix a compiler error.
Signed-off-by: Stefan Weil <sw@weilnetz.de>...
tcg-i386: Use new return-argument ld/st helpers
Discontinue the jump-around-jump-to-jump scheme, trading it for a singleimmediate move instruction. The two extra jumps always consume 7 bytes,whereas the immediate move is either 5 or 7 bytes depending on where the...
tcg: Tidy generated code for tcg_outN
Aliasing was forcing s->code_ptr to be re-read after the store.Keep the pointer in a local variable to help the compiler.
tcg-i386: Add and use tcg_out64
No point in splitting the write into 32-bit pieces.
tcg-i386: Try pc-relative lea for constant formation
Use a 7 byte lea before the ultimate 10 byte movq.
tcg-i386: Tidy qemu_ld/st slow path
Use existing stack space for arguments; don't push/pop.Use less ifdefs and more C ifs.
tcg/mips: fix invalid op definition errors
tcg/mips/tcg-target.h defines various operations conditionally dependingupon the isa revision, however these operations are included inmips_op_defs[] unconditionally resulting in the following runtime errorsif CONFIG_DEBUG_TCG is defined:...
tci: Fix broken build (compiler warning caused by redefined macro BIT)
The definition of macro BIT in tci/tcg-target.c now conflicts with thedefinition of the same macro in includes qemu/bitops.h.
This conflict was triggered by a recent change in the include chain of...
Merge git://github.com/hw-claudio/qemu-aarch64-queue into tcg-next
tcg/aarch64: Implement tlb lookup fast path
Supports CONFIG_QEMU_LDST_OPTIMIZATION
Signed-off-by: Jani Kokkonen <jani.kokkonen@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
tcg-i386: Use QEMU_BUILD_BUG_ON instead of assert for frame size
We can check the condition at compile time, rather than run time.
Reviewed-by: Andreas Färber <afaerber@suse.de>Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg-arm: Implement tcg_register_jit
Allows unwinding past the code_gen_buffer.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg: Fix high_pc fields in .debug_info
I don't think the debugger actually looks at this for anything,using the correct .debug_frame contents, but might as well getit all correct.
tcg: Move the CIE and FDE header definitions to common code
These will necessarily be the same layout for all hosts. This limitsthe amount of boilerplate required to implement jit debug for a host.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>...
tcg-arm: Use AT_PLATFORM to detect the host ISA
With this we can generate armv7 insns even when the OS compiles for alower common denominator. The macros are arranged so that when we docompile for a given ISA, all of the runtime checks for that ISA are...
tcg-arm: Simplify logic in detecting the ARM ISA in use
GCC 4.8 defines a handy __ARM_ARCH symbol that we can use, whichwill make us nicely forward compatible with ARMv8 AArch32.
tcg-arm: Rename use_armv5_instructions to use_armvt5_instructions
As it really controls the availability of a thumb interworkinginstruction on armv5t.
tcg: Allow non-constant control macros
This allows TCG_TARGET_HAS_* to be a variable rather than a constant,which allows easier support for differing ISA levels for the host.
tcg: Simplify logic using TCG_OPF_NOT_PRESENT
Expand the definition of "not present" to include "should not be present".This means we can simplify the logic surrounding the generic tcg opcodesfor which the host backend ought not be providing definitions....
tcg-arm: Make use of conditional availability of opcodes for divide
We can now detect and use divide instructions at runtime, rather thanhaving to restrict their availability to compile-time.
tcg-arm: Don't implement rem
tcg-ppc: Don't implement rem
tcg-ppc64: Don't implement rem
tcg: Split rem requirement from div requirement
There are several hosts with only a "div" insn. Remainder is computedmanually from the quotient and inputs. We can do this generically.
tcg/aarch64: implement ldst 12bit scaled uimm offset
implement the 12bit scaled unsigned immediate offsetvariant of LDR/STR. This improves code size by avoidingthe movi + ldst_r for naturally aligned offsets in range.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>...
tcg-ppc64: bswap64 rotates output 32 bits
If our input and output is in the same register, bswap64 tries toundo a rotate of the input. This just ends up rotating the output.
Cc: qemu-stable@nongnu.orgSigned-off-by: Anton Blanchard <anton@samba.org>Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg-ppc64: Fix add2_i64
add2_i64 was adding the lower double word to the upper double wordof each input. Fix this so we add the lower double words, then theupper double words with carry propagation.
Cc: qemu-stable@nongnu.orgSigned-off-by: Anton Blanchard <anton@samba.org>...
tcg-ppc64: rotr_i32 rotates wrong amount
rotr_i32 calculates the amount to left shift and puts it into atemporary, but then doesn't use it when doing the shift.
tcg-ppc64: Fix RLDCL opcode
The rldcl instruction doesn't have an sh field, so the minor opcodeis shifted 1 bit. We were using the XO30 macro which shifted theminor opcode 2 bits.
Remove XO30 and add MD30 and MDS30 macros which match thePower ISA categories....
Merge remote-tracking branch 'pmaydell/tcg-aarch64.next' into staging
tcg/aarch64: implement byte swap operations
implement the optional byte swap operations with the dedicatedaarch64 instructions.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>Message-id: 51AC9A33.9050003@huawei.com...
tcg/aarch64: implement sign/zero extend operations
implement the optional sign/zero extend operations with the dedicatedaarch64 instructions.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>Message-id: 51AC9A58.40502@huawei.com...
tcg/aarch64: implement user mode qemu ld/st
also put aarch64 in the list of archs that do not need an ldscript.
Signed-off-by: Jani Kokkoken <jani.kokkonen@huawei.com>Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>...
tcg/aarch64: implement new TCG target for aarch64
add preliminary support for TCG target aarch64.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>Reviewed-by: Peter Maydell <peter.maydell@linaro.org>...
tcg/aarch64: improve arith shifted regs operations
for arith operations, add SUBS, ANDS, ADDS and add a shift parameterso that all arith instructions can make use of shifted registers.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>...
tcg/aarch64: implement AND/TEST immediate pattern
add functions to AND/TEST registers with immediate patterns.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>Reviewed-by: Richard Henderson <rth@twiddle.net>Message-id: 51AC9A0C.3090303@huawei.com...
tcg: Remove redundant tcg_target_init checks
We've got a compile-time check for the condition in exec/cpu-defs.h.
Reviewed-by: Andreas Färber <afaerber@suse.de>Reviewed-by: liguang <lig.fnst@cn.fujitsu.com>Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg/optimize: fix setcond2 optimization
When setcond2 is rewritten into setcond, the state of the destinationtemp should be reset, so that a copy of the previous value is notused instead of the result.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>Reviewed-by: Richard Henderson <rth@twiddle.net>...
tcg-arm: Use movi32 in exit_tb
Avoid the mini constant pool for armv7, and avoid replicatingthe test for pre-v7.
Signed-off-by: Richard Henderson <rth@twiddle.net>Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
tcg-arm: Fix 64-bit tlb load for pre-v6
Found by inspection, since the effect of the bug was simply tosend all memory ops through the slow path.
tcg-arm: Split out tcg_out_tlb_read
Share code between qemu_ld and qemu_st to process the tlb.
tcg-arm: Improve scheduling of tcg_out_tlb_read
The schedule was fully serial, with no possibility for dual issue.The old schedule had a minimal issue of 7 cycles; the new schedulehas a minimal issue of 5 cycles.
tcg-arm: Delete the 'S' constraint
After the previous patch, 's' and 'S' are the same.
tcg-arm: Use movi32 + blx for calls on v7
Work better with branch predition when we have movw+movt,as the size of the code is the same. Perhaps re-evaluatewhen we have a proper constant pool.
tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION
Move the slow path out of line, as the TODO's mention.This allows the fast path to be unconditional, which canspeed up the fast path as well, depending on the core.
tcg-arm: Remove long jump from tcg_out_goto_label
Branches within a TB will always be within 16MB.
tcg-arm: Implement deposit for armv7
We have BFI and BFC available for implementing it.
tcg-arm: Implement division instructions
An armv7 extension implements division, present on Cortex A15.
tcg-arm: Use TCG_REG_TMP name for the tcg temporary
Don't hard-code R8.
tcg-arm: Use R12 for the tcg temporary
R12 is call clobbered, while R8 is call saved. This changegives tcg one more call saved register for real data.
tcg-arm: Cleanup multiply subroutines
Make the code more readable by only having one copy of the magicnumbers, swapping registers as needed prior to that. Speed thecompiler by not applying the rd == rn avoidance for v6 or later.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>...
tcg-arm: Cleanup most primitive load store subroutines
Use even more primitive helper functions to avoid lots of duplicated code.
tcg-arm: Handle negated constant arguments to and/sub
This greatly improves code generation for addition of smallnegative constants.
tcg-arm: Allow constant first argument to sub
This allows the generation of RSB instructions.
tcg-arm: Use tcg_out_dat_rIN for compares
This allows us to emit CMN instructions.
tcg-arm: Handle constant arguments to add2/sub2
We get to re-use the _rIN and _rIK subroutines to handle the variouscombinations of add vs sub. Fold the << 21 into the opcode enum valuesso that we can explicitly add TO_CPSR as desired.
tcg-arm: Improve constant generation
Try fully rotated arguments to mov and mvn before trying movtor full decomposition. Begin decomposition with mvn when itlooks like it'll help. Examples include
: mov r9, #0x00000fa0: orr r9, r9, #0x000ee000...
tcg-arm: Use bic to implement and with constant
This greatly improves the code we can produce for depositwithout armv7 support.
tcg: Log the contents of the prologue with -d out_asm
This makes it easier to verify changes to the codegenerating the prologue.
[Aurelien: change the format from %i to %zu]
tcg-arm: Fix local stack frame
We were not allocating TCG_STATIC_CALL_ARGS_SIZE, so this meant thatany helper with more than 4 arguments would clobber the saved regs.Realizing that we're supposed to have this memory pre-allocated meanswe can clean up the tcg_out_arg functions, which were trying to do...
tcg: fix deposit_i64 op on 32-bit targets
On 32-bit TCG targets, when emulating deposit_i64 with a mov_i32 +deposit_i32, care should be taken to not overwrite the low part ofthe second argument before the deposit when it is the same thedestination.
This fixes the shld instruction in qemu-system-x86_64, which in turns...