* Implements hardware acceleration for SHA256 instructions.
* Adds SHA256 instructions introduced in ARMv8 to A32 frontend.
* Implements polyfill for processors that do not support hardware
accelerated SHA instructions.
Inlines implementation of exclusive instructions into JITted code,
improving performance of applications relying heavily on these
instructions.
We also fastmem these instructions for additional speed, with
support for appropriate recompilation on fastmem failure.
An unsafe optimization to disable the intercore global_monitor is also
provided, should one wish to rely solely on cmpxchg semantics for
safety.
See also: merryhime/dynarmic#664
4e6848d A32/ir_emitter: Bugfix: ExceptionRaised was producing incorrect PC
41ba9fd value: Move ImmediateToU64() to be a part of Value's interface
c6a6271 reg_alloc: Emit AVX instructions where able
aedd32a abi: Emit AVX instructions where able
f2d9337 a64_exclusive_monitor: Loosen memory ordering requirements
7ca709d travis: Make macOS builds use Xcode 10
14dd45e Fix VShift terminology
88554c4 emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS16
ab4e316 emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS64
0ea84f3 emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftS32
c77a2c5 emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftU16()
e9441fd emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU64()
0e9c33c emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU32()
8f85274 emit_x64_vector: SSSE3 variant of EmitVectorCountLeadingZeros8()
be05e75 Merge pull request #397 from VelocityRa/dec-shift-fix
bc328fc decoders: Cast to correctly-sized type before shifting
9c3d2d1 a64_emit_x64: Lowercase PAGE_SIZE
f538d29 emit_x64_vector_floating_point: SSE4.1 implementation of EmitFPVectorToFixed
1603a6e emit_x64_vector_floating_point: EmitFPVectorRoundInt: Use FCODE
2e1ccaf emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8
555bfda emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16
71c2589 externals: Update Xbyak to 5.73
1ec1b2f Squashed 'externals/xbyak/' changes from 1de435ed..42462ef9
550d662 load_store_exclusive: Define s == t state to be Constraint_NONE
0b69381 A64/translate: Allow for unpredictable behaviour to be defined
6d236d4 system: Implement MRS CNTFRQ_EL0
6cbb6fb A32/testenv: Add missing headers
6729328 externals: Update xbyak to v5.67
1812bd2 Squashed 'externals/xbyak/' changes from 2794cde7..671fc805
9a95802 externals: Document subtrees
714a840 A64: Implement SQ{ADD, SUB}, and UQ{ADD, SUB}'s vector variants
8cab459 A64: Implement UQADD/UQSUB's scalar variants
18a8151 ir: Add opcodes for unsigned saturating add and subtract
a5660ee x64/reg_alloc: Use type alias for array returned by GetArgumentInfo()
29489b5 ir/value: Use type alias CoprocessorInfo for std::array<u8, 8>
e23ba26 status_register_access: Add support for bits 0 and 1 of mask to MSR
55190bd fuzz_with_unicorn: Split utility functions into fuzz_util
23b049d A32/translate/load_store: Correct detection of writeback
7ec9f15 A32/translate: Add TranslateSingleInstruction
efeecb4 A32/ir_emitter: Bug fix: IREmitter::ExceptionRaised using incorrect opcode
08d1d19 A32/decoders: Split instruction list into include file
2d929cc tests: Refactor unicorn_emu to allow for A32 unicorn
f672368 microinstruction: Improve assert messages
7ebff50 emit_x64_vector: EmitVectorNarrow16: AVX512 implementation
edce230 emit_x64_vector: EmitVectorNarrow32: prefer pblendw to loading constant
4f96c63 emit_x64_vector_floating_point: Simplify FPVector{Min,Max}
e15fdfe emit_x64_vector_floating_point: Simplify Get*Vector functions
734a00b emit_x64_floating_point: Remove EmitProcessNaNs
fd45191 devirtualize: Replace DEVIRT macro with function template
67ba5d0 fuzz_with_unicorn: Remove FCVT_float from ignore list
66e6dd1 a32_emit_x64: std::move A32::UserConfig in the constructor
b4890b6 emit_x64_floating_point: Use EmitPostProcessNaNs in EmitFPMulX
18b2943 emit_x64_floating_point: Remove unnecessary DenormalsAreZero from EmitFPSingleToDouble and EmitFPDoubleToSingle
df1f81f emit_x64_floating_point: Simplify EmitFP{Min,Max}{,Numeric}{32,64}
21fb1c3 emit_x64_floating_point: Reduce NaN processing overhead
f5c9f0f A64: Implement FMULX, scalar single/double variant
8f47773 IR: Implement FPMulX IR instruction
79e6440 fuzz_with_unicorn: Randomize SP
33c80e3 fuzz_with_unicorn: Randomize PC
8d41024 testenv: Make code_mem mobile
a9fae0e emit_x64_vector: Vectorize 32-bit variants of paired min/max
8926a92 emit_x64_vector: Improve code emission of VectorGetElement* for index == 0
e20bd38 reg_alloc: Do a UseScratch if a Use destination is too small
a19fa0e fuzz_with_unicorn: Randomize FPCR.AHP and FPCR.FZ16
775f368 emit_x64_floating_point: AVX implementation of ForceToDefaultNaN
71018a1 emit_x64_vector_floating_point: Prefer blendvp{s,d} to vblendvp{s,d} where possible
137f4b3 backend_x64: Remove all use of xmm0
e73d67a emit_x64_vector_floating_point: AVX implementation of ForceToDefaultNaN
43cca54 emit_x64_vector_floating_point: Reduce codesize of ForceToDefaultNaN
5dc40f4 emit_x64_vector_floating_point: Reduce codesize of EmitTwoOpVectorOperation
07622ee emit_x64_vector_floating_point: Correct FMA in FTZ mode
621c85b emit_x64_floating_point: DenormalsAreZero is redundant as hardware already does DAZ
3d0ebaa emit_x64_floating_point: FlushToZero is redundant as hardware already does FTZ
f626ff8 backend_x64: Fix FPVectorMulAdd and FPMulAdd NaN handling with denormals
adeb9d9 a32/fuzz_arm: Disable vfp tests
19ea70d fuzz_with_unicorn: Randomize FPCR.FZ
895db36 backend_x64: Fix bugs when FPCR.FZ=1
d7e2de2 fuzz_with_unicorn: Extract RandomFpcr function
c858d6c fp/info: Deduplicate functions
5b88ec2 emit_x64_floating_point: Deduplicate EmitFPMulAdd implementation