You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Tree:
46a7c8826b
00336
02751
Kernel
Update-some-barriers-test
astc_optimization1
astc_optimizations
atomicops-mxwell
bcn-ternary-soft
camillelavey-patch-1
chore/chore
civa
cyber/async-gpu-android
descriptor
descriptor_pool_opt
descriptor_set
discfix
dmnt2
dravee/qt-fix
dravee/remove-async
dynarmic-ppc64
eden-orbis-ps4
eds-true-adreno-fixes
eds-true-adreno-fixes-pre-0.1.0
feat/android_kotlin_compose
feat/game_override
ffmpeg-cross-compile
fix/discord-rpc
fix/socket
flatopsfixes23485
fs-try
gentoo-gdb
gpuopts
interval-zero
lanobu
liz-crash-dumps-solaris
liz-dynarmic-macos-fbsd-port
liz-get-rid-of-mcl-intrusive-list
liz-heaptrack-fix
liz-no-rtti-allowance
lizzie/adreno5-mali-driver-fix
lizzie/astc-mp4-improv
lizzie/audio-remove-recursive-lock
lizzie/correcter-convert-abgr8-srgb-d24s8
lizzie/dynarmic-sse3-impls
lizzie/dynrregabiset
lizzie/evil-f32-u32-format-fix
lizzie/fix-dragonslayer-armnce
lizzie/fix-my-fuckup-with-vic
lizzie/fix-rw-garten
lizzie/fix-vic-msvc-attempt-2
lizzie/gmake-fix
lizzie/gutter-sw-blitter
lizzie/indirect-removeal-memread
lizzie/inline-dynarmic-spooky
lizzie/ios-port-sud
lizzie/ltofix123
lizzie/macos-fix
lizzie/macos-vk-metal-fix-stype
lizzie/map-vfs-wawaw
lizzie/maxwell-dumb-tools
lizzie/nce-port1123
lizzie/qt-fix-user-dialogue
lizzie/rasterizer-recursive-mutex-not
lizzie/realvfs-fix-dtor-crash
lizzie/remove-disassembler-dynarmic
lizzie/rework-block-list
lizzie/settings-blocked-domains
lizzie/sgsr
lizzie/splay
lizzie/spscs-mpscs-queue
lizzie/stupid-socket-bullshit
lizzie/texture-pass-revert-1
lizzie/thai-polish
lizzie/try-fix-win11-crash-fuck
lizzie/vids-ratatata
lizzie/xbyak-unor-mapfix-with-boost
lock-term-1
macroify-surface-stuffs
master
mmap-fixews
msvc
mutliplayer-filter-better1
n64
nce_cpp
pipelinederivative
qcomopts2
querybugfix
refactoreds2
release-early-fences
release/0.0.3
release/0.0.4
release/0.1.0
renderer_vulkan
revert-2695
revert-noinline
revert-xbyak
revertrevert
sgsrtry
showcase
showcase2
sjkdbsdfjkbsdf-2834
spvopts
sured-revert
techno48473719
test-revert-gpu-optim
test2
true-eds
true-eds-graphics
true-eds-pre-0.0.1
uma
vk-fix-oom-force-maller-buffers
vk-surface-andpc
vk-symph
vkexperiments1
vuid00336_1
vuid02999
vuid04553
vulkan-thingy
vulkanasync
woa-turnip-expr
workgroup
xbzk-saf-recursive-write-with-permission-request
xbzk/bindless-textures-support
xbzk/dma-step-ondemand-flush
xbzk/flicker-fix
xbzk/vulkan-vuid-goodies-pack
0.0.0
0.0.1-pre-alpha
0.0.2-pre-alpha
test-tag1
test-tag2
v0.0.3
v0.0.3-rc1
v0.0.3-rc2
v0.0.3-rc3
v0.0.3.git
v0.0.4
v0.0.4-rc1
v0.0.4-rc2
v0.0.4-rc2.test
v0.0.4-rc2.test2
v0.0.4-rc3
v0.0.4-rc3.test1
v0.0.4-rc3.test2
v0.0.4.test
v0.1.0
v0.1.0-rc1
v0.1.1
v0.2.0-rc1
${ noResults }
eden/.appveyor
Uses arithmetic that can be identified more trivially by compilers for
optimizations. e.g. Rather than shifting the halves of the value and
then swapping and combining them, we can swap them in place.
e.g. for the original swap32 code on x86-64, clang 8.0 would generate:
mov ecx, edi
rol cx, 8
shl ecx, 16
shr edi, 16
rol di, 8
movzx eax, di
or eax, ecx
ret
while GCC 8.3 would generate the ideal:
mov eax, edi
bswap eax
ret
now both generate the same optimal output.
MSVC used to generate the following with the old code:
mov eax, ecx
rol cx, 8
shr eax, 16
rol ax, 8
movzx ecx, cx
movzx eax, ax
shl ecx, 16
or eax, ecx
ret 0
Now MSVC also generates a similar, but equally optimal result as clang/GCC:
bswap ecx
mov eax, ecx
ret 0
====
In the swap64 case, for the original code, clang 8.0 would generate:
mov eax, edi
bswap eax
shl rax, 32
shr rdi, 32
bswap edi
or rax, rdi
ret
(almost there, but still missing the mark)
while, again, GCC 8.3 would generate the more ideal:
mov rax, rdi
bswap rax
ret
now clang also generates the optimal sequence for this fallback as well.
This is a case where MSVC unfortunately falls short, despite the new
code, this one still generates a doozy of an output.
mov r8, rcx
mov r9, rcx
mov rax, 71776119061217280
mov rdx, r8
and r9, rax
and edx, 65280
mov rax, rcx
shr rax, 16
or r9, rax
mov rax, rcx
shr r9, 16
mov rcx, 280375465082880
and rax, rcx
mov rcx, 1095216660480
or r9, rax
mov rax, r8
and rax, rcx
shr r9, 16
or r9, rax
mov rcx, r8
mov rax, r8
shr r9, 8
shl rax, 16
and ecx, 16711680
or rdx, rax
mov eax, -16777216
and rax, r8
shl rdx, 16
or rdx, rcx
shl rdx, 16
or rax, rdx
shl rax, 8
or rax, r9
ret 0
which is pretty unfortunate.
|
7 years ago | |
|---|---|---|
| .. | ||
| UtilityFunctions.ps1 | Implement Citra pull 3043 | 8 years ago |