You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Tree:
46a7c8826b
2ndAdjustmentVulkan
2ndAdjustmentVulkanV3-STOP_FORCE_PUSHING_CAMILLE
2ndAdjustmentVulkanv2
2ndgpuopts
John-android-rescale-toggle
Kernel
TTYD
Update-some-barriers-test
astc_optimization1
astc_optimizations
atomicops-mxwell
bcn-ternary-soft
bioshock
bump-httplib-minver
civa
descriptor
descriptor_pool_opt
descriptor_set
discfix
dmnt2
drop-msvc
dynarmic
dynarmic-ppc64
eden-managarm
eden-orbis-ps4
eds-true-adreno-fixes
eds-true-adreno-fixes-pre-0.1.0
eds_changes1
eds_changes_phasewise
feat/android_kotlin_compose
feat/game_override
feat/nvnflinger_defaultdataspace
fix-friend-list-freeze
fix-grid-autoalign
fix-update-android
fix/discord-rpc
flatopsfixes23485
fs-try
gpuopts
hotfix-performance
kosmic_krisp
kosmickrisp
lanobu
lines
liz-dynarmic-macos-fbsd-port
liz-heaptrack-fix
lizize/fixcrashwehenimagesizeexceed
lizzie/1st-anniversary-icon
lizzie/adreno5-mali-driver-fix
lizzie/amsmeow
lizzie/anv-cached-mesa-hack
lizzie/areafixnvidia
lizzie/astc-mp4-improv
lizzie/audio-oobfix
lizzie/aurfixixifxixfi
lizzie/bsdsockets-mod-fix
lizzie/bundle-v53
lizzie/bundleevil123
lizzie/civ7-y2k-dns-ban
lizzie/cpudetect1234
lizzie/demangle-cxxabi
lizzie/dynarmic-exclusive-fixups
lizzie/edit-handheld-mode
lizzie/evil-f32-u32-format-fix
lizzie/evil-meow-meow
lizzie/fbsd-shm_create_largepage
lizzie/ffmpeg-d1d873c003
lizzie/ffmpeg-fix-externel
lizzie/fix-nvmap-handles
lizzie/fix-nvmap-handles-but-evil
lizzie/fixandroid7435683485
lizzie/forceattachements-at-all-times
lizzie/fs-msvc-succks
lizzie/gigdigdi75
lizzie/gutter-sw-blitter
lizzie/hblfixusperhashxha
lizzie/hbloader-fix
lizzie/hleinlineremoveredundant
lizzie/icon-old-like-again
lizzie/inline-123
lizzie/inline-gpu-works1
lizzie/inline-thread-funcs
lizzie/inlinecodecnvenc
lizzie/invert-gyro
lizzie/ios-port-sud
lizzie/jit-addresschecks
lizzie/jthread74573785
lizzie/macos-network-ifaces
lizzie/macos-vk-metal-fix-stype
lizzie/maxwell-dumb-tools
lizzie/mboverhead1
lizzie/mlp223
lizzie/msvcfix1234
lizzie/multicore-macos-fix1
lizzie/mwaitx-better-inl
lizzie/nce-invalidate-split
lizzie/nce-port1123
lizzie/noexcept-dynarmic
lizzie/openssl-external-builds
lizzie/powerunswizzle
lizzie/product-model-wawa
lizzie/qtcrashwhensgssersrtsr
lizzie/readonly-listicons
lizzie/remove-disassembler-dynarmic
lizzie/remove-mlp-2-level
lizzie/restore-fcsm
lizzie/rework-ips
lizzie/settings-blocked-domains
lizzie/sgsr
lizzie/shebangfuckery
lizzie/simplify-invalid-acc
lizzie/sm-AtmosphereHasService
lizzie/spirvheaderstoolsnuke
lizzie/splay
lizzie/stable-shader-pools
lizzie/stuff-for-fun-but-evil
lizzie/testrisevilffkg
lizzie/tomo-toggle
lizzie/tomoda823828
lizzie/ulaunch-attempt1
lizzie/unaligned-attempt-2
lizzie/unity-build
lizzie/update-faq-link-wwa
lizzie/vk-ext-fault-info
lizzie/vkexperiments1-highp-fucked
lizzie/wstring-uni-123
lizzie/xbyak-force-bundled
lizzie/xcode-evil-shit-123
macroify-surface-stuffs
many/fix-tomodachi
master
memory_changes
mmap-fixews
mutliplayer-filter-better1
n64
nce_cpp
pipelinederivative
qcomopts2
queries
querybugfix
refactoreds2
release-early-fences
release/0.0.3
release/0.0.4
release/0.1.0
revert-2695
showcase
showcase2
sjkdbsdfjkbsdf-2834
spvopts
sured-revert
test-revert-gpu-optim
test2
tex_opt
texture_cache
true-eds
true-eds-graphics
true-eds-pre-0.0.1
uma
update-deps-040626
update-translations-1777730513
update-translations-1779199346
video_core
vk-experiments1
vk-fix-oom-force-maller-buffers
vk-symph
vkexperiments1
vuid00336_1
vuid02999
vuid04553
vulkanasync
workgroup
xbzk-dma-pusher-step-redesign
xbzk-saf-recursive-write-with-permission-request
xbzk/background-support
xbzk/ban-epic-domain
xbzk/flicker-fix
xbzk/unreal-unsafe-junk-guards
xbzk/vulkan-vuid-goodies-pack
0.0.0
0.0.1-pre-alpha
0.0.2-pre-alpha
test-tag1
test-tag2
v0.0.3
v0.0.3-rc1
v0.0.3-rc2
v0.0.3-rc3
v0.0.3.git
v0.0.4
v0.0.4-rc1
v0.0.4-rc2
v0.0.4-rc2.test
v0.0.4-rc2.test2
v0.0.4-rc3
v0.0.4-rc3.test1
v0.0.4-rc3.test2
v0.0.4.test
v0.1.0
v0.1.0-rc1
v0.1.1
v0.2.0
v0.2.0-rc1
v0.2.0-rc2
${ noResults }
eden/.appveyor
Uses arithmetic that can be identified more trivially by compilers for
optimizations. e.g. Rather than shifting the halves of the value and
then swapping and combining them, we can swap them in place.
e.g. for the original swap32 code on x86-64, clang 8.0 would generate:
mov ecx, edi
rol cx, 8
shl ecx, 16
shr edi, 16
rol di, 8
movzx eax, di
or eax, ecx
ret
while GCC 8.3 would generate the ideal:
mov eax, edi
bswap eax
ret
now both generate the same optimal output.
MSVC used to generate the following with the old code:
mov eax, ecx
rol cx, 8
shr eax, 16
rol ax, 8
movzx ecx, cx
movzx eax, ax
shl ecx, 16
or eax, ecx
ret 0
Now MSVC also generates a similar, but equally optimal result as clang/GCC:
bswap ecx
mov eax, ecx
ret 0
====
In the swap64 case, for the original code, clang 8.0 would generate:
mov eax, edi
bswap eax
shl rax, 32
shr rdi, 32
bswap edi
or rax, rdi
ret
(almost there, but still missing the mark)
while, again, GCC 8.3 would generate the more ideal:
mov rax, rdi
bswap rax
ret
now clang also generates the optimal sequence for this fallback as well.
This is a case where MSVC unfortunately falls short, despite the new
code, this one still generates a doozy of an output.
mov r8, rcx
mov r9, rcx
mov rax, 71776119061217280
mov rdx, r8
and r9, rax
and edx, 65280
mov rax, rcx
shr rax, 16
or r9, rax
mov rax, rcx
shr r9, 16
mov rcx, 280375465082880
and rax, rcx
mov rcx, 1095216660480
or r9, rax
mov rax, r8
and rax, rcx
shr r9, 16
or r9, rax
mov rcx, r8
mov rax, r8
shr r9, 8
shl rax, 16
and ecx, 16711680
or rdx, rax
mov eax, -16777216
and rax, r8
shl rdx, 16
or rdx, rcx
shl rdx, 16
or rax, rdx
shl rax, 8
or rax, r9
ret 0
which is pretty unfortunate.
|
7 years ago | |
|---|---|---|
| .. | ||
| UtilityFunctions.ps1 | Implement Citra pull 3043 | 8 years ago |