15 KiB
The NVIDIA SM86 (Maxwell) GPU - Instruction set
AL2P ALD AST ATOM ATOMS B2R BAR BFE BFI BPT BRA BRK BRX CAL CCTL CCTLL CONT CS2R CSET CSETP DADD DEPBAR DFMA DMNMX DMUL DSET DSETP EXIT F2F F2I FADD FCHK FCMP FFMA FLO FMNMX FMUL FSET FSETP FSWZADD GETCRSPTR GETLMEMBASE HADD2 HFMA2 HMUL2 HSET2 HSETP2 I2F I2I IADD IADD3 ICMP IDE IDP IMAD IMADSP IMNMX IMUL IPA ISBERD ISCADD ISET ISETP JCAL JMP JMX KIL LD LDC LDG LDL LDS LEA LEPC LONGJMP LOP LOP3 MEMBAR MOV MUFU NOP OUT P2R PBK PCNT PEXIT PIXLD PLONGJMP POPC PRET PRMT PSET PSETP R2B R2P RAM RED RET RRO RTT S2R SAM SEL SETCRSPTR SETLMEMBASE SHF SHFL SHL SHR SSY ST STG STL STP STS SUATOM SULD SURED SUST SYNC TEX TLD TLD4 TMML TXA TXD TXQ VABSDIFF VABSDIFF4 VADD VMAD VMNMX VOTE VSET VSETP VSHL VSHR XMAD
NOTE: Regenerate TOC with cat docs/gpu/README.md | grep '#' | cut -d '#' -f 2 | tr -d ' ' | awk '{print "["$1"](#"$1")"}'.
The numbers (in binary) represent the opcodes; - signifies "don't care".
AL2P
1110 1111 1010 0---
ALD
1110 1111 1101 1---
AST
1110 1111 1111 0---
ATOM
- ATOM_cas:
1110 1110 1111 ---- - ATOM:
1110 1101 ---- ----
Atomic operation.
- INC, DEC for U32/S32/U64 does nothing.
- ADD, INC, DEC for S64 does nothing.
- Only ADD does something for F32.
- Only ADD, MIN and MAX does something for F16x2.
ATOMS
- ATOMS_cas:
1110 1110 ---- ---- - ATOMS:
1110 1100 ---- ----
B2R
1111 0000 1011 1---
BAR
1111 0000 1010 1---
BFE
- BFE_reg:
0101 1100 0000 0--- - BFE_cbuf:
0100 1100 0000 0--- - BFE_imm:
0011 100- 0000 0---
Bit Field Extract.
BFI
- BFI_reg:
0101 1011 1111 0--- - BFI_rc:
0101 0011 1111 0--- - BFI_cr:
0100 1011 1111 0--- - BFI_imm:
0011 011- 1111 0---
Bit Field Insert.
BPT
1110 0011 1010 ----
Breakpoint trap.
BRA
1110 0010 0100 ----
Relative branch.
BRK
1110 0011 0100 ----
Break.
BRX
1110 0010 0101 ----
CAL
1110 0010 0110 ----
CCTL
1110 1111 011- ----
Cache Control.
CCTLL
1110 1111 100- ----
Texture Cache Control.
CONT
1110 0011 0101 ----
Continue.
CS2R
0101 0000 1100 1---
Move Special Register to Register.
CSET
0101 0000 1001 1---
Test Condition Code And Set.
CSETP
0101 0000 1010 0---
Test Condition Code and Set Predicate.
DADD
- DADD_reg:
0101 1100 0111 0--- - DADD_cbuf:
0100 1100 0111 0--- - DADD_imm:
0011 100- 0111 0---
DEPBAR
1111 0000 1111 0---
DFMA
- DFMA_reg:
0101 1011 0111 ---- - DFMA_rc:
0101 0011 0111 ---- - DFMA_cr:
0100 1011 0111 ---- - DFMA_imm:
0011 011- 0111 ----
FP64 Fused Mutiply Add.
DMNMX
- DMNMX_reg:
0101 1100 0101 0--- - DMNMX_cbuf:
0100 1100 0101 0--- - DMNMX_imm:
0011 100- 0101 0---
FP64 Minimum/Maximum.
DMUL
- DMUL_reg:
0101 1100 1000 0--- - DMUL_cbuf:
0100 1100 1000 0--- - DMUL_imm:
0011 100- 1000 0---
FP64 Multiply.
DSET
- DSET_reg:
0101 1001 0--- ---- - DSET_cbuf:
0100 1001 0--- ---- - DSET_imm:
0011 001- 0--- ----
FP64 Compare And Set.
DSETP
- DSETP_reg:
0101 1011 1000 ---- - DSETP_cbuf:
0100 1011 1000 ---- - DSETP_imm:
0011 011- 1000 ----
FP64 Compare And Set Predicate.
EXIT
1110 0011 0000 ----
F2F
- F2F_reg:
0101 1100 1010 1--- - F2F_cbuf:
0100 1100 1010 1--- - F2F_imm:
0011 100- 1010 1---
F2I
- F2I_reg:
0101 1100 1011 0--- - F2I_cbuf:
0100 1100 1011 0--- - F2I_imm:
0011 100- 1011 0---
FADD
- FADD_reg:
0101 1100 0101 1--- - FADD_cbuf:
0100 1100 0101 1--- - FADD_imm:
0011 100- 0101 1--- - FADD32I:
0000 10-- ---- ----
FP32 Add.
FCHK
- FCHK_reg:
0101 1100 1000 1--- - FCHK_cbuf:
0100 1100 1000 1--- - FCHK_imm:
0011 100- 1000 1---
Single Precision FP Divide Range Check.
FCMP
- FCMP_reg:
0101 1011 1010 ---- - FCMP_rc:
0101 0011 1010 ---- - FCMP_cr:
0100 1011 1010 ---- - FCMP_imm:
0011 011- 1010 ----
FP32 Compare to Zero and Select Source.
FFMA
- FFMA_reg:
0101 1001 1--- ---- - FFMA_rc:
0101 0001 1--- ---- - FFMA_cr:
0100 1001 1--- ---- - FFMA_imm:
0011 001- 1--- ---- - FFMA32I:
0000 11-- ---- ----
FP32 Fused Multiply and Add.
FLO
- FLO_reg:
0101 1100 0011 0--- - FLO_cbuf:
0100 1100 0011 0--- - FLO_imm:
0011 100- 0011 0---
FMNMX
- FMNMX_reg:
0101 1100 0110 0--- - FMNMX_cbuf:
0100 1100 0110 0--- - FMNMX_imm:
0011 100- 0110 0---
FP32 Minimum/Maximum.
FMUL
- FMUL_reg:
0101 1100 0110 1--- - FMUL_cbuf:
0100 1100 0110 1--- - FMUL_imm:
0011 100- 0110 1--- - FMUL32I:
0001 1110 ---- ----
FP32 Multiply.
FSET
- FSET_reg:
0101 1000 ---- ---- - FSET_cbuf:
0100 1000 ---- ---- - FSET_imm:
0011 000- ---- ----
FP32 Compare And Set.
FSETP
- FSETP_reg:
0101 1011 1011 ---- - FSETP_cbuf:
0100 1011 1011 ---- - FSETP_imm:
0011 011- 1011 ----
FP32 Compare And Set Predicate.
FSWZADD
0101 0000 1111 1---
FP32 Add used for FSWZ emulation.
GETCRSPTR
1110 0010 1100 ----
GETLMEMBASE
1110 0010 1101 ----
HADD2
- HADD2_reg:
0101 1101 0001 0--- - HADD2_cbuf:
0111 101- 1--- ---- - HADD2_imm:
0111 101- 0--- ---- - HADD2_32I:
0010 110- ---- ----
FP16 Add.
HFMA2
- HFMA2_reg:
0101 1101 0000 0--- - HFMA2_rc:
0110 0--- 1--- ---- - HFMA2_cr:
0111 0--- 1--- ---- - HFMA2_imm:
0111 0--- 0--- ---- - HFMA2_32I:
0010 100- ---- ----
FP16 Fused Mutiply Add.
HMUL2
- HMUL2_reg:
0101 1101 0000 1--- - HMUL2_cbuf:
0111 100- 1--- ---- - HMUL2_imm:
0111 100- 0--- ---- - HMUL2_32I:
0010 101- ---- ----
FP16 Multiply.
HSET2
- HSET2_reg:
0101 1101 0001 1--- - HSET2_cbuf:
0111 110- 1--- ---- - HSET2_imm:
0111 110- 0--- ----
FP16 Compare And Set.
HSETP2
- HSETP2_reg:
0101 1101 0010 0--- - HSETP2_cbuf:
0111 111- 1--- ---- - HSETP2_imm:
0111 111- 0--- ----
FP16 Compare And Set Predicate.
I2F
- I2F_reg:
0101 1100 1011 1--- - I2F_cbuf:
0100 1100 1011 1--- - I2F_imm:
0011 100- 1011 1---
I2I
- I2I_reg:
0101 1100 1110 0--- - I2I_cbuf:
0100 1100 1110 0--- - I2I_imm:
0011 100- 1110 0---
IADD
- IADD_reg:
0101 1100 0001 0--- - IADD_cbuf:
0100 1100 0001 0--- - IADD_imm:
0011 100- 0001 0---
Integer Addition.
IADD3
- IADD3_reg:
0101 1100 1100 ---- - IADD3_cbuf:
0100 1100 1100 ---- - IADD3_imm:
0011 100- 1100 ---- - IADD32I:
0001 110- ---- ----
3-input Integer Addition.
ICMP
- ICMP_reg:
0101 1011 0100 ---- - ICMP_rc:
0101 0011 0100 ---- - ICMP_cr:
0100 1011 0100 ---- - ICMP_imm:
0011 011- 0100 ----
Integer Compare to Zero and Select Source.
IDE
1110 0011 1001 ----
IDP
- IDP_reg:
0101 0011 1111 1--- - IDP_imm:
0101 0011 1101 1---
IMAD
- IMAD_reg:
0101 1010 0--- ---- - IMAD_rc:
0101 0010 0--- ---- - IMAD_cr:
0100 1010 0--- ---- - IMAD_imm:
0011 010- 0--- ---- - IMAD32I:
1000 00-- ---- ----
Integer Multiply And Add.
IMADSP
- IMADSP_reg:
0101 1010 1--- ---- - IMADSP_rc:
0101 0010 1--- ---- - IMADSP_cr:
0100 1010 1--- ---- - IMADSP_imm:
0011 010- 1--- ----
Extracted Integer Multiply And Add..
IMNMX
- IMNMX_reg:
0101 1100 0010 0--- - IMNMX_cbuf:
0100 1100 0010 0--- - IMNMX_imm:
0011 100- 0010 0---
Integer Minimum/Maximum.
IMUL
- IMUL_reg:
0101 1100 0011 1--- - IMUL_cbuf:
0100 1100 0011 1--- - IMUL_imm:
0011 100- 0011 1--- - IMUL32I:
0001 1111 ---- ----
Integer Multiply.
IPA
1110 0000 ---- ----
ISBERD
1110 1111 1101 0---
In-Stage-Buffer Entry Read.
ISCADD
- ISCADD_reg:
0101 1100 0001 1--- - ISCADD_cbuf:
0100 1100 0001 1--- - ISCADD_imm:
0011 100- 0001 1--- - ISCADD32I:
0001 01-- ---- ----
Scaled Integer Addition.
ISET
- ISET_reg:
0101 1011 0101 ---- - ISET_cbuf:
0100 1011 0101 ---- - ISET_imm:
0011 011- 0101 ----
Integer Compare And Set.
ISETP
- ISETP_reg:
0101 1011 0110 ---- - ISETP_cbuf:
0100 1011 0110 ---- - ISETP_imm:
0011 011- 0110 ----
Integer Compare And Set Predicate.
JCAL
1110 0010 0010 ----
Absolute Call.
JMP
1110 0010 0001 ----
Absolute Jump.
JMX
1110 0010 0000 ----
Absolute Jump Indirect.
KIL
1110 0011 0011 ----
LD
100- ---- ---- ----
Load from generic Memory.
LDC
1110 1111 1001 0---
Load Constant.
LDG
1110 1110 1101 0---
Load from Global Memory.
LDL
1110 1111 0100 0---
Load within Local Memory Window.
LDS
1110 1111 0100 1---
Load within Shared Memory Window.
LEA
- LEA_hi_reg:
0101 1011 1101 1--- - LEA_hi_cbuf:
0001 10-- ---- ---- - LEA_lo_reg:
0101 1011 1101 0--- - LEA_lo_cbuf:
0100 1011 1101 ---- - LEA_lo_imm:
0011 011- 1101 0---
LEPC
0101 0000 1101 0---
LONGJMP
1110 0011 0001 ----
LOP
- LOP_reg:
0101 1100 0100 0--- - LOP_cbuf:
0100 1100 0100 0--- - LOP_imm:
0011 100- 0100 0---
LOP3
- LOP3_reg:
0101 1011 1110 0--- - LOP3_cbuf:
0000 001- ---- ---- - LOP3_imm:
0011 11-- ---- ---- - LOP32I:
0000 01-- ---- ----
MEMBAR
1110 1111 1001 1---
Memory Barrier.
MOV
- MOV_reg:
0101 1100 1001 1--- - MOV_cbuf:
0100 1100 1001 1--- - MOV_imm:
0011 100- 1001 1--- - MOV32I:
0000 0001 0000 ----
MUFU
0101 0000 1000 0---
Multi Function Operation.
NOP
0101 0000 1011 0---
No operation.
OUT
- OUT_reg:
1111 1011 1110 0--- - OUT_cbuf:
1110 1011 1110 0--- - OUT_imm:
1111 011- 1110 0---
P2R
- P2R_reg:
0101 1100 1110 1--- - P2R_cbuf:
0100 1100 1110 1--- - P2R_imm:
0011 1000 1110 1---
Move Predicate Register To Register.
PBK
1110 0010 1010 ----
Pre-break.
PCNT
1110 0010 1011 ----
Pre-continue.
PEXIT
1110 0010 0011 ----
Pre-exit.
PIXLD
1110 1111 1110 1---
PLONGJMP
1110 0010 1000 ----
Pre-long jump.
POPC
- POPC_reg:
0101 1100 0000 1--- - POPC_cbuf:
0100 1100 0000 1--- - POPC_imm:
0011 100- 0000 1---
Population/Bit count.
PRET
1110 0010 0111 ----
Pre-return from subroutine. Pushes the return address to the CRS stack.
PRMT
- PRMT_reg:
0101 1011 1100 ---- - PRMT_rc:
0101 0011 1100 ---- - PRMT_cr:
0100 1011 1100 ---- - PRMT_imm:
0011 011- 1100 ----
PSET
0101 0000 1000 1---
Combine Predicates and Set.
PSETP
0101 0000 1001 0---
Combine Predicates and Set Predicate.
R2B
1111 0000 1100 0---
Move Register to Barrier.
R2P
- R2P_reg:
0101 1100 1111 0--- - R2P_cbuf:
0100 1100 1111 0--- - R2P_imm:
0011 100- 1111 0---
Move Register To Predicate/CC Register.
RAM
1110 0011 1000 ----
RED
1110 1011 1111 1---
Reduction Operation on Generic Memory.
RET
1110 0011 0010 ----
Return.
RRO
- RRO_reg:
0101 1100 1001 0--- - RRO_cbuf:
0100 1100 1001 0--- - RRO_imm:
0011 100- 1001 0---
RTT
1110 0011 0110 ----
S2R
1111 0000 1100 1---
SAM
1110 0011 0111 ----
SEL
- SEL_reg:
0101 1100 1010 0--- - SEL_cbuf:
0100 1100 1010 0--- - SEL_imm:
0011 100- 1010 0---
SETCRSPTR
1110 0010 1110 ----
SETLMEMBASE
1110 0010 1111 ----
SHF
- SHF_l_reg:
0101 1011 1111 1--- - SHF_l_imm:
0011 011- 1111 1--- - SHF_r_reg:
0101 1100 1111 1--- - SHF_r_imm:
0011 100- 1111 1---
SHFL
1110 1111 0001 0---
SHL
- SHL_reg:
0101 1100 0100 1--- - SHL_cbuf:
0100 1100 0100 1--- - SHL_imm:
0011 100- 0100 1---
SHR
- SHR_reg:
0101 1100 0010 1--- - SHR_cbuf:
0100 1100 0010 1--- - SHR_imm:
0011 100- 0010 1---
SSY
1110 0010 1001 ----
Set Synchronization Point.
ST
101- ---- ---- ----
Store to generic Memory.
STG
1110 1110 1101 1---
Store to global Memory.
STL
1110 1111 0101 0---
Store within Local or Shared Window.
STP
1110 1110 1010 0---
Store to generic Memory and Predicate.
STS
1110 1111 0101 1---
Store within Local or Shared Window.
SUATOM
- SUATOM:
1110 1010 0--- ---- - SUATOM_cas:
1110 1010 1--- ----
Atomic Op on Surface Memory.
SULD
1110 1011 000- ----
Surface Load.
SURED
1110 1011 010- ----
Reduction Op on Surface Memory.
SUST
1110 1011 001- ----
Surface Store.
SYNC
1111 0000 1111 1---
TEX
- TEX:
1100 0--- ---- ---- - TEX_b:
1101 1110 10-- ---- - TEXS:
1101 -00- ---- ----
Texture Fetch with scalar/non-vec4 source/destinations.
TLD
- TLD:
1101 1100 ---- ---- - TLD_b:
1101 1101 ---- ---- - TLDS:
1101 -01- ---- ----
Texture Load with scalar/non-vec4 source/destinations.
TLD4
- TLD4:
1100 10-- ---- ---- - TLD4_b:
1101 1110 11-- ---- - TLD4S:
1101 1111 -0-- ----
Texture Load 4 with scalar/non-vec4 source/destinations.
TMML
- TMML:
1101 1111 0101 1--- - TMML_b:
1101 1111 0110 0---
Texture MipMap Level.
TXA
1101 1111 0100 0---
TXD
- TXD:
1101 1110 00-- ---- - TXD_b:
1101 1110 01-- ----
Texture Fetch With Derivatives.
TXQ
- TXQ:
1101 1111 0100 1--- - TXQ_b:
1101 1111 0101 0---
Texture Query.
VABSDIFF
0101 0100 ---- ----
VABSDIFF4
0101 0000 0--- ----
VADD
0010 00-- ---- ----
VMAD
0101 1111 ---- ----
VMNMX
0011 101- ---- ----
VOTE
- VOTE:
0101 0000 1101 1--- - VOTE_vtg:
0101 0000 1110 0---
Vote Across SIMD Thread Group
VSET
0100 000- ---- ----
VSETP
0101 0000 1111 0---
VSHL
0101 0111 ---- ----
VSHR
0101 0110 ---- ----
XMAD
- XMAD_reg:
0101 1011 00-- ---- - XMAD_rc:
0101 0001 0--- ---- - XMAD_cr:
0100 111- ---- ---- - XMAD_imm:
0011 011- 00-- ----
Integer Short Multiply Add.