Thanks for using Compiler Explorer
Sponsors
Jakt
C++
Ada
Analysis
Android Java
Android Kotlin
Assembly
C
C3
Carbon
C++ (Circle)
CIRCT
Clean
CMake
CMakeScript
COBOL
C++ for OpenCL
MLIR
Cppx
Cppx-Blue
Cppx-Gold
Cpp2-cppfront
Crystal
C#
CUDA C++
D
Dart
Elixir
Erlang
Fortran
F#
Go
Haskell
HLSL
Hook
Hylo
ispc
Java
Julia
Kotlin
LLVM IR
LLVM MIR
Modula-2
Nim
Objective-C
Objective-C++
OCaml
OpenCL C
Pascal
Pony
Python
Racket
Ruby
Rust
Snowball
Scala
Solidity
Spice
Swift
LLVM TableGen
Toit
TypeScript Native
V
Vala
Visual Basic
Zig
Javascript
GIMPLE
assembly source #1
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
AArch64 binutils 2.28
AArch64 binutils 2.31.1
AArch64 binutils 2.33.1
AArch64 binutils 2.35.1
AArch64 binutils 2.38
ARM binutils 2.25
ARM binutils 2.28
ARM binutils 2.31.1
ARM gcc 10.2 (linux)
ARM gcc 9.3 (linux)
ARMhf binutils 2.28
BeebAsm 1.09
NASM 2.12.02
NASM 2.13.02
NASM 2.13.03
NASM 2.14.02
NASM 2.16.01
PTX Assembler 10.0.130
PTX Assembler 10.1.105
PTX Assembler 10.1.168
PTX Assembler 10.1.243
PTX Assembler 10.2.89
PTX Assembler 11.0.2
PTX Assembler 11.0.3
PTX Assembler 11.1.0
PTX Assembler 11.1.1
PTX Assembler 11.2.0
PTX Assembler 11.2.1
PTX Assembler 11.2.2
PTX Assembler 11.3.0
PTX Assembler 11.3.1
PTX Assembler 11.4.0
PTX Assembler 11.4.1
PTX Assembler 11.5.0
PTX Assembler 9.1.85
PTX Assembler 9.2.88
RISC-V binutils 2.31.1
RISC-V binutils 2.31.1
RISC-V binutils 2.35.1
RISC-V binutils 2.35.1
RISC-V binutils 2.37.0
RISC-V binutils 2.37.0
RISC-V binutils 2.38.0
RISC-V binutils 2.38.0
x86-64 binutils (trunk)
x86-64 binutils 2.27
x86-64 binutils 2.28
x86-64 binutils 2.29.1
x86-64 binutils 2.34
x86-64 binutils 2.36.1
x86-64 binutils 2.38
x86-64 clang (assertions trunk)
x86-64 clang (trunk)
x86-64 clang 10.0.0
x86-64 clang 10.0.1
x86-64 clang 11.0.0
x86-64 clang 11.0.1
x86-64 clang 12.0.0
x86-64 clang 12.0.1
x86-64 clang 13.0.0
x86-64 clang 14.0.0
x86-64 clang 15.0.0
x86-64 clang 16.0.0
x86-64 clang 17.0.1
x86-64 clang 18.1.0
x86-64 clang 3.0.0
x86-64 clang 3.1
x86-64 clang 3.2
x86-64 clang 3.3
x86-64 clang 3.4.1
x86-64 clang 3.5
x86-64 clang 3.5.1
x86-64 clang 3.5.2
x86-64 clang 3.6
x86-64 clang 3.7
x86-64 clang 3.7.1
x86-64 clang 3.8
x86-64 clang 3.8.1
x86-64 clang 3.9.0
x86-64 clang 3.9.1
x86-64 clang 4.0.0
x86-64 clang 4.0.1
x86-64 clang 5.0.0
x86-64 clang 6.0.0
x86-64 clang 7.0.0
x86-64 clang 8.0.0
x86-64 clang 9.0.0
Options
Source code
; the block_size=4 special case, assuming reads past the end get garbage but don't fault. ; started writing this before realizing that was just an example. rwhash4: ; input: uint8_t *RSI, unsigned length ECX ; output: store into 4x uint16_t pointed to by RDI pxor mm0, mm0 .loop: sub ecx, 4 ; count down from the end jb .tail ; craptastic while(){} loop structure pxor mm1, mm1 punpcklbw mm1, [rsi+rcx] ; reads 8 bytes, uses only the low 4 (emulate pmovzxbw) paddb mm0, mm1 ; avoid polluting the high byte of each element jmp .loop .tail: ; TODO: last 0..3 bytes, maybe 4-byte load and shift to zero them? ; possibly bzhi? ; final processing on accumulator mm0 pcmpeqw mm7, mm2 psrlw mm7, 16-5 ; set1_epi16(0x1f) mask pshufw mm1, mm0, 0b00111001 ; rotate right by 1 word element pshufw mm2, mm0, 0b01001110 ; rotate by 2 word elements pand mm1, mm7 paddb mm0, mm1 psrlw mm7, 4 ; mask is now set1_epi16(0x1) pand mm2, mm7 paddb mm0, mm2 ; each 16-bit element is one of the 4 "block" elements. ; TODO: multiplicative inverse with pmulhw to implement % 36 ; https://stackoverflow.com/questions/41183935/why-does-gcc-use-multiplication-by-a-strange-number-in-implementing-integer-divi movq [rdi], mm0 ; array of 4 integer values, not their ASCII characters. ; pshufw mm1, mm0, 0b00000001 ; 2-byte rotate of low 4. ; movd eax, mm0 ; or try scalar ; mov ecx, eax ; ror eax, 8 ; and eax, 0x1f1f1f1f ; add ecx, eax ; nope, and SWAR add requires a lot of masking. https://www.chessprogramming.org/SIMD_and_SWAR_Techniques ; movd back to mm1? ; some versions of ARM and/or MIPS IIRC has SIMD byte-element adds within general-purpose registers; would be great for this.
Become a Patron
Sponsor on GitHub
Donate via PayPal
Source on GitHub
Mailing list
Installed libraries
Wiki
Report an issue
How it works
Contact the author
CE on Mastodon
About the author
Statistics
Changelog
Version tree