Thanks for using Compiler Explorer
Sponsors
Jakt
C++
Ada
Analysis
Android Java
Android Kotlin
Assembly
C
C3
Carbon
C++ (Circle)
CIRCT
Clean
CMake
CMakeScript
COBOL
C++ for OpenCL
MLIR
Cppx
Cppx-Blue
Cppx-Gold
Cpp2-cppfront
Crystal
C#
CUDA C++
D
Dart
Elixir
Erlang
Fortran
F#
Go
Haskell
HLSL
Hook
Hylo
ispc
Java
Julia
Kotlin
LLVM IR
LLVM MIR
Modula-2
Nim
Objective-C
Objective-C++
OCaml
OpenCL C
Pascal
Pony
Python
Racket
Ruby
Rust
Snowball
Scala
Solidity
Spice
Swift
LLVM TableGen
Toit
TypeScript Native
V
Vala
Visual Basic
Zig
Javascript
GIMPLE
c++ source #1
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
6502-c++ 11.1.0
ARM GCC 10.2.0
ARM GCC 10.3.0
ARM GCC 10.4.0
ARM GCC 10.5.0
ARM GCC 11.1.0
ARM GCC 11.2.0
ARM GCC 11.3.0
ARM GCC 11.4.0
ARM GCC 12.1.0
ARM GCC 12.2.0
ARM GCC 12.3.0
ARM GCC 13.1.0
ARM GCC 13.2.0
ARM GCC 13.2.0 (unknown-eabi)
ARM GCC 4.5.4
ARM GCC 4.6.4
ARM GCC 5.4
ARM GCC 6.3.0
ARM GCC 6.4.0
ARM GCC 7.3.0
ARM GCC 7.5.0
ARM GCC 8.2.0
ARM GCC 8.5.0
ARM GCC 9.3.0
ARM GCC 9.4.0
ARM GCC 9.5.0
ARM GCC trunk
ARM gcc 10.2.1 (none)
ARM gcc 10.3.1 (2021.07 none)
ARM gcc 10.3.1 (2021.10 none)
ARM gcc 11.2.1 (none)
ARM gcc 5.4.1 (none)
ARM gcc 7.2.1 (none)
ARM gcc 8.2 (WinCE)
ARM gcc 8.3.1 (none)
ARM gcc 9.2.1 (none)
ARM msvc v19.0 (WINE)
ARM msvc v19.10 (WINE)
ARM msvc v19.14 (WINE)
ARM64 Morello gcc 10.1 Alpha 2
ARM64 gcc 10.2
ARM64 gcc 10.3
ARM64 gcc 10.4
ARM64 gcc 10.5.0
ARM64 gcc 11.1
ARM64 gcc 11.2
ARM64 gcc 11.3
ARM64 gcc 11.4.0
ARM64 gcc 12.1
ARM64 gcc 12.2.0
ARM64 gcc 12.3.0
ARM64 gcc 13.1.0
ARM64 gcc 13.2.0
ARM64 gcc 5.4
ARM64 gcc 6.3
ARM64 gcc 6.4
ARM64 gcc 7.3
ARM64 gcc 7.5
ARM64 gcc 8.2
ARM64 gcc 8.5
ARM64 gcc 9.3
ARM64 gcc 9.4
ARM64 gcc 9.5
ARM64 gcc trunk
ARM64 msvc v19.14 (WINE)
AVR gcc 10.3.0
AVR gcc 11.1.0
AVR gcc 12.1.0
AVR gcc 12.2.0
AVR gcc 12.3.0
AVR gcc 13.1.0
AVR gcc 13.2.0
AVR gcc 4.5.4
AVR gcc 4.6.4
AVR gcc 5.4.0
AVR gcc 9.2.0
AVR gcc 9.3.0
Arduino Mega (1.8.9)
Arduino Uno (1.8.9)
BPF clang (trunk)
BPF clang 13.0.0
BPF clang 14.0.0
BPF clang 15.0.0
BPF clang 16.0.0
BPF clang 17.0.1
BPF clang 18.1.0
BPF gcc 13.1.0
BPF gcc 13.2.0
BPF gcc trunk
EDG (experimental reflection)
EDG 6.5
EDG 6.5 (GNU mode gcc 13)
EDG 6.6
EDG 6.6 (GNU mode gcc 13)
FRC 2019
FRC 2020
FRC 2023
KVX ACB 4.1.0 (GCC 7.5.0)
KVX ACB 4.1.0-cd1 (GCC 7.5.0)
KVX ACB 4.10.0 (GCC 10.3.1)
KVX ACB 4.11.1 (GCC 10.3.1)
KVX ACB 4.12.0 (GCC 11.3.0)
KVX ACB 4.2.0 (GCC 7.5.0)
KVX ACB 4.3.0 (GCC 7.5.0)
KVX ACB 4.4.0 (GCC 7.5.0)
KVX ACB 4.6.0 (GCC 9.4.1)
KVX ACB 4.8.0 (GCC 9.4.1)
KVX ACB 4.9.0 (GCC 9.4.1)
M68K gcc 13.1.0
M68K gcc 13.2.0
M68k clang (trunk)
MRISC32 gcc (trunk)
MSP430 gcc 4.5.3
MSP430 gcc 5.3.0
MSP430 gcc 6.2.1
MinGW clang 14.0.3
MinGW clang 14.0.6
MinGW clang 15.0.7
MinGW clang 16.0.0
MinGW clang 16.0.2
MinGW gcc 11.3.0
MinGW gcc 12.1.0
MinGW gcc 12.2.0
MinGW gcc 13.1.0
RISC-V (32-bits) gcc (trunk)
RISC-V (32-bits) gcc 10.2.0
RISC-V (32-bits) gcc 10.3.0
RISC-V (32-bits) gcc 11.2.0
RISC-V (32-bits) gcc 11.3.0
RISC-V (32-bits) gcc 11.4.0
RISC-V (32-bits) gcc 12.1.0
RISC-V (32-bits) gcc 12.2.0
RISC-V (32-bits) gcc 12.3.0
RISC-V (32-bits) gcc 13.1.0
RISC-V (32-bits) gcc 13.2.0
RISC-V (32-bits) gcc 8.2.0
RISC-V (32-bits) gcc 8.5.0
RISC-V (32-bits) gcc 9.4.0
RISC-V (64-bits) gcc (trunk)
RISC-V (64-bits) gcc 10.2.0
RISC-V (64-bits) gcc 10.3.0
RISC-V (64-bits) gcc 11.2.0
RISC-V (64-bits) gcc 11.3.0
RISC-V (64-bits) gcc 11.4.0
RISC-V (64-bits) gcc 12.1.0
RISC-V (64-bits) gcc 12.2.0
RISC-V (64-bits) gcc 12.3.0
RISC-V (64-bits) gcc 13.1.0
RISC-V (64-bits) gcc 13.2.0
RISC-V (64-bits) gcc 8.2.0
RISC-V (64-bits) gcc 8.5.0
RISC-V (64-bits) gcc 9.4.0
RISC-V rv32gc clang (trunk)
RISC-V rv32gc clang 10.0.0
RISC-V rv32gc clang 10.0.1
RISC-V rv32gc clang 11.0.0
RISC-V rv32gc clang 11.0.1
RISC-V rv32gc clang 12.0.0
RISC-V rv32gc clang 12.0.1
RISC-V rv32gc clang 13.0.0
RISC-V rv32gc clang 13.0.1
RISC-V rv32gc clang 14.0.0
RISC-V rv32gc clang 15.0.0
RISC-V rv32gc clang 16.0.0
RISC-V rv32gc clang 17.0.1
RISC-V rv32gc clang 18.1.0
RISC-V rv32gc clang 9.0.0
RISC-V rv32gc clang 9.0.1
RISC-V rv64gc clang (trunk)
RISC-V rv64gc clang 10.0.0
RISC-V rv64gc clang 10.0.1
RISC-V rv64gc clang 11.0.0
RISC-V rv64gc clang 11.0.1
RISC-V rv64gc clang 12.0.0
RISC-V rv64gc clang 12.0.1
RISC-V rv64gc clang 13.0.0
RISC-V rv64gc clang 13.0.1
RISC-V rv64gc clang 14.0.0
RISC-V rv64gc clang 15.0.0
RISC-V rv64gc clang 16.0.0
RISC-V rv64gc clang 17.0.1
RISC-V rv64gc clang 18.1.0
RISC-V rv64gc clang 9.0.0
RISC-V rv64gc clang 9.0.1
Raspbian Buster
Raspbian Stretch
SPARC LEON gcc 12.2.0
SPARC LEON gcc 12.3.0
SPARC LEON gcc 13.1.0
SPARC LEON gcc 13.2.0
SPARC gcc 12.2.0
SPARC gcc 12.3.0
SPARC gcc 13.1.0
SPARC gcc 13.2.0
SPARC64 gcc 12.2.0
SPARC64 gcc 12.3.0
SPARC64 gcc 13.1.0
SPARC64 gcc 13.2.0
TI C6x gcc 12.2.0
TI C6x gcc 12.3.0
TI C6x gcc 13.1.0
TI C6x gcc 13.2.0
TI CL430 21.6.1
VAX gcc NetBSDELF 10.4.0
VAX gcc NetBSDELF 10.5.0 (Nov 15 03:50:22 2023)
WebAssembly clang (trunk)
Xtensa ESP32 gcc 11.2.0 (2022r1)
Xtensa ESP32 gcc 12.2.0 (20230208)
Xtensa ESP32 gcc 8.2.0 (2019r2)
Xtensa ESP32 gcc 8.2.0 (2020r1)
Xtensa ESP32 gcc 8.2.0 (2020r2)
Xtensa ESP32 gcc 8.4.0 (2020r3)
Xtensa ESP32 gcc 8.4.0 (2021r1)
Xtensa ESP32 gcc 8.4.0 (2021r2)
Xtensa ESP32-S2 gcc 11.2.0 (2022r1)
Xtensa ESP32-S2 gcc 12.2.0 (20230208)
Xtensa ESP32-S2 gcc 8.2.0 (2019r2)
Xtensa ESP32-S2 gcc 8.2.0 (2020r1)
Xtensa ESP32-S2 gcc 8.2.0 (2020r2)
Xtensa ESP32-S2 gcc 8.4.0 (2020r3)
Xtensa ESP32-S2 gcc 8.4.0 (2021r1)
Xtensa ESP32-S2 gcc 8.4.0 (2021r2)
Xtensa ESP32-S3 gcc 11.2.0 (2022r1)
Xtensa ESP32-S3 gcc 12.2.0 (20230208)
Xtensa ESP32-S3 gcc 8.4.0 (2020r3)
Xtensa ESP32-S3 gcc 8.4.0 (2021r1)
Xtensa ESP32-S3 gcc 8.4.0 (2021r2)
arm64 msvc v19.28 VS16.9
arm64 msvc v19.29 VS16.10
arm64 msvc v19.29 VS16.11
arm64 msvc v19.30
arm64 msvc v19.31
arm64 msvc v19.32
arm64 msvc v19.33
arm64 msvc v19.34
arm64 msvc v19.35
arm64 msvc v19.36
arm64 msvc v19.37
arm64 msvc v19.38
arm64 msvc v19.latest
armv7-a clang (trunk)
armv7-a clang 10.0.0
armv7-a clang 10.0.1
armv7-a clang 11.0.0
armv7-a clang 11.0.1
armv7-a clang 9.0.0
armv7-a clang 9.0.1
armv8-a clang (all architectural features, trunk)
armv8-a clang (trunk)
armv8-a clang 10.0.0
armv8-a clang 10.0.1
armv8-a clang 11.0.0
armv8-a clang 11.0.1
armv8-a clang 12.0.0
armv8-a clang 13.0.0
armv8-a clang 14.0.0
armv8-a clang 15.0.0
armv8-a clang 16.0.0
armv8-a clang 17.0.1
armv8-a clang 18.1.0
armv8-a clang 9.0.0
armv8-a clang 9.0.1
ellcc 0.1.33
ellcc 0.1.34
ellcc 2017-07-16
hexagon-clang 16.0.5
llvm-mos commander X16
llvm-mos commodore 64
llvm-mos mega65
llvm-mos nes-cnrom
llvm-mos nes-mmc1
llvm-mos nes-mmc3
llvm-mos nes-nrom
llvm-mos osi-c1p
loongarch64 gcc 12.2.0
loongarch64 gcc 12.3.0
loongarch64 gcc 13.1.0
loongarch64 gcc 13.2.0
mips clang 13.0.0
mips clang 14.0.0
mips clang 15.0.0
mips clang 16.0.0
mips clang 17.0.1
mips clang 18.1.0
mips gcc 11.2.0
mips gcc 12.1.0
mips gcc 12.2.0
mips gcc 12.3.0
mips gcc 13.1.0
mips gcc 13.2.0
mips gcc 4.9.4
mips gcc 5.4
mips gcc 5.5.0
mips gcc 9.3.0 (codescape)
mips gcc 9.5.0
mips64 (el) gcc 12.1.0
mips64 (el) gcc 12.2.0
mips64 (el) gcc 12.3.0
mips64 (el) gcc 13.1.0
mips64 (el) gcc 13.2.0
mips64 (el) gcc 4.9.4
mips64 (el) gcc 5.4.0
mips64 (el) gcc 5.5.0
mips64 (el) gcc 9.5.0
mips64 clang 13.0.0
mips64 clang 14.0.0
mips64 clang 15.0.0
mips64 clang 16.0.0
mips64 clang 17.0.1
mips64 clang 18.1.0
mips64 gcc 11.2.0
mips64 gcc 12.1.0
mips64 gcc 12.2.0
mips64 gcc 12.3.0
mips64 gcc 13.1.0
mips64 gcc 13.2.0
mips64 gcc 4.9.4
mips64 gcc 5.4.0
mips64 gcc 5.5.0
mips64 gcc 9.5.0
mips64el clang 13.0.0
mips64el clang 14.0.0
mips64el clang 15.0.0
mips64el clang 16.0.0
mips64el clang 17.0.1
mips64el clang 18.1.0
mipsel clang 13.0.0
mipsel clang 14.0.0
mipsel clang 15.0.0
mipsel clang 16.0.0
mipsel clang 17.0.1
mipsel clang 18.1.0
mipsel gcc 12.1.0
mipsel gcc 12.2.0
mipsel gcc 12.3.0
mipsel gcc 13.1.0
mipsel gcc 13.2.0
mipsel gcc 4.9.4
mipsel gcc 5.4.0
mipsel gcc 5.5.0
mipsel gcc 9.5.0
nanoMIPS gcc 6.3.0 (mtk)
power gcc 11.2.0
power gcc 12.1.0
power gcc 12.2.0
power gcc 12.3.0
power gcc 13.1.0
power gcc 13.2.0
power gcc 4.8.5
power64 AT12.0 (gcc8)
power64 AT13.0 (gcc9)
power64 gcc 11.2.0
power64 gcc 12.1.0
power64 gcc 12.2.0
power64 gcc 12.3.0
power64 gcc 13.1.0
power64 gcc 13.2.0
power64 gcc trunk
power64le AT12.0 (gcc8)
power64le AT13.0 (gcc9)
power64le clang (trunk)
power64le gcc 11.2.0
power64le gcc 12.1.0
power64le gcc 12.2.0
power64le gcc 12.3.0
power64le gcc 13.1.0
power64le gcc 13.2.0
power64le gcc 6.3.0
power64le gcc trunk
powerpc64 clang (trunk)
s390x gcc 11.2.0
s390x gcc 12.1.0
s390x gcc 12.2.0
s390x gcc 12.3.0
s390x gcc 13.1.0
s390x gcc 13.2.0
sh gcc 12.2.0
sh gcc 12.3.0
sh gcc 13.1.0
sh gcc 13.2.0
sh gcc 4.9.4
sh gcc 9.5.0
vast (trunk)
x64 msvc v19.0 (WINE)
x64 msvc v19.10 (WINE)
x64 msvc v19.14
x64 msvc v19.14 (WINE)
x64 msvc v19.15
x64 msvc v19.16
x64 msvc v19.20
x64 msvc v19.21
x64 msvc v19.22
x64 msvc v19.23
x64 msvc v19.24
x64 msvc v19.25
x64 msvc v19.26
x64 msvc v19.27
x64 msvc v19.28
x64 msvc v19.28 VS16.9
x64 msvc v19.29 VS16.10
x64 msvc v19.29 VS16.11
x64 msvc v19.30
x64 msvc v19.31
x64 msvc v19.32
x64 msvc v19.33
x64 msvc v19.34
x64 msvc v19.35
x64 msvc v19.36
x64 msvc v19.37
x64 msvc v19.38
x64 msvc v19.latest
x86 djgpp 4.9.4
x86 djgpp 5.5.0
x86 djgpp 6.4.0
x86 djgpp 7.2.0
x86 msvc v19.0 (WINE)
x86 msvc v19.10 (WINE)
x86 msvc v19.14
x86 msvc v19.14 (WINE)
x86 msvc v19.15
x86 msvc v19.16
x86 msvc v19.20
x86 msvc v19.21
x86 msvc v19.22
x86 msvc v19.23
x86 msvc v19.24
x86 msvc v19.25
x86 msvc v19.26
x86 msvc v19.27
x86 msvc v19.28
x86 msvc v19.28 VS16.9
x86 msvc v19.29 VS16.10
x86 msvc v19.29 VS16.11
x86 msvc v19.30
x86 msvc v19.31
x86 msvc v19.32
x86 msvc v19.33
x86 msvc v19.34
x86 msvc v19.35
x86 msvc v19.36
x86 msvc v19.37
x86 msvc v19.38
x86 msvc v19.latest
x86 nvc++ 22.11
x86 nvc++ 22.7
x86 nvc++ 22.9
x86 nvc++ 23.1
x86 nvc++ 23.11
x86 nvc++ 23.3
x86 nvc++ 23.5
x86 nvc++ 23.7
x86 nvc++ 23.9
x86 nvc++ 24.1
x86 nvc++ 24.3
x86-64 Zapcc 190308
x86-64 clang (amd-stg-open)
x86-64 clang (assertions trunk)
x86-64 clang (clangir)
x86-64 clang (experimental -Wlifetime)
x86-64 clang (experimental P1061)
x86-64 clang (experimental P1144)
x86-64 clang (experimental P1221)
x86-64 clang (experimental P2996)
x86-64 clang (experimental metaprogramming - P2632)
x86-64 clang (experimental pattern matching)
x86-64 clang (old concepts branch)
x86-64 clang (reflection)
x86-64 clang (resugar)
x86-64 clang (thephd.dev)
x86-64 clang (trunk)
x86-64 clang (variadic friends - P2893)
x86-64 clang (widberg)
x86-64 clang 10.0.0
x86-64 clang 10.0.0 (assertions)
x86-64 clang 10.0.1
x86-64 clang 11.0.0
x86-64 clang 11.0.0 (assertions)
x86-64 clang 11.0.1
x86-64 clang 12.0.0
x86-64 clang 12.0.0 (assertions)
x86-64 clang 12.0.1
x86-64 clang 13.0.0
x86-64 clang 13.0.0 (assertions)
x86-64 clang 13.0.1
x86-64 clang 14.0.0
x86-64 clang 14.0.0 (assertions)
x86-64 clang 15.0.0
x86-64 clang 15.0.0 (assertions)
x86-64 clang 16.0.0
x86-64 clang 16.0.0 (assertions)
x86-64 clang 17.0.1
x86-64 clang 17.0.1 (assertions)
x86-64 clang 18.1.0
x86-64 clang 18.1.0 (assertions)
x86-64 clang 2.6.0 (assertions)
x86-64 clang 2.7.0 (assertions)
x86-64 clang 2.8.0 (assertions)
x86-64 clang 2.9.0 (assertions)
x86-64 clang 3.0.0
x86-64 clang 3.0.0 (assertions)
x86-64 clang 3.1
x86-64 clang 3.1 (assertions)
x86-64 clang 3.2
x86-64 clang 3.2 (assertions)
x86-64 clang 3.3
x86-64 clang 3.3 (assertions)
x86-64 clang 3.4 (assertions)
x86-64 clang 3.4.1
x86-64 clang 3.5
x86-64 clang 3.5 (assertions)
x86-64 clang 3.5.1
x86-64 clang 3.5.2
x86-64 clang 3.6
x86-64 clang 3.6 (assertions)
x86-64 clang 3.7
x86-64 clang 3.7 (assertions)
x86-64 clang 3.7.1
x86-64 clang 3.8
x86-64 clang 3.8 (assertions)
x86-64 clang 3.8.1
x86-64 clang 3.9.0
x86-64 clang 3.9.0 (assertions)
x86-64 clang 3.9.1
x86-64 clang 4.0.0
x86-64 clang 4.0.0 (assertions)
x86-64 clang 4.0.1
x86-64 clang 5.0.0
x86-64 clang 5.0.0 (assertions)
x86-64 clang 5.0.1
x86-64 clang 5.0.2
x86-64 clang 6.0.0
x86-64 clang 6.0.0 (assertions)
x86-64 clang 6.0.1
x86-64 clang 7.0.0
x86-64 clang 7.0.0 (assertions)
x86-64 clang 7.0.1
x86-64 clang 7.1.0
x86-64 clang 8.0.0
x86-64 clang 8.0.0 (assertions)
x86-64 clang 8.0.1
x86-64 clang 9.0.0
x86-64 clang 9.0.0 (assertions)
x86-64 clang 9.0.1
x86-64 clang rocm-4.5.2
x86-64 clang rocm-5.0.2
x86-64 clang rocm-5.1.3
x86-64 clang rocm-5.2.3
x86-64 clang rocm-5.3.3
x86-64 clang rocm-5.7.0
x86-64 gcc (contract labels)
x86-64 gcc (contracts natural syntax)
x86-64 gcc (contracts)
x86-64 gcc (coroutines)
x86-64 gcc (modules)
x86-64 gcc (trunk)
x86-64 gcc 10.1
x86-64 gcc 10.2
x86-64 gcc 10.3
x86-64 gcc 10.4
x86-64 gcc 10.5
x86-64 gcc 11.1
x86-64 gcc 11.2
x86-64 gcc 11.3
x86-64 gcc 11.4
x86-64 gcc 12.1
x86-64 gcc 12.2
x86-64 gcc 12.3
x86-64 gcc 13.1
x86-64 gcc 13.2
x86-64 gcc 4.1.2
x86-64 gcc 4.4.7
x86-64 gcc 4.5.3
x86-64 gcc 4.6.4
x86-64 gcc 4.7.1
x86-64 gcc 4.7.2
x86-64 gcc 4.7.3
x86-64 gcc 4.7.4
x86-64 gcc 4.8.1
x86-64 gcc 4.8.2
x86-64 gcc 4.8.3
x86-64 gcc 4.8.4
x86-64 gcc 4.8.5
x86-64 gcc 4.9.0
x86-64 gcc 4.9.1
x86-64 gcc 4.9.2
x86-64 gcc 4.9.3
x86-64 gcc 4.9.4
x86-64 gcc 5.1
x86-64 gcc 5.2
x86-64 gcc 5.3
x86-64 gcc 5.4
x86-64 gcc 5.5
x86-64 gcc 6.1
x86-64 gcc 6.2
x86-64 gcc 6.3
x86-64 gcc 6.4
x86-64 gcc 7.1
x86-64 gcc 7.2
x86-64 gcc 7.3
x86-64 gcc 7.4
x86-64 gcc 7.5
x86-64 gcc 8.1
x86-64 gcc 8.2
x86-64 gcc 8.3
x86-64 gcc 8.4
x86-64 gcc 8.5
x86-64 gcc 9.1
x86-64 gcc 9.2
x86-64 gcc 9.3
x86-64 gcc 9.4
x86-64 gcc 9.5
x86-64 icc 13.0.1
x86-64 icc 16.0.3
x86-64 icc 17.0.0
x86-64 icc 18.0.0
x86-64 icc 19.0.0
x86-64 icc 19.0.1
x86-64 icc 2021.1.2
x86-64 icc 2021.10.0
x86-64 icc 2021.2.0
x86-64 icc 2021.3.0
x86-64 icc 2021.4.0
x86-64 icc 2021.5.0
x86-64 icc 2021.6.0
x86-64 icc 2021.7.0
x86-64 icc 2021.7.1
x86-64 icc 2021.8.0
x86-64 icc 2021.9.0
x86-64 icx (latest)
x86-64 icx 2021.1.2
x86-64 icx 2021.2.0
x86-64 icx 2021.3.0
x86-64 icx 2021.4.0
x86-64 icx 2022.0.0
x86-64 icx 2022.1.0
x86-64 icx 2022.2.0
x86-64 icx 2022.2.1
x86-64 icx 2023.0.0
x86-64 icx 2023.1.0
x86-64 icx 2023.2.1
x86-64 icx 2024.0.0
zig c++ 0.10.0
zig c++ 0.11.0
zig c++ 0.6.0
zig c++ 0.7.0
zig c++ 0.7.1
zig c++ 0.8.0
zig c++ 0.9.0
zig c++ trunk
Options
Source code
/* ======================================================================== meow_intrinsics.h (C) Copyright 2018 by Molly Rocket, Inc. (https://mollyrocket.com) See https://mollyrocket.com/meowhash for details. This is the default way to define all of the types and operations that meow_hash.h needs. However, if you've got your _own_ equivalent type definitions and intrinsics, you can _omit_ this header file and just #define/typedef all the Meow ops to map to your own ops, keeping things nice and uniform in your codebase. ======================================================================== */ #if !defined(MEOW_HASH_INTRINSICS_H) // // NOTE(casey): Try to guess the source file for compiler intrinsics // #if _MSC_VER #if _M_AMD64 || _M_IX86 #include <intrin.h> #elif _M_ARM64 #include <arm64_neon.h> #endif #else #if __x86_64__ || __i386__ #include <x86intrin.h> #elif __aarch64__ #include <arm_neon.h> #endif #endif // // NOTE(casey): Set #define's to their defaults // #if !defined(MEOW_HASH_INTEL) || !defined(MEOW_HASH_ARMV8) #if __x86_64__ || __i386__ || _M_AMD64 || _M_IX86 #define MEOW_HASH_INTEL 1 #elif __aarch64__ || _M_ARM64 #define MEOW_HASH_ARMV8 1 #else #error Cannot determine architecture to use! #endif #endif #if !defined(MEOW_HASH_AVX512) #define MEOW_HASH_AVX512 0 #endif // // NOTE(casey): Define basic types // #define meow_u8 char unsigned #define meow_u16 short unsigned #define meow_u32 int unsigned #define meow_u64 long long unsigned // // NOTE(casey): Operations for x64 processors // #if MEOW_HASH_INTEL #define meow_u128 __m128i #define meow_aes_128 __m128i #define meow_u256 __m256i #define meow_aes_256 __m256i #define meow_u512 __m512i #define meow_aes_512 __m512i #define Meow128_AreEqual(A, B) (_mm_movemask_epi8(_mm_cmpeq_epi8((A), (B))) == 0xFFFF) #define Meow128_AESDEC(Prior, Xor) _mm_aesdec_si128((Prior), (Xor)) #define Meow128_AESDEC_Mem(Prior, Xor) _mm_aesdec_si128((Prior), _mm_loadu_si128((meow_u128 *)(Xor))) #define Meow128_AESDEC_Finalize(A) (A) #define Meow128_Set64x2(Low64, High64) _mm_set_epi64x((High64), (Low64)) #define Meow128_Set64x2_State(Low64, High64) Meow128_Set64x2(Low64, High64) // TODO(casey): Not sure if this should actually be Meow128_Zero(A) ((A) = _mm_setzero_si128()), maybe #define Meow128_Zero() _mm_setzero_si128() #define Meow128_ZeroState() Meow128_Zero() #define Meow256_AESDEC(Prior, XOr) _mm256_aesdec_epi128((Prior), (XOr)) #define Meow256_AESDEC_Mem(Prior, XOr) _mm256_aesdec_epi128((Prior), *(meow_u256 *)(XOr)) #define Meow256_Store(Value, Ptr) _mm256_store_si256((meow_u256 *)(Ptr), (Value)); #define Meow256_Zero() _mm256_setzero_si256() #define Meow512_AESDEC(Prior, XOr) _mm512_aesdec_epi128((Prior), (XOr)) #define Meow512_AESDEC_Mem(Prior, XOr) _mm512_aesdec_epi128((Prior), *(meow_u256 *)(XOr)) #define Meow512_Store(Value, Ptr) _mm256_store_si256((meow_u512 *)(Ptr), (Value)); #define Meow512_Zero() _mm512_setzero_si512() // // NOTE(casey): Operations for ARM processors // #elif MEOW_HASH_ARMV8 #define meow_u128 uint8x16_t // NOTE(mmozeiko): AES opcodes on ARMv8 work a bit differently than on Intel // On Intel the "x = AESDEC(x, m)" does following: // x = InvMixColumns(SubBytes(ShiftRows(x))) ^ m // But on ARMv8 the "x = AESDEC(x, m)" does following: // x = SubBytes(ShiftRows(x ^ m)) // Thus on ARMv8 it requires extra InvMixColumns call and delay on Xor operation. // On iteration N it needs to use m[N-1] as input, and remeber m[N] for next iteration. // This structure will store memory operand in member B which will be used in // next AESDEC opcode. Remember to do one more XOR(A,B) when finishing AES // operations in a loop. typedef struct { meow_u128 A; meow_u128 B; } meow_aes_128; static int Meow128_AreEqual(meow_u128 A, meow_u128 B) { uint8x16_t Powers = { 1, 2, 4, 8, 16, 32, 64, 128, 1, 2, 4, 8, 16, 32, 64, 128, }; uint8x16_t Input = vceqq_u8(A, B); uint64x2_t Mask = vpaddlq_u32(vpaddlq_u16(vpaddlq_u8(vandq_u8(Input, Powers)))); meow_u16 Output; vst1q_lane_u8((meow_u8*)&Output + 0, vreinterpretq_u8_u64(Mask), 0); vst1q_lane_u8((meow_u8*)&Output + 1, vreinterpretq_u8_u64(Mask), 8); return Output == 0xFFFF; } static meow_aes_128 Meow128_AESDEC(meow_aes_128 Prior, meow_u128 Xor) { meow_aes_128 R; R.A = vaesimcq_u8(vaesdq_u8(Prior.A, Prior.B)); R.B = Xor; return(R); } static meow_aes_128 Meow128_AESDEC_Mem(meow_aes_128 Prior, void *Xor) { meow_aes_128 R; R.A = vaesimcq_u8(vaesdq_u8(Prior.A, Prior.B)); R.B = vld1q_u8((meow_u8*)Xor); return(R); } static meow_u128 Meow128_AESDEC_Finalize(meow_aes_128 Value) { meow_u128 R = veorq_u8(Value.A, Value.B); return(R); } static meow_u128 Meow128_Zero() { meow_u128 R = vdupq_n_u8(0); return(R); } static meow_aes_128 Meow128_ZeroState() { meow_aes_128 R; R.A = R.B = vdupq_n_u8(0); return(R); } static meow_u128 Meow128_Set64x2(meow_u64 Low64, meow_u64 High64) { meow_u128 R = vreinterpretq_u8_u64(vcombine_u64(vcreate_u64(Low64), vcreate_u64(High64))); return(R); } static meow_aes_128 Meow128_Set64x2_State(meow_u64 Low64, meow_u64 High64) { meow_aes_128 R; R.A = Meow128_Set64x2(Low64, High64); R.B = Meow128_Zero(); return(R); } #endif #if MEOW_HASH_IACA // NOTE(casey): Define this if you'd like to analyze Meow hash with IACA #include <iacaMarks.h> #define MEOW_ANALYSIS_START IACA_VC64_START #define MEOW_ANALYSIS_END IACA_VC64_END #else #define MEOW_ANALYSIS_START #define MEOW_ANALYSIS_END #endif struct meow_hash_state; typedef meow_u128 meow_hash_implementation(meow_u64 Seed, meow_u64 Len, void *Source); typedef void meow_absorb_implementation(struct meow_hash_state *State, meow_u64 Len, void *Source); #define MEOW_HASH_INTRINSICS_H #endif /* ======================================================================== Meow - A Fast Non-cryptographic Hash for Large Data Sizes (C) Copyright 2018 by Molly Rocket, Inc. (https://mollyrocket.com) See https://mollyrocket.com/meowhash for details. ======================================================================== zlib License (C) Copyright 2018 Molly Rocket, Inc. This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. ======================================================================== FAQ Q: What is it? A: Meow is a 128-bit non-cryptographic hash that operates at high speeds on x64 and ARM processors that provide AES instructions. It is designed to be truncatable to 64 and 32-bit hash values and still retain good collision resistance. Q: What is it GOOD for? A: Quickly hashing large amounts of data for comparison purposes such as block deduplication or change detection. Q: What is it BAD for? A: Anything security-related. It should be assumed that it provides no protection from adversaries whatsoever. Q: Why is it called the "Meow hash"? A: It is named after a character in Meow the Infinite (https://meowtheinfinite.com) Q: Who wrote it? A: CASEY MURATORI (https://caseymuratori.com) wrote the original implementation for use in processing large-footprint assets for the game 1935 (https://molly1935.com). After the initial version, the hash was refined via collaboration with several great programmers who contributed suggestions and modifications: WON CHUN (https://twitter.com/won3d) provided the much-improved end-of-buffer handling, which significantly improved Meow's performance on small unaligned hashes. MARTINS MOZEIKO (https://matrins.ninja) ported Meow to ARM and added the proper preprocessor dressing for clean compilation on a variety of compiler configurations. FABIAN GIESEN (https://fgiesen.wordpress.com) provided support for getting the benchmarking working properly across a number of platforms. ARAS PRANCKEVICIUS (https://aras-p.info) provided the allocation shim for compilation on Mac OS X. ======================================================================== USAGE For a complete working example, see meow_example.cpp. In order to use the Meow hash, you must have x64 intrinsics defined. This requires including platform- and compiler-specific header files. If you know how to do this yourself, you are welcome to define the types that Meow needs. However, if you just want Meow to define its own stuff using its best guesses, you can include meow_intrinsics.h: #include "meow_intrinsics.h" Once you have intrinsics defined, either by meow_intrinsics.h or by definining all the necessary types and intrinsics yourself, you can include meow_hash.h and call a MeowHash implementation: #include "meow_hash.h" // Always available meow_u128 MeowHash1(u64 Seed, u64 Len, void *Source); // Available only when compiling with AVX-512 extensions meow_u128 MeowHash2(u64 Seed, u64 Len, void *Source); meow_u128 MeowHash4(u64 Seed, u64 Len, void *Source); MeowHash1 is 128-bit wide AES-NI. MeowHash2 is 256-bit wide VAES. MeowHash4 is 512-bit wide VAES. As of the initial publication of Meow hash, no consumer CPUs exist which support VAES, so the latter two are for future use and internal x64 vendor testing. Calling MeowHash* with a seed, length, and source pointer computes the hash and returns a 128-bit value which contains the full 128-bit hash. You can use this value directly using your compiler's intrinsics, or you can use some helper functions Meow defines: // NOTE(casey): Check if two Meow hashes are the same // (returns zero if they aren't, non-zero if they are) int MeowHashesAreEqual(meow_u128 A, meow_u128 B) // NOTE(casey): Truncate a Meow hash to 64 bits meow_u64 MeowU64From(meow_u128 Hash); // NOTE(casey): Truncate a Meow hash to 32 bits meow_u32 MeowU32From(meow_u128 Hash); Since no currently available CPUs can run MeowHash2 or MeowHash4, it is not recommended that you include them in your code, because they literally _cannot_ be tested. Once CPUs are available that can run them, you can include them and use a probing function to see if they can be used at startup, as shown in meow_example.cpp. **** VERY IMPORTANT X64 COMPILATION NOTES **** On x64, Meow uses the AESDEC instruction, which comes in two flavors: SSE (aesdec) and AVX (vaesdec). If you are compiling _with_ AVX support, your compiler will probably emit the AVX variant, which means your code WILL NOT RUN on computers that do not have AVX. If you need to deploy this hash on computers that do not have AVX, you must take care to TURN OFF support for AVX in your compiler for the file that includes the Meow hash! **** IF YOU DON'T KNOW WHAT AES-NI, VAES, VEX, ETC. ARE... **** ... then you probably shouldn't be using Meow hash for another few years. Right now, there are only certain CPUs that support AES instructions, both on x64 and ARM. If you don't know what you're doing, you may accidentally ship code that doesn't run on some of the CPUs on your target platforms. Meow hash is designed more for use on the production side, and less on the distributed-to-clients side, so an abundance of caution is advised for people who don't really know about CPU architeectures but were tempted to deploy Meow hash to end users without fully understanding the consequences. ======================================================================== */ // // NOTE(casey): This version is EXPERIMENTAL. The Meow hash is still // undergoing testing and finalization. // // **** EXPECT HASHES/APIs TO CHANGE UNTIL THE VERSION NUMBER HITS 1.0. **** // // You have been warned. // #define MEOW_HASH_VERSION 3 #define MEOW_HASH_VERSION_NAME "0.3/snowshoe" #define MEOW_HASH_BLOCK_SIZE_SHIFT 8 // // NOTE(casey): Smaller hash extraction // static meow_u64 MeowU64From(meow_u128 Hash) { // TODO(casey): It is probably worth it to use the cvt intrinsics here // TODO(mmozeiko): use vgetq_lane_u64 on ARMv8 meow_u64 Result = *(meow_u64 *)&Hash; return(Result); } static meow_u32 MeowU32From(meow_u128 Hash) { // TODO(casey): It is probably worth it to use the cvt intrinsics here // TODO(mmozeiko): use vgetq_lane_u32 on ARMv8 meow_u32 Result = *(meow_u32 *)&Hash; return(Result); } // // NOTE(casey): "Fast" comparison (using SSE or NEON) // static int MeowHashesAreEqual(meow_u128 A, meow_u128 B) { int Result = Meow128_AreEqual(A, B); return(Result); } // // NOTE(casey): 128-wide AES-NI Meow (maximum of 16 bytes/clock single threaded) // static meow_u128 MeowHash1(meow_u64 Seed, meow_u64 TotalLengthInBytes, void *SourceInit) { // // NOTE(casey): Initialize all 16 streams to 0 // meow_aes_128 S0 = Meow128_ZeroState(); meow_aes_128 S1 = Meow128_ZeroState(); meow_aes_128 S2 = Meow128_ZeroState(); meow_aes_128 S3 = Meow128_ZeroState(); meow_aes_128 S4 = Meow128_ZeroState(); meow_aes_128 S5 = Meow128_ZeroState(); meow_aes_128 S6 = Meow128_ZeroState(); meow_aes_128 S7 = Meow128_ZeroState(); meow_aes_128 S8 = Meow128_ZeroState(); meow_aes_128 S9 = Meow128_ZeroState(); meow_aes_128 SA = Meow128_ZeroState(); meow_aes_128 SB = Meow128_ZeroState(); meow_aes_128 SC = Meow128_ZeroState(); meow_aes_128 SD = Meow128_ZeroState(); meow_aes_128 SE = Meow128_ZeroState(); meow_aes_128 SF = Meow128_ZeroState(); // // NOTE(casey): Handle as many full 256-byte blocks as possible (16 cycles per block) // meow_u8 *Source = (meow_u8 *)SourceInit; meow_u64 Len = TotalLengthInBytes; meow_u64 BlockCount = (Len >> MEOW_HASH_BLOCK_SIZE_SHIFT); Len -= (BlockCount << MEOW_HASH_BLOCK_SIZE_SHIFT); while(BlockCount--) { S0 = Meow128_AESDEC_Mem(S0, Source); S1 = Meow128_AESDEC_Mem(S1, Source + 16); S2 = Meow128_AESDEC_Mem(S2, Source + 32); S3 = Meow128_AESDEC_Mem(S3, Source + 48); S4 = Meow128_AESDEC_Mem(S4, Source + 64); S5 = Meow128_AESDEC_Mem(S5, Source + 80); S6 = Meow128_AESDEC_Mem(S6, Source + 96); S7 = Meow128_AESDEC_Mem(S7, Source + 112); S8 = Meow128_AESDEC_Mem(S8, Source + 128); S9 = Meow128_AESDEC_Mem(S9, Source + 144); SA = Meow128_AESDEC_Mem(SA, Source + 160); SB = Meow128_AESDEC_Mem(SB, Source + 176); SC = Meow128_AESDEC_Mem(SC, Source + 192); SD = Meow128_AESDEC_Mem(SD, Source + 208); SE = Meow128_AESDEC_Mem(SE, Source + 224); SF = Meow128_AESDEC_Mem(SF, Source + 240); Source += (1 << MEOW_HASH_BLOCK_SIZE_SHIFT); } // // NOTE(casey): Handle as many full 128-bit lanes as possible (15 cycles at length 15) // switch(Len >> 4) { case 15: SE = Meow128_AESDEC_Mem(SE, Source + 224); case 14: SD = Meow128_AESDEC_Mem(SD, Source + 208); case 13: SC = Meow128_AESDEC_Mem(SC, Source + 192); case 12: SB = Meow128_AESDEC_Mem(SB, Source + 176); case 11: SA = Meow128_AESDEC_Mem(SA, Source + 160); case 10: S9 = Meow128_AESDEC_Mem(S9, Source + 144); case 9: S8 = Meow128_AESDEC_Mem(S8, Source + 128); case 8: S7 = Meow128_AESDEC_Mem(S7, Source + 112); case 7: S6 = Meow128_AESDEC_Mem(S6, Source + 96); case 6: S5 = Meow128_AESDEC_Mem(S5, Source + 80); case 5: S4 = Meow128_AESDEC_Mem(S4, Source + 64); case 4: S3 = Meow128_AESDEC_Mem(S3, Source + 48); case 3: S2 = Meow128_AESDEC_Mem(S2, Source + 32); case 2: S1 = Meow128_AESDEC_Mem(S1, Source + 16); case 1: S0 = Meow128_AESDEC_Mem(S0, Source); default:; } Source += (Len & 0xF0); // // NOTE(casey): Start as much of the mixdown as we can before handling the overhang // // TODO(casey): There needs to be a solid idea behind the mixing vector here. // Before Meow v1, we need some definitive analysis of what it should be. meow_u128 Mixer = Meow128_Set64x2(Seed - TotalLengthInBytes, Seed + TotalLengthInBytes + 1); S0 = Meow128_AESDEC(S0, Meow128_AESDEC_Finalize(S8)); S1 = Meow128_AESDEC(S1, Meow128_AESDEC_Finalize(S9)); S2 = Meow128_AESDEC(S2, Meow128_AESDEC_Finalize(SA)); S3 = Meow128_AESDEC(S3, Meow128_AESDEC_Finalize(SB)); S4 = Meow128_AESDEC(S4, Meow128_AESDEC_Finalize(SC)); S5 = Meow128_AESDEC(S5, Meow128_AESDEC_Finalize(SD)); S6 = Meow128_AESDEC(S6, Meow128_AESDEC_Finalize(SE)); S0 = Meow128_AESDEC(S0, Mixer); S1 = Meow128_AESDEC(S1, Mixer); S2 = Meow128_AESDEC(S2, Mixer); S3 = Meow128_AESDEC(S3, Mixer); S4 = Meow128_AESDEC(S4, Mixer); S5 = Meow128_AESDEC(S5, Mixer); S6 = Meow128_AESDEC(S6, Mixer); // // NOTE(casey): Deal with individual bytes // if(Len & 0xF) { // NOTE(casey): Scalar partial load construction appears courtesy of Won "Hash Daddy" Chun. // It allows the partial bytes to be handled by the scalar pipe "in the shadow" of the // vector pipe. meow_u64 Has8 = (Len & 8); meow_u64 Has4 = (Len & 4); meow_u64 Lo = 0; meow_u64 Hi = 0; if(Has8) { Lo = *(meow_u64 *)Source; } if(Has4) { Hi = *(meow_u32 *)(Source + Has8); } switch (Len & 3) { case 3: Hi |= (meow_u64)(*(Source + Has8 + Has4 + 2)) << 48; case 2: Hi |= (meow_u64)(*(Source + Has8 + Has4 + 1)) << 40; case 1: Hi |= (meow_u64)(*(Source + Has8 + Has4)) << 32; case 0:; } meow_aes_128 PartialState = Meow128_Set64x2_State(Hi, Lo); SF = Meow128_AESDEC(PartialState, Meow128_AESDEC_Finalize(SF)); } // // NOTE(casey): Finish the part of the mixdown that is dependent on SF // and then do the tree reduction (starting the tree reduction early // doesn't seem to save anything) // S7 = Meow128_AESDEC(S7, Meow128_AESDEC_Finalize(SF)); S7 = Meow128_AESDEC(S7, Mixer); S0 = Meow128_AESDEC(S0, Meow128_AESDEC_Finalize(S4)); S1 = Meow128_AESDEC(S1, Meow128_AESDEC_Finalize(S5)); S2 = Meow128_AESDEC(S2, Meow128_AESDEC_Finalize(S6)); S3 = Meow128_AESDEC(S3, Meow128_AESDEC_Finalize(S7)); S0 = Meow128_AESDEC(S0, Mixer); S1 = Meow128_AESDEC(S1, Mixer); S2 = Meow128_AESDEC(S2, Mixer); S3 = Meow128_AESDEC(S3, Mixer); S0 = Meow128_AESDEC(S0, Meow128_AESDEC_Finalize(S2)); S1 = Meow128_AESDEC(S1, Meow128_AESDEC_Finalize(S3)); S0 = Meow128_AESDEC(S0, Meow128_AESDEC_Finalize(S1)); S0 = Meow128_AESDEC(S0, Mixer); meow_u128 Result = Meow128_AESDEC_Finalize(S0); return(Result); } #if MEOW_HASH_AVX512 // // NOTE(casey): 256-wide VAES Meow (maximum of 32 bytes/clock single threaded) // static meow_u128 MeowHash2(meow_u64 Seed, meow_u64 TotalLengthInBytes, void *Source) { meow_u256 S01 = Meow256_Zero(); meow_u256 S23 = Meow256_Zero(); meow_u256 S45 = Meow256_Zero(); meow_u256 S67 = Meow256_Zero(); meow_u256 S89 = Meow256_Zero(); meow_u256 SAB = Meow256_Zero(); meow_u256 SCD = Meow256_Zero(); meow_u256 SEF = Meow256_Zero(); // // NOTE(casey): Handle as many full 256-byte blocks as possible (4 cycles per block) // meow_u8 *Source = (meow_u8 *)SourceInit; meow_u64 Len = TotalLengthInBytes; meow_u64 BlockCount = (Len >> MEOW_HASH_BLOCK_SIZE_SHIFT); Len -= (BlockCount << MEOW_HASH_BLOCK_SIZE_SHIFT); while(BlockCount--) { S01 = Meow256_AESDEC_Mem(Source, S01); S23 = Meow256_AESDEC_Mem(Source + 32, S23); S45 = Meow256_AESDEC_Mem(Source + 64, S45); S67 = Meow256_AESDEC_Mem(Source + 96, S67); S89 = Meow256_AESDEC_Mem(Source + 128, S89); SAB = Meow256_AESDEC_Mem(Source + 160, SAB); SCD = Meow256_AESDEC_Mem(Source + 192, SCD); SEF = Meow256_AESDEC_Mem(Source + 224, SEF); Source += (1 << MEOW_HASH_BLOCK_SIZE_SHIFT); } // // NOTE(casey): Handle as many full 32-byte blocks as possible // switch(Len >> 5) { case 7: SCD = Meow256_AESDEC_Mem(Source + 192, SCD); case 6: SAB = Meow256_AESDEC_Mem(Source + 160, SAB); case 5: S89 = Meow256_AESDEC_Mem(Source + 128, S89); case 4: S67 = Meow256_AESDEC_Mem(Source + 96, S67); case 3: S45 = Meow256_AESDEC_Mem(Source + 64, S45); case 2: S23 = Meow256_AESDEC_Mem(Source + 32, S23); case 1: S01 = Meow256_AESDEC_Mem(Source, S01); default: } meow_u256 Partial = Meow256_PartialLoad(Source + (Len & 0xE0), Len & 0x1F); // TODO(casey): To make the hashes equivalent, we would need to shuffle up the // highest 128 here to be inline with the high 128 of the 256 Partial, because // that's how the 128-bit works... SEF = Meow256_AESDEC_Mem(Partial, SEF); S01 = Meow256_AESDEC(S01, S89); S23 = Meow256_AESDEC(S23, SAB); S45 = Meow256_AESDEC(S45, SCD); S67 = Meow256_AESDEC(S67, SEF); S01 = Meow256_AESDEC(S01, Mixer4); S23 = Meow256_AESDEC(S23, Mixer4); S45 = Meow256_AESDEC(S45, Mixer4); S67 = Meow256_AESDEC(S67, Mixer4); S01 = Meow256_AESDEC(S01, S45); S23 = Meow256_AESDEC(S23, S67); S01 = Meow256_AESDEC(S01, S23); meow_u128 S0 = Meow128FromLow(S01); meow_u128 S1 = Meow128FromHIgh(S01); S0 = Meow128_AESDEC(S0, S1); S0 = Meow128_AESDEC(S0, Mixer1); return(S0); return(Result); } // // NOTE(casey): 512-wide VAES Meow (maximum of 64 bytes/clock single threaded) // static meow_u128 MeowHash4(meow_u64 Seed, meow_u64 TotalLengthInBytes, void *SourceInit) { // // NOTE(casey): Initialize all 16 streams to 0 // meow_u512 S0123 = Meow512_Zero(); meow_u512 S4567 = Meow512_Zero(); meow_u512 S89AB = Meow512_Zero(); meow_u512 SCDEF = Meow512_Zero(); // // NOTE(casey): Handle as many full 256-byte blocks as possible (4 cycles per block) // meow_u8 *Source = (meow_u8 *)SourceInit; meow_u64 Len = TotalLengthInBytes; meow_u64 BlockCount = (Len >> MEOW_HASH_BLOCK_SIZE_SHIFT); Len -= (BlockCount << MEOW_HASH_BLOCK_SIZE_SHIFT); while(BlockCount--) { S0123 = Meow512_AESDEC_Mem(Source, S0123); S4567 = Meow512_AESDEC_Mem(Source + 64, S4567); S89AB = Meow512_AESDEC_Mem(Source + 128, S89AB); SCDEF = Meow512_AESDEC_Mem(Source + 192, SCDEF); Source += (1 << MEOW_HASH_BLOCK_SIZE_SHIFT); } // // NOTE(casey): Handle as many full 64-byte blocks as possible // switch(Len >> 6) { case 3: S89AB = Meow512_AESDEC_Mem(Source + 128, S89AB); case 2: S4567 = Meow512_AESDEC_Mem(Source + 64, S4567); case 1: S0123 = Meow512_AESDEC_Mem(Source, S0123); default: } meow_u512 Partial = Meow512_PartialLoad(Source + (Len & 0xC0), Len & 0x3F); // TODO(casey): To make the hashes equivalent, we would need to shuffle up the // highest 128 here to be inline with the high 128 of the 512 Partial, because // that's how the 128-bit works... SCDEF = Meow512_AESDEC(Partial, SCDEF); S0123 = Meow512_AESDEC(S0123, S89AB); S4567 = Meow512_AESDEC(S4567, SCDEF); S0123 = Meow512_AESDEC(S0123, Mixer4); S4567 = Meow512_AESDEC(S4567, Mixer4); S0123 = Meow512_AESDEC(S0123, S4567); meow_u256 S01 = Meow256FromLow(S0123); meow_u256 S23 = Meow256FromHigh(S0123); S01 = Meow256_AESDEC(S01, S23); meow_u128 S0 = Meow128FromLow(S01); meow_u128 S1 = Meow128FromHIgh(S01); S0 = Meow128_AESDEC(S0, S1); S0 = Meow128_AESDEC(S0, Mixer1); return(S0); } #endif // // NOTE(casey): Streaming construction (optional) // typedef struct meow_hash_state { union { struct { meow_aes_128 S0; meow_aes_128 S1; meow_aes_128 S2; meow_aes_128 S3; meow_aes_128 S4; meow_aes_128 S5; meow_aes_128 S6; meow_aes_128 S7; meow_aes_128 S8; meow_aes_128 S9; meow_aes_128 SA; meow_aes_128 SB; meow_aes_128 SC; meow_aes_128 SD; meow_aes_128 SE; meow_aes_128 SF; }; #if MEOW_HASH_AVX512 struct { meow_aes_256 S01; meow_aes_256 S23; meow_aes_256 S45; meow_aes_256 S67; meow_aes_256 S89; meow_aes_256 SAB; meow_aes_256 SCD; meow_aes_256 SEF; }; struct { meow_aes_512 S0123; meow_aes_512 S4567; meow_aes_512 S89AB; meow_aes_512 SCDEF; }; #endif }; meow_u64 TotalLengthInBytes; meow_u8 Buffer[1 << MEOW_HASH_BLOCK_SIZE_SHIFT]; int unsigned BufferLen; } meow_hash_state; static void MeowHashBegin(meow_hash_state *State) { // // NOTE(casey): Initialize all 16 streams to 0 // State->S0 = Meow128_ZeroState(); State->S1 = Meow128_ZeroState(); State->S2 = Meow128_ZeroState(); State->S3 = Meow128_ZeroState(); State->S4 = Meow128_ZeroState(); State->S5 = Meow128_ZeroState(); State->S6 = Meow128_ZeroState(); State->S7 = Meow128_ZeroState(); State->S8 = Meow128_ZeroState(); State->S9 = Meow128_ZeroState(); State->SA = Meow128_ZeroState(); State->SB = Meow128_ZeroState(); State->SC = Meow128_ZeroState(); State->SD = Meow128_ZeroState(); State->SE = Meow128_ZeroState(); State->SF = Meow128_ZeroState(); State->TotalLengthInBytes = 0; State->BufferLen = 0; } static void MeowHashAbsorbBlocks1(meow_hash_state *State, meow_u64 BlockCount, meow_u8 *Source) { meow_aes_128 S0 = State->S0; meow_aes_128 S1 = State->S1; meow_aes_128 S2 = State->S2; meow_aes_128 S3 = State->S3; meow_aes_128 S4 = State->S4; meow_aes_128 S5 = State->S5; meow_aes_128 S6 = State->S6; meow_aes_128 S7 = State->S7; meow_aes_128 S8 = State->S8; meow_aes_128 S9 = State->S9; meow_aes_128 SA = State->SA; meow_aes_128 SB = State->SB; meow_aes_128 SC = State->SC; meow_aes_128 SD = State->SD; meow_aes_128 SE = State->SE; meow_aes_128 SF = State->SF; while(BlockCount--) { S0 = Meow128_AESDEC_Mem(S0, Source); S1 = Meow128_AESDEC_Mem(S1, Source + 16); S2 = Meow128_AESDEC_Mem(S2, Source + 32); S3 = Meow128_AESDEC_Mem(S3, Source + 48); S4 = Meow128_AESDEC_Mem(S4, Source + 64); S5 = Meow128_AESDEC_Mem(S5, Source + 80); S6 = Meow128_AESDEC_Mem(S6, Source + 96); S7 = Meow128_AESDEC_Mem(S7, Source + 112); S8 = Meow128_AESDEC_Mem(S8, Source + 128); S9 = Meow128_AESDEC_Mem(S9, Source + 144); SA = Meow128_AESDEC_Mem(SA, Source + 160); SB = Meow128_AESDEC_Mem(SB, Source + 176); SC = Meow128_AESDEC_Mem(SC, Source + 192); SD = Meow128_AESDEC_Mem(SD, Source + 208); SE = Meow128_AESDEC_Mem(SE, Source + 224); SF = Meow128_AESDEC_Mem(SF, Source + 240); Source += (1 << MEOW_HASH_BLOCK_SIZE_SHIFT); } State->S0 = S0; State->S1 = S1; State->S2 = S2; State->S3 = S3; State->S4 = S4; State->S5 = S5; State->S6 = S6; State->S7 = S7; State->S8 = S8; State->S9 = S9; State->SA = SA; State->SB = SB; State->SC = SC; State->SD = SD; State->SE = SE; State->SF = SF; } static void MeowHashAbsorb1(meow_hash_state *State, meow_u64 Len, void *SourceInit) { State->TotalLengthInBytes += Len; meow_u8 *Source = (meow_u8 *)SourceInit; // NOTE(casey): Handle any buffered residual if(State->BufferLen) { int unsigned Fill = (sizeof(State->Buffer) - State->BufferLen); if(Fill > Len) { Fill = (int unsigned)Len; } Len -= Fill; while(Fill--) { State->Buffer[State->BufferLen++] = *Source++; } if(State->BufferLen == sizeof(State->Buffer)) { MeowHashAbsorbBlocks1(State, 1, State->Buffer); State->BufferLen = 0; } } // NOTE(casey): Handle any full blocks meow_u64 BlockCount = (Len >> MEOW_HASH_BLOCK_SIZE_SHIFT); meow_u64 Advance = (BlockCount << MEOW_HASH_BLOCK_SIZE_SHIFT); MeowHashAbsorbBlocks1(State, BlockCount, Source); Len -= Advance; Source += Advance; // NOTE(casey): Store residual while(Len--) { State->Buffer[State->BufferLen++] = *Source++; } } static meow_u128 MeowHashEnd(meow_hash_state *State, meow_u64 Seed) { meow_aes_128 S0 = State->S0; meow_aes_128 S1 = State->S1; meow_aes_128 S2 = State->S2; meow_aes_128 S3 = State->S3; meow_aes_128 S4 = State->S4; meow_aes_128 S5 = State->S5; meow_aes_128 S6 = State->S6; meow_aes_128 S7 = State->S7; meow_aes_128 S8 = State->S8; meow_aes_128 S9 = State->S9; meow_aes_128 SA = State->SA; meow_aes_128 SB = State->SB; meow_aes_128 SC = State->SC; meow_aes_128 SD = State->SD; meow_aes_128 SE = State->SE; meow_aes_128 SF = State->SF; meow_u8 *Source = State->Buffer; int unsigned Len = State->BufferLen; switch(Len >> 4) { case 15: SE = Meow128_AESDEC_Mem(SE, Source + 224); case 14: SD = Meow128_AESDEC_Mem(SD, Source + 208); case 13: SC = Meow128_AESDEC_Mem(SC, Source + 192); case 12: SB = Meow128_AESDEC_Mem(SB, Source + 176); case 11: SA = Meow128_AESDEC_Mem(SA, Source + 160); case 10: S9 = Meow128_AESDEC_Mem(S9, Source + 144); case 9: S8 = Meow128_AESDEC_Mem(S8, Source + 128); case 8: S7 = Meow128_AESDEC_Mem(S7, Source + 112); case 7: S6 = Meow128_AESDEC_Mem(S6, Source + 96); case 6: S5 = Meow128_AESDEC_Mem(S5, Source + 80); case 5: S4 = Meow128_AESDEC_Mem(S4, Source + 64); case 4: S3 = Meow128_AESDEC_Mem(S3, Source + 48); case 3: S2 = Meow128_AESDEC_Mem(S2, Source + 32); case 2: S1 = Meow128_AESDEC_Mem(S1, Source + 16); case 1: S0 = Meow128_AESDEC_Mem(S0, Source); default:; } Source += (Len & 0xF0); // // NOTE(casey): Deal with individual bytes // if(Len & 0xF) { // NOTE(casey): Scalar partial load construction appears courtesy of Won "Hash Daddy" Chun. // It allows the partial bytes to be handled by the scalar pipe "in the shadow" of the // vector pipe. meow_u64 Has8 = (Len & 8); meow_u64 Has4 = (Len & 4); meow_u64 Lo = 0; meow_u64 Hi = 0; if(Has8) { Lo = *(meow_u64 *)Source; } if(Has4) { Hi = *(meow_u32 *)(Source + Has8); } switch (Len & 3) { case 3: Hi |= (meow_u64)(*(Source + Has8 + Has4 + 2)) << 48; case 2: Hi |= (meow_u64)(*(Source + Has8 + Has4 + 1)) << 40; case 1: Hi |= (meow_u64)(*(Source + Has8 + Has4)) << 32; case 0:; } meow_aes_128 PartialState = Meow128_Set64x2_State(Hi, Lo); SF = Meow128_AESDEC(PartialState, Meow128_AESDEC_Finalize(SF)); } meow_u128 Mixer = Meow128_Set64x2(Seed - State->TotalLengthInBytes, Seed + State->TotalLengthInBytes + 1); S0 = Meow128_AESDEC(S0, Meow128_AESDEC_Finalize(S8)); S1 = Meow128_AESDEC(S1, Meow128_AESDEC_Finalize(S9)); S2 = Meow128_AESDEC(S2, Meow128_AESDEC_Finalize(SA)); S3 = Meow128_AESDEC(S3, Meow128_AESDEC_Finalize(SB)); S4 = Meow128_AESDEC(S4, Meow128_AESDEC_Finalize(SC)); S5 = Meow128_AESDEC(S5, Meow128_AESDEC_Finalize(SD)); S6 = Meow128_AESDEC(S6, Meow128_AESDEC_Finalize(SE)); S7 = Meow128_AESDEC(S7, Meow128_AESDEC_Finalize(SF)); S0 = Meow128_AESDEC(S0, Mixer); S1 = Meow128_AESDEC(S1, Mixer); S2 = Meow128_AESDEC(S2, Mixer); S3 = Meow128_AESDEC(S3, Mixer); S4 = Meow128_AESDEC(S4, Mixer); S5 = Meow128_AESDEC(S5, Mixer); S6 = Meow128_AESDEC(S6, Mixer); S7 = Meow128_AESDEC(S7, Mixer); S0 = Meow128_AESDEC(S0, Meow128_AESDEC_Finalize(S4)); S1 = Meow128_AESDEC(S1, Meow128_AESDEC_Finalize(S5)); S2 = Meow128_AESDEC(S2, Meow128_AESDEC_Finalize(S6)); S3 = Meow128_AESDEC(S3, Meow128_AESDEC_Finalize(S7)); S0 = Meow128_AESDEC(S0, Mixer); S1 = Meow128_AESDEC(S1, Mixer); S2 = Meow128_AESDEC(S2, Mixer); S3 = Meow128_AESDEC(S3, Mixer); S0 = Meow128_AESDEC(S0, Meow128_AESDEC_Finalize(S2)); S1 = Meow128_AESDEC(S1, Meow128_AESDEC_Finalize(S3)); S0 = Meow128_AESDEC(S0, Meow128_AESDEC_Finalize(S1)); S0 = Meow128_AESDEC(S0, Mixer); meow_u128 Result = Meow128_AESDEC_Finalize(S0); return(Result); } /* ======================================================================== meow_example.cpp - basic usage example of the Meow hash (C) Copyright 2018 by Molly Rocket, Inc. (https://mollyrocket.com) See https://mollyrocket.com/meowhash for details. ======================================================================== */ #include <stdio.h> #include <stdlib.h> #include <memory.h> // // NOTE(casey): Step 1 - include an intrinsics header, then include meow_hash.h // // Meow relies on definitions for non-standard types (meow_u128, etc.) and // intrinsics for various platforms. You can either include the supplied meow_intrinsics.h // file that will define these for you with its best guesses for your platform, or for // more control, you can define them all yourself to map to your own stuff. // // // NOTE(casey): Step 2 (optional) - for future-proofing, detect which Meow hash the CPU can run // // This is COMPLETELY OPTIONAL - you can instead just always call MeowHash1 to use // the 128-bit-wide version exclusively. // static meow_hash_implementation *MeowHash = MeowHash1; int MeowHashSpecializeForCPU(void) { int Result = 0; #if MEOW_HASH_AVX512 try { char Garbage[64]; MeowHash4(0, sizeof(Garbage), Garbage); MeowHash = MeowHash4; Result = 512; } catch(...) #endif { #if MEOW_HASH_AVX512 try { char Garbage[64]; MeowHash2(0, sizeof(Garbage), Garbage); MeowHash = MeowHash2; Result = 256; } catch(...) #endif { MeowHash = MeowHash1; Result = 128; } } return(Result); } // // NOTE(casey): Step 3 - use the Meow hash in a variety of ways! // // Example functions below: // PrintHash - how to print a Meow hash to stdout, from highest-order 32-bits to lowest // HashTestBuffer - how to have Meow hash a buffer of data // HashOneFile - have Meow hash the contents of a file // CompareTwoFiles - have Meow hash the contents of two files, and check for equivalence // // // NOTE(casey): entire_file / ReadEntireFile / FreeEntireFile are simple helpers // for loading a file into memory. They are defined at the end of this file. // struct entire_file { size_t Size; void *Contents; }; static entire_file ReadEntireFile(char *Filename); static void FreeEntireFile(entire_file *File); static void PrintHash(meow_u128 Hash) { meow_u32 *HashU32 = (meow_u32 *)&Hash; printf(" %08X-%08X-%08X-%08X\n", HashU32[3], HashU32[2], HashU32[1], HashU32[0]); } static void HashTestBuffer(void) { // NOTE(casey): Make a buffer with repeating numbers. int Size = 16000; char *Buffer = (char *)malloc(Size); for(int Index = 0; Index < Size; ++Index) { Buffer[Index] = (char)Index; } // NOTE(casey): Ask Meow for the hash meow_u128 Hash = MeowHash(0, Size, Buffer); // NOTE(casey): Extract example smaller hash sizes you might want: long long unsigned Hash64 = MeowU64From(Hash); int unsigned Hash32 = MeowU32From(Hash); // NOTE(casey): Print the hash printf(" Hash of a test buffer:\n"); PrintHash(Hash); free(Buffer); } static void HashOneFile(char *FilenameA) { // NOTE(casey): Load the file entire_file A = ReadEntireFile(FilenameA); if(A.Contents) { // NOTE(casey): Ask Meow for the hash meow_u128 HashA = MeowHash(0, A.Size, A.Contents); // NOTE(casey): Print the hash printf(" Hash of \"%s\":\n", FilenameA); PrintHash(HashA); } FreeEntireFile(&A); } static void CompareTwoFiles(char *FilenameA, char *FilenameB) { // NOTE(casey): Load both files entire_file A = ReadEntireFile(FilenameA); entire_file B = ReadEntireFile(FilenameB); if(A.Contents && B.Contents) { // NOTE(casey): Hash both files meow_u128 HashA = MeowHash(0, A.Size, A.Contents); meow_u128 HashB = MeowHash(0, B.Size, B.Contents); // NOTE(casey): Check for match int HashesMatch = MeowHashesAreEqual(HashA, HashB); int FilesMatch = ((A.Size == B.Size) && (memcmp(A.Contents, B.Contents, A.Size) == 0)); // NOTE(casey): Print the result if(HashesMatch && FilesMatch) { printf("Files \"%s\" and \"%s\" are the same:\n", FilenameA, FilenameB); PrintHash(HashA); } else if(FilesMatch) { printf("MEOW HASH FAILURE: Files match but hashes don't!\n"); printf(" Hash of \"%s\":\n", FilenameA); PrintHash(HashA); printf(" Hash of \"%s\":\n", FilenameB); PrintHash(HashB); } else if(HashesMatch) { printf("MEOW HASH FAILURE: Hashes match but files don't!\n"); printf(" Hash of both \"%s\" and \"%s\":\n", FilenameA, FilenameB); PrintHash(HashA); } else { printf("Files \"%s\" and \"%s\" are different:\n", FilenameA, FilenameB); printf(" Hash of \"%s\":\n", FilenameA); PrintHash(HashA); printf(" Hash of \"%s\":\n", FilenameB); PrintHash(HashB); } } FreeEntireFile(&A); FreeEntireFile(&B); } // // NOTE(casey): That's it! Everything else below here is just boilerplate for starting up // and loading files with the C runtime library. // int main(int ArgCount, char **Args) { // NOTE(casey): Print the banner printf("meow_example %s - basic usage example of the Meow hash\n", MEOW_HASH_VERSION_NAME); printf("(C) Copyright 2018 by Molly Rocket, Inc. (https://mollyrocket.com)\n"); printf("See https://mollyrocket.com/meowhash for details.\n"); printf("\n"); // NOTE(casey): Detect which MeowHash to call - do this only once, at startup. int BitWidth = MeowHashSpecializeForCPU(); printf("Using %u-bit Meow implementation\n", BitWidth); // NOTE(casey): Look at our arguments to decide which example to run if(ArgCount < 2) { HashTestBuffer(); } else if(ArgCount == 2) { HashOneFile(Args[1]); } else if(ArgCount == 3) { CompareTwoFiles(Args[1], Args[2]); } else { printf("Usage:\n"); printf("%s - hash a test buffer\n", Args[0]); printf("%s [filename] - hash the contents of [filename]\n", Args[0]); printf("%s [filename0] [filename1] - hash the contents of [filename0] and [filename1] and compare them\n", Args[0]); } return(0); } static entire_file ReadEntireFile(char *Filename) { entire_file Result = {}; FILE *File = fopen(Filename, "rb"); if(File) { fseek(File, 0, SEEK_END); Result.Size = ftell(File); fseek(File, 0, SEEK_SET); Result.Contents = malloc(Result.Size); if(Result.Contents) { if(Result.Size) { fread(Result.Contents, Result.Size, 1, File); } } else { Result.Contents = 0; Result.Size = 0; } fclose(File); } else { printf("ERROR: Unable to load \"%s\"\n", Filename); } return(Result); } static void FreeEntireFile(entire_file *File) { if(File->Contents) { free(File->Contents); File->Contents = 0; } File->Size = 0; }
Become a Patron
Sponsor on GitHub
Donate via PayPal
Source on GitHub
Mailing list
Installed libraries
Wiki
Report an issue
How it works
Contact the author
CE on Mastodon
About the author
Statistics
Changelog
Version tree