Thanks for using Compiler Explorer
Sponsors
Jakt
C++
Ada
Analysis
Android Java
Android Kotlin
Assembly
C
C3
Carbon
C++ (Circle)
CIRCT
Clean
CMake
CMakeScript
COBOL
C++ for OpenCL
MLIR
Cppx
Cppx-Blue
Cppx-Gold
Cpp2-cppfront
Crystal
C#
CUDA C++
D
Dart
Elixir
Erlang
Fortran
F#
Go
Haskell
HLSL
Hook
Hylo
ispc
Java
Julia
Kotlin
LLVM IR
LLVM MIR
Modula-2
Nim
Objective-C
Objective-C++
OCaml
OpenCL C
Pascal
Pony
Python
Racket
Ruby
Rust
Snowball
Scala
Solidity
Spice
Swift
LLVM TableGen
Toit
TypeScript Native
V
Vala
Visual Basic
Zig
Javascript
GIMPLE
c++ source #1
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
6502-c++ 11.1.0
ARM GCC 10.2.0
ARM GCC 10.3.0
ARM GCC 10.4.0
ARM GCC 10.5.0
ARM GCC 11.1.0
ARM GCC 11.2.0
ARM GCC 11.3.0
ARM GCC 11.4.0
ARM GCC 12.1.0
ARM GCC 12.2.0
ARM GCC 12.3.0
ARM GCC 13.1.0
ARM GCC 13.2.0
ARM GCC 13.2.0 (unknown-eabi)
ARM GCC 4.5.4
ARM GCC 4.6.4
ARM GCC 5.4
ARM GCC 6.3.0
ARM GCC 6.4.0
ARM GCC 7.3.0
ARM GCC 7.5.0
ARM GCC 8.2.0
ARM GCC 8.5.0
ARM GCC 9.3.0
ARM GCC 9.4.0
ARM GCC 9.5.0
ARM GCC trunk
ARM gcc 10.2.1 (none)
ARM gcc 10.3.1 (2021.07 none)
ARM gcc 10.3.1 (2021.10 none)
ARM gcc 11.2.1 (none)
ARM gcc 5.4.1 (none)
ARM gcc 7.2.1 (none)
ARM gcc 8.2 (WinCE)
ARM gcc 8.3.1 (none)
ARM gcc 9.2.1 (none)
ARM msvc v19.0 (WINE)
ARM msvc v19.10 (WINE)
ARM msvc v19.14 (WINE)
ARM64 Morello gcc 10.1 Alpha 2
ARM64 gcc 10.2
ARM64 gcc 10.3
ARM64 gcc 10.4
ARM64 gcc 10.5.0
ARM64 gcc 11.1
ARM64 gcc 11.2
ARM64 gcc 11.3
ARM64 gcc 11.4.0
ARM64 gcc 12.1
ARM64 gcc 12.2.0
ARM64 gcc 12.3.0
ARM64 gcc 13.1.0
ARM64 gcc 13.2.0
ARM64 gcc 5.4
ARM64 gcc 6.3
ARM64 gcc 6.4
ARM64 gcc 7.3
ARM64 gcc 7.5
ARM64 gcc 8.2
ARM64 gcc 8.5
ARM64 gcc 9.3
ARM64 gcc 9.4
ARM64 gcc 9.5
ARM64 gcc trunk
ARM64 msvc v19.14 (WINE)
AVR gcc 10.3.0
AVR gcc 11.1.0
AVR gcc 12.1.0
AVR gcc 12.2.0
AVR gcc 12.3.0
AVR gcc 13.1.0
AVR gcc 13.2.0
AVR gcc 4.5.4
AVR gcc 4.6.4
AVR gcc 5.4.0
AVR gcc 9.2.0
AVR gcc 9.3.0
Arduino Mega (1.8.9)
Arduino Uno (1.8.9)
BPF clang (trunk)
BPF clang 13.0.0
BPF clang 14.0.0
BPF clang 15.0.0
BPF clang 16.0.0
BPF clang 17.0.1
BPF clang 18.1.0
BPF gcc 13.1.0
BPF gcc 13.2.0
BPF gcc trunk
EDG (experimental reflection)
EDG 6.5
EDG 6.5 (GNU mode gcc 13)
EDG 6.6
EDG 6.6 (GNU mode gcc 13)
FRC 2019
FRC 2020
FRC 2023
KVX ACB 4.1.0 (GCC 7.5.0)
KVX ACB 4.1.0-cd1 (GCC 7.5.0)
KVX ACB 4.10.0 (GCC 10.3.1)
KVX ACB 4.11.1 (GCC 10.3.1)
KVX ACB 4.12.0 (GCC 11.3.0)
KVX ACB 4.2.0 (GCC 7.5.0)
KVX ACB 4.3.0 (GCC 7.5.0)
KVX ACB 4.4.0 (GCC 7.5.0)
KVX ACB 4.6.0 (GCC 9.4.1)
KVX ACB 4.8.0 (GCC 9.4.1)
KVX ACB 4.9.0 (GCC 9.4.1)
M68K gcc 13.1.0
M68K gcc 13.2.0
M68k clang (trunk)
MRISC32 gcc (trunk)
MSP430 gcc 4.5.3
MSP430 gcc 5.3.0
MSP430 gcc 6.2.1
MinGW clang 14.0.3
MinGW clang 14.0.6
MinGW clang 15.0.7
MinGW clang 16.0.0
MinGW clang 16.0.2
MinGW gcc 11.3.0
MinGW gcc 12.1.0
MinGW gcc 12.2.0
MinGW gcc 13.1.0
RISC-V (32-bits) gcc (trunk)
RISC-V (32-bits) gcc 10.2.0
RISC-V (32-bits) gcc 10.3.0
RISC-V (32-bits) gcc 11.2.0
RISC-V (32-bits) gcc 11.3.0
RISC-V (32-bits) gcc 11.4.0
RISC-V (32-bits) gcc 12.1.0
RISC-V (32-bits) gcc 12.2.0
RISC-V (32-bits) gcc 12.3.0
RISC-V (32-bits) gcc 13.1.0
RISC-V (32-bits) gcc 13.2.0
RISC-V (32-bits) gcc 8.2.0
RISC-V (32-bits) gcc 8.5.0
RISC-V (32-bits) gcc 9.4.0
RISC-V (64-bits) gcc (trunk)
RISC-V (64-bits) gcc 10.2.0
RISC-V (64-bits) gcc 10.3.0
RISC-V (64-bits) gcc 11.2.0
RISC-V (64-bits) gcc 11.3.0
RISC-V (64-bits) gcc 11.4.0
RISC-V (64-bits) gcc 12.1.0
RISC-V (64-bits) gcc 12.2.0
RISC-V (64-bits) gcc 12.3.0
RISC-V (64-bits) gcc 13.1.0
RISC-V (64-bits) gcc 13.2.0
RISC-V (64-bits) gcc 8.2.0
RISC-V (64-bits) gcc 8.5.0
RISC-V (64-bits) gcc 9.4.0
RISC-V rv32gc clang (trunk)
RISC-V rv32gc clang 10.0.0
RISC-V rv32gc clang 10.0.1
RISC-V rv32gc clang 11.0.0
RISC-V rv32gc clang 11.0.1
RISC-V rv32gc clang 12.0.0
RISC-V rv32gc clang 12.0.1
RISC-V rv32gc clang 13.0.0
RISC-V rv32gc clang 13.0.1
RISC-V rv32gc clang 14.0.0
RISC-V rv32gc clang 15.0.0
RISC-V rv32gc clang 16.0.0
RISC-V rv32gc clang 17.0.1
RISC-V rv32gc clang 18.1.0
RISC-V rv32gc clang 9.0.0
RISC-V rv32gc clang 9.0.1
RISC-V rv64gc clang (trunk)
RISC-V rv64gc clang 10.0.0
RISC-V rv64gc clang 10.0.1
RISC-V rv64gc clang 11.0.0
RISC-V rv64gc clang 11.0.1
RISC-V rv64gc clang 12.0.0
RISC-V rv64gc clang 12.0.1
RISC-V rv64gc clang 13.0.0
RISC-V rv64gc clang 13.0.1
RISC-V rv64gc clang 14.0.0
RISC-V rv64gc clang 15.0.0
RISC-V rv64gc clang 16.0.0
RISC-V rv64gc clang 17.0.1
RISC-V rv64gc clang 18.1.0
RISC-V rv64gc clang 9.0.0
RISC-V rv64gc clang 9.0.1
Raspbian Buster
Raspbian Stretch
SPARC LEON gcc 12.2.0
SPARC LEON gcc 12.3.0
SPARC LEON gcc 13.1.0
SPARC LEON gcc 13.2.0
SPARC gcc 12.2.0
SPARC gcc 12.3.0
SPARC gcc 13.1.0
SPARC gcc 13.2.0
SPARC64 gcc 12.2.0
SPARC64 gcc 12.3.0
SPARC64 gcc 13.1.0
SPARC64 gcc 13.2.0
TI C6x gcc 12.2.0
TI C6x gcc 12.3.0
TI C6x gcc 13.1.0
TI C6x gcc 13.2.0
TI CL430 21.6.1
VAX gcc NetBSDELF 10.4.0
VAX gcc NetBSDELF 10.5.0 (Nov 15 03:50:22 2023)
WebAssembly clang (trunk)
Xtensa ESP32 gcc 11.2.0 (2022r1)
Xtensa ESP32 gcc 12.2.0 (20230208)
Xtensa ESP32 gcc 8.2.0 (2019r2)
Xtensa ESP32 gcc 8.2.0 (2020r1)
Xtensa ESP32 gcc 8.2.0 (2020r2)
Xtensa ESP32 gcc 8.4.0 (2020r3)
Xtensa ESP32 gcc 8.4.0 (2021r1)
Xtensa ESP32 gcc 8.4.0 (2021r2)
Xtensa ESP32-S2 gcc 11.2.0 (2022r1)
Xtensa ESP32-S2 gcc 12.2.0 (20230208)
Xtensa ESP32-S2 gcc 8.2.0 (2019r2)
Xtensa ESP32-S2 gcc 8.2.0 (2020r1)
Xtensa ESP32-S2 gcc 8.2.0 (2020r2)
Xtensa ESP32-S2 gcc 8.4.0 (2020r3)
Xtensa ESP32-S2 gcc 8.4.0 (2021r1)
Xtensa ESP32-S2 gcc 8.4.0 (2021r2)
Xtensa ESP32-S3 gcc 11.2.0 (2022r1)
Xtensa ESP32-S3 gcc 12.2.0 (20230208)
Xtensa ESP32-S3 gcc 8.4.0 (2020r3)
Xtensa ESP32-S3 gcc 8.4.0 (2021r1)
Xtensa ESP32-S3 gcc 8.4.0 (2021r2)
arm64 msvc v19.28 VS16.9
arm64 msvc v19.29 VS16.10
arm64 msvc v19.29 VS16.11
arm64 msvc v19.30
arm64 msvc v19.31
arm64 msvc v19.32
arm64 msvc v19.33
arm64 msvc v19.34
arm64 msvc v19.35
arm64 msvc v19.36
arm64 msvc v19.37
arm64 msvc v19.38
arm64 msvc v19.latest
armv7-a clang (trunk)
armv7-a clang 10.0.0
armv7-a clang 10.0.1
armv7-a clang 11.0.0
armv7-a clang 11.0.1
armv7-a clang 9.0.0
armv7-a clang 9.0.1
armv8-a clang (all architectural features, trunk)
armv8-a clang (trunk)
armv8-a clang 10.0.0
armv8-a clang 10.0.1
armv8-a clang 11.0.0
armv8-a clang 11.0.1
armv8-a clang 12.0.0
armv8-a clang 13.0.0
armv8-a clang 14.0.0
armv8-a clang 15.0.0
armv8-a clang 16.0.0
armv8-a clang 17.0.1
armv8-a clang 18.1.0
armv8-a clang 9.0.0
armv8-a clang 9.0.1
ellcc 0.1.33
ellcc 0.1.34
ellcc 2017-07-16
hexagon-clang 16.0.5
llvm-mos atari2600-3e
llvm-mos atari2600-4k
llvm-mos atari2600-common
llvm-mos atari5200-supercart
llvm-mos atari8-cart-megacart
llvm-mos atari8-cart-std
llvm-mos atari8-cart-xegs
llvm-mos atari8-common
llvm-mos atari8-dos
llvm-mos c128
llvm-mos c64
llvm-mos commodore
llvm-mos cpm65
llvm-mos cx16
llvm-mos dodo
llvm-mos eater
llvm-mos mega65
llvm-mos nes
llvm-mos nes-action53
llvm-mos nes-cnrom
llvm-mos nes-gtrom
llvm-mos nes-mmc1
llvm-mos nes-mmc3
llvm-mos nes-nrom
llvm-mos nes-unrom
llvm-mos nes-unrom-512
llvm-mos osi-c1p
llvm-mos pce
llvm-mos pce-cd
llvm-mos pce-common
llvm-mos pet
llvm-mos rp6502
llvm-mos rpc8e
llvm-mos supervision
llvm-mos vic20
loongarch64 gcc 12.2.0
loongarch64 gcc 12.3.0
loongarch64 gcc 13.1.0
loongarch64 gcc 13.2.0
mips clang 13.0.0
mips clang 14.0.0
mips clang 15.0.0
mips clang 16.0.0
mips clang 17.0.1
mips clang 18.1.0
mips gcc 11.2.0
mips gcc 12.1.0
mips gcc 12.2.0
mips gcc 12.3.0
mips gcc 13.1.0
mips gcc 13.2.0
mips gcc 4.9.4
mips gcc 5.4
mips gcc 5.5.0
mips gcc 9.3.0 (codescape)
mips gcc 9.5.0
mips64 (el) gcc 12.1.0
mips64 (el) gcc 12.2.0
mips64 (el) gcc 12.3.0
mips64 (el) gcc 13.1.0
mips64 (el) gcc 13.2.0
mips64 (el) gcc 4.9.4
mips64 (el) gcc 5.4.0
mips64 (el) gcc 5.5.0
mips64 (el) gcc 9.5.0
mips64 clang 13.0.0
mips64 clang 14.0.0
mips64 clang 15.0.0
mips64 clang 16.0.0
mips64 clang 17.0.1
mips64 clang 18.1.0
mips64 gcc 11.2.0
mips64 gcc 12.1.0
mips64 gcc 12.2.0
mips64 gcc 12.3.0
mips64 gcc 13.1.0
mips64 gcc 13.2.0
mips64 gcc 4.9.4
mips64 gcc 5.4.0
mips64 gcc 5.5.0
mips64 gcc 9.5.0
mips64el clang 13.0.0
mips64el clang 14.0.0
mips64el clang 15.0.0
mips64el clang 16.0.0
mips64el clang 17.0.1
mips64el clang 18.1.0
mipsel clang 13.0.0
mipsel clang 14.0.0
mipsel clang 15.0.0
mipsel clang 16.0.0
mipsel clang 17.0.1
mipsel clang 18.1.0
mipsel gcc 12.1.0
mipsel gcc 12.2.0
mipsel gcc 12.3.0
mipsel gcc 13.1.0
mipsel gcc 13.2.0
mipsel gcc 4.9.4
mipsel gcc 5.4.0
mipsel gcc 5.5.0
mipsel gcc 9.5.0
nanoMIPS gcc 6.3.0 (mtk)
power gcc 11.2.0
power gcc 12.1.0
power gcc 12.2.0
power gcc 12.3.0
power gcc 13.1.0
power gcc 13.2.0
power gcc 4.8.5
power64 AT12.0 (gcc8)
power64 AT13.0 (gcc9)
power64 gcc 11.2.0
power64 gcc 12.1.0
power64 gcc 12.2.0
power64 gcc 12.3.0
power64 gcc 13.1.0
power64 gcc 13.2.0
power64 gcc trunk
power64le AT12.0 (gcc8)
power64le AT13.0 (gcc9)
power64le clang (trunk)
power64le gcc 11.2.0
power64le gcc 12.1.0
power64le gcc 12.2.0
power64le gcc 12.3.0
power64le gcc 13.1.0
power64le gcc 13.2.0
power64le gcc 6.3.0
power64le gcc trunk
powerpc64 clang (trunk)
s390x gcc 11.2.0
s390x gcc 12.1.0
s390x gcc 12.2.0
s390x gcc 12.3.0
s390x gcc 13.1.0
s390x gcc 13.2.0
sh gcc 12.2.0
sh gcc 12.3.0
sh gcc 13.1.0
sh gcc 13.2.0
sh gcc 4.9.4
sh gcc 9.5.0
vast (trunk)
x64 msvc v19.0 (WINE)
x64 msvc v19.10 (WINE)
x64 msvc v19.14
x64 msvc v19.14 (WINE)
x64 msvc v19.15
x64 msvc v19.16
x64 msvc v19.20
x64 msvc v19.21
x64 msvc v19.22
x64 msvc v19.23
x64 msvc v19.24
x64 msvc v19.25
x64 msvc v19.26
x64 msvc v19.27
x64 msvc v19.28
x64 msvc v19.28 VS16.9
x64 msvc v19.29 VS16.10
x64 msvc v19.29 VS16.11
x64 msvc v19.30
x64 msvc v19.31
x64 msvc v19.32
x64 msvc v19.33
x64 msvc v19.34
x64 msvc v19.35
x64 msvc v19.36
x64 msvc v19.37
x64 msvc v19.38
x64 msvc v19.latest
x86 djgpp 4.9.4
x86 djgpp 5.5.0
x86 djgpp 6.4.0
x86 djgpp 7.2.0
x86 msvc v19.0 (WINE)
x86 msvc v19.10 (WINE)
x86 msvc v19.14
x86 msvc v19.14 (WINE)
x86 msvc v19.15
x86 msvc v19.16
x86 msvc v19.20
x86 msvc v19.21
x86 msvc v19.22
x86 msvc v19.23
x86 msvc v19.24
x86 msvc v19.25
x86 msvc v19.26
x86 msvc v19.27
x86 msvc v19.28
x86 msvc v19.28 VS16.9
x86 msvc v19.29 VS16.10
x86 msvc v19.29 VS16.11
x86 msvc v19.30
x86 msvc v19.31
x86 msvc v19.32
x86 msvc v19.33
x86 msvc v19.34
x86 msvc v19.35
x86 msvc v19.36
x86 msvc v19.37
x86 msvc v19.38
x86 msvc v19.latest
x86 nvc++ 22.11
x86 nvc++ 22.7
x86 nvc++ 22.9
x86 nvc++ 23.1
x86 nvc++ 23.11
x86 nvc++ 23.3
x86 nvc++ 23.5
x86 nvc++ 23.7
x86 nvc++ 23.9
x86 nvc++ 24.1
x86 nvc++ 24.3
x86-64 Zapcc 190308
x86-64 clang (amd-stg-open)
x86-64 clang (assertions trunk)
x86-64 clang (clangir)
x86-64 clang (experimental -Wlifetime)
x86-64 clang (experimental P1061)
x86-64 clang (experimental P1144)
x86-64 clang (experimental P1221)
x86-64 clang (experimental P2996)
x86-64 clang (experimental metaprogramming - P2632)
x86-64 clang (experimental pattern matching)
x86-64 clang (old concepts branch)
x86-64 clang (reflection)
x86-64 clang (resugar)
x86-64 clang (thephd.dev)
x86-64 clang (trunk)
x86-64 clang (variadic friends - P2893)
x86-64 clang (widberg)
x86-64 clang 10.0.0
x86-64 clang 10.0.0 (assertions)
x86-64 clang 10.0.1
x86-64 clang 11.0.0
x86-64 clang 11.0.0 (assertions)
x86-64 clang 11.0.1
x86-64 clang 12.0.0
x86-64 clang 12.0.0 (assertions)
x86-64 clang 12.0.1
x86-64 clang 13.0.0
x86-64 clang 13.0.0 (assertions)
x86-64 clang 13.0.1
x86-64 clang 14.0.0
x86-64 clang 14.0.0 (assertions)
x86-64 clang 15.0.0
x86-64 clang 15.0.0 (assertions)
x86-64 clang 16.0.0
x86-64 clang 16.0.0 (assertions)
x86-64 clang 17.0.1
x86-64 clang 17.0.1 (assertions)
x86-64 clang 18.1.0
x86-64 clang 18.1.0 (assertions)
x86-64 clang 2.6.0 (assertions)
x86-64 clang 2.7.0 (assertions)
x86-64 clang 2.8.0 (assertions)
x86-64 clang 2.9.0 (assertions)
x86-64 clang 3.0.0
x86-64 clang 3.0.0 (assertions)
x86-64 clang 3.1
x86-64 clang 3.1 (assertions)
x86-64 clang 3.2
x86-64 clang 3.2 (assertions)
x86-64 clang 3.3
x86-64 clang 3.3 (assertions)
x86-64 clang 3.4 (assertions)
x86-64 clang 3.4.1
x86-64 clang 3.5
x86-64 clang 3.5 (assertions)
x86-64 clang 3.5.1
x86-64 clang 3.5.2
x86-64 clang 3.6
x86-64 clang 3.6 (assertions)
x86-64 clang 3.7
x86-64 clang 3.7 (assertions)
x86-64 clang 3.7.1
x86-64 clang 3.8
x86-64 clang 3.8 (assertions)
x86-64 clang 3.8.1
x86-64 clang 3.9.0
x86-64 clang 3.9.0 (assertions)
x86-64 clang 3.9.1
x86-64 clang 4.0.0
x86-64 clang 4.0.0 (assertions)
x86-64 clang 4.0.1
x86-64 clang 5.0.0
x86-64 clang 5.0.0 (assertions)
x86-64 clang 5.0.1
x86-64 clang 5.0.2
x86-64 clang 6.0.0
x86-64 clang 6.0.0 (assertions)
x86-64 clang 6.0.1
x86-64 clang 7.0.0
x86-64 clang 7.0.0 (assertions)
x86-64 clang 7.0.1
x86-64 clang 7.1.0
x86-64 clang 8.0.0
x86-64 clang 8.0.0 (assertions)
x86-64 clang 8.0.1
x86-64 clang 9.0.0
x86-64 clang 9.0.0 (assertions)
x86-64 clang 9.0.1
x86-64 clang rocm-4.5.2
x86-64 clang rocm-5.0.2
x86-64 clang rocm-5.1.3
x86-64 clang rocm-5.2.3
x86-64 clang rocm-5.3.3
x86-64 clang rocm-5.7.0
x86-64 gcc (contract labels)
x86-64 gcc (contracts natural syntax)
x86-64 gcc (contracts)
x86-64 gcc (coroutines)
x86-64 gcc (modules)
x86-64 gcc (trunk)
x86-64 gcc 10.1
x86-64 gcc 10.2
x86-64 gcc 10.3
x86-64 gcc 10.4
x86-64 gcc 10.5
x86-64 gcc 11.1
x86-64 gcc 11.2
x86-64 gcc 11.3
x86-64 gcc 11.4
x86-64 gcc 12.1
x86-64 gcc 12.2
x86-64 gcc 12.3
x86-64 gcc 13.1
x86-64 gcc 13.2
x86-64 gcc 4.1.2
x86-64 gcc 4.4.7
x86-64 gcc 4.5.3
x86-64 gcc 4.6.4
x86-64 gcc 4.7.1
x86-64 gcc 4.7.2
x86-64 gcc 4.7.3
x86-64 gcc 4.7.4
x86-64 gcc 4.8.1
x86-64 gcc 4.8.2
x86-64 gcc 4.8.3
x86-64 gcc 4.8.4
x86-64 gcc 4.8.5
x86-64 gcc 4.9.0
x86-64 gcc 4.9.1
x86-64 gcc 4.9.2
x86-64 gcc 4.9.3
x86-64 gcc 4.9.4
x86-64 gcc 5.1
x86-64 gcc 5.2
x86-64 gcc 5.3
x86-64 gcc 5.4
x86-64 gcc 5.5
x86-64 gcc 6.1
x86-64 gcc 6.2
x86-64 gcc 6.3
x86-64 gcc 6.4
x86-64 gcc 7.1
x86-64 gcc 7.2
x86-64 gcc 7.3
x86-64 gcc 7.4
x86-64 gcc 7.5
x86-64 gcc 8.1
x86-64 gcc 8.2
x86-64 gcc 8.3
x86-64 gcc 8.4
x86-64 gcc 8.5
x86-64 gcc 9.1
x86-64 gcc 9.2
x86-64 gcc 9.3
x86-64 gcc 9.4
x86-64 gcc 9.5
x86-64 icc 13.0.1
x86-64 icc 16.0.3
x86-64 icc 17.0.0
x86-64 icc 18.0.0
x86-64 icc 19.0.0
x86-64 icc 19.0.1
x86-64 icc 2021.1.2
x86-64 icc 2021.10.0
x86-64 icc 2021.2.0
x86-64 icc 2021.3.0
x86-64 icc 2021.4.0
x86-64 icc 2021.5.0
x86-64 icc 2021.6.0
x86-64 icc 2021.7.0
x86-64 icc 2021.7.1
x86-64 icc 2021.8.0
x86-64 icc 2021.9.0
x86-64 icx (latest)
x86-64 icx 2021.1.2
x86-64 icx 2021.2.0
x86-64 icx 2021.3.0
x86-64 icx 2021.4.0
x86-64 icx 2022.0.0
x86-64 icx 2022.1.0
x86-64 icx 2022.2.0
x86-64 icx 2022.2.1
x86-64 icx 2023.0.0
x86-64 icx 2023.1.0
x86-64 icx 2023.2.1
x86-64 icx 2024.0.0
zig c++ 0.10.0
zig c++ 0.11.0
zig c++ 0.6.0
zig c++ 0.7.0
zig c++ 0.7.1
zig c++ 0.8.0
zig c++ 0.9.0
zig c++ trunk
Options
Source code
#include <stddef.h> #include <string.h> typedef unsigned char uuid_t[16]; // unused for SSE2, only for scalar or SSSE3 or higher __attribute__((used)) // just silences warning; doesn't stop GCC from optimizing away _Alignas(16) static char const hexdigits_lower[16] = "0123456789abcdef"; __attribute__((used)) _Alignas(16) static char const hexdigits_upper[16] = "0123456789ABCDEF"; #if 1 #define FMT_ARG_UPPER hexdigits_upper #define FMT_ARG_LOWER hexdigits_lower static inline char hexdigit(unsigned c, unsigned alpha_base) { unsigned cdigit = c + '0'; return c < 10 ? cdigit : cdigit + (alpha_base-'0'-10); } // restrict on fmt doesn't help; we'd need to make uuid and/or buf restrict // to let the compiler know that *p++ = ... can't modify uuid bytes. //__attribute__((noinline)) // just for experimentation. __attribute__((regparm(3))) // efficient tailcall on 32-bit x86 static void uuid_fmt(const uuid_t uuid, char *buf, char const fmt[restrict]) { // args ordered to match the caller char *p = buf; unsigned alpha_base = fmt[10]; (void)alpha_base; for (int i = 0; i < 16; i+=1) { if (i == 4 || i == 6 || i == 8 || i == 10) { *p++ = '-'; } size_t tmp = uuid[i]; // help GCC not redo zero-extension, and not reload // current GCC wastes an instruction with a type narrower than a pointer #if 1 *p++ = fmt[tmp >> 4]; *p++ = fmt[tmp & 15]; #else *p++ = hexdigit(tmp >> 4, alpha_base); *p++ = hexdigit(tmp & 15, alpha_base); #endif #if 1 // unrolling by 2 is quite a bit faster tmp = uuid[i+1]; // unroll by 2, minimum gap between special i values // reducing the overhead of checking for the special cases // *p++ = hexdigit(tmp >> 4, alpha_base); // *p++ = hexdigit(tmp & 15, alpha_base); *p++ = fmt[tmp >> 4]; *p++ = fmt[tmp & 15]; i++; // fold this into the loop proper once we decide on an unroll factor #endif } *p = '\0'; } #endif #ifdef __SSE2__ #include <immintrin.h> static inline void store_low4(char *p, __m128i v) { //_mm_storeu_si32(p, v); // movd. Apparently not all compilers have this // Hopefully this is still strict-aliasing safe... _mm_store_ss((float*)p, _mm_castsi128_ps(v)); } #endif #if defined(__SSE2__) && !defined(__SSSE3__) && 1 // input already in printing order (big endian) static __m128i tohex_16digits_sse2(__m128i uu8, __m128i alpha_base) { const __m128i ascii_zero = _mm_set1_epi8('0'); const __m128i vec9 = _mm_set1_epi8(9); //const __m128i vec_af_add = _mm_set1_epi8( 'a'-('0'+10) ); //const __m128i vec_af_add = _mm_sub_epi8(ascii0, vec9); // happens to be the right number #undef FMT_ARG_UPPER #undef FMT_ARG_LOWER #define FMT_ARG_UPPER _mm_set1_epi8( 'A'-('0'+10) ) #define FMT_ARG_LOWER _mm_set1_epi8( 'a'-('0'+10) ) __m128i high_nibbles = _mm_srli_epi32(uu8, 4); __m128i unpacked_nibbles = _mm_unpacklo_epi8(high_nibbles, uu8); // high nibble first printing order const __m128i nibble_mask = _mm_set1_epi8(0x0f); unpacked_nibbles = _mm_and_si128(unpacked_nibbles, nibble_mask); __m128i gt9 = _mm_cmpgt_epi8(unpacked_nibbles, vec9); __m128i adjust_gt9 = _mm_and_si128(gt9, alpha_base); // 0 or af_add __m128i ascii = _mm_add_epi8(unpacked_nibbles, ascii_zero); ascii = _mm_add_epi8(ascii, adjust_gt9); return ascii; } #undef uuid_fmt #define uuid_fmt uuid_fmt_sse2 static __attribute__((regparm(3))) // efficient tailcall on 32-bit x86 void uuid_fmt_sse2(const uuid_t uuid, char *buf, __m128i alpha_base) { //__m128i uu_first8 = _mm_loadu_si64(uuid); __m128i uu_first8 = _mm_loadl_epi64((const __m128i*)uuid); __m128i hex_first16 = tohex_16digits_sse2(uu_first8, alpha_base); __m128i hex_last16 = tohex_16digits_sse2(_mm_loadl_epi64((const __m128i*)&uuid[8]), alpha_base); // 8x-4x-4x-4x-12x // A B C D E arbitrary chunk names _mm_storel_epi64((__m128i*)buf, hex_first16); _mm_storeu_si128((__m128i*)(buf + 36-16), hex_last16); // last 12x hex digits, plus stuff to be overwritten // store 4-byte pieces using some overlapping stores instead of shuffles / blends store_low4(buf+19, hex_last16); //__m128i hex_4_4 = _mm_unpackhi_epi64(hex_first16, hex_first16); // GCC9 shoots itself in the foot and also uses psrldq __m128i hex_4_4 = _mm_srli_si128(hex_first16, 8); // high half -> low _mm_storel_epi64((__m128i*)(buf+10), hex_4_4); // ends with middle 4x store_low4(buf+9, hex_4_4); buf[8] = '-'; // fill in the dashes buf[13] = '-'; buf[18] = '-'; buf[23] = '-'; buf[36] = '\0'; } #endif #ifdef __SSSE3__ // input already in printing order (big endian) // not yet masked to clear high garbage static __m128i nibbles_to_hex_ssse3(__m128i unpacked_nibbles, __m128i hexLUT) { const __m128i nibble_mask = _mm_set1_epi8(0x0f); __m128i masked_nibbles = _mm_and_si128(unpacked_nibbles, nibble_mask); // shift/interleave already done by caller so it can use unpacklo/hi #undef FMT_ARG_UPPER #undef FMT_ARG_LOWER #define FMT_ARG_UPPER _mm_load_si128((const __m128i*)hexdigits_upper) #define FMT_ARG_LOWER _mm_load_si128((const __m128i*)hexdigits_lower) return _mm_shuffle_epi8(hexLUT, masked_nibbles); } #undef uuid_fmt #define uuid_fmt uuid_fmt_ssse3 static __attribute__((regparm(3))) void uuid_fmt_ssse3(const uuid_t uuid, char *buf, __m128i hexLUT) { //__m128i uu_first8 = _mm_loadl_epi64((__m128i*)uuid); __m128i uu = _mm_loadu_si128((__m128i*)uuid); __m128i high_nibbles = _mm_srli_epi32(uu, 4); // unpack to high nibble first printing order __m128i hex_first16 = nibbles_to_hex_ssse3(_mm_unpacklo_epi8(high_nibbles, uu), hexLUT); __m128i hex_last16 = nibbles_to_hex_ssse3(_mm_unpackhi_epi8(high_nibbles, uu), hexLUT); // 8x-4x-4x-4x-12x // A B C D E arbitrary chunk names _mm_storel_epi64((__m128i*)(buf+0), hex_first16); _mm_storeu_si128((__m128i*)(buf+20), hex_last16); // ending with last 12 hex digits lined up with end of buf __m128i middle = _mm_alignr_epi8(hex_last16, hex_first16, 7); // BCDx const __m128i mid_shuffle = _mm_setr_epi8(-1, 1,2,3,4, -1, 5,6,7,8, -1, 9,10,11,12, -1); middle = _mm_shuffle_epi8(middle, mid_shuffle); const __m128i dashes = _mm_setr_epi8('-', 0,0,0,0, '-', 0,0,0,0, '-', 0,0,0,0, '-'); middle = _mm_or_si128(middle, dashes); _mm_storeu_si128((__m128i*)(buf+8), middle); buf[36] = '\0'; } #if 0 // seems to need more movdqa register-copy instructions // and a cache miss on the static data would leave more work incomplete // static void uuid_fmt_ssse3_unpack_after_xlat(const uuid_t uuid, char *buf, char const fmt[restrict]) { const __m128i hexLUT = _mm_load_si128((const __m128i*)fmt); __m128i uu = _mm_loadu_si128((__m128i*)uuid); __m128i high_nibbles = _mm_srli_epi32(uu, 4); const __m128i nibble_mask = _mm_set1_epi8(0x0f); __m128i low_nibbles = _mm_and_si128(uu, nibble_mask); high_nibbles = _mm_and_si128(high_nibbles, nibble_mask); low_nibbles = _mm_shuffle_epi8(hexLUT, low_nibbles); high_nibbles = _mm_shuffle_epi8(hexLUT, high_nibbles); __m128i hex_first16 = _mm_unpacklo_epi8(high_nibbles, low_nibbles); __m128i hex_last16 = _mm_unpackhi_epi8(high_nibbles, low_nibbles); // unpack with high nibble first printing order // __m128i hex_first16 = nibbles_to_hex_ssse3(_mm_unpacklo_epi8(high_nibbles, uu), hexLUT); // __m128i hex_last16 = nibbles_to_hex_ssse3(_mm_unpackhi_epi8(high_nibbles, uu), hexLUT); // 8x-4x-4x-4x-12x // A B C D E arbitrary chunk names _mm_storel_epi64((__m128i*)buf, hex_first16); _mm_storeu_si128((__m128i*)(buf+20), hex_last16); // ending with last 12 hex digits lined up with end of buf __m128i middle = _mm_alignr_epi8(hex_last16, hex_first16, 7); // BCDx const __m128i mid_shuffle = _mm_setr_epi8(-1, 1,2,3,4, -1, 5,6,7,8, -1, 9,10,11,12, -1); middle = _mm_shuffle_epi8(middle, mid_shuffle); const __m128i dashes = _mm_setr_epi8('-', 0,0,0,0, '-', 0,0,0,0, '-', 0,0,0,0, '-'); middle = _mm_or_si128(middle, dashes); _mm_storeu_si128((__m128i*)(buf+8), middle); buf[36] = '\0'; } #endif #ifdef __AVX512VBMI__ // UNTESTED, and macros for wrappers not done. // I can't perf test this, I don't have an IceLake or CannonLake. //static __attribute__((regparm(3))) void uuid_fmt_avx512vbmi(const uuid_t uuid, char *buf, __m256i hexLUT) { __m128i uu = _mm_loadu_si128((__m128i*)uuid); __m256i uu_dwords =_mm256_cvtepu32_epi64(uu); // or maybe 128-bit broadcast would work? /* #undef FMT_ARG_UPPER #undef FMT_ARG_LOWER #define FMT_ARG_UPPER _mm256_broadcastsi128_si256(_mm_load_si128((const __m128i*)hexdigits_upper)) #define FMT_ARG_LOWER _mm256_broadcastsi128_si256(_mm_load_si128((const __m128i*)hexdigits_lower)) */ // stupid compiler expands the static data instead of using a bcast load. Unless we hide the const behind a pointer // hexLUT = _mm256_broadcastsi128_si256(_mm_load_si128((const __m128i*)hexp)); // high nibbles first, but low bytes first: already in printing order static const _Alignas(8) unsigned char msc[] = {0x4, 0x0, 0xc, 8, 0x14, 0x10, 0x1c, 0x18}; __m256i multishift_control = _mm256_broadcastq_epi64(_mm_loadl_epi64((const __m128i*)msc)); // __m256i multishift_control = _mm256_set1_epi64x(msc_int); __m256i nibbles = _mm256_multishift_epi64_epi8(multishift_control, uu_dwords); // vpermb ymm uses 5 index bits. To avoid masking away high garbage, replicate the LUT to upper lanes. __m256i hex = _mm256_permutexvar_epi8(nibbles, hexLUT); buf[36] = '\0'; __m128i hex_last16 = _mm256_extracti128_si256(hex, 1); _mm_storeu_si128((__m128i*)(buf+20), hex_last16); // ending with last 12 hex digits lined up with end of buf __m256i dash = _mm256_set1_epi8('-'); // TODO: hand-hold compiler into 4B bcast load for this constant, too __mmask32 dashmask = ~0b00000000100001000010000100000000; __m256i shuf = _mm256_setr_epi8(0,1,2,3,4,5,6,7, 0, 8,9,10,11, 0, 12,13,14,15, 0, 16,17,18,19, 0, 20,21,22,23,24,25,26,27); // 8x-4x-4x-4x-12x __m256i hexdashes = _mm256_mask_permutexvar_epi8(dash, dashmask, shuf, hex); // merge-mask into dashes _mm256_storeu_si256((__m256i*)buf, hexdashes); // first 32 bytes, with dashes, partially overlapping the last 16 (and even the last 12) } #endif /* normal: load shift 2x unpack 2x and 2x pshufb AVX2 variable-shift after broadcast load: nope 2x bcast load 2x shiftv 2x and 2x pshufb LUT 2x pshufb interleave */ #endif #if 1 __attribute__((noinline)) // for testing void uuid_unparse_upper(const uuid_t uu, char *out) { uuid_fmt(uu, out, FMT_ARG_UPPER); } __attribute__((noinline)) void uuid_unparse_lower(const uuid_t uu, char *out) { uuid_fmt(uu, out, FMT_ARG_LOWER); } #ifdef UUID_UNPARSE_DEFAULT_UPPER //#define FMT_ARG_DEFAULT FMT_ARG_UPPER __attribute__((noinline)) void uuid_unparse(const uuid_t uu, char *out) __attribute__((alias("uuid_unparse_upper"))); #else //#define FMT_ARG_DEFAULT FMT_ARG_LOWER __attribute__((noinline)) void uuid_unparse(const uuid_t uu, char *out) __attribute__((alias("uuid_unparse_lower"))); #endif #endif #if 0 // original version. //__attribute__((noinline)) static void uuid_fmt_orig(char *buf, const uuid_t uuid, char const fmt[36]) { char *p = buf; for (int i = 0; i < 16; i++) { if (i == 4 || i == 6 || i == 8 || i == 10) { *p++ = '-'; } *p++ = fmt[uuid[i] >> 4]; *p++ = fmt[uuid[i] & 15]; } *p = '\0'; } void uuid_unparse_upper_orig(const uuid_t uu, char *out) { uuid_fmt_orig(out, uu, hexdigits_upper); } void uuid_unparse_lower_orig(const uuid_t uu, char *out) { uuid_fmt_orig(out, uu, hexdigits_lower); } #endif #include <string.h> #include <stdio.h> int main(void) { uuid_t uu = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}; __attribute__((aligned(64))) char outbuf[50]; memset(outbuf, 'Z', sizeof(outbuf)); for (int i=0; i<100000000 ; i++) { uuid_unparse(uu, outbuf); } printf("%.48s\n", outbuf); } /* TIMES for 100000000 iterations: * Conroe @2.4GHz running gcc9.3 -march=core2 -O2 SSSE3: 1.21 (clang 1.08) (slowshuffle, should prob. require SSE4 for dynamic dispatching, avoids slowshuffle CPUs) * Conroe @2.4GHz running gcc9.3 -march=core2 -O2 SSE2 version: 1.00 (clang 0.880) clang9.0.1 * Conroe @2.4GHz running gcc9.3 -march=core2 -O2 -U__SSE2__ scalar version: unrolled x2, LUT: 2.88 * Conroe @2.4GHz running gcc9.3 -march=core2 -O2 scalar version: unrolled x2, CMOV: 6.50 * Conroe @2.4GHz running gcc9.3 -march=core2 -O2 scalar version: rolled up, LUT: 6.40 * Conroe @2.4GHz running gcc9.3 -march=core2 -O2 scalar version: rolled up, CMOV: 7.64 * * Skylake @4.1GHz gcc9.3 -march=core2 -O2 SSSE3 version: 0.175 * Skylake @4.1GHz gcc9.3 -march=core2 -O2 SSE2 version: 0.30 * Skylake @4.1GHz gcc9.3 -march=core2 -O2 -mno-sse2 scalar version: unrolled x2, LUT: 1.69 * Skylake @4.1GHz gcc9.3 -march=core2 -O2 scalar version: unrolled x2, CMOV: 2.19 (half each cmova 2 uops / cmovb 1 uop) * Skylake @4.1GHz gcc9.3 -march=core2 -O2 scalar version: rolled up, LUT: 2.43 * Skylake @4.1GHz gcc9.3 -march=core2 -O2 scalar version: rolled up, CMOV: 2.64 * -O3 typically fully unrolls the scalar loops. (clang does that even at -O2 if you manually unroll by 2) * Conroe @2.4GHz running gcc9.3 -march=core2 -O3 SSSE3: 1.21: (slowshuffle, should prob. require SSE4 for dynamic dispatching) * Conroe @2.4GHz running gcc9.3 -march=core2 -O3 SSE2 version: 0.965 * Conroe @2.4GHz running gcc9.3 -march=core2 -O3 -U__SSE2__ scalar version: unrolled x2, LUT: 2.37 * Conroe @2.4GHz running gcc9.3 -march=core2 -O3 scalar version: unrolled x2, CMOV: 4.90 * Conroe @2.4GHz running gcc9.3 -march=core2 -O3 scalar version: unrolled x2, mixed: 2.72 * Conroe @2.4GHz running gcc9.3 -march=core2 -O3 scalar version: LUT: 2.35 // probably all fully unrolled * Conroe @2.4GHz running gcc9.3 -march=core2 -O3 scalar version: CMOV: 8.40 * Skylake @4.1GHz gcc9.3 -march=core2 -O3 SSSE3 version: 0.173 * Skylake @4.1GHz gcc9.3 -march=core2 -O3 SSE2 version: 0.273 * Skylake @4.1GHz gcc9.3 -march=core2 -O3 -U__SSE2__ scalar version: unrolled x2, LUT: 0.95 * Skylake @4.1GHz gcc9.3 -march=core2 -O3 scalar version: unrolled x2, CMOV: 2.10 (half each cmova 2 uops / cmovb 1 uop) * Skylake @4.1GHz gcc9.3 -march=core2 -O3 scalar version: unrolled x2, mixed: 1.34 * Skylake @4.1GHz gcc9.3 -march=core2 -O3 scalar version: LUT: 0.95 // probably all fully unrolled * Skylake @4.1GHz gcc9.3 -march=core2 -O3 scalar version: rolled up, CMOV: 2.54 */
Become a Patron
Sponsor on GitHub
Donate via PayPal
Source on GitHub
Mailing list
Installed libraries
Wiki
Report an issue
How it works
Contact the author
CE on Mastodon
About the author
Statistics
Changelog
Version tree