Thanks for using Compiler Explorer
Sponsors
Jakt
C++
Ada
Algol68
Analysis
Android Java
Android Kotlin
Assembly
C
C3
Carbon
C with Coccinelle
C++ with Coccinelle
C++ (Circle)
CIRCT
Clean
CMake
CMakeScript
COBOL
C++ for OpenCL
MLIR
Cppx
Cppx-Blue
Cppx-Gold
Cpp2-cppfront
Crystal
C#
CUDA C++
D
Dart
Elixir
Erlang
Fortran
F#
GLSL
Go
Haskell
HLSL
Hook
Hylo
IL
ispc
Java
Julia
Kotlin
LLVM IR
LLVM MIR
Modula-2
Mojo
Nim
Numba
Nix
Objective-C
Objective-C++
OCaml
Odin
OpenCL C
Pascal
Pony
PTX
Python
Racket
Raku
Ruby
Rust
Sail
Snowball
Scala
Slang
Solidity
Spice
SPIR-V
Swift
LLVM TableGen
Toit
Triton
TypeScript Native
V
Vala
Visual Basic
Vyper
WASM
Zig
Javascript
GIMPLE
Ygen
sway
c source #1
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
6502 cc65 2.17
6502 cc65 2.18
6502 cc65 2.19
6502 cc65 trunk
ARM GCC 10.2.0 (linux)
ARM GCC 10.2.1 (none)
ARM GCC 10.3.0 (linux)
ARM GCC 10.3.1 (2021.07 none)
ARM GCC 10.3.1 (2021.10 none)
ARM GCC 10.5.0
ARM GCC 11.1.0 (linux)
ARM GCC 11.2.0 (linux)
ARM GCC 11.2.1 (none)
ARM GCC 11.3.0 (linux)
ARM GCC 11.4.0
ARM GCC 12.1.0 (linux)
ARM GCC 12.2.0 (linux)
ARM GCC 12.3.0
ARM GCC 12.4.0
ARM GCC 12.5.0
ARM GCC 13.1.0 (linux)
ARM GCC 13.2.0
ARM GCC 13.2.0 (unknown-eabi)
ARM GCC 13.3.0
ARM GCC 13.3.0 (unknown-eabi)
ARM GCC 13.4.0
ARM GCC 13.4.0 (unknown-eabi)
ARM GCC 14.1.0
ARM GCC 14.1.0 (unknown-eabi)
ARM GCC 14.2.0
ARM GCC 14.2.0 (unknown-eabi)
ARM GCC 14.3.0
ARM GCC 14.3.0 (unknown-eabi)
ARM GCC 15.1.0
ARM GCC 15.1.0 (unknown-eabi)
ARM GCC 15.2.0
ARM GCC 15.2.0 (unknown-eabi)
ARM GCC 4.5.4 (linux)
ARM GCC 4.6.4 (linux)
ARM GCC 5.4 (linux)
ARM GCC 5.4.1 (none)
ARM GCC 6.3.0 (linux)
ARM GCC 6.4.0 (linux)
ARM GCC 7.2.1 (none)
ARM GCC 7.3.0 (linux)
ARM GCC 7.5.0 (linux)
ARM GCC 8.2.0 (WinCE)
ARM GCC 8.2.0 (linux)
ARM GCC 8.3.1 (none)
ARM GCC 8.5.0 (linux)
ARM GCC 9.2.1 (none)
ARM GCC 9.3.0 (linux)
ARM GCC trunk (linux)
ARM msvc v19.0 (ex-WINE)
ARM msvc v19.10 (ex-WINE)
ARM msvc v19.14 (ex-WINE)
ARM64 GCC 10.2.0
ARM64 GCC 10.3.0
ARM64 GCC 10.4.0
ARM64 GCC 10.5.0
ARM64 GCC 11.1.0
ARM64 GCC 11.2.0
ARM64 GCC 11.3.0
ARM64 GCC 11.4.0
ARM64 GCC 12.1.0
ARM64 GCC 12.2.0
ARM64 GCC 12.3.0
ARM64 GCC 12.4.0
ARM64 GCC 12.5.0
ARM64 GCC 13.1.0
ARM64 GCC 13.2.0
ARM64 GCC 13.3.0
ARM64 GCC 13.4.0
ARM64 GCC 14.1.0
ARM64 GCC 14.2.0
ARM64 GCC 14.3.0
ARM64 GCC 15.1.0
ARM64 GCC 15.2.0
ARM64 GCC 4.9.4
ARM64 GCC 5.4
ARM64 GCC 5.5.0
ARM64 GCC 6.3
ARM64 GCC 6.4.0
ARM64 GCC 7.3.0
ARM64 GCC 7.5.0
ARM64 GCC 8.2.0
ARM64 GCC 8.5.0
ARM64 GCC 9.3.0
ARM64 GCC 9.4.0
ARM64 GCC 9.5.0
ARM64 GCC trunk
ARM64 Morello GCC 10.1.0 Alpha 1
ARM64 Morello GCC 10.1.2 Alpha 2
ARM64 msvc v19.14 (ex-WINE)
AVR gcc 10.3.0
AVR gcc 11.1.0
AVR gcc 12.1.0
AVR gcc 12.2.0
AVR gcc 12.3.0
AVR gcc 12.4.0
AVR gcc 12.5.0
AVR gcc 13.1.0
AVR gcc 13.2.0
AVR gcc 13.3.0
AVR gcc 13.4.0
AVR gcc 14.1.0
AVR gcc 14.2.0
AVR gcc 14.3.0
AVR gcc 15.1.0
AVR gcc 15.2.0
AVR gcc 4.5.4
AVR gcc 4.6.4
AVR gcc 5.4.0
AVR gcc 9.2.0
AVR gcc 9.3.0
Arduino Mega (1.8.9)
Arduino Uno (1.8.9)
BPF clang (trunk)
BPF clang 13.0.0
BPF clang 14.0.0
BPF clang 15.0.0
BPF clang 16.0.0
BPF clang 17.0.1
BPF clang 18.1.0
BPF clang 19.1.0
BPF clang 20.1.0
BPF clang 21.1.0
BPF gcc 13.1.0
BPF gcc 13.2.0
BPF gcc 13.3.0
BPF gcc 13.4.0
BPF gcc 14.1.0
BPF gcc 14.2.0
BPF gcc 14.3.0
BPF gcc 15.1.0
BPF gcc 15.2.0
BPF gcc trunk
C2Rust (master)
Chibicc 2020-12-07
FRC 2019
FRC 2020
FRC 2023
HPPA gcc 14.2.0
HPPA gcc 14.3.0
HPPA gcc 15.1.0
HPPA gcc 15.2.0
K1C gcc 7.4
K1C gcc 7.5
KVX ACB 4.1.0 (GCC 7.5.0)
KVX ACB 4.1.0-cd1 (GCC 7.5.0)
KVX ACB 4.10.0 (GCC 10.3.1)
KVX ACB 4.11.1 (GCC 10.3.1)
KVX ACB 4.12.0 (GCC 11.3.0)
KVX ACB 4.2.0 (GCC 7.5.0)
KVX ACB 4.3.0 (GCC 7.5.0)
KVX ACB 4.4.0 (GCC 7.5.0)
KVX ACB 4.6.0 (GCC 9.4.1)
KVX ACB 4.8.0 (GCC 9.4.1)
KVX ACB 4.9.0 (GCC 9.4.1)
KVX ACB 5.0.0 (GCC 12.2.1)
KVX ACB 5.2.0 (GCC 13.2.1)
LC3 (trunk)
M68K clang (trunk)
M68K gcc 13.1.0
M68K gcc 13.2.0
M68K gcc 13.3.0
M68K gcc 13.4.0
M68K gcc 14.1.0
M68K gcc 14.2.0
M68K gcc 14.3.0
M68K gcc 15.1.0
M68K gcc 15.2.0
MRISC32 gcc (trunk)
MSP430 gcc 12.1.0
MSP430 gcc 12.2.0
MSP430 gcc 12.3.0
MSP430 gcc 12.4.0
MSP430 gcc 12.5.0
MSP430 gcc 13.1.0
MSP430 gcc 13.2.0
MSP430 gcc 13.3.0
MSP430 gcc 13.4.0
MSP430 gcc 14.1.0
MSP430 gcc 14.2.0
MSP430 gcc 14.3.0
MSP430 gcc 15.1.0
MSP430 gcc 15.2.0
MSP430 gcc 4.5.3
MSP430 gcc 5.3.0
MSP430 gcc 6.2.1
MinGW clang 14.0.3
MinGW clang 14.0.6
MinGW clang 15.0.7
MinGW clang 16.0.0
MinGW clang 16.0.2
MinGW gcc 11.3.0
MinGW gcc 12.1.0
MinGW gcc 12.2.0
MinGW gcc 13.1.0
MinGW gcc 14.3.0
MinGW gcc 15.2.0
ORCA/C 2.1.0
ORCA/C 2.2.0
ORCA/C 2.2.1
POWER64 gcc 11.2.0
POWER64 gcc 12.1.0
POWER64 gcc 12.2.0
POWER64 gcc 12.3.0
POWER64 gcc 12.4.0
POWER64 gcc 12.5.0
POWER64 gcc 13.1.0
POWER64 gcc 13.2.0
POWER64 gcc 13.3.0
POWER64 gcc 13.4.0
POWER64 gcc 14.1.0
POWER64 gcc 14.2.0
POWER64 gcc 14.3.0
POWER64 gcc 15.1.0
POWER64 gcc 15.2.0
POWER64 gcc trunk
RISC-V (32-bits) gcc (trunk)
RISC-V (32-bits) gcc 10.2.0
RISC-V (32-bits) gcc 10.3.0
RISC-V (32-bits) gcc 11.2.0
RISC-V (32-bits) gcc 11.3.0
RISC-V (32-bits) gcc 11.4.0
RISC-V (32-bits) gcc 12.1.0
RISC-V (32-bits) gcc 12.2.0
RISC-V (32-bits) gcc 12.3.0
RISC-V (32-bits) gcc 12.4.0
RISC-V (32-bits) gcc 12.5.0
RISC-V (32-bits) gcc 13.1.0
RISC-V (32-bits) gcc 13.2.0
RISC-V (32-bits) gcc 13.3.0
RISC-V (32-bits) gcc 13.4.0
RISC-V (32-bits) gcc 14.1.0
RISC-V (32-bits) gcc 14.2.0
RISC-V (32-bits) gcc 14.3.0
RISC-V (32-bits) gcc 15.1.0
RISC-V (32-bits) gcc 15.2.0
RISC-V (32-bits) gcc 8.2.0
RISC-V (32-bits) gcc 8.5.0
RISC-V (32-bits) gcc 9.4.0
RISC-V (64-bits) gcc (trunk)
RISC-V (64-bits) gcc 10.2.0
RISC-V (64-bits) gcc 10.3.0
RISC-V (64-bits) gcc 11.2.0
RISC-V (64-bits) gcc 11.3.0
RISC-V (64-bits) gcc 11.4.0
RISC-V (64-bits) gcc 12.1.0
RISC-V (64-bits) gcc 12.2.0
RISC-V (64-bits) gcc 12.3.0
RISC-V (64-bits) gcc 12.4.0
RISC-V (64-bits) gcc 12.5.0
RISC-V (64-bits) gcc 13.1.0
RISC-V (64-bits) gcc 13.2.0
RISC-V (64-bits) gcc 13.3.0
RISC-V (64-bits) gcc 13.4.0
RISC-V (64-bits) gcc 14.1.0
RISC-V (64-bits) gcc 14.2.0
RISC-V (64-bits) gcc 14.3.0
RISC-V (64-bits) gcc 15.1.0
RISC-V (64-bits) gcc 15.2.0
RISC-V (64-bits) gcc 8.2.0
RISC-V (64-bits) gcc 8.5.0
RISC-V (64-bits) gcc 9.4.0
RISC-V rv32gc clang (trunk)
RISC-V rv32gc clang 10.0.0
RISC-V rv32gc clang 10.0.1
RISC-V rv32gc clang 11.0.0
RISC-V rv32gc clang 11.0.1
RISC-V rv32gc clang 12.0.0
RISC-V rv32gc clang 12.0.1
RISC-V rv32gc clang 13.0.0
RISC-V rv32gc clang 13.0.1
RISC-V rv32gc clang 14.0.0
RISC-V rv32gc clang 15.0.0
RISC-V rv32gc clang 16.0.0
RISC-V rv32gc clang 17.0.1
RISC-V rv32gc clang 18.1.0
RISC-V rv32gc clang 19.1.0
RISC-V rv32gc clang 20.1.0
RISC-V rv32gc clang 21.1.0
RISC-V rv32gc clang 9.0.0
RISC-V rv32gc clang 9.0.1
RISC-V rv64gc clang (trunk)
RISC-V rv64gc clang 10.0.0
RISC-V rv64gc clang 10.0.1
RISC-V rv64gc clang 11.0.0
RISC-V rv64gc clang 11.0.1
RISC-V rv64gc clang 12.0.0
RISC-V rv64gc clang 12.0.1
RISC-V rv64gc clang 13.0.0
RISC-V rv64gc clang 13.0.1
RISC-V rv64gc clang 14.0.0
RISC-V rv64gc clang 15.0.0
RISC-V rv64gc clang 16.0.0
RISC-V rv64gc clang 17.0.1
RISC-V rv64gc clang 18.1.0
RISC-V rv64gc clang 19.1.0
RISC-V rv64gc clang 20.1.0
RISC-V rv64gc clang 21.1.0
RISC-V rv64gc clang 9.0.0
RISC-V rv64gc clang 9.0.1
Raspbian Buster
Raspbian Stretch
SDCC 4.0.0
SDCC 4.1.0
SDCC 4.2.0
SDCC 4.3.0
SDCC 4.4.0
SDCC 4.5.0
SPARC LEON gcc 12.2.0
SPARC LEON gcc 12.3.0
SPARC LEON gcc 12.4.0
SPARC LEON gcc 12.5.0
SPARC LEON gcc 13.1.0
SPARC LEON gcc 13.2.0
SPARC LEON gcc 13.3.0
SPARC LEON gcc 13.4.0
SPARC LEON gcc 14.1.0
SPARC LEON gcc 14.2.0
SPARC LEON gcc 14.3.0
SPARC LEON gcc 15.1.0
SPARC LEON gcc 15.2.0
SPARC gcc 12.2.0
SPARC gcc 12.3.0
SPARC gcc 12.4.0
SPARC gcc 12.5.0
SPARC gcc 13.1.0
SPARC gcc 13.2.0
SPARC gcc 13.3.0
SPARC gcc 13.4.0
SPARC gcc 14.1.0
SPARC gcc 14.2.0
SPARC gcc 14.3.0
SPARC gcc 15.1.0
SPARC gcc 15.2.0
SPARC64 gcc 12.2.0
SPARC64 gcc 12.3.0
SPARC64 gcc 12.4.0
SPARC64 gcc 12.5.0
SPARC64 gcc 13.1.0
SPARC64 gcc 13.2.0
SPARC64 gcc 13.3.0
SPARC64 gcc 13.4.0
SPARC64 gcc 14.1.0
SPARC64 gcc 14.2.0
SPARC64 gcc 14.3.0
SPARC64 gcc 15.1.0
SPARC64 gcc 15.2.0
TCC (trunk)
TCC 0.9.27
TI C6x gcc 12.2.0
TI C6x gcc 12.3.0
TI C6x gcc 12.4.0
TI C6x gcc 12.5.0
TI C6x gcc 13.1.0
TI C6x gcc 13.2.0
TI C6x gcc 13.3.0
TI C6x gcc 13.4.0
TI C6x gcc 14.1.0
TI C6x gcc 14.2.0
TI C6x gcc 14.3.0
TI C6x gcc 15.1.0
TI C6x gcc 15.2.0
TI CL430 21.6.1
Tricore gcc 11.3.0 (EEESlab)
VAX gcc NetBSDELF 10.4.0
VAX gcc NetBSDELF 10.5.0 (Nov 15 03:50:22 2023)
VAX gcc NetBSDELF 12.4.0 (Apr 16 05:27 2025)
WebAssembly clang (trunk)
Xtensa ESP32 gcc 11.2.0 (2022r1)
Xtensa ESP32 gcc 12.2.0 (20230208)
Xtensa ESP32 gcc 14.2.0 (20241119)
Xtensa ESP32 gcc 8.2.0 (2019r2)
Xtensa ESP32 gcc 8.2.0 (2020r1)
Xtensa ESP32 gcc 8.2.0 (2020r2)
Xtensa ESP32 gcc 8.4.0 (2020r3)
Xtensa ESP32 gcc 8.4.0 (2021r1)
Xtensa ESP32 gcc 8.4.0 (2021r2)
Xtensa ESP32-S2 gcc 11.2.0 (2022r1)
Xtensa ESP32-S2 gcc 12.2.0 (20230208)
Xtensa ESP32-S2 gcc 14.2.0 (20241119)
Xtensa ESP32-S2 gcc 8.2.0 (2019r2)
Xtensa ESP32-S2 gcc 8.2.0 (2020r1)
Xtensa ESP32-S2 gcc 8.2.0 (2020r2)
Xtensa ESP32-S2 gcc 8.4.0 (2020r3)
Xtensa ESP32-S2 gcc 8.4.0 (2021r1)
Xtensa ESP32-S2 gcc 8.4.0 (2021r2)
Xtensa ESP32-S3 gcc 11.2.0 (2022r1)
Xtensa ESP32-S3 gcc 12.2.0 (20230208)
Xtensa ESP32-S3 gcc 14.2.0 (20241119)
Xtensa ESP32-S3 gcc 8.4.0 (2020r3)
Xtensa ESP32-S3 gcc 8.4.0 (2021r1)
Xtensa ESP32-S3 gcc 8.4.0 (2021r2)
arm64 msvc v19.20 VS16.0
arm64 msvc v19.21 VS16.1
arm64 msvc v19.22 VS16.2
arm64 msvc v19.23 VS16.3
arm64 msvc v19.24 VS16.4
arm64 msvc v19.25 VS16.5
arm64 msvc v19.27 VS16.7
arm64 msvc v19.28 VS16.8
arm64 msvc v19.28 VS16.9
arm64 msvc v19.29 VS16.10
arm64 msvc v19.29 VS16.11
arm64 msvc v19.30 VS17.0
arm64 msvc v19.31 VS17.1
arm64 msvc v19.32 VS17.2
arm64 msvc v19.33 VS17.3
arm64 msvc v19.34 VS17.4
arm64 msvc v19.35 VS17.5
arm64 msvc v19.36 VS17.6
arm64 msvc v19.37 VS17.7
arm64 msvc v19.38 VS17.8
arm64 msvc v19.39 VS17.9
arm64 msvc v19.40 VS17.10
arm64 msvc v19.41 VS17.11
arm64 msvc v19.42 VS17.12
arm64 msvc v19.43 VS17.13
arm64 msvc v19.latest
armv7-a clang (trunk)
armv7-a clang 10.0.0
armv7-a clang 10.0.1
armv7-a clang 11.0.0
armv7-a clang 11.0.1
armv7-a clang 12.0.0
armv7-a clang 12.0.1
armv7-a clang 13.0.0
armv7-a clang 13.0.1
armv7-a clang 14.0.0
armv7-a clang 15.0.0
armv7-a clang 16.0.0
armv7-a clang 17.0.1
armv7-a clang 18.1.0
armv7-a clang 19.1.0
armv7-a clang 20.1.0
armv7-a clang 21.1.0
armv7-a clang 9.0.0
armv7-a clang 9.0.1
armv8-a clang (all architectural features, trunk)
armv8-a clang (trunk)
armv8-a clang 10.0.0
armv8-a clang 10.0.1
armv8-a clang 11.0.0
armv8-a clang 11.0.1
armv8-a clang 12.0.0
armv8-a clang 12.0.1
armv8-a clang 13.0.0
armv8-a clang 13.0.1
armv8-a clang 14.0.0
armv8-a clang 15.0.0
armv8-a clang 16.0.0
armv8-a clang 17.0.1
armv8-a clang 18.1.0
armv8-a clang 19.1.0
armv8-a clang 20.1.0
armv8-a clang 21.1.0
armv8-a clang 9.0.0
armv8-a clang 9.0.1
clang 12 for DPU (rel 2023.2.0)
cproc-master
ez80-clang 15.0.0
ez80-clang 15.0.7
llvm-mos commander X16
llvm-mos commodore 64
llvm-mos mega65
llvm-mos nes-cnrom
llvm-mos nes-mmc1
llvm-mos nes-mmc3
llvm-mos nes-nrom
llvm-mos osi-c1p
loongarch64 gcc 12.2.0
loongarch64 gcc 12.3.0
loongarch64 gcc 12.4.0
loongarch64 gcc 12.5.0
loongarch64 gcc 13.1.0
loongarch64 gcc 13.2.0
loongarch64 gcc 13.3.0
loongarch64 gcc 13.4.0
loongarch64 gcc 14.1.0
loongarch64 gcc 14.2.0
loongarch64 gcc 14.3.0
loongarch64 gcc 15.1.0
loongarch64 gcc 15.2.0
mips (el) gcc 12.1.0
mips (el) gcc 12.2.0
mips (el) gcc 12.3.0
mips (el) gcc 12.4.0
mips (el) gcc 12.5.0
mips (el) gcc 13.1.0
mips (el) gcc 13.2.0
mips (el) gcc 13.3.0
mips (el) gcc 13.4.0
mips (el) gcc 14.1.0
mips (el) gcc 14.2.0
mips (el) gcc 14.3.0
mips (el) gcc 15.1.0
mips (el) gcc 15.2.0
mips (el) gcc 4.9.4
mips (el) gcc 5.4
mips (el) gcc 5.5.0
mips (el) gcc 9.5.0
mips clang 13.0.0
mips clang 14.0.0
mips clang 15.0.0
mips clang 16.0.0
mips clang 17.0.1
mips clang 18.1.0
mips clang 19.1.0
mips clang 20.1.0
mips clang 21.1.0
mips gcc 11.2.0
mips gcc 12.1.0
mips gcc 12.2.0
mips gcc 12.3.0
mips gcc 12.4.0
mips gcc 12.5.0
mips gcc 13.1.0
mips gcc 13.2.0
mips gcc 13.3.0
mips gcc 13.4.0
mips gcc 14.1.0
mips gcc 14.2.0
mips gcc 14.3.0
mips gcc 15.1.0
mips gcc 15.2.0
mips gcc 4.9.4
mips gcc 5.4
mips gcc 5.5.0
mips gcc 9.3.0 (codescape)
mips gcc 9.5.0
mips64 (el) gcc 12.1.0
mips64 (el) gcc 12.2.0
mips64 (el) gcc 12.3.0
mips64 (el) gcc 12.4.0
mips64 (el) gcc 12.5.0
mips64 (el) gcc 13.1.0
mips64 (el) gcc 13.2.0
mips64 (el) gcc 13.3.0
mips64 (el) gcc 13.4.0
mips64 (el) gcc 14.1.0
mips64 (el) gcc 14.2.0
mips64 (el) gcc 14.3.0
mips64 (el) gcc 15.1.0
mips64 (el) gcc 15.2.0
mips64 (el) gcc 4.9.4
mips64 (el) gcc 5.4.0
mips64 (el) gcc 5.5.0
mips64 (el) gcc 9.5.0
mips64 clang 13.0.0
mips64 clang 14.0.0
mips64 clang 15.0.0
mips64 clang 16.0.0
mips64 clang 17.0.1
mips64 clang 18.1.0
mips64 clang 19.1.0
mips64 clang 20.1.0
mips64 clang 21.1.0
mips64 gcc 11.2.0
mips64 gcc 12.1.0
mips64 gcc 12.2.0
mips64 gcc 12.3.0
mips64 gcc 12.4.0
mips64 gcc 12.5.0
mips64 gcc 13.1.0
mips64 gcc 13.2.0
mips64 gcc 13.3.0
mips64 gcc 13.4.0
mips64 gcc 14.1.0
mips64 gcc 14.2.0
mips64 gcc 14.3.0
mips64 gcc 15.1.0
mips64 gcc 15.2.0
mips64 gcc 4.9.4
mips64 gcc 5.4
mips64 gcc 5.5.0
mips64 gcc 9.5.0
mips64el clang 13.0.0
mips64el clang 14.0.0
mips64el clang 15.0.0
mips64el clang 16.0.0
mips64el clang 17.0.1
mips64el clang 18.1.0
mips64el clang 19.1.0
mips64el clang 20.1.0
mips64el clang 21.1.0
mipsel clang 13.0.0
mipsel clang 14.0.0
mipsel clang 15.0.0
mipsel clang 16.0.0
mipsel clang 17.0.1
mipsel clang 18.1.0
mipsel clang 19.1.0
mipsel clang 20.1.0
mipsel clang 21.1.0
movfuscator (trunk)
nanoMIPS gcc 6.3.0
power gcc 11.2.0
power gcc 12.1.0
power gcc 12.2.0
power gcc 12.3.0
power gcc 12.4.0
power gcc 12.5.0
power gcc 13.1.0
power gcc 13.2.0
power gcc 13.3.0
power gcc 13.4.0
power gcc 14.1.0
power gcc 14.2.0
power gcc 14.3.0
power gcc 15.1.0
power gcc 15.2.0
power gcc 4.8.5
power64 AT12.0 (gcc8)
power64 AT13.0 (gcc9)
power64le AT12.0 (gcc8)
power64le AT13.0 (gcc9)
power64le clang (trunk)
power64le gcc 11.2.0
power64le gcc 12.1.0
power64le gcc 12.2.0
power64le gcc 12.3.0
power64le gcc 12.4.0
power64le gcc 12.5.0
power64le gcc 13.1.0
power64le gcc 13.2.0
power64le gcc 13.3.0
power64le gcc 13.4.0
power64le gcc 14.1.0
power64le gcc 14.2.0
power64le gcc 14.3.0
power64le gcc 15.1.0
power64le gcc 15.2.0
power64le gcc 6.3.0
power64le gcc trunk
powerpc64 clang (trunk)
ppci 0.5.5
s390x gcc 11.2.0
s390x gcc 12.1.0
s390x gcc 12.2.0
s390x gcc 12.3.0
s390x gcc 12.4.0
s390x gcc 12.5.0
s390x gcc 13.1.0
s390x gcc 13.2.0
s390x gcc 13.3.0
s390x gcc 13.4.0
s390x gcc 14.1.0
s390x gcc 14.2.0
s390x gcc 14.3.0
s390x gcc 15.1.0
s390x gcc 15.2.0
sh gcc 12.2.0
sh gcc 12.3.0
sh gcc 12.4.0
sh gcc 12.5.0
sh gcc 13.1.0
sh gcc 13.2.0
sh gcc 13.3.0
sh gcc 13.4.0
sh gcc 14.1.0
sh gcc 14.2.0
sh gcc 14.3.0
sh gcc 15.1.0
sh gcc 15.2.0
sh gcc 4.9.4
sh gcc 9.5.0
vast (trunk)
x64 msvc v19.0 (ex-WINE)
x64 msvc v19.10 (ex-WINE)
x64 msvc v19.14 (ex-WINE)
x64 msvc v19.20 VS16.0
x64 msvc v19.21 VS16.1
x64 msvc v19.22 VS16.2
x64 msvc v19.23 VS16.3
x64 msvc v19.24 VS16.4
x64 msvc v19.25 VS16.5
x64 msvc v19.27 VS16.7
x64 msvc v19.28 VS16.8
x64 msvc v19.28 VS16.9
x64 msvc v19.29 VS16.10
x64 msvc v19.29 VS16.11
x64 msvc v19.30 VS17.0
x64 msvc v19.31 VS17.1
x64 msvc v19.32 VS17.2
x64 msvc v19.33 VS17.3
x64 msvc v19.34 VS17.4
x64 msvc v19.35 VS17.5
x64 msvc v19.36 VS17.6
x64 msvc v19.37 VS17.7
x64 msvc v19.38 VS17.8
x64 msvc v19.39 VS17.9
x64 msvc v19.40 VS17.10
x64 msvc v19.41 VS17.11
x64 msvc v19.42 VS17.12
x64 msvc v19.43 VS17.13
x64 msvc v19.latest
x86 CompCert 3.10
x86 CompCert 3.11
x86 CompCert 3.12
x86 CompCert 3.9
x86 gcc 1.27
x86 msvc v19.0 (ex-WINE)
x86 msvc v19.10 (ex-WINE)
x86 msvc v19.14 (ex-WINE)
x86 msvc v19.20 VS16.0
x86 msvc v19.21 VS16.1
x86 msvc v19.22 VS16.2
x86 msvc v19.23 VS16.3
x86 msvc v19.24 VS16.4
x86 msvc v19.25 VS16.5
x86 msvc v19.27 VS16.7
x86 msvc v19.28 VS16.8
x86 msvc v19.28 VS16.9
x86 msvc v19.29 VS16.10
x86 msvc v19.29 VS16.11
x86 msvc v19.30 VS17.0
x86 msvc v19.31 VS17.1
x86 msvc v19.32 VS17.2
x86 msvc v19.33 VS17.3
x86 msvc v19.34 VS17.4
x86 msvc v19.35 VS17.5
x86 msvc v19.36 VS17.6
x86 msvc v19.37 VS17.7
x86 msvc v19.38 VS17.8
x86 msvc v19.39 VS17.9
x86 msvc v19.40 VS17.10
x86 msvc v19.41 VS17.11
x86 msvc v19.42 VS17.12
x86 msvc v19.43 VS17.13
x86 msvc v19.latest
x86 nvc 24.11
x86 nvc 24.9
x86 nvc 25.1
x86 nvc 25.3
x86 nvc 25.5
x86 nvc 25.7
x86 nvc 25.9
x86 tendra (trunk)
x86-64 clang (assertions trunk)
x86-64 clang (thephd.dev)
x86-64 clang (trunk)
x86-64 clang (widberg)
x86-64 clang 10.0.0
x86-64 clang 10.0.1
x86-64 clang 11.0.0
x86-64 clang 11.0.1
x86-64 clang 12.0.0
x86-64 clang 12.0.1
x86-64 clang 13.0.0
x86-64 clang 13.0.1
x86-64 clang 14.0.0
x86-64 clang 15.0.0
x86-64 clang 16.0.0
x86-64 clang 17.0.1
x86-64 clang 18.1.0
x86-64 clang 19.1.0
x86-64 clang 20.1.0
x86-64 clang 21.1.0
x86-64 clang 3.0.0
x86-64 clang 3.1
x86-64 clang 3.2
x86-64 clang 3.3
x86-64 clang 3.4.1
x86-64 clang 3.5
x86-64 clang 3.5.1
x86-64 clang 3.5.2
x86-64 clang 3.6
x86-64 clang 3.7
x86-64 clang 3.7.1
x86-64 clang 3.8
x86-64 clang 3.8.1
x86-64 clang 3.9.0
x86-64 clang 3.9.1
x86-64 clang 4.0.0
x86-64 clang 4.0.1
x86-64 clang 5.0.0
x86-64 clang 5.0.1
x86-64 clang 5.0.2
x86-64 clang 6.0.0
x86-64 clang 6.0.1
x86-64 clang 7.0.0
x86-64 clang 7.0.1
x86-64 clang 7.1.0
x86-64 clang 8.0.0
x86-64 clang 8.0.1
x86-64 clang 9.0.0
x86-64 clang 9.0.1
x86-64 gcc (trunk)
x86-64 gcc 10.1
x86-64 gcc 10.2
x86-64 gcc 10.3
x86-64 gcc 10.3 (assertions)
x86-64 gcc 10.4
x86-64 gcc 10.4 (assertions)
x86-64 gcc 10.5
x86-64 gcc 10.5 (assertions)
x86-64 gcc 11.1
x86-64 gcc 11.1 (assertions)
x86-64 gcc 11.2
x86-64 gcc 11.2 (assertions)
x86-64 gcc 11.3
x86-64 gcc 11.3 (assertions)
x86-64 gcc 11.4
x86-64 gcc 11.4 (assertions)
x86-64 gcc 12.1
x86-64 gcc 12.1 (assertions)
x86-64 gcc 12.2
x86-64 gcc 12.2 (assertions)
x86-64 gcc 12.3
x86-64 gcc 12.3 (assertions)
x86-64 gcc 12.4
x86-64 gcc 12.4 (assertions)
x86-64 gcc 12.5
x86-64 gcc 12.5 (assertions)
x86-64 gcc 13.1
x86-64 gcc 13.1 (assertions)
x86-64 gcc 13.2
x86-64 gcc 13.2 (assertions)
x86-64 gcc 13.3
x86-64 gcc 13.3 (assertions)
x86-64 gcc 13.4
x86-64 gcc 13.4 (assertions)
x86-64 gcc 14.1
x86-64 gcc 14.1 (assertions)
x86-64 gcc 14.2
x86-64 gcc 14.2 (assertions)
x86-64 gcc 14.3
x86-64 gcc 14.3 (assertions)
x86-64 gcc 15.1
x86-64 gcc 15.1 (assertions)
x86-64 gcc 15.2
x86-64 gcc 15.2 (assertions)
x86-64 gcc 3.4.6
x86-64 gcc 4.0.4
x86-64 gcc 4.1.2
x86-64 gcc 4.4.7
x86-64 gcc 4.5.3
x86-64 gcc 4.6.4
x86-64 gcc 4.7.1
x86-64 gcc 4.7.2
x86-64 gcc 4.7.3
x86-64 gcc 4.7.4
x86-64 gcc 4.8.1
x86-64 gcc 4.8.2
x86-64 gcc 4.8.3
x86-64 gcc 4.8.4
x86-64 gcc 4.8.5
x86-64 gcc 4.9.0
x86-64 gcc 4.9.1
x86-64 gcc 4.9.2
x86-64 gcc 4.9.3
x86-64 gcc 4.9.4
x86-64 gcc 5.1
x86-64 gcc 5.2
x86-64 gcc 5.3
x86-64 gcc 5.4
x86-64 gcc 6.1
x86-64 gcc 6.2
x86-64 gcc 6.3
x86-64 gcc 6.5
x86-64 gcc 7.1
x86-64 gcc 7.2
x86-64 gcc 7.3
x86-64 gcc 7.4
x86-64 gcc 7.5
x86-64 gcc 8.1
x86-64 gcc 8.2
x86-64 gcc 8.3
x86-64 gcc 8.4
x86-64 gcc 8.5
x86-64 gcc 9.1
x86-64 gcc 9.2
x86-64 gcc 9.3
x86-64 gcc 9.4
x86-64 gcc 9.5
x86-64 icc 13.0.1
x86-64 icc 16.0.3
x86-64 icc 17.0.0
x86-64 icc 18.0.0
x86-64 icc 19.0.0
x86-64 icc 19.0.1
x86-64 icc 2021.1.2
x86-64 icc 2021.10.0
x86-64 icc 2021.2.0
x86-64 icc 2021.3.0
x86-64 icc 2021.4.0
x86-64 icc 2021.5.0
x86-64 icc 2021.6.0
x86-64 icc 2021.7.0
x86-64 icc 2021.7.1
x86-64 icc 2021.8.0
x86-64 icc 2021.9.0
x86-64 icx (latest)
x86-64 icx 2021.1.2
x86-64 icx 2021.2.0
x86-64 icx 2021.3.0
x86-64 icx 2021.4.0
x86-64 icx 2022.0.0
x86-64 icx 2022.1.0
x86-64 icx 2022.2.0
x86-64 icx 2022.2.1
x86-64 icx 2023.0.0
x86-64 icx 2023.1.0
x86-64 icx 2024.0.0
x86_64 CompCert 3.10
x86_64 CompCert 3.11
x86_64 CompCert 3.12
x86_64 CompCert 3.9
z180-clang 15.0.0
z180-clang 15.0.7
z80-clang 15.0.0
z80-clang 15.0.7
z88dk 2.2
zig cc 0.10.0
zig cc 0.11.0
zig cc 0.12.0
zig cc 0.12.1
zig cc 0.13.0
zig cc 0.14.0
zig cc 0.14.1
zig cc 0.15.1
zig cc 0.6.0
zig cc 0.7.0
zig cc 0.7.1
zig cc 0.8.0
zig cc 0.9.0
zig cc trunk
Options
Source code
/* This is benchmarking of a collection of matrix multiplication algorithms. Algorithms are kept as simple as possible. No structs are passed as arguments. No "clever" "generic" matrix macros are used Different compilers multiplied with different platforms multiplied selection of data types yield a complex picture of benchmarking results. Although here is strong hint for you: The simplest algorithm is the fastest. Keep in mind compiler has the easiest job optimizing the simplest code. Use this file to recompile and re measure whenever selecting the right matrix multiplication algorithm (c) 2021-2022 by dbj at dbj dot org -- https://dbj.org/license_dbj/ */ // undef benchamrking means testing // use testing to prove the validity od the algorythms #define DBJ_BENCHMARKING 1 #define DBJ_ON_GODBOLT 1 #ifdef _MSC_VER #pragma region common trash #endif /* NDEBUG == RELEASE */ #include <assert.h> #if (defined(__clang__) || defined(__GNUC__)) #define DBJ_CLANGNUC 1 #else #define DBJ_CLANGNUC 0 #endif #if DBJ_CLANGNUC #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wunknown-pragmas" #pragma GCC diagnostic ignored "-Wunused-variable" #pragma GCC diagnostic ignored "-Wunused-local-typedefs" #pragma GCC diagnostic ignored "-Wunused-parameter" #pragma GCC diagnostic ignored "-Wfloat-equal" #ifdef __clang__ #pragma clang diagnostic ignored "-Wlanguage-extension-token" #endif // __clang__ #endif // DBJ_CLANGNUC #if !DBJ_ON_GODBOLT #include "build_time_stamp.inc" // DBJ_BUILD_TIMESTAMP #if DBJ_BENCHMARKING #include "ubench.h/ubench.h" #else #include "utest.h/utest.h" #endif // ! DBJ_BENCHMARKING #else // on godbolt #if DBJ_BENCHMARKING #include "https://raw.githubusercontent.com/sheredom/ubench.h/master/ubench.h" #else /// testing #include "https://raw.githubusercontent.com/sheredom/utest.h/master/utest.h" #endif // ! DBJ_BENCHMARKING #define DBJ_BUILD_TIMESTAMP __DATE__ " " __TIME__ #endif // DBJ_ON_GODBOLT #define DBJ_VT_RESET "\033[0m" #define DBJ_VT_BOLD "\033[01m" #define DBJ_VT_GREEN DBJ_VT_BOLD "\033[32m" // #define DBJ_VT_RED "\033[31m" #if DBJ_CLANGNUC #define DBJ_CTOR __attribute__((constructor)) #define DBJ_DTOR __attribute__((destructor)) #else #define DBJ_CTOR #define DBJ_DTOR #endif #ifdef NDEBUG #define NOMEM_POLICY(BOOLEXP_) ((void)BOOLEXP_) #else // ! NDEBUG == DEBUG #define NOMEM_POLICY(BOOLEXP_) \ if (!BOOLEXP_) { \ perror(__FILE__ ", Could not allocate memory!"); \ exit(-1); \ } #endif // ! NDEBUG // for when we are sure ARR is the array #define DBJ_CNT(ARR) (sizeof(ARR) / sizeof(ARR[0])) #undef MALLOC_WITH_POLICY #define MALLOC_WITH_POLICY(PTR_, SIZE_) \ do { \ PTR_ = malloc(SIZE_); \ NOMEM_POLICY(PTR_); \ } while (0) #undef CALLOC_WITH_POLICY #define CALLOC_WITH_POLICY(PTR_, R_, C_, SIZE_) \ do { \ PTR_ = calloc(R_ * C_, SIZE_); \ NOMEM_POLICY(PTR_); \ } while (0) #define DBJ_FREE(P_) \ do { \ if (P_) { \ free(P_); \ P_ = NULL; \ } \ } while (0) #undef DBJ_API #define DBJ_API static #ifdef _MSC_VER #pragma endregion // common trash #pragma region common data #endif ///////////////////////////////////////////////////////////////////////// // dimensions #if DBJ_BENCHMARKING // NOTE: here we use stack based prottype design , thus be carefull with sizes // UBENCH repeats execution so matrix size is not the prevailing factor // keep them small-ish #define DBJ_MX_A_ROWS 0xF #define DBJ_MX_A_COLS 0xF * 2 #define DBJ_MX_B_ROWS DBJ_MX_A_COLS #define DBJ_MX_B_COLS 0xF * 2 #else // testing /* In case of testing we use this constelation of matrices, to check the correctness of algorithms * ! 1 2 | | 5 6 | | 19 22 | * | | x | | = | | * | 3 4 | | 7 8 | | 43 50 | */ #define DBJ_MX_A_ROWS 2 #define DBJ_MX_A_COLS 2 #define DBJ_MX_B_ROWS DBJ_MX_A_COLS #define DBJ_MX_B_COLS 2 #endif // testing #define DBJ_MX_R_ROWS DBJ_MX_A_ROWS #define DBJ_MX_R_COLS DBJ_MX_B_COLS static_assert(DBJ_MX_A_COLS == DBJ_MX_B_ROWS, "DBJ_MX_A_COLS != DBJ_MX_B_ROWS"); static_assert(DBJ_MX_A_ROWS == DBJ_MX_R_ROWS, "DBJ_MX_A_ROWS != DBJ_MX_R_ROWS"); static_assert(DBJ_MX_B_COLS == DBJ_MX_R_COLS, "DBJ_MX_B_COLS != DBJ_MX_R_COLS"); typedef double dbj_matrix_data_type; #define dbj_matrix_data_type_name "double" // NOTE: these are compile time typedefs // we can create them here // if we do not use Variably Modified Types (VMT) // ditto we can typedef dbj_matrix_data_type (*dbj_mx_a_pointer)[DBJ_MX_A_COLS][DBJ_MX_A_ROWS]; typedef dbj_matrix_data_type (*dbj_mx_b_pointer)[DBJ_MX_B_COLS][DBJ_MX_B_ROWS]; typedef dbj_matrix_data_type (*dbj_mx_r_pointer)[DBJ_MX_R_COLS][DBJ_MX_R_ROWS]; typedef dbj_matrix_data_type (*dbj_mx_a_row)[DBJ_MX_A_COLS]; typedef dbj_matrix_data_type (*dbj_mx_b_row)[DBJ_MX_B_COLS]; typedef dbj_matrix_data_type (*dbj_mx_r_row)[DBJ_MX_R_COLS]; #ifdef _MSC_VER #pragma endregion // common data #pragma region matrix functions and various matmuls #endif #if DBJ_BENCHMARKING DBJ_API void* matrix_arr_init(const unsigned rows_a, const unsigned cols_a, dbj_matrix_data_type a[static rows_a][cols_a]) { for (unsigned i = 0; i < rows_a; i++) { for (unsigned j = 0; j < cols_a; j++) { a[i][j] = (dbj_matrix_data_type)(i * cols_a + j); } } return a; } #endif // DBJ_BENCHMARKING #define dbj_matrix_size_bytes(rows_, cols_, type_) \ (rows_ * cols_ * sizeof(type_)) DBJ_API void dbj_matrix_transpose( const unsigned rows_m, const unsigned cols_m, const dbj_matrix_data_type m[static rows_m][cols_m], dbj_matrix_data_type t[static cols_m][rows_m]) { for (size_t i = 0; i < rows_m; i++) { for (size_t j = 0; j < cols_m; j++) { t[j][i] = m[i][j]; } } } DBJ_API dbj_matrix_data_type sdot_1(int n, const dbj_matrix_data_type x[static n], const dbj_matrix_data_type y[static n]) { dbj_matrix_data_type s = (dbj_matrix_data_type)0; for (int i = 0; i < n; ++i) s += x[i] * y[i]; return s; } DBJ_API dbj_matrix_data_type sdot_8(int n, const dbj_matrix_data_type x[static n], const dbj_matrix_data_type y[static n]) { int i, n8 = n >> 3 << 3; dbj_matrix_data_type s = (dbj_matrix_data_type)0, t[8] = {(dbj_matrix_data_type)0}; // t[0] = t[1] = t[2] = t[3] = t[4] = t[5] = t[6] = t[7] = 0.0f; for (i = 0; i < n8; i += 8) { t[0] += x[i + 0] * y[i + 0]; t[1] += x[i + 1] * y[i + 1]; t[2] += x[i + 2] * y[i + 2]; t[3] += x[i + 3] * y[i + 3]; t[4] += x[i + 4] * y[i + 4]; t[5] += x[i + 5] * y[i + 5]; t[6] += x[i + 6] * y[i + 6]; t[7] += x[i + 7] * y[i + 7]; } for (s = (dbj_matrix_data_type)0; i < n; ++i) s += x[i] * y[i]; s += t[0] + t[1] + t[2] + t[3] + t[4] + t[5] + t[6] + t[7]; return s; } // the most "by the book" C matrix mutliplication function // author has added the static keyword for sizes // this is using VLA/VMT features // the key fact might be this is the matrix mutliplication so // "severley optimized" by compilers there is no point investing // in finding faster algorithms, including SSE/AVX usage DBJ_API void the_most_by_the_book_matrix_mult( size_t a_rows, size_t a_cols, size_t b_cols, dbj_matrix_data_type A[static a_rows][a_cols], dbj_matrix_data_type B[static a_cols][b_cols], dbj_matrix_data_type C[static a_rows][b_cols]) { for (size_t i = 0; i < a_rows; ++i) { for (size_t j = 0; j < b_cols; ++j) { C[i][j] = 0.0; for (size_t l = 0; l < a_cols; ++l) { C[i][j] += A[i][l] * B[l][j]; } } } } /* use 1D aray as matrix type + index calculation of "matrix" [row][col] this is in here because it is curiously and persistently the fastest matmul */ DBJ_API dbj_matrix_data_type* matmul_mx_as_array( const size_t a_rows, const size_t a_cols, const size_t b_cols, dbj_matrix_data_type* a, dbj_matrix_data_type* b, dbj_matrix_data_type* c) { /* the matmul dimensional requirements A rows == B columns A columns == B rows R rows == A rows R columns == B columns */ for (size_t i = 0; i < a_rows; i++) { for (size_t k = 0; k < b_cols; k++) { dbj_matrix_data_type sum = (dbj_matrix_data_type)0.0; for (size_t j = 0; j < a_cols /* same as b rows */; j++) { sum += a[i * a_cols + j] * b[j * a_rows + k]; } c[i * a_rows + k] = sum; } } return c; } /* ---------------------------------------------------------------------------- */ DBJ_API dbj_matrix_data_type* matmul_mx_as_array_another( const size_t a_rows, const size_t a_cols, const size_t b_cols, dbj_matrix_data_type* a, dbj_matrix_data_type* b, dbj_matrix_data_type* c, dbj_matrix_data_type* bT) { // orienteering // const unsigned b_rows = a_cols; // const unsigned bt_rows = b_cols; // const unsigned bt_cols = b_rows ; dbj_matrix_data_type* bTR = bT; dbj_matrix_transpose(a_cols, b_cols, (void*)b, (void*)bTR); for (unsigned i = 0; i < a_rows; i++) { for (unsigned k = 0; k < b_cols; k++) { dbj_matrix_data_type sum = 0.0; for (unsigned j = 0; j < a_cols; j++) { sum += a[i * a_cols + j] * bTR[k * b_cols + j]; } c[i * b_cols + k] = sum; } } return c; } // using: gcc -s -O3 -lm -Wall -DNDEBUG // this one is a winner // this is VMT based DBJ_API void* matmul_transpose_sdot( const unsigned a_rows, const unsigned a_cols, const unsigned b_cols, dbj_matrix_data_type a[static a_rows][a_cols], dbj_matrix_data_type b[static a_cols][b_cols], dbj_matrix_data_type m[static a_rows][b_cols], // allocated space for transposed b dbj_matrix_data_type bT[static b_cols][a_cols]) { // orinteering // const unsigned b_rows = a_cols; // const unsigned bt_rows = b_cols; // const unsigned bt_cols = b_rows ; dbj_matrix_data_type(*bTR)[a_cols] = bT; dbj_matrix_transpose(a_cols, b_cols, (void*)b, (void*)bTR); for (unsigned i = 0; i < a_rows; ++i) for (unsigned j = 0; j < b_cols; ++j) m[i][j] = sdot_8(a_cols, a[i], bTR[j]); return m; } DBJ_API void* matmul_transpose_sdot_another( const unsigned a_rows, const unsigned a_cols, const unsigned b_cols, dbj_matrix_data_type a[static a_rows][a_cols], dbj_matrix_data_type b[static a_cols][b_cols], dbj_matrix_data_type m[static a_rows][b_cols], // allocated space for transposed b dbj_matrix_data_type bT[static b_cols][a_cols]) { // orienteering // const unsigned b_rows = a_cols; // const unsigned bt_rows = b_cols; // const unsigned bt_cols = b_rows ; // pointer to bT Row dbj_matrix_data_type(*bTR)[a_cols] = bT; dbj_matrix_transpose(a_cols, b_cols, (void*)b, (void*)bTR); for (unsigned i = 0; i < a_rows; ++i) for (unsigned j = 0; j < b_cols; ++j) m[i][j] = sdot_1(a_cols, a[i], bTR[j]); return m; } #ifdef _MSC_VER #pragma endregion // matrix functions and various matmuls #pragma region common for testing or benchmarking #endif // ubench functions have no parameters // thus we use common data aka globals typedef struct app_data_struct { const unsigned rows_a; const unsigned cols_a; const unsigned rows_b; const unsigned cols_b; const unsigned rows_r; const unsigned cols_r; // transposed B dimension const unsigned rows_bT; const unsigned cols_bT; // the matrixes dbj_matrix_data_type a[DBJ_MX_A_ROWS][DBJ_MX_A_COLS]; dbj_matrix_data_type b[DBJ_MX_B_ROWS][DBJ_MX_B_COLS]; // transposed b dbj_matrix_data_type bT[DBJ_MX_B_COLS][DBJ_MX_B_ROWS]; // the result dbj_matrix_data_type r[DBJ_MX_R_ROWS] [DBJ_MX_R_COLS]; /* rezult size is a rows * b cols */ } app_data_type; // app_data is global pointer to app data #define reset_test_result() \ do { \ dbj_matrix_data_type(*rap)[DBJ_MX_R_ROWS * DBJ_MX_R_COLS] = \ (void*)app_data->r; \ memset(rap, 0, \ sizeof(dbj_matrix_data_type[DBJ_MX_R_ROWS * DBJ_MX_R_COLS])); \ } while (0) DBJ_API app_data_type* app_data = 0; DBJ_API void app_start(void) { // CAUTION : if you declare large dimensions this will take a // lot of stack space. Or just fail. static app_data_type app_data_prototype = { .rows_a = DBJ_MX_A_ROWS, .cols_a = DBJ_MX_A_COLS, .rows_b = DBJ_MX_B_ROWS, .cols_b = DBJ_MX_B_COLS, // transposed b dimension .rows_bT = DBJ_MX_B_COLS, .cols_bT = DBJ_MX_B_ROWS, /* the result */ .rows_r = DBJ_MX_A_ROWS, .cols_r = DBJ_MX_B_COLS, #if !DBJ_BENCHMARKING // testing .a = {{1, 2}, {3, 4}}, .b = {{5, 6}, {7, 8}}, .bT = {{0, 0}, {0, 0}}, .r = {{0, 0}, {0, 0}}, #endif // !DBJ_BENCHMARKING }; app_data = & app_data_prototype; #undef DBJ_APP_KIND #if DBJ_BENCHMARKING #define DBJ_APP_KIND "BENCHMARKING" matrix_arr_init(app_data->rows_a, app_data->cols_a, app_data->a); matrix_arr_init(app_data->rows_b, app_data->cols_b, app_data->b); // r and bT are zeroed when app_data_prototype was made #else // TESTING #define DBJ_APP_KIND "TESTING" /* * ! 1 2 | | 5 6 | | 19 22 | * | | x | | = | | * | 3 4 | | 7 8 | | 43 50 | */ assert(app_data->rows_a * app_data->cols_a == 4); assert(app_data->rows_b * app_data->cols_b == 4); assert(app_data->rows_r * app_data->cols_r == 4); #endif // ! DBJ_BENCHMARKING fprintf(stderr, DBJ_VT_RESET "\n\n" DBJ_VT_GREEN DBJ_APP_KIND "\n\n" "Various matrix multiplication algorithms benchmarking/testing" "\n(c) 2021-2022 by dbj at dbj dot org, " "https://dbj.org/license_dbj \nTimestamp: %s", DBJ_BUILD_TIMESTAMP); #define SHOWMX_(N_, R_, C_, DS_, MS_) \ fprintf(stderr, "\n%3s: %2d * %2d * sizeof(%s) == %4.2f KB", \ N_, R_, C_, DS_, MS_) #define SHOWMX(N_, RF_, CF_) \ do { \ const float size_ = \ dbj_matrix_size_bytes(app_data->RF_, app_data->CF_, \ dbj_matrix_data_type) / \ 1024.0f; \ SHOWMX_(N_, app_data->RF_, app_data->CF_, dbj_matrix_data_type_name, \ size_); \ } while (0) SHOWMX("A", rows_a, cols_a); SHOWMX("B", rows_b, cols_b); SHOWMX("BT", rows_bT, cols_bT); SHOWMX("R", rows_r, cols_r); #undef SHOWMX_ #undef SHOWMX #undef DBJ_APP_KIND } // app init finishes just here DBJ_API void app_end(void) { // app_data is the adress of the hidden static // thus no freeing // DBJ_FREE(app_data); printf(" " DBJ_VT_RESET " "); } ///////////////////////////////////////////////////////////////////////// #if DBJ_BENCHMARKING // rezult reset and checking are done in UTEST's, see bellow UBENCH(matmul, matmul_transpose_sdot_another) { matmul_transpose_sdot_another(DBJ_MX_A_ROWS, DBJ_MX_A_COLS, DBJ_MX_B_COLS, app_data->a, app_data->b, app_data->r, app_data->bT); } UBENCH(matmul, matmul_transpose_sdot) { matmul_transpose_sdot(DBJ_MX_A_ROWS, DBJ_MX_A_COLS, DBJ_MX_B_COLS, app_data->a, app_data->b, app_data->r, app_data->bT); } UBENCH(matmul, matmul_mx_as_array_another) { matmul_mx_as_array_another(DBJ_MX_A_ROWS, DBJ_MX_A_COLS, DBJ_MX_B_COLS, (void*)app_data->a, (void*)app_data->b, (void*)app_data->r, (void*)app_data->bT); } UBENCH(matmul, matmul_mx_as_array) { matmul_mx_as_array(DBJ_MX_A_ROWS, DBJ_MX_A_COLS, DBJ_MX_B_COLS, (void*)app_data->a, (void*)app_data->b, (void*)app_data->r); } UBENCH(matmul, the_most_by_the_book_matrix_mult) { the_most_by_the_book_matrix_mult(DBJ_MX_A_ROWS, DBJ_MX_A_COLS, DBJ_MX_B_COLS, app_data->a, app_data->b, app_data->r); } #else // testing ///////////////////////////////////////////////////// /* * ! 1 2 | | 5 6 | | 19 22 | * | | x | | = | | * | 3 4 | | 7 8 | | 43 50 | #define check_test_input() \ do { \ EXPECT_EQ(app_data->a[0][0], (dbj_matrix_data_type)1); \ EXPECT_EQ(app_data->a[0][1], (dbj_matrix_data_type)2); \ EXPECT_EQ(app_data->a[1][0], (dbj_matrix_data_type)3); \ EXPECT_EQ(app_data->a[1][1], (dbj_matrix_data_type)4); \ \ EXPECT_EQ(app_data->b[0][0], (dbj_matrix_data_type)5); \ EXPECT_EQ(app_data->b[0][1], (dbj_matrix_data_type)6); \ EXPECT_EQ(app_data->b[1][0], (dbj_matrix_data_type)7); \ EXPECT_EQ(app_data->b[1][1], (dbj_matrix_data_type)8); \ } while (0) */ #define check_test_result() \ do { \ EXPECT_EQ(app_data->r[0][0], (dbj_matrix_data_type)19); \ EXPECT_EQ(app_data->r[0][1], (dbj_matrix_data_type)22); \ EXPECT_EQ(app_data->r[1][0], (dbj_matrix_data_type)43); \ EXPECT_EQ(app_data->r[1][1], (dbj_matrix_data_type)50); \ } while (0) UTEST(matmul, matmul_transpose_sdot_another) { reset_test_result(); matmul_transpose_sdot_another(DBJ_MX_A_ROWS, DBJ_MX_A_COLS, DBJ_MX_B_COLS, app_data->a, app_data->b, app_data->r, app_data->bT); check_test_result(); } UTEST(matmul, matmul_transpose_sdot) { reset_test_result(); matmul_transpose_sdot(DBJ_MX_A_ROWS, DBJ_MX_A_COLS, DBJ_MX_B_COLS, app_data->a, app_data->b, app_data->r, app_data->bT); check_test_result(); } UTEST(matmul, matmul_mx_as_array_another) { reset_test_result(); matmul_mx_as_array_another(DBJ_MX_A_ROWS, DBJ_MX_A_COLS, DBJ_MX_B_COLS, (void*)app_data->a, (void*)app_data->b, (void*)app_data->r, (void*)app_data->bT); check_test_result(); } UTEST(matmul, matmul_mx_as_array) { reset_test_result(); matmul_mx_as_array(DBJ_MX_A_ROWS, DBJ_MX_A_COLS, DBJ_MX_B_COLS, (void*)app_data->a, (void*)app_data->b, (void*)app_data->r); check_test_result(); } UTEST(matmul, the_most_by_the_book_matrix_mult) { reset_test_result(); the_most_by_the_book_matrix_mult(DBJ_MX_A_ROWS, DBJ_MX_B_ROWS, DBJ_MX_A_COLS, app_data->a, app_data->b, app_data->r); check_test_result(); } #undef check_test_result #endif // testing #ifdef _MSC_VER #pragma region common main #endif #if DBJ_BENCHMARKING UBENCH_STATE(); #else // testing UTEST_STATE(); #endif // ! DBJ_BENCHMARKING int main(int argc, const char* const argv[]) { #if defined(_WIN32) // VT100 ESC codes kick-start system(" "); #endif app_start(); #if DBJ_BENCHMARKING return ubench_main(argc, argv); #else // ! DBJ_BENCHMARKING return utest_main(argc, argv); #endif // ! DBJ_BENCHMARKING app_end(); } #ifdef _MSC_VER #pragma endregion // common main #endif #ifdef _MSC_VER #pragma endregion // common for testing or benchmarking #endif #if DBJ_CLANGNUC #pragma GCC diagnostic pop #endif // DBJ_CLANGNUC
Become a Patron
Sponsor on GitHub
Donate via PayPal
Compiler Explorer Shop
Source on GitHub
Mailing list
Installed libraries
Wiki
Report an issue
How it works
Contact the author
CE on Mastodon
CE on Bluesky
Statistics
Changelog
Version tree