Thanks for using Compiler Explorer
Sponsors
Jakt
C++
Ada
Algol68
Analysis
Android Java
Android Kotlin
Assembly
C
C3
Carbon
C with Coccinelle
C++ with Coccinelle
C++ (Circle)
CIRCT
Clean
CMake
CMakeScript
COBOL
C++ for OpenCL
MLIR
Cppx
Cppx-Blue
Cppx-Gold
Cpp2-cppfront
Crystal
C#
CUDA C++
D
Dart
Elixir
Erlang
Fortran
F#
GLSL
Go
Haskell
HLSL
Hook
Hylo
IL
ispc
Java
Julia
Kotlin
LLVM IR
LLVM MIR
Modula-2
Mojo
Nim
Numba
Nix
Objective-C
Objective-C++
OCaml
Odin
OpenCL C
Pascal
Pony
PTX
Python
Racket
Raku
Ruby
Rust
Sail
Snowball
Scala
Slang
Solidity
Spice
SPIR-V
Swift
LLVM TableGen
Toit
Triton
TypeScript Native
V
Vala
Visual Basic
Vyper
WASM
Zig
Javascript
GIMPLE
Ygen
sway
c++ source #1
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
6502-c++ 11.1.0
ARM GCC 10.2.0
ARM GCC 10.3.0
ARM GCC 10.4.0
ARM GCC 10.5.0
ARM GCC 11.1.0
ARM GCC 11.2.0
ARM GCC 11.3.0
ARM GCC 11.4.0
ARM GCC 12.1.0
ARM GCC 12.2.0
ARM GCC 12.3.0
ARM GCC 12.4.0
ARM GCC 12.5.0
ARM GCC 13.1.0
ARM GCC 13.2.0
ARM GCC 13.2.0 (unknown-eabi)
ARM GCC 13.3.0
ARM GCC 13.3.0 (unknown-eabi)
ARM GCC 13.4.0
ARM GCC 13.4.0 (unknown-eabi)
ARM GCC 14.1.0
ARM GCC 14.1.0 (unknown-eabi)
ARM GCC 14.2.0
ARM GCC 14.2.0 (unknown-eabi)
ARM GCC 14.3.0
ARM GCC 14.3.0 (unknown-eabi)
ARM GCC 15.1.0
ARM GCC 15.1.0 (unknown-eabi)
ARM GCC 15.2.0
ARM GCC 15.2.0 (unknown-eabi)
ARM GCC 4.5.4
ARM GCC 4.6.4
ARM GCC 5.4
ARM GCC 6.3.0
ARM GCC 6.4.0
ARM GCC 7.3.0
ARM GCC 7.5.0
ARM GCC 8.2.0
ARM GCC 8.5.0
ARM GCC 9.3.0
ARM GCC 9.4.0
ARM GCC 9.5.0
ARM GCC trunk
ARM gcc 10.2.1 (none)
ARM gcc 10.3.1 (2021.07 none)
ARM gcc 10.3.1 (2021.10 none)
ARM gcc 11.2.1 (none)
ARM gcc 5.4.1 (none)
ARM gcc 7.2.1 (none)
ARM gcc 8.2 (WinCE)
ARM gcc 8.3.1 (none)
ARM gcc 9.2.1 (none)
ARM msvc v19.0 (ex-WINE)
ARM msvc v19.10 (ex-WINE)
ARM msvc v19.14 (ex-WINE)
ARM64 Morello gcc 10.1 Alpha 2
ARM64 gcc 10.2
ARM64 gcc 10.3
ARM64 gcc 10.4
ARM64 gcc 10.5.0
ARM64 gcc 11.1
ARM64 gcc 11.2
ARM64 gcc 11.3
ARM64 gcc 11.4.0
ARM64 gcc 12.1
ARM64 gcc 12.2.0
ARM64 gcc 12.3.0
ARM64 gcc 12.4.0
ARM64 gcc 12.5.0
ARM64 gcc 13.1.0
ARM64 gcc 13.2.0
ARM64 gcc 13.3.0
ARM64 gcc 13.4.0
ARM64 gcc 14.1.0
ARM64 gcc 14.2.0
ARM64 gcc 14.3.0
ARM64 gcc 15.1.0
ARM64 gcc 15.2.0
ARM64 gcc 4.9.4
ARM64 gcc 5.4
ARM64 gcc 5.5.0
ARM64 gcc 6.3
ARM64 gcc 6.4
ARM64 gcc 7.3
ARM64 gcc 7.5
ARM64 gcc 8.2
ARM64 gcc 8.5
ARM64 gcc 9.3
ARM64 gcc 9.4
ARM64 gcc 9.5
ARM64 gcc trunk
ARM64 msvc v19.14 (ex-WINE)
AVR gcc 10.3.0
AVR gcc 11.1.0
AVR gcc 12.1.0
AVR gcc 12.2.0
AVR gcc 12.3.0
AVR gcc 12.4.0
AVR gcc 12.5.0
AVR gcc 13.1.0
AVR gcc 13.2.0
AVR gcc 13.3.0
AVR gcc 13.4.0
AVR gcc 14.1.0
AVR gcc 14.2.0
AVR gcc 14.3.0
AVR gcc 15.1.0
AVR gcc 15.2.0
AVR gcc 4.5.4
AVR gcc 4.6.4
AVR gcc 5.4.0
AVR gcc 9.2.0
AVR gcc 9.3.0
Arduino Mega (1.8.9)
Arduino Uno (1.8.9)
BPF clang (trunk)
BPF clang 13.0.0
BPF clang 14.0.0
BPF clang 15.0.0
BPF clang 16.0.0
BPF clang 17.0.1
BPF clang 18.1.0
BPF clang 19.1.0
BPF clang 20.1.0
BPF clang 21.1.0
EDG (experimental reflection)
EDG 6.5
EDG 6.5 (GNU mode gcc 13)
EDG 6.6
EDG 6.6 (GNU mode gcc 13)
EDG 6.7
EDG 6.7 (GNU mode gcc 14)
FRC 2019
FRC 2020
FRC 2023
HPPA gcc 14.2.0
HPPA gcc 14.3.0
HPPA gcc 15.1.0
HPPA gcc 15.2.0
KVX ACB 4.1.0 (GCC 7.5.0)
KVX ACB 4.1.0-cd1 (GCC 7.5.0)
KVX ACB 4.10.0 (GCC 10.3.1)
KVX ACB 4.11.1 (GCC 10.3.1)
KVX ACB 4.12.0 (GCC 11.3.0)
KVX ACB 4.2.0 (GCC 7.5.0)
KVX ACB 4.3.0 (GCC 7.5.0)
KVX ACB 4.4.0 (GCC 7.5.0)
KVX ACB 4.6.0 (GCC 9.4.1)
KVX ACB 4.8.0 (GCC 9.4.1)
KVX ACB 4.9.0 (GCC 9.4.1)
KVX ACB 5.0.0 (GCC 12.2.1)
KVX ACB 5.2.0 (GCC 13.2.1)
LoongArch64 clang (trunk)
LoongArch64 clang 17.0.1
LoongArch64 clang 18.1.0
LoongArch64 clang 19.1.0
LoongArch64 clang 20.1.0
LoongArch64 clang 21.1.0
M68K gcc 13.1.0
M68K gcc 13.2.0
M68K gcc 13.3.0
M68K gcc 13.4.0
M68K gcc 14.1.0
M68K gcc 14.2.0
M68K gcc 14.3.0
M68K gcc 15.1.0
M68K gcc 15.2.0
M68k clang (trunk)
MRISC32 gcc (trunk)
MSP430 gcc 4.5.3
MSP430 gcc 5.3.0
MSP430 gcc 6.2.1
MinGW clang 14.0.3
MinGW clang 14.0.6
MinGW clang 15.0.7
MinGW clang 16.0.0
MinGW clang 16.0.2
MinGW gcc 11.3.0
MinGW gcc 12.1.0
MinGW gcc 12.2.0
MinGW gcc 13.1.0
RISC-V (32-bits) gcc (trunk)
RISC-V (32-bits) gcc 10.2.0
RISC-V (32-bits) gcc 10.3.0
RISC-V (32-bits) gcc 11.2.0
RISC-V (32-bits) gcc 11.3.0
RISC-V (32-bits) gcc 11.4.0
RISC-V (32-bits) gcc 12.1.0
RISC-V (32-bits) gcc 12.2.0
RISC-V (32-bits) gcc 12.3.0
RISC-V (32-bits) gcc 12.4.0
RISC-V (32-bits) gcc 12.5.0
RISC-V (32-bits) gcc 13.1.0
RISC-V (32-bits) gcc 13.2.0
RISC-V (32-bits) gcc 13.3.0
RISC-V (32-bits) gcc 13.4.0
RISC-V (32-bits) gcc 14.1.0
RISC-V (32-bits) gcc 14.2.0
RISC-V (32-bits) gcc 14.3.0
RISC-V (32-bits) gcc 15.1.0
RISC-V (32-bits) gcc 15.2.0
RISC-V (32-bits) gcc 8.2.0
RISC-V (32-bits) gcc 8.5.0
RISC-V (32-bits) gcc 9.4.0
RISC-V (64-bits) gcc (trunk)
RISC-V (64-bits) gcc 10.2.0
RISC-V (64-bits) gcc 10.3.0
RISC-V (64-bits) gcc 11.2.0
RISC-V (64-bits) gcc 11.3.0
RISC-V (64-bits) gcc 11.4.0
RISC-V (64-bits) gcc 12.1.0
RISC-V (64-bits) gcc 12.2.0
RISC-V (64-bits) gcc 12.3.0
RISC-V (64-bits) gcc 12.4.0
RISC-V (64-bits) gcc 12.5.0
RISC-V (64-bits) gcc 13.1.0
RISC-V (64-bits) gcc 13.2.0
RISC-V (64-bits) gcc 13.3.0
RISC-V (64-bits) gcc 13.4.0
RISC-V (64-bits) gcc 14.1.0
RISC-V (64-bits) gcc 14.2.0
RISC-V (64-bits) gcc 14.3.0
RISC-V (64-bits) gcc 15.1.0
RISC-V (64-bits) gcc 15.2.0
RISC-V (64-bits) gcc 8.2.0
RISC-V (64-bits) gcc 8.5.0
RISC-V (64-bits) gcc 9.4.0
RISC-V rv32gc clang (trunk)
RISC-V rv32gc clang 10.0.0
RISC-V rv32gc clang 10.0.1
RISC-V rv32gc clang 11.0.0
RISC-V rv32gc clang 11.0.1
RISC-V rv32gc clang 12.0.0
RISC-V rv32gc clang 12.0.1
RISC-V rv32gc clang 13.0.0
RISC-V rv32gc clang 13.0.1
RISC-V rv32gc clang 14.0.0
RISC-V rv32gc clang 15.0.0
RISC-V rv32gc clang 16.0.0
RISC-V rv32gc clang 17.0.1
RISC-V rv32gc clang 18.1.0
RISC-V rv32gc clang 19.1.0
RISC-V rv32gc clang 20.1.0
RISC-V rv32gc clang 21.1.0
RISC-V rv32gc clang 9.0.0
RISC-V rv32gc clang 9.0.1
RISC-V rv64gc clang (trunk)
RISC-V rv64gc clang 10.0.0
RISC-V rv64gc clang 10.0.1
RISC-V rv64gc clang 11.0.0
RISC-V rv64gc clang 11.0.1
RISC-V rv64gc clang 12.0.0
RISC-V rv64gc clang 12.0.1
RISC-V rv64gc clang 13.0.0
RISC-V rv64gc clang 13.0.1
RISC-V rv64gc clang 14.0.0
RISC-V rv64gc clang 15.0.0
RISC-V rv64gc clang 16.0.0
RISC-V rv64gc clang 17.0.1
RISC-V rv64gc clang 18.1.0
RISC-V rv64gc clang 19.1.0
RISC-V rv64gc clang 20.1.0
RISC-V rv64gc clang 21.1.0
RISC-V rv64gc clang 9.0.0
RISC-V rv64gc clang 9.0.1
Raspbian Buster
Raspbian Stretch
SPARC LEON gcc 12.2.0
SPARC LEON gcc 12.3.0
SPARC LEON gcc 12.4.0
SPARC LEON gcc 12.5.0
SPARC LEON gcc 13.1.0
SPARC LEON gcc 13.2.0
SPARC LEON gcc 13.3.0
SPARC LEON gcc 13.4.0
SPARC LEON gcc 14.1.0
SPARC LEON gcc 14.2.0
SPARC LEON gcc 14.3.0
SPARC LEON gcc 15.1.0
SPARC LEON gcc 15.2.0
SPARC gcc 12.2.0
SPARC gcc 12.3.0
SPARC gcc 12.4.0
SPARC gcc 12.5.0
SPARC gcc 13.1.0
SPARC gcc 13.2.0
SPARC gcc 13.3.0
SPARC gcc 13.4.0
SPARC gcc 14.1.0
SPARC gcc 14.2.0
SPARC gcc 14.3.0
SPARC gcc 15.1.0
SPARC gcc 15.2.0
SPARC64 gcc 12.2.0
SPARC64 gcc 12.3.0
SPARC64 gcc 12.4.0
SPARC64 gcc 12.5.0
SPARC64 gcc 13.1.0
SPARC64 gcc 13.2.0
SPARC64 gcc 13.3.0
SPARC64 gcc 13.4.0
SPARC64 gcc 14.1.0
SPARC64 gcc 14.2.0
SPARC64 gcc 14.3.0
SPARC64 gcc 15.1.0
SPARC64 gcc 15.2.0
TI C6x gcc 12.2.0
TI C6x gcc 12.3.0
TI C6x gcc 12.4.0
TI C6x gcc 12.5.0
TI C6x gcc 13.1.0
TI C6x gcc 13.2.0
TI C6x gcc 13.3.0
TI C6x gcc 13.4.0
TI C6x gcc 14.1.0
TI C6x gcc 14.2.0
TI C6x gcc 14.3.0
TI C6x gcc 15.1.0
TI C6x gcc 15.2.0
TI CL430 21.6.1
Tricore gcc 11.3.0 (EEESlab)
VAX gcc NetBSDELF 10.4.0
VAX gcc NetBSDELF 10.5.0 (Nov 15 03:50:22 2023)
VAX gcc NetBSDELF 12.4.0 (Apr 16 05:27 2025)
WebAssembly clang (trunk)
Xtensa ESP32 gcc 11.2.0 (2022r1)
Xtensa ESP32 gcc 12.2.0 (20230208)
Xtensa ESP32 gcc 14.2.0 (20241119)
Xtensa ESP32 gcc 8.2.0 (2019r2)
Xtensa ESP32 gcc 8.2.0 (2020r1)
Xtensa ESP32 gcc 8.2.0 (2020r2)
Xtensa ESP32 gcc 8.4.0 (2020r3)
Xtensa ESP32 gcc 8.4.0 (2021r1)
Xtensa ESP32 gcc 8.4.0 (2021r2)
Xtensa ESP32-S2 gcc 11.2.0 (2022r1)
Xtensa ESP32-S2 gcc 12.2.0 (20230208)
Xtensa ESP32-S2 gcc 14.2.0 (20241119)
Xtensa ESP32-S2 gcc 8.2.0 (2019r2)
Xtensa ESP32-S2 gcc 8.2.0 (2020r1)
Xtensa ESP32-S2 gcc 8.2.0 (2020r2)
Xtensa ESP32-S2 gcc 8.4.0 (2020r3)
Xtensa ESP32-S2 gcc 8.4.0 (2021r1)
Xtensa ESP32-S2 gcc 8.4.0 (2021r2)
Xtensa ESP32-S3 gcc 11.2.0 (2022r1)
Xtensa ESP32-S3 gcc 12.2.0 (20230208)
Xtensa ESP32-S3 gcc 14.2.0 (20241119)
Xtensa ESP32-S3 gcc 8.4.0 (2020r3)
Xtensa ESP32-S3 gcc 8.4.0 (2021r1)
Xtensa ESP32-S3 gcc 8.4.0 (2021r2)
arm64 msvc v19.20 VS16.0
arm64 msvc v19.21 VS16.1
arm64 msvc v19.22 VS16.2
arm64 msvc v19.23 VS16.3
arm64 msvc v19.24 VS16.4
arm64 msvc v19.25 VS16.5
arm64 msvc v19.27 VS16.7
arm64 msvc v19.28 VS16.8
arm64 msvc v19.28 VS16.9
arm64 msvc v19.29 VS16.10
arm64 msvc v19.29 VS16.11
arm64 msvc v19.30 VS17.0
arm64 msvc v19.31 VS17.1
arm64 msvc v19.32 VS17.2
arm64 msvc v19.33 VS17.3
arm64 msvc v19.34 VS17.4
arm64 msvc v19.35 VS17.5
arm64 msvc v19.36 VS17.6
arm64 msvc v19.37 VS17.7
arm64 msvc v19.38 VS17.8
arm64 msvc v19.39 VS17.9
arm64 msvc v19.40 VS17.10
arm64 msvc v19.41 VS17.11
arm64 msvc v19.42 VS17.12
arm64 msvc v19.43 VS17.13
arm64 msvc v19.latest
armv7-a clang (trunk)
armv7-a clang 10.0.0
armv7-a clang 10.0.1
armv7-a clang 11.0.0
armv7-a clang 11.0.1
armv7-a clang 12.0.0
armv7-a clang 12.0.1
armv7-a clang 13.0.0
armv7-a clang 13.0.1
armv7-a clang 14.0.0
armv7-a clang 15.0.0
armv7-a clang 16.0.0
armv7-a clang 17.0.1
armv7-a clang 18.1.0
armv7-a clang 19.1.0
armv7-a clang 20.1.0
armv7-a clang 21.1.0
armv7-a clang 9.0.0
armv7-a clang 9.0.1
armv8-a clang (all architectural features, trunk)
armv8-a clang (trunk)
armv8-a clang 10.0.0
armv8-a clang 10.0.1
armv8-a clang 11.0.0
armv8-a clang 11.0.1
armv8-a clang 12.0.0
armv8-a clang 13.0.0
armv8-a clang 14.0.0
armv8-a clang 15.0.0
armv8-a clang 16.0.0
armv8-a clang 17.0.1
armv8-a clang 18.1.0
armv8-a clang 19.1.0
armv8-a clang 20.1.0
armv8-a clang 21.1.0
armv8-a clang 9.0.0
armv8-a clang 9.0.1
clad trunk (clang 21.1.0)
clad v1.10 (clang 20.1.0)
clad v1.8 (clang 18.1.0)
clad v1.9 (clang 19.1.0)
clad v2.00 (clang 20.1.0)
clang-cl 18.1.0
ellcc 0.1.33
ellcc 0.1.34
ellcc 2017-07-16
ez80-clang 15.0.0
ez80-clang 15.0.7
hexagon-clang 16.0.5
llvm-mos atari2600-3e
llvm-mos atari2600-4k
llvm-mos atari2600-common
llvm-mos atari5200-supercart
llvm-mos atari8-cart-megacart
llvm-mos atari8-cart-std
llvm-mos atari8-cart-xegs
llvm-mos atari8-common
llvm-mos atari8-dos
llvm-mos c128
llvm-mos c64
llvm-mos commodore
llvm-mos cpm65
llvm-mos cx16
llvm-mos dodo
llvm-mos eater
llvm-mos mega65
llvm-mos nes
llvm-mos nes-action53
llvm-mos nes-cnrom
llvm-mos nes-gtrom
llvm-mos nes-mmc1
llvm-mos nes-mmc3
llvm-mos nes-nrom
llvm-mos nes-unrom
llvm-mos nes-unrom-512
llvm-mos osi-c1p
llvm-mos pce
llvm-mos pce-cd
llvm-mos pce-common
llvm-mos pet
llvm-mos rp6502
llvm-mos rpc8e
llvm-mos supervision
llvm-mos vic20
loongarch64 gcc 12.2.0
loongarch64 gcc 12.3.0
loongarch64 gcc 12.4.0
loongarch64 gcc 12.5.0
loongarch64 gcc 13.1.0
loongarch64 gcc 13.2.0
loongarch64 gcc 13.3.0
loongarch64 gcc 13.4.0
loongarch64 gcc 14.1.0
loongarch64 gcc 14.2.0
loongarch64 gcc 14.3.0
loongarch64 gcc 15.1.0
loongarch64 gcc 15.2.0
mips clang 13.0.0
mips clang 14.0.0
mips clang 15.0.0
mips clang 16.0.0
mips clang 17.0.1
mips clang 18.1.0
mips clang 19.1.0
mips clang 20.1.0
mips clang 21.1.0
mips gcc 11.2.0
mips gcc 12.1.0
mips gcc 12.2.0
mips gcc 12.3.0
mips gcc 12.4.0
mips gcc 12.5.0
mips gcc 13.1.0
mips gcc 13.2.0
mips gcc 13.3.0
mips gcc 13.4.0
mips gcc 14.1.0
mips gcc 14.2.0
mips gcc 14.3.0
mips gcc 15.1.0
mips gcc 15.2.0
mips gcc 4.9.4
mips gcc 5.4
mips gcc 5.5.0
mips gcc 9.3.0 (codescape)
mips gcc 9.5.0
mips64 (el) gcc 12.1.0
mips64 (el) gcc 12.2.0
mips64 (el) gcc 12.3.0
mips64 (el) gcc 12.4.0
mips64 (el) gcc 12.5.0
mips64 (el) gcc 13.1.0
mips64 (el) gcc 13.2.0
mips64 (el) gcc 13.3.0
mips64 (el) gcc 13.4.0
mips64 (el) gcc 14.1.0
mips64 (el) gcc 14.2.0
mips64 (el) gcc 14.3.0
mips64 (el) gcc 15.1.0
mips64 (el) gcc 15.2.0
mips64 (el) gcc 4.9.4
mips64 (el) gcc 5.4.0
mips64 (el) gcc 5.5.0
mips64 (el) gcc 9.5.0
mips64 clang 13.0.0
mips64 clang 14.0.0
mips64 clang 15.0.0
mips64 clang 16.0.0
mips64 clang 17.0.1
mips64 clang 18.1.0
mips64 clang 19.1.0
mips64 clang 20.1.0
mips64 clang 21.1.0
mips64 gcc 11.2.0
mips64 gcc 12.1.0
mips64 gcc 12.2.0
mips64 gcc 12.3.0
mips64 gcc 12.4.0
mips64 gcc 12.5.0
mips64 gcc 13.1.0
mips64 gcc 13.2.0
mips64 gcc 13.3.0
mips64 gcc 13.4.0
mips64 gcc 14.1.0
mips64 gcc 14.2.0
mips64 gcc 14.3.0
mips64 gcc 15.1.0
mips64 gcc 15.2.0
mips64 gcc 4.9.4
mips64 gcc 5.4.0
mips64 gcc 5.5.0
mips64 gcc 9.5.0
mips64el clang 13.0.0
mips64el clang 14.0.0
mips64el clang 15.0.0
mips64el clang 16.0.0
mips64el clang 17.0.1
mips64el clang 18.1.0
mips64el clang 19.1.0
mips64el clang 20.1.0
mips64el clang 21.1.0
mipsel clang 13.0.0
mipsel clang 14.0.0
mipsel clang 15.0.0
mipsel clang 16.0.0
mipsel clang 17.0.1
mipsel clang 18.1.0
mipsel clang 19.1.0
mipsel clang 20.1.0
mipsel clang 21.1.0
mipsel gcc 12.1.0
mipsel gcc 12.2.0
mipsel gcc 12.3.0
mipsel gcc 12.4.0
mipsel gcc 12.5.0
mipsel gcc 13.1.0
mipsel gcc 13.2.0
mipsel gcc 13.3.0
mipsel gcc 13.4.0
mipsel gcc 14.1.0
mipsel gcc 14.2.0
mipsel gcc 14.3.0
mipsel gcc 15.1.0
mipsel gcc 15.2.0
mipsel gcc 4.9.4
mipsel gcc 5.4.0
mipsel gcc 5.5.0
mipsel gcc 9.5.0
nanoMIPS gcc 6.3.0 (mtk)
power gcc 11.2.0
power gcc 12.1.0
power gcc 12.2.0
power gcc 12.3.0
power gcc 12.4.0
power gcc 12.5.0
power gcc 13.1.0
power gcc 13.2.0
power gcc 13.3.0
power gcc 13.4.0
power gcc 14.1.0
power gcc 14.2.0
power gcc 14.3.0
power gcc 15.1.0
power gcc 15.2.0
power gcc 4.8.5
power64 AT12.0 (gcc8)
power64 AT13.0 (gcc9)
power64 gcc 11.2.0
power64 gcc 12.1.0
power64 gcc 12.2.0
power64 gcc 12.3.0
power64 gcc 12.4.0
power64 gcc 12.5.0
power64 gcc 13.1.0
power64 gcc 13.2.0
power64 gcc 13.3.0
power64 gcc 13.4.0
power64 gcc 14.1.0
power64 gcc 14.2.0
power64 gcc 14.3.0
power64 gcc 15.1.0
power64 gcc 15.2.0
power64 gcc trunk
power64le AT12.0 (gcc8)
power64le AT13.0 (gcc9)
power64le clang (trunk)
power64le gcc 11.2.0
power64le gcc 12.1.0
power64le gcc 12.2.0
power64le gcc 12.3.0
power64le gcc 12.4.0
power64le gcc 12.5.0
power64le gcc 13.1.0
power64le gcc 13.2.0
power64le gcc 13.3.0
power64le gcc 13.4.0
power64le gcc 14.1.0
power64le gcc 14.2.0
power64le gcc 14.3.0
power64le gcc 15.1.0
power64le gcc 15.2.0
power64le gcc 6.3.0
power64le gcc trunk
powerpc64 clang (trunk)
qnx 8.0.0
s390x gcc 11.2.0
s390x gcc 12.1.0
s390x gcc 12.2.0
s390x gcc 12.3.0
s390x gcc 12.4.0
s390x gcc 12.5.0
s390x gcc 13.1.0
s390x gcc 13.2.0
s390x gcc 13.3.0
s390x gcc 13.4.0
s390x gcc 14.1.0
s390x gcc 14.2.0
s390x gcc 14.3.0
s390x gcc 15.1.0
s390x gcc 15.2.0
sh gcc 12.2.0
sh gcc 12.3.0
sh gcc 12.4.0
sh gcc 12.5.0
sh gcc 13.1.0
sh gcc 13.2.0
sh gcc 13.3.0
sh gcc 13.4.0
sh gcc 14.1.0
sh gcc 14.2.0
sh gcc 14.3.0
sh gcc 15.1.0
sh gcc 15.2.0
sh gcc 4.9.4
sh gcc 9.5.0
vast (trunk)
x64 msvc v19.0 (ex-WINE)
x64 msvc v19.10 (ex-WINE)
x64 msvc v19.14 (ex-WINE)
x64 msvc v19.20 VS16.0
x64 msvc v19.21 VS16.1
x64 msvc v19.22 VS16.2
x64 msvc v19.23 VS16.3
x64 msvc v19.24 VS16.4
x64 msvc v19.25 VS16.5
x64 msvc v19.27 VS16.7
x64 msvc v19.28 VS16.8
x64 msvc v19.28 VS16.9
x64 msvc v19.29 VS16.10
x64 msvc v19.29 VS16.11
x64 msvc v19.30 VS17.0
x64 msvc v19.31 VS17.1
x64 msvc v19.32 VS17.2
x64 msvc v19.33 VS17.3
x64 msvc v19.34 VS17.4
x64 msvc v19.35 VS17.5
x64 msvc v19.36 VS17.6
x64 msvc v19.37 VS17.7
x64 msvc v19.38 VS17.8
x64 msvc v19.39 VS17.9
x64 msvc v19.40 VS17.10
x64 msvc v19.41 VS17.11
x64 msvc v19.42 VS17.12
x64 msvc v19.43 VS17.13
x64 msvc v19.latest
x86 djgpp 4.9.4
x86 djgpp 5.5.0
x86 djgpp 6.4.0
x86 djgpp 7.2.0
x86 msvc v19.0 (ex-WINE)
x86 msvc v19.10 (ex-WINE)
x86 msvc v19.14 (ex-WINE)
x86 msvc v19.20 VS16.0
x86 msvc v19.21 VS16.1
x86 msvc v19.22 VS16.2
x86 msvc v19.23 VS16.3
x86 msvc v19.24 VS16.4
x86 msvc v19.25 VS16.5
x86 msvc v19.27 VS16.7
x86 msvc v19.28 VS16.8
x86 msvc v19.28 VS16.9
x86 msvc v19.29 VS16.10
x86 msvc v19.29 VS16.11
x86 msvc v19.30 VS17.0
x86 msvc v19.31 VS17.1
x86 msvc v19.32 VS17.2
x86 msvc v19.33 VS17.3
x86 msvc v19.34 VS17.4
x86 msvc v19.35 VS17.5
x86 msvc v19.36 VS17.6
x86 msvc v19.37 VS17.7
x86 msvc v19.38 VS17.8
x86 msvc v19.39 VS17.9
x86 msvc v19.40 VS17.10
x86 msvc v19.41 VS17.11
x86 msvc v19.42 VS17.12
x86 msvc v19.43 VS17.13
x86 msvc v19.latest
x86 nvc++ 22.11
x86 nvc++ 22.7
x86 nvc++ 22.9
x86 nvc++ 23.1
x86 nvc++ 23.11
x86 nvc++ 23.3
x86 nvc++ 23.5
x86 nvc++ 23.7
x86 nvc++ 23.9
x86 nvc++ 24.1
x86 nvc++ 24.11
x86 nvc++ 24.3
x86 nvc++ 24.5
x86 nvc++ 24.7
x86 nvc++ 24.9
x86 nvc++ 25.1
x86 nvc++ 25.3
x86 nvc++ 25.5
x86 nvc++ 25.7
x86-64 Zapcc 190308
x86-64 clang (-fimplicit-constexpr)
x86-64 clang (Chris Bazley N3089)
x86-64 clang (EricWF contracts)
x86-64 clang (amd-staging)
x86-64 clang (assertions trunk)
x86-64 clang (clangir)
x86-64 clang (experimental -Wlifetime)
x86-64 clang (experimental P1061)
x86-64 clang (experimental P1144)
x86-64 clang (experimental P1221)
x86-64 clang (experimental P2998)
x86-64 clang (experimental P3068)
x86-64 clang (experimental P3309)
x86-64 clang (experimental P3367)
x86-64 clang (experimental P3372)
x86-64 clang (experimental P3385)
x86-64 clang (experimental P3776)
x86-64 clang (experimental metaprogramming - P2632)
x86-64 clang (old concepts branch)
x86-64 clang (p1974)
x86-64 clang (pattern matching - P2688)
x86-64 clang (reflection - C++26)
x86-64 clang (reflection - TS)
x86-64 clang (resugar)
x86-64 clang (string interpolation - P3412)
x86-64 clang (thephd.dev)
x86-64 clang (trunk)
x86-64 clang (variadic friends - P2893)
x86-64 clang (widberg)
x86-64 clang 10.0.0
x86-64 clang 10.0.0 (assertions)
x86-64 clang 10.0.1
x86-64 clang 11.0.0
x86-64 clang 11.0.0 (assertions)
x86-64 clang 11.0.1
x86-64 clang 12.0.0
x86-64 clang 12.0.0 (assertions)
x86-64 clang 12.0.1
x86-64 clang 13.0.0
x86-64 clang 13.0.0 (assertions)
x86-64 clang 13.0.1
x86-64 clang 14.0.0
x86-64 clang 14.0.0 (assertions)
x86-64 clang 15.0.0
x86-64 clang 15.0.0 (assertions)
x86-64 clang 16.0.0
x86-64 clang 16.0.0 (assertions)
x86-64 clang 17.0.1
x86-64 clang 17.0.1 (assertions)
x86-64 clang 18.1.0
x86-64 clang 18.1.0 (assertions)
x86-64 clang 19.1.0
x86-64 clang 19.1.0 (assertions)
x86-64 clang 2.6.0 (assertions)
x86-64 clang 2.7.0 (assertions)
x86-64 clang 2.8.0 (assertions)
x86-64 clang 2.9.0 (assertions)
x86-64 clang 20.1.0
x86-64 clang 20.1.0 (assertions)
x86-64 clang 21.1.0
x86-64 clang 21.1.0 (assertions)
x86-64 clang 3.0.0
x86-64 clang 3.0.0 (assertions)
x86-64 clang 3.1
x86-64 clang 3.1 (assertions)
x86-64 clang 3.2
x86-64 clang 3.2 (assertions)
x86-64 clang 3.3
x86-64 clang 3.3 (assertions)
x86-64 clang 3.4 (assertions)
x86-64 clang 3.4.1
x86-64 clang 3.5
x86-64 clang 3.5 (assertions)
x86-64 clang 3.5.1
x86-64 clang 3.5.2
x86-64 clang 3.6
x86-64 clang 3.6 (assertions)
x86-64 clang 3.7
x86-64 clang 3.7 (assertions)
x86-64 clang 3.7.1
x86-64 clang 3.8
x86-64 clang 3.8 (assertions)
x86-64 clang 3.8.1
x86-64 clang 3.9.0
x86-64 clang 3.9.0 (assertions)
x86-64 clang 3.9.1
x86-64 clang 4.0.0
x86-64 clang 4.0.0 (assertions)
x86-64 clang 4.0.1
x86-64 clang 5.0.0
x86-64 clang 5.0.0 (assertions)
x86-64 clang 5.0.1
x86-64 clang 5.0.2
x86-64 clang 6.0.0
x86-64 clang 6.0.0 (assertions)
x86-64 clang 6.0.1
x86-64 clang 7.0.0
x86-64 clang 7.0.0 (assertions)
x86-64 clang 7.0.1
x86-64 clang 7.1.0
x86-64 clang 8.0.0
x86-64 clang 8.0.0 (assertions)
x86-64 clang 8.0.1
x86-64 clang 9.0.0
x86-64 clang 9.0.0 (assertions)
x86-64 clang 9.0.1
x86-64 clang rocm-4.5.2
x86-64 clang rocm-5.0.2
x86-64 clang rocm-5.1.3
x86-64 clang rocm-5.2.3
x86-64 clang rocm-5.3.3
x86-64 clang rocm-5.7.0
x86-64 clang rocm-6.0.2
x86-64 clang rocm-6.1.2
x86-64 clang rocm-6.2.4
x86-64 clang rocm-6.3.3
x86-64 clang rocm-6.4.0
x86-64 gcc (P2034 lambdas)
x86-64 gcc (contract labels)
x86-64 gcc (contracts natural syntax)
x86-64 gcc (contracts)
x86-64 gcc (coroutines)
x86-64 gcc (modules)
x86-64 gcc (trunk)
x86-64 gcc 10.1
x86-64 gcc 10.2
x86-64 gcc 10.3
x86-64 gcc 10.3 (assertions)
x86-64 gcc 10.4
x86-64 gcc 10.4 (assertions)
x86-64 gcc 10.5
x86-64 gcc 10.5 (assertions)
x86-64 gcc 11.1
x86-64 gcc 11.1 (assertions)
x86-64 gcc 11.2
x86-64 gcc 11.2 (assertions)
x86-64 gcc 11.3
x86-64 gcc 11.3 (assertions)
x86-64 gcc 11.4
x86-64 gcc 11.4 (assertions)
x86-64 gcc 12.1
x86-64 gcc 12.1 (assertions)
x86-64 gcc 12.2
x86-64 gcc 12.2 (assertions)
x86-64 gcc 12.3
x86-64 gcc 12.3 (assertions)
x86-64 gcc 12.4
x86-64 gcc 12.4 (assertions)
x86-64 gcc 12.5
x86-64 gcc 12.5 (assertions)
x86-64 gcc 13.1
x86-64 gcc 13.1 (assertions)
x86-64 gcc 13.2
x86-64 gcc 13.2 (assertions)
x86-64 gcc 13.3
x86-64 gcc 13.3 (assertions)
x86-64 gcc 13.4
x86-64 gcc 13.4 (assertions)
x86-64 gcc 14.1
x86-64 gcc 14.1 (assertions)
x86-64 gcc 14.2
x86-64 gcc 14.2 (assertions)
x86-64 gcc 14.3
x86-64 gcc 14.3 (assertions)
x86-64 gcc 15.1
x86-64 gcc 15.1 (assertions)
x86-64 gcc 15.2
x86-64 gcc 15.2 (assertions)
x86-64 gcc 3.4.6
x86-64 gcc 4.0.4
x86-64 gcc 4.1.2
x86-64 gcc 4.4.7
x86-64 gcc 4.5.3
x86-64 gcc 4.6.4
x86-64 gcc 4.7.1
x86-64 gcc 4.7.2
x86-64 gcc 4.7.3
x86-64 gcc 4.7.4
x86-64 gcc 4.8.1
x86-64 gcc 4.8.2
x86-64 gcc 4.8.3
x86-64 gcc 4.8.4
x86-64 gcc 4.8.5
x86-64 gcc 4.9.0
x86-64 gcc 4.9.1
x86-64 gcc 4.9.2
x86-64 gcc 4.9.3
x86-64 gcc 4.9.4
x86-64 gcc 5.1
x86-64 gcc 5.2
x86-64 gcc 5.3
x86-64 gcc 5.4
x86-64 gcc 5.5
x86-64 gcc 6.1
x86-64 gcc 6.2
x86-64 gcc 6.3
x86-64 gcc 6.4
x86-64 gcc 6.5
x86-64 gcc 7.1
x86-64 gcc 7.2
x86-64 gcc 7.3
x86-64 gcc 7.4
x86-64 gcc 7.5
x86-64 gcc 8.1
x86-64 gcc 8.2
x86-64 gcc 8.3
x86-64 gcc 8.4
x86-64 gcc 8.5
x86-64 gcc 9.1
x86-64 gcc 9.2
x86-64 gcc 9.3
x86-64 gcc 9.4
x86-64 gcc 9.5
x86-64 icc 13.0.1
x86-64 icc 16.0.3
x86-64 icc 17.0.0
x86-64 icc 18.0.0
x86-64 icc 19.0.0
x86-64 icc 19.0.1
x86-64 icc 2021.1.2
x86-64 icc 2021.10.0
x86-64 icc 2021.2.0
x86-64 icc 2021.3.0
x86-64 icc 2021.4.0
x86-64 icc 2021.5.0
x86-64 icc 2021.6.0
x86-64 icc 2021.7.0
x86-64 icc 2021.7.1
x86-64 icc 2021.8.0
x86-64 icc 2021.9.0
x86-64 icx 2021.1.2
x86-64 icx 2021.2.0
x86-64 icx 2021.3.0
x86-64 icx 2021.4.0
x86-64 icx 2022.0.0
x86-64 icx 2022.1.0
x86-64 icx 2022.2.0
x86-64 icx 2022.2.1
x86-64 icx 2023.0.0
x86-64 icx 2023.1.0
x86-64 icx 2023.2.1
x86-64 icx 2024.0.0
x86-64 icx 2024.1.0
x86-64 icx 2024.2.0
x86-64 icx 2024.2.1
x86-64 icx 2025.0.0
x86-64 icx 2025.0.1
x86-64 icx 2025.0.3
x86-64 icx 2025.0.4
x86-64 icx 2025.1.0
x86-64 icx 2025.1.1
x86-64 icx 2025.2.0
x86-64 icx 2025.2.1
x86-64 icx 2025.2.1
z180-clang 15.0.0
z180-clang 15.0.7
z80-clang 15.0.0
z80-clang 15.0.7
zig c++ 0.10.0
zig c++ 0.11.0
zig c++ 0.12.0
zig c++ 0.12.1
zig c++ 0.13.0
zig c++ 0.14.0
zig c++ 0.14.1
zig c++ 0.15.1
zig c++ 0.6.0
zig c++ 0.7.0
zig c++ 0.7.1
zig c++ 0.8.0
zig c++ 0.9.0
zig c++ trunk
Options
Source code
#include <cinttypes> #include <cstdio> #include <climits> typedef int32_t v4si __attribute__((vector_size(16))) __attribute__((aligned(16))); typedef int64_t v2si __attribute__((vector_size(16))) __attribute__((aligned(16))); #if __x86_64__ #include <emmintrin.h> #include <immintrin.h> __m128i multiply32_low_alternative(__m128i a, __m128i b) { auto alow = _mm_shuffle_epi32(a, _MM_SHUFFLE(1,1,0,0)); auto amask = _mm_cmpgt_epi32(_mm_setzero_si128(), alow ); auto blow = _mm_shuffle_epi32(b, _MM_SHUFFLE(1,1,0,0)); auto bmask = _mm_cmpgt_epi32(_mm_setzero_si128(), blow ); auto maskAb = _mm_mul_epu32(amask,blow); auto maskBa = _mm_mul_epu32(bmask,alow); auto maskMultiplyAdd = _mm_slli_epi64(_mm_add_epi64(maskAb, maskBa), 32); auto regMultiply = _mm_mul_epu32(alow,blow); return _mm_add_epi64(regMultiply, maskMultiplyAdd); } __m128i multiply32_high_alternative(__m128i a, __m128i b) { auto alow = _mm_shuffle_epi32(a, _MM_SHUFFLE(3,3,2,2)); auto amask = _mm_cmpgt_epi32(_mm_setzero_si128(), alow ); auto blow = _mm_shuffle_epi32(b, _MM_SHUFFLE(3,3,2,2)); auto bmask = _mm_cmpgt_epi32(_mm_setzero_si128(), blow ); auto maskAb = _mm_mul_epu32(amask,blow); auto maskBa = _mm_mul_epu32(bmask,alow); auto maskMultiplyAdd = _mm_slli_epi64(_mm_add_epi64(maskAb, maskBa), 32); auto regMultiply = _mm_mul_epu32(alow,blow); return _mm_add_epi64(regMultiply, maskMultiplyAdd); } __m128i multiply32_low_scalar(__m128i a, __m128i b) { volatile v4si aSpill = (v4si) a; volatile v4si bSpill = (v4si) b; int64_t ab0 = int64_t(aSpill[0])*int64_t(bSpill[0]); int64_t ab1 = int64_t(aSpill[1])*int64_t(bSpill[1]); v2si out{ab0, ab1}; return out; } __m128i multiply32_high_scalar(__m128i a, __m128i b) { volatile v4si aSpill = (v4si) a; volatile v4si bSpill = (v4si) b; int64_t ab2 = int64_t(aSpill[2])*int64_t(bSpill[2]); int64_t ab3 = int64_t(aSpill[3])*int64_t(bSpill[3]); v2si out{ab2, ab3}; return out; } __m128i multiply32_low_s(__m128i a, __m128i b) { auto alow = _mm_shuffle_epi32(a, _MM_SHUFFLE(1,1,0,0)); auto amask = _mm_cmpgt_epi32(_mm_setzero_si128(), alow ); auto blow = _mm_shuffle_epi32(b, _MM_SHUFFLE(1,1,0,0)); auto bmask = _mm_cmpgt_epi32(_mm_setzero_si128(), blow ); alow = _mm_xor_si128(alow,amask); blow = _mm_xor_si128(blow,bmask); alow = _mm_sub_epi32(alow,amask); blow = _mm_sub_epi32(blow,bmask); auto absProd = _mm_mul_epu32(alow, blow); auto signMask = _mm_xor_si128(amask,bmask); auto out = _mm_sub_epi64(_mm_xor_si128(absProd, signMask),signMask); return out; } __m128i multiply32_high_s(__m128i a, __m128i b) { auto alow = _mm_shuffle_epi32(a, _MM_SHUFFLE(3,3,2,2)); auto amask = _mm_cmpgt_epi32(_mm_setzero_si128(), alow ); auto blow = _mm_shuffle_epi32(b, _MM_SHUFFLE(3,3,2,2)); auto bmask = _mm_cmpgt_epi32(_mm_setzero_si128(), blow ); alow = _mm_xor_si128(alow,amask); blow = _mm_xor_si128(blow,bmask); alow = _mm_sub_epi32(alow,amask); blow = _mm_sub_epi32(blow,bmask); auto absProd = _mm_mul_epu32(alow, blow); auto signMask = _mm_xor_si128(amask,bmask); auto out = _mm_sub_epi64(_mm_xor_si128(absProd, signMask),signMask); return out; } __attribute__((target("sse4.1"))) static inline __m128i multiply32_low_s_sse4(__m128i a, __m128i b) { auto ahigh = _mm_shuffle_epi32(a, _MM_SHUFFLE(1,1,0,0)); auto bhigh = _mm_shuffle_epi32(b, _MM_SHUFFLE(1,1,0,0)); auto out = _mm_mul_epi32(ahigh,bhigh); return out; } __attribute__((target("sse4.1"))) static inline __m128i multiply32_high_s_sse4(__m128i a, __m128i b) { auto ahigh = _mm_shuffle_epi32(a, _MM_SHUFFLE(3,3,2,2)); auto bhigh = _mm_shuffle_epi32(b, _MM_SHUFFLE(3,3,2,2)); auto out = _mm_mul_epi32(ahigh,bhigh); return out; } void runLoop_multiply32_alternative(__m128i *pA, __m128i *pB, __m128i *pOut) { for (int i = 0; i < 10000; ++i) { auto lowAB = multiply32_low_alternative(_mm_loadu_si128(pA+i), _mm_loadu_si128(pB+i)); auto highAB = multiply32_high_alternative(_mm_loadu_si128(pA+i), _mm_loadu_si128(pB+i)); pOut[2*i] = _mm_add_epi64(lowAB,lowAB); pOut[2*i+1] = _mm_add_epi64(highAB,highAB); } } void runLoop_multiply32_original(__m128i *pA, __m128i *pB, __m128i *pOut) { for (int i = 0; i < 10000; ++i) { auto lowAB = multiply32_low_s(_mm_loadu_si128(pA+i), _mm_loadu_si128(pB+i)); auto highAB = multiply32_high_s(_mm_loadu_si128(pA+i), _mm_loadu_si128(pB+i)); pOut[2*i] = _mm_add_epi64(lowAB,lowAB); pOut[2*i+1] = _mm_add_epi64(highAB,highAB); } } void runLoop_multiply32_original_xnnpack_unaligned(__m128i *pA, __m128i *pMultiplier, __m128i *pRounding, __m128i *pOut) { for (int i = 0; i < 1024; i+= 16) { auto multiplier = _mm_loadu_si128(pMultiplier+i); auto rounding = _mm_loadu_si128(pRounding+i); for (int j = 0; j < 8; ++j) { auto lowAB = _mm_add_epi64(multiply32_low_s(_mm_loadu_si128(pA+i+2*j), multiplier), rounding); auto highAB = _mm_add_epi64(multiply32_high_s(_mm_loadu_si128(pA+i+2*j), multiplier), rounding); _mm_storeu_si128(pOut+i+2*j, _mm_add_epi64(lowAB,lowAB)); _mm_storeu_si128(pOut+i+2*j+1, _mm_add_epi64(highAB,highAB)); } } } void runLoop_multiply32_original_xnnpack_sse41_unaligned(__m128i *pA, __m128i *pMultiplier, __m128i *pRounding, __m128i *pOut) { for (int i = 0; i < 1024; i+= 16) { auto multiplier = _mm_loadu_si128(pMultiplier+i); auto rounding = _mm_loadu_si128(pRounding+i); for (int j = 0; j < 8; ++j) { auto lowAB = _mm_add_epi64(multiply32_low_s_sse4(_mm_loadu_si128(pA+i+2*j), multiplier), rounding); auto highAB = _mm_add_epi64(multiply32_high_s_sse4(_mm_loadu_si128(pA+i+2*j), multiplier), rounding); _mm_storeu_si128(pOut+i+2*j, _mm_add_epi64(lowAB,lowAB)); _mm_storeu_si128(pOut+i+2*j+1, _mm_add_epi64(highAB,highAB)); } } } void runLoop_multiply32_original_xnnpack_aligned(__m128i *pA, __m128i *pMultiplier, __m128i *pRounding, __m128i *pOut) { for (int i = 0; i < 1024; i+= 16) { auto multiplier = pMultiplier[i]; auto rounding = pRounding[i]; for (int j = 0; j < 8; ++j) { auto lowAB = _mm_add_epi64(multiply32_low_s(pA[i+2*j], multiplier), rounding); auto highAB = _mm_add_epi64(multiply32_high_s(pA[i+2*j], multiplier), rounding); pOut[i+2*j]= _mm_add_epi64(lowAB,lowAB); pOut[i+2*j+1]= _mm_add_epi64(highAB,highAB); } } } void runLoop_multiply32_original_xnnpack_scalar_unaligned(__m128i *pA, __m128i *pMultiplier, __m128i *pRounding, __m128i *pOut) { for (int i = 0; i < 1024; i+= 16) { auto multiplier = _mm_loadu_si128(pMultiplier+i); auto rounding = _mm_loadu_si128(pRounding+i); for (int j = 0; j < 8; ++j) { auto lowAB = _mm_add_epi64(multiply32_low_scalar(_mm_loadu_si128(pA+i+2*j), multiplier), rounding); auto highAB = _mm_add_epi64(multiply32_high_scalar(_mm_loadu_si128(pA+i+2*j), multiplier), rounding); _mm_storeu_si128(pOut+i+2*j, _mm_add_epi64(lowAB,lowAB)); _mm_storeu_si128(pOut+i+2*j+1, _mm_add_epi64(highAB,highAB)); } } } int main(int argc, char* argv[]) { v4si a{-5,50000,-5000,50000}; v4si b{10,-100,-5000,500000000}; // auto c = multiply32_low_s(a,b); // auto d = multiply32_high_s(a,b); auto c = multiply32_low_alternative(a,b); auto d = multiply32_high_alternative(a,b); printf("%lld %lld\n", c[0],c[1]); printf("%lld %lld\n", d[0],d[1]); } #elif __arm__ #endif
analysis source #2
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
OSACA (0.7.0)
llvm-mca (assertions trunk)
llvm-mca (trunk)
Options
Source code
.LBB7_1: # =>This Inner Loop Header: Depth=1 movdqu xmm1, xmmword ptr [rsi + rax] movdqu xmm8, xmmword ptr [rdx + rax] pshufd xmm4, xmm1, 80 # xmm4 = xmm1[0,0,1,1] movdqa xmm3, xmm4 psrad xmm3, 31 pshufd xmm2, xmm1, 250 # xmm2 = xmm1[2,2,3,3] movdqa xmm1, xmm2 psrad xmm1, 31 movdqu xmm6, xmmword ptr [rdi + rax] pshufd xmm7, xmm6, 80 # xmm7 = xmm6[0,0,1,1] movdqa xmm0, xmm7 psrad xmm0, 31 pmuludq xmm0, xmm4 movdqa xmm5, xmm7 pmuludq xmm5, xmm3 paddq xmm5, xmm0 psllq xmm5, 32 pmuludq xmm7, xmm4 paddq xmm7, xmm8 paddq xmm7, xmm5 pshufd xmm0, xmm6, 250 # xmm0 = xmm6[2,2,3,3] movdqa xmm5, xmm0 psrad xmm5, 31 pmuludq xmm5, xmm2 movdqa xmm6, xmm0 pmuludq xmm6, xmm1 paddq xmm6, xmm5 psllq xmm6, 32 pmuludq xmm0, xmm2 paddq xmm0, xmm8 paddq xmm0, xmm6 paddq xmm7, xmm7 movdqa xmmword ptr [rcx + rax], xmm7 paddq xmm0, xmm0 movdqa xmmword ptr [rcx + rax + 16], xmm0 movdqu xmm0, xmmword ptr [rdi + rax + 32] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 pmuludq xmm6, xmm4 movdqa xmm7, xmm5 pmuludq xmm7, xmm3 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm5, xmm4 paddq xmm5, xmm8 paddq xmm5, xmm7 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm6, xmm0 psrad xmm6, 31 pmuludq xmm6, xmm2 movdqa xmm7, xmm0 pmuludq xmm7, xmm1 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm0, xmm2 paddq xmm0, xmm8 paddq xmm0, xmm7 paddq xmm5, xmm5 movdqa xmmword ptr [rcx + rax + 32], xmm5 paddq xmm0, xmm0 movdqa xmmword ptr [rcx + rax + 48], xmm0 movdqu xmm0, xmmword ptr [rdi + rax + 64] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 pmuludq xmm6, xmm4 movdqa xmm7, xmm5 pmuludq xmm7, xmm3 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm5, xmm4 paddq xmm5, xmm8 paddq xmm5, xmm7 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm6, xmm0 psrad xmm6, 31 pmuludq xmm6, xmm2 movdqa xmm7, xmm0 pmuludq xmm7, xmm1 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm0, xmm2 paddq xmm0, xmm8 paddq xmm0, xmm7 paddq xmm5, xmm5 movdqa xmmword ptr [rcx + rax + 64], xmm5 paddq xmm0, xmm0 movdqa xmmword ptr [rcx + rax + 80], xmm0 movdqu xmm0, xmmword ptr [rdi + rax + 96] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 pmuludq xmm6, xmm4 movdqa xmm7, xmm5 pmuludq xmm7, xmm3 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm5, xmm4 paddq xmm5, xmm8 paddq xmm5, xmm7 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm6, xmm0 psrad xmm6, 31 pmuludq xmm6, xmm2 movdqa xmm7, xmm0 pmuludq xmm7, xmm1 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm0, xmm2 paddq xmm0, xmm8 paddq xmm0, xmm7 paddq xmm5, xmm5 movdqa xmmword ptr [rcx + rax + 96], xmm5 paddq xmm0, xmm0 movdqa xmmword ptr [rcx + rax + 112], xmm0 movdqu xmm0, xmmword ptr [rdi + rax + 128] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 pmuludq xmm6, xmm4 movdqa xmm7, xmm5 pmuludq xmm7, xmm3 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm5, xmm4 paddq xmm5, xmm8 paddq xmm5, xmm7 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm6, xmm0 psrad xmm6, 31 pmuludq xmm6, xmm2 movdqa xmm7, xmm0 pmuludq xmm7, xmm1 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm0, xmm2 paddq xmm0, xmm8 paddq xmm0, xmm7 paddq xmm5, xmm5 movdqa xmmword ptr [rcx + rax + 128], xmm5 paddq xmm0, xmm0 movdqa xmmword ptr [rcx + rax + 144], xmm0 movdqu xmm0, xmmword ptr [rdi + rax + 160] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 pmuludq xmm6, xmm4 movdqa xmm7, xmm5 pmuludq xmm7, xmm3 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm5, xmm4 paddq xmm5, xmm8 paddq xmm5, xmm7 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm6, xmm0 psrad xmm6, 31 pmuludq xmm6, xmm2 movdqa xmm7, xmm0 pmuludq xmm7, xmm1 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm0, xmm2 paddq xmm0, xmm8 paddq xmm0, xmm7 paddq xmm5, xmm5 movdqa xmmword ptr [rcx + rax + 160], xmm5 paddq xmm0, xmm0 movdqa xmmword ptr [rcx + rax + 176], xmm0 movdqu xmm0, xmmword ptr [rdi + rax + 192] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 pmuludq xmm6, xmm4 movdqa xmm7, xmm5 pmuludq xmm7, xmm3 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm5, xmm4 paddq xmm5, xmm8 paddq xmm5, xmm7 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm6, xmm0 psrad xmm6, 31 pmuludq xmm6, xmm2 movdqa xmm7, xmm0 pmuludq xmm7, xmm1 paddq xmm7, xmm6 psllq xmm7, 32 pmuludq xmm0, xmm2 paddq xmm0, xmm8 paddq xmm0, xmm7 paddq xmm5, xmm5 movdqa xmmword ptr [rcx + rax + 192], xmm5 paddq xmm0, xmm0 movdqa xmmword ptr [rcx + rax + 208], xmm0 movdqu xmm0, xmmword ptr [rdi + rax + 224] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 pmuludq xmm6, xmm4 pmuludq xmm3, xmm5 paddq xmm3, xmm6 psllq xmm3, 32 pmuludq xmm5, xmm4 paddq xmm5, xmm8 paddq xmm5, xmm3 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm3, xmm0 psrad xmm3, 31 pmuludq xmm3, xmm2 pmuludq xmm1, xmm0 paddq xmm1, xmm3 psllq xmm1, 32 pmuludq xmm0, xmm2 paddq xmm0, xmm8 paddq xmm0, xmm1 paddq xmm5, xmm5 movdqa xmmword ptr [rcx + rax + 224], xmm5 paddq xmm0, xmm0 movdqa xmmword ptr [rcx + rax + 240], xmm0 add rax, 256 add r8, 16 cmp r8, 1008 jb .LBB7_1
analysis source #3
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
OSACA (0.7.0)
llvm-mca (assertions trunk)
llvm-mca (trunk)
Options
Source code
.LBB6_1: # =>This Inner Loop Header: Depth=1 movdqu xmm1, xmmword ptr [rsi + rax] movdqu xmm3, xmmword ptr [rdx + rax] pshufd xmm4, xmm1, 80 # xmm4 = xmm1[0,0,1,1] movdqa xmm8, xmm4 psrad xmm8, 31 paddd xmm4, xmm8 pxor xmm4, xmm8 pshufd xmm2, xmm1, 250 # xmm2 = xmm1[2,2,3,3] movdqa xmm1, xmm2 psrad xmm1, 31 paddd xmm2, xmm1 pxor xmm2, xmm1 movdqu xmm5, xmmword ptr [rdi + rax] pshufd xmm6, xmm5, 80 # xmm6 = xmm5[0,0,1,1] movdqa xmm7, xmm6 psrad xmm7, 31 paddd xmm6, xmm7 pxor xmm6, xmm7 pmuludq xmm6, xmm4 pxor xmm7, xmm8 pxor xmm6, xmm7 movdqa xmm0, xmm3 psubq xmm0, xmm7 paddq xmm0, xmm6 pshufd xmm5, xmm5, 250 # xmm5 = xmm5[2,2,3,3] movdqa xmm6, xmm5 psrad xmm6, 31 paddd xmm5, xmm6 pxor xmm5, xmm6 pmuludq xmm5, xmm2 pxor xmm6, xmm1 pxor xmm5, xmm6 movdqa xmm7, xmm3 psubq xmm7, xmm6 paddq xmm7, xmm5 paddq xmm0, xmm0 movdqu xmmword ptr [rcx + rax], xmm0 paddq xmm7, xmm7 movdqu xmmword ptr [rcx + rax + 16], xmm7 movdqu xmm0, xmmword ptr [rdi + rax + 32] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 paddd xmm5, xmm6 pxor xmm5, xmm6 pmuludq xmm5, xmm4 pxor xmm6, xmm8 pxor xmm5, xmm6 movdqa xmm7, xmm3 psubq xmm7, xmm6 paddq xmm7, xmm5 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm5, xmm0 psrad xmm5, 31 paddd xmm0, xmm5 pxor xmm0, xmm5 pmuludq xmm0, xmm2 pxor xmm5, xmm1 pxor xmm0, xmm5 movdqa xmm6, xmm3 psubq xmm6, xmm5 paddq xmm6, xmm0 paddq xmm7, xmm7 movdqu xmmword ptr [rcx + rax + 32], xmm7 paddq xmm6, xmm6 movdqu xmmword ptr [rcx + rax + 48], xmm6 movdqu xmm0, xmmword ptr [rdi + rax + 64] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 paddd xmm5, xmm6 pxor xmm5, xmm6 pmuludq xmm5, xmm4 pxor xmm6, xmm8 pxor xmm5, xmm6 movdqa xmm7, xmm3 psubq xmm7, xmm6 paddq xmm7, xmm5 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm5, xmm0 psrad xmm5, 31 paddd xmm0, xmm5 pxor xmm0, xmm5 pmuludq xmm0, xmm2 pxor xmm5, xmm1 pxor xmm0, xmm5 movdqa xmm6, xmm3 psubq xmm6, xmm5 paddq xmm6, xmm0 paddq xmm7, xmm7 movdqu xmmword ptr [rcx + rax + 64], xmm7 paddq xmm6, xmm6 movdqu xmmword ptr [rcx + rax + 80], xmm6 movdqu xmm0, xmmword ptr [rdi + rax + 96] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 paddd xmm5, xmm6 pxor xmm5, xmm6 pmuludq xmm5, xmm4 pxor xmm6, xmm8 pxor xmm5, xmm6 movdqa xmm7, xmm3 psubq xmm7, xmm6 paddq xmm7, xmm5 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm5, xmm0 psrad xmm5, 31 paddd xmm0, xmm5 pxor xmm0, xmm5 pmuludq xmm0, xmm2 pxor xmm5, xmm1 pxor xmm0, xmm5 movdqa xmm6, xmm3 psubq xmm6, xmm5 paddq xmm6, xmm0 paddq xmm7, xmm7 movdqu xmmword ptr [rcx + rax + 96], xmm7 paddq xmm6, xmm6 movdqu xmmword ptr [rcx + rax + 112], xmm6 movdqu xmm0, xmmword ptr [rdi + rax + 128] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 paddd xmm5, xmm6 pxor xmm5, xmm6 pmuludq xmm5, xmm4 pxor xmm6, xmm8 pxor xmm5, xmm6 movdqa xmm7, xmm3 psubq xmm7, xmm6 paddq xmm7, xmm5 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm5, xmm0 psrad xmm5, 31 paddd xmm0, xmm5 pxor xmm0, xmm5 pmuludq xmm0, xmm2 pxor xmm5, xmm1 pxor xmm0, xmm5 movdqa xmm6, xmm3 psubq xmm6, xmm5 paddq xmm6, xmm0 paddq xmm7, xmm7 movdqu xmmword ptr [rcx + rax + 128], xmm7 paddq xmm6, xmm6 movdqu xmmword ptr [rcx + rax + 144], xmm6 movdqu xmm0, xmmword ptr [rdi + rax + 160] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 paddd xmm5, xmm6 pxor xmm5, xmm6 pmuludq xmm5, xmm4 pxor xmm6, xmm8 pxor xmm5, xmm6 movdqa xmm7, xmm3 psubq xmm7, xmm6 paddq xmm7, xmm5 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm5, xmm0 psrad xmm5, 31 paddd xmm0, xmm5 pxor xmm0, xmm5 pmuludq xmm0, xmm2 pxor xmm5, xmm1 pxor xmm0, xmm5 movdqa xmm6, xmm3 psubq xmm6, xmm5 paddq xmm6, xmm0 paddq xmm7, xmm7 movdqu xmmword ptr [rcx + rax + 160], xmm7 paddq xmm6, xmm6 movdqu xmmword ptr [rcx + rax + 176], xmm6 movdqu xmm0, xmmword ptr [rdi + rax + 192] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 paddd xmm5, xmm6 pxor xmm5, xmm6 pmuludq xmm5, xmm4 pxor xmm6, xmm8 pxor xmm5, xmm6 movdqa xmm7, xmm3 psubq xmm7, xmm6 paddq xmm7, xmm5 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm5, xmm0 psrad xmm5, 31 paddd xmm0, xmm5 pxor xmm0, xmm5 pmuludq xmm0, xmm2 pxor xmm5, xmm1 pxor xmm0, xmm5 movdqa xmm6, xmm3 psubq xmm6, xmm5 paddq xmm6, xmm0 paddq xmm7, xmm7 movdqu xmmword ptr [rcx + rax + 192], xmm7 paddq xmm6, xmm6 movdqu xmmword ptr [rcx + rax + 208], xmm6 movdqu xmm0, xmmword ptr [rdi + rax + 224] pshufd xmm5, xmm0, 80 # xmm5 = xmm0[0,0,1,1] movdqa xmm6, xmm5 psrad xmm6, 31 paddd xmm5, xmm6 pxor xmm5, xmm6 pmuludq xmm5, xmm4 pxor xmm6, xmm8 pxor xmm5, xmm6 movdqa xmm4, xmm3 psubq xmm4, xmm6 paddq xmm4, xmm5 pshufd xmm0, xmm0, 250 # xmm0 = xmm0[2,2,3,3] movdqa xmm5, xmm0 psrad xmm5, 31 paddd xmm0, xmm5 pxor xmm0, xmm5 pmuludq xmm0, xmm2 pxor xmm5, xmm1 pxor xmm0, xmm5 psubq xmm3, xmm5 paddq xmm3, xmm0 paddq xmm4, xmm4 movdqu xmmword ptr [rcx + rax + 224], xmm4 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 240], xmm3 add r8, 16 add rax, 256 cmp r8, 1008 jb .LBB6_1
analysis source #4
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
OSACA (0.7.0)
llvm-mca (assertions trunk)
llvm-mca (trunk)
Options
Source code
.LBB7_1: # =>This Inner Loop Header: Depth=1 movdqu xmm1, xmmword ptr [rsi + rax] movdqu xmm0, xmmword ptr [rdx + rax] pshufd xmm2, xmm1, 80 # xmm2 = xmm1[0,0,1,1] pshufd xmm1, xmm1, 250 # xmm1 = xmm1[2,2,3,3] movdqu xmm3, xmmword ptr [rdi + rax] pmovzxdq xmm4, xmm3 # xmm4 = xmm3[0],zero,xmm3[1],zero pmuldq xmm4, xmm2 paddq xmm4, xmm0 pshufd xmm3, xmm3, 250 # xmm3 = xmm3[2,2,3,3] pmuldq xmm3, xmm1 paddq xmm3, xmm0 paddq xmm4, xmm4 movdqu xmmword ptr [rcx + rax], xmm4 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 16], xmm3 movdqu xmm3, xmmword ptr [rdi + rax + 32] pmovzxdq xmm4, xmm3 # xmm4 = xmm3[0],zero,xmm3[1],zero pmuldq xmm4, xmm2 paddq xmm4, xmm0 pshufd xmm3, xmm3, 250 # xmm3 = xmm3[2,2,3,3] pmuldq xmm3, xmm1 paddq xmm3, xmm0 paddq xmm4, xmm4 movdqu xmmword ptr [rcx + rax + 32], xmm4 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 48], xmm3 movdqu xmm3, xmmword ptr [rdi + rax + 64] pmovzxdq xmm4, xmm3 # xmm4 = xmm3[0],zero,xmm3[1],zero pmuldq xmm4, xmm2 paddq xmm4, xmm0 pshufd xmm3, xmm3, 250 # xmm3 = xmm3[2,2,3,3] pmuldq xmm3, xmm1 paddq xmm3, xmm0 paddq xmm4, xmm4 movdqu xmmword ptr [rcx + rax + 64], xmm4 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 80], xmm3 movdqu xmm3, xmmword ptr [rdi + rax + 96] pmovzxdq xmm4, xmm3 # xmm4 = xmm3[0],zero,xmm3[1],zero pmuldq xmm4, xmm2 paddq xmm4, xmm0 pshufd xmm3, xmm3, 250 # xmm3 = xmm3[2,2,3,3] pmuldq xmm3, xmm1 paddq xmm3, xmm0 paddq xmm4, xmm4 movdqu xmmword ptr [rcx + rax + 96], xmm4 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 112], xmm3 movdqu xmm3, xmmword ptr [rdi + rax + 128] pmovzxdq xmm4, xmm3 # xmm4 = xmm3[0],zero,xmm3[1],zero pmuldq xmm4, xmm2 paddq xmm4, xmm0 pshufd xmm3, xmm3, 250 # xmm3 = xmm3[2,2,3,3] pmuldq xmm3, xmm1 paddq xmm3, xmm0 paddq xmm4, xmm4 movdqu xmmword ptr [rcx + rax + 128], xmm4 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 144], xmm3 movdqu xmm3, xmmword ptr [rdi + rax + 160] pmovzxdq xmm4, xmm3 # xmm4 = xmm3[0],zero,xmm3[1],zero pmuldq xmm4, xmm2 paddq xmm4, xmm0 pshufd xmm3, xmm3, 250 # xmm3 = xmm3[2,2,3,3] pmuldq xmm3, xmm1 paddq xmm3, xmm0 paddq xmm4, xmm4 movdqu xmmword ptr [rcx + rax + 160], xmm4 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 176], xmm3 movdqu xmm3, xmmword ptr [rdi + rax + 192] pmovzxdq xmm4, xmm3 # xmm4 = xmm3[0],zero,xmm3[1],zero pmuldq xmm4, xmm2 paddq xmm4, xmm0 pshufd xmm3, xmm3, 250 # xmm3 = xmm3[2,2,3,3] pmuldq xmm3, xmm1 paddq xmm3, xmm0 paddq xmm4, xmm4 movdqu xmmword ptr [rcx + rax + 192], xmm4 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 208], xmm3 movdqu xmm3, xmmword ptr [rdi + rax + 224] pmovzxdq xmm4, xmm3 # xmm4 = xmm3[0],zero,xmm3[1],zero pmuldq xmm4, xmm2 paddq xmm4, xmm0 pshufd xmm2, xmm3, 250 # xmm2 = xmm3[2,2,3,3] pmuldq xmm2, xmm1 paddq xmm2, xmm0 paddq xmm4, xmm4 movdqu xmmword ptr [rcx + rax + 224], xmm4 paddq xmm2, xmm2 movdqu xmmword ptr [rcx + rax + 240], xmm2 add r8, 16 add rax, 256 cmp r8, 1008 jb .LBB7_1
analysis source #5
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
OSACA (0.7.0)
llvm-mca (assertions trunk)
llvm-mca (trunk)
Options
Source code
.LBB12_1: # =>This Inner Loop Header: Depth=1 movdqu xmm1, xmmword ptr [rsi + rax] movdqu xmm0, xmmword ptr [rdx + rax] movd ebx, xmm1 movsxd r14, ebx pshufd xmm2, xmm1, 229 # xmm2 = xmm1[1,1,2,3] movd ebx, xmm2 movsxd r11, ebx pshufd xmm2, xmm1, 78 # xmm2 = xmm1[2,3,0,1] movd ebx, xmm2 movsxd r10, ebx pshufd xmm1, xmm1, 231 # xmm1 = xmm1[3,1,2,3] movd ebx, xmm1 movsxd r9, ebx movdqu xmm1, xmmword ptr [rdi + rax] movd ebx, xmm1 movsxd rbx, ebx imul rbx, r14 pshufd xmm2, xmm1, 229 # xmm2 = xmm1[1,1,2,3] movd ebp, xmm2 movsxd rbp, ebp imul rbp, r11 movq xmm2, rbx movq xmm3, rbp punpcklqdq xmm2, xmm3 # xmm2 = xmm2[0],xmm3[0] paddq xmm2, xmm0 pshufd xmm3, xmm1, 78 # xmm3 = xmm1[2,3,0,1] movd ebx, xmm3 movsxd rbx, ebx imul rbx, r10 pshufd xmm1, xmm1, 231 # xmm1 = xmm1[3,1,2,3] movd ebp, xmm1 movsxd rbp, ebp imul rbp, r9 movq xmm1, rbp movq xmm3, rbx punpcklqdq xmm3, xmm1 # xmm3 = xmm3[0],xmm1[0] paddq xmm3, xmm0 paddq xmm2, xmm2 movdqu xmmword ptr [rcx + rax], xmm2 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 16], xmm3 movdqu xmm1, xmmword ptr [rdi + rax + 32] movd ebx, xmm1 movsxd rbx, ebx imul rbx, r14 pshufd xmm2, xmm1, 229 # xmm2 = xmm1[1,1,2,3] movd ebp, xmm2 movsxd rbp, ebp imul rbp, r11 movq xmm2, rbx movq xmm3, rbp punpcklqdq xmm2, xmm3 # xmm2 = xmm2[0],xmm3[0] paddq xmm2, xmm0 pshufd xmm3, xmm1, 78 # xmm3 = xmm1[2,3,0,1] movd ebx, xmm3 movsxd rbx, ebx imul rbx, r10 pshufd xmm1, xmm1, 231 # xmm1 = xmm1[3,1,2,3] movd ebp, xmm1 movsxd rbp, ebp imul rbp, r9 movq xmm1, rbp movq xmm3, rbx punpcklqdq xmm3, xmm1 # xmm3 = xmm3[0],xmm1[0] paddq xmm3, xmm0 paddq xmm2, xmm2 movdqu xmmword ptr [rcx + rax + 32], xmm2 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 48], xmm3 movdqu xmm1, xmmword ptr [rdi + rax + 64] movd ebx, xmm1 movsxd rbx, ebx imul rbx, r14 pshufd xmm2, xmm1, 229 # xmm2 = xmm1[1,1,2,3] movd ebp, xmm2 movsxd rbp, ebp imul rbp, r11 movq xmm2, rbx movq xmm3, rbp punpcklqdq xmm2, xmm3 # xmm2 = xmm2[0],xmm3[0] paddq xmm2, xmm0 pshufd xmm3, xmm1, 78 # xmm3 = xmm1[2,3,0,1] movd ebx, xmm3 movsxd rbx, ebx imul rbx, r10 pshufd xmm1, xmm1, 231 # xmm1 = xmm1[3,1,2,3] movd ebp, xmm1 movsxd rbp, ebp imul rbp, r9 movq xmm1, rbp movq xmm3, rbx punpcklqdq xmm3, xmm1 # xmm3 = xmm3[0],xmm1[0] paddq xmm3, xmm0 paddq xmm2, xmm2 movdqu xmmword ptr [rcx + rax + 64], xmm2 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 80], xmm3 movdqu xmm1, xmmword ptr [rdi + rax + 96] movd ebx, xmm1 movsxd rbx, ebx imul rbx, r14 pshufd xmm2, xmm1, 229 # xmm2 = xmm1[1,1,2,3] movd ebp, xmm2 movsxd rbp, ebp imul rbp, r11 movq xmm2, rbx movq xmm3, rbp punpcklqdq xmm2, xmm3 # xmm2 = xmm2[0],xmm3[0] paddq xmm2, xmm0 pshufd xmm3, xmm1, 78 # xmm3 = xmm1[2,3,0,1] movd ebx, xmm3 movsxd rbx, ebx imul rbx, r10 pshufd xmm1, xmm1, 231 # xmm1 = xmm1[3,1,2,3] movd ebp, xmm1 movsxd rbp, ebp imul rbp, r9 movq xmm1, rbp movq xmm3, rbx punpcklqdq xmm3, xmm1 # xmm3 = xmm3[0],xmm1[0] paddq xmm3, xmm0 paddq xmm2, xmm2 movdqu xmmword ptr [rcx + rax + 96], xmm2 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 112], xmm3 movdqu xmm1, xmmword ptr [rdi + rax + 128] movd ebx, xmm1 movsxd rbx, ebx imul rbx, r14 pshufd xmm2, xmm1, 229 # xmm2 = xmm1[1,1,2,3] movd ebp, xmm2 movsxd rbp, ebp imul rbp, r11 movq xmm2, rbx movq xmm3, rbp punpcklqdq xmm2, xmm3 # xmm2 = xmm2[0],xmm3[0] paddq xmm2, xmm0 pshufd xmm3, xmm1, 78 # xmm3 = xmm1[2,3,0,1] movd ebx, xmm3 movsxd rbx, ebx imul rbx, r10 pshufd xmm1, xmm1, 231 # xmm1 = xmm1[3,1,2,3] movd ebp, xmm1 movsxd rbp, ebp imul rbp, r9 movq xmm1, rbp movq xmm3, rbx punpcklqdq xmm3, xmm1 # xmm3 = xmm3[0],xmm1[0] paddq xmm3, xmm0 paddq xmm2, xmm2 movdqu xmmword ptr [rcx + rax + 128], xmm2 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 144], xmm3 movdqu xmm1, xmmword ptr [rdi + rax + 160] movd ebx, xmm1 movsxd rbx, ebx imul rbx, r14 pshufd xmm2, xmm1, 229 # xmm2 = xmm1[1,1,2,3] movd ebp, xmm2 movsxd rbp, ebp imul rbp, r11 movq xmm2, rbx movq xmm3, rbp punpcklqdq xmm2, xmm3 # xmm2 = xmm2[0],xmm3[0] paddq xmm2, xmm0 pshufd xmm3, xmm1, 78 # xmm3 = xmm1[2,3,0,1] movd ebx, xmm3 movsxd rbx, ebx imul rbx, r10 pshufd xmm1, xmm1, 231 # xmm1 = xmm1[3,1,2,3] movd ebp, xmm1 movsxd rbp, ebp imul rbp, r9 movq xmm1, rbp movq xmm3, rbx punpcklqdq xmm3, xmm1 # xmm3 = xmm3[0],xmm1[0] paddq xmm3, xmm0 paddq xmm2, xmm2 movdqu xmmword ptr [rcx + rax + 160], xmm2 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 176], xmm3 movdqu xmm1, xmmword ptr [rdi + rax + 192] movd ebx, xmm1 movsxd rbx, ebx imul rbx, r14 pshufd xmm2, xmm1, 229 # xmm2 = xmm1[1,1,2,3] movd ebp, xmm2 movsxd rbp, ebp imul rbp, r11 movq xmm2, rbx movq xmm3, rbp punpcklqdq xmm2, xmm3 # xmm2 = xmm2[0],xmm3[0] paddq xmm2, xmm0 pshufd xmm3, xmm1, 78 # xmm3 = xmm1[2,3,0,1] movd ebx, xmm3 movsxd rbx, ebx imul rbx, r10 pshufd xmm1, xmm1, 231 # xmm1 = xmm1[3,1,2,3] movd ebp, xmm1 movsxd rbp, ebp imul rbp, r9 movq xmm1, rbp movq xmm3, rbx punpcklqdq xmm3, xmm1 # xmm3 = xmm3[0],xmm1[0] paddq xmm3, xmm0 paddq xmm2, xmm2 movdqu xmmword ptr [rcx + rax + 192], xmm2 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 208], xmm3 movdqu xmm1, xmmword ptr [rdi + rax + 224] movd ebx, xmm1 movsxd rbx, ebx imul rbx, r14 pshufd xmm2, xmm1, 229 # xmm2 = xmm1[1,1,2,3] movd ebp, xmm2 movsxd rbp, ebp imul rbp, r11 movq xmm2, rbx movq xmm3, rbp punpcklqdq xmm2, xmm3 # xmm2 = xmm2[0],xmm3[0] paddq xmm2, xmm0 pshufd xmm3, xmm1, 78 # xmm3 = xmm1[2,3,0,1] movd ebp, xmm3 movsxd rbp, ebp imul rbp, r10 pshufd xmm1, xmm1, 231 # xmm1 = xmm1[3,1,2,3] movd ebx, xmm1 movsxd rbx, ebx imul rbx, r9 movq xmm1, rbx movq xmm3, rbp punpcklqdq xmm3, xmm1 # xmm3 = xmm3[0],xmm1[0] paddq xmm3, xmm0 paddq xmm2, xmm2 movdqu xmmword ptr [rcx + rax + 224], xmm2 paddq xmm3, xmm3 movdqu xmmword ptr [rcx + rax + 240], xmm3 add r8, 16 add rax, 256 cmp r8, 1008 jb .LBB12_1
Become a Patron
Sponsor on GitHub
Donate via PayPal
Compiler Explorer Shop
Source on GitHub
Mailing list
Installed libraries
Wiki
Report an issue
How it works
Contact the author
CE on Mastodon
CE on Bluesky
Statistics
Changelog
Version tree