Thanks for using Compiler Explorer
Sponsors
Jakt
C++
Ada
Analysis
Android Java
Android Kotlin
Assembly
C
C3
Carbon
C++ (Circle)
CIRCT
Clean
CMake
CMakeScript
COBOL
C++ for OpenCL
MLIR
Cppx
Cppx-Blue
Cppx-Gold
Cpp2-cppfront
Crystal
C#
CUDA C++
D
Dart
Elixir
Erlang
Fortran
F#
Go
Haskell
HLSL
Hook
Hylo
ispc
Java
Julia
Kotlin
LLVM IR
LLVM MIR
Modula-2
Nim
Objective-C
Objective-C++
OCaml
OpenCL C
Pascal
Pony
Python
Racket
Ruby
Rust
Snowball
Scala
Solidity
Spice
Swift
LLVM TableGen
Toit
TypeScript Native
V
Vala
Visual Basic
Zig
Javascript
GIMPLE
analysis source #1
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
OSACA (0.5.2)
llvm-mca (assertions trunk)
llvm-mca (trunk)
Options
Source code
;procedure MulMatrixCR(A, B: TMatrix4; var R: TMatrix4); ;asm ; .NOFRAME //------------------------------------------- // Загружаем матрицу A строками //------------------------------------------- movups xmm0, xmmword ptr [A] //A[00..03] movups xmm1, xmmword ptr [A + 16] //A[04..07] movups xmm2, xmmword ptr [A + 32] //A[08..11] movups xmm3, xmmword ptr [A + 48] //A[12..15] //------------------------------------------- // Загружаем матрицу B строками //------------------------------------------- movups xmm4, xmmword ptr [B] //B [03] [02] [01] [00] movups xmm5, xmmword ptr [B + 16] //B [07] [06] [05] [04] movups xmm6, xmmword ptr [B + 32] //B [11] [10] [09] [08] movups xmm7, xmmword ptr [B + 48] //B [15] [14] [13] [12] //------------------------------------------- // Транспонируем матрицу B //------------------------------------------- // Исходный вид // [03] [02] [01] [00] -> [12] [08] [04] [00] // [07] [06] [05] [04] -> [13] [09] [05] [01] // [11] [10] [09] [08] -> [14] [10] [06] [02] // [15] [14] [13] [12] -> [15] [11] [07] [03] // Интерливинг половинок movaps xmm8, xmm6 unpcklps xmm6, xmm7 // [13] [09] [12] [08] unpckhps xmm8, xmm7 // [15] [11] [14] [10] movaps xmm7, xmm4 unpcklps xmm4, xmm5 // [05] [01] [04] [00] unpckhps xmm7, xmm5 // [07] [03] [06] [02] movaps xmm5, xmm4 unpcklpd xmm4, xmm6 // [12] [08] [04] [00] unpckhpd xmm5, xmm6 // [13] [09] [05] [01] movaps xmm6, xmm7 unpcklpd xmm6, xmm8 // [14] [10] [06] [02] unpckhpd xmm7, xmm8 // [15] [11] [07] [03] //------------------------ // Умножаем ряд [0] //------------------------ // Дублируем первую строку movaps xmm9, xmm0 movaps xmm10, xmm0 movaps xmm11, xmm0 // Умножаем первую строку A на столбцы B[0..3] mulps xmm0, xmm4 mulps xmm9, xmm5 mulps xmm10, xmm6 mulps xmm11, xmm7 // Складываем все значения по горизонтали haddps xmm0, xmm0 haddps xmm0, xmm0 haddps xmm9, xmm9 haddps xmm9, xmm9 haddps xmm10, xmm10 haddps xmm10, xmm10 haddps xmm11, xmm11 haddps xmm11, xmm11 // Извлекаем результаты умножения [0 ряд] movlhps xmm0, xmm10 insertps xmm0, xmm9, 01010000b insertps xmm0, xmm11, 11110000b //------------------------ // Умножаем ряд [1] //------------------------ // Дублируем вторую строку movaps xmm9, xmm1 movaps xmm10, xmm1 movaps xmm11, xmm1 // Умножаем вторую строку A на столбцы B[0..3] mulps xmm1, xmm4 mulps xmm9, xmm5 mulps xmm10, xmm6 mulps xmm11, xmm7 // Складываем все значения по горизонтали haddps xmm1, xmm1 haddps xmm1, xmm1 haddps xmm9, xmm9 haddps xmm9, xmm9 haddps xmm10, xmm10 haddps xmm10, xmm10 haddps xmm11, xmm11 haddps xmm11, xmm11 // Извлекаем результаты умножения [1 ряд] movlhps xmm1, xmm10 insertps xmm1, xmm9, 01010000b insertps xmm1, xmm11, 11110000b //------------------------ // Умножаем ряд [2] //------------------------ // Дублируем третью строку movaps xmm9, xmm2 movaps xmm10, xmm2 movaps xmm11, xmm2 // Умножаем третью строку A на столбцы B[0..3] mulps xmm2, xmm4 mulps xmm9, xmm5 mulps xmm10, xmm6 mulps xmm11, xmm7 // Складываем все значения по горизонтали haddps xmm2, xmm2 haddps xmm2, xmm2 haddps xmm9, xmm9 haddps xmm9, xmm9 haddps xmm10, xmm10 haddps xmm10, xmm10 haddps xmm11, xmm11 haddps xmm11, xmm11 // Извлекаем результаты умножения [2 ряд] movlhps xmm2, xmm10 insertps xmm2, xmm9, 01010000b insertps xmm2, xmm11, 11110000b //------------------------ // Умножаем ряд [3] //------------------------ // Дублируем четвертую строку movaps xmm9, xmm3 movaps xmm10, xmm3 movaps xmm11, xmm3 // Умножаем четвертую строку A на столбцы B[0..3] mulps xmm3, xmm4 mulps xmm9, xmm5 mulps xmm10, xmm6 mulps xmm11, xmm7 // Складываем все значения по горизонтали haddps xmm3, xmm3 haddps xmm3, xmm3 haddps xmm9, xmm9 haddps xmm9, xmm9 haddps xmm10, xmm10 haddps xmm10, xmm10 haddps xmm11, xmm11 haddps xmm11, xmm11 // Извлекаем результаты умножения [3 ряд] movlhps xmm3, xmm10 insertps xmm3, xmm9, 01010000b insertps xmm3, xmm11, 11110000b // Результат movups xmmword ptr [R], xmm0 movups xmmword ptr [R + 16], xmm1 movups xmmword ptr [R + 32], xmm2 movups xmmword ptr [R + 48], xmm3 ;end;
Become a Patron
Sponsor on GitHub
Donate via PayPal
Source on GitHub
Mailing list
Installed libraries
Wiki
Report an issue
How it works
Contact the author
CE on Mastodon
About the author
Statistics
Changelog
Version tree