Thanks for using Compiler Explorer
Sponsors
Jakt
C++
Ada
Analysis
Android Java
Android Kotlin
Assembly
C
C3
Carbon
C++ (Circle)
CIRCT
Clean
CMake
CMakeScript
COBOL
C++ for OpenCL
MLIR
Cppx
Cppx-Blue
Cppx-Gold
Cpp2-cppfront
Crystal
C#
CUDA C++
D
Dart
Elixir
Erlang
Fortran
F#
Go
Haskell
HLSL
Hook
Hylo
ispc
Java
Julia
Kotlin
LLVM IR
LLVM MIR
Modula-2
Nim
Objective-C
Objective-C++
OCaml
OpenCL C
Pascal
Pony
Python
Racket
Ruby
Rust
Snowball
Scala
Solidity
Spice
Swift
LLVM TableGen
Toit
TypeScript Native
V
Vala
Visual Basic
Zig
Javascript
GIMPLE
assembly source #1
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
AArch64 binutils 2.28
AArch64 binutils 2.31.1
AArch64 binutils 2.33.1
AArch64 binutils 2.35.1
AArch64 binutils 2.38
ARM binutils 2.25
ARM binutils 2.28
ARM binutils 2.31.1
ARM gcc 10.2 (linux)
ARM gcc 9.3 (linux)
ARMhf binutils 2.28
BeebAsm 1.09
NASM 2.12.02
NASM 2.13.02
NASM 2.13.03
NASM 2.14.02
NASM 2.16.01
PTX Assembler 10.0.130
PTX Assembler 10.1.105
PTX Assembler 10.1.168
PTX Assembler 10.1.243
PTX Assembler 10.2.89
PTX Assembler 11.0.2
PTX Assembler 11.0.3
PTX Assembler 11.1.0
PTX Assembler 11.1.1
PTX Assembler 11.2.0
PTX Assembler 11.2.1
PTX Assembler 11.2.2
PTX Assembler 11.3.0
PTX Assembler 11.3.1
PTX Assembler 11.4.0
PTX Assembler 11.4.1
PTX Assembler 11.5.0
PTX Assembler 9.1.85
PTX Assembler 9.2.88
RISC-V binutils 2.31.1
RISC-V binutils 2.31.1
RISC-V binutils 2.35.1
RISC-V binutils 2.35.1
RISC-V binutils 2.37.0
RISC-V binutils 2.37.0
RISC-V binutils 2.38.0
RISC-V binutils 2.38.0
x86-64 binutils (trunk)
x86-64 binutils 2.27
x86-64 binutils 2.28
x86-64 binutils 2.29.1
x86-64 binutils 2.34
x86-64 binutils 2.36.1
x86-64 binutils 2.38
x86-64 clang (assertions trunk)
x86-64 clang (trunk)
x86-64 clang 10.0.0
x86-64 clang 10.0.1
x86-64 clang 11.0.0
x86-64 clang 11.0.1
x86-64 clang 12.0.0
x86-64 clang 12.0.1
x86-64 clang 13.0.0
x86-64 clang 14.0.0
x86-64 clang 15.0.0
x86-64 clang 16.0.0
x86-64 clang 17.0.1
x86-64 clang 18.1.0
x86-64 clang 3.0.0
x86-64 clang 3.1
x86-64 clang 3.2
x86-64 clang 3.3
x86-64 clang 3.4.1
x86-64 clang 3.5
x86-64 clang 3.5.1
x86-64 clang 3.5.2
x86-64 clang 3.6
x86-64 clang 3.7
x86-64 clang 3.7.1
x86-64 clang 3.8
x86-64 clang 3.8.1
x86-64 clang 3.9.0
x86-64 clang 3.9.1
x86-64 clang 4.0.0
x86-64 clang 4.0.1
x86-64 clang 5.0.0
x86-64 clang 6.0.0
x86-64 clang 7.0.0
x86-64 clang 8.0.0
x86-64 clang 9.0.0
Options
Source code
fxor: ; long double rdi[2] mov ecx, [rdi+0 + 8] ; exponent and sign, and padded with zeros, not high garbage (custom ABI) sub ecx, [rdi+16 + 8] movd xmm7, ecx ;pabsd xmm7, xmm7 ; but also need to swap which ldouble to shift vs. which to xor with ;jcc some pointer adjustment, or LEA rax, [rdi+16] earlier and jb over xchg here? movq xmm0, [rdi+16] ; zero-extending load of just the mantissa, including explicit leading 1 bit psrlq xmm0, xmm7 ; x2 mantissa >> (e1-e2) saturating on counts >= 64 xorps xmm0, [rdi+0] ; FIXME: leaves the mantissa potentially unnormalized ; which the hardware won't renormalize for us, tested on Skylake. ret ; fixable with lzcnt / shl mantissa / add the lzcnt to the exponent. ; full cancellation of the mantissa needs to change the exponent to produce 0.0 ; maybe easier to manually saturate the shift count in integer regs ; so we can branch on flags set by mantissa xor ; I think non-normals only happen with equal exponents; ; where we didn't shift out any (would be significant) bits. ; AArch64 has saturating integer shifts, but it's harder to justify operating on 80-bit floats on AArch64 align 16 global _start _start: sub rsp, 32 ; vmovaps ymm0, [args] ; vmovaps [rsp], ymm0 ; take a pointer instead of stack values, because [rsp] addr modes need a ModRM lea rdi, [rel args] fld tword [rdi] fld tword [rdi+16] call fxor vmovaps [rsp], xmm0 fldz fld tword [rsp] ; hardware does *not* renormalize for us on an 80-bit float with high bit of mantissa cleared faddp times 4 fadd st0 xor edi,edi mov eax, 231 ; Linux _NR_exit_group syscall section .rodata default rel align 16 args: dt 3.141592653589793 times 3 dw 0 ; align 16 with 0s, not NOPs dt 2.141592653589793 ;result: from info reg $st0 in GDB ; st0 <invalid float value> (raw 0x40004000000000000000) ; after fld ; st0 -nan(0xc000000000000000) (raw 0xffffc000000000000000) ; after faddp with 0.0 ; hardware doesn't like it either (Skylake CPU) %if 0 ;fxor: ; vpminuq xmm2, xmm1, xmm0 ; AVX-512F+VL ; vpmaxuq xmm3, xmm1, xmm0 ; the larger magnitude one. Keep it, right shift the other's mantissa ; nope, doesn't XOR the leading 1 in the mantissa vmaxpd ; AVX1 / SSE2, IDK what I was thinking with AVX-512 integer vminpd ; vpsubq xmm2, xmm1, xmm0 vpsubd xmm2, xmm1, xmm0 psrlq xmm2, 52 psllq xmm1, 12 psrlq xmm1, 12 ; clear the sign/exponent field of %endif
Become a Patron
Sponsor on GitHub
Donate via PayPal
Source on GitHub
Mailing list
Installed libraries
Wiki
Report an issue
How it works
Contact the author
CE on Mastodon
About the author
Statistics
Changelog
Version tree