Thanks for using Compiler Explorer
Sponsors
Jakt
C++
Ada
Algol68
Analysis
Android Java
Android Kotlin
Assembly
C
C3
Carbon
C with Coccinelle
C++ with Coccinelle
C++ (Circle)
CIRCT
Clean
Clojure
CMake
CMakeScript
COBOL
C++ for OpenCL
MLIR
Cppx
Cppx-Blue
Cppx-Gold
Cpp2-cppfront
Crystal
C#
CUDA C++
D
Dart
Elixir
Erlang
Fortran
F#
GLSL
Go
Haskell
HLSL
Hook
Hylo
IL
ispc
Java
Julia
Kotlin
LLVM IR
LLVM MIR
Modula-2
Mojo
Nim
Numba
Nix
Objective-C
Objective-C++
OCaml
Odin
OpenCL C
Pascal
Pony
PTX
Python
Racket
Raku
Ruby
Rust
Sail
Snowball
Scala
Slang
Solidity
Spice
SPIR-V
Swift
LLVM TableGen
Toit
Triton
TypeScript Native
V
Vala
Visual Basic
Vyper
WASM
Yul (Solidity IR)
Zig
Javascript
GIMPLE
Ygen
sway
c++ source #1
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
6502-c++ 11.1.0
ARM GCC 10.2.0
ARM GCC 10.3.0
ARM GCC 10.4.0
ARM GCC 10.5.0
ARM GCC 11.1.0
ARM GCC 11.2.0
ARM GCC 11.3.0
ARM GCC 11.4.0
ARM GCC 12.1.0
ARM GCC 12.2.0
ARM GCC 12.3.0
ARM GCC 12.4.0
ARM GCC 12.5.0
ARM GCC 13.1.0
ARM GCC 13.2.0
ARM GCC 13.2.0 (unknown-eabi)
ARM GCC 13.3.0
ARM GCC 13.3.0 (unknown-eabi)
ARM GCC 13.4.0
ARM GCC 13.4.0 (unknown-eabi)
ARM GCC 14.1.0
ARM GCC 14.1.0 (unknown-eabi)
ARM GCC 14.2.0
ARM GCC 14.2.0 (unknown-eabi)
ARM GCC 14.3.0
ARM GCC 14.3.0 (unknown-eabi)
ARM GCC 15.1.0
ARM GCC 15.1.0 (unknown-eabi)
ARM GCC 15.2.0
ARM GCC 15.2.0 (unknown-eabi)
ARM GCC 4.5.4
ARM GCC 4.6.4
ARM GCC 5.4
ARM GCC 6.3.0
ARM GCC 6.4.0
ARM GCC 7.3.0
ARM GCC 7.5.0
ARM GCC 8.2.0
ARM GCC 8.5.0
ARM GCC 9.3.0
ARM GCC 9.4.0
ARM GCC 9.5.0
ARM GCC trunk
ARM gcc 10.2.1 (none)
ARM gcc 10.3.1 (2021.07 none)
ARM gcc 10.3.1 (2021.10 none)
ARM gcc 11.2.1 (none)
ARM gcc 5.4.1 (none)
ARM gcc 7.2.1 (none)
ARM gcc 8.2 (WinCE)
ARM gcc 8.3.1 (none)
ARM gcc 9.2.1 (none)
ARM msvc v19.0 (ex-WINE)
ARM msvc v19.10 (ex-WINE)
ARM msvc v19.14 (ex-WINE)
ARM64 Morello gcc 10.1 Alpha 2
ARM64 gcc 10.2
ARM64 gcc 10.3
ARM64 gcc 10.4
ARM64 gcc 10.5.0
ARM64 gcc 11.1
ARM64 gcc 11.2
ARM64 gcc 11.3
ARM64 gcc 11.4.0
ARM64 gcc 12.1
ARM64 gcc 12.2.0
ARM64 gcc 12.3.0
ARM64 gcc 12.4.0
ARM64 gcc 12.5.0
ARM64 gcc 13.1.0
ARM64 gcc 13.2.0
ARM64 gcc 13.3.0
ARM64 gcc 13.4.0
ARM64 gcc 14.1.0
ARM64 gcc 14.2.0
ARM64 gcc 14.3.0
ARM64 gcc 15.1.0
ARM64 gcc 15.2.0
ARM64 gcc 4.9.4
ARM64 gcc 5.4
ARM64 gcc 5.5.0
ARM64 gcc 6.3
ARM64 gcc 6.4
ARM64 gcc 7.3
ARM64 gcc 7.5
ARM64 gcc 8.2
ARM64 gcc 8.5
ARM64 gcc 9.3
ARM64 gcc 9.4
ARM64 gcc 9.5
ARM64 gcc trunk
ARM64 msvc v19.14 (ex-WINE)
AVR gcc 10.3.0
AVR gcc 11.1.0
AVR gcc 12.1.0
AVR gcc 12.2.0
AVR gcc 12.3.0
AVR gcc 12.4.0
AVR gcc 12.5.0
AVR gcc 13.1.0
AVR gcc 13.2.0
AVR gcc 13.3.0
AVR gcc 13.4.0
AVR gcc 14.1.0
AVR gcc 14.2.0
AVR gcc 14.3.0
AVR gcc 15.1.0
AVR gcc 15.2.0
AVR gcc 4.5.4
AVR gcc 4.6.4
AVR gcc 5.4.0
AVR gcc 9.2.0
AVR gcc 9.3.0
Arduino Mega (1.8.9)
Arduino Uno (1.8.9)
BPF clang (trunk)
BPF clang 13.0.0
BPF clang 14.0.0
BPF clang 15.0.0
BPF clang 16.0.0
BPF clang 17.0.1
BPF clang 18.1.0
BPF clang 19.1.0
BPF clang 20.1.0
BPF clang 21.1.0
EDG (experimental reflection)
EDG 6.5
EDG 6.5 (GNU mode gcc 13)
EDG 6.6
EDG 6.6 (GNU mode gcc 13)
EDG 6.7
EDG 6.7 (GNU mode gcc 14)
FRC 2019
FRC 2020
FRC 2023
HPPA gcc 14.2.0
HPPA gcc 14.3.0
HPPA gcc 15.1.0
HPPA gcc 15.2.0
KVX ACB 4.1.0 (GCC 7.5.0)
KVX ACB 4.1.0-cd1 (GCC 7.5.0)
KVX ACB 4.10.0 (GCC 10.3.1)
KVX ACB 4.11.1 (GCC 10.3.1)
KVX ACB 4.12.0 (GCC 11.3.0)
KVX ACB 4.2.0 (GCC 7.5.0)
KVX ACB 4.3.0 (GCC 7.5.0)
KVX ACB 4.4.0 (GCC 7.5.0)
KVX ACB 4.6.0 (GCC 9.4.1)
KVX ACB 4.8.0 (GCC 9.4.1)
KVX ACB 4.9.0 (GCC 9.4.1)
KVX ACB 5.0.0 (GCC 12.2.1)
KVX ACB 5.2.0 (GCC 13.2.1)
LoongArch64 clang (trunk)
LoongArch64 clang 17.0.1
LoongArch64 clang 18.1.0
LoongArch64 clang 19.1.0
LoongArch64 clang 20.1.0
LoongArch64 clang 21.1.0
M68K gcc 13.1.0
M68K gcc 13.2.0
M68K gcc 13.3.0
M68K gcc 13.4.0
M68K gcc 14.1.0
M68K gcc 14.2.0
M68K gcc 14.3.0
M68K gcc 15.1.0
M68K gcc 15.2.0
M68k clang (trunk)
MRISC32 gcc (trunk)
MSP430 gcc 4.5.3
MSP430 gcc 5.3.0
MSP430 gcc 6.2.1
MinGW clang 14.0.3
MinGW clang 14.0.6
MinGW clang 15.0.7
MinGW clang 16.0.0
MinGW clang 16.0.2
MinGW gcc 11.3.0
MinGW gcc 12.1.0
MinGW gcc 12.2.0
MinGW gcc 13.1.0
MinGW gcc 14.3.0
MinGW gcc 15.2.0
RISC-V (32-bits) gcc (trunk)
RISC-V (32-bits) gcc 10.2.0
RISC-V (32-bits) gcc 10.3.0
RISC-V (32-bits) gcc 11.2.0
RISC-V (32-bits) gcc 11.3.0
RISC-V (32-bits) gcc 11.4.0
RISC-V (32-bits) gcc 12.1.0
RISC-V (32-bits) gcc 12.2.0
RISC-V (32-bits) gcc 12.3.0
RISC-V (32-bits) gcc 12.4.0
RISC-V (32-bits) gcc 12.5.0
RISC-V (32-bits) gcc 13.1.0
RISC-V (32-bits) gcc 13.2.0
RISC-V (32-bits) gcc 13.3.0
RISC-V (32-bits) gcc 13.4.0
RISC-V (32-bits) gcc 14.1.0
RISC-V (32-bits) gcc 14.2.0
RISC-V (32-bits) gcc 14.3.0
RISC-V (32-bits) gcc 15.1.0
RISC-V (32-bits) gcc 15.2.0
RISC-V (32-bits) gcc 8.2.0
RISC-V (32-bits) gcc 8.5.0
RISC-V (32-bits) gcc 9.4.0
RISC-V (64-bits) gcc (trunk)
RISC-V (64-bits) gcc 10.2.0
RISC-V (64-bits) gcc 10.3.0
RISC-V (64-bits) gcc 11.2.0
RISC-V (64-bits) gcc 11.3.0
RISC-V (64-bits) gcc 11.4.0
RISC-V (64-bits) gcc 12.1.0
RISC-V (64-bits) gcc 12.2.0
RISC-V (64-bits) gcc 12.3.0
RISC-V (64-bits) gcc 12.4.0
RISC-V (64-bits) gcc 12.5.0
RISC-V (64-bits) gcc 13.1.0
RISC-V (64-bits) gcc 13.2.0
RISC-V (64-bits) gcc 13.3.0
RISC-V (64-bits) gcc 13.4.0
RISC-V (64-bits) gcc 14.1.0
RISC-V (64-bits) gcc 14.2.0
RISC-V (64-bits) gcc 14.3.0
RISC-V (64-bits) gcc 15.1.0
RISC-V (64-bits) gcc 15.2.0
RISC-V (64-bits) gcc 8.2.0
RISC-V (64-bits) gcc 8.5.0
RISC-V (64-bits) gcc 9.4.0
RISC-V rv32gc clang (trunk)
RISC-V rv32gc clang 10.0.0
RISC-V rv32gc clang 10.0.1
RISC-V rv32gc clang 11.0.0
RISC-V rv32gc clang 11.0.1
RISC-V rv32gc clang 12.0.0
RISC-V rv32gc clang 12.0.1
RISC-V rv32gc clang 13.0.0
RISC-V rv32gc clang 13.0.1
RISC-V rv32gc clang 14.0.0
RISC-V rv32gc clang 15.0.0
RISC-V rv32gc clang 16.0.0
RISC-V rv32gc clang 17.0.1
RISC-V rv32gc clang 18.1.0
RISC-V rv32gc clang 19.1.0
RISC-V rv32gc clang 20.1.0
RISC-V rv32gc clang 21.1.0
RISC-V rv32gc clang 9.0.0
RISC-V rv32gc clang 9.0.1
RISC-V rv64gc clang (trunk)
RISC-V rv64gc clang 10.0.0
RISC-V rv64gc clang 10.0.1
RISC-V rv64gc clang 11.0.0
RISC-V rv64gc clang 11.0.1
RISC-V rv64gc clang 12.0.0
RISC-V rv64gc clang 12.0.1
RISC-V rv64gc clang 13.0.0
RISC-V rv64gc clang 13.0.1
RISC-V rv64gc clang 14.0.0
RISC-V rv64gc clang 15.0.0
RISC-V rv64gc clang 16.0.0
RISC-V rv64gc clang 17.0.1
RISC-V rv64gc clang 18.1.0
RISC-V rv64gc clang 19.1.0
RISC-V rv64gc clang 20.1.0
RISC-V rv64gc clang 21.1.0
RISC-V rv64gc clang 9.0.0
RISC-V rv64gc clang 9.0.1
Raspbian Buster
Raspbian Stretch
SPARC LEON gcc 12.2.0
SPARC LEON gcc 12.3.0
SPARC LEON gcc 12.4.0
SPARC LEON gcc 12.5.0
SPARC LEON gcc 13.1.0
SPARC LEON gcc 13.2.0
SPARC LEON gcc 13.3.0
SPARC LEON gcc 13.4.0
SPARC LEON gcc 14.1.0
SPARC LEON gcc 14.2.0
SPARC LEON gcc 14.3.0
SPARC LEON gcc 15.1.0
SPARC LEON gcc 15.2.0
SPARC gcc 12.2.0
SPARC gcc 12.3.0
SPARC gcc 12.4.0
SPARC gcc 12.5.0
SPARC gcc 13.1.0
SPARC gcc 13.2.0
SPARC gcc 13.3.0
SPARC gcc 13.4.0
SPARC gcc 14.1.0
SPARC gcc 14.2.0
SPARC gcc 14.3.0
SPARC gcc 15.1.0
SPARC gcc 15.2.0
SPARC64 gcc 12.2.0
SPARC64 gcc 12.3.0
SPARC64 gcc 12.4.0
SPARC64 gcc 12.5.0
SPARC64 gcc 13.1.0
SPARC64 gcc 13.2.0
SPARC64 gcc 13.3.0
SPARC64 gcc 13.4.0
SPARC64 gcc 14.1.0
SPARC64 gcc 14.2.0
SPARC64 gcc 14.3.0
SPARC64 gcc 15.1.0
SPARC64 gcc 15.2.0
TI C6x gcc 12.2.0
TI C6x gcc 12.3.0
TI C6x gcc 12.4.0
TI C6x gcc 12.5.0
TI C6x gcc 13.1.0
TI C6x gcc 13.2.0
TI C6x gcc 13.3.0
TI C6x gcc 13.4.0
TI C6x gcc 14.1.0
TI C6x gcc 14.2.0
TI C6x gcc 14.3.0
TI C6x gcc 15.1.0
TI C6x gcc 15.2.0
TI CL430 21.6.1
Tricore gcc 11.3.0 (EEESlab)
VAX gcc NetBSDELF 10.4.0
VAX gcc NetBSDELF 10.5.0 (Nov 15 03:50:22 2023)
VAX gcc NetBSDELF 12.4.0 (Apr 16 05:27 2025)
WebAssembly clang (trunk)
Xtensa ESP32 gcc 11.2.0 (2022r1)
Xtensa ESP32 gcc 12.2.0 (20230208)
Xtensa ESP32 gcc 14.2.0 (20241119)
Xtensa ESP32 gcc 8.2.0 (2019r2)
Xtensa ESP32 gcc 8.2.0 (2020r1)
Xtensa ESP32 gcc 8.2.0 (2020r2)
Xtensa ESP32 gcc 8.4.0 (2020r3)
Xtensa ESP32 gcc 8.4.0 (2021r1)
Xtensa ESP32 gcc 8.4.0 (2021r2)
Xtensa ESP32-S2 gcc 11.2.0 (2022r1)
Xtensa ESP32-S2 gcc 12.2.0 (20230208)
Xtensa ESP32-S2 gcc 14.2.0 (20241119)
Xtensa ESP32-S2 gcc 8.2.0 (2019r2)
Xtensa ESP32-S2 gcc 8.2.0 (2020r1)
Xtensa ESP32-S2 gcc 8.2.0 (2020r2)
Xtensa ESP32-S2 gcc 8.4.0 (2020r3)
Xtensa ESP32-S2 gcc 8.4.0 (2021r1)
Xtensa ESP32-S2 gcc 8.4.0 (2021r2)
Xtensa ESP32-S3 gcc 11.2.0 (2022r1)
Xtensa ESP32-S3 gcc 12.2.0 (20230208)
Xtensa ESP32-S3 gcc 14.2.0 (20241119)
Xtensa ESP32-S3 gcc 8.4.0 (2020r3)
Xtensa ESP32-S3 gcc 8.4.0 (2021r1)
Xtensa ESP32-S3 gcc 8.4.0 (2021r2)
arm64 msvc v19.20 VS16.0
arm64 msvc v19.21 VS16.1
arm64 msvc v19.22 VS16.2
arm64 msvc v19.23 VS16.3
arm64 msvc v19.24 VS16.4
arm64 msvc v19.25 VS16.5
arm64 msvc v19.27 VS16.7
arm64 msvc v19.28 VS16.8
arm64 msvc v19.28 VS16.9
arm64 msvc v19.29 VS16.10
arm64 msvc v19.29 VS16.11
arm64 msvc v19.30 VS17.0
arm64 msvc v19.31 VS17.1
arm64 msvc v19.32 VS17.2
arm64 msvc v19.33 VS17.3
arm64 msvc v19.34 VS17.4
arm64 msvc v19.35 VS17.5
arm64 msvc v19.36 VS17.6
arm64 msvc v19.37 VS17.7
arm64 msvc v19.38 VS17.8
arm64 msvc v19.39 VS17.9
arm64 msvc v19.40 VS17.10
arm64 msvc v19.41 VS17.11
arm64 msvc v19.42 VS17.12
arm64 msvc v19.43 VS17.13
arm64 msvc v19.latest
armv7-a clang (trunk)
armv7-a clang 10.0.0
armv7-a clang 10.0.1
armv7-a clang 11.0.0
armv7-a clang 11.0.1
armv7-a clang 12.0.0
armv7-a clang 12.0.1
armv7-a clang 13.0.0
armv7-a clang 13.0.1
armv7-a clang 14.0.0
armv7-a clang 15.0.0
armv7-a clang 16.0.0
armv7-a clang 17.0.1
armv7-a clang 18.1.0
armv7-a clang 19.1.0
armv7-a clang 20.1.0
armv7-a clang 21.1.0
armv7-a clang 9.0.0
armv7-a clang 9.0.1
armv8-a clang (all architectural features, trunk)
armv8-a clang (trunk)
armv8-a clang 10.0.0
armv8-a clang 10.0.1
armv8-a clang 11.0.0
armv8-a clang 11.0.1
armv8-a clang 12.0.0
armv8-a clang 13.0.0
armv8-a clang 14.0.0
armv8-a clang 15.0.0
armv8-a clang 16.0.0
armv8-a clang 17.0.1
armv8-a clang 18.1.0
armv8-a clang 19.1.0
armv8-a clang 20.1.0
armv8-a clang 21.1.0
armv8-a clang 9.0.0
armv8-a clang 9.0.1
clad trunk (clang 21.1.0)
clad v1.10 (clang 20.1.0)
clad v1.8 (clang 18.1.0)
clad v1.9 (clang 19.1.0)
clad v2.00 (clang 20.1.0)
clad v2.1 (clang 21.1.0)
clang-cl 18.1.0
ellcc 0.1.33
ellcc 0.1.34
ellcc 2017-07-16
ez80-clang 15.0.0
ez80-clang 15.0.7
hexagon-clang 16.0.5
llvm-mos atari2600-3e
llvm-mos atari2600-4k
llvm-mos atari2600-common
llvm-mos atari5200-supercart
llvm-mos atari8-cart-megacart
llvm-mos atari8-cart-std
llvm-mos atari8-cart-xegs
llvm-mos atari8-common
llvm-mos atari8-dos
llvm-mos c128
llvm-mos c64
llvm-mos commodore
llvm-mos cpm65
llvm-mos cx16
llvm-mos dodo
llvm-mos eater
llvm-mos mega65
llvm-mos nes
llvm-mos nes-action53
llvm-mos nes-cnrom
llvm-mos nes-gtrom
llvm-mos nes-mmc1
llvm-mos nes-mmc3
llvm-mos nes-nrom
llvm-mos nes-unrom
llvm-mos nes-unrom-512
llvm-mos osi-c1p
llvm-mos pce
llvm-mos pce-cd
llvm-mos pce-common
llvm-mos pet
llvm-mos rp6502
llvm-mos rpc8e
llvm-mos supervision
llvm-mos vic20
loongarch64 gcc 12.2.0
loongarch64 gcc 12.3.0
loongarch64 gcc 12.4.0
loongarch64 gcc 12.5.0
loongarch64 gcc 13.1.0
loongarch64 gcc 13.2.0
loongarch64 gcc 13.3.0
loongarch64 gcc 13.4.0
loongarch64 gcc 14.1.0
loongarch64 gcc 14.2.0
loongarch64 gcc 14.3.0
loongarch64 gcc 15.1.0
loongarch64 gcc 15.2.0
mips clang 13.0.0
mips clang 14.0.0
mips clang 15.0.0
mips clang 16.0.0
mips clang 17.0.1
mips clang 18.1.0
mips clang 19.1.0
mips clang 20.1.0
mips clang 21.1.0
mips gcc 11.2.0
mips gcc 12.1.0
mips gcc 12.2.0
mips gcc 12.3.0
mips gcc 12.4.0
mips gcc 12.5.0
mips gcc 13.1.0
mips gcc 13.2.0
mips gcc 13.3.0
mips gcc 13.4.0
mips gcc 14.1.0
mips gcc 14.2.0
mips gcc 14.3.0
mips gcc 15.1.0
mips gcc 15.2.0
mips gcc 4.9.4
mips gcc 5.4
mips gcc 5.5.0
mips gcc 9.3.0 (codescape)
mips gcc 9.5.0
mips64 (el) gcc 12.1.0
mips64 (el) gcc 12.2.0
mips64 (el) gcc 12.3.0
mips64 (el) gcc 12.4.0
mips64 (el) gcc 12.5.0
mips64 (el) gcc 13.1.0
mips64 (el) gcc 13.2.0
mips64 (el) gcc 13.3.0
mips64 (el) gcc 13.4.0
mips64 (el) gcc 14.1.0
mips64 (el) gcc 14.2.0
mips64 (el) gcc 14.3.0
mips64 (el) gcc 15.1.0
mips64 (el) gcc 15.2.0
mips64 (el) gcc 4.9.4
mips64 (el) gcc 5.4.0
mips64 (el) gcc 5.5.0
mips64 (el) gcc 9.5.0
mips64 clang 13.0.0
mips64 clang 14.0.0
mips64 clang 15.0.0
mips64 clang 16.0.0
mips64 clang 17.0.1
mips64 clang 18.1.0
mips64 clang 19.1.0
mips64 clang 20.1.0
mips64 clang 21.1.0
mips64 gcc 11.2.0
mips64 gcc 12.1.0
mips64 gcc 12.2.0
mips64 gcc 12.3.0
mips64 gcc 12.4.0
mips64 gcc 12.5.0
mips64 gcc 13.1.0
mips64 gcc 13.2.0
mips64 gcc 13.3.0
mips64 gcc 13.4.0
mips64 gcc 14.1.0
mips64 gcc 14.2.0
mips64 gcc 14.3.0
mips64 gcc 15.1.0
mips64 gcc 15.2.0
mips64 gcc 4.9.4
mips64 gcc 5.4.0
mips64 gcc 5.5.0
mips64 gcc 9.5.0
mips64el clang 13.0.0
mips64el clang 14.0.0
mips64el clang 15.0.0
mips64el clang 16.0.0
mips64el clang 17.0.1
mips64el clang 18.1.0
mips64el clang 19.1.0
mips64el clang 20.1.0
mips64el clang 21.1.0
mipsel clang 13.0.0
mipsel clang 14.0.0
mipsel clang 15.0.0
mipsel clang 16.0.0
mipsel clang 17.0.1
mipsel clang 18.1.0
mipsel clang 19.1.0
mipsel clang 20.1.0
mipsel clang 21.1.0
mipsel gcc 12.1.0
mipsel gcc 12.2.0
mipsel gcc 12.3.0
mipsel gcc 12.4.0
mipsel gcc 12.5.0
mipsel gcc 13.1.0
mipsel gcc 13.2.0
mipsel gcc 13.3.0
mipsel gcc 13.4.0
mipsel gcc 14.1.0
mipsel gcc 14.2.0
mipsel gcc 14.3.0
mipsel gcc 15.1.0
mipsel gcc 15.2.0
mipsel gcc 4.9.4
mipsel gcc 5.4.0
mipsel gcc 5.5.0
mipsel gcc 9.5.0
nanoMIPS gcc 6.3.0 (mtk)
power gcc 11.2.0
power gcc 12.1.0
power gcc 12.2.0
power gcc 12.3.0
power gcc 12.4.0
power gcc 12.5.0
power gcc 13.1.0
power gcc 13.2.0
power gcc 13.3.0
power gcc 13.4.0
power gcc 14.1.0
power gcc 14.2.0
power gcc 14.3.0
power gcc 15.1.0
power gcc 15.2.0
power gcc 4.8.5
power64 AT12.0 (gcc8)
power64 AT13.0 (gcc9)
power64 gcc 11.2.0
power64 gcc 12.1.0
power64 gcc 12.2.0
power64 gcc 12.3.0
power64 gcc 12.4.0
power64 gcc 12.5.0
power64 gcc 13.1.0
power64 gcc 13.2.0
power64 gcc 13.3.0
power64 gcc 13.4.0
power64 gcc 14.1.0
power64 gcc 14.2.0
power64 gcc 14.3.0
power64 gcc 15.1.0
power64 gcc 15.2.0
power64 gcc trunk
power64le AT12.0 (gcc8)
power64le AT13.0 (gcc9)
power64le clang (trunk)
power64le gcc 11.2.0
power64le gcc 12.1.0
power64le gcc 12.2.0
power64le gcc 12.3.0
power64le gcc 12.4.0
power64le gcc 12.5.0
power64le gcc 13.1.0
power64le gcc 13.2.0
power64le gcc 13.3.0
power64le gcc 13.4.0
power64le gcc 14.1.0
power64le gcc 14.2.0
power64le gcc 14.3.0
power64le gcc 15.1.0
power64le gcc 15.2.0
power64le gcc 6.3.0
power64le gcc trunk
powerpc64 clang (trunk)
qnx 8.0.0
s390x gcc 11.2.0
s390x gcc 12.1.0
s390x gcc 12.2.0
s390x gcc 12.3.0
s390x gcc 12.4.0
s390x gcc 12.5.0
s390x gcc 13.1.0
s390x gcc 13.2.0
s390x gcc 13.3.0
s390x gcc 13.4.0
s390x gcc 14.1.0
s390x gcc 14.2.0
s390x gcc 14.3.0
s390x gcc 15.1.0
s390x gcc 15.2.0
sh gcc 12.2.0
sh gcc 12.3.0
sh gcc 12.4.0
sh gcc 12.5.0
sh gcc 13.1.0
sh gcc 13.2.0
sh gcc 13.3.0
sh gcc 13.4.0
sh gcc 14.1.0
sh gcc 14.2.0
sh gcc 14.3.0
sh gcc 15.1.0
sh gcc 15.2.0
sh gcc 4.9.4
sh gcc 9.5.0
vast (trunk)
x64 msvc v19.0 (ex-WINE)
x64 msvc v19.10 (ex-WINE)
x64 msvc v19.14 (ex-WINE)
x64 msvc v19.20 VS16.0
x64 msvc v19.21 VS16.1
x64 msvc v19.22 VS16.2
x64 msvc v19.23 VS16.3
x64 msvc v19.24 VS16.4
x64 msvc v19.25 VS16.5
x64 msvc v19.27 VS16.7
x64 msvc v19.28 VS16.8
x64 msvc v19.28 VS16.9
x64 msvc v19.29 VS16.10
x64 msvc v19.29 VS16.11
x64 msvc v19.30 VS17.0
x64 msvc v19.31 VS17.1
x64 msvc v19.32 VS17.2
x64 msvc v19.33 VS17.3
x64 msvc v19.34 VS17.4
x64 msvc v19.35 VS17.5
x64 msvc v19.36 VS17.6
x64 msvc v19.37 VS17.7
x64 msvc v19.38 VS17.8
x64 msvc v19.39 VS17.9
x64 msvc v19.40 VS17.10
x64 msvc v19.41 VS17.11
x64 msvc v19.42 VS17.12
x64 msvc v19.43 VS17.13
x64 msvc v19.latest
x86 djgpp 4.9.4
x86 djgpp 5.5.0
x86 djgpp 6.4.0
x86 djgpp 7.2.0
x86 msvc v19.0 (ex-WINE)
x86 msvc v19.10 (ex-WINE)
x86 msvc v19.14 (ex-WINE)
x86 msvc v19.20 VS16.0
x86 msvc v19.21 VS16.1
x86 msvc v19.22 VS16.2
x86 msvc v19.23 VS16.3
x86 msvc v19.24 VS16.4
x86 msvc v19.25 VS16.5
x86 msvc v19.27 VS16.7
x86 msvc v19.28 VS16.8
x86 msvc v19.28 VS16.9
x86 msvc v19.29 VS16.10
x86 msvc v19.29 VS16.11
x86 msvc v19.30 VS17.0
x86 msvc v19.31 VS17.1
x86 msvc v19.32 VS17.2
x86 msvc v19.33 VS17.3
x86 msvc v19.34 VS17.4
x86 msvc v19.35 VS17.5
x86 msvc v19.36 VS17.6
x86 msvc v19.37 VS17.7
x86 msvc v19.38 VS17.8
x86 msvc v19.39 VS17.9
x86 msvc v19.40 VS17.10
x86 msvc v19.41 VS17.11
x86 msvc v19.42 VS17.12
x86 msvc v19.43 VS17.13
x86 msvc v19.latest
x86 nvc++ 22.11
x86 nvc++ 22.7
x86 nvc++ 22.9
x86 nvc++ 23.1
x86 nvc++ 23.11
x86 nvc++ 23.3
x86 nvc++ 23.5
x86 nvc++ 23.7
x86 nvc++ 23.9
x86 nvc++ 24.1
x86 nvc++ 24.11
x86 nvc++ 24.3
x86 nvc++ 24.5
x86 nvc++ 24.7
x86 nvc++ 24.9
x86 nvc++ 25.1
x86 nvc++ 25.3
x86 nvc++ 25.5
x86 nvc++ 25.7
x86 nvc++ 25.9
x86-64 Zapcc 190308
x86-64 clang (-fimplicit-constexpr)
x86-64 clang (Chris Bazley N3089)
x86-64 clang (EricWF contracts)
x86-64 clang (amd-staging)
x86-64 clang (assertions trunk)
x86-64 clang (clangir)
x86-64 clang (experimental -Wlifetime)
x86-64 clang (experimental P1061)
x86-64 clang (experimental P1144)
x86-64 clang (experimental P1221)
x86-64 clang (experimental P2998)
x86-64 clang (experimental P3068)
x86-64 clang (experimental P3309)
x86-64 clang (experimental P3367)
x86-64 clang (experimental P3372)
x86-64 clang (experimental P3385)
x86-64 clang (experimental P3776)
x86-64 clang (experimental metaprogramming - P2632)
x86-64 clang (old concepts branch)
x86-64 clang (p1974)
x86-64 clang (pattern matching - P2688)
x86-64 clang (reflection - C++26)
x86-64 clang (reflection - TS)
x86-64 clang (resugar)
x86-64 clang (string interpolation - P3412)
x86-64 clang (thephd.dev)
x86-64 clang (trunk)
x86-64 clang (variadic friends - P2893)
x86-64 clang (widberg)
x86-64 clang 10.0.0
x86-64 clang 10.0.0 (assertions)
x86-64 clang 10.0.1
x86-64 clang 11.0.0
x86-64 clang 11.0.0 (assertions)
x86-64 clang 11.0.1
x86-64 clang 12.0.0
x86-64 clang 12.0.0 (assertions)
x86-64 clang 12.0.1
x86-64 clang 13.0.0
x86-64 clang 13.0.0 (assertions)
x86-64 clang 13.0.1
x86-64 clang 14.0.0
x86-64 clang 14.0.0 (assertions)
x86-64 clang 15.0.0
x86-64 clang 15.0.0 (assertions)
x86-64 clang 16.0.0
x86-64 clang 16.0.0 (assertions)
x86-64 clang 17.0.1
x86-64 clang 17.0.1 (assertions)
x86-64 clang 18.1.0
x86-64 clang 18.1.0 (assertions)
x86-64 clang 19.1.0
x86-64 clang 19.1.0 (assertions)
x86-64 clang 2.6.0 (assertions)
x86-64 clang 2.7.0 (assertions)
x86-64 clang 2.8.0 (assertions)
x86-64 clang 2.9.0 (assertions)
x86-64 clang 20.1.0
x86-64 clang 20.1.0 (assertions)
x86-64 clang 21.1.0
x86-64 clang 21.1.0 (assertions)
x86-64 clang 3.0.0
x86-64 clang 3.0.0 (assertions)
x86-64 clang 3.1
x86-64 clang 3.1 (assertions)
x86-64 clang 3.2
x86-64 clang 3.2 (assertions)
x86-64 clang 3.3
x86-64 clang 3.3 (assertions)
x86-64 clang 3.4 (assertions)
x86-64 clang 3.4.1
x86-64 clang 3.5
x86-64 clang 3.5 (assertions)
x86-64 clang 3.5.1
x86-64 clang 3.5.2
x86-64 clang 3.6
x86-64 clang 3.6 (assertions)
x86-64 clang 3.7
x86-64 clang 3.7 (assertions)
x86-64 clang 3.7.1
x86-64 clang 3.8
x86-64 clang 3.8 (assertions)
x86-64 clang 3.8.1
x86-64 clang 3.9.0
x86-64 clang 3.9.0 (assertions)
x86-64 clang 3.9.1
x86-64 clang 4.0.0
x86-64 clang 4.0.0 (assertions)
x86-64 clang 4.0.1
x86-64 clang 5.0.0
x86-64 clang 5.0.0 (assertions)
x86-64 clang 5.0.1
x86-64 clang 5.0.2
x86-64 clang 6.0.0
x86-64 clang 6.0.0 (assertions)
x86-64 clang 6.0.1
x86-64 clang 7.0.0
x86-64 clang 7.0.0 (assertions)
x86-64 clang 7.0.1
x86-64 clang 7.1.0
x86-64 clang 8.0.0
x86-64 clang 8.0.0 (assertions)
x86-64 clang 8.0.1
x86-64 clang 9.0.0
x86-64 clang 9.0.0 (assertions)
x86-64 clang 9.0.1
x86-64 clang rocm-4.5.2
x86-64 clang rocm-5.0.2
x86-64 clang rocm-5.1.3
x86-64 clang rocm-5.2.3
x86-64 clang rocm-5.3.3
x86-64 clang rocm-5.7.0
x86-64 clang rocm-6.0.2
x86-64 clang rocm-6.1.2
x86-64 clang rocm-6.2.4
x86-64 clang rocm-6.3.3
x86-64 clang rocm-6.4.0
x86-64 clang rocm-7.0.1
x86-64 gcc (P2034 lambdas)
x86-64 gcc (contract labels)
x86-64 gcc (contracts natural syntax)
x86-64 gcc (contracts)
x86-64 gcc (coroutines)
x86-64 gcc (modules)
x86-64 gcc (trunk)
x86-64 gcc 10.1
x86-64 gcc 10.2
x86-64 gcc 10.3
x86-64 gcc 10.3 (assertions)
x86-64 gcc 10.4
x86-64 gcc 10.4 (assertions)
x86-64 gcc 10.5
x86-64 gcc 10.5 (assertions)
x86-64 gcc 11.1
x86-64 gcc 11.1 (assertions)
x86-64 gcc 11.2
x86-64 gcc 11.2 (assertions)
x86-64 gcc 11.3
x86-64 gcc 11.3 (assertions)
x86-64 gcc 11.4
x86-64 gcc 11.4 (assertions)
x86-64 gcc 12.1
x86-64 gcc 12.1 (assertions)
x86-64 gcc 12.2
x86-64 gcc 12.2 (assertions)
x86-64 gcc 12.3
x86-64 gcc 12.3 (assertions)
x86-64 gcc 12.4
x86-64 gcc 12.4 (assertions)
x86-64 gcc 12.5
x86-64 gcc 12.5 (assertions)
x86-64 gcc 13.1
x86-64 gcc 13.1 (assertions)
x86-64 gcc 13.2
x86-64 gcc 13.2 (assertions)
x86-64 gcc 13.3
x86-64 gcc 13.3 (assertions)
x86-64 gcc 13.4
x86-64 gcc 13.4 (assertions)
x86-64 gcc 14.1
x86-64 gcc 14.1 (assertions)
x86-64 gcc 14.2
x86-64 gcc 14.2 (assertions)
x86-64 gcc 14.3
x86-64 gcc 14.3 (assertions)
x86-64 gcc 15.1
x86-64 gcc 15.1 (assertions)
x86-64 gcc 15.2
x86-64 gcc 15.2 (assertions)
x86-64 gcc 3.4.6
x86-64 gcc 4.0.4
x86-64 gcc 4.1.2
x86-64 gcc 4.4.7
x86-64 gcc 4.5.3
x86-64 gcc 4.6.4
x86-64 gcc 4.7.1
x86-64 gcc 4.7.2
x86-64 gcc 4.7.3
x86-64 gcc 4.7.4
x86-64 gcc 4.8.1
x86-64 gcc 4.8.2
x86-64 gcc 4.8.3
x86-64 gcc 4.8.4
x86-64 gcc 4.8.5
x86-64 gcc 4.9.0
x86-64 gcc 4.9.1
x86-64 gcc 4.9.2
x86-64 gcc 4.9.3
x86-64 gcc 4.9.4
x86-64 gcc 5.1
x86-64 gcc 5.2
x86-64 gcc 5.3
x86-64 gcc 5.4
x86-64 gcc 5.5
x86-64 gcc 6.1
x86-64 gcc 6.2
x86-64 gcc 6.3
x86-64 gcc 6.4
x86-64 gcc 6.5
x86-64 gcc 7.1
x86-64 gcc 7.2
x86-64 gcc 7.3
x86-64 gcc 7.4
x86-64 gcc 7.5
x86-64 gcc 8.1
x86-64 gcc 8.2
x86-64 gcc 8.3
x86-64 gcc 8.4
x86-64 gcc 8.5
x86-64 gcc 9.1
x86-64 gcc 9.2
x86-64 gcc 9.3
x86-64 gcc 9.4
x86-64 gcc 9.5
x86-64 icc 13.0.1
x86-64 icc 16.0.3
x86-64 icc 17.0.0
x86-64 icc 18.0.0
x86-64 icc 19.0.0
x86-64 icc 19.0.1
x86-64 icc 2021.1.2
x86-64 icc 2021.10.0
x86-64 icc 2021.2.0
x86-64 icc 2021.3.0
x86-64 icc 2021.4.0
x86-64 icc 2021.5.0
x86-64 icc 2021.6.0
x86-64 icc 2021.7.0
x86-64 icc 2021.7.1
x86-64 icc 2021.8.0
x86-64 icc 2021.9.0
x86-64 icx 2021.1.2
x86-64 icx 2021.2.0
x86-64 icx 2021.3.0
x86-64 icx 2021.4.0
x86-64 icx 2022.0.0
x86-64 icx 2022.1.0
x86-64 icx 2022.2.0
x86-64 icx 2022.2.1
x86-64 icx 2023.0.0
x86-64 icx 2023.1.0
x86-64 icx 2023.2.1
x86-64 icx 2024.0.0
x86-64 icx 2024.1.0
x86-64 icx 2024.2.0
x86-64 icx 2024.2.1
x86-64 icx 2025.0.0
x86-64 icx 2025.0.1
x86-64 icx 2025.0.3
x86-64 icx 2025.0.4
x86-64 icx 2025.1.0
x86-64 icx 2025.1.1
x86-64 icx 2025.2.0
x86-64 icx 2025.2.1
x86-64 icx 2025.2.1
z180-clang 15.0.0
z180-clang 15.0.7
z80-clang 15.0.0
z80-clang 15.0.7
zig c++ 0.10.0
zig c++ 0.11.0
zig c++ 0.12.0
zig c++ 0.12.1
zig c++ 0.13.0
zig c++ 0.14.0
zig c++ 0.14.1
zig c++ 0.15.1
zig c++ 0.6.0
zig c++ 0.7.0
zig c++ 0.7.1
zig c++ 0.8.0
zig c++ 0.9.0
zig c++ trunk
Options
Source code
#include <stddef.h> #include <stdio.h> #include <stdlib.h> #include <math.h> #include <time.h> typedef struct dbcF_params_f { ptrdiff_t num_elements; const float *src_real,*src_imag; ptrdiff_t src_real_stride,src_imag_stride; float *dst_real,*dst_imag; ptrdiff_t dst_real_stride,dst_imag_stride; float scale; } dbcf_params_f; extern int dbc_fft_fc( ptrdiff_t num_elements, const float *src_real,const float *src_imag, float *dst_real, float *dst_imag, float scale); extern int dbc_ifft_fc( ptrdiff_t num_elements, const float *src_real,const float *src_imag, float *dst_real, float *dst_imag, float scale); extern int dbc_fft_fi( ptrdiff_t num_elements, const float *src, float *dst, float scale); extern int dbc_ifft_fi( ptrdiff_t num_elements, const float *src, float *dst, float scale); extern int dbc_fft_fs( ptrdiff_t num_elements, const float *src_real,const float *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, float *dst_real, float *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, float scale); extern int dbc_ifft_fs( ptrdiff_t num_elements, const float *src_real,const float *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, float *dst_real, float *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, float scale); extern int dbc_fft_fp(dbcf_params_f *params); extern int dbc_ifft_fp(dbcf_params_f *params); typedef struct dbcF_params_d { ptrdiff_t num_elements; const double *src_real,*src_imag; ptrdiff_t src_real_stride,src_imag_stride; double *dst_real,*dst_imag; ptrdiff_t dst_real_stride,dst_imag_stride; double scale; } dbcf_params_d; extern int dbc_fft_dc( ptrdiff_t num_elements, const double *src_real,const double *src_imag, double *dst_real, double *dst_imag, double scale); extern int dbc_ifft_dc( ptrdiff_t num_elements, const double *src_real,const double *src_imag, double *dst_real, double *dst_imag, double scale); extern int dbc_fft_di( ptrdiff_t num_elements, const double *src, double *dst, double scale); extern int dbc_ifft_di( ptrdiff_t num_elements, const double *src, double *dst, double scale); extern int dbc_fft_ds( ptrdiff_t num_elements, const double *src_real,const double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, double *dst_real, double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, double scale); extern int dbc_ifft_ds( ptrdiff_t num_elements, const double *src_real,const double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, double *dst_real, double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, double scale); extern int dbc_fft_dp(dbcf_params_d *params); extern int dbc_ifft_dp(dbcf_params_d *params); typedef struct dbcF_params_l { ptrdiff_t num_elements; const long double *src_real,*src_imag; ptrdiff_t src_real_stride,src_imag_stride; long double *dst_real,*dst_imag; ptrdiff_t dst_real_stride,dst_imag_stride; long double scale; } dbcf_params_l; extern int dbc_fft_lc( ptrdiff_t num_elements, const long double *src_real,const long double *src_imag, long double *dst_real, long double *dst_imag, long double scale); extern int dbc_ifft_lc( ptrdiff_t num_elements, const long double *src_real,const long double *src_imag, long double *dst_real, long double *dst_imag, long double scale); extern int dbc_fft_li( ptrdiff_t num_elements, const long double *src, long double *dst, long double scale); extern int dbc_ifft_li( ptrdiff_t num_elements, const long double *src, long double *dst, long double scale); extern int dbc_fft_ls( ptrdiff_t num_elements, const long double *src_real,const long double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, long double *dst_real, long double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, long double scale); extern int dbc_ifft_ls( ptrdiff_t num_elements, const long double *src_real,const long double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, long double *dst_real, long double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, long double scale); extern int dbc_fft_lp(dbcf_params_l *params); extern int dbc_ifft_lp(dbcf_params_l *params); typedef int dbcF_static_assert_tmp_buf_size[(10)>=2 ?1:-1]; typedef int dbcF_static_assert_Q [((((10)>>1)<6?((10)>>1):6))>=1&&(2*((((10)>>1)<6?((10)>>1):6))<=(10))?1:-1]; static unsigned char dbcF_bitreverse_table[512]= { 0,0,0,0 +1,0,0 +2*1,0 +1,0 +1 +2*1,0,0 +2*2*1,0 +2*1,0 +2*1 +2*2*1,0 +1,0 +1 +2*2*1,0 +1 +2*1,0 +1 +2*1 +2*2*1,0,0 +2*2*2*1,0 +2*2*1,0 +2*2*1 +2*2*2*1,0 +2*1,0 +2*1 +2*2*2*1,0 +2*1 +2*2*1,0 +2*1 +2*2*1 +2*2*2*1,0 +1,0 +1 +2*2*2*1,0 +1 +2*2*1,0 +1 +2*2*1 +2*2*2*1,0 +1 +2*1,0 +1 +2*1 +2*2*2*1,0 +1 +2*1 +2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1,0,0 +2*2*2*2*1,0 +2*2*2*1,0 +2*2*2*1 +2*2*2*2*1,0 +2*2*1,0 +2*2*1 +2*2*2*2*1,0 +2*2*1 +2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +2*1,0 +2*1 +2*2*2*2*1,0 +2*1 +2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1,0 +2*1 +2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +1,0 +1 +2*2*2*2*1,0 +1 +2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*2*1,0 +1 +2*2*1 +2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*1,0 +1 +2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0,0 +2*2*2*2*2*1,0 +2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*1,0 +2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1,0 +2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1,0 +1 +2*2*2*2*2*1,0 +1 +2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1,0 +1 +2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1,0 +1 +2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0,0 +2*2*2*2*2*2*1,0 +2*2*2*2*2*1,0 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1,0 +2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1,0 +2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1,0 +1 +2*2*2*2*2*2*1,0 +1 +2*2*2*2*2*1,0 +1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1,0 +1 +2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1,0 +1 +2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0,0 +2*2*2*2*2*2*2*1,0 +2*2*2*2*2*2*1,0 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*2*2*1,0 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1,0 +2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1,0 +2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1,0 +1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*2*2*2*1,0 +1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*2*2*1,0 +1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1,0 +1 +2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1,0 +1 +2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1,0 +1 +2*1 +2*2*1 +2*2*2*1 +2*2*2*2*1 +2*2*2*2*2*1 +2*2*2*2*2*2*1 +2*2*2*2*2*2*2*1 }; static ptrdiff_t dbcF_bitreverse(ptrdiff_t i,ptrdiff_t bits) { if(bits<=8) return (dbcF_bitreverse_table+(((ptrdiff_t)1)<<(bits)))[i]; return ((ptrdiff_t)((dbcF_bitreverse_table+256)[i&255])<<(bits-8))^dbcF_bitreverse(i>>8,bits-8); } static int dbcF_has_cpuid(void) { return 1; } static void dbcF_cpuid(int level,int sublevel,unsigned *eax,unsigned *ebx,unsigned *ecx,unsigned *edx) { unsigned a,b,c,d; __asm__ __volatile__( "cpuid\n\t" :"=a"(a),"=b"(b),"=c"(c),"=d"(d) :"0"(level),"2"(sublevel)); *eax=a; *ebx=b; *ecx=c; *edx=d; } static unsigned dbcF_xgetbv(unsigned level) { unsigned ret=0; unsigned eax,edx; __asm__ __volatile__("xgetbv\n\t":"=a"(eax),"=d"(edx):"c"(level)); ret=eax; return ret; } static int dbcF_detect_simd(void) { int ret=0; ret|=1 |2; if(dbcF_has_cpuid()) { unsigned eax,ebx,ecx,edx; unsigned maxlevel; dbcF_cpuid(0,0,&maxlevel,&ebx,&ecx,&edx); if(maxlevel>0) { dbcF_cpuid(1,0,&eax,&ebx,&ecx,&edx); if(edx&0x04000000u) ret|=1 |2; if(ecx&0x18000000u) { unsigned xcr0=dbcF_xgetbv(0); if((xcr0&0x6)==0x6) { ret|=4 |8; if(maxlevel>=7) { dbcF_cpuid(7,0,&eax,&ebx,&ecx,&edx); if((xcr0&0xE6)==0xE6) { if(ebx&0x00010000u) ret|=16|32; } } } } } } return ret; } static int dbcf_detect_simd(void) { static int cached= -1; if(cached==-1) cached= dbcF_detect_simd(); return cached; } typedef float dbcf_simd4f __attribute__((vector_size(16))); typedef double dbcf_simd2d __attribute__((vector_size(16))); typedef float dbcf_simd8f __attribute__((vector_size(32))); typedef double dbcf_simd4d __attribute__((vector_size(32))); typedef float dbcf_simd16f __attribute__((vector_size(64))); typedef double dbcf_simd8d __attribute__((vector_size(64))); static dbcf_simd4f dbcF_load4f(const void *p) {dbcf_simd4f ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,1),sizeof(ret));return ret;} static void dbcF_store4f(dbcf_simd4f v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,1),&v,sizeof(v));} static dbcf_simd4f dbcF_load4f_aligned(const void *p) {dbcf_simd4f ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,16),sizeof(ret));return ret;} static void dbcF_store4f_aligned(dbcf_simd4f v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,16),&v,sizeof(v));} static dbcf_simd4f dbcF_set4f(float v0,float v1,float v2,float v3) {return (__extension__ (dbcf_simd4f){v0,v1,v2,v3});} static dbcf_simd4f dbcF_fill4f(float v) {return dbcF_set4f(v,v,v,v);} static dbcf_simd4f dbcF_add4f(dbcf_simd4f l,dbcf_simd4f r) {return l+r;} static dbcf_simd4f dbcF_sub4f(dbcf_simd4f l,dbcf_simd4f r) {return l-r;} static dbcf_simd4f dbcF_mul4f(dbcf_simd4f l,dbcf_simd4f r) {return l*r;} static dbcf_simd2d dbcF_load2d(const void *p) {dbcf_simd2d ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,1),sizeof(ret));return ret;} static void dbcF_store2d(dbcf_simd2d v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,1),&v,sizeof(v));} static dbcf_simd2d dbcF_load2d_aligned(const void *p) {dbcf_simd2d ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,16),sizeof(ret));return ret;} static void dbcF_store2d_aligned(dbcf_simd2d v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,16),&v,sizeof(v));} static dbcf_simd2d dbcF_set2d(double v0,double v1) {return (__extension__ (dbcf_simd2d){v0,v1});} static dbcf_simd2d dbcF_fill2d(double v) {return dbcF_set2d(v,v);} static dbcf_simd2d dbcF_add2d(dbcf_simd2d l,dbcf_simd2d r) {return l+r;} static dbcf_simd2d dbcF_sub2d(dbcf_simd2d l,dbcf_simd2d r) {return l-r;} static dbcf_simd2d dbcF_mul2d(dbcf_simd2d l,dbcf_simd2d r) {return l*r;} __attribute__((target("avx"))) static dbcf_simd8f dbcF_load8f(const void *p) {dbcf_simd8f ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,1),sizeof(ret));return ret;} __attribute__((target("avx"))) static void dbcF_store8f(dbcf_simd8f v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,1),&v,sizeof(v));} __attribute__((target("avx"))) static dbcf_simd8f dbcF_load8f_aligned(const void *p) {dbcf_simd8f ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,32),sizeof(ret));return ret;} __attribute__((target("avx"))) static void dbcF_store8f_aligned(dbcf_simd8f v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,32),&v,sizeof(v));} __attribute__((target("avx"))) static dbcf_simd8f dbcF_set8f(float v0,float v1,float v2,float v3,float v4,float v5,float v6,float v7) {return (__extension__ (dbcf_simd8f){v0,v1,v2,v3,v4,v5,v6,v7});} __attribute__((target("avx"))) static dbcf_simd8f dbcF_fill8f(float v) {return dbcF_set8f(v,v,v,v,v,v,v,v);} __attribute__((target("avx"))) static dbcf_simd8f dbcF_add8f(dbcf_simd8f l,dbcf_simd8f r) {return l+r;} __attribute__((target("avx"))) static dbcf_simd8f dbcF_sub8f(dbcf_simd8f l,dbcf_simd8f r) {return l-r;} __attribute__((target("avx"))) static dbcf_simd8f dbcF_mul8f(dbcf_simd8f l,dbcf_simd8f r) {return l*r;} __attribute__((target("avx"))) static dbcf_simd4d dbcF_load4d(const void *p) {dbcf_simd4d ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,1),sizeof(ret));return ret;} __attribute__((target("avx"))) static void dbcF_store4d(dbcf_simd4d v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,1),&v,sizeof(v));} __attribute__((target("avx"))) static dbcf_simd4d dbcF_load4d_aligned(const void *p) {dbcf_simd4d ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,32),sizeof(ret));return ret;} __attribute__((target("avx"))) static void dbcF_store4d_aligned(dbcf_simd4d v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,32),&v,sizeof(v));} __attribute__((target("avx"))) static dbcf_simd4d dbcF_set4d(double v0,double v1,double v2,double v3) {return (__extension__ (dbcf_simd4d){v0,v1,v2,v3});} __attribute__((target("avx"))) static dbcf_simd4d dbcF_fill4d(double v) {return dbcF_set4d(v,v,v,v);} __attribute__((target("avx"))) static dbcf_simd4d dbcF_add4d(dbcf_simd4d l,dbcf_simd4d r) {return l+r;} __attribute__((target("avx"))) static dbcf_simd4d dbcF_sub4d(dbcf_simd4d l,dbcf_simd4d r) {return l-r;} __attribute__((target("avx"))) static dbcf_simd4d dbcF_mul4d(dbcf_simd4d l,dbcf_simd4d r) {return l*r;} __attribute__((target("avx512f"))) static dbcf_simd16f dbcF_load16f(const void *p) {dbcf_simd16f ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,1),sizeof(ret));return ret;} __attribute__((target("avx512f"))) static void dbcF_store16f(dbcf_simd16f v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,1),&v,sizeof(v));} __attribute__((target("avx512f"))) static dbcf_simd16f dbcF_load16f_aligned(const void *p) {dbcf_simd16f ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,64),sizeof(ret));return ret;} __attribute__((target("avx512f"))) static void dbcF_store16f_aligned(dbcf_simd16f v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,64),&v,sizeof(v));} __attribute__((target("avx512f"))) static dbcf_simd16f dbcF_set16f(float v0,float v1,float v2,float v3,float v4,float v5,float v6,float v7,float v8,float v9,float v10,float v11,float v12,float v13,float v14,float v15) {return (__extension__ (dbcf_simd16f){v0,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10,v11,v12,v13,v14,v15});} __attribute__((target("avx512f"))) static dbcf_simd16f dbcF_fill16f(float v) {return dbcF_set16f(v,v,v,v,v,v,v,v,v,v,v,v,v,v,v,v);} __attribute__((target("avx512f"))) static dbcf_simd16f dbcF_add16f(dbcf_simd16f l,dbcf_simd16f r) {return l+r;} __attribute__((target("avx512f"))) static dbcf_simd16f dbcF_sub16f(dbcf_simd16f l,dbcf_simd16f r) {return l-r;} __attribute__((target("avx512f"))) static dbcf_simd16f dbcF_mul16f(dbcf_simd16f l,dbcf_simd16f r) {return l*r;} __attribute__((target("avx512f"))) static dbcf_simd8d dbcF_load8d(const void *p) {dbcf_simd8d ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,1),sizeof(ret));return ret;} __attribute__((target("avx512f"))) static void dbcF_store8d(dbcf_simd8d v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,1),&v,sizeof(v));} __attribute__((target("avx512f"))) static dbcf_simd8d dbcF_load8d_aligned(const void *p) {dbcf_simd8d ret;__builtin_memcpy(&ret,__builtin_assume_aligned(p,64),sizeof(ret));return ret;} __attribute__((target("avx512f"))) static void dbcF_store8d_aligned(dbcf_simd8d v,void *p) {__builtin_memcpy(__builtin_assume_aligned(p,64),&v,sizeof(v));} __attribute__((target("avx512f"))) static dbcf_simd8d dbcF_set8d(double v0,double v1,double v2,double v3,double v4,double v5,double v6,double v7) {return (__extension__ (dbcf_simd8d){v0,v1,v2,v3,v4,v5,v6,v7});} __attribute__((target("avx512f"))) static dbcf_simd8d dbcF_fill8d(double v) {return dbcF_set8d(v,v,v,v,v,v,v,v);} __attribute__((target("avx512f"))) static dbcf_simd8d dbcF_add8d(dbcf_simd8d l,dbcf_simd8d r) {return l+r;} __attribute__((target("avx512f"))) static dbcf_simd8d dbcF_sub8d(dbcf_simd8d l,dbcf_simd8d r) {return l-r;} __attribute__((target("avx512f"))) static dbcf_simd8d dbcF_mul8d(dbcf_simd8d l,dbcf_simd8d r) {return l*r;} static void dbcF_cexpm1_f(ptrdiff_t log2n,float *real,float *imag); static void dbcF_cexp_f(ptrdiff_t log2n,float *real,float *imag); static void dbcF_fft8_f(float *real,float *imag,ptrdiff_t real_stride,ptrdiff_t imag_stride,int inverse,float c); static void dbcF_cexpm1_d(ptrdiff_t log2n,double *real,double *imag); static void dbcF_cexp_d(ptrdiff_t log2n,double *real,double *imag); static void dbcF_fft8_d(double *real,double *imag,ptrdiff_t real_stride,ptrdiff_t imag_stride,int inverse,double c); static void dbcF_butterfly_block_4f_uu( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd4f CC=dbcF_fill4f(C),SS=dbcF_fill4f(S); for(i=0;i<b;i+=4) { dbcf_simd4f TR=dbcF_load4f(tr+i),TI=dbcF_load4f(ti+i); dbcf_simd4f c=dbcF_sub4f(dbcF_mul4f(CC,TR),dbcF_mul4f(SS,TI)),s=dbcF_add4f(dbcF_mul4f(SS,TR),dbcF_mul4f(CC,TI)); dbcf_simd4f xl=dbcF_load4f(LR+i),yl=dbcF_load4f(LI+i); dbcf_simd4f xr=dbcF_load4f(HR+i),yr=dbcF_load4f(HI+i); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr)),y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f(dbcF_add4f(xl,x),LR+i);dbcF_store4f(dbcF_add4f(yl,y),LI+i); dbcF_store4f(dbcF_sub4f(xl,x),HR+i);dbcF_store4f(dbcF_sub4f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_4f_uu(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_4f_uu(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_block_4f_au( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd4f CC=dbcF_fill4f(C),SS=dbcF_fill4f(S); for(i=0;i<b;i+=4) { dbcf_simd4f TR=dbcF_load4f_aligned(tr+i),TI=dbcF_load4f_aligned(ti+i); dbcf_simd4f c=dbcF_sub4f(dbcF_mul4f(CC,TR),dbcF_mul4f(SS,TI)),s=dbcF_add4f(dbcF_mul4f(SS,TR),dbcF_mul4f(CC,TI)); dbcf_simd4f xl=dbcF_load4f(LR+i),yl=dbcF_load4f(LI+i); dbcf_simd4f xr=dbcF_load4f(HR+i),yr=dbcF_load4f(HI+i); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr)),y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f(dbcF_add4f(xl,x),LR+i);dbcF_store4f(dbcF_add4f(yl,y),LI+i); dbcF_store4f(dbcF_sub4f(xl,x),HR+i);dbcF_store4f(dbcF_sub4f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_4f_au(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_4f_au(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_block_4f_ua( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd4f CC=dbcF_fill4f(C),SS=dbcF_fill4f(S); for(i=0;i<b;i+=4) { dbcf_simd4f TR=dbcF_load4f(tr+i),TI=dbcF_load4f(ti+i); dbcf_simd4f c=dbcF_sub4f(dbcF_mul4f(CC,TR),dbcF_mul4f(SS,TI)),s=dbcF_add4f(dbcF_mul4f(SS,TR),dbcF_mul4f(CC,TI)); dbcf_simd4f xl=dbcF_load4f_aligned(LR+i),yl=dbcF_load4f_aligned(LI+i); dbcf_simd4f xr=dbcF_load4f_aligned(HR+i),yr=dbcF_load4f_aligned(HI+i); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr)),y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f_aligned(dbcF_add4f(xl,x),LR+i);dbcF_store4f_aligned(dbcF_add4f(yl,y),LI+i); dbcF_store4f_aligned(dbcF_sub4f(xl,x),HR+i);dbcF_store4f_aligned(dbcF_sub4f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_4f_ua(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_4f_ua(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_block_4f_aa( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd4f CC=dbcF_fill4f(C),SS=dbcF_fill4f(S); for(i=0;i<b;i+=4) { dbcf_simd4f TR=dbcF_load4f_aligned(tr+i),TI=dbcF_load4f_aligned(ti+i); dbcf_simd4f c=dbcF_sub4f(dbcF_mul4f(CC,TR),dbcF_mul4f(SS,TI)),s=dbcF_add4f(dbcF_mul4f(SS,TR),dbcF_mul4f(CC,TI)); dbcf_simd4f xl=dbcF_load4f_aligned(LR+i),yl=dbcF_load4f_aligned(LI+i); dbcf_simd4f xr=dbcF_load4f_aligned(HR+i),yr=dbcF_load4f_aligned(HI+i); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr)),y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f_aligned(dbcF_add4f(xl,x),LR+i);dbcF_store4f_aligned(dbcF_add4f(yl,y),LI+i); dbcF_store4f_aligned(dbcF_sub4f(xl,x),HR+i);dbcF_store4f_aligned(dbcF_sub4f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_4f_aa(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_4f_aa(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_pass_4f_uu( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>4) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd4f c=dbcF_load4f(tr+d),s=dbcF_load4f(ti+d); dbcf_simd4f xl=dbcF_load4f(LR+d),yl=dbcF_load4f(LI+d); dbcf_simd4f xr=dbcF_load4f(HR+d),yr=dbcF_load4f(HI+d); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr)),y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f(dbcF_add4f(xl,x),LR+d);dbcF_store4f(dbcF_add4f(yl,y),LI+d); dbcF_store4f(dbcF_sub4f(xl,x),HR+d);dbcF_store4f(dbcF_sub4f(yl,y),HI+d); d+=4; c=dbcF_load4f(tr+d);s=dbcF_load4f(ti+d); xl=dbcF_load4f(LR+d);yl=dbcF_load4f(LI+d); xr=dbcF_load4f(HR+d);yr=dbcF_load4f(HI+d); x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr));y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f(dbcF_add4f(xl,x),LR+d);dbcF_store4f(dbcF_add4f(yl,y),LI+d); dbcF_store4f(dbcF_sub4f(xl,x),HR+d);dbcF_store4f(dbcF_sub4f(yl,y),HI+d); d+=4; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd4f C=dbcF_load4f(tr),S=dbcF_load4f(ti); dbcf_simd4f xl=dbcF_load4f(LR),yl=dbcF_load4f(LI); dbcf_simd4f xr=dbcF_load4f(HR),yr=dbcF_load4f(HI); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(C,xr),dbcF_mul4f(S,yr)),y=dbcF_add4f(dbcF_mul4f(S,xr),dbcF_mul4f(C,yr)); dbcF_store4f(dbcF_add4f(xl,x),LR);dbcF_store4f(dbcF_add4f(yl,y),LI); dbcF_store4f(dbcF_sub4f(xl,x),HR);dbcF_store4f(dbcF_sub4f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_4f_uu(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } static void dbcF_butterfly_pass_4f_au( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>4) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd4f c=dbcF_load4f_aligned(tr+d),s=dbcF_load4f_aligned(ti+d); dbcf_simd4f xl=dbcF_load4f(LR+d),yl=dbcF_load4f(LI+d); dbcf_simd4f xr=dbcF_load4f(HR+d),yr=dbcF_load4f(HI+d); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr)),y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f(dbcF_add4f(xl,x),LR+d);dbcF_store4f(dbcF_add4f(yl,y),LI+d); dbcF_store4f(dbcF_sub4f(xl,x),HR+d);dbcF_store4f(dbcF_sub4f(yl,y),HI+d); d+=4; c=dbcF_load4f_aligned(tr+d);s=dbcF_load4f_aligned(ti+d); xl=dbcF_load4f(LR+d);yl=dbcF_load4f(LI+d); xr=dbcF_load4f(HR+d);yr=dbcF_load4f(HI+d); x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr));y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f(dbcF_add4f(xl,x),LR+d);dbcF_store4f(dbcF_add4f(yl,y),LI+d); dbcF_store4f(dbcF_sub4f(xl,x),HR+d);dbcF_store4f(dbcF_sub4f(yl,y),HI+d); d+=4; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd4f C=dbcF_load4f_aligned(tr),S=dbcF_load4f_aligned(ti); dbcf_simd4f xl=dbcF_load4f(LR),yl=dbcF_load4f(LI); dbcf_simd4f xr=dbcF_load4f(HR),yr=dbcF_load4f(HI); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(C,xr),dbcF_mul4f(S,yr)),y=dbcF_add4f(dbcF_mul4f(S,xr),dbcF_mul4f(C,yr)); dbcF_store4f(dbcF_add4f(xl,x),LR);dbcF_store4f(dbcF_add4f(yl,y),LI); dbcF_store4f(dbcF_sub4f(xl,x),HR);dbcF_store4f(dbcF_sub4f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_4f_au(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } static void dbcF_butterfly_pass_4f_ua( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>4) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd4f c=dbcF_load4f(tr+d),s=dbcF_load4f(ti+d); dbcf_simd4f xl=dbcF_load4f_aligned(LR+d),yl=dbcF_load4f_aligned(LI+d); dbcf_simd4f xr=dbcF_load4f_aligned(HR+d),yr=dbcF_load4f_aligned(HI+d); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr)),y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f_aligned(dbcF_add4f(xl,x),LR+d);dbcF_store4f_aligned(dbcF_add4f(yl,y),LI+d); dbcF_store4f_aligned(dbcF_sub4f(xl,x),HR+d);dbcF_store4f_aligned(dbcF_sub4f(yl,y),HI+d); d+=4; c=dbcF_load4f(tr+d);s=dbcF_load4f(ti+d); xl=dbcF_load4f_aligned(LR+d);yl=dbcF_load4f_aligned(LI+d); xr=dbcF_load4f_aligned(HR+d);yr=dbcF_load4f_aligned(HI+d); x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr));y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f_aligned(dbcF_add4f(xl,x),LR+d);dbcF_store4f_aligned(dbcF_add4f(yl,y),LI+d); dbcF_store4f_aligned(dbcF_sub4f(xl,x),HR+d);dbcF_store4f_aligned(dbcF_sub4f(yl,y),HI+d); d+=4; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd4f C=dbcF_load4f(tr),S=dbcF_load4f(ti); dbcf_simd4f xl=dbcF_load4f_aligned(LR),yl=dbcF_load4f_aligned(LI); dbcf_simd4f xr=dbcF_load4f_aligned(HR),yr=dbcF_load4f_aligned(HI); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(C,xr),dbcF_mul4f(S,yr)),y=dbcF_add4f(dbcF_mul4f(S,xr),dbcF_mul4f(C,yr)); dbcF_store4f_aligned(dbcF_add4f(xl,x),LR);dbcF_store4f_aligned(dbcF_add4f(yl,y),LI); dbcF_store4f_aligned(dbcF_sub4f(xl,x),HR);dbcF_store4f_aligned(dbcF_sub4f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_4f_ua(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } static void dbcF_butterfly_pass_4f_aa( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>4) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd4f c=dbcF_load4f_aligned(tr+d),s=dbcF_load4f_aligned(ti+d); dbcf_simd4f xl=dbcF_load4f_aligned(LR+d),yl=dbcF_load4f_aligned(LI+d); dbcf_simd4f xr=dbcF_load4f_aligned(HR+d),yr=dbcF_load4f_aligned(HI+d); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr)),y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f_aligned(dbcF_add4f(xl,x),LR+d);dbcF_store4f_aligned(dbcF_add4f(yl,y),LI+d); dbcF_store4f_aligned(dbcF_sub4f(xl,x),HR+d);dbcF_store4f_aligned(dbcF_sub4f(yl,y),HI+d); d+=4; c=dbcF_load4f_aligned(tr+d);s=dbcF_load4f_aligned(ti+d); xl=dbcF_load4f_aligned(LR+d);yl=dbcF_load4f_aligned(LI+d); xr=dbcF_load4f_aligned(HR+d);yr=dbcF_load4f_aligned(HI+d); x=dbcF_sub4f(dbcF_mul4f(c,xr),dbcF_mul4f(s,yr));y=dbcF_add4f(dbcF_mul4f(s,xr),dbcF_mul4f(c,yr)); dbcF_store4f_aligned(dbcF_add4f(xl,x),LR+d);dbcF_store4f_aligned(dbcF_add4f(yl,y),LI+d); dbcF_store4f_aligned(dbcF_sub4f(xl,x),HR+d);dbcF_store4f_aligned(dbcF_sub4f(yl,y),HI+d); d+=4; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd4f C=dbcF_load4f_aligned(tr),S=dbcF_load4f_aligned(ti); dbcf_simd4f xl=dbcF_load4f_aligned(LR),yl=dbcF_load4f_aligned(LI); dbcf_simd4f xr=dbcF_load4f_aligned(HR),yr=dbcF_load4f_aligned(HI); dbcf_simd4f x=dbcF_sub4f(dbcF_mul4f(C,xr),dbcF_mul4f(S,yr)),y=dbcF_add4f(dbcF_mul4f(S,xr),dbcF_mul4f(C,yr)); dbcF_store4f_aligned(dbcF_add4f(xl,x),LR);dbcF_store4f_aligned(dbcF_add4f(yl,y),LI); dbcF_store4f_aligned(dbcF_sub4f(xl,x),HR);dbcF_store4f_aligned(dbcF_sub4f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_4f_aa(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } static void dbcF_compute_twiddles_4f_u(ptrdiff_t log2n,ptrdiff_t log2b,float *real,float *imag,int inverse) { ptrdiff_t i; real[0]=0.0f; imag[0]=0.0f; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); float x,y; dbcF_cexpm1_f(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=4) { dbcf_simd4f X=dbcF_fill4f(x); dbcf_simd4f Y=dbcF_fill4f(y); for(j=0;j<k;j+=4) { dbcf_simd4f R=dbcF_load4f(real+j); dbcf_simd4f I=dbcF_load4f(imag+j); dbcF_store4f(dbcF_add4f(dbcF_sub4f(dbcF_mul4f(X,R),dbcF_mul4f(Y,I)),dbcF_add4f(X,R)),real+k+j); dbcF_store4f(dbcF_add4f(dbcF_add4f(dbcF_mul4f(Y,R),dbcF_mul4f(X,I)),dbcF_add4f(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0f +real[i]; } static void dbcF_compute_twiddles_4f_a(ptrdiff_t log2n,ptrdiff_t log2b,float *real,float *imag,int inverse) { ptrdiff_t i; real[0]=0.0f; imag[0]=0.0f; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); float x,y; dbcF_cexpm1_f(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=4) { dbcf_simd4f X=dbcF_fill4f(x); dbcf_simd4f Y=dbcF_fill4f(y); for(j=0;j<k;j+=4) { dbcf_simd4f R=dbcF_load4f_aligned(real+j); dbcf_simd4f I=dbcF_load4f_aligned(imag+j); dbcF_store4f_aligned(dbcF_add4f(dbcF_sub4f(dbcF_mul4f(X,R),dbcF_mul4f(Y,I)),dbcF_add4f(X,R)),real+k+j); dbcF_store4f_aligned(dbcF_add4f(dbcF_add4f(dbcF_mul4f(Y,R),dbcF_mul4f(X,I)),dbcF_add4f(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0f +real[i]; } static void dbcF_fft8_4f(float *real,float *imag,int inverse) {dbcF_fft8_f(real,imag,1,1,inverse,0.70710678118654752438f);} static void dbcF_butterfly_block_2d_uu( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd2d CC=dbcF_fill2d(C),SS=dbcF_fill2d(S); for(i=0;i<b;i+=2) { dbcf_simd2d TR=dbcF_load2d(tr+i),TI=dbcF_load2d(ti+i); dbcf_simd2d c=dbcF_sub2d(dbcF_mul2d(CC,TR),dbcF_mul2d(SS,TI)),s=dbcF_add2d(dbcF_mul2d(SS,TR),dbcF_mul2d(CC,TI)); dbcf_simd2d xl=dbcF_load2d(LR+i),yl=dbcF_load2d(LI+i); dbcf_simd2d xr=dbcF_load2d(HR+i),yr=dbcF_load2d(HI+i); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr)),y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d(dbcF_add2d(xl,x),LR+i);dbcF_store2d(dbcF_add2d(yl,y),LI+i); dbcF_store2d(dbcF_sub2d(xl,x),HR+i);dbcF_store2d(dbcF_sub2d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_2d_uu(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_2d_uu(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_block_2d_au( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd2d CC=dbcF_fill2d(C),SS=dbcF_fill2d(S); for(i=0;i<b;i+=2) { dbcf_simd2d TR=dbcF_load2d_aligned(tr+i),TI=dbcF_load2d_aligned(ti+i); dbcf_simd2d c=dbcF_sub2d(dbcF_mul2d(CC,TR),dbcF_mul2d(SS,TI)),s=dbcF_add2d(dbcF_mul2d(SS,TR),dbcF_mul2d(CC,TI)); dbcf_simd2d xl=dbcF_load2d(LR+i),yl=dbcF_load2d(LI+i); dbcf_simd2d xr=dbcF_load2d(HR+i),yr=dbcF_load2d(HI+i); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr)),y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d(dbcF_add2d(xl,x),LR+i);dbcF_store2d(dbcF_add2d(yl,y),LI+i); dbcF_store2d(dbcF_sub2d(xl,x),HR+i);dbcF_store2d(dbcF_sub2d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_2d_au(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_2d_au(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_block_2d_ua( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd2d CC=dbcF_fill2d(C),SS=dbcF_fill2d(S); for(i=0;i<b;i+=2) { dbcf_simd2d TR=dbcF_load2d(tr+i),TI=dbcF_load2d(ti+i); dbcf_simd2d c=dbcF_sub2d(dbcF_mul2d(CC,TR),dbcF_mul2d(SS,TI)),s=dbcF_add2d(dbcF_mul2d(SS,TR),dbcF_mul2d(CC,TI)); dbcf_simd2d xl=dbcF_load2d_aligned(LR+i),yl=dbcF_load2d_aligned(LI+i); dbcf_simd2d xr=dbcF_load2d_aligned(HR+i),yr=dbcF_load2d_aligned(HI+i); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr)),y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d_aligned(dbcF_add2d(xl,x),LR+i);dbcF_store2d_aligned(dbcF_add2d(yl,y),LI+i); dbcF_store2d_aligned(dbcF_sub2d(xl,x),HR+i);dbcF_store2d_aligned(dbcF_sub2d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_2d_ua(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_2d_ua(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_block_2d_aa( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd2d CC=dbcF_fill2d(C),SS=dbcF_fill2d(S); for(i=0;i<b;i+=2) { dbcf_simd2d TR=dbcF_load2d_aligned(tr+i),TI=dbcF_load2d_aligned(ti+i); dbcf_simd2d c=dbcF_sub2d(dbcF_mul2d(CC,TR),dbcF_mul2d(SS,TI)),s=dbcF_add2d(dbcF_mul2d(SS,TR),dbcF_mul2d(CC,TI)); dbcf_simd2d xl=dbcF_load2d_aligned(LR+i),yl=dbcF_load2d_aligned(LI+i); dbcf_simd2d xr=dbcF_load2d_aligned(HR+i),yr=dbcF_load2d_aligned(HI+i); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr)),y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d_aligned(dbcF_add2d(xl,x),LR+i);dbcF_store2d_aligned(dbcF_add2d(yl,y),LI+i); dbcF_store2d_aligned(dbcF_sub2d(xl,x),HR+i);dbcF_store2d_aligned(dbcF_sub2d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_2d_aa(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_2d_aa(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_pass_2d_uu( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>2) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd2d c=dbcF_load2d(tr+d),s=dbcF_load2d(ti+d); dbcf_simd2d xl=dbcF_load2d(LR+d),yl=dbcF_load2d(LI+d); dbcf_simd2d xr=dbcF_load2d(HR+d),yr=dbcF_load2d(HI+d); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr)),y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d(dbcF_add2d(xl,x),LR+d);dbcF_store2d(dbcF_add2d(yl,y),LI+d); dbcF_store2d(dbcF_sub2d(xl,x),HR+d);dbcF_store2d(dbcF_sub2d(yl,y),HI+d); d+=2; c=dbcF_load2d(tr+d);s=dbcF_load2d(ti+d); xl=dbcF_load2d(LR+d);yl=dbcF_load2d(LI+d); xr=dbcF_load2d(HR+d);yr=dbcF_load2d(HI+d); x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr));y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d(dbcF_add2d(xl,x),LR+d);dbcF_store2d(dbcF_add2d(yl,y),LI+d); dbcF_store2d(dbcF_sub2d(xl,x),HR+d);dbcF_store2d(dbcF_sub2d(yl,y),HI+d); d+=2; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd2d C=dbcF_load2d(tr),S=dbcF_load2d(ti); dbcf_simd2d xl=dbcF_load2d(LR),yl=dbcF_load2d(LI); dbcf_simd2d xr=dbcF_load2d(HR),yr=dbcF_load2d(HI); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(C,xr),dbcF_mul2d(S,yr)),y=dbcF_add2d(dbcF_mul2d(S,xr),dbcF_mul2d(C,yr)); dbcF_store2d(dbcF_add2d(xl,x),LR);dbcF_store2d(dbcF_add2d(yl,y),LI); dbcF_store2d(dbcF_sub2d(xl,x),HR);dbcF_store2d(dbcF_sub2d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_2d_uu(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } static void dbcF_butterfly_pass_2d_au( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>2) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd2d c=dbcF_load2d_aligned(tr+d),s=dbcF_load2d_aligned(ti+d); dbcf_simd2d xl=dbcF_load2d(LR+d),yl=dbcF_load2d(LI+d); dbcf_simd2d xr=dbcF_load2d(HR+d),yr=dbcF_load2d(HI+d); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr)),y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d(dbcF_add2d(xl,x),LR+d);dbcF_store2d(dbcF_add2d(yl,y),LI+d); dbcF_store2d(dbcF_sub2d(xl,x),HR+d);dbcF_store2d(dbcF_sub2d(yl,y),HI+d); d+=2; c=dbcF_load2d_aligned(tr+d);s=dbcF_load2d_aligned(ti+d); xl=dbcF_load2d(LR+d);yl=dbcF_load2d(LI+d); xr=dbcF_load2d(HR+d);yr=dbcF_load2d(HI+d); x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr));y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d(dbcF_add2d(xl,x),LR+d);dbcF_store2d(dbcF_add2d(yl,y),LI+d); dbcF_store2d(dbcF_sub2d(xl,x),HR+d);dbcF_store2d(dbcF_sub2d(yl,y),HI+d); d+=2; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd2d C=dbcF_load2d_aligned(tr),S=dbcF_load2d_aligned(ti); dbcf_simd2d xl=dbcF_load2d(LR),yl=dbcF_load2d(LI); dbcf_simd2d xr=dbcF_load2d(HR),yr=dbcF_load2d(HI); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(C,xr),dbcF_mul2d(S,yr)),y=dbcF_add2d(dbcF_mul2d(S,xr),dbcF_mul2d(C,yr)); dbcF_store2d(dbcF_add2d(xl,x),LR);dbcF_store2d(dbcF_add2d(yl,y),LI); dbcF_store2d(dbcF_sub2d(xl,x),HR);dbcF_store2d(dbcF_sub2d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_2d_au(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } static void dbcF_butterfly_pass_2d_ua( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>2) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd2d c=dbcF_load2d(tr+d),s=dbcF_load2d(ti+d); dbcf_simd2d xl=dbcF_load2d_aligned(LR+d),yl=dbcF_load2d_aligned(LI+d); dbcf_simd2d xr=dbcF_load2d_aligned(HR+d),yr=dbcF_load2d_aligned(HI+d); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr)),y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d_aligned(dbcF_add2d(xl,x),LR+d);dbcF_store2d_aligned(dbcF_add2d(yl,y),LI+d); dbcF_store2d_aligned(dbcF_sub2d(xl,x),HR+d);dbcF_store2d_aligned(dbcF_sub2d(yl,y),HI+d); d+=2; c=dbcF_load2d(tr+d);s=dbcF_load2d(ti+d); xl=dbcF_load2d_aligned(LR+d);yl=dbcF_load2d_aligned(LI+d); xr=dbcF_load2d_aligned(HR+d);yr=dbcF_load2d_aligned(HI+d); x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr));y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d_aligned(dbcF_add2d(xl,x),LR+d);dbcF_store2d_aligned(dbcF_add2d(yl,y),LI+d); dbcF_store2d_aligned(dbcF_sub2d(xl,x),HR+d);dbcF_store2d_aligned(dbcF_sub2d(yl,y),HI+d); d+=2; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd2d C=dbcF_load2d(tr),S=dbcF_load2d(ti); dbcf_simd2d xl=dbcF_load2d_aligned(LR),yl=dbcF_load2d_aligned(LI); dbcf_simd2d xr=dbcF_load2d_aligned(HR),yr=dbcF_load2d_aligned(HI); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(C,xr),dbcF_mul2d(S,yr)),y=dbcF_add2d(dbcF_mul2d(S,xr),dbcF_mul2d(C,yr)); dbcF_store2d_aligned(dbcF_add2d(xl,x),LR);dbcF_store2d_aligned(dbcF_add2d(yl,y),LI); dbcF_store2d_aligned(dbcF_sub2d(xl,x),HR);dbcF_store2d_aligned(dbcF_sub2d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_2d_ua(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } static void dbcF_butterfly_pass_2d_aa( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>2) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd2d c=dbcF_load2d_aligned(tr+d),s=dbcF_load2d_aligned(ti+d); dbcf_simd2d xl=dbcF_load2d_aligned(LR+d),yl=dbcF_load2d_aligned(LI+d); dbcf_simd2d xr=dbcF_load2d_aligned(HR+d),yr=dbcF_load2d_aligned(HI+d); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr)),y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d_aligned(dbcF_add2d(xl,x),LR+d);dbcF_store2d_aligned(dbcF_add2d(yl,y),LI+d); dbcF_store2d_aligned(dbcF_sub2d(xl,x),HR+d);dbcF_store2d_aligned(dbcF_sub2d(yl,y),HI+d); d+=2; c=dbcF_load2d_aligned(tr+d);s=dbcF_load2d_aligned(ti+d); xl=dbcF_load2d_aligned(LR+d);yl=dbcF_load2d_aligned(LI+d); xr=dbcF_load2d_aligned(HR+d);yr=dbcF_load2d_aligned(HI+d); x=dbcF_sub2d(dbcF_mul2d(c,xr),dbcF_mul2d(s,yr));y=dbcF_add2d(dbcF_mul2d(s,xr),dbcF_mul2d(c,yr)); dbcF_store2d_aligned(dbcF_add2d(xl,x),LR+d);dbcF_store2d_aligned(dbcF_add2d(yl,y),LI+d); dbcF_store2d_aligned(dbcF_sub2d(xl,x),HR+d);dbcF_store2d_aligned(dbcF_sub2d(yl,y),HI+d); d+=2; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd2d C=dbcF_load2d_aligned(tr),S=dbcF_load2d_aligned(ti); dbcf_simd2d xl=dbcF_load2d_aligned(LR),yl=dbcF_load2d_aligned(LI); dbcf_simd2d xr=dbcF_load2d_aligned(HR),yr=dbcF_load2d_aligned(HI); dbcf_simd2d x=dbcF_sub2d(dbcF_mul2d(C,xr),dbcF_mul2d(S,yr)),y=dbcF_add2d(dbcF_mul2d(S,xr),dbcF_mul2d(C,yr)); dbcF_store2d_aligned(dbcF_add2d(xl,x),LR);dbcF_store2d_aligned(dbcF_add2d(yl,y),LI); dbcF_store2d_aligned(dbcF_sub2d(xl,x),HR);dbcF_store2d_aligned(dbcF_sub2d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_2d_aa(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } static void dbcF_compute_twiddles_2d_u(ptrdiff_t log2n,ptrdiff_t log2b,double *real,double *imag,int inverse) { ptrdiff_t i; real[0]=0.0e0; imag[0]=0.0e0; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); double x,y; dbcF_cexpm1_d(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=2) { dbcf_simd2d X=dbcF_fill2d(x); dbcf_simd2d Y=dbcF_fill2d(y); for(j=0;j<k;j+=2) { dbcf_simd2d R=dbcF_load2d(real+j); dbcf_simd2d I=dbcF_load2d(imag+j); dbcF_store2d(dbcF_add2d(dbcF_sub2d(dbcF_mul2d(X,R),dbcF_mul2d(Y,I)),dbcF_add2d(X,R)),real+k+j); dbcF_store2d(dbcF_add2d(dbcF_add2d(dbcF_mul2d(Y,R),dbcF_mul2d(X,I)),dbcF_add2d(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0e0 +real[i]; } static void dbcF_compute_twiddles_2d_a(ptrdiff_t log2n,ptrdiff_t log2b,double *real,double *imag,int inverse) { ptrdiff_t i; real[0]=0.0e0; imag[0]=0.0e0; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); double x,y; dbcF_cexpm1_d(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=2) { dbcf_simd2d X=dbcF_fill2d(x); dbcf_simd2d Y=dbcF_fill2d(y); for(j=0;j<k;j+=2) { dbcf_simd2d R=dbcF_load2d_aligned(real+j); dbcf_simd2d I=dbcF_load2d_aligned(imag+j); dbcF_store2d_aligned(dbcF_add2d(dbcF_sub2d(dbcF_mul2d(X,R),dbcF_mul2d(Y,I)),dbcF_add2d(X,R)),real+k+j); dbcF_store2d_aligned(dbcF_add2d(dbcF_add2d(dbcF_mul2d(Y,R),dbcF_mul2d(X,I)),dbcF_add2d(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0e0 +real[i]; } static void dbcF_fft8_2d(double *real,double *imag,int inverse) {dbcF_fft8_d(real,imag,1,1,inverse,0.70710678118654752438e0);} __attribute__((target("avx"))) static void dbcF_butterfly_block_8f_uu( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd8f CC=dbcF_fill8f(C),SS=dbcF_fill8f(S); for(i=0;i<b;i+=8) { dbcf_simd8f TR=dbcF_load8f(tr+i),TI=dbcF_load8f(ti+i); dbcf_simd8f c=dbcF_sub8f(dbcF_mul8f(CC,TR),dbcF_mul8f(SS,TI)),s=dbcF_add8f(dbcF_mul8f(SS,TR),dbcF_mul8f(CC,TI)); dbcf_simd8f xl=dbcF_load8f(LR+i),yl=dbcF_load8f(LI+i); dbcf_simd8f xr=dbcF_load8f(HR+i),yr=dbcF_load8f(HI+i); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr)),y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f(dbcF_add8f(xl,x),LR+i);dbcF_store8f(dbcF_add8f(yl,y),LI+i); dbcF_store8f(dbcF_sub8f(xl,x),HR+i);dbcF_store8f(dbcF_sub8f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_8f_uu(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_8f_uu(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx"))) static void dbcF_butterfly_block_8f_au( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd8f CC=dbcF_fill8f(C),SS=dbcF_fill8f(S); for(i=0;i<b;i+=8) { dbcf_simd8f TR=dbcF_load8f_aligned(tr+i),TI=dbcF_load8f_aligned(ti+i); dbcf_simd8f c=dbcF_sub8f(dbcF_mul8f(CC,TR),dbcF_mul8f(SS,TI)),s=dbcF_add8f(dbcF_mul8f(SS,TR),dbcF_mul8f(CC,TI)); dbcf_simd8f xl=dbcF_load8f(LR+i),yl=dbcF_load8f(LI+i); dbcf_simd8f xr=dbcF_load8f(HR+i),yr=dbcF_load8f(HI+i); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr)),y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f(dbcF_add8f(xl,x),LR+i);dbcF_store8f(dbcF_add8f(yl,y),LI+i); dbcF_store8f(dbcF_sub8f(xl,x),HR+i);dbcF_store8f(dbcF_sub8f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_8f_au(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_8f_au(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx"))) static void dbcF_butterfly_block_8f_ua( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd8f CC=dbcF_fill8f(C),SS=dbcF_fill8f(S); for(i=0;i<b;i+=8) { dbcf_simd8f TR=dbcF_load8f(tr+i),TI=dbcF_load8f(ti+i); dbcf_simd8f c=dbcF_sub8f(dbcF_mul8f(CC,TR),dbcF_mul8f(SS,TI)),s=dbcF_add8f(dbcF_mul8f(SS,TR),dbcF_mul8f(CC,TI)); dbcf_simd8f xl=dbcF_load8f_aligned(LR+i),yl=dbcF_load8f_aligned(LI+i); dbcf_simd8f xr=dbcF_load8f_aligned(HR+i),yr=dbcF_load8f_aligned(HI+i); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr)),y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f_aligned(dbcF_add8f(xl,x),LR+i);dbcF_store8f_aligned(dbcF_add8f(yl,y),LI+i); dbcF_store8f_aligned(dbcF_sub8f(xl,x),HR+i);dbcF_store8f_aligned(dbcF_sub8f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_8f_ua(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_8f_ua(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx"))) static void dbcF_butterfly_block_8f_aa( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd8f CC=dbcF_fill8f(C),SS=dbcF_fill8f(S); for(i=0;i<b;i+=8) { dbcf_simd8f TR=dbcF_load8f_aligned(tr+i),TI=dbcF_load8f_aligned(ti+i); dbcf_simd8f c=dbcF_sub8f(dbcF_mul8f(CC,TR),dbcF_mul8f(SS,TI)),s=dbcF_add8f(dbcF_mul8f(SS,TR),dbcF_mul8f(CC,TI)); dbcf_simd8f xl=dbcF_load8f_aligned(LR+i),yl=dbcF_load8f_aligned(LI+i); dbcf_simd8f xr=dbcF_load8f_aligned(HR+i),yr=dbcF_load8f_aligned(HI+i); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr)),y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f_aligned(dbcF_add8f(xl,x),LR+i);dbcF_store8f_aligned(dbcF_add8f(yl,y),LI+i); dbcF_store8f_aligned(dbcF_sub8f(xl,x),HR+i);dbcF_store8f_aligned(dbcF_sub8f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_8f_aa(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_8f_aa(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx"))) static void dbcF_butterfly_pass_8f_uu( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>8) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd8f c=dbcF_load8f(tr+d),s=dbcF_load8f(ti+d); dbcf_simd8f xl=dbcF_load8f(LR+d),yl=dbcF_load8f(LI+d); dbcf_simd8f xr=dbcF_load8f(HR+d),yr=dbcF_load8f(HI+d); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr)),y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f(dbcF_add8f(xl,x),LR+d);dbcF_store8f(dbcF_add8f(yl,y),LI+d); dbcF_store8f(dbcF_sub8f(xl,x),HR+d);dbcF_store8f(dbcF_sub8f(yl,y),HI+d); d+=8; c=dbcF_load8f(tr+d);s=dbcF_load8f(ti+d); xl=dbcF_load8f(LR+d);yl=dbcF_load8f(LI+d); xr=dbcF_load8f(HR+d);yr=dbcF_load8f(HI+d); x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr));y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f(dbcF_add8f(xl,x),LR+d);dbcF_store8f(dbcF_add8f(yl,y),LI+d); dbcF_store8f(dbcF_sub8f(xl,x),HR+d);dbcF_store8f(dbcF_sub8f(yl,y),HI+d); d+=8; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd8f C=dbcF_load8f(tr),S=dbcF_load8f(ti); dbcf_simd8f xl=dbcF_load8f(LR),yl=dbcF_load8f(LI); dbcf_simd8f xr=dbcF_load8f(HR),yr=dbcF_load8f(HI); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(C,xr),dbcF_mul8f(S,yr)),y=dbcF_add8f(dbcF_mul8f(S,xr),dbcF_mul8f(C,yr)); dbcF_store8f(dbcF_add8f(xl,x),LR);dbcF_store8f(dbcF_add8f(yl,y),LI); dbcF_store8f(dbcF_sub8f(xl,x),HR);dbcF_store8f(dbcF_sub8f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_8f_uu(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx"))) static void dbcF_butterfly_pass_8f_au( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>8) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd8f c=dbcF_load8f_aligned(tr+d),s=dbcF_load8f_aligned(ti+d); dbcf_simd8f xl=dbcF_load8f(LR+d),yl=dbcF_load8f(LI+d); dbcf_simd8f xr=dbcF_load8f(HR+d),yr=dbcF_load8f(HI+d); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr)),y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f(dbcF_add8f(xl,x),LR+d);dbcF_store8f(dbcF_add8f(yl,y),LI+d); dbcF_store8f(dbcF_sub8f(xl,x),HR+d);dbcF_store8f(dbcF_sub8f(yl,y),HI+d); d+=8; c=dbcF_load8f_aligned(tr+d);s=dbcF_load8f_aligned(ti+d); xl=dbcF_load8f(LR+d);yl=dbcF_load8f(LI+d); xr=dbcF_load8f(HR+d);yr=dbcF_load8f(HI+d); x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr));y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f(dbcF_add8f(xl,x),LR+d);dbcF_store8f(dbcF_add8f(yl,y),LI+d); dbcF_store8f(dbcF_sub8f(xl,x),HR+d);dbcF_store8f(dbcF_sub8f(yl,y),HI+d); d+=8; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd8f C=dbcF_load8f_aligned(tr),S=dbcF_load8f_aligned(ti); dbcf_simd8f xl=dbcF_load8f(LR),yl=dbcF_load8f(LI); dbcf_simd8f xr=dbcF_load8f(HR),yr=dbcF_load8f(HI); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(C,xr),dbcF_mul8f(S,yr)),y=dbcF_add8f(dbcF_mul8f(S,xr),dbcF_mul8f(C,yr)); dbcF_store8f(dbcF_add8f(xl,x),LR);dbcF_store8f(dbcF_add8f(yl,y),LI); dbcF_store8f(dbcF_sub8f(xl,x),HR);dbcF_store8f(dbcF_sub8f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_8f_au(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx"))) static void dbcF_butterfly_pass_8f_ua( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>8) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd8f c=dbcF_load8f(tr+d),s=dbcF_load8f(ti+d); dbcf_simd8f xl=dbcF_load8f_aligned(LR+d),yl=dbcF_load8f_aligned(LI+d); dbcf_simd8f xr=dbcF_load8f_aligned(HR+d),yr=dbcF_load8f_aligned(HI+d); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr)),y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f_aligned(dbcF_add8f(xl,x),LR+d);dbcF_store8f_aligned(dbcF_add8f(yl,y),LI+d); dbcF_store8f_aligned(dbcF_sub8f(xl,x),HR+d);dbcF_store8f_aligned(dbcF_sub8f(yl,y),HI+d); d+=8; c=dbcF_load8f(tr+d);s=dbcF_load8f(ti+d); xl=dbcF_load8f_aligned(LR+d);yl=dbcF_load8f_aligned(LI+d); xr=dbcF_load8f_aligned(HR+d);yr=dbcF_load8f_aligned(HI+d); x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr));y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f_aligned(dbcF_add8f(xl,x),LR+d);dbcF_store8f_aligned(dbcF_add8f(yl,y),LI+d); dbcF_store8f_aligned(dbcF_sub8f(xl,x),HR+d);dbcF_store8f_aligned(dbcF_sub8f(yl,y),HI+d); d+=8; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd8f C=dbcF_load8f(tr),S=dbcF_load8f(ti); dbcf_simd8f xl=dbcF_load8f_aligned(LR),yl=dbcF_load8f_aligned(LI); dbcf_simd8f xr=dbcF_load8f_aligned(HR),yr=dbcF_load8f_aligned(HI); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(C,xr),dbcF_mul8f(S,yr)),y=dbcF_add8f(dbcF_mul8f(S,xr),dbcF_mul8f(C,yr)); dbcF_store8f_aligned(dbcF_add8f(xl,x),LR);dbcF_store8f_aligned(dbcF_add8f(yl,y),LI); dbcF_store8f_aligned(dbcF_sub8f(xl,x),HR);dbcF_store8f_aligned(dbcF_sub8f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_8f_ua(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx"))) static void dbcF_butterfly_pass_8f_aa( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>8) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd8f c=dbcF_load8f_aligned(tr+d),s=dbcF_load8f_aligned(ti+d); dbcf_simd8f xl=dbcF_load8f_aligned(LR+d),yl=dbcF_load8f_aligned(LI+d); dbcf_simd8f xr=dbcF_load8f_aligned(HR+d),yr=dbcF_load8f_aligned(HI+d); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr)),y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f_aligned(dbcF_add8f(xl,x),LR+d);dbcF_store8f_aligned(dbcF_add8f(yl,y),LI+d); dbcF_store8f_aligned(dbcF_sub8f(xl,x),HR+d);dbcF_store8f_aligned(dbcF_sub8f(yl,y),HI+d); d+=8; c=dbcF_load8f_aligned(tr+d);s=dbcF_load8f_aligned(ti+d); xl=dbcF_load8f_aligned(LR+d);yl=dbcF_load8f_aligned(LI+d); xr=dbcF_load8f_aligned(HR+d);yr=dbcF_load8f_aligned(HI+d); x=dbcF_sub8f(dbcF_mul8f(c,xr),dbcF_mul8f(s,yr));y=dbcF_add8f(dbcF_mul8f(s,xr),dbcF_mul8f(c,yr)); dbcF_store8f_aligned(dbcF_add8f(xl,x),LR+d);dbcF_store8f_aligned(dbcF_add8f(yl,y),LI+d); dbcF_store8f_aligned(dbcF_sub8f(xl,x),HR+d);dbcF_store8f_aligned(dbcF_sub8f(yl,y),HI+d); d+=8; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd8f C=dbcF_load8f_aligned(tr),S=dbcF_load8f_aligned(ti); dbcf_simd8f xl=dbcF_load8f_aligned(LR),yl=dbcF_load8f_aligned(LI); dbcf_simd8f xr=dbcF_load8f_aligned(HR),yr=dbcF_load8f_aligned(HI); dbcf_simd8f x=dbcF_sub8f(dbcF_mul8f(C,xr),dbcF_mul8f(S,yr)),y=dbcF_add8f(dbcF_mul8f(S,xr),dbcF_mul8f(C,yr)); dbcF_store8f_aligned(dbcF_add8f(xl,x),LR);dbcF_store8f_aligned(dbcF_add8f(yl,y),LI); dbcF_store8f_aligned(dbcF_sub8f(xl,x),HR);dbcF_store8f_aligned(dbcF_sub8f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_8f_aa(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx"))) static void dbcF_compute_twiddles_8f_u(ptrdiff_t log2n,ptrdiff_t log2b,float *real,float *imag,int inverse) { ptrdiff_t i; real[0]=0.0f; imag[0]=0.0f; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); float x,y; dbcF_cexpm1_f(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=8) { dbcf_simd8f X=dbcF_fill8f(x); dbcf_simd8f Y=dbcF_fill8f(y); for(j=0;j<k;j+=8) { dbcf_simd8f R=dbcF_load8f(real+j); dbcf_simd8f I=dbcF_load8f(imag+j); dbcF_store8f(dbcF_add8f(dbcF_sub8f(dbcF_mul8f(X,R),dbcF_mul8f(Y,I)),dbcF_add8f(X,R)),real+k+j); dbcF_store8f(dbcF_add8f(dbcF_add8f(dbcF_mul8f(Y,R),dbcF_mul8f(X,I)),dbcF_add8f(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0f +real[i]; } __attribute__((target("avx"))) static void dbcF_compute_twiddles_8f_a(ptrdiff_t log2n,ptrdiff_t log2b,float *real,float *imag,int inverse) { ptrdiff_t i; real[0]=0.0f; imag[0]=0.0f; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); float x,y; dbcF_cexpm1_f(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=8) { dbcf_simd8f X=dbcF_fill8f(x); dbcf_simd8f Y=dbcF_fill8f(y); for(j=0;j<k;j+=8) { dbcf_simd8f R=dbcF_load8f_aligned(real+j); dbcf_simd8f I=dbcF_load8f_aligned(imag+j); dbcF_store8f_aligned(dbcF_add8f(dbcF_sub8f(dbcF_mul8f(X,R),dbcF_mul8f(Y,I)),dbcF_add8f(X,R)),real+k+j); dbcF_store8f_aligned(dbcF_add8f(dbcF_add8f(dbcF_mul8f(Y,R),dbcF_mul8f(X,I)),dbcF_add8f(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0f +real[i]; } __attribute__((target("avx"))) static void dbcF_fft8_8f(float *real,float *imag,int inverse) {dbcF_fft8_f(real,imag,1,1,inverse,0.70710678118654752438f);} __attribute__((target("avx"))) static void dbcF_butterfly_block_4d_uu( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd4d CC=dbcF_fill4d(C),SS=dbcF_fill4d(S); for(i=0;i<b;i+=4) { dbcf_simd4d TR=dbcF_load4d(tr+i),TI=dbcF_load4d(ti+i); dbcf_simd4d c=dbcF_sub4d(dbcF_mul4d(CC,TR),dbcF_mul4d(SS,TI)),s=dbcF_add4d(dbcF_mul4d(SS,TR),dbcF_mul4d(CC,TI)); dbcf_simd4d xl=dbcF_load4d(LR+i),yl=dbcF_load4d(LI+i); dbcf_simd4d xr=dbcF_load4d(HR+i),yr=dbcF_load4d(HI+i); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr)),y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d(dbcF_add4d(xl,x),LR+i);dbcF_store4d(dbcF_add4d(yl,y),LI+i); dbcF_store4d(dbcF_sub4d(xl,x),HR+i);dbcF_store4d(dbcF_sub4d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_4d_uu(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_4d_uu(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx"))) static void dbcF_butterfly_block_4d_au( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd4d CC=dbcF_fill4d(C),SS=dbcF_fill4d(S); for(i=0;i<b;i+=4) { dbcf_simd4d TR=dbcF_load4d_aligned(tr+i),TI=dbcF_load4d_aligned(ti+i); dbcf_simd4d c=dbcF_sub4d(dbcF_mul4d(CC,TR),dbcF_mul4d(SS,TI)),s=dbcF_add4d(dbcF_mul4d(SS,TR),dbcF_mul4d(CC,TI)); dbcf_simd4d xl=dbcF_load4d(LR+i),yl=dbcF_load4d(LI+i); dbcf_simd4d xr=dbcF_load4d(HR+i),yr=dbcF_load4d(HI+i); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr)),y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d(dbcF_add4d(xl,x),LR+i);dbcF_store4d(dbcF_add4d(yl,y),LI+i); dbcF_store4d(dbcF_sub4d(xl,x),HR+i);dbcF_store4d(dbcF_sub4d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_4d_au(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_4d_au(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx"))) static void dbcF_butterfly_block_4d_ua( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd4d CC=dbcF_fill4d(C),SS=dbcF_fill4d(S); for(i=0;i<b;i+=4) { dbcf_simd4d TR=dbcF_load4d(tr+i),TI=dbcF_load4d(ti+i); dbcf_simd4d c=dbcF_sub4d(dbcF_mul4d(CC,TR),dbcF_mul4d(SS,TI)),s=dbcF_add4d(dbcF_mul4d(SS,TR),dbcF_mul4d(CC,TI)); dbcf_simd4d xl=dbcF_load4d_aligned(LR+i),yl=dbcF_load4d_aligned(LI+i); dbcf_simd4d xr=dbcF_load4d_aligned(HR+i),yr=dbcF_load4d_aligned(HI+i); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr)),y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d_aligned(dbcF_add4d(xl,x),LR+i);dbcF_store4d_aligned(dbcF_add4d(yl,y),LI+i); dbcF_store4d_aligned(dbcF_sub4d(xl,x),HR+i);dbcF_store4d_aligned(dbcF_sub4d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_4d_ua(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_4d_ua(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx"))) static void dbcF_butterfly_block_4d_aa( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd4d CC=dbcF_fill4d(C),SS=dbcF_fill4d(S); for(i=0;i<b;i+=4) { dbcf_simd4d TR=dbcF_load4d_aligned(tr+i),TI=dbcF_load4d_aligned(ti+i); dbcf_simd4d c=dbcF_sub4d(dbcF_mul4d(CC,TR),dbcF_mul4d(SS,TI)),s=dbcF_add4d(dbcF_mul4d(SS,TR),dbcF_mul4d(CC,TI)); dbcf_simd4d xl=dbcF_load4d_aligned(LR+i),yl=dbcF_load4d_aligned(LI+i); dbcf_simd4d xr=dbcF_load4d_aligned(HR+i),yr=dbcF_load4d_aligned(HI+i); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr)),y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d_aligned(dbcF_add4d(xl,x),LR+i);dbcF_store4d_aligned(dbcF_add4d(yl,y),LI+i); dbcF_store4d_aligned(dbcF_sub4d(xl,x),HR+i);dbcF_store4d_aligned(dbcF_sub4d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_4d_aa(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_4d_aa(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx"))) static void dbcF_butterfly_pass_4d_uu( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>4) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd4d c=dbcF_load4d(tr+d),s=dbcF_load4d(ti+d); dbcf_simd4d xl=dbcF_load4d(LR+d),yl=dbcF_load4d(LI+d); dbcf_simd4d xr=dbcF_load4d(HR+d),yr=dbcF_load4d(HI+d); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr)),y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d(dbcF_add4d(xl,x),LR+d);dbcF_store4d(dbcF_add4d(yl,y),LI+d); dbcF_store4d(dbcF_sub4d(xl,x),HR+d);dbcF_store4d(dbcF_sub4d(yl,y),HI+d); d+=4; c=dbcF_load4d(tr+d);s=dbcF_load4d(ti+d); xl=dbcF_load4d(LR+d);yl=dbcF_load4d(LI+d); xr=dbcF_load4d(HR+d);yr=dbcF_load4d(HI+d); x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr));y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d(dbcF_add4d(xl,x),LR+d);dbcF_store4d(dbcF_add4d(yl,y),LI+d); dbcF_store4d(dbcF_sub4d(xl,x),HR+d);dbcF_store4d(dbcF_sub4d(yl,y),HI+d); d+=4; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd4d C=dbcF_load4d(tr),S=dbcF_load4d(ti); dbcf_simd4d xl=dbcF_load4d(LR),yl=dbcF_load4d(LI); dbcf_simd4d xr=dbcF_load4d(HR),yr=dbcF_load4d(HI); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(C,xr),dbcF_mul4d(S,yr)),y=dbcF_add4d(dbcF_mul4d(S,xr),dbcF_mul4d(C,yr)); dbcF_store4d(dbcF_add4d(xl,x),LR);dbcF_store4d(dbcF_add4d(yl,y),LI); dbcF_store4d(dbcF_sub4d(xl,x),HR);dbcF_store4d(dbcF_sub4d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_4d_uu(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx"))) static void dbcF_butterfly_pass_4d_au( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>4) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd4d c=dbcF_load4d_aligned(tr+d),s=dbcF_load4d_aligned(ti+d); dbcf_simd4d xl=dbcF_load4d(LR+d),yl=dbcF_load4d(LI+d); dbcf_simd4d xr=dbcF_load4d(HR+d),yr=dbcF_load4d(HI+d); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr)),y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d(dbcF_add4d(xl,x),LR+d);dbcF_store4d(dbcF_add4d(yl,y),LI+d); dbcF_store4d(dbcF_sub4d(xl,x),HR+d);dbcF_store4d(dbcF_sub4d(yl,y),HI+d); d+=4; c=dbcF_load4d_aligned(tr+d);s=dbcF_load4d_aligned(ti+d); xl=dbcF_load4d(LR+d);yl=dbcF_load4d(LI+d); xr=dbcF_load4d(HR+d);yr=dbcF_load4d(HI+d); x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr));y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d(dbcF_add4d(xl,x),LR+d);dbcF_store4d(dbcF_add4d(yl,y),LI+d); dbcF_store4d(dbcF_sub4d(xl,x),HR+d);dbcF_store4d(dbcF_sub4d(yl,y),HI+d); d+=4; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd4d C=dbcF_load4d_aligned(tr),S=dbcF_load4d_aligned(ti); dbcf_simd4d xl=dbcF_load4d(LR),yl=dbcF_load4d(LI); dbcf_simd4d xr=dbcF_load4d(HR),yr=dbcF_load4d(HI); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(C,xr),dbcF_mul4d(S,yr)),y=dbcF_add4d(dbcF_mul4d(S,xr),dbcF_mul4d(C,yr)); dbcF_store4d(dbcF_add4d(xl,x),LR);dbcF_store4d(dbcF_add4d(yl,y),LI); dbcF_store4d(dbcF_sub4d(xl,x),HR);dbcF_store4d(dbcF_sub4d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_4d_au(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx"))) static void dbcF_butterfly_pass_4d_ua( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>4) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd4d c=dbcF_load4d(tr+d),s=dbcF_load4d(ti+d); dbcf_simd4d xl=dbcF_load4d_aligned(LR+d),yl=dbcF_load4d_aligned(LI+d); dbcf_simd4d xr=dbcF_load4d_aligned(HR+d),yr=dbcF_load4d_aligned(HI+d); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr)),y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d_aligned(dbcF_add4d(xl,x),LR+d);dbcF_store4d_aligned(dbcF_add4d(yl,y),LI+d); dbcF_store4d_aligned(dbcF_sub4d(xl,x),HR+d);dbcF_store4d_aligned(dbcF_sub4d(yl,y),HI+d); d+=4; c=dbcF_load4d(tr+d);s=dbcF_load4d(ti+d); xl=dbcF_load4d_aligned(LR+d);yl=dbcF_load4d_aligned(LI+d); xr=dbcF_load4d_aligned(HR+d);yr=dbcF_load4d_aligned(HI+d); x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr));y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d_aligned(dbcF_add4d(xl,x),LR+d);dbcF_store4d_aligned(dbcF_add4d(yl,y),LI+d); dbcF_store4d_aligned(dbcF_sub4d(xl,x),HR+d);dbcF_store4d_aligned(dbcF_sub4d(yl,y),HI+d); d+=4; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd4d C=dbcF_load4d(tr),S=dbcF_load4d(ti); dbcf_simd4d xl=dbcF_load4d_aligned(LR),yl=dbcF_load4d_aligned(LI); dbcf_simd4d xr=dbcF_load4d_aligned(HR),yr=dbcF_load4d_aligned(HI); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(C,xr),dbcF_mul4d(S,yr)),y=dbcF_add4d(dbcF_mul4d(S,xr),dbcF_mul4d(C,yr)); dbcF_store4d_aligned(dbcF_add4d(xl,x),LR);dbcF_store4d_aligned(dbcF_add4d(yl,y),LI); dbcF_store4d_aligned(dbcF_sub4d(xl,x),HR);dbcF_store4d_aligned(dbcF_sub4d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_4d_ua(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx"))) static void dbcF_butterfly_pass_4d_aa( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>4) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd4d c=dbcF_load4d_aligned(tr+d),s=dbcF_load4d_aligned(ti+d); dbcf_simd4d xl=dbcF_load4d_aligned(LR+d),yl=dbcF_load4d_aligned(LI+d); dbcf_simd4d xr=dbcF_load4d_aligned(HR+d),yr=dbcF_load4d_aligned(HI+d); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr)),y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d_aligned(dbcF_add4d(xl,x),LR+d);dbcF_store4d_aligned(dbcF_add4d(yl,y),LI+d); dbcF_store4d_aligned(dbcF_sub4d(xl,x),HR+d);dbcF_store4d_aligned(dbcF_sub4d(yl,y),HI+d); d+=4; c=dbcF_load4d_aligned(tr+d);s=dbcF_load4d_aligned(ti+d); xl=dbcF_load4d_aligned(LR+d);yl=dbcF_load4d_aligned(LI+d); xr=dbcF_load4d_aligned(HR+d);yr=dbcF_load4d_aligned(HI+d); x=dbcF_sub4d(dbcF_mul4d(c,xr),dbcF_mul4d(s,yr));y=dbcF_add4d(dbcF_mul4d(s,xr),dbcF_mul4d(c,yr)); dbcF_store4d_aligned(dbcF_add4d(xl,x),LR+d);dbcF_store4d_aligned(dbcF_add4d(yl,y),LI+d); dbcF_store4d_aligned(dbcF_sub4d(xl,x),HR+d);dbcF_store4d_aligned(dbcF_sub4d(yl,y),HI+d); d+=4; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd4d C=dbcF_load4d_aligned(tr),S=dbcF_load4d_aligned(ti); dbcf_simd4d xl=dbcF_load4d_aligned(LR),yl=dbcF_load4d_aligned(LI); dbcf_simd4d xr=dbcF_load4d_aligned(HR),yr=dbcF_load4d_aligned(HI); dbcf_simd4d x=dbcF_sub4d(dbcF_mul4d(C,xr),dbcF_mul4d(S,yr)),y=dbcF_add4d(dbcF_mul4d(S,xr),dbcF_mul4d(C,yr)); dbcF_store4d_aligned(dbcF_add4d(xl,x),LR);dbcF_store4d_aligned(dbcF_add4d(yl,y),LI); dbcF_store4d_aligned(dbcF_sub4d(xl,x),HR);dbcF_store4d_aligned(dbcF_sub4d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_4d_aa(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx"))) static void dbcF_compute_twiddles_4d_u(ptrdiff_t log2n,ptrdiff_t log2b,double *real,double *imag,int inverse) { ptrdiff_t i; real[0]=0.0e0; imag[0]=0.0e0; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); double x,y; dbcF_cexpm1_d(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=4) { dbcf_simd4d X=dbcF_fill4d(x); dbcf_simd4d Y=dbcF_fill4d(y); for(j=0;j<k;j+=4) { dbcf_simd4d R=dbcF_load4d(real+j); dbcf_simd4d I=dbcF_load4d(imag+j); dbcF_store4d(dbcF_add4d(dbcF_sub4d(dbcF_mul4d(X,R),dbcF_mul4d(Y,I)),dbcF_add4d(X,R)),real+k+j); dbcF_store4d(dbcF_add4d(dbcF_add4d(dbcF_mul4d(Y,R),dbcF_mul4d(X,I)),dbcF_add4d(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0e0 +real[i]; } __attribute__((target("avx"))) static void dbcF_compute_twiddles_4d_a(ptrdiff_t log2n,ptrdiff_t log2b,double *real,double *imag,int inverse) { ptrdiff_t i; real[0]=0.0e0; imag[0]=0.0e0; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); double x,y; dbcF_cexpm1_d(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=4) { dbcf_simd4d X=dbcF_fill4d(x); dbcf_simd4d Y=dbcF_fill4d(y); for(j=0;j<k;j+=4) { dbcf_simd4d R=dbcF_load4d_aligned(real+j); dbcf_simd4d I=dbcF_load4d_aligned(imag+j); dbcF_store4d_aligned(dbcF_add4d(dbcF_sub4d(dbcF_mul4d(X,R),dbcF_mul4d(Y,I)),dbcF_add4d(X,R)),real+k+j); dbcF_store4d_aligned(dbcF_add4d(dbcF_add4d(dbcF_mul4d(Y,R),dbcF_mul4d(X,I)),dbcF_add4d(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0e0 +real[i]; } __attribute__((target("avx"))) static void dbcF_fft8_4d(double *real,double *imag,int inverse) {dbcF_fft8_d(real,imag,1,1,inverse,0.70710678118654752438e0);} __attribute__((target("avx512f"))) static void dbcF_butterfly_block_16f_uu( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd16f CC=dbcF_fill16f(C),SS=dbcF_fill16f(S); for(i=0;i<b;i+=16) { dbcf_simd16f TR=dbcF_load16f(tr+i),TI=dbcF_load16f(ti+i); dbcf_simd16f c=dbcF_sub16f(dbcF_mul16f(CC,TR),dbcF_mul16f(SS,TI)),s=dbcF_add16f(dbcF_mul16f(SS,TR),dbcF_mul16f(CC,TI)); dbcf_simd16f xl=dbcF_load16f(LR+i),yl=dbcF_load16f(LI+i); dbcf_simd16f xr=dbcF_load16f(HR+i),yr=dbcF_load16f(HI+i); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr)),y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f(dbcF_add16f(xl,x),LR+i);dbcF_store16f(dbcF_add16f(yl,y),LI+i); dbcF_store16f(dbcF_sub16f(xl,x),HR+i);dbcF_store16f(dbcF_sub16f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_16f_uu(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_16f_uu(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx512f"))) static void dbcF_butterfly_block_16f_au( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd16f CC=dbcF_fill16f(C),SS=dbcF_fill16f(S); for(i=0;i<b;i+=16) { dbcf_simd16f TR=dbcF_load16f_aligned(tr+i),TI=dbcF_load16f_aligned(ti+i); dbcf_simd16f c=dbcF_sub16f(dbcF_mul16f(CC,TR),dbcF_mul16f(SS,TI)),s=dbcF_add16f(dbcF_mul16f(SS,TR),dbcF_mul16f(CC,TI)); dbcf_simd16f xl=dbcF_load16f(LR+i),yl=dbcF_load16f(LI+i); dbcf_simd16f xr=dbcF_load16f(HR+i),yr=dbcF_load16f(HI+i); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr)),y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f(dbcF_add16f(xl,x),LR+i);dbcF_store16f(dbcF_add16f(yl,y),LI+i); dbcF_store16f(dbcF_sub16f(xl,x),HR+i);dbcF_store16f(dbcF_sub16f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_16f_au(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_16f_au(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx512f"))) static void dbcF_butterfly_block_16f_ua( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd16f CC=dbcF_fill16f(C),SS=dbcF_fill16f(S); for(i=0;i<b;i+=16) { dbcf_simd16f TR=dbcF_load16f(tr+i),TI=dbcF_load16f(ti+i); dbcf_simd16f c=dbcF_sub16f(dbcF_mul16f(CC,TR),dbcF_mul16f(SS,TI)),s=dbcF_add16f(dbcF_mul16f(SS,TR),dbcF_mul16f(CC,TI)); dbcf_simd16f xl=dbcF_load16f_aligned(LR+i),yl=dbcF_load16f_aligned(LI+i); dbcf_simd16f xr=dbcF_load16f_aligned(HR+i),yr=dbcF_load16f_aligned(HI+i); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr)),y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f_aligned(dbcF_add16f(xl,x),LR+i);dbcF_store16f_aligned(dbcF_add16f(yl,y),LI+i); dbcF_store16f_aligned(dbcF_sub16f(xl,x),HR+i);dbcF_store16f_aligned(dbcF_sub16f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_16f_ua(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_16f_ua(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx512f"))) static void dbcF_butterfly_block_16f_aa( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd16f CC=dbcF_fill16f(C),SS=dbcF_fill16f(S); for(i=0;i<b;i+=16) { dbcf_simd16f TR=dbcF_load16f_aligned(tr+i),TI=dbcF_load16f_aligned(ti+i); dbcf_simd16f c=dbcF_sub16f(dbcF_mul16f(CC,TR),dbcF_mul16f(SS,TI)),s=dbcF_add16f(dbcF_mul16f(SS,TR),dbcF_mul16f(CC,TI)); dbcf_simd16f xl=dbcF_load16f_aligned(LR+i),yl=dbcF_load16f_aligned(LI+i); dbcf_simd16f xr=dbcF_load16f_aligned(HR+i),yr=dbcF_load16f_aligned(HI+i); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr)),y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f_aligned(dbcF_add16f(xl,x),LR+i);dbcF_store16f_aligned(dbcF_add16f(yl,y),LI+i); dbcF_store16f_aligned(dbcF_sub16f(xl,x),HR+i);dbcF_store16f_aligned(dbcF_sub16f(yl,y),HI+i); } } else { float X,Y; dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_16f_aa(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_16f_aa(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx512f"))) static void dbcF_butterfly_pass_16f_uu( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>16) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd16f c=dbcF_load16f(tr+d),s=dbcF_load16f(ti+d); dbcf_simd16f xl=dbcF_load16f(LR+d),yl=dbcF_load16f(LI+d); dbcf_simd16f xr=dbcF_load16f(HR+d),yr=dbcF_load16f(HI+d); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr)),y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f(dbcF_add16f(xl,x),LR+d);dbcF_store16f(dbcF_add16f(yl,y),LI+d); dbcF_store16f(dbcF_sub16f(xl,x),HR+d);dbcF_store16f(dbcF_sub16f(yl,y),HI+d); d+=16; c=dbcF_load16f(tr+d);s=dbcF_load16f(ti+d); xl=dbcF_load16f(LR+d);yl=dbcF_load16f(LI+d); xr=dbcF_load16f(HR+d);yr=dbcF_load16f(HI+d); x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr));y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f(dbcF_add16f(xl,x),LR+d);dbcF_store16f(dbcF_add16f(yl,y),LI+d); dbcF_store16f(dbcF_sub16f(xl,x),HR+d);dbcF_store16f(dbcF_sub16f(yl,y),HI+d); d+=16; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd16f C=dbcF_load16f(tr),S=dbcF_load16f(ti); dbcf_simd16f xl=dbcF_load16f(LR),yl=dbcF_load16f(LI); dbcf_simd16f xr=dbcF_load16f(HR),yr=dbcF_load16f(HI); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(C,xr),dbcF_mul16f(S,yr)),y=dbcF_add16f(dbcF_mul16f(S,xr),dbcF_mul16f(C,yr)); dbcF_store16f(dbcF_add16f(xl,x),LR);dbcF_store16f(dbcF_add16f(yl,y),LI); dbcF_store16f(dbcF_sub16f(xl,x),HR);dbcF_store16f(dbcF_sub16f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_16f_uu(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx512f"))) static void dbcF_butterfly_pass_16f_au( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>16) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd16f c=dbcF_load16f_aligned(tr+d),s=dbcF_load16f_aligned(ti+d); dbcf_simd16f xl=dbcF_load16f(LR+d),yl=dbcF_load16f(LI+d); dbcf_simd16f xr=dbcF_load16f(HR+d),yr=dbcF_load16f(HI+d); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr)),y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f(dbcF_add16f(xl,x),LR+d);dbcF_store16f(dbcF_add16f(yl,y),LI+d); dbcF_store16f(dbcF_sub16f(xl,x),HR+d);dbcF_store16f(dbcF_sub16f(yl,y),HI+d); d+=16; c=dbcF_load16f_aligned(tr+d);s=dbcF_load16f_aligned(ti+d); xl=dbcF_load16f(LR+d);yl=dbcF_load16f(LI+d); xr=dbcF_load16f(HR+d);yr=dbcF_load16f(HI+d); x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr));y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f(dbcF_add16f(xl,x),LR+d);dbcF_store16f(dbcF_add16f(yl,y),LI+d); dbcF_store16f(dbcF_sub16f(xl,x),HR+d);dbcF_store16f(dbcF_sub16f(yl,y),HI+d); d+=16; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd16f C=dbcF_load16f_aligned(tr),S=dbcF_load16f_aligned(ti); dbcf_simd16f xl=dbcF_load16f(LR),yl=dbcF_load16f(LI); dbcf_simd16f xr=dbcF_load16f(HR),yr=dbcF_load16f(HI); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(C,xr),dbcF_mul16f(S,yr)),y=dbcF_add16f(dbcF_mul16f(S,xr),dbcF_mul16f(C,yr)); dbcF_store16f(dbcF_add16f(xl,x),LR);dbcF_store16f(dbcF_add16f(yl,y),LI); dbcF_store16f(dbcF_sub16f(xl,x),HR);dbcF_store16f(dbcF_sub16f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_16f_au(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx512f"))) static void dbcF_butterfly_pass_16f_ua( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>16) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd16f c=dbcF_load16f(tr+d),s=dbcF_load16f(ti+d); dbcf_simd16f xl=dbcF_load16f_aligned(LR+d),yl=dbcF_load16f_aligned(LI+d); dbcf_simd16f xr=dbcF_load16f_aligned(HR+d),yr=dbcF_load16f_aligned(HI+d); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr)),y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f_aligned(dbcF_add16f(xl,x),LR+d);dbcF_store16f_aligned(dbcF_add16f(yl,y),LI+d); dbcF_store16f_aligned(dbcF_sub16f(xl,x),HR+d);dbcF_store16f_aligned(dbcF_sub16f(yl,y),HI+d); d+=16; c=dbcF_load16f(tr+d);s=dbcF_load16f(ti+d); xl=dbcF_load16f_aligned(LR+d);yl=dbcF_load16f_aligned(LI+d); xr=dbcF_load16f_aligned(HR+d);yr=dbcF_load16f_aligned(HI+d); x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr));y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f_aligned(dbcF_add16f(xl,x),LR+d);dbcF_store16f_aligned(dbcF_add16f(yl,y),LI+d); dbcF_store16f_aligned(dbcF_sub16f(xl,x),HR+d);dbcF_store16f_aligned(dbcF_sub16f(yl,y),HI+d); d+=16; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd16f C=dbcF_load16f(tr),S=dbcF_load16f(ti); dbcf_simd16f xl=dbcF_load16f_aligned(LR),yl=dbcF_load16f_aligned(LI); dbcf_simd16f xr=dbcF_load16f_aligned(HR),yr=dbcF_load16f_aligned(HI); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(C,xr),dbcF_mul16f(S,yr)),y=dbcF_add16f(dbcF_mul16f(S,xr),dbcF_mul16f(C,yr)); dbcF_store16f_aligned(dbcF_add16f(xl,x),LR);dbcF_store16f_aligned(dbcF_add16f(yl,y),LI); dbcF_store16f_aligned(dbcF_sub16f(xl,x),HR);dbcF_store16f_aligned(dbcF_sub16f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_16f_ua(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx512f"))) static void dbcF_butterfly_pass_16f_aa( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real; float *HR=real+h; float *LI=imag; float *HI=imag+h; if(log2n-1<=log2t) { if(h>16) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd16f c=dbcF_load16f_aligned(tr+d),s=dbcF_load16f_aligned(ti+d); dbcf_simd16f xl=dbcF_load16f_aligned(LR+d),yl=dbcF_load16f_aligned(LI+d); dbcf_simd16f xr=dbcF_load16f_aligned(HR+d),yr=dbcF_load16f_aligned(HI+d); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr)),y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f_aligned(dbcF_add16f(xl,x),LR+d);dbcF_store16f_aligned(dbcF_add16f(yl,y),LI+d); dbcF_store16f_aligned(dbcF_sub16f(xl,x),HR+d);dbcF_store16f_aligned(dbcF_sub16f(yl,y),HI+d); d+=16; c=dbcF_load16f_aligned(tr+d);s=dbcF_load16f_aligned(ti+d); xl=dbcF_load16f_aligned(LR+d);yl=dbcF_load16f_aligned(LI+d); xr=dbcF_load16f_aligned(HR+d);yr=dbcF_load16f_aligned(HI+d); x=dbcF_sub16f(dbcF_mul16f(c,xr),dbcF_mul16f(s,yr));y=dbcF_add16f(dbcF_mul16f(s,xr),dbcF_mul16f(c,yr)); dbcF_store16f_aligned(dbcF_add16f(xl,x),LR+d);dbcF_store16f_aligned(dbcF_add16f(yl,y),LI+d); dbcF_store16f_aligned(dbcF_sub16f(xl,x),HR+d);dbcF_store16f_aligned(dbcF_sub16f(yl,y),HI+d); d+=16; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd16f C=dbcF_load16f_aligned(tr),S=dbcF_load16f_aligned(ti); dbcf_simd16f xl=dbcF_load16f_aligned(LR),yl=dbcF_load16f_aligned(LI); dbcf_simd16f xr=dbcF_load16f_aligned(HR),yr=dbcF_load16f_aligned(HI); dbcf_simd16f x=dbcF_sub16f(dbcF_mul16f(C,xr),dbcF_mul16f(S,yr)),y=dbcF_add16f(dbcF_mul16f(S,xr),dbcF_mul16f(C,yr)); dbcF_store16f_aligned(dbcF_add16f(xl,x),LR);dbcF_store16f_aligned(dbcF_add16f(yl,y),LI); dbcF_store16f_aligned(dbcF_sub16f(xl,x),HR);dbcF_store16f_aligned(dbcF_sub16f(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_16f_aa(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx512f"))) static void dbcF_compute_twiddles_16f_u(ptrdiff_t log2n,ptrdiff_t log2b,float *real,float *imag,int inverse) { ptrdiff_t i; real[0]=0.0f; imag[0]=0.0f; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); float x,y; dbcF_cexpm1_f(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=16) { dbcf_simd16f X=dbcF_fill16f(x); dbcf_simd16f Y=dbcF_fill16f(y); for(j=0;j<k;j+=16) { dbcf_simd16f R=dbcF_load16f(real+j); dbcf_simd16f I=dbcF_load16f(imag+j); dbcF_store16f(dbcF_add16f(dbcF_sub16f(dbcF_mul16f(X,R),dbcF_mul16f(Y,I)),dbcF_add16f(X,R)),real+k+j); dbcF_store16f(dbcF_add16f(dbcF_add16f(dbcF_mul16f(Y,R),dbcF_mul16f(X,I)),dbcF_add16f(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0f +real[i]; } __attribute__((target("avx512f"))) static void dbcF_compute_twiddles_16f_a(ptrdiff_t log2n,ptrdiff_t log2b,float *real,float *imag,int inverse) { ptrdiff_t i; real[0]=0.0f; imag[0]=0.0f; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); float x,y; dbcF_cexpm1_f(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=16) { dbcf_simd16f X=dbcF_fill16f(x); dbcf_simd16f Y=dbcF_fill16f(y); for(j=0;j<k;j+=16) { dbcf_simd16f R=dbcF_load16f_aligned(real+j); dbcf_simd16f I=dbcF_load16f_aligned(imag+j); dbcF_store16f_aligned(dbcF_add16f(dbcF_sub16f(dbcF_mul16f(X,R),dbcF_mul16f(Y,I)),dbcF_add16f(X,R)),real+k+j); dbcF_store16f_aligned(dbcF_add16f(dbcF_add16f(dbcF_mul16f(Y,R),dbcF_mul16f(X,I)),dbcF_add16f(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0f +real[i]; } __attribute__((target("avx512f"))) static void dbcF_fft8_16f(float *real,float *imag,int inverse) {dbcF_fft8_f(real,imag,1,1,inverse,0.70710678118654752438f);} __attribute__((target("avx512f"))) static void dbcF_butterfly_block_8d_uu( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd8d CC=dbcF_fill8d(C),SS=dbcF_fill8d(S); for(i=0;i<b;i+=8) { dbcf_simd8d TR=dbcF_load8d(tr+i),TI=dbcF_load8d(ti+i); dbcf_simd8d c=dbcF_sub8d(dbcF_mul8d(CC,TR),dbcF_mul8d(SS,TI)),s=dbcF_add8d(dbcF_mul8d(SS,TR),dbcF_mul8d(CC,TI)); dbcf_simd8d xl=dbcF_load8d(LR+i),yl=dbcF_load8d(LI+i); dbcf_simd8d xr=dbcF_load8d(HR+i),yr=dbcF_load8d(HI+i); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr)),y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d(dbcF_add8d(xl,x),LR+i);dbcF_store8d(dbcF_add8d(yl,y),LI+i); dbcF_store8d(dbcF_sub8d(xl,x),HR+i);dbcF_store8d(dbcF_sub8d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_8d_uu(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_8d_uu(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx512f"))) static void dbcF_butterfly_block_8d_au( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd8d CC=dbcF_fill8d(C),SS=dbcF_fill8d(S); for(i=0;i<b;i+=8) { dbcf_simd8d TR=dbcF_load8d_aligned(tr+i),TI=dbcF_load8d_aligned(ti+i); dbcf_simd8d c=dbcF_sub8d(dbcF_mul8d(CC,TR),dbcF_mul8d(SS,TI)),s=dbcF_add8d(dbcF_mul8d(SS,TR),dbcF_mul8d(CC,TI)); dbcf_simd8d xl=dbcF_load8d(LR+i),yl=dbcF_load8d(LI+i); dbcf_simd8d xr=dbcF_load8d(HR+i),yr=dbcF_load8d(HI+i); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr)),y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d(dbcF_add8d(xl,x),LR+i);dbcF_store8d(dbcF_add8d(yl,y),LI+i); dbcF_store8d(dbcF_sub8d(xl,x),HR+i);dbcF_store8d(dbcF_sub8d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_8d_au(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_8d_au(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx512f"))) static void dbcF_butterfly_block_8d_ua( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd8d CC=dbcF_fill8d(C),SS=dbcF_fill8d(S); for(i=0;i<b;i+=8) { dbcf_simd8d TR=dbcF_load8d(tr+i),TI=dbcF_load8d(ti+i); dbcf_simd8d c=dbcF_sub8d(dbcF_mul8d(CC,TR),dbcF_mul8d(SS,TI)),s=dbcF_add8d(dbcF_mul8d(SS,TR),dbcF_mul8d(CC,TI)); dbcf_simd8d xl=dbcF_load8d_aligned(LR+i),yl=dbcF_load8d_aligned(LI+i); dbcf_simd8d xr=dbcF_load8d_aligned(HR+i),yr=dbcF_load8d_aligned(HI+i); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr)),y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d_aligned(dbcF_add8d(xl,x),LR+i);dbcF_store8d_aligned(dbcF_add8d(yl,y),LI+i); dbcF_store8d_aligned(dbcF_sub8d(xl,x),HR+i);dbcF_store8d_aligned(dbcF_sub8d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_8d_ua(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_8d_ua(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx512f"))) static void dbcF_butterfly_block_8d_aa( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; if(log2b<=((10)-1)) { ptrdiff_t i; dbcf_simd8d CC=dbcF_fill8d(C),SS=dbcF_fill8d(S); for(i=0;i<b;i+=8) { dbcf_simd8d TR=dbcF_load8d_aligned(tr+i),TI=dbcF_load8d_aligned(ti+i); dbcf_simd8d c=dbcF_sub8d(dbcF_mul8d(CC,TR),dbcF_mul8d(SS,TI)),s=dbcF_add8d(dbcF_mul8d(SS,TR),dbcF_mul8d(CC,TI)); dbcf_simd8d xl=dbcF_load8d_aligned(LR+i),yl=dbcF_load8d_aligned(LI+i); dbcf_simd8d xr=dbcF_load8d_aligned(HR+i),yr=dbcF_load8d_aligned(HI+i); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr)),y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d_aligned(dbcF_add8d(xl,x),LR+i);dbcF_store8d_aligned(dbcF_add8d(yl,y),LI+i); dbcF_store8d_aligned(dbcF_sub8d(xl,x),HR+i);dbcF_store8d_aligned(dbcF_sub8d(yl,y),HI+i); } } else { double X,Y; dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_8d_aa(log2n,log2b-1,LR ,LI ,HR ,HI ,C ,S ,inverse,tr,ti); dbcF_butterfly_block_8d_aa(log2n,log2b-1,LR+h,LI+h,HR+h,HI+h,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } __attribute__((target("avx512f"))) static void dbcF_butterfly_pass_8d_uu( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>8) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd8d c=dbcF_load8d(tr+d),s=dbcF_load8d(ti+d); dbcf_simd8d xl=dbcF_load8d(LR+d),yl=dbcF_load8d(LI+d); dbcf_simd8d xr=dbcF_load8d(HR+d),yr=dbcF_load8d(HI+d); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr)),y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d(dbcF_add8d(xl,x),LR+d);dbcF_store8d(dbcF_add8d(yl,y),LI+d); dbcF_store8d(dbcF_sub8d(xl,x),HR+d);dbcF_store8d(dbcF_sub8d(yl,y),HI+d); d+=8; c=dbcF_load8d(tr+d);s=dbcF_load8d(ti+d); xl=dbcF_load8d(LR+d);yl=dbcF_load8d(LI+d); xr=dbcF_load8d(HR+d);yr=dbcF_load8d(HI+d); x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr));y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d(dbcF_add8d(xl,x),LR+d);dbcF_store8d(dbcF_add8d(yl,y),LI+d); dbcF_store8d(dbcF_sub8d(xl,x),HR+d);dbcF_store8d(dbcF_sub8d(yl,y),HI+d); d+=8; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd8d C=dbcF_load8d(tr),S=dbcF_load8d(ti); dbcf_simd8d xl=dbcF_load8d(LR),yl=dbcF_load8d(LI); dbcf_simd8d xr=dbcF_load8d(HR),yr=dbcF_load8d(HI); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(C,xr),dbcF_mul8d(S,yr)),y=dbcF_add8d(dbcF_mul8d(S,xr),dbcF_mul8d(C,yr)); dbcF_store8d(dbcF_add8d(xl,x),LR);dbcF_store8d(dbcF_add8d(yl,y),LI); dbcF_store8d(dbcF_sub8d(xl,x),HR);dbcF_store8d(dbcF_sub8d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_8d_uu(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx512f"))) static void dbcF_butterfly_pass_8d_au( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>8) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd8d c=dbcF_load8d_aligned(tr+d),s=dbcF_load8d_aligned(ti+d); dbcf_simd8d xl=dbcF_load8d(LR+d),yl=dbcF_load8d(LI+d); dbcf_simd8d xr=dbcF_load8d(HR+d),yr=dbcF_load8d(HI+d); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr)),y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d(dbcF_add8d(xl,x),LR+d);dbcF_store8d(dbcF_add8d(yl,y),LI+d); dbcF_store8d(dbcF_sub8d(xl,x),HR+d);dbcF_store8d(dbcF_sub8d(yl,y),HI+d); d+=8; c=dbcF_load8d_aligned(tr+d);s=dbcF_load8d_aligned(ti+d); xl=dbcF_load8d(LR+d);yl=dbcF_load8d(LI+d); xr=dbcF_load8d(HR+d);yr=dbcF_load8d(HI+d); x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr));y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d(dbcF_add8d(xl,x),LR+d);dbcF_store8d(dbcF_add8d(yl,y),LI+d); dbcF_store8d(dbcF_sub8d(xl,x),HR+d);dbcF_store8d(dbcF_sub8d(yl,y),HI+d); d+=8; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd8d C=dbcF_load8d_aligned(tr),S=dbcF_load8d_aligned(ti); dbcf_simd8d xl=dbcF_load8d(LR),yl=dbcF_load8d(LI); dbcf_simd8d xr=dbcF_load8d(HR),yr=dbcF_load8d(HI); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(C,xr),dbcF_mul8d(S,yr)),y=dbcF_add8d(dbcF_mul8d(S,xr),dbcF_mul8d(C,yr)); dbcF_store8d(dbcF_add8d(xl,x),LR);dbcF_store8d(dbcF_add8d(yl,y),LI); dbcF_store8d(dbcF_sub8d(xl,x),HR);dbcF_store8d(dbcF_sub8d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_8d_au(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx512f"))) static void dbcF_butterfly_pass_8d_ua( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>8) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd8d c=dbcF_load8d(tr+d),s=dbcF_load8d(ti+d); dbcf_simd8d xl=dbcF_load8d_aligned(LR+d),yl=dbcF_load8d_aligned(LI+d); dbcf_simd8d xr=dbcF_load8d_aligned(HR+d),yr=dbcF_load8d_aligned(HI+d); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr)),y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d_aligned(dbcF_add8d(xl,x),LR+d);dbcF_store8d_aligned(dbcF_add8d(yl,y),LI+d); dbcF_store8d_aligned(dbcF_sub8d(xl,x),HR+d);dbcF_store8d_aligned(dbcF_sub8d(yl,y),HI+d); d+=8; c=dbcF_load8d(tr+d);s=dbcF_load8d(ti+d); xl=dbcF_load8d_aligned(LR+d);yl=dbcF_load8d_aligned(LI+d); xr=dbcF_load8d_aligned(HR+d);yr=dbcF_load8d_aligned(HI+d); x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr));y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d_aligned(dbcF_add8d(xl,x),LR+d);dbcF_store8d_aligned(dbcF_add8d(yl,y),LI+d); dbcF_store8d_aligned(dbcF_sub8d(xl,x),HR+d);dbcF_store8d_aligned(dbcF_sub8d(yl,y),HI+d); d+=8; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd8d C=dbcF_load8d(tr),S=dbcF_load8d(ti); dbcf_simd8d xl=dbcF_load8d_aligned(LR),yl=dbcF_load8d_aligned(LI); dbcf_simd8d xr=dbcF_load8d_aligned(HR),yr=dbcF_load8d_aligned(HI); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(C,xr),dbcF_mul8d(S,yr)),y=dbcF_add8d(dbcF_mul8d(S,xr),dbcF_mul8d(C,yr)); dbcF_store8d_aligned(dbcF_add8d(xl,x),LR);dbcF_store8d_aligned(dbcF_add8d(yl,y),LI); dbcF_store8d_aligned(dbcF_sub8d(xl,x),HR);dbcF_store8d_aligned(dbcF_sub8d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_8d_ua(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx512f"))) static void dbcF_butterfly_pass_8d_aa( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real; double *HR=real+h; double *LI=imag; double *HI=imag+h; if(log2n-1<=log2t) { if(h>8) for(i=0;i<c;++i) { ptrdiff_t d; for(d=0;d<h;) { dbcf_simd8d c=dbcF_load8d_aligned(tr+d),s=dbcF_load8d_aligned(ti+d); dbcf_simd8d xl=dbcF_load8d_aligned(LR+d),yl=dbcF_load8d_aligned(LI+d); dbcf_simd8d xr=dbcF_load8d_aligned(HR+d),yr=dbcF_load8d_aligned(HI+d); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr)),y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d_aligned(dbcF_add8d(xl,x),LR+d);dbcF_store8d_aligned(dbcF_add8d(yl,y),LI+d); dbcF_store8d_aligned(dbcF_sub8d(xl,x),HR+d);dbcF_store8d_aligned(dbcF_sub8d(yl,y),HI+d); d+=8; c=dbcF_load8d_aligned(tr+d);s=dbcF_load8d_aligned(ti+d); xl=dbcF_load8d_aligned(LR+d);yl=dbcF_load8d_aligned(LI+d); xr=dbcF_load8d_aligned(HR+d);yr=dbcF_load8d_aligned(HI+d); x=dbcF_sub8d(dbcF_mul8d(c,xr),dbcF_mul8d(s,yr));y=dbcF_add8d(dbcF_mul8d(s,xr),dbcF_mul8d(c,yr)); dbcF_store8d_aligned(dbcF_add8d(xl,x),LR+d);dbcF_store8d_aligned(dbcF_add8d(yl,y),LI+d); dbcF_store8d_aligned(dbcF_sub8d(xl,x),HR+d);dbcF_store8d_aligned(dbcF_sub8d(yl,y),HI+d); d+=8; } LR+=n;LI+=n; HR+=n;HI+=n; } else for(i=0;i<c;++i) { dbcf_simd8d C=dbcF_load8d_aligned(tr),S=dbcF_load8d_aligned(ti); dbcf_simd8d xl=dbcF_load8d_aligned(LR),yl=dbcF_load8d_aligned(LI); dbcf_simd8d xr=dbcF_load8d_aligned(HR),yr=dbcF_load8d_aligned(HI); dbcf_simd8d x=dbcF_sub8d(dbcF_mul8d(C,xr),dbcF_mul8d(S,yr)),y=dbcF_add8d(dbcF_mul8d(S,xr),dbcF_mul8d(C,yr)); dbcF_store8d_aligned(dbcF_add8d(xl,x),LR);dbcF_store8d_aligned(dbcF_add8d(yl,y),LI); dbcF_store8d_aligned(dbcF_sub8d(xl,x),HR);dbcF_store8d_aligned(dbcF_sub8d(yl,y),HI); LR+=n;LI+=n; HR+=n;HI+=n; } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_8d_aa(log2n,log2n-1,LR,LI,HR,HI,1.0f,0.0f,inverse,tr,ti); LR+=n;LI+=n; HR+=n;HI+=n; } } } __attribute__((target("avx512f"))) static void dbcF_compute_twiddles_8d_u(ptrdiff_t log2n,ptrdiff_t log2b,double *real,double *imag,int inverse) { ptrdiff_t i; real[0]=0.0e0; imag[0]=0.0e0; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); double x,y; dbcF_cexpm1_d(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=8) { dbcf_simd8d X=dbcF_fill8d(x); dbcf_simd8d Y=dbcF_fill8d(y); for(j=0;j<k;j+=8) { dbcf_simd8d R=dbcF_load8d(real+j); dbcf_simd8d I=dbcF_load8d(imag+j); dbcF_store8d(dbcF_add8d(dbcF_sub8d(dbcF_mul8d(X,R),dbcF_mul8d(Y,I)),dbcF_add8d(X,R)),real+k+j); dbcF_store8d(dbcF_add8d(dbcF_add8d(dbcF_mul8d(Y,R),dbcF_mul8d(X,I)),dbcF_add8d(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0e0 +real[i]; } __attribute__((target("avx512f"))) static void dbcF_compute_twiddles_8d_a(ptrdiff_t log2n,ptrdiff_t log2b,double *real,double *imag,int inverse) { ptrdiff_t i; real[0]=0.0e0; imag[0]=0.0e0; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); double x,y; dbcF_cexpm1_d(log2n-i,&x,&y); if(!inverse) y=-y; if(k>=8) { dbcf_simd8d X=dbcF_fill8d(x); dbcf_simd8d Y=dbcF_fill8d(y); for(j=0;j<k;j+=8) { dbcf_simd8d R=dbcF_load8d_aligned(real+j); dbcf_simd8d I=dbcF_load8d_aligned(imag+j); dbcF_store8d_aligned(dbcF_add8d(dbcF_sub8d(dbcF_mul8d(X,R),dbcF_mul8d(Y,I)),dbcF_add8d(X,R)),real+k+j); dbcF_store8d_aligned(dbcF_add8d(dbcF_add8d(dbcF_mul8d(Y,R),dbcF_mul8d(X,I)),dbcF_add8d(Y,I)),imag+k+j); } } else for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0e0 +real[i]; } __attribute__((target("avx512f"))) static void dbcF_fft8_8d(double *real,double *imag,int inverse) {dbcF_fft8_d(real,imag,1,1,inverse,0.70710678118654752438e0);} static ptrdiff_t dbcF_butterfly_pass_optimized_float( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, int inverse, ptrdiff_t log2t, float *tr,float *ti, int simd_flags) { if(simd_flags&16) { int alignt=((((ptrdiff_t)(tr))&((ptrdiff_t)(16*sizeof(float))-1))==0)&&((((ptrdiff_t)(ti))&((ptrdiff_t)(16*sizeof(float))-1))==0); int alignd=((((ptrdiff_t)(real))&((ptrdiff_t)(16*sizeof(float))-1))==0)&&((((ptrdiff_t)(imag))&((ptrdiff_t)(16*sizeof(float))-1))==0); if(((16<<2)>>log2n)<=1&&((16<<1)>>log2t)<=1) { if(alignt) dbcF_compute_twiddles_16f_a(log2n,log2t,tr,ti,inverse); else dbcF_compute_twiddles_16f_u(log2n,log2t,tr,ti,inverse); switch(2*alignd+alignt) { case 0: dbcF_butterfly_pass_16f_uu(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 1: dbcF_butterfly_pass_16f_au(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 2: dbcF_butterfly_pass_16f_ua(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 3: dbcF_butterfly_pass_16f_aa(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; } return 1; } }; if(simd_flags&4) { int alignt=((((ptrdiff_t)(tr))&((ptrdiff_t)(8*sizeof(float))-1))==0)&&((((ptrdiff_t)(ti))&((ptrdiff_t)(8*sizeof(float))-1))==0); int alignd=((((ptrdiff_t)(real))&((ptrdiff_t)(8*sizeof(float))-1))==0)&&((((ptrdiff_t)(imag))&((ptrdiff_t)(8*sizeof(float))-1))==0); if(((8<<2)>>log2n)<=1&&((8<<1)>>log2t)<=1) { if(alignt) dbcF_compute_twiddles_8f_a(log2n,log2t,tr,ti,inverse); else dbcF_compute_twiddles_8f_u(log2n,log2t,tr,ti,inverse); switch(2*alignd+alignt) { case 0: dbcF_butterfly_pass_8f_uu(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 1: dbcF_butterfly_pass_8f_au(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 2: dbcF_butterfly_pass_8f_ua(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 3: dbcF_butterfly_pass_8f_aa(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; } return 1; } }; if(simd_flags&1) { int alignt=((((ptrdiff_t)(tr))&((ptrdiff_t)(4*sizeof(float))-1))==0)&&((((ptrdiff_t)(ti))&((ptrdiff_t)(4*sizeof(float))-1))==0); int alignd=((((ptrdiff_t)(real))&((ptrdiff_t)(4*sizeof(float))-1))==0)&&((((ptrdiff_t)(imag))&((ptrdiff_t)(4*sizeof(float))-1))==0); if(((4<<2)>>log2n)<=1&&((4<<1)>>log2t)<=1) { if(alignt) dbcF_compute_twiddles_4f_a(log2n,log2t,tr,ti,inverse); else dbcF_compute_twiddles_4f_u(log2n,log2t,tr,ti,inverse); switch(2*alignd+alignt) { case 0: dbcF_butterfly_pass_4f_uu(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 1: dbcF_butterfly_pass_4f_au(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 2: dbcF_butterfly_pass_4f_ua(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 3: dbcF_butterfly_pass_4f_aa(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; } return 1; } }; return 0; } static ptrdiff_t dbcF_butterfly_multipass_optimized_float( ptrdiff_t log2n, ptrdiff_t log2c, ptrdiff_t depth, float *real,float *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, float *tr,float *ti) { ptrdiff_t ret=0; int simd_flags; if(real_stride!=1||imag_stride!=1) return 0; simd_flags=dbcf_detect_simd(); if(!(simd_flags&(1|4|16))) return 0; if(((10)-1)<3) return 0; if(depth==log2n&&depth>=3) { ptrdiff_t j,m=(((ptrdiff_t)1)<<(log2n+log2c-3)); if(simd_flags&16) {for(j=0;j<m;++j) {dbcF_fft8_16f(real+8*j,imag+8*j,inverse);} goto ok;} if(simd_flags&4) {for(j=0;j<m;++j) {dbcF_fft8_8f (real+8*j,imag+8*j,inverse);} goto ok;} if(simd_flags&1) {for(j=0;j<m;++j) {dbcF_fft8_4f (real+8*j,imag+8*j,inverse);} goto ok;} return 0; ok: depth-=3; ret=3; } if(log2n-depth+1>3) { ptrdiff_t log2d; for(log2d=log2n-depth+1;log2d<=log2n;++log2d) { ptrdiff_t log2t=(log2d-1<((10)-1)?log2d-1:((10)-1)); if(dbcF_butterfly_pass_optimized_float(log2d,log2c+log2n-log2d,real,imag,inverse,log2t,tr,ti,simd_flags)) ++ret; else break; } return ret; } return 0; } static ptrdiff_t dbcF_butterfly_pass_optimized_double( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, int inverse, ptrdiff_t log2t, double *tr,double *ti, int simd_flags) { if(simd_flags&32) { int alignt=((((ptrdiff_t)(tr))&((ptrdiff_t)(8*sizeof(double))-1))==0)&&((((ptrdiff_t)(ti))&((ptrdiff_t)(8*sizeof(double))-1))==0); int alignd=((((ptrdiff_t)(real))&((ptrdiff_t)(8*sizeof(double))-1))==0)&&((((ptrdiff_t)(imag))&((ptrdiff_t)(8*sizeof(double))-1))==0); if(((8<<2)>>log2n)<=1&&((8<<1)>>log2t)<=1) { if(alignt) dbcF_compute_twiddles_8d_a(log2n,log2t,tr,ti,inverse); else dbcF_compute_twiddles_8d_u(log2n,log2t,tr,ti,inverse); switch(2*alignd+alignt) { case 0: dbcF_butterfly_pass_8d_uu(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 1: dbcF_butterfly_pass_8d_au(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 2: dbcF_butterfly_pass_8d_ua(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 3: dbcF_butterfly_pass_8d_aa(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; } return 1; } }; if(simd_flags&8) { int alignt=((((ptrdiff_t)(tr))&((ptrdiff_t)(4*sizeof(double))-1))==0)&&((((ptrdiff_t)(ti))&((ptrdiff_t)(4*sizeof(double))-1))==0); int alignd=((((ptrdiff_t)(real))&((ptrdiff_t)(4*sizeof(double))-1))==0)&&((((ptrdiff_t)(imag))&((ptrdiff_t)(4*sizeof(double))-1))==0); if(((4<<2)>>log2n)<=1&&((4<<1)>>log2t)<=1) { if(alignt) dbcF_compute_twiddles_4d_a(log2n,log2t,tr,ti,inverse); else dbcF_compute_twiddles_4d_u(log2n,log2t,tr,ti,inverse); switch(2*alignd+alignt) { case 0: dbcF_butterfly_pass_4d_uu(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 1: dbcF_butterfly_pass_4d_au(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 2: dbcF_butterfly_pass_4d_ua(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 3: dbcF_butterfly_pass_4d_aa(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; } return 1; } }; if(simd_flags&2) { int alignt=((((ptrdiff_t)(tr))&((ptrdiff_t)(2*sizeof(double))-1))==0)&&((((ptrdiff_t)(ti))&((ptrdiff_t)(2*sizeof(double))-1))==0); int alignd=((((ptrdiff_t)(real))&((ptrdiff_t)(2*sizeof(double))-1))==0)&&((((ptrdiff_t)(imag))&((ptrdiff_t)(2*sizeof(double))-1))==0); if(((2<<2)>>log2n)<=1&&((2<<1)>>log2t)<=1) { if(alignt) dbcF_compute_twiddles_2d_a(log2n,log2t,tr,ti,inverse); else dbcF_compute_twiddles_2d_u(log2n,log2t,tr,ti,inverse); switch(2*alignd+alignt) { case 0: dbcF_butterfly_pass_2d_uu(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 1: dbcF_butterfly_pass_2d_au(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 2: dbcF_butterfly_pass_2d_ua(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; case 3: dbcF_butterfly_pass_2d_aa(log2n,log2c,real,imag,inverse,log2t,tr,ti); break; } return 1; } }; return 0; } static ptrdiff_t dbcF_butterfly_multipass_optimized_double( ptrdiff_t log2n, ptrdiff_t log2c, ptrdiff_t depth, double *real,double *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, double *tr,double *ti) { ptrdiff_t ret=0; int simd_flags; if(real_stride!=1||imag_stride!=1) return 0; simd_flags=dbcf_detect_simd(); if(!(simd_flags&(2|8))) return 0; if(((10)-1)<2) return 0; if(depth==log2n&&depth>=3) { ptrdiff_t j,m=(((ptrdiff_t)1)<<(log2n+log2c-3)); if(simd_flags&32) {for(j=0;j<m;++j) {dbcF_fft8_8d(real+8*j,imag+8*j,inverse);} goto ok;} if(simd_flags&8) {for(j=0;j<m;++j) {dbcF_fft8_4d(real+8*j,imag+8*j,inverse);} goto ok;} if(simd_flags&2) {for(j=0;j<m;++j) {dbcF_fft8_2d(real+8*j,imag+8*j,inverse);} goto ok;} ok: depth-=3; ret=3; } if(log2n-depth+1>3) { ptrdiff_t log2d; for(log2d=log2n-depth+1;log2d<=log2n;++log2d) { ptrdiff_t log2t=(log2d-1<((10)-1)?log2d-1:((10)-1)); if(dbcF_butterfly_pass_optimized_double(log2d,log2c+log2n-log2d,real,imag,inverse,log2t,tr,ti,simd_flags)) ++ret; else break; } return ret; } return 0; } static void dbcF_init() { (void)dbcf_detect_simd(); } static void dbcF_cexpm1_f(ptrdiff_t log2n,float *real,float *imag) { static float table[][2]={ { 0.0e0f ,0.0e0f }, {-2.0e0f ,0.0e0f }, {-1.0e0f ,1.0e0f }, {-2.928932188134524755991556378951509607151e-1f,7.071067811865475244008443621048490392848e-1f}, {-7.612046748871324387181681060321171317758e-2f,3.826834323650897717284599840303988667613e-1f}, {-1.921471959676955087381776386576096302606e-2f,1.950903220161282678482848684770222409276e-1f}, {-4.815273327803113755163046890520078424525e-3f,9.801714032956060199419556388864184586113e-2f}, {-1.204543794827607285228395240899305556796e-3f,4.906767432741801425495497694268265831474e-2f}, {-3.011813037957798842343503338278031499389e-4f,2.454122852291228803173452945928292506546e-2f}, {-7.529816085545907835350880361677564939353e-5f,1.227153828571992607940826195100321214037e-2f}, {-1.882471739885734300956227143228382608274e-5f,6.135884649154475359640234590372580917057e-3f}, {-4.706190423828488419874299880100447012366e-6f,3.067956762965976270145365490919842518944e-3f}, {-1.176548298090070974289828473980951732077e-6f,1.533980186284765612303697150264079079954e-3f}, {-2.941371177808397717822612343228837361006e-7f,7.669903187427045269385683579485766431409e-4f}, {-7.353428214885526851929261214305179884431e-8f,3.834951875713955890724616811813812633950e-4f}, {-1.838357070619165308459709028549492394875e-8f,1.917475973107033074399095619890009334688e-4f}, {-4.595892687109028066860393851041105696810e-9f,9.587379909597734587051721097647635118706e-5f} }; if(log2n<(ptrdiff_t)(sizeof(table)/(sizeof(table[0])))) { *real=table[log2n][0]; *imag=table[log2n][1]; } else { ptrdiff_t n=((ptrdiff_t)1)<<log2n; const float C1=1.0e0f; const float C2=5.0e-1f; const float C3=1.666666666666666666666666666666666666666e-1f; const float C4=4.166666666666666666666666666666666666666e-2f; const float C5=8.333333333333333333333333333333333333333e-3f; const float C6=1.388888888888888888888888888888888888888e-3f; const float C7=1.984126984126984126984126984126984126984e-4f; const float C8=2.480158730158730158730158730158730158730e-5f; float x=6.283185307179586476925286766559005768f/(float)n; float x2=x*x; *real=-x2*(C2-x2*(C4-x2*(C6-x2*C8))); *imag=x*(C1-x2*(C3-x2*(C5-x2*C7))); } } static void dbcF_cexp_f(ptrdiff_t log2n,float *real,float *imag) { dbcF_cexpm1_f(log2n,real,imag); *real=1.0f +*real; } static void dbcF_compute_twiddles_f(ptrdiff_t log2n,ptrdiff_t log2b,float *real,float *imag,int inverse) { ptrdiff_t i; real[0]=0.0f; imag[0]=0.0f; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); float x,y; dbcF_cexpm1_f(log2n-i,&x,&y); if(!inverse) y=-y; for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0f +real[i]; } static void dbcF_cexpm1_npot_f(ptrdiff_t p,ptrdiff_t q,float *real,float *imag) { ptrdiff_t i; float C=1.0f,S=1.0f; float x=6.283185307179586476925286766559005768f*(float)p/(float)q,x2=x*x; float I=32.0f; for(i=32;i>=0;--i) { float J=(2.0f*I+3.0f),K=I+I+3.0f; J=J*J; C=1.0f -x2*C/(J+K); S=1.0f -x2*S/(J-K); I=I-1.0f; } C=-C*0.5f*x2; S=S*x; *real=C; *imag=S; } static void dbcF_compute_twiddles_npot_f(ptrdiff_t n,float *real,float *imag,int inverse) { ptrdiff_t i,j,k,m=n>>1,h=(m+2)>>1; if(n<1) return; real[0]=0.0f; imag[0]=0.0f; for(i=1;i<h;i*=2) { float X,Y; dbcF_cexpm1_npot_f(i,n,&X,&Y); if(!inverse) Y=-Y; j=(h<i*2?h-i:i); for(k=0;k<j;++k) { real[i+k]=(X*real[k]-Y*imag[k])+(X+real[k]); imag[i+k]=(Y*real[k]+X*imag[k])+(Y+imag[k]); } } for(i=0;i<h;++i) real[i]=1.0f +real[i]; for(i=h;i<m;++i) { real[i]=-real[m-i]; imag[i]= imag[m-i]; } for(i=0;i<m;++i) { real[m+i]=-real[i]; imag[m+i]=-imag[i]; } } static void dbcF_bitreversal_swap_f(ptrdiff_t log2n,float *src,ptrdiff_t src_stride,float *dst,ptrdiff_t dst_stride) { ptrdiff_t i,n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; if(log2n<=8) { const unsigned char *idx=dbcF_bitreverse_table+(((ptrdiff_t)1)<<(log2n)); for(i=0;i<n;++i) { ptrdiff_t j=(ptrdiff_t)(idx[i]); float x=src[i*src_stride]; float y=dst[j*dst_stride]; src[i*src_stride]=y; dst[j*dst_stride]=x; } } else { dbcF_bitreversal_swap_f(log2n-1,src ,2*src_stride,dst ,dst_stride); dbcF_bitreversal_swap_f(log2n-1,src+src_stride,2*src_stride,dst+h*dst_stride,dst_stride); } } static void dbcF_bitreversal_permutation_f(ptrdiff_t log2n,const float *src,ptrdiff_t src_stride,float *dst,ptrdiff_t dst_stride,float *tmp) { ptrdiff_t i,n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; if(src_stride==0) { float x=src[0]; for(i=0;i<n;++i) dst[i*dst_stride]=x; } else if(src==dst) { if(log2n<=8) { const unsigned char *idx=dbcF_bitreverse_table+(((ptrdiff_t)1)<<(log2n)); for(i=0;i<n;++i) { ptrdiff_t j=(ptrdiff_t)(idx[i]); if(i<j) { float x=dst[i*dst_stride]; float y=dst[j*dst_stride]; dst[i*dst_stride]=y; dst[j*dst_stride]=x; } } } else if(log2n<=2*((((10)>>1)<6?((10)>>1):6))+2||log2n<=16) { dbcF_bitreversal_swap_f(log2n-2,dst+dst_stride,2*dst_stride,dst+h*dst_stride,2*dst_stride); dbcF_bitreversal_permutation_f(log2n-2,dst ,2*dst_stride,dst ,2*dst_stride,tmp); dbcF_bitreversal_permutation_f(log2n-2,dst+(h+1)*dst_stride,2*dst_stride,dst+(h+1)*dst_stride,2*dst_stride,tmp); } else { ptrdiff_t a,b,c,log2m=log2n-2*((((10)>>1)<6?((10)>>1):6)); ptrdiff_t m=(((ptrdiff_t)1)<<(log2m)); ptrdiff_t pow2q=(((ptrdiff_t)1)<<((((10)>>1)<6?((10)>>1):6))); for(b=0;b<m;++b) { ptrdiff_t ib=dbcF_bitreverse(b,log2m); if(ib<b) continue; for(a=0;a<pow2q;++a) for(c=0;c<pow2q;++c) tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]=dst[((a<<(log2n-((((10)>>1)<6?((10)>>1):6))))^(b<<((((10)>>1)<6?((10)>>1):6)))^c)*dst_stride]; for(c=0;c<pow2q;++c) { ptrdiff_t ic=dbcF_bitreverse(c,((((10)>>1)<6?((10)>>1):6))); for(a=0;a<pow2q;++a) { ptrdiff_t ia=dbcF_bitreverse(a,((((10)>>1)<6?((10)>>1):6))); float t; i=(ic<<(log2n-((((10)>>1)<6?((10)>>1):6))))^(ib<<((((10)>>1)<6?((10)>>1):6)))^ia; t=dst[i*dst_stride]; dst[i*dst_stride]=tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]; tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]=t; } } if(b!=ib) for(a=0;a<pow2q;++a) for(c=0;c<pow2q;++c) dst[((a<<(log2n-((((10)>>1)<6?((10)>>1):6))))^(b<<((((10)>>1)<6?((10)>>1):6)))^c)*dst_stride]=tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]; } } } else { if(log2n<=8) { const unsigned char *idx=dbcF_bitreverse_table+(((ptrdiff_t)1)<<(log2n)); for(i=0;i<n;++i) { ptrdiff_t j=(ptrdiff_t)(idx[i]); dst[j*dst_stride]=src[i*src_stride]; } } else if(log2n<=16) { dbcF_bitreversal_permutation_f(log2n-1,src ,2*src_stride,dst ,dst_stride,tmp); dbcF_bitreversal_permutation_f(log2n-1,src+src_stride,2*src_stride,dst+h*dst_stride,dst_stride,tmp); } else { for(i=0;i<h;++i) { dst[ i *dst_stride]=src[(2*i )*src_stride]; dst[(i+h)*dst_stride]=src[(2*i+1)*src_stride]; } dbcF_bitreversal_permutation_f(log2n-1,dst ,dst_stride,dst ,dst_stride,tmp); dbcF_bitreversal_permutation_f(log2n-1,dst+h*dst_stride,dst_stride,dst+h*dst_stride,dst_stride,tmp); } } } static void dbcF_fft8_f( float *real,float *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse,float c) { float r0,r1,r2,r3,r4,r5,r6,r7; float i0,i1,i2,i3,i4,i5,i6,i7; float R0,R1,R2,R3,R4,R5,R6,R7; float I0,I1,I2,I3,I4,I5,I6,I7; float p5,m5,p7,m7; r0=real[0*real_stride];i0=imag[0*imag_stride]; r1=real[1*real_stride];i1=imag[1*imag_stride]; r2=real[2*real_stride];i2=imag[2*imag_stride]; r3=real[3*real_stride];i3=imag[3*imag_stride]; r4=real[4*real_stride];i4=imag[4*imag_stride]; r5=real[5*real_stride];i5=imag[5*imag_stride]; r6=real[6*real_stride];i6=imag[6*imag_stride]; r7=real[7*real_stride];i7=imag[7*imag_stride]; R0=r0+r1;R1=r0-r1;I0=i0+i1;I1=i0-i1; R2=r2+r3;R3=r2-r3;I2=i2+i3;I3=i2-i3; R4=r4+r5;R5=r4-r5;I4=i4+i5;I5=i4-i5; R6=r6+r7;R7=r6-r7;I6=i6+i7;I7=i6-i7; if(!inverse) { r0=R0+R2;i0=I0+I2; r1=R1+I3;i1=I1-R3; r2=R0-R2;i2=I0-I2; r3=R1-I3;i3=I1+R3; r4=R4+R6;i4=I4+I6; r5=R5+I7;i5=I5-R7; r6=R4-R6;i6=I4-I6; r7=R5-I7;i7=I5+R7; p5=c*(r5+i5);m5=c*(r5-i5); p7=c*(r7+i7);m7=c*(r7-i7); real[0*real_stride]=r0+r4;imag[0*imag_stride]=i0+i4; real[1*real_stride]=r1+p5;imag[1*imag_stride]=i1-m5; real[2*real_stride]=r2+i6;imag[2*imag_stride]=i2-r6; real[3*real_stride]=r3-m7;imag[3*imag_stride]=i3-p7; real[4*real_stride]=r0-r4;imag[4*imag_stride]=i0-i4; real[5*real_stride]=r1-p5;imag[5*imag_stride]=i1+m5; real[6*real_stride]=r2-i6;imag[6*imag_stride]=i2+r6; real[7*real_stride]=r3+m7;imag[7*imag_stride]=i3+p7; } else { r0=R0+R2;i0=I0+I2; r1=R1-I3;i1=I1+R3; r2=R0-R2;i2=I0-I2; r3=R1+I3;i3=I1-R3; r4=R4+R6;i4=I4+I6; r5=R5-I7;i5=I5+R7; r6=R4-R6;i6=I4-I6; r7=R5+I7;i7=I5-R7; p5=c*(r5+i5);m5=c*(r5-i5); p7=c*(r7+i7);m7=c*(r7-i7); real[0*real_stride]=r0+r4;imag[0*imag_stride]=i0+i4; real[1*real_stride]=r1+m5;imag[1*imag_stride]=i1+p5; real[2*real_stride]=r2-i6;imag[2*imag_stride]=i2+r6; real[3*real_stride]=r3-p7;imag[3*imag_stride]=i3+m7; real[4*real_stride]=r0-r4;imag[4*imag_stride]=i0-i4; real[5*real_stride]=r1-m5;imag[5*imag_stride]=i1-p5; real[6*real_stride]=r2+i6;imag[6*imag_stride]=i2-r6; real[7*real_stride]=r3+p7;imag[7*imag_stride]=i3-m7; } } static void dbcF_butterfly_block_f( ptrdiff_t log2n, ptrdiff_t log2b, float *LR,float *LI, float *HR,float *HI, ptrdiff_t real_stride,ptrdiff_t imag_stride, float C,float S, int inverse, const float *tr,const float *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; float X,Y; if(log2b<=((10)-1)) { ptrdiff_t i,j=0,k=0; for(i=0;i<b;++i) { float c=C*tr[i]-S*ti[i]; float s=S*tr[i]+C*ti[i]; float xl=LR[j],yl=LI[k]; float xr=HR[j],yr=HI[k]; float x=c*xr-s*yr; float y=s*xr+c*yr; LR[j]=xl+x; LI[k]=yl+y; HR[j]=xl-x; HI[k]=yl-y; j+=real_stride; k+=imag_stride; } } else { dbcF_cexp_f(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_f(log2n,log2b-1,LR ,LI ,HR ,HI ,real_stride,imag_stride,C ,S ,inverse,tr,ti); dbcF_butterfly_block_f(log2n,log2b-1,LR+h*real_stride,LI+h*imag_stride,HR+h*real_stride,HI+h*imag_stride,real_stride,imag_stride,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_pass_f( ptrdiff_t log2n, ptrdiff_t log2c, float *real,float *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, ptrdiff_t log2t, const float *tr,const float *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; float *LR=real,*HR=real+h*real_stride; float *LI=imag,*HI=imag+h*imag_stride; if(log2n==0) return; if(log2n-1<=log2t) { if(h>1) { for(i=0;i<c;++i) { ptrdiff_t d,j=0,k=0; for(d=0;d<h;d+=2) { float C,S,xl,yl,xr,yr,x,y; C=tr[d];S=ti[d]; xl=LR[j];yl=LI[k]; xr=HR[j];yr=HI[k]; x=C*xr-S*yr;y=S*xr+C*yr; LR[j]=xl+x;LI[k]=yl+y; HR[j]=xl-x;HI[k]=yl-y; j+=real_stride;k+=imag_stride; C=tr[d+1];S=ti[d+1]; xl=LR[j];yl=LI[k]; xr=HR[j];yr=HI[k]; x=C*xr-S*yr;y=S*xr+C*yr; LR[j]=xl+x;LI[k]=yl+y; HR[j]=xl-x;HI[k]=yl-y; j+=real_stride;k+=imag_stride; } LR+=n*real_stride;LI+=n*imag_stride; HR+=n*real_stride;HI+=n*imag_stride; } } else { for(i=0;i<c;++i) { float xl,yl,xr,yr; xl=LR[0];yl=LI[0]; xr=HR[0];yr=HI[0]; LR[0]=xl+xr;LI[0]=yl+yr; HR[0]=xl-xr;HI[0]=yl-yr; LR+=n*real_stride;LI+=n*imag_stride; HR+=n*real_stride;HI+=n*imag_stride; } } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_f(log2n,log2n-1,LR,LI,HR,HI,real_stride,imag_stride,1.0f,0.0f,inverse,tr,ti); LR+=n*real_stride;LI+=n*imag_stride; HR+=n*real_stride;HI+=n*imag_stride; } } } static void dbcF_butterfly_multipass_f( ptrdiff_t log2n, ptrdiff_t log2c, ptrdiff_t depth, float *real,float *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, float *tr,float *ti) { while(depth>0) { ptrdiff_t log2d,log2t; ptrdiff_t d=dbcF_butterfly_multipass_optimized_float( log2n,log2c,depth, real,imag, real_stride,imag_stride, inverse, tr,ti); if(d>0) {depth-=d;continue;} if(depth==log2n&&depth>=3) { ptrdiff_t j,m=(((ptrdiff_t)1)<<(log2n+log2c-3)); dbcF_cexp_f(3,tr,ti); for(j=0;j<m;++j) dbcF_fft8_f(real+8*real_stride*j,imag+8*imag_stride*j,real_stride,imag_stride,inverse,tr[0]); depth-=3; continue; } log2d=log2n-depth+1; log2t=(log2d-1<((10)-1)?log2d-1:((10)-1)); dbcF_compute_twiddles_f(log2d,log2t,tr,ti,inverse); dbcF_butterfly_pass_f( log2d, log2c+log2n-log2d, real,imag, real_stride,imag_stride, inverse, log2t, tr,ti); depth-=1; } } static void dbcF_butterfly_f( ptrdiff_t log2n, float *real,float *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, float *tmp) { float *tr=tmp; float *ti=tmp+(((ptrdiff_t)1)<<(((10)-1))); if(log2n>12) { dbcF_butterfly_f( log2n-1, real,imag, real_stride,imag_stride, inverse, tmp); dbcF_butterfly_f( log2n-1, real+(((ptrdiff_t)1)<<(log2n-1))*real_stride,imag+(((ptrdiff_t)1)<<(log2n-1))*imag_stride, real_stride,imag_stride, inverse, tmp); dbcF_butterfly_multipass_f( log2n,0,1, real,imag, real_stride,imag_stride, inverse, tr,ti); } else { dbcF_butterfly_multipass_f( log2n,0,log2n, real,imag, real_stride,imag_stride, inverse, tr,ti); } } static void dbcF_deinterleave_f(float *dst,ptrdiff_t log2n,float *tmp) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; if(n<=2) return; if(n<=(((ptrdiff_t)1)<<(10))) { ptrdiff_t i,h=n>>1; float *real=tmp,*imag=tmp+h; for(i=0;i<h;++i) { real[i]=dst[2*i+0]; imag[i]=dst[2*i+1]; } for(i=0;i<n;++i) dst[i]=tmp[i]; return; } dbcF_bitreversal_permutation_f(log2n ,dst ,1,dst ,1,tmp); dbcF_bitreversal_permutation_f(log2n-1,dst ,1,dst ,1,tmp); dbcF_bitreversal_permutation_f(log2n-1,dst+h,1,dst+h,1,tmp); } static void dbcF_interleave_f(float *dst,ptrdiff_t log2n,float *tmp) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; if(n<=2) return; if(n<=(((ptrdiff_t)1)<<(10))) { ptrdiff_t i,h=n>>1; const float *real=dst,*imag=dst+h; for(i=0;i<h;++i) { tmp[2*i+0]=real[i]; tmp[2*i+1]=imag[i]; } for(i=0;i<n;++i) dst[i]=tmp[i]; return; } dbcF_bitreversal_permutation_f(log2n-1,dst ,1,dst ,1,tmp); dbcF_bitreversal_permutation_f(log2n-1,dst+h,1,dst+h,1,tmp); dbcF_bitreversal_permutation_f(log2n ,dst ,1,dst ,1,tmp); } static int dbcF_fft_pot_f( ptrdiff_t num_elements, const float *src_real,const float *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, float *dst_real, float *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, int inverse, float scale) { __attribute__((aligned(32))) float dummy[2]={0.0f,0.0f}; __attribute__((aligned(32))) float tmp[(((ptrdiff_t)1)<<(10))]; ptrdiff_t n=num_elements; ptrdiff_t log2n=(ptrdiff_t)-1; ptrdiff_t i; int needs_deinterleave=(dst_real_stride==2&&dst_imag_stride==2&&dst_imag==dst_real+1&&num_elements>16); while(n) {n>>=1;++log2n;} if(!src_real) {src_real=dummy ;src_real_stride=0;} if(!src_imag) {src_imag=dummy+1;src_imag_stride=0;} dbcF_bitreversal_permutation_f(log2n,src_real,src_real_stride,dst_real,dst_real_stride,tmp); dbcF_bitreversal_permutation_f(log2n,src_imag,src_imag_stride,dst_imag,dst_imag_stride,tmp); if(needs_deinterleave) { dbcF_deinterleave_f(dst_real,log2n+1,tmp); dbcF_butterfly_f( log2n, dst_real,dst_real+num_elements, 1,1, inverse,tmp); } else dbcF_butterfly_f( log2n, dst_real,dst_imag, dst_real_stride,dst_imag_stride, inverse,tmp); if(needs_deinterleave) dbcF_interleave_f(dst_real,log2n+1,tmp); if(scale!=1.0f) for(i=0;i<num_elements;++i) { dst_real[i]=dst_real[i]*scale; dst_imag[i]=dst_imag[i]*scale; } return 0; } static int dbcF_fft_npot_f( ptrdiff_t n, const float *src_real,const float *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, float *dst_real, float *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, int inverse, float scale) { unsigned char *buf,*mem=0; float *ar,*ai,*br,*bi,*tr,*ti; float M=1.0f; ptrdiff_t i,j,log2m=0,m; ptrdiff_t alignment=64; while((((ptrdiff_t)1)<<(log2m))<2*n-1) {++log2m;M=M+M;} m=(((ptrdiff_t)1)<<(log2m)); if(!(mem=(unsigned char*)malloc((size_t)((4*m+4*n)*(ptrdiff_t)sizeof(float)+alignment)))) return (-2); buf=mem; if(alignment) { ptrdiff_t offset=((ptrdiff_t)buf)&(alignment-1); if(offset) buf+=alignment-offset; } ar=(float*)buf+0*m; ai=(float*)buf+1*m; br=(float*)buf+2*m; bi=(float*)buf+3*m; tr=(float*)buf+4*m+0*n; ti=(float*)buf+4*m+2*n; dbcF_compute_twiddles_npot_f(2*n,tr,ti,inverse); for(i=0,j=0;i<n;++i) { float c=tr[j],s=ti[j]; float x=src_real[i*src_real_stride],y=src_imag[i*src_imag_stride]; ar[i]=x*c-y*s; ai[i]=x*s+y*c; br[i]= c; bi[i]=-s; if(i>0) { br[m-i]= c; bi[m-i]=-s; } j+=(2*i+1); if(j>=2*n) j-=2*n; } for(i=n;i<m;++i) { ar[i]=0.0f; ai[i]=0.0f; } for(i=n;i<=m-n;++i) { br[i]=0.0f; bi[i]=0.0f; } dbcF_fft_pot_f(m,ar,ai,1,1,ar,ai,1,1,0,1.0f/M); dbcF_fft_pot_f(m,br,bi,1,1,br,bi,1,1,0,1.0f); for(i=0;i<m;++i) { float c=br[i],s=bi[i],x=ar[i],y=ai[i]; ar[i]=c*x-s*y; ai[i]=c*y+s*x; } dbcF_fft_pot_f(m,ar,ai,1,1,ar,ai,1,1,1,scale); for(i=0,j=0;i<n;++i) { float c=tr[j],s=ti[j],x=ar[i],y=ai[i]; dst_real[i*dst_real_stride]=c*x-s*y; dst_imag[i*dst_imag_stride]=c*y+s*x; j+=(2*i+1); if(j>=2*n) j-=2*n; } free(mem); return 0; } static int dbcF_fft_f( ptrdiff_t num_elements, const float *src_real,const float *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, float *dst_real, float *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, int inverse, float scale) { dbcF_init(); if(num_elements<1) return 0; if(src_real==dst_real&&src_real_stride!=dst_real_stride) return (-1); if(src_imag==dst_imag&&src_imag_stride!=dst_imag_stride) return (-1); if(src_imag==dst_real) return (-1); if(src_real==dst_imag) return (-1); if(num_elements&(num_elements-1)) return dbcF_fft_npot_f( num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, inverse, scale); return dbcF_fft_pot_f( num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, inverse, scale); } extern int dbc_fft_fc( ptrdiff_t num_elements, const float *src_real,const float *src_imag, float *dst_real, float *dst_imag, float scale) { return dbcF_fft_f(num_elements, src_real,src_imag, 1,1, dst_real,dst_imag, 1,1, 0, scale); } extern int dbc_ifft_fc( ptrdiff_t num_elements, const float *src_real,const float *src_imag, float *dst_real, float *dst_imag, float scale) { return dbcF_fft_f(num_elements, src_real,src_imag, 1,1, dst_real,dst_imag, 1,1, 1, scale); } extern int dbc_fft_fi( ptrdiff_t num_elements, const float *src, float *dst, float scale) { return dbcF_fft_f(num_elements, src,(src?src+1:src), (src?2:0),(src?2:0), dst,dst+1, (src?2:0),(src?2:0), 0, scale); } extern int dbc_ifft_fi( ptrdiff_t num_elements, const float *src, float *dst, float scale) { return dbcF_fft_f(num_elements, src,(src?src+1:src), (src?2:0),(src?2:0), dst,dst+1, (src?2:0),(src?2:0), 1, scale); } extern int dbc_fft_fs( ptrdiff_t num_elements, const float *src_real,const float *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, float *dst_real, float *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, float scale) { return dbcF_fft_f(num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, 0, scale); } extern int dbc_ifft_fs( ptrdiff_t num_elements, const float *src_real,const float *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, float *dst_real, float *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, float scale) { return dbcF_fft_f(num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, 1, scale); } extern int dbc_fft_fp(dbcf_params_f *params) { return dbcF_fft_f(params->num_elements, params->src_real,params->src_imag, params->src_real_stride,params->src_imag_stride, params->dst_real,params->dst_imag, params->dst_real_stride,params->dst_imag_stride, 0, params->scale); } extern int dbc_ifft_fp(dbcf_params_f *params) { return dbcF_fft_f(params->num_elements, params->src_real,params->src_imag, params->src_real_stride,params->src_imag_stride, params->dst_real,params->dst_imag, params->dst_real_stride,params->dst_imag_stride, 1, params->scale); } static void dbcF_cexpm1_d(ptrdiff_t log2n,double *real,double *imag) { static double table[][2]={ { (0.0e0) ,(0.0e0) }, {-(2.0e0) ,(0.0e0) }, {-(1.0e0) ,(1.0e0) }, {-(2.928932188134524755991556378951509607151e-1),(7.071067811865475244008443621048490392848e-1)}, {-(7.612046748871324387181681060321171317758e-2),(3.826834323650897717284599840303988667613e-1)}, {-(1.921471959676955087381776386576096302606e-2),(1.950903220161282678482848684770222409276e-1)}, {-(4.815273327803113755163046890520078424525e-3),(9.801714032956060199419556388864184586113e-2)}, {-(1.204543794827607285228395240899305556796e-3),(4.906767432741801425495497694268265831474e-2)}, {-(3.011813037957798842343503338278031499389e-4),(2.454122852291228803173452945928292506546e-2)}, {-(7.529816085545907835350880361677564939353e-5),(1.227153828571992607940826195100321214037e-2)}, {-(1.882471739885734300956227143228382608274e-5),(6.135884649154475359640234590372580917057e-3)}, {-(4.706190423828488419874299880100447012366e-6),(3.067956762965976270145365490919842518944e-3)}, {-(1.176548298090070974289828473980951732077e-6),(1.533980186284765612303697150264079079954e-3)}, {-(2.941371177808397717822612343228837361006e-7),(7.669903187427045269385683579485766431409e-4)}, {-(7.353428214885526851929261214305179884431e-8),(3.834951875713955890724616811813812633950e-4)}, {-(1.838357070619165308459709028549492394875e-8),(1.917475973107033074399095619890009334688e-4)}, {-(4.595892687109028066860393851041105696810e-9),(9.587379909597734587051721097647635118706e-5)} }; if(log2n<(ptrdiff_t)(sizeof(table)/(sizeof(table[0])))) { *real=table[log2n][0]; *imag=table[log2n][1]; } else { ptrdiff_t n=((ptrdiff_t)1)<<log2n; const double C1=(1.0e0); const double C2=(5.0e-1); const double C3=(1.666666666666666666666666666666666666666e-1); const double C4=(4.166666666666666666666666666666666666666e-2); const double C5=(8.333333333333333333333333333333333333333e-3); const double C6=(1.388888888888888888888888888888888888888e-3); const double C7=(1.984126984126984126984126984126984126984e-4); const double C8=(2.480158730158730158730158730158730158730e-5); double x=(6.283185307179586476925286766559005768)/(double)n; double x2=x*x; *real=-x2*(C2-x2*(C4-x2*(C6-x2*C8))); *imag=x*(C1-x2*(C3-x2*(C5-x2*C7))); } } static void dbcF_cexp_d(ptrdiff_t log2n,double *real,double *imag) { dbcF_cexpm1_d(log2n,real,imag); *real=(1.0)+*real; } static void dbcF_compute_twiddles_d(ptrdiff_t log2n,ptrdiff_t log2b,double *real,double *imag,int inverse) { ptrdiff_t i; real[0]=(0.0); imag[0]=(0.0); for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); double x,y; dbcF_cexpm1_d(log2n-i,&x,&y); if(!inverse) y=-y; for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=(1.0)+real[i]; } static void dbcF_cexpm1_npot_d(ptrdiff_t p,ptrdiff_t q,double *real,double *imag) { ptrdiff_t i; double C=(1.0),S=(1.0); double x=(6.283185307179586476925286766559005768)*(double)p/(double)q,x2=x*x; double I=(32.0); for(i=32;i>=0;--i) { double J=((2.0)*I+(3.0)),K=I+I+(3.0); J=J*J; C=(1.0)-x2*C/(J+K); S=(1.0)-x2*S/(J-K); I=I-(1.0); } C=-C*(0.5)*x2; S=S*x; *real=C; *imag=S; } static void dbcF_compute_twiddles_npot_d(ptrdiff_t n,double *real,double *imag,int inverse) { ptrdiff_t i,j,k,m=n>>1,h=(m+2)>>1; if(n<1) return; real[0]=(0.0); imag[0]=(0.0); for(i=1;i<h;i*=2) { double X,Y; dbcF_cexpm1_npot_d(i,n,&X,&Y); if(!inverse) Y=-Y; j=(h<i*2?h-i:i); for(k=0;k<j;++k) { real[i+k]=(X*real[k]-Y*imag[k])+(X+real[k]); imag[i+k]=(Y*real[k]+X*imag[k])+(Y+imag[k]); } } for(i=0;i<h;++i) real[i]=(1.0)+real[i]; for(i=h;i<m;++i) { real[i]=-real[m-i]; imag[i]= imag[m-i]; } for(i=0;i<m;++i) { real[m+i]=-real[i]; imag[m+i]=-imag[i]; } } static void dbcF_bitreversal_swap_d(ptrdiff_t log2n,double *src,ptrdiff_t src_stride,double *dst,ptrdiff_t dst_stride) { ptrdiff_t i,n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; if(log2n<=8) { const unsigned char *idx=dbcF_bitreverse_table+(((ptrdiff_t)1)<<(log2n)); for(i=0;i<n;++i) { ptrdiff_t j=(ptrdiff_t)(idx[i]); double x=src[i*src_stride]; double y=dst[j*dst_stride]; src[i*src_stride]=y; dst[j*dst_stride]=x; } } else { dbcF_bitreversal_swap_d(log2n-1,src ,2*src_stride,dst ,dst_stride); dbcF_bitreversal_swap_d(log2n-1,src+src_stride,2*src_stride,dst+h*dst_stride,dst_stride); } } static void dbcF_bitreversal_permutation_d(ptrdiff_t log2n,const double *src,ptrdiff_t src_stride,double *dst,ptrdiff_t dst_stride,double *tmp) { ptrdiff_t i,n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; if(src_stride==0) { double x=src[0]; for(i=0;i<n;++i) dst[i*dst_stride]=x; } else if(src==dst) { if(log2n<=8) { const unsigned char *idx=dbcF_bitreverse_table+(((ptrdiff_t)1)<<(log2n)); for(i=0;i<n;++i) { ptrdiff_t j=(ptrdiff_t)(idx[i]); if(i<j) { double x=dst[i*dst_stride]; double y=dst[j*dst_stride]; dst[i*dst_stride]=y; dst[j*dst_stride]=x; } } } else if(log2n<=2*((((10)>>1)<6?((10)>>1):6))+2||log2n<=16) { dbcF_bitreversal_swap_d(log2n-2,dst+dst_stride,2*dst_stride,dst+h*dst_stride,2*dst_stride); dbcF_bitreversal_permutation_d(log2n-2,dst ,2*dst_stride,dst ,2*dst_stride,tmp); dbcF_bitreversal_permutation_d(log2n-2,dst+(h+1)*dst_stride,2*dst_stride,dst+(h+1)*dst_stride,2*dst_stride,tmp); } else { ptrdiff_t a,b,c,log2m=log2n-2*((((10)>>1)<6?((10)>>1):6)); ptrdiff_t m=(((ptrdiff_t)1)<<(log2m)); ptrdiff_t pow2q=(((ptrdiff_t)1)<<((((10)>>1)<6?((10)>>1):6))); for(b=0;b<m;++b) { ptrdiff_t ib=dbcF_bitreverse(b,log2m); if(ib<b) continue; for(a=0;a<pow2q;++a) for(c=0;c<pow2q;++c) tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]=dst[((a<<(log2n-((((10)>>1)<6?((10)>>1):6))))^(b<<((((10)>>1)<6?((10)>>1):6)))^c)*dst_stride]; for(c=0;c<pow2q;++c) { ptrdiff_t ic=dbcF_bitreverse(c,((((10)>>1)<6?((10)>>1):6))); for(a=0;a<pow2q;++a) { ptrdiff_t ia=dbcF_bitreverse(a,((((10)>>1)<6?((10)>>1):6))); double t; i=(ic<<(log2n-((((10)>>1)<6?((10)>>1):6))))^(ib<<((((10)>>1)<6?((10)>>1):6)))^ia; t=dst[i*dst_stride]; dst[i*dst_stride]=tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]; tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]=t; } } if(b!=ib) for(a=0;a<pow2q;++a) for(c=0;c<pow2q;++c) dst[((a<<(log2n-((((10)>>1)<6?((10)>>1):6))))^(b<<((((10)>>1)<6?((10)>>1):6)))^c)*dst_stride]=tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]; } } } else { if(log2n<=8) { const unsigned char *idx=dbcF_bitreverse_table+(((ptrdiff_t)1)<<(log2n)); for(i=0;i<n;++i) { ptrdiff_t j=(ptrdiff_t)(idx[i]); dst[j*dst_stride]=src[i*src_stride]; } } else if(log2n<=16) { dbcF_bitreversal_permutation_d(log2n-1,src ,2*src_stride,dst ,dst_stride,tmp); dbcF_bitreversal_permutation_d(log2n-1,src+src_stride,2*src_stride,dst+h*dst_stride,dst_stride,tmp); } else { for(i=0;i<h;++i) { dst[ i *dst_stride]=src[(2*i )*src_stride]; dst[(i+h)*dst_stride]=src[(2*i+1)*src_stride]; } dbcF_bitreversal_permutation_d(log2n-1,dst ,dst_stride,dst ,dst_stride,tmp); dbcF_bitreversal_permutation_d(log2n-1,dst+h*dst_stride,dst_stride,dst+h*dst_stride,dst_stride,tmp); } } } static void dbcF_fft8_d( double *real,double *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse,double c) { double r0,r1,r2,r3,r4,r5,r6,r7; double i0,i1,i2,i3,i4,i5,i6,i7; double R0,R1,R2,R3,R4,R5,R6,R7; double I0,I1,I2,I3,I4,I5,I6,I7; double p5,m5,p7,m7; r0=real[0*real_stride];i0=imag[0*imag_stride]; r1=real[1*real_stride];i1=imag[1*imag_stride]; r2=real[2*real_stride];i2=imag[2*imag_stride]; r3=real[3*real_stride];i3=imag[3*imag_stride]; r4=real[4*real_stride];i4=imag[4*imag_stride]; r5=real[5*real_stride];i5=imag[5*imag_stride]; r6=real[6*real_stride];i6=imag[6*imag_stride]; r7=real[7*real_stride];i7=imag[7*imag_stride]; R0=r0+r1;R1=r0-r1;I0=i0+i1;I1=i0-i1; R2=r2+r3;R3=r2-r3;I2=i2+i3;I3=i2-i3; R4=r4+r5;R5=r4-r5;I4=i4+i5;I5=i4-i5; R6=r6+r7;R7=r6-r7;I6=i6+i7;I7=i6-i7; if(!inverse) { r0=R0+R2;i0=I0+I2; r1=R1+I3;i1=I1-R3; r2=R0-R2;i2=I0-I2; r3=R1-I3;i3=I1+R3; r4=R4+R6;i4=I4+I6; r5=R5+I7;i5=I5-R7; r6=R4-R6;i6=I4-I6; r7=R5-I7;i7=I5+R7; p5=c*(r5+i5);m5=c*(r5-i5); p7=c*(r7+i7);m7=c*(r7-i7); real[0*real_stride]=r0+r4;imag[0*imag_stride]=i0+i4; real[1*real_stride]=r1+p5;imag[1*imag_stride]=i1-m5; real[2*real_stride]=r2+i6;imag[2*imag_stride]=i2-r6; real[3*real_stride]=r3-m7;imag[3*imag_stride]=i3-p7; real[4*real_stride]=r0-r4;imag[4*imag_stride]=i0-i4; real[5*real_stride]=r1-p5;imag[5*imag_stride]=i1+m5; real[6*real_stride]=r2-i6;imag[6*imag_stride]=i2+r6; real[7*real_stride]=r3+m7;imag[7*imag_stride]=i3+p7; } else { r0=R0+R2;i0=I0+I2; r1=R1-I3;i1=I1+R3; r2=R0-R2;i2=I0-I2; r3=R1+I3;i3=I1-R3; r4=R4+R6;i4=I4+I6; r5=R5-I7;i5=I5+R7; r6=R4-R6;i6=I4-I6; r7=R5+I7;i7=I5-R7; p5=c*(r5+i5);m5=c*(r5-i5); p7=c*(r7+i7);m7=c*(r7-i7); real[0*real_stride]=r0+r4;imag[0*imag_stride]=i0+i4; real[1*real_stride]=r1+m5;imag[1*imag_stride]=i1+p5; real[2*real_stride]=r2-i6;imag[2*imag_stride]=i2+r6; real[3*real_stride]=r3-p7;imag[3*imag_stride]=i3+m7; real[4*real_stride]=r0-r4;imag[4*imag_stride]=i0-i4; real[5*real_stride]=r1-m5;imag[5*imag_stride]=i1-p5; real[6*real_stride]=r2+i6;imag[6*imag_stride]=i2-r6; real[7*real_stride]=r3+p7;imag[7*imag_stride]=i3-m7; } } static void dbcF_butterfly_block_d( ptrdiff_t log2n, ptrdiff_t log2b, double *LR,double *LI, double *HR,double *HI, ptrdiff_t real_stride,ptrdiff_t imag_stride, double C,double S, int inverse, const double *tr,const double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; double X,Y; if(log2b<=((10)-1)) { ptrdiff_t i,j=0,k=0; for(i=0;i<b;++i) { double c=C*tr[i]-S*ti[i]; double s=S*tr[i]+C*ti[i]; double xl=LR[j],yl=LI[k]; double xr=HR[j],yr=HI[k]; double x=c*xr-s*yr; double y=s*xr+c*yr; LR[j]=xl+x; LI[k]=yl+y; HR[j]=xl-x; HI[k]=yl-y; j+=real_stride; k+=imag_stride; } } else { dbcF_cexp_d(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_d(log2n,log2b-1,LR ,LI ,HR ,HI ,real_stride,imag_stride,C ,S ,inverse,tr,ti); dbcF_butterfly_block_d(log2n,log2b-1,LR+h*real_stride,LI+h*imag_stride,HR+h*real_stride,HI+h*imag_stride,real_stride,imag_stride,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_pass_d( ptrdiff_t log2n, ptrdiff_t log2c, double *real,double *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, ptrdiff_t log2t, const double *tr,const double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; double *LR=real,*HR=real+h*real_stride; double *LI=imag,*HI=imag+h*imag_stride; if(log2n==0) return; if(log2n-1<=log2t) { if(h>1) { for(i=0;i<c;++i) { ptrdiff_t d,j=0,k=0; for(d=0;d<h;d+=2) { double C,S,xl,yl,xr,yr,x,y; C=tr[d];S=ti[d]; xl=LR[j];yl=LI[k]; xr=HR[j];yr=HI[k]; x=C*xr-S*yr;y=S*xr+C*yr; LR[j]=xl+x;LI[k]=yl+y; HR[j]=xl-x;HI[k]=yl-y; j+=real_stride;k+=imag_stride; C=tr[d+1];S=ti[d+1]; xl=LR[j];yl=LI[k]; xr=HR[j];yr=HI[k]; x=C*xr-S*yr;y=S*xr+C*yr; LR[j]=xl+x;LI[k]=yl+y; HR[j]=xl-x;HI[k]=yl-y; j+=real_stride;k+=imag_stride; } LR+=n*real_stride;LI+=n*imag_stride; HR+=n*real_stride;HI+=n*imag_stride; } } else { for(i=0;i<c;++i) { double xl,yl,xr,yr; xl=LR[0];yl=LI[0]; xr=HR[0];yr=HI[0]; LR[0]=xl+xr;LI[0]=yl+yr; HR[0]=xl-xr;HI[0]=yl-yr; LR+=n*real_stride;LI+=n*imag_stride; HR+=n*real_stride;HI+=n*imag_stride; } } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_d(log2n,log2n-1,LR,LI,HR,HI,real_stride,imag_stride,(1.0),(0.0),inverse,tr,ti); LR+=n*real_stride;LI+=n*imag_stride; HR+=n*real_stride;HI+=n*imag_stride; } } } static void dbcF_butterfly_multipass_d( ptrdiff_t log2n, ptrdiff_t log2c, ptrdiff_t depth, double *real,double *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, double *tr,double *ti) { while(depth>0) { ptrdiff_t log2d,log2t; ptrdiff_t d=dbcF_butterfly_multipass_optimized_double( log2n,log2c,depth, real,imag, real_stride,imag_stride, inverse, tr,ti); if(d>0) {depth-=d;continue;} if(depth==log2n&&depth>=3) { ptrdiff_t j,m=(((ptrdiff_t)1)<<(log2n+log2c-3)); dbcF_cexp_d(3,tr,ti); for(j=0;j<m;++j) dbcF_fft8_d(real+8*real_stride*j,imag+8*imag_stride*j,real_stride,imag_stride,inverse,tr[0]); depth-=3; continue; } log2d=log2n-depth+1; log2t=(log2d-1<((10)-1)?log2d-1:((10)-1)); dbcF_compute_twiddles_d(log2d,log2t,tr,ti,inverse); dbcF_butterfly_pass_d( log2d, log2c+log2n-log2d, real,imag, real_stride,imag_stride, inverse, log2t, tr,ti); depth-=1; } } static void dbcF_butterfly_d( ptrdiff_t log2n, double *real,double *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, double *tmp) { double *tr=tmp; double *ti=tmp+(((ptrdiff_t)1)<<(((10)-1))); if(log2n>12) { dbcF_butterfly_d( log2n-1, real,imag, real_stride,imag_stride, inverse, tmp); dbcF_butterfly_d( log2n-1, real+(((ptrdiff_t)1)<<(log2n-1))*real_stride,imag+(((ptrdiff_t)1)<<(log2n-1))*imag_stride, real_stride,imag_stride, inverse, tmp); dbcF_butterfly_multipass_d( log2n,0,1, real,imag, real_stride,imag_stride, inverse, tr,ti); } else { dbcF_butterfly_multipass_d( log2n,0,log2n, real,imag, real_stride,imag_stride, inverse, tr,ti); } } static void dbcF_deinterleave_d(double *dst,ptrdiff_t log2n,double *tmp) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; if(n<=2) return; if(n<=(((ptrdiff_t)1)<<(10))) { ptrdiff_t i,h=n>>1; double *real=tmp,*imag=tmp+h; for(i=0;i<h;++i) { real[i]=dst[2*i+0]; imag[i]=dst[2*i+1]; } for(i=0;i<n;++i) dst[i]=tmp[i]; return; } dbcF_bitreversal_permutation_d(log2n ,dst ,1,dst ,1,tmp); dbcF_bitreversal_permutation_d(log2n-1,dst ,1,dst ,1,tmp); dbcF_bitreversal_permutation_d(log2n-1,dst+h,1,dst+h,1,tmp); } static void dbcF_interleave_d(double *dst,ptrdiff_t log2n,double *tmp) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; if(n<=2) return; if(n<=(((ptrdiff_t)1)<<(10))) { ptrdiff_t i,h=n>>1; const double *real=dst,*imag=dst+h; for(i=0;i<h;++i) { tmp[2*i+0]=real[i]; tmp[2*i+1]=imag[i]; } for(i=0;i<n;++i) dst[i]=tmp[i]; return; } dbcF_bitreversal_permutation_d(log2n-1,dst ,1,dst ,1,tmp); dbcF_bitreversal_permutation_d(log2n-1,dst+h,1,dst+h,1,tmp); dbcF_bitreversal_permutation_d(log2n ,dst ,1,dst ,1,tmp); } static int dbcF_fft_pot_d( ptrdiff_t num_elements, const double *src_real,const double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, double *dst_real, double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, int inverse, double scale) { __attribute__((aligned(32))) double dummy[2]={(0.0),(0.0)}; __attribute__((aligned(32))) double tmp[(((ptrdiff_t)1)<<(10))]; ptrdiff_t n=num_elements; ptrdiff_t log2n=(ptrdiff_t)-1; ptrdiff_t i; int needs_deinterleave=(dst_real_stride==2&&dst_imag_stride==2&&dst_imag==dst_real+1&&num_elements>16); while(n) {n>>=1;++log2n;} if(!src_real) {src_real=dummy ;src_real_stride=0;} if(!src_imag) {src_imag=dummy+1;src_imag_stride=0;} dbcF_bitreversal_permutation_d(log2n,src_real,src_real_stride,dst_real,dst_real_stride,tmp); dbcF_bitreversal_permutation_d(log2n,src_imag,src_imag_stride,dst_imag,dst_imag_stride,tmp); if(needs_deinterleave) { dbcF_deinterleave_d(dst_real,log2n+1,tmp); dbcF_butterfly_d( log2n, dst_real,dst_real+num_elements, 1,1, inverse,tmp); } else dbcF_butterfly_d( log2n, dst_real,dst_imag, dst_real_stride,dst_imag_stride, inverse,tmp); if(needs_deinterleave) dbcF_interleave_d(dst_real,log2n+1,tmp); if(scale!=(1.0)) for(i=0;i<num_elements;++i) { dst_real[i]=dst_real[i]*scale; dst_imag[i]=dst_imag[i]*scale; } return 0; } static int dbcF_fft_npot_d( ptrdiff_t n, const double *src_real,const double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, double *dst_real, double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, int inverse, double scale) { unsigned char *buf,*mem=0; double *ar,*ai,*br,*bi,*tr,*ti; double M=(1.0); ptrdiff_t i,j,log2m=0,m; ptrdiff_t alignment=64; while((((ptrdiff_t)1)<<(log2m))<2*n-1) {++log2m;M=M+M;} m=(((ptrdiff_t)1)<<(log2m)); if(!(mem=(unsigned char*)malloc((size_t)((4*m+4*n)*(ptrdiff_t)sizeof(double)+alignment)))) return (-2); buf=mem; if(alignment) { ptrdiff_t offset=((ptrdiff_t)buf)&(alignment-1); if(offset) buf+=alignment-offset; } ar=(double*)buf+0*m; ai=(double*)buf+1*m; br=(double*)buf+2*m; bi=(double*)buf+3*m; tr=(double*)buf+4*m+0*n; ti=(double*)buf+4*m+2*n; dbcF_compute_twiddles_npot_d(2*n,tr,ti,inverse); for(i=0,j=0;i<n;++i) { double c=tr[j],s=ti[j]; double x=src_real[i*src_real_stride],y=src_imag[i*src_imag_stride]; ar[i]=x*c-y*s; ai[i]=x*s+y*c; br[i]= c; bi[i]=-s; if(i>0) { br[m-i]= c; bi[m-i]=-s; } j+=(2*i+1); if(j>=2*n) j-=2*n; } for(i=n;i<m;++i) { ar[i]=(0.0); ai[i]=(0.0); } for(i=n;i<=m-n;++i) { br[i]=(0.0); bi[i]=(0.0); } dbcF_fft_pot_d(m,ar,ai,1,1,ar,ai,1,1,0,(1.0)/M); dbcF_fft_pot_d(m,br,bi,1,1,br,bi,1,1,0,(1.0)); for(i=0;i<m;++i) { double c=br[i],s=bi[i],x=ar[i],y=ai[i]; ar[i]=c*x-s*y; ai[i]=c*y+s*x; } dbcF_fft_pot_d(m,ar,ai,1,1,ar,ai,1,1,1,scale); for(i=0,j=0;i<n;++i) { double c=tr[j],s=ti[j],x=ar[i],y=ai[i]; dst_real[i*dst_real_stride]=c*x-s*y; dst_imag[i*dst_imag_stride]=c*y+s*x; j+=(2*i+1); if(j>=2*n) j-=2*n; } free(mem); return 0; } static int dbcF_fft_d( ptrdiff_t num_elements, const double *src_real,const double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, double *dst_real, double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, int inverse, double scale) { dbcF_init(); if(num_elements<1) return 0; if(src_real==dst_real&&src_real_stride!=dst_real_stride) return (-1); if(src_imag==dst_imag&&src_imag_stride!=dst_imag_stride) return (-1); if(src_imag==dst_real) return (-1); if(src_real==dst_imag) return (-1); if(num_elements&(num_elements-1)) return dbcF_fft_npot_d( num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, inverse, scale); return dbcF_fft_pot_d( num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, inverse, scale); } extern int dbc_fft_dc( ptrdiff_t num_elements, const double *src_real,const double *src_imag, double *dst_real, double *dst_imag, double scale) { return dbcF_fft_d(num_elements, src_real,src_imag, 1,1, dst_real,dst_imag, 1,1, 0, scale); } extern int dbc_ifft_dc( ptrdiff_t num_elements, const double *src_real,const double *src_imag, double *dst_real, double *dst_imag, double scale) { return dbcF_fft_d(num_elements, src_real,src_imag, 1,1, dst_real,dst_imag, 1,1, 1, scale); } extern int dbc_fft_di( ptrdiff_t num_elements, const double *src, double *dst, double scale) { return dbcF_fft_d(num_elements, src,(src?src+1:src), (src?2:0),(src?2:0), dst,dst+1, (src?2:0),(src?2:0), 0, scale); } extern int dbc_ifft_di( ptrdiff_t num_elements, const double *src, double *dst, double scale) { return dbcF_fft_d(num_elements, src,(src?src+1:src), (src?2:0),(src?2:0), dst,dst+1, (src?2:0),(src?2:0), 1, scale); } extern int dbc_fft_ds( ptrdiff_t num_elements, const double *src_real,const double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, double *dst_real, double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, double scale) { return dbcF_fft_d(num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, 0, scale); } extern int dbc_ifft_ds( ptrdiff_t num_elements, const double *src_real,const double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, double *dst_real, double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, double scale) { return dbcF_fft_d(num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, 1, scale); } extern int dbc_fft_dp(dbcf_params_d *params) { return dbcF_fft_d(params->num_elements, params->src_real,params->src_imag, params->src_real_stride,params->src_imag_stride, params->dst_real,params->dst_imag, params->dst_real_stride,params->dst_imag_stride, 0, params->scale); } extern int dbc_ifft_dp(dbcf_params_d *params) { return dbcF_fft_d(params->num_elements, params->src_real,params->src_imag, params->src_real_stride,params->src_imag_stride, params->dst_real,params->dst_imag, params->dst_real_stride,params->dst_imag_stride, 1, params->scale); } static void dbcF_cexpm1_l(ptrdiff_t log2n,long double *real,long double *imag) { static long double table[][2]={ { 0.0e0l ,0.0e0l }, {-2.0e0l ,0.0e0l }, {-1.0e0l ,1.0e0l }, {-2.928932188134524755991556378951509607151e-1l,7.071067811865475244008443621048490392848e-1l}, {-7.612046748871324387181681060321171317758e-2l,3.826834323650897717284599840303988667613e-1l}, {-1.921471959676955087381776386576096302606e-2l,1.950903220161282678482848684770222409276e-1l}, {-4.815273327803113755163046890520078424525e-3l,9.801714032956060199419556388864184586113e-2l}, {-1.204543794827607285228395240899305556796e-3l,4.906767432741801425495497694268265831474e-2l}, {-3.011813037957798842343503338278031499389e-4l,2.454122852291228803173452945928292506546e-2l}, {-7.529816085545907835350880361677564939353e-5l,1.227153828571992607940826195100321214037e-2l}, {-1.882471739885734300956227143228382608274e-5l,6.135884649154475359640234590372580917057e-3l}, {-4.706190423828488419874299880100447012366e-6l,3.067956762965976270145365490919842518944e-3l}, {-1.176548298090070974289828473980951732077e-6l,1.533980186284765612303697150264079079954e-3l}, {-2.941371177808397717822612343228837361006e-7l,7.669903187427045269385683579485766431409e-4l}, {-7.353428214885526851929261214305179884431e-8l,3.834951875713955890724616811813812633950e-4l}, {-1.838357070619165308459709028549492394875e-8l,1.917475973107033074399095619890009334688e-4l}, {-4.595892687109028066860393851041105696810e-9l,9.587379909597734587051721097647635118706e-5l} }; if(log2n<(ptrdiff_t)(sizeof(table)/(sizeof(table[0])))) { *real=table[log2n][0]; *imag=table[log2n][1]; } else { ptrdiff_t n=((ptrdiff_t)1)<<log2n; const long double C1=1.0e0l; const long double C2=5.0e-1l; const long double C3=1.666666666666666666666666666666666666666e-1l; const long double C4=4.166666666666666666666666666666666666666e-2l; const long double C5=8.333333333333333333333333333333333333333e-3l; const long double C6=1.388888888888888888888888888888888888888e-3l; const long double C7=1.984126984126984126984126984126984126984e-4l; const long double C8=2.480158730158730158730158730158730158730e-5l; long double x=6.283185307179586476925286766559005768l/(long double)n; long double x2=x*x; *real=-x2*(C2-x2*(C4-x2*(C6-x2*C8))); *imag=x*(C1-x2*(C3-x2*(C5-x2*C7))); } } static void dbcF_cexp_l(ptrdiff_t log2n,long double *real,long double *imag) { dbcF_cexpm1_l(log2n,real,imag); *real=1.0l +*real; } static void dbcF_compute_twiddles_l(ptrdiff_t log2n,ptrdiff_t log2b,long double *real,long double *imag,int inverse) { ptrdiff_t i; real[0]=0.0l; imag[0]=0.0l; for(i=0;i<log2b;++i) { ptrdiff_t j,k=(((ptrdiff_t)1)<<(i)); long double x,y; dbcF_cexpm1_l(log2n-i,&x,&y); if(!inverse) y=-y; for(j=0;j<k;++j) { real[k+j]=(x*real[j]-y*imag[j])+(x+real[j]); imag[k+j]=(y*real[j]+x*imag[j])+(y+imag[j]); } } for(i=0;i<(((ptrdiff_t)1)<<(log2b));++i) real[i]=1.0l +real[i]; } static void dbcF_cexpm1_npot_l(ptrdiff_t p,ptrdiff_t q,long double *real,long double *imag) { ptrdiff_t i; long double C=1.0l,S=1.0l; long double x=6.283185307179586476925286766559005768l*(long double)p/(long double)q,x2=x*x; long double I=32.0l; for(i=32;i>=0;--i) { long double J=(2.0l*I+3.0l),K=I+I+3.0l; J=J*J; C=1.0l -x2*C/(J+K); S=1.0l -x2*S/(J-K); I=I-1.0l; } C=-C*0.5l*x2; S=S*x; *real=C; *imag=S; } static void dbcF_compute_twiddles_npot_l(ptrdiff_t n,long double *real,long double *imag,int inverse) { ptrdiff_t i,j,k,m=n>>1,h=(m+2)>>1; if(n<1) return; real[0]=0.0l; imag[0]=0.0l; for(i=1;i<h;i*=2) { long double X,Y; dbcF_cexpm1_npot_l(i,n,&X,&Y); if(!inverse) Y=-Y; j=(h<i*2?h-i:i); for(k=0;k<j;++k) { real[i+k]=(X*real[k]-Y*imag[k])+(X+real[k]); imag[i+k]=(Y*real[k]+X*imag[k])+(Y+imag[k]); } } for(i=0;i<h;++i) real[i]=1.0l +real[i]; for(i=h;i<m;++i) { real[i]=-real[m-i]; imag[i]= imag[m-i]; } for(i=0;i<m;++i) { real[m+i]=-real[i]; imag[m+i]=-imag[i]; } } static void dbcF_bitreversal_swap_l(ptrdiff_t log2n,long double *src,ptrdiff_t src_stride,long double *dst,ptrdiff_t dst_stride) { ptrdiff_t i,n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; if(log2n<=8) { const unsigned char *idx=dbcF_bitreverse_table+(((ptrdiff_t)1)<<(log2n)); for(i=0;i<n;++i) { ptrdiff_t j=(ptrdiff_t)(idx[i]); long double x=src[i*src_stride]; long double y=dst[j*dst_stride]; src[i*src_stride]=y; dst[j*dst_stride]=x; } } else { dbcF_bitreversal_swap_l(log2n-1,src ,2*src_stride,dst ,dst_stride); dbcF_bitreversal_swap_l(log2n-1,src+src_stride,2*src_stride,dst+h*dst_stride,dst_stride); } } static void dbcF_bitreversal_permutation_l(ptrdiff_t log2n,const long double *src,ptrdiff_t src_stride,long double *dst,ptrdiff_t dst_stride,long double *tmp) { ptrdiff_t i,n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; if(src_stride==0) { long double x=src[0]; for(i=0;i<n;++i) dst[i*dst_stride]=x; } else if(src==dst) { if(log2n<=8) { const unsigned char *idx=dbcF_bitreverse_table+(((ptrdiff_t)1)<<(log2n)); for(i=0;i<n;++i) { ptrdiff_t j=(ptrdiff_t)(idx[i]); if(i<j) { long double x=dst[i*dst_stride]; long double y=dst[j*dst_stride]; dst[i*dst_stride]=y; dst[j*dst_stride]=x; } } } else if(log2n<=2*((((10)>>1)<6?((10)>>1):6))+2||log2n<=16) { dbcF_bitreversal_swap_l(log2n-2,dst+dst_stride,2*dst_stride,dst+h*dst_stride,2*dst_stride); dbcF_bitreversal_permutation_l(log2n-2,dst ,2*dst_stride,dst ,2*dst_stride,tmp); dbcF_bitreversal_permutation_l(log2n-2,dst+(h+1)*dst_stride,2*dst_stride,dst+(h+1)*dst_stride,2*dst_stride,tmp); } else { ptrdiff_t a,b,c,log2m=log2n-2*((((10)>>1)<6?((10)>>1):6)); ptrdiff_t m=(((ptrdiff_t)1)<<(log2m)); ptrdiff_t pow2q=(((ptrdiff_t)1)<<((((10)>>1)<6?((10)>>1):6))); for(b=0;b<m;++b) { ptrdiff_t ib=dbcF_bitreverse(b,log2m); if(ib<b) continue; for(a=0;a<pow2q;++a) for(c=0;c<pow2q;++c) tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]=dst[((a<<(log2n-((((10)>>1)<6?((10)>>1):6))))^(b<<((((10)>>1)<6?((10)>>1):6)))^c)*dst_stride]; for(c=0;c<pow2q;++c) { ptrdiff_t ic=dbcF_bitreverse(c,((((10)>>1)<6?((10)>>1):6))); for(a=0;a<pow2q;++a) { ptrdiff_t ia=dbcF_bitreverse(a,((((10)>>1)<6?((10)>>1):6))); long double t; i=(ic<<(log2n-((((10)>>1)<6?((10)>>1):6))))^(ib<<((((10)>>1)<6?((10)>>1):6)))^ia; t=dst[i*dst_stride]; dst[i*dst_stride]=tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]; tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]=t; } } if(b!=ib) for(a=0;a<pow2q;++a) for(c=0;c<pow2q;++c) dst[((a<<(log2n-((((10)>>1)<6?((10)>>1):6))))^(b<<((((10)>>1)<6?((10)>>1):6)))^c)*dst_stride]=tmp[(a<<((((10)>>1)<6?((10)>>1):6)))^c]; } } } else { if(log2n<=8) { const unsigned char *idx=dbcF_bitreverse_table+(((ptrdiff_t)1)<<(log2n)); for(i=0;i<n;++i) { ptrdiff_t j=(ptrdiff_t)(idx[i]); dst[j*dst_stride]=src[i*src_stride]; } } else if(log2n<=16) { dbcF_bitreversal_permutation_l(log2n-1,src ,2*src_stride,dst ,dst_stride,tmp); dbcF_bitreversal_permutation_l(log2n-1,src+src_stride,2*src_stride,dst+h*dst_stride,dst_stride,tmp); } else { for(i=0;i<h;++i) { dst[ i *dst_stride]=src[(2*i )*src_stride]; dst[(i+h)*dst_stride]=src[(2*i+1)*src_stride]; } dbcF_bitreversal_permutation_l(log2n-1,dst ,dst_stride,dst ,dst_stride,tmp); dbcF_bitreversal_permutation_l(log2n-1,dst+h*dst_stride,dst_stride,dst+h*dst_stride,dst_stride,tmp); } } } static void dbcF_fft8_l( long double *real,long double *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse,long double c) { long double r0,r1,r2,r3,r4,r5,r6,r7; long double i0,i1,i2,i3,i4,i5,i6,i7; long double R0,R1,R2,R3,R4,R5,R6,R7; long double I0,I1,I2,I3,I4,I5,I6,I7; long double p5,m5,p7,m7; r0=real[0*real_stride];i0=imag[0*imag_stride]; r1=real[1*real_stride];i1=imag[1*imag_stride]; r2=real[2*real_stride];i2=imag[2*imag_stride]; r3=real[3*real_stride];i3=imag[3*imag_stride]; r4=real[4*real_stride];i4=imag[4*imag_stride]; r5=real[5*real_stride];i5=imag[5*imag_stride]; r6=real[6*real_stride];i6=imag[6*imag_stride]; r7=real[7*real_stride];i7=imag[7*imag_stride]; R0=r0+r1;R1=r0-r1;I0=i0+i1;I1=i0-i1; R2=r2+r3;R3=r2-r3;I2=i2+i3;I3=i2-i3; R4=r4+r5;R5=r4-r5;I4=i4+i5;I5=i4-i5; R6=r6+r7;R7=r6-r7;I6=i6+i7;I7=i6-i7; if(!inverse) { r0=R0+R2;i0=I0+I2; r1=R1+I3;i1=I1-R3; r2=R0-R2;i2=I0-I2; r3=R1-I3;i3=I1+R3; r4=R4+R6;i4=I4+I6; r5=R5+I7;i5=I5-R7; r6=R4-R6;i6=I4-I6; r7=R5-I7;i7=I5+R7; p5=c*(r5+i5);m5=c*(r5-i5); p7=c*(r7+i7);m7=c*(r7-i7); real[0*real_stride]=r0+r4;imag[0*imag_stride]=i0+i4; real[1*real_stride]=r1+p5;imag[1*imag_stride]=i1-m5; real[2*real_stride]=r2+i6;imag[2*imag_stride]=i2-r6; real[3*real_stride]=r3-m7;imag[3*imag_stride]=i3-p7; real[4*real_stride]=r0-r4;imag[4*imag_stride]=i0-i4; real[5*real_stride]=r1-p5;imag[5*imag_stride]=i1+m5; real[6*real_stride]=r2-i6;imag[6*imag_stride]=i2+r6; real[7*real_stride]=r3+m7;imag[7*imag_stride]=i3+p7; } else { r0=R0+R2;i0=I0+I2; r1=R1-I3;i1=I1+R3; r2=R0-R2;i2=I0-I2; r3=R1+I3;i3=I1-R3; r4=R4+R6;i4=I4+I6; r5=R5-I7;i5=I5+R7; r6=R4-R6;i6=I4-I6; r7=R5+I7;i7=I5-R7; p5=c*(r5+i5);m5=c*(r5-i5); p7=c*(r7+i7);m7=c*(r7-i7); real[0*real_stride]=r0+r4;imag[0*imag_stride]=i0+i4; real[1*real_stride]=r1+m5;imag[1*imag_stride]=i1+p5; real[2*real_stride]=r2-i6;imag[2*imag_stride]=i2+r6; real[3*real_stride]=r3-p7;imag[3*imag_stride]=i3+m7; real[4*real_stride]=r0-r4;imag[4*imag_stride]=i0-i4; real[5*real_stride]=r1-m5;imag[5*imag_stride]=i1-p5; real[6*real_stride]=r2+i6;imag[6*imag_stride]=i2-r6; real[7*real_stride]=r3+p7;imag[7*imag_stride]=i3-m7; } } static void dbcF_butterfly_block_l( ptrdiff_t log2n, ptrdiff_t log2b, long double *LR,long double *LI, long double *HR,long double *HI, ptrdiff_t real_stride,ptrdiff_t imag_stride, long double C,long double S, int inverse, const long double *tr,const long double *ti) { ptrdiff_t b=(((ptrdiff_t)1)<<(log2b)),h=b>>1; long double X,Y; if(log2b<=((10)-1)) { ptrdiff_t i,j=0,k=0; for(i=0;i<b;++i) { long double c=C*tr[i]-S*ti[i]; long double s=S*tr[i]+C*ti[i]; long double xl=LR[j],yl=LI[k]; long double xr=HR[j],yr=HI[k]; long double x=c*xr-s*yr; long double y=s*xr+c*yr; LR[j]=xl+x; LI[k]=yl+y; HR[j]=xl-x; HI[k]=yl-y; j+=real_stride; k+=imag_stride; } } else { dbcF_cexp_l(log2n-log2b+1,&X,&Y); if(!inverse) Y=-Y; dbcF_butterfly_block_l(log2n,log2b-1,LR ,LI ,HR ,HI ,real_stride,imag_stride,C ,S ,inverse,tr,ti); dbcF_butterfly_block_l(log2n,log2b-1,LR+h*real_stride,LI+h*imag_stride,HR+h*real_stride,HI+h*imag_stride,real_stride,imag_stride,C*X-S*Y,S*X+C*Y,inverse,tr,ti); } } static void dbcF_butterfly_pass_l( ptrdiff_t log2n, ptrdiff_t log2c, long double *real,long double *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, ptrdiff_t log2t, const long double *tr,const long double *ti) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)),h=n>>1; ptrdiff_t c=(((ptrdiff_t)1)<<(log2c)); ptrdiff_t i; long double *LR=real,*HR=real+h*real_stride; long double *LI=imag,*HI=imag+h*imag_stride; if(log2n==0) return; if(log2n-1<=log2t) { if(h>1) { for(i=0;i<c;++i) { ptrdiff_t d,j=0,k=0; for(d=0;d<h;d+=2) { long double C,S,xl,yl,xr,yr,x,y; C=tr[d];S=ti[d]; xl=LR[j];yl=LI[k]; xr=HR[j];yr=HI[k]; x=C*xr-S*yr;y=S*xr+C*yr; LR[j]=xl+x;LI[k]=yl+y; HR[j]=xl-x;HI[k]=yl-y; j+=real_stride;k+=imag_stride; C=tr[d+1];S=ti[d+1]; xl=LR[j];yl=LI[k]; xr=HR[j];yr=HI[k]; x=C*xr-S*yr;y=S*xr+C*yr; LR[j]=xl+x;LI[k]=yl+y; HR[j]=xl-x;HI[k]=yl-y; j+=real_stride;k+=imag_stride; } LR+=n*real_stride;LI+=n*imag_stride; HR+=n*real_stride;HI+=n*imag_stride; } } else { for(i=0;i<c;++i) { long double xl,yl,xr,yr; xl=LR[0];yl=LI[0]; xr=HR[0];yr=HI[0]; LR[0]=xl+xr;LI[0]=yl+yr; HR[0]=xl-xr;HI[0]=yl-yr; LR+=n*real_stride;LI+=n*imag_stride; HR+=n*real_stride;HI+=n*imag_stride; } } } else { for(i=0;i<c;++i) { dbcF_butterfly_block_l(log2n,log2n-1,LR,LI,HR,HI,real_stride,imag_stride,1.0l,0.0l,inverse,tr,ti); LR+=n*real_stride;LI+=n*imag_stride; HR+=n*real_stride;HI+=n*imag_stride; } } } static void dbcF_butterfly_multipass_l( ptrdiff_t log2n, ptrdiff_t log2c, ptrdiff_t depth, long double *real,long double *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, long double *tr,long double *ti) { while(depth>0) { ptrdiff_t log2d,log2t; if(depth==log2n&&depth>=3) { ptrdiff_t j,m=(((ptrdiff_t)1)<<(log2n+log2c-3)); dbcF_cexp_l(3,tr,ti); for(j=0;j<m;++j) dbcF_fft8_l(real+8*real_stride*j,imag+8*imag_stride*j,real_stride,imag_stride,inverse,tr[0]); depth-=3; continue; } log2d=log2n-depth+1; log2t=(log2d-1<((10)-1)?log2d-1:((10)-1)); dbcF_compute_twiddles_l(log2d,log2t,tr,ti,inverse); dbcF_butterfly_pass_l( log2d, log2c+log2n-log2d, real,imag, real_stride,imag_stride, inverse, log2t, tr,ti); depth-=1; } } static void dbcF_butterfly_l( ptrdiff_t log2n, long double *real,long double *imag, ptrdiff_t real_stride,ptrdiff_t imag_stride, int inverse, long double *tmp) { long double *tr=tmp; long double *ti=tmp+(((ptrdiff_t)1)<<(((10)-1))); if(log2n>12) { dbcF_butterfly_l( log2n-1, real,imag, real_stride,imag_stride, inverse, tmp); dbcF_butterfly_l( log2n-1, real+(((ptrdiff_t)1)<<(log2n-1))*real_stride,imag+(((ptrdiff_t)1)<<(log2n-1))*imag_stride, real_stride,imag_stride, inverse, tmp); dbcF_butterfly_multipass_l( log2n,0,1, real,imag, real_stride,imag_stride, inverse, tr,ti); } else { dbcF_butterfly_multipass_l( log2n,0,log2n, real,imag, real_stride,imag_stride, inverse, tr,ti); } } static int dbcF_fft_pot_l( ptrdiff_t num_elements, const long double *src_real,const long double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, long double *dst_real, long double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, int inverse, long double scale) { __attribute__((aligned(32))) long double dummy[2]={0.0l,0.0l}; __attribute__((aligned(32))) long double tmp[(((ptrdiff_t)1)<<(10))]; ptrdiff_t n=num_elements; ptrdiff_t log2n=(ptrdiff_t)-1; ptrdiff_t i; while(n) {n>>=1;++log2n;} if(!src_real) {src_real=dummy ;src_real_stride=0;} if(!src_imag) {src_imag=dummy+1;src_imag_stride=0;} dbcF_bitreversal_permutation_l(log2n,src_real,src_real_stride,dst_real,dst_real_stride,tmp); dbcF_bitreversal_permutation_l(log2n,src_imag,src_imag_stride,dst_imag,dst_imag_stride,tmp); dbcF_butterfly_l( log2n, dst_real,dst_imag, dst_real_stride,dst_imag_stride, inverse,tmp); if(scale!=1.0l) for(i=0;i<num_elements;++i) { dst_real[i]=dst_real[i]*scale; dst_imag[i]=dst_imag[i]*scale; } return 0; } static int dbcF_fft_npot_l( ptrdiff_t n, const long double *src_real,const long double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, long double *dst_real, long double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, int inverse, long double scale) { unsigned char *buf,*mem=0; long double *ar,*ai,*br,*bi,*tr,*ti; long double M=1.0l; ptrdiff_t i,j,log2m=0,m; ptrdiff_t alignment=0; while((((ptrdiff_t)1)<<(log2m))<2*n-1) {++log2m;M=M+M;} m=(((ptrdiff_t)1)<<(log2m)); if(!(mem=(unsigned char*)malloc((size_t)((4*m+4*n)*(ptrdiff_t)sizeof(long double)+alignment)))) return (-2); buf=mem; if(alignment) { ptrdiff_t offset=((ptrdiff_t)buf)&(alignment-1); if(offset) buf+=alignment-offset; } ar=(long double*)buf+0*m; ai=(long double*)buf+1*m; br=(long double*)buf+2*m; bi=(long double*)buf+3*m; tr=(long double*)buf+4*m+0*n; ti=(long double*)buf+4*m+2*n; dbcF_compute_twiddles_npot_l(2*n,tr,ti,inverse); for(i=0,j=0;i<n;++i) { long double c=tr[j],s=ti[j]; long double x=src_real[i*src_real_stride],y=src_imag[i*src_imag_stride]; ar[i]=x*c-y*s; ai[i]=x*s+y*c; br[i]= c; bi[i]=-s; if(i>0) { br[m-i]= c; bi[m-i]=-s; } j+=(2*i+1); if(j>=2*n) j-=2*n; } for(i=n;i<m;++i) { ar[i]=0.0l; ai[i]=0.0l; } for(i=n;i<=m-n;++i) { br[i]=0.0l; bi[i]=0.0l; } dbcF_fft_pot_l(m,ar,ai,1,1,ar,ai,1,1,0,1.0l/M); dbcF_fft_pot_l(m,br,bi,1,1,br,bi,1,1,0,1.0l); for(i=0;i<m;++i) { long double c=br[i],s=bi[i],x=ar[i],y=ai[i]; ar[i]=c*x-s*y; ai[i]=c*y+s*x; } dbcF_fft_pot_l(m,ar,ai,1,1,ar,ai,1,1,1,scale); for(i=0,j=0;i<n;++i) { long double c=tr[j],s=ti[j],x=ar[i],y=ai[i]; dst_real[i*dst_real_stride]=c*x-s*y; dst_imag[i*dst_imag_stride]=c*y+s*x; j+=(2*i+1); if(j>=2*n) j-=2*n; } free(mem); return 0; } static int dbcF_fft_l( ptrdiff_t num_elements, const long double *src_real,const long double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, long double *dst_real, long double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, int inverse, long double scale) { dbcF_init(); if(num_elements<1) return 0; if(src_real==dst_real&&src_real_stride!=dst_real_stride) return (-1); if(src_imag==dst_imag&&src_imag_stride!=dst_imag_stride) return (-1); if(src_imag==dst_real) return (-1); if(src_real==dst_imag) return (-1); if(num_elements&(num_elements-1)) return dbcF_fft_npot_l( num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, inverse, scale); return dbcF_fft_pot_l( num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, inverse, scale); } extern int dbc_fft_lc( ptrdiff_t num_elements, const long double *src_real,const long double *src_imag, long double *dst_real, long double *dst_imag, long double scale) { return dbcF_fft_l(num_elements, src_real,src_imag, 1,1, dst_real,dst_imag, 1,1, 0, scale); } extern int dbc_ifft_lc( ptrdiff_t num_elements, const long double *src_real,const long double *src_imag, long double *dst_real, long double *dst_imag, long double scale) { return dbcF_fft_l(num_elements, src_real,src_imag, 1,1, dst_real,dst_imag, 1,1, 1, scale); } extern int dbc_fft_li( ptrdiff_t num_elements, const long double *src, long double *dst, long double scale) { return dbcF_fft_l(num_elements, src,(src?src+1:src), (src?2:0),(src?2:0), dst,dst+1, (src?2:0),(src?2:0), 0, scale); } extern int dbc_ifft_li( ptrdiff_t num_elements, const long double *src, long double *dst, long double scale) { return dbcF_fft_l(num_elements, src,(src?src+1:src), (src?2:0),(src?2:0), dst,dst+1, (src?2:0),(src?2:0), 1, scale); } extern int dbc_fft_ls( ptrdiff_t num_elements, const long double *src_real,const long double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, long double *dst_real, long double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, long double scale) { return dbcF_fft_l(num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, 0, scale); } extern int dbc_ifft_ls( ptrdiff_t num_elements, const long double *src_real,const long double *src_imag, ptrdiff_t src_real_stride,ptrdiff_t src_imag_stride, long double *dst_real, long double *dst_imag, ptrdiff_t dst_real_stride,ptrdiff_t dst_imag_stride, long double scale) { return dbcF_fft_l(num_elements, src_real,src_imag, src_real_stride,src_imag_stride, dst_real,dst_imag, dst_real_stride,dst_imag_stride, 1, scale); } extern int dbc_fft_lp(dbcf_params_l *params) { return dbcF_fft_l(params->num_elements, params->src_real,params->src_imag, params->src_real_stride,params->src_imag_stride, params->dst_real,params->dst_imag, params->dst_real_stride,params->dst_imag_stride, 0, params->scale); } extern int dbc_ifft_lp(dbcf_params_l *params) { return dbcF_fft_l(params->num_elements, params->src_real,params->src_imag, params->src_real_stride,params->src_imag_stride, params->dst_real,params->dst_imag, params->dst_real_stride,params->dst_imag_stride, 1, params->scale); } static double get_cpu_time() { return (double)clock()/(double)CLOCKS_PER_SEC; } typedef struct RNG {unsigned a,b,c,d;} RNG; static unsigned RNG_generate(RNG *x) { unsigned e = x->a - (((x->b)<<(27))|((x->b)>>(32-(27)))); x->a = x->b ^ (((x->c)<<(17))|((x->c)>>(32-(17)))); x->b = x->c + x->d; x->c = x->d + e; x->d = e + x->a; return x->d; } static void RNG_init(RNG *x,unsigned seed) { unsigned i; x->a = 0xf1ea5eed, x->b = x->c = x->d = seed; for (i=0; i<20; ++i) (void)RNG_generate(x); } static int use_mflops=1; __attribute__((aligned(64))) union Data { float buf_f[((1<<24)+64)/sizeof(float)]; double buf_d[((1<<24)+64)/sizeof(double)]; long double buf_l[((1<<24)+64)/sizeof(long double)]; struct Mixed { float bf[((1<<24)+64)/8/sizeof(float)]; double bd[((1<<24)+64)/8/sizeof(float)]; long double bl[((1<<24)+64)/8/sizeof(float)]; } mixed; } data; static ptrdiff_t bitreverse_table[((1<<24)+64)/sizeof(float)]; static ptrdiff_t bitreverse_bruteforce(ptrdiff_t i,ptrdiff_t k) { ptrdiff_t ret=0; ptrdiff_t j=0; for(j=0;j<k;++j) { ret=(ret<<1)^(i&1); i=i>>1; } return ret; } void ft_bruteforce_f( ptrdiff_t n, const float *src_real,const float *src_imag, float *dst_real, float *dst_imag, int inverse, float scale) { ptrdiff_t i,j; float pi=((float)(4.0))*atanf(((float)(1.0))); for(i=0;i<n;++i) { dst_real[i]=((float)(0.0)); dst_imag[i]=((float)(0.0)); } for(i=0;i<n;++i) { float x=((float)(0.0)); float y=((float)(0.0)); float w=((float)(i))/((float)(n)); for(j=0;j<n;++j) { float a=((float)(2.0))*pi*(w*((float)(j))); float c=cosf(a); float s=sinf(a); if(!inverse) s=-s; x=x+src_real[j]*c-src_imag[j]*s; y=y+src_real[j]*s+src_imag[j]*c; } dst_real[i]=x; dst_imag[i]=y; } for(i=0;i<n;++i) { dst_real[i]=dst_real[i]*scale; dst_imag[i]=dst_imag[i]*scale; } } static void generate_f(ptrdiff_t seed,ptrdiff_t n,float *real,float *imag) { ptrdiff_t i,j; ptrdiff_t MAX=n; RNG rng; RNG_init(&rng,(unsigned)seed); for(i=0;i<MAX;++i) { const double m=1.0/4294967296.0; float x=((float)(0.0)); float y=((float)(0.0)); for(j=0;j<4;++j) x=(x*((float)(m))+((float)(((double)RNG_generate(&rng)*m-0.5)))); for(j=0;j<4;++j) y=(y*((float)(m))+((float)(((double)RNG_generate(&rng)*m-0.5)))); real[i]=((float)(x)); imag[i]=((float)(y)); } } static double test_time_f(ptrdiff_t n,const float *src_real,const float *src_imag,float *dst_real,float *dst_imag,int interleaved) { ptrdiff_t i; double t=get_cpu_time(); ptrdiff_t m=(((ptrdiff_t)1)<<(21))/n; if(sizeof(float)>=16) m/=8; if(n&(n-1)) m/=10; if(m==0) m=1; if(interleaved) for(i=0;i<m;++i) dbc_fft_fs(n,src_real,src_real+1,2,2,dst_real,dst_real+1,2,2,((float)(1.0))); else for(i=0;i<m;++i) dbc_fft_fs(n,src_real,src_imag ,1,1,dst_real,dst_imag ,1,1,((float)(1.0))); t=get_cpu_time()-t; return t/(double)m; } static void get_norms_f(ptrdiff_t n,const float *xr,const float *xi,const float *yr,const float *yi,double *RMS,double *Linf) { ptrdiff_t i; double e2=0.0,einf=0.0; for(i=0;i<n;++i) { double dr=0.0; double di=0.0; double d2=0.0; dr=((double)((xi[i]-yi[i]))); di=((double)((xr[i]-yr[i]))); d2=dr*dr+di*di; if(d2>einf) einf=d2; e2+=d2; } einf=sqrt(einf); e2/=(double)n; e2=sqrt(e2); *RMS=e2; *Linf=einf; } static void bitreverseal_permutation_bruteforce_f(ptrdiff_t log2n,float *src,ptrdiff_t src_stride,float *dst,ptrdiff_t dst_stride) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)); ptrdiff_t i; for(i=0;i<n;++i) { ptrdiff_t j=bitreverse_table[i]; dst[j*dst_stride]=src[i*src_stride]; } } static void test_bitreversal_permutation_f() { ptrdiff_t i; float tmp[65536]; float *buf0=data.buf_f; printf(" N |out-of-place| inplace| bruteforce\n"); printf("----------+------------+------------+------------\n"); for(i=0;((((1<<24)+64)/sizeof(float))>>i)>1;++i) { ptrdiff_t n=(((ptrdiff_t)1)<<(i)),m=(((ptrdiff_t)1)<<(21))/n; ptrdiff_t j; float *buf1=buf0+n; double t; if(m<1) m=1; for(j=0;j<n;++j) buf0[j]=((float)(j)); for(j=0;j<n;++j) bitreverse_table[j]=bitreverse_bruteforce(j,i); t=get_cpu_time(); for(j=0;j<m;++j) dbcF_bitreversal_permutation_f(i,buf0,1,buf1,1,tmp); t=get_cpu_time()-t; t/=(double)m; printf("%10.0f|%12.2f",(double)n,1e9*t/(double)n); for(j=0;j<n;++j) if(buf1[j]!=((float)(bitreverse_table[j]))) {printf(" FAIL!\n");return;} t=get_cpu_time(); for(j=0;j<m;++j) dbcF_bitreversal_permutation_f(i,buf0,1,buf0,1,tmp); t=get_cpu_time()-t; t/=(double)m; printf("|%12.2f",1e9*t/(double)n); for(j=0;j<n;++j) buf0[j]=((float)(j)); dbcF_bitreversal_permutation_f(i,buf0,1,buf0,1,tmp); for(j=0;j<n;++j) if(buf0[j]!=((float)(bitreverse_table[j]))) {printf(" FAIL!\n");return;} for(j=0;j<n;++j) buf0[j]=((float)(j)); t=get_cpu_time(); for(j=0;j<m;++j) bitreverseal_permutation_bruteforce_f(i,buf0,1,buf1,1); t=get_cpu_time()-t; t/=(double)m; printf("|%12.2f\n",1e9*t/(double)n); } } void test_fft_f(ptrdiff_t maxn) { ptrdiff_t i,j,a,b; ptrdiff_t MAX=((1<<24)+64)/sizeof(float)/8; float *buf=data.buf_f; if(maxn<MAX) MAX=maxn; printf(" | %5.5s | FFT-bruteforce | X-IFFT(FFT(X)) \n",(use_mflops?"Speed":"Time")); printf(" N | SoA | AoS | RMS | Linf | RMS | Linf \n"); printf("----------+-------+-------+-----------+-----------+-----------+-----------\n"); for(i=0;MAX>>i;++i) { ptrdiff_t n=(((ptrdiff_t)1)<<(i)); ptrdiff_t m=5*n*i; double RMS,Linf; double t; if(m==0) m=1; generate_f(37,n,buf+0*n,buf+1*n); printf("%10.0f|",(double)(((ptrdiff_t)1)<<(i))); for(j=0;j<2;++j) { t=test_time_f(n,buf+0*n,buf+1*n,buf+4*n,buf+5*n,(int)j); if(use_mflops) t=(double)m/(1.0e+9*t); else t=1.0e+9*t/(double)m; printf("%7.3f|",t); } dbc_fft_fs (n,buf+0*n,buf+1*n,1,1,buf+4*n,buf+5*n,1,1,((float)(1.0))); dbc_ifft_fs(n,buf+4*n,buf+5*n,1,1,buf+6*n,buf+7*n,1,1,((float)(1.0))/((float)(n))); if(i<=10) { ft_bruteforce_f(n,buf+0*n,buf+1*n,buf+2*n,buf+3*n,0,((float)(1.0))); get_norms_f(n,buf+2*n,buf+3*n,buf+4*n,buf+5*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e|",Linf); } else printf(" %-10s| %-10s|","-","-"); get_norms_f(n,buf+0*n,buf+1*n,buf+6*n,buf+7*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e",Linf); printf("\n"); } for(a=5,b=8;a<MAX;a+=b,b+=a) { ptrdiff_t n=a; double m=5.0*(double)n*log((double)n)/log(2.0); double RMS,Linf; double t; generate_f(37,n,buf+0*n,buf+1*n); printf("%10.0f|",(double)n); for(j=0;j<2;++j) { t=test_time_f(n,buf+0*n,buf+1*n,buf+4*n,buf+5*n,(int)j); if(use_mflops) t=m/(1.0e+9*t); else t=1.0e+9*t/m; printf("%7.3f|",t); } dbc_fft_fs (n,buf+0*n,buf+1*n,1,1,buf+4*n,buf+5*n,1,1,((float)(1.0))); dbc_ifft_fs(n,buf+4*n,buf+5*n,1,1,buf+6*n,buf+7*n,1,1,((float)(1.0))/((float)(n))); if(a<=1024) { ft_bruteforce_f(n,buf+0*n,buf+1*n,buf+2*n,buf+3*n,0,((float)(1.0))); get_norms_f(n,buf+2*n,buf+3*n,buf+4*n,buf+5*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e|",Linf); } else printf(" %-10s| %-10s|","-","-"); get_norms_f(n,buf+0*n,buf+1*n,buf+6*n,buf+7*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e",Linf); printf("\n"); } } void ft_bruteforce_d( ptrdiff_t n, const double *src_real,const double *src_imag, double *dst_real, double *dst_imag, int inverse, double scale) { ptrdiff_t i,j; double pi=((double)(4.0))*atan(((double)(1.0))); for(i=0;i<n;++i) { dst_real[i]=((double)(0.0)); dst_imag[i]=((double)(0.0)); } for(i=0;i<n;++i) { double x=((double)(0.0)); double y=((double)(0.0)); double w=((double)(i))/((double)(n)); for(j=0;j<n;++j) { double a=((double)(2.0))*pi*(w*((double)(j))); double c=cos(a); double s=sin(a); if(!inverse) s=-s; x=x+src_real[j]*c-src_imag[j]*s; y=y+src_real[j]*s+src_imag[j]*c; } dst_real[i]=x; dst_imag[i]=y; } for(i=0;i<n;++i) { dst_real[i]=dst_real[i]*scale; dst_imag[i]=dst_imag[i]*scale; } } static void generate_d(ptrdiff_t seed,ptrdiff_t n,double *real,double *imag) { ptrdiff_t i,j; ptrdiff_t MAX=n; RNG rng; RNG_init(&rng,(unsigned)seed); for(i=0;i<MAX;++i) { const double m=1.0/4294967296.0; double x=((double)(0.0)); double y=((double)(0.0)); for(j=0;j<4;++j) x=(x*((double)(m))+((double)(((double)RNG_generate(&rng)*m-0.5)))); for(j=0;j<4;++j) y=(y*((double)(m))+((double)(((double)RNG_generate(&rng)*m-0.5)))); real[i]=((double)(x)); imag[i]=((double)(y)); } } static double test_time_d(ptrdiff_t n,const double *src_real,const double *src_imag,double *dst_real,double *dst_imag,int interleaved) { ptrdiff_t i; double t=get_cpu_time(); ptrdiff_t m=(((ptrdiff_t)1)<<(21))/n; if(sizeof(double)>=16) m/=8; if(n&(n-1)) m/=10; if(m==0) m=1; if(interleaved) for(i=0;i<m;++i) dbc_fft_ds(n,src_real,src_real+1,2,2,dst_real,dst_real+1,2,2,((double)(1.0))); else for(i=0;i<m;++i) dbc_fft_ds(n,src_real,src_imag ,1,1,dst_real,dst_imag ,1,1,((double)(1.0))); t=get_cpu_time()-t; return t/(double)m; } static void get_norms_d(ptrdiff_t n,const double *xr,const double *xi,const double *yr,const double *yi,double *RMS,double *Linf) { ptrdiff_t i; double e2=0.0,einf=0.0; for(i=0;i<n;++i) { double dr=0.0; double di=0.0; double d2=0.0; dr=((double)((xi[i]-yi[i]))); di=((double)((xr[i]-yr[i]))); d2=dr*dr+di*di; if(d2>einf) einf=d2; e2+=d2; } einf=sqrt(einf); e2/=(double)n; e2=sqrt(e2); *RMS=e2; *Linf=einf; } static void bitreverseal_permutation_bruteforce_d(ptrdiff_t log2n,double *src,ptrdiff_t src_stride,double *dst,ptrdiff_t dst_stride) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)); ptrdiff_t i; for(i=0;i<n;++i) { ptrdiff_t j=bitreverse_table[i]; dst[j*dst_stride]=src[i*src_stride]; } } static void test_bitreversal_permutation_d() { ptrdiff_t i; double tmp[65536]; double *buf0=data.buf_d; printf(" N |out-of-place| inplace| bruteforce\n"); printf("----------+------------+------------+------------\n"); for(i=0;((((1<<24)+64)/sizeof(double))>>i)>1;++i) { ptrdiff_t n=(((ptrdiff_t)1)<<(i)),m=(((ptrdiff_t)1)<<(21))/n; ptrdiff_t j; double *buf1=buf0+n; double t; if(m<1) m=1; for(j=0;j<n;++j) buf0[j]=((double)(j)); for(j=0;j<n;++j) bitreverse_table[j]=bitreverse_bruteforce(j,i); t=get_cpu_time(); for(j=0;j<m;++j) dbcF_bitreversal_permutation_d(i,buf0,1,buf1,1,tmp); t=get_cpu_time()-t; t/=(double)m; printf("%10.0f|%12.2f",(double)n,1e9*t/(double)n); for(j=0;j<n;++j) if(buf1[j]!=((double)(bitreverse_table[j]))) {printf(" FAIL!\n");return;} t=get_cpu_time(); for(j=0;j<m;++j) dbcF_bitreversal_permutation_d(i,buf0,1,buf0,1,tmp); t=get_cpu_time()-t; t/=(double)m; printf("|%12.2f",1e9*t/(double)n); for(j=0;j<n;++j) buf0[j]=((double)(j)); dbcF_bitreversal_permutation_d(i,buf0,1,buf0,1,tmp); for(j=0;j<n;++j) if(buf0[j]!=((double)(bitreverse_table[j]))) {printf(" FAIL!\n");return;} for(j=0;j<n;++j) buf0[j]=((double)(j)); t=get_cpu_time(); for(j=0;j<m;++j) bitreverseal_permutation_bruteforce_d(i,buf0,1,buf1,1); t=get_cpu_time()-t; t/=(double)m; printf("|%12.2f\n",1e9*t/(double)n); } } void test_fft_d(ptrdiff_t maxn) { ptrdiff_t i,j,a,b; ptrdiff_t MAX=((1<<24)+64)/sizeof(double)/8; double *buf=data.buf_d; if(maxn<MAX) MAX=maxn; printf(" | %5.5s | FFT-bruteforce | X-IFFT(FFT(X)) \n",(use_mflops?"Speed":"Time")); printf(" N | SoA | AoS | RMS | Linf | RMS | Linf \n"); printf("----------+-------+-------+-----------+-----------+-----------+-----------\n"); for(i=0;MAX>>i;++i) { ptrdiff_t n=(((ptrdiff_t)1)<<(i)); ptrdiff_t m=5*n*i; double RMS,Linf; double t; if(m==0) m=1; generate_d(37,n,buf+0*n,buf+1*n); printf("%10.0f|",(double)(((ptrdiff_t)1)<<(i))); for(j=0;j<2;++j) { t=test_time_d(n,buf+0*n,buf+1*n,buf+4*n,buf+5*n,(int)j); if(use_mflops) t=(double)m/(1.0e+9*t); else t=1.0e+9*t/(double)m; printf("%7.3f|",t); } dbc_fft_ds (n,buf+0*n,buf+1*n,1,1,buf+4*n,buf+5*n,1,1,((double)(1.0))); dbc_ifft_ds(n,buf+4*n,buf+5*n,1,1,buf+6*n,buf+7*n,1,1,((double)(1.0))/((double)(n))); if(i<=10) { ft_bruteforce_d(n,buf+0*n,buf+1*n,buf+2*n,buf+3*n,0,((double)(1.0))); get_norms_d(n,buf+2*n,buf+3*n,buf+4*n,buf+5*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e|",Linf); } else printf(" %-10s| %-10s|","-","-"); get_norms_d(n,buf+0*n,buf+1*n,buf+6*n,buf+7*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e",Linf); printf("\n"); } for(a=5,b=8;a<MAX;a+=b,b+=a) { ptrdiff_t n=a; double m=5.0*(double)n*log((double)n)/log(2.0); double RMS,Linf; double t; generate_d(37,n,buf+0*n,buf+1*n); printf("%10.0f|",(double)n); for(j=0;j<2;++j) { t=test_time_d(n,buf+0*n,buf+1*n,buf+4*n,buf+5*n,(int)j); if(use_mflops) t=m/(1.0e+9*t); else t=1.0e+9*t/m; printf("%7.3f|",t); } dbc_fft_ds (n,buf+0*n,buf+1*n,1,1,buf+4*n,buf+5*n,1,1,((double)(1.0))); dbc_ifft_ds(n,buf+4*n,buf+5*n,1,1,buf+6*n,buf+7*n,1,1,((double)(1.0))/((double)(n))); if(a<=1024) { ft_bruteforce_d(n,buf+0*n,buf+1*n,buf+2*n,buf+3*n,0,((double)(1.0))); get_norms_d(n,buf+2*n,buf+3*n,buf+4*n,buf+5*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e|",Linf); } else printf(" %-10s| %-10s|","-","-"); get_norms_d(n,buf+0*n,buf+1*n,buf+6*n,buf+7*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e",Linf); printf("\n"); } } void ft_bruteforce_l( ptrdiff_t n, const long double *src_real,const long double *src_imag, long double *dst_real, long double *dst_imag, int inverse, long double scale) { ptrdiff_t i,j; long double pi=((long double)(4.0))*atanl(((long double)(1.0))); for(i=0;i<n;++i) { dst_real[i]=((long double)(0.0)); dst_imag[i]=((long double)(0.0)); } for(i=0;i<n;++i) { long double x=((long double)(0.0)); long double y=((long double)(0.0)); long double w=((long double)(i))/((long double)(n)); for(j=0;j<n;++j) { long double a=((long double)(2.0))*pi*(w*((long double)(j))); long double c=cosl(a); long double s=sinl(a); if(!inverse) s=-s; x=x+src_real[j]*c-src_imag[j]*s; y=y+src_real[j]*s+src_imag[j]*c; } dst_real[i]=x; dst_imag[i]=y; } for(i=0;i<n;++i) { dst_real[i]=dst_real[i]*scale; dst_imag[i]=dst_imag[i]*scale; } } static void generate_l(ptrdiff_t seed,ptrdiff_t n,long double *real,long double *imag) { ptrdiff_t i,j; ptrdiff_t MAX=n; RNG rng; RNG_init(&rng,(unsigned)seed); for(i=0;i<MAX;++i) { const double m=1.0/4294967296.0; long double x=((long double)(0.0)); long double y=((long double)(0.0)); for(j=0;j<4;++j) x=(x*((long double)(m))+((long double)(((double)RNG_generate(&rng)*m-0.5)))); for(j=0;j<4;++j) y=(y*((long double)(m))+((long double)(((double)RNG_generate(&rng)*m-0.5)))); real[i]=((long double)(x)); imag[i]=((long double)(y)); } } static double test_time_l(ptrdiff_t n,const long double *src_real,const long double *src_imag,long double *dst_real,long double *dst_imag,int interleaved) { ptrdiff_t i; double t=get_cpu_time(); ptrdiff_t m=(((ptrdiff_t)1)<<(21))/n; if(sizeof(long double)>=16) m/=8; if(n&(n-1)) m/=10; if(m==0) m=1; if(interleaved) for(i=0;i<m;++i) dbc_fft_ls(n,src_real,src_real+1,2,2,dst_real,dst_real+1,2,2,((long double)(1.0))); else for(i=0;i<m;++i) dbc_fft_ls(n,src_real,src_imag ,1,1,dst_real,dst_imag ,1,1,((long double)(1.0))); t=get_cpu_time()-t; return t/(double)m; } static void get_norms_l(ptrdiff_t n,const long double *xr,const long double *xi,const long double *yr,const long double *yi,double *RMS,double *Linf) { ptrdiff_t i; double e2=0.0,einf=0.0; for(i=0;i<n;++i) { double dr=0.0; double di=0.0; double d2=0.0; dr=((double)((xi[i]-yi[i]))); di=((double)((xr[i]-yr[i]))); d2=dr*dr+di*di; if(d2>einf) einf=d2; e2+=d2; } einf=sqrt(einf); e2/=(double)n; e2=sqrt(e2); *RMS=e2; *Linf=einf; } static void bitreverseal_permutation_bruteforce_l(ptrdiff_t log2n,long double *src,ptrdiff_t src_stride,long double *dst,ptrdiff_t dst_stride) { ptrdiff_t n=(((ptrdiff_t)1)<<(log2n)); ptrdiff_t i; for(i=0;i<n;++i) { ptrdiff_t j=bitreverse_table[i]; dst[j*dst_stride]=src[i*src_stride]; } } static void test_bitreversal_permutation_l() { ptrdiff_t i; long double tmp[65536]; long double *buf0=data.buf_l; printf(" N |out-of-place| inplace| bruteforce\n"); printf("----------+------------+------------+------------\n"); for(i=0;((((1<<24)+64)/sizeof(long double))>>i)>1;++i) { ptrdiff_t n=(((ptrdiff_t)1)<<(i)),m=(((ptrdiff_t)1)<<(21))/n; ptrdiff_t j; long double *buf1=buf0+n; double t; if(m<1) m=1; for(j=0;j<n;++j) buf0[j]=((long double)(j)); for(j=0;j<n;++j) bitreverse_table[j]=bitreverse_bruteforce(j,i); t=get_cpu_time(); for(j=0;j<m;++j) dbcF_bitreversal_permutation_l(i,buf0,1,buf1,1,tmp); t=get_cpu_time()-t; t/=(double)m; printf("%10.0f|%12.2f",(double)n,1e9*t/(double)n); for(j=0;j<n;++j) if(buf1[j]!=((long double)(bitreverse_table[j]))) {printf(" FAIL!\n");return;} t=get_cpu_time(); for(j=0;j<m;++j) dbcF_bitreversal_permutation_l(i,buf0,1,buf0,1,tmp); t=get_cpu_time()-t; t/=(double)m; printf("|%12.2f",1e9*t/(double)n); for(j=0;j<n;++j) buf0[j]=((long double)(j)); dbcF_bitreversal_permutation_l(i,buf0,1,buf0,1,tmp); for(j=0;j<n;++j) if(buf0[j]!=((long double)(bitreverse_table[j]))) {printf(" FAIL!\n");return;} for(j=0;j<n;++j) buf0[j]=((long double)(j)); t=get_cpu_time(); for(j=0;j<m;++j) bitreverseal_permutation_bruteforce_l(i,buf0,1,buf1,1); t=get_cpu_time()-t; t/=(double)m; printf("|%12.2f\n",1e9*t/(double)n); } } void test_fft_l(ptrdiff_t maxn) { ptrdiff_t i,j,a,b; ptrdiff_t MAX=((1<<24)+64)/sizeof(long double)/8; long double *buf=data.buf_l; if(maxn<MAX) MAX=maxn; printf(" | %5.5s | FFT-bruteforce | X-IFFT(FFT(X)) \n",(use_mflops?"Speed":"Time")); printf(" N | SoA | AoS | RMS | Linf | RMS | Linf \n"); printf("----------+-------+-------+-----------+-----------+-----------+-----------\n"); for(i=0;MAX>>i;++i) { ptrdiff_t n=(((ptrdiff_t)1)<<(i)); ptrdiff_t m=5*n*i; double RMS,Linf; double t; if(m==0) m=1; generate_l(37,n,buf+0*n,buf+1*n); printf("%10.0f|",(double)(((ptrdiff_t)1)<<(i))); for(j=0;j<2;++j) { t=test_time_l(n,buf+0*n,buf+1*n,buf+4*n,buf+5*n,(int)j); if(use_mflops) t=(double)m/(1.0e+9*t); else t=1.0e+9*t/(double)m; printf("%7.3f|",t); } dbc_fft_ls (n,buf+0*n,buf+1*n,1,1,buf+4*n,buf+5*n,1,1,((long double)(1.0))); dbc_ifft_ls(n,buf+4*n,buf+5*n,1,1,buf+6*n,buf+7*n,1,1,((long double)(1.0))/((long double)(n))); if(i<=10) { ft_bruteforce_l(n,buf+0*n,buf+1*n,buf+2*n,buf+3*n,0,((long double)(1.0))); get_norms_l(n,buf+2*n,buf+3*n,buf+4*n,buf+5*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e|",Linf); } else printf(" %-10s| %-10s|","-","-"); get_norms_l(n,buf+0*n,buf+1*n,buf+6*n,buf+7*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e",Linf); printf("\n"); } for(a=5,b=8;a<MAX;a+=b,b+=a) { ptrdiff_t n=a; double m=5.0*(double)n*log((double)n)/log(2.0); double RMS,Linf; double t; generate_l(37,n,buf+0*n,buf+1*n); printf("%10.0f|",(double)n); for(j=0;j<2;++j) { t=test_time_l(n,buf+0*n,buf+1*n,buf+4*n,buf+5*n,(int)j); if(use_mflops) t=m/(1.0e+9*t); else t=1.0e+9*t/m; printf("%7.3f|",t); } dbc_fft_ls (n,buf+0*n,buf+1*n,1,1,buf+4*n,buf+5*n,1,1,((long double)(1.0))); dbc_ifft_ls(n,buf+4*n,buf+5*n,1,1,buf+6*n,buf+7*n,1,1,((long double)(1.0))/((long double)(n))); if(a<=1024) { ft_bruteforce_l(n,buf+0*n,buf+1*n,buf+2*n,buf+3*n,0,((long double)(1.0))); get_norms_l(n,buf+2*n,buf+3*n,buf+4*n,buf+5*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e|",Linf); } else printf(" %-10s| %-10s|","-","-"); get_norms_l(n,buf+0*n,buf+1*n,buf+6*n,buf+7*n,&RMS,&Linf); printf(" %-10.3e|",RMS); printf(" %-10.3e",Linf); printf("\n"); } } static void test_accuracy(ptrdiff_t maxn) { ptrdiff_t i,q,k,n; ptrdiff_t MAX=((1<<24)+64)/sizeof(float)/16; if(maxn<MAX) MAX=maxn; for(q=0;q<2;++q) { ptrdiff_t a=5,b=8; for(n=(q?a:1);n<=MAX;a+=b,b+=a,n=(q?a:n*2)) { long double Ef=0.0,Ed=0.0; double m; float *bf=data.mixed.bf; double *bd=data.mixed.bd; long double *bl=data.mixed.bl; if(n<2) m=1.0; else m=log((double)n)/log(2.0); for(k=0;k<20;++k) { long double r,e; r=0.0L; e=0.0L; generate_f(n*37+k*17,n,bf,bf+n); for(i=0;i<n;++i) { bl[i ]=bf[i ]; bl[n+i]=bf[n+i]; } dbc_fft_fc(n,bf,bf+n,bf,bf+n,1.0f); dbc_fft_lc(n,bl,bl+n,bl,bl+n,1.0L); for(i=0;i<n;++i) { r+=bl[i ]*bl[i ]; r+=bl[n+i]*bl[n+i]; e+=(bf[i ]-bl[i ])*(bf[i ]-bl[i ]); e+=(bf[n+i]-bl[n+i])*(bf[n+i]-bl[n+i]); } e/=r; if(e>Ef) Ef=e; r=0.0L; e=0.0L; generate_d(n*37+k*17,n,bd,bd+n); for(i=0;i<n;++i) { bl[i ]=bd[i ]; bl[n+i]=bd[n+i]; } dbc_fft_dc(n,bd,bd+n,bd,bd+n,1.0 ); dbc_fft_lc(n,bl,bl+n,bl,bl+n,1.0L); for(i=0;i<n;++i) { r+=bl[i ]*bl[i ]; r+=bl[n+i]*bl[n+i]; e+=(bd[i ]-bl[i ])*(bd[i ]-bl[i ]); e+=(bd[n+i]-bl[n+i])*(bd[n+i]-bl[n+i]); } e/=r; if(e>Ed) Ed=e; } Ef=sqrtl(Ef); Ed=sqrtl(Ed); printf("%10.0f| %10.3e | %7.3f | %10.3e | %7.3f\n",(double)n,(double)Ef,(double)Ef*pow(2.0,23.0)/m,(double)Ed,(double)Ed*pow(2.0,52.0)/m); } } } int main() { int simd_flags; static const char *types[5]={"float","double","long double","__float128",""}; dbc_fft_fi(0,0,0,0.0f); simd_flags=dbcf_detect_simd(); printf("Detected SIMD: \n"); printf(" |x2 |x4 |x8 |x16\n"); printf("float: "); printf("| %s ",(0 ?"+":"-")); printf("| %s ",(simd_flags&1 ?"+":"-")); printf("| %s ",(simd_flags&4 ?"+":"-")); printf("| %s ",(simd_flags&16?"+":"-")); printf("\n"); printf("double: "); printf("| %s ",(simd_flags&2 ?"+":"-")); printf("| %s ",(simd_flags&8 ?"+":"-")); printf("| %s ",(simd_flags&32 ?"+":"-")); printf("| %s ",(0 ?"+":"-")); printf("\n"); printf("\n"); if(0) { printf("Testing permutation speed.\n"); printf("Time is in ns/element.\n"); printf(" %s:\n",types[0]); test_bitreversal_permutation_f(); printf(" %s:\n",types[1]); test_bitreversal_permutation_d(); printf(" %s:\n",types[2]); test_bitreversal_permutation_l(); printf("\n"); } if(1) { printf("Testing dbc_fft.\n"); printf("Elements of X are in [-0.5;+0.5], uniformly distributed.\n"); if(use_mflops) printf("Speed is in Cooley-Tukey gigaflops (CTGs): 5*N*log2(N)/(time in ns),\nsimilar to FFTW benchmarks.\n"); else printf("Time is in ns/(5*N*log2(N)), similar to FFTW benchmarks.\n"); printf("SoA is separate real/imag, AoS is interleaved real/imag.\n"); printf(" %s:\n",types[0]); test_fft_f(((1<<24)+64)/sizeof(float)/8); printf(" %s:\n",types[1]); test_fft_d(((1<<24)+64)/sizeof(double)/8); printf(" %s:\n",types[2]); test_fft_l(((1<<24)+64)/sizeof(long double)/8); printf("\n"); } if(1) { printf("Testing accuracy.\n"); printf("Values reported are:\n"); printf(" Err=RMS(error)/RMS(output),\n"); printf(" Rel=Err/(E*log2(N)),\n"); printf("where E=ULP(1), i.e. 2^{-23} for float and 2^{-52} for double.\n"); if(sizeof(long double)<=sizeof(float)) printf("WARNING: accuracy figures for float are likely wrong.\n"); if(sizeof(long double)<=sizeof(double)) printf("WARNING: accuracy figures for double are likely wrong.\n"); printf(" | float | double \n"); printf(" N | Err | Rel | Err | Rel \n"); printf("----------+------------+---------+------------+---------\n"); test_accuracy(((1<<24)+64)/sizeof(float)/16/32); printf("\n"); } return 0; }
Become a Patron
Sponsor on GitHub
Donate via PayPal
Compiler Explorer Shop
Source on GitHub
Mailing list
Installed libraries
Wiki
Report an issue
How it works
Contact the author
CE on Mastodon
CE on Bluesky
Statistics
Changelog
Version tree