The four 128-bit NEON pipelines thus on
paper match the current throughput capabilities of desktop cores from AMD and Intel,
albeit with smaller vectors. Floating-point operations throughput here is 1:1 with the
pipeline count, meaning Firestorm can do 4 FADDs and 4 FMULs per cycle with
respectively 3 and 4 cycles latency. That’s quadruple the per-cycle throughput of Intel
CPUs and previous AMD CPUs, and still double that of the recent Zen3, of course, still
running at lower frequency. This might be one reason why Apples does so well in browser
benchmarks (JavaScript numbers are floating-point doubles).