Re: [Jack-Devel] AMD Bulldozer CPUs, shared FPU and Intel Hyper-threading
Chris Caudle wrote:
> The issue with bulldozer (and piledriver) is that two integer cores share
> a single floating point core, so the scheduler thinks it can schedule two
> cores, but depending on whether the processes being scheduled are mostly
> integer or mostly floating point, it might work well or there might be
> contention for the single FP core.
AMD says (in the optimization manual):
| The AMD Family 15h processor floating point unit (FPU) was designed to
| provide four times the raw FADD and FMUL bandwidth as the original AMD
| Opteron and Athlon 64 processors.  It achieves this by means of two
| 128-bit fused multiply-accumulate (FMAC) units which are supported by
| a 128-bit high-bandwidth load-store system. [...]
| The FPU can receive up to four ops per cycle. These ops can only be
| from one thread, but the thread may change every cycle. Likewise the
| FPU is four wide, capable of issue, execution and completion of four
| ops each cycle. Once received by the FPU, ops from multiple threads
| can be executed.
So if you're running two FP-heavy threads that happen to interfere with
each other in the most negative way, the theoretical performance will
only be double that of earlier CPUs ...
In practice, the FPU sharing matters only if you use 256-bit AVX
instructions, because those must be executed in two steps.
Regards,
Clemens
1361393255.12502_0.ltw:2,a <5125361E.7090008 at ladisch dot de>