In this paper the CERN openlab Platform Competence Centre (PCC) reports on a set of benchmark results obtained by the team when comparing a 32-core, quad socket, “Sandy Bridge-EP” server with a 16-core, dual socket, “Sandy Bridge-EP” server and a 40-core, quad socket server using Intel’s previous microarchitecture, the “Westmere-EX”. The Intel marketing names for the corresponding processors are the “Xeon E5-4600 processor series”, “Xeon E5-2600 processor series” and “Xeon E7-4800 processor series”, respectively. Multiple benchmarks were used to get a good understanding of the performance of each processor and the corresponding system. We used both industry-standard benchmarks, such as SPEC CPU2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events.
In summary, the PCC team found that the quad socket Sandy Bridge-EP offers the equivalent frequency-scaled performance of the quad socket Westmere-EX. It should be noted that the Westmere-EX processor comes with 25% more cores and 20% larger L3 caches. The team also found that Turbo Boost Technology has been improved. Vectorized applications get an additional performance boost given by the new AVX instructions. The performance of the quad socket Sandy Bridge-EP E5-4650 processor evaluated in this report closely matches that of the two socket Sandy Bridge-EP E5-2680, which was to be expected as they share most characteristics, the Xeon E5-4650 being the quad socket version of the Xeon E5-2680.
Intel has improved the thermal characteristics of the Sandy Bridge substantially and this was reflected in the measurements of idle power, but also in the measurements of a fully loaded system. Computer centers that are power constrained will, without doubt, appreciate the improvements in this important domain.
In the HEP-SPEC06 tests the quad socket Sandy Bridge-EP system performed at the same level as the quad socket Westmere-EX when frequency scaled (within 0.5% one from each other). Given the extra 25% cores of the Westmere-EX that means that we have a 25% performance increase per core when both servers were fully loaded. If we consider the performance per core gains using a number of threads matching the number of physical cores on each system the figure increases to 31%. That’s due to the fact that the gain with SMT is 27.7% for the Westmere-EX and 21.3% for the quad socket Sandy Bridge-EP.
When the PCC team tested weak scaling using the Multithreaded Geant4 benchmark, they found a performance increase of 2% when using all SMT cores on the quad socket Sandy Bridge-EP server when compared to the quad socket Westmere-EX server. The SMT benefit was 24.8%.
When running the Parallel Maximum Likelihood fitting benchmark, which has a fixed problem size enforcing a strong scaling behaviour, the PCC team obtained a 23% performance increase per core when using SSE on both generation of processors.
In conclusion the PCC team confirms that the quad socket Sandy Bridge EP processor is a significant improvement in terms of performance per core when compared to the previous Westmere-EX generation which is Intel’s flagship platform when it comes to expandable architectures. With this new generation Intel has allowed expectations to be set at a high level and we are keen to see whether the pace of improvement can be sustained for the 22 nm processors, namely the Ivy Bridge-EP and Ivy Bridge-EX processors planned for 2013 and Haswell for 2014.