Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors
by Anand Lal Shimpi on December 30, 2005 11:36 AM EST- Posted in
- CPUs
Larger L2, but no increase in latency?
When Prescott first got a 2MB L2 cache, we noticed that along with a larger L2 came a 17% increase in access latency. The end result was a mixed bag of performance, with some applications benefitting from the larger cache while others were hampered by the increase in L2 latency. Overall, the end result was that the two performance elements balanced each other out and Prescott 2M generally offered no real performance improvement over the 1MB version.
With Presler, each core also gets an upgraded 2MB cache, as compared to the 1MB L2 cache found in Smithfield. The upgrade is similar to what we saw with Prescott, so we assumed that along with a larger L2 cache per core, Presler's L2 cache also received an increase in L2 cache latency over Smithfield.
In order to confirm, we ran ScienceMark 2.0 and Cachemem:
What we found was extremely interesting; however, Presler does have the same 27 cycle L2 cache as Prescott 2M, but so does Smithfield. We simply took for granted that Smithfield was nothing more than two Prescott 1M cores put together, but this data shows us that Smithfield actually had the same higher latency L2 cache as Prescott 2M.
Although we were expecting Presler to give us a higher latency L2 over Smithfield, it looks like Smithfield actually had a higher latency L2 to begin with. This means that, at the same clock speed, Presler will be at least as fast as Smithfield, if not faster. Normally, we take for granted that a new core means better performance, but Intel has let us down in the past; luckily, this time we're not put in such a situation.
When Prescott first got a 2MB L2 cache, we noticed that along with a larger L2 came a 17% increase in access latency. The end result was a mixed bag of performance, with some applications benefitting from the larger cache while others were hampered by the increase in L2 latency. Overall, the end result was that the two performance elements balanced each other out and Prescott 2M generally offered no real performance improvement over the 1MB version.
With Presler, each core also gets an upgraded 2MB cache, as compared to the 1MB L2 cache found in Smithfield. The upgrade is similar to what we saw with Prescott, so we assumed that along with a larger L2 cache per core, Presler's L2 cache also received an increase in L2 cache latency over Smithfield.
In order to confirm, we ran ScienceMark 2.0 and Cachemem:
Cachemem L2 Latency (128KB block, 64-byte stride) | ScienceMark L2 Latency (64-byte stride) | |
AMD Athlon 64 X2 4800+ | 17 cycles | 17 cycles |
Intel Smithfield 2.8GHz | 27 cycles | 27 cycles |
Intel Presler 2.8GHz | 27 cycles | 27 cycles |
Intel Prescott 2M | 27 cycles | 27 cycles |
Intel Prescott 1M | 23 cycles | 23 cycles |
What we found was extremely interesting; however, Presler does have the same 27 cycle L2 cache as Prescott 2M, but so does Smithfield. We simply took for granted that Smithfield was nothing more than two Prescott 1M cores put together, but this data shows us that Smithfield actually had the same higher latency L2 cache as Prescott 2M.
Although we were expecting Presler to give us a higher latency L2 over Smithfield, it looks like Smithfield actually had a higher latency L2 to begin with. This means that, at the same clock speed, Presler will be at least as fast as Smithfield, if not faster. Normally, we take for granted that a new core means better performance, but Intel has let us down in the past; luckily, this time we're not put in such a situation.
84 Comments
View All Comments
Betwon - Saturday, December 31, 2005 - link
NO.Don't You think that Future versions of the patch will be written by intel.
Viditor - Saturday, December 31, 2005 - link
Doubtful (but who knows)...I can't see Intel spending 100s of millions with every developer (or even 1 developer) for the long term, just to keep tweaking their patches. It's just not a very smart long term strategy (and Intel is quite smart).
Betwon - Saturday, December 31, 2005 - link
You just guess it.We find that the good quality codes can provide better performance for both AMD and Intel.
Intel can often benefit more, because the performance potential of Intel is high.
Now, You can not find another SMP-game which can make fps of SMP CPU improve so much great.
If you find it, please tell us.
There is no one who found it.
Viditor - Saturday, December 31, 2005 - link
Now it's you who's guessing...
Betwon - Saturday, December 31, 2005 - link
NO.It is true.
Viditor - Saturday, December 31, 2005 - link
OK...prove it!
Betwon - Saturday, December 31, 2005 - link
For example:we saw a test(from anandtech)
With the good quality codes, AMD become faster than before, but Intel become much faster than before.
They use Intel's compiler.
Betwon - Saturday, December 31, 2005 - link
When not use the intel's compiler, AMD become slow.Viditor - Saturday, December 31, 2005 - link
I know you've often quoted from the spec.org site...
I suggest you revisit there and look at the difference between AMD systems using Intel compilers and the PathScale or Sun compilers. In general, the Spec scores for AMD improve by as much as 30% when not using an Intel compiler...especially in FP.
http://www.swallowtail.org/naughty-intel.html">http://www.swallowtail.org/naughty-intel.html
defter - Saturday, December 31, 2005 - link
This is not true, for example:
FX-57, Intel compiler, SpecInt base 1862:
http://www.spec.org/osg/cpu2000/results/res2005q2/...">http://www.spec.org/osg/cpu2000/results/res2005q2/...
FX-57, Pathscale compiler, 1745: http://www.spec.org/osg/cpu2000/results/res2005q2/...">http://www.spec.org/osg/cpu2000/results/res2005q2/...
Opteron 2.8GHz, Intel compiler, SpecInt base 1837: http://www.spec.org/osg/cpu2000/results/res2005q3/...">http://www.spec.org/osg/cpu2000/results/res2005q3/...
Opteron 2.8GHz, Sun compiler, SpecInt base 1660: http://www.spec.org/osg/cpu2000/results/res2005q4/...">http://www.spec.org/osg/cpu2000/results/res2005q4/...
In SpecFP Intel compiler produces slightly slower results, but the difference isn't 30%:
Opteron 2.8GHz (HP hardware), Intel compiler, SpecFP base 1805: http://www.spec.org/osg/cpu2000/results/res2005q3/...">http://www.spec.org/osg/cpu2000/results/res2005q3/...
Opteron 2.8GHz (HP hardware), Pathscale compiler, SpecFP base 2052: http://www.spec.org/osg/cpu2000/results/res2005q3/...">http://www.spec.org/osg/cpu2000/results/res2005q3/...
Opteron 2.8GHz (Sun hardware), Sun compiler, SpecFP base 2132: http://www.spec.org/osg/cpu2000/results/res2005q4/...">http://www.spec.org/osg/cpu2000/results/res2005q4/...
So let's see:
Intel vs Sun compiler:
- Intel complier is 10.7% faster in SpecINT
- Sun compiler is 18.1% faster in SpecFP
Intel vs Pathscale compiler:
- Intel compiler is 6.7% faster in SpecInt
- Pathscale compiler is 13.7% faster is SpecFP
It is quite suprising that Intel's compiler gives best results for AMD's processors in many situations.