The Test
Unfortunately the platform that we were testing on would only allow us to run our DDR2-800 at 4-5-4 timings, instead of the 3-3-3 that's possible with this memory on socket 775. That hurts performance a bit, but the real world difference between 4-5-4 and 3-3-3 isn't going to be more than a few percentage points.
The only DDR2-800 we had on hand was in the form of 1GB modules, so we had to use a pair of 1GB DDR-400 which ran at 2-3-2, instead of the 2-2-2 we normally run with our smaller 512MB modules. Once again, the difference in performance isn't tremendous, but we wanted to explain why the timings were different than what we've used in the past.
Both our Socket-AM2 and Socket-939 Athlon 64 X2 processors ran at the same clock speed with the same cache sizes, so the results should give us a clear indication of whether or not AM2 is faster than equivalent 939 configuration.
CPU: | AMD Athlon 64 X2 Socket-AM2 AMD Athlon 64 X2 Socket-939 |
Motherboard: | ASUS A8N32-SLI (Socket-939) Unnamed MCP55 Socket-AM2 Motherboard |
Chipset: | NVIDIA nForce4 SLI x16 NVIDIA MCP55 |
Chipset Drivers: | nForce4 6.70 |
Hard Disk: | Seagate 7200.9 300GB SATA |
Memory: | OCZ PC8000 DDR2-800 4-5-4-15 (1GB x 2) OCZ DDR-400 2-3-2 (1GB x 2) |
Video Card: | NVIDIA GeForce 7800 GTX |
Video Drivers: | NVIDIA ForceWare 84.21 |
Desktop Resolution: | 1280 x 1024 - 32-bit @ 60Hz |
OS: | Windows XP Professional SP2 |
107 Comments
View All Comments
mino - Tuesday, April 11, 2006 - link
1) 3-cycle L1 on K7/K8 is the fastest required, it goes from the internal structure if the scheduler and the pipeline that 2-cycle chache would do almost no good. Also they would have to reduce L1 size to 32k+32k which would hurt. It simply does not make sense to change L1 at all, maybe on K8L but IMHO 128k+128k would help much more than 2-cycle latency.2) 17-cycle L2 is PRETTY GOOD for 1M L2 with exclusive structure!!! IMHO it is possible to do 16-cycle, maybe 15, but nowhere near Dothan's 10-cycle. Also remember lower-latency L2 has scaling problems (that's why intel made prescott's L2 slower than NW's)
3) Concerning the memory subsystem(caches + memory) (on single-socket K8/K8L) the biggest issue is the robustness(amount of on the fly acceses to memory) and latency of the memory controller. To solve this is not trivial thing. IMHO to add 2-4M L3 with random access ~50 cycles would do.
4) In the >4 sockets front all they need is effective caching of MOESI snoops.
You are also forgot K7/K8 is mostly KISS architecture. It is just wery well balanced so has good performance in the end. However do one wrong change and you are screwed.
KISS == Keep It Simple Silly
About "weak" SIMD implementation on AMD, don't fool yourselves guys. Only x86 architecture faster than K8 on SSE/SSE2 is Netburst aka SIMD-by-intel.
About conroe, ita has twice as wide ALU's and FPU's than PIII/K7/K8, this means it has huge resources at disposal to calculate SIMD.
Same goes for K8L 2 quarters later. That said K7/K8 core has far more FP power than P6 architecture. On FP Conroe and K8 are about aquall.
but K8L will wipe the floor with K8 and Conroe on FP. Conroe will wipe K8 on INT and be still faster than K8L by decent margin.
Overall we are for another PIII vs. K7 battle with single very important change - AMD has a platform it had not back in the K7 vs. PIII days.
fitten - Thursday, April 13, 2006 - link
I find the K8L a somewhat odd strategy. I guess they are targeting the Itanium market because Opterons already have a good part of the HPC market. Given that the HPC people are the ones that really care about FPU performance and that they are still a fairly small market segment, it seems an odd target. Integer performance rules the roost for servers... web, database, and just about everything else you can think of other than number crunching simulations and the like. Desktop uses for FPU are a few like games and some mathmatical stuff. Intel is focusing on integer performance at least as much as FPU with Conroe (Conroe gets a good dose of both), which makes sense to me since so much of the work done on computers, both desktops and servers, is dominated by integer operations. K8L speculation says only FPU horsepower will be added... just doesn't seem like a sound decision to me.Zoomer - Monday, April 10, 2006 - link
Hey anand, could you take out 1 of the two modules and do a quick test on that?With doubled (in theory) bandwidth with ddr2, wouldn't the dual channel mem controller be even more redundant? Perhaps we'll see a new 754-ish socket? :)
Zoomer - Monday, April 10, 2006 - link
Hey anand, could you take out 1 of the two modules and do a quick test on that?With doubled (in theory) bandwidth with ddr2, wouldn't the dual channel mem controller be even more redundant? Perhaps we'll see a new 754-ish socket? :)
Furen - Monday, April 10, 2006 - link
I dont believe we will. Even S1 will be dual-channel, and this is what would have benefited the most from being single-channel (since the pincount would be much lower the package could be much smaller).BaronMatrix - Monday, April 10, 2006 - link
Looking at the intensive timing and bus speed tweaks USING the SAME RAM as the latest XE955 article I would have expected the same kind of thing here. Anand doesn't look at lower speed lower latency for whatever chip he used. That RAM will do 3-2-2 at 667. Obviously AMD is more sensitive to latency.ChristTheGreat - Monday, April 10, 2006 - link
AMD is sensitive to latencies, cause of the memory controller. I'm sure that 3-2-2-9 DDR2 from OCZ, would give much more performance on AMD.Again, this is only a CPU that they use to test, so it's not the true CPU. They wouldn't give us the performance it gives before it's launch. That's like killing yourself right now if the performance is poor....
I saw an article, that AMD could be working on DDR2 latencies. You think that 4-4-4-12 is good timings? 12 = tRAS
"tRAS is the time required before (or delay needed) between the active and precharge commands. In other words, how long the memory must wait before the next memory access can begin."
In fact, you have better frequencies, but lower timings.... What you need, is higher frequencies, and lower timings.
So we will have to wait till they launch Socket AM2, to know the true performance of AM2.
defter - Monday, April 10, 2006 - link
4-4-4-12 are good timings, even for DDR2-667. It isn't easy to find reasonable priced DDR2-667 that works on those timing with standard voltage.
Some people forget that 99% of consumers won't be using super expensive overvolted 3-3-3-10 DDR2-800 memory just to get few percents of extra performance. And if you compare AMD CPU + super fast DDR2-800 against Intel CPU (which runs fine on DDR2-667 because of FSB limitation) then you need to take into account higher price of memory on AMD system.
Wesley Fink - Monday, April 10, 2006 - link
We are continuing to test the AM2 on different AM2 boards. On another motherboard we could run at 3-3-3 DDR2-800 with the OCZ PC2-8000 memory. Latency was a bit lower and bandwidth a bit higher, but nothing realy changed from Anand's conclusions. We have also been running DDR2-667 and DDR2-533 tests with this new super fast OCZ memory and cheaper mainstream DDR2 memory, and we will be sharing those results as soon as testing is complete.cornfedone - Monday, April 10, 2006 - link
The crap the mobo companies have been shoving out the doors the past couple years is pure garbage as any number of hardware review sites have confirmed. It looks like the AM2 mobos might be more half-baked crap. Until you can test the shipping CPUs on a quality mobo that allows proper memory timing, it's difficult to know what AMD's AM2 CPUs will or won't deliver. If I had a dollar for every bogus claim Intel has made, I'd be a Billionaire so I wouldn't hold my breath that Conroe will perform as Intel claims.