Power Consumption: Who is the king?

Intel promised us better performance per watt, lower energy consumed per instruction, and an overall serious reduction in power consumption with Conroe and its Core 2 line of processors. Compared to its NetBurst predecessors, the Core 2 lineup consumes significantly lower power - but what about compared to AMD?

This is one area that AMD is not standing still in, and just days before Intel's launch AMD managed to get us a couple of its long awaited Energy Efficient Athlon 64 X2 processors that are manufactured to target much lower TDPs than its other X2 processors. AMD sent us its Athlon 64 X2 4600+ Energy Efficient processor which carries a 65W TDP compared to 89W for the regular 4600+. The more interesting CPU is its Athlon 64 X2 3800+ Energy Efficient Small Form Factor CPU, which features an extremely low 35W rating. We've also included the 89W Athlon 64 X2s in this comparison, as well as the 125W Athlon 64 FX-62.

Cool 'n Quiet and EIST were enabled for AMD and Intel platforms respectively; power consumption was measured at the wall outlet. We used an ASUS M2NPV-VM for our AM2 platform and ASUS' P5W DH Deluxe for our Core 2 platform, but remember that power consumption will be higher with a SLI chipset on either platform. We used a single GeForce 7900 GTX, but since our power consumption tests were all done at the Windows desktop 3D performance/power consumption never came into play.

We took two power measurements: peak at idle and peak under load while performing our Windows Media Encoder 9 test.

System Level Power Consumption at Idle

Taking into consideration the fact that we were unable to compare two more similar chipsets (we will take a look back at that once retail Intel nForce 5 products hit the shelves), these power numbers heavily favor Intel. The releative power savings over the Extreme Edition 965 show just how big the jump is, and the ~15% idle power advantage our lower power AMD motherboard has over the Intel solution isn't a huge issue, especially when considering the performance advantage for the realtively small power investment.

System Level Power Consumption under Full Load

When looking at load power, we can very clearly see that AMD is no longer the performance per watt king. While the Energy Efficient (EE) line of X2 processors is clearly very good at dropping load power (especially in the case of the 3800+), not even these chips can compete with the efficiency of the Core 2 line while encoding with WME9. The bottom line is that Intel just gets it done faster while pulling fewer watts (e.g. Performance/Watt on the X6800 is 0.3575 vs. 0.2757 on the X2 3800+ EE SFF).

In fact, in a complete turn around from what we've seen in the past, the highest end Core 2 processor is actually the most efficient (performance per Watt) processor in the lineup for WME9. This time, those who take the plunge on a high priced processor will not be stuck with brute force and a huge electric bill.

FSB Bottlenecks: Is 1333MHz Necessary? Application Performance using SYSMark 2004 SE
Comments Locked

202 Comments

View All Comments

  • coldpower27 - Friday, July 14, 2006 - link

    Are there supposed to be there as they aren't functioning in Firefox 1.5.0.4
  • coldpower27 - Friday, July 14, 2006 - link

    You guys fixed it awesome.
  • Orbs - Friday, July 14, 2006 - link

    On "The Test" page (I think page 2), you write:

    please look back at the following articles:

    But then there are no links to the articles.

    Anyway, Anand, great report! Very detailed with tons of benchmarks using a very interesting gaming configuration, and this review was the second one I read (so it was up pretty quickly). Thanks for not saccrificing quality just to get it online first, and again, great article.

    Makes me want a Conroe!
  • Calin - Friday, July 14, 2006 - link

    Great article, and thanks for a well done job. Conroe is everything Intel marketing machine shown it to be.
  • stepz - Friday, July 14, 2006 - link

    The Core 2 doesn't have smaller emoty latency than K8. You're seeing the new advanced prefetcher in action. But don't just believe me, check with the SM2.0 author.
  • Anand Lal Shimpi - Friday, July 14, 2006 - link

    That's what Intel's explanation implied as well, when they are working well the prefetchers remove the need for an on-die memory controller so long as you have an unsaturated FSB. Inevitably there will be cases where AMD is still faster (from a pure latency perspective), but it's tough to say how frequently that will happen.

    Take care,
    Anand
  • stepz - Friday, July 14, 2006 - link

    Excuse me. You state "Intel's Core 2 processors now offer even quicker memory access than AMD's Athlon 64 X2, without resorting to an on-die memory controller.". That is COMPLETELY wrong and misleading. (see: http://www.aceshardware.com/forums/read_post.jsp?i...">http://www.aceshardware.com/forums/read_post.jsp?i... )

    It would be really nice from journalistic integrity point of view and all that, if you posted a correction or atleast silently changed the article to not be spreading incorrect information.


    Oh... and you really should have smelt something fishy when a memory controller suddenly halves its latency by changing the requestor.
  • stepz - Friday, July 14, 2006 - link

    To clarify. Yes the prefetching and espescially the speculative memory op reordering does wonders for realworld performance. But then let the real-world performance results speak for themselves. But please don't use broken synthetic tests. The advancements help to hide latency from applications that do real work. They don't reduce the actual latency of memory ops that that test was supposed to test. Given that the prefetcher figures out the access pattern of the latency test, the test is utterly meaningless in any context. The test doesn't do anything close to realworld, so if its main purpose is broken, it is utterly useless.
  • JarredWalton - Friday, July 14, 2006 - link

    Modified comments from a similar thread further down:

    Given that the prefetcher figures out the access pattern of the latency test, the test is utterly meaningless in any context."

    That's only true if the prefetcher can't figure out access patterns for all other applications as well, and from the results I'm pretty sure it can. You have to remember, even with the memory latency of approximately 35 ns, that delay means the CPU now has about 100 cycles to go and find other stuff to do. At an instruction fetch rate of 4 instructions per cycle, that's a lot of untapped power. So, while it waits on main memory access one, it can be scanning the next accesses that are likely to take place and start queuing them up and priming the RAM. The net result is that you may never actually be able to measure latency higher than 35-40 ns or whatever.

    The way I think of it is this: pipeline issues aside, a large portion of what allowed Athlon 64 to outperform NetBurst was reduced memory latency. Remember, Pentium 4 was easily able to outperform Athlon XP in the majority of benchmarks -- it just did so at higher clock speeds. (Don't *even* try to tell me that the Athlon XP 3200+ was as fast as a Pentium 4 3.2 GHz! LOL. The Athlon 64 3200+ on the other hand....) AMD boosted performance by about 25% by adding an integrated memory controller. Now Intel is faster at similar clock speeds, and although the 4-wide architectural design helps, not to mention 4MB shared L2, they almost certainly wouldn't be able to improve performance without improving memory latency -- not just in theory, but in actual practice. Looking at the benchmarks, I have to think that our memory latency scores are generally representative of what applications see.

    If you have to engineer a synthetic application specifically to fool the advanced prefetcher and op reordering, what's the point? To demonstrate a "worst case" scenario that doesn't actually occur in practical use? In the end, memory latency is only one part of CPU/platform design. The Athlon FX-62 is 61.6% faster than the Pentium XE 965 in terms of latency, but that doesn't translate into a real world performance difference of anywhere near 60%. The X6800 is 19.3% faster in memory latency tests, and it comes out 10-35% faster in real world benchmarks, so again there's not an exact correlation. Latency is important to look at, but so is memory bandwidth and the rest of the architecture.

    The proof is in the pudding, and right now the Core 2 pudding tastes very good. Nice design, Intel.
  • coldpower27 - Friday, July 14, 2006 - link

    But why are you posting the Manchester core's die size?

    What about the Socket AM2 Windsor 2x512KB model which has a die size of 183mm2?

Log in

Don't have an account? Sign up now