Sunday, February 21, 2010

Parallel PowerPoint: Why the Power Law is Important


The notion of "parallel PowerPoint" is a poster child for the uselessness of multicore on client computers. Rendering a slide two, four, eight times faster is a joke. Nobody needs it.

Running PowerPoint 4, 16, 64 times longer on my laptop battery, though, that's useful. I am purely sick of carrying around a 3 lb. power supply with my expensively light, carbide case, 3 lb. laptop.

This is why the Parallel Power Law – if the hardware is designed right, the processor power can drop with the square of the number of processors – is important. I'd like to suggest that it is a, possibly the, killer app for parallelism.

There are two keys here, the source of both I discussed in the previous post, the Parallel Power Law. One key is reduced power, of course: Double the processors and halve the clock rate, with associated reduced voltage, and the power drops by 4X (see that post). That decrease doesn't happen, though, unless the other key is present: You don't increase performance, as parallel mavens have been attempting for eons; instead, you keep it constant. Many systems, particularly laptops and graphics units, now reduce clock speed under heat or power constraints; but they don't maintain performance.

The issue is this: All the important everyday programs out there run with perfectly adequate performance today, and have done so for nigh unto a decade. Email, browsers, spreadsheets, word processors, and yes, PowerPoint, all those simply don't have a performance issue; they're fast enough now. Multicore in those systems is a pure waste: It's not needed, and they don't use it. Yes, multiple cores get exercised by the multitasking transparently provided by operating systems, but it really doesn't do anything useful; with a few exceptions, primarily games, programs are nearly always performance-gated by disk or network or other functions than by processing speed. One can point to exceptions – spreadsheets in financial circles where you press F5 (recalculate) and go get coffee, or lunch, or a night's sleep – but they scarcely are used in volume.

And volume is what defines a killer app: It must be something used widely enough to soak up the output of fabs. Client applications are the only ones with that characteristic.

What I'm trying to point to here appears to be a new paradigm for the use of parallelism: Use it not to achieve more performance, but to drastically lower power consumption without reducing performance. It applies to both traditional clients, and to the entire new class of mobile apps running on smart phones / iPads / iGoggles that is now the fastest-expanding frontier.

For sure, the pure quadratic gains (or more; see the comments on that post) won't be realized, because there are many other uses of power this does not affect, like displays, memory, disks, etc. But a substantial fraction of the power used is still in the processor, so dropping its contribution by a lot will certainly help.

Can this become a new killer-app bandwagon? It's physically possible. But I doubt it, because the people with the expertise to do it, the parallel establishment, is too heavily invested in the paradigm of parallel for higher performance, busily working out how to get to exascale computation, with conjoined petascale department systems, etc.

Some areas definitely need that level of computation; weather and climate simulation, for example. But cases like that, while useful and even laudable, are increasingly remote from the volumes needed to keep the industry running as it has in the past. Parallelism for lower power is the only case I've seen that truly addresses that necessarily broad market.

10 comments:

Unknown said...

I think you overstate somewhat the level of performance we have.

The computer I use most of the time has a 1.2 GHz Core 2 Duo processor. Compared to machines a decade ago, it's blisteringly fast, no doubt about that.

For many tasks today, especially web browsing and video playback, it's desperately slow.

For stuff like Word, sure; there's a huge I/O bottleneck in front of the keyboard, and that's been the case for two decades or more, no doubt about it.

But web browsing--especially with complex Flash or HTML 5 sites, and H.264 video that's sadly all too often decoded purely in software--and straight video playback still causes quite a burden.

And god forbid I try to play a game on this thing. Even Flash games like Tower Defense can make this puny processor cry.

So whilst all the tasks we did a decade ago are still done today, and generally see little benefit from increased performance, I think there are enough *new* tasks that there's still a good reason to push performance higher. The performance-demanding niche is still there, and IMO has remained fairly constant.


Compiling software is not much fun either, but I agree that's something of a niche task.

As such, I'm not altogether surprised that they're still pushing for higher performance.

I think it's also worth noting that Intel's last few "multicore" and multithreaded processors (Conroe, Penryn, Nehalem) have also offered best-in-class single-threaded performance. There's a reason for that: it's the only reliable way to make the processor faster. Because the other thing about these commonplace workloads is that they're still overwhelmingly single-threaded (computationally, I/O causes lots of multithreading for non-computational code).

Trading off single-threaded performance for greater parallelism is still a bad idea for typical workloads.

Fazal Majid said...

Amdahl's law would make this moot. I believe the CPU itself only accounts for about 30% of a laptop's power draw (the screen, hard drive, video card, WiFi, gigabit ethernet and even RAM account for the rest). Even with an infinite decrease in CPU power consumption from parallelism, this would at best gain a 50% increase in battery life, probably not enough to justify the huge effort required.

Andrew Richards said...

I can't see that these tasks are easily parallelizable at all. There are so many plug-ins and DLLs and COM objects, that I think the parallelism would cross lots of module boundaries, which would make it practically impossible to parallelize. I think that, for this reason, we're likely to see x86 processors get smaller, lower power and not go beyond 2 cores for most people. It's (as DrPizza says) new tasks, especially graphcis, that is going to be running on higher-performance parallel processors in the future. Those tasks parallelize better, and gain from extra performance. It's the reason Intel has to have a high-performance GPU: because for most people 2 increasingly small x86 cores plus a constantly improving GPU/media processor for 3D graphics, video and audio processing, is going to be the best power/performance option. I think what we're seeing is that the future of Moore's Law for processing is GPU-like devices (for consumers and desktops) and x86 multicore is for servers.

Greg Pfister said...

A friend who wishes to be anonymous commented in email to me, essentially agreeing with DrPizza above:

"You said [current systems are adequate to run office applications]. I disagree. Some of the latest email/collaboration packages for corporate consumption can bring a dual core laptop to its knees. They're bloatware and sluggish and would still be that way if the laptop got 2x faster."

My response:

It's unrealistic to think that any software, no matter how inefficient, will be satisfied by any finite amount of computing. I could write a mail program that computes a fractal, in Java, every time it gets an input keystroke. It would be slow. Should I then say the hardware is inadequate?

New and existing software can no longer be designed with the assumption that it really will be run on hardware two or four times faster in the near future. Those days are gone. Some software development houses don't have that message yet.

Tomas said...

Sort of true. How many households have multiple CPUs already? How long til everyone has a server and a few tablets in each house?

Greg Pfister said...

Tom, what you say is true; there are many computers - hence many cores - in every house. I have one in a light switch. (and oven, and microwave, and dishwasher, washer, dryer, etc.) And let's not even try to count what's in cars.

But I'm talking here about multiple CPUs (cores) in a single computer, not multiple computers.

PowerPoint said...

This has been really helpful! Thank you so much! I have a presentation for nest week and this is just perfect! Thanks a lot for sharing this with us! More power to you and to your site!

Anonymous said...

You're kidding, right? Black text on a dark blue background? You might be saying something important... but I ain't reading it.

Greg Pfister said...

Eh what? It's supposed to be black text on a slightly grey background. That's what I get. Anybody else seeing that?

Greg

Yale Zhang said...

Greg, I'm afraid to say your main proposal of trading MhZ for less power isn't possible these days.

Namely, you assume good old Dennard scaling,

power before: frequency * capacitance * voltage^2

power after: 2 * (frequency/2) * capacitance * (voltage/2)^2

This scaling has ended about 10 years ago, because reducing the voltage also requires making the transistor gate thinner to achieve lower threshold voltage, increasing the leakage current (grows exponentially w.r.t. -gate thickness, I believe).

These days, it's not uncommon for CMOS static power to be 40% of the chip power. Of course, power gating reduces the problem, but for practical purposes, the supply voltage has remained constant to keep leakage power manageable. You can still reduce frequency, but that will decrease performance, obviously.

Post a Comment

Thanks for commenting!

Note: Only a member of this blog may post a comment.