Tuesday, September 8, 2009

Japan, Inc., vs. Intel – Not Really

Fujitsu, Toshiba, Panasonic, Renesas Technology, NEC, Hitachi and Canon, with Japan's Ministry of Economy, Trade and Industry supplying 3-4 billion yen, are pooling resources to build a new "super CPU" for consumer electronics by the end of 2012, according to an article in Forbes. It's being publicized as a "taking on Intel."

The design is based on the work of Hironori Kasahara, professor of computer science at Waseda University, and is allegedly extremely power-efficient. It even "runs on solar cells that will use less than 70% of the power consumed by normal ones." Man, I hate silly marketing talk, especially when subject to translation.

El Reg also picked up on this development.

Why a new CPU design? To jump to the conclusion: I don't know. I don't see it. Not clear what's really going on here.

Digging around for info runs into an almost impenetrable wall of academic publisher copyrights. I did find a paper downloadable from back in 2006, and what looks like a conference poster session exhibit, and a friend got me a copy of a more recent paper that gave a few more clues.

The main advances here appear to be in Kasahara's OSCAR compiler, which produces a hierarchical coarse-grain task graph that is statically scheduled on multiprocessors by the compiler itself. The lowest levels appear to target all the way down to an accelerator. I'm not enough of a compiler expert to judge this, but fine, I'll agree it works. A compiler doesn't require a new CPU design.

The multicore system targeted – and of course there's no guarantee this is what the funded project will ultimately produce – seems to be a conventional cache-coherent MP integrated with a Rapport Kilocore-style reconfigurable 2D array of few-bit (don't know how many, likely 2 or 4) ALUs and multipliers. Some inter-processor block data transfer and straightforward synchronization registers are there, too. Use of the accelerator can produce the usual kinds of accelerator speedups, like 24X over one core for MP3 encoding.

Except for their specific accelerator, this is fairly standard stuff for the embedded market. So far, I don't see anything justifying the huge cost of a developing a new architecture, and, more importantly, producing the never-ending stream of software support it requires: compilers, SDKs, development systems, simulators, etc.

One feature that does not appear standard is the power control. Apparently individual cores can have their frequency and voltage changed independently. For example, one core can run full tilt while another is at half-speed and a third is quarter-speed. Embedded systems today, like IBM/Frescale PowerPC and ARM licensees, typically just provide on and off, with several variants of "off" using less power the longer it takes to turn on again.

All the scheduling, synchronization, and power control is under the control of the compiler. This is particularly useful when subtask times are knowable and you have a deadline that's less than flat-out performance. In those circumstances, the compiler can arrange execution to optimize power. For example, 60% less energy is needed to run a computational fluid dynamics benchmark (Applu) and 87% less for mpeg2encode. As a purely automatic result, this is pretty good. It didn't, in this case, use the accelerator.

Enough for a new architecture? I wouldn't think so. I don't see why they wouldn't, for example, license ARM or PowerPC and thereby get a huge leg up on the support software. Something else is driving this, and I'm not sure what. The Intel reference is, of course, just silly; it is instead competing with the very wide variety of embedded system chips instead. Of course, those have volumes 1000s of times larger than desktops and servers, so any perceived advantage has a huge multiplier.

Oh, and there’s no way this can be the basis of a new general-purpose desktop or server system. All the synch and power control under compiler control, which is key to the OSCAR compiler operation, has to be directly accessible in user mode for the compiler to play with. This is standard in embedded systems that run only one application, forever (like your car transmission), but necessarily anathema in a “general-purpose” system.

No comments:

Post a Comment

Thanks for commenting!

Note: Only a member of this blog may post a comment.