Why a consoles CPU architecture is almost irrelevant.

Posted by | Posted in Game Development | Posted on 22-05-2013

In the aftermath of last nights XboxOne announcement this is probably the least technically literate article I’ve read so far: http://www.engadget.com/2013/05/21/x86-architecture-vs-nintendo/

Why would the CPU architecture make any difference? Answer: It doesn’t. Ok it might make things slightly different, a tiny bit different but frankly there are much bigger differences between even the XboxOne and PS4 than a little thing like the CPU architecture.

With something like that we abstract it once, occasionally fix up a few issues over the development of the title, spend some time during optimisation and… that’s it.

It’s just a tiny cost compared to the vast hours spent dealing with the rendering pipeline differences. The initial system level bringup and porting of our libraries. Dealing with platform specific TCR/TRC compliance and testing. Balancing system resources and usage. Or managing the different build systems and compilers. You still need to do that for ANY new console even when they all use the same underlying architecture because you’re rarely targetting the ASM level directly.

If anything doomed the Wii U it was holding off it’s launch until the next-gen consoles were due so soon. If they’d launched it earlier they’d have had several YEARS to establish it, to iterate on the hardware to bring costs down and resolve any problems with the OS software. Instead they look like an underpowered, last-gen console with poor software support and a gimmick

Andy

Why no desktop 16-core CPU?

Posted by | Posted in Game Development, Pioneer | Posted on 06-03-2013

It’s a question I keep coming up against as I do multi-threading work but where are all of the desktop 16-core CPUs?

You can get what AMD call a 16-core CPU in the Opteron 6200 but it’s built on Bulldozer or Piledriver which are more like 1.5 cores per “dual core” thanks to sharing their FPU capabilities. Or to put it another way, 16-core INTEGER and 8-core FLOATING POINT.

Not long ago we went from single-core to dual-core, to true dual-core (both cores on the same die), to quad-core… and then we stopped.

I guess the argument could be that it’s not worth it for most people? Or that hyperthreading gives you 8 hardware threads with upto 30% performance boost if you can use them well.

This all misses the point though and I’d have been quite happy if AMD had continued adding cores to it’s K10 architecture, keeping up with the process node advances (die shrinking), updating, optimising and just piling on the cores. They have done that to some degree because K10h, as used in the Phenom 2, did make it into the early APU’s in a low power die-shrunk version. It lacked any kind of L3 cache though it did have some extensions, updates and improvements so that it just about holds it’s own against a similar number of core Phenom 2.

Those chips were APU’s though, with an on chip GPU for mobile use. So they clocked slower and fully 50% of the die was spent on the GPU. You could get versions without the GPU called Athlon II but they still lacked the L3 cache and the GPU was there, just disabled and powered off. There was no 8-core Athlon II with an L3, even a small one even though it was probably possible.

We’re down at 22nm + 3D transitors with Intel CPUs whilst AMD are still rolling out on 32nm but are we really stuck at 4-cores?

No, there are higher core counts as you can see but they’re for servers rather than our mere mortal desktop machines. So the work isn’t going into getting more performance out of heavily threading things, instead it’s in GPGPU languages like CUDA, DirectCompute and OpenCL. Utilising the GPU to do work you’d normally have just hammered out on the CPU. There’s real benefit to doing things on your GPU, and eventually I think AMDs “APU” strategy might pay off if they can reduce the latency between the CPU<->GPU for compute languages for example but traditional multi-threading seems to have been ignored. It’s not even an option to get more cores on a desktop CPU and I think that’s a shame as there’s a lot of workloads that will happily scale to 16 or 32 threads without the need to move them to GPU.

It would have been interesting if AMD had chosen to do it, to keep scaling the cores at least as an option but then I think there’s a metric fucktonne of things that AMD could have done to stay relevant which they’ve manifestly failed to do so here’s a short list:

  • big.LITTLE – as in the ARM design where you have a group (4 typically) of large, fast, powerful CPUs and then you have one tiny little low power CPU that is used when the workload gets light to save power.
  • Unlocked multipliers and clocks on CPUs sold with a very limited warranty at the same price as the regular locked one – the Black Edition chips are good but not enough.
  • Change the chip packaging format as Intel and IBM have both separately proposed for better thermal, power and mounting design – if you’re in 2nd place you innovate to survive.
  • Speaking of IBM – form an alliance, use their resources and manufacturing to get access to better process node shrinks.
  • …and of Process Node shrinks – you can’t compete with Intel on them but as soon as AMD were free of Global Foundaries why didn’t they run off to TSMC (or IBM) and aggressively chase what there was available?

There must be a tonne more as well besides, but in short I miss the AMD of old. We were speaking about it at work the other day and it always produces the same head-shaking response where you just can’t believe that todays AMD is the same one that gave us the K6 and k8 architectures. The AMD that bludgeoned Intels Pentium 4 misstep into a cocked hat and happily overclocked it pants off.

Todays AMD seems to be one that releases products based on “vision” rather than”getting-the-fucking-job-done-well“. I’d even settle for: “getting the job done well enough” and do it with more cores but instead if you buy AMD now you’re probably getting a Piledriver chip at the high-end which is finally a little bit faster in some situations than the k10h architecture… they could have done an 8-core k10h in approximately the same die area but minus the GPU and I’d rather have had that because all my GPU needs are serviced by a separate card in a PCI-e slot.

I’d buy that chip, fuck I’d buy a 16-core version without hesitation. A die shrunk 8/16-core Phenom, L3 cache intact, the improvements they rolled into the Stars (k10h successor) architecture, no GPU, on a quad channel memory bus and clocked at about 3Ghz. That could pummel my Core i7 into the floor with the loads I have in mind.

It’s not to be though, that AMD is dead and for some reason so are desktop 8/16-core CPUs it seems.