Why no desktop 16-core CPU?

Posted by | Posted in Game Development, Pioneer | Posted on 06-03-2013

It’s a question I keep coming up against as I do multi-threading work but where are all of the desktop 16-core CPUs?

You can get what AMD call a 16-core CPU in the Opteron 6200 but it’s built on Bulldozer or Piledriver which are more like 1.5 cores per “dual core” thanks to sharing their FPU capabilities. Or to put it another way, 16-core INTEGER and 8-core FLOATING POINT.

Not long ago we went from single-core to dual-core, to true dual-core (both cores on the same die), to quad-core… and then we stopped.

I guess the argument could be that it’s not worth it for most people? Or that hyperthreading gives you 8 hardware threads with upto 30% performance boost if you can use them well.

This all misses the point though and I’d have been quite happy if AMD had continued adding cores to it’s K10 architecture, keeping up with the process node advances (die shrinking), updating, optimising and just piling on the cores. They have done that to some degree because K10h, as used in the Phenom 2, did make it into the early APU’s in a low power die-shrunk version. It lacked any kind of L3 cache though it did have some extensions, updates and improvements so that it just about holds it’s own against a similar number of core Phenom 2.

Those chips were APU’s though, with an on chip GPU for mobile use. So they clocked slower and fully 50% of the die was spent on the GPU. You could get versions without the GPU called Athlon II but they still lacked the L3 cache and the GPU was there, just disabled and powered off. There was no 8-core Athlon II with an L3, even a small one even though it was probably possible.

We’re down at 22nm + 3D transitors with Intel CPUs whilst AMD are still rolling out on 32nm but are we really stuck at 4-cores?

No, there are higher core counts as you can see but they’re for servers rather than our mere mortal desktop machines. So the work isn’t going into getting more performance out of heavily threading things, instead it’s in GPGPU languages like CUDA, DirectCompute and OpenCL. Utilising the GPU to do work you’d normally have just hammered out on the CPU. There’s real benefit to doing things on your GPU, and eventually I think AMDs “APU” strategy might pay off if they can reduce the latency between the CPU<->GPU for compute languages for example but traditional multi-threading seems to have been ignored. It’s not even an option to get more cores on a desktop CPU and I think that’s a shame as there’s a lot of workloads that will happily scale to 16 or 32 threads without the need to move them to GPU.

It would have been interesting if AMD had chosen to do it, to keep scaling the cores at least as an option but then I think there’s a metric fucktonne of things that AMD could have done to stay relevant which they’ve manifestly failed to do so here’s a short list:

  • big.LITTLE – as in the ARM design where you have a group (4 typically) of large, fast, powerful CPUs and then you have one tiny little low power CPU that is used when the workload gets light to save power.
  • Unlocked multipliers and clocks on CPUs sold with a very limited warranty at the same price as the regular locked one – the Black Edition chips are good but not enough.
  • Change the chip packaging format as Intel and IBM have both separately proposed for better thermal, power and mounting design – if you’re in 2nd place you innovate to survive.
  • Speaking of IBM – form an alliance, use their resources and manufacturing to get access to better process node shrinks.
  • …and of Process Node shrinks – you can’t compete with Intel on them but as soon as AMD were free of Global Foundaries why didn’t they run off to TSMC (or IBM) and aggressively chase what there was available?

There must be a tonne more as well besides, but in short I miss the AMD of old. We were speaking about it at work the other day and it always produces the same head-shaking response where you just can’t believe that todays AMD is the same one that gave us the K6 and k8 architectures. The AMD that bludgeoned Intels Pentium 4 misstep into a cocked hat and happily overclocked it pants off.

Todays AMD seems to be one that releases products based on “vision” rather than”getting-the-fucking-job-done-well“. I’d even settle for: “getting the job done well enough” and do it with more cores but instead if you buy AMD now you’re probably getting a Piledriver chip at the high-end which is finally a little bit faster in some situations than the k10h architecture… they could have done an 8-core k10h in approximately the same die area but minus the GPU and I’d rather have had that because all my GPU needs are serviced by a separate card in a PCI-e slot.

I’d buy that chip, fuck I’d buy a 16-core version without hesitation. A die shrunk 8/16-core Phenom, L3 cache intact, the improvements they rolled into the Stars (k10h successor) architecture, no GPU, on a quad channel memory bus and clocked at about 3Ghz. That could pummel my Core i7 into the floor with the loads I have in mind.

It’s not to be though, that AMD is dead and for some reason so are desktop 8/16-core CPUs it seems.

Comments posted (7))

  1. Personally I’m not sure there’s a market for more cores on the desktop just yet. Didn’t AMD bring out a 6-core CPU when Intel just had a 4-core (albeit with 8 hyperthreads)? And pretty much every consumer-level test ran worse on it? Virtually nothing commonly used on the desktop except dedicated multimedia apps like video rendering, image processing, or multitrack audio really gets much use out of that much parallelism… and many of the people who use those apps are on OSX and thus Intel anyway.

    Desktop users generally have more than enough power for the apps they’re running (probably with 2 or 3 cores idle 90% of the time anyway) and investment in desktop CPUs is certain to be dropping given that desktop sales are in decline. It’s not like Intel seem to be doing much these days either; a look at PCSpecialist.co.uk seems to show that the top Intel desktop chip they offer is barely any different to the one I bought 15 months ago.

    Anyway, writing AMD off at the moment would seem to be premature given that they seem to have a foothold in next-gen consoles. They aren’t the impressive company they used to be, and I really wish they had the slightest resemblance of a clue when it comes to writing decent graphics drivers, but they’re obviously impressing certain hardware developers so something must be going right.

  2. Clock for clock the Phenom 2 x6 was clock for clock better than an x4 (quad) but of course it wasn’t clocked as high so fared worse, but then the dual cores often outperformed the quads for the same reason.

    Honestly there isn’t much of a compelling reason for anything beyond dual core, and single core would suffice for 95% of regular desktop user however that wasn’t my question.

    I definitely can use 8 / 16 even 32 cores quite happily and regularly. I don’t however have that option… at all and that’s what I find odd.

    I spent a bit of time looking at the various die sizes, the area, that each CPU takes up last night whilst writing that post. It’s interesting because if anything it seems like the chips are getting smaller, and are clocked lower, sometimes with less cache than they were a few years ago.

    In AMDs case it’s because they’re trying to squeeze every last penny. I mean it’s crazy you can buy a dual core AMD E350-D attached to a motherboard for £49.99 from Overclockers.co.uk today! Stick 8GB of ram on that for another £35 and you’ve got an really really solid home PC.

    We did stop at quad core though aside from AMDs limited daliance with hexa-core but I cannot understand why when there’s a sizeable market who, like me, could easily mas out almost any number of cores.

    The chips wouldn’t be any bigger, in fact likely slightly smaller, than the current APU (CPU+GPU) configurations. The cores would be the same. I just find it odd that the option isn’t available.

    As for the AMD being in the next gen consoles, I’m not sure that it’s enough. MS / SONY aren’t buying the chips, they’ve just licensed the design and are probably getting Global Foundaries to fab’ them. AMD won’t be making as much from it. Hopefully it’ll be enough to keep them going though. As for why, I think it’s just the combination of there willingness to make a deal like that, they’re laptop designs – low-power, low-clockspeeds, – although the GPU side is in flux. Not sure how they got so lucky, maybe it’s partly because they’re in such financial straights?

  3. Do you think people like you are a sizeable market? I would be surprised if there are more than 1000 people worldwide who are doing stuff that is heavily concurrent to the point of making good use of >4 cores, with algorithms that are ill-suited to GPU, and need it to run on single desktop machines rather than servers, server clusters, or dedicated workstations. It’s certainly not a use case that currently comes up in consumer or business apps, and ultimately that’s what desktop hardware is aimed at, right?

  4. Oh yeah I’m sure, take Sony, they provided every person with an 8 or 16-core machine. We looked them up, dual-socket Xeon workstations starting at £3,500 but rising quickly. One machine per-person minimum.
    Take your initial list, a mixture of image editing, rendering and audio work, then add in compiling, building/processing data sets and anything else from a myriad of fields.

    That’s just stuff that we’re all doing all day everyday right now that would benefit from it. Then think about how quickly that came about? How rapidly did we go from single threaded to maxing out 4-cores? It took a few years but we are there. We might not be home users but most of us also aren’t spending >£3500 on dual-socket Xeon workstations with 24-hour response and replacement agreements from HP ;)

    Also what I’m suggesting wouldn’t cost more than the chips AMD are releasing right now. They’d take up the same die area, and be made on the same process nodes. They’d just remove the GPU, replace it with 4 cores. The most expensive Athlon II on overclockers is £59.99 for a quad core, that’s a lot of pricing headroom they’ve got for adding cores. From what I worked out last night they could fit just shy of 16-cores using the older Stars architecture into the space of 8-cores (8 INT, 4 FPUs) for the Piledrvier based FX. Maybe they’d have to charge a bit more but I don’t think it’s just me that could utilise those cores.

    When we hit quad core people weren’t sure we’d be able to use all of those effectively. Even with dual-core people weren’t sure we’d get decent usage out of them. In truth I don’t think there’s an easy to define limit to what my day to day work couldn’t utilise and I don’t think I’m alone in that nor is our field.

    We might not use it 90% of the time, but when we’re light on usage we’re often really light, as in the cores idle and down clock a lot of the time. However when we do want to use it then we’re often bottlenecked by it and the machines are thrashing the quad-core CPUs with every core pegged for long periods of time.

  5. lack of demand. Companies like Apple have pushed for low power devices, and now everyone else is jumping on the phone and tablet market, the demands for power have just carried on that way.

  6. you might say it would not cost more, but what costs is really all the machine tooling and R&D, rather than the chips itself. The actual manufacture once tooling and research are done is commonly a relatively small part of the cost that needs recovering.

  7. 1st comment: Lack of demand; but as I said, there is demand. Not everyone is after dual core low power devices. The market for high-end CPUs is thriving so whilst things are pushing the power down into mobile CPUs it’s also bumping up against the limits of quad cores and power hungry machines at the top end which is still a vast market across a wide range of areas.

    I also pointed out that we didn’t take long to push those quad core chips to their limits at all and if given more core we’d use them all pretty soon too. At least at the high end. I never suggested giving 8-cores to people who just want to read their email.

    2nd comment: R&D: the thing is the AMD nearly bankrupted themselves spending on R&D for Bulldozer and Piledriver, which the market has emphatically showed that they don’t want. Whereas they had to keep developing Stars (K10h) simultaneously. It was money they spent anyway. IN fact they spent more because they went the APU (CPU + GPU) route which is why I made the argument that I did. More cores = lower R&D cost because they already had the design, they didn’t have to integrate the GPU and they didn’t have to build entirely separate architecture which spectacularly flopped.

    In fact it was the Stars architecture that kept them afloat whilst Bulldozer very nearly buried them.

    Not that it appears to have saved them anyway: http://arstechnica.com/business/2013/03/amd-sells-its-austin-hq-for-164-million-to-raise-some-quick-cash/

    Plus they’re getting bugger all from Sony or MS for the use of their cores or GPUs since they’ve licensed them rather than making them and selling them.

    It bothers me because I like AMD, I think that the market needs them too. They do seem screwed at the moment though.

    My original point still seem to stand though, low-power dual-core machines are fine for most people, but there’s a thriving high-end market that could do with more cores without going to all the overhead and expense of server hardware. I.e we’ve got the £200 to £600 range covered, and the £3000 to £5000 range with servers. Why is no-one really covering that £600 to £3000 range?

    You can spend that much on a PC, with quad-core but it won’t perform much better than something around the £600 mark in terms of processing power, maybe 1% or 2%. So it just seems like a missed opportunity, and as I keep hammering on, we really can use them all.

    If I had it to spend I’d go for a dual-socket Xeon setup with 16-cores and 32-hardware threads. I won’t though because that’s the £3000 to £5000 range :(