A new year!?!

Posted by | Posted in Game Development, Life, Pioneer | Posted on 04-01-2014

Seems I haven’t updated since September.
Well I got a new job, I’m now working on Unreal Engine 4 via my employer Pitbull Studios.

Still doing Pioneer stuff, I’ll post a review of work done and work to do some time soon-ish.

2 weeks already?

Posted by | Posted in Game Development, Life, Pioneer | Posted on 13-09-2013

Wow time really flies when you’re … um, reading books and going to the gym more often?

Ok so I have been doing some programming on Pioneer; fixed up the solutions & projects for VS 2012 & 2013 Preview (I could do VS2010 too but am not touching 2008!) to deal with latest changes, fixed the GLEW changes to get it all building and compiling and then took up the Oculus Rift integration again – that’s a bit hairy that stuff since I’m having to modify a LOT of stuff in Pioneers rendering process.

Anyway the result so far is the following;

  • distortion shader working,
  • head tracking working,
  • rendering two cameras to a framebuffer working.


So all done huh? Nope, not by a long shot.

There’s a lot of other stuff that needs to be done to get a 3D projection + FOV + view offsets for each eye and a few other bits before we actually get a working 3D view and I haven’t worked through all of that yet since I’ve been busy with time consuming other stuff.

Unexpectedly, probably only to me, some of it is just stuff that didn’t used to take much time at all but is now quite time consuming.

Take going to the gym, this is what it used to work out like:

  • Leave work & walk to gym (11 mins)
  • Get changed (3-ish mins)
  • Exercise (30 mins)
  • Shower and get dressed (10-ish mins)
  • Hurry back to work stopping at Tescos (15 mins)

I think that in truth that used to usually take about 1 hour and 15 minutes, sometimes longer if we’d been doing leg work and I couldn’t walk as fast on the way back. Now however I’m coming in from Beeston which entails an extra train journey, and because I don’t choose when the trains run I have to catch them based on which one gets me there _before_ my session with my personal trainer. This means that I’m getting in quite a bit earlier, but that I’m travelling for much longer and due to the scheduling granularity of the trains… well, I left the house at 11:30am and got home today at 15:00pm just to do a 30 minute session. I could have gotten home earlier today, but decided that since I’d be missing one train anyway I got a haircut (I’m male, so if it takes more than 10 minutes to cut my hair then something is wrong). So I could have shaved 30 minutes off that (Train schedule) but even so it’s around the 3 hour mark to go to the gym just once.

Everything has scaled similarly though, popping out to get milk was to the nearby Tesco Express and less than 10 minutes but now involves walking across Beeston to the big 24 hour Tescos (or further for Sainsburys) because that’s the nearest one that does the Lactofree stuff. I’ve also been getting to the doctors (knee injury and anxiety attack stuff), dentists (keep flossing!!!) and the talking therapy place for an assessment (I’m not mad, might want to work on my confidence in a couple of months – quitting was the right thing to do apparently!).

At first I was worried about how this was going ot impact on two of my stated goals from the previous post:

  1. Work through some UDK tutorials – the new job is on Unreal Engine 4 so I think getting a footing in the tech’ will be good for me,
  2. Work through Frank D Luna’s D3D 11 tutorials but convert them to OpenGL 4.0 – Been meaning to do this for a while, it’s a good ramp up and parallels these SlimDX posts,

…but screw that, I’ve really really really (emphasis!!!) needed this downtime to get myself sorted and simply running into yet more challenging stuff and then freaking out about that wasn’t going to help anyone least of all me.

My plans haven’t changed, I still want to dive into the UDK stuff – I have it all installed and a selection of tutorials ready, I’ve just pushed the start of it back until this weekend or Monday. That takes the pressure off me for doing the Pioneer Oculus Rift integration which is acting like a pressure valve and gentle reentry into doing some coding again for fun.

Now though I’m going to go and tidy up downstairs before Danni gets home. She’s started at a secondment to a new primary school which is in Special Measures. It’s taking a lot of her time and mental capacity, everyone there is under tremendous pressure, so I’m trying to suck less at the housework. Of course I come from a position of sucking _utterly_ to begin with but I am getting less awful :)



To shard, or not to shard?

Posted by | Posted in Game Development, Pioneer | Posted on 01-07-2013

To shard, or not to shard? Is the wrong question.

I came across a bit hard on sharding in my last and have rightly been called on it a couple of times so I’m just going to quickly clarify.

When You Should Have a Shard:

You will be have various “shard“s your game at least two in fact.

Read the rest of this entry »

Space MMO – Pioneer-alike

Posted by | Posted in Game Development, Pioneer | Posted on 29-06-2013

Space game MMOs… A brain dump:

I’ve been inspired to brain dump by a post on the Paragon forum.

To quote:

Will multiplayer be instanced or MMO style?
If MMO will players be able to transverse to other servers/galaxy’s with their ships,equipment and credits?

Now, leaving aside the question of what they might mean behind instancing, lets instead think about how a space game like Paragon, not Pioneer note, Paragon might choose to implement an MMO gameplay model.

NB: Paragon is a sort of daughter project to Pioneer. It’s forked from the Pioneer codebase and occasionally pulls in work we do. However Paragon has it’s own developers, development and game specific code, scripting, storyline and art. I’ll be discussing these things from being a Pioneer developers point of view.

Read the rest of this entry »

Google+ Comments reposted…

Posted by | Posted in Game Development | Posted on 22-05-2013

To avoid repeating some of the things I’ve been saying on Google+:

“Overall I’m feeling a little “meh” about the XboxOne. It seems as though it can now do all the things that I’m currently using the PS3 and Xbox360 for… but I already have both of those. In fact whilst the PS3 is on literally every single day I haven’t turned on the Xbox360 in a few months because the PS3 does everything that I could want it to do for the foreseeable future.

All of the extra stuff (skype/facebook/not-always-but-sort-on/video-in+out/etc) is all stuff that I either actively don’t want or don’t like.”


“Yes, it seems as though the new XboxOne will do a lot of the things that they’ve belatedly allowed the Xbox360 to do after the PS3 did them first (NetFlix, LoveFilm, etc).

Only it will also come with a boatload of things that Iactively don’t want it to do (Always on, Kinect+, Facebook, Skype, etc) that they could have enabled on the Xbox360 anyway but haven’t.

Meanwhile they’ve forgotten that the reason I’d have bought an XboxOne would be for the games and instead are releasing a console much weaker than the PS4.

If Halo4 had been good enough then I might have bought one, I’ve been a big fan of the Xbox since developing for the first one (that’s not at all confusing…) was such a pleasure compared to the PS2. The Xbox360 continued that by being much nicer to make games for than the PS3.

Seems to have all gone wrong this time around.”

Pretty much sums up how I feel about it all.


Why a consoles CPU architecture is almost irrelevant.

Posted by | Posted in Game Development | Posted on 22-05-2013

In the aftermath of last nights XboxOne announcement this is probably the least technically literate article I’ve read so far: http://www.engadget.com/2013/05/21/x86-architecture-vs-nintendo/

Why would the CPU architecture make any difference? Answer: It doesn’t. Ok it might make things slightly different, a tiny bit different but frankly there are much bigger differences between even the XboxOne and PS4 than a little thing like the CPU architecture.

With something like that we abstract it once, occasionally fix up a few issues over the development of the title, spend some time during optimisation and… that’s it.

It’s just a tiny cost compared to the vast hours spent dealing with the rendering pipeline differences. The initial system level bringup and porting of our libraries. Dealing with platform specific TCR/TRC compliance and testing. Balancing system resources and usage. Or managing the different build systems and compilers. You still need to do that for ANY new console even when they all use the same underlying architecture because you’re rarely targetting the ASM level directly.

If anything doomed the Wii U it was holding off it’s launch until the next-gen consoles were due so soon. If they’d launched it earlier they’d have had several YEARS to establish it, to iterate on the hardware to bring costs down and resolve any problems with the OS software. Instead they look like an underpowered, last-gen console with poor software support and a gimmick


Part 3: Many hands make for light work.

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 19-05-2013

Well this episode has taken longer than planned to get written, or event started for that matter. So lets not delay as this is part 3 of me attempting to explain the terrain system used in Pioneer Space Sim.

If you want to recap and point out poor grammar or spelling mistakes then go ahead and read Parts “1: Pioneer’ing Terrain” and “2: Now with… no feeling in my arms due to all the typing!”.

How things change:

Since Part 2 there have actually been some developments which make this edition even more relevant. At the end of it I mentioned that I’d be covering my work making the terrain generation multi-threaded using a job based system. Well that has now been merged into master and is available in the latest downloads. It’s got some bugs fixed since then and hopefully this will only get truer in the future ;)

Read the rest of this entry »

Part 2: Now with… no feeling in my arms due to all the typing!

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 20-04-2013

I should have made this clear from the outset but these posts aren’t a description of the “best way” to do anything, they’re more akin to documenting how we currently ARE doing things.

There’s a great deal of information out there on how these things can work but with Pioneer you can grab the source code and actually see it in progress. There’s great value in getting hold of something, testing it, debugging it, making changes and watching it blow up in your face ;)


In Part 1 I gave a very basic description but at no point did I attempt to explain things at the code level… and I’m not going to get too close to it during this series but I’ve got to get at least a little more familiar because some parts just don’t make any sense at first.

What does this do and why does it do it?:

One of the first things you see with the “GeoSphere” code leading up to, and including, the Alpha 33 release is that it’s entirely contained in just two files which are several thousand lines long and full of some nasty complex looking code. This is an unfortunate side effect of the way it’s evolved. At one point it might have been reasonably clean but as with all things they get added too and so the complexity grows along with the sheer amount of code in a single file. Eventually you just have to prune it back but with these situations you first have to understand what’s going on before you can do that pruning and splitting up of files. The complexity means that no-one understands the system and deciphering it becomes the large part of the burden so the situation is rarely resolved.

So lets tackle the big pieces of the complexity first.


The GeoSphere class is the main unit the rest of the code will interact with, it’s the part visible to the outside. It contains instances of the other main classes and does some general orchestration of them. It can be thought of as being split into two main parts however due to it’s use to static methods and members:

  1. static methods & members that affect ALL GeoSphere instances.
  2. non-static methods & member that are about a specific instance of GeoSphere.

The static part handles the creation and initialisation of stuff that affects all of the GeoSphere instances, the GeoPatchContext which holds information about the size and detail of the GeoPatch’s for example. It also kicks off the thread which manages updating the level-of-detail calculations I mention in Part 1.

The per-instance stuff is fairly basic:

  • m_terrain pointer, which is a “Terrain” pointer that will handle all of the heavy lifting logic for generating height and colour information for the patches.
  • 6 GeoPatch pointers, these are the 6 faces of the cube we will turn into a sphere, these look after themselves most of the time, we just have to call methods(/function) one them.
  • Constructor, destructor, Render, GetHeight, GetColor and BuildFirstPatches methods.

GetHeight and GetColor are simply helper methods, they just pass some information into a method of m_terrain and return the data, nothing more. Likewise the constructor and destructor are just setup/cleanup so lets ignore them.

The two methods of interest here are Render and BuildFirstPatches.

The Render method is called from the main thread as all of our rendering has to happen there currently. However, Render does more than just render, yeah I know, horrible eh but it’s what we’ve got.

At the top of the method it mostly does some setup of materials, one’s subsequently used by ALL of the patches that will be drawn. Then we get to a call to BuildFirstPatches. This is checked every single time we call Render but it only does anything on the first call and it’s lazy evaluation done lazily. It could be quite convoluted to create the GeoSphere and then immediately call BuildFirstPatches so instead of resolving this complexity it just checks to see if the root node of the patches have been built every frame and if it hasn’t it calls this method. The most stars/planets/moons I’ve yet to see in a system was about ~50 so it’s not too much overhead but it’s still icky.

After this check we configure basic lighting and then we start calling the Render method on each of the root GeoPatch instances (the ones created by BuildFirstPatches). I’ll describe them more later but they essentially do the quadtree traversal finding the nodes worth rendering.

Once we’re finished there we release out materials and then do some nasty management of the multithreaded updating logic, and finally we store the current camera position that will be used in the updating and level-of-detail (LOD) calculating thread. This is actually another reason why BuildFirstPatches is done lazily within the Render call, it’s because you can’t properly update the LOD until you have a camera position, but you don’t have one until you’ve tried to render. By putting the BuildFirstPatches call in Render you almost guarantee that you will have a valid camera position by the time that the LOD thread gets around to updating your GeoSphere. Hacky but it’s worked all this time.


If GeoSphere were the potatoes then this is the meat. GeoPatchContext might take up the top 1/3rd of the file but that stuff is actually fluff, it’s bulky but it’s not doing a lot of active logic, just setting things up for GeoPatch to use later, and they’re mostly rendering related but they’re not important until we get to GeoPatch rendering.

GeoPatch is defined entirely within the CPP file for GeoSphere, I don’t mind this technique overly much because C++ can be rather limited when you’re trying to hide an implementation but this is crazy. The GeoSphere implementation is only ~500 lines of code, and it starts at line 1053! GeoPatch on the other hand takes 670 lines in the middle of the file. Ugly and confusing for people new too it. Also confusing is that the quadtree, patch data, and some threading handling information are all muddled together.

We have as the 3 parts that should be more distinct:

  1. kids (*4), parent and edge friends (*4) – this is the quadtree information.
  2. m_kidsLock pointer to a mutex – this is the threading rubbish, it’s used to stop you from updating the patch when you’re trying to render it and vice versa.
  3. everything else is the patch data or at least can be considered that way.

To add to this confusing class data layout is that some of it will be called and used in both the main thread and the LOD update thread with access gated by the m_kidsLock mutex. Threading is almost never simple but this lack of clarity makes it hard to understand both when and where some members will be updated.

Next we have a bunch of functions that stump most people, I’ll list them and then explain their purpose:

  • GetEdgeMinusOneVerticesFlipped
  • GetEdgeIdxOf
  • FixEdgeNormals
  • GetChildIdx
  • FixEdgeFromParentInterpolated
  • MakeCornerNormal (templated for extra fun)
  • FixCornerNormalsByEdge
  • GenerateEdgeNormalsAndColors
  • OnEdgeFriendChanged (needed)
  • NotifyEdgeFriendSplit (needed)
  • NotifyEdgeFriendDeleted (needed)
  • GetEdgeFriendForKid (needed)

All of these serve one purpose, to confuse the hell out of everyone reading the code… no ok that’s being mean, the last 4 really are useful… still mean? Yeah a bit, I’ll have to explain.

Generating terrain can be a computationally expensive process. The way we do it will be a couple of future articles but the very very short version is that we run an expensive set of mathematical noise functions many many times for every single vertex that we want to generate the position for. Doing this for a single vertex is expensive but we do it 10’s of thousands of times even for a low detail planet. Most of the code, and ALL of that complexity above, is an attempt to avoid generating as much of it by copying it from other patches that have already been generated.

The reason that the last four methods are, mostly, needed is because they’re both the way that we keep track of who our neighouring nodes are and how we kick off the data copying process.

The thing to take from this is not that these are a good idea, and I’m not going to explain how they work either, the important thing is that they’re OPTIONAL. They’re costly too, a lot of time is spent maintaining all of this and all of the data copying is pretty bad for performance for other reasons. Not least that it often just has to generate the data anyway because a neighbour can’t be found at the correct level. They also make it much harder to make everything into smaller tasks that can better use multi-core CPUs. All things considered I’d rather not have them.

What all this mean is that GeoPatch has that lot, and only another handful of interesting methods which I’m going to go into a little detail with now:

  • determineIndexBuffer – I will forever regret that this doesn’t match our method naming convention, i.e: that lowercase ‘d’. Ignoring that though; this is what uses the result of the method “Init” from GeoPatchContext, most of it anyway. “Init” went along for many lines of code calculating 16 index buffers, it was a lot of code and earlier I said it didn’t do very much. Well in truth what it did was set up 16 index buffers so that here, in this method, we could do a very fast and simple bit of OR’ing and come up with an index into that list of 16 index buffers. Specifically we take our edgeFriends and we see which ones of them we actually have, and we find the perfect index buffer that uses low-resolution edges when we don’t have a neighbour on that edge, and hi-resolution edges when we do. There are 16 possible combinations, although in truth we only ever have a subset of those used on our terrain. It’s quick and easy to calculate them though, and this is fast way to index those.
  • UpdateVBOs & _UpdateVBOs – even I’m not entirely sure where these two are called from, which thread and under what circumstances. This is because of those many edge copying methods, I suspect that UpdateVBOs can be called from either depending on circumstances, but it’s usually called from the LOD thread. _UpdateVBOs on the other hand should only EVER be called from the MAIN thread as it interacts with OpenGL by updating or creating the vertex buffer object (VBO) that is used to render the patch on screen so if it’s ever called from another thread it will almost certainly crash, or worse it’ll keep running and fail silently :(
  • GetSpherePoint – gets a point on a sphere, specifically it performs a bilinear interpolation between the four corner points of a patch. At the lowest detail level then, that means the corners of a cubes face, i.e: this turns our cube into a sphere – or one faces of it into a curved patch on the surface of the sphere – part of the magic happens here. Once it’s done the bilinear bit it “normalises” it, turns it into a vector of unit length (Off topic; I hate wikipedias vector descriptions, they penalise those striving to learn some maths and strokes the egos of those who already do) so that when you sum the parts of the vector they equal 1. GenerateMesh makes heavy use of GetSpherePoint because it creates the patch vertices.
  • GenerateMesh – ignore the centroid for now, focus on the first couple of nested for loops because they generated ALL of the heights for every vertex of a patch. This *IS* the terrain creation right here in two for loops.
    • The call to GetHeight is the expensive call to the generation process itself but it’s interchangable with any other. We have many versions of this call they all do things using different maths and could even use other ways of generating terrain so we’ll deal with them another day. No the important bit is that if you strip all of GeoPatch down to it’s absolute bare minimum then this function and GetSpherePoint is almost all you’d have left with just these two for loops.
    • We get the height using a double precision floating point variable because it will be in the range 0.0 to 1.0 yet it has to be very very precise and if we used a single precision (32-bit) float then we’d lose detail and get visible banding on the terrain. There are ways and means of avoiding this but doubles are simple and they’re fast enough most of the time.
    • We take that height and add 1.0 to it, then we take this new value and use it to scale the vertex position from GetSpherePoint and that’s it, we have a terrain height that will be of at least unit length (1.0) up to a maximum of length 2.0 which would make for some very high mountains.
    • The second pass is now where the trouble starts but can be summed up with the word: “normals” – the calculation of which uses 4 vertex positions, at least two of those will be within this patches data, but the other two will be in our neighbouring patches. That’s why these for loops don’t cover the entire patches data. Instead they cover the central part, the rest of the data, the edges, will be calculated by the mess of edge copying methods I skipped describing above.
    • The colour is also calculated in this 2nd pass, it requires the height and the normals so it’s also done in the edge copying. You might notice if you’re reading through the code that the height is stored in the colour in pass 1, then used in pass 2. I covered why this was and why it was bad in Part 1 – as I say, this is a description of how it is, not how it should be!
  • Render – an actual blend of quadtree and patch all in one here. This is depth first traversal & rendering of a quadtree in action! Render MUST be called in the MAIN thread only because it’s dealing with OpenGL. First we see if we have any child nodes, if we do we just forward on the values that we passed into the function. That’s the depth first traversal part, because in a tree each node will do the same thing, over and over until it reaches a leaf node, the one without children. When we do the rendering is straight forward:
    • Lock the node – so it can’t be updated in the LOD thread,
    • optionally update the vertex buffer object (VBO),
    • test to see if we’re visible and return without doing anything if we’re not,
    • push the current matrix and then translate the view according the difference between the camera position and the clipCentroid (which is the centre of the four corner vertices),
      • this helps us deal with a problem called “jitter” (/”jittering”) caused by the GPU only using 32-bit floats which aren’t precise enough to represent the terrain. What we do is move the rendering of the patch in such a way that 32-bit’s are precise enough for us by offsetting it from the camera position. We’re effectively moving it closer to the camera before applying the position to avoid the jittering! Patches further away still jitter like crazy, but it doesn’t matter because they’re small and far away!
    • update some information we store about how many triangle we draw,
    • setup our buffers, there’s determineIndexBuffer from earlier,
    • do the actual drawing,
    • now release our buffers and pop the matrix so that this patches matrix offset doesn’t affect the next patches to be rendered.
    • and we’re done with Render :)
  • LODUpdate – This is called only from the LOD thread, the MAIN thread never goes near it, I’ll break it down just like Render above but this is where we decide to increase or decrease the detail of a GeoPatch by splitting or merging them:
    • Slightly different to the above we lock the “m_abortLock” to see if the thread is trying to quit – it helps us exit faster is the idea but it’s implementation specific, ignore it.
    • canSplit – leading question eh? We iterate through our neighbours and perform some tests like: Do we have them all? Are they less detailed than us or higher in the quadtree than us? If we pass that then we check that we’re not already too deep to split and finally we apply some maths to decide if our edges length is more than the distance from the camera to the centroid we calculated in GenerateMesh. This is a crude approximation of determining how big the patch will be on the players screen. You can find much better ones if you Google for geomorphing. There’s an additional check to see if we have a parent because someone decided we should always split in that instance… not sure why.
    • canMerge – apparently yes… I don’t know why this is here, we could avoid an if test later on, I hope the optimiser gets rid of it!
    • canSplit branch – if we really really canSplit then, in summary: we first see if we have children, if we then we just pass on the call to LODUpdate otherwise; we create 4 child nodes, we set ourselves as their parent, we setup the relationship between each of them and the neighbours that we know about (if any) then we call GenerateMesh on each of them, we GenerateEdgeNormalsAndColors using all that complicated shit above, we UpdateVBOs which sets the flag that says that we really need to call _UpdateVBOs back in the MAIN thread and finally we pass on the call to LODUpdate to our newly created child nodes.
    • canMerge branch – if we cannot split then we’ll go this way, always (stupid code). That doesn’t mean we can actually merge though, not if have no child nodes. If we do then we’ll happily destroy them, and in their destructors they’ll destroy their kids and so on and so forth. When I first starting reading this code it took me a while to realise why the tree isn’t unbalanced and constantly destroying and creating node. It’s because we split aggressively. We only take the merge branch when there’s no way of splitting.
    • A word on locking: in both the canSplit and canMerge branches we Lock but at different times. In canMerge land there’s not much to do, just destroy stuff so we lock straight away. In canSplit land though we can delay the mutex locking until later, we allocate the new child nodes to temporaries, then call the expensive GenerateMesh method, only once we’ve done that do we lock the mutex and copy the temporaries into the correct child node pointers. They have to be there for to correctly notify their new neighbours that they exist and to generate the edge normals and colours. At least this way the mutex isn’t locked for as long, because that would delay the rendering call back in the MAIN thread if it was.
  • GeoPatch constructor/destructor – a quick note about these two, the generate some data used throughout the rest so are worth reading and keeping in mind. The destructor also does some of the edge friend management by letting it’s friends know when it’s destroyed but otherwise they’re pretty simple.

Wow, 3202 words in so far everyone… everyone? Anyone? Ahem, anyway.

The end?:

That’s it for the whistle-stop tour for now. I know I know, there’s still slightly more than a third of the CPP file that I haven’t even started to cover but that’s fine because I’m not going too. At all.

It’s mostly too specific to some of Pioneers quirks, some of it is just structural so it’s run once to set things up and then never used again but mostly I’m ignoring it because it doesn’t deal with what you need to understand.

As I’ve described above the existing terrain generation is in two parts; one in the main thread with some setup and then the rendering, the second is in it’s own thread. The two cross over at places and prevent crashes and other problems by locking mutexes. This works but it causes performance problems of it’s own. There’s a great deal of data copying stuff that can be avoided by simply generating more than you need, that means you don’t need to track relationships so much and that simplifies the code massively. That might have performance issues of it’s own of course since you are doing extra work but it simplifies the code so very much and I’ve already covered the reasons for doing that in Part 1.

Next in the series I’ll probably cover some of the ways we generate the heights for the terrain… no, no actually I’m not doing that at all. Yikes that’s terrifying stuff. No instead I’ll discuss what I’ve been working on to make it take advantage of multithreading and many cores. In that article I’ll also cover a side project called GLSLPlanet which moves some of the work onto the GPU, and why I’m not doing that in Pioneer just yet even though it’s based on the same code I’ve just described.

See you next time,


Part 1: Pioneer’ing Terrain

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 17-04-2013

Yes yes I went for the pun-tastic title :)

This is going to be a bit more technical than usual, not massively, don’t stop reading for fear of equations or pages of code. No this is just going to be a technical description of some parts of Pioneers terrain rendering system because it’s quite odd, not great, but it’s effective enough. I’m basically going to brain dump a bit so I might come back and edit this in the future to clean it up, it will be something of a living document.

In the future I’ll refer back to this post to describe the ways that the terrain rendering and generation will be changing.

I should also point out that I’m not the original author of the terrain rendering or generation used in Pioneer. It’s just an area that I’ve been poking around in a lot lately and that I have a lot of changes that I want to make to that area. These changes should make generation of the terrain faster and more flexible, add control that we currently lack. They should also accelerate rendering and use a lot less memory as well as opening up new visual and game effects in the distant future.

The basic idea:

The terrain itself is a form of Quadrilateralised Spherical Cube which is a fancy way of saying that you’ve got 6 flat planes oriented to face in the six outward directions of a cube, then you deform them to be a sphere. This causes some distortions but it also comes with a lot of rather useful benefits.

For a start we can use a quadtree to define the surface of each (originally) square face of the (former) cube. That’s handy because there’s lot of simple terrain rendering algorithms that can use such an arrangement. In this case we’re using something similar to “Chunked LOD“. I say “similar” because the details differ, but you can read a better explanation on the “making worlds of sphere and cubes” blog post on Acko since it’s similar in concept.

So you know that:

  • we’re using 6 quadtrees, 1 per cube face.
  • each quadtree is deformed to fit onto a sphere.
  • that magically this is helpful…

Well when we want to render the terrain we start at the top of each of the quadtrees, i.e; the lowest detail level, and simply ask it if it has any children. Because it’s a quadtree it will have either 4 or none, hence the name “quad” in case you ignored that wikipedia like. When we reach one of these quadtree nodes that has no children we render whatever geometry it has because that must by definition be the highest level-of-detail available. This is a very simple system, you repeat this for each of the faces of the cube/sphere and you’ve very quickly rendered a convincing sphere onscreen.

Implicit in the above statement however is the idea that some of the quadtrees “nodes” won’t have children whereas others will. To do this we pass through the players camera position to an update method that goes through the quadtree, much like the rendering does, and at each node it does some maths to work out if a node is in 1 of 3 states: good enough, whether it should “split” and create some child nodes, or if it has child nodes and doesn’t need them anymore so can “merge” them.

  • Good enough: nothing happens, we have the perfect amount of detail. In the code we’ll take the “merge” branch but if we have no child node then nothing changes,
  • Split: Ah, this nodes not detailed enough so we can either create 4 child nodes and populate them with more detailed terrain, or if we already have them then we descend into those node and repeat the update process,
  • Merge: The flipside to being good enough is that we’re good enough and so if we have any child nodes then they’re superfluous and we can merge them, or in Pioneers case, just delete and erase them from existence.

In the current Pioneer Alpha 33 this all happens in a single separate thread with a few locks/mutexes preventing rendering from happening at the same time as updating. This creates a lot of waiting around for either updating the quadtree or waiting for rendering to complete. It was the simplest way to improve things at the time it was created though because it meant that it didn’t stall the main thread whenever the, very expensive, terrain generation needed to be done. Doing it this way keeps everything looking like a single-threaded process and this is inherently simpler to write and understand. That means there should be less bugs and less problems.

Unfortunately Pioneer tries to do some odd things that exploit the fact it’s storing the terrain in a quadtree. This unfortunate stuff is done in the name of performance in that it tries to avoid regenerating some of the terrain data by keeping close track of who it’s neighbours are in the quadtree and then copying as much data as it can. All of this neighbour tracking and data copying is very expensive both in performance terms and complexity. Overall it was this that took me the longest to understand rather than the actual terrain generation or rendering parts!

This neighbourly management is one of the first things that is going to change, hopefully by the next release date.

You see the neighbouring node stuff is complex, it also imposes some form on the quadtree structure that we don’t need and that means things cannot be easily generated in little pieces or across many CPU cores. Ideally we’d like the update logic to happen, it decides to Split or Merge some nodes. Then it asks a system to go away and make it the data it needs. Those requests disappear off into the aether and a short while later new terrain node data returns ready to be put into the quadtree. Whether that’s done on a super-powerful GPU, farmed out across a distributed network, or done on a differet core of the same CPU we don’t know or care. Removing the neighbour management, or at least the data copying portion, means breaking up the generation of these quadtree terrain nodes gets much simpler and easier to distribute across many CPU cores.

Quadtree specifics and quick/easy changes:

A quadtree can be used to store any kind of information, it doesn’t even have to be spatial or geometric like we are in Pioneer. However that is our use case and so we store a tonne of data in each of the quadtrees nodes. Now by a tonne of stuff I really mean a metric-fucktonne of stuff! We store in terrifying order:

  • A ref counted pointer to the context data that holds the terrain generation methods,
  • the double precision vector of the corner vertices,
  • 3 * double precision vector arrays to the vertex, normal and colour data,
  • a GLuint for a nodes vertex buffer object number,
  • 4 pointers to its child nodes (it’s a quadtree afterall),
  • another pointer too this nodes parent,
  • 4 more pointers to it’s possible neighbours (whether or not it has any),
  • a pointer to the parent object called a geosphere, which holds the top level quadtree roots,
  • double precision rough length of an edge (from corner to corner along an edge),
  • 2 * double precision vectors for the centroid and clipCentroid of the node used for updating and rendering calculations,
  • double clipRadius,
  • depth of the node within the quadtree,
  • a mutex for locking access to the child nodes during updating or rendering,
  • double “distMult” which is calculated and used in the constructor and NEVER AGAIN.

Ok so that did get a little non-geek hostile there (it’s about to get worse!) but what it boils down too is that there’s a lot of shit in every single node. Some of it is terrifying, absolutely pant wettingly funnily bad to have there. Lets take the worst of the offenders, and before I get any comments pointing out the others, yes I am aware of all of them:

3 * double precision vector arrays to the vertex, normal and colour data – These are bad for a few reasons but the obvious ones are that even if we need to store this data then we’re storing it in the wrong format for AT LEAST two of the pieces of data:

  • Normals – these can be efficiently packed down into some very compact formats, in reality you could get down to a single 32-bit integer with two of the coordinates packed into 16-bits each and then recalculate the 3rd component when need, probably in the vertex or pixel shader. In this case we can quickly halve the memory usage however by simple making it a float vector instead of a double.
  • Colours – oh-my-god, you don’t EVER need double precision for colour, in this case we have 192-bits for colour… 16-bits would probably be good enough, for perfect colour we might need 24-bits but NOT 192-bits. In this instance a single 32-bit integer is our best bet for quickly encoding the colour saving us a staggering 160-bits PER VERTEX and we’ve got 8-bits to spare if we need them for something (alpha channel?) in future.
  • So why was the colour using 64-bit vector3 representation? Because the heightmap generation was sneakily storing the height in the red channel during the terrain generation so that it could use it later…
  • …this was so it could retrieve the height when calculating the per vertex colour. Why not just store the height directly in another array? This adds back another 64-bits so we’ve only saved 92-bits per vertex.
  • Of course we’re only storing all of this temporarily anyway, until we can generate the new vertex buffer for the patch (nuuurgh, I’ll rant about this later) which then turns it into floating point vector data. Since this is the case however we can in fact discard the vertex position data completely, all 192-bits of it, and then rebuild it from the height data that we’re now storing instead.
  • Total so far: Normals = 50% = 192-bits to 96-bits, Colours = 1/6th = 192-bits to 32-bits, Vertex = 1/3rd = 192-bits to 64-bits, so 576-bits to 192-bits per vertex.

That reduction has brought my peak memory consumption from ~700MiB when sat on the surface of the New Hope start position down to ~280MiB, and as you can see from the above there’s still more to be saved from a variety of areas. That’s because those arrays in particular are allocated to contain all of the visible data for each node. The rest is a bit nasty but it’s nothing in terms of memory usage by comparison because even with a few thousand nodes in the quadtree(s) they’re only a tiny fraction of all the data stored.

Hmm, that’s gone bit rambl-o-matic because I’m bloody tired.

Well maybe someone might find it interesting, I’ll continue with this stuff… soon! :)


Further in the series: Part 2 is up now.

Why no desktop 16-core CPU?

Posted by | Posted in Game Development, Pioneer | Posted on 06-03-2013

It’s a question I keep coming up against as I do multi-threading work but where are all of the desktop 16-core CPUs?

You can get what AMD call a 16-core CPU in the Opteron 6200 but it’s built on Bulldozer or Piledriver which are more like 1.5 cores per “dual core” thanks to sharing their FPU capabilities. Or to put it another way, 16-core INTEGER and 8-core FLOATING POINT.

Not long ago we went from single-core to dual-core, to true dual-core (both cores on the same die), to quad-core… and then we stopped.

I guess the argument could be that it’s not worth it for most people? Or that hyperthreading gives you 8 hardware threads with upto 30% performance boost if you can use them well.

This all misses the point though and I’d have been quite happy if AMD had continued adding cores to it’s K10 architecture, keeping up with the process node advances (die shrinking), updating, optimising and just piling on the cores. They have done that to some degree because K10h, as used in the Phenom 2, did make it into the early APU’s in a low power die-shrunk version. It lacked any kind of L3 cache though it did have some extensions, updates and improvements so that it just about holds it’s own against a similar number of core Phenom 2.

Those chips were APU’s though, with an on chip GPU for mobile use. So they clocked slower and fully 50% of the die was spent on the GPU. You could get versions without the GPU called Athlon II but they still lacked the L3 cache and the GPU was there, just disabled and powered off. There was no 8-core Athlon II with an L3, even a small one even though it was probably possible.

We’re down at 22nm + 3D transitors with Intel CPUs whilst AMD are still rolling out on 32nm but are we really stuck at 4-cores?

No, there are higher core counts as you can see but they’re for servers rather than our mere mortal desktop machines. So the work isn’t going into getting more performance out of heavily threading things, instead it’s in GPGPU languages like CUDA, DirectCompute and OpenCL. Utilising the GPU to do work you’d normally have just hammered out on the CPU. There’s real benefit to doing things on your GPU, and eventually I think AMDs “APU” strategy might pay off if they can reduce the latency between the CPU<->GPU for compute languages for example but traditional multi-threading seems to have been ignored. It’s not even an option to get more cores on a desktop CPU and I think that’s a shame as there’s a lot of workloads that will happily scale to 16 or 32 threads without the need to move them to GPU.

It would have been interesting if AMD had chosen to do it, to keep scaling the cores at least as an option but then I think there’s a metric fucktonne of things that AMD could have done to stay relevant which they’ve manifestly failed to do so here’s a short list:

  • big.LITTLE – as in the ARM design where you have a group (4 typically) of large, fast, powerful CPUs and then you have one tiny little low power CPU that is used when the workload gets light to save power.
  • Unlocked multipliers and clocks on CPUs sold with a very limited warranty at the same price as the regular locked one – the Black Edition chips are good but not enough.
  • Change the chip packaging format as Intel and IBM have both separately proposed for better thermal, power and mounting design – if you’re in 2nd place you innovate to survive.
  • Speaking of IBM – form an alliance, use their resources and manufacturing to get access to better process node shrinks.
  • …and of Process Node shrinks – you can’t compete with Intel on them but as soon as AMD were free of Global Foundaries why didn’t they run off to TSMC (or IBM) and aggressively chase what there was available?

There must be a tonne more as well besides, but in short I miss the AMD of old. We were speaking about it at work the other day and it always produces the same head-shaking response where you just can’t believe that todays AMD is the same one that gave us the K6 and k8 architectures. The AMD that bludgeoned Intels Pentium 4 misstep into a cocked hat and happily overclocked it pants off.

Todays AMD seems to be one that releases products based on “vision” rather than”getting-the-fucking-job-done-well“. I’d even settle for: “getting the job done well enough” and do it with more cores but instead if you buy AMD now you’re probably getting a Piledriver chip at the high-end which is finally a little bit faster in some situations than the k10h architecture… they could have done an 8-core k10h in approximately the same die area but minus the GPU and I’d rather have had that because all my GPU needs are serviced by a separate card in a PCI-e slot.

I’d buy that chip, fuck I’d buy a 16-core version without hesitation. A die shrunk 8/16-core Phenom, L3 cache intact, the improvements they rolled into the Stars (k10h successor) architecture, no GPU, on a quad channel memory bus and clocked at about 3Ghz. That could pummel my Core i7 into the floor with the loads I have in mind.

It’s not to be though, that AMD is dead and for some reason so are desktop 8/16-core CPUs it seems.