Space MMO – Pioneer-alike

Posted by | Posted in Game Development, Pioneer | Posted on 29-06-2013

Space game MMOs… A brain dump:

I’ve been inspired to brain dump by a post on the Paragon forum.

To quote:

Will multiplayer be instanced or MMO style?
If MMO will players be able to transverse to other servers/galaxy’s with their ships,equipment and credits?

Now, leaving aside the question of what they might mean behind instancing, lets instead think about how a space game like Paragon, not Pioneer note, Paragon might choose to implement an MMO gameplay model.

NB: Paragon is a sort of daughter project to Pioneer. It’s forked from the Pioneer codebase and occasionally pulls in work we do. However Paragon has it’s own developers, development and game specific code, scripting, storyline and art. I’ll be discussing these things from being a Pioneer developers point of view.

Read the rest of this entry »

Part 3: Many hands make for light work.

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 19-05-2013

Well this episode has taken longer than planned to get written, or event started for that matter. So lets not delay as this is part 3 of me attempting to explain the terrain system used in Pioneer Space Sim.

If you want to recap and point out poor grammar or spelling mistakes then go ahead and read Parts “1: Pioneer’ing Terrain” and “2: Now with… no feeling in my arms due to all the typing!”.

How things change:

Since Part 2 there have actually been some developments which make this edition even more relevant. At the end of it I mentioned that I’d be covering my work making the terrain generation multi-threaded using a job based system. Well that has now been merged into master and is available in the latest downloads. It’s got some bugs fixed since then and hopefully this will only get truer in the future ;)

Read the rest of this entry »

Part 2: Now with… no feeling in my arms due to all the typing!

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 20-04-2013

I should have made this clear from the outset but these posts aren’t a description of the “best way” to do anything, they’re more akin to documenting how we currently ARE doing things.

There’s a great deal of information out there on how these things can work but with Pioneer you can grab the source code and actually see it in progress. There’s great value in getting hold of something, testing it, debugging it, making changes and watching it blow up in your face ;)


In Part 1 I gave a very basic description but at no point did I attempt to explain things at the code level… and I’m not going to get too close to it during this series but I’ve got to get at least a little more familiar because some parts just don’t make any sense at first.

What does this do and why does it do it?:

One of the first things you see with the “GeoSphere” code leading up to, and including, the Alpha 33 release is that it’s entirely contained in just two files which are several thousand lines long and full of some nasty complex looking code. This is an unfortunate side effect of the way it’s evolved. At one point it might have been reasonably clean but as with all things they get added too and so the complexity grows along with the sheer amount of code in a single file. Eventually you just have to prune it back but with these situations you first have to understand what’s going on before you can do that pruning and splitting up of files. The complexity means that no-one understands the system and deciphering it becomes the large part of the burden so the situation is rarely resolved.

So lets tackle the big pieces of the complexity first.


The GeoSphere class is the main unit the rest of the code will interact with, it’s the part visible to the outside. It contains instances of the other main classes and does some general orchestration of them. It can be thought of as being split into two main parts however due to it’s use to static methods and members:

  1. static methods & members that affect ALL GeoSphere instances.
  2. non-static methods & member that are about a specific instance of GeoSphere.

The static part handles the creation and initialisation of stuff that affects all of the GeoSphere instances, the GeoPatchContext which holds information about the size and detail of the GeoPatch’s for example. It also kicks off the thread which manages updating the level-of-detail calculations I mention in Part 1.

The per-instance stuff is fairly basic:

  • m_terrain pointer, which is a “Terrain” pointer that will handle all of the heavy lifting logic for generating height and colour information for the patches.
  • 6 GeoPatch pointers, these are the 6 faces of the cube we will turn into a sphere, these look after themselves most of the time, we just have to call methods(/function) one them.
  • Constructor, destructor, Render, GetHeight, GetColor and BuildFirstPatches methods.

GetHeight and GetColor are simply helper methods, they just pass some information into a method of m_terrain and return the data, nothing more. Likewise the constructor and destructor are just setup/cleanup so lets ignore them.

The two methods of interest here are Render and BuildFirstPatches.

The Render method is called from the main thread as all of our rendering has to happen there currently. However, Render does more than just render, yeah I know, horrible eh but it’s what we’ve got.

At the top of the method it mostly does some setup of materials, one’s subsequently used by ALL of the patches that will be drawn. Then we get to a call to BuildFirstPatches. This is checked every single time we call Render but it only does anything on the first call and it’s lazy evaluation done lazily. It could be quite convoluted to create the GeoSphere and then immediately call BuildFirstPatches so instead of resolving this complexity it just checks to see if the root node of the patches have been built every frame and if it hasn’t it calls this method. The most stars/planets/moons I’ve yet to see in a system was about ~50 so it’s not too much overhead but it’s still icky.

After this check we configure basic lighting and then we start calling the Render method on each of the root GeoPatch instances (the ones created by BuildFirstPatches). I’ll describe them more later but they essentially do the quadtree traversal finding the nodes worth rendering.

Once we’re finished there we release out materials and then do some nasty management of the multithreaded updating logic, and finally we store the current camera position that will be used in the updating and level-of-detail (LOD) calculating thread. This is actually another reason why BuildFirstPatches is done lazily within the Render call, it’s because you can’t properly update the LOD until you have a camera position, but you don’t have one until you’ve tried to render. By putting the BuildFirstPatches call in Render you almost guarantee that you will have a valid camera position by the time that the LOD thread gets around to updating your GeoSphere. Hacky but it’s worked all this time.


If GeoSphere were the potatoes then this is the meat. GeoPatchContext might take up the top 1/3rd of the file but that stuff is actually fluff, it’s bulky but it’s not doing a lot of active logic, just setting things up for GeoPatch to use later, and they’re mostly rendering related but they’re not important until we get to GeoPatch rendering.

GeoPatch is defined entirely within the CPP file for GeoSphere, I don’t mind this technique overly much because C++ can be rather limited when you’re trying to hide an implementation but this is crazy. The GeoSphere implementation is only ~500 lines of code, and it starts at line 1053! GeoPatch on the other hand takes 670 lines in the middle of the file. Ugly and confusing for people new too it. Also confusing is that the quadtree, patch data, and some threading handling information are all muddled together.

We have as the 3 parts that should be more distinct:

  1. kids (*4), parent and edge friends (*4) – this is the quadtree information.
  2. m_kidsLock pointer to a mutex – this is the threading rubbish, it’s used to stop you from updating the patch when you’re trying to render it and vice versa.
  3. everything else is the patch data or at least can be considered that way.

To add to this confusing class data layout is that some of it will be called and used in both the main thread and the LOD update thread with access gated by the m_kidsLock mutex. Threading is almost never simple but this lack of clarity makes it hard to understand both when and where some members will be updated.

Next we have a bunch of functions that stump most people, I’ll list them and then explain their purpose:

  • GetEdgeMinusOneVerticesFlipped
  • GetEdgeIdxOf
  • FixEdgeNormals
  • GetChildIdx
  • FixEdgeFromParentInterpolated
  • MakeCornerNormal (templated for extra fun)
  • FixCornerNormalsByEdge
  • GenerateEdgeNormalsAndColors
  • OnEdgeFriendChanged (needed)
  • NotifyEdgeFriendSplit (needed)
  • NotifyEdgeFriendDeleted (needed)
  • GetEdgeFriendForKid (needed)

All of these serve one purpose, to confuse the hell out of everyone reading the code… no ok that’s being mean, the last 4 really are useful… still mean? Yeah a bit, I’ll have to explain.

Generating terrain can be a computationally expensive process. The way we do it will be a couple of future articles but the very very short version is that we run an expensive set of mathematical noise functions many many times for every single vertex that we want to generate the position for. Doing this for a single vertex is expensive but we do it 10’s of thousands of times even for a low detail planet. Most of the code, and ALL of that complexity above, is an attempt to avoid generating as much of it by copying it from other patches that have already been generated.

The reason that the last four methods are, mostly, needed is because they’re both the way that we keep track of who our neighouring nodes are and how we kick off the data copying process.

The thing to take from this is not that these are a good idea, and I’m not going to explain how they work either, the important thing is that they’re OPTIONAL. They’re costly too, a lot of time is spent maintaining all of this and all of the data copying is pretty bad for performance for other reasons. Not least that it often just has to generate the data anyway because a neighbour can’t be found at the correct level. They also make it much harder to make everything into smaller tasks that can better use multi-core CPUs. All things considered I’d rather not have them.

What all this mean is that GeoPatch has that lot, and only another handful of interesting methods which I’m going to go into a little detail with now:

  • determineIndexBuffer – I will forever regret that this doesn’t match our method naming convention, i.e: that lowercase ‘d’. Ignoring that though; this is what uses the result of the method “Init” from GeoPatchContext, most of it anyway. “Init” went along for many lines of code calculating 16 index buffers, it was a lot of code and earlier I said it didn’t do very much. Well in truth what it did was set up 16 index buffers so that here, in this method, we could do a very fast and simple bit of OR’ing and come up with an index into that list of 16 index buffers. Specifically we take our edgeFriends and we see which ones of them we actually have, and we find the perfect index buffer that uses low-resolution edges when we don’t have a neighbour on that edge, and hi-resolution edges when we do. There are 16 possible combinations, although in truth we only ever have a subset of those used on our terrain. It’s quick and easy to calculate them though, and this is fast way to index those.
  • UpdateVBOs & _UpdateVBOs – even I’m not entirely sure where these two are called from, which thread and under what circumstances. This is because of those many edge copying methods, I suspect that UpdateVBOs can be called from either depending on circumstances, but it’s usually called from the LOD thread. _UpdateVBOs on the other hand should only EVER be called from the MAIN thread as it interacts with OpenGL by updating or creating the vertex buffer object (VBO) that is used to render the patch on screen so if it’s ever called from another thread it will almost certainly crash, or worse it’ll keep running and fail silently :(
  • GetSpherePoint – gets a point on a sphere, specifically it performs a bilinear interpolation between the four corner points of a patch. At the lowest detail level then, that means the corners of a cubes face, i.e: this turns our cube into a sphere – or one faces of it into a curved patch on the surface of the sphere – part of the magic happens here. Once it’s done the bilinear bit it “normalises” it, turns it into a vector of unit length (Off topic; I hate wikipedias vector descriptions, they penalise those striving to learn some maths and strokes the egos of those who already do) so that when you sum the parts of the vector they equal 1. GenerateMesh makes heavy use of GetSpherePoint because it creates the patch vertices.
  • GenerateMesh – ignore the centroid for now, focus on the first couple of nested for loops because they generated ALL of the heights for every vertex of a patch. This *IS* the terrain creation right here in two for loops.
    • The call to GetHeight is the expensive call to the generation process itself but it’s interchangable with any other. We have many versions of this call they all do things using different maths and could even use other ways of generating terrain so we’ll deal with them another day. No the important bit is that if you strip all of GeoPatch down to it’s absolute bare minimum then this function and GetSpherePoint is almost all you’d have left with just these two for loops.
    • We get the height using a double precision floating point variable because it will be in the range 0.0 to 1.0 yet it has to be very very precise and if we used a single precision (32-bit) float then we’d lose detail and get visible banding on the terrain. There are ways and means of avoiding this but doubles are simple and they’re fast enough most of the time.
    • We take that height and add 1.0 to it, then we take this new value and use it to scale the vertex position from GetSpherePoint and that’s it, we have a terrain height that will be of at least unit length (1.0) up to a maximum of length 2.0 which would make for some very high mountains.
    • The second pass is now where the trouble starts but can be summed up with the word: “normals” – the calculation of which uses 4 vertex positions, at least two of those will be within this patches data, but the other two will be in our neighbouring patches. That’s why these for loops don’t cover the entire patches data. Instead they cover the central part, the rest of the data, the edges, will be calculated by the mess of edge copying methods I skipped describing above.
    • The colour is also calculated in this 2nd pass, it requires the height and the normals so it’s also done in the edge copying. You might notice if you’re reading through the code that the height is stored in the colour in pass 1, then used in pass 2. I covered why this was and why it was bad in Part 1 – as I say, this is a description of how it is, not how it should be!
  • Render – an actual blend of quadtree and patch all in one here. This is depth first traversal & rendering of a quadtree in action! Render MUST be called in the MAIN thread only because it’s dealing with OpenGL. First we see if we have any child nodes, if we do we just forward on the values that we passed into the function. That’s the depth first traversal part, because in a tree each node will do the same thing, over and over until it reaches a leaf node, the one without children. When we do the rendering is straight forward:
    • Lock the node – so it can’t be updated in the LOD thread,
    • optionally update the vertex buffer object (VBO),
    • test to see if we’re visible and return without doing anything if we’re not,
    • push the current matrix and then translate the view according the difference between the camera position and the clipCentroid (which is the centre of the four corner vertices),
      • this helps us deal with a problem called “jitter” (/”jittering”) caused by the GPU only using 32-bit floats which aren’t precise enough to represent the terrain. What we do is move the rendering of the patch in such a way that 32-bit’s are precise enough for us by offsetting it from the camera position. We’re effectively moving it closer to the camera before applying the position to avoid the jittering! Patches further away still jitter like crazy, but it doesn’t matter because they’re small and far away!
    • update some information we store about how many triangle we draw,
    • setup our buffers, there’s determineIndexBuffer from earlier,
    • do the actual drawing,
    • now release our buffers and pop the matrix so that this patches matrix offset doesn’t affect the next patches to be rendered.
    • and we’re done with Render :)
  • LODUpdate – This is called only from the LOD thread, the MAIN thread never goes near it, I’ll break it down just like Render above but this is where we decide to increase or decrease the detail of a GeoPatch by splitting or merging them:
    • Slightly different to the above we lock the “m_abortLock” to see if the thread is trying to quit – it helps us exit faster is the idea but it’s implementation specific, ignore it.
    • canSplit – leading question eh? We iterate through our neighbours and perform some tests like: Do we have them all? Are they less detailed than us or higher in the quadtree than us? If we pass that then we check that we’re not already too deep to split and finally we apply some maths to decide if our edges length is more than the distance from the camera to the centroid we calculated in GenerateMesh. This is a crude approximation of determining how big the patch will be on the players screen. You can find much better ones if you Google for geomorphing. There’s an additional check to see if we have a parent because someone decided we should always split in that instance… not sure why.
    • canMerge – apparently yes… I don’t know why this is here, we could avoid an if test later on, I hope the optimiser gets rid of it!
    • canSplit branch – if we really really canSplit then, in summary: we first see if we have children, if we then we just pass on the call to LODUpdate otherwise; we create 4 child nodes, we set ourselves as their parent, we setup the relationship between each of them and the neighbours that we know about (if any) then we call GenerateMesh on each of them, we GenerateEdgeNormalsAndColors using all that complicated shit above, we UpdateVBOs which sets the flag that says that we really need to call _UpdateVBOs back in the MAIN thread and finally we pass on the call to LODUpdate to our newly created child nodes.
    • canMerge branch – if we cannot split then we’ll go this way, always (stupid code). That doesn’t mean we can actually merge though, not if have no child nodes. If we do then we’ll happily destroy them, and in their destructors they’ll destroy their kids and so on and so forth. When I first starting reading this code it took me a while to realise why the tree isn’t unbalanced and constantly destroying and creating node. It’s because we split aggressively. We only take the merge branch when there’s no way of splitting.
    • A word on locking: in both the canSplit and canMerge branches we Lock but at different times. In canMerge land there’s not much to do, just destroy stuff so we lock straight away. In canSplit land though we can delay the mutex locking until later, we allocate the new child nodes to temporaries, then call the expensive GenerateMesh method, only once we’ve done that do we lock the mutex and copy the temporaries into the correct child node pointers. They have to be there for to correctly notify their new neighbours that they exist and to generate the edge normals and colours. At least this way the mutex isn’t locked for as long, because that would delay the rendering call back in the MAIN thread if it was.
  • GeoPatch constructor/destructor – a quick note about these two, the generate some data used throughout the rest so are worth reading and keeping in mind. The destructor also does some of the edge friend management by letting it’s friends know when it’s destroyed but otherwise they’re pretty simple.

Wow, 3202 words in so far everyone… everyone? Anyone? Ahem, anyway.

The end?:

That’s it for the whistle-stop tour for now. I know I know, there’s still slightly more than a third of the CPP file that I haven’t even started to cover but that’s fine because I’m not going too. At all.

It’s mostly too specific to some of Pioneers quirks, some of it is just structural so it’s run once to set things up and then never used again but mostly I’m ignoring it because it doesn’t deal with what you need to understand.

As I’ve described above the existing terrain generation is in two parts; one in the main thread with some setup and then the rendering, the second is in it’s own thread. The two cross over at places and prevent crashes and other problems by locking mutexes. This works but it causes performance problems of it’s own. There’s a great deal of data copying stuff that can be avoided by simply generating more than you need, that means you don’t need to track relationships so much and that simplifies the code massively. That might have performance issues of it’s own of course since you are doing extra work but it simplifies the code so very much and I’ve already covered the reasons for doing that in Part 1.

Next in the series I’ll probably cover some of the ways we generate the heights for the terrain… no, no actually I’m not doing that at all. Yikes that’s terrifying stuff. No instead I’ll discuss what I’ve been working on to make it take advantage of multithreading and many cores. In that article I’ll also cover a side project called GLSLPlanet which moves some of the work onto the GPU, and why I’m not doing that in Pioneer just yet even though it’s based on the same code I’ve just described.

See you next time,


Part 1: Pioneer’ing Terrain

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 17-04-2013

Yes yes I went for the pun-tastic title :)

This is going to be a bit more technical than usual, not massively, don’t stop reading for fear of equations or pages of code. No this is just going to be a technical description of some parts of Pioneers terrain rendering system because it’s quite odd, not great, but it’s effective enough. I’m basically going to brain dump a bit so I might come back and edit this in the future to clean it up, it will be something of a living document.

In the future I’ll refer back to this post to describe the ways that the terrain rendering and generation will be changing.

I should also point out that I’m not the original author of the terrain rendering or generation used in Pioneer. It’s just an area that I’ve been poking around in a lot lately and that I have a lot of changes that I want to make to that area. These changes should make generation of the terrain faster and more flexible, add control that we currently lack. They should also accelerate rendering and use a lot less memory as well as opening up new visual and game effects in the distant future.

The basic idea:

The terrain itself is a form of Quadrilateralised Spherical Cube which is a fancy way of saying that you’ve got 6 flat planes oriented to face in the six outward directions of a cube, then you deform them to be a sphere. This causes some distortions but it also comes with a lot of rather useful benefits.

For a start we can use a quadtree to define the surface of each (originally) square face of the (former) cube. That’s handy because there’s lot of simple terrain rendering algorithms that can use such an arrangement. In this case we’re using something similar to “Chunked LOD“. I say “similar” because the details differ, but you can read a better explanation on the “making worlds of sphere and cubes” blog post on Acko since it’s similar in concept.

So you know that:

  • we’re using 6 quadtrees, 1 per cube face.
  • each quadtree is deformed to fit onto a sphere.
  • that magically this is helpful…

Well when we want to render the terrain we start at the top of each of the quadtrees, i.e; the lowest detail level, and simply ask it if it has any children. Because it’s a quadtree it will have either 4 or none, hence the name “quad” in case you ignored that wikipedia like. When we reach one of these quadtree nodes that has no children we render whatever geometry it has because that must by definition be the highest level-of-detail available. This is a very simple system, you repeat this for each of the faces of the cube/sphere and you’ve very quickly rendered a convincing sphere onscreen.

Implicit in the above statement however is the idea that some of the quadtrees “nodes” won’t have children whereas others will. To do this we pass through the players camera position to an update method that goes through the quadtree, much like the rendering does, and at each node it does some maths to work out if a node is in 1 of 3 states: good enough, whether it should “split” and create some child nodes, or if it has child nodes and doesn’t need them anymore so can “merge” them.

  • Good enough: nothing happens, we have the perfect amount of detail. In the code we’ll take the “merge” branch but if we have no child node then nothing changes,
  • Split: Ah, this nodes not detailed enough so we can either create 4 child nodes and populate them with more detailed terrain, or if we already have them then we descend into those node and repeat the update process,
  • Merge: The flipside to being good enough is that we’re good enough and so if we have any child nodes then they’re superfluous and we can merge them, or in Pioneers case, just delete and erase them from existence.

In the current Pioneer Alpha 33 this all happens in a single separate thread with a few locks/mutexes preventing rendering from happening at the same time as updating. This creates a lot of waiting around for either updating the quadtree or waiting for rendering to complete. It was the simplest way to improve things at the time it was created though because it meant that it didn’t stall the main thread whenever the, very expensive, terrain generation needed to be done. Doing it this way keeps everything looking like a single-threaded process and this is inherently simpler to write and understand. That means there should be less bugs and less problems.

Unfortunately Pioneer tries to do some odd things that exploit the fact it’s storing the terrain in a quadtree. This unfortunate stuff is done in the name of performance in that it tries to avoid regenerating some of the terrain data by keeping close track of who it’s neighbours are in the quadtree and then copying as much data as it can. All of this neighbour tracking and data copying is very expensive both in performance terms and complexity. Overall it was this that took me the longest to understand rather than the actual terrain generation or rendering parts!

This neighbourly management is one of the first things that is going to change, hopefully by the next release date.

You see the neighbouring node stuff is complex, it also imposes some form on the quadtree structure that we don’t need and that means things cannot be easily generated in little pieces or across many CPU cores. Ideally we’d like the update logic to happen, it decides to Split or Merge some nodes. Then it asks a system to go away and make it the data it needs. Those requests disappear off into the aether and a short while later new terrain node data returns ready to be put into the quadtree. Whether that’s done on a super-powerful GPU, farmed out across a distributed network, or done on a differet core of the same CPU we don’t know or care. Removing the neighbour management, or at least the data copying portion, means breaking up the generation of these quadtree terrain nodes gets much simpler and easier to distribute across many CPU cores.

Quadtree specifics and quick/easy changes:

A quadtree can be used to store any kind of information, it doesn’t even have to be spatial or geometric like we are in Pioneer. However that is our use case and so we store a tonne of data in each of the quadtrees nodes. Now by a tonne of stuff I really mean a metric-fucktonne of stuff! We store in terrifying order:

  • A ref counted pointer to the context data that holds the terrain generation methods,
  • the double precision vector of the corner vertices,
  • 3 * double precision vector arrays to the vertex, normal and colour data,
  • a GLuint for a nodes vertex buffer object number,
  • 4 pointers to its child nodes (it’s a quadtree afterall),
  • another pointer too this nodes parent,
  • 4 more pointers to it’s possible neighbours (whether or not it has any),
  • a pointer to the parent object called a geosphere, which holds the top level quadtree roots,
  • double precision rough length of an edge (from corner to corner along an edge),
  • 2 * double precision vectors for the centroid and clipCentroid of the node used for updating and rendering calculations,
  • double clipRadius,
  • depth of the node within the quadtree,
  • a mutex for locking access to the child nodes during updating or rendering,
  • double “distMult” which is calculated and used in the constructor and NEVER AGAIN.

Ok so that did get a little non-geek hostile there (it’s about to get worse!) but what it boils down too is that there’s a lot of shit in every single node. Some of it is terrifying, absolutely pant wettingly funnily bad to have there. Lets take the worst of the offenders, and before I get any comments pointing out the others, yes I am aware of all of them:

3 * double precision vector arrays to the vertex, normal and colour data – These are bad for a few reasons but the obvious ones are that even if we need to store this data then we’re storing it in the wrong format for AT LEAST two of the pieces of data:

  • Normals – these can be efficiently packed down into some very compact formats, in reality you could get down to a single 32-bit integer with two of the coordinates packed into 16-bits each and then recalculate the 3rd component when need, probably in the vertex or pixel shader. In this case we can quickly halve the memory usage however by simple making it a float vector instead of a double.
  • Colours – oh-my-god, you don’t EVER need double precision for colour, in this case we have 192-bits for colour… 16-bits would probably be good enough, for perfect colour we might need 24-bits but NOT 192-bits. In this instance a single 32-bit integer is our best bet for quickly encoding the colour saving us a staggering 160-bits PER VERTEX and we’ve got 8-bits to spare if we need them for something (alpha channel?) in future.
  • So why was the colour using 64-bit vector3 representation? Because the heightmap generation was sneakily storing the height in the red channel during the terrain generation so that it could use it later…
  • …this was so it could retrieve the height when calculating the per vertex colour. Why not just store the height directly in another array? This adds back another 64-bits so we’ve only saved 92-bits per vertex.
  • Of course we’re only storing all of this temporarily anyway, until we can generate the new vertex buffer for the patch (nuuurgh, I’ll rant about this later) which then turns it into floating point vector data. Since this is the case however we can in fact discard the vertex position data completely, all 192-bits of it, and then rebuild it from the height data that we’re now storing instead.
  • Total so far: Normals = 50% = 192-bits to 96-bits, Colours = 1/6th = 192-bits to 32-bits, Vertex = 1/3rd = 192-bits to 64-bits, so 576-bits to 192-bits per vertex.

That reduction has brought my peak memory consumption from ~700MiB when sat on the surface of the New Hope start position down to ~280MiB, and as you can see from the above there’s still more to be saved from a variety of areas. That’s because those arrays in particular are allocated to contain all of the visible data for each node. The rest is a bit nasty but it’s nothing in terms of memory usage by comparison because even with a few thousand nodes in the quadtree(s) they’re only a tiny fraction of all the data stored.

Hmm, that’s gone bit rambl-o-matic because I’m bloody tired.

Well maybe someone might find it interesting, I’ll continue with this stuff… soon! :)


Further in the series: Part 2 is up now.

Why no desktop 16-core CPU?

Posted by | Posted in Game Development, Pioneer | Posted on 06-03-2013

It’s a question I keep coming up against as I do multi-threading work but where are all of the desktop 16-core CPUs?

You can get what AMD call a 16-core CPU in the Opteron 6200 but it’s built on Bulldozer or Piledriver which are more like 1.5 cores per “dual core” thanks to sharing their FPU capabilities. Or to put it another way, 16-core INTEGER and 8-core FLOATING POINT.

Not long ago we went from single-core to dual-core, to true dual-core (both cores on the same die), to quad-core… and then we stopped.

I guess the argument could be that it’s not worth it for most people? Or that hyperthreading gives you 8 hardware threads with upto 30% performance boost if you can use them well.

This all misses the point though and I’d have been quite happy if AMD had continued adding cores to it’s K10 architecture, keeping up with the process node advances (die shrinking), updating, optimising and just piling on the cores. They have done that to some degree because K10h, as used in the Phenom 2, did make it into the early APU’s in a low power die-shrunk version. It lacked any kind of L3 cache though it did have some extensions, updates and improvements so that it just about holds it’s own against a similar number of core Phenom 2.

Those chips were APU’s though, with an on chip GPU for mobile use. So they clocked slower and fully 50% of the die was spent on the GPU. You could get versions without the GPU called Athlon II but they still lacked the L3 cache and the GPU was there, just disabled and powered off. There was no 8-core Athlon II with an L3, even a small one even though it was probably possible.

We’re down at 22nm + 3D transitors with Intel CPUs whilst AMD are still rolling out on 32nm but are we really stuck at 4-cores?

No, there are higher core counts as you can see but they’re for servers rather than our mere mortal desktop machines. So the work isn’t going into getting more performance out of heavily threading things, instead it’s in GPGPU languages like CUDA, DirectCompute and OpenCL. Utilising the GPU to do work you’d normally have just hammered out on the CPU. There’s real benefit to doing things on your GPU, and eventually I think AMDs “APU” strategy might pay off if they can reduce the latency between the CPU<->GPU for compute languages for example but traditional multi-threading seems to have been ignored. It’s not even an option to get more cores on a desktop CPU and I think that’s a shame as there’s a lot of workloads that will happily scale to 16 or 32 threads without the need to move them to GPU.

It would have been interesting if AMD had chosen to do it, to keep scaling the cores at least as an option but then I think there’s a metric fucktonne of things that AMD could have done to stay relevant which they’ve manifestly failed to do so here’s a short list:

  • big.LITTLE – as in the ARM design where you have a group (4 typically) of large, fast, powerful CPUs and then you have one tiny little low power CPU that is used when the workload gets light to save power.
  • Unlocked multipliers and clocks on CPUs sold with a very limited warranty at the same price as the regular locked one – the Black Edition chips are good but not enough.
  • Change the chip packaging format as Intel and IBM have both separately proposed for better thermal, power and mounting design – if you’re in 2nd place you innovate to survive.
  • Speaking of IBM – form an alliance, use their resources and manufacturing to get access to better process node shrinks.
  • …and of Process Node shrinks – you can’t compete with Intel on them but as soon as AMD were free of Global Foundaries why didn’t they run off to TSMC (or IBM) and aggressively chase what there was available?

There must be a tonne more as well besides, but in short I miss the AMD of old. We were speaking about it at work the other day and it always produces the same head-shaking response where you just can’t believe that todays AMD is the same one that gave us the K6 and k8 architectures. The AMD that bludgeoned Intels Pentium 4 misstep into a cocked hat and happily overclocked it pants off.

Todays AMD seems to be one that releases products based on “vision” rather than”getting-the-fucking-job-done-well“. I’d even settle for: “getting the job done well enough” and do it with more cores but instead if you buy AMD now you’re probably getting a Piledriver chip at the high-end which is finally a little bit faster in some situations than the k10h architecture… they could have done an 8-core k10h in approximately the same die area but minus the GPU and I’d rather have had that because all my GPU needs are serviced by a separate card in a PCI-e slot.

I’d buy that chip, fuck I’d buy a 16-core version without hesitation. A die shrunk 8/16-core Phenom, L3 cache intact, the improvements they rolled into the Stars (k10h successor) architecture, no GPU, on a quad channel memory bus and clocked at about 3Ghz. That could pummel my Core i7 into the floor with the loads I have in mind.

It’s not to be though, that AMD is dead and for some reason so are desktop 8/16-core CPUs it seems.

What I do when I’m programming.

Posted by | Posted in Game Development, Life, Lua, Pioneer | Posted on 02-02-2013

Today, as with the last 6 weekends in a fucking row I am ill, whoop, and being the weekend the doctors are off. It’s not an “emergency” so I can’t get seen until Monday, by which time I will be ok again and they won’t know what to do… fucking fabulous.

Instead of posting about that, or how you can’t buy a laptop with a decent resolution screen these days unless it’s made by Apple, I’ve decided to describe how I approach programming a feature in Pioneer. I happen to be working on a relatively bite size one at the moment so it’s fair game.

Read the rest of this entry »

Ah Lua, how do I loathe thee… I mean Love, yeah Love…

Posted by | Posted in Game Development, Lua, Pioneer | Posted on 25-11-2012

I’ve never understood the love given to scripting languages embedded in a game engine.

I’m going to take Lua in Pioneer, or in anything else for that matter but it’s Pioneers that sparked this off. You have a system written in C++, you expose it to Lua with C++ side functions that get presented to Lua scripts, you then program in Lua.

You are still programming, it’s just another programming language. Lua is not King, neither is C++, they’re both just programming languages.

Now, inevitably, the next step occurs: Everything has to be done in Lua.

What was a convenience, or a way of rapid prototyping, or a way of scripting light data handling routines, or for displaying data in a GUI is now doing heavy lifting in the engine at about 1/35th to 1/50th the speed it was being done in the traditionally compiled code.

Of course by this time only experienced programmers can actually write or modify the scripts because to make Lua useful you’ve extended it with home grown libraries & since the purpose of Lua is usually to make designers and non-coders lives easier it has fundamentally failed in this regard by this stage.

Whole systems are exposed from C++ meaning that you’re maintaining code twice except that you’ve exposed the worst bits of C++ via the wooley type unsafe Lua where the most advanced editor has all the sophistication of “Notepad.exe”.

Lua is not king, Lua quickly becomes a ball ache most of the time because it grows out of it’s usefulness, rapidly doubles the amount of work required to maintain engines, and slaughters anywhere it’s used in a performance critical subsystem.

I say this as someone who has programmed using it at several companies and Loves Lua for scripting. I just don’t think it’s anything other than a helper and best if it’s regularly pruned to reduce what it’s used for.

Some things should be moved out of Lua in Pioneer entirely and into some form of structured data generated by a tool. All the LMR stuff is obvious, ship definitions, spacestation configuration info, and ANYTHING to do with vectors/matrices/quaternions.

Other stuff is perfect Lua fodder: missions, trade pricing, defining factions, the GUI and probably a few others.

It’s just so annoying writing something in one language, then everyone wanting it in Lua too. Fuck off. It’s written already. Why have it in yet another language? It’ll be doing the same thing! Only then it’ll be in a language that I can muddle by in compared to C/C++ which I’ve been doing for 18 years (33 now, 15 when I started). What bloody good will that do? Will it mean more people can use it? No. There’s already a load of people who can write in C++ on the project who don’t know/use Lua. If anything it will reduce the number of people who can use it to only those who know/use Lua!

Does anyone really think that something has been done in Lua that couldn’t have been done in the C++ side? No. It does mean however that there’s a shitlod of C++ code, then a shitload of C++ interface code, and then a shitload of Lua code to make the C++ do what would have taken at least one shitload less of interface code to just do directly in C++.

You know what? If you find yourself embedding Lua to make your life easier and to get away from C++ then Lua isn’t the answer.

C# is.

September, still unemployed but happy update!

Posted by | Posted in Game Development, Life | Posted on 03-09-2012

So it’s been 2 months since I finished my last contract, well, 2 months and 3 days. It’s also now 3 months (+3 days) since I was last paid! Neither of these things is great news but I’m doing surprisingly well right now.

A quick recap just to get my brain in order:

*Sept->Dec (2011)* – Sony London (soho) studio worked on the Harry Potter Book of Spells using their new WonderBook platform, top secret at the time and I can still only mention that I worked on it. That was ok, frustrating at times as they were really hunting around to find the best bits so a lot of stuff fell on the cutting room floor. At the end they offered me another contract on more money beginning when I returned from sailing.

*Dec->Dec (2011)* – Sailed across the Atlantic with my Dad in a slightly broken boat! You can see the photos if you look in my photo archives. Scary, a bit dangerous but not too much – certainly seemed and felt dangerous at the time in places. Good experience, I can’t believe this is already 9 months ago.

*Jan (2012)* – Sony didn’t contact me before/over Xmas – total panic as I needed a job and thought I had everything lined up. Thankfully Rik had been trying to contact me about doing some mobile development for Android & iOS for his new company AppCrowd. This was interesting, I’d be able to work from home a lot and the pay was good enough, nowhere near Sony but not in London and working from home lots meant it was about equal. I took the job with AppCrowd.
Of course Sony contacted me a couple of days later and it was all down to a slight snaffu with HR etc, it’s always HR that screw things up but I’d made my mind up and stuck with the AppCrowd decision.

*Jan->Jun 29th (2012)* – The AppCrowd contract came to an end after a 1 month extension. In some ways I was glad because the publisher we were working for were becoming a real nightmare. It was like they simply had no clue about how to do game development despite being in the business for years before. I think they were just too used to being able to make big changes very easily in the older J2ME games. When the games get bigger and the resources required balloon you need to plan out those changes before work gets done on them or you end up paying multiple times for a single task.

*Jun->now (2012)* – the publisher decided to withhold payment for work done so they could negotiate a better deal, that meant AppCrowd don’t have the money to pay me for the work extra month of work I did in June…

So as you can imagine things are a bit tight money wise right now but they couldn’t be better on other fronts.

Danni is awesome, we went on holiday for a few days and did lots of Shakespeare related stuff (saw a play, visited museums etc), then she was away for 2 weeks travelling around Europe but I got her back this weekend :) she’s gone back to work (teaching) today which is a bit of a bummer as I’d have liked to have her around for a bit longer all to myself!

I have a new niece (see my photos) called “Poppy Isabella Lydiatt” who is, as babies often are, quite adorable. My other niece Amelia Joy is being a good older sister, she especially likes the fact that all of the babies things are pink because this means that she can be the boy and have everything blue… I’m buying her a Transformers toy :)

Also I’m actually getting interested in programming again rather than just doing it because I feel I must to keep my skills in shape, or for programming tests or indeed for work. Mostly this has meant doing things to which I’ve just been submitting various little patches covering some basic icon scaling and keeping the VC2010 project compiling etc. However I have also been working on the Factions code that I started months ago and it’s coming along nicely now.
In other coding news I have been working on the Syndicate level viewer and adding pathfinding too it and the GWEN UI etc but that’s gone on hold once more :)

…so there, end of braindump!

Of ripped assets and other painful metaphors

Posted by | Posted in Game Development | Posted on 07-08-2012

Time to speak of work and I haven’t been slacking off or wasting my time!

Last night I finally integrated the Gwen UI system with “the project” so it’s now even getting a bit of polish and usable UI rather than bizarrely assigned keypresses. There are some notes that I’d make about Gwen before you pick it as your UI of choice.

  1. Firstly I’m using it with SFML 2.0 pulled direct from the SFML git repository. That means I’ve had to make some changes just to get it to compile with it. These are documented on a number of different sites so I’ll probably upload a patch somewhere and try to document them on here in a separate post later.
  2. The Gwen solution for VS2010 is a classic example of an uber-solution. All of the projects are inside it for all possible libraries that it can build so if you don’t have Allegro or DirectX 2D available or in your global paths then you’re going to get lots of compile errors. A better approach might have been to have a number of different projects for each implementation but nevermind.
  3. This is very much an ingame psuedo-serious UI. It’s not for a HUD – though I’m sure I could twist it to do so, it’s providing MFC alike buttons, text windows, scroll bars and things like that. In this regard it’s exactly what I’m looking for and my impression of it’s abilities will probably change as I learn to use it.

So far it all looks very simple but powerful and I wish I’d actually gone through those minor hassles before now. I can think of a few older projects I’ll be integrating it with post-haste because they’ll look a lot more professional once I have!

Speaking of “the project” this is the one I’ve been reluctant to show screenshots of because I’m currently using ripped isometric graphics from a rather popular isometric Bullfrog game. It’s silly of me because eventually I think only the presentation will be isometric the graphics themselves will be 3D assets but I’m having to learn a lot of things that people back in the Amiga days just seemed to be aware of. I may be trivialising the work of those pioneers with that sentence but I don’t half feel stupid trying to work some of it out.

Ideally there’d be some set of these 3D assets lying around, I could integrate them now and modify the renderer to get everything displaying again. It’s a lot of work to look like you’re standing still but it’s a necessary part of the handover process from pseudo 2.5D to true 3D assets. Nothing is ideal though and my art skills have never been anything more than complete cack therefore I’m stuck with 2.5D and ripped assets for the time being. I can still get on with the system, audio and gameplay though oh and the ingame editor UI which is what the Gwen integration is for.

Broomy was talking seriously about quitting and going indie but needing a year long plan, so we spent most of an evening working out whether we can take “the project” further than my whimsical hacking. He’s actually more clued up about the whole thing than I am and would prefer to start smaller or use middleware like Unity or UDK which is very sensible but does have a learning curve we’d both have to get over. There’s also the issue of what else to make, we’re both talented enough to make some pretty cool little games, it’s more a question of: “What cool little game can two guys make in less than 3 months to start earning a tiny bit of money?” – this is actually quite trivial to answer if you consider the available markets, i.e: PC, Android or iOS. That answer is, none. Or less pessimistically: If 10,000 two man teams made a game in less than 3 months only one of those teams would break even or better.

Currently I find myself in the position of trying this by default since I don’t have a job, go me.

I’ve worked on and off on “the project” since January so it’s not exactly new but finding time and sustaining motivation is really tricky. it began as nothing more than a Syndicate level viewer. I just wanted to scroll around the maps a bit for nostalgia. Then I started exporting them in a more flexible and editable format, then came the different palettes, then dissecting the level construction, then the level collision etc. Now it’s becoming… well “the project”. The Syndicate-level-viewer origins became a bit of a hindrance a little while ago but only in a few minor ways, that’ll all fade away soon. Mostly though what I’m doing is just fun now, something I enjoy going back to work on each day.

Working on this has another benefit, it’s reminded me that I need to focus more, to work on just one thing at a time: Do one thing at a time and do it well.

My weak C++ is overkill.

Posted by | Posted in Game Development, Life | Posted on 30-07-2012

In the space of the last two weeks I’ve done a few programming tests, travelled down to Oxford for an interview + test and failed multiple times.

The feedback has been consistently inconsistent. Having failed at one test due to my approach being “overkill” and focusing too much on the technical I decided that it needed a bit more work and another approach implementing that was a little more bare bones. Then when the Oxford based studio wanted an example of some code I reused that “overkill” code for this purpose. This time apparently my C++ was “too weak“.

At this point, and over £90 down on travel expenses which will never be reimbursed, I’m feeling a little out of sorts with the whole process. Previously I’ve been head hunted by other companies, I’m still one of the first people that our old CEO approaches when he needs a coder and whilst everyone has something they don’t like about the way I code they still tell me that they’d happily work with me again.

So what’s going on?

Maybe I just don’t fit that eminently employable mold that everyone seems to be getting squeezed into lately? No, well “yes” but it’s not quite that simple. I don’t have a great range of demos to show people, or a large volume of finished projects. Mostly my spare time coding is learning about a single thing that doesn’t really add up to what you’d call “a demo“. The titles I’ve worked on are usually my demos but recently EVERYONE has insisted on seeing production quality code. By “production quality” I think they’re actually meaning some kind of aspirational coding quality that I’ve yet to see in real game code but we all know what they want anyway. I don’t have the code for a lot of projects, and even if I did it’s under NDA as far as I’m concerned. I don’t go around showing people the code from other companies projects because it’s not mine to show.

Of course there’s always the “coding test“; the ultimate independent arbiter of a programmers ability! There’s no better test than seeing how they solve the old point-is-inside-a-polygon with pen and paper to really tell you what kind of a programmer is sat before you! Maybe instead it’ll be something almost 20 or 30 lines long with a couple of functions just to see how they cope, or my new personal favourite; the whilst ski-ing with your new co-workers which route do you take semi-psychological question. Yay! Shame that “bury their corpses in the mountain snow for giving me this stupid test” wasn’t one of the possible answers.

What a fucking ball ache, plus a massive waste of time. They’re like any bloody test, they tell you how good the person sat in front of you is at that test and that test only. Not what they’re coding style is, not how quickly or well they can change that style to match your companies Byzantine preference, it doesn’t tell you how they learn, adapt or take criticism of the approach they’ve used. It doesn’t help you see why they have those recommendations on LinkedIn (Did they ask for them or were given? Who are those people to you friends or just co-workers?), what they’d like to learn more about or where they’re weak.

My personal experience is that I seem to be getting filtered out at a lot of these tests, even for companies where their whole game is less complicated than a single feature I’ve worked on for other titles. Apparently all those years of experience don’t matter because they don’t like the style I used to answer an arbitrary test question on a sub-subject that I haven’t needed to look at since the second year of University over 10 years ago… I still answered it and in the last interview I even got praised that I’d taken the correct approach to solving it!

What seems to happen is that I fall foul of these tiny tests that stretch some irrelevant scrap of knowledge or practice and that’s it, test over, interview failed. For the bigger tests, the tests you can do sat at home, I’m either going to too much effort (wtf!?!) or I’m just not hearing back from places, at all.

These places aren’t Valve, Sony, Microsoft, BioWare etc. No, I’m falling flat on my face with over a decade and a half of programming practice and 9 years in the Games Industry for companies making mobile phone games who have development teams of over 140 people. We did MotoGP 10/11 for Xbox360 and PS3 (and an internal PC version) with less than 50 and I wrote major pieces of core functionality and gameplay for those damned games.

Does that make me the best coder I know? Good grief no, I’m average, sometimes I’m better than the next guy, sometimes I’m worse, frequently it depends on the task at hand. If you want to wait on finding that super-coder-from-the-year-3001 then just say so but don’t expect to hire him for as little as you’re offering me.

The upshot?

I’m tired, I feel a little beaten up by this application and interviewing process. I wish I could hit pause, get a cuppa, tell everyone to fuck off… and quietly turn 33 years old on Saturday 4th Aug (I’d like a career change this year please!), returning renewed, ready … and tell everyone to _really_ fuck off because I’d rather go Indie than work for most of these places that I’m applying to. Sadly this on-again/off-again relationship I’ve had with contracts and work post-Monumental-Games-Ltd has meant that I have absolutely nothing left of savings, and since I haven’t even been paid for my last bit of contracting(!) I am pretty screwed this month too.

This isn’t a very satisfactory ending to this post because this isn’t some story with a conclusion, this is just my life recently.