Part 2: Now with… no feeling in my arms due to all the typing!

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 20-04-2013

I should have made this clear from the outset but these posts aren’t a description of the “best way” to do anything, they’re more akin to documenting how we currently ARE doing things.

There’s a great deal of information out there on how these things can work but with Pioneer you can grab the source code and actually see it in progress. There’s great value in getting hold of something, testing it, debugging it, making changes and watching it blow up in your face ;)

Previously:

In Part 1 I gave a very basic description but at no point did I attempt to explain things at the code level… and I’m not going to get too close to it during this series but I’ve got to get at least a little more familiar because some parts just don’t make any sense at first.

What does this do and why does it do it?:

One of the first things you see with the “GeoSphere” code leading up to, and including, the Alpha 33 release is that it’s entirely contained in just two files which are several thousand lines long and full of some nasty complex looking code. This is an unfortunate side effect of the way it’s evolved. At one point it might have been reasonably clean but as with all things they get added too and so the complexity grows along with the sheer amount of code in a single file. Eventually you just have to prune it back but with these situations you first have to understand what’s going on before you can do that pruning and splitting up of files. The complexity means that no-one understands the system and deciphering it becomes the large part of the burden so the situation is rarely resolved.

So lets tackle the big pieces of the complexity first.

GeoSphere:

The GeoSphere class is the main unit the rest of the code will interact with, it’s the part visible to the outside. It contains instances of the other main classes and does some general orchestration of them. It can be thought of as being split into two main parts however due to it’s use to static methods and members:

  1. static methods & members that affect ALL GeoSphere instances.
  2. non-static methods & member that are about a specific instance of GeoSphere.

The static part handles the creation and initialisation of stuff that affects all of the GeoSphere instances, the GeoPatchContext which holds information about the size and detail of the GeoPatch’s for example. It also kicks off the thread which manages updating the level-of-detail calculations I mention in Part 1.

The per-instance stuff is fairly basic:

  • m_terrain pointer, which is a “Terrain” pointer that will handle all of the heavy lifting logic for generating height and colour information for the patches.
  • 6 GeoPatch pointers, these are the 6 faces of the cube we will turn into a sphere, these look after themselves most of the time, we just have to call methods(/function) one them.
  • Constructor, destructor, Render, GetHeight, GetColor and BuildFirstPatches methods.

GetHeight and GetColor are simply helper methods, they just pass some information into a method of m_terrain and return the data, nothing more. Likewise the constructor and destructor are just setup/cleanup so lets ignore them.

The two methods of interest here are Render and BuildFirstPatches.

The Render method is called from the main thread as all of our rendering has to happen there currently. However, Render does more than just render, yeah I know, horrible eh but it’s what we’ve got.

At the top of the method it mostly does some setup of materials, one’s subsequently used by ALL of the patches that will be drawn. Then we get to a call to BuildFirstPatches. This is checked every single time we call Render but it only does anything on the first call and it’s lazy evaluation done lazily. It could be quite convoluted to create the GeoSphere and then immediately call BuildFirstPatches so instead of resolving this complexity it just checks to see if the root node of the patches have been built every frame and if it hasn’t it calls this method. The most stars/planets/moons I’ve yet to see in a system was about ~50 so it’s not too much overhead but it’s still icky.

After this check we configure basic lighting and then we start calling the Render method on each of the root GeoPatch instances (the ones created by BuildFirstPatches). I’ll describe them more later but they essentially do the quadtree traversal finding the nodes worth rendering.

Once we’re finished there we release out materials and then do some nasty management of the multithreaded updating logic, and finally we store the current camera position that will be used in the updating and level-of-detail (LOD) calculating thread. This is actually another reason why BuildFirstPatches is done lazily within the Render call, it’s because you can’t properly update the LOD until you have a camera position, but you don’t have one until you’ve tried to render. By putting the BuildFirstPatches call in Render you almost guarantee that you will have a valid camera position by the time that the LOD thread gets around to updating your GeoSphere. Hacky but it’s worked all this time.

GeoPatch:

If GeoSphere were the potatoes then this is the meat. GeoPatchContext might take up the top 1/3rd of the file but that stuff is actually fluff, it’s bulky but it’s not doing a lot of active logic, just setting things up for GeoPatch to use later, and they’re mostly rendering related but they’re not important until we get to GeoPatch rendering.

GeoPatch is defined entirely within the CPP file for GeoSphere, I don’t mind this technique overly much because C++ can be rather limited when you’re trying to hide an implementation but this is crazy. The GeoSphere implementation is only ~500 lines of code, and it starts at line 1053! GeoPatch on the other hand takes 670 lines in the middle of the file. Ugly and confusing for people new too it. Also confusing is that the quadtree, patch data, and some threading handling information are all muddled together.

We have as the 3 parts that should be more distinct:

  1. kids (*4), parent and edge friends (*4) – this is the quadtree information.
  2. m_kidsLock pointer to a mutex – this is the threading rubbish, it’s used to stop you from updating the patch when you’re trying to render it and vice versa.
  3. everything else is the patch data or at least can be considered that way.

To add to this confusing class data layout is that some of it will be called and used in both the main thread and the LOD update thread with access gated by the m_kidsLock mutex. Threading is almost never simple but this lack of clarity makes it hard to understand both when and where some members will be updated.

Next we have a bunch of functions that stump most people, I’ll list them and then explain their purpose:

  • GetEdgeMinusOneVerticesFlipped
  • GetEdgeIdxOf
  • FixEdgeNormals
  • GetChildIdx
  • FixEdgeFromParentInterpolated
  • MakeCornerNormal (templated for extra fun)
  • FixCornerNormalsByEdge
  • GenerateEdgeNormalsAndColors
  • OnEdgeFriendChanged (needed)
  • NotifyEdgeFriendSplit (needed)
  • NotifyEdgeFriendDeleted (needed)
  • GetEdgeFriendForKid (needed)

All of these serve one purpose, to confuse the hell out of everyone reading the code… no ok that’s being mean, the last 4 really are useful… still mean? Yeah a bit, I’ll have to explain.

Generating terrain can be a computationally expensive process. The way we do it will be a couple of future articles but the very very short version is that we run an expensive set of mathematical noise functions many many times for every single vertex that we want to generate the position for. Doing this for a single vertex is expensive but we do it 10’s of thousands of times even for a low detail planet. Most of the code, and ALL of that complexity above, is an attempt to avoid generating as much of it by copying it from other patches that have already been generated.

The reason that the last four methods are, mostly, needed is because they’re both the way that we keep track of who our neighouring nodes are and how we kick off the data copying process.

The thing to take from this is not that these are a good idea, and I’m not going to explain how they work either, the important thing is that they’re OPTIONAL. They’re costly too, a lot of time is spent maintaining all of this and all of the data copying is pretty bad for performance for other reasons. Not least that it often just has to generate the data anyway because a neighbour can’t be found at the correct level. They also make it much harder to make everything into smaller tasks that can better use multi-core CPUs. All things considered I’d rather not have them.

What all this mean is that GeoPatch has that lot, and only another handful of interesting methods which I’m going to go into a little detail with now:

  • determineIndexBuffer – I will forever regret that this doesn’t match our method naming convention, i.e: that lowercase ‘d’. Ignoring that though; this is what uses the result of the method “Init” from GeoPatchContext, most of it anyway. “Init” went along for many lines of code calculating 16 index buffers, it was a lot of code and earlier I said it didn’t do very much. Well in truth what it did was set up 16 index buffers so that here, in this method, we could do a very fast and simple bit of OR’ing and come up with an index into that list of 16 index buffers. Specifically we take our edgeFriends and we see which ones of them we actually have, and we find the perfect index buffer that uses low-resolution edges when we don’t have a neighbour on that edge, and hi-resolution edges when we do. There are 16 possible combinations, although in truth we only ever have a subset of those used on our terrain. It’s quick and easy to calculate them though, and this is fast way to index those.
  • UpdateVBOs & _UpdateVBOs – even I’m not entirely sure where these two are called from, which thread and under what circumstances. This is because of those many edge copying methods, I suspect that UpdateVBOs can be called from either depending on circumstances, but it’s usually called from the LOD thread. _UpdateVBOs on the other hand should only EVER be called from the MAIN thread as it interacts with OpenGL by updating or creating the vertex buffer object (VBO) that is used to render the patch on screen so if it’s ever called from another thread it will almost certainly crash, or worse it’ll keep running and fail silently :(
  • GetSpherePoint – gets a point on a sphere, specifically it performs a bilinear interpolation between the four corner points of a patch. At the lowest detail level then, that means the corners of a cubes face, i.e: this turns our cube into a sphere – or one faces of it into a curved patch on the surface of the sphere – part of the magic happens here. Once it’s done the bilinear bit it “normalises” it, turns it into a vector of unit length (Off topic; I hate wikipedias vector descriptions, they penalise those striving to learn some maths and strokes the egos of those who already do) so that when you sum the parts of the vector they equal 1. GenerateMesh makes heavy use of GetSpherePoint because it creates the patch vertices.
  • GenerateMesh – ignore the centroid for now, focus on the first couple of nested for loops because they generated ALL of the heights for every vertex of a patch. This *IS* the terrain creation right here in two for loops.
    • The call to GetHeight is the expensive call to the generation process itself but it’s interchangable with any other. We have many versions of this call they all do things using different maths and could even use other ways of generating terrain so we’ll deal with them another day. No the important bit is that if you strip all of GeoPatch down to it’s absolute bare minimum then this function and GetSpherePoint is almost all you’d have left with just these two for loops.
    • We get the height using a double precision floating point variable because it will be in the range 0.0 to 1.0 yet it has to be very very precise and if we used a single precision (32-bit) float then we’d lose detail and get visible banding on the terrain. There are ways and means of avoiding this but doubles are simple and they’re fast enough most of the time.
    • We take that height and add 1.0 to it, then we take this new value and use it to scale the vertex position from GetSpherePoint and that’s it, we have a terrain height that will be of at least unit length (1.0) up to a maximum of length 2.0 which would make for some very high mountains.
    • The second pass is now where the trouble starts but can be summed up with the word: “normals” – the calculation of which uses 4 vertex positions, at least two of those will be within this patches data, but the other two will be in our neighbouring patches. That’s why these for loops don’t cover the entire patches data. Instead they cover the central part, the rest of the data, the edges, will be calculated by the mess of edge copying methods I skipped describing above.
    • The colour is also calculated in this 2nd pass, it requires the height and the normals so it’s also done in the edge copying. You might notice if you’re reading through the code that the height is stored in the colour in pass 1, then used in pass 2. I covered why this was and why it was bad in Part 1 – as I say, this is a description of how it is, not how it should be!
  • Render – an actual blend of quadtree and patch all in one here. This is depth first traversal & rendering of a quadtree in action! Render MUST be called in the MAIN thread only because it’s dealing with OpenGL. First we see if we have any child nodes, if we do we just forward on the values that we passed into the function. That’s the depth first traversal part, because in a tree each node will do the same thing, over and over until it reaches a leaf node, the one without children. When we do the rendering is straight forward:
    • Lock the node – so it can’t be updated in the LOD thread,
    • optionally update the vertex buffer object (VBO),
    • test to see if we’re visible and return without doing anything if we’re not,
    • push the current matrix and then translate the view according the difference between the camera position and the clipCentroid (which is the centre of the four corner vertices),
      • this helps us deal with a problem called “jitter” (/”jittering”) caused by the GPU only using 32-bit floats which aren’t precise enough to represent the terrain. What we do is move the rendering of the patch in such a way that 32-bit’s are precise enough for us by offsetting it from the camera position. We’re effectively moving it closer to the camera before applying the position to avoid the jittering! Patches further away still jitter like crazy, but it doesn’t matter because they’re small and far away!
    • update some information we store about how many triangle we draw,
    • setup our buffers, there’s determineIndexBuffer from earlier,
    • do the actual drawing,
    • now release our buffers and pop the matrix so that this patches matrix offset doesn’t affect the next patches to be rendered.
    • and we’re done with Render :)
  • LODUpdate – This is called only from the LOD thread, the MAIN thread never goes near it, I’ll break it down just like Render above but this is where we decide to increase or decrease the detail of a GeoPatch by splitting or merging them:
    • Slightly different to the above we lock the “m_abortLock” to see if the thread is trying to quit – it helps us exit faster is the idea but it’s implementation specific, ignore it.
    • canSplit – leading question eh? We iterate through our neighbours and perform some tests like: Do we have them all? Are they less detailed than us or higher in the quadtree than us? If we pass that then we check that we’re not already too deep to split and finally we apply some maths to decide if our edges length is more than the distance from the camera to the centroid we calculated in GenerateMesh. This is a crude approximation of determining how big the patch will be on the players screen. You can find much better ones if you Google for geomorphing. There’s an additional check to see if we have a parent because someone decided we should always split in that instance… not sure why.
    • canMerge – apparently yes… I don’t know why this is here, we could avoid an if test later on, I hope the optimiser gets rid of it!
    • canSplit branch – if we really really canSplit then, in summary: we first see if we have children, if we then we just pass on the call to LODUpdate otherwise; we create 4 child nodes, we set ourselves as their parent, we setup the relationship between each of them and the neighbours that we know about (if any) then we call GenerateMesh on each of them, we GenerateEdgeNormalsAndColors using all that complicated shit above, we UpdateVBOs which sets the flag that says that we really need to call _UpdateVBOs back in the MAIN thread and finally we pass on the call to LODUpdate to our newly created child nodes.
    • canMerge branch – if we cannot split then we’ll go this way, always (stupid code). That doesn’t mean we can actually merge though, not if have no child nodes. If we do then we’ll happily destroy them, and in their destructors they’ll destroy their kids and so on and so forth. When I first starting reading this code it took me a while to realise why the tree isn’t unbalanced and constantly destroying and creating node. It’s because we split aggressively. We only take the merge branch when there’s no way of splitting.
    • A word on locking: in both the canSplit and canMerge branches we Lock but at different times. In canMerge land there’s not much to do, just destroy stuff so we lock straight away. In canSplit land though we can delay the mutex locking until later, we allocate the new child nodes to temporaries, then call the expensive GenerateMesh method, only once we’ve done that do we lock the mutex and copy the temporaries into the correct child node pointers. They have to be there for to correctly notify their new neighbours that they exist and to generate the edge normals and colours. At least this way the mutex isn’t locked for as long, because that would delay the rendering call back in the MAIN thread if it was.
  • GeoPatch constructor/destructor – a quick note about these two, the generate some data used throughout the rest so are worth reading and keeping in mind. The destructor also does some of the edge friend management by letting it’s friends know when it’s destroyed but otherwise they’re pretty simple.

Wow, 3202 words in so far everyone… everyone? Anyone? Ahem, anyway.

The end?:

That’s it for the whistle-stop tour for now. I know I know, there’s still slightly more than a third of the CPP file that I haven’t even started to cover but that’s fine because I’m not going too. At all.

It’s mostly too specific to some of Pioneers quirks, some of it is just structural so it’s run once to set things up and then never used again but mostly I’m ignoring it because it doesn’t deal with what you need to understand.

As I’ve described above the existing terrain generation is in two parts; one in the main thread with some setup and then the rendering, the second is in it’s own thread. The two cross over at places and prevent crashes and other problems by locking mutexes. This works but it causes performance problems of it’s own. There’s a great deal of data copying stuff that can be avoided by simply generating more than you need, that means you don’t need to track relationships so much and that simplifies the code massively. That might have performance issues of it’s own of course since you are doing extra work but it simplifies the code so very much and I’ve already covered the reasons for doing that in Part 1.

Next in the series I’ll probably cover some of the ways we generate the heights for the terrain… no, no actually I’m not doing that at all. Yikes that’s terrifying stuff. No instead I’ll discuss what I’ve been working on to make it take advantage of multithreading and many cores. In that article I’ll also cover a side project called GLSLPlanet which moves some of the work onto the GPU, and why I’m not doing that in Pioneer just yet even though it’s based on the same code I’ve just described.

See you next time,

Andy

Comments posted (2))

  1. I’d rather hear about how you managed to burn steamed veg.

  2. Then you’re going to love parts 3 and 4 :P