Part 1: Pioneer’ing Terrain

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 17-04-2013

Yes yes I went for the pun-tastic title :)

This is going to be a bit more technical than usual, not massively, don’t stop reading for fear of equations or pages of code. No this is just going to be a technical description of some parts of Pioneers terrain rendering system because it’s quite odd, not great, but it’s effective enough. I’m basically going to brain dump a bit so I might come back and edit this in the future to clean it up, it will be something of a living document.

In the future I’ll refer back to this post to describe the ways that the terrain rendering and generation will be changing.

I should also point out that I’m not the original author of the terrain rendering or generation used in Pioneer. It’s just an area that I’ve been poking around in a lot lately and that I have a lot of changes that I want to make to that area. These changes should make generation of the terrain faster and more flexible, add control that we currently lack. They should also accelerate rendering and use a lot less memory as well as opening up new visual and game effects in the distant future.

The basic idea:

The terrain itself is a form of Quadrilateralised Spherical Cube which is a fancy way of saying that you’ve got 6 flat planes oriented to face in the six outward directions of a cube, then you deform them to be a sphere. This causes some distortions but it also comes with a lot of rather useful benefits.

For a start we can use a quadtree to define the surface of each (originally) square face of the (former) cube. That’s handy because there’s lot of simple terrain rendering algorithms that can use such an arrangement. In this case we’re using something similar to “Chunked LOD“. I say “similar” because the details differ, but you can read a better explanation on the “making worlds of sphere and cubes” blog post on Acko since it’s similar in concept.

So you know that:

  • we’re using 6 quadtrees, 1 per cube face.
  • each quadtree is deformed to fit onto a sphere.
  • that magically this is helpful…

Well when we want to render the terrain we start at the top of each of the quadtrees, i.e; the lowest detail level, and simply ask it if it has any children. Because it’s a quadtree it will have either 4 or none, hence the name “quad” in case you ignored that wikipedia like. When we reach one of these quadtree nodes that has no children we render whatever geometry it has because that must by definition be the highest level-of-detail available. This is a very simple system, you repeat this for each of the faces of the cube/sphere and you’ve very quickly rendered a convincing sphere onscreen.

Implicit in the above statement however is the idea that some of the quadtrees “nodes” won’t have children whereas others will. To do this we pass through the players camera position to an update method that goes through the quadtree, much like the rendering does, and at each node it does some maths to work out if a node is in 1 of 3 states: good enough, whether it should “split” and create some child nodes, or if it has child nodes and doesn’t need them anymore so can “merge” them.

  • Good enough: nothing happens, we have the perfect amount of detail. In the code we’ll take the “merge” branch but if we have no child node then nothing changes,
  • Split: Ah, this nodes not detailed enough so we can either create 4 child nodes and populate them with more detailed terrain, or if we already have them then we descend into those node and repeat the update process,
  • Merge: The flipside to being good enough is that we’re good enough and so if we have any child nodes then they’re superfluous and we can merge them, or in Pioneers case, just delete and erase them from existence.

In the current Pioneer Alpha 33 this all happens in a single separate thread with a few locks/mutexes preventing rendering from happening at the same time as updating. This creates a lot of waiting around for either updating the quadtree or waiting for rendering to complete. It was the simplest way to improve things at the time it was created though because it meant that it didn’t stall the main thread whenever the, very expensive, terrain generation needed to be done. Doing it this way keeps everything looking like a single-threaded process and this is inherently simpler to write and understand. That means there should be less bugs and less problems.

Unfortunately Pioneer tries to do some odd things that exploit the fact it’s storing the terrain in a quadtree. This unfortunate stuff is done in the name of performance in that it tries to avoid regenerating some of the terrain data by keeping close track of who it’s neighbours are in the quadtree and then copying as much data as it can. All of this neighbour tracking and data copying is very expensive both in performance terms and complexity. Overall it was this that took me the longest to understand rather than the actual terrain generation or rendering parts!

This neighbourly management is one of the first things that is going to change, hopefully by the next release date.

You see the neighbouring node stuff is complex, it also imposes some form on the quadtree structure that we don’t need and that means things cannot be easily generated in little pieces or across many CPU cores. Ideally we’d like the update logic to happen, it decides to Split or Merge some nodes. Then it asks a system to go away and make it the data it needs. Those requests disappear off into the aether and a short while later new terrain node data returns ready to be put into the quadtree. Whether that’s done on a super-powerful GPU, farmed out across a distributed network, or done on a differet core of the same CPU we don’t know or care. Removing the neighbour management, or at least the data copying portion, means breaking up the generation of these quadtree terrain nodes gets much simpler and easier to distribute across many CPU cores.

Quadtree specifics and quick/easy changes:

A quadtree can be used to store any kind of information, it doesn’t even have to be spatial or geometric like we are in Pioneer. However that is our use case and so we store a tonne of data in each of the quadtrees nodes. Now by a tonne of stuff I really mean a metric-fucktonne of stuff! We store in terrifying order:

  • A ref counted pointer to the context data that holds the terrain generation methods,
  • the double precision vector of the corner vertices,
  • 3 * double precision vector arrays to the vertex, normal and colour data,
  • a GLuint for a nodes vertex buffer object number,
  • 4 pointers to its child nodes (it’s a quadtree afterall),
  • another pointer too this nodes parent,
  • 4 more pointers to it’s possible neighbours (whether or not it has any),
  • a pointer to the parent object called a geosphere, which holds the top level quadtree roots,
  • double precision rough length of an edge (from corner to corner along an edge),
  • 2 * double precision vectors for the centroid and clipCentroid of the node used for updating and rendering calculations,
  • double clipRadius,
  • depth of the node within the quadtree,
  • a mutex for locking access to the child nodes during updating or rendering,
  • double “distMult” which is calculated and used in the constructor and NEVER AGAIN.

Ok so that did get a little non-geek hostile there (it’s about to get worse!) but what it boils down too is that there’s a lot of shit in every single node. Some of it is terrifying, absolutely pant wettingly funnily bad to have there. Lets take the worst of the offenders, and before I get any comments pointing out the others, yes I am aware of all of them:

3 * double precision vector arrays to the vertex, normal and colour data – These are bad for a few reasons but the obvious ones are that even if we need to store this data then we’re storing it in the wrong format for AT LEAST two of the pieces of data:

  • Normals – these can be efficiently packed down into some very compact formats, in reality you could get down to a single 32-bit integer with two of the coordinates packed into 16-bits each and then recalculate the 3rd component when need, probably in the vertex or pixel shader. In this case we can quickly halve the memory usage however by simple making it a float vector instead of a double.
  • Colours – oh-my-god, you don’t EVER need double precision for colour, in this case we have 192-bits for colour… 16-bits would probably be good enough, for perfect colour we might need 24-bits but NOT 192-bits. In this instance a single 32-bit integer is our best bet for quickly encoding the colour saving us a staggering 160-bits PER VERTEX and we’ve got 8-bits to spare if we need them for something (alpha channel?) in future.
  • So why was the colour using 64-bit vector3 representation? Because the heightmap generation was sneakily storing the height in the red channel during the terrain generation so that it could use it later…
  • …this was so it could retrieve the height when calculating the per vertex colour. Why not just store the height directly in another array? This adds back another 64-bits so we’ve only saved 92-bits per vertex.
  • Of course we’re only storing all of this temporarily anyway, until we can generate the new vertex buffer for the patch (nuuurgh, I’ll rant about this later) which then turns it into floating point vector data. Since this is the case however we can in fact discard the vertex position data completely, all 192-bits of it, and then rebuild it from the height data that we’re now storing instead.
  • Total so far: Normals = 50% = 192-bits to 96-bits, Colours = 1/6th = 192-bits to 32-bits, Vertex = 1/3rd = 192-bits to 64-bits, so 576-bits to 192-bits per vertex.

That reduction has brought my peak memory consumption from ~700MiB when sat on the surface of the New Hope start position down to ~280MiB, and as you can see from the above there’s still more to be saved from a variety of areas. That’s because those arrays in particular are allocated to contain all of the visible data for each node. The rest is a bit nasty but it’s nothing in terms of memory usage by comparison because even with a few thousand nodes in the quadtree(s) they’re only a tiny fraction of all the data stored.

Hmm, that’s gone bit rambl-o-matic because I’m bloody tired.

Well maybe someone might find it interesting, I’ll continue with this stuff… soon! :)

Andy

Further in the series: Part 2 is up now.

Comments posted (2))

  1. Now I’m not wondering anymore why I never looked at this part of the code. It is SCARY. Bloody interesting but scary!

  2. It’s not too bad after a while, at first it’s a wall of code sensation but then you start to see the different pieces and how they’ve evolved. I’ll try and cover it a bit more in Part 2 :)