Pioneer Clouds

Posted by | Posted in environment, Game Development, Pioneer | Posted on 03-06-2016

Some time ago I tried slapping a texture of the Earths clouds onto a sphere around Pioneers planets. It didn’t look awful but it was a static texture meant for just the Earth itself and having a selection of high-resolution textures to pick from would look very repetitive uncomfortably quickly.

As usual in a thread like that I immediately got asked why it wasn’t better, why aren’t I doing like X,Y or Z, etc but the idea is still bubbling away in my head. The thread itself was really just to gather resources and ideas rather than to track anything.

Now I have a small gallery of test images based on some noise based shaders which update over time but look rubbish.

See, from the ground:

From the ground

From orbit:

From Orbit

To get better looking clouds, ones that look like they’re generated by the planets surface and properties, you need to analyse that planets geography and the energy inputs/sinks.

There are a couple of interesting pages like the Stainless and Dungeon League sites who really dig into and iterate through ideas to produce some very interesting results.

Heat & Friction

Heat & Friction

2D normalised wind direction

2D normalised wind direction

I fairly sure that I can combine them already to get some interesting results but I hope that more than using them raw like this I can do some more processing to generate evaporation, rainfall (precipitation), rain-shadow style maps that will look even more compelling.

Right now I am basing the number of “bands” of weather on the Earth’s global atmospheric circulation which will not hold true for other worlds, so having some way of determining the number of “cells” to divide a planets circulation into will be necessary. Take Jupiter as an example, it has numerous “cells” which all have different gases being mixed into them giving it that distinctive layered look. Depending on the planets size, rotational velocity, atmospheric density, viscosity, heat absorbtion, angle of inclination to the nearest star(s), etc there could be anything from 1 to 100’s of cells dividing up the atmophere.

That would be something amazing to see, different on each and every world with corresponding weather systems and biomes.

That still won’t be the end however as all of this so far is excruciatingly slow to generate, requiring as it does a decent level of heightmap detail.

Generating those heightmaps is the most expensive single process in Pioneer, which is why the development of the multi-threaded processing made such a big difference.

Some simplifications, optimisation and cleverness will be required before this can ever make it into the game proper.

PS: IN parting I should also add that working on this came about because I needed to generate from of this data for the new JSON driven terrain generation system and because of the XFrontier forum thread which just looked too damned good to pass up.

Part 3: Many hands make for light work.

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 19-05-2013

Well this episode has taken longer than planned to get written, or event started for that matter. So lets not delay as this is part 3 of me attempting to explain the terrain system used in Pioneer Space Sim.

If you want to recap and point out poor grammar or spelling mistakes then go ahead and read Parts “1: Pioneer’ing Terrain” and “2: Now with… no feeling in my arms due to all the typing!”.

How things change:

Since Part 2 there have actually been some developments which make this edition even more relevant. At the end of it I mentioned that I’d be covering my work making the terrain generation multi-threaded using a job based system. Well that has now been merged into master and is available in the latest downloads. It’s got some bugs fixed since then and hopefully this will only get truer in the future ;)

Read the rest of this entry »

Part 2: Now with… no feeling in my arms due to all the typing!

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 20-04-2013

I should have made this clear from the outset but these posts aren’t a description of the “best way” to do anything, they’re more akin to documenting how we currently ARE doing things.

There’s a great deal of information out there on how these things can work but with Pioneer you can grab the source code and actually see it in progress. There’s great value in getting hold of something, testing it, debugging it, making changes and watching it blow up in your face ;)

Previously:

In Part 1 I gave a very basic description but at no point did I attempt to explain things at the code level… and I’m not going to get too close to it during this series but I’ve got to get at least a little more familiar because some parts just don’t make any sense at first.

What does this do and why does it do it?:

One of the first things you see with the “GeoSphere” code leading up to, and including, the Alpha 33 release is that it’s entirely contained in just two files which are several thousand lines long and full of some nasty complex looking code. This is an unfortunate side effect of the way it’s evolved. At one point it might have been reasonably clean but as with all things they get added too and so the complexity grows along with the sheer amount of code in a single file. Eventually you just have to prune it back but with these situations you first have to understand what’s going on before you can do that pruning and splitting up of files. The complexity means that no-one understands the system and deciphering it becomes the large part of the burden so the situation is rarely resolved.

So lets tackle the big pieces of the complexity first.

GeoSphere:

The GeoSphere class is the main unit the rest of the code will interact with, it’s the part visible to the outside. It contains instances of the other main classes and does some general orchestration of them. It can be thought of as being split into two main parts however due to it’s use to static methods and members:

  1. static methods & members that affect ALL GeoSphere instances.
  2. non-static methods & member that are about a specific instance of GeoSphere.

The static part handles the creation and initialisation of stuff that affects all of the GeoSphere instances, the GeoPatchContext which holds information about the size and detail of the GeoPatch’s for example. It also kicks off the thread which manages updating the level-of-detail calculations I mention in Part 1.

The per-instance stuff is fairly basic:

  • m_terrain pointer, which is a “Terrain” pointer that will handle all of the heavy lifting logic for generating height and colour information for the patches.
  • 6 GeoPatch pointers, these are the 6 faces of the cube we will turn into a sphere, these look after themselves most of the time, we just have to call methods(/function) one them.
  • Constructor, destructor, Render, GetHeight, GetColor and BuildFirstPatches methods.

GetHeight and GetColor are simply helper methods, they just pass some information into a method of m_terrain and return the data, nothing more. Likewise the constructor and destructor are just setup/cleanup so lets ignore them.

The two methods of interest here are Render and BuildFirstPatches.

The Render method is called from the main thread as all of our rendering has to happen there currently. However, Render does more than just render, yeah I know, horrible eh but it’s what we’ve got.

At the top of the method it mostly does some setup of materials, one’s subsequently used by ALL of the patches that will be drawn. Then we get to a call to BuildFirstPatches. This is checked every single time we call Render but it only does anything on the first call and it’s lazy evaluation done lazily. It could be quite convoluted to create the GeoSphere and then immediately call BuildFirstPatches so instead of resolving this complexity it just checks to see if the root node of the patches have been built every frame and if it hasn’t it calls this method. The most stars/planets/moons I’ve yet to see in a system was about ~50 so it’s not too much overhead but it’s still icky.

After this check we configure basic lighting and then we start calling the Render method on each of the root GeoPatch instances (the ones created by BuildFirstPatches). I’ll describe them more later but they essentially do the quadtree traversal finding the nodes worth rendering.

Once we’re finished there we release out materials and then do some nasty management of the multithreaded updating logic, and finally we store the current camera position that will be used in the updating and level-of-detail (LOD) calculating thread. This is actually another reason why BuildFirstPatches is done lazily within the Render call, it’s because you can’t properly update the LOD until you have a camera position, but you don’t have one until you’ve tried to render. By putting the BuildFirstPatches call in Render you almost guarantee that you will have a valid camera position by the time that the LOD thread gets around to updating your GeoSphere. Hacky but it’s worked all this time.

GeoPatch:

If GeoSphere were the potatoes then this is the meat. GeoPatchContext might take up the top 1/3rd of the file but that stuff is actually fluff, it’s bulky but it’s not doing a lot of active logic, just setting things up for GeoPatch to use later, and they’re mostly rendering related but they’re not important until we get to GeoPatch rendering.

GeoPatch is defined entirely within the CPP file for GeoSphere, I don’t mind this technique overly much because C++ can be rather limited when you’re trying to hide an implementation but this is crazy. The GeoSphere implementation is only ~500 lines of code, and it starts at line 1053! GeoPatch on the other hand takes 670 lines in the middle of the file. Ugly and confusing for people new too it. Also confusing is that the quadtree, patch data, and some threading handling information are all muddled together.

We have as the 3 parts that should be more distinct:

  1. kids (*4), parent and edge friends (*4) – this is the quadtree information.
  2. m_kidsLock pointer to a mutex – this is the threading rubbish, it’s used to stop you from updating the patch when you’re trying to render it and vice versa.
  3. everything else is the patch data or at least can be considered that way.

To add to this confusing class data layout is that some of it will be called and used in both the main thread and the LOD update thread with access gated by the m_kidsLock mutex. Threading is almost never simple but this lack of clarity makes it hard to understand both when and where some members will be updated.

Next we have a bunch of functions that stump most people, I’ll list them and then explain their purpose:

  • GetEdgeMinusOneVerticesFlipped
  • GetEdgeIdxOf
  • FixEdgeNormals
  • GetChildIdx
  • FixEdgeFromParentInterpolated
  • MakeCornerNormal (templated for extra fun)
  • FixCornerNormalsByEdge
  • GenerateEdgeNormalsAndColors
  • OnEdgeFriendChanged (needed)
  • NotifyEdgeFriendSplit (needed)
  • NotifyEdgeFriendDeleted (needed)
  • GetEdgeFriendForKid (needed)

All of these serve one purpose, to confuse the hell out of everyone reading the code… no ok that’s being mean, the last 4 really are useful… still mean? Yeah a bit, I’ll have to explain.

Generating terrain can be a computationally expensive process. The way we do it will be a couple of future articles but the very very short version is that we run an expensive set of mathematical noise functions many many times for every single vertex that we want to generate the position for. Doing this for a single vertex is expensive but we do it 10’s of thousands of times even for a low detail planet. Most of the code, and ALL of that complexity above, is an attempt to avoid generating as much of it by copying it from other patches that have already been generated.

The reason that the last four methods are, mostly, needed is because they’re both the way that we keep track of who our neighouring nodes are and how we kick off the data copying process.

The thing to take from this is not that these are a good idea, and I’m not going to explain how they work either, the important thing is that they’re OPTIONAL. They’re costly too, a lot of time is spent maintaining all of this and all of the data copying is pretty bad for performance for other reasons. Not least that it often just has to generate the data anyway because a neighbour can’t be found at the correct level. They also make it much harder to make everything into smaller tasks that can better use multi-core CPUs. All things considered I’d rather not have them.

What all this mean is that GeoPatch has that lot, and only another handful of interesting methods which I’m going to go into a little detail with now:

  • determineIndexBuffer – I will forever regret that this doesn’t match our method naming convention, i.e: that lowercase ‘d’. Ignoring that though; this is what uses the result of the method “Init” from GeoPatchContext, most of it anyway. “Init” went along for many lines of code calculating 16 index buffers, it was a lot of code and earlier I said it didn’t do very much. Well in truth what it did was set up 16 index buffers so that here, in this method, we could do a very fast and simple bit of OR’ing and come up with an index into that list of 16 index buffers. Specifically we take our edgeFriends and we see which ones of them we actually have, and we find the perfect index buffer that uses low-resolution edges when we don’t have a neighbour on that edge, and hi-resolution edges when we do. There are 16 possible combinations, although in truth we only ever have a subset of those used on our terrain. It’s quick and easy to calculate them though, and this is fast way to index those.
  • UpdateVBOs & _UpdateVBOs – even I’m not entirely sure where these two are called from, which thread and under what circumstances. This is because of those many edge copying methods, I suspect that UpdateVBOs can be called from either depending on circumstances, but it’s usually called from the LOD thread. _UpdateVBOs on the other hand should only EVER be called from the MAIN thread as it interacts with OpenGL by updating or creating the vertex buffer object (VBO) that is used to render the patch on screen so if it’s ever called from another thread it will almost certainly crash, or worse it’ll keep running and fail silently :(
  • GetSpherePoint – gets a point on a sphere, specifically it performs a bilinear interpolation between the four corner points of a patch. At the lowest detail level then, that means the corners of a cubes face, i.e: this turns our cube into a sphere – or one faces of it into a curved patch on the surface of the sphere – part of the magic happens here. Once it’s done the bilinear bit it “normalises” it, turns it into a vector of unit length (Off topic; I hate wikipedias vector descriptions, they penalise those striving to learn some maths and strokes the egos of those who already do) so that when you sum the parts of the vector they equal 1. GenerateMesh makes heavy use of GetSpherePoint because it creates the patch vertices.
  • GenerateMesh – ignore the centroid for now, focus on the first couple of nested for loops because they generated ALL of the heights for every vertex of a patch. This *IS* the terrain creation right here in two for loops.
    • The call to GetHeight is the expensive call to the generation process itself but it’s interchangable with any other. We have many versions of this call they all do things using different maths and could even use other ways of generating terrain so we’ll deal with them another day. No the important bit is that if you strip all of GeoPatch down to it’s absolute bare minimum then this function and GetSpherePoint is almost all you’d have left with just these two for loops.
    • We get the height using a double precision floating point variable because it will be in the range 0.0 to 1.0 yet it has to be very very precise and if we used a single precision (32-bit) float then we’d lose detail and get visible banding on the terrain. There are ways and means of avoiding this but doubles are simple and they’re fast enough most of the time.
    • We take that height and add 1.0 to it, then we take this new value and use it to scale the vertex position from GetSpherePoint and that’s it, we have a terrain height that will be of at least unit length (1.0) up to a maximum of length 2.0 which would make for some very high mountains.
    • The second pass is now where the trouble starts but can be summed up with the word: “normals” – the calculation of which uses 4 vertex positions, at least two of those will be within this patches data, but the other two will be in our neighbouring patches. That’s why these for loops don’t cover the entire patches data. Instead they cover the central part, the rest of the data, the edges, will be calculated by the mess of edge copying methods I skipped describing above.
    • The colour is also calculated in this 2nd pass, it requires the height and the normals so it’s also done in the edge copying. You might notice if you’re reading through the code that the height is stored in the colour in pass 1, then used in pass 2. I covered why this was and why it was bad in Part 1 – as I say, this is a description of how it is, not how it should be!
  • Render – an actual blend of quadtree and patch all in one here. This is depth first traversal & rendering of a quadtree in action! Render MUST be called in the MAIN thread only because it’s dealing with OpenGL. First we see if we have any child nodes, if we do we just forward on the values that we passed into the function. That’s the depth first traversal part, because in a tree each node will do the same thing, over and over until it reaches a leaf node, the one without children. When we do the rendering is straight forward:
    • Lock the node – so it can’t be updated in the LOD thread,
    • optionally update the vertex buffer object (VBO),
    • test to see if we’re visible and return without doing anything if we’re not,
    • push the current matrix and then translate the view according the difference between the camera position and the clipCentroid (which is the centre of the four corner vertices),
      • this helps us deal with a problem called “jitter” (/”jittering”) caused by the GPU only using 32-bit floats which aren’t precise enough to represent the terrain. What we do is move the rendering of the patch in such a way that 32-bit’s are precise enough for us by offsetting it from the camera position. We’re effectively moving it closer to the camera before applying the position to avoid the jittering! Patches further away still jitter like crazy, but it doesn’t matter because they’re small and far away!
    • update some information we store about how many triangle we draw,
    • setup our buffers, there’s determineIndexBuffer from earlier,
    • do the actual drawing,
    • now release our buffers and pop the matrix so that this patches matrix offset doesn’t affect the next patches to be rendered.
    • and we’re done with Render :)
  • LODUpdate – This is called only from the LOD thread, the MAIN thread never goes near it, I’ll break it down just like Render above but this is where we decide to increase or decrease the detail of a GeoPatch by splitting or merging them:
    • Slightly different to the above we lock the “m_abortLock” to see if the thread is trying to quit – it helps us exit faster is the idea but it’s implementation specific, ignore it.
    • canSplit – leading question eh? We iterate through our neighbours and perform some tests like: Do we have them all? Are they less detailed than us or higher in the quadtree than us? If we pass that then we check that we’re not already too deep to split and finally we apply some maths to decide if our edges length is more than the distance from the camera to the centroid we calculated in GenerateMesh. This is a crude approximation of determining how big the patch will be on the players screen. You can find much better ones if you Google for geomorphing. There’s an additional check to see if we have a parent because someone decided we should always split in that instance… not sure why.
    • canMerge – apparently yes… I don’t know why this is here, we could avoid an if test later on, I hope the optimiser gets rid of it!
    • canSplit branch – if we really really canSplit then, in summary: we first see if we have children, if we then we just pass on the call to LODUpdate otherwise; we create 4 child nodes, we set ourselves as their parent, we setup the relationship between each of them and the neighbours that we know about (if any) then we call GenerateMesh on each of them, we GenerateEdgeNormalsAndColors using all that complicated shit above, we UpdateVBOs which sets the flag that says that we really need to call _UpdateVBOs back in the MAIN thread and finally we pass on the call to LODUpdate to our newly created child nodes.
    • canMerge branch – if we cannot split then we’ll go this way, always (stupid code). That doesn’t mean we can actually merge though, not if have no child nodes. If we do then we’ll happily destroy them, and in their destructors they’ll destroy their kids and so on and so forth. When I first starting reading this code it took me a while to realise why the tree isn’t unbalanced and constantly destroying and creating node. It’s because we split aggressively. We only take the merge branch when there’s no way of splitting.
    • A word on locking: in both the canSplit and canMerge branches we Lock but at different times. In canMerge land there’s not much to do, just destroy stuff so we lock straight away. In canSplit land though we can delay the mutex locking until later, we allocate the new child nodes to temporaries, then call the expensive GenerateMesh method, only once we’ve done that do we lock the mutex and copy the temporaries into the correct child node pointers. They have to be there for to correctly notify their new neighbours that they exist and to generate the edge normals and colours. At least this way the mutex isn’t locked for as long, because that would delay the rendering call back in the MAIN thread if it was.
  • GeoPatch constructor/destructor – a quick note about these two, the generate some data used throughout the rest so are worth reading and keeping in mind. The destructor also does some of the edge friend management by letting it’s friends know when it’s destroyed but otherwise they’re pretty simple.

Wow, 3202 words in so far everyone… everyone? Anyone? Ahem, anyway.

The end?:

That’s it for the whistle-stop tour for now. I know I know, there’s still slightly more than a third of the CPP file that I haven’t even started to cover but that’s fine because I’m not going too. At all.

It’s mostly too specific to some of Pioneers quirks, some of it is just structural so it’s run once to set things up and then never used again but mostly I’m ignoring it because it doesn’t deal with what you need to understand.

As I’ve described above the existing terrain generation is in two parts; one in the main thread with some setup and then the rendering, the second is in it’s own thread. The two cross over at places and prevent crashes and other problems by locking mutexes. This works but it causes performance problems of it’s own. There’s a great deal of data copying stuff that can be avoided by simply generating more than you need, that means you don’t need to track relationships so much and that simplifies the code massively. That might have performance issues of it’s own of course since you are doing extra work but it simplifies the code so very much and I’ve already covered the reasons for doing that in Part 1.

Next in the series I’ll probably cover some of the ways we generate the heights for the terrain… no, no actually I’m not doing that at all. Yikes that’s terrifying stuff. No instead I’ll discuss what I’ve been working on to make it take advantage of multithreading and many cores. In that article I’ll also cover a side project called GLSLPlanet which moves some of the work onto the GPU, and why I’m not doing that in Pioneer just yet even though it’s based on the same code I’ve just described.

See you next time,

Andy

Part 1: Pioneer’ing Terrain

Posted by | Posted in Game Development, GLSLPlanet, Pioneer | Posted on 17-04-2013

Yes yes I went for the pun-tastic title :)

This is going to be a bit more technical than usual, not massively, don’t stop reading for fear of equations or pages of code. No this is just going to be a technical description of some parts of Pioneers terrain rendering system because it’s quite odd, not great, but it’s effective enough. I’m basically going to brain dump a bit so I might come back and edit this in the future to clean it up, it will be something of a living document.

In the future I’ll refer back to this post to describe the ways that the terrain rendering and generation will be changing.

I should also point out that I’m not the original author of the terrain rendering or generation used in Pioneer. It’s just an area that I’ve been poking around in a lot lately and that I have a lot of changes that I want to make to that area. These changes should make generation of the terrain faster and more flexible, add control that we currently lack. They should also accelerate rendering and use a lot less memory as well as opening up new visual and game effects in the distant future.

The basic idea:

The terrain itself is a form of Quadrilateralised Spherical Cube which is a fancy way of saying that you’ve got 6 flat planes oriented to face in the six outward directions of a cube, then you deform them to be a sphere. This causes some distortions but it also comes with a lot of rather useful benefits.

For a start we can use a quadtree to define the surface of each (originally) square face of the (former) cube. That’s handy because there’s lot of simple terrain rendering algorithms that can use such an arrangement. In this case we’re using something similar to “Chunked LOD“. I say “similar” because the details differ, but you can read a better explanation on the “making worlds of sphere and cubes” blog post on Acko since it’s similar in concept.

So you know that:

  • we’re using 6 quadtrees, 1 per cube face.
  • each quadtree is deformed to fit onto a sphere.
  • that magically this is helpful…

Well when we want to render the terrain we start at the top of each of the quadtrees, i.e; the lowest detail level, and simply ask it if it has any children. Because it’s a quadtree it will have either 4 or none, hence the name “quad” in case you ignored that wikipedia like. When we reach one of these quadtree nodes that has no children we render whatever geometry it has because that must by definition be the highest level-of-detail available. This is a very simple system, you repeat this for each of the faces of the cube/sphere and you’ve very quickly rendered a convincing sphere onscreen.

Implicit in the above statement however is the idea that some of the quadtrees “nodes” won’t have children whereas others will. To do this we pass through the players camera position to an update method that goes through the quadtree, much like the rendering does, and at each node it does some maths to work out if a node is in 1 of 3 states: good enough, whether it should “split” and create some child nodes, or if it has child nodes and doesn’t need them anymore so can “merge” them.

  • Good enough: nothing happens, we have the perfect amount of detail. In the code we’ll take the “merge” branch but if we have no child node then nothing changes,
  • Split: Ah, this nodes not detailed enough so we can either create 4 child nodes and populate them with more detailed terrain, or if we already have them then we descend into those node and repeat the update process,
  • Merge: The flipside to being good enough is that we’re good enough and so if we have any child nodes then they’re superfluous and we can merge them, or in Pioneers case, just delete and erase them from existence.

In the current Pioneer Alpha 33 this all happens in a single separate thread with a few locks/mutexes preventing rendering from happening at the same time as updating. This creates a lot of waiting around for either updating the quadtree or waiting for rendering to complete. It was the simplest way to improve things at the time it was created though because it meant that it didn’t stall the main thread whenever the, very expensive, terrain generation needed to be done. Doing it this way keeps everything looking like a single-threaded process and this is inherently simpler to write and understand. That means there should be less bugs and less problems.

Unfortunately Pioneer tries to do some odd things that exploit the fact it’s storing the terrain in a quadtree. This unfortunate stuff is done in the name of performance in that it tries to avoid regenerating some of the terrain data by keeping close track of who it’s neighbours are in the quadtree and then copying as much data as it can. All of this neighbour tracking and data copying is very expensive both in performance terms and complexity. Overall it was this that took me the longest to understand rather than the actual terrain generation or rendering parts!

This neighbourly management is one of the first things that is going to change, hopefully by the next release date.

You see the neighbouring node stuff is complex, it also imposes some form on the quadtree structure that we don’t need and that means things cannot be easily generated in little pieces or across many CPU cores. Ideally we’d like the update logic to happen, it decides to Split or Merge some nodes. Then it asks a system to go away and make it the data it needs. Those requests disappear off into the aether and a short while later new terrain node data returns ready to be put into the quadtree. Whether that’s done on a super-powerful GPU, farmed out across a distributed network, or done on a differet core of the same CPU we don’t know or care. Removing the neighbour management, or at least the data copying portion, means breaking up the generation of these quadtree terrain nodes gets much simpler and easier to distribute across many CPU cores.

Quadtree specifics and quick/easy changes:

A quadtree can be used to store any kind of information, it doesn’t even have to be spatial or geometric like we are in Pioneer. However that is our use case and so we store a tonne of data in each of the quadtrees nodes. Now by a tonne of stuff I really mean a metric-fucktonne of stuff! We store in terrifying order:

  • A ref counted pointer to the context data that holds the terrain generation methods,
  • the double precision vector of the corner vertices,
  • 3 * double precision vector arrays to the vertex, normal and colour data,
  • a GLuint for a nodes vertex buffer object number,
  • 4 pointers to its child nodes (it’s a quadtree afterall),
  • another pointer too this nodes parent,
  • 4 more pointers to it’s possible neighbours (whether or not it has any),
  • a pointer to the parent object called a geosphere, which holds the top level quadtree roots,
  • double precision rough length of an edge (from corner to corner along an edge),
  • 2 * double precision vectors for the centroid and clipCentroid of the node used for updating and rendering calculations,
  • double clipRadius,
  • depth of the node within the quadtree,
  • a mutex for locking access to the child nodes during updating or rendering,
  • double “distMult” which is calculated and used in the constructor and NEVER AGAIN.

Ok so that did get a little non-geek hostile there (it’s about to get worse!) but what it boils down too is that there’s a lot of shit in every single node. Some of it is terrifying, absolutely pant wettingly funnily bad to have there. Lets take the worst of the offenders, and before I get any comments pointing out the others, yes I am aware of all of them:

3 * double precision vector arrays to the vertex, normal and colour data – These are bad for a few reasons but the obvious ones are that even if we need to store this data then we’re storing it in the wrong format for AT LEAST two of the pieces of data:

  • Normals – these can be efficiently packed down into some very compact formats, in reality you could get down to a single 32-bit integer with two of the coordinates packed into 16-bits each and then recalculate the 3rd component when need, probably in the vertex or pixel shader. In this case we can quickly halve the memory usage however by simple making it a float vector instead of a double.
  • Colours – oh-my-god, you don’t EVER need double precision for colour, in this case we have 192-bits for colour… 16-bits would probably be good enough, for perfect colour we might need 24-bits but NOT 192-bits. In this instance a single 32-bit integer is our best bet for quickly encoding the colour saving us a staggering 160-bits PER VERTEX and we’ve got 8-bits to spare if we need them for something (alpha channel?) in future.
  • So why was the colour using 64-bit vector3 representation? Because the heightmap generation was sneakily storing the height in the red channel during the terrain generation so that it could use it later…
  • …this was so it could retrieve the height when calculating the per vertex colour. Why not just store the height directly in another array? This adds back another 64-bits so we’ve only saved 92-bits per vertex.
  • Of course we’re only storing all of this temporarily anyway, until we can generate the new vertex buffer for the patch (nuuurgh, I’ll rant about this later) which then turns it into floating point vector data. Since this is the case however we can in fact discard the vertex position data completely, all 192-bits of it, and then rebuild it from the height data that we’re now storing instead.
  • Total so far: Normals = 50% = 192-bits to 96-bits, Colours = 1/6th = 192-bits to 32-bits, Vertex = 1/3rd = 192-bits to 64-bits, so 576-bits to 192-bits per vertex.

That reduction has brought my peak memory consumption from ~700MiB when sat on the surface of the New Hope start position down to ~280MiB, and as you can see from the above there’s still more to be saved from a variety of areas. That’s because those arrays in particular are allocated to contain all of the visible data for each node. The rest is a bit nasty but it’s nothing in terms of memory usage by comparison because even with a few thousand nodes in the quadtree(s) they’re only a tiny fraction of all the data stored.

Hmm, that’s gone bit rambl-o-matic because I’m bloody tired.

Well maybe someone might find it interesting, I’ll continue with this stuff… soon! :)

Andy

Further in the series: Part 2 is up now.