Part 3: Many hands make for light work.
Posted by Game Development, GLSLPlanet, Pioneer | Posted on 19-05-2013
| Posted inWell this episode has taken longer than planned to get written, or event started for that matter. So lets not delay as this is part 3 of me attempting to explain the terrain system used in Pioneer Space Sim.
If you want to recap and point out poor grammar or spelling mistakes then go ahead and read Parts “1: Pioneer’ing Terrain” and “2: Now with… no feeling in my arms due to all the typing!”.
How things change:
Since Part 2 there have actually been some developments which make this edition even more relevant. At the end of it I mentioned that I’d be covering my work making the terrain generation multi-threaded using a job based system. Well that has now been merged into master and is available in the latest downloads. It’s got some bugs fixed since then and hopefully this will only get truer in the future ;)
Jobs, not the Steve kind:
So when I talk about “jobs” what am I on about? Well a given job is supposed to be, from the outside, a mystery…
Imagine you’re stood between two conveyor belts, and boxes are going past you on one of them. Up from you on the conveyor belt someone puts boxes onto the belt, a box comes along to you, you take the box from the belt and press a big red button on the top of it, then you put it onto the other empty conveyor belt. Each box is a job, pressing the button does “something”, and placing it onto the empty conveyor belt means that it’s been processed.
Each of those boxes was self-contained, whatever that big red button was wired upto doesn’t matter, all that matters was that as it came along to you it had the correct interface (the button) and you pressed it and then placed it onto the other belt which means that it’s done it’s work.
We can treat a lot of things this way if we’re clever, in this case I chose the terrain generation because it’s quite a nice self-contained process. Or it can be anyway. Each job is a collection of all the data required for some process to run, placed into a container – the “box” – and sent off as one big lump to be processed. With the terrain system it took a little bit more teasing apart what data and structures were required than I had expected but I managed to create a container that nicely separated it anyway.
Actually, I made 2 called “SinglePatchJob” and “QuadPatchJob”, but we’ll get to them in a moment.
Voting, for none of the above:
What exactly makes an easy, not a good, candidate for job based processing?
What you’re looking for in a job-able task is something that you do a lot, I mean an awful lot, like thousands of times but where each instance is neatly self contained.
Some data comes in, it processes that data, then some data comes out. No state, no accessing member variables or lots of other methods, and definitely no accessing other objects or instances in memory. You’re looking for something that deals exclusively with a simple lump of data and returns an equally simple lump of data. Simple in terms of containment rather than what it describes.
That’s a pretty wide net to cast, it describes almost any function for example (NB: function, not a method of a class).
Want that one:
I already knew of a number of areas in the codebase that made potentially good candidates but my increasing familiarity with the terrain system made it fairly obvious to me. There’s one major step that eats a huge amount of CPU time, it depends on a limited set of data, and when it’s complete it has created another lump of data. Of course it never used to look like that before I made this change but I could see the shape of what I needed within that complex system.
In the old system the terrain generation called the “GenerateMesh” method and then went about copying or generating lots of terrain, colours and all manor of horrors. That expensive call though was mostly to “GenerateMesh” as that’s where it did most of the work and all of the copying of data that followed was just noise in the system. With a little effort I isolated “GenerateMesh” and removed most of the neighbour management complexity.
Of course removing the neighbour management brought up some issues. For example: generating colours and normals for the edges or corners required information that was outside of that created in “GenerateMesh“. The terrain generation used to solve this by asking for the information from a neighbouring node, or generating it temporarily then discarding it. To solve this I decided the best answer was just to generate a lot of wasted data, or rather, an edge of extra data around the data that I really wanted.
This extra data probably requires a picture to explain… only I suck at those so use your imagination, it’s good practice… oh it’s good practice for _me_, well here we go then:
Oh yeah, check that out!
- Red squares: the generated data I actually need,
- grey squares: the extra data that we generate so we can calculate the terrain normals,
- yellow-ish squares: COMPLETELY WASTED, we don’t use them at all! Oooh the excess!
And this is the reason we need that extra data:
Generating a surface normal requires (currently) us to sample the heights surrounding each other point on the grid. That would break down at the edges of the data that we want to generate the normals for if we only had the central (red squares) data. However, we’ve generated the buffer data (grey squares) as well. That grey data we are only interested in using, we don’t need to generate normals for it so that fact that we can’t doesn’t matter. It being there allows us to make the important data completely self-contained to the “GenerateMesh” method and we don’t have to communicate with other nodes anymore to get the data that we need. That lets us make the code a lot simpler too.
All of this comes with some obvious costs. First of all we are now generating a lot of extra data that we only need temporarily. In fact we’re generating some of the same data that 4 other nodes will also generate for their own edges… and they’re generating some we will! Because we’re isolating all of this though we can’t share that data with anything else.
On the low resolution grid above that adds up to quite a lot of overhead: 64+(4*8)+4 = 100 heights generated instead of just the core 64 we actually need. Thankfully the higher the resolution of your grid the smaller a percentage of it is represented by that overdraw. So whilst the above low resolution grid has 1/3rd of it data wasted a higher resolution 33×33 grid has just 11% which is still more than we’d like but it’s not going to kill the performance.
Now that we’re generating this border we don’t have to look in the neighbouring nodes and all of the data can be generated from what’s passed into the “GenerateMesh” method itself which means it’s now suitable for turning into a job.
Right Job for the Tool:
Earlier I mentioned that I’d actually written 2 jobs called “SinglePatchJob” and “QuadPatchJob” but why did I write two? Because our quad tree based terrain is a bit of an oddball in that it has renderable geometry at the root node. That is to say that it has a single patch of terrain at it’s base and that then has the usual four child nodes and so on down the tree.
I’m a big fan of making only the change that needs to be made so this wasn’t something I felt like altering therefore “SinglePatchJob” is the one that’s issued for each of the 6 root patches, and “QuadPatchJob” is issued when generating any child nodes as it creates all 4 in a single task. I decided to group the child patch generation into one job that did all 4 patches rather than issue many SinglePatchJob’s for each child because it avoids having to handle accumulating them later on. It is possible that might be a better method, perhaps something to test in the future.
Each of the job types takes a few parameters in it’s construction, allocates memory for the buffers that will be hold the final data and for the temporary buffers. The job is then added to the job queue and a flag is set on the GeoPatch which requested the job to indicate that it’s waiting for data to be generated. That flag means that it shouldn’t be deleted or request the data again.
All of that is done from the main thread in the “Update” method of a GeoPatch. In fact one of the biggest changes was that updating of the terrain has moved out of it’s own (single) thread and back into the main thread via an “Update” method on the TerrainBody class.
When a job is completed it creates another data structure to return the patch data then calls a class static method on GeoSphere. That goes through all of the registered instances of GeoSphere and looks for the matching address that the patch wants to return it’s data too. When it’s found the correct instance it adds the data to a vector for later processing.
When the GeoSphere::Update() method gets called it will flush the vector containing any returned patch data and pass it onto the GeoPatch tree. Finally our GeoPatch actually receives the data it requested and can now construct it’s child nodes.
A lot of effort, how much gain?:
So if that’s all there is too it why have I gone to all this effort?
The answer is quite simple: Any of those jobs can now be processed on any available thread.
This means that on my Intel i7 for example I currently spawn 7 threads which means that I can have 7 jobs being processed simultaneously in theory. In practice it doesn’t work quite like that since I only have 4 physical CPU cores and hyperthreading makes it look like I have 8 however the performance increase is still dramatic.
In future we’re hoping to make some more improvements which will mean that even dual-core machines benefit from this, currently they only spawn a single thread for processing jobs by default.
So that’s pretty much it, not such a deep dive in because I’ll leave that for part 4.
One last thing:
I mentioned at the end of Part 2 about my side project GLSLPlanet, I’m not currently integrating it’s way of working into Pioneer due to the way that our terrain noise system works.
However the idea itself is a good one, using the GPU to generate the terrain data can be fantastically fast even on relatively low end GPU hardware. It does mean you must be careful about precision and our current noise system isn’t right for porting over to it at the moment. What it would require to work in Pioneer is almost the same as the job based approach described above. Separation of the job from the rest of the terrain code into an independent unit of work.
That means that by laying the groundwork for the job system we’ve also made the first steps required to get away from our current terrain generation system and move onto something more modern. A system that can leverage more of the computers resources whatever they happen to be.
In Part 4 I’ll go into some of the more technical and implementation specific details of the terrain job system, how it’s affected the way we create and delete GeoPatch instances, what problems it has and various other details.
Thanks for reading.