Space MMO – Pioneer-alike
Posted by Game Development, Pioneer | Posted on 29-06-2013
| Posted inSpace game MMOs… A brain dump:
I’ve been inspired to brain dump by a post on the Paragon forum.
To quote:
“Will multiplayer be instanced or MMO style?
If MMO will players be able to transverse to other servers/galaxy’s with their ships,equipment and credits?“
Now, leaving aside the question of what they might mean behind instancing, lets instead think about how a space game like Paragon, not Pioneer note, Paragon might choose to implement an MMO gameplay model.
NB: Paragon is a sort of daughter project to Pioneer. It’s forked from the Pioneer codebase and occasionally pulls in work we do. However Paragon has it’s own developers, development and game specific code, scripting, storyline and art. I’ll be discussing these things from being a Pioneer developers point of view.
Part 3: Many hands make for light work.
Posted by Game Development, GLSLPlanet, Pioneer | Posted on 19-05-2013
| Posted inWell this episode has taken longer than planned to get written, or event started for that matter. So lets not delay as this is part 3 of me attempting to explain the terrain system used in Pioneer Space Sim.
If you want to recap and point out poor grammar or spelling mistakes then go ahead and read Parts “1: Pioneer’ing Terrain” and “2: Now with… no feeling in my arms due to all the typing!”.
How things change:
Since Part 2 there have actually been some developments which make this edition even more relevant. At the end of it I mentioned that I’d be covering my work making the terrain generation multi-threaded using a job based system. Well that has now been merged into master and is available in the latest downloads. It’s got some bugs fixed since then and hopefully this will only get truer in the future ;)
Part 2: Now with… no feeling in my arms due to all the typing!
Posted by Game Development, GLSLPlanet, Pioneer | Posted on 20-04-2013
| Posted inI should have made this clear from the outset but these posts aren’t a description of the “best way” to do anything, they’re more akin to documenting how we currently ARE doing things.
There’s a great deal of information out there on how these things can work but with Pioneer you can grab the source code and actually see it in progress. There’s great value in getting hold of something, testing it, debugging it, making changes and watching it blow up in your face ;)
Previously:
In Part 1 I gave a very basic description but at no point did I attempt to explain things at the code level… and I’m not going to get too close to it during this series but I’ve got to get at least a little more familiar because some parts just don’t make any sense at first.
What does this do and why does it do it?:
One of the first things you see with the “GeoSphere” code leading up to, and including, the Alpha 33 release is that it’s entirely contained in just two files which are several thousand lines long and full of some nasty complex looking code. This is an unfortunate side effect of the way it’s evolved. At one point it might have been reasonably clean but as with all things they get added too and so the complexity grows along with the sheer amount of code in a single file. Eventually you just have to prune it back but with these situations you first have to understand what’s going on before you can do that pruning and splitting up of files. The complexity means that no-one understands the system and deciphering it becomes the large part of the burden so the situation is rarely resolved.
So lets tackle the big pieces of the complexity first.
GeoSphere:
The GeoSphere class is the main unit the rest of the code will interact with, it’s the part visible to the outside. It contains instances of the other main classes and does some general orchestration of them. It can be thought of as being split into two main parts however due to it’s use to static methods and members:
- static methods & members that affect ALL GeoSphere instances.
- non-static methods & member that are about a specific instance of GeoSphere.
The static part handles the creation and initialisation of stuff that affects all of the GeoSphere instances, the GeoPatchContext which holds information about the size and detail of the GeoPatch’s for example. It also kicks off the thread which manages updating the level-of-detail calculations I mention in Part 1.
The per-instance stuff is fairly basic:
- m_terrain pointer, which is a “Terrain” pointer that will handle all of the heavy lifting logic for generating height and colour information for the patches.
- 6 GeoPatch pointers, these are the 6 faces of the cube we will turn into a sphere, these look after themselves most of the time, we just have to call methods(/function) one them.
- Constructor, destructor, Render, GetHeight, GetColor and BuildFirstPatches methods.
GetHeight and GetColor are simply helper methods, they just pass some information into a method of m_terrain and return the data, nothing more. Likewise the constructor and destructor are just setup/cleanup so lets ignore them.
The two methods of interest here are Render and BuildFirstPatches.
The Render method is called from the main thread as all of our rendering has to happen there currently. However, Render does more than just render, yeah I know, horrible eh but it’s what we’ve got.
At the top of the method it mostly does some setup of materials, one’s subsequently used by ALL of the patches that will be drawn. Then we get to a call to BuildFirstPatches. This is checked every single time we call Render but it only does anything on the first call and it’s lazy evaluation done lazily. It could be quite convoluted to create the GeoSphere and then immediately call BuildFirstPatches so instead of resolving this complexity it just checks to see if the root node of the patches have been built every frame and if it hasn’t it calls this method. The most stars/planets/moons I’ve yet to see in a system was about ~50 so it’s not too much overhead but it’s still icky.
After this check we configure basic lighting and then we start calling the Render method on each of the root GeoPatch instances (the ones created by BuildFirstPatches). I’ll describe them more later but they essentially do the quadtree traversal finding the nodes worth rendering.
Once we’re finished there we release out materials and then do some nasty management of the multithreaded updating logic, and finally we store the current camera position that will be used in the updating and level-of-detail (LOD) calculating thread. This is actually another reason why BuildFirstPatches is done lazily within the Render call, it’s because you can’t properly update the LOD until you have a camera position, but you don’t have one until you’ve tried to render. By putting the BuildFirstPatches call in Render you almost guarantee that you will have a valid camera position by the time that the LOD thread gets around to updating your GeoSphere. Hacky but it’s worked all this time.
GeoPatch:
If GeoSphere were the potatoes then this is the meat. GeoPatchContext might take up the top 1/3rd of the file but that stuff is actually fluff, it’s bulky but it’s not doing a lot of active logic, just setting things up for GeoPatch to use later, and they’re mostly rendering related but they’re not important until we get to GeoPatch rendering.
GeoPatch is defined entirely within the CPP file for GeoSphere, I don’t mind this technique overly much because C++ can be rather limited when you’re trying to hide an implementation but this is crazy. The GeoSphere implementation is only ~500 lines of code, and it starts at line 1053! GeoPatch on the other hand takes 670 lines in the middle of the file. Ugly and confusing for people new too it. Also confusing is that the quadtree, patch data, and some threading handling information are all muddled together.
We have as the 3 parts that should be more distinct:
- kids (*4), parent and edge friends (*4) – this is the quadtree information.
- m_kidsLock pointer to a mutex – this is the threading rubbish, it’s used to stop you from updating the patch when you’re trying to render it and vice versa.
- everything else is the patch data or at least can be considered that way.
To add to this confusing class data layout is that some of it will be called and used in both the main thread and the LOD update thread with access gated by the m_kidsLock mutex. Threading is almost never simple but this lack of clarity makes it hard to understand both when and where some members will be updated.
Next we have a bunch of functions that stump most people, I’ll list them and then explain their purpose:
- GetEdgeMinusOneVerticesFlipped
- GetEdgeIdxOf
- FixEdgeNormals
- GetChildIdx
- FixEdgeFromParentInterpolated
- MakeCornerNormal (templated for extra fun)
- FixCornerNormalsByEdge
- GenerateEdgeNormalsAndColors
- OnEdgeFriendChanged (needed)
- NotifyEdgeFriendSplit (needed)
- NotifyEdgeFriendDeleted (needed)
- GetEdgeFriendForKid (needed)
All of these serve one purpose, to confuse the hell out of everyone reading the code… no ok that’s being mean, the last 4 really are useful… still mean? Yeah a bit, I’ll have to explain.
Generating terrain can be a computationally expensive process. The way we do it will be a couple of future articles but the very very short version is that we run an expensive set of mathematical noise functions many many times for every single vertex that we want to generate the position for. Doing this for a single vertex is expensive but we do it 10’s of thousands of times even for a low detail planet. Most of the code, and ALL of that complexity above, is an attempt to avoid generating as much of it by copying it from other patches that have already been generated.
The reason that the last four methods are, mostly, needed is because they’re both the way that we keep track of who our neighouring nodes are and how we kick off the data copying process.
The thing to take from this is not that these are a good idea, and I’m not going to explain how they work either, the important thing is that they’re OPTIONAL. They’re costly too, a lot of time is spent maintaining all of this and all of the data copying is pretty bad for performance for other reasons. Not least that it often just has to generate the data anyway because a neighbour can’t be found at the correct level. They also make it much harder to make everything into smaller tasks that can better use multi-core CPUs. All things considered I’d rather not have them.
What all this mean is that GeoPatch has that lot, and only another handful of interesting methods which I’m going to go into a little detail with now:
- determineIndexBuffer – I will forever regret that this doesn’t match our method naming convention, i.e: that lowercase ‘d’. Ignoring that though; this is what uses the result of the method “Init” from GeoPatchContext, most of it anyway. “Init” went along for many lines of code calculating 16 index buffers, it was a lot of code and earlier I said it didn’t do very much. Well in truth what it did was set up 16 index buffers so that here, in this method, we could do a very fast and simple bit of OR’ing and come up with an index into that list of 16 index buffers. Specifically we take our edgeFriends and we see which ones of them we actually have, and we find the perfect index buffer that uses low-resolution edges when we don’t have a neighbour on that edge, and hi-resolution edges when we do. There are 16 possible combinations, although in truth we only ever have a subset of those used on our terrain. It’s quick and easy to calculate them though, and this is fast way to index those.
- UpdateVBOs & _UpdateVBOs – even I’m not entirely sure where these two are called from, which thread and under what circumstances. This is because of those many edge copying methods, I suspect that UpdateVBOs can be called from either depending on circumstances, but it’s usually called from the LOD thread. _UpdateVBOs on the other hand should only EVER be called from the MAIN thread as it interacts with OpenGL by updating or creating the vertex buffer object (VBO) that is used to render the patch on screen so if it’s ever called from another thread it will almost certainly crash, or worse it’ll keep running and fail silently :(
- GetSpherePoint – gets a point on a sphere, specifically it performs a bilinear interpolation between the four corner points of a patch. At the lowest detail level then, that means the corners of a cubes face, i.e: this turns our cube into a sphere – or one faces of it into a curved patch on the surface of the sphere – part of the magic happens here. Once it’s done the bilinear bit it “normalises” it, turns it into a vector of unit length (Off topic; I hate wikipedias vector descriptions, they penalise those striving to learn some maths and strokes the egos of those who already do) so that when you sum the parts of the vector they equal 1. GenerateMesh makes heavy use of GetSpherePoint because it creates the patch vertices.
- GenerateMesh – ignore the centroid for now, focus on the first couple of nested for loops because they generated ALL of the heights for every vertex of a patch. This *IS* the terrain creation right here in two for loops.
- The call to GetHeight is the expensive call to the generation process itself but it’s interchangable with any other. We have many versions of this call they all do things using different maths and could even use other ways of generating terrain so we’ll deal with them another day. No the important bit is that if you strip all of GeoPatch down to it’s absolute bare minimum then this function and GetSpherePoint is almost all you’d have left with just these two for loops.
- We get the height using a double precision floating point variable because it will be in the range 0.0 to 1.0 yet it has to be very very precise and if we used a single precision (32-bit) float then we’d lose detail and get visible banding on the terrain. There are ways and means of avoiding this but doubles are simple and they’re fast enough most of the time.
- We take that height and add 1.0 to it, then we take this new value and use it to scale the vertex position from GetSpherePoint and that’s it, we have a terrain height that will be of at least unit length (1.0) up to a maximum of length 2.0 which would make for some very high mountains.
- The second pass is now where the trouble starts but can be summed up with the word: “normals” – the calculation of which uses 4 vertex positions, at least two of those will be within this patches data, but the other two will be in our neighbouring patches. That’s why these for loops don’t cover the entire patches data. Instead they cover the central part, the rest of the data, the edges, will be calculated by the mess of edge copying methods I skipped describing above.
- The colour is also calculated in this 2nd pass, it requires the height and the normals so it’s also done in the edge copying. You might notice if you’re reading through the code that the height is stored in the colour in pass 1, then used in pass 2. I covered why this was and why it was bad in Part 1 – as I say, this is a description of how it is, not how it should be!
- Render – an actual blend of quadtree and patch all in one here. This is depth first traversal & rendering of a quadtree in action! Render MUST be called in the MAIN thread only because it’s dealing with OpenGL. First we see if we have any child nodes, if we do we just forward on the values that we passed into the function. That’s the depth first traversal part, because in a tree each node will do the same thing, over and over until it reaches a leaf node, the one without children. When we do the rendering is straight forward:
- Lock the node – so it can’t be updated in the LOD thread,
- optionally update the vertex buffer object (VBO),
- test to see if we’re visible and return without doing anything if we’re not,
- push the current matrix and then translate the view according the difference between the camera position and the clipCentroid (which is the centre of the four corner vertices),
- this helps us deal with a problem called “jitter” (/”jittering”) caused by the GPU only using 32-bit floats which aren’t precise enough to represent the terrain. What we do is move the rendering of the patch in such a way that 32-bit’s are precise enough for us by offsetting it from the camera position. We’re effectively moving it closer to the camera before applying the position to avoid the jittering! Patches further away still jitter like crazy, but it doesn’t matter because they’re small and far away!
- update some information we store about how many triangle we draw,
- setup our buffers, there’s determineIndexBuffer from earlier,
- do the actual drawing,
- now release our buffers and pop the matrix so that this patches matrix offset doesn’t affect the next patches to be rendered.
- and we’re done with Render :)
- LODUpdate – This is called only from the LOD thread, the MAIN thread never goes near it, I’ll break it down just like Render above but this is where we decide to increase or decrease the detail of a GeoPatch by splitting or merging them:
- Slightly different to the above we lock the “m_abortLock” to see if the thread is trying to quit – it helps us exit faster is the idea but it’s implementation specific, ignore it.
- canSplit – leading question eh? We iterate through our neighbours and perform some tests like: Do we have them all? Are they less detailed than us or higher in the quadtree than us? If we pass that then we check that we’re not already too deep to split and finally we apply some maths to decide if our edges length is more than the distance from the camera to the centroid we calculated in GenerateMesh. This is a crude approximation of determining how big the patch will be on the players screen. You can find much better ones if you Google for geomorphing. There’s an additional check to see if we have a parent because someone decided we should always split in that instance… not sure why.
- canMerge – apparently yes… I don’t know why this is here, we could avoid an if test later on, I hope the optimiser gets rid of it!
- canSplit branch – if we really really canSplit then, in summary: we first see if we have children, if we then we just pass on the call to LODUpdate otherwise; we create 4 child nodes, we set ourselves as their parent, we setup the relationship between each of them and the neighbours that we know about (if any) then we call GenerateMesh on each of them, we GenerateEdgeNormalsAndColors using all that complicated shit above, we UpdateVBOs which sets the flag that says that we really need to call _UpdateVBOs back in the MAIN thread and finally we pass on the call to LODUpdate to our newly created child nodes.
- canMerge branch – if we cannot split then we’ll go this way, always (stupid code). That doesn’t mean we can actually merge though, not if have no child nodes. If we do then we’ll happily destroy them, and in their destructors they’ll destroy their kids and so on and so forth. When I first starting reading this code it took me a while to realise why the tree isn’t unbalanced and constantly destroying and creating node. It’s because we split aggressively. We only take the merge branch when there’s no way of splitting.
- A word on locking: in both the canSplit and canMerge branches we Lock but at different times. In canMerge land there’s not much to do, just destroy stuff so we lock straight away. In canSplit land though we can delay the mutex locking until later, we allocate the new child nodes to temporaries, then call the expensive GenerateMesh method, only once we’ve done that do we lock the mutex and copy the temporaries into the correct child node pointers. They have to be there for to correctly notify their new neighbours that they exist and to generate the edge normals and colours. At least this way the mutex isn’t locked for as long, because that would delay the rendering call back in the MAIN thread if it was.
- GeoPatch constructor/destructor – a quick note about these two, the generate some data used throughout the rest so are worth reading and keeping in mind. The destructor also does some of the edge friend management by letting it’s friends know when it’s destroyed but otherwise they’re pretty simple.
Wow, 3202 words in so far everyone… everyone? Anyone? Ahem, anyway.
The end?:
That’s it for the whistle-stop tour for now. I know I know, there’s still slightly more than a third of the CPP file that I haven’t even started to cover but that’s fine because I’m not going too. At all.
It’s mostly too specific to some of Pioneers quirks, some of it is just structural so it’s run once to set things up and then never used again but mostly I’m ignoring it because it doesn’t deal with what you need to understand.
As I’ve described above the existing terrain generation is in two parts; one in the main thread with some setup and then the rendering, the second is in it’s own thread. The two cross over at places and prevent crashes and other problems by locking mutexes. This works but it causes performance problems of it’s own. There’s a great deal of data copying stuff that can be avoided by simply generating more than you need, that means you don’t need to track relationships so much and that simplifies the code massively. That might have performance issues of it’s own of course since you are doing extra work but it simplifies the code so very much and I’ve already covered the reasons for doing that in Part 1.
Next in the series I’ll probably cover some of the ways we generate the heights for the terrain… no, no actually I’m not doing that at all. Yikes that’s terrifying stuff. No instead I’ll discuss what I’ve been working on to make it take advantage of multithreading and many cores. In that article I’ll also cover a side project called GLSLPlanet which moves some of the work onto the GPU, and why I’m not doing that in Pioneer just yet even though it’s based on the same code I’ve just described.
See you next time,
Andy
Part 1: Pioneer’ing Terrain
Posted by Game Development, GLSLPlanet, Pioneer | Posted on 17-04-2013
| Posted inYes yes I went for the pun-tastic title :)
This is going to be a bit more technical than usual, not massively, don’t stop reading for fear of equations or pages of code. No this is just going to be a technical description of some parts of Pioneers terrain rendering system because it’s quite odd, not great, but it’s effective enough. I’m basically going to brain dump a bit so I might come back and edit this in the future to clean it up, it will be something of a living document.
In the future I’ll refer back to this post to describe the ways that the terrain rendering and generation will be changing.
I should also point out that I’m not the original author of the terrain rendering or generation used in Pioneer. It’s just an area that I’ve been poking around in a lot lately and that I have a lot of changes that I want to make to that area. These changes should make generation of the terrain faster and more flexible, add control that we currently lack. They should also accelerate rendering and use a lot less memory as well as opening up new visual and game effects in the distant future.
The basic idea:
The terrain itself is a form of Quadrilateralised Spherical Cube which is a fancy way of saying that you’ve got 6 flat planes oriented to face in the six outward directions of a cube, then you deform them to be a sphere. This causes some distortions but it also comes with a lot of rather useful benefits.
For a start we can use a quadtree to define the surface of each (originally) square face of the (former) cube. That’s handy because there’s lot of simple terrain rendering algorithms that can use such an arrangement. In this case we’re using something similar to “Chunked LOD“. I say “similar” because the details differ, but you can read a better explanation on the “making worlds of sphere and cubes” blog post on Acko since it’s similar in concept.
So you know that:
- we’re using 6 quadtrees, 1 per cube face.
- each quadtree is deformed to fit onto a sphere.
- that magically this is helpful…
Well when we want to render the terrain we start at the top of each of the quadtrees, i.e; the lowest detail level, and simply ask it if it has any children. Because it’s a quadtree it will have either 4 or none, hence the name “quad” in case you ignored that wikipedia like. When we reach one of these quadtree nodes that has no children we render whatever geometry it has because that must by definition be the highest level-of-detail available. This is a very simple system, you repeat this for each of the faces of the cube/sphere and you’ve very quickly rendered a convincing sphere onscreen.
Implicit in the above statement however is the idea that some of the quadtrees “nodes” won’t have children whereas others will. To do this we pass through the players camera position to an update method that goes through the quadtree, much like the rendering does, and at each node it does some maths to work out if a node is in 1 of 3 states: good enough, whether it should “split” and create some child nodes, or if it has child nodes and doesn’t need them anymore so can “merge” them.
- Good enough: nothing happens, we have the perfect amount of detail. In the code we’ll take the “merge” branch but if we have no child node then nothing changes,
- Split: Ah, this nodes not detailed enough so we can either create 4 child nodes and populate them with more detailed terrain, or if we already have them then we descend into those node and repeat the update process,
- Merge: The flipside to being good enough is that we’re good enough and so if we have any child nodes then they’re superfluous and we can merge them, or in Pioneers case, just delete and erase them from existence.
In the current Pioneer Alpha 33 this all happens in a single separate thread with a few locks/mutexes preventing rendering from happening at the same time as updating. This creates a lot of waiting around for either updating the quadtree or waiting for rendering to complete. It was the simplest way to improve things at the time it was created though because it meant that it didn’t stall the main thread whenever the, very expensive, terrain generation needed to be done. Doing it this way keeps everything looking like a single-threaded process and this is inherently simpler to write and understand. That means there should be less bugs and less problems.
Unfortunately Pioneer tries to do some odd things that exploit the fact it’s storing the terrain in a quadtree. This unfortunate stuff is done in the name of performance in that it tries to avoid regenerating some of the terrain data by keeping close track of who it’s neighbours are in the quadtree and then copying as much data as it can. All of this neighbour tracking and data copying is very expensive both in performance terms and complexity. Overall it was this that took me the longest to understand rather than the actual terrain generation or rendering parts!
This neighbourly management is one of the first things that is going to change, hopefully by the next release date.
You see the neighbouring node stuff is complex, it also imposes some form on the quadtree structure that we don’t need and that means things cannot be easily generated in little pieces or across many CPU cores. Ideally we’d like the update logic to happen, it decides to Split or Merge some nodes. Then it asks a system to go away and make it the data it needs. Those requests disappear off into the aether and a short while later new terrain node data returns ready to be put into the quadtree. Whether that’s done on a super-powerful GPU, farmed out across a distributed network, or done on a differet core of the same CPU we don’t know or care. Removing the neighbour management, or at least the data copying portion, means breaking up the generation of these quadtree terrain nodes gets much simpler and easier to distribute across many CPU cores.
Quadtree specifics and quick/easy changes:
A quadtree can be used to store any kind of information, it doesn’t even have to be spatial or geometric like we are in Pioneer. However that is our use case and so we store a tonne of data in each of the quadtrees nodes. Now by a tonne of stuff I really mean a metric-fucktonne of stuff! We store in terrifying order:
- A ref counted pointer to the context data that holds the terrain generation methods,
- the double precision vector of the corner vertices,
- 3 * double precision vector arrays to the vertex, normal and colour data,
- a GLuint for a nodes vertex buffer object number,
- 4 pointers to its child nodes (it’s a quadtree afterall),
- another pointer too this nodes parent,
- 4 more pointers to it’s possible neighbours (whether or not it has any),
- a pointer to the parent object called a geosphere, which holds the top level quadtree roots,
- double precision rough length of an edge (from corner to corner along an edge),
- 2 * double precision vectors for the centroid and clipCentroid of the node used for updating and rendering calculations,
- double clipRadius,
- depth of the node within the quadtree,
- a mutex for locking access to the child nodes during updating or rendering,
- double “distMult” which is calculated and used in the constructor and NEVER AGAIN.
Ok so that did get a little non-geek hostile there (it’s about to get worse!) but what it boils down too is that there’s a lot of shit in every single node. Some of it is terrifying, absolutely pant wettingly funnily bad to have there. Lets take the worst of the offenders, and before I get any comments pointing out the others, yes I am aware of all of them:
3 * double precision vector arrays to the vertex, normal and colour data – These are bad for a few reasons but the obvious ones are that even if we need to store this data then we’re storing it in the wrong format for AT LEAST two of the pieces of data:
- Normals – these can be efficiently packed down into some very compact formats, in reality you could get down to a single 32-bit integer with two of the coordinates packed into 16-bits each and then recalculate the 3rd component when need, probably in the vertex or pixel shader. In this case we can quickly halve the memory usage however by simple making it a float vector instead of a double.
- Colours – oh-my-god, you don’t EVER need double precision for colour, in this case we have 192-bits for colour… 16-bits would probably be good enough, for perfect colour we might need 24-bits but NOT 192-bits. In this instance a single 32-bit integer is our best bet for quickly encoding the colour saving us a staggering 160-bits PER VERTEX and we’ve got 8-bits to spare if we need them for something (alpha channel?) in future.
- So why was the colour using 64-bit vector3 representation? Because the heightmap generation was sneakily storing the height in the red channel during the terrain generation so that it could use it later…
- …this was so it could retrieve the height when calculating the per vertex colour. Why not just store the height directly in another array? This adds back another 64-bits so we’ve only saved 92-bits per vertex.
- Of course we’re only storing all of this temporarily anyway, until we can generate the new vertex buffer for the patch (nuuurgh, I’ll rant about this later) which then turns it into floating point vector data. Since this is the case however we can in fact discard the vertex position data completely, all 192-bits of it, and then rebuild it from the height data that we’re now storing instead.
- Total so far: Normals = 50% = 192-bits to 96-bits, Colours = 1/6th = 192-bits to 32-bits, Vertex = 1/3rd = 192-bits to 64-bits, so 576-bits to 192-bits per vertex.
That reduction has brought my peak memory consumption from ~700MiB when sat on the surface of the New Hope start position down to ~280MiB, and as you can see from the above there’s still more to be saved from a variety of areas. That’s because those arrays in particular are allocated to contain all of the visible data for each node. The rest is a bit nasty but it’s nothing in terms of memory usage by comparison because even with a few thousand nodes in the quadtree(s) they’re only a tiny fraction of all the data stored.
Hmm, that’s gone bit rambl-o-matic because I’m bloody tired.
Well maybe someone might find it interesting, I’ll continue with this stuff… soon! :)
Andy
Further in the series: Part 2 is up now.
Why no desktop 16-core CPU?
Posted by Game Development, Pioneer | Posted on 06-03-2013
| Posted inIt’s a question I keep coming up against as I do multi-threading work but where are all of the desktop 16-core CPUs?
You can get what AMD call a 16-core CPU in the Opteron 6200 but it’s built on Bulldozer or Piledriver which are more like 1.5 cores per “dual core” thanks to sharing their FPU capabilities. Or to put it another way, 16-core INTEGER and 8-core FLOATING POINT.
Not long ago we went from single-core to dual-core, to true dual-core (both cores on the same die), to quad-core… and then we stopped.
I guess the argument could be that it’s not worth it for most people? Or that hyperthreading gives you 8 hardware threads with upto 30% performance boost if you can use them well.
This all misses the point though and I’d have been quite happy if AMD had continued adding cores to it’s K10 architecture, keeping up with the process node advances (die shrinking), updating, optimising and just piling on the cores. They have done that to some degree because K10h, as used in the Phenom 2, did make it into the early APU’s in a low power die-shrunk version. It lacked any kind of L3 cache though it did have some extensions, updates and improvements so that it just about holds it’s own against a similar number of core Phenom 2.
Those chips were APU’s though, with an on chip GPU for mobile use. So they clocked slower and fully 50% of the die was spent on the GPU. You could get versions without the GPU called Athlon II but they still lacked the L3 cache and the GPU was there, just disabled and powered off. There was no 8-core Athlon II with an L3, even a small one even though it was probably possible.
We’re down at 22nm + 3D transitors with Intel CPUs whilst AMD are still rolling out on 32nm but are we really stuck at 4-cores?
No, there are higher core counts as you can see but they’re for servers rather than our mere mortal desktop machines. So the work isn’t going into getting more performance out of heavily threading things, instead it’s in GPGPU languages like CUDA, DirectCompute and OpenCL. Utilising the GPU to do work you’d normally have just hammered out on the CPU. There’s real benefit to doing things on your GPU, and eventually I think AMDs “APU” strategy might pay off if they can reduce the latency between the CPU<->GPU for compute languages for example but traditional multi-threading seems to have been ignored. It’s not even an option to get more cores on a desktop CPU and I think that’s a shame as there’s a lot of workloads that will happily scale to 16 or 32 threads without the need to move them to GPU.
It would have been interesting if AMD had chosen to do it, to keep scaling the cores at least as an option but then I think there’s a metric fucktonne of things that AMD could have done to stay relevant which they’ve manifestly failed to do so here’s a short list:
- big.LITTLE – as in the ARM design where you have a group (4 typically) of large, fast, powerful CPUs and then you have one tiny little low power CPU that is used when the workload gets light to save power.
- Unlocked multipliers and clocks on CPUs sold with a very limited warranty at the same price as the regular locked one – the Black Edition chips are good but not enough.
- Change the chip packaging format as Intel and IBM have both separately proposed for better thermal, power and mounting design – if you’re in 2nd place you innovate to survive.
- Speaking of IBM – form an alliance, use their resources and manufacturing to get access to better process node shrinks.
- …and of Process Node shrinks – you can’t compete with Intel on them but as soon as AMD were free of Global Foundaries why didn’t they run off to TSMC (or IBM) and aggressively chase what there was available?
There must be a tonne more as well besides, but in short I miss the AMD of old. We were speaking about it at work the other day and it always produces the same head-shaking response where you just can’t believe that todays AMD is the same one that gave us the K6 and k8 architectures. The AMD that bludgeoned Intels Pentium 4 misstep into a cocked hat and happily overclocked it pants off.
Todays AMD seems to be one that releases products based on “vision” rather than”getting-the-fucking-job-done-well“. I’d even settle for: “getting the job done well enough” and do it with more cores but instead if you buy AMD now you’re probably getting a Piledriver chip at the high-end which is finally a little bit faster in some situations than the k10h architecture… they could have done an 8-core k10h in approximately the same die area but minus the GPU and I’d rather have had that because all my GPU needs are serviced by a separate card in a PCI-e slot.
I’d buy that chip, fuck I’d buy a 16-core version without hesitation. A die shrunk 8/16-core Phenom, L3 cache intact, the improvements they rolled into the Stars (k10h successor) architecture, no GPU, on a quad channel memory bus and clocked at about 3Ghz. That could pummel my Core i7 into the floor with the loads I have in mind.
It’s not to be though, that AMD is dead and for some reason so are desktop 8/16-core CPUs it seems.
What I do when I’m programming.
Posted by Game Development, Life, Lua, Pioneer | Posted on 02-02-2013
| Posted inToday, as with the last 6 weekends in a fucking row I am ill, whoop, and being the weekend the doctors are off. It’s not an “emergency” so I can’t get seen until Monday, by which time I will be ok again and they won’t know what to do… fucking fabulous.
Instead of posting about that, or how you can’t buy a laptop with a decent resolution screen these days unless it’s made by Apple, I’ve decided to describe how I approach programming a feature in Pioneer. I happen to be working on a relatively bite size one at the moment so it’s fair game.
Ah Lua, how do I loathe thee… I mean Love, yeah Love…
Posted by Game Development, Lua, Pioneer | Posted on 25-11-2012
| Posted inI’ve never understood the love given to scripting languages embedded in a game engine.
I’m going to take Lua in Pioneer, or in anything else for that matter but it’s Pioneers that sparked this off. You have a system written in C++, you expose it to Lua with C++ side functions that get presented to Lua scripts, you then program in Lua.
You are still programming, it’s just another programming language. Lua is not King, neither is C++, they’re both just programming languages.
Now, inevitably, the next step occurs: Everything has to be done in Lua.
What was a convenience, or a way of rapid prototyping, or a way of scripting light data handling routines, or for displaying data in a GUI is now doing heavy lifting in the engine at about 1/35th to 1/50th the speed it was being done in the traditionally compiled code.
Of course by this time only experienced programmers can actually write or modify the scripts because to make Lua useful you’ve extended it with home grown libraries & since the purpose of Lua is usually to make designers and non-coders lives easier it has fundamentally failed in this regard by this stage.
Whole systems are exposed from C++ meaning that you’re maintaining code twice except that you’ve exposed the worst bits of C++ via the wooley type unsafe Lua where the most advanced editor has all the sophistication of “Notepad.exe”.
Lua is not king, Lua quickly becomes a ball ache most of the time because it grows out of it’s usefulness, rapidly doubles the amount of work required to maintain engines, and slaughters anywhere it’s used in a performance critical subsystem.
I say this as someone who has programmed using it at several companies and Loves Lua for scripting. I just don’t think it’s anything other than a helper and best if it’s regularly pruned to reduce what it’s used for.
Some things should be moved out of Lua in Pioneer entirely and into some form of structured data generated by a tool. All the LMR stuff is obvious, ship definitions, spacestation configuration info, and ANYTHING to do with vectors/matrices/quaternions.
Other stuff is perfect Lua fodder: missions, trade pricing, defining factions, the GUI and probably a few others.
It’s just so annoying writing something in one language, then everyone wanting it in Lua too. Fuck off. It’s written already. Why have it in yet another language? It’ll be doing the same thing! Only then it’ll be in a language that I can muddle by in compared to C/C++ which I’ve been doing for 18 years (33 now, 15 when I started). What bloody good will that do? Will it mean more people can use it? No. There’s already a load of people who can write in C++ on the project who don’t know/use Lua. If anything it will reduce the number of people who can use it to only those who know/use Lua!
Does anyone really think that something has been done in Lua that couldn’t have been done in the C++ side? No. It does mean however that there’s a shitlod of C++ code, then a shitload of C++ interface code, and then a shitload of Lua code to make the C++ do what would have taken at least one shitload less of interface code to just do directly in C++.
You know what? If you find yourself embedding Lua to make your life easier and to get away from C++ then Lua isn’t the answer.
C# is.
This whole “working for a living”
Posted by Game Development, GLSLPlanet, Life, Pioneer | Posted on 28-10-2012
| Posted inWe as a species have fucked up, I can prove it too, just listen… or read; We have to work to live, this is clearly a cock up since as anyone can tell you our working lives are a clusterfuck. No-one really manages anything, your boss gets paid more than you for seemingly doing a lot less which works right from the bottom up meaning that the person at the top is essentially paid to play golf… badly and what you’re actually doing is, for most of us, a worthless crock of pointless shit that if anything actually detracts from the sum total worth of our species.
On an unrelated note I’ve just finished my 2nd week at my new job. Too pessimistic?
Try again: It’s not actually too bad, the people are nice… hmm, no that’s a bad start, I mean the people are always nice. Andys top tip is; if you ever start a job where the people aren’t nice, explain to the not nice persons boss that they’re reason you’re not taking the job and then walk out, possibly whilst also flipping the not-nice-person the bird. Starving to death on the streets is ok kids…
Ok take 3; The new job is ok, it has exactly the problems I expected, MFC, no real direction at the moment, MFC, the settling in, MFC, being told that “It’s not like those little projects you’ve worked on” … like 2 MMOs, numerous console titles etc. Piddly little things. Had to smile nicely, nod to that and just… ignore it, MFC, and of course I’ve already been asked to things that would be a piece of cake… if we weren’t using MFC. I think the worst things are that it looks like I’m going to be focused on one specific area for quite a while and that the office politics permeates everything. There’s a certain amateurish approach that makes things harder, ironically some individuals think it’s actually very professional, that leaves gaps in the development process wide enough to drive a lorry through. No-one seems in charge of the tech’ so there’s no direction that it’s heading in. I can’t see it as a long term career place at the moment but we’ll see how I settle down in the coming weeks. There’s a trip to Frankfurt to meet the team over there coming up too so I’m hoping that something about all this reaches out and grabs my attention. At the moment and mere 2 weeks in, nothing has really done that.
In more awesome news though, which isn’t job related, someone else has started hacking away at the Faction stuff I added to Pioneer in Alpha 27 :)
I haven’t had time in the last month to do much so seeing someone else come in and extend it is very good news! I think Pioneer is where I’m going to continue to find the reward side of programming and work is just going to be 9-to-5 from now on. Although I have done some more on the slightly separate GLSLPlanet code that I extracted and am working on – it now renders a limited set of terrain tiles at once, I need to do some work on that really and I think I need to prevent the root node from having any tiles of it’s own which would make the sub-tile grouping for processing a lot cleaner.
One thing that I am happy about with the new job is that I’m allowed to work on all of this and possibly on my own company, although there are several clauses covering IP and any title I’d release of my own. I have assured them that I won’t be releasing any multi-million pound First Person Shooters whilst I work for them…
Not much more to tell, so far have avoided doing anything for Halloween. I’ve tried to join in before but I just don’t feel anything for it, the dressing up has never amused me that much and I just generally don’t get it so, meh.
Adios!