Porting: SIMD

The work on making the code portable continues.

Way back when I first got an i7 CPU, I wrote a global illumination ray tracer that made really nice pictures while pushing the CPU to its limits. The math code for the ray tracer used the CPU’s SIMD instructions to perform the math operations more quickly, and I got very used to coding with them. The performance gain using them was fairly large.

For Banished I wanted to get started writing code and the game quickly – so I tended to use things I knew very well – everything from the choice of compiler, modeling tools, algorithms, and APIs. For math code (which is used by pretty much anything in the game that moves, animates, or is displayed in 3D) I chose to use the SIMD intrinsics.

That stands for Single Instruction, Multiple Data – meaning the instructions operate on more than one value at a time.

For example the single instruction:

c = _mm_add_ps(a, b);

would be the same as doing 4 additions:

c.x = a.x + b.x;
c.y = a.y + b.y;
c.z = a.z + b.z;
c.w = a.w + b.w;

If all goes well, I should be able to compile on OSX and Linux using the same SIMD instructions, as I’ll still be targeting Intel/AMD CPUs. But just in case it doesn’t work or compile, or more importantly, if I port to a platform without those intrinsics, I’d like the code to be ready.

In writing the SIMD code, I didn’t keep any sort of portable C++ version around. Which is not so good. Having the C code for reference material is nice. Especially for something that’s been optimized to be fast and is no longer easily readable – or some equation that was hard to derive and that scrap of paper got thrown away…

The porting process is pretty easy. Most SIMD code just performs 4 operations at once, so in plain C++ code, all the operations are generally written out explicitly.

The C++ code I wrote is now much easier to look at. But I’ve also left the code open enough that if another platform has it’s own set of differing SIMD instructions I can write code using them if so desired on that platform, or I can just use the C++ code.

The worst part of porting the math code like this is typo’s. After all, there’s not too many different ways to write a matrix multiply, matrix inverse, or quaternion interpolate. So any errors tend to be errors in typing. You write .x, .y, .z, or .w (or similar) so many times that they tend to just blur together. You look at the code so many times knowing what it should be, that you don’t see the glaring error.

So after writing the C++ code, everything appeared to work correctly, except I noticed some deer would twitch during their animations.

It took me more time debugging to find the two errors that caused this than the time it took to port the entirety of the math library to generic C++.

In one place I had mistakenly written

float d = (a._v._0 * b._v._0) + (a._v._1 * b._v._1) + (a._v._2 * b._v._2) + (b._v._3 * b._v._3);

instead of

float d = (a._v._0 * b._v._0) + (a._v._1 * b._v._1) + (a._v._2 * b._v._2) + (a._v._3 * b._v._3);

And in the other location

Vector3((_r._0 * a._r._0) + (_r._1 * a._r._0) + (_r._2 * a._u._0),
	(_r._0 * a._r._1) + (_r._1 * a._f._1) + (_r._2 * a._u._1),
	(_r._0 * a._r._2) + (_r._1 * a._f._2) + (_r._2 * a._u._2)),

instead of

Vector3((_r._0 * a._r._0) + (_r._1 * a._f._0) + (_r._2 * a._u._0),
	(_r._0 * a._r._1) + (_r._1 * a._f._1) + (_r._2 * a._u._1),
	(_r._0 * a._r._2) + (_r._1 * a._f._2) + (_r._2 * a._u._2)),

Hopefully you can spot the errors faster than I did – it’s easier without additional math code above and below. I guess my brain saw what I expected to see, rather than what it really saw.

I’m actually fairly amazed the game displayed mostly correctly with these errors – the first error was part of Quaterion::Interpolate – which is used on every object displayed in game every frame, as well as during animation. The second error was in the code to multiply one orthonormal basis by another. Also used very heavily.

And for those of you thinking I should have unit tests for this kinda of thing, yeah math is definitely a system that can be unit tested – but eh, unit tests, that’s a discussion for another day.

Anyway, lots of debugging later the math code is now portable and the game works properly as far as I can tell with identical output.

Next up is an OpenGL version of the renderer, which I’ve got half done… more to come later.


    January 16, 2015 5:37 pm

    Ooh men, took me several minutes to find the difference.

    First: last bracket, b instead of a
    Second: first line, middle brackets, r instead of f

    January 16, 2015 5:40 pm

    Very interesting to read Luke.
    Unit testing… Very helpful except if you make the same typo in the unit test code… lol.

    It’s really great to read what you are doing code wise (and I understood everything of it, yay).
    Good luck with porting. I hope you’ll have less typo’s this time, even two is very annoying.

    January 16, 2015 6:34 pm

    As a software engineer, I’d love to read a postmortem once you’re done porting. Things that you would have done differently from the start knowing what you know now and how you would go about starting your next project.

    I’m very impressed with what you’ve been able to accomplish as a solo developer. It’s very inspiring.

    Andrew Haining
    January 16, 2015 6:47 pm

    Another point worth noting is that automatic vectorisation and a lot of other modern compiler and cpu optimisations make explicit intrinsics less performant than it once was.

    I’m weirdly going through a very similar process with my maths library which is the same age right now & i’ve had similar typo bugs with my code. Something that helped me was to introduce the new version along side the old version, assert equivelence and then replace for one function at a time, it’s probably not much use to you now though :S

    January 16, 2015 7:21 pm

    Thanks for the reading. Hope you keep writing your exp.

    January 16, 2015 7:36 pm

    If we purchased the windows version on Steam will we be able to download the OSX version once it goes live?

    January 16, 2015 7:42 pm

    Thank you so much for writing these. As a software engineering student I love reading these posts. Really insightful.

    January 17, 2015 5:06 am

    While reading this, I was wondering: how do you manage to switch between SIMD/non-SIMD code depending on the system it’s running on? I mean, if you have to run a math function several thousand times in a loop, using conditionnal branches/virtual tables/any other runtime mechanism will hit performance a lot, right? So how do you “access” your SIMD/non-SIMD functions ?

    January 17, 2015 5:24 am

    use ctrl+F to find the difference,Just sayin :)

    Michael Loftis
    January 17, 2015 1:14 pm

    @Nathan – you typically make the decision on what to use a bit earlier than deep inside a tight loop – but even still – modern processors (even the ones w/o SIMD) are really pretty good at indirect stuff/virtual tables that C++ generates. There’s also other tricks that can be done, load time linking/resolving or runtime replacement (yup, self modifying code) of calls (ala DTrace) and the like.

    Hell of a SW eng project man!

    Jon Harper
    January 17, 2015 2:59 pm

    @Nathan: runtime platform check. Load a library with the correct platform target based on the check (SIMD library for SIMD-capable systems, and vice-versa).

    January 17, 2015 6:09 pm

    I agree with Eddy. Although I am just learning to become a software engineer. I like to know what you would do differently afterwards. Ik hope to see the Mac version soon. As I played the game a couple of hours on my Mac with the wine bottler. But a native version always runs smoother.

    January 17, 2015 6:09 pm

    @Post above was made by me, I was to tired 😀

    January 17, 2015 7:17 pm

    Hey, I am having a lot of issues with loading the game. It was working last night just fine after hours of fighting with it, now I’m back to fighting with it again. I have the latest patch, tried fixing the registry videointerface value, deleting the Menu.msc file, I’m at a loss. I run it at 64 bit and was running it last night with that. If I could get someone to e-mail me or something that would be so fantastic thanks!

    January 19, 2015 6:38 pm

    You are a breath of fresh air in developer transparency.

    In my children’s LEGO speak :

    “You are the Master Builder!”

    January 20, 2015 6:37 am

    I’d love to hear your thoughts on unit testing. I’ve yet to see it used in a game studio, but maybe I haven’t been at the right places in that regard.

    January 21, 2015 8:33 am

    Hi, like Eddy and Dick I have hunger for your experience. My studies/work are about programming so I’m really enjoying more technical. Which doesn’t prevent me from reading everything you wrote on this blog!

    Jon Harper
    January 24, 2015 12:11 pm

    @Nathan: Michael explained it better than I did, but here’s a dummy code example:

    class BaseMathClass {
    //abstract virtual functions here
    virtual int func_1(int a, int b) = 0;
    virtual double func_2(double a, double b) = 0;

    class SimdMathClass : public BaseMathClass {
    //implement functions here
    int func_1(int a, int b);
    virtual double func_2(double a, double b);

    class NonSimdMathClass : public BaseMathClass {
    //implement functions here
    int func_1(int a, int b);
    virtual double func_2(double a, double b);

    BaseMathLib *math_lib;

    bool setup_math_lib(bool use_simd)
    if (use_simd)
    math_lib = dynamic_cast(new SimdMathClass);
    math_lib = dynamic_cast(new NonSimdMathClass);
    return math_lib != 0;

    At runtime, the class inheriting BaseMathClass is loaded but is transparent to calls to math_lib. No checks are needed after calling setup_math_lib().

    There are many, many other ways of doing this, such as a class that loads a library and creates function pointers to the library’s math functions.

    January 26, 2015 3:54 pm

    A lot of nice technical things. Love the game but in my opinion you should work on an update
    to improve the game and bring more basic ideas to it.
    The only interessting mod is CC, but the latest version isn’t stable. But no mod can really change the basic ideas. E.g. there is a lot of different food and fields and so on but it doesn’t matter what i use.
    Sorry for my english, but i think you know what i mean.