Passing values…

A few months ago I had some interesting performance problems with OpenGL on OSX. I identified the problem and made some work arounds for development to continue. This week I’ve properly fixed the issue, and I want to record it here for myself and others to avoid this mistake.

So here’s a scene, rendering on OSX, at an abysmal frame rate of 14 on a MacBook Pro. That’s right. 14. I’ve got the game paused so there isn’t any time spent on updates, this is just drawing.


If I move the camera to a different location, the frame rate is 126. Thats a difference of 63 or so milliseconds. Ouch.


So after much debugging I determined that rendering animated models was causing the slow down. The image of just trees doesn’t have any deer or people moving around. And if I remove the people from my original test scene, the frame rate is over 100.


Since rendering houses and trees really only has minor differences with animated models I disabled the shader code that animates the models and the frame rate went back up to normal. This looks funny, and runs fast.


So here’s the basic code that handles animation in GLSL. It looks pretty standard and is simple code. This isn’t the entire shader, just enough to get an idea of how the animation part works.

struct BoneConstants
    mat4x4 transforms[64];

uniform BoneConstants bc;

in vec3 inputPosition;
in vec4 inputWeight;
in ivec4 inputIndex;

vec3 SkinPosition(vec3 position, ivec4 index, vec4 weight, BoneConstants bones)
		((bones.transforms[index.x] * vec4(position, 1.0)) * weight.x + 
		 (bones.transforms[index.y] * vec4(position, 1.0)) * weight.y + 
		 (bones.transforms[index.z] * vec4(position, 1.0)) * weight.z + 
		 (bones.transforms[index.w] * vec4(position, 1.0)) * weight.w)).xyz;

void main()
    vec3 position = SkinPosition(inputPosition, inputIndex, inputIndex, bc);
    gl_Position = (gc.worldToProjection * (tc.transform * vec4(position, 1.0)));

What this code does is transform the position of a vertex by up to four bones in the models structure. It then weights them by how much influence each bone has on the vertex.

I stared at this code for a while (more than a while actually), and after messing about a bit, it finally dawned on me what’s wrong with it. Face Palm.

To fix it, instead of calling a function to animate the models, I manually inlined the code. And my frame rate returned to normal, with animated characters.

void main()
   vec4  position = 
   		((bc.transforms[inputIndex.x] * vec4(inputPosition, 1.0)) * inputWeight.x + 
		 (bc.transforms[inputIndex.y] * vec4(inputPosition, 1.0)) * inputWeight.y + 
		 (bc.transforms[inputIndex.z] * vec4(inputPosition, 1.0)) * inputWeight.z + 
		 (bc.transforms[inputIndex.w] * vec4(inputPosition, 1.0)) * inputWeight.w)).xyz;
    gl_Position = (gc.worldToProjection * (tc.transforms[gl_InstanceID] * vec4(position, 1.0)));

Wow. So whats going on there?

There’s two ways to pass parameters to a function. Either by value, or by reference.

When you pass a parameter by value, a copy of the variable is made so that any changes to the variable in the function don’t effect its value in the calling function.

When you pass a parameter by reference any modifications to the variable change it directly. No copy is made.

In my case with animation, the entire array of bone transformations is being copied, because it’s being passed by value. My suspicion is that the program running on the GPU doesn’t have enough registers to make this copy, so the GLSL compiler is generating code – copying the array bit by bit, and then is running the code over and over to evaluate the final result. What’s just a few matrix multiples, scaling, and adding becomes many many copies and conditionals. This possibly results in different execution paths per GPU thread, causing even more slowdown.

My first attempt before manually inlining this code was actually to pass the array by reference, but the OpenGL compiler yelled at me that you can’t pass a uniform by reference.

On Windows and Linux, I suspect the compiler is smart enough to see that the function doesn’t modify the array, and optimizes the copy away. (Or my GTX 980 and 290X are just too fast for me to notice the slowdown…)

Most people directly reference the global list of uniform bone transformations directly and never run into this issue. But since my custom shader language that generates GLSL doesn’t have a concept of globals, everything is passed to functions if it’s needed. Arghghghg.

So what’s the real fix?

I don’t want to have to manually repeat code in shaders, that’s just bad programming practice. Luckily, I control the compiler for my own shading language, so I can get it to generate different code.

So I just recently added an ‘inline’ keyword for functions. The code gets inlined automatically and any value passed by reference isn’t copied when the GLSL is generated.

Previously my skinning function looked (in SRSL, not GLSL) like this:

inline float3 SkinPosition(float3 position, int4 index, float4 weight, BoneConstants bc) {...}

And now it looks like this

inline float3 SkinPosition(float3 position, inout int4 index, inout float4 weight, 
	inout BoneConstants bc) {...}

No more repeated skinning code everywhere.

Getting my compiler to inline the code is pretty easy. However, as most shader languages don’t feature a goto or label statement to jump over remaining code, it’s hard (if not impossible) to inline a certain class of functions. So my inline feature doesn’t handle inlining when returning from complex flow control. This really isn’t an issue for shaders, as the programs tend to be straight forward and not have many loops or conditionals.

So long story short, don’t pass uniform arrays and large structs to a function by value in GLSL.


Quick Fixes

I just uploaded some quick bug fixes introduced with the last build. 1.0.6 is live on Steam. If you need to redownload it from Humble you can log in and grab it, or you can use this tool: should have an updated build shortly.

There’s a new modkit, available here:, though there shouldn’t be any changes to it from 1.0.5.

Changes in this build:

  • – Fixed a crash that occurred when clicking on the town hall if a translation mod was in use that was built with 1.0.4. Missing text data will now be blank.
  • – Fixed a bug that caused orchards and pastures to not drop items inside their boundaries as was intended.


Today 1.0.5 has been released! If you play on Steam, you should get an auto update. On Humble Store, or if you bought direct, you can download the new version by logging into your humble account, or using this tool: If you bought on, you’ll have to log in and download it. GOG might take a few more hours to update to 1.0.5.

There’s a new modkit, available here:

If you want, you can patch from 1.0.4 to 1.0.5 manually (the non-steam version). You can use this patch here: Just unzip it into the directory where Banished is installed.

If anything seems amiss with the new version, contact me.

Changes in 1.0.5:

  • – UTF8 is now used instead of USC2.
  • – Resource files can be in UTF8, USC2, UTF16, big and little endian. They’ll be converted to UTF8 on load.
  • – Memory usage allowance has been increased to 1 gigabyte, which should allow for larger mods.
  • – All materials now use custom shading language SRSL instead of HLSL.
    • – Any mods with custom materials will need to be modified to point to the new shaders and/or use SRSL.
  • – Math library can now be compiled without the need for SIMD instructions.
  • – OpenGL is now supported (but isn’t currently being released with the PC version)
  • – Data compilation is now in a separate DLL – CompileWin.dll – this can be swapped out for other platforms (consoles, mac, linux, etc)
  • – Shader compiler is now in it’s own DLL. Video DX9/DX11/GL dlls are no longer required for compiling shaders.
  • – Added safety code to check for invalid and dangling pointers – this should make catching hard to find and rare issues easier.
  • – Sped up mod details dialog for massive mods that have 10000’s of files included. This should make looking at conflicts and uploading to Steam workshop easier.
  • – Beta Mods and Mods newer than the currently released version can no longer be uploaded to Steam Workshop.
  • – Nvidia and AMD GPUs in laptops should now be auto selected for use, instead of an Intel Integrated card.
  • – Textile limit is now available for modders to use.
    • – Cropfields, Fishing, Forester, Hunters, Orchards, and Pastures now have a configurable resource limit.
    • – Livestock has a resource limit for the by product they make (eggs, wool, milk, etc) Note that if a by product isn’t created because of the resource limit, the icon won’t appear above the building.
    • – Added textile to the Status Bar, Resource Limit window, and Town Hall UI
    • – Added graphs for textiles to Town Hall UI
  • – Fixed a bug that caused fonts from 1.0.4 to not load in 1.0.5. A UCS2 – UTF8 conversion wasn’t made properly.
  • – Fixed a bug that caused dropped resources (from citizen death/task cancelation) to drop in invalid places.
  • – Fixed a bug that caused orchards to cause invalid data access and or data corruption if a citizen tried to harvest a tree, but the tree died before he got there.
  • – Fixed a bug that caused potential memory corruption when cutting down an orchards trees.
  • – Fixed a bug that caused a crash if game startup failed before memory allocation was available or was corrupt. It now properly displays an error.
  • – Added better error message if the game runs out of memory due to too many mods loaded.
  • – Fixed a bug that caused a crash when loading old mods that had custom materials. The game will no longer crash, however objects with those materials will not display. To fix this issue, mods should be updated to the newest mod kit version and update the materials.

Thoughts on Linux

I wanted to share a few things about my experience getting Banished working on Linux. I haven’t seriously used Linux to develop software since around 1997-1999, where I just used vi and a Makefile.

Getting Started

Getting linux going should be easy right? Download ISO, burn disc, install in new machine. But then the installer fails to make partitions on my brand new SSD! That was ok, nothing is ever easy – so I decided to setup the partitions myself using gparted. Does it work? No. After two days I figured out that for some reason the install wouldn’t work with the drive plugged into easy-to-access SATA5. Plugged the drive into SATA0 underneath the video card, and we’re running!

I like to have multiple machines that can compile the code, which ensures all required files are checked into source control, and that multiple GPUs render correctly. I had an extra laptop with a 600 series Nvidia GPU, so I got linux going on that, but had another few days of trying to get the Intel/Nvidia Optimus setup working under linux. I’m still failing on that, it seems overly complicated, but I can compile on it – just not run. Bah.

Development Environment

There are a zillion IDEs available under linux. I read about a ton of them and their associated debugging options. There are too many to actually try them all in a reasonable amount on time. It almost made me want to just go back to my roots and use vi and a make file. But I don’t really want a Makefile, and I don’t want to use cmake. Especially since the Mac and Windows IDEs just have a project and do their own dependency checking.

So I ended up going with SlickEdit. Slick edit has no need of a make file, just add files to the project. It’s relatively cheap for a single user license, and has key bindings that match Visual Studio. If nothing else, it let me jump right in and not fight too much learning a new IDE.

Graphics Drivers

My linux machine is an 8 core AMD with and AMD R9 graphics card. Installing the graphics driver was no problem, and getting OpenGL working wasn’t too bad once I read through the GLX documentation. I did go through a few days of random crashes that would freeze the entire X11 desktop (always on a GL context related call), my only recourse being Alt-SysReq-B. Boo.

Turns out I forgot to implement a certain platform specific locking mechanism that kept my background loading thread for executing graphics calls at the same time as the main rendering thread, causing some data corruption of lists of resources. Not so bad, but the entire system halting made for very slow debugging.


So when I told fellow programmers I was going to deal with straight X windows and GLX, they told me I was crazy and would regret it. But really, all I need is to create a window, and render OpenGL into it, and get some keyboard/mouse input. So that’s what I did. However when searching for answers to questions that weren’t made quite clear in the documentation, often people that had the same questions I did received answers like ‘Use SDL’, or ‘Use GTK or Qt’. That’s amazingly frustrating and just adds noise to the internet.

The only real problem I had with X11 is that the includes #define a lot of symbols that conflicted with identifiers in my code. I guess that goes with the territory of using a C only library that was written long long ago. I ended up having to forward some X11 symbols for use in my headers, and only include the actual header where needed.

When said and done the X11 code I had to write was far far smaller than any 3rd party library that handled the same things, has no extra dependancies, and compiles fast.


There seem to be a lot of options for outputting audio on Linux, but all signs seemed to pointed toward ALSA for my purposes. Since I have my own mixer that outputs PCM data, all that was required was to play a single stream of audio, which is pretty easy.

The only major frustration here is that the method I like to use (and use similar methods on Windows and OSX) is to get a callback every N milliseconds and then fill in an audio buffer. However under Ubuntu the callback mechanism of ALSA isn’t implemented (apparently unsafe?). So I ended up writing my own audio thread that polls for the audio buffer to become available, fills in the audio data and then the thread goes back to waiting.

Porting everything else

Most of the remaining code was easy. It shares with OSX, so threading, memory management, critical sections, file/io, etc was already written and worked without an issue, except for a few date and timer functions.

Over all I liked the linux experience of porting much better than OSX. I didn’t have to use another language, and I get to control the overall game loop and when events get processed. Then again, I already had a lot of compatible code ready from doing the OSX port, so there could be some bias there.

Going forward

Linux as a development environment is pretty good. But.

One of the more frustrating things about doing these ports is that currently the data is only compiled on Windows, then has to be packaged into a single file, and then copied over to the target machine. This is fine when things are said and done code-wise, but during development a lot of debugging requires changing data.

For example, lets say I just want to debug vertex skinning of animated characters because it’s not working. This requires a change to the vertex shaders that will be used. So I turn to the Windows machine, change the shader, compile all resources, build a pack file, copy the pack files to a shared location, and then start debugging on Linux or OSX. I go back and forth doing this 20 times before fixing the issue. It’s slow.

I really need to get my full toolchain working on both OSX and Linux – so that regardless of what machine I have with me I can fully build my game and work normally and faster. Once I do, the game will recognize that a resource has changed, and compiles it just as it’s loaded.

But this requires me changing a bunch of Windows only compilation functions. This includes loading images, building compressed textures, building fonts, loading and converting audio data, and setting up FBX to work on other platforms. There’s probably more. It’s not insurmountable, but it’s a good chunk on code. So it’ll probably wait until the next project.


Three Month Update

The last three months have been full of holidays, travel, Fallout 4, and the Witness. And a bunch of coding. I may elaborate more on the details of whats been going on with the code lately, but for now here’s a quick update.


Despite some annoyances getting linux setup and a proper development environment going, the Linux build is pretty feature complete, and fully playable. I’ve been developing under Ubuntu 14.04. Overall my linux experience has been really good – compared with the last time I seriously developed on linux, circa 1999. If nothing else the linux build shares a lot of code with the OSX version – at least those things that are POSIX compliant, and OpenGL.

I’ve tried to keep dependancies down to a minimum. If your system has X11, ALSA for audio, and an OpenGL 3.2 driver, it should be good. I still have to get Steam integration compiled in, as well as test it on SteamOS. Plus figure out how to package it up for install.


Banished is fully playable on Mac, although I have a little finishing up to do. Things like fullscreen/windowed toggle, changing some video options, and supporting custom cursors. It also needs the Steam integration, but thats just a different compile with some other options.

While working on Mac has been okay, I’m not entirely thrilled with the way I have the code setup. It needs some refactoring to better separate the Objective-C stuff and keep some of my namespaces from being polluted by including things they shouldn’t.

I had some additional OpenGL performance annoyances, with shaders generating really bad code that took a while to track down. Maybe one day I’ll use Metal and it will be less facepalmy.


The OpenGL renderer is pretty fast now. I had some crazy bugs and mistakes that were only revealed by getting it running on three different platforms. It’s nearly up to the performance of the DX11 version.

Strangely the Windows version still has some things wrong with it in certain edge cases. I guess that’s easy to have happen since Windows is using a slightly modified version of the code due to being able to switch renderers at runtime. At some point I’ll figure out the diffs between the two sets of code.


A lot of my time was spent trying to figure out how to get identical sound output on all platforms. I looked at a lot of sound libraries, didn’t like something about all of them, so ended up writing my own.

Audio in the engine is now all ogg/vorbis, instead of the Windows specific WMA/XMA. Using some nice public domain software, the ogg is decoded as needed in a background thread, and I wrote my own mixer to blend the sounds. So now per platform, all I need to do is output 5 milliseconds of PCM sound data at a time when it’s needed, and everything sounds the same.

Yes, I know many people will think this is crazy. But it was a great learning experience for audio related topics. It’s also cool in that the per-platform audio code is now about 100 lines per platform, and everything else is common code. As my audio needs grow, I can just extend the mixer. Sweet.

Version 1.05

I still need to release 1.0.5. There’s a few bugs that got reported I need to fix first. I guess I’ve just been distracted by porting – plus working on the two code bases at once is slightly annoying. 1.0.6 will be the first version with OSX/Linux support, so I’ll be glad when I’m only working in a single code base instead of two.

Edit: Looks like the comment section is broken… looking into it.