Thoughts on Linux

I wanted to share a few things about my experience getting Banished working on Linux. I haven’t seriously used Linux to develop software since around 1997-1999, where I just used vi and a Makefile.

Getting Started

Getting linux going should be easy right? Download ISO, burn disc, install in new machine. But then the installer fails to make partitions on my brand new SSD! That was ok, nothing is ever easy – so I decided to setup the partitions myself using gparted. Does it work? No. After two days I figured out that for some reason the install wouldn’t work with the drive plugged into easy-to-access SATA5. Plugged the drive into SATA0 underneath the video card, and we’re running!

I like to have multiple machines that can compile the code, which ensures all required files are checked into source control, and that multiple GPUs render correctly. I had an extra laptop with a 600 series Nvidia GPU, so I got linux going on that, but had another few days of trying to get the Intel/Nvidia Optimus setup working under linux. I’m still failing on that, it seems overly complicated, but I can compile on it – just not run. Bah.

Development Environment

There are a zillion IDEs available under linux. I read about a ton of them and their associated debugging options. There are too many to actually try them all in a reasonable amount on time. It almost made me want to just go back to my roots and use vi and a make file. But I don’t really want a Makefile, and I don’t want to use cmake. Especially since the Mac and Windows IDEs just have a project and do their own dependency checking.

So I ended up going with SlickEdit. Slick edit has no need of a make file, just add files to the project. It’s relatively cheap for a single user license, and has key bindings that match Visual Studio. If nothing else, it let me jump right in and not fight too much learning a new IDE.

Graphics Drivers

My linux machine is an 8 core AMD with and AMD R9 graphics card. Installing the graphics driver was no problem, and getting OpenGL working wasn’t too bad once I read through the GLX documentation. I did go through a few days of random crashes that would freeze the entire X11 desktop (always on a GL context related call), my only recourse being Alt-SysReq-B. Boo.

Turns out I forgot to implement a certain platform specific locking mechanism that kept my background loading thread for executing graphics calls at the same time as the main rendering thread, causing some data corruption of lists of resources. Not so bad, but the entire system halting made for very slow debugging.

X11

So when I told fellow programmers I was going to deal with straight X windows and GLX, they told me I was crazy and would regret it. But really, all I need is to create a window, and render OpenGL into it, and get some keyboard/mouse input. So that’s what I did. However when searching for answers to questions that weren’t made quite clear in the documentation, often people that had the same questions I did received answers like ‘Use SDL’, or ‘Use GTK or Qt’. That’s amazingly frustrating and just adds noise to the internet.

The only real problem I had with X11 is that the includes #define a lot of symbols that conflicted with identifiers in my code. I guess that goes with the territory of using a C only library that was written long long ago. I ended up having to forward some X11 symbols for use in my headers, and only include the actual header where needed.

When said and done the X11 code I had to write was far far smaller than any 3rd party library that handled the same things, has no extra dependancies, and compiles fast.

Audio

There seem to be a lot of options for outputting audio on Linux, but all signs seemed to pointed toward ALSA for my purposes. Since I have my own mixer that outputs PCM data, all that was required was to play a single stream of audio, which is pretty easy.

The only major frustration here is that the method I like to use (and use similar methods on Windows and OSX) is to get a callback every N milliseconds and then fill in an audio buffer. However under Ubuntu the callback mechanism of ALSA isn’t implemented (apparently unsafe?). So I ended up writing my own audio thread that polls for the audio buffer to become available, fills in the audio data and then the thread goes back to waiting.

Porting everything else

Most of the remaining code was easy. It shares with OSX, so threading, memory management, critical sections, file/io, etc was already written and worked without an issue, except for a few date and timer functions.

Over all I liked the linux experience of porting much better than OSX. I didn’t have to use another language, and I get to control the overall game loop and when events get processed. Then again, I already had a lot of compatible code ready from doing the OSX port, so there could be some bias there.

Going forward

Linux as a development environment is pretty good. But.

One of the more frustrating things about doing these ports is that currently the data is only compiled on Windows, then has to be packaged into a single file, and then copied over to the target machine. This is fine when things are said and done code-wise, but during development a lot of debugging requires changing data.

For example, lets say I just want to debug vertex skinning of animated characters because it’s not working. This requires a change to the vertex shaders that will be used. So I turn to the Windows machine, change the shader, compile all resources, build a pack file, copy the pack files to a shared location, and then start debugging on Linux or OSX. I go back and forth doing this 20 times before fixing the issue. It’s slow.

I really need to get my full toolchain working on both OSX and Linux – so that regardless of what machine I have with me I can fully build my game and work normally and faster. Once I do, the game will recognize that a resource has changed, and compiles it just as it’s loaded.

But this requires me changing a bunch of Windows only compilation functions. This includes loading images, building compressed textures, building fonts, loading and converting audio data, and setting up FBX to work on other platforms. There’s probably more. It’s not insurmountable, but it’s a good chunk on code. So it’ll probably wait until the next project.

52 Comments

Three Month Update

The last three months have been full of holidays, travel, Fallout 4, and the Witness. And a bunch of coding. I may elaborate more on the details of whats been going on with the code lately, but for now here’s a quick update.

Linux

Despite some annoyances getting linux setup and a proper development environment going, the Linux build is pretty feature complete, and fully playable. I’ve been developing under Ubuntu 14.04. Overall my linux experience has been really good – compared with the last time I seriously developed on linux, circa 1999. If nothing else the linux build shares a lot of code with the OSX version – at least those things that are POSIX compliant, and OpenGL.

I’ve tried to keep dependancies down to a minimum. If your system has X11, ALSA for audio, and an OpenGL 3.2 driver, it should be good. I still have to get Steam integration compiled in, as well as test it on SteamOS. Plus figure out how to package it up for install.

OSX

Banished is fully playable on Mac, although I have a little finishing up to do. Things like fullscreen/windowed toggle, changing some video options, and supporting custom cursors. It also needs the Steam integration, but thats just a different compile with some other options.

While working on Mac has been okay, I’m not entirely thrilled with the way I have the code setup. It needs some refactoring to better separate the Objective-C stuff and keep some of my namespaces from being polluted by including things they shouldn’t.

I had some additional OpenGL performance annoyances, with shaders generating really bad code that took a while to track down. Maybe one day I’ll use Metal and it will be less facepalmy.

OpenGL

The OpenGL renderer is pretty fast now. I had some crazy bugs and mistakes that were only revealed by getting it running on three different platforms. It’s nearly up to the performance of the DX11 version.

Strangely the Windows version still has some things wrong with it in certain edge cases. I guess that’s easy to have happen since Windows is using a slightly modified version of the code due to being able to switch renderers at runtime. At some point I’ll figure out the diffs between the two sets of code.

Audio

A lot of my time was spent trying to figure out how to get identical sound output on all platforms. I looked at a lot of sound libraries, didn’t like something about all of them, so ended up writing my own.

Audio in the engine is now all ogg/vorbis, instead of the Windows specific WMA/XMA. Using some nice public domain software, the ogg is decoded as needed in a background thread, and I wrote my own mixer to blend the sounds. So now per platform, all I need to do is output 5 milliseconds of PCM sound data at a time when it’s needed, and everything sounds the same.

Yes, I know many people will think this is crazy. But it was a great learning experience for audio related topics. It’s also cool in that the per-platform audio code is now about 100 lines per platform, and everything else is common code. As my audio needs grow, I can just extend the mixer. Sweet.

Version 1.05

I still need to release 1.0.5. There’s a few bugs that got reported I need to fix first. I guess I’ve just been distracted by porting – plus working on the two code bases at once is slightly annoying. 1.0.6 will be the first version with OSX/Linux support, so I’ll be glad when I’m only working in a single code base instead of two.

Edit: Looks like the comment section is broken… looking into it.

15 Comments

Beta 1.0.5 version 2

There’s a new beta available for 1.0.5. This is version 151214.

Thanks very much to everyone who has been testing and sent bug reports and feedback.

Important!

If you have a mod that has fonts in it, and you built your mod using Beta 1.0.5 build 151026, this version is going to crash when loading it. In fact loading fonts from 1.0.4 with the previous build would also crash. This stopped most all translations from working.

Change List

Here are the changes to this build:

  • – Fixed a bug that caused fonts from 1.0.4 to not load in 1.0.5. A UCS2 – UTF8 conversion wasn’t made properly.
  • – Fixed a bug that caused dropped resources (from citizen death/task cancelation) to drop in invalid places.
  • – Fixed a bug that caused orchards to cause invalid data access and or data corruption if a citizen tried to harvest a tree, but the tree died before he got there.
  • – Fixed a bug that caused potential memory corruption when cutting down an orchards trees.
  • – Fixed a bug that caused a crash if game startup failed before memory allocation was available or was corrupt. It now properly displays an error.
  • – Added better error message if the game runs out of memory due to too many mods loaded.
  • – Fixed a bug that caused a crash when loading old mods that had custom materials. The game will no longer crash, however objects with those materials will not display. To fix this issue, mods should be updated to the newest mod kit version and update the materials.

How to get the build

If you are using Steam, go into your game library and right click on Banished. Select properties, and then in the windows that opens, select the BETAs tab. Select the drop down and pick Beta Test for 1.0.5.

If you don’t use Steam, you can download the patch here: BanishedPatch_1.0.4_To_1.0.5.151214.Beta.zip. Note that you need to apply the patch to version 1.0.4. Previous versions of the game won’t work with this patch. Once downloaded, just unzip the archive into the folder where you have Banished installed. This is usually C:\Program Files\Shining Rock Software\Banished\.

If you’re into modding, you can get the beta mod kit here: BanishedKit_1.0.5.151214.Beta.zip.

When will this build not be a Beta?

I’ve fixed all the bugs that were reported to me, so if there aren’t any serious bug reports in about a week, I’ll push this build live to everyone.

As before, if you find a problem, I’d like to hear about it. You can submit bugs on the forum in the new beta sub forum. Or through the regular Support methods.

21 Comments

Graphics Drivers

Gah. So if you saw the last post I made about OSX, you may remember it was running at 1 FPS.

I spent a lot of time thinking about this issue and a quite a bit of time trying to code solutions. Despite OpenGL being a ‘cross platform’ library, at this point I’m pretty sure each platform that uses it is going to have to be tailored to that platforms specific graphics drivers.

Here’s my debugging method. (This is going to sound elegant as I type this out, but there was a lot of stumbling and double and triple checking things…)

One Frame Per Second

So I’m sitting there looking at the game chug along at 1FPS, and thinking: the loading screens run fast, but the title screen runs miserably. The loading screens have 1-3 draw calls per frame, whereas the title screen has hundreds, if not thousands. Something per draw call must be going slow.

Sure enough, if I don’t make any draw calls, things run fast, but this is mostly useless, since I can’t see anything.

A few thoughts enter my mind.

Hypothesis

  1. – The graphics driver is defaulting to software rendering or software transformations.
  2. – I’m doing something that’s not OpenGL 3.2 compliant, or doing something causing OpenGL errors.
  3. – The GPU is waiting on the CPU (or vice versa) for something.

The first idea just shouldn’t be possible, as I selected a pixel format (an OpenGL thing that specifies what kind of rendering you’ll be doing) on OSX requiring hardware acceleration and no software fall back. But I’ll double check.

The second idea is somewhat likely, but I worked very hard to make the Windows renderer OpenGL 3.2 compliant and it doesn’t show any errors. But I’ll check anyway since it’s a different driver and different GPU using the same code.

Third idea? Let’s hope it’s not that.

Testing

How do you check something like this? There’s some sorta-ok GPU debugging tools available for OSX, so I downloaded them and started them up. After a little documentation reading, I got them working. You can set some OpenGL break points which will stop the program and give a bit of information if theres an error or if you encounter software rendering.

BreakPointsSet

Of course nothing is easy. No OpenGL errors, no software rendering. This immediately discounted ideas #1 and #2. So it’s probably #3. Something is syncing the CPU and GPU. Blah.

Next I looked at what OpenGL calls were being made and how long they were taking.

DrawCallsSlow

Ah ha! You’ll notice the highlighted lines (which are draw calls), and that opengl calls are taking up a crazy 98% of the frame.

Looking close at individual calls, the huge time differences can be seen between glDraw calls and other API calls…

SingleCallSlow

Having written low level code for consoles that don’t really have a driver has given me a good understanding of what sort of things go on when the CPU sends commands to the GPU, and what can cause a stall. Generally this happens when you’re either writing to dynamic resources that the GPU is currently using but the CPU wants to update. Or, when the CPU is waiting for the GPU to finish some rendering so it can access a rendered or computed result.

I only have 3 places in code that might cause this. The first one I looked at is updating vertex and index data used for dynamic rendering – which is used for particle systems, ui, and other things that change frame to frame.

The (abbreviated) code looks like this:

    GLbitfield flags = GL_MAP_WRITE_BIT;
    if (_currentOffset + bytes > _bufferBytes)
    {
        // at the end of the buffer, invalidate it and start writing at the beginning...
        flags |= GL_MAP_INVALIDATE_BUFFER_BIT;
        _currentOffset = 0;
    }
    else
    {
        // there's still room, write past what the GPU is using and notify that there's no
        // need to stall on this write.
        flags |= GL_MAP_UNSYNCHRONIZED_BIT;
    }
        
    glBindBuffer(GL_ARRAY_BUFFER, _objectId);
    void* data = glMapBufferRange(GL_ARRAY_BUFFER, _currentOffset, bytes, flags);    

    // write some data ....

    glUnmapBuffer(GL_ARRAY_BUFFER);

    // draw some stuff with the data at _currentOffset.

    _currentOffset += bytes;

It’s setup so that generally you’re just writing more data while the GPU can use data earlier in the buffer as it’s needed. Occasionally when you run out of room you let the driver know you’re going to overwrite the buffer. (This can be better with multiple buffers, but I didn’t want to overcomplicate this example code.)

This didn’t seem to be the problem as nearly every draw call was slow. Drawing that used fully static data was slow too. Static data is setup with code that looks like this.

    glGenBuffers(1, &_objectId);
    glBindBuffer(GL_ARRAY_BUFFER, _objectId);
    glBufferData(GL_ARRAY_BUFFER, bytes, data, GL_STATIC_DRAW);       

That data isn’t ever touched again, and hopefully the GPU takes the hint that it can reside in GPU memory so no problem there.

But then I noticed that not every draw call was slow. Using the OpenGL Profiler trace I could see that sequential draw calls without any changes to any render state in-between did not stall.

FastDrawCalls

Hmmmm….

What’s the most common thing that changes between draw calls? If it’s not the material on the object, it’s the location where that object is drawn. It’s transformation – position and orientation. Transformations are generally stored in a very fast (and fairly small) section of GPU memory meant just for this purpose. It’s also where the camera location, object color, and other variable properties are stored. We call this data ‘uniforms’. Or in my engine ‘constants’.

In OpenGL 3.2 I used uniform buffer objects, since it most closely matches my engine architecture and that of DX10/11. DX9 fits the concept as well, since you can specify the location of all uniforms. Seems like a good fit.

After some pre-configuration, sending uniforms to the GPU for vertex and pixel programs to use is really easy. It looks like this:

void ConstantBuffer::Bind(Context& context, void* data, int32 offsetBytes, int32 bytes)
{
    glBindBuffer(GL_UNIFORM_BUFFER, _objectId);
    glBufferSubData(GL_UNIFORM_BUFFER, offsetBytes, bytes, data);
}

To my knowledge this should be crazy fast. On some hardware (way down at the command stream level) this data is part of the command buffer and it updates constants just before the vertex and pixel shaders are invoked. Worst case if its actually a separate buffer the GPU uses, and/or the driver supports getting this data back to the CPU, it needs to copy it off somewhere until the GPU needs it and the last set values can be read back by the CPU without any stall…

But you never know….

I read the OpenGL docs again, and sure enough glBufferSubData can cause a stall and the GPU waits for the previous commands to consume the previous values.

“Consider using multiple buffer objects to avoid stalling the rendering pipeline during data store updates. If any rendering in the pipeline makes reference to data in the buffer object being updated by glBufferSubData, especially from the specific region being updated, that rendering must drain from the pipeline before the data store can be updated.”

Really? Why? Setting uniforms HAS to be fast. You do it almost as often as issuing draw commands!!! This has been true since vertex shader 1.0. (Yeah I know, this doesn’t have to be quite true for some of the newest GPUs and APIs)

So for kicks, since there’s more than one way to modify buffer data in OpenGL, I changed the ConstantBuffer update to:

void ConstantBuffer::Bind(Context& context, void* data, int32 offsetBytes, int32 bytes)
{
    glBindBuffer(GL_UNIFORM_BUFFER, _objectId);
    void* destData = glMapBufferRange(GL_UNIFORM_BUFFER, offsetBytes, bytes, GL_MAP_WRITE_BIT);
    memcpy(destData, data, bytes);
    glUnmapBuffer(GL_UNIFORM_BUFFER);
}

And while in my mind there really shouldn’t be any difference, the statistics on OpenGL commands changes to this:

MapBufferSlow

Huh, theres all that wait time again, but its moved to setting uniforms. Now I’m getting somewhere. I figure I’m just not using the API correctly when setting uniforms.

Experimentation

So I tried a bunch of different things.

I tried having a single large uniform buffer using the GL_MAP_INVALIDATE_BUFFER_BIT / GL_MAP_UNSYNCHRONIZED_BIT and glBindBufferRange() so that no constants were overwritten. This was slower. And yes, you can get slower than 1 FPS.

I tried having a uniform buffer per draw call so they were never overwritten, except between frames. This was slower, using either glMapBuffer or glBufferSubData.

I tried changing the buffer creation flags. No change.

I read about other coders running through their entire scene, collecting uniforms, updating a uniform buffer once at the beginning of the frame, and then running through the scene again just to make draw calls. This is stupid and slow.

I wished I could use a newer version of OpenGL to try some other options, but I’m using 3.2 for maximum compatibility.

Eureka!

Then I got a sinking feeling in my stomach. I knew the answer (actually was pretty sure…) but I didn’t want to code it. Ugh.

Back before OpenGL 3.0 / DirectX 10, there weren’t any uniform buffers. Uniforms were just loose data that you set one at a time using functions like glUniformMatrix4fv and glUniform4fv.

What isn’t great about the old way is every time you change vertex and pixel programs, you need to reapply all the uniforms that have changed that the next GPU programs uses. OpenGL 3.2 doesn’t let the shader pick where uniforms go in memory, so you always have to look it up, and the location of each uniform variable can change shader to shader.

With uniform buffers, if you set some values once and it doesn’t change the entire frame there’s nothing else to do.

So I went about changing the engine to use the old old way.

  1. -First I had to change all the shaders to not use uniform buffers. Luckily I have the shader compiler so this was a few lines of code instead of hand editing 100’s of shaders.
  2. -Then I sat around for a few minutes for all the shaders to regenerate and recompile.
  3. -Next I had to record the per vertex/pixel program combination of which uniforms were used and where they needed to be uploaded to. This was a non-trival amount of code to write.
  4. -Then any time a shader changed, I had to change the code to dirty all uniforms so they’d be reapplied.
  5. -Then I had to write a new uniform binding function.

Here’s the new constant binding function. Pretty messy memory wise, and many more calls to the GL API frame.

void ConstantBuffer::Bind(Context& context, void* data, int32 offsetBytes, int32 /*bytes*/)
{
    _Assert(offsetBytes == 0, "can't upload with non-zero offset");
        
    const VideoProgram* program = context.GetVideoProgram();
    const Collection::Array& upload = program->GetUploadInfo(context.GetDetailLevel(), _ordinal);
        
    for (int32 i = 0; i < upload.GetSize(); ++i)
    {
        const VideoProgram::UploadInfo& uploadInfo = upload[i];
        switch (uploadInfo._type)
        {
            case GL_FLOAT_MAT4:
                glUniformMatrix4fv(uploadInfo._index, uploadInfo._size, 
                                   false, (float*)data + (uploadInfo._offset * 4));
                break;
            case GL_FLOAT_VEC4:
                glUniform4fv(uploadInfo._index, uploadInfo._size, 
                            (float*)data + uploadInfo._offset * 4);
                break;
        }
    }
}

Success

Finally I watched the game run at 60 FPS. So now the statistics are nicer. And only 5% CPU time spent in OpenGL. Woot.

FixedIssue

Graphics Drivers

Ok, so the driver is optimized to set loose constants very quickly, but when presented as a block it just stalls waiting for the GPU to finish? I don't get it. The Windows drivers seem to handle uniform buffers properly. I understand writing the driver to the OpenGL spec - but geez, this makes uniform buffers mostly useless. It's known to be a uniform buffer, the calling code is updating it, it's marked as DYNAMIC_WRITE, so why isn't it doing exactly the same things as what my manual setting of each uniform value is doing???? Arhghghghg.

I'm sure someone has a good answer as to how to update uniform buffers on Mac OSX, but I couldn't find it. Or maybe the answer is upgrading, or not using them? But this was debugging hours I didn't need to spend. Actually I take that back. Tracking down issues like this is pretty satisfying...

So I can just keep the code the way that works on Mac, but uniform buffers are so much more elegant. Plus what if Linux runs faster with uniform buffers instead of loose uniforms? Or if Windows does? Then I have to generate two different OpenGL shaders, and have different code per platform to get the same data to the GPU. Now I'm not so worried that the Windows OpenGL implementation was slightly different from OSX, because I can see the implementations are going to be driver dependent anyway...

OpenGL is cross platform? Sorta. Yikes.

33 Comments

Quick Update…

Things are progressing on both the Mac port and the current Beta.

OSX

I did figure out the slowdown with OSX running at 1 FPS. Apparently there are certain functions in OpenGL 3.2 that are just unusable in the OSX driver because they cause a full GPU pipeline flush and the CPU just waits around doing nothing while it happens.

Really this just means that the OSX version of the OpenGL renderer now diverges from the PC (and possibly Linux) version, which is okay, it just means maintaining the renderer over time is slightly more annoying, and that my shader compiler now outputs different vertex and pixel shaders for OpenGL depending on the target platform. Gah!

I’ll probably write more on this later after testing a few more GPU/OS configurations to make sure the PC version doesn’t have to change.

Beta Version

As for the beta, there will most likely be an update soon to fix a few issues. There are two very common bugs that people are reporting.

The first is for mods that had custom materials built. In the developer build, those materials would just fail to draw anything, but in a release build the validity of the material was skipped (supposedly for performance reasons) and as soon as those materials ended up on screen, a crash would occur. That was an easy fix.

The other issue occurs when cutting down trees in an orchard. There’s a bug where the game tries to access the cut down tree after it’s removed, causing a potential crash. Also an easy fix.

Windows 10

What’s stopping me from updating the beta right now is there are bugs I have to dig into, but can’t. Unfortunately Windows 10 (I think) is causing me issues and I can’t currently use many of the crash dumps that people have sent.

What’s happening is that when the game detects the errors I added additional checking for, it forces an exception so that a proper debug crash dump can be output. The instruction on the top of the callstack happens to be in a system dll when this occurs. If the crash dump is generated on Windows 10, my Windows 7 machine doesn’t have debug information for (or even a copy of) the newer updated dll and therefore can’t generate a proper stack frame to begin walking the stack too see where the error occurred.

Basically this means I can’t read these crash dumps, and I just need to upgrade my development machines to Windows 10.

I’ve been avoiding this, because I hate doing OS reinstalls. This is mostly because there’s a lot of software to install to get up and compiling and developing, and I have a lot of computers to update. I end up half-working on other machines but mostly looking at progress bars. There’s also a big chunk of time spent making sure there’s nothing local to the harddrives that isn’t already on the server or NAS before they get wiped.

What I’ll probably end up with is my main desktop and new laptop with Windows 10, a desktop machine that dual boots Windows 7 and Linux, and my old development laptop will become a linux laptop.

Time to wait on HDD formats and progress bars…

Edit: Thanks for the tip of using the MS symbol server… works well. Goes to show there’s always something new you haven’t used or known about after 20-some years of programming. Windows 10 is still a good idea though as I’ve had a few Windows 10 specific bug reports…

16 Comments