Shading Languages

So for better or worse, I’ve designed my own shading language, written a compiler that parses it, it gives syntax and semantic errors, and if there are no errros, outputs the program in several other shading languages. This way, I can write a shader once and target multiple platforms. And plus, now I own the Dragon Book!

In line with GLSL, HLSL, PSSL, and the others, I’m calling my language SRSL for Shining Rock Shading Language. I’m seriously debating calling it SRSLY, because that’s funny, and owls are cool. I just need to decide what the Y stands for. Or not.

Defining the entire grammer and all the extensive language rules is probably beyond the scope of this post, so I’ll just be giving an overview here and lots of examples.

First some basics:

The body text of the shaders is very C like and should be understood easily coming from other shading languages. SRSL has all the intrinsic functions you’d expect from HLSL or GLSL. dot, cross, reflect, clamp, sin, cos, step, lerp, saturate, etc.

You’ll notice that all types in the examples are HLSL style – float, float2, float3, float4, int, int2, uint, uint2, float4x4, etc. I prefer this over GL style vec, ivec, uvec, and mat4. I think the HLSL style conveys more and better information about the built in types.

While none of these examples shows it, while loops, do loops, for loops, and conditionals are available in the language. However switch statements are not implemented.

One difference from C like languages is for the casting of variables between types. First, there is no automatic casting between int types and float types, as HLSL has. Also, if you had an int4 and wanted to cast it to float4, The C style cast would be:

(float4)variable;

But SRSL uses a more left to right readable post-fix op:

variable.cast(float4);

Another difference is declaration of arrays. In SRSL, arrays are defined with the array size as part of the type definition, like so:

// two arrays of matrix
float4x4[64] transforms, bones;

Whereas the C style declaration would be:

float4x4 transforms[64], bones[64];

The language has no global variables. Everything is passed to the shader entry point and if a function needs it, it has to be passed as a parameter. I’m pretty sure I’ve rarely or never used globals in shader programming so I didn’t see an immediate need for the language to have them. (Obviously in other languages vertex attributes, uniforms and textures can be globals, but that’s not what I’m talking about here.)

The only use case I can think of having globals are constants you want to define once, such as pi, or magic numbers used in more than one place. But rather than add another language feature, functions are used to return constant values. Due to shader compilers aggressive inlining, the call will always go away and just be a constant in the assembly.

For example:

float pi() { return 3.14159265359; }

Instead of something like:

const float pi = 3.14159265359;

So now I’ll get right to it, let’s take a look at a shader. Here’s the body of the simplest shader used in Banished.

program Debug
{
	stream(DebugVertex) uniforms(VSConstants) 
	vertexshader vs
	{
		float3 worldpos = tc.transforms[0] * float4(input.position, 1.0);
		output.position = gc.worldToProjection * float4(worldpos, 1.0);
		output.color = input.color;
	}
	stream(Interpolant) 
	pixelshader ps
	{
		output.color = input.color;
	}
	stream(PixelOutput)
}

My first goal for the language was to treat the vertex shader, geometry shader, pixel shader, etc as a single program – they’re usually written as pairs and triplets (or however many stages are in use), so the language should treat them as such. It should be hard to write a valid program where the parts of the pipeline (vertex, geometry, pixel) aren’t sync’d by default, however writing a lone pipeline stage should be possible if needed.

The way the language sees the graphics pipeline is stream of data, followed by some code, which outputs a stream of data, and then more code runs, and another stream of data is output.

If I ever add compute style shaders to the language, it probably won’t follow this paradigm – as compute shaders aren’t necessarily going to be used to put pixels on the screen. But since it’s my own language, I can always add specific syntax for whatever I need.

Anyhow, streams of data are defined by a struct. Each member of the struct can have an attribute that binds it to a specific hardware resource. Attributes aren’t required, except in a few cases – such as screen space position, instance index, vertex index, depth output, etc. All the : 0, : 1, : 2, assignments below are optional and the compiler will assign them if they aren’t specified.

// debug vertex describes the input data from the host program
struct DebugVertex
{
	float3 position : 0;  // bound to vertex attribute 0
	float4 color : 1;     // bound to vertex attribute 1
}

// The interpolant passes data from stage to stage
struct Interpolant
{
	float4 position : clipposition;  // special attribute for vs outputs
	float4 color : 0;  // interpolated attribute
}

// output color to render target. For multiple render targets, 
// multiple outputs can be defined, as well as depth output.
struct PixelOutput
{
	float4 color : 0;  // single color output
}

program Debug
{
	stream(DebugVertex) // stream definition (vertex attributes)
	vertexshader vs { ... }
	stream(Interpolant) // stream definition (passed from vs to ps)
	pixelshader ps { ... }
	stream(PixelOutput) // stream definition (output pixel colors)
}

For any shader, the stream defined above it is assigned to an automatically defined variable named ‘input’, and the stream after it is defined as ‘output’.

Often times, you want multiple pixel shaders per vertex shader, or vice versa. In that case you can define multiple shaders between streams. I’d use this for alpha test, different texture blending, skinning or morphing vertices, etc. As long as the streams are the same for shaders you can write something like this:

program Debug
{
	stream(Interpolant) 
	pixelshader ps1 { ... }
	pixelshader ps2 { ... }
	pixelshader ps3 { ... }
	stream(PixelOutput) 
}

This also shows how you can write just a pixel shader, without the vertex shader preceding it.

Another design goal I had was to remove repetitive code in the shaders. This tends to happen quite a bit when you have a lot of similar shaders with small differences. An extra rim lighting highlight, or skinning a model. These are the same as a base shader with only a few lines added.

So the language allows you to insert shader code in a previously defined shader. In the next example, the shader ‘psalpha’ derives from ‘ps’ – all the code from the body is used, and then the clip instruction is appended on the bottom. This is a very common operation when defining some shader where a version is needed that discards pixels based on the alpha channel.

program Normal
{
	stream(Interpolant) uniforms(PSConstants) textures(OpaqueTexture)
	pixelshader ps
	{
		float shadow = GetShadowValue(shadowMap, input.shadowProjection, pc.texelSize.x);
		float3 ao = sample(aoMap, input.texcoord.zw).xxx;
		float4 color = sample(diffuseMap, input.texcoord.xy);
		
		output.color.xyz = ComputeLighting(input.lightfog, color.xyz, shadow, 
                        float3(1.0, 1.0, 1.0), ao, pc.lightColor, pc.ambientColor, pc.fogColor);
		output.color.w = 0.0;
	}
	pixelshader psalpha : ps
	{
                // discard pixels when diffuse alpha is less than threshold
		clip(color.w - pc.alphaRef);
	}
	stream(PixelOutput)
}

Not only can you append code to the end of a shader, but you can insert it somewhere in the middle using a label.

Below is a shader that computes the position of a vertex for use in shadow mapping. Note the label keyword. At that location, any vertex can be modified in local space if code is inserted there.

program Depth
{
	stream(ModelDepthVertex) uniforms(VSConstants) 
	vertexshader vs
	{
		// get transform from list of instances
		float4x4 localToWorld = tc.transforms[input.instance];
		
		// decompress position from fixed point to float
		float3 position = input.position.xyz.cast(float3) * (1.0 / 512.0);
		
		label positionmod; // insertion point for vertex modification
		
		// apply local scale
		position *= localToWorld.row3.xyz;

		// transform to world, then to screenspace
		float3 worldPosition = localToWorld.cast(float3x4) * float4(position, 1.0);
		output.position = gc.worldToProjection * float4(worldPosition, 1.0);
	}
	stream(DepthInterpolant) 
}

When a skinned model needs to be rendered into the shadow map, just the skinning of the vertex can be inserted at the label positionmod. Note that the stream input for this shader is different, but as long as it contains all the inputs from the parent shader, it will compile just fine.

program DepthSkin
{
	stream(ModelDepthSkinVertex) uniforms(VSConstants) 
	vertexshader vs : Depth.vs(positionmod)
	{
		position = SkinPosition(position, input.index, input.weight, bc);
	}
	stream(DepthInterpolant) 
}

At this point, you may be wondering about this fancy code insertion language feature, and why I’m not just using macros or functions to do the same thing.

With functions, each shader would have a long list of function calls, with many parameters, and many declarations for out parameters that are used by different parts of the shader. In my experience shaders are very volatile during development – they change all the time as features get added and removed, or new ideas are tested. Function signatures change frequently. If a function signature changes, I’d rather not spend the time to change 50 or 100 shaders to update the calling parameters. It’s easier to just have all the code inline and allow variables from one shader to be accessed without issue in another.

At least, that’s the idea – It’s worked well for reducing code size for Banished, and hopefully will do so for future projects as well.

Macros are something I’m not interested in implementing in the language, however there’s a simple preprocessor in the languages tokenizer, with simple #if, #ifn, #define, #else, #end, and #include. It allows for different compilation based on target and features, and for sharing of common functions and structs.

You might see something like this in the shader, to disable computation of shadow mapping coordinates at the lowest shader detail levels.

#ifn DETAIL0
// only include shadow computation when detail level isn't 0
output.shadowProjection = lc.shadowProjection[0] * float4(worldPosition, 1.0);
#end

There is no requirement for preprocessor like tokens to be the first item on a line, so you might also see something like this with a conditional compile inline. DirectX 9 has no instance input as a shader variable so it has to be faked somehow. In Banished, it’s currently done like this:

// get transform from list of instances
float4x4 localToWorld = tc.transforms[#if DX9 input.position.w #else input.instance #end];

Functions can be defined outside of a program block for shared functionality, and look like typical C style functions:

float3 SkinPosition(float3 position, int4 index, float4 weight, BoneConstants bc)
{
	return ((bc.transforms[index.x] * float4(position, 1.0) * weight.x) +
		(bc.transforms[index.y] * float4(position, 1.0) * weight.y) +
		(bc.transforms[index.z] * float4(position, 1.0) * weight.z) +
		(bc.transforms[index.w] * float4(position, 1.0) * weight.w)).xyz;
}

Shaders use more than just vertex inputs – there are also uniform constants and textures that need to be passed to the shader. In designing the language, I wanted the use of constants and textures and their bindings to registers to look exactly like binding stream variables to hardware registers.

If you look back at the first example I presented, the vertex shader uses several constants, namely tc and gc. These are defined like this:

// vertex constants that can be accessed anytime and don't change per render
struct GlobalConstants
{
	float4x4 worldToProjection;  // used to transform to screenspace
	float4x4 worldToCamera;      // used to transform to cameraspace
	float4 cameraPosition;       // camera location
	float4 time;                 // current time in seconds
	float4 fog;                  // values for computing linear fog
	float4 fogColor;             // fog color
}

// list of instance transforms, changes per draw
struct TransformConstants
{
	float4x4[128] transforms;
}

// VSConstants is a list of all constant buffers available to the shader. 
// If used as constants input, this struct can only contain fields of other
// user defined structs
struct VSConstants
{
	GlobalConstants gc : 0;   // bound to constant buffer 0
	TransformConstants tc : 3;  // bound to constant buffer 3
}	

When you want to use a set of vertex constants in a shader program it’s referenced like this:

stream(Vertex) uniforms(VSConstants)
vertexshader vs { ... }
stream(Interpolant) uniforms(PSConstants)
pixelshader ps { ... }

The idea here is that there’s no need for a whole lot of loose global uniform constants (or constant buffers) like in HLSL and GLSL. The host program only provides certain constants, and they are generally known to the shader program and are available all the time. This way they are explicitly defined, and once setup it’s hard to make a mistake, such as using a uniform constant meant for a pixel shader in a vertex shader.

For instances where constants are different, say for drawing a specific type of geometry, a different set of constants could be specified making sure that only the available constants are actually used by the shader.

Textures are defined in a similar manner. The texture struct can only contain texture types.

struct PSTextures
{
	texture2d diffuse : 0;  // here the attribute defines which index
	texture2d snow : 1;     // the texture / sampler is bound to.
	texture2d ao : 3;
	shadow2d shadow : 5;
}

stream(Interpolant) uniforms(PSConstants) textures(PSTextures)
pixelshader ps 
{
        ...
        float4 color = sample(diffuse, input.texcoord);
        color *= sampleproj(shadow, input.shadowProjection); 
        ...
}

I’m going digress a little bit here, and you’ll see some of my thought process when designing this language. The language is still new and may need some tweaking – this is one of those places.

If you’ve been close paying attention to any of the examples, you’ll notice a glaring inconstancy with uniforms and textures versus the shader inputs and outputs. Shader inputs and outputs are automatically defined variables of the input and output type – input.position, input.texcoord, output.position, output.color, etc.

Textures and uniforms are currently used without a name and the variables inside the struct are simply declared as locals to the shader. This is okay. But I’ve been trying to decide if I should make this consistent with the other shader inputs.

Currently uniforms and textures would be accessed such as:

float4 position = gc.worldToProjection * input.position;
float4 color = sample(diffuse, input.texcoord);

But I’ve been thinking about changing it to

float4 position = uniform.gc.worldToProjection * input.position;
float4 color = sample(textures.diffuse, input.texcoord);

I like this change for a few reasons. First, it’s consistent the with the way streams are handled, and second, it stops you from inadvertently polluting the local variable namespace with unintended names that you might otherwise use. One day I might add a new texture to a struct, and it’s name clashes with an existing local in a shader – requiring a name change to one item or the other.

On the flipside, streams could have the input. and output. dropped as well, but too often I want to put the same names in both structs (position, texcoord, color, etc) so prefixing them with input. and output. is better in my opinion.

In the case of textures, I might want a texture named diffuse and a variable named diffuse to represent the resulting color when the textures is sampled.

float4 diffuse = sample(textures.diffuse, input.texcoord);

That’s nice and fairly clear as to what the variable holds.

The real downside here is for uniforms. Having to write something like ‘uniform.gc.worldToProjection‘ all over the place may be overly verbose, however it’s absolutely clear what’s going on. I can think of a few ways to reduce the length, such as allowing a user specified name when declaring uniforms and textures such as…

stream(Interpolant) uniforms(PSConstants, u) textures(ModelTextures, t)
...
float4 position = u.gc.worldToProjection * input.position;
float4 color = sample(t.diffuse, input.texcoord);

On the other had, I could scope the textures with a variable and leave uniforms alone. Really this is just sugar on the language. It works fine as is, and I’ll probably make a decision one way or the other the more I use it.

Changing Banished to use the new language (once the compiler was written and debugged) has been fairly painless and the reduction in code redundancy is very good. (I’ve actually found several bugs in the original shader code by doing the conversion, Whoops!)

Banished has also been a good test bed for a variety of shaders – I think it would be hard to design something like this without a real world test case.

Everything is pretty much done, but I’m sure the compiler still has bugs in it that i’ll find as I write more shaders. There are also missing features I’d like to add at some point. Depending on what else I’m working on I may not add them until I need them.

There’s currently no texture array type yet, and there aren’t sampling functions to specify which mip to use. (but a new texture type and sampling function are fairly easy to add). There are no multidimensional arrays, but I can’t think of the last time I even used one in C++. Geometry shader support isn’t finished. And there’s no tessellation shader as of yet.

Phew. Don’t fool yourself, compilers and languages are big projects.

So that’s SRSL (or SRSLY…) in a nutshell. It works, I can draw stuff using it, it’s cross platform ready. Now I can finally finish the OpenGL graphics renderer. Woot.

29 Comments

Porting: OpenGL and Shaders

Getting a new graphics API running in an existing game engine is somewhat painful – there’s always some kind of issue or architectural oversight that doesn’t meld well with new APIs. But the game engine for Banished is written in such a way to abstract all the graphics functionality behind a common API, so porting it isn’t as awful as it could be. The game can already switch between DirectX 9 and 11 at runtime, so adding OpenGL to the mix shouldn’t be too bad.

However, there’s a critical mass of code required to get even the simplest graphics to display.

Vertex Buffers need to be implemented to store mesh data, as do Index Buffers. Vertex Layouts need to be defined to tell the GPU how to read that mesh data. Constant Buffers are needed to send uniform constant data to the GPU for use in pixel and vertex shaders. Render State needs an implementation – it tells the GPU how to blend new geometry into the scene (transparent, solid, etc), how to update depth and stencil buffers, and how each triangle gets rasterized. Vertex Shaders are needed to transform mesh data into something that’s a triangle on screen, and Pixel Shaders are needed to read textures, performing lighting, and output the final color for each pixel.

All that just to get a simple triangle on the screen.

Once that triangle is displayed, textures need to be supported – there’s 2D Textures, Cubemaps, compressed formats, mipmaps, copying textures, and scaling of images – lots of code there. Then Render Targets need to be implemented to be able to render off screen buffers and render shadow maps.

Only then will the renderer be complete. Once I get all that code done, the game should just render properly with OpenGL. It’s all common code at the level of drawing a building, the terrain, UI, or anything else.

It’s only about 80K of code, but getting it all right takes some debugging time.

OpenGL has a few items that DirectX doesn’t – like Vertex Array Objects and Shader Programs that pre link vertex and pixel shaders. Not a big deal, but these are things my engine wasn’t designed around, so there are a few hoops to jump through.

All this implementation is fairly mechanical – with a good OpenGL reference none of this is terrible – but I’ve been struggling with how to implement one particular item, and that’s vertex shader and pixel shaders.

At its simplest, a shader is a short program that tells the GPU how to deal with mesh data, or how to color a pixel. Each API (DirectX 9, 11, OpenGL) tend to have they’re own way of specifying these programs. For each API, a slightly different version of the same program is required.

For example, here’s a simple vertex program written for DirectX 9, HLSL for Shader Model 2.0/3.0. (I didn’t actually compile any of these… so forgive minor errors)

uniform float4x4 g_localToView;

void main(in float4 iposition : POSITION, in float4 icolor : COLOR0, 
          in float2 itexture : TEXCOORD0, out float4 oposition : POSITION, 
          out float4 ocolor : COLOR0, out float2 otexcoord : TEXCOORD0)
{
    oposition = mul(g_localToView, iposition);
    otexcoord = itexture;
    ocolor = icolor;
}

Here’s the same vertex program, written for DirectX 11 – HLSL 4/5 – fairly similar, but there are some differences. There are more differences in more complex cases, such as texture sampling and dealing with input and output registers.

cbuffer globals
{
    float4x4 g_localToView;
}

void main(in float4 iposition : POSITION, in float4 icolor : COLOR0, 
          in float2 itexture : TEXCOORD0, out float4 oposition : SV_Position, 
          out float4 ocolor : COLOR0, out float2 otexcoord : TEXCOORD0)
{
    oposition = mul(g_localToView, iposition);
    otexcoord = itexture;
    ocolor = icolor;
}

Now, the OpenGL GLSL 3.3 version:

uniform Globals
{
    mat4 localToView;
} globals;

in vec4 iposition;
in vec4 icolor;
in vec2 itexture;

out vec4 ocolor;
out vec2 otexcoord;

void main(void)
{
    gl_Position = globals.localToView * iposition;
    otexcoord = itexture;
    ocolor = icolor;
}

Currently, the game engine supports just DirectX 9 and 11 – and those shaders are close enough to be able to switch between them using some generated code for the constants and some macros to account for language differences. It’s probably possible to do the same with GLSL, but I’m not keen to do so.

I’m generally not a coder that worries about future proofing code – I write what I need now, and refactor when needed. But I already refactored my shaders when I added DirectX 11 code, and I’m doing it again now for OpenGL’s GLSL language.

If one day I want to include support for other hardware that has programmable GPUs, more versions of these programs are required. Banished already has 50-60 different shaders, so for each new platform supported, there’s the same programs to implement over and over. There’s already other shading languages out there that I’m not currently targeting, but may do so in the future – There are differences for OpenGL ES, OpenGL 2, OpenGL 4, Vulkan, Xbox 360, Xbox One, PS3, PS4, AMD Mantle, and DirectX 12. There’s probably more I don’t know about.

There are a few solutions to this:

Option 1. Just deal with multiple versions.

  • This is really not ideal. What’s frustrating here is during development, these shaders are often in flux – and updating many programs that do the same thing is error prone and a bit wasteful of time.

    This would be like writing Banished in C on OSX and C++ on Windows. It’s not as much code, but you get the idea. Not much point to implementing the same thing different ways when the result is identical.

    Worse is that since I use the PC for development, I’d probably get lazy, only update the PC shaders, and then when I want to test on another platform, would have to spend all day updating shaders in other languages.

Option 2. Use macros, find/replace, and partially generated code to account for language differences.

  • This is the option that is currently in use. It works, but it’s not future proof. The more languages that are supported, the more macros are used, and the messier the code gets to look at. If you’ve spent time modding Banished and looked at the shaders, you’ll notice all the crap around the actual body text of the shaders to make them compatible just with DX9/11.

    There’s also a lot of #ifdef/#else/#endif code sections to account for differences between the two. It’s messy.

    I can get this to work for OpenGL, but it still requires at least one more refactoring of all the shaders.

Option 3. Use an existing tool, like HLSL2GLSL

  • This is a good option. There are some tools to convert from HLSL to GLSL, but they don’t support as many languages as I’d like. It could work for the OpenGL version of Banished, but that’s about it.

    I could probably modify the tool to output other languages, but I really don’t like messing with other people’s code. (Yup, I’m one of those coders…)

    Additionally I’d prefer the language I use not be pure HLSL of some specific version. I’d like a language that would be updated to support new hardware features. Nvidia provided the Cg language for a while, which targeted many platforms, but development has been discontinued, so it’s a no go for future proofing.

Option 4. Design a new shading language that can be read in and then output any language.

  • To me, this is the best option, but one I’ve been trying to decide if it’s worth the effort. I basically would have to write a compiler. It’s not a small undertaking, but it has advantages.

    There are things I never really liked about HLSL, and I’m finding I don’t like GLSL that much either. Both languages are very C like, but they don’t need to be – I’d rather see a language that is designed around the hardware rather than looking like the code that everything else is written in with special syntax just to bind certain variables to hardware features.

    My own language would give the freedom to implement just the features I need, and add features as they are needed. I’ve been thinking about my ideal shading language for a while (being a graphics programmer and all) but never really saw a need for it.

    With my own language, my engine would become API agnostic. I’ve got my own graphics layer anyway, that hides the platform specific details, so having a shading language to match would make sense.

    Plus, never having had written a compiler, it’s one of those programming tasks that my fingers are begging to implement.

    To do this, a fairly feature complete compiler with proper parsing, semantic checking, type checks, symbol tables, full error messages, and more would be required. It’s a big project.

Option 5 – OpenGL Only

  • I could just limit the engine to using OpenGL. However this is more restrictive than I’d like.

    It would support Windows, OSX, and Linux, but in the future it would limit me to only working on OpenGL systems. Granted, it’s a pretty large set of systems, but I don’t see any reason to do this since I already have DirectX implementations working and a framework for supporting multiple graphics APIs and hardware.

    I’d like to make console games again one day, and OpenGL tends to either not be available on them, or it isn’t the base graphics library which gives full GPU control.

Option 6 – Node Graph or Alternate Representation

  • Shaders could be built using an alternate representation, like the node graph material editor in Unreal and some other game engines. The idea is to use small blocks each with their own snippet of code, and link them together visually. For each target language, a different code snippet would be used per node.

    This type of system can be a great tool for prototyping features and allows non-coders to generate shaders. It’s fairly hard to generate invalid code. It also allows for multiple types of outputs to any shading language.

    However, this type of tool takes a bit of code to implement, quite a bit of UI. There’s still a significant code generation step, and hand coding of each node in the graph for each language. In an environment without a large number of artists, this may also be overkill in terms of toolset.

    The game engine that Banished runs on only has text inputs without any visual tools, so adding this type of tool probably doesn’t make a whole lot of sense.

Decision?

I’m writing a compiler. It’s the most flexible solution, so that’s what I’ve been working on the past two months (well, a few other things too, including some other tools, and going to GDC)

I’ve been thinking about this language issue for a long time, designing and changing the syntax, and making sure I can extend it to geometry shaders (and other hardware features) at some point. I’ve also been looking at the feature set that all current shading languages are providing and making sure I can provide the full feature set I need for the games I make and the hardware I’m targeting.

While it’s been a slower road to getting done, I’m pretty sure at this point my own language that does just what it needs and has a clean syntax is the best thing.

I still have to do one more refactor of all the shaders in the game – but as that’s required anyway regardless of what I do to make OpenGL work. If I end up porting Banished to some other platform with a different graphics API, I’ll be able to just write a new output for the compiler. The next game I make will just be portable to any system.

At this point I’ve got most of the compiler written. It reads in my syntax and can output GLSL 3.2, HLSL 2/3, and 4/5 and I’ve converted enough shaders that the UI terrain, and buildings show up in game.

As a side motive, I’m excited to be writing a compiler – it makes me start thinking that the next game I make could possibly have a scripting language for game code – but that’s a ways off.

23 Comments

Porting: SIMD

The work on making the code portable continues.

Way back when I first got an i7 CPU, I wrote a global illumination ray tracer that made really nice pictures while pushing the CPU to its limits. The math code for the ray tracer used the CPU’s SIMD instructions to perform the math operations more quickly, and I got very used to coding with them. The performance gain using them was fairly large.

For Banished I wanted to get started writing code and the game quickly – so I tended to use things I knew very well – everything from the choice of compiler, modeling tools, algorithms, and APIs. For math code (which is used by pretty much anything in the game that moves, animates, or is displayed in 3D) I chose to use the SIMD intrinsics.

That stands for Single Instruction, Multiple Data – meaning the instructions operate on more than one value at a time.

For example the single instruction:

c = _mm_add_ps(a, b);

would be the same as doing 4 additions:

c.x = a.x + b.x;
c.y = a.y + b.y;
c.z = a.z + b.z;
c.w = a.w + b.w;

If all goes well, I should be able to compile on OSX and Linux using the same SIMD instructions, as I’ll still be targeting Intel/AMD CPUs. But just in case it doesn’t work or compile, or more importantly, if I port to a platform without those intrinsics, I’d like the code to be ready.

In writing the SIMD code, I didn’t keep any sort of portable C++ version around. Which is not so good. Having the C code for reference material is nice. Especially for something that’s been optimized to be fast and is no longer easily readable – or some equation that was hard to derive and that scrap of paper got thrown away…

The porting process is pretty easy. Most SIMD code just performs 4 operations at once, so in plain C++ code, all the operations are generally written out explicitly.

The C++ code I wrote is now much easier to look at. But I’ve also left the code open enough that if another platform has it’s own set of differing SIMD instructions I can write code using them if so desired on that platform, or I can just use the C++ code.

The worst part of porting the math code like this is typo’s. After all, there’s not too many different ways to write a matrix multiply, matrix inverse, or quaternion interpolate. So any errors tend to be errors in typing. You write .x, .y, .z, or .w (or similar) so many times that they tend to just blur together. You look at the code so many times knowing what it should be, that you don’t see the glaring error.

So after writing the C++ code, everything appeared to work correctly, except I noticed some deer would twitch during their animations.

It took me more time debugging to find the two errors that caused this than the time it took to port the entirety of the math library to generic C++.

In one place I had mistakenly written

float d = (a._v._0 * b._v._0) + (a._v._1 * b._v._1) + (a._v._2 * b._v._2) + (b._v._3 * b._v._3);

instead of

float d = (a._v._0 * b._v._0) + (a._v._1 * b._v._1) + (a._v._2 * b._v._2) + (a._v._3 * b._v._3);

And in the other location

Vector3((_r._0 * a._r._0) + (_r._1 * a._r._0) + (_r._2 * a._u._0),
	(_r._0 * a._r._1) + (_r._1 * a._f._1) + (_r._2 * a._u._1),
	(_r._0 * a._r._2) + (_r._1 * a._f._2) + (_r._2 * a._u._2)),

instead of

Vector3((_r._0 * a._r._0) + (_r._1 * a._f._0) + (_r._2 * a._u._0),
	(_r._0 * a._r._1) + (_r._1 * a._f._1) + (_r._2 * a._u._1),
	(_r._0 * a._r._2) + (_r._1 * a._f._2) + (_r._2 * a._u._2)),

Hopefully you can spot the errors faster than I did – it’s easier without additional math code above and below. I guess my brain saw what I expected to see, rather than what it really saw.

I’m actually fairly amazed the game displayed mostly correctly with these errors – the first error was part of Quaterion::Interpolate – which is used on every object displayed in game every frame, as well as during animation. The second error was in the code to multiply one orthonormal basis by another. Also used very heavily.

And for those of you thinking I should have unit tests for this kinda of thing, yeah math is definitely a system that can be unit tested – but eh, unit tests, that’s a discussion for another day.

Anyway, lots of debugging later the math code is now portable and the game works properly as far as I can tell with identical output.

Next up is an OpenGL version of the renderer, which I’ve got half done… more to come later.

19 Comments

Porting: UTF-8

With the mod kit and steam workshop out there, I’ve been working on porting the game to OSX and Linux, cleaning up code, and writing some new code. I’ll still be fixing bugs and making small changes, but my focus now is going to be on ports, and building prototypes for new games.

I’ve built a new machine for developing on Linux, and bought some Mac’s, so I’m all set with hardware. And it’s been a while since I used makefiles, other IDEs, gcc/clang, or did any sort of *nix development, but I have done it so I’m not starting from ground zero.

But before I can actually go about working on the new hardware and compiling things, there are a few issues in the Banished code base that need fixing. I planned on porting the code base one day, so things are nicely setup into common code and platform specific code, but there are still some issues I didn’t properly account for.

There’s code portability issues. I want to make the code as portable as possible – there’s a chance I’ll be making games for more than just Windows, Mac, and Linux one day, so I might as well try to fix them now.

The first issue is with text.

When I wrote the code I assumed one day it would support more than just English. Back when I made console games I never worked on any of the text code, so all I really knew is that the system used two bytes per character in a string of text, and this was fine for all the languages that the game was translated into – generally EFIGS and maybe Japanese. Since windows API takes wchar_t for all filenames and text, that’s what I used, and I happily and naively coded away using wide strings. This probably would be okay if the game stayed on Windows.

But it’s sort of mistake for cross-platform code. I didn’t even use it correctly. The released game doesn’t currently use UTF-16, it’s just UCS-2, so there are some languages that have characters that are unrepresentable.

Not only that, but API calls on other systems generally don’t take wchar_t*, they take char* instead. Certainly I could make conversion functions and convert UCS-2 to UTF-8 as needed, but that’s not really ideal.

The final problem is that the size of wchar_t on Windows is 2 bytes, but this isn’t guaranteed to be the case on other platforms. The size issue shouldn’t really matter – but it’s possible somewhere I multiplied a string length by 2, instead of multiplying by sizeof(wchar_t); That would cause problems.

I’ve known about these issues for a while, but I generally code something that works first, and don’t refactor until I have to. And now I have to.

So my big fix recently was to remove the use of wchar_t and use char instead. (And make sure there’s no multiplies by 2). Not only that but all strings need to use UTF8 properly – when printing text, or reading text from resource files the right thing needs to be done to properly decode and show the right character.

The first step of doing this is easy. You can find and replace a bunch of stuff using a text editor.

  • wchar_t becomes char
  • L”string” becomes “string”
  • wsprintf(buffer, “%ls”, param) becomes sprintf(buffer, “%s”, param)
  • wcscpy, wcslen, wcscat becomes strcpy, strlen, strcat

Next I fixed my String class to properly use char. No big deal.

All the serialization code that previously written to serialize wchar_t became serialization of char. This conflicted with serialization of single byte signed values, which were already typed as char.

For example

void Serialize(char v);             // serialize a signed 8 bit value
void Serialize(unsigned char v);    // serialize a unsigned 8 bit value
void Serialize(wchar_t v);          // serialize a character

Became

void Serialize(signed char v);     // this looks ambiguous to me, because i think 
void Serialize(unsigned char v);   // of 'char' as signed, even though it's
void Serialize(char v);            // compiler dependent and compiles ok...

This prompted me to redeclare all integers with typedefs and make a distinction between a character and an 8-bit integer.

  • signed int became int32
  • unsigned int became uint32
  • signed short became int16
  • unsigned short became uint16
  • signed char became int8
  • unsigned char because uint8
  • char stays the same, and is only used for characters and strings.

int64 and uint64 were already typedef’d. These typedefs are made in platform code, so per platform types can be declared using the right sizes, regardless of their names on each platform.

So the overloads then became

void Serialize(int8 v);     // serialize a signed 8 bit value
void Serialize(uint8 v);    // serialize an unsigned 8 bit value
void Serialize(char v);     // serialize a character

This took care of the type conflict since the compiler treats ‘signed char‘ as a different type than ‘char‘. It also clarifies what the types are used for in the code. If I see a char in code, it means character, or char* means null terminated string. If you see int8 or uint8, it’s a number stored as 8 bits. This makes things a bit more clear.

This was also a big find and replace, except for char, which I had to go through and determine if they were int8, or actually ascii characters. This wasn’t too hard though, as most previous uses of text used wchar_t – so most char‘s became int8. I’ll probably take a few weeks getting used to typing int32 instead of just int.

One issue that came up with the type name change was the existing source data. If you’ve been modding the game, in the data you might see

int _value = 400;

with the new code, it would become

int32 _value = 400;

While I could version all the text data, I don’t really want to break peoples mods that they’ve already setup and have to do the find and replace on their own, so there’s some versioning code that reads the old typename. For the next game I’ll get rid of that versioning code, but it’s going to stay in Banished for now.

Next came the hard part. Input data could be in any format – generally text files would be in Ascii if I create them, or UTF8 I used a symbol like the Euro. But mod creators and translated strings could be in UTF8, UTF8 with a byte order mark, or UTF16 big-endian, or UTF16 little-endian. Really it shouldn’t matter what a mod creator uses as a text format – I just want the game to load it and do the right thing.

While there are libraries for dealing with this, I tend to write my own code since I don’t like different code styles mixed in my code, type conflicts, and dealing with crazy code licenses. So 400 or so lines of code later and lots of debugging I wrote two great functions. One detects character encoding using byte order marks and checking for valid UTF8. The other can convert any one text encoding to another. There’s also support functions for decoding strings one character at a time.

After that it was a simple matter to convert all text files to UTF8 as they are opened by the game engine.

Once I had all UTF8 strings in memory, they have to be decoded when used as output – the font rendering code now decodes UTF8 into actual characters using 1 to 4 bytes of a string at a time before looking up each glyph in the font texture.

The Windows API can be compiled to use either wide strings or not, but I left them as wide and made a WideString class to deal with the conversion to and from the internal UTF8 String format. The WideString class only exists on windows compilations, and is only used in files that would be rewritten per platform anyway.

After all that, the game compiled and ran just fine, but it wouldn’t load old data or save games. This is bad. I can’t go breaking save games when a new version comes out. And I’d like to keep all existing mods working.

So then I had to version old strings that are stored in saves and data on disk – strings are just written as an int32 length, followed by all the bytes of data. So new strings set the high bit on the length to mark it as the new version. I doubt that the game will have a string of text 4 gigabytes long so this will be okay. If an old string is detected when the high bit unset, it converts it from UCS-2 to UTF-8 on load, and the game happily continues loading older data.

Again this versioning code will hopefully go away when I make another game, since it won’t be needed.

So now the only wchar_t that exists is in platform code on Windows that won’t be compiled on OSX or Linux, and the game properly supports various character encodings.

Text encodings are one of those things I never really want to think about – and now that I’ve spent a while dealing with it, hopefully I never have to deal with it again. Phew.

30 Comments

Update to Steam 1.0.4

A new version of the game has been updated on Steam that changes the way mods are uploaded to Workshop. Adding a mod to workshop now requires the original compiled source data to be available to allow uploading. This allows only mod authors to upload their content.

Mods must be rebuilt with Mod Kit 141123 before this will work. Non-Steam versions should still function properly regardless if mods are built with the newest mod kit or the previous one.

Changes for 1.0.4 Build 141123

  • - Uploading mods to Steam Workshop now requires the user to have the original compiled data before packaging on disk (generally .crs files). The data must be in the same location it was build and match the files in the package. For example, if the mod was built in C:\BanishedKit\mymod\bin\, then all files created during mod compilation must remain in that directory for the mod to be added to Steam Workshop.

    Without the original data, the Add to Workshop and Update on Workshop buttons are unavailable. Current mods need to be rebuilt with Banished Kit 141123 before they can be updated.

    No other changes to the game or mod kit have been made, so non-steam users can continue using build 141103.
29 Comments