Small Ideas, Big Changes

As I’ve been working lately, I’ve been very aware that seemingly simple things I wanted to have in a new game have huge implications on the difficulty of making a game and engine.

The first thing I wanted to change from the previous game was the ability for the camera to look up toward the horizon. In Banished the camera always looked down, and the higher the camera got, the more the ability to look up was limited.

This was great for performance – it limited the number of objects drawn, and no graphics models needed simpler level of detail (LOD) versions. The terrain was simple. I didn’t have to have a fancy sky drawn, nothing had to be drawn past the mountains at the edge of the map. The drawback was that you couldn’t ever see the whole settlement once it got large. You also couldn’t move around very quickly.

Now with that simple change – looking up, and zooming farther out there’s more work to do. All models need some of LOD to keep the triangle count down and keep highly detailed objects from aliasing in the distance. The terrain now has LOD as well. Having LOD on the terrain makes texturing it and having decals on it harder. And brings in other problems like objects disappearing or floating because the terrain is less detailed in the distance. I also have to draw to the horizon, ocean or mountains, the sky, the sun.

Drawing shadows also takes more time. Instead of a single shadow map based on what’s visible, to draw to the horizon takes multiple shadow maps, various tricks for distant shadows, and more complex pixel shaders. This also increases CPU load as 4 shadow maps are drawn instead of 1.

The renderer also has to change to handle this. While it works just fine, it’s slow. Instead of dealing with say 4000 visible objects in view, my current test scenes are dealing with 40000. So the renderer has to be changed to handle culling away and drawing 10 times the number of objects in the same amount of time as before. Granted GPUs are a bit faster than before, but CPU time hasn’t increased the same way.

After the camera change, the next change I thought wouldn’t be so bad is arbitrary layout of buildings. No grids. Grids were great in Banished because it made lots of things simple. Pathfinding was on a grid, placement of objects was on the grid, everything used the grid.

But I want to see organic layouts. I’d like farmed land to follow terrain and butt up against a river. I’d like buildings and roads to placed in whatever natural way they end up being.

But with the grid gone, lots of things are more complicated. Placing objects now requires checking an arbitrary shape against everything else already placed. This requires a spacial subdivision structure to be fast. It’s also harder to visually convey the overlap and how far you might have to move something to get it to fit.

It also adds a little friction to placing buildings. With a grid, you might place 2 buildings and leave space inbetween for something else – because you know its size. With arbitrary sizes, rotations, and shapes this can get harder, and potentially frustrating. When you do want to line up objects, its not as easy either.

Because things are arbitrary, I’m no longer limited to square regions – players can create any shape for an area – which is great, but makes the user interface more complicated. At the same time it can add friction to the experience because you have to spend time layout out areas if you don’t want the results of a quick click-and-drag.

I’ve been working on making placing arbitrary objects and regions as painless as possible, but it’s challenging.

As I wrote about last month, pathfinding has also gotten more complicated. However with the complexity comes flexibility. I have pathing setup such that characters can always walk between buildings, reducing the chance of getting stuck, and decreasing search time while running the A-Star algorithm. Search time is also decreased due to much more open area.

Thankfully, graphics doesn’t get anymore complicated, it was written to handle things in any orientation from the start.

These two seemingly simple changes have caused my game engine to get slightly more complicated. What is hardest is that there’s even more tiny details than I’ve written about that I keep running up against that were unknowable until I ran into them. I’ve spent more time making the features work than expected.

In the end, these changes are good – even I went back to building a grid based game with limited camera. It’s just surprising to see how far reaching a simple design decision can be.

27 Comments

Recurring Nightmares

Almost every project I’ve worked on has a bit of code that always causes chronic problems. Some code gets written and it seems to work well in development and testing. Then someone on the team creating real world data causes it to malfunction.

So then the programmers fix the issue, and everyone continues on their merry way, until, some different real world data causes it to malfunction.

This cycle continues until the game ships. I’m not sure this problem comes from poor development – it’s just that code can be so complex, no one can envision all the ways it will be used or all the ways it can break.

I’ve even seen this happen across multiple projects that reuse a game engine. A feature might be used flawlessly on one game, and unchanged, it breaks on the next game in development because it’s being used differently or with different requirements.

I’ve seen this happen across all fields. Graphics, physics, collision, AI, audio, animation, controller input, even movie playback!

Sometimes, once true real world requirements of a system are known, a full rewrite helps the chronic bugs. Or eventually all the edge cases are discovered. Sometimes not.

Unfortunately I’ve got an issue like this now. A while back I decided I wanted to support arbitrary placement of objects. Any shape, any rotation, with potential intersections, added and removed dynamically to the play field. I ended up implementing an algorithm from a paper called Fully Dynamic Constrained Delaunay Triangulations. It’s a bit complicated, but was fun to implement and works really well. The paper covers all the cases needed to be robust, but there are really tricky issues that come up.

Right now it handles pretty much whatever I throw at it. It makes nice big triangular maps of pathable space that I can run A-Star on for pathfinding, and the map can be analyzed quickly to allow different sized units.

Here’s some pretty pictures of it, and some test paths that have a non-zero unit radius.

So every few weeks, the random level generator ends up placing objects in configurations that breaks my code. Arghghgh. Sometimes it breaks building a map. Sometimes it breaks when quitting and a specific object is removed. The worst is if it breaks when I place an object manually, because it’s really hard to find and recreate the exact placement that caused the issue.

And so I stop whatever task I was doing, and start debugging the triangulation code. This is not easy – Some maps have 20,000 objects, and 175,000 triangles in the pathing mesh. I end up doing a mix of visual and code debugging. By drawing the pathing mesh at the point the error occurs, as well as before and after, and stepping slowly through the code to figure out what is happening, I can figure out what’s causing the bug. It usually takes me several hours to determine the problem. Sometimes more. Finding a quality fix is typically hard. So I take a break, sleep the night, and generally have a good idea for a solution in the morning. Implement and test.

Then I wonder, “Okay, What was I doing before I ran into this bug??”

For this triangulation system, the culprit is floating point math. Every time. The algorithm is good. I haven’t had to change the major details of it a single time since I got the initial implementation working. But because math on a computer is an approximation using a fixed number of bits, math that works out on paper does not always behave the same way on a computer.

For example, one of my issues dealt with computing the circumcircle of a triangle. The algorithm just didn’t work at far distances from the origin until I wrote the math three different ways to find the most numerically stable and accurate implementation. On paper the math should have resulted in exactly the same result for all three methods!

Another issue arose because I was testing a large circle against a point which laid exactly on its perimeter, but the test failed because of lack of numerical precision. I’ve also had failures do to nearly degenerate triangles. And other crazy things that are hard to describe concisely.

I’m pretty sure some of the worst bugs to fix properly in my programming career are due to floating point imprecision. We have a long history, and we are not friends.

When I started making games professionally the fix would be to test using an epsilon. For example instead of
if (x == 0.0) { ... }
I would write something like
if (abs(x) < 0.00001) { ... }

In the right case this can be good. But in most cases is very bad. Because without the right epsilon and knowing what x is and always will be, you are potentially creating false positives in addition to fixing the original problem. I avoid this whenever possible.

My goto solution now is to use geometry analysis to determine an answer that needs high precision. Can I make an inference of the result using vertices, edges, and faces, and their relation to each other? Can I write the algorithm to be fault tolerant of values slightly under or over the desired one? If not, can I rewrite the math such that I’m never using values orders of magnitude away from each other?

Having fixed so many small cases – about one a month, I do consider going back to a simple grid for pathfinding. But I purposefully chose this route as the most flexible. I do wonder if I’ll ever get this piece of code to be fully stable. At least it works today and it generates lovely paths for units to follow.

Only time will tell if I get all the kinks worked out – at least until the next project that uses it in a different way.

19 Comments

Art Test

I’ve been tired of looking at grey flat land with test objects, so I spent a bit of time doing some art tests. It’s hard for me to have any feelings about a game without visuals. Yes, the gameplay can be in place, and it might be fun, but for me it misses something undefinable without the visuals – partly charm, warmth, completeness, but something more as well.

At the same time, since I’ve still in a very much prototype phase of development, I don’t want to spend days on each art asset, or hours making shaders.

So I arrived at a prototype art style that is certainly subject to change, but for now gives me something more concrete to build on where I can add detail later.

It looks like this:

So now I can quickly model general shapes, play with color and shading, without spending a ton of time getting high resolution assets built.

A bunch of other things had to be done to make this happen. I had to start a proper terrain generation system specific to the game that incorporates gameplay features instead of just being specific to terrain. This was fairly quick implement do to previous work.

I also had to rewrite the shadowing system I had from Banished – previously the camera couldn’t look up to the horizon, but now I’m allowing a much more flexible camera, so I need to render to the horizon and handle shadows in any configuration. Not hard, but a bit of a change from a single shadow maps to cascaded shadow maps.

Also due to the camera changes I now have some new rendering systems to handle more objects and need to create LOD objects for things far away – another thing that I didn’t have in Banished. And another reason to have a quick prototype art style. Each asset now needs multiple models for close, mid, and far view distances.

I’m allowing a large zoom out distance, so you can look over a fairly large area. Here’s the same terrain as the previous image from far away.

Obviously it still needs, ocean, and sky – but usually I’m testing close up in the middle of the island looking down, so it’s not an immediate need.

I may end up with something in-between this style and Banished’s style, or something completely different. Or I may stick with it – I’ve got plans for far more assets than the previous game required, so faster art creation might be a really good thing.

34 Comments

Game Code Design

I don’t quite know how to write game code properly. It’s a bit of a mystery because I’m still new at it.

My experience working professionally on games for the last 17 or so years had mostly been writing systems. My main job used to be writing graphics engines for consoles, but I also wrote collision, physics, and tool chains for content creators. And because part of my job was getting the entire game engine running on consoles, I had exposure to the audio, networking, i/o, asset management, and input systems. I like systems. They all have definable expected results.

Despite the complexity of any one of the major systems in a game engine, it mostly takes some data, does some stuff to it, and outputs a result. The systems make the artwork appear on screen, make the physics behave correctly and perform well, make the audio sound just so. Given requirements for what you want a system to do, it’s usually easy(ish) to design and implement. Especially when there are clear lines of separation between systems.

Before starting my own games, I never saw the game code. It was someone else’s job to implement it. Sure I knew the basics of what was in the code – state machines, entities, various channels of communication, and scripts that made things happen. And there was also AI code, responding to input, user interface code, and more. And it was all very tightly tied together. And the games tended to be iterative – after initial implementation, features got tweaked until the game shipped.

How do you organize that sort of code and make it easy to refactor? And is it possible to keep the entire game design in mind when implementing things initially?

I’d never done it before, so for Banished, the answer was no. The design of my last game was iterative until near completion, so I just kept adding things to the code base as needed without an overall plan. Things could have been implemented in a cleaner and more productive way, had I been able to stand back and look at the whole picture. Sure I occasionally refactored things when it got hard or messy to continue forward, but a full rearchitect of the game level code wasn’t in the cards.

So now that I get to mostly start over, I’m trying to take a more systems level approach to the game code. And while my design isn’t fully written out and all the little details aren’t set, I know the overall shape and size of the games features. Luckily since the new project I’m working on is similar to Banished in that there’s indirect control over some people, I can take a lot of what I learned and apply it.

Entities

I had the concept of entities in Banished. In other game engines these are known as objects, pawns, actors, things, etc. It’s basically just something that does something. It could be the player character, a torch, a chest with treasure, a tree, or an monster manager that is spawning zombies. But by itself, it does nothing. In my implementation, I add components to entities which adds functionality – a model to display, audio to play, an ai and movement controller, and many other things. Basically each component gets a chance to do something on creation, update, and removal.

I’ve kept this for my new project as it’s one of the things I got right – but I’m extending it further. In Banished there were a lot of things that weren’t entities – the terrain, the sunlight, camera, object selections, object placement, player toolbars, the map data, clock, menus, minimap, and the weather system. And they all required extra manual code (and sometimes repeated code!) to use them in a bunch of places. (If you’re clever you’ll notice those are the things that are global to the game, and there’s only one of them at a time.)

In my new project things like that are now entities as well, since they only do things on creation, update and removal. Having a unified system for all game objects also makes writing other things easier, like save games, since everything fits into the same mold. I also had separate game loop code for loading screens vs the main menu vs the main game. Now it’s just one game loop that can do anything based on the entities used.

Hardcoded Data

Early on in Bansished’s development I got things working too quickly. Things like professions and types of raw material were coded in C++, rather than configured as data. As you can imagine, late in the game adding a new professions or item type was painful! I had to touch many source files and make sure everything worked and nothing broke. I eventually made professions configurable through data, but the item types were so ingrained in the code that changing it to data instead of code was a task I didn’t want to take on.

This is not a mistake I’ll make again – any game concept is now made generic and configurable. My rule of thumb is if I think of creating variable names with nouns or descriptors, (Pot, Clay, Bronze, Edible, etc), it’s probably something that should be data, not code.

Hierarchal State Machines

State machines in general are useful. The idea is some object is in some state, and while in that state only does certain things. Some event may occur that causes a transition to another state, which has its own things it does. And so on. But they caused me a lot of headache in Banished. With a hierarchical state machine, you can override a state with something new. So let’s say you have an entity that is a Box of Treasure. It’s normally in the closed state, and when you interact with it, it transitions to the opening state, which plays an animation, and when that’s done, there’s a transition to the open state, and it gives you some gold. Now lets say I want a Trapped Treasure Box. I only have to override the open state, and instead of giving treasure, I write code to shoot an arrow at the player. I didn’t have to rewrite code for the closed state, or the opening state.

In Banished this worked well for adding components to the building. Each component added could implement a state machine, or overrides various states. There were lots of these – partial state machines for gathering resources, building, handing out jobs, being on fire, being diseased, being destroyed, etc. The problem was not all of these were on each building or field, so I had to make them work regardless of which were present or not. It made it exponentially hard to write – should the parent state be called? Should it not? If there’s not transition function between states, should it call the transition function in parent states? Does it work if the components are in different orders? This was a huge source of bugs that took a long time to work out.

In my new code, I’ve got state machines being used, but I’m building them more carefully, and avoiding hierarchies, especially deep ones. Once state machines get to large, they’re hard to manage and think about.

Character AI

Ah, this was a mess in Banished. Some hand written code made overall decisions about what to do. It was prone to breaking. What to do if a character is hungry, diseased, and his house is on fire all at the same time? Which has priority? Does it depend on what everyone else is doing? Have I handled all the permutations? I’d add a new concept like being cold or being happy, and that would break something (like not starving) that had been working because the decision process changed or I didn’t add the proper checks in the right places. Once the decision was made, a list of actions for the character to carry out would be generated – sometimes by a different chunk of code, like the global general work list, or a building state machine that was handing out jobs. Something like – walk to storage, pick up logs, walk back to workplace, drop logs, etc. Which might be interrupted at any time because the AI deciding to do something else important. The code was spread out over too many places and was a bit prone to breaking.

This time I’m implementing a system that’s a bit more configurable and unified. What I’m building now is an overall priority system that weights each characters needs. So things like food, water, warmth, sleep, shelter, companions, possessions, daily schedule, working, helping others, needs of the village, emergencies, and special events will be weighted based on the current situation, and the best one will be selected.

For each of those a behavior tree will drive how each character achieves each need to allow many ways to solve an issue. For example, there should be maybe ways to find food. Is it in my inventory? Is prepared food available nearby? Is preserved food available in my home? Can I ask my neighbors for some? Do I have to get it from storage? Can I ask a hunter to prioritize getting food? Should I walk out into the woods looking for mushrooms and berries myself? Can’t find any after a while? Hmm, time to leave the village for a better one that has food.

This sort of decision making and planning will be mostly data driven, so that adding new behaviors requires little to no code. At least I can hope so.

User Interfaces

I’m pretty happy with the way the user interface in Banished turned out. My only issue with it was that there was a lot of code to make it work. Something like 20% of the game code for Banished was UI. If I designed a UI with a button on it, I had to write some UI code to find the button by name, configure it to receive an event, and then when the event occured, call some function on an entity.

So I’ve rewritten the UI code to be able remove the need for the in-between code that manages that UI widgets – I can just create a UI layout with a button widget that binds itself directly to the function on the entity. The intermediate control code isn’t needed. This works for all sorts of widgets and values, as well as text and sprites that appear on the UI. While this won’t reduce UI code to nothing, it should help to reduce the amount of code to manage.

Another UI change I’ve made is that I’ve separated the way the UI looks from the way it behaves. This way I can easily restyle the UI and also create widgets and layouts in code and have them styled the same way as everything else.

Performance

For the new project, I’m planning farther ahead to deal with performance issues and use more CPU cores. When I started my game engine, I consciously chose to limit multithreading to keep things simple – after all I was working solo and wanted to get initial implementations running quickly. Some things did end up in different threads, like fine grained pathfinding, but everything else – updating entities, drawing, coarse pathfinding, searching for locations, etc was done sequentially.

This needs to change this time around to make a more scalable game.

While a lot of other current engines are running entity updates in parallel, I’m not choosing this route. Threading updates where multiple entities depend on each other is hard. Really hard. The goal is not to use any locks or thread synchronization. This requires really breaking up updates into small chunks, limited or no direct access to other entity data, sending messages to other entities, and waiting for responses. This makes designing the code hard, makes it hard to debug, and hard to modify later if you forget whats going on.

I’m also not sure that updating the entities will be the bottleneck this time around. Entities are now just decision makers and controllers for engine level systems, and most of them don’t update every frame. It will require profiling once it’s in place to know for sure, but I know other things are going to show up with significant time use on the profiler first.

I’m preparing for moving all the heavy lifting into systems that can be easily parallelized. All animation, character movement, particle systems, pathfinding, ray casts, spacial searches, spacial subdivision updates, and more, are all fully separate systems and can run on different CPUs easily without dependencies. If the AI needs to wait on a search for nearby objects, or how to get from A to B, or other expensive operation, they can just idle a bit until the result comes back from a lower level system.

Additionally the entire rendering pipeline can run start to finish on a different thread if it runs a frame behind the updates. As I don’t plan on making games requiring quick twitch input response, I don’t believe this will ever be an issue or even be noticed. I’ve done this before on consoles engines, and it can free up a ton of frame time, making it available for updates. If required I can parallelize culling and command buffer generation within the rendering, but I’m not sure there will be huge gains there unless I’m also supporting DX12 and/or Vulkan.

The Plan

Anyway, that’s the plan for this time around – I’m sure that these changes to the way I structure the game code will help to make the code easier to use and update, make a more extensible game engine, and teach me new things as I go along. But I’m also sure at about 80% complete on the game, the code is going to get start to get messy again in the push to finish as I implement all the small items I didn’t consider ahead of time. Which, obviously, I’ll fix in the game after this one.

19 Comments

This is what game programming is like.

Earlier in the week I reached a point where I was adding the first characters to the game. When I start some new feature I generally type out a description of the things I think will be needed, and modify the document as I work to remember changes I need to make and to keep track of random things I think of that are related. I often get interrupted or it’s the end of the day and I want to remember what I was doing when I resume.

So the first few lines of the document read like this:

  • – get character model setup, initial component setup.
  • – make a basic walk animation, idle animation
  • – write a simple wander. Pick random x, y path to it.
  • – need an AI action to Follow Paths.
  • – need to get height above terrain and set it.
  • – …

I went merrily on my way doing this, and since the last game had the AI follow path action, I figured I’d be done in no time. Especially since I have my dynamic navigation mesh working and can already find paths through it. Alright!

Well.

I coded for about 13 minutes, then I really started thinking about it, and realized there were some issues. The last game ignored a lot of things. The animation drives the forward movement of the character, so I don’t actually know how far the character has moved until that happens, and the AI update happens way before animation. There could be 3 or 4 animations blending that cause the forward movement. And I have to know the forward movement before I can query the height above terrain so that the character stands in the right place. There’s also the problem that I have to evaluate the path for the direction to walk in and make sure the character turns to face that direction somewhere. Ok, more on the list.

  • – need an AI action to Follow Paths.
    • – write a hierarchy controller to set x, y movement, get height from collision, and set orientation.
    • – controller will update right after animation so move offset is known.
    • – maybe call it CharacterComponent and CharacterController – add to Entity namespace? Easily reusable for all moving entities.
    • – set height over terrain after getting animation offset and moving!
    • – maybe put path following into the CharacterController too. That moves it into hierarchy update, so it can be parallelized later!
    • – keep call to path finding in the action, so it can play idle animations, then walk if pathfinding is delayed due to waiting for the result from another thread.

Ok, a few more minutes of coding and I think of some additional problems. What if the path gets blocked halfway through walking there? What if an item gets placed on top of the character and they are stuck in the navigation mesh? Or placed at the destination and the character can’t get there anymore?

I briefly consider just doing a collision check when the character moves forward to handle these cases, but for 100’s of characters that will be slow, and won’t actually do anything in 99.99% of cases.

So I make a pot of tea and sit down to eat some lunch. And think. Then. More on the list.

  • – need an AI action to Follow Paths.
    • – write a hierarchy controller to set x, y movement, get height from collision, and set orientation.
    • – controller will update right after animation so move offset is known.
    • – maybe called CharacterComponent and CharacterController in Entity namespace? Easily reusable for all moving entities.
    • – set height over terrain after getting animation offset and moving!
    • – maybe put path following into the CharacterController too. That moves it into hierarchy update, so it can be parallelized later!
    • – keep call to path finding in the action, so it can play idle animations, then walk if pathfinding is delayed due to waiting for the result from another thread.
    • – want to know when paths are interrupted
      • – bad case, character is now in an area they shouldn’t be – have to path out to safe location before proceeding.
        • – could search collision mesh for places near edges that are safe and towards target?
        • – could manually add ‘safe points’ to mesh?
      • – bad case, destination is no longer reachable. Cancel task and replan.
      • – good case, can repath from current location to destination and continue
      • – best case, path was modified behind the character and no repath is required.
      • – these can all be handled in the AI follow path action, just need to know when they occur.
      • – So need an active path manager / path cache.
        • – when navigation mesh changes – (add/remove) – need to test active paths to see if they are affected.
        • – this is a walk across the navigation triangulation…? or just use the dirty rectangle as that’s already known for agent radius computation…?
        • – higher level needs some sort of handle to the path that can be released when done.
        • – this is actually a cool idea, common paths that are used frequently can be cached for a long time and will auto regenerate if invalidated. Been looking for a solution to this.
        • – Path cache is also a good fit for all of path finding being put in its own thread. Keep thread safety in mind!
        • – Cache makes it easy to draw all active paths for debugging purposes alongside the navigation mesh.
        • – need some spacial subdivision structure to put paths in so only paths in the area of change are tested and invalidated.
          • – Consider consolidating all spacial subdivisions? There are a lot – graphics, collision, nav mesh, audio, etc. And now paths.
      • – add a way to add/remove things from path scene en mass – so less recomputation happens when major events occur, otherwise agent radius and path invalidating will happen for each add/remove to the navigation mesh.

Wow that’s a lot of stuff. And this is possibly where I’ve gone too far. Everything here is new and probably required, except:

  • – Consider consolidating all spacial subdivisions? There are a lot – graphics, collision, nav mesh, audio, etc. And now paths.

Changing my spacial subdivision structures is a major refactor, it would touch lots of major systems, and I’m just trying to get path following to be robust and fast and the two are really unrelated. I’m pretty sure I’m about to Shave a Yak if I go down that road. So for peace of mind I check the memory usage of the subdivision structures, and it’s silly low. Alright, no reason for that to be considered again.

Now I work through the list, getting simple things working, then adding all the details until path following works and characters can handle events like getting stuck, tasks canceled, and repathing around new obstacles. None of this is that hard or will take that long (maybe), it just needs to be done right. Phew.

This is what developing the game will be like until the core game loop and engine code solidifies. Even after it does, the new features that seem so simple on the surface can have far reaching design and implementation problems when you get into them.

So what was next on that list again…?

19 Comments