Thursday, December 3, 2009

Optimization - Or when your application eats up 150+ megs of memory generating maps...

So, yesterday I finished stairways and generation of multiple level maps. It's somewhat boring, but it works. I turned the map generator up to generator 5 levels and built a release build, to get an idea of performance. There was a second pause before the game started, a bit more than I'd like to see. I turned it up to 50 maps, and ran into a 30 second pause that the task manager claimed ate up 150 megs of ram.

Now before you get out pitchforks and tell me that all the managed code stuff I talked about in the previous article is the cause of it, I want to make two quick points.
  • Much of that memory would be reclaimed the minute another program asked for it, since it was a lot of garbage the garbage collected was waiting for next collection on.
  • The mistakes I made would have been just as slow in C++, RAII would have reclaimed the memory right away (good), but all the allocations I made would have been just as slow (bad).
I won't bore you with the normal two or three quotes about premature optimization and difficulty in predicting where to optimize, you have already heard them before. What I will say is that every time I run into performance, every single time the cause is not where I'd guess.

A month ago when I was just starting the map generator, I was seeing poor performance moving around. I figured it had to be in my game engine somewhere, maybe in the other actors movement. It turned out to be a part of the GUI that was walking every single tile in my map to draw, even if off screen. Since it was in my GUI loop, just the act of walking all the tiles didn't scale to 200x200 sized maps.

To get back at the story at hand, I figured the way I was generating maps was too slow, that I'd need to rip it out and start over. I made the decision to measure my performance and see what was slow. I'd like to tangent to point out the CLR Profiler and EQATEC Profiler, which are free and awesome. So, it turned out that the subroutine that placed chests and monsters was taking 55% of the entire generation time, and using a crazy amount of memory.

The way the chest and monster placement works for my cave generator is simple: from the generated map, get a clear point that isn't too close to the stairs and use it. Repeat for number of chests and monsters. The problem was in the get a clear point. My naive algorithms walked the entire map and created a list of free points. Then it randomized it, and started grabbing from the front of the list until it found one wasn't too close. I figured it'd be more effective than randomly generating points and checking them in maps with a small percentage of empty tiles. You might see the problem here already, I'm walked all the tiles to painstakingly generate a list, randomize it, and then throw it away when done. When you do this 20+ times per level, then 50 levels in a row, it adds up.

Once I cached the list, the generation time dropped in half, and so did memory. I found another similar issue in the core map generator, see here for details. I'm tracking one more down in the stitch map generator right now.

The "lesson" from this story, if there is one, is that I would have never guessed that GetClearPoint was the culprit. A lot of people claim they want fast code, and might try to optimize where they think it's going to be slow, and do nothing but add bugs and obscure coding constructs. Write the simplest code first (with reason, no bubble sorts), then measure and see what's slowing you down. Once you know that, you can whip out the black magic optimizations there.