flipCode - Harmless Algorithms

A Hybrid Approach to Visibility
by Edward Kmett (harmless@mich.com)

This column will discuss an example of a hybrid visibility algorithm from a few-year-old test engine of mine, which used approaches from the previous columns. Providing the terminology to give the explanation below was the main thrust of the original Harmless Algorithms columns, but I quite simply never found the time to finish. My career has taken me far away from 3d graphics. Well, since a number of people have been nagging me about it and it has been over a year since my last update, here goes.

Hybrid Visibility:

Goals
Octree
BSP Cache
Point Lights
Rendering
Summary

Goals Top

First of all, lets state our goals for this particular engine. I want to manage a million polygon scene with some highly dynamic areas and some very stable portions. I also want to minimize level compilation time to facilitate rapid prototyping by the art staff. I need fast collision detection so I can afford to run physics. The engine is geared for a primarily indoor scene, so I can avoid dealing with horizon issues, or failing that always fall back on a far clip plane. I also want to experiment with shadow casting point lights, because of the ambience I want to give the world.
With these goals in mind, I'll really just dive into the choices made to support this engine, rather than the reasoning behind the different choices made. Some of them will seem rather counter-intuitive, and they very well may be wrong for the type of engine the typical reader of this column may wish to build. Some of the choices were made simply because of some feature that I wanted to support that I could not support through a conventional approach (i.e. shadow casting point lights) and may induce a global suboptimization in search of a single goal. All of game engine design is about trade-offs, and I do not mean to try to prove that the approach below is the perfect approach, merely that it is an approach.

Octree Top

The first piece is to use an Octree to divide space up into manageable pieces. The scene preprocessor just grabs polygons and dumps them into Octree buckets, recursing until a maximum depth is reached or a reasonable number of polygons per Octree node (around 50) is reached.
Now, I could stop here, clip the polygons to the Octree and traverse my scene using a neighbor traversal of the Octree structure, by inserting checking each of the polygons in the node against a C-Buffer and rendering the ones that pass, then inserting all of the nodes into the C-Buffer before checking the faces of the tree against the c-buffer and traversing outwards. The problem is this does not provide me with intra-node polygon culling (No polygon within the Octree node can occlude any other polygon within the Octree node), and I have no way to achieve front-to-back sorting within the node, and my collision detection times are terrible. This would be fine, if I didn't want to deal with collision or shadow casting point lights.

BSP Cache Top

So, instead I decide to go with miniature BSPs inside of the Octree nodes, and to really play devil's advocate, I don't even precompile them. Before you panic, please consider that these individual BSPs have very manageable polygon counts (usually 50 or less), and so the nonlinearities of the BSP compilation process are minimized. 70,000³ polgyons for a 70,000 polygon scene in a Quake-style engine vs 50³ performed 1400 times (2000 allowing for shared polygons) differs by several orders of magnitude. ~350 trillion operations vs. 250 million operations is quite a spread, especially when you consider that the 250 million operation version can be handled in chunks of 125,000 operations. Please note that these numbers are worst case scenarios. Also note that this is for only a 70,000 polygon scene. As the scene approaches a million polygons, these numbers become far more ridiculous. (1e18 vs 2.5e9)
The polygon soup is tagged with what Octree nodes they belong to and are accessible through lists from the Octree nodes themselves, so when I need to compile a node into the mini-BSP cache, I just pass in each of the polygons known to be in the node to the BSP compiler. The BSP compiler does not bother balancing the tree because the computational overhead of balancing is worse than the pathological cases an unbalanced tree can induce. The compiler generates clipped polygon fragments which are used for visibility determination, but keeps pointers from the fragments to the original polygons. The miniBSP cache is maintained on a LRU (least recently used) basis. You could gain a global speed-up by removing the cache and precompiling the BSPs, but you would lose the ability to perform gross geometry changes by changing the polygons or large-scale objects that occupy a set of nodes and expiring their LRU cache entries.
When performing a front-to-back traversal of the node, you walk the BSP inserting fragments into your visibility system (Either C-Buffer or Beam Tree have their own relative merits, depending on display resolution and if you are using anti-aliasing), but if you insert a fragment you render the whole polygon (skipping it of course if you have already inserted another fragment from this same polygon). Rendering the scene could still be done ignoring the BSP, using the approach above if you wish - and may be considerably faster if the BSP fragments too heavily, but the presence of a front-to-back structure is useful for collision detection and complex lighting. If the BSP is too fragmented then the cache hit of walking the polygons in the Octree node twice to render then insert into the fine occlusion culling algorithm is less than the overhead of walking the structure.
The real purpose of the BSP is to provide something to lean on for collision detection. BSP collision detection algorithms are among the fastest in the industry and are very elegant and pleasant to code.
Other uses for the BSP are real-time CSG, which is useful if you really want to blow holes in walls. CSG of a convex solid against a BSP is actually a surprisingly straight forward process. The problem is the number of polygons will increase dramatically over time if you leave this unchecked, and so it really needs good in-game support and reasoning to be practical.
I should stress that characters are not part of the BSP. Their moving polygon count renders them far too complex to reasonably insert into such a structure.

Point Lights Top

Now we have reasonable collision detection, and a solid scene traversal which will prevent us from seeing things which aren't there, with safety valves for their more pathological cases. Let there be light.
Note that thus far there hasn't been much of a preprocessing step. Dumping polygons into Octree buckets can be done, worst case, as the engine loads. This is a good thing for rapid prototyping. However, we all know that you can't get away with radiosity under such conditions. So, if you want radiosity, you'll have to preprocess it. On the other hand, this will doom any attempt at dynamic geometry in the areas in which you pre-light, because it will not properly adapt to environmental changes.
My answer was to go goth and move to harsher lighting. The Beam Tree algorithm from the earlier columns can be used not only for visibility but to determine the visible polygon fragments in a scene from a given point. These fragments happen to directly correspond to the fragments that would be lit from a light source at that point.
What is required is a front-to-back traversable scene structure and a Beam Tree per light. If you provide linear fall off for the light source you can terminate within a known distance of the source light, providing caps on the time the lighting algorithm will consume. Also, you can 'render' lights on a cached basis, because the polygon fragments lit by a given point light do not change from frame to frame unless the geometry is moving. If you limit their use to areas of low dynamic geometry and/or limited CSG, then you'll do fine.

Rendering Top

First of all, I'm going to need to stand more of the conventional rendering process on its ear. Instead of rendering textures first, I'll render light. This way I can use additive textures to add multiple lights to the same surface. The cost of this is that this engine cannot use detail textures. If you want ambient light you can render it by rendering an ambient light polygon as a base. Otherwise, rendering a black starting polygon will do. Then render the polygon fragments for each light source affecting this surface. (Cached as a list in the polygon). Any bumpmapping type pass can also be performed here. Finally render the actual texture for the polygon using a multiplicative pass.
If you want to move back towards a conventional rendering pipeline you could in theory render the point lights to a lightmap and apply the lightmap as a multiplicative post-process. However this will mean considerably more texture memory usage and thrashing of the texture memory will now be guaranteed as will the overhead of performing software lighting on the texture. Using the approach above, texture memory remains stable and available for use by more textures.
With some tweaking you can use the lighting algorithm to project light through stained glass and you can provide some cheap volumetric lighting effects by using the edges in the Beam Tree to provide you edges for volumetric light streamers. They are not physically accurate, but they can look pretty good anyways. You can mix and match with the standard bag of tricks, like putting billboard coronas around the light when looking right at it and/or a lens flare if you want to be tacky, etc.

Summary Top

My primary goal for this column was to show the ways that a number of innocuous algorithms that in-and-of-themselves are not very useful can be strung together to achieve goals that none of them can achieve alone. For example, a BSP could not handle the dynamic geometry or even the sheer size of the world. An Octree could not provide decent collision detection or support for the shadow casting point lights. A Beam Tree is not always the best algorithm for visible surface determination, so we provide the C-Buffer as a fallback.

Harmless
February 3, 2001

You can contact the author at the following address: harmless@mich.com
You can also check out his web site at: http://www.kmett.com/

This document is � 2001 Edward Kmett and may not be reproduced in any way without explicit permission from the author (Edward Kmett). All Rights Reserved. Any and all trademarks used belong to their respective owners. Best viewed at a high resolution. The views expressed in this document are the views of the author and NOT neccesarily of anyone else associated with flipCode.