Not logged in, Join Here! or Log In Below:  
News Articles Search    

Submitted by Jacco Bikker, posted on October 18, 2000

Image Description, by Jacco Bikker

Here we have a couple of screenshots from the visibility algorithm that I am developing here at Davilex. The algorithm is a combination of an adaptive octree for spatial subdivision and a c-buffer for occlusion testing. About the shots: The first one (top left) shows the scene, rendered normally. The one in the top-right corner shows the c-buffer: 'Red' means occluded. Note that I don't just draw every single polygon in this buffer, but just the large occluders. Also note that I normally use a lower resolution c-buffer, but that doesn't look as good for an IOTD. :) The green lines are the boundaries of the adaptive octree nodes, the blue lines (hard to see) are the bounding boxes of the detail meshes. The two images on the bottom side of the image show the scene rendered in wireframe mode, with and without culling. This is a conservative culling algorithm; as soon as a part of a mesh is visible, it's drawn completely. This ensures minimal culling costs, resulting in optimal performance on today's hardware. For software rendering, you might want to try a slightly less conservative algorithm.

Jacco Bikker, a.k.a. 'The Phantom',
Davilex Software (

Image of the Day Gallery


Message Center / Reader Comments: ( To Participate in the Discussion, Join the Community )
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.

October 18, 2000, 02:58 AM

Hey, cool. Could you go into any particulars about your octree though? Like, what do you do if meshes end up in multiple leaves, or if you have overlapping meshes which can't be put into separate leaves? Also, is this visibility engine dynamic, or must scenes be static?

Also, how are you determining whether an occluder is large or not? The sorts of heuristics involved usually seem kinda arbitrary to me. :)


October 18, 2000, 03:34 AM

Good questions. :) About the octree: It's an adaptive octree, so I can move the splitting planes around a bit to get a good subdivision. The only 'options' are splitting plane positions that 'touch' a bounding box of any of the meshes in the node. An option is assigned a score, the best score is used. The score consists of mesh balancing, spatial balancing and number of splits generated. Splits are terrible, so they have a big impact on the score.
About splitting meshes: The meshes from the original scene (a NetImmerse scenegraph) are NOT stored in the octree, and thus NOT splitted. This is so because the vis algo serves as a plugin for NetImmerse; I simply turn parts of the scene graph on and off with my algo. I do store polygons in the octree; these are the polygons from the occluder meshes, in the future hopefully in a simplified form. All other meshes only have a pointer in the octree to the original mesh in the scenegraph.
Scenes can be dynamic; you can of course easily move the dynamic meshes around (the ones with just a pointer in the octree), and the other meshes can be moved by recalculating a part of the octree. That's the reason for picking the adaptive octree, and not the kd-tree; I think I can recalc the octree faster at run-time. Of course, you don't want to move huge amounts of houses each frame, :) but you can blast a hole in an occluder, if you wish.
Finally: Determining if an occluder is large or not is specified in 3DSMax, at the moment, by hand. In the ideal situation, the artists would only have to distinguish between 'mostly dynamic' and 'mostly static'; the compiler would then try to construct a simplified occluder mesh for all 'mostly static' meshes. I am still thinking about that.

Jaap Suter

October 18, 2000, 03:48 AM

Wow, Awesome!!


See you next Monday.

Japio De Amigo


October 18, 2000, 04:53 AM

If you render the c-buffer at a lower resolution, couldn't you potentially run into situations where a pixel is deemed "occluded" by the c-buffer, when it's not?

Consider a c-buffer that has half as many scanlines as the screen. In that case, one c-buffer pixel would represent a vertical column of two screen-space pixels. When rendering to the c-buffer, won't there be cases when you'll fill in a pixel from the c-buffer (i.e. two screen-space pixels), when the same rendering to the screen would only fill in one screen-space pixel? This really should only be a concern along the edges of occlusion polygons.

If (when) this happens, you'll have a screen-space pixel deemed "occluded" by the c-buffer, when it's really not. This would then cause a polygon to not be rendered when it should be (the opposite of "conservative").

Take it to the extreme to see the effect clearly: use a c-buffer of 40x30 to represent a 640x480 screen. In this case, each c-buffer pixel represents a 16x16 grid of screen-space pixels. Of course, there's no need to decrease the horizontal resolution of a c-buffer, so this is really a vertical-resolution issue.

Extend this to hardware rendering... In order to match the screen-space resolution, you may find yourself rendering your occluders into the c-buffer (and performing scanline conversion on non-occluder polygons to test for visibility) using a rather high-resolution c-buffer (1280x1024 or even 1600x1200). This is why Unreal slows down more than Quake as you increase the resolution.

Ned Greene's paper ("Hierarchical Z-Buffer Visibility") covers this topic and solves the problem by using a hierarchical z-buffer as opposed to a lower-resolution z-buffer.

Also, by only using a c-buffer aren't you required to have a perfect front-to-back sort order? How do you accomplish this without BSP trees? Newell-Newell-Sancha?

To end on a good note, couldn't you try to use a series of c-buffers to accomplish the hierarchical z-buffer effect? I wonder if it would help.

Finally, when rendering your non-occluders to test for visibility, it seems to me that it's a "search for a visible pixel." Couldn't you apply quick searching techniques to speed things up? For example, "check" each non-occluder in the c-buffer by scan converting it in multiple passes. For example, check the even scanlines first to cover the polygon's area quickly. If you find an open pixel near the bottom of the polygon, you would have found it sooner than scan-converting the whole thing. Extend this to 4, 8 or even 16 passes (in effect, performing a "blind hierarchical search.") There should be a lot of ways to speed up this search.

- MidNight


October 18, 2000, 05:12 AM


Lots of things to reply to. :)
About the lower resolution c-buffer: Jaap and I talked about this problem. Jaap mentioned the theoretical case of a 2x2 c-buffer: In that case, it is extremely easy to see that the visibility information is 'not accurate'. There are two ways to handle this problem.
1- Jaap suggested to use a c-buffer with a 'reasonable' vertical resolution, eg. half the horizontal resolution of the final image. This would still cause some 'popping', but your eyes wouldn't have much problems believing that it is actually correct. Of course, this is not 'conservative'. Currently, this is the situation, however.
2- While reading your post, I thought of something else: I never check individual polygons against the c-buffer; I always check bounding boxes. That means that I can simply enlarge the bounding box to compensate for the inaccuracy of the c-buffer.

About the perfect sorting order: This is not needed. When an octree node is 'processed', it automatically means that all detail meshes in the node are set to 'visible', even if they are behind an occluder polygon in the same node (that should have been prevented by a better subdivision, of course, but let's assume it happens). That means that a detail mesh can only be occluded by a different node. The occlusion caused by the occluder polygons in a node is always the same, no matter in what order you draw the occluder polygons in that node. Therefore, I need no perfect sorting. In fact, I would gain speed if I would first insert the largest polygons, and then the smaller ones.

I DO need splitting of occluder polygons against octree node boundaries, of course, because I need perfect sorting of occluder polygons on the octree node level. This makes moving 'static' geometry more expensive. This could be solved by using an s-buffer. Right now, I consider the s-buffer a good solution for scenes that require that all objects must be moveable. For all other scenes, I believe the c-buffer is faster.

About the search for the visible pixel: That's a very good idea. I'm not sure how much time checking actually takes (it's a cheap process, code wise, but still), so I'm installing VTune right now, but it might indeed be a good idea to do it the way you described. After all, visibility testing is done for every octree node that has a (partially) visibile parent node, plus every detail mesh bounding box (or sphere, elipsoids are easy to check against the c-buffer too:).

Thanks for your post,
- Jacco.

Joakim Hårsman

October 18, 2000, 09:39 AM

What kind of speed gains do you get from this scheme? I would have guessed a c-buffer wouldn't speed things up much expecially on systems with slow processors and fast graphics hardware but you obviously get a speedup. any figures you have would be really interesting



October 18, 2000, 10:24 AM


This is a very hard question to answer. I use a PIII/500 with a GeForce, so it's quite difficult to build a scene that is faster than raw polygon dumping with any kind of visibility algorithm. I mean, hierarchical culling to the frustum of octree nodes is also done by my algorithm, so what's left inside the frustum AND occluded must be so complex that it is faster to cull it than to draw it with the GeForce. I could of course simply place a thousand spheres consisting of 1000 polygons each behind a brick wall, but that would give good performance figures with any occlusion scheme.
The scene you see in the shots has quite a few occluders, most of them consists of a lot of polygons (for example, each of the pilars is about 50 polygons, oriented vertically, wich makes them a nightmare for the c-buffer, IF I wouldn't have used a little trick), and each occluder occludes only one or a few meshes. This particular scene now runs faster WITH the algo than without. But, as I said, that doesn't mean it's 'good'.
The typical scene that we intend to use this algo for will benefit greatly from the algo, though: We expect to use it in scenes with a couple of extremely simple but extremely large occluders (buildings), occluding lots of highly complex detail meshes. Such a scene, rendered on a relatively slow accelerator (VooDoo1 & 2) will benefit greatly from this algo.
About the speed of the c-buffer: Since I don't use the c-buffer for drawing pixels, I can use a greatly simplified version of the algorithm. A span that is inserted 'behind' 10 other spans for example does not need to be split in separate pieces in order to be able to render each piece; I can simply delete the existing spans and store the new one instead. The most expensive part of the algorithm is therefore the code that finds the correct spot in the linked list of the c-buffer to insert a new span. Maybe I can improve this a bit by using a small tree, but I'm not sure if the more complex implementation of that will really run faster.

- Jacco.


October 18, 2000, 10:57 AM

Use a skiplist to track your spans

If the # of spans is very small in a particular node, a linked list is fastest, so just skip the random # generation and put it in the lowest linked list only until you get a reasonable # of spans.

Once you have many spans, the skiplist rocks. I have glanced at a few collision papers where they were tracking object extents ( spans ) and they saw a good speedup using skiplists.


October 18, 2000, 11:24 AM


Thanks a lot, I'll check that out. I have never heard of skiplists, but it sounds like a good optimization.

- Jacco.


October 18, 2000, 12:51 PM

Well, I didn't mean to imply that the meshes were stored in the octree, just that they were assigned to leaves of the tree, which seemed like a standard way to do it (and simply turning on or off the parts of the scenegraph based on visibility seemed intuitive to me).

Thanks for your answer, in any case. Now I've just got to digest it... :)

R. van Wijnen

October 18, 2000, 02:23 PM

Ziet er waanzinnig uit. {That's dutch for looks great}

Gegroet vanuit een regenachtig Holland.



October 18, 2000, 02:47 PM

Don't count out the octree occlusion!!! I have about 35k polys in the world for a game demo I'm working on, and after the octree / view frustum culling I end up with around 2-5k. I get about 2-4x the speed of drawing all of the polys this way (I don't have an exact figure). A lot of times I agree, the card can draw triangles faster than you can cull them, but if you can do gross culling (like with an octree or some other bv) then it's worth the effort.
The screens are pretty sweet... I love the shots that let you see the techniques 'in action.' Keep at it.

zed zeek

October 18, 2000, 05:33 PM

ain't graphic cards becoming much faster relatively than cpus + ive read this trend will continue
eg bullshit figures (1995 cpu-100mhz card 200kpolys vs 2000 cpu 1ghz card 20mil polys)
ive noticed this meself ,
hey great idea to do a better cull. type type type finished
excelland im sending 30%less polys to be drawn WTF me framerate has decresed


October 18, 2000, 05:47 PM

I don't know a lot about the theory behind c-buffers, but don't you need the transformed, in-screen coordinates of the triangles to update the c-buffer ? If so, it means you probably need to transform the vertices yourself, don't you ? How could the engine benefit from T&L then ?


Arne Rosenfeldt

October 18, 2000, 06:41 PM

If someone invents a really good culling algorithm,
the graphic card producers will pour this faster into hardware
(so its as fast as drawTriangle),
than I would into software ;-) "OpenGl-Extension"


Marco Al

October 18, 2000, 07:23 PM

The hardware would have to have a scenegraph to work with though, otherwise the best you could do is pass a bounding box along with a set of triangles and let the hardware do the bounding box check (feedback is evil).

BTW mr. Rosenfeldt, did you ever get anywhere with your culling idea? Did you see the paper Conservative Visibility Preprocessing using Extended Projections? Reminded me of some of the things you were trying to accomplish.

zed zeek

October 18, 2000, 10:26 PM

did anyone understand that cause reading it now i certainly dont


October 19, 2000, 03:08 AM

Ehm, Zeek,

I'm not sure I don't understand it. In fact, now that I think of it, I'm pretty sure I don't understand it. :)

Are you against or pro culling? :)

- Jacco.


October 19, 2000, 03:16 AM

I hear this question a lot lately, and here's my opinion on the subject:
1. Just because NVidia comes up with another feature that potentially speeds up rendering doesn't mean everyone should use it. Using T&L rules out a lot of algorithms (i.e., all screen space algo's). Or, if you put it differently: Using T&L improves the speed of all world space algo's. That does not mean that everyone should start using world space algo's, it just means that everyone who does just got a free speed increase.
2. I can use HW T&L for detail meshes, just not for the occluders and the bounding boxes of detail meshes. Detail meshes will never be send to the c-buffer, nor tested against it. That means that I cannot use hardware T&L for about 1% of my scene. I don't think that will decrease the speed of my engine too much.

But I really want to stress that you should not blindly follow other people or cardmakers (subtle insult intended :). Doing that brought us an endless stream of Quake viewers, and endless stream of stencil buffer demo's and it will soon bring us an endless stream of demo's with exactly 8 colored lights, rendered really quick, if you have a GeForce. :)

- Jacco.

Kurt Miller

October 19, 2000, 04:36 AM

I think these two lines of his were a simulation of sorts:

"hey great idea to do a better cull. type type type finished
excelland im sending 30%less polys to be drawn WTF me framerate has decresed "

Meaning, as I interpret it, that Zed put in a lot of work on a culling algorithm that let him send along 30% less polys, only to find it actually slowed down the engine. The point being it would've been faster in his case to blast off all the data to the card and let it do its thing, versus spending precious cpu time figuring out how to send less.

In a talk that Carmack gave earlier this year (mp3 here), he emphasizes the fact that "graphics cards are getting faster than the cpu is." Its a good listen that's relevant to this discussion because as you know, Quake3 uses a PVS versus any "fancy" dynamic culling algorithms, and the game is quite fast despite the fact that its often rendering a lot more than is really neccesary to complete the scene.

I realize Jacco that you're probably the last person who wants to hear someone say "Carmack says...", but I bring this up simply because he has indeed shipped a very fast hardware-accelerated game. Seeing as how that's the goal of many developers these days, I'd argue that his experiences are at least worth a listen. That certainly doesn't imply you should neccesarily take his advice, just perhaps keep some of these things in mind. I think its great that you're working on a new dynamic visibility system. Keep it up...

Kurt Miller

October 19, 2000, 04:40 AM

I realize Jacco that you're probably the last person who wants to hear someone say "Carmack says...",

Just to clarify that comment, I meant because you're always encouraging innovation :)

Kurt Miller

October 19, 2000, 04:55 AM

If someone invents a really good culling algorithm, the graphic card producers will pour this faster into hardware (so its as fast as drawTriangle), than I would into software ;-)

I tried talking to a few people briefly about this in the past, wondering why it hasn't already been done. Several points were brought up, but the one that I heard over and over from people is the fact that in many visibility schemes, access to the full world database (or huge portions of it) is neccesary. This could very well be a problem for a hardware visibility algorithm, especially since people would make their worlds much more complex if they had a fast hardware algo. Not to mention the extra texture usage :).

Btw, weren't gigapixel doing a kind of hardware vis based on some form of sophisticated tile-based architecture? Anyone know what ever happened to them after 3dfx bought 'em up?

Jaap Suter

October 19, 2000, 07:42 AM

All we need is a hardwired:

bool IsPolygonVisible( vertexList );

method and the possibilities will become endless.

Jaap Suter


October 19, 2000, 07:52 AM

That would be impossible without sorting your polygons first.
IsPolygonVisible could return true until some other polygon further in the list would happen to occlude it.

I would like to see hard-wired:

bool IsOccludedByOccluder(occluder, vertexList);

funny name :)


October 19, 2000, 07:57 AM

Unless the card has access to all vertex-data in the scene, in that case IsPolygonVisible would be feasible. But that would mean that the polygon should be checked against every other polygon in the scene.

Just nitpicking here, Jaap. :)

Marco Al

October 19, 2000, 08:06 AM

Oh joy rendering in lock-step with the 3D card, what an improvement. Non immediate mode rendering becomes impossible, in other words tiling/chunking/bucket-rendering, which pretty much means deferred shading is out the window too (not everyone might be sorry to see them go, but I still say they have great potential... as Ive said quite often, if tiling is good enough for Pixar its good enough for me). Although it wouldnt be restricted to tiling, ATI's MAXX cards wouldnt like it much either.

There are perfectly good screen-space occlusion culling algorithm's available which are just not feasible because of the software/hardware divide. The alternative would be to use a scenegraph interface and let hardware handle the whole thing :)


Arne Rosenfeldt

October 19, 2000, 08:10 AM

Hello Marco AL

I will read the paper tonight.

The problem is, this is a hobby, so I can wait for the ultimate idea.
(I study physics mainly)
I am here to get the experience from people who do a lot more testing with real code, than I do.

I still belive in bounding boxes.
If you do not have them, what enables you to go top down, wich is needed for culling?
Particle systems, automates and results from physical simulations
do not provide bounding boxes :-(

I still do not see, why a BeamTree has a much greater cost that viewing frustrum culling? You do not have to start of the root of the tree for every polygon.

When people start to model single bricks in the wall, there will be no large occluders anymore. So I still prefer a coverage-buffer (organized as a beam-tree).


Arne Rosenfeldt

October 19, 2000, 08:20 AM

No good idea how not to start at the BeamTree root

Jaap Suter

October 19, 2000, 08:54 AM

I disagree Willem.

Your hardwired IsOccluderVisible has nothing to do with graphics hardware since it has nothing to do with the display or texture memory. Besides it would be way to tied to the scenegraph. You are basically saying that it is cool to hardwire every algorithm which is true ofcourse but not very feasible.

You are right about the front to back sorting of my method. But that doesn't matter. Imagine having the ability to test the visibility of octree nodes in hardware???? That would be so incredibly freakin awesome. My method needs acces to the zbuffer and since that is located on the graphics hardware it can be put in there right away. Ofcourse it would need some nasty bus-changes cause obtaining feedback from the graphics card sucks right now. But the method itself would be very handy.

Jaap Suter

Marco Al

October 19, 2000, 09:23 AM

Oops I think I mixed up your talk about combining coverage for siblings with another post. Still an interesting paper though :)

This thread contains 35 messages.
First Previous ( To view more messages, select a page: 0 1 ... out of 1) Next Last
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.