Not logged in, Join Here! or Log In Below:  
News Articles Search    

Submitted by Brebion flavien, posted on May 18, 2001

Image Description, by Brebion flavien

These are screenshots from my 3d MMORPG engine. They were taken on a P2-400 with a GeForce. I tried to show a few interesting effects, such as true volumetric clouds and detail objects (grass) on the ground. The performance is rather low due to many factors: no LOD (this will be my next big task), true 3d meshes for everything (including the landscape), on-the-fly computed soft shadows (look the mountains and the house; the sun can be moved), and mostly, everything is dynamic (vertex buffers are refilled every time). The application is CPU limited.

The sky and grass are animated. I tried to not abuse of the lensflare effect, which can be disabled by users anyway. There is also the usual stuff (particles, 3d sound, procedural terrain textures, detail textures, collision detection, frustum culling, 3ds importer, shaders, state groups, etc..). I am sorry for the poor quality of the models/colors/textures used -i'm definately not an artist-.

I am seeking a job in the game industry in France. If you are interested, or know someone that could be, feel free to drop me a mail. I'll release a public demo in a few days.


F. Brebion

Image of the Day Gallery


Message Center / Reader Comments: ( To Participate in the Discussion, Join the Community )
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
Falstaff Fakir

May 19, 2001, 11:17 AM

If you want to see the use of lensflare by reading the z-buffer in
OpenGL go download the "Serious Sam"-techdemo. According to the developers this is the technique they use. It looks great and runs
really smooth on my Duron800/GF2mx


May 19, 2001, 11:59 AM

Actually i dont think 768mb is enough, i have 256mb but after playing for an hour the mem usage was 900-1000mb, i thought that was impossible.

David Olsson

May 19, 2001, 12:59 PM

"My understanding is that LOD will _increase_ CPU load (In theory not including DX8 PMeshes)."

completely application specific.
Like if you do a lot of per vertex stuff might gain a lot by having less vertices.

Fabian Giesen

May 19, 2001, 01:54 PM

They are not.

In fact, I once had exactly this problem because I tried to sync something to music and it didn't work because the drivers buffered it for several frames, and, worse, they did run through those few frames very fast, then rendered them all "at once", then again buffering, so the framerate was not oscillating like wild.

To fix it I had to lock the backbuffer, then unlock it directly after each frame was completed, so that the drivers had to flush the buffers (this workaround was recommended by one of the DirectX developers). This is definitely *not* something I'm making up, and I believe NVidia mentioned in some recent presentation that they had fixed this problem, so it seems like it was there at all.

And Mark: The NVidia Detonator drivers sometimes allocate 30MB+ for buffers, what do you think they do with it? :)

Just my $0.02

Mark Friedenbach

May 19, 2001, 02:12 PM

The NVidia Detonator drivers sometimes allocate 30MB+ for buffers, what do you think they do with it?

Then NVidia's insane. Who the hell wrote those drivers? They should spend more time rendering and less time memcpy()ing your vertex arrays...

Well, just for the record, I have never used an NVidia card for development. I know that the 3Dfx and ATI drivers rarely, if ever, buffer more than 1Mb of commands at a time..

Brebion flavien

May 19, 2001, 02:30 PM

> have you thought about disabling alpha blending and using alpha testing instead?

Yes, i tried that but it looked less good and it was really tricky to make it work with transparency ( for distant grass to "fade in" ). There was some small artifacts too, and it still required the distance sort (for border pixels), so i prefered the alpha-blending technic. Fillrate is not really the problem here.

About using splines: i could give it a try, but i'm using a precomputed table for the grass animation loop, so i'm not sure it'll help to do less calculations.

> If your CPU limited, how will LOD improve FPS ?

It is CPU limited now, but i have tons of ideas to improve that. I hope to reduce the overhead to a minimum. By the way, LOD is still needed if you want to have a far clipping plane at many kilometers. The test scene, shown in the shots, is actually quite small ( around 200x200 m ). Not acceptable for a MMORPG. I'm not sure of what kind of LOD i'll be using, but it won't be ROAM. One of my requirements is that the terrain mesh is exactly as any other object, so i can't use a terrain specific lod scheme. I think i'm going to use a set of ~10 different levels of details for each object ( computed at run-time ) and use geomorphing to avoid popping.

F. Brebion


May 19, 2001, 02:48 PM

Actually that's exactly what I meant:

" somehow they have to be syncronised, by you!" :)

My reasoning was that it's impossible to let the GPU draw the 1000th frame while the FPU is still computing the 500th frame, or the other way around...

Right after the buffers are flushed, it should be very fast to read the z-buffer since there are no other operations to wait for like Mark said. So the next frame will be rendered after this nanosecond :)




May 19, 2001, 02:58 PM

It's the other way around. The CPU'll be working on say, the nth frame, while the video card is drawing frame n-1 or n-2. There isn't much disparity, so it doesn't make the gameplay look jerky or anything like that, and it keeps the CPU from having to wait for the card to draw the frame before it moves along with it's work, thus saving CPU time.


Mike Taylor

May 19, 2001, 03:53 PM

At the risk of sounding like a DirtyPunk clone, this sounds like a good job for VIPM buffers. I would hope that you aren't planning on doing a massive amount of dynamic terrain if it is going to be a MMORPG engine, and VIPM works on damn near any type of mesh, so the tools are very portable. And it is about the most hardware friendly algo out there for current hardware. You may have to do some hybriding at the very far distance, like I do, but a patch based VIPM LOD scheme sounds like exactly what you need.

-Mike Taylor


May 19, 2001, 07:59 PM

Just implemented 4 readpixels per frame in my opengl terrain engine and there is no measurable performance hit.


May 19, 2001, 11:49 PM

Reading one pixel out of the Z buffer once you've rendered is not very slow at all, especially compared to the other alternatives. I happen to know for a fact that that's what several commercial games do, too.


May 20, 2001, 08:13 AM

what? blaze only using 256mb? could this be true? ;)

u9 runs perfectly on my celeron2 600@750 with 256mb and a gf2mx on win98 (win2k kills it), don't know what your issue is...


May 20, 2001, 09:44 AM

Bwahaha. What isn't an issue?

I gave up after the patch didn't fix enough of the problems. It really just wasn't worth trying to play it through.

Patrick Lahey

May 20, 2001, 09:58 AM

Nice work!

You may want to consider using a different sorting method. Quick sort does not take advantage of frame to frame coherence and has a worst case complexity of O(N*N).

Take a peek here:


Kasper Fauerby

May 20, 2001, 10:05 AM

Randomized Quick-sort has an expected time-complexity of
O(log(n)*n) though and it's easy to implement - instead of choosing the middle index as pivot value you pick one randomly.

IIRC Radix runs in linear time but has some restrictions which
makes it less suited for sorting numbers with a large span.
(I haven't read the article you referred to though, this is just what I remember from a CS class some years back ;)

Joakim Hårsman

May 20, 2001, 10:19 AM

If you check out the paper on real time fur rendering by Hugues Hoppe et al, that's pretty similar to what you're describing. That technique would probably produce really good looking grass, but sadly it eats fillrate like there's no tomorrow.


May 20, 2001, 01:33 PM

Not that I don't like this one (I do, I'd like to see it in motion), but isn't it about time for a ne IOTD?


May 20, 2001, 02:56 PM

I fail to see how a randomized Quicksort can be quicker than a normal quicksort, and even a hybrid Quicksort (Quicksort which switches to an insert sort when the distance between keys are small enough). And my 4 pass Radix Sort is faster than my Quicksort and a good friend of mine has made his 4 pass Radix Sort even faster still. I have a good idea of making mine faster again:-)

To quote him: "Anyway, I've mostly grown tired of looking at sorting algorithms. Radix sort is too damn efficient and takes the fun out of it."

I've got a piece on my web-site. A bit sh*t and has a dire case of "needs finishing" which I hope to do soon.

David Massey

May 20, 2001, 03:13 PM

Isn't theory great? It's funny how theory can sometimes give the illusion that something is better than it actually is in practice. Case in point, any comparison based sort compared to a radix sort.


May 20, 2001, 04:33 PM

"As if by magic, the shop-keeper appeared..."


May 20, 2001, 07:57 PM

You can never read from the card. Ever. Period. Its evil. Its wrong. Its synchronisation. Synchronisation is bad.

Sure, Serious Sam uses this technique, and it looks *fantastic* (I mean really good - especially in that courtyard where the lens flare flickers through the palm tree). Sure, its a very elegant solution to the problem. Sure, I never noticed any performance problems.


Just because something runs fine on your PC doesn't mean it will be fine for everyone. Even if it doesn't slow things down now, you're creating another bottleneck. Imagine scratching your head when your blistering-fast new code doesn't improve the FPS much at all.

Different hardware works very differently. Eg 3dfx command fifo on card vs nvidia agp dma buffer. Depth-Buffer vs Kyro. You really don't want to mess around with the card at this low a level. It assumes too much.

This is a collision detection problem, and if your engine can't cast one ray per frame you're in deep trouble.

Damn. To me, this sounds exactly like the silly old 30/60 FPS thing.

Just Say No to Synchronisation.

Sorry. Peace :)


May 20, 2001, 08:15 PM

Nice IOTD.

Volumetric clouds sounds interesting.
Is this like volumetric fog? Or is it a way to get an nice animated sky, perhaps?


May 20, 2001, 10:16 PM


When I find out who's removnig my posts, I'm going to sic Primey on them.



May 21, 2001, 02:42 AM

First of all: peace to you too :)

Suppose a bright student develops a close-to-perfect visibility algorithm for his software engine that draws some polygons, reads data from the z-buffer, decides which polygons to draw next, reads from z-buffer again... This would be terribly slow for hardware accelerated engines because the CPU always stalls while the card is drawing stuff. Conclusion: hardware acceleration is bad. Period :)

Back to our lens flare again. At a certain moment every frame, the CPU has to wait for the GPU, or the other way around. When this happens, it should be almost free to read from the z-buffer since all instruction buffers are emptied.

If suddenly nVidia decides to make reads from the card much slower to speed up all other operations, then all those great games that sometimes read from the card will all be dead. Games that don't do this sell double. Conclusion: nVidia will not make it slow as long as there are developpers that use it. nVidia has to make it possible to read fast, the developper just needs to use it in the correct way...

Suppose you have a 'jungle' game and you want to know if the enemy can see you throught the leafs. Most methods I can think of read back data from the card. You never know that in the future someone will need to read back data for some cool effects.

There's only one method to know it for sure: take your cutting-edge landscape engine and implement this. I'm that stupid student with his software engine so don't ask me to test it ;)



Kurt Miller

May 21, 2001, 02:51 AM

Please stop posting in huge fonts for one thing. Aside from MB, I believe I've received more complaints and alerts from people about your posts than anyone else on these forums...

Kasper Fauerby

May 21, 2001, 04:07 AM

Well, the short version:
Quicksort is a divide and conquer algorithm which means that it will sub-divide the problem until it is trivial to solve and then stitch the pieces back together. Quicksort does this by dividing the data up until there is unly 1 (or two) elements in each chunk which can trivially be solved. The step of dividing, stitching together etc. can be done in linear time and is responsible for the 'n' in the time complexity. The number of times it must subdivide - or the depth of the recursion tree - is the other factor.

In normal quicksort you always pick the value at the last index to split the data after. A perfect splitter divides the data exactly into two equally large sets which leads to a tree depth of log(n).
On an already sorted list the normal quicksort will always pick the worst possible pivot - namely the largest number in the data - and the two halfs will not be of equal size. This leads to a tree depth of n and a time complexity of O(n*n).

If you choose the pivot randomly then you can show (if you are a math-head) that with very high probability you have a 50/50 chance of selecting a 'good' pivot which splits the data almost equally. Some more math reveals that the depth becomes log(n) and thus the time complexity is O(n*log(n))

However, I just read the article about the Radix sort and I must admit that it sounds promising. With a constant of 4 it outperforms the quicksort algorithms at very small data-sets ;)

Brebion flavien

May 21, 2001, 06:59 AM

I have never counted of many grass polys are used, but there's probably around 2000 grass objects ( 4 tris each ) being used per frame.

As for volumetric clouds, it's not at all like volumetric fog. Volumetric fog is hard to be animated, and requires a lot of cpu ressources anyway. I'm using impostors for my clouds. Each impostor is physically simulated with many thousands of particles and rendered to a texture. That way i spend no time at rendering time and clouds can be animated.

F. Brebion


May 21, 2001, 12:50 PM

"Suppose a bright student develops a close-to-perfect visibility algorithm for his software engine that draws some polygons, reads data from the z-buffer, decides which polygons to draw next, reads from z-buffer again... This would be terribly slow for hardware accelerated engines because the CPU always stalls while the card is drawing stuff. Conclusion: hardware acceleration is bad. Period :)"

No (IMHO?). We have to write code that works best on current hardware. If your 'Z-buffer-reading nifty visibility algo' runs slower than just rendering all the polys, then I'm afraid that rendering all the polys is better. And the hardware isn't bad, because it's still improved performance. :)

btw I agree with a_j_harvey


May 21, 2001, 02:30 PM



PS: Shut up BlindSheep.


May 21, 2001, 05:43 PM

Very cool, when will we be able to see that demo?

This thread contains 95 messages.
First Previous ( To view more messages, select a page: 0 1 2 3 ... out of 3) Next Last
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.