Not logged in, Join Here! or Log In Below:  
News Articles Search    

Submitted by Tim C. Schröder, posted on August 18, 2001

Image Description, by Tim C. Schröder

Well, I thought it's time to share some new pics from my current engine. Basically it is a GeForce 3 optimized FPS engine, with 100% dynamic lighting as the main feature. All the shots you see here have a preprocessing time of 0ms, except the 500ms for the Octree during the level compilation;-) I tried to get the lighting model working on GF1/GF2 cards, but I failed. Not possible. At least it screams on the GeForce 3. Well, I guess you want to see the feature list, here it goes:
  • Vertex.. ahh... SMARTSHADER(TM) for basically every triangle on the screen
  • Pixel shaders for the entire lighting
  • DOT3 diffuse + specular per-pixel lighting on every surface (Well, not on the skybox...)
  • Per-pixel normalization cubemap or pixel shader normalization for every surface
  • Tangent space setup done by vertex shaders
  • DOT3 self-shadowing
  • PPA for every surface
  • Realtime general shadow solution, everything shadows on everything including on itself
  • Colored lights
  • Blinking, flickering and pulsating lights through a shader definition file
  • Lights can be assigned to splines
  • Detail texturing
  • Hyper texturing
  • Advanced vertex buffer optimization code to gurantee best T&L performance
  • Light flares + coronas, implemented through vertex shaders
  • Ellipsoid based collison detection / handling
  • Realtime in game light editor, modify every aspect of the lighting without any reloading
  • Configuration system allows you to change basically everything without any code rebuild
  • Basically any damn 3D feature in the world. If it is not supported yet, it will be in the future
  • The engine is incredible CPU limited at the moment, this is my main problem. Rendering brute force is sometimes even faster than performing HSR. I can render 4 quad texture passes without much drop in performance. The CPU code isn't sooo unoptimized, but the GF3 is card that handles everything you throw at it and just screams for more, so my crappy 700Mhz machine can't keep up.

    If you are interested in a discussion about realtime lighting algorithms, you know my mail. Note: A discussion is a conversation between to similar skilled people that learn from each other. So please no "Teach me how to do this !!!" mails. I'm quite busy with my work and writing my engine, so really no time for tutorials / explanations, sorry ;-(

    Anyway, comments welcome.

    Tim C. Schröder

    Image of the Day Gallery


    Message Center / Reader Comments: ( To Participate in the Discussion, Join the Community )
    Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.

    August 19, 2001, 09:26 AM

    Wow, seems like I'm not the only one who believes in the power of modern CPU's ;)

    "I cant wait to have a programable pipeline for cpu's ( would be very nice for mp3-encoding, mpeg4-encoding and such, too.. )"

    Yeah, that would be a big step forward. BTW, an Intel Itanium has some very interesting features that make this almost possible. Unfortunately it's not meant for home PC's right now. The cool thing is that this technology is not only usefull for speeding up 3D graphics in software, but also for things like music and video encoding like you mentioned and all other multimedia applications...

    Fabian Giesen

    August 19, 2001, 09:29 AM

    People are developing *game* engines for years, not 3d engines.

    David Olsson

    August 19, 2001, 09:41 AM

    D3D is extremly slow, I don't think there is a single asm code in there almost no optimization. So it's certainly not fair to compare against that code.

    How many of your triangles are visible at the same time ?
    I don't think there is many of them.

    IMHO with a software engine with heirarchical zbuffer and front to back traversal (easy with octree) you could probably compare it with your engine. But you need to cull nodes of the octree against the zbuffer not just triangles.

    Obviously you need alot of code and it's certainly not easy but that doesn't mean it isn't possible.

    However there are some scenes where a software engine doesn't stand a chance, namely scenes with lot of visible triangles.

    The geforce3 is impressive but not that impressive. For a pure graphics application it's probably faster to let the vertex shaders be done in software. At least that's what I heard while reading the dx mailing list. (I don't anymore).

    David Olsson

    August 19, 2001, 09:58 AM

    I'd love to see that in real time, any binary I can have ?

    A little suggestion, render your scene to 320x200 and then apply a filter and smooth it out to 640x480, will probably make it look alot better and it can be done really fast.

    The big big plus for raytracing is that you can have really really big worlds since raytracing is basicly logarithmic in the number of objects with good datastructures. Best of all is probably the freedom you get.

    My own approach is something in between software and pure hardware.
    It will hopefully be something very different.


    August 19, 2001, 10:14 AM

    Yeah, keep on dreaming. Beating a GF3 with a CPU, not in a million years...



    August 19, 2001, 10:16 AM

    Yes, very true. It's just that you can never ever reach the speed of a decicated HW renderer with a general purpose CPU, this was all I wanted to say.



    August 19, 2001, 10:20 AM

    Well, I don't have the current shadow model in the screenshot, and the shot that is shown here ideed has the problems you describe. The current engine build handles attenuation and shadows 100% correct without artifacts.

    Actually, you made some very smart observations. You can just render a shadow for every light, use light volumes as you say. Surely, it is expensive but the "wrong way" has to many limitations and destroies the immersion in my opinion.



    August 19, 2001, 10:25 AM

    Man, you talk some serious crap. A GF3 beats every multiprosseor system in the ground. Even every 50.000$ workstation. While you spend years writing your rendere, I have all thi working now.

    I pretty sure that even on a 2Ghz P4 system you can't even reach the quality of a TNT1 card. CPUs don't even reach a tiny tiny fraction of the speed a modern GPU has.

    I'll challenge you, write a SW render that reaches the quality and speed of a TNT1 on any computer fot 5000$. I doubt that you can even reach the quality of an incrdible outdated accelerator on the best CPU out there, even the idea sounds silly. But go on, wastre your time and proof me wrong...



    August 19, 2001, 10:29 AM

    Never say never...

    Imagine this situation: a few billion spheres (or any big number too high to draw with brute force) placed randomly around the camera, pretty close together. It's obvious that only a few hundred of them can be visible at the same time, right?

    In software, you can render them front to back with a c-buffer, and once every pixel has been drawn, you can stop drawing very quickly. I'm not saying this is impossible with a GeForce, but it will be a lot harder to implement, not? You will have to use a lot of slow tricks for things that are trivial with a software engine.

    Also, a GeForce3 will never beat a software engine at ray-tracing, simply because it can't do this!

    I'm waiting for the card that gives you just as much freedom as a software engine...

    Patrick Lahey

    August 19, 2001, 10:38 AM

    This paper (and this objection) has come up before. If we restrict ourselves to non-degenerate triangular discretizations (i.e. all triangles have finite area) of non-intersecting closed surfaces the silhouette approach is equivalent to the single triangle approach.

    That is to say, the closed shadow volume constructed from:

    -all frontfacing surface triangles (the front cap)
    -all translated (and scaled for local lighting) versions of all non-frontfacing triangles (the back cap)
    -the extruded potential silhouette which connects the edges of the front and back caps

    is equivalent (at least away from the backcap) to the shadow volume constructed from each frontfacing triangle.

    The only catch is getting the orientation of the silhouette triangles correct but this is easy. The orientation is the same as the orientation would be for that edge if we were using the each triangle casts a shadow volume approach. This is why the approach works, we have that orientation information.

    The paper you site dealt with ambiguities that arise when you only have a surface point. Such ambiguities do not arise when you deal with non-generate triangle discretizations where you deal with oriented edges instead of points.

    So why is the silhouette approach equivalent to the single triangle approach (for this class of surfaces)? The proof has essential the same character as the proof of Stokes theorem from elementary calculus. Think about the orientation of each edge from each frontfacing triangle of the surface and you will quickly come to the conclusion that the contribution from all triangle edges cancel except those on a potential silhouette.


    August 19, 2001, 10:56 AM

    Yes, we probably won't be able for a few years to process multiple pixels in one clock cycle on a CPU the way a GeForce3 does, but on the other hand this also means you're limited to what your hardware can do. Programmable CPU pipelines would bring them a lot closer together than you think! Although a GeForce3 is already capable of doing some funky stuff, there will always be things it can't do. Luckily things are changing now with pixel shaders and such, but a CPU is still unbeatable in versatility and can do things your GeForce3 can't do efficiently...

    If someone develops a whole new brilliant algorithm (or old, like a c-buffer), it always has to be implemented in software first because it takes a few years to support it in hardware. So here software is ahead of hardware. There are also algorithms that are very hard to put on silicon, like ray-tracing.

    Now let's stop this discussion, my friend, I shouldn't have started it. I like your work a lot and I hope you like software engines just as much. I'm sorry if you feel offended, but it has nothing to do with your fantastic engine. Now I can get back to studying my exams ;P


    August 19, 2001, 11:19 AM

    AFAIK, there are things that today's CPUs can do that GPUs can't, and vice versa. A processor can be good at something while still being bad at something else.

    - wolverian


    August 19, 2001, 11:37 AM

    *cough* Perfect? *cough*

    A) Phong isn't perfect
    B) I don't see you calculating perfect integrals for area lights in a hemisphere
    C) I seem to call there being a whole issue of range and the fact that black is well, null.

    David Olsson

    August 19, 2001, 11:43 AM

    I totally agree with you Nick.

    Besides, If I would write a software engine I wouldn't use triangles as primitives. I would use something like pixelsized spheres, cubes or whatever together with hierachical zbuffer. That would be a killer.


    August 19, 2001, 11:51 AM

    the dx8 software emulation is c-code not assembler, not MMX, not 3dNow, not SSE, not even fast c-code

    "...Microsoft® Direct3D® features are implemented in software; however, the reference rasterizer does make use of special CPU instructions whenever it can..."


    - Michael

    David Olsson

    August 19, 2001, 11:53 AM

    You don't seem to understand that you can do somethings with a cpu that you can't even dream to do with a gf3. You don't have do things same way with a software engine as with a gf3. Bruteforce isn't always faster.


    August 19, 2001, 12:03 PM

    Degenerate being every 3d model scan, stitched model or model created out of raw polygons from intersecting brushes? Degenerate being every quake level I can remember?

    You also forgot the case of zero volume models which isn't a degenerate case. For example an open ended cylinder with a dent in one side, meshed both inside and out creates a problem.


    August 19, 2001, 12:05 PM

    SreeD software engine, presentation of AIBO
    The Viewpoint website, other products

    IIRC it doesn't even use SSE for those dozens of thousands of polygons...


    August 19, 2001, 12:07 PM

    Also, the vertex shader pipelines are ultra optimized and written by Intel and AMD itself. And thes still suck soooooooooooo much compared to the GF3 HW pipeline.

    Face it, SW redering is 50x slower than a decent gfx card, whatever you do, whatever you say, this remains a fact. The quality of todays realtime SW gfx just approaches a 5 year old Voodoo1. SW redering kicks my GF3s ass, yeah, sounds realistic when you consider that it is barely on the level of 5 years old HW rendering...



    August 19, 2001, 12:11 PM


    Is this a joke? I can't take seriously even one line of your post.
    Ok. One thing that I believe is that the refencence rasterizer is written in c. (but hey, thats only for reference)

    It will take at least another decade before you can make a 3d engine with pure raytracing with more than a few hundred polygons or not using simple primitives like balls and boxes. And it will take only few years before everyone has a gf3 or similiar.

    You say that gf3 has bugs. I really wouldn't call limitations by hardware as bugs. Though some of them might be as anoying.

    Btw. 128 bits for colors and alpha for one pixel? Whats the point? Your graphics card and screen still cannot push more that 32 bits.

    Joakim Hårsman

    August 19, 2001, 12:15 PM

    But when you do multipass stuff, the limited precision shows. I think the GeForce uses higher colour precision than 32-bit internally.


    August 19, 2001, 12:30 PM

    Wow, looks pretty great.

    Perpixel lighting and bumpmapping every where. But those shadows in that older version of your engine would make those other three images (of the newer one) look even better. Actually I think it would look pretty much the same as the doom3 will look like (except that you are using a quake2 map ;-) Support for soft shadows would be a plus ofcourse (propably hard to do with shadow volumes, at least without pixelshaders).

    About shadow volumes I have to say that whether or not they work allways depends on the way you generate them. If you just exturede backfacing polys you will encounter some aliasing problems with badly tesselated meshes. Check NVIDIA developer pages for a demo about this.

    PS. Now that you are on D3D I hope even more that you restart updating your page.

    Fabian Giesen

    August 19, 2001, 12:37 PM

    Not true, there are examples of realtime raytracing already.

    Check out:

    heaven seven for an example of scanline based raytracing or nature suxx / nature still suxx which are two 100% software rendered raytracing demos that deliver quite a decent performance for what they do, I'd say :) (while the scenes aren't exactly what I'd call complex, they're not trivial either).

    Note that all three of those don't use SSE or 3dnow or something similar.

    Mike Armstrong

    August 19, 2001, 12:42 PM

    Am I correct is saying that the current model you are using is:

    pass 1: render all ambient based polys ( also get scene in zbuffer )
    for each light
    Render volumes created by this light to stencil
    render the whole scene where the stencil value == 0
    here also factor in the per pixel dp3 calculations
    clear the stencil buffer and repeat

    obviously here the stencil volume polys would still need to be "infinite" although it would allow multiple lights with "correct" per - pixel lighting.

    additionly which method of setting the stencil buffer are you using, incrementing on zpass or incrementing on zfail. If you are using the former do you find you require and amazingly close front clip plane
    so you don't have to cap the volumes, this always caused so much flickering for me as the camera passed through volume planes so I adopted the second method at the expense of rendering more.

    hope that all makes sense.


    ps arhh the next gen of games awash with moving shadows, now just got to find the gameplay to re-enforce it :>

    Patrick Lahey

    August 19, 2001, 12:43 PM

    You said:

    Degenerate being every 3d model scan, stitched model or model created out of raw polygons from intersecting brushes? Degenerate being every quake level I can remember?

    You also forgot the case of zero volume models which isn't a degenerate case. For example an open ended cylinder with a dent in one side, meshed both inside and out creates a problem.

    To which I say:

    You seem a tad aggressive here... My point was to debunk a common misconnception about potential silhouette based shadow volume construction. For the stated class of surfaces (which need not be convex -- the commonly stated restriction) silhouette construction is effectively equivalent to triangle based construction.

    I didn't say anything about how general the stated class of surfaces are. Certainly 3d model scan data will require careful preprocessing before it can be used for shadow volumes. Intersecting brushes are not a problem as long as each brush is individually closed. Quake does not make these kind of restrictions on its geometry so it is not surprising that the levels would violate it. If quake levels are your only source of data (i.e. you are a hobby programmer with no in-house artists or in-house tools) then silhouette based shadow volumes are probably not of use to you.

    As far as zero volume surfaces are concerned, I did not forget them.
    As long as the inner and outer surfaces share common edges the shadow volume constructed using the silhouette method will be equivalent to the single triangle construction (which was my only point).

    If the state class of surfaces is too restrictive for your application, you should probably consider other algorithms. I have no emotional stake in which shadowing algorithm you (or anyone else) selects.


    August 19, 2001, 12:51 PM

    Yep. I have seen those. The point you missed is that these all use very primitive geometry to build a bit more complex stuff. There isn't actually nothing more than cubes, balls, cylinders etc. You really just can't raytrace models like game characters with todays cpu:s. Take for example mental ray which is considered a fast renderer plugin for 3dsmax. It takes atleast a of two second for it to render a simple mesh with about 500 faces.

    Btw. Those are really great demos/intros.


    August 19, 2001, 12:56 PM

    Sorry, I didn't mean to sound aggressive ( *Conor wishes he had the power to use tone and facial expressions over the internet *) - I was just constructively debunking your debunk.

    Zero volume meshes don't have the property, particularly the cylinder with one bent side I mentioned - I worked this out on paper visually again just to make sure.


    August 19, 2001, 01:06 PM

    bugs your gf3 has:

    or, letz call it limitations.. how ever, i hate them.. stupid limitations they are

    cubemap rendering is not at all fast
    =>no shadowmappings for pointlights => the shadowmap extension is not very usefull.. in a software-rastericer you can write rastericing for cubemaps without any overdraw..

    very unprecious register combiner part ( 8bit [-1,1]-range )..
    =>just using the 2 combiners on the gf2mx and i have terrible artefacts yet in speculars ( i get high exponents with multiplications, power32 and power64 with 2passlighting.. i think i know how to use my combiners, ok? and the range-limitation is terrible.. i could do nice stuff like real reflection-vectors dotted with the lightvectors in the combiner, per pixel, on a gf2, if i would not have this limitation.. not very nice.. )

    normaly you use floats today for this on cpu's
    =>read the one above.. do it with floats and you have no problem.. on the other hand, yes, you need much more memory than with 32bit for the 4 components.. but the normals are very unprecious then, and the HILO-normals you can't use in the combiners, they would be nice.. (even if 32bit in the combiner anyways..)

    only triangles
    =>no comment..;) you can create optimized cell-renderers, render spheres, nurbs, what ever, or do raytracing.. you can do very much stuff.. voxels etc.. not at all on a gpu today..

    terrible to render volumetric objects ( texture3d )
    =>kill your gf3, render a huge volumetric object.. oh, and if you use some blending for making it transparent like the enemies in final_fantasy_the_movie or the funny big head with the bullet in in the jamesbond film, for example.. then you see the damn unprecious color-format you have..

    stupid architecture for perpixellighting
    =>very much features wich would be easy are "forgotten".. combiners are made in a way that you need houndreds of them.. etc etc.. most i hate is the possibility to do at least diffuse lighting for all 8 opengl lightsources per pixel with bumpmap, but you have too less vector-inputs into the combiners.. just 4 textures and 2 colors.. how about VECTORS? working like the pass_through in the texture shader, would be nice.. there is a very nice distance-attentuation possible with pass_through.. but i dont need to use a texture_stage for it really.. and because you can put it into hardware it would not be a big problem to paralellize for example 16 vectors for perpixelinterpolation.. i dont even ask for automatic per pixel normalization of them ( not yet;) )

    32bit color, i use 128bit myself ( or 96 without alpha.. ;) )
    =>calculating all with floats, simple on todays cpu's with SSE for example.. a multiplication of your colors is no problem there, just one operation ( clockcicles i dont know, sorry.. ) i really like that nvidia wants to support 64bit colors next.. but at the time they support 128bit colors with 4 floats we can get the real nice stuff, and loose the bugs of rgb with [0,1]-range.. (adding for example.. add orange and orange and you get? yellow..).. then next is why rgb? there are others out there, much bether for correct combining stuff sometimes..

    =>if you do your software-rastericer well, you only need to call your pixel_program once per pixel, not 6 textures 3 passes etc.. i mean, there is much too much drawn on this screen, even with culling with a octree.. read the document about software-portal-rendering and you possibly see the power of direct access to everything.. cull every unneccessary pixel

    =>big shit.. use a c-buffer, or a hyrarchical one, or what ever.. software renderer know much faster depthtest-methods than a depthbuffer, only in hardware you cant choose.. and using a depthbuffer or not is a speeddifference ( ok, on gf3 its bether now with the early depthtesting, this just shows that the current pixel_pipeline takes some time even if you say it does not..;) )

    framebuffer access much too highlevel with much too less features
    =>much nice stuff possible in combiners.. but if you need 2 or more passes you get problems, and you have to choose sometimes much more passes than you want.. no choose for alpha and rgb like in the combiners.. not much possible there, how about extending the pixel_shader over it?

    btw, i really like the gf3, it has much more power than everything else yet.. and i like that they choose this way, but i just hate the gf3 is perfect fuck your cpu-sentence.. i mean, tim, this picture is great, but your software-rastericer is shit.. a single triangle is not the big problem to get, and sure it is slow if you code it in c++..

    oh, and NEVER say i say shit.. ok? never..

    i think i know much about gf3,gf2 and something about other gpu's, and i know much about cpu's, too..

    how about this? combining software and hardware? and THEN you see how fucked up this hardware is.. the direct access is simply not there, what ever you want.. there would be a way to combine rastericing, raytracing and globalillumination.. and i think it could be nice fast on a athlon 2giga and gf3, but there you see that those limitations mentoyed above are big bottlenecks, and this shows that it really IS limitated. you can just use it for rendering quake with much pixeldetail, but new stuff you can't do really.. its not a great deal to render a quakelevel with bumpmapping, you can get all the information for it on the net. but doing nice software stuff simply means that YOU have to do something.. and, i don't know tim, but possibly you just wont be able to do it? and i prefer having a cpu without a gpu than vice-versa..


    August 19, 2001, 01:26 PM

    hm absurd, you dont need to believe me, but if you want i can give you the exe of my raytracer ( if i find a version.. it has some time i have done it, it took me 2 days ( two afternoons while not in school.. ) )

    simple raytracing is damn simple to realize.. no culling there, no real stuff.. spheres and planes ( 2 things your gf3 has problems to realize.. infinite planes simply not possible, spheres just with much too much data.. i need 128bit for a sphere;) )

    at the time i get my athlon 1.4 giga ( one or two months remaining ) i will start code a real raytracer, not just a test for can_i_do_it_at_all? demo.. and yes, infinite reflections are possible.. frame-rate dropped to one quarter.. but its nice to see it anyways..

    at the time i have my athlon i will try to think about combining my rastericer and raytracer to get real fast stuff.. possibly a half-cube-rastericer or something like that.. i dont know wich i will try first;) but after all.. raytracing is not anymore a future-dream.. oh, and think about outcast, do you know it? the voxelgame.. it whas designed for intel and amd-processors up to 500mhz ( or so.. ).. with an update for todays up_to_2giga cpus you could get the doubled resolution, means you could play it at 1024 or so.. or smaller with more stuff

    oh, and no, your gf3 does not have much features.. it has perpixellighting/bumpmapping/enviromentbumpmapeffects.. that is what you do. before we used vertexlighting and lightmaps and such, because this whas supported in gpu's, now perpixellighing.. it looks much nicer, but its just a new feature you now use. try to do something really different and you're blocked by hardware.. this is fact.. gpu's don't let you be very creative, except you where really good..

    the stuff i liked most whas the automatic billboarding generation with 8 instructions in the vertexshader or the automatic particle_dot_generation on the geforce without a texture but nice transparent round fade-out particles with one general combiner.. this is stuff wich takes some brain ( even if not much;) ) to create, and you can do with this very much, even on gf2. oh, and try to create particles without textures, and you can get twice the count onto your screen.. great for fire-effects, not?

    i cant wait for my own gf3 to see my demos in hardware.. perpixellighting surely IS great, gf3 surely IS great, and everyone should have one..

    but just dont say you wrote the perfect engine..

    you wrote a engine for the best gpu out there, and you render a few 1000 triangles.. wow.. ;) no, its great.. really.. can't wait to do something like that, too..

    oh, and about normalizing in the combiner instead of the cubemap, dont do it, if you know maths you know that your vectors get very much degenerated and at the end the place your light enlightens the triangles is not the place your light is.. learn math and try to do CORRECT PHONG... this is difficult on gf3, yeah, even there..

    ideas for trying on your fabulous gpu:
    transparent stuff with bumpmaps on for refraction effects
    reflection maps
    then you get troubles implementing it fast..

    and your gpu is thought to do such stuff..

    how about displacement-stuff.. like heat over fire and such.. there are much effects not really possible yet, at least, not with your gpu;)


    August 19, 2001, 01:33 PM

    i dont want to critisize you, but i just wanted to note that heigher lightmap resolution does not always look better. the smooth look of lowres lightmaps and the possibility to put up really many lights illuminating the same room make static lightmaps look better than the realtime dynamic things.
    i really would not say that the realtime lights look better than static lights. its just that.. they are dynamic. they can move and stuff. that is their good side^^.

    also as a side note: did you try out your scheme with fx like alpha blended stuff, envmapped stuff, antialiasing or other things?

    i'm not (yet) convinced of the 100% dynamic lighting thing. perhaps you should put up some more screenshots showing your new shadow system which you mentioned serveral times to show that your dynamic lights can really look as good as static lights.
    (side note: light colors do not have to have a staturation of 100% always and not every scene needs to have light colors of every single corner of the HSB spectrum...

    This thread contains 140 messages.
    First Previous ( To view more messages, select a page: 0 1 2 3 4 ... out of 4) Next Last
    Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.