Not logged in, Join Here! or Log In Below:  
News Articles Search    

Submitted by Jacco Bikker, posted on April 29, 2005

Image Description, by Jacco Bikker

A while back I sent in an ray tracing IOTD showing the Stanford bunny, rendered at high speed. I've been busy since I made that demo, and these shots show the current state of the art. The top shot represents the maximum image quality: There are textures, adaptive super sampling (for edge anti aliasing), a bloom filter causing a subtle glow and of course the reflections. Sadly all this eye candy comes at a cost. The lower shot shows a very good performing model: The number of rays per second is no less than 3 million on a 1.7 Pentium-M - on a P4 @ 3.2Ghz this would be about 6 million rays per second, which is better than the SaarCOR FPGA ray tracing chip.

Over the past months, many things have improved: The overall speed of the ray tracer has been improved considerably due to some stiff competition from tbp (the odd french dude), there's a complete tool chain now to get from downloaded content to ray traced images (via the .obj file format), and the functionality has been extended considerably (textures, reflections, HDRI, networked rendering etc.).

There will be more good stuff, I'll keep you all informed. Greets - Jacco.

Image of the Day Gallery


Message Center / Reader Comments: ( To Participate in the Discussion, Join the Community )
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.

April 29, 2005, 10:25 AM

What is the goal of this raytracer?
Is it image quality or speed?
And can it handle any kind of animations?


April 29, 2005, 10:58 AM

Impressive work Jacco!

The number of rays per second is no less than 3 million on a 1.7 Pentium-M - on a P4 @ 3.2Ghz this would be about 6 million rays per second...

I'm afraid that's not true. My software renderer runs at nearly the same speed on a Pentium M 1.4 GHz and a Pentium 4 2.4 GHz. Anyway, I hope I'm wrong for your raytracer!

Rui Martins

April 29, 2005, 11:02 AM

I don't know if it's my eyes or what, but thethose columns, specially the ones on the right seem to be out of focus or something like that!

Could it be because of the Bloom filter ?

Jacco Bikker

April 29, 2005, 11:14 AM

Nick: I know you can't really compare Mhz, but it's actually an educated guess: I share experiences (and demos) with tbp, who has a 2.2Ghz P4. On his machine, my demos run 60% faster than on my machine. Also, I tried earlier versions on a collegue's P4/3.2, and got roughly doubled performance. I don't know why you don't see the same results; perhaps my code is less dependent on the cache?

BTW sorry I lied to you earlier when I said that there would be little room for speed improvement over the Bunny demo. I got about 50% extra speed since that demo.

Rui: Yes, the current implementation of the bloom filter causes the image to look slightly blurred. The filter is a 17x17 filter of the rendered image, and the final image is a blend between this blurred image and the original frame buffer (20%/80%, if I am correct).

Scali: I'm striving for both image quality and speed. Or: thanks to the speed I can add features that don't take ages to render (like the adaptive super sampling). I would also like to work on photon mapping (still), but I'll also keep working on better speed. Animation is not supported at the moment (kd-trees suck for that), but I might give it a try later. I could for example use a cheaper construction method, or even a plain octree. That would hurt performance severely, but tree construction is almost instant, so the time per animation frame would probably drop.

And of course, lights and the camera are independent of the kd-tree, so these can be animated without extra cost.


April 29, 2005, 12:16 PM

Netburst is very fast with SSE, that could be the reason for the almost linear scaling


April 29, 2005, 12:19 PM

"which is better than the SaarCOR FPGA ray tracing chip"

hehe no doubt, btw there has been talk of dedicated ray tracing and physic chips making it mainstream, but I think they will be a little late to market, once dual cores hit there will be little use for these as you'll have a 2nd proc just sitting there waiting to be fed...

oh ya, awesome pics.


April 29, 2005, 12:28 PM

Come on. Even dual core processors can't handle complex raytracing scenes in the near future...


April 29, 2005, 03:03 PM

Really impressive! By the way; did I set a new standard with that legocar? ;))

Jacco Bikker

April 29, 2005, 03:26 PM

Did you model that? Nice. :) It is ideal for testing purposes; few polygons, so the kd-tree builds fast, minimal texturing, and of course it looks really nice. A collegue used to test his software rasterizer with it, so I just took it from his machine.


April 29, 2005, 03:28 PM

No that model was actually used for a different reason alltogether.

Some time ago when I carefully set my first steps into the 3d realm I tried creating a software renderer of my own, using that exact same model.

Jacco found it funny (he has an evil kind of humor) to feed the same model into his raytracer and get faster framerates ;^)


Jacco Bikker

April 29, 2005, 03:40 PM

Ehm... Yeah. That could indeed have been a factor also. ;)


April 29, 2005, 04:56 PM

indeed, i didn't render it, but i keep on seeing it more and more often in "our" field, but it is just some coincidence apparently ;)

(like these)


April 29, 2005, 05:13 PM

Wow, Like many said, impressive.

What has been improved since the Rabit to get such a performance gain?


April 29, 2005, 05:37 PM

looks pretty cool (and fast)! those example scenes are cool too, i hope the lights move around and that :)

would it be possible for you to write about simd subdivision traversal (grids, kd-trees...)? i'm really interested in issues such as diverging ray paths: how do you effectively detect this, and is there anything better you can do than spawn the rays seperately using the non-simd traversal?

aside: i'm working on (what ought to be) a very cache efficient, dualcore/multicpu aware renderer, and it would be nice to fully exploit those simd units. i'm using double precision though (with good reason i feel), does sse3 provide for that?

Dan Royer

April 30, 2005, 12:17 AM

You may have different bottlenecks

Christian Sigg

April 30, 2005, 12:50 AM

"a little late to market"

I think you clearly have to distinguish between physics and ray tracing here.

Physics has a wide field of possible algorithms and acceleration data structures. The Ageia PhysX chip is a rather simple SIMD multicore chip. I would guess that once you have enough performance available, your physics algorithms will be too complex for it. It might be less hassle to implement those algorithms on a second Pentium core at compareable speed (bandwidth overhead to PCI card, SSE on CPU). At the moment, the cell chip seems to be the best architecture for things like that. You have a general core for complex data structures and fast, multi parallel SIMD for your (particle/bone) shaders.

Ray tracing requires a fixed instruction set for one specific algorithm and can be implemented on a FPGA. Dedicated hardware is just faster than general purpose cores (SaarCOR or the new FPGA implementation run much slower, the latter is not called SaarCOR as far as I know). And several vendors make a good living from selling rasterization chips because graphics became such an important aspect of a (gamers) PC everyday workload. I'm just wondering if those companies are thinking about implementing dedicated RT hardware in the near future. I would guess that NVIDIA will come up with irregular framebuffer sampling first to compute shadow maps without sampling artifacts. Refection and Refraction rays will have to wait.


Christian Sigg

April 30, 2005, 01:08 AM

Hi Jacco,

Very nice and inspiring work.

I was wondering why the number of rays per second in the upper shot is just a third of the lower shot. Is it because the upper scene is 8 times more complex (tree traversal complexity logarithmic to scene complexity)? I guess blooming is not that expensive, is it? Do you also count secondary- and super sampling rays?



April 30, 2005, 04:31 AM

Looking great!

Will there be a part 8 to the raytracing series anytime soon?

Jacco Bikker

April 30, 2005, 04:48 AM

Christian: There are several factors that influence the number of rays per second. First of all, rays per second is the number of rays cast (primary & secondary), divided by the rendering time, so adding the bloom filter impacts this figure (quite severely; I added the filter two days ago and it's far from optimal, takes about 0.1s for a 512x384 image).

Other factors:
- The depth of the kd-tree;
- The size of the triangles (smaller triangles means that I can use the packet traversal code for less pixels, and the mono ray tracer is only half the speed);
- Edge anti-aliasing: These rays typically can't use the packet tracer since a packet will almost always hit more than one triangle;
- Texturing: The bilinear filter is applied to 4 rays at once if possible, but still it's slower than not doing texturing, obviously;
- Reflections: These rays have more expensive setup, and also tend to diverge more rapidly than primary rays;
- Screen coverage: The lego car doesn't cover the entire screen, which means that many rays don't need shading / texturing at all. Besides, these rays travel through less dense areas, so they hit less tree nodes (and no triangles, obviously).

There's more: The pilars of the cloister make that there are multiple complex areas. Rays that miss a pilar by a few pixels still travel through complex areas of the kd-tree, without hitting anything.

Summed, this results in the above speed difference. Basically, the car is the optimal case, the cloister is pretty much worst case.

I'm not using sse3, just sse2. Frankly I don't even know what sse3 adds, I'll investigate. BTW why do you use double precision? I so far never encountered any problems that boiled down to precision. And another question: How fast are you? :)

Most of the improvements are just plain optimizations of the existing code: Const correctness, proper inlining, hot and cold data, cache alignment, reordering code and so on. There are some improvements to the kd-tree construction code: Empty space cut-off, adjusting the split plane by + or - epsilon to fix accuracy problems, and some smaller tweaks. The kd-tree tweaks resulted in about 10% improvement, a long list of tweaks to the code resulted in another 30-40%.

This is unlikely. There's a huge gap between the theory of article 7 and what I have now. I'll make it up in a different way. Wait and see.


April 30, 2005, 04:50 AM

[account locked, using a decoy]
Just a little correction, from the odd french dude (cough).
I don't own a P4 (holy jumping Jesus, that would be so embarassing) but a 2Ghz k8, more precisely an Opteron 146.
And while that ratio between ours boxes is more of less ok for that kind of application, it tells nothing about P4.


April 30, 2005, 05:06 AM

[posted on behalf of tbp]
Divergence is easily handled for a kd-tree, as Wald noted it's just a matter of checking direction signs.
So, and i think Jacco does the same, it goes this way:
. generate a ray packet (simd)
. check for divergence (simd)
. take the unlikely branch that unpacks each 4 sub ray into proper mono-rays if not compliant (simd swizzle and then back to scalars)

That last part is a bit ugly, but you have to pay for it at some point.

BTW you don't need SSE3 to tinker with double precision, just SSE2 (the whole set of operation is mirrored between floats & doubles).


April 30, 2005, 05:09 AM

For the matter at hand, SSE3 adds a bunch of horizontal ops.

And i did encounter precision issues; Jacco you should know better.
Still, 32bits of precision ought to be enough if you're (extra) careful.


April 30, 2005, 05:20 AM

Jacco Bikker wrote: - Texturing: The bilinear filter is applied to 4 rays at once if possible, but still it's slower than not doing texturing, obviously;

Is bilinear filtering the only filter that your raytracer supports?
It seems that more advanced filters (trilinear, anisotropic, etc) in raytracers are rare, especially in speed-oriented renderers. It seems that people often just throw more rays at it... supersampling will also work as texture filtering...
But I was just wondering if you have found a more efficient way.

(I am just wondering when there will be a raytracer that can do animation and good quality texturing quickly, so realtime raytracing will start to become interesting as an alternative to triangle rasterizing...).


April 30, 2005, 05:48 AM

Better filtering asks for too much memory bandwith, a scarce ressource for modern CPU.
It's not really practical to go beyond bilerp.

And animation isn't an issue as long as you can bear lowering your space partitionning efficiency (that's where the tradeoff stands).


April 30, 2005, 05:49 AM

The recent trend of doing full-scene bloom is bugging me. The blooming should, imo, only occur where there's any overflow of the original 8 bits. Thus if you subtract 1.0 (with zero-clamping) from the framebuffer and blend (or even just a scaled add or a "soft add" (src + dst*(1-src)) the blurred results of that, areas which aren't overflowing won't look blurred and the final image won't look so damn "over done".

This is like when colored lighting first came out, everything was colored all of a sudden! =)


April 30, 2005, 05:54 AM

Exactly, that's the gimmick of the day.
Where are the lens flare?


April 30, 2005, 07:05 AM

Fool! Jacco Bikker doesn't have any bottlenecks! :)


April 30, 2005, 10:14 AM

*cough* Demo *cough*


April 30, 2005, 11:01 AM

muchos gracies :)

my new renderer is working with a grid at the moment though, so i'm thinking of a similar way to do it that minimises the "packet reshaping"... i'll probably just start with a really big packet of coherent rays (eg all antialiasing eye rays) and allow it to be split a fixed number of times before calling the single ray routine.

alternatively i can make a really compact kd-tree instead (or perhaps just additionally? i have good reasons for using grids) just for the coherent rays.


April 30, 2005, 11:13 AM

Yep. That lego buggy is fast becoming the new Utah teapot. Have you seen the gallery recently? ;-)

This thread contains 160 messages.
First Previous ( To view more messages, select a page: 0 1 2 3 4 ... out of 5) Next Last
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.