Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 


Submitted by Jacco Bikker, posted on January 05, 2005




Image Description, by Jacco Bikker



These are some images from my latest (yet unnamed) ray tracer. Everyone has seen the bunny a thousand times now so I'm throwing in a real time demo as well. :-) As you can see, so far I have been focussing on raw performance. The ray tracer now fully implements Ingo Wald's packet tracing: Whenever possible, four rays are cast simultaneously and traversed through a kd-tree. By using packets, the ray tracer becomes very suitable for vectorization using sse/sse2 instructions. Despite some overhead this approach doubles the speed of a regular raytracer.

About the images: The top left image shows the famous Stanford Bunny model, 69k triangles. There are two (dynamic) light sources in this scene, and the animation is rendered at 512x384 pixels. On my 1.7Ghz Pentium-M laptop this runs at about 5 frames per second. To the right of this image a dump of the matching kd-tree is shown.

The two lower images show the Happy Buddha model and the matching kd-tree, which consists of no less than 1089k triangles. This model renders slightly slower; on my system it runs at about 4 frames per second.

This brings up a very interesting characteristic of ray tracing: The technology scales very well. Switching from 69k to 1089k triangles only means that some extra levels are added to the kd-tree; the speed decrease is therefore not linear at all. Besides that, ray tracing performance scales virtually linearly with processing power and the number of available processors. This means that, given a sufficiently complex scene, it's possible to outperform high-end graphics cards using commodity hardware.

Also interesting is the fact that the ray tracing algorithm is very simple. 750 lines of code get me 'z-buffering' with virtually unlimited accuracy, visibility determination, self-shadowing, recursive reflections and refractions, hard and / or soft shadows, per-pixel lighting, HDR effects and so on.

I'm currently working with Thierry Berger-Perrin to produce a more interesting demo, perhaps similar to the famous RealStorm benchmark. In the meantime, you can download the bunny demo (3.8MB).

Greets
Jacco Bikker


[prev]
Image of the Day Gallery
www.flipcode.com

[next]

 
Message Center / Reader Comments: ( To Participate in the Discussion, Join the Community )
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
Bramz

January 05, 2005, 12:16 PM

very nice! reminds me I have to continue my own ray tracer :D

here it runs @ +/- 8 fps (P4 @ 2.5GHz)

Bramz

 
Axel

January 05, 2005, 12:22 PM

8.5 fps on a 2 Ghz Athlon 3000+. Very impressiv :)

I hope HW manufactors finaly realize that raytracing is the future *sigh*
If you have enough raw power then you can throw almost as much geometry at a raytracer as you want.

 
NCCAChris

January 05, 2005, 12:24 PM

Nice job Jacco!

btw have u finished with the raytracing series or is it just a lull period :)

*edit* - although it doesn't want to work atm - goes straight to crashsville

 
Samuel

January 05, 2005, 12:32 PM

Very nice : ) All I have is a 2.6 ghz celeron (p4 supposedly) with a gforce 5200 fx card, and it runs 7.3-8.9 fps. I can definately see this being the wave of the future, after a few more technology jumps.

 
Jacco Bikker

January 05, 2005, 12:52 PM

I forgot to mention that the demo requires sse2. It thus requires a P4 or a recent AMD proc; the Athlon XP isn't going to cut it. Sorry...

Articles: I'm not sure, I'll probably do one or two more but I'm not sure when. I want to describe the packet tracer that I implemented, and I think a decent tutorial on sse is badly needed. There's virtually no real info on the web apart from the obligatory matrix * vector stuff.

 
Axel

January 05, 2005, 12:54 PM

Why does it need SSE2? I thought SSE2 only extends SSE with 64-bit instructions?

 
Rui Martins

January 05, 2005, 01:01 PM

I wounder how long it will take to transform those 750 lines of code into some kind of shader/Ray tracer assembly, so that all can have a Raytracer board at Home.

There are a few already, but they still are expensive, and somewhat limited on functionality.

When Technology gets this far, we are going to start having a bad time to have a good looking scenario, because we can't cheat with fake implementations, we really have to produce tons of models for our worlds.

Just imagine what we do with a simple skybox today, would have to be done with huge transparent models of clouds (possibly blobs or many spheres).

The side effect I like most is that we probably will get rid of textures !
At least in the 2D sense of it. More like Procedural Materials or similar.

Setting a scene will become more like a hollywood movie set, because we will have to place the lights in specific spots, to get the expected output. A lot harder to do, than is usually expected.

There are even cases, where a fake or edited image appears to be more real, than the real photograph.

I believe that scene management will be the buzz word in these to come new age days.

I suppose we will have a large transition period, due to the high cost of the transition in terms of man-power and know-how, the so called "Lock-In" effect.

 
stoo

January 05, 2005, 01:07 PM

This is totally cool! Been implementing my own kd-tree after reading your tutorials (and the linked wald paper). I have some way to go to match these speeds though :).

I'd be very interested to see how you do the packet tracing stuff - would that work for tracing arbitrary rays for stuff like e.g. ambient occlusion?

Anyway, here are the demo specs:
P4 3.2ghz, 1gb ram - ~10.5fps

Excellent!

stoo

 
MadHed

January 05, 2005, 01:26 PM

WOW! This is totally amazing.

My Specs:
AMD Athlon64 3400+ (2400Mhz)
512 MB DDR Ram

Runs at about 11 fps

Have you tried rendering the scene with complexer materials, for example
a reflecting bunny? I wonder what speed it would run at.

Both thumbs up for the demo and an excellent tutorial series... :)

 
El Pinto Grande

January 05, 2005, 02:12 PM

"Why does it need SSE2? I thought SSE2 only extends SSE with 64-bit instructions?"
Indeed SSE2 brings 64bits float support but also a whole range of logical ops and conversions (ie intfloat32/64).

"Have you tried rendering the scene with complexer materials, for example
a reflecting bunny? I wonder what speed it would run at."
That demo shows what kind of raw ray/s budget you can get with coherent rays vs a kd-tree on current cpu; there's many ways to waste them :)

 
Alex Herz

January 05, 2005, 02:13 PM

Very nice shots. Hardware raytracing might be closer than you think,
real time raytracing on fpgas:

http://www.saarcor.de/

Alex

 
Lennox

January 05, 2005, 02:33 PM

Amazing once again. I for one can't wait till RTRT becomes a reality. This is one huge leap forward. Haven't been this impressed since the RealStorm demos.

P4 3ghz Hyper Threading
1G DDR Ram
Scores after startup
Lowest : 8.4
highest : 10.4
avg : 9.6

P4 1700 Mhz
512MB DDR Ram
avg : 5.2

 
Kurt Miller

January 05, 2005, 02:39 PM

Awesome job as always. Runs at about 8-9fps on my p4 2.8ghz.

 
jon100

January 05, 2005, 04:26 PM

This is a really amazing Jacco, congratulations!
I've read (and almost understood) all of your amazing raytracing tutorials and I was waiting for a few more and I'm sad to hear that you'll maybe only provide 2 extra ones... I guess this is taking a lot of time and perhaps you'd be sad to see all your hard work taken by some company making money with it, and I totally respect that.
I would've loved to read every single details and explanations on your amazing process to get to where you are today. It is common to see pretty picture of the progress of some new raytracer but extremly rare if not unique to read so clearly and well explained articles and have the satifaction of understanding almost every single step.
But I guess you've already provided more than enough for anybody to start with great knowledge and follow their own path trying to match the quality you're showing on this IOTD :)
Anyway, I hope you'll still continue to blow our minds with some IOTDs as your raytracer progresses, and I can't wait to see what you're preparing next.

Congratulations and a huge thank you for the articles!

PS: Please Kurt, could you add a comment system (forum) to the articles so everybody can contribute and maybe help other users with that particular article ? The comments are available for IOTDs and News, why not articles :(

 
Victor Widell

January 05, 2005, 04:34 PM

"what we do with a simple skybox today, would have to be done with huge transparent models of clouds"

Not true, as long as you stay on the ground... Actually, a nice HDR skybox would be perfect to light any outdoor scene (using global illumination).

 
bsh

January 05, 2005, 04:57 PM

Nice job. This runs under WINE on my Linux (1.8GHz, 640MB, 2.6 kernel) box at 5-6 f/s.

$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Pentium(R) 4 CPU 1.80GHz
stepping : 2
cpu MHz : 1794.965
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 3555.32

 
stoo

January 05, 2005, 05:21 PM

Aye, they still use sky domes and simple 2d particles in films all the time. Theres always room for cheating :).

 
Ivan Kolev

January 05, 2005, 05:28 PM

Jacco, is the program multi-threaded? Should we expect performance increase on dual-CPU machines? I have a dual Xeon 2.4GHz with hyperthreading (4 virtual CPU's) and the average CPU usage is about 35%. FPS is about 7.

 
El Pinto Grande

January 05, 2005, 05:31 PM

I'll sneak in and answer that: no it's not multi-threaded as far as rendering is concerned.
But it would be quite easy to parallelize it a bit (like any raytracer), at least for primary rays.

 
Nick

January 05, 2005, 05:31 PM

Jacco Bikker wrote: ...and I think a decent tutorial on sse is badly needed. There's virtually no real info on the web apart from the obligatory matrix * vector stuff.

I have been using http://tommesani.com/Docs.html for quite a while as a quick reference. In rare cases I just look at the Intel references to get the full story.

What kind of article do you intend to write on SSE? The instructions are pretty straightforward except for a few exception. :-)

 
Axel

January 05, 2005, 05:46 PM

I know, but imagine what they could do with a 200 million transistor chip at 400Mhz...

 
Nick

January 05, 2005, 05:48 PM

Awesome results. Jacco, what kind of performance improvements do you still expect to see? I mean, are the used algorithms and the code tuning already close to optimal, or haven't we even seen a fraction of the possibilities yet?

 
El Pinto Grande

January 05, 2005, 05:53 PM

Nick wrote: The instructions are pretty straightforward except for a few exception. :-)


They are straighforward until you try to make something useful out of them.
As always evil is in the details; ie how to deal with NaNs, brancheless code/conditionnal moves, mixing float/integer vectors...

And anyone trying to use them will have to re invent the wheel for that and all the most basic c-lib like functions: ie abs, negate... you name it.

And then how to be cache/mem efficient and then... ad vitam.

The map isn't the terrain and the doc you've pointed at isn't addressing those problems at all :)

 
Jacco Bikker

January 05, 2005, 06:29 PM

Guys, thanks for the extremely kind words. To answer some questions:

Ray packets can be used for arbitrary rays, but the gain is little when rays are not coherent. A packet is traced through the kd-tree as a whole, so every node that one ray visits is visited by all rays. For primary rays this is fine, especially at high resolutions. For shadow rays it's also fine. For reflections, rays start to diverge, and the gain drops. For the demo, I have implemented both a mono ray tracer and a packet ray tracer: The mono tracer kicks in when primary rays didn't hit the same primitive, or when the rays can't be traced as a packet for other reasons (e.g., the signs of the directions are not the same).

Complexer stuff: Working on reflections right now, refraction and texture mapping should follow. I took out as much as possible to ease the initial implementation of the proof-of-concept demo. Having a mono and a packet tracer doesn't help btw; I have to code everything twice, usually in both a plain C and an sse implementation. Give me some time. :)

Articles: I'll try harder to write some things down. The above discussion is extremely encouraging.

Multi-threading: That would be easy to add, but I don't have a system to test it on. Rays are very much independent of each other, so using two procs is straightforward to implement. Adding some basic network code should allow processing on multiple machines connected by LAN. That's an option I'm definitely going to investigate, as a couple of machines would already push this in the realm of realtime graphics.

Room for improvement: I'm working with Pierre Berger-Perrin to compile the code using the latest (experimental) gcc compiler. The results so far look promising, Pierre beat the Intel compiler this afternoon and we are expecting improvements of about 10-15% over the current demo. There's always room for improvement in the code itself, I'm not yet using prefetching for example and I'm definitely not an sse king. :) But looking at the scores that OpenRT achieves I'm not too far from optimal so don't expect huge improvements.

More later. :)

 
Nick

January 05, 2005, 06:34 PM

El Pinto Grande wrote: As always evil is in the details; ie how to deal with NaNs, brancheless code/conditionnal moves, mixing float/integer vectors...

That's not really SSE specific.
And anyone trying to use them will have to re invent the wheel for that and all the most basic c-lib like functions: ie abs, negate... you name it. And then how to be cache/mem efficient and then... ad vitam.

Ok, let's face it, this -is- assembly programming, so you're going to reinvent the wheel anyway. abs and negate are trivial for anyone who understands binary numbers. And cache efficiency is also something any programmer who wishes to optimize his code with assembly should already be aware of.
The map isn't the terrain and the doc you've pointed at isn't addressing those problems at all :)

I suggest writing a complete assembly programming tutorial then. It makes no sense to explain it to somebody who is already experienced at writing assembly code, and we can't start with SSE code for the people who've never written a single line of assembly code.

Just my humble opinion really... I have been wanting to write a tutorial about it myself a while ago, but I always got stuck in the first paragraph because it's really hard to make any assumptions about the reader's preknowledge. Also, I found out that optimizations that work well in one situation don't make a difference at all in another situation, or even slow things down. I believe the reason why there are so little SSE tutorials is because it's really hard to reach the right goals. On the other hand, anyone really wanting to learn it, will reach his goals one way or another.

Ok, allow me to explain that a little more illustrative. Suppose we have a tutorial about SSE. Then lots of people will just try to copy-paste the code and expect things to work ok. Unfortunately, using that code correctly will be nearly as complex as writing it yourself. For someone with limited or no assembly experience, it can produce nasty bugs or just not deliver the performance increase they hoped for. On the other hand, someone with adequate assembly knowledge who's serious about SSE optimizations won't learn much from such tutorial because all in all it's just a few tricky instructions.

Anyway, I'm open for any opinion about this. I certainly didn't mean to discourage anyone and I wish Jacco all the best if he writes an article like this! Maybe he can surprise me with that as well... ;-)

 
Nick

January 05, 2005, 06:59 PM

Jacco Bikker wrote: I have to code everything twice, usually in both a plain C and an sse implementation. Give me some time. :)

SoftWire. ;-)

It is now possible to write things like "a = b * c - d" and let SoftWire generate optimized SSE code from that. So it's very close to writing C code. And you don't have to write everything twice. Just use if-else statements and this translates to dynamic conditional compilation. You can read more about it here: http://sw-shader.sourceforge.net/optimization.html. If you want to support any kind of shader scripts, dynamic compilation will also be the only way to go...
[/i]Room for improvement: I'm working with Pierre Berger-Perrin to compile the code using the latest (experimental) gcc compiler. The results so far look promising, Pierre beat the Intel compiler this afternoon and we are expecting improvements of about 10-15% over the current demo. There's always room for improvement in the code itself, I'm not yet using prefetching for example and I'm definitely not an sse king. :) But looking at the scores that OpenRT achieves I'm not too far from optimal so don't expect huge improvements.[/i]

That's a bit dissapointing, although it's about what I expected...

It confirms the way I think about this technology. Raytracing isn't going to replace rasterization any time soon. At least not for the applications which are typically done with rasterization. Although it costs some effort (basically hacks) to implement things like visibility determination, shadows, reflections, LOD, etc. for a rasterizer, the results are still much faster than raytracing. The idea behind raytracing is elegant but it doesn't really allow to use 'hacks' to do complex operations any faster.

Anyway, I believe raytracing and rasterization will converge. Raytracing and variants are often used for radiosity effects, but such scenes are mostly rendered with a rasterizer. On the other hand raytracing can use some of the 'hacks' from rasterization like reflection maps to reach better performance.

 
Jacco Bikker

January 05, 2005, 07:12 PM

I agree that ray tracing probably isn't going to take over rasterization. However, I think the only reason for this is the fact that this would be a too complex transition; you can't sell people rt cards without games, you can't code a game without the cards.

The bunny demo uses *only* the processor. If I compare what a cpu can do in terms of rasterization, compared to what the latest cards do, and then apply that to ray tracing then things are more in the right perspective: If NVidia would produce a chip that beats my code 50 or 100 times in terms of performance, I have no doubt that the coolest shots would soon come from ray traced games instead of rasterized games. And not just that: Graphics engines would no longer be the core of the game. Ray tracing is too simple for that.

 
Nick

January 05, 2005, 08:29 PM

Jacco Bikker wrote: The bunny demo uses *only* the processor. If I compare what a cpu can do in terms of rasterization, compared to what the latest cards do, and then apply that to ray tracing then things are more in the right perspective: If NVidia would produce a chip that beats my code 50 or 100 times in terms of performance...

I'm no hardware expert but I'm afraid it's not that simple. There are still hard limits that are not easy to solve. First we have the memory bandwidth limitation. Raytracing also tends to have non-linear memory accesses, so caches and prediction logic would take up a huge fraction of the chip space (very much like existing CPUs really). The lighting calculations per pixel (or per ray) will also be very comparable to what current GPUs are capable of.
...I have no doubt that the coolest shots would soon come from ray traced games instead of rasterized games.

There's little that a pure raytracer can do that can't be done convincingly with a rasterizer. Shadows can be done with shadow maps (technically very close to raytracing actually), reflections with reflection maps, per-pixel lighting with shaders, etc. Raytracing only becomes really special when rays are split into a dozen new rays for every hit. But we know that makes things extremely slow.
And not just that: Graphics engines would no longer be the core of the game. Ray tracing is too simple for that.

Rasterization was also simple for flat shading...

I don't think graphics engines will just dissapear. Think about 3D Studio MAX's raytracing, do you think it's 'too simple'? One huge problem with raytracing is that rendering dynamic geometry doesn't work well with the optimizations. A kD-tree works extremely well as you've proven very nicely, but it only works for static geometry.

No bad things about your achievements! I'd just like to point out that I believe raytracing will only be best for specific applications. At best raytracing and rasterization techniques converge to combine the best of both...

 
zed zeek

January 05, 2005, 11:32 PM

nice one,
from 8.5 - 10.0 fps 2.0ghz athlon64
i didnt see anyone mention this, i do see the occasional missed pixel though (ie a single black pixel in the bunny), apearing in various different places of the bunny

 
Søren Madsen

January 06, 2005, 01:17 AM

Good job!

Your post about the experimental GCC compiler stirred my curiosity. Did you say "beat the Intel compiler"? Amazing! Could you provide more details on your performance results using the GCC compiler?

 
This thread contains 64 messages.
First Previous ( To view more messages, select a page: 0 1 2 ... out of 2) Next Last
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.