|
|
Christian: There are several factors that influence the number of rays per second. First of all, rays per second is the number of rays cast (primary & secondary), divided by the rendering time, so adding the bloom filter impacts this figure (quite severely; I added the filter two days ago and it's far from optimal, takes about 0.1s for a 512x384 image).
Other factors: - The depth of the kd-tree; - The size of the triangles (smaller triangles means that I can use the packet traversal code for less pixels, and the mono ray tracer is only half the speed); - Edge anti-aliasing: These rays typically can't use the packet tracer since a packet will almost always hit more than one triangle; - Texturing: The bilinear filter is applied to 4 rays at once if possible, but still it's slower than not doing texturing, obviously; - Reflections: These rays have more expensive setup, and also tend to diverge more rapidly than primary rays; - Screen coverage: The lego car doesn't cover the entire screen, which means that many rays don't need shading / texturing at all. Besides, these rays travel through less dense areas, so they hit less tree nodes (and no triangles, obviously).
There's more: The pilars of the cloister make that there are multiple complex areas. Rays that miss a pilar by a few pixels still travel through complex areas of the kd-tree, without hitting anything.
Summed, this results in the above speed difference. Basically, the car is the optimal case, the cloister is pretty much worst case.
Lycium: I'm not using sse3, just sse2. Frankly I don't even know what sse3 adds, I'll investigate. BTW why do you use double precision? I so far never encountered any problems that boiled down to precision. And another question: How fast are you? :)
Rover: Most of the improvements are just plain optimizations of the existing code: Const correctness, proper inlining, hot and cold data, cache alignment, reordering code and so on. There are some improvements to the kd-tree construction code: Empty space cut-off, adjusting the split plane by + or - epsilon to fix accuracy problems, and some smaller tweaks. The kd-tree tweaks resulted in about 10% improvement, a long list of tweaks to the code resulted in another 30-40%.
MightyMouse: This is unlikely. There's a huge gap between the theory of article 7 and what I have now. I'll make it up in a different way. Wait and see.
|