Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 


Submitted by Jari Komppa, posted on November 02, 2004




Image Description, by Jari Komppa



Yep, it's that time of year again. Text mode demo competition 7. Deadline 12.12.04. More info on the contest (such as rules, etc) can be found at http://tAAt.fi/tmdc/

The invitation demo uses FMOD for audio, LibCaCa for the console 'graphics'' output, and CFL3 to contain the data files.

Due to the fact that I've been even more busy than usual (having started studying in the evenings), the demo was made in a hurry (basically 2-3 days), including improvements on my textmode rendering engine. Most of the improvements were prompted by LibCaCa's output quality, which is surprisingly good.

All 3d in the demo is rendered with gouraud filler and blur filter; the scenes have surprisingly huge polygon counts =). As a neat technical detail, the blur filter uses a 2-pass "any sized box blur filter" trick - the blur effect takes exactly as much cpu time regardless of the blurring strength.

Cheers,
Sol


[prev]
Image of the Day Gallery
www.flipcode.com

[next]

 
Message Center / Reader Comments: ( To Participate in the Discussion, Join the Community )
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
Daniel Talagrand

November 02, 2004, 08:49 AM

One of the best textmode demos out there, with some of the most impressive rendering quality I've seen. Pretty impressive stuff, although the tmdc5 invitation demo remains my favorite (mainly due to the great blend with Teque's music).

Anyways, who knows, I might even try participating this year for a change.

On a side note, is LibCaca an evolution in Sol's TextFX console graphics library?

 
Faelenor

November 02, 2004, 09:26 AM

Nice demo!
That's impressive to see what can be done in text mode!

And libcaca... In French it means libshit ;) Hehe!

 
Moe

November 02, 2004, 12:59 PM

Sometimes I notice I have no clue from all the beautiful things that can be done. Nice!

 
Jari Komppa

November 02, 2004, 01:05 PM

LibCaCa is not my doing. The project's page can be found at:

http://sam.zoy.org/projects/libcaca/

Some have asked why I used LibCaCa although I already have my own library that does something similar.. well.. first off, I wanted to try something new for a change, and second, libcaca is simply better =) Almost too good, to be exact..

Anyway, I had to make some changes to the released version to make it usable for this purpose; the changes are included with the invitation.

 
Faelenor

November 02, 2004, 09:05 PM

I looked at libcaca's homepage and it seems that the author is a French guy! So the name was intended to mean that. A more exact translation would be:
libexcrement... Weird!

 
Jari Komppa

November 02, 2004, 11:33 PM

More weird than making a textmode graphics library? =)

 
Per Vognsen

November 02, 2004, 11:42 PM

As a neat technical detail, the blur filter uses a 2-pass "any sized box blur filter" trick - the blur effect takes exactly as much cpu time regardless of the blurring strength.


This got me thinking. You do the filtering in two passes as you say. The first pass blurs across rows, the second pass blurs across columns. This is pretty standard stuff so far. The trick is how to support large blur radii at essentially no cost. You could keep a "running sum" and update it each iteration. So let's say you are blurring a pixel in a row. You have the sum of the 2N + 1 pixels around the pixel in that row where N is the radius. Given this sum, you blur the pixel by just dividing the sum by 2N + 1. At each iteration, you update the running sum by subtracting off the pixel that's leaving the "window" at the left and adding in the pixel that's entering the "window" at the right. You lose some accuracy due to round-off but it likely won't be too noticeable. The performance of the algorithm isn't entirely radius-independent since to blur the first pixel of each row/column, you need to get the running sum initialized. This will require anywhere between N + 1 and 2N + 1 steps, depending exactly on how you handle filtering at the edges. (Clamping to zero, mirroring or wrapping, for instance.) At any rate, this will be negligible as long as the radius is relatively small compared to the size of the image.

Is this close at all to how you do it? Or did you come up with something much more clever?

 
Jari Komppa

November 05, 2004, 02:11 PM

I had hoped someone would pick that up. I have to admit that I did not come up with it (Lance / Aggression pointed it out to me, but I assume this is an old trick), but it's pretty neat for software rendering nevertheless.

Anyway, let's assume we have a silly situation with only values of 1 in the framebuffer, as follows..

  1.  
  2. 1  1  1 ...
  3. 1  1  1 ...
  4. 1  1  1 ...
  5. ...
  6.  


First pass, we add up all of the values in horizontal and vertical lines (yes, this is done for R, G and B separately, and yes, we need more than 8 bits for each component)

  1.  
  2. 1  2  3 ...
  3. 2  4  6 ...
  4. 3  6  9 ...
  5. ...
  6.  


Note that for each cell, we basically have val = val + val_left + val_up.

In the second pass, we use the above values to calculate the blur. To do this, we calculate, for each pixel:

  1.  
  2. value = buf[y2 + x2] - buf[y1 + x2] - buf[y2 + x1] + buf[y1 + x1];
  3. value /= (x2 - x1) * (y2 - y1);
  4.  


The reason why we add and decrement values like we do can be figured out easiest by looking at this situation..

  1.  
  2. aaabbb...
  3. aaabbb...
  4. aaabbb...
  5. cccddd...
  6. cccded...
  7. cccddd...
  8. .........
  9. .........
  10.  

where the whole area is the framebuffer, 'e' is the pixel we're calculating, 'd' (and e) is the area we want to average. First we get the value at y2,x2, which is actually the sum of all pixels marked with a,b,c,d or e in our picture. Next, we subtract y1,x2 from it, in practise subtracting the sum of everything marked with a or b. Next, subtract y2,x1, which maps to a and c. Last, we add x1,y1 back, because we subtracted the 'a' area twice.

Simple. =) After this, we just pick the grid size and get different amounts of blur with the same operations.. of course the cache misses etc. affect the result, but it's basically the same cpu power cost.

One catch is what to do with the borders. I ended up doing it so that the filter kernel gets smaller on the edges, and thus (in theory) get blurred less. It's not noticeable, at least in text mode. Note that the divisor also changes.

Too bad this trick doesn't work too well with hardware rendering..

 
Luke Hodorowicz

November 05, 2004, 02:48 PM

Jari: Haven't tried myself, but supposedly it can be done with hardware...

http://www.opengl.org/resources/tutorials/gdc2003/GDC03_SummedAreaTables.ppt

-Luke

 
Per Vognsen

November 05, 2004, 03:16 PM

Jari Komppa wrote: Simple. =) After this, we just pick the grid size and get different amounts of blur with the same operations..


Pretty cool, so we came up with pretty different approaches! I actually suspect my algorithm will be somewhat faster due to separating the filtering into two passes where each pass reads exclusively from either rows or columns. This makes prefetching almost trivial since you only need to worry about one "span" at a time.

 
Jari Komppa

November 05, 2004, 03:31 PM

Thanks for the link. If nothing else, it does prove that it's an old trick - check the references =)

 
Fabian 'ryg' Giesen

November 06, 2004, 09:33 AM

Somes notes on that trick (I'm using it for blurring in all my texture generators for quite some time now ;):

1. I find the variant Per mentioned generally nicer in practice, because it needs less memory (O(N) vs. O(N^2)), and works with less bits in your accumulators. The latter may seem irrelevant at first, but it's not; I use 16 bit per color channel for texture generation (a good idea if you plan to generate normal and displacement maps), so with a 32 bit accumulator, you can get overflow after more than 65536 pixels, or a 256x256 bitmap - oops. Plus the smaller memory requirements make it feasible to implement things in a way that basically everything happens in the cache, which is a nice speed boost.

2. The horizontal blur step is usually very fast. The vertical one tends to be *much* slower, because you're reading one pixel from line 0, then one pixel from line 1, etc., which is a cache nightmare. One solution is to do the vertical pass in "stripes" of more that one pixel width - how much more depends on the number of bytes in a pixel and in a cache line. This improves the speed notably.

3. Fast blurs are nice, but being limited to a box filter isn't. Fortunately, you aren't: You can blur multiple times in succession. When doing first, you're essentially doing a convolution with a box filter kernel. Now, convolution is associative, so "((((signal * box1) * box2) * ...) * boxN)" is the same as "signal * (box1 * box2 * ... * boxN)" with * indicating convolution. As some experiment will convince you (I'll spare you proofs :), a box filter kernel applied onto a box filter kernel yields a triangle kernel, and another application gives a quadratic b-spline which looks pretty close to a gaussian (further applications will result in b-spline basis functions of higher degree, which are still closer to a gaussian; again, I'll skip the details here). It has the advantage that it actually drops off to zero after a while, unlike a true gaussian which has infinite support, so this is in practice nicer to use than the "real thing", because you don't need to do any windowing.

4. The last point leads to a relatively easy implementation of a very fast "gaussian" blur (some experiments I've conducted lead me to the conclusion that Photoshop also uses the 3-repeated-box-filters approximation, so this is the same "gaussian" as in Photoshops "gaussian blur"), if you take care of one detail: don't just do a "full" blur 3 times in succession, do the 3 passes directly for every row/column you process, while it's still in cache!

An interesting phenomenon I've encountered is that replacing the vertical pass altogether by transposing the image, then doing a horizontal pass, then transposing again, was faster than the seperate vertical pass in "stripes". Obviously, transposing is quite memory bandwidth-heavy, so you'll only want to do it twice, and not for all 3 passes! Still, I find this quite interesting - but it's nice news, since it allows me to only have code for horizontal blurs together with a very short loop that transposes the image - less work than having two seperate steps with different code.

---

For hardware, you cannot implement this technique directly. However, for box filters of width N, you can still do better than summing N weighted copies of the image: Assume for a moment than N is a power of 2.

In the first pass, compute the average of each pixel and the pixel just to the right.

Then, do a second pass, with the results from the first pass as input; compute the average of each pixel and the pixel *2* steps to the right. The first pixel contains the value (intensity[x+0]+intensity[x+1])/2, the second contains (intensity[x+2]+intensity[x+3])/2, so you get (intensity[x+0]+intensity[x+1]+intensity[x+2]+intensity[x+3])/4.

Repeat as needed. This takes exactly log2(N) steps for a width N horizontal blur, if N is a power of two; extending that method to one that computes the exact result of blurring with a radius N box kernel for non-power-of-2 N is quite hard to do with hardware rendering, but in practice I've found you don't need to; just offset the texture coordinates smoothly and make use of bilinear filtering. You won't get any visible popping and it will still look blurred - who cares that it's not quite right. Also, the method I've used for explanation shifts the image to the right by N/2 pixels; "centering" is again just a matter of adjusting the texture coordinates appropriately, which again will not be quite correct if they become fractional in the process, and which I'll again ignore :)

To do a real 2D kernel requires another log2(N) passes in vertical direction. This can sum up to quite a lot passes. Another not-quite-right-but-OK approximation is to sum 4 copies of the image in each step - one offset to the top-left, one to the top-right, one bottom-left, and one bottom-right. This doesn't really result in a proper 2D box filter at all, but it still looks blurred, so we'll happily take the reduced amount of passes and decide to completely forget about proper filtering :).

In the first pass, you can normally get some more "bang for the buck" by actively using bilinear filtering to weight more pixels in. For bloom, doing one or two such low-radius passes afterwards does a nice job of removing the very linear gradients that box filters tend to result in. Obviously, this all just pays off if you're doing a box filter with relatively high width in the first place, which is probably not a good idea anyway; but still, this makes for a cool demo effect :)

 
Per Vognsen

November 06, 2004, 12:47 PM

Fabian 'ryg' Giesen wrote: 3. Fast blurs are nice, but being limited to a box filter isn't. Fortunately, you aren't: You can blur multiple times in succession. When doing first, you're essentially doing a convolution with a box filter kernel. Now, convolution is associative, so "((((signal * box1) * box2) * ...) * boxN)" is the same as "signal * (box1 * box2 * ... * boxN)" with * indicating convolution. As some experiment will convince you (I'll spare you proofs :), a box filter kernel applied onto a box filter kernel yields a triangle kernel, and another application gives a quadratic b-spline which looks pretty close to a gaussian (further applications will result in b-spline basis functions of higher degree, which are still closer to a gaussian; again, I'll skip the details here). It has the advantage that it actually drops off to zero after a while, unlike a true gaussian which has infinite support, so this is in practice nicer to use than the "real thing", because you don't need to do any windowing.


This is very, very cool and I feel stupid for not thinking about it. If you ponder this result for a second, you come to realize that what you are describing is basically a version of the central limit theorem from probability theory: If you have a collection of independent uniformly-distributed random variables X_1, X_2, ... then the average 1/N(X_1 + X_2 + ... + X_N) has a normal distribution in the limit of N -> infinity. The probability density function (PDF) of the normal distribution is the Gaussian and the PDF of the X_i is the box function. To compute the PDF of X_1 + X_2 you convolve the PDFs of X_1 and X_2. This makes sense because to calculate the probability of getting a certain number, you have to consider all possible ways of getting that number by summing X_1 and X_2. So you end up with something like P(X_1 + X_2 = s) = int(-infinity, infinity) P(X=t) P(X=s-t) dt. Anyway, thanks a lot for mentioning that trick!

 
Fabian 'ryg' Giesen

November 06, 2004, 01:59 PM

Yep, that is the easiest way to prove it :)

 
This thread contains 14 messages.
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.