Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 


Submitted by Maxim Stepin, posted on April 18, 2002




Image Description, by Maxim Stepin



One day I realized that implementing texture magnification by using simple bilinear filter is just not good enough. Very often, a texture contains several color-uniform areas with apparent borders between them and these borders get blurred as the rest of the texture. Just start any modern FPS game, move a character very close to a wall with some sign on it, and if the texture resolution is not high enough, you_ll experience a "bad eyes" effect - everything looks too blurry, you don't see any sharp borders anymore.

In order to solve this problem, I developed my own "smart" texture filter. It keeps borders between color-uniform areas look sharp regardless of texture magnification level and, at the same time, keeps interiors of those areas as smooth as bilinear filter does.

The idea is to use independent interpolation function for each group of four adjacent texels. I developed a set of such functions - most of them describe how border(s) intersect the interval between these texels_ centers. To determine, which function to use for each group of four texels, some additional information needs to be stored along with a texture. In my demo, I used 8 additional bits per texel for that purpose. Because these interpolation functions describe pretty much a sub-texel level of the texture, on preprocessing (analyzing) stage I had to use a much bigger version of the same texture. For example, for this demo I analyzed 2048x1024 image to get the final 512x128 texture with correct interpolation function information.

You can find the demo and full source code at www.HiEnd3D.com/demos.html. I used software rendering for obvious reasons, so please don_t expect very high speed. Artwork is inspired by M.C. Esher drawings.

Fell free to ask any questions. Comments and suggestions are very welcomed.

Maxim Stepin.


[prev]
Image of the Day Gallery
www.flipcode.com

[next]

 
Message Center / Reader Comments: ( To Participate in the Discussion, Join the Community )
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
savage

April 19, 2002, 05:48 AM

The demo looks excellent!
I also get a crash when the demo exits.
I am using Win2000 with ATI M4.

Is there any chance of you releasing a non assembler version of your source code?


BTW, M.C. Esher is my favorite artist. Well ahead of his time.


 
rakta

April 19, 2002, 06:10 AM

excellent work on the code!

But i'm curious if the better solution to this problem is the introduction of vector-based art used as textures? the marriage of pixelized imagery and vector-based 3d geometry works beautifully, but issues like these show its vexes.

I wonder if our graphic artists will one day be doing their work in Shockwave Flash or something :)

 
punkonion

April 19, 2002, 06:45 AM

To fant,

check-out the alpha channel of the TGA, and BEHOLD! No fake.

 
MaxSt

April 19, 2002, 06:57 AM

You know, that's exactly how I implemented it!


.DATA
InterpFuncTable label dword
dd 16 DUP (offset Type0)
dd 16 DUP (offset Type1)
dd 16 DUP (offset Type2)
dd 16 DUP (offset Type3)
dd 16 DUP (offset Type4)
dd 16 DUP (offset Type5)
dd 16 DUP (offset Type6)
dd 16 DUP (offset Type7)
dd 16 DUP (offset Type8)
dd 16 DUP (offset Type9)
dd 16 DUP (offset Type10)
dd 16 DUP (offset Type11)
dd 16 DUP (offset Type12)
dd 16 DUP (offset Type13)
dd 32 DUP (offset Type0)





Type0 is bilinear. Others 13 are border-like.

 
MaxSt

April 19, 2002, 07:08 AM

I tried Shortcut-S demo yesterday - looks very very nice.
Great analyzing method.

 
ttthorstennn

April 19, 2002, 08:23 AM

You should sue him anyway! That seems to be the standard company policy of today...

 
Ready4Dis

April 19, 2002, 08:36 AM

Awesome texturing program, but one gripe, I can't break 100fps.. it appears to be locking in at 99.9fps while zoomed out a bit, and under normal circumstances around 75-80... and while up close, drops to around 40!

Oh, and it also crashes on exit (like others with the same OS as mine) ...

Athlon XP 1600
GeForce2 MX
128 megs DDR
Windows XP Proffesional

Billy - BillyB@mrsnj.com

 
Nick

April 19, 2002, 09:40 AM

Great demo, but I liked the old one from '99 with texture dithering better ;) Could you add it to your site please?

Some tips to make it faster: Don't use fixed point for your matrix and vertex math! MulDiv has a big call overhead and doesn't get inlined. Don't transform your normals, do the lighting in object space. I think you used some kind of phong lighting with a 1MB lookup table? Don't do this, it's very bad for the cache and it's a lot faster to just calculate light values at the vertices and interpolate linearly. Also don't use lookup tables for trigonometric functions. Don't use a scanline clipper, that's much slower than a 3D polygon clipper for this amount of polygons. Your bilinear filtering assembly code could be about twice as fast. Take a close look at the bilinear filtering formula and you'll be able to simplify it to three multiplications, plus you don't need to calculate 1 - du and 1 - dv any more. By storing u and v in separate registers you can calculate the weight factors in just two instructions. You can also add support for variable texture sizes (mipmapping) without any extra instructions. Free ebp and esp by using lots of static data. Compute two pixels simultaneously for reducing dependencies, the uop count for the loop is more than 40 so a Pentium can't do efficient dynamic execution. Also use a good profiler to find the real bottlenecks instead of uselessly optimising things like your Rotate function. Mail me after you've improved all this ;)

Don't get me wrong. This is meant as constructive criticism. Your demo look really good, I just needed to point out things that could be improved. The reputation of software engines is already bad enough ;)

Cheers,

Nick

 
SteelBrain

April 19, 2002, 10:23 AM

"hack" the code (actually, you only need to xchg some filenames...)

 
Unix Plumber

April 19, 2002, 10:26 AM

Perhaps the problem with zooming out can be fixed by using a "standard" interpolation and rendering method. This would make the lines blurry and blended with the other colors, but it would look better and more real at farther distances.

 
Olio

April 19, 2002, 10:40 AM

Why just "danger" signs? :)
Wouldn't any text look good with this filter?

 
MaxSt

April 19, 2002, 10:54 AM

Here is the old version: www.hiend3d.com/bin/SmartFlt_1999.zip

>> Don't use fixed point for your matrix and vertex math!

The model is only 1000 triangles or something, so it soesn't matter. Besides, my fixed point math is more precise.

>> I think you used some kind of phong lighting with a 1MB lookup table? Don't do this, it's very bad for the cache and it's a lot faster to just calculate light values at the vertices and interpolate linearly.

But I like Phong!
Maybe it's a little slower, but I wanted to create an eye-candy, not just another super-fast-gouraud thing.

"trigonometric functions", "scanline clipper" - all this stuff is outside of innerloop, and my profiler shows the innerloop is most expensive thing, so this stuff it's not so important.

>> Your bilinear filtering assembly code could be about twice as fast.

That's sounds interesting.

>> Take a close look at the bilinear filtering formula and you'll be able to simplify it to three multiplications

I'm sorry, but 3 mul's - I don't think it possible. The only idea I could think of is:
color = r0r1r2r3 * Cr + g0g1g2g3 * Cg + b0b1b2b3 * Cb
But arrangment of such order will take more then a few punpck's...

>> By storing u and v in separate registers you can calculate the weight factors in just two instructions.

But calulating weight factors involves multiplications, right?
And I can't do multiplication in two instructions...

Could you explain your ideas, please? 'Cause I feel kinda confused.
My bilinear code was mostly based on Intel bilinear code - I always thought those guys were good in optimization. :)

>> Also use a good profiler to find the real bottlenecks instead of uselessly optimising things like your Rotate function.

VTune is too expensive, sorry. :)
Rotate() finction was copy-pasted from one of my old '96 programs.
I was just too jazy to write four lines of C code. :) It's nothing to do with optimization.

Maxim Stepin.

 
Mike Nelson

April 19, 2002, 10:57 AM

OMG that is Cool. Beautiful screenshot.

 
MaxSt

April 19, 2002, 11:22 AM

2 rIO.sK :
Yes, it's kinda similar to marching squares, but I don't have a potential function, and as you can see from picture at my wab page, only first 7 of my 14 functions are of that kind. Other 7 are more complex. I'm kinda converting each 4x4 texel group from original big texture to some tricky vector format.

2 rakta :
I think, texture artists still like to have control of intividual texels, so they prefer rester editors like Photoshop over vector editors. Besides, for me it's easier to analize big _raster_ image, rather then vector metafile.

2 Ready4Dis :
It's just FPS counter. It is very simple - only 3 digits. :)
Your 40fps at close-up sounds very good. :) I'm getting there around 11fps of my P3-600. Maybe your DDR memory does the magic. Interesting.

Thanks everybody for great feedback.
Maybe I should write an acticle about the method.

Maxim Stepin.

 
dEViNiTY

April 19, 2002, 11:37 AM

yowwwzz!! ever considered working at nvidia? :)
i can't wait to see the next quake/unreal using a similar technology.
it runs quite smooth here too (happy me!)
keep it up

 
Ready4Dis

April 19, 2002, 11:50 AM

Yeah, my system is pretty good, and I'm assuming the ddr memory is helping a lot seeing as something like this heavily relies on system memory :D

Billy - BillyB@mrsnj.com

 
MaxSt

April 19, 2002, 02:46 PM

>> ever considered working at nvidia? :)

Are you... NVidia CEO ? :D :D

 
Nick

April 19, 2002, 03:02 PM

Thanks for the old version! :)

>> Besides, my fixed point math is more precise.

A float has 'only' 24 bit mantissa precision, but the eight bits for the exponent are not lost you know! Your 32-bit fixed-point numbers can have a lot of unused bits at the MSB side, while a float 'always' uses the total 24-bit mantissa. Also when you divide or multiply a float the same precision remains, while a fixed-point number looses precision. If you scale your world by a factor of 1000, floats won't give any problem, but fixed-point numbers might overflow/underflow. So use floats, they're totally not slower and your code looks a lot better ;)

>> But I like Phong!

Ok, in that case, keep the LUT :) But try making it a lot smaller. I even think that a table of 32x32 will look the same, but it won't cause a cache miss every time like your 1 MB LUT. Look, you are storing 8-bit values in a gigantic table, most neighboring values would probably be the same!

>> Maybe it's a little slower, but I wanted to create an eye-candy, not just another super-fast-gouraud thing.

When you have enough triangles (like this case), gouraud looks just the same as phong if you use the phong formula at the vertices.

>> I'm sorry, but 3 mul's - I don't think it possible.

Then you obviously don't know where the formula for bilinear filtering came from. I'll show you two ways to get a formula with only three multiplications. First analytical. We start from the formulas you copied form Intel:

c = c0*(1-du)*(1-dv)+c1*du*(1-dv)+c2*(1-du)*dv+c3*du*dv

Expand it and you get:

c = c0-c0*dv-c0*du+c0*du*dv+c1*du-c1*du*dv+c2*dv-c2*dv*du+c3*du*dv

Collect du and dv:

c = ((c0-c1-c2+c3)*du-c0+c2)*dv+(c1-c0)*du+c0

Hmm, yucky, but it's already three multiplications isn't it? When we use some temporary varialbles we can simplify it a bit:

c20 = (c2-c0)*dv+c0
c31 = (c3-c1)*dv+c1
c = (c31-c20)*du+c20

That's only three multiplications and six additions. I don't think it can be any shorter. Now let me explain the meaning. Draw a square with the colors c0...c3 first. If we interpolate the color on the left side with dv, we get c20 in the formula above, if we do the same for the right side we get c31. Now interpolate accross c20 to c31 using dv and we get c! Ascii art:

c0 c1
+--+----+
| |dv |
c20+--c +c31
|du |
+-------+
c2 c3




Also notice that we don't have to calculate (1-du) and (1-dv) and more, and we don't need any multiplications in the weight factors. So if you count every color component (including alpha) separately, we've reduced from 20 to 12 multiplications, or from five to three MMX multiplications. Not to mention the additions we've saved. Du and dv can be calculated in just a few clock cyles if you store u and v in a separate register.

>> I always thought those guys were good in optimization. :)

Their implementation is as optimised as can be, but they didn't think long enough about the algorithms and formulas...

 
Nick

April 19, 2002, 03:51 PM

Typo:

>> Now interpolate accross c20 to c31 using dv and we get c!

I meant using du of course.

Just a little warning before you try implementing the formula with three multiplications: you will be subtracting colors with unsigned components. So be carefull about sign bits and integer ranges and mail me if you have a solution for it that doesn't take extra instructions (I do have such a solution but it's a hack).

 
MaxSt

April 19, 2002, 06:41 PM

Sorry, my brain doesn't work very well on Fridays... :)
I'll try this at this weekend. I'll e-mail you then.

 
Politik

April 19, 2002, 07:18 PM

This is scaled down from a much bigger texture, right? And you store an extra 8 bits in the image (in the alpha channel or whatever)? So wouldnt it be more fair to compare the performance of your algorithm to an equally large (byte wise) texture, as opposed to the 24 out of 32 bits that you currently use?

 
Ready4Dis

April 19, 2002, 09:15 PM

Yes, I suppose it'd be more fair to compare his 512x256x4 image to a:

512x256x4 = 524288 total bytes!
W*H*3 = 524288
w*h = 174762.667
w=2h
h*h*2=174762.667
h^2=174762.667/2
h^2 = 87381.333
h = sqrt(87381)
h=295.6
w=2h
w=2(295.6)
w=591.2

592x295x3 = 523920
593x295x3 = 524805

So, somewhere in between those 2 numbers :D

592/295=2.00678 (aproximately 2)
512/256 = 2.00000

So, these would be the same dimensions, so you could compare the image quality of these... but performance wise, the one that is a square of 2 will do better, because you can use shifting, instead of multiplying (or lookup tables).

So, as you can see, the fact that one is a square of 2, and the other isn't would void performance results (i'm assuming he's using shifting, if not, it may actually be comparable in performance, and image quality).

Billy - BillyB@mrsnj.com

 
MaxSt

April 19, 2002, 10:17 PM

I just updated the demo - made a "blind" fix for crash-on-exit bug.
Could somebody with Win2k or WinXP try this - whether the bug was actually fixed?
Thanks.

Maxim.

 
triebdata

April 20, 2002, 03:34 PM

Look for your mails. There is a a complete description what the problem is/was and how to fix this without using ExitProcess for easy fixing.

I hope you have recieved it this time.

 
MaxSt

April 21, 2002, 01:19 PM

OK, crash-on-exit bug is really fixed now.

Forgot to mention some "easter eggs" - buttons "0" (zero) and "N". :)
Cheers!

Maxim.

 
Ready4Dis

April 22, 2002, 07:33 PM

I found 'N'... actually my 6 month old son was playing with keys and found it :D Looks freaky with normals :D

Billy - BillyB@mrsnj.com

 
MaxSt

April 25, 2002, 10:25 AM

I have a working implementation of the formula you suggested, Nick.
But I cannot reach you by e-mail for some reason.

In short - performance it pretty much the same.

Maxim Stepin.

 
Nick

April 26, 2002, 10:19 AM

>> But I cannot reach you by e-mail for some reason.

Thanks for mentioning! I've recently changed from capens.com to capens.net. I've tested it again and nicolas@capens.net works. Alternatives are nicolas_capens@hotmail.com, nicolas.capens@pandora.be (50 MB) or nicolas.capens@rug.ac.be :)

>> In short - performance it pretty much the same.

Yeah, your inner loop only takes about 30% of processor time on my laptop PII 300. So even if you optimised the bilinear filtering to the maximum you would still only get about 10% performance increase. You definately need to optimise the lighting part too.

Could you try to use a much smaller phong map? Now it's 1 MB so it doesn't fit into the cache and the processor always has to wait 7 or more clock cycles to get the data from RAM. If you make it 64x64 or even 32x32 it should be a lot better and also a part of the texture map can stay in the cache.

Also try a 3D polygon clipper, that should be a lot faster than the scanline clipping you're doing now.

 
This thread contains 118 messages.
First Previous ( To view more messages, select a page: 0 1 2 3 ... out of 3) Next Last
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.