Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 

 Home / 3D Theory & Graphics / division vs multiplication , silcon area, latency, energy Account Manager
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
Arne Rosenfeldt

April 22, 2005, 08:29 AM

Am I right,
__that division is slower than multiplication,
__ because
____after every substraction (=quotient bit)
____one has to wait for the carry bit
______to be carried through all 32,64 or 60 or 120 bits of the dividend

 
Wernaeh

April 22, 2005, 02:09 PM

What kind of multiplication do you mean ? Integer or float ?

Heh, and btw, what's up with those underscores before your post's lines ? ;o)

Cheers,
- Wernaeh

 
criznach

April 22, 2005, 03:23 PM

Thank you. The sleepless nights have come to an end. :P

 
Julio Gorgé

April 22, 2005, 03:52 PM

It all depends on the ALU design. Traditionally, division has been considered slower in most systems. Nowadays, I do not think it is still true. I would not bother about it unless I were coding for a really slow CPU or embedded system.

 
Fabian 'ryg' Giesen

April 22, 2005, 04:24 PM

Well, on X86 CPUs, it definitely is true (both integer and float).

 
MOA

April 22, 2005, 04:48 PM

Division is certainly slower on today's systems. Try to avoid it where possible.

Note that you can always replace a division by a multiply, by using the inverse.

i.e:

100 / n == 100 * 1/n

(This only makes sense when 'n' is a constant [and the compiler will probably do this for you already]. You should ofcourse store 1/n instead of n by yourself when using look-up tables, for example.)

 
Scali

April 22, 2005, 05:26 PM

The compiler will probably not do this when using floating point...
x/y won't give the exact same result as x*(1/y) with floats... you lose some precision in the intermediate 1/y result...
So by replacing the division with a multiplication, the compiler could break your code, and therefore it should never do this.
With integers it's different. You will use fixedpoint (2^n)/y and calc (x*(2^n)/y) / (2^n) == (x*(2^n)/y) >> n.

By just choosing n large enough, you can guarantee that you will get exactly the same amount of bits as the integer division would get, so you get the exact result. AMD made a simple tool to generate the code for you (sourcecode for a Windows-app here: http://modseven.de/pastebin.php?id=161), and any half-decent compiler has it built-in these days.

 
Arne Rosenfeldt

April 23, 2005, 07:52 AM

>What kind of multiplication do you mean ? Integer or float ?
So what is the difference?
for float the mantissas are multiplicated like int
and exponents are added

>Heh, and btw, what's up with those underscores before your post's lines ? ;o)
lame indent technique

> It all depends on the ALU design.
But there are physical limitations.
Multiplication works in parallel,
division not.

For division you need a lot of space for storing intermediate results, while the carry bit wanders around.

For mulitplications you can put 4 units into this space and run them at ten times the speed.

>Note that you can always replace a division by a multiply, by using the inverse.
This thread is related to perspective correct texture mapping. No const there.

 
Victor Widell

April 23, 2005, 08:23 PM



Arne Rosenfeldt wrote: > This thread is related to perspective correct texture mapping. No const there.


I think quake 1 used an inversed z-buffer to avoid division later in the shading process.

 
Arne Rosenfeldt

April 24, 2005, 02:39 AM

Oh, I did a mistake:

>Note that you can always replace a division by a multiply, by using the inverse.


Yes, for each pixel

z = 1/"1/z"

u0 = "u0/z" * z
v0 = "v0/z" * z
u1 = "u1/z" * z
v1 = "v1/z" * z

r = "r/z" * z
g = "g/z" * z
b = "b/z" * z

zbuffer=z


So with some shaders, the 1/z isn't the main problem today anymore.
But anyhow, I was thinking of a recursive span subdivision:

For each recursion:
Interpolate z linear
Extrapolate z linear
We get an upper and lower z bound
Therefore a lot of significant bits in the quotient are already known

 
Fabian 'ryg' Giesen

April 24, 2005, 10:12 AM

First, multiplication is not completely parallelizable either (at least not with multiplication algorithms you'd use for relatively small numbers, such as

 
Wim Heirman

April 24, 2005, 10:44 AM

As Scali correctly noted, the above is not true. The US Army even had their Patriot missiles fail, since their programmers incorrectly assumed that n/10 == n*.1 ...
You can read more about that here: The Patriot Missile Failure http://www.ima.umn.edu/~arnold/disasters/patriot.html

 
Arne Rosenfeldt

April 24, 2005, 12:42 PM

>completely parallelizable
nobody is perfect
>pipeline the process
It comes naturally for multiplication because all datapaths are of similar speed.
One could theoretically run the data through it without pipeline (without lots of latches) and instead spend 2 waitstates.

> However, there are no decisions based on the intermediate results involved, so the computations for those types of dividers can be pipelined relatively well (though the data flow gets rather hairy).

The pentium bug

>In practice, no one does this because division is a lot less frequent than multiplication in normal applications.
But now as everybody as learned to parallize at a higher level,
wouldn't it be wise to return to the shift and substract method,
just in parallel?

>assuming multiplication/addition time is a constant (or can be hidden by pipelining), grows logarithmically in your word size.
cool, division is faster than add ;-)

 
Arne Rosenfeldt

April 25, 2005, 02:43 AM

Some more:
>wider-than-word-width multipliers
would allow to solve z'=z^2 with newton method leading to 1/z without division

Texture Supersampling allows for subpixel precise const z mapping.

I guess today everybody uses bilinear interpolation for each tile.

The greater number of divisions in
http://en.wikipedia.org/wiki/Reyes_rendering
http://en.wikipedia.org/wiki/Ray_tracing
prevent their success.

But I hope someday fur is just rendered as in "monster inc.".

>cool, division is faster than add ;-)
add scales linear with word width if you want to use the carry or zero bit

I guess I have to read
http://historical.ncstrl.org/litesite-data/stan/CSL-TR-93-554.pdf

 
Arne Rosenfeldt

April 25, 2005, 02:58 AM

>http://historical.ncstrl.org/litesite-data/stan/CSL-TR-93-554.pdf
>5.1 Newton-Raphson

So in reality for all kinds of renderers

use 1/z_{n-1}
to init Newton-Raphson
and get 1/z_{n}


If you walk your scene graph, this will be fast.

Note, this is free fall feeling:
rounding errors (as with every float)
unproper rounding (says the pdf)
indeterministic result, because of init and epsilon >> LSB

1/z as difficult as sin :-(

 
Scali

April 25, 2005, 06:19 AM

Why not the tried-and-tested method of scanline subdivision?
In the simple version, you just generate a 1/z for every 16th pixel, and calc the gradients, which you then linearly interpolate along those 16 pixels, while you start the next 1/z on the FPU. Like the Quake 1 renderer... You get the perspective virtually for free then.

In a slightly more complex version you will estimate the amount of perspective correction required, rather than using the fixed value of 16 pixels. This can give you visually perfect results, while not wasting time on calcing perspective at every pixel, which you can't see anyway... and in practice should be almost as fast as linear gradients.

 
Arne Rosenfeldt

April 25, 2005, 12:22 PM

------linear
But why does the playstation 1 look so horrible?
Most triangles are smaller than 16 pixel.

OK for hires (1280*960*Antialias) linear interpolation works better.
And hardware will probalby use not 16*1pixel scanlines, but 8*8 boxes.

And so you get rid of the multiplications also ( this u= "u/z" * z stuff)

With high tesselation we could go back to plain affine texture mapping.

-------------- synergy

But all division methods really get faster
if you have a good starting value.
Some get quadratically faster, so even when you do them twice as much, you are twice as fast.

---------------Reyes

Maybe we could divide space into voxels,
and interpolate linear within.

The pixel-shaders eat three times the space on the chips then the vertex shaders . I think it should be vice versa.

BSP based shadow volumes are a no no with this

 
Scali

April 25, 2005, 01:50 PM

Erm, is it me, or is your reply completely unrelated to what I said?

 
Arne Rosenfeldt

April 26, 2005, 07:46 AM

You: "for every 16th pixel"
Me: "Most triangles are smaller than 16 pixel"

You: "linearly interpolate"
Me: "playstation 1", wich always did linearly interpolate

You: "16 pixels"
Me: "8*8 boxes" = 64 pixel =faster

You: "Like the Quake 1 renderer"
Me: "" like descent=4*1 or terminal velocity=8*8

You: "FPU"
Me: "hardware" I want to forsee HW development, because I am slow at implementing

You:"In a slightly more complex version you will estimate the amount of perspective correction required, rather than using the fixed value of 16 pixels"
Me: "playstation 1", were triangles at sharp angles get smaller in this direction

You: "Why not the tried-and-tested method"
Me:"But all division methods really get faster" So this should theoretically be better than the tried and tested method.
Me: "OK for hires (1280*960*Antialias) linear interpolation works better."But why does it look so good? Does it have to do with how our eyes work?




Not related:

And so you get rid of the multiplications also ( this u= "u/z" * z stuff)

With high tesselation we could go back to plain affine texture mapping.

-------------- synergy


if you have a good starting value.
Some get quadratically faster, so even when you do them twice as much, you are twice as fast.

---------------Reyes

Maybe we could divide space into voxels,
and interpolate linear within.

The pixel-shaders eat three times the space on the chips then the vertex shaders . I think it should be vice versa.

BSP based shadow volumes are a no no with this

 
Scali

April 26, 2005, 08:24 AM

You: "for every 16th pixel" Me: "Most triangles are smaller than 16 pixel"


Then you have bigger problems than perspective, I'd say.
Anyway, scanline subdivision can work for any span of pixels ofcourse. And it could also be implemented vertically, but that may be more trouble than it's worth in some cases.

You: "linearly interpolate" Me: "playstation 1", wich always did linearly interpolate


PlayStation had bigger problems than linear texturemapping... Like no subpixel/texel correction, no texture filtering, not that many colours, etc.
Quake 1/2 in software mode look much better than PSX.
They are indistinguishable from proper perspective texturemapping. That's scanline subdivision.

You:"In a slightly more complex version you will estimate the amount of perspective correction required, rather than using the fixed value of 16 pixels" Me: "playstation 1", were triangles at sharp angles get smaller in this direction


Yes, but what are you trying to say with that? Triangles at sharp edges always get smaller... Has nothing to do with PSX. That's just how perspective works.

 
Nick

April 26, 2005, 01:07 PM

With SSE, you can get four 1/x approximations per clock cycle. These have a precision of 12 mantissa bits, which can easily be increased to ~24 bit with a Newton-Rhapson iteration.

The hardware implementation is based on a small lookup table. To compute 1/x you only have to negate the exponent of the floating-point number, and look up the reciproke of the mantissa in the table. The two nearest values are read and linearly interpolated. An alternative is to partially convert the (12-bit) table to logic.

 
Arne Rosenfeldt

April 27, 2005, 02:25 PM

>but what are you trying to say with that?
is in reply to
>In a slightly more complex version you will estimate the amount of perspective correction required, rather than using the fixed value of 16 pixels

either you use fixed pixel block size like Quake,
or you use fixed texel block size like PSX (planes are tesselated).

What else? Logartihmic stuff, like quadtree tesselation come to mind.

And I have seen artefacts with 8x8 and with 16x16 pixel blocks software renderers. I guess Quake is just dark enough/has undefined texture art, so you and me cannot see it.

 
Scali

April 27, 2005, 03:11 PM

I just use a simple estimation formula that takes the error at the middle of the polygon, given linear interpolation.
It will then guess the proper subdivision for the polygon, which is between 2 and 64 pixels in practice.
You can tune the guess so that it will always be conservative, which means there are never any visual errors.

 
Arne Rosenfeldt

April 28, 2005, 12:30 PM

OK, thank you

 
This thread contains 24 messages.
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.