Not logged in, Join Here! or Log In Below:  
News Articles Search    

 Home / 3D Theory & Graphics / Fast Inverse Squareroot in Fixed Point Account Manager
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.

May 31, 2005, 02:12 AM

Hi Guyz,

I'm still trying to speed up the vector.normalize() function I use in my 3D engine for the ARM processor.

I bumped into a very interesting article on how first calculating the inverse squareroot can dramatically speed up normalization because you then don't have to divide every element of your vector with the length, but you can multiply them. This is obviously faster since ARM doesn't have a divide instruction but it does have a fast multiplication.

the article:

The only problem here is that I don't understand the initial premise of the article. It states:

"My idea is that you can do 1/sqrt(k) faster than this.
I assume that k is normalized so. 0.25

Fabian 'ryg' Giesen

May 31, 2005, 05:32 AM

That is not normalized as in "has length one", but as in "is in a given value range".

Since sqrt(a*a*x) = abs(a)*sqrt(x), you can "pull out" constant factors to reduce the range you have to approximate sqrt in a lot. The trick is doing this with factors where it's easy to do.
You use powers of four (=even powers of two) for that, because multiplying/dividing by them is just bit shifting.


May 31, 2005, 07:50 AM

I see....well, allmost ;^)

I'm not sure how one would 'normalize' a number given the explanation you provided.

for instance..say I have an int32 with the fixed point at 8.
then to get it in the range 0.25


May 31, 2005, 09:17 AM

Thinking a bit more about this...

to get a value X inbetween A and B one would typically divide X by (B-A) and then add A

you said: sqrt(a*a*x) = abs(a)*sqrt(x)
but this formula only handles the scaling part... but to get it in the range 1>k>+0.25 I'd also have to add 0.25 to it...

Is there also a clever trick for this too?


May 31, 2005, 10:05 AM


Jacco explained it to me... I understand now.
Thanks for the help!


Fabian 'ryg' Giesen

May 31, 2005, 10:10 AM

You can scale any positive real number by a power of 4 so that it lies in the interval [0.25,1).

Assume you're working with 32 bit integers that represent "0.32" fixed point numbers. All you need to do is shift left by two (=multiply by 4) until the value is >=0x40000000. That is the "pre-scaling" part. You count how many such shifts you needed (this will obviously always be less than 16, in the case of 32 bit numbers), call that number N. Then you calculate the square root. To "post-scale" (that is, undo the prescaling), you now have to divide by sqrt(4)^N=2^N, which is just a shift right by N bits.

In practice you can do a bit better than shifting up to 15 times to determine N (and normalize the input), because you don't have to do a "linear search" for the right scale factor; you can determine N with a simple divide-and-conquer-technique, assuming that x != 0:

  2.   int N = 0, t = x;
  3.   if(t & 0xffff0000) t >>= 16; else N += 8;
  4.   if(t & 0x0000ff00) t >>=  8; else N += 4;
  5.   if(t & 0x000000f0) t >>=  4; else N += 2;
  6.   if((t & 0x0000000c) == 0) N++;

Understanding that algorithm is left as an excercise to the reader :) - but seriously, this is a lot harder to write down than to understand; the general idea is that we want to count pairs of zero bits starting from the most significant bit of our number. If the first 16 bits are zero, we have 8 such pairs in the most significant 16 bits, plus whatever leading "zero packs" we have in the least significant 16 bits. If they're not zero, then the number of leading "zero packs" is the same as in the number (x >> 16), so in both cases we need to determine the number of leading zeroes of a 16-bit number next, which can be reduced exactly the same way.

You then need to shift x left by 2N bits before the sqrt computation and right by N bits afterwards. With fixed point it's a bit trickier; normally the easiest solution is just treating each fixed point number as if it was 0.32 then compensating for the actual number of fractional bits together with the "post-adjustment" shift.

This thread contains 6 messages.
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.