See what's going on with flipcode!




This section of the archives stores flipcode's complete Developer Toolbox collection, featuring a variety of mini-articles and source code contributions from our readers.

 

  Automatic CPU Detection
  Submitted by



In many math libraries there are different functions optimized for specific processors. For example a square root approximation using SSE takes only a few clock cycles, compared to dozens or even hundreds of clock cycles for a version that does not use SSE. But the benefits of using SSE can be in vain if we constantly have to check for processor support.

Here I present you a method that does not only eliminate this inefficiency, but also does it automatically:

#include "CPUID.hpp"
#include <math.h

float fsqrtInit(float x); // Forward declaration

static float (*fsqrt)(float) = fsqrtInit; // Function pointer

float sqrtSSE(float x) // Fast SSE implemenation { __asm { movss xmm0, x rsqrtss xmm0, xmm0 rcpss xmm0, xmm0 movss x, xmm0 }

return x; }

float fsqrtInit(float x) // Initialization function { if(CPUID::supportsSSE()) { fsqrt = sqrtSSE; } else { fsqrt = sqrtf; }

return fsqrt(x); }



The trick is to use a function pointer, here fsqrt ("fast sqrt"). When SSE support is detected, we make it point to our SSE implementation, else we point it to the regular sqrtf function. We could do this in an initialization function that is called at the beginning of our application, but this is easy to forget and cumbersome. My solution is to initialize the fsqrt function pointer with the initialization function itself!

So when fsqrt is called the first time, it actually calls the initialization function and then the correct square root function. The second time it is called, it does not check for SSE support any more, but immediately calls the best square root function. So the initialization is done exactly once and you never have to worry about it.

But it obvioulsy does not support inlining. This isn't too bad because a function call is very optimized on modern processors and much faster than a mispredicted jump. Furthermore, if you need top performance, you shouldn't use this trick for such small functions, but optimize the whole the algorithm. As with all optimizations, only apply it if it matters and when it matters.

Regards,


Nicolas "Nick" Capens

The zip file viewer built into the Developer Toolbox made use of the zlib library, as well as the zlibdll source additions.

 

Copyright 1999-2008 (C) FLIPCODE.COM and/or the original content author(s). All rights reserved.
Please read our Terms, Conditions, and Privacy information.