Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 

 Home / General Programming / SIMD/batch optimized datastructure. Account Manager
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
Victor Widell

February 10, 2005, 07:54 PM

Please give me your thoughts on this:

I had this stupid idea of trying to design a vector class speciffically for SIMD processing and batching. (Even though I have no real-life experience with ASM.) This seems to be the future of computing, considering the plans for the Cell chip.

My idea was basically to use an array of vectors, and apply all operations on those arrays instead of the induvidual vectors, and the data will be stored xxxx, yyyy, zzzz instead of xyz, xyz, xyz, xyz.

The class will basically look like this:

  1.  
  2. class cVectorArray{
  3.   public:
  4.     cVectorArray(int Size):Size(Size), x(new float[Size]), y(new float[Size]), z(new float[Size]){};
  5.     ~cVectorArray{
  6.       delete[] x;
  7.       delete[] y;
  8.       delete[] z;
  9.     };
  10.  
  11.   cVectorArray operator+(cVectorArray Input){
  12.     cVectorArray Return(MIN(Size, Input.Size));
  13.  
  14.     int i;
  15.  
  16.     i=0;
  17.     for(; i<Size; i+=4)
  18.       asm{ some asm SIMD code to fill Return.x. }
  19.    
  20.     i=0;
  21.     for(; i<Size; i+=4)
  22.       asm{ some asm SIMD code to fill Return.y. }
  23.    
  24.     i=0;
  25.     for(; i<Size; i+=4)
  26.       asm{ some asm SIMD code to fill Return.z. }
  27.    
  28.     return Return;
  29.   };
  30.  
  31.   private:
  32.     float* x;
  33.     float* y;
  34.     float* z;
  35.  
  36.     int Size;
  37. };
  38.  



There should of course be some interface functions to push/pop etc. But you get the concept.

Pros:
* You automatically get efficient SIMD code.
* You get linear access.
* "Simple" to implement.
* The overhead of function calling is negligable. You could even use inheritence and provide fallback implementations with function pointers in runtime.
* For very large data, one could even use multithreading.

Cons:
* Few APIs support the packing format. (?)
* You might need to convert the data back to conventional packing.
* It is impossible to do this kind of massive SIMD in some situations.


Now, my question is: Is this just another dead end, or could it actually be useful?

 
Dr. Necessiter

February 10, 2005, 09:09 PM

This can be useful, and has been done for some time. However, first think about your application. How are you using vectors? What are you using them for. If you're using a hardware graphics board, then it may turn out that it is actually quite rare to need to perform some operation in software on a big batch of vectors.

Unfortunately, what is good for SIMD is usually not so good for higher-level architectual organization. And in many cases, clarity and maintainability should probably win out over raw performance.

 
Victor Widell

February 11, 2005, 05:47 AM

"and has been done for some time"

I suspect Photoshop does it like this, since it has a dynamic number of channels and the file format looks like this too.

 
eulogy

February 11, 2005, 08:38 AM

Wouldn't separating x, y and z kill the cache when you're going to access these vectors afterwords, since they're in completely different memory areas?

SSE can operate on 4 floats at a time, so why not:

x1 y1 z1 w1
+ x2 y2 z2 w1

or am I missing something here? (I haven't coded ANY simd asm)

 
eulogy

February 11, 2005, 09:09 AM

(sorry about the double post)

Wouldn't separating x, y and z kill the cache when you're going to access these vectors afterwords, since they're in completely different memory areas?

SSE can operate on 4 floats at a time, so why not:

x1 y1 z1 w1
+ x2 y2 z2 w1

or am I missing something here? (I haven't coded ANY simd asm)

 
Dr. Necessiter

February 11, 2005, 09:21 AM

>Wouldn't separating x, y and z kill the cache when you're going to access these vectors afterwords, since they're in completely different memory areas?<

Exactly. It boils down to the specific application. If you are doing most of your work on vectors in big batches, then the SIMD may be a win. However, if you are typically accessing the vectors "individually", then the swizzling required to feed and extract data from SIMD may be more overhead than it is worth.

As Victor pointed out, some types of applications are well suited to this (signal processing, etc). I'd guess that most hardware accelerated games are not.

When I think about the architecture of most games, it seems that higher-level optimizations are put in place to prevent the need to operate on big batches of vectors. For example, we wouldn't brute-force dot product a vector against all the world's surface normals. If we did, this may be a good argument for SIMD. Instead, however, we partition the world and do hierarchical tests to reduce the amount of work required.

 
Victor Widell

February 11, 2005, 07:19 PM

"Wouldn't separating x, y and z kill the cache when you're going to access these vectors afterwords, since they're in completely different memory areas?"

This class is designed for batching. I imagine a Size of at least a few hundred vectors. In this case the separation is no problem since it works with all x-s first (wich are sequential), then all y-s, etc.

You could even put all data sequentially, like xxxxyyyyzzzz, but it would mean pretty slow resizing.


"so why not:

x1 y1 z1 w1
+ x2 y2 z2 w1"

Because most more complex formulas don't work well with interleaved data. Think of normalizations and dot products.


I think (my) idea would be quite suitable for areas that cannot easily be hardware accelrated; like tesselation, image processing, physics(?).

Also raytracing is tedious to implement in current gpu:s (allthoug perfectly possible). However, it seems to me that the problems with a vector array would be the same as storing vectors in a texture.


 
Anthony Rufrano

February 14, 2005, 02:26 PM

You could just do:

  1.  
  2. _asm {
  3.     mov                     edi, v1
  4.     mov                     esi, v2
  5.     movups                  xmm0, [edi]
  6.     movups                  xmm1, [esi]
  7.  


That aligns the struct for you.

 
This thread contains 8 messages.
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.