Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 

 Home / General Programming / aligned data to 16 bytes for simd Account Manager
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
codehunter

May 12, 2005, 05:48 AM

Hi all,
i've tried to optimise my matrix class with simd but i get an access violations when accessing _L1; the msdn documentation says that a __m128 variable will always be aligned at 16 bytes but i've noticed that _L1 is not aligned to 16 bytes so i think that's the cause of the error.

here is the data declaration of my matrix:

  1.  
  2. class cMatrix {
  3.         public:
  4.                 cMatrix() {}
  5.                 union
  6.                 {
  7.                         struct
  8.                         {
  9.                                 __m128 _L1, _L2, _L3, _L4;
  10.                         };
  11.                         struct
  12.                         {
  13.                                 float   _11, _12, _13, _14;
  14.                                 float   _21, _22, _23, _24;
  15.                                 float   _31, _32, _33, _34;
  16.                                 float   _41, _42, _43, _44;
  17.                         };
  18.                 };
  19. };
  20.  


thx in advance

 
Chris

May 12, 2005, 06:14 AM

_L1 is aligned to 16 bytes WITHIN THE CLASS (offset 0 ist aligned to 16 bytes).
You need to make sure that the instance of cMatrix is aligned to 16 bytes too.

 
codehunter

May 12, 2005, 06:27 AM

i've already tried that by this:

  1.  
  2. __declspec( align( 16 ) ) class cMatrix
  3. {
  4. ...
  5. }
  6.  


But that doesn't change the alignement. Is there an other way?

 
Chris

May 12, 2005, 06:35 AM

Do you new/malloc the cMatrix instance and work with pointers ? new/malloc don't respect __declspec(align), I think.

 
codehunter

May 12, 2005, 06:40 AM

no the matrix in which it goes wrong is not dynamicly allocated

i've tested the __declspec(align) on a coupl of floats and ints and there are also not aligned.

i use vs 2003 and there is an option to align struct at 16 bytes. but it still doesnt work :(

 
Nick

May 12, 2005, 07:33 AM

Strange, it should work with __declspec(align) if it's just a local variable...

The compiler uses the ebx register as an aligned stack pointer. So if you mess with that register, it can cause trouble (the compiler gives a warning about this). Anyway, you could try making the matrices static. That should always work and it's actually recommended since you avoid the aligned stack setup.

 
dummy

May 12, 2005, 09:00 AM

In my experience, msvc2k3 just silently ignore that kind of alignement clause.
Instead align things when instanciating, ie: cMatrix __declspec(align(16)) CmAtRix.
Side note _MM_ALIGN16 is slightly more portable and _mm_malloc + placement new are kinda handy.

Anyway msvc2k3 isn't too hot, to say the least, when dealing with intrinsics.

 
Chris

May 12, 2005, 09:54 AM

Well, if even Nick doesn't know what's up, I'm probably of absolutely no use here ...

 
Nick

May 12, 2005, 11:00 AM

I tried this code, and it works flawlessly:

  1.  
  2. __declspec(align(16)) struct Vector
  3. {
  4.         float x;
  5.         float y;
  6.         float z;
  7. };
  8.  
  9. void main()
  10. {
  11.         Vector a = {0, 1, 2};
  12.         Vector b = {3, 4, 5};
  13.         Vector c = {6, 7, 8};
  14.  
  15.         __asm
  16.         {
  17.                 movaps xmm0, a
  18.                 addps xmm0, b
  19.                 movaps c, xmm0
  20.         }
  21. }
  22.  

Could you give it a try? If it doesn't work, it must be your compiler settings...

 
codehunter

May 12, 2005, 02:51 PM

yes as long as you define them globally it works but when its a member of a class it fails

 
Nick

May 12, 2005, 05:00 PM

Ah yes, within classes the alignment is not affected by __declspec(align).

However, you should be able to use #pragma pack(16): http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm/pragm_22.asp

 
codehunter

May 14, 2005, 09:39 AM

i had no time to response any faster.
i have already tried the pragma pack(16) and that also doesn't works.
I've made a seperate test project and there i never get that error so i think it's something in my engine that cause the wrong alignement. but i have no clue what that can be :(

 
River

May 14, 2005, 11:03 AM

Ha... You message make think, do you have any pragma pack around? Maybe it is overriding __declspec (align)? #pragma pack should always be used with extreme care and in regions as small as possible.

 
Chris

May 14, 2005, 11:26 AM

Doesn't #pragma pack do another thing ? I though #pragma pack was for controlling the in-structure or in-class alignment of members, not the alignment of the struct or class on the stack ?

 
Nick

May 14, 2005, 11:30 AM

Chris wrote: Doesn't #pragma pack do another thing ? I though #pragma pack was for controlling the in-structure or in-class alignment of members, not the alignment of the struct or class on the stack ?

Yes, so when his matrix member is 16-byte aligned within the class, and the class itself is 16-byte aligned with __declspec(align(16)) or _aligned_malloc, he should be able to access the matrix rows on 16-byte boundaries.

 
Christian Schüler

May 14, 2005, 05:13 PM


On a side note, the STL that comes with VC7 cannot handle aligned objects in containers.
Try creating a std::vector of your aligned vector. It gives compiler errors all over the place :-(

 
Nick

May 14, 2005, 05:43 PM

Why would you ever want to use STL and SIMD, together? That's like ice and fire. :-P

STL containers use dynamic allocation, but not the aligned variants, so it can't guarantee the 16-byte alignment. If you need some sort of dynamic buffers, you can always use _aligned_realloc.

Most SIMD processing is done on arrays allocated with _aligned_malloc, or the data is read and written using movups. Constants and such can be stored in static 16-byte aligned locations, using __declspec(align(16)).

 
dougc

May 14, 2005, 05:59 PM

We've got a few classes that use _m128s and the __declspec(align(16)) (#pack isn't not needed). All of them work on the stack w/o any problems.

Hum, do you have any virtual functions in your matrix class that might throw off the alignment (though it should just align properly anyway and waste 12bytes). Also, what kind of CPU do you have?

You might check your project settings and make sure you don't have anything odd turned on (the default ones should work fine though).

The regular new/delete should return pointers that start 16byte aligned (at least they did for me, but you may need to make your own versions to make sure). You can also use _alignof( T ) to make generic containers. Hopefully some of this will be standardized soon (?) as now the different syntax per compiler is annoying.

Another thing to note is that while you can pass a _m128 by value, you can't pass a class w/ one inside it (at least not that I could find) even if you only have one (mVec4 class). This is a bit annoying b/c VC2k3 will actually pass them in the xmm# registers which can save the pointer deref that you get when you pass by const ref/pointer -- and some aliasing issues I believe. ( Anyone else find a way around this? )

-doug

 
dougc

May 14, 2005, 06:02 PM

There are very useful reasons for this. Including using maps/sets for geometry processing (which causes problems if you have a vert w/ a _m128 inside it). STL still pukes w/ this, so you're left w/ writing a custom allocator or your own containers. Actually STL has some pass-by-value functiosn that are called which keep any class w/ _m128s from working under .net 2k3.

 
Nick

May 14, 2005, 06:54 PM

Writing your own containers is the only serious option anyway. SIMD is used for performance reasons. That doesn't mix with the STL.

If performance isn't of highest priority, you can still use float[4], STL, and movups.

 
dougc

May 14, 2005, 10:13 PM

STL is fine for lots of things was my point. Exporters, lighting utils, editors, etc. So there's no reason the two can't go together. It's just annoying that they don't as I'm sure lots of people wanting to make SIMD classes would rather not write their own containers. I doubt we would have rewritten them if they worked.

 
codehunter

May 15, 2005, 07:18 AM

"do you have any pragma pack around?"
No it's the only pragma pack i use.

"you might check your project settings and make sure you don't have anything odd turned on "
I think i use the default options. i only enabled SSE and 16 byte alignement but those things doesn't make any difference.

"Also, what kind of CPU do you have?"
i've got a AMD 2000 XP

"Hum, do you have any virtual functions in your matrix class that might throw off the alignment?"
Nope,no virtual functions in my matrix class.





 
codehunter

May 15, 2005, 07:41 AM

mmm found something interesting. i've got a scengraph library and there i've got a scenenode class which has the matrix that goes wrong. Now that i add that .lib file to the test project i made(where the matrices are working), are the matrices there also wrong aligned. I still do not use any class from the scenegraph library or have any include fom that lib in my test project.

 
This thread contains 23 messages.
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.