Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 

 Home / 3D Theory & Graphics / SSE2 instructions ? Account Manager
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
devil20

March 24, 2005, 05:29 AM

Hi
I am not well experinced with sse2 instructions but trying to learn how to use ? So I got some questions .

1) In which Case i'll Get Previlledge Instruction ??
2) If there is divide by zero or multiply by zero then i'll get previlledge instuctions ??
3) If constructor not initilize and value is grabge then i'm trying assign so what is result ??

I made simple vector3d class that is using sse2 instruction but in assignment of some value in x,y,z i'm getting previlledge instruction. So any idea why ?

If u wanna see my code please let me know i'll post it.

devil20

 
Navreet Gill

March 24, 2005, 07:35 AM

Post code

 
Chris

March 24, 2005, 10:20 AM

1) I'm not picky and make spelling mistakes myself, but it's really "Privileged Instruction". I saw you repeating this typo. Anyway, you get this when you access data that is not properly aligned.

2) Divide by zero should trigger the appropriate exception (there's an extra one for this), not Privileged Instruction Faults.
Multiply by zero is perfectly valid, what did you smoke ?

3) Constructors have nothing to do with SSE. Any variable you do not explicitely initialize is filled with bogus values, which you cannot predict. When you use such a value as a floting-point value (no matter if that be SSE or ordinary FPU) it might contain a totally invalid representation and you might get any exception ( div by zero, denormal, overflow, underflow, etc. ) depending on what the CPU thinks the bogus value means.


All this is explained to great detail in the Intel Developer Documents, which are freely available, search the www.intel.com site.

 
devil20

March 24, 2005, 10:11 PM

Hi
Yeah it's spell mistake :) sorry anyway .
Ok..Your right i read from intel site in some documents and i think problem is my data is not aligned But i don't know about aligned data ?? Can u explain me via one little example of aligned data ??

here's code that i am trying to use. Code is not 100% tested becuase i still can not debug because of errors. if u found any error please let me know.

  1.  
  2.  
  3.         class SPVector3
  4.         {
  5.         public:
  6.                
  7.                 union {
  8.                         __m128 vec;
  9.                         struct {
  10.                                 float   x,y,z;
  11.                
  12.                         };
  13.                 };
  14.  
  15.                 //! constructor
  16.                 SPVector3();
  17.                
  18.                 //! copy constructor
  19.                 SPVector3(const __m128 &m);
  20.  
  21.                 SPVector3(const F32vec4 &m);
  22.  
  23.                 SPVector3(const float x, const float y, const float z);
  24.                
  25.  
  26.                 SPVector3(const float *arr) : vec(_mm_loadl_pi(_mm_movelh_ps(_mm_load_ss(arr+2),_mm_load_ss(arr+2)), (__m64*)arr)) {}
  27.                
  28.                 operator __m128() const { return vec; }
  29.                
  30.                 operator F32vec4() const { return vec; }
  31.  
  32.  
  33.                 SPVector3& operator = (const F32vec4 &a) { vec = a; return *this; }
  34.                 SPVector3& operator = (const __m128 &a) { vec = a; return *this; }
  35.  
  36.                 //! operators
  37.                
  38.                 // Accessing elements:
  39.                 float& operator () (int i) {
  40.                         assert((0<=i) && (i<=2));
  41.                         return *(((float *)&vec) + i);
  42.                 }
  43.                 float& operator [] (int i) {
  44.                         assert((0<=i) && (i<=2));
  45.                         return *(((float *)&vec) + i);
  46.                 }
  47.                 float& operator [] (int i) const {
  48.                         assert((0<=i) && (i<=2));
  49.                         return *(((float *)&vec) + i);
  50.                 }
  51.  
  52.                 SPVector3& operator = (const SPVector3& a) { vec = a.vec; return *this; }
  53.  
  54.  
  55.                
  56.                 //! unary plus
  57.                 SPVector3  operator + (const SPVector3& other) const ;
  58.                
  59.                 //! unary plus
  60.                 SPVector3& operator += (const SPVector3& B);
  61.                
  62.                
  63.                 //! unary minus
  64.                 SPVector3& operator -= (const SPVector3& B) ;
  65.                
  66.                 //! unary minus
  67.                 SPVector3  operator - (const SPVector3& other) const;
  68.  
  69.                
  70.                 //! multiplication vector with number float    
  71.                 SPVector3  operator * (const float s) const ;
  72.                                
  73.                 //! multiplication vector with another vector
  74.                 SPVector3 operator * (const SPVector3& A) const ;
  75.                
  76.                 //!  Vector multiplication by float. [Vec] = [Vec]*s.  
  77.                 SPVector3& operator *= (const float s);
  78.                
  79.                 //! assignment and multiplication
  80.                 SPVector3& operator *= (const SPVector3& A);
  81.  
  82.                 //! divides
  83.                 SPVector3 operator / (const SPVector3& other) const;
  84.                
  85.                 //! divides & assign
  86.                 SPVector3& operator /= (const SPVector3& other);
  87.                
  88.                 //! divides by float
  89.                 SPVector3 operator / (const float v) const ;
  90.  
  91.                 //! divide by float
  92.                 SPVector3& operator /= (const float v);
  93.  
  94.                 //! cross product of two vectors
  95.                 SPVector3 cross(const SPVector3& p) const;
  96.                
  97.                 //! Returns the dot product with another vector.
  98.                 float dot(const SPVector3& other) const;
  99.                
  100.                 //! Returns length of the vector.
  101.                 //! we use float 32 bit becuase of speed
  102.                 f32 getLength() const ;
  103.                
  104.                 //! Returns squared length of the vector.
  105.                 //! This is useful because it is much faster then
  106.                 //! getLength().
  107.                 f32 getLengthSQ() const ;
  108.  
  109.                 //! normalize vector
  110.                 SPVector3 normalize();
  111.                
  112.        
  113.         }; //! end of class
  114.  
  115.        
  116.         //! constructor
  117.         inline SPVector3::SPVector3()
  118.                 :x(0),y(0),z(0)
  119.         {
  120.         }
  121.  
  122.         //! copy constructor
  123.         inline SPVector3::SPVector3(const __m128 &m)
  124.          : vec(m)
  125.         {
  126.         }
  127.  
  128.         inline SPVector3::SPVector3(const F32vec4 &m)
  129.                 : vec(m)
  130.         {
  131.         }
  132.  
  133.         inline SPVector3::SPVector3(const float x, const float y, const float z)
  134.                 : vec(F32vec4(0.0f,z,y,x))
  135.         {
  136.         }
  137.  
  138.        
  139.         //! unary plus
  140.         inline SPVector3  SPVector3::operator + (const SPVector3& other) const
  141.         {
  142.                 return SPVector3(_mm_add_ps(vec,other.vec));
  143.         }
  144.        
  145.         //! unary plus
  146.         inline SPVector3& SPVector3::operator += (const SPVector3& B)
  147.         {
  148.                 vec = _mm_add_ps(vec, B.vec);
  149.                 return *this;
  150.         }
  151.        
  152.         //! unary minus
  153.         inline SPVector3  SPVector3::operator - (const SPVector3& other) const
  154.         {
  155.                 return SPVector3(_mm_sub_ps(vec,other.vec));
  156.         }
  157.  
  158.         //! unary minus
  159.         inline SPVector3& SPVector3::operator -= (const SPVector3& B)
  160.         {
  161.                 vec = _mm_sub_ps(vec, B.vec);
  162.                 return *this;
  163.         }
  164.                
  165.         //! multiplication vector with number float    
  166.         inline SPVector3  SPVector3::operator * (const float s) const
  167.         {      
  168.                 return SPVector3(x*s,y*s,z*s);
  169.         }
  170.  
  171.         //! vector multiplication with another vector
  172.         inline SPVector3 SPVector3::operator * (const SPVector3& A) const
  173.         {
  174.  
  175.                 return SPVector3(_mm_mul_ps(vec,A.vec));
  176.  
  177.         }
  178.        
  179.         //! vector multiplication and assignment
  180.         inline SPVector3& SPVector3::operator *= (const SPVector3& A)
  181.         {
  182.                 vec = _mm_mul_ps(vec, A.vec);
  183.                 return *this;
  184.         }
  185.        
  186.         //!  Vector multiplication by float. [Vec] = [Vec]*s.
  187.         inline SPVector3& SPVector3::operator *= (const float s)
  188.         {
  189.                 vec = vec * F32vec4(s);
  190.                 return *this;
  191.         }
  192.  
  193.        
  194.         //! divide operator
  195.         inline SPVector3 SPVector3::operator / (const SPVector3& other) const
  196.         {
  197.                 return SPVector3(_mm_div_ps(vec,other.vec));
  198.         }
  199.  
  200.         //! divides by float
  201.         inline SPVector3 SPVector3::operator / (const float v) const
  202.         {
  203.                 return SPVector3(x/v,y/v,z/v);
  204.         }
  205.  
  206.         //! divide and assign
  207.         inline  SPVector3& SPVector3::operator /= (const SPVector3& other)
  208.         {
  209.                 vec = _mm_div_ps(vec,other.vec);
  210.                 return *this;
  211.         }
  212.        
  213.         //! divide by float
  214.         inline SPVector3& SPVector3::operator /= (const float v)
  215.         {
  216.                 float i = 1.0f/v;
  217.                 x *= i;
  218.                 y *= i;
  219.                 z *= i;
  220.                 return *this;
  221.         }
  222.  
  223.        
  224.        
  225.         //! cross product of two vectors
  226.         inline SPVector3 SPVector3::cross(const SPVector3& p) const
  227.         {
  228.                 F32vec4 l1, l2, m1, m2;
  229.                 l1 = _mm_shuffle_ps(vec,vec, _MM_SHUFFLE(3,1,0,2));
  230.                 l2 = _mm_shuffle_ps(p.vec,p.vec, _MM_SHUFFLE(3,0,2,1));
  231.                 m2 = l1*l2;
  232.                 l1 = _mm_shuffle_ps(vec,vec, _MM_SHUFFLE(3,0,2,1));
  233.                 l2 = _mm_shuffle_ps(p.vec,p.vec, _MM_SHUFFLE(3,1,0,2));
  234.                 m1 = l1*l2;
  235.                 return m1-m2;
  236.        
  237.         }
  238.  
  239.        
  240.         //! Returns the dot product with another vector.
  241.         inline float SPVector3::dot(const SPVector3& other) const
  242.         {
  243.                 F32vec4 r = _mm_mul_ps(vec,other.vec);
  244.                 F32vec1 t = _mm_add_ss(_mm_shuffle_ps(r,r,1), _mm_add_ps(_mm_movehl_ps(r,r),r));
  245.                 return *(float *)&t;
  246.         }
  247.  
  248.         //! Returns length of the vector.
  249.         //! we use float 32 bit becuase of speed
  250.         inline f32 SPVector3::getLength() const
  251.         {
  252.                 F32vec4 r = _mm_mul_ps(vec,vec);
  253.                 F32vec1 t = _mm_add_ss(_mm_shuffle_ps(r,r,1), _mm_add_ss(_mm_movehl_ps(r,r),r));
  254.                 t = sqrt(t);
  255.                 return *(float *)&t;
  256.                
  257.         }
  258.  
  259.         //! Returns squared length of the vector.
  260.         //! This is useful because it is much faster then
  261.         //! getLength().
  262.         inline f32 SPVector3::getLengthSQ() const
  263.         {
  264.                 return x*x + y*y + z*z;
  265.         }
  266.  
  267.         //! normalize vector
  268.         inline SPVector3 SPVector3::normalize()
  269.         {
  270.                 F32vec4 r = _mm_mul_ps(vec,vec);
  271.                 F32vec1 t = _mm_add_ss(_mm_shuffle_ps(r,r,1), _mm_add_ss(_mm_movehl_ps(r,r),r));
  272.                 t = rsqrt_nr(t);
  273.                 vec = _mm_mul_ps(vec, _mm_shuffle_ps(t,t,0x00));
  274.                 return *this;
  275.         }
  276.  
  277.  
  278.        
  279. #pragma pack(pop)
  280.  
  281.  
  282.  


I am getting error at assignment operator when i am trying push_back some vertex.I think it's because of aligned data. Anyone can suggest good tutorial of beginner about aligned data and sse2 instructions ??

devil20























 
Chris

March 25, 2005, 06:52 AM

There's not much to be said about alignment.
When one says "this requires X to have N-byte alignment" one means that X's (byte) address is a multiple of N.

As a rule of thumb data types such as int, float, double, etc. should always have an alignment equal to their size, e.g. int's aligned to 4 byte boundaries, doubles to 8 byte boundaries.

So if a double X has address 0x0080010c, that's unaligned, because it's a multiple of 4 only. Of course totally odd addresses like 0x0080010d are also unaligned. 0x00800110 would be aligned, or 0x00800108.

Normally on the x86 platform one sticks to that rule for performance reasons only, the code would work just as well without alignment. There are other architectures that always require aligned data accesses.

But in the case of SSE, things are different. Intel chose not to support nonaligned data accesses, and that was a new thing to care about for some of us.

In your case, the __mm128 needs 128 bit (or 16-byte) alignment, addresses need to be a multiple of 16. You can check the last hexadecimal digit, it's got to be
0.

---

Where exactly is push_back in that code ?

Do you mean, you use std::vector, and push SPVector3s onto it ? Then you need to provide a special allocator that makes sure all elements are properly aligned. Ordinary new or malloc won't take care of that for you, they usually deliver 4-byte aligned addresses only.

 
devil20

March 25, 2005, 08:13 AM

Ok..I understand and getting more closer to problem. I think it's problem with allocation of memory with operator new.

Well i'm doing like this

  1.  
  2.  
  3.   cameraNode* cam = new CameraNode(SceneManager* mgr,SPVector3 pos,SPVector lookat);
  4.  
  5.  


In cameraNode header class has some SPVector variable in public area that is trying to intilize SPVector constructor because CameraNode class has been called but 8 time constructor intilize and from 9th time it's giving me Previledge instruction.

So Should i use _mm_malloc instead of new operator ?? because if i am right new operator has 8 byte aligned data so from 9th time i am getting error Am i right ?? (if wrong please forgive me because i'm not well experinced i'm just learning )

devil20







 
Chris

March 25, 2005, 10:04 AM

> because if i am right new operator has 8 byte aligned data so from 9th time i
> am getting error Am i right ??

That statement makes no sense. You need to look at the CameraNode structure. How large is it ? That's the amount of memory new allocates, not only a single byte.

Also subsequent calls of new do not make any guarantee that subsequent memory locations are allocated, in fact the returned locations might look really random. Depends on the memory management philosophy that is used internally.

Alignment has nothing to do with the number of allocations you do.

But let's return to the problem:
Say CameraNode looks like that

  1.  
  2. class CameraNode
  3. {
  4. public:
  5.   int m_foo;
  6.   double m_bar;
  7.   SPVector3s m_vec;
  8. };
  9.  


Now it's *at least* (The compiler might not pack them tightly, it may introduce null bytes between the fields) 4+8+16 = 28 bytes large. You're facing three problems:

1. You don't know about the order in which the fields are stored in memory.
The compiler may place m_vec first, then m_bar, then m_foo. Or any other arrangement it likes.

2. You must guarantee that m_vec is properly aligned WITHIN CameraNode, e.g. it's offset within the class must be a multiple of 16.

3. Still if you could guarantee that m_vec starts at an offset which is a multiple of 16, do you know that the CameraNode instance itself does ?
For example, let's say m_vec is at offset 16 within the CameraNode, but the instance is allocated at an address which is not a multiple of 16. Then m_vec also isn't.



1. and 2. can be solved using compiler-specifics, for example __declspec(align(16)) in Microsoft C++.

3. can be solved using special memory allocation routines, or by overloading the respective operators new and new[]. Microsoft C++ provides _aligned_malloc.



So basically you yet take care of nothing. Your public SPVector3 field may have unaligned offset, and your operator new may return an unaligned address.

_mm_malloc solves the alignment issue, but you cannot replace an operator new by a call to any malloc function, because mallocs never call the constructor.
You need to overload CameraNode's operators new and new[].

 
devil20

March 31, 2005, 10:46 PM

Hi Chris

Sorry for Delays Because i wasn't able to read reply. Ok now with orignal stuff. As i said it's problem with new operator. When i am allocating with new operatore data alignement is not 16 byte boundry , I don't know why ??

But i tried with my own new operator with aligned_malloc that is working but i have lot of class that using vector and new operator So each class need own new operator ???

  1.  
  2.  
  3. void * operator new (size_t cb)
  4. {
  5.   void* res = _aligned_malloc(cb,1);
  6.  
  7.   return res;
  8. }
  9.  
  10.  


This operator is fix my problem but this problem goes to another class like

  1.  
  2.  
  3.  CSceneManger* Manger = new CSceneManger(driver);
  4.  
  5.  


This type of lot class using new operator so i have to design each and every operator for all class ???

devil20

 
This thread contains 8 messages.
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.