Monday, April 11, 2011

Vector Cross Product using SSE Code

A common operation for two 3D vectors is the cross product:
|a.x|   |b.x|   | a.y * b.z - a.z * b.y |
|a.y| X |b.y| = | a.z * b.x - a.x * b.z |
|a.z|   |b.z|   | a.x * b.y - a.y * b.x |
Executing this operation using scalar instructions requires 6 multiplications and three subtractions. When using vectorized SSE code, the same operation can be performed using 2 multiplications, one subtraction and 4 shuffle operations:
inline __m128 CrossProduct(__m128 a, __m128 b)
{
  return _mm_sub_ps(
    _mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 0, 2, 1)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 1, 0, 2))), 
    _mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 1, 0, 2)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 0, 2, 1)))
  );
}
Both registers a and b contain three floats (x, y and z) where the highest float of the 128-bit register is unused. The values can be loaded using the LoadFloat3 function or SSE set methods such as _mm_setr_ps(x, y, z, 0).

No comments:

Post a Comment