FastC++: Coding Cpp Efficiently: Vector Cross Product using SSE Code

Monday, April 11, 2011

Vector Cross Product using SSE Code

A common operation for two 3D vectors is the cross product:

|a.x|   |b.x|   | a.y * b.z - a.z * b.y |
|a.y| X |b.y| = | a.z * b.x - a.x * b.z |
|a.z|   |b.z|   | a.x * b.y - a.y * b.x |

Executing this operation using scalar instructions requires 6 multiplications and three subtractions. When using vectorized SSE code, the same operation can be performed using 2 multiplications, one subtraction and 4 shuffle operations:

inline __m128 CrossProduct(__m128 a, __m128 b)
{
  return _mm_sub_ps(
    _mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 0, 2, 1)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 1, 0, 2))), 
    _mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 1, 0, 2)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 0, 2, 1)))
  );
}

Both registers a and b contain three floats (x, y and z) where the highest float of the 128-bit register is unused. The values can be loaded using the LoadFloat3 function or SSE set methods such as _mm_setr_ps(x, y, z, 0).

5 comments:

AnonymousAugust 12, 2013 at 1:02 PM
Was looking for a simd cross product implementation and found your nice post. I eventually discovered that you can do it with only 3 shuffle instructions:

inline __m128 CrossProduct( __m128 a, __m128 b )
{
__m128 result = _mm_sub_ps(
_mm_mul_ps(b, _mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 0, 2, 1))),
_mm_mul_ps(a, _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 0, 2, 1)))
);
return _mm_shuffle_ps(result, result, _MM_SHUFFLE(3, 0, 2, 1 ));
}
ReplyDelete
Replies
Waldemar BancewiczNovember 11, 2013 at 9:53 PM
This code produces (1, 0, 0) X (0, 1, 0) = (0, 0, -1) so we don't have a right-handed coordinate system.
ReplyDelete
Replies
AnonymousJuly 20, 2014 at 6:26 PM
@Waldemar just swap the two operands.
ReplyDelete
Replies

Add comment

FastC++: Coding Cpp Efficiently

Monday, April 11, 2011

Vector Cross Product using SSE Code

5 comments:

About Me

Useful Links