Monday, April 11, 2011

Vector Cross Product using SSE Code

A common operation for two 3D vectors is the cross product:
|a.x|   |b.x|   | a.y * b.z - a.z * b.y |
|a.y| X |b.y| = | a.z * b.x - a.x * b.z |
|a.z|   |b.z|   | a.x * b.y - a.y * b.x |
Executing this operation using scalar instructions requires 6 multiplications and three subtractions. When using vectorized SSE code, the same operation can be performed using 2 multiplications, one subtraction and 4 shuffle operations:
inline __m128 CrossProduct(__m128 a, __m128 b)
  return _mm_sub_ps(
    _mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 0, 2, 1)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 1, 0, 2))), 
    _mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 1, 0, 2)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 0, 2, 1)))
Both registers a and b contain three floats (x, y and z) where the highest float of the 128-bit register is unused. The values can be loaded using the LoadFloat3 function or SSE set methods such as _mm_setr_ps(x, y, z, 0).


  1. Was looking for a simd cross product implementation and found your nice post. I eventually discovered that you can do it with only 3 shuffle instructions:

    inline __m128 CrossProduct( __m128 a, __m128 b )
    __m128 result = _mm_sub_ps(
    _mm_mul_ps(b, _mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 0, 2, 1))),
    _mm_mul_ps(a, _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 0, 2, 1)))
    return _mm_shuffle_ps(result, result, _MM_SHUFFLE(3, 0, 2, 1 ));

    1. I don't know what the other anonymous guy who came on here was smoking, but his code does not produce a proper 3D cross product... like at all... Here's what the original code would have done with two vectors, A, and B:

      (A3, A0, A2, A1) * (B3, B1, B0, B2) - (A3, A1, A0, A2) * (B3, B0, B2, B1)

      Since the multiplications go straight through, slot by slot, the resulting two vector is:

      (0, (A0B1 - A1B0), (A2B0 - A0B2), (A1B2 - A2B1))

      Here's what the code the anonymous person posted does:

      (B0, B1, B2, B3) * (A3, A0, A2, A1) - (A0, A1, A2, A3) * (B3, B0, B2, B1)

      Which results in the vector:

      ((B2A2 - A2B2), (B0A3 - A0B3), (B3A1 - A3B1), (B1A0 - A1B0))

      These are clearly not the same (to anyone who understands anything about... well... numbers... or variables... so anyone who would ever be interested in this sort of thing, anyway :P)

  2. This code produces (1, 0, 0) X (0, 1, 0) = (0, 0, -1) so we don't have a right-handed coordinate system.