Monday, April 11, 2011

Vector Cross Product using SSE Code

A common operation for two 3D vectors is the cross product:
|a.x|   |b.x|   | a.y * b.z - a.z * b.y |
|a.y| X |b.y| = | a.z * b.x - a.x * b.z |
|a.z|   |b.z|   | a.x * b.y - a.y * b.x |
Executing this operation using scalar instructions requires 6 multiplications and three subtractions. When using vectorized SSE code, the same operation can be performed using 2 multiplications, one subtraction and 4 shuffle operations:
inline __m128 CrossProduct(__m128 a, __m128 b)
  return _mm_sub_ps(
    _mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 0, 2, 1)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 1, 0, 2))), 
    _mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 1, 0, 2)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 0, 2, 1)))
Both registers a and b contain three floats (x, y and z) where the highest float of the 128-bit register is unused. The values can be loaded using the LoadFloat3 function or SSE set methods such as _mm_setr_ps(x, y, z, 0).


  1. Was looking for a simd cross product implementation and found your nice post. I eventually discovered that you can do it with only 3 shuffle instructions:

    inline __m128 CrossProduct( __m128 a, __m128 b )
    __m128 result = _mm_sub_ps(
    _mm_mul_ps(b, _mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 0, 2, 1))),
    _mm_mul_ps(a, _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 0, 2, 1)))
    return _mm_shuffle_ps(result, result, _MM_SHUFFLE(3, 0, 2, 1 ));

    1. I don't know what the other anonymous guy who came on here was smoking, but his code does not produce a proper 3D cross product... like at all... Here's what the original code would have done with two vectors, A, and B:

      (A3, A0, A2, A1) * (B3, B1, B0, B2) - (A3, A1, A0, A2) * (B3, B0, B2, B1)

      Since the multiplications go straight through, slot by slot, the resulting two vector is:

      (0, (A0B1 - A1B0), (A2B0 - A0B2), (A1B2 - A2B1))

      Here's what the code the anonymous person posted does:

      (B0, B1, B2, B3) * (A3, A0, A2, A1) - (A0, A1, A2, A3) * (B3, B0, B2, B1)

      Which results in the vector:

      ((B2A2 - A2B2), (B0A3 - A0B3), (B3A1 - A3B1), (B1A0 - A1B0))

      These are clearly not the same (to anyone who understands anything about... well... numbers... or variables... so anyone who would ever be interested in this sort of thing, anyway :P)

    2. No, it is correct. You got your shuffles wrong: The parameters of the _MM_SHUFFLE macro, from LEFT to RIGHT, select the source words for the result's HIGHEST to LOWEST word.

      Since you use low to high notation (B0, B1, B2, B3), you also have to write (A1, A2, A0, A3) (i.e.: low word comes first, high word comes last)

      More info on the shuffle macro can be found here:

  2. This code produces (1, 0, 0) X (0, 1, 0) = (0, 0, -1) so we don't have a right-handed coordinate system.