A common operation for two 3D vectors is the

cross product:

|a.x| |b.x| | a.y * b.z - a.z * b.y |
|a.y| X |b.y| = | a.z * b.x - a.x * b.z |
|a.z| |b.z| | a.x * b.y - a.y * b.x |

Executing this operation using scalar instructions requires 6 multiplications and three subtractions. When using vectorized SSE code, the same operation can be performed using 2 multiplications, one subtraction and 4 shuffle operations:

inline __m128 CrossProduct(__m128 a, __m128 b)
{
return _mm_sub_ps(
_mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 0, 2, 1)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 1, 0, 2))),
_mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 1, 0, 2)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 0, 2, 1)))
);
}

Both registers

`a`

and

`b`

contain three floats (x, y and z) where the highest float of the 128-bit register is unused. The values can be loaded using the

LoadFloat3 function or SSE set methods such as

`_mm_setr_ps(x, y, z, 0)`

.

Was looking for a simd cross product implementation and found your nice post. I eventually discovered that you can do it with only 3 shuffle instructions:

ReplyDeleteinline __m128 CrossProduct( __m128 a, __m128 b )

{

__m128 result = _mm_sub_ps(

_mm_mul_ps(b, _mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 0, 2, 1))),

_mm_mul_ps(a, _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 0, 2, 1)))

);

return _mm_shuffle_ps(result, result, _MM_SHUFFLE(3, 0, 2, 1 ));

}

I don't know what the other anonymous guy who came on here was smoking, but his code does not produce a proper 3D cross product... like at all... Here's what the original code would have done with two vectors, A, and B:

Delete(A3, A0, A2, A1) * (B3, B1, B0, B2) - (A3, A1, A0, A2) * (B3, B0, B2, B1)

Since the multiplications go straight through, slot by slot, the resulting two vector is:

(0, (A0B1 - A1B0), (A2B0 - A0B2), (A1B2 - A2B1))

Here's what the code the anonymous person posted does:

(B0, B1, B2, B3) * (A3, A0, A2, A1) - (A0, A1, A2, A3) * (B3, B0, B2, B1)

Which results in the vector:

((B2A2 - A2B2), (B0A3 - A0B3), (B3A1 - A3B1), (B1A0 - A1B0))

These are clearly not the same (to anyone who understands anything about... well... numbers... or variables... so anyone who would ever be interested in this sort of thing, anyway :P)

This code produces (1, 0, 0) X (0, 1, 0) = (0, 0, -1) so we don't have a right-handed coordinate system.

ReplyDelete