A common operation for two 3D vectors is the
cross product:
|a.x| |b.x| | a.y * b.z - a.z * b.y |
|a.y| X |b.y| = | a.z * b.x - a.x * b.z |
|a.z| |b.z| | a.x * b.y - a.y * b.x |
Executing this operation using scalar instructions requires 6 multiplications and three subtractions. When using vectorized SSE code, the same operation can be performed using 2 multiplications, one subtraction and 4 shuffle operations:
inline __m128 CrossProduct(__m128 a, __m128 b)
{
return _mm_sub_ps(
_mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 0, 2, 1)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 1, 0, 2))),
_mm_mul_ps(_mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 1, 0, 2)), _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 0, 2, 1)))
);
}
Both registers
a and
b contain three floats (x, y and z) where the highest float of the 128-bit register is unused. The values can be loaded using the
LoadFloat3 function or SSE set methods such as
_mm_setr_ps(x, y, z, 0).
No comments:
Post a Comment