The
IEEE 754 floating point format defines the memory layout for the
C++ float
datatype. It consists of a one bit sign, the 8 bit exponent and 23 bits that store the fractional part of the value.
float x = [sign (1 bit) | exponent (8bit) | fraction (23bit)]
We can use this knowledge about the memory-layout in order to change the sign of floating point values without the need for floating point arithmetic.
For example, calculating the absolute value of a floating point number is equivalent to setting the sign bit to zero. In SSE we can do this for four float values simultaneously by using a binary mask and logical operations:
static const __m128 SIGNMASK =
_mm_castsi128_ps(_mm_set1_epi32(0x80000000));
__m128 val = /* some value */;
__m128 absval = _mm_andnot_ps(SIGNMASK, val); // absval = abs(val)
//...
In a similar way we can negate floating point numbers by simply negating their highest bit:
__m128 val = /* some value */;
__m128 minusval = _mm_xor_ps(val, SIGNMASK); // minusval = -val
//...
Just wanted to point out that there's a zero missing in your mask :)
ReplyDeleteShould be : static const __m128 SIGNMASK = _mm_castsi128_ps(_mm_set1_epi32(0x80000000));
thank you, that was a typo - it's now corrected
ReplyDeleteThank you, exactly what I was looking for.
ReplyDeleteBut actually you have to use _mm_xor_ps(SIGNMASK, val) because the sse instruction negates the first entry and not the second.
Just to point out that another, slightly more efficient (yet mathematically unpleasant) way of generating SIGNMASK is _mm_set1_ps(-0.0f)). as in _mm_xor_ps(val, _mm_set1_ps(-0.0f)). I don't now about the portability, but I seen lots of reference to this technique elsewhere.
ReplyDeleteIf you compare the assembly outputs, this should be faster (three instructions):
ReplyDelete__m128 vec = _mm_load_ps1(&f);
vec = _mm_and_ps(vec, _mm_castsi128_ps(_mm_set_epi32(0,0,0,~(1<<31))) );
f = _mm_cvtss_f32(vec);
Although, Clang's `fabsf` does it in two, which I think is minimal. Probably, it was handcoded assembly.