The

IEEE 754 floating point format defines the memory layout for the

`C++ float`

datatype. It consists of a one bit sign, the 8 bit exponent and 23 bits that store the fractional part of the value.

float x = [sign (1 bit) | exponent (8bit) | fraction (23bit)]

We can use this knowledge about the memory-layout in order to change the sign of floating point values without the need for floating point arithmetic.

For example, calculating the absolute value of a floating point number is equivalent to setting the sign bit to zero. In SSE we can do this for four float values simultaneously by using a binary mask and logical operations:

static const __m128 SIGNMASK =
_mm_castsi128_ps(_mm_set1_epi32(0x80000000));
__m128 val = /* some value */;
__m128 absval = _mm_andnot_ps(SIGNMASK, val); // absval = abs(val)
//...

In a similar way we can negate floating point numbers by simply negating their highest bit:

__m128 val = /* some value */;
__m128 minusval = _mm_xor_ps(val, SIGNMASK); // minusval = -val
//...

Just wanted to point out that there's a zero missing in your mask :)

ReplyDeleteShould be : static const __m128 SIGNMASK = _mm_castsi128_ps(_mm_set1_epi32(0x80000000));

thank you, that was a typo - it's now corrected

ReplyDeleteThank you, exactly what I was looking for.

ReplyDeleteBut actually you have to use _mm_xor_ps(SIGNMASK, val) because the sse instruction negates the first entry and not the second.

Just to point out that another, slightly more efficient (yet mathematically unpleasant) way of generating SIGNMASK is _mm_set1_ps(-0.0f)). as in _mm_xor_ps(val, _mm_set1_ps(-0.0f)). I don't now about the portability, but I seen lots of reference to this technique elsewhere.

ReplyDeleteIf you compare the assembly outputs, this should be faster (three instructions):

ReplyDelete__m128 vec = _mm_load_ps1(&f);

vec = _mm_and_ps(vec, _mm_castsi128_ps(_mm_set_epi32(0,0,0,~(1<<31))) );

f = _mm_cvtss_f32(vec);

Although, Clang's `fabsf` does it in two, which I think is minimal. Probably, it was handcoded assembly.