Monday, March 28, 2011

Changing the sign of float values using SSE code

The IEEE 754 floating point format defines the memory layout for the C++ float datatype. It consists of a one bit sign, the 8 bit exponent and 23 bits that store the fractional part of the value.
float x = [sign (1 bit) | exponent (8bit) | fraction (23bit)]
We can use this knowledge about the memory-layout in order to change the sign of floating point values without the need for floating point arithmetic. For example, calculating the absolute value of a floating point number is equivalent to setting the sign bit to zero. In SSE we can do this for four float values simultaneously by using a binary mask and logical operations:
static const __m128 SIGNMASK = 
               _mm_castsi128_ps(_mm_set1_epi32(0x80000000));
__m128 val = /* some value */;
__m128 absval = _mm_andnot_ps(SIGNMASK, val); // absval = abs(val)
//...
In a similar way we can negate floating point numbers by simply negating their highest bit:
__m128 val = /* some value */;
__m128 minusval = _mm_xor_ps(val, SIGNMASK); // minusval = -val
//...

5 comments:

  1. Just wanted to point out that there's a zero missing in your mask :)
    Should be : static const __m128 SIGNMASK = _mm_castsi128_ps(_mm_set1_epi32(0x80000000));

    ReplyDelete
  2. thank you, that was a typo - it's now corrected

    ReplyDelete
  3. Thank you, exactly what I was looking for.
    But actually you have to use _mm_xor_ps(SIGNMASK, val) because the sse instruction negates the first entry and not the second.

    ReplyDelete
  4. Just to point out that another, slightly more efficient (yet mathematically unpleasant) way of generating SIGNMASK is _mm_set1_ps(-0.0f)). as in _mm_xor_ps(val, _mm_set1_ps(-0.0f)). I don't now about the portability, but I seen lots of reference to this technique elsewhere.

    ReplyDelete
  5. If you compare the assembly outputs, this should be faster (three instructions):

    __m128 vec = _mm_load_ps1(&f);
    vec = _mm_and_ps(vec, _mm_castsi128_ps(_mm_set_epi32(0,0,0,~(1<<31))) );
    f = _mm_cvtss_f32(vec);

    Although, Clang's `fabsf` does it in two, which I think is minimal. Probably, it was handcoded assembly.

    ReplyDelete