Monday, March 28, 2011

Changing the sign of float values using SSE code

The IEEE 754 floating point format defines the memory layout for the C++ float datatype. It consists of a one bit sign, the 8 bit exponent and 23 bits that store the fractional part of the value.
float x = [sign (1 bit) | exponent (8bit) | fraction (23bit)]
We can use this knowledge about the memory-layout in order to change the sign of floating point values without the need for floating point arithmetic. For example, calculating the absolute value of a floating point number is equivalent to setting the sign bit to zero. In SSE we can do this for four float values simultaneously by using a binary mask and logical operations:
static const __m128 SIGNMASK = 
__m128 val = /* some value */;
__m128 absval = _mm_andnot_ps(SIGNMASK, val); // absval = abs(val)
In a similar way we can negate floating point numbers by simply negating their highest bit:
__m128 val = /* some value */;
__m128 minusval = _mm_xor_ps(val, SIGNMASK); // minusval = -val


  1. Just wanted to point out that there's a zero missing in your mask :)
    Should be : static const __m128 SIGNMASK = _mm_castsi128_ps(_mm_set1_epi32(0x80000000));

  2. thank you, that was a typo - it's now corrected

  3. Thank you, exactly what I was looking for.
    But actually you have to use _mm_xor_ps(SIGNMASK, val) because the sse instruction negates the first entry and not the second.

  4. Just to point out that another, slightly more efficient (yet mathematically unpleasant) way of generating SIGNMASK is _mm_set1_ps(-0.0f)). as in _mm_xor_ps(val, _mm_set1_ps(-0.0f)). I don't now about the portability, but I seen lots of reference to this technique elsewhere.

  5. If you compare the assembly outputs, this should be faster (three instructions):

    __m128 vec = _mm_load_ps1(&f);
    vec = _mm_and_ps(vec, _mm_castsi128_ps(_mm_set_epi32(0,0,0,~(1<<31))) );
    f = _mm_cvtss_f32(vec);

    Although, Clang's `fabsf` does it in two, which I think is minimal. Probably, it was handcoded assembly.