Showing posts with label loops. Show all posts
Showing posts with label loops. Show all posts

Sunday, March 10, 2013

Efficient Processing of Arrays using SSE/SIMD and C++ Functors

This post is about how to write beautiful SIMD/SSE code that is easy to debug and maintain, yet allows to produce optimal code with zero overhead. It builds upon existing posts such as the Template based vector library and How to unroll a loop in C++. Here we present a pattern that is fully based on functors and similar to array processing in the Standard Template Library (STL).
Given one or multiple array of values (such as float, double, int, etc.), we want to process the array element wise. For example, we want to scale and add the value of an input array and add that to the output array. This is exactly the behavior of the ?AXPY method of the BLAS linear algebra library:
Y = A*X + Y
Note that this special function is just a running example (we will use the single precision datatype float, thus our function of interest is called saxpy). The underlying concept is true for any element-wise array function.

Friday, December 30, 2011

Template based vector library with SSE support

This article explains how to set up a vector math library that performs mathematical operations on arbitrary sized float arrays or vectors. It can handle aligned and unaligned pointers with minimal code overhead but optimal runtime performance. This is achieved through template functions and compile-time arguments.
This tutorial is based on a simple code example: adding two arrays and storing the result in a third array. Step by step, we will introduce loop SSE intrinsics, loop unrolling and functor based concepts which allow to build a library with different operations.

Tuesday, April 5, 2011

How to Unroll a For-Loop in C++

In this article we will show you how to manually unroll a loop in C++. Loop unrolling has performance advantages due to the reduced overhead of checking and advancing the loop counter at each iteration. Also, when using vectorized mathematical operations such as SSE instructions, it is possible to perform some iterations of the same loop in parallel. Let's first focus on the general concept of loop unrolling with a simple example:

Friday, March 25, 2011

Fast iteration over STL vector elements

The STL class std::vector is a great container for managing dynamic arrays. Unfortunately it introduces a slight overhead when iterating through it using an index. For example, the following code is not very efficient: