Tutorial: Cilk Plus Reducers

Previous: Keywords Up: Overview Next: Task Parallelism Tools

Attempting to modify a shared variable from multiple parallel threads simultaneously is called a race. The standard way to deal with races is to use locks or mutexes to serialize access to the variable. For example, consider the following code which generates a list of the letters 'a' to 'z':

void locked_list_test()
{
    mutex m;
    std::list<char>letters;

    // Build the list in parallel
    cilk_for(char ch = 'a'; ch <= 'z'; ch++)
    {
        simulated_work();

        m.lock();
        letters.push_back(ch);
        m.unlock();
    }

    // Show the resulting list
    std::cout << "Letters from locked list: ";
    for(std::list<char>::iterator i = letters.begin(); i != letters.end(); i++)
    {
        std::cout << " " << *i;
    }
    std::cout << std::endl;
}

Since STL lists are not thread-safe, we must use a mutex to serialize access to the list "letters."

While the mutex guarantees that the access is thread-safe, it doesn't make any guarantees about ordering, so the resulting list will be jumbled and diffent on every run with more than 1 worker. Running the locked_list_test() routine will result in output something like:

Letters from locked list:  y g n d t a w x e z q j o h b u f v c k i r p l m s

Here's the function rewritten using a reducer_list.  The code that has been modified is highlighted:

void reducer_list_test()
{
    cilk::reducer< cilk::op_list_append<char> > letters_reducer;

    // Build the list in parallel
    cilk_for(char ch = 'a'; ch <= 'z'; ch++)
    {
        simulated_work();
        letters_reducer->push_back(ch);
    }

    // Fetch the result of the reducer as a standard STL list
    const std::list<char> &letters = letters_reducer.get_value();

    // Show the resulting list
    std::cout << "Letters from reducer_list:";
    for(std::list<char>::const_iterator i = letters.begin(); i != letters.end(); i++)
    {
        std::cout << " " << *i;
    }
    std::cout << std::endl;
}

Running the reducer_list_test() routine will result in the following output regardless of how many workers there are:

Letters from reducer_list: a b c d e f g h i j k l m n o p q r s t u v w x y z

Click to download the  cilk-reducers-demo.cpp sample code.

Cilk Plus reducers provide a number of useful properties:

  • Each strand has a private view of the reducer, so we don't need to use mutexes to serialize access to the reducer. The views are combined by the Cilk runtime by calling the reduce() function of the reducer's monoid when views sync.
  • The reduce() function is called so that the strands are combined in the order that would have occurred if the program were run with one worker.

Unlike some other parallel frameworks, Intel Cilk Plus reducers are not limited to loops.  Here's a version of the Fibonacci number calculation that's been reworked to use reducers:

cilk::reducer< cilk::op_add<int> > fib_sum(0);

void fib_with_reducer_internal(int n)
{
    if (n < 2)
    {
        *fib_sum += n;
    }
    else
    {
        cilk_spawn fib_with_reducer_internal(n-1);
        fib_with_reducer_internal(n-2);
        cilk_sync;
    }
}

The Cilk Plus Reducer Library

Intel Cilk Plus ships with a library of reducers you can use in your C and C++ code. The reducers are shipped as a series of include files which you can study and make the basis of your own reducers.  The include files comments are in doxygen format so you can generate your own copy of the reducer documentation in HTML or other formats.

Lists
reducer_list_append Creates a list by adding elements to the back.
reducer_list_prepend Creates a list by adding elements to the front.
Min/Max
reducer_max Calculates the maximum value of a set of values.
reducer_max_index Calculates the maximum value and index of that value of a set of values.
reducer_min Calculates the minimum value of a set of values.
reducer_min_index Calculates the minimum value and index of that value of a set of values.
Math Operators
reducer_opadd Calculates the sum of a set of values.
Bitwise Operators
reducer_opand Calculates the binary AND of a set of values.
reducer_opor Calculate the binary OR of a set of values.
reducer_opxor Calculate the binary XOR of a set of values.
String Operators
reducer_string Accumulates a string using append operations.
reducer_wstring Accumulates a "wide" string using append operations.
Files
reducer_ostream An output stream that can be written in parallel.
Previous: Keywords Up: Overview Next: Task Parallelism Tools