Cilk Plus Tutorial

    Next: Cilk Plus Keywords

Introducing Intel® Cilk Plus:
Extensions to simplify task and data parallelism

Intel® Cilk Plus adds simple language extensions to the C and C++languages to express task and data parallelism. These language extensions are powerful, yet easy to apply and use in a wide range of applications.

Intel Cilk Plus includes the following features and benefits:

Feature Benefit
Keywords Simple, powerful expression of task parallelism:
  • cilk_for - Parallelize for loops
  • cilk_spawn - Specifies that a function can execute in parallel with the remainder of the calling function
  • cilk_sync - Specifies that all spawned calls in a function must complete before execution continues
Reducers Eliminate contention for shared variables among tasks by automatically creating views of them as needed and "reducing" them in a lock free manner.
Array Notation Data parallelism for arrays or sections of arrays.
SIMD-Enabled Functions Define functions that can be vectorized when called from within an array notation expression or a #pragma simd loop.
#pragma simd Specifies that a loop is to be vectorized

Serial Semantics

A deterministic Intel Cilk Plus application has serial semantics. That is, the result of a parallel run is the same as if the program had executed serially. Serial semantics makes it easier to reason about the parallel application. In addition, developers can use familiar tools to debug the application.

Cilk Keywords

Intel Cilk Plus adds three keywords to C and C++ to allow developers to express opportunities for parallelism.

  • cilk_spawn - Specifies that a function call can execute asynchronously, without requiring the caller to wait for it to return. This is an expression of an opportunity for parallelism, not a command that mandates parallelism. The Intel Cilk Plus runtime will choose whether to run the function in parallel with its caller.
  • cilk_sync - Specifies that all spawned calls in a function must complete before execution continues. There is an implied cilk_sync at the end of every function that contains a cilk_spawn.
  • cilk_for - Allows iterations of the loop body to be executed in parallel.

As stated above, the cilk_spawn and cilk_for keywords express opportunites for parallelism. Which portions of your application that actually run in parallel is determined by the Intel Cilk Plus runtime that implements task parallelism with an efficient work-stealing scheduler.


Intel Cilk Plus includes reducers to help make parallel programming easier. Traditional parallel programs use locks to protect shared variables, which can be problematic.  Incorrect lock use can result in deadlocks.  Contention for locked regions of code can slow a program down.  And while locks can prevent races, there is no way to enforce ordering, resulting in non-deterministic results. Reducers provide a lock-free mechanism that allows parallel code to use private "views" of a variable which are merged at the next sync. The merge is done in an ordered manner  to maintain the serial semantics of the Intel Cilk Plus application.

Task Parallelism Tools

The Intel Cilk Plus SDK contains race detection and scalability analysis tools for Cilk-style parallelized binaries.  The Cilk tools support code compiled with both the C/C++ compiler from the Intel® Parallel Studio XE tool suites and the GCC "cilkplus" branch C/C++ compiler.

Array Notation

Intel Cilk Plus includes a set of notations that allow users to express high-level operations on entire arrays or sections of arrays. These notations help the compiler to effectively vectorize the application. Intel Cilk Plus allows C/C++ operations to be applied to multiple array elements in parallel, and also provides a set of builtin functions that can be used to perform vectorized shifts, rotates, and reductions.

SIMD-Enabled Functions

A SIMD-enabled function is a regular function which can be invoked either on scalar arguments or on array elements in parallel.  They are most useful when combined with array notation or #pragma simd.

#pragma simd

This pragma gives the compiler permission to vectorize a loop even in cases where auto-vectorization might fail. It is the simplest way to manually apply vectorization.


    Next: Cilk Plus Keywords