FAQ

General Questions

What is Intel® Cilk™ Plus?

Intel® Cilk™ Plus is an extension to the C and C++ languages to support task and data parallelism. Unlike other threading packages, Intel Cilk Plus is not just a library.   It is a language extension that is implemented by the compiler and the Intel Cilk Plus runtime, allowing lower overhead than library-only solutions.

Where do I download Intel Cilk Plus?

The commercial version of Intel Cilk Plus is available in binary form, as part of the Intel® Parallel Studio XE suites offering.

The open source version of Intel Cilk Plus is available in source form, both as part of the GCC “cilkplus” branch of the GCC v4.8 C/C++ compiler, and from the Download page.

The “cilkplus” branch includes all sources necessary to build the Intel Cilk Plus runtime and the C and C++ compilers required to compile the Intel Cilk Plus extension for Linux* or OS X*.

The sources available from the Download page include all sources necessary to build the Intel Cilk Plus runtime for Linux* or OS X*.  You’ll need a compiler that implements the Intel Cilk Plus extension; either ICC or the GCC v4.8 “cilkplus” branch.

Where do you see Intel Cilk Plus going in the future?

Intel Cilk Plus will go where our customers take it – we’ll study the input we receive from developers based on their experience with their application use. We’ll encourage experimentation and look to fold the best ideas back into Intel Cilk Plus regularly with at least a commercial release a year.  In addition we are proposing Intel Cilk Plus as a feature for a future version of the C++ standard.

How did Intel Cilk Plus come to be?

Intel Cilk Plus is a combination of two streams of development in task and data parallelism.

The task parallelism in Intel Cilk Plus is based on 15 years of research at MIT and other academic institutions, culminating in Cilk-5.4.6, a source-to-source translator that converts Cilk code to C, and then compiles the resulting C source.

Cilk Arts, Inc. licensed the Cilk technology from MIT and used it as the basis of  the Cilk++ language   The Cilk++ language also introduced Reducers which allow applications to update non-local variables without locks and provide a deterministic result, regardless of the scheduling of the application.  Cilk Arts released compilers for Windows and Linux that extended C++ with the Cilk keywords.

Intel has worked for many years on better ways to expose the vector units provided by modern CPUs.  The vectorization features of Intel Cilk Plus were inspired by similar programming languages such as Fortran and MATLAB*. When Intel acquired Cilk Arts in 2009, it recognized that the combination of data and task parallelism was greater than the individual pieces and merged the technologies to create Intel Cilk Plus.  Intel Cilk Plus was initially released in the Intel® C++ Composer XE 2011 compiler.

What are the Key Features of Intel Cilk Plus?

 

Keywords

Language additions to C/C++ to specify task parallelism in an application

Reducers

A mechanism to eliminate contention for shared variables in a lock-free manner

#pragma simd

A directive to the compiler that a loop contains data parallelism and should be vectorized

Array Notations

Language additions to C/C++ to specify data parallelism for arrays or sections of arrays

Elemental Functions

An annotation specifying tha a function can be called with array or array sections as arguments, as well as scalar values

 

What variants of Cilk are there? How can I tell which I have?
  • MIT Cilk – The original research project from MIT, culminating in Cilk-5.4.6.  MIT Cilk was implemented as a source-to-source translator that converts Cilk code to C and them compiled the resulting C source.
    • Only supported C code with Cilk keywords
    • All parallel functions had to be marked with a cilk keyword
    • Cilk functions had to be spawned, not called
  • Cilk++ - Compilers developed by Cilk Arts, Inc.  Cilk Arts licensed the Cilk technology from MIT.
    • Only supported C++ code
    • Used a non-standard calling convention, meaning you had to use a cilk::context to call Cilk functions from C or C++
    • Cilk files used the .cilk extension
    • Released by Intel as unsupported software through the WhatIf site
  • Intel Cilk Plus – Fully integrated into the Intel C/C++ compiler.
    • Supports both C and C++
    • Uses standard calling conventions
    • Includes both task and data parallelism
Why do C and C++ need a language extension for parallelism?

Modern CPUs provide vector units to support data parallelism, as well as multiple cores in each CPU.  Unfortunately, C and C++, like other popular programming languages, were not designed to express either kind of parallelism.  C and C++ developers are left with two choices: program at a low level of abstraction using native threads, or restructure their applications to fit a parallel library. Intel Cilk Plus provides a third alternative, providing a higher level of abstraction for expression both task and data parallelism without significant code restructuring.

Abstraction is an important tool for developers. Using native threads, doing your own explicit thread management, is like assembly language for parallelism.  Programming for parallelism using native threads is tedious, error prone, and not portable. Applications written using native threads are also seldom as scalable as they could be since achieving high levels of scalability requires significant expertise, programmer effort and machine-specific tuning.

Intel Cilk Plus supports a higher level of abstraction by providing constructs for expressing potential parallelism in an application, not mandating it in the compiled code or runtime.   This higher level of abstraction separates the specification of parallelism from scheduling: the programmer is free to focus only on what code is allowed to execute in parallel, not what the underlying system should do to execute that code efficiently.

By providing an abstraction for parallelism, Cilk Plus allows developers to express their parallel algorithms in a way that is more natural than many existing parallel libraries. Existing libraries for writing parallel programs such as TBB and OpenMP require the developer to restructure their application to break it into tasks that can be run in parallel.  This restructuring can obscure the original algorithm, making it harder for the programmer to reason about and maintain the application.  Because the Intel Cilk Plus keywords have serial semantics, developers can often express the parallelism inherent in their algorithm by adding only a few keywords to the original serial code.

Finally, Intel Cilk Plus allows you to write composable code.  That is, you can write a library using Intel Cilk Plus and incorporate that into other applications using Intel Cilk Plus without having to worry about an explosion of threads.  Intel Cilk Plus uses a randomized work-stealing scheduler to perform dynamic load-balancing which allows the scheduler to efficiently handle nested parallelism in both application and library without oversubscribing the machine with OS threads.

Have you thought about asking that Intel Cilk Plus be part of the C++ standard?

Intel has presented Cilk Plus as a proposal to the C++ standards committee, and we will continue shepherding it through the standardization process. The next C++ standard is expected by 2017.

Why is maintaining serial semantics an advantage?
  • The equivalent serial C or C++ program can be debugged and analyzed using existing development tools.
  • Serial semantics makes testing and quality assurance easier. A properly written Cilk Plus application will deliver the same output every time, making it easier to verify your results.
What do you mean by "dynamic load-balancing"?

A simple way to understand dynamic load-balancing is compare it to the alternative, namely static load-balancing.   Suppose you wanted to execute a parallel loop over n items on  machine with 4 cores.  A static scheduler (i.e., a scheduler that uses static load-balancing) might typically divide the n items into 4 equal pieces, one for each core, and have each core process n/4 items.  The scheduler in this example is static, since the scheduling decision is made completely up front, before loop starts.

In contrast, a dynamic scheduler (one that performs dynamic load-balancing) makes scheduling decisions at runtime, as the work is executing.  For example, in Cilk, when a parallel loop executes the work of all n items conceptually starts on a single initial worker thread.  This work is load-balanced (e.g., distributed) to other worker threads only as those worker threads become idle and steal work.

Why do I want dynamic load-balancing? Isn't static scheduling good enough?

Dynamic load-balancing offers three important advantages over static scheduling.   It provides

  1. Better scalability for parallel applications with irregular parallelism.

    Dynamic load-balancing can significantly outperform static load-balancing in programs whose parallelism is difficult or impossible to predict ahead of time.  For example, in a parallel loop where the work of each loop iteration varies significantly, it can be difficult for a static schedule to divide work evenly among multiple cores, especially if the work of each loop iteration depends on the data-input for the loop.

  2. More robust performance for applications running in multiprogrammed environments.

    Dynamic load-balancing provides more robust performance in multiprogrammed environments because the scheduler can adapt as other programs enter and leave the system and effectively change the number of available cores to the program.  Consider a program that is written using static scheduling, and tuned expecting to use 4 cores.  While that program is executing, if another new program arrives and starts using one of the 4 cores, then the carefully tuning of the static schedule is wrecked, since effectively now only 3 cores are available to the original program.  This limitation of static scheduling can manifest even on systems where you only expect to run one program at a time, since the OS itself can periodically choose to occupy a core for its own use.
     

  3. Improved composability when developing parallel libraries because it allows one to write processor-oblivious code, i.e., code that is independent of the number of threads/cores/processors it should execute on.

    Dynamic load-balancing simplifies the development of composable parallel libraries because it frees the user of a library from worrying about how many cores are required to execute a particular library function.

    Imagine that you are trying to code a parallel library function f, and you are trying to use static scheduling and OS-native threads.   How many threads should you use inside f?  The optimal answer may depend not only on the input to the f, but how the library user is calling the function, how many parallel instances of the f the user is creating, and what else the user may wish to execute in parallel at the same time.  To avoid oversubscribing a machine with many more threads than cores, the library user often needs to manage how many instances of f are created and how many threads to use in each instance.
    By writing a library using Cilk Plus, which has dynamic load balancing, you can rely on the Cilk Plus runtime to dynamically determine how to schedule each function instance, instead of burdening your library user with thread management.  The work-stealing scheduler in Cilk will adaptively load-balance work between multiple parallel instances off.

How will Intel Cilk Plus help my software run on future processors?

Intel Cilk Plus will detect the number of cores on the hardware platform and make the necessary adjustments to allow your software to adapt. All you have to do is make sure that your application exposes sufficient parallelism to take advantage of the additional cores. Use of future vector features will require only a recompilation.

What hardware platforms are supported? What OSes?

Intel currently offers Intel support for Intel Cilk Plus for use on the IA-32 and Intel® 64 hardware platforms, and the Windows*, Linux* and OS X* operating systems.  We look forward to working with the GCC community to port Intel Cilk Plus to other architectures and operating systems.