Using Cilk Plus

Do I have to use Intel’s compilers?

Intel Cilk Plus is an extension to the C and C++ languages. At this time it supported by the Intel® C++ Composer XE (2011 or later) compiler for Windows*, Linux* and OS X*, as well as the “cilkplus” branch of GCC 4.8 and the "cilkplus" branch of LLVM/Clang. Cilk Plus is also supported by the GCC 4.9 distribution.

Does Intel Cilk Plus support non-Intel hardware, and/or operating systems other than Windows*/Linux*/OS X*?

Not at this time. However, we look forward to working with the GCC community to port Intel Cilk Plus to other architectures and operating systems.

I only have a Dual Core, not a Quad Core, processor… will Intel Cilk Plus help me?

Absolutely! Intel Cilk Plus will let you exploit the vector units in your dual core CPU. And the Intel Cilk Plus runtime automatically detects the number of cores and attempts to distribute your application’s work among the available cores. A Cilk Plus application developed on 2 cores will automatically take advantage of more cores without recompilation, provided that the application exposes enough parallelism. Conversely, an application developed on a system with a large number of cores will run correctly, if more slowly, on a system with fewer cores.

Is there a version of Intel Cilk Plus that provides statically linked libraries?

Intel Cilk Plus is not provided as a statically linked library, for the following reasons:

Most libraries operate locally. For example, an Intel(R) MKL FFT transforms an array. It is irrelevant how many copies of the FFT there are. Multiple copies and versions can coexist without difficulty. But some libraries control program-wide resources, such as memory and processors. For example, garbage collectors control memory allocation across a program. Analogously, Intel Cilk Plus controls scheduling of tasks across a program. To do their job effectively, each of these must be a singleton; that is, have a sole instance that can coordinate activities across the entire program. Allowing k instances of the Intel Cilk Plus scheduler in a single program would cause there to be k times as many software threads as hardware threads. The program would operate inefficiently, because the machine would be oversubscribed by a factor of k, causing more context switching, cache contention, and memory consumption. Furthermore, Intel Cilk Plus 's efficient support for nested parallelism would be negated when nested parallelism arose from nested invocations of distinct schedulers.

The most practical solution for creating a program-wide singleton is a dynamic shared library that contains the singleton. Of course if the schedulers could cooperate, we would not need a singleton. But that cooperation requires a centralized agent to communicate through; that is, a singleton!

Our decision to omit a statically linkable version of Intel Cilk Plus was strongly influenced by our OpenMP experience. Like Intel Cilk Plus, OpenMP also tries to schedule across a program. A static version of the OpenMP run-time was once provided, and it has been a constant source of problems arising from duplicate schedulers. We think it best not to repeat that history. As an indirect proof of the validity of these considerations, we could point to the fact that Microsoft Visual C++* only provides OpenMP support via dynamic libraries.

I write software <of a particular nature>. Is Intel Cilk Plus the right tool for me?

It depends on what your application profile is. Intel Cilk Plus’ tasking model is strictly “fork and join”. All tasks spawned within a function must be completed before that function returns. If you require “fire and forget” tasks, you should use some other mechanism to create that thread.

Intel Cilk Plus does not try to replace I/O threads or GUI threads or general Win Threads. Intel Cilk Plus is best for computational tasks that are not prone to frequent waiting for I/O or events in order to proceed.

Intel Cilk Plus does not provide synchronization primitives. What should I use?

In most cases you should try to program without synchronization primitives. Synchronization primitives can cause the following problems:

  • Deadlocks can occur if you are inconsistent about the order in which you acquire locks.
  • Performance problems can result from contention between worker threads for access to a locked region of code.
  • Even if you do everything right, you can end up with ordering issues since there’s no guarantee that your workers will execute in the same order each run.

Intel Cilk Plus provides reducers to solve many of these problems in a lock-free manner.

If you must use synchronization primitives, you can use the synchronizations primitives provided by Threading Building Blocks, the synchronization APIs provided by your operating system, or any synchronizations primitives provided by your version of C or C++, such as the synchronization primitives provided by C++ 11.

Intel Cilk Plus does not support a scalable memory manager. What library should I use?

There are many memory allocators that provide scalable support for multi-threaded applications. Intel provides tbbmalloc as part of the Threading Building Blocks offering. To use the TBB scalable memory allocator you must link with tbbmalloc_debug.{dll,so,dynlib} for debug builds or tbbmalloc.{dll,so,dynlib} for release builds.

If you only want to use the TBB scalable memory allocator without using the threading portions of TBB, you don’t have to initialize the TBB task scheduler.

What tools can help me write Intel Cilk Plus programs?

First your application must be a correct serial program. Either build the serialization of your program, or run it using a single worker. This will allow you to use all of your familiar development tools, such as the debugger, and tools such as Valgrind to check for memory leaks and memory overruns.

The first step in writing a parallel application is determining where your application is spending the majority of its time. Use performance analysis tools to identify these hotspots and then concentrate on paralleling them. The Intel Cilk Plus Software Development Kit supplies additional tools to help you:

  • The Intel Cilk screen race detector (Cilkscreen) monitors the execution of an Intel Cilk Plus program. Cilkscreen reports all possible data races exposed by the test run, for any possible schedule.
  • The Intel Cilk view scalability analyzer (Cilkview) helps you understand the parallel performance of your Intel Cilk Plus program. Cilkview reports statistics about the parallelism exposed and predicts how the applications performance will scale on multiple processor systems.

The Intel Cilk Plus SDK is available as a free supplemental download from the Cilk Tools section of the download page

Do Intel® Thread Checker and Thread Profiler support Intel Cilk Plus?

Yes. Applications threaded with Cilk Plus can be analyzed with both Intel® Thread Checker and Intel® Thread Profiler.