Why Use Intel® Cilk™ Plus?

Why Use it?

Intel® Cilk™ Plus is the easiest, quickest way to harness the power of both multicore and vector processing.


What is it?

Intel Cilk Plus is an extension to the C and C++ languages to support data and task parallelism.


Primary Features

High Performance:

  • An efficient work-stealing scheduler provides nearly optimal scheduling of parallel tasks
  • Vector support unlocks the performance that's been hiding in your processors
  • Powerful hyperobjects allow for lock-free programming

Easy to Learn:

  • Only 3 new keywords to implement task parallelism
  • Serial semantics make understanding and debugging the parallel program easier
  • Array Notations provide a natural way to express data parallelism

Easy to Use:

  • Automatic load balancing provides good behavior in multi-programmed environments
  • Exisiting algorithms easily adapted for parallelism with minimal modification
  • Supports both C and C++ programmers

Learn more


Threading with Intel® Cilk™ Plus

Intel® Cilk™ Plus: Part of the Intel C++ Compiler

Using Intel® Cilk™ Plus can improve application performance dramatically. In this example, the speed-up was 14×. How much faster could your code be?

Vectorization and Intel® Cilk™ Plus

SIMD, SSE 4, and AVX; what do they mean and why are they important to a software Developer?

Parallelization with Intel® Cilk™ Plus, Part 2

David Mackay introduces basic concepts of threading with Intel® Cilk™ Plus. Cilk Plus provides a clean and elegant interface to parallel operations.

Intel Tools for Threaded Parallelism at Super Computing

In this video, Intel's Ronald Green demonstrates the company's tools for threaded parallelism. Recorded at SC11 in Seattle.

Threading with Intel® Cilk™ Plus

Arch Robison, Cilk Plus architect, joins David Mackay to answer some basic questions for users.

Latest Posts

News
Forums
September 9, 2013: Version 1.2 of the Intel Cilk Plus Language Extension Spec is now available

In response to questions from customers, implementors, forum contributors, and developers, the Intel Cilk Plus Language Extension Specification has been revised.  The new specification (version 1.2) contains numerous corrections and clarifications.  No new features were added, but the existing features are much more precisely described.  A redlined copy showing changes from the 1.1 spec is also available.

http://www.cilkplus.org/download#open-specification

August 21, 2013: The Intel Cilk Plus SDK now includes support for Mac OS

Build 3566 of the Intel Cilk Plus SDK was released to the Cilk Plus website on Tuesday, 20-Aug-2013. It introduces support for Mac OS, in addition to the traditional support for Windows* and Linux*.

http://www.cilkplus.org/download

August 8, 2013: GCC 4.8 binaries now available

Binary versions of the GNU Compiler Collection (GCC) C and C++ 4.8 compilers with the cilkplus extension are now available from the Download page.  The binaries support the x86-32 and x86-64 architectures on Ubuntu* Linux*. The source for these compilers is available at http://gcc.gnu.org/svn/gcc/branches/cilkplus-4_8-branch  .

http://www.cilkplus.org/download#gcc-development-branch

 

 

Hello,

I have code that is structured like this:

float A[3], X[M], Y[M], Z[M], OUTX[N], OUTY[N], OUTZ[N];

for (i = 0; i < N; i++) {
  // Use other arrays and i as an index to these arrays to initialize A[0], A[1], A[2]
  for (j = 0; j < M; j++) {
    // Calculate new values for A[0], A[1], A[2]
    // using more arrays where i and/or j are used as indexes

    X[j] += A[0];
    Y[j] += A[1];
    Z[j] += A[2];
  }
  OUTX[i] = A[0];
  OUTY[i] = A[1];
  OUTZ[i] = A[2];
}

I have successfully parallelized the outer loop using OpenMP, making the array A private and adding the atomic directive before the updates to the elements of X, Y and Z (using critical was actually worse). But now I would like to try this code out using Cilk Plus.

Although I have read all the documentation about reducers and reduction operations in Cilk Plus, I still cannot formulate in my mind how the above code could be implemented in Cilk Plus. I would like to replace the outer loop with a cilk_for and have some way to make the reductions for the elements in arrays X, Y and Z. Could anyone direct me towards the right solution?

Thank you,

Ioannis E. Venetis

I'm having difficulty running a simple test case using cilk_spawn.  I'm compiling under gcc 4.9.0 20130520.

The following fib2010.cpp example, executes in 0.028s without cilk and takes 0.376s with cilk as long as I set the number of workers to 1.  If I change the number of workers to any number greater than one, I get a segmentation fault.

// fib2010.1.cpp
//

#include <iostream>
#include <cilk/cilk.h>
#include <cilk/cilk_api.h>

int fib(int n)
{
    if (n < 2)
        return n;

    int x = cilk_spawn fib(n-1);
    int y = fib(n-2);
    cilk_sync;
    return x + y;
}

int main(int argc, char* argv[])
{
    std::cout << "No of workers = " << __cilkrts_get_nworkers() << std::endl;
    int n = 32;
    std::cout << "fib(" << n << ") = " << fib(n) << std::endl;
}

 

The hardware is Dual Core AMD Opteron 8220.

Hi,

I'm having difficulty comparing cilk_for with cilk_spawn.  The following cilk_spawn code executes as I expect for command line arguments like 1000000 30

// Recursive Implementation of Map
// r_map.3.cpp

#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <ctime>
#include <cmath>
#include <cilk/cilk.h>

const double pi = 3.14159265;

template<typename T>
class AddSin {
	T* a;
	T* b;
  public:
    AddSin(T* a_, T* b_) : a(a_), b(b_) {}
    void operator()(int i) { a[i] = b[i] + std::sin(pi * (double) i / 180.) + std::cos(pi * (double) i / 180.) + std::tan(pi * (double) i / 180.); }
};

template <typename Func>
void r_map(int low, int high, int grain, Func f) {
	if (high - low <= grain)
		for (int i = low; i < high; i++)
			f(i);
	else {
		int mid = low + (high - low) / 2;
		cilk_spawn r_map(low, mid, grain, f);
	}
}

int main(int argc, char** argv) {
	if (argc != 3) {
		std::cerr << "Incorrect number of arguments\n";
		return 1;
	}
	int n = std::atoi(argv[1]);
	int g = std::atoi(argv[2]);
	int* a = new int[n];
	int* b = new int[n];
	for (int i = 0; i < n; i++) {
		a[i] = b[i] = 1;
	}
	clock_t cs = clock();
	r_map(0, n, g, AddSin<int>(a, b));
	clock_t ce = clock();
	std::cout << ce - cs / (double)CLOCKS_PER_SEC << std::endl;
	delete [] a;
	delete [] b;
}

If I replace the body of r_map with a simple cilk_for loop and set the number of workers environment variable to more than 1, this code generates segmentation faults once my command line arguments exceed 36000 30

// Recursive Implementation of Map
// r_map.2.cpp

#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <ctime>
#include <cmath>
#include <cilk/cilk.h>

const double pi = 3.14159265;

template<typename T>
class AddSin {
	T* a;
	T* b;
  public:
    AddSin(T* a_, T* b_) : a(a_), b(b_) {}
    void operator()(int i) { a[i] = b[i] + std::sin(pi * (double) i / 180.) + std::cos(pi * (double) i / 180.) + std::tan(pi * (double) i / 180.); }
};

template <typename Func>
void r_map(int low, int high, int grain, Func f) {
	cilk_for (int i = low; i < high; i++)
		f(i);
}

int main(int argc, char** argv) {
	if (argc != 3) {
		std::cerr << "Incorrect number of arguments\n";
		return 1;
	}
	int n = std::atoi(argv[1]);
	int g = std::atoi(argv[2]);
	int* a = new int[n];
	int* b = new int[n];
	for (int i = 0; i < n; i++) {
		a[i] = b[i] = 1;
	}
	clock_t cs = clock();
	r_map(0, n, g, AddSin<int>(a, b));
	clock_t ce = clock();
	std::cout << ce - cs / (double)CLOCKS_PER_SEC << std::endl;
	delete [] a;
	delete [] b;
}

I'm compiling using GCC 4.9.0 20130520.

Can you explain why cilk_spawn works while cilk_for does not?

Intel Cilk Plus is written with three new keywords and requires fewer source code modifications than Intel® Threading Building Blocks but is really powerful. Cilk Plus also comes with an array language extension to C/C++, providing array section notations for SIMD vector parallelism and parallel function maps for multi-threading.
Brian Reynolds
Brian Reynolds Research

Documentation

Reference Manual
FAQs
API and Reducer Library
Open Specifications

Resources

Tutorial
Intel Cilk Plus Forum
Cilk Tools
Code Samples
Sample Applications
Contributed Code
Experimental Software
GCC Binaries
Open Specifications
Papers and Presentations

 

Structured Parallel Programming:
Patterns for Efficient Computation

Buy Now