Intel® Cilk™ Plus – an extension to the C and C++ languages to support data and task parallelism – is being deprecated in the 2018 release of Intel® Software Development Tools. It will remain in deprecation mode in the Intel® C++ Compiler for an extended period of two years. It is highly recommended that you start migrating to standard parallelization models such as OpenMP* and Intel Threading Building Blocks (Intel TBB). For more information see Migrate Your Application to use OpenMP* or Intel TBB Instead of Intel® Cilk™ Plus. Research into Cilk technology continues at MIT's Cilk Hub.
Intel® Cilk™ Plus is the easiest, quickest way to harness the power of both multicore and vector processing.
Intel Cilk Plus is an extension to the C and C++ languages to support data and task parallelism.
High Performance:
Easy to Learn:
Easy to Use:
Build 4501 of the Intel Cilk Plus SDK was released on Monday, 01-23-2017. This version includes 64-bit Cilk screen/view binaries for Linux* and OS X* operating systems.
A new version of open-source Intel Cilk Plus runtime library (build 4467) is available for download now. This version contains build fix for Cygwin* and SPARC* support submitted by the community.
Build 4421 of the Intel Cilk Plus SDK was released on Wednesday, 11-11-2015. This version supports the latest versions of Linux*, OS X*, and Windows* operating systems.
Hello,
I have code that is structured like this:
float A[3], X[M], Y[M], Z[M], OUTX[N], OUTY[N], OUTZ[N]; for (i = 0; i < N; i++) { // Use other arrays and i as an index to these arrays to initialize A[0], A[1], A[2] for (j = 0; j < M; j++) { // Calculate new values for A[0], A[1], A[2] // using more arrays where i and/or j are used as indexes X[j] += A[0]; Y[j] += A[1]; Z[j] += A[2]; } OUTX[i] = A[0]; OUTY[i] = A[1]; OUTZ[i] = A[2]; }
I have successfully parallelized the outer loop using OpenMP, making the array A private and adding the atomic directive before the updates to the elements of X, Y and Z (using critical was actually worse). But now I would like to try this code out using Cilk Plus.
Although I have read all the documentation about reducers and reduction operations in Cilk Plus, I still cannot formulate in my mind how the above code could be implemented in Cilk Plus. I would like to replace the outer loop with a cilk_for and have some way to make the reductions for the elements in arrays X, Y and Z. Could anyone direct me towards the right solution?
Thank you,
Ioannis E. Venetis
I'm having difficulty running a simple test case using cilk_spawn. I'm compiling under gcc 4.9.0 20130520.
The following fib2010.cpp example, executes in 0.028s without cilk and takes 0.376s with cilk as long as I set the number of workers to 1. If I change the number of workers to any number greater than one, I get a segmentation fault.
// fib2010.1.cpp // #include <iostream> #include <cilk/cilk.h> #include <cilk/cilk_api.h> int fib(int n) { if (n < 2) return n; int x = cilk_spawn fib(n-1); int y = fib(n-2); cilk_sync; return x + y; } int main(int argc, char* argv[]) { std::cout << "No of workers = " << __cilkrts_get_nworkers() << std::endl; int n = 32; std::cout << "fib(" << n << ") = " << fib(n) << std::endl; }
The hardware is Dual Core AMD Opteron 8220.
Hi,
I'm having difficulty comparing cilk_for with cilk_spawn. The following cilk_spawn code executes as I expect for command line arguments like 1000000 30
// Recursive Implementation of Map // r_map.3.cpp #include <iostream> #include <iomanip> #include <cstdlib> #include <ctime> #include <cmath> #include <cilk/cilk.h> const double pi = 3.14159265; template<typename T> class AddSin { T* a; T* b; public: AddSin(T* a_, T* b_) : a(a_), b(b_) {} void operator()(int i) { a[i] = b[i] + std::sin(pi * (double) i / 180.) + std::cos(pi * (double) i / 180.) + std::tan(pi * (double) i / 180.); } }; template <typename Func> void r_map(int low, int high, int grain, Func f) { if (high - low <= grain) for (int i = low; i < high; i++) f(i); else { int mid = low + (high - low) / 2; cilk_spawn r_map(low, mid, grain, f); } } int main(int argc, char** argv) { if (argc != 3) { std::cerr << "Incorrect number of arguments\n"; return 1; } int n = std::atoi(argv[1]); int g = std::atoi(argv[2]); int* a = new int[n]; int* b = new int[n]; for (int i = 0; i < n; i++) { a[i] = b[i] = 1; } clock_t cs = clock(); r_map(0, n, g, AddSin<int>(a, b)); clock_t ce = clock(); std::cout << ce - cs / (double)CLOCKS_PER_SEC << std::endl; delete [] a; delete [] b; }
If I replace the body of r_map with a simple cilk_for loop and set the number of workers environment variable to more than 1, this code generates segmentation faults once my command line arguments exceed 36000 30
// Recursive Implementation of Map // r_map.2.cpp #include <iostream> #include <iomanip> #include <cstdlib> #include <ctime> #include <cmath> #include <cilk/cilk.h> const double pi = 3.14159265; template<typename T> class AddSin { T* a; T* b; public: AddSin(T* a_, T* b_) : a(a_), b(b_) {} void operator()(int i) { a[i] = b[i] + std::sin(pi * (double) i / 180.) + std::cos(pi * (double) i / 180.) + std::tan(pi * (double) i / 180.); } }; template <typename Func> void r_map(int low, int high, int grain, Func f) { cilk_for (int i = low; i < high; i++) f(i); } int main(int argc, char** argv) { if (argc != 3) { std::cerr << "Incorrect number of arguments\n"; return 1; } int n = std::atoi(argv[1]); int g = std::atoi(argv[2]); int* a = new int[n]; int* b = new int[n]; for (int i = 0; i < n; i++) { a[i] = b[i] = 1; } clock_t cs = clock(); r_map(0, n, g, AddSin<int>(a, b)); clock_t ce = clock(); std::cout << ce - cs / (double)CLOCKS_PER_SEC << std::endl; delete [] a; delete [] b; }
I'm compiling using GCC 4.9.0 20130520.
Can you explain why cilk_spawn works while cilk_for does not?