Piper: Experimental Language Support for Pipeline Parallelism in Intel® Cilk™ Plus

We have implemented Piper, an experimental prototype of Intel® Cilk™ Plus that provides library headers and runtime support for pipe-while loops.   Pipe-while loops are a new parallel loop construct described in a recent paper on On-the-fly pipeline parallelism, which was published in July 2013 in collaboration with researchers at MIT.    The Piper prototype enables users to write parallel applications that use pipe-while loops and experiment with this new language construct.    To download the Piper prototype, visit the Experimental Software section on the Download page.

 

Pipe-While Loops

A pipe-while loop is a generalization of an ordinary while loop that allows for pipeline parallelism between iterations.   Based on this research, we propose a new pipe-while loop construct for Intel Cilk Plus, which is used as follows:
  1. A programmer specifies a pipe-while loop using the cilk_pipe_while keyword.  Iterations of the pipe-while loop may potentially execute in parallel, subject to a few additional constraints.
  2. Each iteration of a pipe-while loop is divided into stages using the cilk_stage and cilk_stage_wait keywords.
  3. A statement cilk_stage(s) in an iteration advances execution of the iteration to stage s.  
  4. A statement cilk_stage_wait(s) in an iteration advances execution of the iteration to stage s, but the stage s in the iteration can begin executing only after the previous iteration has finished its stage s.   Thus, a cilk_stage_wait creates a dependency on the corresponding stage from the previous iteration.

Existing Intel Cilk Plus constructs, that is, cilk_forcilk_spawn, and cilk_sync, are designed to express only parallel computations with fork-join dependencies.   Pipe-while loops extend the capabilities of Intel Cilk Plus by enabling programmers to also write parallel computations with a pipeline grid dependency structure.   For more details on the proposed pipe-while language construct, see the Piper reference guide.

Examples

One simple use-case for a pipe-while loop is for constructing simple static pipelines, similar to the parallel pipeline construct in Intel® Threading Building Blocks (TBB).   For example, the following pipe-while loop in Figure 1 illustrates a simple 5-stage "SPSPS" pipeline.   Stages 0, 2, and 4 are serial stages, or S-stages, meaning that each that stage of each iteration is processed in order.   In contrast, stages 1 and 3 are parallel stages, or P-stages, meaning that there is no dependency between corresponding stages of different iterations.

 
bool done = false;
int iter_counter = 0;
cilk_pipe_while(!done) { // Each iteration starts executing in Stage 0.
    int i = iter_counter++;
    done = stage0(i);

    cilk_stage(1);       // Advance to Stage 1 (parallel stage)
    stage1(i);

    cilk_stage_wait(2);  // Advance to Stage 2 (serial stage)
    stage2(i);

    cilk_stage(3);       // Advance to Stage 3 (parallel stage)
    stage3(i);
  
    cilk_stage_wait(4);  // Advance to Stage 4 (serial stage)
    stage4(i);
}
Figure 1:  A pipe-while loop for a simple 5-stage SPSPS pipeline.   The cilk_pipe_while keyword indicates the loop is a pipe-while.   In this example, the cilk_stage keyword is used to start each parallel stage.  The cilk_stage_wait keyword is used to start each serial stage except stage 0, which is always serial and starts when the iteration begins.
 

Figure 2 shows the example dag (directed acyclic graph) for the SPSPS pipeline for Figure 1.   Note that the pipeline loop in Figure 1 can be sped up by a parallel execution if the average work of all the parallel stages is much larger than the average work of the serial stages.     

 
Pipeline dag for SPSPS pipeline from Figure 1. 

Figure 2:  Pipeline dag for SPSPS pipeline from Figure 1.   Each column of circles in the grid represents a single iteration, and each circle represents a single stage in that iteration.

Pipe-while loops are also able to express more complex pipelines.  For example, since the number of stages in an iteration of a pipe-while loop can be data-dependent, a pipe-while loop can be used to encode the dependencies of a dynamic program on an n by m grid, as shown in Figure 3.  This example is described in greater detail in the Piper reference guide.

 

Pipeline dag for a dynamic program computed on an n by m grid.

Figure 3:  Pipeline dag for a dynamic program computed on an n by m grid.   Each cell [i, j] in the grid can be computed based on the result of its neighboring cells [i-1, j], [i, j-1], and [i-1, j-1].  Since the last dependency on [i-1, j-1] is implicitly satisfied by the other two dependencies, diagonal edges are unnecessary in the dag.

Moreover, a pipe-while loop iteration can "skip" over stages, and the choice of whether an iteration executes a cilk_stage or cilk_stage_wait for a given stage can be made dynamically at runtime.   This flexibility enables one to encode  a complex pipeline dag needed by an application such as an x264 video encoder, shown in Figure 4.  For more details, see the full paper describing pipe-while loops.

 

Pipeline dag for an X264 encoder.

Figure 4:  Pipeline dag for an x264 encoder.    Each iteration processes a frame in the video.  The first stage in each iteration decides whether the frame being processed is classified as an “I” or “P” frame.

 

Piper Prototype

We have implemented Piper, an experimental prototype of Intel® Cilk™ Plus that provides a library headers and runtime support for pipe-while loops.    Unfortunately, supporting pipe-while loops in the language requires compiler support to handle the new keywords.   Since compiler support is not yet available, Piper instead provides a header file with preprocessor macros that users can include to code pipe-while loops.   For example, to implement the pipe-while loop described in Figure 1 for an SPSPS pipeline, programmers can write the code shown in Figure 5 using Piper.

 

bool done = false;
int iter_counter = 0;
CILK_PIPE_WHILE_BEGIN(!done) { // Each iteration starts in Stage 0.
    int i = iter_counter++;

    done = stage0(i);

    CILK_STAGE(1);      // Advance to Stage 1 (parallel stage)
    stage1(i);

    CILK_STAGE_WAIT(2); // Advance to Stage 2 (serial stage)
    stage2(i);

    CILK_STAGE(3);      // Advance to Stage 3 (parallel stage)
    stage3(i);

    CILK_STAGE_WAIT(4)// Advance to Stage 4 (serial stage)
    stage4(i);

CILK_PIPE_WHILE_END();
Figure 5:  The equivalent implementation using Piper of the example proposed pipe-while loop shown in Figure 1.
 

For more information about downloading, installing, and using the Piper prototype, see the visit the Experimental Software section on the Download page.