Thursday, February 20, 2014

ParaSail Work Stealing speeds up

We are now releasing revision 5.2 of the ParaSail compiler and interpreter, which incorporates a new implementation of work stealing.  Links to both binaries and sources can be found at:

in the Documentation and Download section of the web page.

Here is the new entry in the release notes for revision 5.2:

* [rev 5.2] Re-implementation of work stealing to reduce contention between processors.  Each server now has a private double-ended queue (deque) of pico-threads, along with the shared triplet of deques which has existed before.  This produced another two times speedup (in addition to the rev 4.9 improvements), thereby speeding up execution by four times or more since rev 4.8. The main procedures for each language (ParaSail, Sparkel, etc.) have been refactored to share more common code. Allow a command-line flag "-servers nnn" to specify the number of heavy-weight server threads to use.  Default is 6. Command can also be given interactively, as "servers nnn". Interactively it only works to increase the number; there is no way to shut down servers that already have been activated.

Wednesday, February 19, 2014

Worklist Algorithms and Parallel Iterators

ParaSail has some very general parallel iterator constructs.  One of the more unusual involves the use of a continue statement to initiate a new iteration of an outer loop statement, as illustrated in this breadth-first search of a graph:

        var Seen : Array<Atomic<Boolean>, Indexed_By => Node_Id> :=
          [for I in 1 .. |DGraph.G| => Create(#false)];

        for Next_Set => Root_Set loop  // Start with the initial set of roots
            for N in Next_Set concurrent loop  // Look at each node in set
                if not Value(Seen[N]) then
                    Set_Value(Seen[N], #true);
                    if Is_Desired_Node(DGraph.G[N]) then // Found a node
                        return N;
                    end if;
                    // Continue outer loop with successors of this node
                    continue loop Outer with Next_Set => DGraph.G[N].Succs;
                end if;
            end loop;
        end loop Outer;
        // No node found
        return null;

It has been a bit hard to explain what it means for an inner concurrent loop to continue an outer loop.   More recently, we have developed an alternative explanation.  The basic idea is to appeal to the notion of a worklist algorithm.  Various (sequential) algorithms are organized around the use of a list of work items to be processed, with the overall algorithm continuing until the worklist is empty, or, for a search, until an item of interest is found.  For example, the work-list approach to data flow analysis algorithms is described here:
Here is another worklist-based algorithm, for solving a constraint satisfaction problem using the Arc Consistency approach:
Finally, here is a breadth-first graph search algorithm, which uses a work queue instead of a work list:
So one way of explaining this kind of ParaSail iterator is as a built-in work-list-based iterator, where the loop keeps going until there are no more work items to process, and continue adds a new work-item to the worklist.  This approach makes it clear that such iterators can work even if there is only one physical processor available, and also suggests that the work items are not full threads of control, but rather light-weight representations of some work to be done.  This terminology of work items and work list or work queue also matches well with the terminology of the underlying work stealing scheduling mechanism in use.

We used to refer to such ParaSail's parallel iterators as a bag of threads, but now it seems easier to describe each iterator as having its own list of work items (i.e. a worklist) and the loop keeps going until the worklist is empty, or until we encounter a return or exit (aka break) statement.

One interesting detail at this point is that for a breadth-first search, you want the work list/queue to be processed in a FIFO manner, while for a depth-first search, you want to process the work list/queue as a stack, in a LIFO manner.  The work-stealing scheduler we use does some of both, using FIFO when stealing, and LIFO when the work-items are served by the same core that generated them.  This argues for perhaps giving more control to the programmer when using an explicit continue statement, to be able to specify FIFO or LIFO.  However, it is not entirely clear if that added complexity is worthwhile.