Parallel Programming

by Richard Carr, published at http://www.blackwasp.co.uk/ParallelProgramming.aspx

This is the first in a series of articles introducing the parallel programming techniques that are available in the C# programming language and the .NET framework version 4.0. The first part describes some of the concepts of parallel programming.

Previous: Page 2

Blocking

To avoid synchronisation problems you may decide to use locks. A lock can be requested by one task to prevent a section of code from being entered or a shared state variable from being accessed by another task. This is a technique that can be used to synchronise threads and prevent race conditions. When a process requests a lock that has already been granted to another thread, the first process stops executing and waits for the lock to be released. The stopped thread is said to have been blocked. Usually the blocked thread will eventually obtain a lock and continue working as normal. However, if there is excessive blocking some processors may become idle as they are starved of work. This impacts performance.

Deadlocking

Deadlocking is an extreme state of blocking involving two or more processes. In the simplest situation you may have two tasks that are each blocked by the other. As each task is blocked and will not continue until the other has released its lock, the deadlock cannot be broken and the two tasks will potentially be blocked forever.

Parallel Programming in .NET 4.0

Over the course of this tutorial we will be looking at the parallel programming classes provided by the .NET framework. We will be looking at two libraries. These are the Task Parallel Library (TPL) and the parallel version of Language-Integrated Query (PLINQ)

Task Parallel Library

The Task Parallel Library provides parallelism based upon both data and task decomposition. Data parallelism is simplified with new versions of the for loop and foreach loop that automatically decompose the data and separate the iterations onto all available processor cores.

Task parallelism is provided by new classes that allow tasks to be defined using lambda expressions. You can create tasks and let the .NET framework determine when they will execute and which of the available processors will perform the work.

The TPL provides an imperative form of parallel programming. Whether you decide to use data decomposition or task decomposition, your code defines exactly how your algorithm works.

Parallel LINQ

The parallel LINQ approach is declarative rather than imperative in the same way as the sequential version of LINQ. This approach to parallelism is of a higher level that that provided by the TPL. It allows the use of the standard query operators, which you should be familiar with, whilst automatically assigning work to be carried out simultaneously by the available processors.

Benefits

The new parallel programming functionality in the .NET framework provides several benefits that make it the preferred choice over standard multi-threading. When manually creating threads, you may create too many, leading to excessive task-switching operations that affect performance. You may also create two few, leaving processors idle. These are some of the key problems that the new classes aim to address.

Both the TPL and PLINQ provide automatic data decomposition. Although you can control decomposition, usually the standard behaviour is sufficient. This behaviour is intelligent. For example, after decomposition and allocation of work, the activity of each processor is continually considered. If it turns out that the work assigned to one processor is more time-consuming than that of another, a work-stealing algorithm is used to transfer work from the busy processor to the under-utilised one.

It is important to understand that the new libraries provide potential parallelism. With standard multi-threading, when you launch a new thread it immediately starts its work. This might not be the most efficient way of utilising the available processors. The parallelism libraries may launch new threads if processor cores are available. If they are not, tasks may be postponed until a core becomes free or until the result of the operation is actually needed.

Finally, the new libraries allow you to not worry about the number of available cores and the number that might be available on future computers. All of the available cores will be utilised as required. If the code is executed on a single-processor machine, it will be mostly executed sequentially. A little overhead is introduced by the parallelism libraries so parallel code running on a single core machine will run more slowly than purely sequential code. However, this impact is minor when compared with the benefits gained.

Next: Parallel For Loop

8 August 2011