Rob Cook


Home | Donate

Parallel programming in C#


2021-01-25

Parallel programming is not the same as asynchronous programming. Parallel is about things happening at the same time, asynchronous is about avoiding blocking a thread. Asynchronous code could run all on one thread, parallel code will run on multiple threads.

Parallel programming consists of three main steps: partitioning, executing, and collating results. There are different types of parallel constructs in .NET that do some or all of these steps for you.

PLINQ

Any LINQ query can be tunred into a parallel one by calling AsParallel() on the input sequence. Subsequent LINQ calls will be bound to parallel versions of those operations. Not all operations can be successfully parallelised. In particular try and avoid indexed versions of: Select, SelectMany, and ElementAt. Grouping and set operations can also be expensive to perform.

PLINQ partitions and collates data for you, all you need do is specify the query to run. Note that your query may still run sequentially in full or in part, this is down to the framework trying to optimise the query.


// Searching for words in a really big list.

var query = reallyBigListOfWords
    .AsParallel()
    .Where(word => searchTerms.Contains(word))
    .Select(word => word);
    
foreach(var foundWord in query)
{
    Console.WriteLine($"Found word {word}.");
}

Parallel ForEach

The Parallel class offers the ForEach method which can be used to parallelise a foreach loop.


// Sequential loop.
foreach(var image in images)
{
    image = Resize(image);
    image = DoExpensiveTransform(image)
    Save(image);
}

// Parallel loop.
Parallel.ForEach(images, (image) =>
{
    image = Resize(image);
    image = DoExpensiveTransform(image)
    Save(image);
});

Note that unlike PLINQ, the Parallel methods do not collate results for you. If you need to gather results, add to a collection in a thread-safe manner (more on this later).


var images = new List();
var imagesLock = new object();

Parallel.ForEach(images, (image) =>
{
    image = Resize(image);
    image = DoExpensiveTransform(image)
    
    lock (imagesLock)
    {
        images.Add(image);
    }
});

// Do something with the images collection.

Tasks

At a lower level of abstraction we have tasks. A task is itself an abstraction of a thread, albiet one where the details of thread scheduling and execution are left up to the framework.

With tasks we have to partition and collate the work ourselves.


var images = new[] { "landscape.jpg", ... };
var results = new List<Result>();
var resultsLock = new object();
var workers = new List<Task>();

for (var i=0; i
    {
        var result = DoExpensiveThing(images[i]));
        
        lock (resultsLock)
        {
            results.Add(result);
        }
    }
    
    workers.Add(worker);
}

Task.WaitAll(workers.ToArray());

// Do something with the results.

Concurrent collections

When threads need to share data they usually resort to locking in order for the data they operate on to be thread-safe. .NET provides a set of concurrent collections suitable for high concurrency situations, that don't require explicit locking. These are in the namespace System.Collections.Concurrent.

There are four such collections:

Stacks, queues, and dictionaries behave like their normal equivalents (albiet thread-safe), the bag stores an unordered collection of objects.

If you only have a small number of threads sharing a data structure, these collections can perform slower than using a lock around a regular collection.

Things to avoid

Accessing the console will cause contention amongst the tasks that are running. If possible avoid writing to the console from inside a parallel operation, or at least minimise the amount of writing done.

An alternative approach would be to write messages to an instance of BlockingCollection<string>, and process these from another task that writes to the console.

Avoid nesting parallel loops. The number of parallel tasks is bound by the number of cores avaialble on the host machine. Stick to a parallel outer loop only.

If using locks around traditional data structures, be careful not to call into these structures too frequently. Much like accessing the console, frequent aquisition of locks from within a parallel operation will cause contention issues.

end