.NET 3.5+

Custom LINQ Operators, Deferred Execution and Exceptions

by Richard Carr, published at http://www.blackwasp.co.uk/LinqDeferredExceptions.aspx

The Language-Integrated Query (LINQ) standard query operators that return enumerable sequences use deferred execution but validate any source sequences immediately. Custom LINQ operators should exhibit the same behaviour.

Deferred Execution

Language-Integrated Query (LINQ) uses deferred execution for operations that return enumerable sequences of values. This allows a query to be constructed in several stages, without the query being executed until its results are accessed. This can also improve performance and memory usage, as an entire sequence does not need to be stored in memory. The full set of results may never be generated if only the first few values from the query's results are used.

When you create your own, custom query operators you can achieve deferred execution easily by using C# iterators. However, the execution of any code in a method that includes the yield keyword will be deferred, even if it does not relate directly to the generation of the final sequence. This means that validation of parameter values will not occur until the first result is read. If, like the standard query operators, your custom extension method throws an exception when an input sequence is null, the exception may be thrown at an unexpected time.

We can demonstrate with a simple extension method. The code below defines a LINQ-style operator than converts a collection of any type to a new sequence containing string representations of the original values.

public static class IEnumerableExtensions
{
    public static IEnumerable<string> ToStringSequence<T>(this IEnumerable<T> source)
    {
        if (source == null) throw new ArgumentNullException("source");

        foreach (T item in source)
            yield return item.ToString();
    }
}

The code below uses the above operator to convert a null source collection. If you step through the code in a debugger you will see that the ArgumentNullException is not thrown until the final line of code. Ideally the code should fail on the line that runs ToStringSequence.

int[] source = null; 
var strings = source.ToStringSequence();

Console.WriteLine("An exception should be thrown before now");
Console.WriteLine(strings.Count());

Checking Arguments with Deferred Execution

We can change the functionality to bring it in line with the standard query operators by splitting the method into two parts. The first is the public method that consumers of our operator will use. This method validates the input parameter. If the value is valid, a second, private method containing the yield command can be called. The execution of the private method will be deferred.

The updated code is shown below. Now when you run the program you should see that the exception occurs immediately when the ToStringSequence method is encountered. If you run the code with a valid source sequence in the debugger, you can see that the overall process still uses deferred execution, only starting to generate the sequence when the Count operator is used.

public static IEnumerable<string> ToStringSequence<T>(this IEnumerable<T> source)
{
    if (source == null) throw new ArgumentNullException("source");
    return DeferredToStringSequence(source);
}

private static IEnumerable<string> DeferredToStringSequence<T>(IEnumerable<T> source)
{
    foreach (T item in source)
        yield return item.ToString();
}

1 May 2011