.NET 3.5+

LINQ Aggregation

by Richard Carr, published at http://www.blackwasp.co.uk/LinqAggregation.aspx

The seventh part of the LINQ to Objects tutorial looks at aggregate functions. These are standard query operators that can be applied to sequences and groups, performing a calculation that combines all of the values in a collection.

Previous: Page 1

Max and Min

The Max and Min standard query operators return the largest and smallest values in a sequence respectively. As with Sum, these operators can be used without parameters for numeric sequences or with a Func delegate parameter that returns the values to be compared.

var groupMaxMin = stock.GroupBy(s => s.Category, (c, i) => new
    {
        Category = c,
        Max = i.Max(s => s.Price),
        Min = i.Min(s => s.Price)
    });

/* RESULTS

{ Category = Fruit, Max = 0.35, Min = 0.29 }
{ Category = Vegetable, Max = 0.49, Min = 0.29 }
{ Category = Dairy, Max = 1.12, Min = 1.12 }

*/

Average

The last of the simple aggregate functions is provided by the Average method. This operator finds the mean of the values in a sequence. In the example below, a selector function is used to obtain the price for each item. The average price for each category is then returned.

var groupAverages = stock.GroupBy(
    s => s.Category, (c, i) => new { Category = c, Average = i.Average(s => s.Price) });

/* RESULTS (Rounded)

{ Category = Fruit, Average = 0.31 }
{ Category = Vegetable, Average = 0.36 }
{ Category = Dairy, Average = 1.12 }

*/

Custom Aggregation Functions

In addition to the aggregation functions described above, you can create your own calculations using the Aggregate operator. This operator is similar to the other extension methods, as it performs a calculation across all of the values in a sequence. However, it allows you to specify a custom function, usually as a lambda expression. This function is called multiple times, passing each value from the sequence in turn. This allows you to perform any aggregation that could be achieved with simple looping.

The most basic version of the Aggregate method has one parameter. This accepts a function that has two parameters and returns a value. The parameters and the return value are all of the same type as the items in the collection being processed.

The first call to the function provides the first two values in the sequence. The first parameter holds the first value and the second receives the second. On subsequent calls the first parameter is an accumulator, containing the result of the previous calculation. The second parameter receives the next value in the sequence. This means that if you have a collection containing four items, the function will be called three times.

We can demonstrate using a simple aggregation function. In the code below a series of four integers is aggregated. During each call the accumulator is multiplied by ten before adding the next value in the sequence. The first call passes the first and second items in the collection, these being 1 and 2, and therefore returns 12. The second call receives the result of the previous operation (12) and the next item in the sequence (3) and returns 123. The final call receives the previous result (123) and the last item in the collection (4), giving the final answer of 1234.

var value = new int[] { 1, 2, 3, 4 };
var aggregate = value.Aggregate((acc, next) => acc * 10 + next); // 1234

Sometimes it is necessary to execute the function for every item in a collection, rather than skipping the first value. In such cases you can provide a seed value for the accumulator as the first argument of the Aggregate method. In the next example the source data contains four strings. The aggregation initialises the accumulator with a fifth string and then concatenates each item from the array.

var value = new string[] { "A", "B", "C", "D" };
var aggregate = value.Aggregate("Z", (acc, next) => acc + "," + next);

// "Z,A,B,C,D"

A third overload of the Aggregate method exists. This includes a parameter to initialise the accumulator, a function to perform the calculation and a second delegate that provides a result selector. The result selector is executed against the aggregated value after the entire process is completed. This is demonstrated below by including a result selector function that transforms the results to lower case.

var value = new string[] { "A", "B", "C", "D" };
var aggregate = value.Aggregate("Z", (acc, next) => acc + "," + next, s => s.ToLower());

// value = "z,a,b,c,d"

Using Aggregates with Query Expression Syntax

Standard and custom aggregation functions can be used when working with query expression syntax. There are no specific clauses to perform these calculations. Instead, the query is used to generate a sequence and the standard query operator is applied. In the following example the query finds stock items with a price greater than 0.3. The Count method is then used to obtain the number of such items.

var count =
    (from s in stock
     where s.Price > 0.3
     select s).Count();     // count = 3

When using the group clause the results should be held in a temporary variable using the into clause. You can then use the select clause to project the data and include calls to the aggregation operators within the select. This technique is used to find the number of stock items of each category in the sample below:

var groupCounts =
    from s in stock
    group s by s.Category into category
    select new
    {
        Category = category.Key,
        Count = category.Count()
    };

/* RESULTS

{ Category = Fruit, Count = 3 }
{ Category = Vegetable, Count = 3 }
{ Category = Dairy, Count = 1 }

*/

Next: LINQ Joins

12 August 2010