BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

LINQ
.NET 3.5+

A LINQ Style Mode Operator

Language Integrated Query (LINQ) includes the Average operator that can be used to calculate the mean value of a sequence. This article implements a LINQ operator that determines the mode, which is the most common value or group of values.

Mode

In previous articles we have seen the Average standard query operator, which calculates the mean of a sequence of values by summing them and dividing by the number of elements present. We have also implemented a Median operator, which places the sequence in order and selects the middle value, or calculates the mean of the two middle values in a sequence with an even number of elements.

Another method for finding an average of a set of values is using the mode. The mode of a sequence is the value that appears the most frequently. Unlike the mean and median, there can be more than one mode when two or more values occur the same number of times. For example, the sequence 1, 2, 2, 3, 4, 5, 5 is bimodal. The values 2 and 5, which both occur twice are the set's modes.

In this article we will create a method with a syntax similar to the existing Language-Integrated Query (LINQ) standard query operators. It will calculate the mode, or modes, for a sequence and return them as a new sequence.

Creating the Class

To begin we need to create a static class to contain the extension method. Create a new project and add a class named "ModeExtensions". Modify the class' definition as follows:

public static class ModeExtensions
{
}

Creating the Method

The custom Mode operator will have various overloaded versions, allowing it to work with a sequence of any type, using a custom equality comparer and including a projection function that obtains the values from which to generate the mode. We'll implement this, the most complex overload, first. Later we'll add simpler versions, each calling the original.

Let's start by creating the signature for the Mode method:

public static IEnumerable<R> Mode<T, R>(
    this IEnumerable<T> source, Func<T, R> selector, IEqualityComparer<R> comparer)
{
}

This may look quite complicated at first glance but it's actually quite simple. We start by declaring that we will be returning an IEnumerable<R>. This is a sequence containing the mode values. The type parameter 'R' is the resultant type, which may be different from the that of the input sequence after projection.

The first parameter, source, is the input sequence. This is declared as IEnumerable<T>, where 'T' is the type of the elements in the sequence. The selector function will be applied to each item from the source sequence before the modes are calculated. It will cause the conversion between the two types of the generic method, if those types differ. Finally, the comparer parameter allows you to provide an alternative comparer to the default. For example, if you are calculating the mode for a series of strings, you may decide to use a case-insensitive comparer. Later we will create overloads that do not include a projection function or a comparer parameter.

When the method is called, it is possible that a comparer will not be provided and that this argument will be null. In this case, we will use the default comparer. To detect this possibility and obtain the comparer that will be used, add the following line to the method:

var actualComparer = comparer == null ? EqualityComparer<R>.Default : comparer;

Calculating the Mode

We can now process the entire sequence, determining which elements are equal using our comparer, and create a set of unique values and the number of times that they appear within the sequence. We don't want to apply this to the input sequence directly. Instead we will apply the selector function to each element to obtain the potential modes. This will allow us to extract a single property from an object or perform a calculation before determining the mode.

We can perform all of the above using the standard GroupBy operator, using the overload that accepts a key selector, a result selector and a custom equality comparer. The following line creates an array of anonymous types containing the key, which is generated by the selector function, and the count for each group. Converting the groups to an array ensures that the Mode method will work correctly with sequences that may only be enumerated once.

var grouped = source.GroupBy(
    selector, (k, s) => new { Key = k, Count = s.Count() }, actualComparer).ToArray();

At this point we have two possible outcomes. The first is that the set of results is empty because the input sequence had no elements. In this case we want to return an empty sequence of the 'R' type.

if (grouped.Count() == 0)
    return Enumerable.Empty<R>();

If there are values in the grouped sequence, we need to extract the modes. These are the items that have the highest frequencies and are found using the query below. Here the Where operator filters the grouped items, only returning the values with the highest frequency, determined by calling the Max method. The Select operator is used to return only the keys and not the counts.

else
    return grouped.Where(g => g.Count == grouped.Max(x => x.Count)).Select(g => g.Key);
18 May 2011