BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

LINQ
.NET 3.5+

LINQ Outer Joins

One commonly used feature of Language-Integrated Query (LINQ) is the facility to combine two sequences of related data using joins. The standard join operation provides an inner join but with a minor modification can be changed to give a left outer join.

Inner and Outer Joins

When you use the LINQ join clause in the query expression syntax to combine two sets of related information, you perform an inner join. This means that you provide an expression that determines for each item in the first sequence, the matching items in the second. Where there is a match, one item is added to the resultant sequence. Where there are several matches for a single item, several results are generated. If an item in the first sequence has no matching items in the second, that item is excluded from the results.

An inner join is useful in many situations but there are some circumstances where the omission of non-matching results is undesirable. For example, imagine that you have a set of data containing product categories and another containing stock items. If you wish to generate a list of categories and their products, an inner join presents the possibility that an empty product category will be missing from the results. If you wish to include empty categories you need to perform an outer join.

An outer join includes items in the resultant sequence that were present in only one of the input collections. LINQ allows you to create left outer joins using a slight modification to the standard join syntax. This allows you to select all of the items from the first sequence, which appears on the left of the statement, even if they have no matching values in the second. If there are no matches, one result is generated using a combination of data from the first collection and provided default values. If there are multiple matching items, the behaviour is unchanged; multiple results are generated.

Example Data

To demonstrate the difference between an inner join and a left outer join we need some sample classes and data. The first class we will use represents a category of stock. The category includes a name and the name of the major category that it belongs to.

public class StockCategory
{
    public string Name { get; set; }
    public string MajorCategory { get; set; }

    public StockCategory(string name, string majorCategory)
    {
        Name = name;
        MajorCategory = majorCategory;
    }
}

The second class is used to represent stock items. Each includes a name and the name of the category that it belong to.

public class StockItem
{
    public string Name { get; set; }
    public string Category { get; set; }

    public StockItem(string name, string category)
    {
        Name = name;
        Category = category;
    }
}

With the classes created we can create some sample stock items and categories. Note that the categories list includes the item, "Ices", which has no related stock items.

var stock = new List<StockItem>
{
    new StockItem("Apple", "Fruit"),
    new StockItem("Banana", "Fruit"),
    new StockItem("Orange", "Fruit"),
    new StockItem("Cabbage", "Vegetable"),
    new StockItem("Carrot", "Vegetable"),
    new StockItem("Lettuce", "Vegetable"),
    new StockItem("Milk", "Dairy")
};

var categories = new List<StockCategory>
{
    new StockCategory("Dairy", "Chilled"),
    new StockCategory("Ices", "Frozen"),
    new StockCategory("Fruit", "Fresh"),
    new StockCategory("Vegetable", "Fresh")
};

Performing an Inner Join

To begin, let's look at an inner join of the stock items and category data, so that we can compare the syntax with that of a left outer join. Inner joins are explained in the LINQ Joins article, so I will not describe them in detail here. The query below simply combines the categories and stock sequences, generating a result when the category's Name property matches the Category property of a StockItem object. The results are anonymous types containing the major and minor categories and the stock item name.

var joined =
    from c in categories
    join s in stock
    on c.Name equals s.Category
    select new
    {
        MajorCategory = c.MajorCategory,
        MinorCategory = c.Name,
        Name = s.Name
    };

/* RESULTS

{ MajorCategory = Chilled, MinorCategory = Dairy, Name = Milk }
{ MajorCategory = Fresh, MinorCategory = Fruit, Name = Apple }
{ MajorCategory = Fresh, MinorCategory = Fruit, Name = Banana }
{ MajorCategory = Fresh, MinorCategory = Fruit, Name = Orange }
{ MajorCategory = Fresh, MinorCategory = Vegetable, Name = Cabbage }
{ MajorCategory = Fresh, MinorCategory = Vegetable, Name = Carrot }
{ MajorCategory = Fresh, MinorCategory = Vegetable, Name = Lettuce }

*/

When you run the code you should find that the results are as shown in the comments above. Note that the "Ices" category is not present because there are no stock items in that category.

Converting the Inner Join to an Outer Join

To change our query to perform an outer join so that it returns the empty categories we only need to make some minor modifications. The first is to convert the standard join to a group join by adding the into clause, thereby pushing the results into a temporary variable. Next we add a second from clause that gets data from the temporary variable with a call to DefaultIfEmpty. This standard query operator means that the stock items in each grouped result will either be a sequence, if stock items are present, or null, if the category has no matching stock items. Finally we project the results, making sure to check for nulls in the temporary data and provide default values where they are present.

The modified query below performs a left outer join. In the main part the results are identical to the inner join. The only difference is in the second result, which shows the empty category with a default name for the stock item of "<No items>".

var joined =
    from c in categories
    join s in stock
    on c.Name equals s.Category into tempJoin
    from j in tempJoin.DefaultIfEmpty()
    select new
    {
        MajorCategory = c.MajorCategory,
        MinorCategory = c.Name,
        Name = j == null ? "<No items>" : j.Name
    };

/* RESULTS

{ MajorCategory = Chilled, MinorCategory = Dairy, Name = Milk }
{ MajorCategory = Frozen, MinorCategory = Ices, Name = <No items> }
{ MajorCategory = Fresh, MinorCategory = Fruit, Name = Apple }
{ MajorCategory = Fresh, MinorCategory = Fruit, Name = Banana }
{ MajorCategory = Fresh, MinorCategory = Fruit, Name = Orange }
{ MajorCategory = Fresh, MinorCategory = Vegetable, Name = Cabbage }
{ MajorCategory = Fresh, MinorCategory = Vegetable, Name = Carrot }
{ MajorCategory = Fresh, MinorCategory = Vegetable, Name = Lettuce }

*/
27 October 2011