SQL 2005+

Transact-SQL Joins

by Richard Carr, published at http://www.blackwasp.co.uk/SQLJoins.aspx

The nineteenth part of the SQL Server Programming Fundamentals tutorial returns to the topic of querying. In this article, we examine joins using Transact-SQL (T-SQL). Joins allow data from several tables to be combined in a single set of results.

Previous: SQL Server Foreign Keys

Download Script

Joins

In the previous two articles we have examined the process of normalising a database and creating foreign key relationships between tables that are logically linked. When you are working with a normalised relational database, it is essential that you can perform queries that gather data from several linked tables but return the matching data in a single set of results. This is achieved with the use of joins.

A join is a clause within a SELECT statement that specifies that the results will be obtained from two tables. It usually includes a join predicate, which is a conditional statement that determines exactly which rows from each of the tables will be joined by the query. Usually the join will be based upon a foreign key relationship and will only return combined results from the two tables when the key values in both tables match. However, it is possible to join tables based upon non-key values or even perform a cross join that has no predicate and returns all possible combinations of values from the two tables.

Usually normalisation results in a database with many related tables. You may therefore want to join more than two tables for a single set of results. In this case, you can include multiple joins in a SELECT statement. The first join combines the first two tables. The second join combines the results of the first join with a third table, and so on.

JoBS Database

The examples in this article use the JoBS tutorial database. If you do not have a copy of the JoBS database available, download the script using the link at the top of this page. You can execute the script to create and populate a new database.

Performing Queries with Joins

Inner Joins

Possibly the most common form of join that you will use is the inner join. With this type of join, the two tables are combined based upon a join predicate. Wherever a row in one table matches a row in the other, the two rows are combined and added to the outputted results. If a row in either table matches several in the other, each combination will be included in the results. If a row in either table does not match any in the other table, it will be excluded from the results altogether.

The join clause, second table name and join predicate are included in the SELECT statement immediately after the name of the first table. The join uses the INNER JOIN clause and the predicate uses the ON clause. The basic syntax for the SELECT statement is as follows:

SELECT columns FROM table-1 INNER JOIN table-2 ON predicate

As an example, we can join the Jobs and Engineers tables. The following query returns a list of jobs and the details of the engineer that performed the work. In this case, the predicate specifies that the tables will be joined only where the EngineerId in both tables is a match. These columns are used in a foreign key relationship definition for the two tables, although this is not a requirement for the join to be executed. However, if non-key columns are frequently used in join operations, you should consider adding appropriate indexes to improve query performance.

SELECT * FROM Jobs INNER JOIN Engineers ON Jobs.EngineerId = Engineers.EngineerId

The query returns four results as each of the four jobs in the database has an engineer associated with it. However, not all of the engineers have been assigned work so the twenty-one that have not undertaken a job are not included in the results.

The example query returns every column from both tables. For larger tables, or when using multiple joins, this may include a lot of data that you do not require. In addition to making the results more difficult to read, it can also lengthen the processing time for the query and increase the amount of network traffic generated. You should therefore only return the columns that you require. This can be achieved by specifying a column list as usual. However, if the two tables include columns with the same name, the ambiguity can cause an error. Try executing the following query:

SELECT
    EngineerId,
    JobId,
    VisitDate,
    EngineerName
FROM
    Jobs
INNER JOIN
    Engineers
ON
    Jobs.EngineerId = Engineers.EngineerId

This query fails because the EngineerId column appears in both tables. You must therefore specify which table's EngineerId column you require by prefixing it with the table name and a full-stop (period) character. The following query resolves the problem:

SELECT
    Jobs.EngineerId,
    JobId,
    VisitDate,
    EngineerName
FROM
    Jobs
INNER JOIN
    Engineers
ON
    Jobs.EngineerId = Engineers.EngineerId

When using joins, you can also use WHERE clauses, ORDER BY clauses, etc. As with the column selection, you must specify the table name for any columns with ambiguous names. For example:

SELECT
    Jobs.EngineerId,
    JobId,
    VisitDate,
    EngineerName
FROM
    Jobs
INNER JOIN
    Engineers
ON
    Jobs.EngineerId = Engineers.EngineerId
WHERE
    Jobs.EngineerId = 4
OR
    Jobs.EngineerId = 8
ORDER BY
    Jobs.EngineerId

Next: Page 2

28 December 2008