![SQL Server](icons/Sql.png)
SQL 2005+SQL Server Common Table Expressions
Avoiding temporary tables and cursors for queries can lead to scripts that are easier to understand and maintain, and that provide better performance than the cursor-based alternative. One option for complex queries is the use of common table expressions.
Adding Column Names to the Common Table Expression
In the first example the column names of the CTE data were generated by the column names of the query within the CTE. This is only possible when every column in the query has a name. In other situations you can declare the names for the columns within the WITH clause. To do so, you simply add a comma-separated list of the names within in parentheses between the table source name and the AS clause. The syntax is as follows:
WITH table-source-name (param-1, param-2, .., param-X, ) AS (query) query
Using this syntax we can recreate the original query as shown below. Note that the aggregate column in the CTE query no longer needs to be named. If you tried to execute a similar script without the column name or name list you would receive an error.
WITH AggregatedSales (Salesperson, TotalSales)
AS
(
SELECT
Salesperson,
SUM(Value)
FROM
Sales
GROUP BY
Salesperson
)
SELECT
convert(VARCHAR, S.SaleDate, 111) AS SaleDate,
S.Salesperson,
S.Value,
A.TotalSales
FROM
Sales S
INNER JOIN
AggregatedSales A
ON
S.Salesperson = A.Salesperson
/* RESULTS
SaleDate Salesperson Value TotalSales
------------ ----------- --------- ----------
2012/06/11 Bob 1295.00 1295.00
2012/06/16 Jim 1495.00 2390.00
2012/06/22 Jim 895.00 2390.00
2012/06/26 Mel 395.00 395.00
2012/06/21 Nic 1095.00 1095.00
2012/06/24 Pat 195.00 195.00
2012/06/16 Sam 95.00 95.00
*/
Recursive Common Table Expressions
A very useful feature of common table expressions is the ability to create recursive queries. These retrieve data from the database repeatedly, with each successive iteration using data from the previous selection to control the query. Recursive CTEs use the same syntax as we have already seen. The difference is in the query used to populate the virtual table. This has two parts, the anchor member and the recursive member.
A recursive CTE's anchor member is a query that retrieves data from the database. It performs a SELECT statement that obtains information that will be added to the final table source and that forms the starting point for the recursive element of the CTE. The recursive member is another SELECT statement. This retrieves information from the CTE itself and can obtain further information from tables if necessary. If the recursive member generates any results, they are added to the table source. The first time the recursive member runs, any data requested from the CTE is based upon the results of the anchor member. Subsequent executions use the data from the previous iteration of the recursive part of the CTE. If at any time the recursive member yields no results, the CTE terminates and the main query runs.
The anchor member and the recursive member are combined using a UNION ALL clause, as shown in the syntax below:
WITH table-source-name (column-list) AS
(
anchor-member
UNION ALL
recursive-member
)
query
Flattening Hierarchical Data with a Recursive Common Table Expression
Recursive CTEs are useful for flattening hierarchical data into simple rows of data. For example, in our test data we have a set of salespeople, each with a name and a manager's name. Some salespeople have no manager, others have no subordinates, and some have both. This gives a hierarchical structure that can be difficult to query without CTEs.
The CTE shown below generates a list of salespeople and a count of the number of managers that each has. The anchor member, which is the first SELECT statement, gets the list of top-level salespeople. These have no manager so the Manager column value is NULL. The Name and Manager columns are retrieved from the table, along with a third column that will be used to return the number of managers. For these top-level employees this defaults to zero.
The recursive member is the second SELECT statement. This joins the Salespeople table to the CTE, which is named "Team", with an inner join. The join is based upon the salesperson's Manager column matching the Name retrieved in the previous iteration. It therefore retrieves all of the salespeople that are direct subordinates of the previously retrieved salespeople. The number of managers is set to one higher than the previous value, meaning that every step down the hierarchy increases the number of managers by one, as we would expect.
Once the CTE is populated, the query that uses it retrieves all of the salespeople from the flattened hierarchy, aggregating their sales data. The information is sorted by the number of managers to show the higher level salespeople first. Try running the script to verify the results.
WITH Team (Salesperson, Manager, NumManagers) AS
(
-- Anchor member
SELECT
Name,
Manager,
0
FROM
SalesPeople
WHERE
Manager IS NULL
UNION ALL
-- Recursive member
SELECT
S.Name,
S.Manager,
1 + T.NumManagers
FROM
SalesPeople S
INNER JOIN
Team T
ON
S.Manager = T.Salesperson
)
SELECT
T.SalesPerson,
T.Manager,
T.NumManagers,
sum(isnull(S.Value,0)) AS TotalSales
FROM
Team T
LEFT JOIN
Sales S
ON
T.Salesperson = S.Salesperson
GROUP BY
T.Salesperson,
T.Manager,
T.NumManagers
ORDER BY
NumManagers,
Salesperson
/* RESULTS
SalesPerson Manager NumManagers TotalSales
----------- ------- ----------- ----------
Bob NULL 0 1295.00
Jim Bob 1 2390.00
Sam Bob 1 95.00
Mel Jim 2 395.00
Nic Jim 2 1095.00
Ant Mel 3 0.00
Pat Mel 3 195.00
*/
23 June 2012