BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Regular Expressions
.NET 1.1+

Regular Expression Grouping

The eighth part of the Regular Expressions in .NET tutorial examines grouping constructs and their use in the .NET regular expressions engine. Grouping allows a regular expression to include multiple subexpressions.

Named Captured Groups

When you are matching subexpressions and extracting the results, you can often make the pattern easier to understand by using named groups, rather than the numbered ones seen above. Adding a name to the pattern means you can retrieve the information by name instead of number. This is particularly useful if you are generating the regular expression in code and the number of groups may vary.

To name a group, add a question mark immediately after the opening parenthesis character. The name should follow, within angle brackets (<>). For example, the code below applies names to the groups that match the URL within the anchor and the displayed contents. It uses the names when outputting the information.

string input = "For more information use the "
                + "<a href='http://www.blackwasp.co.uk/Contact.aspx'>contact form</a> "
                + "or check the list of "
                + "<a href='http://www.blackwasp.co.uk/FAQ.aspx'>frequently "
                + "asked questions</a>.";

string pattern = "(<a href=')(?<url>.*?)('>)(?<text>.*?)(</a>)";

foreach (Match match in Regex.Matches(input, pattern))
{
    Console.WriteLine("Match: '{0}'", match.Value);
    Console.WriteLine("URL: '{0}'", match.Groups["url"]);   // Matches extracted by name
    Console.WriteLine("Text: '{0}'", match.Groups["text"]);
    Console.WriteLine();
}

/* OUTPUT

Match: '<a href='http://www.blackwasp.co.uk/Contact.aspx'>contact form</a>'
URL: 'http://www.blackwasp.co.uk/Contact.aspx'
Text: 'contact form'

Match: '<a href='http://www.blackwasp.co.uk/FAQ.aspx'>frequently asked questions
</a>'
URL: 'http://www.blackwasp.co.uk/FAQ.aspx'
Text: 'frequently asked questions'

*/

Non-Capturing Groups

If you are using group constructs just to generate a pattern and you don't need to use the subexpression match, either from the Groups collection or elsewhere in the regular expression, it is best practise to use a non-capturing group. This type of group is matched in the same manner but is not given a number or name, is excluded from the Groups collection and does not generate the additional overhead of a standard, capturing group.

To disable capturing of a group, add a question mark and a colon (?:) immediately after the opening parenthesis. The sample code below is updated so that only the URL and hyperlink content elements are captured. The results from the groups are, therefore, extracted from indexes 1 and 2 of the Groups collection property.

string input = "For more information use the "
                + "<a href='http://www.blackwasp.co.uk/Contact.aspx'>contact form</a> "
                + "or check the list of "
                + "<a href='http://www.blackwasp.co.uk/FAQ.aspx'>frequently "
                + "asked questions</a>.";

string pattern = "(?:<a href=')(.*?)(?:'>)(.*?)(?:</a>)";

foreach (Match match in Regex.Matches(input, pattern))
{
    Console.WriteLine("Match: '{0}'", match.Value);
    Console.WriteLine("URL: '{0}'", match.Groups[1]);
    Console.WriteLine("Text: '{0}'", match.Groups[2]);
    Console.WriteLine();
}

/* OUTPUT

Match: '<a href='http://www.blackwasp.co.uk/Contact.aspx'>contact form</a>'
URL: 'http://www.blackwasp.co.uk/Contact.aspx'
Text: 'contact form'

Match: '<a href='http://www.blackwasp.co.uk/FAQ.aspx'>frequently asked questions
</a>'
URL: 'http://www.blackwasp.co.uk/FAQ.aspx'
Text: 'frequently asked questions'

*/

There are many other ways in which you can use numbered and named groups. We'll see some of the options in later instalments of the tutorial.

10 October 2015