BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Regular Expressions
.NET 1.1+

Regular Expression Conditional Matching

The eleventh part of the Regular Expressions in .NET tutorial looks at the conditional matching construct. This allows a regular expression to include an 'if-then-else' style condition.

Conditional Matching

The conditional matching construct provided by the regular expressions engine allows you to match text in different ways, according to an initial match. The construct includes three groups of pattern characters. The first, non-capturing, group defines the conditional element. This is followed by two capturing groups with patterns that match and return text. If the condition matches successfully, the first capturing group is used. If not, the second is used.

The syntax for a conditional match is shown below:

(?(expression)then|else)

The expression group is the part that determines which of the following two elements should be matched. If the expression group successfully matches text, the 'then' pattern is matched. If not, the 'else' part is matched. The conditional group is a zero-length assertion, much like a lookahead or lookbehind. This means it does not consume any characters and is not included in the matched text.

Consider the following program. The input string represents part of a simple initialisation file containing categories and settings, where each setting includes a key and a value. The regular expression extracts the categories and the settings.

string input = "[WindowPosition]\n"
             + "x=200\n"
             + "y=150\n"
             + "\n"
             + "[WindowSize]\n"
             + "height=200\n"
             + "width=150\n";

string groupPattern = @"(?:^\[)(?<Group>.+)(?:\]$)";
string valuePattern = @"(?:^)(?<Key>.+)=(?<Value>.+)(?:$)";

string pattern = string.Format(@"(?(^\[){0}|{1})", groupPattern, valuePattern);

Console.WriteLine("Full Pattern: {0}\n", pattern);

foreach (Match match in Regex.Matches(input, pattern, RegexOptions.Multiline))
{
    if (match.Groups["Group"].Success)
        Console.WriteLine("Group: {0}", match.Groups[1]);
    else
        Console.WriteLine("Value: '{0}' is set to {1}", match.Groups[2], match.Groups[3]);
}

/* OUTPUT

Full Pattern: (?(^\[)(?:^\[)(?<Group>.+)(?:\]$)|(?:^)(?<Key>.+)=(?<Value>.+)(?:$))

Group: WindowPosition
Value: 'x' is set to 200
Value: 'y' is set to 150
Group: WindowSize
Value: 'height' is set to 200
Value: 'width' is set to 150

*/

To make the above code more readable, the regular expression is built in several stages. The first pattern, held in the groupPattern variable, extracts categories from the input string. It finds names held in square brackets, using non-capturing groups to identify the brackets and a named group for the category name.

The valuePattern string holds a pattern that extracts the key and value of a setting, using two named groups, where a line contains two items separated by an equals sign (=). This is combined with the groupPattern value into the final regular expression in the pattern variable. This adds the conditional element, which specifies that the first pattern should be used if a string in brackets is found and the second pattern should be matched otherwise.

The complete regular expression pattern is shown below. You can see that conditional regular expressions can become complex and difficult to read very quickly. The problem increases when more conditions are used. Wherever possible, it is better to use alternative, simpler approaches.

(?(^\[)(?:^\[)(?<Group>.+)(?:\]$)|(?:^)(?<Key>.+)=(?<Value>.+)(?:$))
14 November 2015