BlackWaspTM
Regular Expressions
.NET 1.1+

Regular Expression Character Escapes

The third part of the Regular Expressions in .NET tutorial starts to look at the special characters that can be used to build regular expression patterns for matching. This article describes character escapes, which allow matching of special items such as control characters.

Character Escapes

In the previous article in this series we looked at the basic matching methods provided by the .NET regular expression engine. We saw how to match literal strings only, rather than patterns that perform fuzzy searches. At the end of the article, we touched upon the use of the escape character (\) and how it can be used to prefix text that you wish to match as a literal but is a special character in the regular expression language.

In this article we'll look at the character escapes supported by the regular expression engine. These allow you to find literal characters in different ways. For example, you can match control codes, non-printing characters and any ASCII or Unicode symbol.

Escape Character

All of the character escapes are defined in a regular expression using a backslash (\) followed by one or more letters, numbers or symbols. If the symbol after the backslash is recognised, the engine looks for the desired character escape. If it is not a character escape, the single character after the backslash is matched as a literal character. This means that you can match backslashes with a double-backslash (\\), as shown below:

string input = @"To match a \, you must use two backslashes (\\)";

foreach (Match match in Regex.Matches(input, @"\\"))
{
    Console.WriteLine("Matched at index {0}", match.Index);
}

/* OUTPUT
 
Matched at index 11
Matched at index 44
Matched at index 45
 
*/

Matching Tabs

Text files can include tab characters, which are used to align text, particularly in text files that contain lists. You can use character escapes to find both horizontal and vertical tabs. For horizontal tabs, use the escape, \t. Matching \v looks for vertical tabs.

The sample below finds the two horizontal tabs in the source string. The tabs are defined in the string using \t, as the same escape sequence is used in C# strings as in regular expressions for horizontal tabs.

string input = "Tab\t\tTab";

foreach (Match match in Regex.Matches(input, @"\t"))
{
    Console.WriteLine("Matched at index {0}", match.Index);
}

/* OUTPUT
 
Matched at index 3
Matched at index 4
 
*/

Matching Non-Printing Characters

There are several non-printing characters that can be matched as literals using regular expression character escapes. Three commonly used items are \r, which finds carriage returns, \n, which searches for a new line character and \f, which finds form feeds.

The code below looks for \r\n to find the adjacent carriage return and line feed that is inserted between the two words in the source text.

string input = 
@"Hello,
world";

foreach (Match match in Regex.Matches(input, @"\r\n"))
{
    Console.WriteLine("Matched at index {0}", match.Index);
}

/* OUTPUT
 
Matched at index 6
 
*/

Matching Control Characters

In addition to the non-printing characters mentioned above, you can also match non-printing control characters. This can be particularly useful when working with keyboard input, where the user uses control characters.

Three character escapes are provided for specific control characters. \a matches the bell control code. \b finds the backspace character and \e matches Escape. The other control codes can be matched using the \c character escape. This must be followed by the letter that represents the control code. For example, to match Ctrl-C, you would use the character escape sequence, \cC.

8 September 2015