BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Regular Expressions
.NET 1.1+

Regular Expression Anchors

The fifth part of the Regular Expressions in .NET tutorial continues to look at the characters that make up a regular expression. This article explains how anchors can be used to match patterns based upon their positions within a string or line of text.

Start of String Anchor

If you only wish to find matches at the start of the text, you can use the anchor, "\A". This is demonstrated in the code below. Note that although the Multiline option is still in use, only the first match is returned.

string input = @"This bit is at the start of the string.
This bit is at the start of a line.
However, this bit is in the middle of the string.
And this bit is at the end of the string.";

foreach (Match match in Regex.Matches(input, @"\A[Tt]his", RegexOptions.Multiline))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
   
Matched 'This' at index 0
            
*/

End of String Anchor

To match text at the end of the source text only, you can use the anchor, "\Z". The match must be at the end of the string, or immediately before a final line feed character. As with the end of line anchor, when working with strings created with the .NET framework, you should check for an optional carriage return, as demonstrated in the sample code below:

string input = @"This bit is at the start of the string.
This bit is at the start of a line.
However, this bit is in the middle of the string.
And this bit is at the end of the string.";

foreach (Match match in Regex.Matches(input, @"string.\Z", RegexOptions.Multiline))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
   
Matched 'string.' at index 163
            
*/

If you modify the anchor to use a lower case version, "\z", the match must appear at the end of the string only. If it is followed by an extra line feed, the pattern will not be matched.

string input = @"This bit is at the start of the string.
This bit is at the start of a line.
However, this bit is in the middle of the string.
And this bit is at the end of the string.";

foreach (Match match in Regex.Matches(input, @"string.\z", RegexOptions.Multiline))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
   
Matched 'string.' at index 163
            
*/

Word Boundary Anchor

A commonly used item is the word boundary anchor (\b). This matches the position where a word starts or end. Words are defined as groups of contiguous word characters, which include letters, numeric digits and underscores.

The following code matches the text, "Can" or "can" where it appears at the start of a word. This means that the can in "toucan" and the second can in "cancan" are not matched.

string input = "Can the cantankerous toucan dance the cancan?";

foreach (Match match in Regex.Matches(input, @"\b[Cc]an"))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
   
Matched 'Can' at index 0
Matched 'can' at index 8
Matched 'can' at index 38
            
*/

Non-Word Boundary Anchor

The non-word boundary anchor is the opposite of the word boundary anchor. It ensures that the matching position is either between two word characters or two non-word characters. The anchor is specified using "\B".

Try running the following code. This matches the text, "Can" or "can" where it is not found at the start of a word.

string input = "Can the cantankerous toucan dance the cancan?";

foreach (Match match in Regex.Matches(input, @"\B[Cc]an"))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
   
Matched 'can' at index 24
Matched 'can' at index 41
            
*/
19 September 2015