BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Regular Expressions
.NET 1.1+

Regular Expression Anchors

The fifth part of the Regular Expressions in .NET tutorial continues to look at the characters that make up a regular expression. This article explains how anchors can be used to match patterns based upon their positions within a string or line of text.

Anchors

When working with regular expressions, you can use an anchor to state that a match must occur in a specific position. Anchors can be used to ensure that matches are found at the start or end of the source string, or at the start of end of a line in multi-line text. You can also look for patterns that occur at the start or end of a word or immediately following the previous match.

The anchors in the regular expression match a position only. They consume no characters from the source text and do not appear in the final results.

Start of Line Anchor

The first anchor that we'll consider matches the start of the string or the beginning of a line of text. New lines are deemed to start immediately following an end of line character (\n). To set the required position for the start of line character, you include a caret (^) in the regular expression.

Let's demonstrate with a sample. First, consider the code below, which matches the text, "This" or "this". No anchors are included in the regular expression so four items are found.

string input = @"This bit is at the start of the string.
This bit is at the start of a line.
However, this bit is in the middle of the string.
And this bit is at the end of the string.";

foreach (Match match in Regex.Matches(input, "[Tt]his"))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
   
Matched 'This' at index 0
Matched 'This' at index 41
Matched 'this' at index 87
Matched 'this' at index 133
            
*/

To find only words that appear at the start of the line, you can include the anchor before the text to match, as shown below. Note the use of the RegexOptions.Multiline in the call to the Matches method. This tells the regular expressions engine that the text should be processed as a multiline string. Without the option, only matches at the start of the text will be found.

string input = @"This bit is at the start of the string.
This bit is at the start of a line.
However, this bit is in the middle of the string.
And this bit is at the end of the string.";

foreach (Match match in Regex.Matches(input, "^[Tt]his", RegexOptions.Multiline))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
   
Matched 'This' at index 0
Matched 'This' at index 41
            
*/

End of Line Anchor

You can search for text at the end of the string or the end of a line using the end of line anchor ($). With this anchor, the rest of the pattern must be matched immediately before a line feed character.

Try executing the following code:

string input = @"This bit is at the start of the string.
This bit is at the start of a line.
However, this bit is in the middle of the string.
And this bit is at the end of the string.";

foreach (Match match in Regex.Matches(input, "string.$", RegexOptions.Multiline))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
   
Matched 'string.' at index 163
            
*/

The above example looks for the text, "string.". An end of line anchor specifies that only matches at the end of the string or at the end of a line should be found. However, although there are three occurrences of the text, only the final one is matched. This is because .NET strings terminate each line with a combined carriage return and line feed (\r\n) but the regular expressions engine looks only for the line feed.

To successfully match .NET strings with the end of line anchor you need to include the carriage return in the pattern. For other strings the carriage return may not be present. To create a pattern that will match correctly with or without the carriage return, include the text, "\r?" in the pattern. The question mark indicates that the pattern should match \r zero or one times. It is one of the quantifiers, which we will see later in the tutorial.

Update the previous sample, as shown below, and run it again to see the results. NB: The call to Replace is included to prevent the matched carriage returns from being included in the output.

string input = @"This bit is at the start of the string.
This bit is at the start of a line.
However, this bit is in the middle of the string.
And this bit is at the end of the string.";

foreach (Match match in Regex.Matches(input, @"string.\r?$", RegexOptions.Multiline))
{
    Console.WriteLine("Matched '{0}' at index {1}"
        , match.Value.Replace("\r", ""), match.Index);
}

/* OUTPUT
   
Matched 'string.' at index 32
Matched 'string.' at index 120
Matched 'string.' at index 163
            
*/
19 September 2015