BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Regular Expressions
.NET 1.1+

Regular Expression Options

The fifteenth part of the Regular Expressions in .NET tutorial continues to looks at options that can be applied to regular expressions when matching and substituting text. This article looks at the use of the RegexOptions enumeration.

CultureInvariant Option

Performing a case-insensitive matching or substitution operation can have unexpected effects. By default, the regular expression engine uses the user's current culture. This means that the engine is aware of the links between lower case and upper case letters in the user's language. However, if you are matching to text that has a fixed culture, such as keywords in a programming language, this can give false matches when the user's culture includes different casing conventions.

One well-known example of a cultural casing issue is the Turkish I problem. Where English cultures see 'I' as the capitalised version of 'i', the Turkish culture does not. Instead there are four letter 'i' characters; 'ı' is the lower case version of 'I' and 'i' capitalises to 'İ'. If you are matching terms such as "if" or "IDisposable", this may cause issues.

You can avoid some of these problems by using the CultureInvariant option. This stops the regular expression engine from using the user's cultural conventions and instead uses the well-documented invariant culture.

ECMAScript Option

ECMAScript is an internationally standardised scripting language definition that is implemented by languages such as JavaScript and ActionScript. It includes a regular expression engine. However, the functionality of that engine is not identical to that of .NET framework regular expressions. If you need to mimic the ECMAScript engine, apply the ECMAScript option from the RegexOptions enumeration.

The option modifies several matching behaviours. For example, Unicode is not supported, so character classes change. Additionally, not all other options can be combined with the ECMAScript value. When included, you may only add the IgnoreCase, Multiline and Compiled options. If you use any other value, the operation will throw an exception.

RightToLeft Option

The RightToLeft option reverses the search order direction when applying a pattern. Instead of finding the leftmost match first, the option finds the rightmost match before those that appear further to the left. The operation of the regular expression's pattern is not affected.

Try running the following sample code to demonstrate. The results show that the matches are found from right to left. Note that the RightToLeft option is combined with the IgnoreCase option using a logical OR operator.

string input = "Banana, banana, BANANA!";

foreach (Match match in Regex.Matches(
    input, "banana", RegexOptions.IgnoreCase | RegexOptions.RightToLeft))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT

Matched 'BANANA' at index 16
Matched 'banana' at index 8
Matched 'Banana' at index 0

*/

Compiled Option

Normally, when you match a regular expression, the pattern is converted to a set of custom opcodes that are interpreted each time the expression is used. If you are repeatedly using the same pattern, this can add an overhead that lowers the performance of the operations.

To improve the speed of matching, you can apply the Compiled option. The first time a compiled regular expression is used, it is compiled into intermediate language (MSIL). This gives a trade-off; the compilation is slower than the conversion to custom opcodes but the expression can be processed more quickly.

You should only compile regular expressions that will be used many times, as the initialisation process takes longer than for patterns that are interpreted. If you are using the static methods of the Regex class, the compiled expressions are cached so that they can be reused. If you are using instance methods, it is important to note that the compiled version is lost when the instance goes out of scope.

18 December 2015