Splitting Strings with Regular Expressions
The seventeenth part of the Regular Expressions in .NET tutorial looks at the Split method. Similar to the string class's Split method, this member splits a string into an array of items, based upon a delimiter.
Earlier in the regular expressions tutorial we saw two key methods. Matches allows you to find the parts of an input string that match a pattern and return them. Replace performs substitutions, letting you find matches and replace them with alternative text. In this article we'll begin a look at some other members of the Regex class. We'll start with the Split method.
As with the Split method that is provided by the string class, the regular expression engine version takes an input string and divides it into an array of shorter strings based upon the position of delimiting characters. For the string class, the delimiter is a single character or a string. With the Regex class, the delimiters are found using a regular expression.
The basic way to call Split is using two string parameters. The first contains the input string and the second holds the regular expression to match. Each time a match for the pattern is located, it marks the end of a substring that will be included in the resultant array. If the pattern is matched at the start of the input string, or if the pattern is matched twice in succession, the array will include one or more empty strings. The final string in the returned array contains the text that follows the last match.
To demonstrate, try running the following sample code. The regular expression in the splitOn variable looks for one or more characters from a character group, which looks for white space and some punctuation. This splits the input text into an array of individual words.
string input = "Next day he was drunk, and he went to Judge Thatcher's and bullyragged "
+ "him, and tried to make him give up the money; but he couldn't, and then "
+ "he swore he'd make the law force him.";
string splitOn = @"[\s.,;]+";
string words = Regex.Split(input, splitOn);
foreach (var word in words)
Console.WriteLine("'" + word + "'");
30 December 2015