BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Input / Output
.NET 2.0+

Reading CSV Data

CSV files are useful for transferring information between systems that cannot be connected by another method. Reading the data from CSV files, or strings containing CSV information, is made easy with standard .NET framework classes.

Comma-Separated Values

When developing software that must import information from legacy systems there are many ways in which that information can be read. Depending upon the source of the data you might be limited in your options. One data transfer option that is supported by a large number of systems is the plain text file. Using such a file for data means that you can communicate between disparate software applications that execute on many different types of hardware or operating system.

When transferring data using plain text, the sender and receiver must agree upon a format. For tabular data, which may be thought of as being a set of rows and columns, one common style is the comma-separated values (CSV) file. Comma-separated data usually consists of a set of rows of data separated by carriage return or line feed characters. Each line includes of a number of columns delimited by commas or other special characters such as bar symbols (|) or tabs.

Many people create their own CSV file parsing libraries using simple string-processing functions or regular expressions. At first, this seems like a simple proposition. However, there are several pitfalls. Firstly, you need to consider how to approach data that contains commas. Usually such information is surrounded by quotes within a CSV file. This presents the problem that some data will contain quote symbols, which are usually indicated by two quotes in the data. In addition, some CSV formats will include blank lines or lines containing comments, either of which should be ignored.

Luckily, the .NET framework includes a class that can be used to read CSV data from files, streams or strings. The class is included in the Microsoft.VisualBasic assembly. This suggests that it is targeted at Visual Basic developers. However, it can be used from other .NET languages, such as C#.

In this article we'll see how to read CSV files using the TextFieldParser class. To begin, create a new console application and add a reference to the Microsoft.VisualBasic assembly. TextFieldParser is defined in the Microsoft.VisualBasic.FileIO namespace, and we'll also be using the Sytem.IO namespace, so ensure you include the following two using directives.

using System.IO;
using Microsoft.VisualBasic.FileIO;

Reading a CSV File

We'll need a sample CSV file to work with. Create a new file using Notepad, or a similar text file editor, and save it as "test.txt". In the example code I've stored the file in the c:\temp folder. If you decide to save it elsewhere you will need to update the paths in the samples.

Col1,Col2,Col3

1,2,3
A,B,C
.,",",""""

The sample file includes several elements of CSV data that make it difficult to create your own parser. The second line is blank so should be ignored. The final line contains three single-character values, these being a full stop (period), a comma and a quote. The comma has been quoted so that it is not seen as a delimiter. The quote symbol has been replaced with two quotes and then has been quoted. This means that a single quote character is represented by four quotes in the file.

Opening the File

To read a CSV file directly from disk you can pass the path of the file to a TextFieldParser constructor. You supply the path as a string, as shown below:

TextFieldParser tfp = new TextFieldParser(@"C:\temp\test.txt")

Configuring the Reader

The TextFieldParser class can be used with several plain text formats. Before extracting the data from the file you need to tell the new instance that it is working with a file containing data columns delimited using commas. You can actually use more than one separator so the class includes a property that holds an array of delimiter strings. The code below shows how you can set the Delimiters property to specify that you only want to use commas, using an array with only one element.

tfp.Delimiters = new string[] { "," };

Reading CSV Lines

With the TextFieldParser now configured you can use the GetFields method to read a single row from the CSV file. The row data is automatically split according to the delimiter configuration and the individual fields are returned in an array in the correct order. The TextFieldParser is intelligent enough to ignore blank lines and correctly process quoted data.

When the CSV rows are exhausted, calling GetFields yields a null result. Although you can test for null, your code is probably more readable if you instead check the EndOfData property. This returns true if the source file contains no more data rows.

We can combine all of the above information to create a simple program that reads the CSV file and displays the data in a tabular format. This is shown below. Note that the instantiation of the TextFieldParser is contained within a using statement. The class uses unmanaged resources when reading files and implements IDisposable, so it is good practice to ensure that you call its Dispose method when it is no longer required.

static void Main()
{
    using (TextFieldParser tfp = new TextFieldParser(@"C:\temp\test.txt"))
    {
        tfp.Delimiters = new string[] { "," };
        while (!tfp.EndOfData)
        {
            string[] fields = tfp.ReadFields();
            ShowFields(fields);
        }
    }
}

private static void ShowFields(string[] fields)
{
    foreach (var part in fields)
    {
        Console.Write("{{{0}}}\t", part);
    }
    Console.WriteLine();
}

/* OUTPUT

{Col1}  {Col2}  {Col3}
{1}     {2}     {3}
{A}     {B}     {C}
{.}     {,}     {"}

*/

NB: You can see from the results in the above comment that the blank line is skipped and the quoted characters are correctly extracted.

4 December 2012