Understand Reading From A File Using C#

Interaction with files is a necessary skill with any programming language. Luckily, with the .NET Framework, we have many reading options provided to us through the System.IO namespace. There are almost too many options, and for .NET coding newbies, it may be confusing which option to choose.

In this post, we’ll explore the many ways we can read a file and also benchmark each approach to understand what exactly it is doing.

Using System.IO.File

The .NET Framework provides a static class called File under the System.IO namespace. This class is what we can use to open, read, create, and update any file on disk. That said, we have many methods we can use to read a file:

Open & ReadLine
ReadLines
ReadAllText
ReadAllBytes
ReadAllLines

We’ll explore each of these methods, their performance characteristics, and when we may want to use each one.

Open & StreamReader Readline

The first approach is the most tedious of our strategies but gives us more control over reading the contents of a file.

public void ReadLine()
{
    using var stream = System.IO.File.Open(File, FileMode.Open);
    using var reader = new StreamReader(stream);

    var line = reader.ReadLine();
}

The first step is to open the file and retrieve a Stream. In this case, we will be dealing with a FileStream. Next, we use a StreamReader to read the bytes coming from our FileStream. Finally, we can call ReadLine to get the first line of our file. The approach of using a stream reader gives us control of when to stop reading our file. The controlled approach helps reduce CPU cycles and consumption of memory.

ReadLines

The next approach to reading a file is by using File.ReadLines to create an iterator pointing to our file’s data.

public void ReadLines()
{
    var file = System.IO.File.ReadLines(File);
}

If we decompile the ReadLines method, we can see what is happening within the .NET Framework. The first stop in our expedition leads us to the following method.

public static IEnumerable<string> ReadLines(string path)
{
    if (path == null)
        throw new ArgumentNullException(nameof(path));
    if (path.Length == 0)
        throw new ArgumentException(SR.Argument_EmptyPath, nameof(path));

    return ReadLinesIterator.CreateIterator(path, Encoding.UTF8);
}

There is an internal class called ReadLinesIterator that returns an iterator. When we follow the CreateIterator call to the class definition, we see the following relevant code.

public override bool MoveNext()
{
    if (this._reader != null)
    {
        this.current = _reader.ReadLine();
        if (this.current != null)
            return true;

        // To maintain 4.0 behavior we Dispose 
        // after reading to the end of the reader.
        Dispose();
    }

    return false;
}

It’s our friend, ReadLine! Note that the iterator only starts to read from the file when we begin to process the iterator. Given our code above, we should not expect any CPU or additional memory utilization until we call the iterator in a loop of some kind.

This approach to reading files can reduce the noise in our previous code, but has the same benefits as the ReadLine approach because it is the ReadLine approach.

ReadAllText

The ReadAllText method is for us folks who don’t have time to fiddle around with streams and want the entire contents of a file NOW. The documentation says as much.

Opens a text file, reads all the text in the file, and then closes the file. .NET Framework

The resulting value is a string of the entire contents of a file. As we may have guessed, this can cause some memory pressure when dealing with larger files. Here is the internal implementation of ReadAllText.

private static string InternalReadAllText(string path, Encoding encoding)
{
    Debug.Assert(path != null);
    Debug.Assert(encoding != null);
    Debug.Assert(path.Length > 0);

    using (StreamReader sr = new StreamReader(path, encoding, detectEncodingFromByteOrderMarks: true))
        return sr.ReadToEnd();
}

As we can see, a call to StreamReader.ReadToEnd is made, which returns the entire contents of the file.

ReadAllBytes

Not all files will contain string-friendly values. In some cases, we may be dealing with a proprietary binary format. That is where our next method is helpful. The ReadAllBytes method, like ReadAllText, will take all the binary elements of a file and read them into memory.

public void ReadAllBytes()
{
    byte[] bytes = System.IO.File.ReadAllBytes(File);
}

What occurs within the .NET Framework is interesting.

public static byte[] ReadAllBytes(string path)
{
    // bufferSize == 1 used to avoid unnecessary buffer in FileStream
    using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 1))
    {
        long fileLength = fs.Length;
        if (fileLength > int.MaxValue)
        {
            throw new IOException(SR.IO_FileTooLong2GB);
        }
        else if (fileLength == 0)
        {
#if !MS_IO_REDIST
            // Some file systems (e.g. procfs on Linux) return 0 for length even when there's content.
            // Thus we need to assume 0 doesn't mean empty.
            return ReadAllBytesUnknownLength(fs);
#endif
        }

        int index = 0;
        int count = (int)fileLength;
        byte[] bytes = new byte[count];
        while (count > 0)
        {
            int n = fs.Read(bytes, index, count);
            if (n == 0)
                throw Error.GetEndOfFile();
            index += n;
            count -= n;
        }
        return bytes;
    }
}

The implementation for ReadAllBytes uses the Read method of StreamReader to read all the bytes into memory. Note the two-gigabyte limit as well.

ReadAllLines

Similar to ReadAllText, ReadAllLines will read each line and return the file in the form of an array. This method can be helpful for parsing log files or filtering data file formats like a comma-separated value.

public void ReadAllLines()
{
    string[] lines = System.IO.File.ReadAllLines(File);
}

The implementation of ReadAllLines utilizes ReadLine.

private static string[] InternalReadAllLines(string path, Encoding encoding)
{
    Debug.Assert(path != null);
    Debug.Assert(encoding != null);
    Debug.Assert(path.Length != 0);

    string line;
    List<string> lines = new List<string>();

    using (StreamReader sr = new StreamReader(path, encoding))
        while ((line = sr.ReadLine()) != null)
            lines.Add(line);

    return lines.ToArray();
}

The downside to this approach, is it can be expensive in terms of memory. Buffering the entire contents of a file in memory can lead to memory exhaustion, not to mention expensive garbage collection calls.

Benchmarks

Let’s run each of our read methods through a benchmark test using BenchmarkDotNet.

using System.IO;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;

namespace ReadingRainbow
{
    class Program
    {
        static void Main(string[] args)
        {
            BenchmarkRunner
                .Run<Reading>();
        }
    }

    [MemoryDiagnoser]
    [SimpleJob(RuntimeMoniker.NetCoreApp31)]
    public class Reading
    {
        private const string File = "example.txt";
        
        [Benchmark]
        public void ReadLine()
        {
            using var stream = System.IO.File.Open(File, FileMode.Open);
            using var reader = new StreamReader(stream);

            var line = reader.ReadLine();
        }

        [Benchmark]
        public void ReadLines()
        {
            var file = System.IO.File.ReadLines(File);
        }

        [Benchmark]
        public void ReadAllText()
        {
            var text = System.IO.File.ReadAllText(File);
        }

        [Benchmark]
        public void ReadAllBytes()
        {
            var bytes = System.IO.File.ReadAllBytes(File);
        }

        [Benchmark]
        public void ReadAllLines()
        {
            string[] lines = System.IO.File.ReadAllLines(File);
        }
    }
}

The results are fascinating.

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ReadLine	65.01 us	1.161 us	1.029 us	4.1504	-	-	8.64 KB
ReadLines	50.99 us	0.759 us	0.673 us	2.2583	0.7324	-	4.63 KB
ReadAllText	62.87 us	0.780 us	0.729 us	18.1274	-	-	37.27 KB
ReadAllBytes	53.13 us	1.023 us	1.137 us	3.6621	-	-	7.54 KB
ReadAllLines	77.94 us	0.971 us	0.861 us	15.3809	-	-	31.59 KB

We can see the ReadAll* methods take up the most memory. The benchmarks that convert the bytes of the document into strings take up the most memory, while the ReadAllBytes benchmark takes up the same bytes as the file itself. Ultimately the most memory conscious approach is ReadLines, but that’s because we never iterated over the result.

Conclusion

We’ve gone through all the methods on System.IO.File that can read from disk. What’s common between them all is the use of StreamReader. They are helpful to use in our applications, but all behave slightly differently. For those concerned about getting their work done quickly, the ReadAll* methods get the job done. Folks with more critical memory constraints may want to stick with using StreamReader directly.

I hope you found this post interesting, and please leave a comment.