Interaction with files is a necessary skill with any programming language. Luckily, with the .NET Framework, we have many reading options provided to us through the System.IO
namespace. There are almost too many options, and for .NET coding newbies, it may be confusing which option to choose.
In this post, we’ll explore the many ways we can read a file and also benchmark each approach to understand what exactly it is doing.
Using System.IO.File
The .NET Framework provides a static class called File
under the System.IO
namespace. This class is what we can use to open, read, create, and update any file on disk. That said, we have many methods we can use to read a file:
-
Open
&ReadLine
ReadLines
ReadAllText
ReadAllBytes
ReadAllLines
We’ll explore each of these methods, their performance characteristics, and when we may want to use each one.
Open & StreamReader Readline
The first approach is the most tedious of our strategies but gives us more control over reading the contents of a file.
public void ReadLine()
{
using var stream = System.IO.File.Open(File, FileMode.Open);
using var reader = new StreamReader(stream);
var line = reader.ReadLine();
}
The first step is to open the file and retrieve a Stream
. In this case, we will be dealing with a FileStream
. Next, we use a StreamReader
to read the bytes coming from our FileStream
. Finally, we can call ReadLine
to get the first line of our file. The approach of using a stream reader gives us control of when to stop reading our file. The controlled approach helps reduce CPU cycles and consumption of memory.
ReadLines
The next approach to reading a file is by using File.ReadLines
to create an iterator pointing to our file’s data.
public void ReadLines()
{
var file = System.IO.File.ReadLines(File);
}
If we decompile the ReadLines
method, we can see what is happening within the .NET Framework. The first stop in our expedition leads us to the following method.
public static IEnumerable<string> ReadLines(string path)
{
if (path == null)
throw new ArgumentNullException(nameof(path));
if (path.Length == 0)
throw new ArgumentException(SR.Argument_EmptyPath, nameof(path));
return ReadLinesIterator.CreateIterator(path, Encoding.UTF8);
}
There is an internal class called ReadLinesIterator
that returns an iterator. When we follow the CreateIterator
call to the class definition, we see the following relevant code.
public override bool MoveNext()
{
if (this._reader != null)
{
this.current = _reader.ReadLine();
if (this.current != null)
return true;
// To maintain 4.0 behavior we Dispose
// after reading to the end of the reader.
Dispose();
}
return false;
}
It’s our friend, ReadLine
! Note that the iterator only starts to read from the file when we begin to process the iterator. Given our code above, we should not expect any CPU or additional memory utilization until we call the iterator in a loop of some kind.
This approach to reading files can reduce the noise in our previous code, but has the same benefits as the ReadLine
approach because it is the ReadLine
approach.
ReadAllText
The ReadAllText
method is for us folks who don’t have time to fiddle around with streams and want the entire contents of a file NOW. The documentation says as much.
Opens a text file, reads all the text in the file, and then closes the file. .NET Framework
The resulting value is a string
of the entire contents of a file. As we may have guessed, this can cause some memory pressure when dealing with larger files. Here is the internal implementation of ReadAllText
.
private static string InternalReadAllText(string path, Encoding encoding)
{
Debug.Assert(path != null);
Debug.Assert(encoding != null);
Debug.Assert(path.Length > 0);
using (StreamReader sr = new StreamReader(path, encoding, detectEncodingFromByteOrderMarks: true))
return sr.ReadToEnd();
}
As we can see, a call to StreamReader.ReadToEnd
is made, which returns the entire contents of the file.
ReadAllBytes
Not all files will contain string-friendly values. In some cases, we may be dealing with a proprietary binary format. That is where our next method is helpful. The ReadAllBytes
method, like ReadAllText
, will take all the binary elements of a file and read them into memory.
public void ReadAllBytes()
{
byte[] bytes = System.IO.File.ReadAllBytes(File);
}
What occurs within the .NET Framework is interesting.
public static byte[] ReadAllBytes(string path)
{
// bufferSize == 1 used to avoid unnecessary buffer in FileStream
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 1))
{
long fileLength = fs.Length;
if (fileLength > int.MaxValue)
{
throw new IOException(SR.IO_FileTooLong2GB);
}
else if (fileLength == 0)
{
#if !MS_IO_REDIST
// Some file systems (e.g. procfs on Linux) return 0 for length even when there's content.
// Thus we need to assume 0 doesn't mean empty.
return ReadAllBytesUnknownLength(fs);
#endif
}
int index = 0;
int count = (int)fileLength;
byte[] bytes = new byte[count];
while (count > 0)
{
int n = fs.Read(bytes, index, count);
if (n == 0)
throw Error.GetEndOfFile();
index += n;
count -= n;
}
return bytes;
}
}
The implementation for ReadAllBytes
uses the Read
method of StreamReader
to read all the bytes into memory. Note the two-gigabyte limit as well.
ReadAllLines
Similar to ReadAllText
, ReadAllLines
will read each line and return the file in the form of an array. This method can be helpful for parsing log files or filtering data file formats like a comma-separated value.
public void ReadAllLines()
{
string[] lines = System.IO.File.ReadAllLines(File);
}
The implementation of ReadAllLines
utilizes ReadLine
.
private static string[] InternalReadAllLines(string path, Encoding encoding)
{
Debug.Assert(path != null);
Debug.Assert(encoding != null);
Debug.Assert(path.Length != 0);
string line;
List<string> lines = new List<string>();
using (StreamReader sr = new StreamReader(path, encoding))
while ((line = sr.ReadLine()) != null)
lines.Add(line);
return lines.ToArray();
}
The downside to this approach, is it can be expensive in terms of memory. Buffering the entire contents of a file in memory can lead to memory exhaustion, not to mention expensive garbage collection calls.
Benchmarks
Let’s run each of our read methods through a benchmark test using BenchmarkDotNet.
using System.IO;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
namespace ReadingRainbow
{
class Program
{
static void Main(string[] args)
{
BenchmarkRunner
.Run<Reading>();
}
}
[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
public class Reading
{
private const string File = "example.txt";
[Benchmark]
public void ReadLine()
{
using var stream = System.IO.File.Open(File, FileMode.Open);
using var reader = new StreamReader(stream);
var line = reader.ReadLine();
}
[Benchmark]
public void ReadLines()
{
var file = System.IO.File.ReadLines(File);
}
[Benchmark]
public void ReadAllText()
{
var text = System.IO.File.ReadAllText(File);
}
[Benchmark]
public void ReadAllBytes()
{
var bytes = System.IO.File.ReadAllBytes(File);
}
[Benchmark]
public void ReadAllLines()
{
string[] lines = System.IO.File.ReadAllLines(File);
}
}
}
The results are fascinating.
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
ReadLine | 65.01 us | 1.161 us | 1.029 us | 4.1504 | - | - | 8.64 KB |
ReadLines | 50.99 us | 0.759 us | 0.673 us | 2.2583 | 0.7324 | - | 4.63 KB |
ReadAllText | 62.87 us | 0.780 us | 0.729 us | 18.1274 | - | - | 37.27 KB |
ReadAllBytes | 53.13 us | 1.023 us | 1.137 us | 3.6621 | - | - | 7.54 KB |
ReadAllLines | 77.94 us | 0.971 us | 0.861 us | 15.3809 | - | - | 31.59 KB |
We can see the ReadAll*
methods take up the most memory. The benchmarks that convert the bytes of the document into strings take up the most memory, while the ReadAllBytes
benchmark takes up the same bytes as the file itself. Ultimately the most memory conscious approach is ReadLines
, but that’s because we never iterated over the result.
Conclusion
We’ve gone through all the methods on System.IO.File
that can read from disk. What’s common between them all is the use of StreamReader
. They are helpful to use in our applications, but all behave slightly differently. For those concerned about getting their work done quickly, the ReadAll*
methods get the job done. Folks with more critical memory constraints may want to stick with using StreamReader
directly.
I hope you found this post interesting, and please leave a comment.