Modern development has an abundance of processing power, network bandwidth, and disk space. Given our current fortunes, we should spend these resources like thereβs no tomorrow, right? Well, no! We should be mindful of our resource utilization and how it affects our overall applicationβs run profile.
This post will show how we can use compression algorithms in the System.IO.Compression
namespace to compress and decompress a string
value. Compressing values should result in significant byte reduction.
What Is Compression?
Compression in physics is a size reduction due to forces pushing in on a mass. In terms of data compression, it is transforming data into a smaller format without any perceivable information loss. Data compression uses algorithms to encode existing information into the fewest bits possible. Different algorithms have different levels of effectiveness but typically have trade-offs in terms of time to compress or CPU processing required to achieve a desirable result. In computer science, this is the space-time complexity trade-off.
Developers should evaluate the following factors when choosing a data compression algorithm:
- Time: How long does it take to compress my particular data?
- Space: How much space do I save when compressing data?
- Lossy: Does compression cause a loss in data? Normally acceptable for audio and video to have a level of information loss.
What are data compression algorithms available to .NET developers?
.NET Data Compression Algorithms
When using .NET 5, developers have access to the System.IO.Compression
namespace, which has two compression algorithms: GZip
and Brotli
.
Gzip is a lossless algorithm for data compression. The algorithm includes redundancy checks for detecting data corruption. Linux users are likely familiar with the .gz
extension, as its commonly used in the Unix space. Creators optimized Gzip for uncompressed data. Compressing already compressed data with Gzip may increase the size from the initially compressed size.
Brotli is another lossless data compression algorithm developed at Google and is best suited for text compression. As you may have guessed, Brotli is ideal for web and content delivery, which primarily operates on HTML, JavaScript, and CSS. Brotli is considered the successor of gzip, and most major web browsers support it. It also offers far better data compression than its predecessor, gzip.
Using Compression in C#
Luckily, .NET Developers have access to both the data compression algorithms mentioned above in the form of GZipStream
and BrotliStream
. Both classes have identical APIs and inputs.
var value = "hello world";
var level = CompressionLevel.Fastest;
var bytes = Encoding.Unicode.GetBytes(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
// GZipStream with BrotliStream
await using var stream = new GZipStream(output, level);
await input.CopyToAsync(stream);
var result = output.ToArray();
var resultString = Convert.ToBase64String(result);
We can also create extension methods to make these compression algorithms easier to use in our codebase.
public static class Compression
{
public static async Task<CompressionResult> ToGzipAsync(this string value, CompressionLevel level = CompressionLevel.Fastest)
{
var bytes = Encoding.Unicode.GetBytes(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = new GZipStream(output, level);
await input.CopyToAsync(stream);
var result = output.ToArray();
return new CompressionResult(
new CompressionValue(value, bytes.Length),
new CompressionValue(Convert.ToBase64String(result), result.Length),
level,
"Gzip");
}
public static async Task<CompressionResult> ToBrotliAsync(this string value, CompressionLevel level = CompressionLevel.Fastest)
{
var bytes = Encoding.Unicode.GetBytes(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = new BrotliStream(output, level);
await input.CopyToAsync(stream);
await stream.FlushAsync();
var result = output.ToArray();
return new CompressionResult(
new CompressionValue(value, bytes.Length),
new CompressionValue(Convert.ToBase64String(result), result.Length),
level,
"Brotli"
);
}
public static async Task<string> FromGzipAsync(this string value)
{
var bytes = Convert.FromBase64String(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = new GZipStream(input, CompressionMode.Decompress);
await stream.CopyToAsync(output);
await stream.FlushAsync();
return Encoding.Unicode.GetString(output.ToArray());
}
public static async Task<string> FromBrotliAsync(this string value)
{
var bytes = Convert.FromBase64String(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = new BrotliStream(input, CompressionMode.Decompress);
await stream.CopyToAsync(output);
return Encoding.Unicode.GetString(output.ToArray());
}
}
public record CompressionResult(
CompressionValue Original,
CompressionValue Result,
CompressionLevel Level,
string Kind
)
{
public int Difference =>
Original.Size - Result.Size;
public decimal Percent =>
Math.Abs(Difference / (decimal) Original.Size);
}
public record CompressionValue(
string Value,
int Size
);
We can now use them to compress any string.
var comedyOfErrors = await File.ReadAllTextAsync("the-comedy-of-errors.txt");
var compressions = new[]
{
await comedyOfErrors.ToGzipAsync(),
await comedyOfErrors.ToBrotliAsync()
};
var table = new Table()
.MarkdownBorder()
.Title("compression in bytes")
.ShowHeaders()
.AddColumns("kind", "level", "before", "after", "difference", "% reduction");
foreach (var result in compressions)
{
table
.AddRow(
result.Kind,
result.Level.ToString(),
result.Original.Size.ToString("N0"),
result.Result.Size.ToString("N0"),
result.Difference.ToString("N0"),
result.Percent.ToString("P")
);
}
AnsiConsole.Render(table);
kind | level | before | after | difference | % reduction |
---|---|---|---|---|---|
Gzip | Fastest | 186,500 | 30,310 | 156,190 | 83.75 % |
Brotli | Fastest | 186,500 | 49,424 | 137,076 | 73.50 % |
In this example, I load Shakespeareβs play The Comedy of Errors from a text file and compress it. Whatβs interesting is the Gzip compression is better than Brotli in this case.
BrotliEncoder Instead
The System.IO.Compression
namespace also has a BrotliEncoder
class that we can use to compress strings. To use it effectively, weβll need a reference to the System.Memory
nuget package. The additional package allows us to translate existing arrays into Span
types, either explicitly or implicitly.
// compression
var source = Encoding.Unicode.GetBytes(comedyOfErrors);
var memory = new byte[source.Length];
var encoded = BrotliEncoder.TryCompress(
source,
memory,
out var encodedBytes
);
Console.WriteLine($"compress bytes: {encodedBytes}");
// decompression
var target = new byte[memory.Length];
BrotliDecoder.TryDecompress(memory, target, out var decodedBytes);
Console.WriteLine($"decompress bytes: {decodedBytes}");
var value = Encoding.Unicode.GetString(target);
Interestingly enough, when using the BrotliEncoder
, we get a more efficient resulting artifact of 33,090
bytes than using the BrotliStream
directly, which results in a byte size of 49,424
.
To try out this code, you can clone my GitHub repository.
Update & Bug Fix - Thanks Anthony Francisco!
A community member, Anthony Francisco noticed that I wasnβt getting the most optimal compression out of my Brotli
compression. In his words: βWhen CompressionLevel.Optimal is being used, and in general, the
destination compression stream should be flushed before trying to extract the bytes from the underlying stream.β
private static async Task<CompressionResult> ToCompressedStringAsync(
string value,
CompressionLevel level,
string algorithm,
Func<Stream, Stream> createCompressionStream)
{
var bytes = Encoding.Unicode.GetBytes(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = createCompressionStream(output);
await input.CopyToAsync(stream);
// calling to flush the stream first to get optimal
// compression results
await output.FlushAsync();
var result = output.ToArray();
return new CompressionResult(
new(value, bytes.Length),
new(Convert.ToBase64String(result), result.Length),
level,
algorithm);
}
With the updated code sample being update to the following:
public static class Compression
{
private static async Task<CompressionResult> ToCompressedStringAsync(
string value,
CompressionLevel level,
string algorithm,
Func<Stream, Stream> createCompressionStream)
{
var bytes = Encoding.Unicode.GetBytes(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = createCompressionStream(output);
await input.CopyToAsync(stream);
await stream.FlushAsync();
var result = output.ToArray();
return new CompressionResult(
new(value, bytes.Length),
new(Convert.ToBase64String(result), result.Length),
level,
algorithm);
}
public static async Task<CompressionResult> ToGzipAsync(this string value, CompressionLevel level = CompressionLevel.Fastest)
=> await ToCompressedStringAsync(value, level, "GZip", s => new GZipStream(s, level));
public static async Task<CompressionResult> ToBrotliAsync(this string value, CompressionLevel level = CompressionLevel.Fastest)
=> await ToCompressedStringAsync(value, level, "Brotli", s => new BrotliStream(s, level));
private static async Task<string> FromCompressedStringAsync(string value, Func<Stream, Stream> createDecompressionStream)
{
var bytes = Convert.FromBase64String(value);
await using var input = new MemoryStream(bytes);
await using var output = new MemoryStream();
await using var stream = createDecompressionStream(input);
await stream.CopyToAsync(output);
await output.FlushAsync();
return Encoding.Unicode.GetString(output.ToArray());
}
public static async Task<string> FromGzipAsync(this string value)
=> await FromCompressedStringAsync(value, s => new GZipStream(s, CompressionMode.Decompress));
public static async Task<string> FromBrotliAsync(this string value)
=> await FromCompressedStringAsync(value, s => new BrotliStream(s, CompressionMode.Decompress));
}
public record CompressionResult(
CompressionValue Original,
CompressionValue Result,
CompressionLevel Level,
string Kind
)
{
public int Difference =>
Original.Size - Result.Size;
public decimal Percent =>
Math.Abs(Difference / (decimal) Original.Size);
}
public record CompressionValue(
string Value,
int Size
);
The results after the code changes reflect a more accurate output.
ββββββββββ¬ββββββββββ¬ββββββββββ¬βββββββββ¬βββββββββββββ¬ββββββββββββββ
β kind β level β before β after β difference β % reduction β
ββββββββββΌββββββββββΌββββββββββΌβββββββββΌβββββββββββββΌββββββββββββββ€
β GZip β Fastest β 180,098 β 52,272 β 127,826 β 70.976% β
β GZip β Optimal β 180,098 β 41,175 β 138,923 β 77.137% β
β Brotli β Fastest β 180,098 β 48,408 β 131,690 β 73.121% β
β Brotli β Optimal β 180,098 β 32,833 β 147,265 β 81.769% β
ββββββββββ΄ββββββββββ΄ββββββββββ΄βββββββββ΄βββββββββββββ΄ββββββββββββββ
compress bytes: 32832
decompress bytes: 180098
Still, very impressive to see a compression rate above 70%. Thanks again to Anthony for pointing out the bug in the original code. The code in the repository has been updated.
To try out this code, you can clone my GitHub repository.
Conclusion
Data compression is an integral part of modern software development; in most cases, compression is a low-level feature of a web server or framework. We only need to enable it and get the benefits of smaller payloads and reduced bandwidth usage. That said, it is nice to know we can take advantage of the System.IO.Compression
namespace to compress any data we choose to manually. And as always, be sure to flush your streams.
I hope you found this post helpful, and thank you for reading.