In a previous post, I talked about handling file uploads with ASP.NET. This time around, we will see how we can protect the integrity of our file uploads by verifying the files are what the user says they are. As developers, we are always walking the tight rope between giving our users the functionality they need, while protecting them from other nefarious users. While we hope that our users are good members of the ecosystems we build, we shouldn’t assume they are.

In this post, we’ll see how we can peek into a file’s bytes to verify its exact format instead of trusting the file extension at the time of upload. The technique will help us deflect some file renaming attacks.

File Signatures & Header Bytes

File formats can start with a unique set of bytes that identify the file type. These bytes are variable in length but shared across the specific file format. For example, a MPEG video file must start with the following four bytes of 00 00 01 BA or 00 00 01 B3. Knowing this characteristic of MPEG files, we can verify any file’s type without ever looking at the name or extension.

File types can be identified with a leading bytes signature. signature database

We can lookup our specific byte signatures for our needs at this signature database, or we can do the work ourselves by signature sniffing verified file types and deducing the shared bytes.

Signature Verification In C#

TL;DR; You can download the sample project from GitHub and run it to see the results of file signature verification.

We want to create a format that allows us to drop in file types quickly into our solutions. For that, we can create a FileType base class that has the verification logic.

public abstract class FileType
{
    protected string Description { get; set; }
    protected string Name { get; set; }

    private List<string> Extensions { get; }
        = new List<string>();

    private List<byte[]> Signatures { get; }
        = new List<byte[]>();
    
    public int SignatureLength => Signatures.Max(m => m.Length);

    protected FileType AddSignatures(params byte[][] bytes)
    {
        Signatures.AddRange(bytes);
        return this;
    }

    protected FileType AddExtensions(params string[] extensions)
    {
        Extensions.AddRange(extensions);
        return this;
    }

    public FileTypeVerifyResult Verify(Stream stream)
    {
        stream.Position = 0;
        var reader = new BinaryReader(stream);
        var headerBytes = reader.ReadBytes(SignatureLength);

        return new FileTypeVerifyResult
        {
            Name = Name,
            Description = Description,
            IsVerified = Signatures.Any(signature =>
                headerBytes.Take(signature.Length)
                    .SequenceEqual(signature)
            )
        };
    }
}

public class FileTypeVerifyResult
{
    public string Name { get; set; }
    public string Description { get; set; }
    public bool IsVerified { get; set; }
}

The heart of this class lies in the Verify method. It reads the max bytes it needs to verify the file format and then determines if the incoming stream matches any of the known signatures for this particular file type. From this base class, we can create new file types.

public sealed class Jpeg : FileType
{
    public Jpeg()
    {
        Name = "JPEG";
        Description = "JPEG IMAGE";
        AddExtensions("jpeg", "jpg");
        AddSignatures(
            new byte[] { 0xFF, 0xD8, 0xFF, 0xE0 },
            new byte[] { 0xFF, 0xD8, 0xFF, 0xE2 },
            new byte[] { 0xFF, 0xD8, 0xFF, 0xE3 }
        );
    }
}

public sealed class Mp3 : FileType
{
    public Mp3()
    {
        Name = "MP3";
        Description = "MP3 Audio File";
        AddExtensions("mp3");
        AddSignatures(
            new byte[] { 0x49, 0x44, 0x33 }
        );
    }
}

public sealed class Png : FileType
{
    public Png()
    {
        Name = "PNG";
        Description = "PNG Image";
        AddExtensions("png");
        AddSignatures(
            new byte[] {0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A}
        );
    }
}

For our solution, it would be nice to have one entry point into verifying file types. Since we don’t know what the file is, we may need to invoke multiple FileType instances before we find a match. For that, we can create a static class of FileTypeVerifier.

public static class FileTypeVerifier
{
    private static FileTypeVerifyResult Unknown = new FileTypeVerifyResult
    {
        Name = "Unknown",
        Description = "Unknown File Type",
        IsVerified = false
    };
    
    static FileTypeVerifier()
    {
        Types = new List<FileType>
            {
                new Jpeg(),
                new Png(),
                new Mp3()
            }
            .OrderByDescending(x => x.SignatureLength)
            .ToList();
    }

    private static IEnumerable<FileType> Types { get; set; }

    public static FileTypeVerifyResult What(string path)
    {
        using var file = File.OpenRead(path);
        FileTypeVerifyResult result = null;

        foreach (var fileType in Types)
        {
            result = fileType.Verify(file);
            if (result.IsVerified)
                break;
        }

        return result?.IsVerified == true
               ? result
               : Unknown;
    }
}

We register the file types we’ve already created in the static constructor of our FileTypeVerifier class and order them descending based on signature length since signature varies in length, and there may be overlap.

All we need to do now is get some assets and run our sample application.

class Program
{
    static void Main(string[] args)
    {
        var assets = new[]
        {
            "grapes.jpg",
            "music.mp3",
            "pin.png",
            "jetbrains.svg"
        };

        Console.WriteLine("\nFile Verification Results\n");
        // Identify the file by bytes
        foreach (var asset in assets)
        {
            var path = Path.Combine("./assets", asset);
            var result = FileTypeVerifier.What(path);
            Console.WriteLine($"{asset} is a {result.Name} ({result.Description}).");
        }
    }
}

Running this sample code yields the following result.

File Verification Results

grapes.jpg is a JPEG (JPEG IMAGE).
music.mp3 is a MP3 (MP3 Audio File).
pin.png is a PNG (PNG Image).
jetbrains.svg is a Unknown (Unknown File Type).

Process finished with exit code 0.

Awesome! It worked!

We can use this approach to verify file uploads from our users. There are a few caveats we need to worry about, though. We are dealing with streams and byte arrays. Since byte arrays are in memory, this can come at the cost of high memory utilization. Techniques to reduce memory usage on our primary servers may include differing validation to a distributed network of workers. We can accept files immediately, but not make them accessible until they have run through the verification process.

To add more file signatures, it is just a matter of finding the file formats we support in our apps and plugging them in. We may also want to consider different sources of streams. The FileTypeVerifier assumes all files are on disk, but in a web environment, the data will be buffered in memory and encapsulated within the IFormFile abstraction. As always, we should experiment and adapt this solution to our needs.

Cheers!