How to Write a .NET Markdig Extension for Markdown Processing

Markdown is a powerful writing format with simplicity at its core. It’s no surprise that it is as popular as it is since it can help authors focus more on the art of writing rather than the aesthetics of their work. While there is a standard specification for the language, there are extensions to the Markdown syntax that can enhance the authoring experience in niche contexts. For example, LaTex support for mathematics, improved media support for online services like YouTube, and diagram support via Mermaid for all aspiring software architects.

With Markdown, if you can write it, you can parse and translate it to a desired output. In this post, we’ll explore the Markdig library by author Alexandre Mutel, which is .NET’s fastest and most powerful CommonMark-compliant Markdown parser. Most importantly, it’s also extensible!

What is Markdown and Markdig?

For folks new to Markdown, it is a text-based format used to help writers focus on the structure and content of their work rather than the aesthetics of the work.

Aesthetics include font choices, colors, font sizing, and overall layout concerns. While aesthetics can help tell a more immersive story, they can hinder the writing process if introduced too early.

Markdown focuses on some familiar structural writing tropes and allows you to express them using simplified structures and symbols amongst code. These include headers, links, emphasis, lists, and more. Markdown gets its name because, typically, the format is converted into markup, also known as HTML. Although, Markdown can have many targets, including PDFs, presentation slides, and much more. Your imagination is the limit.

As the introduction mentions, Markdig is a .NET library aimed at helping developers process and transform markdown files. It’s a drop-in and ready library for most needs but also has extensibility options.

Writing a Markdig Inline Parser Extension

There are three parts to writing an extension for Markdig: Markdown syntax, the processing pipeline, and the syntax parser. We’ll walk through all three parts and why they are essential. Let’s start with first describing the intent of our extension.

Given the following syntax, we want to parse any matching token and replace it with a GitHub username link.

this is a link to [github:khalidabuhakmeh]  
and [github:maartenba]

with a resulting output of HTML.

<p>this is a link to <a href="https://github.com/khalidabuhakmeh"/>khalidabuhakmeh</a>
and <a href="https://github.com/maartenba"/>maartenba</a></p>

To parse this with markdown with Markdig, you must first install the NuGet package Markdig.

dotnet add package Markdig

Next, we’ll need to set up a MarkdownPipeline using a MarkdownPipelineBuilder.

using System.Text.RegularExpressions;
using Markdig;
using Markdig.Helpers;
using Markdig.Parsers;
using Markdig.Renderers;
using Markdig.Syntax.Inlines;

var pipeline = new MarkdownPipelineBuilder()
    .Use<GitHubUserProfileExtension>()
    .Build();

var html = Markdown
    .ToHtml("""
            this is a link to [github:khalidabuhakmeh]
            and [github:maartenba]
            """, pipeline);

Console.WriteLine(html);

The pipeline is a series of syntax parsers that run over the markdown document, switching out the syntax for the final output.

You’ll notice the mention of GitHubUserProfileExtension when building the pipeline. This is our new extension. Let’s take a look at the implementation.

public class GitHubUserProfileExtension : IMarkdownExtension
{
    public void Setup(MarkdownPipelineBuilder pipeline)
    {
        if (!pipeline.InlineParsers.Contains<GitHubUserProfileParser>())
        {
            pipeline.InlineParsers.Insert(0, new GitHubUserProfileParser());
        }
    }

    public void Setup(MarkdownPipeline pipeline, IMarkdownRenderer renderer)
    {
    }
}

The extension class is the opportunity to add the GitHubUserProfileParser to the collection of InlineParsers. Parsers take incoming markdown syntax and process the value to its final result.

In my case, I insert the new parser at the beginning of the collection. Parsers execute in the order they are registered. Since I’m reusing the syntax of a link in Markdown, I want to ensure I can process the token before any other parser does. If your parser operates on unique syntax, you can add the parser anywhere in the collection.

Now, let’s get to the parser.

public partial class GitHubUserProfileParser : InlineParser
{
    public GitHubUserProfileParser()
    {
        OpeningCharacters = new[] { '[' };
    }
    
    public override bool Match(InlineProcessor processor, ref StringSlice slice)
    {
        var precedingCharacter = slice.PeekCharExtra(-1);
        if (!precedingCharacter.IsWhiteSpaceOrZero())
        {
            return false;
        }
        
        var regex = GithubTagRegex();
        var match = regex.Match(slice.ToString());
        
        if (!match.Success)
        {
            return false;
        }
        
        var username = match.Groups["username"].Value;
        var literal = $"<a href=\"https://github.com/{username}\"/>{username}</a>";
        
        processor.Inline = new HtmlInline(literal)
        {
            Span =
            {
                Start = processor.GetSourcePosition(slice.Start, out var line, out var column)
            },
            Line = line,
            Column = column,
            IsClosed = true
        };
        processor.Inline.Span.End = processor.Inline.Span.Start + match.Length - 1;
        slice.Start += match.Length;
        return true;
    }

    [GeneratedRegex(@"\[github:(?<username>\w+)]")]
    private static partial Regex GithubTagRegex();
}

There are a few crucial elements to a parser, but none more critical than the OpeningCharacters collection. These characters are what trigger entry into the Match method. Without setting this value, your parser will be responsible for parsing all the text. I made this mistake, but Alexandre Mutel was kind enough to point out my mistake.

Next, in the Match method, we get a reference to a StringSlice, which allows us to check if we’ve matched our expected token. You can look forward and backward here in the provided string, or my case, use a source-generated Regex to match the token.

Once matched, you can create an inline representation of the value to be replaced. I want to replace the token with an anchor tag pointing to a user’s GitHub profile. You also need to calculate where the token begins and ends, and you can do that using the GetSourcePosition method.

I also make sure that I set IsClosed to true. This lets other parsers know that I’ve handled this token and that they should not attempt to modify the token. This depends on your use case, but for this one, this is the end of processing for this token.

Finally, we set the slice.Start position after the token so we don’t keep processing it. If you’re experiencing an exception with a “The parser is in an invalid infinite loop” message, you’ve likely forgotten to move the starting position past the current token.

It’s that easy! Here’s the complete sample in its entirety.

using System.Text.RegularExpressions;
using Markdig;
using Markdig.Helpers;
using Markdig.Parsers;
using Markdig.Renderers;
using Markdig.Syntax.Inlines;

var pipeline = new MarkdownPipelineBuilder()
    .Use<GitHubUserProfileExtension>()
    .Build();

var html = Markdown
    .ToHtml("""
            this is a link to [github:khalidabuhakmeh]
            and [github:maartenba]
            """, pipeline);

Console.WriteLine(html);

public class GitHubUserProfileExtension : IMarkdownExtension
{
    public void Setup(MarkdownPipelineBuilder pipeline)
    {
        if (!pipeline.InlineParsers.Contains<GitHubUserProfileParser>())
        {
            pipeline.InlineParsers.Insert(0, new GitHubUserProfileParser());
        }
    }

    public void Setup(MarkdownPipeline pipeline, IMarkdownRenderer renderer)
    {
    }
}

public partial class GitHubUserProfileParser : InlineParser
{
    public GitHubUserProfileParser()
    {
        OpeningCharacters = new[] { '[' };
    }
    
    public override bool Match(InlineProcessor processor, ref StringSlice slice)
    {
        var precedingCharacter = slice.PeekCharExtra(-1);
        if (!precedingCharacter.IsWhiteSpaceOrZero())
        {
            return false;
        }
        
        var regex = GithubTagRegex();
        var match = regex.Match(slice.ToString());
        
        if (!match.Success)
        {
            return false;
        }
        
        var username = match.Groups["username"].Value;
        var literal = $"<a href=\"https://github.com/{username}\"/>{username}</a>";
        
        processor.Inline = new HtmlInline(literal)
        {
            Span =
            {
                Start = processor.GetSourcePosition(slice.Start, out var line, out var column)
            },
            Line = line,
            Column = column,
            IsClosed = true
        };
        processor.Inline.Span.End = processor.Inline.Span.Start + match.Length - 1;
        slice.Start += match.Length;
        return true;
    }

    [GeneratedRegex(@"\[github:(?<username>\w+)]")]
    private static partial Regex GithubTagRegex();
}

Conclusion

I love Markdown, and Markdig helps .NET developers embrace the wonders of the specification. The extensibility of Markdig also allows .NET developers to go beyond the CommonMark specification and build unique Markdown flavors for their specific purpose. As you’ve seen, it only takes a few classes to extend the functionality of a markdown document beyond the already impressive extensions included with Markdig.

I hope you enjoyed this post, and thanks to Alexandre Mutel and other Markdig contributors for their fantastic work. Thanks for reading and sharing my posts with friends and colleagues.

How to Write a .NET Markdig Extension for Markdown Processing

What is Markdown and Markdig?

Writing a Markdig Inline Parser Extension

Conclusion

About Khalid Abuhakmeh

Read Next

Faster .NET Database Integration Tests with Respawn and xUnit

View Transitions API with ASP.NET Core and HTMX