There are the folks who write YAML, and then there are the folks required to parse it. Let me assure you that the former is more natural than the latter. YAML, which stands for YAML Ain’t Markup Language, is a tricky syntax which promotes itself as a human-friendly data serialization standard for all programming languages.
What makes YAML tricky? The first audience for YAML is humans, not computers. The syntax aims to support constructs that can be hell to program for, with my favorite example being values that can either be a single value or an array across documents. The ambiguity of the syntax has spurred on multiple “YAML sucks” factions all over the world.
While the syntax can cause headaches, I still like YAML. YAML serves its purpose well in simple scenarios. This blog contains all post metadata in a terse front matter block.
In this post, you’ll see how we can use YAML.NET and Markdig to parse a blog post’s front matter.
Getting Started
We’ll first need to install two packages: YAML.NET and Markdig.
$> dotnet add package YamlDotNet
$> dotnet add package Markdig
We will use Markdig to parse our blog posts, although we could also parse our files based on the standard convention of ---
separators and forgo Markdig altogether. For this post, we’ll use it to handle slight discrepancies that might creep in between files.
The Front Matter
Jekyll powers this blog, a blog engine written in Ruby. It has a flexible front matter format that can adapt to the needs of the author and the theme applied to the blog. This blog post’s front matter looks something like this.
---
layout: post
title: "Parse Markdown Front Matter With C#"
tags: dotnet
image: /assets/images/posts/misc/parse-frontmatter.jpg
image_credit_name: Ben Sweet
image_credit_url: https://unsplash.com/@benjaminsweet
image_alt: Man standing in dark room blue light
---
We represent all the values as string
values, including the tags
(a particular quirk of a paging plugin).
The C# Code
The first step we need to take is to create a class that represents our blog’s front matter. Note that many blogs will have different keys, and could vary from this example.
public class BlogFrontMatter
{
[YamlMember(Alias = "tags")]
public string Tags { get; set; }
[YamlMember(Alias = "title")]
public string Title { get; set; }
[YamlMember(Alias = "image")]
public string Image { get; set; }
[YamlMember(Alias = "image_credit_name")]
public string ImageCreditName { get; set; }
[YamlMember(Alias = "image_credit_url")]
public string ImageCreditUrl { get; set; }
[YamlMember(Alias = "image_alt")]
public string ImageAlt { get; set; }
[YamlMember(Alias = "redirect_from")]
public string[] RedirectFrom { get; set; }
[YamlIgnore]
public IList<string> GetTags => Tags?
.Split(",", StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim())
.ToArray();
}
Once we have our front matter, we can create an extension method for our the Markdig MarkdownDocument
class. The class will be where we pull our YAML front matter from in the form of a YamlFrontMatterBlock
.
public static class MarkdownExtensions
{
private static readonly IDeserializer YamlDeserializer =
new DeserializerBuilder()
.IgnoreUnmatchedProperties()
.Build();
private static readonly MarkdownPipeline Pipeline
= new MarkdownPipelineBuilder()
.UseYamlFrontMatter()
.Build();
public static T GetFrontMatter<T>(this string markdown)
{
var document = Markdown.Parse(markdown, Pipeline);
var block = document
.Descendants<YamlFrontMatterBlock>()
.FirstOrDefault();
if (block == null)
return default;
var yaml =
block
// this is not a mistake
// we have to call .Lines 2x
.Lines // StringLineGroup[]
.Lines // StringLine[]
.OrderByDescending(x => x.Line)
.Select(x => $"{x}\n")
.ToList()
.Select(x => x.Replace("---", string.Empty))
.Where(x => !string.IsNullOrWhiteSpace(x))
.Aggregate((s, agg) => agg + s);
return YamlDeserializer.Deserialize<T>(yaml);
}
}
We need to remove the ---
separators from our YAML block, or else YAML.NET will throw an exception. Another critical element of our code is the inclusion of MarkDig’s YAML frontmatter extension.
private static readonly MarkdownPipeline Pipeline
= new MarkdownPipelineBuilder()
// YAML Front Matter extension registered
.UseYamlFrontMatter()
.Build();
Without the extension, we won’t see the YAML block in our parsed document.
Finally, we can call our extension method after reading our files from disk.
var directory = Directory
.GetFiles("/Users/khalidabuhakmeh/Projects/blog/_posts/");
var posts = directory
.Select(File.ReadAllText)
.Select(md => md.GetFrontMatter<BlogFrontMatter>())
.ToList();
Console.WriteLine(posts.Count);
Running our code, we see the results of our front matter parser.
I hope this helped! Enjoy!