Getting Started with EF Core, PostgreSQL, and TimescaleDB

Part of software development is researching new tools and finding ways to integrate them with what you know and love. Given today’s technology landscape, having multiple products power your solution in new and exciting ways is not uncommon. While your choices are plenty, most solutions typically have three distinct elements: a user interface, a backend, and a data storage mechanism. The data storage mechanism is the most important, as its features can typically dictate design choices later in the development process.

Today’s post will look at a particular storage approach called Time Series while utilizing technologies like PostgreSQL and its TimescaleDB extension, all integrated with Entity Framework Core.

What’s Time Series?

Most databases work with “current state” management, where each entry is an aggregate of user actions. Any particular record is a side effect of user interactions in an ever-evolving system. While “current state” management is the most popular approach used by developers, it is also known to cause a loss of insight into a business process. You might see the current data but have no clue how it came to be that way.

A storage approach that attempts to resolve the loss of user actions is referred to as Event Sourcing. Event Sourcing stores a user’s actions as events, building projections of the current state by processing all events up to a point. It’s a bit more work but allows you to reprocess user events into new projections for insights you may not have considered when building a feature. When building event sourcing systems, your focus will be on logical chronology, but without a strict time interval. Of course, events can happen when they happen, and that’s ok. Still, this approach thinks of events as ultimately serving the need to build projections for deriving the state.

A time series is yet another approach, but it’s more concerned about events occurring within a particular time interval. As you may have guessed, the time dimension is essential to this storage approach. If you’re into mathematics, a time series is a set of data points listed in time order within a discrete time. Given the ubiquity of the Internet of Things (IoT) devices, time series databases can help ingest large amounts of data while allowing developers to glance at insights based on seconds, hours, days, and more. As such, each entry’s most essential data element is its timestamp.

Advantages to Time Series databases include:

Efficient ingestion of data to produce insights
Efficient data management once data has fallen out of retention policies
An expressive syntax designed for interval-based data like last, first, avg, and more

Use cases for a Time Series database include:

Infrastructure monitoring
Anomaly detection and alert messaging
Forecasting and predictions
General systems analysis

In general, if you’re dealing with large amounts of data where a timestamp is an essential piece of your data model, you may want to consider a time series database.

Using PostgreSQL and TimescaleDB

TimescaleDB is an open-source time series database developed by Timescale Inc. It is a solution built to extend the capabilities of the popular PostgreSQL database, with additional SQL functions and table structures that support storing your time series data to improve storage efficiency, indexing performance, and data analysis at a large scale.

For .NET users, PostgreSQL support has significantly improved, with libraries like Npgsql doing a lot of the ADO.NET work for us. As a bonus, if you’re an Entity Framework Core user, you’ll find that PostgreSQL is a first-class supported provider that supports most of the significant features in the EF Core toolbox.

The combination of PostgreSQL, TimescaleDB, and EF Core is possible and quite pleasant once it’s all set up, but it takes some effort.

Docker and TimescaleDB Instance

The easiest way to start with TimescaleDB is to use the official docker image to create a container instance.

docker pull timescale/timescaledb-ha:pg14-latest
docker run -d --name timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:latest-pg14

You can connect to the instance of PostgreSQL with any database tool, but I prefer the built-in tools found in JetBrains Rider.

What is TimescaleDB’s Hypertable?

Hypertables are essential to TimescaleDB, and understanding them will help you optimize your applications. A Hypertable is a specialized table around the concept of interval chunks. The default chunk is typically 1 day, and the timestamp groups values for efficient indexing and queries. TimescaleDB triggers chunks automatically when you insert new data, so it’s mostly an invisible process to you, the user.

Another important note about Hypertables is that they typically do not contain a primary key. Instead, a hypertable stores data, and you apply meaning to the data using aggregate functions. The lack of a key contrasts the ideology of EF Core and the “current state” approach, which typically needs a key on your entity; we’ll see how to overcome this later.

You can also apply indexes to improve query performance. In general, all your database administration tools and techniques will still work on a hypertable. That’s good news for most teams looking for more value from a new dependency rather than a burden.

Setting Up Your Solution

If interested, I’ve included a working TimescaleDB and EF Core solution on my GitHub. Still, I’ll also discuss common issues you may encounter when adopting TimescaleDB in your .NET applications.

Before we start, create a Console Application with the EF Core CLI tooling installed. You can follow the CLI instructions below if you’re unfamiliar with how to do that.

dotnet new console -o HelloTimescale && cd HelloTimescale
dotnet new tool-manifest
dotnet tool install dotnet-ef

You’ll also need the NuGet packages of:

Microsoft.EntityFrameworkCore.Design
Npgsql.EntityFrameworkCore.PostgreSQL

The Hypertable Models

I mentioned in the previous section that Hypertable entries are keyless. Keyless entities are possible with EF Core, but they come with caveats:

Never tracked for changes; therefore, you can never insert, update, or delete when using the DbContext.
Only support a subset of navigation mapping capabilities.
Can’t map between two keyless entities
Can be mapped to a defining query that acts as a data source similar to DbSet.

In the sample project, you’ll notice that we have two keyless entities of Stock and IntervalResult. First, let’s focus on Stock as this is the more interesting of the two models. Then, we’ll create a Stocks table and a Companies table to mimic TimescaleDB’s test data.

[Keyless]
public class Stock
{
    public DateTimeOffset Time { get; set; }

    [ForeignKey( /* navigation property */ nameof(Company))]
    public string Symbol { get; set; } = "";

    public decimal? Price { get; set; }
    public int? DayVolume { get; set; }
    public Company Company { get; set; } = default!;
}

public class Company
{
    [Key] public string Symbol { get; set; } = "";
    public string Name { get; set; } = "";
}

Once you create these entities, you’ll need to add them to the DbContext class.

public DbSet<Stock> Stocks { get; set; } = default!;
public DbSet<Company> Companies { get; set; } = default!;

Remember, you can only use Stocks in a read capacity. Attempting to insert data through the DbContext will result in an error. I recommend using ADO.NET to insert data or developing another ingress mechanism.

The next step is to add our new migration to create these tables in our database instance. In the projects root directory, type the following command.

dotnet ef migrations add Initial

The command will add a new migration to your Migrations folder. While EF Core generates most of what we need, we need to modify our migration to turn our Stocks table into a hypertable. Add the following lines to the end of the Up method in the new migration.

// Convert Stocks Table to Hypertable
// language=sql
migrationBuilder.Sql(
    "SELECT create_hypertable( '\"Stocks\"', 'Time');\n" +
    "CREATE INDEX ix_symbol_time ON \"Stocks\" (\"Symbol\", \"Time\" DESC)"
);

You can additionally add an index to improve future query performance at this time. Adding an index is essential since we don’t have any primary keys; thus, there is no unique index at the start.

Great! Now we can run interval queries with our Stocks table.

var sql = """
SELECT * FROM "Stocks"
WHERE "Time" > now() - INTERVAL '1 week'
AND "Symbol" = 'MSFT'
""" ;

var trades = db.Stocks.FromSqlRaw(sql).Count();
Console.WriteLine($"{trades} trades of MSFT in the last week");
""" ;

var trades = db.Stocks.FromSqlInterpolated(sql).Count();
Console.WriteLine($"{trades} trades of MSFT in the last week");

The previous code counts the number of trades for the MSFT stock within the last week. You may also notice the Interval keyword in the SQL. The function exists because TimescaleDB comes with helper functions to make time-based logic much more straightforward.

This previous code works, but it also means we have a lot of SQL in our code base. What about a different approach?

EF Core Database Functions

EF Core has another feature called Database Functions. It allows you to map a method on your DbContext directly to a database function. It also has the benefit of feeling more natural in a C# codebase. Let’s look at the usage first, as it makes understanding the implementation easier.

// UTC Only
// Read this first https://www.npgsql.org/doc/types/datetime.html
var date = new DateTime(2022, 06, 29, 0, 0, 0, DateTimeKind.Utc);
var top = db
    .GetWeeklyResults(date)
    .First(x => x.Symbol == "MSFT");

Console.WriteLine($"{top.Name} ({top.Symbol}): {top.Start:C} - {top.End:C} ~{top.Average:C}");

You’ll notice that we now have a GetWeeklyResults(date) method on our DbContext. So let’s see how to add it. The first step is to create an IntervalResult type.

[Keyless]
public record IntervalResult(
    string Symbol,
    string Name,
    decimal Start,
    decimal End,
    decimal Average);

We follow that up by adding our method to our DbContext.

public IQueryable<IntervalResult> GetWeeklyResults(DateTime value)
{
    if (value.Kind != DateTimeKind.Utc) {
        // https://www.npgsql.org/doc/types/datetime.html
        throw new ArgumentException("DateTime.Kind must be of UTC to convert to timestamp with time zone");
    }
    
    return FromExpression(() => GetWeeklyResults(value));
}

And our last step as part of the DbContext code, is to add a configuration to our OnModelCreating method.

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    // shouldn't be used since we have a method
    modelBuilder
        .HasDbFunction(typeof(StocksDbContext).GetMethod(nameof(GetWeeklyResults), new[] { typeof(DateTime) })!)
        // map to entity and don't worry about tables
        // mapping to a table in the snapshot 
        .HasName("get_weekly_results")
        .IsBuiltIn(false);
}

Now, let’s add our function to our database via an EF Core migration.

dotnet ef migrations add AddGetWeeklyResultsFunction

When EF Core adds the migration to your project, empty the file’s Up and Down methods and replace them with the following.

public partial class AddGetWeeklyResultsFunction : Migration
{
    /// <inheritdoc />
    protected override void Up(MigrationBuilder migrationBuilder)
    {
        var function = $"""
        create or replace FUNCTION get_weekly_results("value" timestamp with time zone)
            returns Table
                    (
                        "Symbol"  text,
                        "Name"    text,
                        "Start"   numeric,
                        "End"     numeric,
                        "Average" numeric
                    )
            LANGUAGE SQL
        as
        $func$
        SELECT srt."Symbol",
               C."Name",
               first("Price", "Time") as "Start",
               last("Price", "Time")  as "End",
               avg("Price")           as "Average"
        FROM "Stocks" srt
                 inner join "Companies" C on C."Symbol" = srt."Symbol"
        WHERE "Time" > "value" - INTERVAL '1 week' 
        AND  "Time" <= "value" 
        GROUP BY srt."Symbol", "Name"
        ORDER BY "End" DESC;
        $func$;
        """;

        migrationBuilder.Sql(function);
    }

    /// <inheritdoc />
    protected override void Down(MigrationBuilder migrationBuilder)
    {
        migrationBuilder.Sql("drop function get_weekly_results(timestamp with time zone);");
    }
}

For the long-term maintainability of your projects, I recommend this approach. It’s easier to manage the functions while also having a method on your DbContext to do parameter checking before calling the database. However, maintaining your solution becomes more difficult with floating SQL scattered across your project. That said, it is your solution, so you can pick the most suitable approach.

Let’s have a look at the final solution.

using Microsoft.EntityFrameworkCore;
using TimescaleSample.Models;

var db = new StocksDbContext();

var sql = """
SELECT * FROM "Stocks"
WHERE "Time" > now() - INTERVAL '1 week'
AND "Symbol" = 'MSFT'
""" ;

var trades = db.Stocks.FromSqlRaw(sql).Count();
Console.WriteLine($"{trades} trades of MSFT in the last week");

// UTC Only
// Read this and cry https://www.npgsql.org/doc/types/datetime.html
var date = new DateTime(2022, 06, 29, 0, 0, 0, DateTimeKind.Utc);
var top = db
    .GetWeeklyResults(date)
    .First(x => x.Symbol == "MSFT");

Console.WriteLine($"{top.Name} ({top.Symbol}): {top.Start:C} - {top.End:C} ~{top.Average:C}");

Running our application, we can see the following output.

50892 trades of MSFT in the last week
Microsoft (MSFT): $266.39 - $256.65 ~$259.77

Awesome!

Thoughts and Conclusion

The power of TimescaleDB comes from its deployment as an extension to your existing PostgreSQL infrastructure. So you don’t have to adopt a completely different infrastructure dependency to get the most out of your time series data. You can also use existing tools to manage your database instance.

The limitations of EF Core can be a bit cumbersome but not impossible to overcome. You can also enhance the integration with EF Core by utilizing database helper functions which I did not show in this post.

Ultimately, I think this is a neat little sample to show how far EF Core and the community have come to support technologies outside the norm. I hope you enjoyed this post, and feel free to share any thoughts you have by following me on Twitter at @buhakmeh.

Remember, you can get this sample code by going to my GitHub repository and cloning the sample.