Part of software development is researching new tools and finding ways to integrate them with what you know and love. Given today’s technology landscape, having multiple products power your solution in new and exciting ways is not uncommon. While your choices are plenty, most solutions typically have three distinct elements: a user interface, a backend, and a data storage mechanism. The data storage mechanism is the most important, as its features can typically dictate design choices later in the development process.
Today’s post will look at a particular storage approach called Time Series while utilizing technologies like PostgreSQL and its TimescaleDB extension, all integrated with Entity Framework Core.
What’s Time Series?
Most databases work with “current state” management, where each entry is an aggregate of user actions. Any particular record is a side effect of user interactions in an ever-evolving system. While “current state” management is the most popular approach used by developers, it is also known to cause a loss of insight into a business process. You might see the current data but have no clue how it came to be that way.
A storage approach that attempts to resolve the loss of user actions is referred to as Event Sourcing. Event Sourcing stores a user’s actions as events, building projections of the current state by processing all events up to a point. It’s a bit more work but allows you to reprocess user events into new projections for insights you may not have considered when building a feature. When building event sourcing systems, your focus will be on logical chronology, but without a strict time interval. Of course, events can happen when they happen, and that’s ok. Still, this approach thinks of events as ultimately serving the need to build projections for deriving the state.
A time series is yet another approach, but it’s more concerned about events occurring within a particular time interval. As you may have guessed, the time dimension is essential to this storage approach. If you’re into mathematics, a time series is a set of data points listed in time order within a discrete time. Given the ubiquity of the Internet of Things (IoT) devices, time series databases can help ingest large amounts of data while allowing developers to glance at insights based on seconds, hours, days, and more. As such, each entry’s most essential data element is its timestamp.
Advantages to Time Series databases include:
- Efficient ingestion of data to produce insights
- Efficient data management once data has fallen out of retention policies
- An expressive syntax designed for interval-based data like
last
,first
,avg
, and more
Use cases for a Time Series database include:
- Infrastructure monitoring
- Anomaly detection and alert messaging
- Forecasting and predictions
- General systems analysis
In general, if you’re dealing with large amounts of data where a timestamp is an essential piece of your data model, you may want to consider a time series database.
Using PostgreSQL and TimescaleDB
TimescaleDB is an open-source time series database developed by Timescale Inc. It is a solution built to extend the capabilities of the popular PostgreSQL database, with additional SQL functions and table structures that support storing your time series data to improve storage efficiency, indexing performance, and data analysis at a large scale.
For .NET users, PostgreSQL support has significantly improved, with libraries like Npgsql doing a lot of the ADO.NET work for us. As a bonus, if you’re an Entity Framework Core user, you’ll find that PostgreSQL is a first-class supported provider that supports most of the significant features in the EF Core toolbox.
The combination of PostgreSQL, TimescaleDB, and EF Core is possible and quite pleasant once it’s all set up, but it takes some effort.
Docker and TimescaleDB Instance
The easiest way to start with TimescaleDB is to use the official docker image to create a container instance.
docker pull timescale/timescaledb-ha:pg14-latest
docker run -d --name timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:latest-pg14
You can connect to the instance of PostgreSQL with any database tool, but I prefer the built-in tools found in JetBrains Rider.
What is TimescaleDB’s Hypertable?
Hypertables are essential to TimescaleDB, and understanding them will help you optimize your applications. A Hypertable is a specialized table around the concept of interval chunks. The default chunk is typically 1 day
, and the timestamp groups values for efficient indexing and queries. TimescaleDB triggers chunks automatically when you insert new data, so it’s mostly an invisible process to you, the user.
Another important note about Hypertables is that they typically do not contain a primary key. Instead, a hypertable stores data, and you apply meaning to the data using aggregate functions. The lack of a key contrasts the ideology of EF Core and the “current state” approach, which typically needs a key on your entity; we’ll see how to overcome this later.
You can also apply indexes to improve query performance. In general, all your database administration tools and techniques will still work on a hypertable. That’s good news for most teams looking for more value from a new dependency rather than a burden.
Setting Up Your Solution
If interested, I’ve included a working TimescaleDB and EF Core solution on my GitHub. Still, I’ll also discuss common issues you may encounter when adopting TimescaleDB in your .NET applications.
Before we start, create a Console Application with the EF Core CLI tooling installed. You can follow the CLI instructions below if you’re unfamiliar with how to do that.
dotnet new console -o HelloTimescale && cd HelloTimescale
dotnet new tool-manifest
dotnet tool install dotnet-ef
You’ll also need the NuGet packages of:
- Microsoft.EntityFrameworkCore.Design
- Npgsql.EntityFrameworkCore.PostgreSQL
The Hypertable Models
I mentioned in the previous section that Hypertable entries are keyless. Keyless entities are possible with EF Core, but they come with caveats:
- Never tracked for changes; therefore, you can never insert, update, or delete when using the DbContext.
- Only support a subset of navigation mapping capabilities.
- Can’t map between two keyless entities
- Can be mapped to a
defining query
that acts as a data source similar toDbSet
.
In the sample project, you’ll notice that we have two keyless entities of Stock
and IntervalResult
. First, let’s focus on Stock
as this is the more interesting of the two models. Then, we’ll create a Stocks
table and a Companies
table to mimic TimescaleDB’s test data.
[Keyless]
public class Stock
{
public DateTimeOffset Time { get; set; }
[ForeignKey( /* navigation property */ nameof(Company))]
public string Symbol { get; set; } = "";
public decimal? Price { get; set; }
public int? DayVolume { get; set; }
public Company Company { get; set; } = default!;
}
public class Company
{
[Key] public string Symbol { get; set; } = "";
public string Name { get; set; } = "";
}
Once you create these entities, you’ll need to add them to the DbContext
class.
public DbSet<Stock> Stocks { get; set; } = default!;
public DbSet<Company> Companies { get; set; } = default!;
Remember, you can only use Stocks
in a read capacity. Attempting to insert data through the DbContext
will result in an error. I recommend using ADO.NET to insert data or developing another ingress mechanism.
The next step is to add our new migration to create these tables in our database instance. In the projects root directory, type the following command.
dotnet ef migrations add Initial
The command will add a new migration to your Migrations
folder. While EF Core generates most of what we need, we need to modify our migration to turn our Stocks
table into a hypertable. Add the following lines to the end of the Up
method in the new migration.
// Convert Stocks Table to Hypertable
// language=sql
migrationBuilder.Sql(
"SELECT create_hypertable( '\"Stocks\"', 'Time');\n" +
"CREATE INDEX ix_symbol_time ON \"Stocks\" (\"Symbol\", \"Time\" DESC)"
);
You can additionally add an index to improve future query performance at this time. Adding an index is essential since we don’t have any primary keys; thus, there is no unique index at the start.
Great! Now we can run interval queries with our Stocks
table.
var sql = """
SELECT * FROM "Stocks"
WHERE "Time" > now() - INTERVAL '1 week'
AND "Symbol" = 'MSFT'
""" ;
var trades = db.Stocks.FromSqlRaw(sql).Count();
Console.WriteLine($"{trades} trades of MSFT in the last week");
""" ;
var trades = db.Stocks.FromSqlInterpolated(sql).Count();
Console.WriteLine($"{trades} trades of MSFT in the last week");
The previous code counts the number of trades for the MSFT stock within the last week. You may also notice the Interval
keyword in the SQL. The function exists because TimescaleDB comes with helper functions to make time-based logic much more straightforward.
This previous code works, but it also means we have a lot of SQL in our code base. What about a different approach?
EF Core Database Functions
EF Core has another feature called Database Functions. It allows you to map a method on your DbContext
directly to a database function. It also has the benefit of feeling more natural in a C# codebase. Let’s look at the usage first, as it makes understanding the implementation easier.
// UTC Only
// Read this first https://www.npgsql.org/doc/types/datetime.html
var date = new DateTime(2022, 06, 29, 0, 0, 0, DateTimeKind.Utc);
var top = db
.GetWeeklyResults(date)
.First(x => x.Symbol == "MSFT");
Console.WriteLine($"{top.Name} ({top.Symbol}): {top.Start:C} - {top.End:C} ~{top.Average:C}");
You’ll notice that we now have a GetWeeklyResults(date)
method on our DbContext
. So let’s see how to add it. The first step is to create an IntervalResult
type.
[Keyless]
public record IntervalResult(
string Symbol,
string Name,
decimal Start,
decimal End,
decimal Average);
We follow that up by adding our method to our DbContext
.
public IQueryable<IntervalResult> GetWeeklyResults(DateTime value)
{
if (value.Kind != DateTimeKind.Utc) {
// https://www.npgsql.org/doc/types/datetime.html
throw new ArgumentException("DateTime.Kind must be of UTC to convert to timestamp with time zone");
}
return FromExpression(() => GetWeeklyResults(value));
}
And our last step as part of the DbContext
code, is to add a configuration to our OnModelCreating
method.
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
// shouldn't be used since we have a method
modelBuilder
.HasDbFunction(typeof(StocksDbContext).GetMethod(nameof(GetWeeklyResults), new[] { typeof(DateTime) })!)
// map to entity and don't worry about tables
// mapping to a table in the snapshot
.HasName("get_weekly_results")
.IsBuiltIn(false);
}
Now, let’s add our function to our database via an EF Core migration.
dotnet ef migrations add AddGetWeeklyResultsFunction
When EF Core adds the migration to your project, empty the file’s Up
and Down
methods and replace them with the following.
public partial class AddGetWeeklyResultsFunction : Migration
{
/// <inheritdoc />
protected override void Up(MigrationBuilder migrationBuilder)
{
var function = $"""
create or replace FUNCTION get_weekly_results("value" timestamp with time zone)
returns Table
(
"Symbol" text,
"Name" text,
"Start" numeric,
"End" numeric,
"Average" numeric
)
LANGUAGE SQL
as
$func$
SELECT srt."Symbol",
C."Name",
first("Price", "Time") as "Start",
last("Price", "Time") as "End",
avg("Price") as "Average"
FROM "Stocks" srt
inner join "Companies" C on C."Symbol" = srt."Symbol"
WHERE "Time" > "value" - INTERVAL '1 week'
AND "Time" <= "value"
GROUP BY srt."Symbol", "Name"
ORDER BY "End" DESC;
$func$;
""";
migrationBuilder.Sql(function);
}
/// <inheritdoc />
protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.Sql("drop function get_weekly_results(timestamp with time zone);");
}
}
For the long-term maintainability of your projects, I recommend this approach. It’s easier to manage the functions while also having a method on your DbContext
to do parameter checking before calling the database. However, maintaining your solution becomes more difficult with floating SQL scattered across your project. That said, it is your solution, so you can pick the most suitable approach.
Let’s have a look at the final solution.
using Microsoft.EntityFrameworkCore;
using TimescaleSample.Models;
var db = new StocksDbContext();
var sql = """
SELECT * FROM "Stocks"
WHERE "Time" > now() - INTERVAL '1 week'
AND "Symbol" = 'MSFT'
""" ;
var trades = db.Stocks.FromSqlRaw(sql).Count();
Console.WriteLine($"{trades} trades of MSFT in the last week");
// UTC Only
// Read this and cry https://www.npgsql.org/doc/types/datetime.html
var date = new DateTime(2022, 06, 29, 0, 0, 0, DateTimeKind.Utc);
var top = db
.GetWeeklyResults(date)
.First(x => x.Symbol == "MSFT");
Console.WriteLine($"{top.Name} ({top.Symbol}): {top.Start:C} - {top.End:C} ~{top.Average:C}");
Running our application, we can see the following output.
50892 trades of MSFT in the last week
Microsoft (MSFT): $266.39 - $256.65 ~$259.77
Awesome!
Thoughts and Conclusion
The power of TimescaleDB comes from its deployment as an extension to your existing PostgreSQL infrastructure. So you don’t have to adopt a completely different infrastructure dependency to get the most out of your time series data. You can also use existing tools to manage your database instance.
The limitations of EF Core can be a bit cumbersome but not impossible to overcome. You can also enhance the integration with EF Core by utilizing database helper functions which I did not show in this post.
Ultimately, I think this is a neat little sample to show how far EF Core and the community have come to support technologies outside the norm. I hope you enjoyed this post, and feel free to share any thoughts you have by following me on Twitter at @buhakmeh.
Remember, you can get this sample code by going to my GitHub repository and cloning the sample.