I’ve recently been scanning the C# sub-reddits to better understand my fellow C# developers. The questions come from a range of individuals trying to solve an array of issues. The topics that interest me are approach-style questions, which generally start with “How would I…”. These are open-ended questions that have an infinite set of possible outcomes, which are fun to explore.
In this post, we’ll explore the question of how to store, read, and return generally static data from a web API.
Let’s get started!
The Original Question
Reddit user numgrippss asks the following question:
I have a client and they want a new feature. There are currently multiple API’s we can use, but it’s too expensive (like 2k a year). He asked me to create the API, which is very easy, because once a year, a few new rows should be added. The problem is how should I insert a lot of data in the database (like 10k rows of static data). –numgrippss
Let’s break down some concerns our developer has here:
- Cost is a significant consideration, with the developer mentioning the phrase too expensive.
- The data is accessed via API.
- Data is relatively small, with 10,000 records.
- Data changes infrequently. Once a year.
- Data changes are minimal, a few rows.
So what would I do in this scenario?
The Recommendation
Generally, when dealing with a low record count that is relatively static, I suggest developers think about utilizing in-memory collections. The data source we use to load data into memory is almost unimportant in this case, as we can load from a flat-file format such as comma-separated value files, JSON formatted files, or even a database. Storing data in human-readable files alongside code offers the potential for source-control management, which brings versioning and change tracking, features that normally have to be planned for in a traditional RDBMS.
Users of the API will see a substantial benefit regarding speed since no network operations to retrieve data will occur.
We’ll walk through the scenario described above and build a sample that shows the benefits of in-memory collections, and discuss the possible downsides of an approach.
Testing Our Solutions
I’ve set up an ASP.NET Core project with three endpoints:
- Root path
- In Memory response
- SQL response
In this example, we’ll use Entity Framework Core and SQLite to power the SQL endpoint. For our in-memory endpoint, we’ll load the SQLite table’s contents into memory once, at application startup. For the sake of fairness, we’ll use the same entities for both endpoints.
The DbContext
Since we’ll be storing our data in an SQLite database, we’ll need a DbContext
class. We’ll also use EF Core to seed 10,000 records as part of our initial migration. I’m using the Bogus library to randomly generate the Product
instances.
Interestingly, EF Core generates 10,000 records at the time we create our migration. That means the values themselves get stored in our migration file. Here’s a peek at the initial migration for this database.
While we’re here, let’s create an InMemory
wrapper to hold the same data in memory.
Since we’re dealing with an ASP.NET Core application, we’ll need to configure our services. We’ll pay the cost of loading our data into memory at application startup and add a singleton instance to ASP.NET Core’s services collection.
Finally, let’s map our endpoints.
As we can see, both the in-memory and SQL endpoints return a Product
at random. The only difference is the way we retrieve the Product
instances. The EF Core query will utilize the LINQ interface, and to give it the best opportunity to be performant, we’ll turn object tracking off.
Performance Considerations
When thinking about performance in an ASP.NET Core application, the two most important metrics are memory and response times. Let’s first start with memory usage.
Memory Utilization
As one might guess, storing objects in memory can come with an overhead cost. Each situation is unique, and developers should evaluate what performance profile they want out of their particular use case. In this example, let’s look at the memory allocated at startup for InMemory
and 10,000 Product
instances.
Using JetBrains Rider’s Dynamic Program Analysis, we can see that our instance is around 45 megabytes.
As mentioned before, this may be fine for some folks and a deal-breaker for others. Before we make any decision, let’s look at the implications of our next performance metric, response times.
Response Times
Memory utilization is an app maintainer’s problem, and users are ultimately more concerned about response times. Let’s see how storing 10,000 records in memory fairs against retrieving a single row from SQL.
Let’s start by doing a non-scientific test against our server application. Here is the unit test.
When we run the SqliteTest
, we can see the response in 180 ms. The value is a best-case scenario as the TestServer
does not actually initiate an HTTP request.
How fast does the InMemoryTest
respond?
Wow, the InMemoryTest
responds in 80 ms. From my past experience, 100 ms is a significant improvement.
Let’s get more scientific by running some load tests against each endpoint using K6. Below you’ll find the script for testing our in-memory endpoint. The SQL endpoint test is identical.
We’ll run the SQL endpoint first under our load test.
We can see that the average iteration_duration is 21.04ms. The results are substantially better than our unit tests, as we’re operating on a larger dataset and a warmed API instance.
Let’s run the in-memory benchmark and see what we get.
Not a surprise regarding response times, but we see an average iteration_duration
of 1.97ms. That’s a 1/10 of the time it takes reading from a database, which is on disk. The differences in performance could also increase based on network topology, power of the database instance, and the amount of data retrieved on a query. We’ll also note that the throughput on our API is significantly increased from 34MB received to 358MB received. We were able to get 10x the previous throughput performance by removing the call to our database.
Conclusion
This proposed solution is only one of potentially infinite solutions out there. Each developer’s circumstance will come with different requirements, and the developer should consider all options when deploying the solution for their client. Additionally, the needs of our clients change. In the case of our Reddit user, the data may currently be static but could become increasingly dynamic due to unforeseen circumstances.
In this post, we explored registering an in-memory collection in ASP.NET Core’s service collection and its impacts on memory and response performance. We also saw how we could get a rough idea about the relative performance between any two approaches by utilizing the WebApplicationFactory
test fixture. We also did some load testing using K6 to get a response time benchmark.
If you’re interested in seeing the complete code for this post, then head over to my GitHub repository.
I hope you enjoyed this post, and let me know in the comments what approach you’d recommend. What would you prioritize, and how would you plan for future changes?
If you have a question you’d like advice on, please send me an email or message on Twitter.