I’ve recently been scanning the C# sub-reddits to better understand my fellow C# developers. The questions come from a range of individuals trying to solve an array of issues. The topics that interest me are approach-style questions, which generally start with “How would I…”. These are open-ended questions that have an infinite set of possible outcomes, which are fun to explore.
In this post, we’ll explore the question of how to store, read, and return generally static data from a web API.
Let’s get started!
The Original Question
Reddit user numgrippss asks the following question:
I have a client and they want a new feature. There are currently multiple API’s we can use, but it’s too expensive (like 2k a year). He asked me to create the API, which is very easy, because once a year, a few new rows should be added. The problem is how should I insert a lot of data in the database (like 10k rows of static data). –numgrippss
Let’s break down some concerns our developer has here:
- Cost is a significant consideration, with the developer mentioning the phrase too expensive.
- The data is accessed via API.
- Data is relatively small, with 10,000 records.
- Data changes infrequently. Once a year.
- Data changes are minimal, a few rows.
So what would I do in this scenario?
The Recommendation
Generally, when dealing with a low record count that is relatively static, I suggest developers think about utilizing in-memory collections. The data source we use to load data into memory is almost unimportant in this case, as we can load from a flat-file format such as comma-separated value files, JSON formatted files, or even a database. Storing data in human-readable files alongside code offers the potential for source-control management, which brings versioning and change tracking, features that normally have to be planned for in a traditional RDBMS.
Users of the API will see a substantial benefit regarding speed since no network operations to retrieve data will occur.
We’ll walk through the scenario described above and build a sample that shows the benefits of in-memory collections, and discuss the possible downsides of an approach.
Testing Our Solutions
I’ve set up an ASP.NET Core project with three endpoints:
- Root path
- In Memory response
- SQL response
In this example, we’ll use Entity Framework Core and SQLite to power the SQL endpoint. For our in-memory endpoint, we’ll load the SQLite table’s contents into memory once, at application startup. For the sake of fairness, we’ll use the same entities for both endpoints.
The DbContext
Since we’ll be storing our data in an SQLite database, we’ll need a DbContext
class. We’ll also use EF Core to seed 10,000 records as part of our initial migration. I’m using the Bogus library to randomly generate the Product
instances.
public class Database : DbContext
{
public DbSet<Product> Products { get; set; }
public Database(DbContextOptions<Database> options)
:base(options)
{ }
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
// 10,000
var productIds = 1;
var products = new Faker<Product>()
.RuleFor(m => m.Id, () => productIds++)
.RuleFor(m => m.Name, f => f.Commerce.ProductName())
.RuleFor(m => m.Department, f => f.Commerce.Department())
.RuleFor(m => m.Category, f => f.Commerce.Categories(1).First())
.RuleFor(m => m.Color, f => f.Commerce.Color())
.RuleFor(m => m.Price, f => f.Finance.Amount(1))
.RuleFor(m => m.Description, f => f.Commerce.ProductDescription())
.Generate(10_000);
// Seed the database
modelBuilder.Entity<Product>()
.HasData(products);
}
}
public class Product
{
public int Id { get; set; }
public string Name { get; set; }
public string Department { get; set; }
public string Category { get; set; }
public string Color { get; set; }
public decimal Price { get; set; }
public string Description { get; set; }
}
Interestingly, EF Core generates 10,000 records at the time we create our migration. That means the values themselves get stored in our migration file. Here’s a peek at the initial migration for this database.
migrationBuilder.InsertData(
table: "Products",
columns: new[] { "Id", "Category", "Color", "Department", "Description", "Name", "Price" },
values: new object[] { 1, "Electronics", "indigo", "Toys", "Boston's most advanced compression wear technology increases muscle oxygenation, stabilizes active muscles", "Generic Soft Chair", 177.08m });
migrationBuilder.InsertData(
table: "Products",
columns: new[] { "Id", "Category", "Color", "Department", "Description", "Name", "Price" },
values: new object[] { 6664, "Tools", "plum", "Industrial & Home", "The Apollotech B340 is an affordable wireless mouse with reliable connectivity, 12 months battery life and modern design", "Handcrafted Steel Shirt", 226.21m });
While we’re here, let’s create an InMemory
wrapper to hold the same data in memory.
public class InMemory
{
public InMemory(List<Product> products)
{
Products = products.AsReadOnly();
}
public IReadOnlyList<Product> Products { get; }
}
Since we’re dealing with an ASP.NET Core application, we’ll need to configure our services. We’ll pay the cost of loading our data into memory at application startup and add a singleton instance to ASP.NET Core’s services collection.
public void ConfigureServices(IServiceCollection services)
{
var connectionString = Configuration.GetConnectionString("Database");
services
.AddEntityFrameworkSqlite()
.AddDbContext<Database>(o => o.UseSqlite(connectionString));
// load all products into memory
using var db = new Database(
new DbContextOptionsBuilder<Database>()
.UseSqlite(connectionString)
.Options
);
var inMemory = new InMemory(db.Products.AsNoTracking().ToList());
services.AddSingleton(inMemory);
}
Finally, let’s map our endpoints.
app.UseEndpoints(endpoints =>
{
endpoints.MapGet("/", async context =>
{
await context.Response.WriteAsync(
@"<html>
<body>
<h1>To Query Or Cache?</h1>
<ul>
<li><a href='/api/in-memory'>In Memory</a></li>
<li><a href='/api/sql'>Sql</a></li>
</ul>
</body>
</html>
");
});
endpoints.MapGet("/api/in-memory", async context =>
{
var random = new Random();
var db = context.RequestServices.GetRequiredService<InMemory>();
var id = random.Next(1, 10_000);
var product = db.Products.First(p => p.Id == id);
await context.Response.WriteAsJsonAsync(product);
});
endpoints.MapGet("/api/sql", async context =>
{
var random = new Random();
var db = context.RequestServices.GetRequiredService<Database>();
var id = random.Next(1, 10_000);
var product = await db
.Products
.AsNoTracking()
.FirstAsync(p => p.Id == id);
await context.Response.WriteAsJsonAsync(product);
});
});
As we can see, both the in-memory and SQL endpoints return a Product
at random. The only difference is the way we retrieve the Product
instances. The EF Core query will utilize the LINQ interface, and to give it the best opportunity to be performant, we’ll turn object tracking off.
Performance Considerations
When thinking about performance in an ASP.NET Core application, the two most important metrics are memory and response times. Let’s first start with memory usage.
Memory Utilization
As one might guess, storing objects in memory can come with an overhead cost. Each situation is unique, and developers should evaluate what performance profile they want out of their particular use case. In this example, let’s look at the memory allocated at startup for InMemory
and 10,000 Product
instances.
Using JetBrains Rider’s Dynamic Program Analysis, we can see that our instance is around 45 megabytes.
As mentioned before, this may be fine for some folks and a deal-breaker for others. Before we make any decision, let’s look at the implications of our next performance metric, response times.
Response Times
Memory utilization is an app maintainer’s problem, and users are ultimately more concerned about response times. Let’s see how storing 10,000 records in memory fairs against retrieving a single row from SQL.
Let’s start by doing a non-scientific test against our server application. Here is the unit test.
public class ApiTests
: IClassFixture<WebApplicationFactory<Startup>>
{
private readonly ITestOutputHelper output;
private readonly WebApplicationFactory<Startup> factory;
public ApiTests(ITestOutputHelper output, WebApplicationFactory<Startup> factory)
{
this.output = output;
this.factory = factory;
var projectDir = Directory.GetCurrentDirectory();
var testSettings = Path.Combine(projectDir, "appsettings.json");
this.factory = factory.WithWebHostBuilder(builder =>
{
builder.ConfigureAppConfiguration((context,conf) =>
{
conf.AddJsonFile(testSettings);
});
});
}
[Fact]
public async Task InMemoryTest()
{
var client = factory.CreateClient();
HttpResponseMessage response;
using (new Timing("in-memory", output))
{
response = await client.GetAsync("/api/in-memory");
}
response.EnsureSuccessStatusCode();
var product = await response.Content.ReadFromJsonAsync<Product>();
Assert.NotEqual(0, product?.Id);
output.WriteLine($"Product Name: {product?.Name}");
}
[Fact]
public async Task SqliteTest()
{
var client = factory.CreateClient();
HttpResponseMessage response;
using (new Timing("sql", output))
{
response = await client.GetAsync("/api/sql");
}
response.EnsureSuccessStatusCode();
var product = await response.Content.ReadFromJsonAsync<Product>();
Assert.NotEqual(0, product?.Id);
output.WriteLine($"Product Name: {product?.Name}");
}
public class Timing : IDisposable
{
private readonly string description;
private readonly ITestOutputHelper output;
private Stopwatch stopwatch;
public Timing(string description, ITestOutputHelper output)
{
this.description = description;
this.output = output;
stopwatch = new Stopwatch();
stopwatch.Start();
}
public void Dispose()
{
stopwatch.Stop();
output.WriteLine($"{description}: {stopwatch.Elapsed.TotalMilliseconds} ms.");
}
}
}
When we run the SqliteTest
, we can see the response in 180 ms. The value is a best-case scenario as the TestServer
does not actually initiate an HTTP request.
How fast does the InMemoryTest
respond?
Wow, the InMemoryTest
responds in 80 ms. From my past experience, 100 ms is a significant improvement.
Let’s get more scientific by running some load tests against each endpoint using K6. Below you’ll find the script for testing our in-memory endpoint. The SQL endpoint test is identical.
import { check } from 'k6';
import http from 'k6/http';
export let options = {
vus: 50,
duration: '30s'
};
export default function() {
let res = http.get('https://localhost:5001/api/in-memory');
check(res, {
'is status 200' : (r) => r.status === 200
});
}
We’ll run the SQL endpoint first under our load test.
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: Tests/sql.js
output: -
scenarios: (100.00%) 1 scenario, 50 max VUs, 1m0s max duration (incl. graceful stop):
* default: 50 looping VUs for 30s (gracefulStop: 30s)
running (0m30.2s), 00/50 VUs, 71329 complete and 0 interrupted iterations
default ✓ [======================================] 50 VUs 30s
✓ is status 200
checks.....................: 100.00% ✓ 71329 ✗ 0
data_received..............: 34 MB 1.1 MB/s
data_sent..................: 8.3 MB 275 kB/s
http_req_blocked...........: avg=410.1µs min=0s med=2µs max=676.44ms p(90)=3µs p(95)=4µs
http_req_connecting........: avg=1.59µs min=0s med=0s max=3.21ms p(90)=0s p(95)=0s
http_req_duration..........: avg=20.55ms min=1.11ms med=10.24ms max=1.19s p(90)=21.78ms p(95)=119.16ms
http_req_receiving.........: avg=38.25µs min=15µs med=31µs max=7.05ms p(90)=55µs p(95)=67µs
http_req_sending...........: avg=11.48µs min=4µs med=10µs max=1.56ms p(90)=15µs p(95)=22µs
http_req_tls_handshaking...: avg=405.27µs min=0s med=0s max=672.97ms p(90)=0s p(95)=0s
http_req_waiting...........: avg=20.5ms min=1.07ms med=10.19ms max=1.19s p(90)=21.72ms p(95)=119.12ms
http_reqs..................: 71329 2364.169154/s
iteration_duration.........: avg=21.04ms min=1.17ms med=10.32ms max=1.19s p(90)=21.89ms p(95)=122.63ms
iterations.................: 71329 2364.169154/s
vus........................: 50 min=50 max=50
vus_max....................: 50 min=50 max=50
We can see that the average iteration_duration is 21.04ms. The results are substantially better than our unit tests, as we’re operating on a larger dataset and a warmed API instance.
Let’s run the in-memory benchmark and see what we get.
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: Tests/in-memory.js
output: -
scenarios: (100.00%) 1 scenario, 50 max VUs, 1m0s max duration (incl. graceful stop):
* default: 50 looping VUs for 30s (gracefulStop: 30s)
running (0m30.0s), 00/50 VUs, 755563 complete and 0 interrupted iterations
default ✓ [======================================] 50 VUs 30s
✓ is status 200
checks.....................: 100.00% ✓ 755563 ✗ 0
data_received..............: 358 MB 12 MB/s
data_sent..................: 92 MB 3.1 MB/s
http_req_blocked...........: avg=25.82µs min=0s med=2µs max=525.24ms p(90)=3µs p(95)=3µs
http_req_connecting........: avg=70ns min=0s med=0s max=1.5ms p(90)=0s p(95)=0s
http_req_duration..........: avg=1.85ms min=136µs med=1.58ms max=110.95ms p(90)=2.92ms p(95)=3.67ms
http_req_receiving.........: avg=39.02µs min=11µs med=22µs max=95.35ms p(90)=39µs p(95)=58µs
http_req_sending...........: avg=12.05µs min=4µs med=8µs max=95.78ms p(90)=12µs p(95)=18µs
http_req_tls_handshaking...: avg=23.33µs min=0s med=0s max=523.02ms p(90)=0s p(95)=0s
http_req_waiting...........: avg=1.8ms min=111µs med=1.54ms max=102.26ms p(90)=2.86ms p(95)=3.59ms
http_reqs..................: 755563 25158.147645/s
iteration_duration.........: avg=1.97ms min=188.08µs med=1.66ms max=527.42ms p(90)=3.04ms p(95)=3.85ms
iterations.................: 755563 25158.147645/s
vus........................: 50 min=50 max=50
vus_max....................: 50 min=50 max=50
Not a surprise regarding response times, but we see an average iteration_duration
of 1.97ms. That’s a 1/10 of the time it takes reading from a database, which is on disk. The differences in performance could also increase based on network topology, power of the database instance, and the amount of data retrieved on a query. We’ll also note that the throughput on our API is significantly increased from 34MB received to 358MB received. We were able to get 10x the previous throughput performance by removing the call to our database.
Conclusion
This proposed solution is only one of potentially infinite solutions out there. Each developer’s circumstance will come with different requirements, and the developer should consider all options when deploying the solution for their client. Additionally, the needs of our clients change. In the case of our Reddit user, the data may currently be static but could become increasingly dynamic due to unforeseen circumstances.
In this post, we explored registering an in-memory collection in ASP.NET Core’s service collection and its impacts on memory and response performance. We also saw how we could get a rough idea about the relative performance between any two approaches by utilizing the WebApplicationFactory
test fixture. We also did some load testing using K6 to get a response time benchmark.
If you’re interested in seeing the complete code for this post, then head over to my GitHub repository.
I hope you enjoyed this post, and let me know in the comments what approach you’d recommend. What would you prioritize, and how would you plan for future changes?
If you have a question you’d like advice on, please send me an email or message on Twitter.