Community Question: Working With Data In An HTTP API

I’ve recently been scanning the C# sub-reddits to better understand my fellow C# developers. The questions come from a range of individuals trying to solve an array of issues. The topics that interest me are approach-style questions, which generally start with “How would I…”. These are open-ended questions that have an infinite set of possible outcomes, which are fun to explore.

In this post, we’ll explore the question of how to store, read, and return generally static data from a web API.

Let’s get started!

The Original Question

Reddit user numgrippss asks the following question:

I have a client and they want a new feature. There are currently multiple API’s we can use, but it’s too expensive (like 2k a year). He asked me to create the API, which is very easy, because once a year, a few new rows should be added. The problem is how should I insert a lot of data in the database (like 10k rows of static data). –numgrippss

Let’s break down some concerns our developer has here:

Cost is a significant consideration, with the developer mentioning the phrase too expensive.
The data is accessed via API.
Data is relatively small, with 10,000 records.
Data changes infrequently. Once a year.
Data changes are minimal, a few rows.

So what would I do in this scenario?

The Recommendation

Generally, when dealing with a low record count that is relatively static, I suggest developers think about utilizing in-memory collections. The data source we use to load data into memory is almost unimportant in this case, as we can load from a flat-file format such as comma-separated value files, JSON formatted files, or even a database. Storing data in human-readable files alongside code offers the potential for source-control management, which brings versioning and change tracking, features that normally have to be planned for in a traditional RDBMS.

Users of the API will see a substantial benefit regarding speed since no network operations to retrieve data will occur.

We’ll walk through the scenario described above and build a sample that shows the benefits of in-memory collections, and discuss the possible downsides of an approach.

Testing Our Solutions

I’ve set up an ASP.NET Core project with three endpoints:

Root path
In Memory response
SQL response

In this example, we’ll use Entity Framework Core and SQLite to power the SQL endpoint. For our in-memory endpoint, we’ll load the SQLite table’s contents into memory once, at application startup. For the sake of fairness, we’ll use the same entities for both endpoints.

The DbContext

Since we’ll be storing our data in an SQLite database, we’ll need a DbContext class. We’ll also use EF Core to seed 10,000 records as part of our initial migration. I’m using the Bogus library to randomly generate the Product instances.

public class Database : DbContext
{
    public DbSet<Product> Products { get; set; }

    public Database(DbContextOptions<Database> options)
        :base(options)
    { }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        // 10,000
        var productIds = 1;
        var products = new Faker<Product>()
            .RuleFor(m => m.Id, () => productIds++)
            .RuleFor(m => m.Name, f => f.Commerce.ProductName())
            .RuleFor(m => m.Department, f => f.Commerce.Department())
            .RuleFor(m => m.Category, f => f.Commerce.Categories(1).First())
            .RuleFor(m => m.Color, f => f.Commerce.Color())
            .RuleFor(m => m.Price, f => f.Finance.Amount(1))
            .RuleFor(m => m.Description, f => f.Commerce.ProductDescription())
            .Generate(10_000);
        
        // Seed the database
        modelBuilder.Entity<Product>()
            .HasData(products);
    }
}

public class Product
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Department { get; set; }
    public string Category { get; set; }
    public string Color { get; set; }
    public decimal Price { get; set; }
    public string Description { get; set; }
}

Interestingly, EF Core generates 10,000 records at the time we create our migration. That means the values themselves get stored in our migration file. Here’s a peek at the initial migration for this database.

migrationBuilder.InsertData(
        table: "Products",
        columns: new[] { "Id", "Category", "Color", "Department", "Description", "Name", "Price" },
        values: new object[] { 1, "Electronics", "indigo", "Toys", "Boston's most advanced compression wear technology increases muscle oxygenation, stabilizes active muscles", "Generic Soft Chair", 177.08m });

migrationBuilder.InsertData(
    table: "Products",
    columns: new[] { "Id", "Category", "Color", "Department", "Description", "Name", "Price" },
    values: new object[] { 6664, "Tools", "plum", "Industrial & Home", "The Apollotech B340 is an affordable wireless mouse with reliable connectivity, 12 months battery life and modern design", "Handcrafted Steel Shirt", 226.21m });

While we’re here, let’s create an InMemory wrapper to hold the same data in memory.

public class InMemory
{
    public InMemory(List<Product> products)
    {
        Products = products.AsReadOnly();
    }

    public IReadOnlyList<Product> Products { get; }
}

Since we’re dealing with an ASP.NET Core application, we’ll need to configure our services. We’ll pay the cost of loading our data into memory at application startup and add a singleton instance to ASP.NET Core’s services collection.

public void ConfigureServices(IServiceCollection services)
{
    var connectionString = Configuration.GetConnectionString("Database");

    services
        .AddEntityFrameworkSqlite()
        .AddDbContext<Database>(o => o.UseSqlite(connectionString));
    
    // load all products into memory
    using var db = new Database(
        new DbContextOptionsBuilder<Database>()
            .UseSqlite(connectionString)
            .Options
    ); 
    
    var inMemory = new InMemory(db.Products.AsNoTracking().ToList());
    services.AddSingleton(inMemory);
}

Finally, let’s map our endpoints.

app.UseEndpoints(endpoints =>
{
    endpoints.MapGet("/", async context =>
    {
        await context.Response.WriteAsync(
            @"<html>
                <body>
                    <h1>To Query Or Cache?</h1>
                    <ul>
                        <li><a href='/api/in-memory'>In Memory</a></li>
                        <li><a href='/api/sql'>Sql</a></li>
                    </ul>
                </body>
              </html>                                    
            ");
    });
    
    endpoints.MapGet("/api/in-memory", async context =>
    {
        var random = new Random();
        var db = context.RequestServices.GetRequiredService<InMemory>();
        var id = random.Next(1, 10_000);
        var product = db.Products.First(p => p.Id == id);

        await context.Response.WriteAsJsonAsync(product);
    });
    
    endpoints.MapGet("/api/sql", async context =>
    {
        var random = new Random();
        var db = context.RequestServices.GetRequiredService<Database>();

        var id = random.Next(1, 10_000);
        var product = await db
            .Products
            .AsNoTracking()
            .FirstAsync(p => p.Id == id);

        await context.Response.WriteAsJsonAsync(product);
    });
});

As we can see, both the in-memory and SQL endpoints return a Product at random. The only difference is the way we retrieve the Product instances. The EF Core query will utilize the LINQ interface, and to give it the best opportunity to be performant, we’ll turn object tracking off.

Performance Considerations

When thinking about performance in an ASP.NET Core application, the two most important metrics are memory and response times. Let’s first start with memory usage.

Memory Utilization

As one might guess, storing objects in memory can come with an overhead cost. Each situation is unique, and developers should evaluate what performance profile they want out of their particular use case. In this example, let’s look at the memory allocated at startup for InMemory and 10,000 Product instances.

Using JetBrains Rider’s Dynamic Program Analysis, we can see that our instance is around 45 megabytes.

Memory allocations for in-memory collection

As mentioned before, this may be fine for some folks and a deal-breaker for others. Before we make any decision, let’s look at the implications of our next performance metric, response times.

Response Times

Memory utilization is an app maintainer’s problem, and users are ultimately more concerned about response times. Let’s see how storing 10,000 records in memory fairs against retrieving a single row from SQL.

Let’s start by doing a non-scientific test against our server application. Here is the unit test.

public class ApiTests
    : IClassFixture<WebApplicationFactory<Startup>>
{
    private readonly ITestOutputHelper output;
    private readonly WebApplicationFactory<Startup> factory;

    public ApiTests(ITestOutputHelper output, WebApplicationFactory<Startup> factory)
    {
        this.output = output;
        this.factory = factory;
        
        var projectDir = Directory.GetCurrentDirectory();
        var testSettings = Path.Combine(projectDir, "appsettings.json");

        this.factory = factory.WithWebHostBuilder(builder =>
        {
            builder.ConfigureAppConfiguration((context,conf) =>
            {
                conf.AddJsonFile(testSettings);
            });
        });
    }
    
    [Fact]
    public async Task InMemoryTest()
    {
        var client = factory.CreateClient();
        
        HttpResponseMessage response;
        using (new Timing("in-memory", output))
        {
            response = await client.GetAsync("/api/in-memory");
        }
        
        response.EnsureSuccessStatusCode();
        var product = await response.Content.ReadFromJsonAsync<Product>();
        Assert.NotEqual(0, product?.Id);
        output.WriteLine($"Product Name: {product?.Name}");
    }
    
    [Fact]
    public async Task SqliteTest()
    {
        var client = factory.CreateClient();
        HttpResponseMessage response;
        
        using (new Timing("sql", output))
        {
            response = await client.GetAsync("/api/sql");
        }
        
        response.EnsureSuccessStatusCode();
        var product = await response.Content.ReadFromJsonAsync<Product>();
        Assert.NotEqual(0, product?.Id);
        output.WriteLine($"Product Name: {product?.Name}");
    }

    public class Timing : IDisposable
    {
        private readonly string description;
        private readonly ITestOutputHelper output;
        private Stopwatch stopwatch;

        public Timing(string description, ITestOutputHelper output)
        {
            this.description = description;
            this.output = output;
            stopwatch = new Stopwatch();
            stopwatch.Start();
        }

        public void Dispose()
        {
            stopwatch.Stop();
            output.WriteLine($"{description}: {stopwatch.Elapsed.TotalMilliseconds} ms.");
        }
    }
}

When we run the SqliteTest, we can see the response in 180 ms. The value is a best-case scenario as the TestServer does not actually initiate an HTTP request.

Sqlite test using TestServer

How fast does the InMemoryTest respond?

in-memory test using TestServer

Wow, the InMemoryTest responds in 80 ms. From my past experience, 100 ms is a significant improvement.

Let’s get more scientific by running some load tests against each endpoint using K6. Below you’ll find the script for testing our in-memory endpoint. The SQL endpoint test is identical.

import { check } from 'k6';
import http from 'k6/http';

export let options = {
    vus: 50,
    duration: '30s'
};

export default function() {
    let res = http.get('https://localhost:5001/api/in-memory');
    check(res, {
        'is status 200' : (r) => r.status === 200
    });
}

We’ll run the SQL endpoint first under our load test.

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: Tests/sql.js
     output: -

  scenarios: (100.00%) 1 scenario, 50 max VUs, 1m0s max duration (incl. graceful stop):
           * default: 50 looping VUs for 30s (gracefulStop: 30s)


running (0m30.2s), 00/50 VUs, 71329 complete and 0 interrupted iterations
default ✓ [======================================] 50 VUs  30s

    ✓ is status 200

    checks.....................: 100.00% ✓ 71329 ✗ 0   
    data_received..............: 34 MB   1.1 MB/s
    data_sent..................: 8.3 MB  275 kB/s
    http_req_blocked...........: avg=410.1µs  min=0s     med=2µs     max=676.44ms p(90)=3µs     p(95)=4µs     
    http_req_connecting........: avg=1.59µs   min=0s     med=0s      max=3.21ms   p(90)=0s      p(95)=0s      
    http_req_duration..........: avg=20.55ms  min=1.11ms med=10.24ms max=1.19s    p(90)=21.78ms p(95)=119.16ms
    http_req_receiving.........: avg=38.25µs  min=15µs   med=31µs    max=7.05ms   p(90)=55µs    p(95)=67µs    
    http_req_sending...........: avg=11.48µs  min=4µs    med=10µs    max=1.56ms   p(90)=15µs    p(95)=22µs    
    http_req_tls_handshaking...: avg=405.27µs min=0s     med=0s      max=672.97ms p(90)=0s      p(95)=0s      
    http_req_waiting...........: avg=20.5ms   min=1.07ms med=10.19ms max=1.19s    p(90)=21.72ms p(95)=119.12ms
    http_reqs..................: 71329   2364.169154/s
    iteration_duration.........: avg=21.04ms  min=1.17ms med=10.32ms max=1.19s    p(90)=21.89ms p(95)=122.63ms
    iterations.................: 71329   2364.169154/s
    vus........................: 50      min=50  max=50
    vus_max....................: 50      min=50  max=50

We can see that the average iteration_duration is 21.04ms. The results are substantially better than our unit tests, as we’re operating on a larger dataset and a warmed API instance.

Let’s run the in-memory benchmark and see what we get.

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: Tests/in-memory.js
     output: -

  scenarios: (100.00%) 1 scenario, 50 max VUs, 1m0s max duration (incl. graceful stop):
           * default: 50 looping VUs for 30s (gracefulStop: 30s)


running (0m30.0s), 00/50 VUs, 755563 complete and 0 interrupted iterations
default ✓ [======================================] 50 VUs  30s

    ✓ is status 200

    checks.....................: 100.00% ✓ 755563 ✗ 0   
    data_received..............: 358 MB  12 MB/s
    data_sent..................: 92 MB   3.1 MB/s
    http_req_blocked...........: avg=25.82µs min=0s       med=2µs    max=525.24ms p(90)=3µs    p(95)=3µs   
    http_req_connecting........: avg=70ns    min=0s       med=0s     max=1.5ms    p(90)=0s     p(95)=0s    
    http_req_duration..........: avg=1.85ms  min=136µs    med=1.58ms max=110.95ms p(90)=2.92ms p(95)=3.67ms
    http_req_receiving.........: avg=39.02µs min=11µs     med=22µs   max=95.35ms  p(90)=39µs   p(95)=58µs  
    http_req_sending...........: avg=12.05µs min=4µs      med=8µs    max=95.78ms  p(90)=12µs   p(95)=18µs  
    http_req_tls_handshaking...: avg=23.33µs min=0s       med=0s     max=523.02ms p(90)=0s     p(95)=0s    
    http_req_waiting...........: avg=1.8ms   min=111µs    med=1.54ms max=102.26ms p(90)=2.86ms p(95)=3.59ms
    http_reqs..................: 755563  25158.147645/s
    iteration_duration.........: avg=1.97ms  min=188.08µs med=1.66ms max=527.42ms p(90)=3.04ms p(95)=3.85ms
    iterations.................: 755563  25158.147645/s
    vus........................: 50      min=50   max=50
    vus_max....................: 50      min=50   max=50

Not a surprise regarding response times, but we see an average iteration_duration of 1.97ms. That’s a 1/10 of the time it takes reading from a database, which is on disk. The differences in performance could also increase based on network topology, power of the database instance, and the amount of data retrieved on a query. We’ll also note that the throughput on our API is significantly increased from 34MB received to 358MB received. We were able to get 10x the previous throughput performance by removing the call to our database.

Conclusion

This proposed solution is only one of potentially infinite solutions out there. Each developer’s circumstance will come with different requirements, and the developer should consider all options when deploying the solution for their client. Additionally, the needs of our clients change. In the case of our Reddit user, the data may currently be static but could become increasingly dynamic due to unforeseen circumstances.

In this post, we explored registering an in-memory collection in ASP.NET Core’s service collection and its impacts on memory and response performance. We also saw how we could get a rough idea about the relative performance between any two approaches by utilizing the WebApplicationFactory test fixture. We also did some load testing using K6 to get a response time benchmark.

If you’re interested in seeing the complete code for this post, then head over to my GitHub repository.

I hope you enjoyed this post, and let me know in the comments what approach you’d recommend. What would you prioritize, and how would you plan for future changes?

If you have a question you’d like advice on, please send me an email or message on Twitter.