Sometimes you need to break down a large number into smaller chunks. For me, I regularly chunk data when I need to bulk import data records into a database. I find that fine-tuning the chunks can help improve data-loading performance when compared to trying to force all of the data in at once. In this short yet fun post, I will write a helper method that takes any number and chunk size and produces a data structure that you can use in loops to load data.
The Use Case
As I mentioned in the intro, we sometimes need to produce a specific count of records.
var totalItems = 10_000;
var chunkSize = 234;
var chunks = new Chunks(totalItems.Value, chunkSize.Value);
Console.WriteLine($"Max Chunk Size: {chunks.MaxSize}\n" +
$"Min Chunk Size: {chunks.MinSize}\n" +
$"Total Chunks : {chunks.TotalChunks}");
// total items: 10_000 and chunk size is 234
foreach (var (index, count) in chunks)
{
// the index 0..1
// the count of items in the chunk
// do something here
}
My most common use case for this is with Entity Framework Core and Bogus (a data generator), utilizing both libraries to load fake data into a database. Running the code above, I will get properly-sized chunks produced by my GetChunks
method.
Max Chunk Size: 234
Min Chunk Size: 172
Total Chunks : 43
If we do the math, we’ll see that (42 * 234) + 172 = 10,000
. Woot! What’s the solution look like for the Chunks
record?
The Solution
The solution is straightforward when we use the Chunk
method recently added to System.Linq
combined with a few record types to make it all work together more quickly for the consuming code.
public record Chunks : IEnumerable<Chunks.Chunk>
{
public Chunks(int totalCount, int chunkSize)
{
Items = Enumerable
.Range(0, totalCount)
.Chunk(chunkSize)
.Select((value, index) => new Chunk(index, value.Length))
.ToList();
}
private List<Chunk> Items { get; }
public int MaxSize => Items.Max(x => x.Count);
public int MinSize => Items.Min(x => x.Count);
public int TotalChunks => Items.Count;
public IEnumerator<Chunk> GetEnumerator()
=> Items.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator()
=> GetEnumerator();
public record Chunk(int Index, int Count);
}
Let’s combine the above code with Bogus to generate some Person
records.
var generator = new Faker<Person>()
//.RuleFor(m => m.Id, (f, _) => f.IndexFaker)
.RuleFor(m => m.Name, f => f.Name.FullName())
.RuleFor(m => m.Hobby, f => f.Commerce.Department())
.RuleFor(m => m.Age, f => f.Finance.Random.Number(16, 89));
var chunks = new Chunks(1_000, 100);
logger.LogInformation("{ChunkCount} of Chunks To Initialize", chunks.TotalChunks);
foreach (var (index, value) in chunks)
{
logger.LogInformation("#{Index}: Generating {Chunk} rows of People", index, value);
var records = generator.Generate(value);
database.People.AddRange(records);
database.SaveChanges();
database.ChangeTracker.Clear();
}
As you can see, we can also take advantage of record deconstruction to break down a Chunk
instance into its parts of Index
and Value
.
The benefit of this approach is it makes your code much easier to read since the instance of Chunks
you get back contains all the metadata you may need when working with chunked data.
I hope you found this post a fun, quick read, and let me know if you use it in your projects by sending me a tweet on Twitter at @buhakmeh. As always, thanks for reading.