As a developer, I want to build the best software my users can have. More often than not, a great search experience is at the forefront of that user experience. In this post, I’ll show you how to build a search experience within your ASP.NET Core applications. After reading this post, you should be able to help your users find the information they need, even when they aren’t sure what they want.
While there are great third-party search providers, I believe understanding how search works will give you the knowledge necessary to fine-tune your user experience. This post will guide you through the thought process of developing a search experience. After reading this post, you should feel confident in making the best choice for your search needs, even if it means trusting someone else to do so.
Goals
I am a well-traveled human being, thanks to my lovely wife Nicole, and often I get asked where folks should visit. In this post, let us focus on building a search experience around the capital cities of the world.
Given a few characters, we should be able to locate a capital city anywhere on the planet and display a map around that location. Our main goal is to show relevant city locations in our UI for the user to review. That is our goal for the users, but what about our technical goals?
We want to load data from our CSV dataset into our search engine. Once the data is loaded, we need to index the data to be searchable. Finally, we need to provide a simple UI to empower our users. Our goals as developers are pretty straight forward; let ‘s have a sneak peek at the final user experience.
Technology Stack
As you may have guessed from the title of this post, we are going to be primarily working with ASP.NET Core. All code samples from this point on will be in C# and potentially be utilizing C# 8 features. I have listed the requirements for this project below. If you want to follow along, please be sure that your local development environment has the following dependencies.
- ASP.NET Core 2.2+: Download the latest from https://dot.net
- Elasticsearch 7.3.1: Download the most recent from https://www.elastic.co/ or use your favorite package manager to install it.
- Kibana 7.3.1: Kibana makes life easier. While Elasticsearch doesn’t require a UI and you can interact with the service via a REST API, I highly recommend installing Kibana.
We’ll also need Nuget packages to accomplish our development tasks.
- NEST: NEST is the best way to interact with Elasticsearch from C#.
- CsvHelper: We will be loading our dataset from a local CSV file, and CsvHelper is the best package I’ve found for making this possible.
If you want to bypass the work of writing the demo and play with the final code, you are welcome to clone it on the public GitHub repository I have provided.
Basic Overview Of Elasticsearch
I think of Elasticsearch as the best way to provide a full-text search experience to users and can be the most powerful tool in a modern developers repertoire. You may also hear about this technology described within the ELK (Elasticsearch, Logstash, and Kibana) stack. Elastic pitches its technology as follows:
Reliably and securely take data from any source, in any format, then search, analyze, and visualize it in real-time.–Elastic
As a developer, you’ll need to understand the essential parts of Elasticsearch to get the best search experience. Let’s break down the parts you need to think about and what you’ll be seeing in the upcoming code samples.
What Is An Elasticsearch Index
Elasticsearch contains many internal data repositories. Each repository is known as an Index
. These indexes include data that is processed and stored in a manner that makes for efficient searches. As developers, we create these indices and refer to them by name. In this post, we’ll be creating an index aptly named capitals
.
Analyzing Data In Elasticsearch
I mentioned in the previous section that data in an Elasticsearch index is “processed.” Processing data in Elasticsearch is known as analyzing. Analyzation includes looking at data and making decisions as to how to separate parts of that data into tokens.
Let’s take a look at parts of our index we will be creating. The first is a geolocation property. We want to let our index know this field contains geo-coordinates.
"location" : {
"type" : "geo_point"
}
Another interesting example in our index will be informing the index to analyze using our custom autocomplete
analyzer. Don’t worry about the details right now, as the details of our analyzer will be explained later in the post.
"names" : {
"type" : "text",
"analyzer" : "autocomplete",
"search_analyzer" : "autocomplete_search"
}
Analyzing differently during index time and search time is one of my favorite features of Elasticsearch. It allows you to reduce the amount of work necessary to tokenize search terms. You can also test any analyzer using Kibana.
GET capitals/_analyze
{
"analyzer": "autocomplete",
"text" :"united kingdom"
}
Here are the results of our custom analyzer.
ASP.NET Core Setup
As mentioned above, we’ll be using ASP.NET Core Razor Pages to provide a simple search experience. Our application will take a user’s input, send it to the server, which will then communicate with Elasticsearch. Any matches from our search will be displayed to the user.
Note, if you want to run this sample locally, please clone the repo from my GitHub account.
Loading a CSV into An Elasticsearch Index
Before we can analyze any data, we need to read data from a source. In this example, we will load our capital cities from a comma-separated values file. Below is an example.
city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
Pristina,Pristina,42.6666,21.1724,Kosovo,XK,XKS,Prishtinë,primary,,1901760068
Longyearbyen,Longyearbyen,78.2167,15.6333,Svalbard,XR,XSV,,primary,,1930654114
ASP.NET Core has made it easier to write startup tasks directly in our Program.cs
file. In this case, I wrote a small service that will read our file and bulk insert the data into our index.
public static async Task Main(string[] args)
{
// setup host
var host = CreateWebHostBuilder(args).Build();
// load records from Csv to Elasticsearch
using (var scope = host.Services.CreateScope())
{
var loader = scope.ServiceProvider.GetRequiredService<CapitalCities>();
await loader.RunAsync();
}
// change our run to async
await host.RunAsync();
}
The implementation of CapitalCities
reads the data, creates the index definition, and bulk inserts the data.
public class CapitalCities
{
public const string IndexName = "capitals";
private ElasticClient client;
public CapitalCities(ElasticClient client)
{
this.client = client;
}
public async Task RunAsync()
{
// if the index exists, let's delete it
// you probably don't want to do this kind of
// index management in a production environment
var index = await client.Indices.ExistsAsync(IndexName);
if (index.Exists)
{
await client.Indices.DeleteAsync(IndexName);
}
// let's create the index
var createResult =
await client.Indices.CreateAsync(IndexName, c => c
.Settings(s => s
.Analysis(a => a
// our custom search analyzer
.AddSearchAnalyzer()
)
)
.Map<CapitalSearchDocument>(m => m.AutoMap())
);
// let's load the data
var file = File.Open("capital_cities.csv", FileMode.Open);
using (var csv = new CsvReader(new StreamReader(file)))
{
// describe's the csv file
csv.Configuration.RegisterClassMap<CapitalCitiesMapping>();
var records = csv
.GetRecords<CapitalCityRecord>()
.Select(record => new CapitalSearchDocument(record))
.ToList();
// we are pushing all the data in at once
var bullkResult =
await client
.BulkAsync(b => b
.Index(IndexName)
.CreateMany(records)
);
}
}
}
Note that the service deletes the existing index, then recreates it every time. Recreating indexes is acceptable for small datasets, but I would recommend evaluating index management for your specific needs.
The Index Definition
If you looked through the code of CapitalCities
, you might have seen a call to AddSearchAnalyzer
. This method is an encapsulation of our search analyzer.
public static class Indices
{
public const string IndexAnalyzerName = "autocomplete";
public const string SearchAnalyzerName = "autocomplete_search";
/// <summary>
/// I've moved this into an extension method
/// for reuse and a clearer understanding of the
/// custom analyzer we are writing
/// </summary>
/// <param name="analysis"></param>
/// <returns></returns>
public static IAnalysis AddSearchAnalyzer(this AnalysisDescriptor analysis)
{
const string lowercase = nameof(lowercase);
// https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html
// names aren't really important, they are just keys
return
analysis
.Analyzers(a => a
.Custom(IndexAnalyzerName, c => c
.Tokenizer(IndexAnalyzerName)
.Filters(lowercase)
)
.Custom(SearchAnalyzerName, c =>
c.Tokenizer(lowercase)
)
)
.Tokenizers(t => t
.EdgeNGram(IndexAnalyzerName, e => e
.MinGram(1)
.MaxGram(20)
.TokenChars(TokenChar.Letter)
)
);
}
}
The EdgeNGram
tokenizer is the most important part of our autocomplete
analyzer. The tokenizer sweeps from left to right adding up each character as part of the next token. Let’s look at an example of the word london
. The tokens created would be as follows:
l
lo
lon
lond
londo
london
The tokenization allows our users to search on fragments efficiently. Each token created will point back to the document we index. Speaking of which, let’s take a look at the record we are indexing.
The Capital City Search Document
I highly recommend thinking about how users may want to search your data. Understanding what a user may, or may not do allows you to do some preprocessing of your data that can make for better experiences.
In our search example, a user may know the capital city name, or they may know the country name. It makes sense to analyze both pieces of data. Let’s look at some preprocessing in our class.
public class CapitalSearchDocument
{
public CapitalSearchDocument()
{
}
public CapitalSearchDocument(CapitalCityRecord record)
{
Id = record.Id;
// we want to do some work in setting
// up the values that will be analyzed
// thinking about what the user might
// type into our search input
Names = new[]
{
record.City,
record.CityAscii,
record.Country,
}
.Union(record.CityAscii.Split(' '))
.Union(record.Country.Split(' '))
.Distinct(StringComparer.OrdinalIgnoreCase)
.ToArray();
City = record.City;
Country = record.Country;
// Elasticsearch supports GeoPoints as Arrays
Location = new[] {record.Longitude, record.Latitude};
Data = record;
}
public string Id { get; set; }
// We want to index the many variations
// of a capital city, so we store the strings
// in an array.
//
// We also want to index and search differently
[Text(
Analyzer = Indices.IndexAnalyzerName,
SearchAnalyzer = Indices.SearchAnalyzerName
)]
public string[] Names { get; set; }
// we want to filter by country
[Keyword]
public string Country { get; set; }
[Keyword]
public string City { get; set; }
[Object(Enabled = false)]
public CapitalCityRecord Data { get; set; }
// store location
[GeoPoint]
public decimal[] Location { get; set; }
}
Elasticsearch supports array fields, and I highly recommend using them. It allows you to create variations and give users several options for their search criteria. In the CapitalSearchDocument
, the Names
properties is a preprocessed set of values from the city name, city name parts, and country name. The variable names should give our users a broad target when searching the data set.
Data Objects In Our Index
You may notice that the index has a CapitalCityRecord
property named Data
. The class property holds an unmanipulated version of my data separate than the information I want to index. Note the use of the Object
attribute. I am telling Elasticsearch to store this property as an object, and the Enabled
flag tells Elasticsearch not to index any child properties. I highly recommend this practice, as it makes thinking about search and displays two different exercises.
Searching In ASP.NET Core Razor Pages
At this point, we should have an analyzed index full of capital cities. We need to expose a UI that lets us enter a search term and return the results to our users. For the sake of brevity, I will show you the most important part of the Razor page, the OnGet
method.
public void OnGet()
{
if (!string.IsNullOrWhiteSpace(Term))
{
Search =
client.Search<CapitalSearchDocument>(s =>
s.Query(q => q
.Match(m => m
.Field(f => f.Names)
.Query(Term)
.Fuzziness(Fuzziness.EditDistance(1))
)
)
.Take(10)
);
}
}
Given a search term, we use NEST to send in a query to our capitals
index. The above query uses our Names
field that has been analyzed using our autocomplete
analyzer, and our search term is analyzed using the autocomplete_search
analyzer, which reduces the tokens from our input. I have limited the results to a count of 10 but you may want to change that for your use case.
Fuzzy Searching Our Records
You may have noticed the use of Fuzziness
in our search. Fuzziness is a trick to help our users still get relevant results. In our case, we use an edit distance
fuzziness of 1. The Elasticsearch documentation describes edit distance as
An edit distance is the number of one-character changes needed to turn one term into another. –Elasticsearch
By adding this feature, we can help folks who may be prone to misspellings.
The Results
The UI utilizes Bootstrap to create a card layout of every search hit. You can see the results below of our UI.
I also ended up using MapQuest’s Static API and utilized the latitude and longitude values to show an informative satellite map.
Conclusion
I had a lot of fun building this demo. ASP.NET Core’s use of IWebHostBuilder
makes running start-up tasks simple. The .NET OSS community has a goldmine of helpful resources, and Elasticsearch is a superpowered asset to any project. I recommend downloading the project from GitHub and playing around with it. It’s small, but I feel a good representation of what it takes to put a search feature in your applications.
If you have any questions, please don’t hesitate to reach out to me on Twitter (@buhakmeh) or through this site.