Relational databases are the undisputed database paradigm king. The versatility, battle-hardened reliability, and ubiquity make relational databases a comfortable choice for many developers. That doesn’t mean, as developers, we shouldn’t evaluate other databases that may be better suited for our problem. There are many database options we could choose from, and with over a decade on the market, Neo4j has shown there is a real place for NoSQL Graph databases.
In this post, we’ll walk through a Neo4j example where we find the shortest path between locations utilizing the default Neo4j UI followed by accessing the same data using C#.
Basics
Before we dive in, we should understand the basic vocabulary of a graph database. Graph databases have three core elements:
- Nodes
- Relationships
- Properties
A Node is a particular point of interest in our graph. For example, a retail data model may express their graph with nodes that include offices, shipping warehouses, and stores. Relationships are the vertices that connect our nodes. Following our example, there may be several shipping relationships between our warehouses and stores. Finally, properties can exist on both nodes and relationships. A store node may have an address, while its relationship with a shipping location may include a shipping agreement. All three concepts encompass what it means to work with graphs.
As we may have noticed, a graph database’s vocabulary is not that dissimilar from that used in a traditional relational database. Unlike traditional relational databases, Neo4j is built to mimic evolving business requirements and embraces the ad-hoc nature of the real world.
The greatest weakness of relational databases is that their schema is too inflexible. –[Neo4j][beginner]
The greatest strength of a graph database comes from our ability to model our problems directly into graphs. The impedance mismatch of tables and denormalization are gone. We can go straight from the whiteboard diagram to a data-model.
We can learn more from reading Neo4j’s beginner guide.
Docker Compose Setup
It is highly recommened that we run Docker locally. It makes starting with Neo4j effortless, and cuts out having to deal with the Java Development Kit (JDK).This posts includes a docker-compose
configuration that can get us started quickly.
version: '3.4'
services:
neo4j:
image: "neo4j"
container_name: "neo4j"
ports:
- 7474:7474
- 7687:7687
restart: always
environment:
NEO4J_ACCEPT_LICENSE_AGREEMENT: "yes"
NEO4J_AUTH: none
Add the YAML configuration to a docker-compose.yml
file and run the following from a command window.
> docker-compose up -d
We should now have the latest stable version of Neo4j running. We are using 4.0.1..
Once our Docker container is up and running, we can access the Neo4j UI by going to http://localhost:7474/browser/
.
Problem
Let’s start by defining the problem we’re solving.
Given a set of locations, we want to find the shortest path from one to another.
We know the following criteria:
- We want to traverse from Location A to Location I.
- Nodes have a
name
and a relationship with at least one other node. - Each relationship has a
distance
property.
Understanding our problem parameters, we can use Neo4j’s query language, Cypher, to create our scenarios.
CREATE (LocationA:Location { name: "Location A" })
CREATE (LocationB:Location { name: "Location B" })
CREATE (LocationC:Location { name: "Location C" })
CREATE (LocationD:Location { name: "Location D" })
CREATE (LocationE:Location { name: "Location E" })
CREATE (LocationF:Location { name: "Location F" })
CREATE (LocationG:Location { name: "Location G" })
CREATE (LocationH:Location { name: "Location H" })
CREATE (LocationI:Location { name: "Location I" })
CREATE
(LocationA)-[:CONNECTED_TO { distance: 5 }]->(LocationB),
(LocationB)-[:CONNECTED_TO { distance: 6 }]->(LocationC),
(LocationC)-[:CONNECTED_TO { distance: 4 }]->(LocationI),
(LocationA)-[:CONNECTED_TO { distance: 3 }]->(LocationD),
(LocationD)-[:CONNECTED_TO { distance: 4 }]->(LocationE),
(LocationE)-[:CONNECTED_TO { distance: 5 }]->(LocationI),
(LocationA)-[:CONNECTED_TO { distance: 2 }]->(LocationF),
(LocationF)-[:CONNECTED_TO { distance: 3 }]->(LocationG),
(LocationG)-[:CONNECTED_TO { distance: 2 }]->(LocationH),
(LocationH)-[:CONNECTED_TO { distance: 1 }]->(LocationI)
We can execute the Cypher in the Neo4j UI to return all the newly created locations.
MATCH (n) RETURN n
Great! We know how all of our locations and their relationships set up. Let’s try and find the distance between the locations.
Solution
We want to find the shortest distance from Location A to Location I. Since all relationships are already defined, we can ask Neo4j to traverse our path and sum up the distances along the way to the destination.
MATCH (from:Location { name:"Location A" }), (to:Location { name: "Location I"}) , path = (from)-[:CONNECTED_TO*]->(to)
RETURN path AS shortestPath,
reduce(distance = 0, r in relationships(path) | distance+r.distance) AS totalDistance
ORDER BY totalDistance ASC
LIMIT 1
The resulting graph should look like this.
The shortest path is a total distance of 8 traversing through locations A, F, G, H, and I.
If we reverse the ordering of our query, we can get the longest path.
MATCH (from:Location { name:"Location A" }), (to:Location { name: "Location I"}) , path = (from)-[:CONNECTED_TO*]->(to)
RETURN path AS longestPath,
reduce(distance = 0, r in relationships(path) | distance+r.distance) AS totalDistance
ORDER BY totalDistance DESC
LIMIT 1
We can see the longest path has a total distance of 15 going through locations A, B, C, and I. Note that even though the shortest path has more nodes, it is still less costly to traverse it because of the total distance.
Access Neo4j From C#
It is excellent that we can use the native UI of Neo4j to explore and manipulate our data. But before we know it, we’ll need to access our data from code. In this case, we’ll use the official Neo4j driver for C#.
> dotnet add package Neo4j.Driver
Our code requires we create a driver instance, and then utilize sessions to execute our Cypher queries.
using System;
using System.Threading.Tasks;
using Neo4j.Driver;
namespace GraphDistance
{
class Program
{
async static Task Main()
{
using var driver = GraphDatabase.Driver(
"neo4j://localhost:7687",
AuthTokens.None
);
var session = driver.AsyncSession(
db => db.WithDatabase("neo4j")
);
// Let's Query All Our Nodes
var namesQuery = "MATCH (n) Return n.name as name";
var cursor = await session.RunAsync(namesQuery);
var all =
await cursor.ToListAsync(x => x["name"].As<string>());
foreach (var name in all)
{
Console.WriteLine(name);
}
// Let's Get Our Shortest Distance
var shortestQuery =
@"MATCH (from:Location { name:""Location A"" }), (to:Location { name: ""Location I""}) , path = (from)-[:CONNECTED_TO*]->(to)
RETURN path AS shortestPath,
reduce(distance = 0, r in relationships(path) | distance+r.distance) AS totalDistance
ORDER BY totalDistance ASC
LIMIT 1";
cursor = await session.RunAsync(shortestQuery);
var distance =
await cursor.SingleAsync(x => x["totalDistance"].As<int>());
Console.WriteLine($"\nThe shortest path's distance is {distance} units.");
await session.CloseAsync();
}
}
}
When we execute our sample, we get the following output.
Location A
Location B
Location C
Location D
Location E
Location F
Location G
Location H
Location I
The shortest path's distance is 8 units.
Cool! We were able to interact with our Neo4j instance from C# with relative ease.
Conclusion
We’ve only scratched the surface of what a graph database is capable of doing. We have seen that it fits intuitively into many business domains and can help solve otherwise complex problems. Cypher is a powerful query language, and as shown by the example in this post, we can perform complex requests with very minimal syntax. Not only is it useful as a stand-alone technology, but it also integrates with some of our favorite languages.
I hope you found this post helpful, and please leave a comment below.