Use NEO4J To Find The Shortest Path

Relational databases are the undisputed database paradigm king. The versatility, battle-hardened reliability, and ubiquity make relational databases a comfortable choice for many developers. That doesn’t mean, as developers, we shouldn’t evaluate other databases that may be better suited for our problem. There are many database options we could choose from, and with over a decade on the market, Neo4j has shown there is a real place for NoSQL Graph databases.

In this post, we’ll walk through a Neo4j example where we find the shortest path between locations utilizing the default Neo4j UI followed by accessing the same data using C#.

Basics

Before we dive in, we should understand the basic vocabulary of a graph database. Graph databases have three core elements:

Nodes
Relationships
Properties

A Node is a particular point of interest in our graph. For example, a retail data model may express their graph with nodes that include offices, shipping warehouses, and stores. Relationships are the vertices that connect our nodes. Following our example, there may be several shipping relationships between our warehouses and stores. Finally, properties can exist on both nodes and relationships. A store node may have an address, while its relationship with a shipping location may include a shipping agreement. All three concepts encompass what it means to work with graphs.

As we may have noticed, a graph database’s vocabulary is not that dissimilar from that used in a traditional relational database. Unlike traditional relational databases, Neo4j is built to mimic evolving business requirements and embraces the ad-hoc nature of the real world.

The greatest weakness of relational databases is that their schema is too inflexible. –[Neo4j][beginner]

The greatest strength of a graph database comes from our ability to model our problems directly into graphs. The impedance mismatch of tables and denormalization are gone. We can go straight from the whiteboard diagram to a data-model.

We can learn more from reading Neo4j’s beginner guide.

Docker Compose Setup

It is highly recommened that we run Docker locally. It makes starting with Neo4j effortless, and cuts out having to deal with the Java Development Kit (JDK).This posts includes a docker-compose configuration that can get us started quickly.

version: '3.4'

services:
  neo4j:
    image: "neo4j"
    container_name: "neo4j"
    ports:
      - 7474:7474
      - 7687:7687
    restart: always
    environment:
      NEO4J_ACCEPT_LICENSE_AGREEMENT: "yes"
      NEO4J_AUTH: none

Add the YAML configuration to a docker-compose.yml file and run the following from a command window.

> docker-compose up -d

We should now have the latest stable version of Neo4j running. We are using 4.0.1..

Once our Docker container is up and running, we can access the Neo4j UI by going to http://localhost:7474/browser/.

Problem

Let’s start by defining the problem we’re solving.

Given a set of locations, we want to find the shortest path from one to another.

We know the following criteria:

We want to traverse from Location A to Location I.
Nodes have a name and a relationship with at least one other node.
Each relationship has a distance property.

Understanding our problem parameters, we can use Neo4j’s query language, Cypher, to create our scenarios.

CREATE (LocationA:Location { name: "Location A" })
CREATE (LocationB:Location { name: "Location B" })
CREATE (LocationC:Location { name: "Location C" })
CREATE (LocationD:Location { name: "Location D" })
CREATE (LocationE:Location { name: "Location E" })
CREATE (LocationF:Location { name: "Location F" })
CREATE (LocationG:Location { name: "Location G" })
CREATE (LocationH:Location { name: "Location H" })
CREATE (LocationI:Location { name: "Location I" })

CREATE
    (LocationA)-[:CONNECTED_TO { distance: 5 }]->(LocationB),
    (LocationB)-[:CONNECTED_TO { distance: 6 }]->(LocationC),
    (LocationC)-[:CONNECTED_TO { distance: 4 }]->(LocationI),
    (LocationA)-[:CONNECTED_TO { distance: 3 }]->(LocationD),
    (LocationD)-[:CONNECTED_TO { distance: 4 }]->(LocationE),
    (LocationE)-[:CONNECTED_TO { distance: 5 }]->(LocationI),
    (LocationA)-[:CONNECTED_TO { distance: 2 }]->(LocationF),
    (LocationF)-[:CONNECTED_TO { distance: 3 }]->(LocationG),
    (LocationG)-[:CONNECTED_TO { distance: 2 }]->(LocationH),
    (LocationH)-[:CONNECTED_TO { distance: 1 }]->(LocationI)

create cypher

We can execute the Cypher in the Neo4j UI to return all the newly created locations.

MATCH (n) RETURN n

all the nodes

Great! We know how all of our locations and their relationships set up. Let’s try and find the distance between the locations.

Solution

We want to find the shortest distance from Location A to Location I. Since all relationships are already defined, we can ask Neo4j to traverse our path and sum up the distances along the way to the destination.

MATCH (from:Location { name:"Location A" }), (to:Location { name: "Location I"}) , path = (from)-[:CONNECTED_TO*]->(to)
RETURN path AS shortestPath,
    reduce(distance = 0, r in relationships(path) | distance+r.distance) AS totalDistance
    ORDER BY totalDistance ASC
    LIMIT 1

The resulting graph should look like this.

shortest nodes

The shortest path is a total distance of 8 traversing through locations A, F, G, H, and I.

shortest nodes table

If we reverse the ordering of our query, we can get the longest path.

MATCH (from:Location { name:"Location A" }), (to:Location { name: "Location I"}) , path = (from)-[:CONNECTED_TO*]->(to)
RETURN path AS longestPath,
    reduce(distance = 0, r in relationships(path) | distance+r.distance) AS totalDistance
    ORDER BY totalDistance DESC
    LIMIT 1

We can see the longest path has a total distance of 15 going through locations A, B, C, and I. Note that even though the shortest path has more nodes, it is still less costly to traverse it because of the total distance.

longest path nodes

longest path results table

Access Neo4j From C#

It is excellent that we can use the native UI of Neo4j to explore and manipulate our data. But before we know it, we’ll need to access our data from code. In this case, we’ll use the official Neo4j driver for C#.

> dotnet add package Neo4j.Driver

Our code requires we create a driver instance, and then utilize sessions to execute our Cypher queries.

using System;
using System.Threading.Tasks;
using Neo4j.Driver;

namespace GraphDistance
{
    class Program
    {
        async static Task Main()
        {
            using var driver = GraphDatabase.Driver(
                "neo4j://localhost:7687",
                AuthTokens.None
            );
            var session = driver.AsyncSession(
                db => db.WithDatabase("neo4j")
            );

            // Let's Query All Our Nodes
            
            var namesQuery = "MATCH (n) Return n.name as name";
            var cursor = await session.RunAsync(namesQuery);

            var all =
                await cursor.ToListAsync(x => x["name"].As<string>()); 
            
            foreach (var name in all)
            {
                Console.WriteLine(name);
            }
            
            // Let's Get Our Shortest Distance
            var shortestQuery = 
            @"MATCH (from:Location { name:""Location A"" }), (to:Location { name: ""Location I""}) , path = (from)-[:CONNECTED_TO*]->(to)
            RETURN path AS shortestPath,
                reduce(distance = 0, r in relationships(path) | distance+r.distance) AS totalDistance
            ORDER BY totalDistance ASC
            LIMIT 1";

            cursor = await session.RunAsync(shortestQuery);
            var distance =
                await cursor.SingleAsync(x => x["totalDistance"].As<int>());
            
            Console.WriteLine($"\nThe shortest path's distance is {distance} units.");

            await session.CloseAsync();
        }
    }
}

When we execute our sample, we get the following output.

Location A
Location B
Location C
Location D
Location E
Location F
Location G
Location H
Location I

The shortest path's distance is 8 units.

Cool! We were able to interact with our Neo4j instance from C# with relative ease.

Conclusion

We’ve only scratched the surface of what a graph database is capable of doing. We have seen that it fits intuitively into many business domains and can help solve otherwise complex problems. Cypher is a powerful query language, and as shown by the example in this post, we can perform complex requests with very minimal syntax. Not only is it useful as a stand-alone technology, but it also integrates with some of our favorite languages.

I hope you found this post helpful, and please leave a comment below.