Concepts
This is a step-by-step guide to the concepts behind graph pattern matching. It starts with the simple building blocks of graph patterns: node patterns and relationship patterns. It then shows how those are composed into path patterns that match fixed-length paths, variable-length paths and paths that have cycles in them.
The model data in the examples below are based on the UK national rail network, using publicly available datasets.
Node patterns
Every graph pattern contains at least one node pattern. The simplest graph pattern is a single, empty node pattern:
MATCH ()
The empty node pattern matches every node in a property graph. In order to obtain a reference to the nodes matched, a variable needs to be declared in the node pattern:
MATCH (n)
With this reference, node properties can be accessed:
MATCH (n)
RETURN n.name
Adding a label expression to the node pattern means only nodes with labels that match will be returned.
The following matches nodes that have the Stop
label:
MATCH (n:Stop)
The following more complex label expression matches all nodes that are either a TrainStation
and a BusStation
or StationGroup
:
MATCH (n:(TrainStation&BusStation)|StationGroup)
A map of property names and values can be used to match on node properties based on equality with the specified values.
The following matches nodes that have their mode
property equal to Rail
:
MATCH (n { mode: 'Rail' })
More general predicates can be expressed with a WHERE
clause.
The following matches nodes whose name property starts with Preston
:
MATCH (n:Station WHERE n.name STARTS WITH 'Preston')
See the node patterns reference section for more details.
Relationship patterns
The simplest possible relationship pattern is a pair of dashes:
--
This pattern matches a relationship with any direction and does not filter on any relationship type or property.
Unlike a node pattern, a relationship pattern cannot be used in a MATCH
clause without node patterns at both ends.
See path patterns for more details.
In order to obtain a reference to the relationships matched by the pattern, a relationship variable needs to be declared in the pattern by adding the variable name in square brackets in between the dashes:
-[r]-
To match a specific direction, add <
or >
to the left or right hand side respectively:
-[r]->
To match on a relationship type, add the type name after a colon:
-[:CALLS_AT]->
Similar to node patterns, a map of property names and values can be added to filter on properties of the relationship based on equality with the specified values:
-[{ distance: 0.24, duration: 'PT4M' }]->
A WHERE
clause can be used for more general predicates:
-[r WHERE time() + duration(r.duration) < time('22:00') ]->
See the relationship patterns reference section for more details.
Path patterns
Any valid path starts and ends with a node, with relationships between each node (if there is more than one node). Path patterns have the same restrictions, and for all valid path patterns the following are true:
-
They have at least one node pattern.
-
They begin and end with a node pattern.
-
They alternate between nodes and relationships.
These are all valid path patterns:
()
(s)--(e)
(:Station)--()<--(m WHERE m.departs > time('12:00'))-->()-[:NEXT]->(n)
These are invalid path patterns:
-->
()-->
()-->-->()
Path pattern matching
This section contains an example of matching a path pattern to paths in a property graph.
It uses the following graph:
To recreate the graph, run the following query against an empty Neo4j database:
CREATE (pmr:Station {name: 'Peckham Rye'}),
(dmk:Station {name: 'Denmark Hill'}),
(vic:Station {name: 'London Victoria'}),
(clp:Station {name: 'Clapham High Street'}),
(eph:Station {name: 'Elephant & Castle'}),
(vic)<-[:CALLS_AT]-(s1:Stop {departs: time('11:55')}),
(dmk)<-[:CALLS_AT]-(s2:Stop {departs: time('11:44')})-[:NEXT]->(s1),
(pmr)<-[:CALLS_AT]-(s3:Stop {departs: time('11:40')})-[:NEXT]->(s2),
(clp)<-[:CALLS_AT]-(s4:Stop {departs: time('11:41')}),
(dmk)<-[:CALLS_AT]-(s5:Stop {departs: time('11:37')})-[:NEXT]->(s4),
(pmr)<-[:CALLS_AT]-(s6:Stop {departs: time('11:33')})-[:NEXT]->(s5),
(eph)<-[:CALLS_AT]-(s7:Stop {departs: time('11:54')}),
(dmk)<-[:CALLS_AT]-(s8:Stop {departs: time('11:47')})-[:NEXT]->(s7),
(pmr)<-[:CALLS_AT]-(s9:Stop {departs: time('11:44')})-[:NEXT]->(s8)
The graph contains a number of train Stations
and Stops
.
A Stop
represents the arrival and departure of a train that CALLS_AT
a Station
.
Each Stop
forms part of a sequence of Stops
connected by relationships with the type NEXT
, representing the order of calling points made by a train service.
The graph shows three chains of Stops
that represent different train services.
Each of these services calls at the Station
with the name Denmark Hill
.
To return all Stops
that call at the Station
Denmark Hill
, the following motif is used (the term motif is used to describe the pattern looked for in the graph):
In this case, three paths in the graph match the structure of the motif (plus the predicate anchoring to the Station
Denmark Hill
):
In order to return the name of each Stop
that calls at a Station
, declare a variable in the Stop
node pattern.
The results will then have a row containing the departs value of each Stop
for each match shown above:
MATCH (s:Stop)-[:CALLS_AT]->(:Station {name: 'Denmark Hill'})
RETURN s.departs AS departureTime
departureTime |
---|
|
|
|
Rows: 3 |
Equijoins
An equijoin is an operation on paths that requires more than one of the nodes or relationships of the paths to be the same. The equality between the nodes or relationships is specified by declaring the same variable in multiple node patterns or relationship patterns. An equijoin allows cycles to be specified in a path pattern. See graph patterns for more complex patterns.
This section uses the following graph:
To recreate the graph, run the following query against an empty Neo4j database:
CREATE (bhi:Station {name: 'Birmingham International'}),
(cov:Station {name: 'Coventry'}),
(eus:Station {name: 'London Euston'}),
(bhi)<-[:CALLS_AT]-(s1:Stop {departs: time('12:03')}),
(cov)<-[:CALLS_AT]-(s2:Stop {departs: time('11:33')})-[:NEXT]->(s1),
(eus)<-[:CALLS_AT]-(s3:Stop {departs: time('15:54')}),
(cov)<-[:CALLS_AT]-(s4:Stop {departs: time('14:45')})-[:NEXT]->(s3),
(cov)<-[:CALLS_AT]-(s5:Stop {departs: time('09:34')}),
(eus)<-[:CALLS_AT]-(s6:Stop {departs: time('08:40')})-[:NEXT]->(s5)
To illustrate how equijoins work, we will use the problem of finding a round trip between two Stations
.
In this example scenario, a passenger starts their outbound journey at London Euston
Station
and ends at Coventry
Station
.
The return journey will be the reverse order of those Stations
.
The graph has three different services, two of which would compose the desired round trip, and a third which would send the passenger to Birmingham International
.
The solution is the following path with a cycle:
If unique properties exist on the node where the cycle "join" occurs in the path, then it is possible to repeat the node pattern with a predicate matching on the unique property.
The following motif demonstrates how that can be achieved, repeating a Station
node pattern with the name London Euston
:
The path pattern equivalent is:
(:Station {name: 'London Euston'})<-[:CALLS_AT]-(:Stop)-[:NEXT]->(:Stop)
-[:CALLS_AT]->(:Station {name: 'Coventry'})<-[:CALLS_AT]-(:Stop)
-[:NEXT]->(:Stop)-[:CALLS_AT]->(:Station {name: 'London Euston'})
There may be cases where a unique predicate is not available.
In this case, an equijoin can be used to define the desired cycle by using a repeated node variable.
In the current example, if you declare the same node variable n
in both the first and last node patterns, then the node patterns must match the same node:
Putting this path pattern with an equijoin in a query, the times of the outbound and return journeys can be returned:
MATCH (n:Station {name: 'London Euston'})<-[:CALLS_AT]-(s1:Stop)
-[:NEXT]->(s2:Stop)-[:CALLS_AT]->(:Station {name: 'Coventry'})
<-[:CALLS_AT]-(s3:Stop)-[:NEXT]->(s4:Stop)-[:CALLS_AT]->(n)
RETURN s1.departs+'-'+s2.departs AS outbound,
s3.departs+'-'+s4.departs AS `return`
outbound | return |
---|---|
|
|
Rows: 1 |
Quantified path patterns
This feature was introduced in Neo4j 5.9.
All the path patterns discussed so far have had a fixed length. This section considers how to match paths of varying length by using quantified path patterns, allowing you to search for paths whose lengths are unknown or within a specific range.
Quantified path patterns can be useful when, for example, searching for all nodes that can be reached from an anchor node, finding all paths connecting two nodes, or when traversing a hierarchy that may have differing depths.
This example uses a new graph:
To recreate the graph, run the following query against an empty Neo4j database:
CREATE (pmr:Station {name: 'Peckham Rye'}),
(dmk:Station {name: 'Denmark Hill'}),
(clp:Station {name: 'Clapham High Street'}),
(wwr:Station {name: 'Wandsworth Road'}),
(clj:Station {name: 'Clapham Junction'}),
(s1:Stop {arrives: time('17:19'), departs: time('17:20')}),
(s2:Stop {arrives: time('17:12'), departs: time('17:13')}),
(s3:Stop {arrives: time('17:10'), departs: time('17:11')}),
(s4:Stop {arrives: time('17:06'), departs: time('17:07')}),
(s5:Stop {arrives: time('16:58'), departs: time('17:01')}),
(s6:Stop {arrives: time('17:17'), departs: time('17:20')}),
(s7:Stop {arrives: time('17:08'), departs: time('17:10')}),
(clj)<-[:CALLS_AT]-(s1), (wwr)<-[:CALLS_AT]-(s2),
(clp)<-[:CALLS_AT]-(s3), (dmk)<-[:CALLS_AT]-(s4),
(pmr)<-[:CALLS_AT]-(s5), (clj)<-[:CALLS_AT]-(s6),
(dmk)<-[:CALLS_AT]-(s7),
(s5)-[:NEXT {distance: 1.2}]->(s4),(s4)-[:NEXT {distance: 0.34}]->(s3),
(s3)-[:NEXT {distance: 0.76}]->(s2), (s2)-[:NEXT {distance: 0.3}]->(s1),
(s7)-[:NEXT {distance: 1.4}]->(s6)
Each Stop
on a service CALLS_AT
one Station
. Each Stop
has the properties arrives
and departs
that give the times the train is at the Station
.
Following the NEXT
relationship of a Stop
will give the next Stop
of the service.
For this example, a path pattern is constructed to match each of the services that allow passengers to travel from Denmark Hill
to Clapham Junction
.
The following shows the two paths that the path pattern should match:
The following motif represents a fixed-length path pattern that matches the service that departs from Denmark Hill
station at 17:07
:
To match the second train service, leaving Denmark Hill
at 17:10
, a shorter path pattern is needed:
Translating the motifs into Cypher®, and adding predicates to match the origin and destination Stations
, yields the following two path patterns respectively:
(:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(:Stop)
-[:NEXT]->(:Stop)
-[:NEXT]->(:Stop)
-[:NEXT]->(:Stop)-[:CALLS_AT]->
(:Station { name: 'Clapham Junction' })
(:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(:Stop)
-[:NEXT]->(:Stop)-[:CALLS_AT]->
(:Station { name: 'Clapham Junction' })
To return both solutions in the same query using these fixed-length path patterns, a UNION of two MATCH
statements would be needed.
For example, the following query returns the departure
of the two services:
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
-[:NEXT]->(:Stop)
-[:NEXT]->(:Stop)
-[:NEXT]->(a:Stop)-[:CALLS_AT]->
(:Station { name: 'Clapham Junction' })
RETURN d.departs AS departureTime, a.arrives AS arrivalTime
UNION
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
-[:NEXT]->(a:Stop)-[:CALLS_AT]->
(:Station { name: 'Clapham Junction' })
RETURN d.departs AS departureTime, a.arrives AS arrivalTime
departureTime | arrivalTime |
---|---|
|
|
|
|
Rows: 2 |
The problem with this solution is that not only is it verbose, it can only be used where the lengths of the target paths are known in advance.
Quantified path patterns solve this problem by extracting repeating parts of a path pattern into parentheses and applying a quantifier.
That quantifier specifies a range of possible repetitions of the extracted pattern to match on.
For the current example, the first step is identifying the repeating pattern, which in this case is the sequence of alternating Stop
nodes and NEXT
relationships, representing one segment of a Service
:
(:Stop)-[:NEXT]->(:Stop)
The shortest path has one instance of this pattern, the longest three.
So the quantifier applied to the wrapper parentheses is the range one to three, expressed as {1,3}
:
((:Stop)-[:NEXT]->(:Stop)){1,3}
This also includes repetitions of two, but in this case this repetition will not return matches. To understand the semantics of this pattern, it helps to work through the expansion of the repetitions. Here are the three repetitions specified by the quantifier, combined into a union of path patterns:
(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)(:Stop)-[:NEXT]->(:Stop)(:Stop)-[:NEXT]->(:Stop)
The union operator (|
) here is used for illustration only; using it this way is not part of Cypher syntax.
Where two node patterns are next to each other in the expansion above, they must necessarily match the same node: the next segment of a Service
starts where the previous segment ends.
As such they can be rewritten as a single node pattern with any filtering condition combined conjunctively.
In this example this is trivial, because the filtering applied to those nodes is just the label Stop
:
With this, the union of path patterns simplifies to:
(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)-[:NEXT]->(:Stop)-[:NEXT]->(:Stop)
The segments of the original path pattern that connect the Stations
to the Stops
can also be rewritten.
Here is what those segments look like when concatenated with the first repetition:
(:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(:Stop)
(:Stop)-[:NEXT]->(:Stop)
(:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
The original MATCH
clause now has the following three parts:
Translating the union of fixed-length path patterns into a quantified path pattern results in a pattern that will return the correct paths.
The following query adds a RETURN
clause that yields the departure and arrival times of the two services:
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
((:Stop)-[:NEXT]->(:Stop)){1,3}
(a:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
RETURN d.departs AS departureTime, a.arrives AS arrivalTime
departureTime | arrivalTime |
---|---|
|
|
|
|
Rows: 2 |
Quantified relationships
Quantified relationships allow some simple quantified path patterns to be re-written in a more succinct way.
Continuing with the example of Stations
and Stops
from the previous section, consider the following query:
MATCH (d:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(n:Stop)
((:Stop)-[:NEXT]->(:Stop)){1,10}
(m:Stop)-[:CALLS_AT]->(a:Station { name: 'Clapham Junction' })
WHERE m.arrives < time('17:18')
RETURN n.departs AS departureTime
If the relationship NEXT
only connects Stop
nodes, the :Stop
label expressions can be removed:
MATCH (d:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(n:Stop)
(()-[:NEXT]->()){1,10}
(m:Stop)-[:CALLS_AT]->(a:Station { name: 'Clapham Junction' })
WHERE m.arrives < time('17:18')
RETURN n.departs AS departureTime
When the quantified path pattern has one relationship pattern, it can be abbreviated to a quantified relationship. A quantified relationship is a relationship pattern with a postfix quantifier. Below is the previous query rewritten with a quantified relationship:
MATCH (d:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-
(n:Stop)-[:NEXT]->{1,10}(m:Stop)-[:CALLS_AT]->
(a:Station { name: 'Clapham Junction' })
WHERE m.arrives < time('17:18')
RETURN n.departs AS departureTime
The scope of the quantifier {1,10}
is the relationship pattern -[:NEXT]->
and not the node patterns abutting it.
More generally, where a path pattern contained in a quantified path pattern has the following form:
(() <relationship pattern> ()) <quantifier>
then it can be re-written as follows:
<relationship pattern> <quantifier>
Prior to the introduction of quantified path patterns and quantified relationships in Neo4j 5.9, the only method in Cypher to match paths of a variable length was through variable-length relationships. This syntax is still available. It is very similar to the syntax for quantified relationships, with the following differences:
For more information, see the reference section on variable-length relationships. |
Group variables
This section uses the example of Stations
and Stops
used in the previous section, but with an additional property distance
added to the NEXT
relationships:
As the name suggests, this property represents the distance between two Stops
.
To return the total distance for each service connecting a pair of Stations
, a variable referencing each of the relationships traversed is needed.
Similarly, to extract the departs
and arrives
properties of each Stop
, variables referencing each of the nodes traversed is required.
In this example of matching services between Denmark Hill
and Clapham Junction
, the variables l
and m
are declared to match the Stops
and r
is declared to match the relationships.
The variable origin only matches the first Stop
in the path:
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(origin)
((l)-[r:NEXT]->(m)){1,3}
()-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
Variables that are declared inside quantified path patterns are known as group variables.
They are so called because, when referred outside of the quantified path pattern, they are lists of the nodes or relationships they are bound to in the match.
To understand how to think about the way group variables are bound to nodes or relationships, it helps to expand the quantified path pattern, and observe how the different variables match to the elements of the overall matched path.
Here the three different expansions for each value in the range given by the quantifier {1,3}
:
(l1)-[r1:NEXT]->(m1) |
(l1)-[r1:NEXT]->(m1)(l2)-[r2:NEXT]->(m2) |
(l1)-[r1:NEXT]->(m1)(l2)-[r2:NEXT]->(m2)(l3)-[r3:NEXT]->(m3)
The subscript of each variable indicates which instance of the path pattern repetition they belong to.
The following diagram shows the variable bindings of the path pattern with three repetitions, which matches the service that departs Denmark Hill
at 17:07
.
It traces the node or relationship that each indexed variable is bound to.
Note that the index increases from right to left as the path starts at Denmark Hill
:
For this matched path, the group variables have the following bindings:
l => [n2, n3, n4]
r => [r2, r3, r4]
m => [n3, n4, n5]
The second solution is the following path:
The following table shows the bindings for both matches, including the variable origin.
In contrast to the group variables, origin
is a singleton variable due to being declared outside the quantification.
Singleton variables bind at most to one node or relationship.
origin | l | r | m |
---|---|---|---|
|
|
|
|
|
|
|
|
Returning to the original goal, which was to return the sequence of depart times for the Stops
and the total distance of each service, the final query exploits the compatibility of group variables with list comprehensions and list functions such as reduce():
MATCH (:Station {name: 'Denmark Hill'})<-[:CALLS_AT]-(origin)
((l)-[r:NEXT]->(m)){1,3}
()-[:CALLS_AT]->(:Station {name: 'Clapham Junction'})
RETURN origin.departs + [stop in m | stop.departs] AS departureTimes,
reduce(acc = 0.0, next in r | round(acc + next.distance, 2)) AS totalDistance
departureTimes | totalDistance |
---|---|
|
|
|
|
Rows: 2 |
Shortest path
This section uses the following graph:
To recreate it, run the following query against an empty Neo4j database:
CREATE (asc:Station {name:'Ashchurch'}),
(bmv:Station {name:'Bromsgrove'}),
(cnm:Station {name:'Cheltenham Spa'}),
(dtw:Station {name:'Droitwich Spa'}),
(hby:Station {name:'Hartlebury'}),
(psh:Station {name:'Pershore'}),
(wop:Station {name:'Worcestershire Parkway Ll'}),
(wof:Station {name:'Worcester Foregate Street'}),
(wos:Station {name:'Worcester Shrub Hill'})
SET asc.location = point({longitude: -2.10876, latitude: 51.9989}),
bmv.location = point({longitude: -2.04978, latitude: 52.3206}),
cnm.location = point({longitude: -2.09962, latitude: 51.8974}),
dtw.location = point({longitude: -2.15836, latitude: 52.2682}),
hby.location = point({longitude: -2.22112, latitude: 52.33449}),
psh.location = point({longitude: -2.07154, latitude: 52.13055}),
wop.location = point({longitude: -2.16003, latitude: 52.15605}),
wof.location = point({longitude: -2.2216, latitude: 52.19514}),
wos.location = point({longitude: -2.20941, latitude: 52.19473})
CREATE (asc)-[:LINK {distance: 7.25}]->(cnm),
(asc)-[:LINK {distance: 11.29}]->(wop),
(asc)-[:LINK {distance: 14.75}]->(wos),
(bmv)-[:LINK {distance: 31.14}]->(cnm),
(bmv)-[:LINK {distance: 6.16}]->(dtw),
(bmv)-[:LINK {distance: 12.6}]->(wop),
(dtw)-[:LINK {distance: 5.64}]->(hby),
(dtw)-[:LINK {distance: 6.03}]->(wof),
(dtw)-[:LINK {distance: 5.76}]->(wos),
(psh)-[:LINK {distance: 4.16}]->(wop),
(wop)-[:LINK {distance: 3.71}]->(wos),
(wof)-[:LINK {distance: 0.65}]->(wos)
Single shortest path
The shortestPath
function returns the path between two nodes with the fewest number of relationships.
If more than one shortest path exists, then one is picked non-deterministically.
For example, the following returns the shortest path between Hartlebury
and Cheltenham Spa
:
MATCH p = shortestPath((hby)-[:LINK*]-(cnm))
WHERE hby.name = 'Hartlebury' AND cnm.name = 'Cheltenham Spa'
RETURN [n in nodes(p) | n.name] AS stops
stops |
---|
|
Rows: 1 |
The path pattern passed to the shortestPath
function defines the pattern that the shortest path must conform to.
It needs to be a variable-length relationship with a single relationship pattern.
For more information, see the reference section on variable-length relationships.
Single shortest path with predicates
If the MATCH
clause of the shortestPath
function includes a WHERE
clause, a shortest path that satisfies the WHERE
clause conditions is returned if one exists.
This is different to first finding the shortest path using the path pattern and then applying the WHERE clause condition, which could potentially return no results.
For example, the following query returns the shortest path, with the condition that the distance between stations is never 20
miles or more:
MATCH p = shortestPath((hby)-[:LINK*]-(cnm))
WHERE all(link in relationships(p) WHERE link.distance < 20) AND
hby.name = 'Hartlebury' AND cnm.name = 'Cheltenham Spa'
RETURN [n in nodes(p) | n.name] AS stops
stops |
---|
|
Rows: 1 |
If the evaluation of the WHERE
clause conditions is forced to happen after the shortestPath
function returns a solution, for example by moving the WHERE
clause so it comes after a WITH
clause, the shortest path found will include LINK
relationships with distance
greater than 20
, which will subsequently get filtered out:
MATCH p = shortestPath((hby)-[:LINK*]-(cnm))
WHERE hby.name = 'Hartlebury' AND cnm.name = 'Cheltenham Spa'
WITH p
WHERE all(link in relationships(p) WHERE link.distance < 20)
RETURN [n in nodes(p) | n.name] AS stops
(no changes, no records)
When the predicate of the WHERE
clause can be checked during the search for the shortest path as in the previous example, then solutions can be found efficiently.
If, however, the predicate requires evaluation of the whole path before being checked, then a more exhaustive search may need to be done first.
This can have a significant impact on performance.
For example, the following query requires the whole path to determine the total distance between the endpoints:
MATCH p = shortestPath((hby)-[link:LINK*]-(cnm))
WHERE reduce(acc = 0, l in link | acc + l.distance) > 50 AND
hby.name = 'Hartlebury' AND cnm.name = 'Cheltenham Spa'
RETURN [n in nodes(p) | n.name] AS stops
stops |
---|
|
Rows: 1 |
On a large, highly connected graph, this can be very time consuming. See the section on shortest path planning for more information.
All shortest paths
The allShortestPaths
function finds all paths between two nodes that have the minimum number of relationships.
For example, the following returns the two shortest paths between Hartlebury
and Pershore
:
MATCH p = allShortestPaths((hby)-[link:LINK*]-(psh))
WHERE hby.name = 'Hartlebury' AND psh.name = 'Pershore'
RETURN [n in nodes(p) | n.name] AS stops
stops |
---|
|
|
Rows: 2 |
Predicates in quantified path patterns
One of the pitfalls of quantified path patterns is that, depending on the graph, they can end up matching very large numbers of paths, resulting in a slow query performance. This is especially true when searching for paths with a large maximum length or when the pattern is too general. However, by using inline predicates that specify precisely which nodes and relationships should be included in the results, unwanted results will be pruned as the graph is traversed.
Here are some examples of the types of constraints you can impose on quantified path pattern traversals:
-
Nodes must have certain combinations of labels. For example, all nodes must be an
Employee
, but not aContractor
. -
Relationships must have certain types. For example, all relationships in the path must be of type
EMPLOYED_BY
. -
Nodes or relationships must have properties satisfying some condition. For example, all relationships must have property
distance
>
10
.
The same example used in the shortest path section above is used here to illustrate the use of inline predicates. In that section, the shortest path in terms of number of hops was found. Here the example is developed to find the shortest path by physical distance and compared to the result from the shortestPath function.
The total distance from Hartlebury
to Cheltenham Spa
following the path yielded by shortestPath
is given by the following query:
MATCH (hby:Station {name: 'Hartlebury'}),
(cnm:Station {name: 'Cheltenham Spa'})
MATCH p = shortestPath((hby)-[:LINK*]-(cnm))
RETURN reduce(acc = 0, r in relationships(p) | acc + r.distance)
AS distance
distance |
---|
|
Rows: 1 |
Whether this is the shortest path by distance can be checked by looking at each path between the two end Stations
and returning the first result after ordering by distance
:
MATCH (hby:Station {name: 'Hartlebury'}),
(cnm:Station {name: 'Cheltenham Spa'})
MATCH p = (hby)-[:LINK]-+(cnm)
RETURN reduce(acc = 0, r in relationships(p) | acc + r.distance)
AS distance
ORDER BY distance LIMIT 1
distance |
---|
|
Rows: 1 |
This shows that there is a route with a shorter distance than the one with fewer Stations
.
For a small dataset, this query will be fast, but the execution time will increase exponentially with the graph size.
For a real dataset, such as the entire rail network of the UK, it will be unacceptably long.
One approach to avoid the exponential explosion in paths is to put a finite upper bound to the quantified path pattern. This works fine where the solution is known to lie within some range of hops. But in cases where this is not known, one alternative would be to make the pattern more specific by, for example, adding node labels, or by specifying a relationship direction. Another alternative would be to add an inline predicate to the quantified path pattern.
In this example, an inline predicate can be added that exploits the geospatial property location of the Stations
: for each pair of Stations
on the path, the second Station
will be closer to the endpoint (not always true, but is assumed here to keep the example simple).
To compose the predicate, the point.distance() function is used to compare the distance of the left-hand and the right-hand Station
to the destination Cheltenham Spa
:
MATCH (hby:Station {name: 'Hartlebury'}),
(cnm:Station {name: 'Cheltenham Spa'})
MATCH p = (hby)
((a)-[:LINK]-(b) WHERE point.distance(a.location, cnm.location) >
point.distance(b.location, cnm.location))+ (cnm)
RETURN reduce(acc = 0, r in relationships(p) | acc + r.distance)
AS distance
ORDER BY distance
distance |
---|
|
|
|
|
Rows: 4 |
This query shows that there are only four paths solving the query (a number that remains constant even if the data from the rest of the UK railway network was included). Using inline predicates or making quantified path patterns more specific where possible can greatly improve query performance.
Graph patterns
In addition to the single path patterns discussed so far, multiple path patterns can be combined in a comma-separated list to form a graph pattern. In a graph pattern, each path pattern is matched separately, and where node variables are repeated in the separate path patterns, the solutions are reduced via equijoins. If there are no equijoins between the path patterns, the result is a Cartesian product between the separate solutions.
The benefit of joining multiple path patterns in this way is that it allows the specification of more complex patterns than the linear paths allowed by a single path pattern.
To illustrate this, another example drawn from the railway model will be used.
In this example, a passenger is traveling from Starbeck
to Huddersfield
, changing trains at Leeds
.
To get to Leeds
from Starbeck
, the passenger can take a direct service that stops at all stations on the way. However, there is an opportunity to change at one of the stations (Harrogate
) on the way to Leeds
, and catch an express train, which may enable the passenger to catch an earlier train leaving from Leeds
, reducing the overall journey time.
This section uses the following graph, showing the train services the passenger can use:
To recreate the graph, run the following query against an empty Neo4j database:
CREATE (hgt:Station {name: 'Harrogate'}), (lds:Station {name: 'Leeds'}),
(sbe:Station {name: 'Starbeck'}), (hbp:Station {name: 'Hornbeam Park'}),
(wet:Station {name: 'Weeton'}), (hrs:Station {name: 'Horsforth'}),
(hdy:Station {name: 'Headingley'}), (buy:Station {name: 'Burley Park'}),
(pnl:Station {name: 'Pannal'}), (hud:Station {name: 'Huddersfield'}),
(s9:Stop {arrives: time('11:53')}),
(s8:Stop {arrives: time('11:44'), departs: time('11:45')}),
(s7:Stop {arrives: time('11:40'), departs: time('11:43')}),
(s6:Stop {arrives: time('11:38'), departs: time('11:39')}),
(s5:Stop {arrives: time('11:29'), departs: time('11:30')}),
(s4:Stop {arrives: time('11:24'), departs: time('11:25')}),
(s3:Stop {arrives: time('11:19'), departs: time('11:20')}),
(s2:Stop {arrives: time('11:16'), departs: time('11:17')}),
(s1:Stop {departs: time('11:11')}), (s21:Stop {arrives: time('11:25')}),
(s211:Stop {departs: time('11:00')}), (s10:Stop {arrives: time('11:45')}),
(s101:Stop {departs: time('11:20')}), (s11:Stop {arrives: time('12:05')}),
(s111:Stop {departs: time('11:40')}), (s12:Stop {arrives: time('12:07')}),
(s121:Stop {departs: time('11:50')}), (s13:Stop {arrives: time('12:37')}),
(s131:Stop {departs: time('12:20')}),
(lds)<-[:CALLS_AT]-(s9), (buy)<-[:CALLS_AT]-(s8)-[:NEXT]->(s9),
(hdy)<-[:CALLS_AT]-(s7)-[:NEXT]->(s8), (hrs)<-[:CALLS_AT]-(s6)-[:NEXT]->(s7),
(wet)<-[:CALLS_AT]-(s5)-[:NEXT]->(s6), (pnl)<-[:CALLS_AT]-(s4)-[:NEXT]->(s5),
(hbp)<-[:CALLS_AT]-(s3)-[:NEXT]->(s4), (hgt)<-[:CALLS_AT]-(s2)-[:NEXT]->(s3),
(sbe)<-[:CALLS_AT]-(s1)-[:NEXT]->(s2), (lds)<-[:CALLS_AT]-(s21), (hgt)<-[:CALLS_AT]-(s211)-[:NEXT]->(s21), (lds)<-[:CALLS_AT]-(s10), (hgt)<-[:CALLS_AT]-(s101)-[:NEXT]->(s10), (lds)<-[:CALLS_AT]-(s11), (hgt)<-[:CALLS_AT]-(s111)-[:NEXT]->(s11), (hud)<-[:CALLS_AT]-(s12), (lds)<-[:CALLS_AT]-(s121)-[:NEXT]->(s12), (hud)<-[:CALLS_AT]-(s13), (lds)<-[:CALLS_AT]-(s131)-[:NEXT]->(s13)
The solution to the problem assembles a set of path patterns matching the following three parts: the stopping service; the express service; and the final leg of the journey from Leeds
to Huddersfield
.
Each changeover, from stopping to express service and from express to onward service, has to respect the fact that the arrival time of a previous leg has to be before the departure time of the next leg.
This will be encoded in a single WHERE
clause.
The following visualizes the three legs with different colors, and identifies the node variables used to create the equijoins and anchoring:
For the stopping service, it is assumed that the station the passenger needs to change at is unknown.
As a result, the pattern needs to match a variable number of stops before and after the Stop
b
, the Stop
that calls at the changeover station l
.
This is achieved by placing the quantified relationship -[:NEXT]->*
on each side of node b
.
The ends of the path also needs to be anchored at a Stop
departing from Starbeck
at 11:11
, as well as at a Stop
calling at Leeds
:
(:Station {name: 'Starbeck'})<-[:CALLS_AT]-
(a:Stop {departs: time('11:11')})-[:NEXT]->*(b)-[:NEXT]->*
(c:Stop)-[:CALLS_AT]->(lds:Station {name: 'Leeds'})
For the express service, the ends of the path are anchored at the stop b
and Leeds
station, which lds
will be bound to by the first leg.
Although in this particular case there are only two stops on the service, a more general pattern that can match any number of stops is used:
(b)-[:CALLS_AT]->(l:Station)<-[:CALLS_AT]-(m:Stop)-[:NEXT]->*
(n:Stop)-[:CALLS_AT]->(lds)
Note that as Cypher only allows a relationship to be traversed once in a given match for a graph pattern, the first and second legs are guaranteed to be different train services.
(See relationship uniqueness for more details.)
Similarly, another quantified relationship that bridges the stops calling at Leeds
station and Huddersfield
station is added:
(lds)<-[:CALLS_AT]-(x:Stop)-[:NEXT]->*(y:Stop)-[:CALLS_AT]->
(:Station {name: 'Huddersfield'})
The other node variables are for the WHERE
clause or for returning data.
Putting this together, the resulting query returns the earliest arrival time achieved by switching to an express service:
MATCH (:Station {name: 'Starbeck'})<-[:CALLS_AT]-
(a:Stop {departs: time('11:11')})-[:NEXT]->*(b)-[:NEXT]->*
(c:Stop)-[:CALLS_AT]->(lds:Station {name: 'Leeds'}),
(b)-[:CALLS_AT]->(l:Station)<-[:CALLS_AT]-(m:Stop)-[:NEXT]->*
(n:Stop)-[:CALLS_AT]->(lds),
(lds)<-[:CALLS_AT]-(x:Stop)-[:NEXT]->*(y:Stop)-[:CALLS_AT]->
(:Station {name: 'Huddersfield'})
WHERE b.arrives < m.departs AND n.arrives < x.departs
RETURN a.departs AS departs,
l.name AS changeAt,
m.departs AS changeDeparts,
y.arrives AS arrives
ORDER BY y.arrives LIMIT 1
departs | changeAt | changeDeparts | arrives |
---|---|---|---|
|
|
|
|
Rows: 1 |
Was this page helpful?