Skip to main content

node_similarity

docs-source

Abstract

If we're interested in how similar two nodes in a graph are, we'll want to get a numerical value that represents the node similarity between those two nodes. There are many node similarity measures and currently this module contains the following:

  • cosine similarity
  • Jaccard similarity
  • overlap similarity

The Jaccard similarity is computed using the following formula:

The overlap similarity is computed using the following formula:

The cosine similarity computes similarity between two nodes based on some property. This property should be a vector and it can be computed using the following formula:

Set A represents all outgoing neighbors of one node, set B represents all outgoing neighbors of the other node. In all the given formulas, the numerator is the cardinality of the intersection of set A and set B (in other words, the cardinality of the common neighbors set). The denominator differs but requires the cardinality of sets A and B in some way.

For each similarity measure, there are two functions, one that calculates similarity between all pairs of nodes and the other, pairwise function, that takes into account pairwise similarities between two set of nodes.

TraitValue
Module typealgorithm
ImplementationC++
Graph directiondirected
Edge weightsunweighted
Parallelismsequential

Procedures

info

If you want to execute this algorithm on graph projections, subgraphs or portions of the graph, be sure to check out the guide on How to run a MAGE module on subgraphs.

cosine()

Output:

  • node1: Vertex ➡ The first node.
  • node2: Vertex ➡ The second node.
  • similarity: float ➡ The cosine similarity between the first and the second node.

Usage:

CALL node_similarity.cosine() YIELD node1, node2, similarity
RETURN node1, node2, similarity

cosine_pairwise(src_nodes, dst_nodes)

Input:

  • src_nodes: List[Vertex] ➡ The first set of nodes.
  • dst_nodes: List[Vertex]] ➡ The second set of nodes.
  • property: str ➡ The property based on which the cosine similarity will be calculated. If the property is not of the vector type, the error will be thrown.

Output:

  • node1: Vertex ➡ The first node.
  • node2: Vertex ➡ The second node.
  • similarity: float ➡ The cosine similarity between the first and the second node.

Usage:

MATCH (m)
WHERE m.id > 2
WITH COLLECT(m) AS nodes1
MATCH (n)
WHERE n.id < 8
WITH COLLECT(n) AS nodes2, nodes1
CALL node_similarity.cosine_pairwise("score", nodes1, nodes2) YIELD node1, node2, similarity
RETURN node1, node2, similarity

jaccard()

Output:

  • node1: Vertex ➡ The first node.
  • node2: Vertex ➡ The second node.
  • similarity: float ➡ The Jaccard similarity between the first and the second node.

Usage:

CALL node_similarity.jaccard() YIELD node1, node2, similarity
RETURN node1, node2, similarity;

jaccard_pairwise(src_nodes, dst_nodes)

Input:

  • src_nodes: List[Vertex] ➡ The first set of nodes.
  • dst_nodes: List[Vertex] ➡ The second set of nodes.

Output:

  • node1: Vertex ➡ The first node.
  • node2: Vertex ➡ The second node.
  • similarity: float ➡ The Jaccard similarity between the first and the second node.

Usage:

MATCH (m)
WHERE m.id > 2
WITH COLLECT(m) AS nodes1
MATCH (n)
WHERE n.id < 8
WITH COLLECT(n) AS nodes2, nodes1
CALL node_similarity.jaccard_pairwise(nodes1, nodes2) YIELD node1, node2, similarity
RETURN node1, node2, similarity

overlap()

Output:

  • node1: Vertex ➡ The first node.
  • node2: Vertex ➡ The second node.
  • similarity: float ➡ The overlap similarity between the first and the second node.

Usage:

CALL node_similarity.overlap() YIELD node1, node2, similarity
RETURN node1, node2, similarity;

overlap_pairwise(node1, node2)

Input:

  • src_nodes: List[Vertex] ➡ The first set of nodes.
  • dst_nodes: List[Vertex] ➡ The second set of nodes.

Output:

  • node1: Vertex ➡ The first node.
  • node2: Vertex ➡ The second node.
  • similarity: float ➡ The overlap similarity between the first and the second node.
MATCH (m)
WHERE m.id > 2
WITH COLLECT(m) AS nodes1
MATCH (n)
WHERE n.id < 8
WITH COLLECT(n) AS nodes2, nodes1
CALL node_similarity.overlap_pairwise(nodes1, nodes2) YIELD node1, node2, similarity
RETURN node1, node2, similarity;

Example - cosine pairwise similarity

Example - Jaccard pairwise similarity

Example - overlap similarity