panther_similarity#

panther_similarity(G, source, k=5, path_length=5, c=0.5, delta=0.1, eps=None)[source]#

Returns the Panther similarity of nodes in the graph G to node v.

Panther is a similarity metric that says “two objects are considered to be similar if they frequently appear on the same paths.” [1].

Parameters:
GNetworkX graph

A NetworkX graph

sourcenode

Source node for which to find the top k similar other nodes

kint (default = 5)

The number of most similar nodes to return

path_lengthint (default = 5)

How long the randomly generated paths should be (T in [1])

cfloat (default = 0.5)

A universal positive constant used to scale the number of sample random paths to generate.

deltafloat (default = 0.1)

The probability that the similarity \(S\) is not an epsilon-approximation to (R, phi), where \(R\) is the number of random paths and \(\phi\) is the probability that an element sampled from a set \(A \subseteq D\), where \(D\) is the domain.

epsfloat or None (default = None)

The error bound. Per [1], a good value is sqrt(1/|E|). Therefore, if no value is provided, the recommended computed value will be used.

Returns:
similaritydictionary

Dictionary of nodes to similarity scores (as floats). Note: the self-similarity (i.e., v) will not be included in the returned dictionary.

References

[1] (1,2,3)

Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., & Li, J. Panther: Fast top-k similarity search on large networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Vol. 2015-August, pp. 1445–1454). Association for Computing Machinery. https://doi.org/10.1145/2783258.2783267.

Examples

>>> G = nx.star_graph(10)
>>> sim = nx.panther_similarity(G, 0)