pagerank_online
Online PageRank is a streaming algorithm made for calculating PageRank in a graph streaming scenario. Incremental- local changes are introduced in the algorithm to prevent users from recalculating PageRank values each time a change occurs in the graph (something is added or deleted).
To make it as fast as possible, the online algorithm is only the approximation of PageRank but carrying the same information - the likelihood of random walk ending in a particular vertex. The work is based on "Fast Incremental and Personalized PageRank" 1, where authors are deeply focused on providing the streaming experience of a highly popular graph algorithm.
Approximating PageRank is done simply by exploring random walks and calculating
the frequency of a node within all walks. R
walks are sampled by using a
random walk with a stopping probability of eps
.Therefore, on average, walks
would have a length of 1/eps
. Approximative PageRank is based on the formula
below:
RankApprox(v) = X_v / (n * R / eps)
Where X_v
is the number of walks where the node v
appears. The theorem
written in the paper explains that RankApprox(v) is sharply concentrated around
its expectation, which is Rank(v).
Usage
Online PageRank should be used in a specific way. To set the parameters, the
user should call a set()
procedure. This function also sets the context of a
streaming algorithm. get()
function only returns the resulting values stored
in a cache. Therefore, if you try to get values before first calling set()
,
the run will fail with a proper message.
To make the incremental flow, you should set the proper trigger. For that, we'll
use the update()
function:
CREATE TRIGGER pagerank_trigger
(BEFORE | AFTER) COMMIT
EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD *
SET node.rank = rank;
Finally, the reset()
function resets the context and enables the user to start
new runs.
1 Fast Incremental and Personalized PageRank, Bahman Bahmani et al.
Trait | Value |
---|---|
Module type | algorithm |
Implementation | C++ |
Graph direction | directed |
Edge weights | unweighted |
Parallelism | sequential |
Procedures
If you want to execute this algorithm on graph projections, subgraphs or portions of the graph, be sure to check out the guide on How to run a MAGE module on subgraphs.
set(walks_per_node, walk_stop_epsilon)
Input:
walks_per_node: integer (default=10)
➡ Number of sampled walks per node.walk_stop_epsilon: double (default=0.1)
➡ The probability of stopping when deriving the random walk. On average, it will create walks of length1 / walk_stop_epsilon
.
Output:
node
➡ Node in the graph, for which PageRank is calculated.rank
➡ Normalized ranking of a node. Expresses the probability that a random surfer will finish in a certain node by a random walk.
Usage:
CALL pagerank_online.set(100, 0.2)
YIELD node, rank;
get(max_iterations, damping_factor, stop_epsilon)
* This should be used if the trigger has been set or a set
function has
been called before adding changes to the graph.
Output:
node
➡ Node in the graph, for which PageRank is calculated.rank
➡ Normalized ranking of a node. Expresses the probability that a random surfer will finish in a certain node by a random walk.
Usage:
CALL pagerank_online.get()
YIELD node, rank;
update(created_vertices, created_edges, deleted_vertices, deleted_edges)
Input:
created_vertices
➡ Vertices that were created in the last transaction.created_edges
➡ Edges created in a period from the last transaction.deleted_vertices
➡ Vertices deleted from the last transaction.deleted_edges
➡ Edges deleted from the last transaction.
Output:
node
➡ Node in the graph, for which PageRank is calculated.rank
➡ Normalized ranking of a node. Expresses the probability that a random surfer will finish in a certain node by a random walk.
Usage:
CREATE TRIGGER pagerank_trigger
(BEFORE | AFTER) COMMIT
EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD *
SET node.rank = rank;
Example
- Step 1: Input graph
- Step 2: Set parameters and trigger
- Step 3: Load commands
- Step 4: Running command
- Step 5: Results
CALL pagerank_online.set(100, 0.2) YIELD *;
CREATE TRIGGER pagerank_trigger
BEFORE COMMIT
EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD *
SET node.rank = rank;
MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b);
MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b);
MERGE (a:Node {id: 2}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b);
MERGE (a:Node {id: 3}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b);
MERGE (a:Node {id: 3}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b);
MERGE (a:Node {id: 3}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b);
MERGE (a:Node {id: 4}) MERGE (b:Node {id: 6}) CREATE (a)-[:RELATION]->(b);
MATCH (node)
RETURN node.id AS node_id, node.rank AS rank;
+-----------+-----------+
| node_id | rank |
+-----------+-----------+
| 0 | 0.225225 |
| 1 | 0.225225 |
| 2 | 0.225225 |
| 3 | 0.0675676 |
| 4 | 0.0765766 |
| 5 | 0.0585586 |
| 6 | 0.121622 |
+-----------+-----------+