Dynamic PageRank
Description
In the domain of estimating the importance of graph nodes, PageRank is arguably the most popular tool. Today, the most popular search engine in the world, Google, owes its popularity solely to this algorithm, developed in the early days by its founders.
If we present nodes as pages and directed edges between them as links the PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page.
The need for its dynamic implementation arose at the moment when nodes and edges arrive in a short period of time. A large number of changes would result in either inconsistent information upon arrival or restarting the algorithm over the entire graph each time the graph changes. Since such changes occur frequently, the dynamic implementation allows the previously processed state to be preserved, and new changes are updated in such a way that only the neighborhood of the arriving entity is processed at a constant time. This saves time and allows us to have updated and accurate information about the new values of the PageRank.
There are also some disadvantages of this approach, and that is that such an approach does not guarantee an explicitly correct solution but its approximation. Such a trade-off is common in computer science but allows fast execution and guarantees that at a large scale such an approximation approaches the correct result.
Valuable work explaining how to quickly calculate these values was developed by Bahmani et. al., engineers from Stanford and Twitter. The paper is worth reading at Fast Incremental and Personalized PageRank 📖.
Materials
Implementation
Dynamic PageRank is implemented as part of the MAGE project. Be sure to check it out in the link above. ☝️
Blog posts
An obvious and trivial solution to the previous problem would be: whenever a new data point or connection comes into our platform, simply recalculate the influence measurement with the PageRank algorithm. This is very simple, and it works. However, the story changes over time when your business becomes more popular. You begin to have more work and therefore datapoints keep coming at a faster pace, accumulated data keeps rising in size and your algorithm fails to deliver valuable information efficiently.
PageRank is the perfect tool with which we can find the most Christmassy person. The person that is going to win is the one that got retweeted by other Christmassy people. As data comes at high speed, fortunately, MAGE has upgraded the good old PageRank in its garage and prepared a dynamic version - Dynamic Pagerank.
Use cases
Although PageRank can be used in a variety of ways, the need for its dynamic implementation lies in systems that use temporal graphs.
One of the most dynamic environments is definitely social networks. Every moment, events arrive that contain the creation of new users, posts, messages, and the graph itself grows at an enormous rate. Therefore, the need for an incremental measure of influence is high in companies such as: Meta, Twitter or Pinterest.
Importance analysis can be highlighted as crucial at the moment when internet traffic is analyzed. If we have some kind of internal, supervised system, PageRank can find out in which nodes the most information flows, and where to place the most important elements of the system resistant to large amounts of traffic.