written 5.6 years ago by |
Page rank is a function that assigns a real number to each page in the web (or at least to that portion of the web that has been crawled and its links discovered)
The intent is that the higher the page rank of a page the more important it is.
Think that a web is a directed graph, where pages are the nodes and there is an arc form page p1 to page p2.
For e.g. Consider graph
- The page rank for above graph for pages A, B and C can be calculated as R = M . r --------- (1)
Where M is transition matrix.
- Above equation is valid when following conditions are met.
1. Graph is strongly connected.
2. There are no dead ends.
- While calculating page rank there are two problems:
1. Dead ends
a page which do not have any link outside is called a dead end.
To get page rank dead ends are removed recursively and page rank is calculated.
2. spider Trap
it is a set of nodes with no dead ends but no Ares out.
They cause page rank calculations to place all the page rank within the spider traps.
To avoid this problem we modify calculation of page rank by allowing each random surfer a small probability of teleporting to a random page, rather than following an out link from their current page.
The equation (1) gets modified as :
S = B M r + ( 1 - $ \beta) e/n$
Where $\beta$ is damping or teleportal factor.
The value of $\beta$ is in the range 0.8 to 0.9
The term BMr represents the case where, with probability b, the random surfer decides to follow an out link from their present page.
The term $(1 - \beta) e/n$ is a vector. Here $(1-\beta)/n$ represents the introduction, with probability $(1-\beta)$ of a new random surfer at a random page.
The teleportation factor takes care of dead ends also by introducing the term $(1-\beta)/n$
Thus the sum of components in vector r will never reach to zero because of teleportation.