Class PageRankParallelGaussSeidel

• public class PageRankParallelGaussSeidel
extends PageRank
Computes PageRank using a parallel (multicore) implementation of the Gauß–Seidel method.

Note: this is the implementation of choice to be used when computing PageRank. It uses less memory (one vector of doubles plus one vector of integers) and, experimentally, converges faster than any other implementation. Moreover, it scales linearly with the number of cores.

Warning: Since we need to enumerate the predecessors a node, you must pass to the constructor the transpose of the graph.

Technically, the iteration performed by this class is not a Gauß–Seidel iteration: we simply start a number of threads, and each thread updates a value using a Gauß–Seidel-like rule. As a result, each update uses some old and some new values: in other words, the regular splitting

M − N = I − α (P + uTd)
of the matrix associated to each update is always different (in a Gauß–Seidel iteration, M is upper triangular, and N is strictly lower triangular). Nonetheless, it is easy to check that M is still (up to permutation) upper triangular and invertible, independently of the specific update sequence.

Note that the step() method is not available: due to the need for some synchronization logic, only stepUntil(StoppingCriterion) is available.

The normDelta() method returns the following values:

• if a suitable norm vector has been set, an upper bound on the error (the ℓ distance from the rank to be computed);
• otherwise, an upper bound to the ℓ1 norm of the error, obtained multiplying by α / (1 − α) the ℓ1 norm of the difference between the last two approximations (this idea arose in discussions with David Gleich).

To be able to set a norm vector, you need to set the pseudoRank flag and use PowerSeries (setting the Markovian flag) to compute a suitable vector. To do so, you must provide an α and use the PowerSeries.MAX_RATIO_STOPPING_CRITERION. If the computation terminates without errors with maximum ratio σ, the resulting vector can be used with this class to compute pseudoranks for all α < 1 / σ (strictness is essential). Note that the ℓ1 norm of the error bounds the ℓ.

With respect to the description of the exact algorithm in PageRankGaussSeidel, we operate a simplification that is essentially in obtaining a coherent update without incurring in too much synchronization: the rank associated with dangling nodes is computed at the end of each computation, and used unchanged throughout the whole iteration. This corresponds to permuting the array so that dangling nodes come out last.

Author:
Sebastiano Vigna
PageRankGaussSeidel, PageRank, SpectralRanking
• Field Detail

• outdegree

public int[] outdegree
The outdegree of each node (initialized after the first computation).
• pseudoRank

public boolean pseudoRank
If true, an everywhere zero dangling-node distribution will be simulated, resulting in the computation of a pseudorank.
• Constructor Detail

• PageRankParallelGaussSeidel

public PageRankParallelGaussSeidel​(ImmutableGraph transpose,
Logger logger)
Creates a new instance.
Parameters:
transpose - the transpose of the graph on which to compute PageRank.
requestedThreads - the requested number of threads (0 for Runtime.availableProcessors()).
logger - a logger that will be passed to super().
• PageRankParallelGaussSeidel

public PageRankParallelGaussSeidel​(ImmutableGraph transpose)
Creates a new instance.
Parameters:
transpose - the transpose of the graph on which to compute PageRank.
• PageRankParallelGaussSeidel

public PageRankParallelGaussSeidel​(ImmutableGraph transpose,
Logger logger)
Creates a new instance.
Parameters:
transpose - the transpose of the graph on which to compute PageRank.
logger - a logger that will be passed to super().
• Method Detail

• normVector

public void normVector​(String normVectorFilename,
double sigma)
throws IOException
Sets the norm vector.
Parameters:
normVectorFilename - a file containing a norm vector as a list of doubles in DataInput format, or null for no norm vector.
sigma - the value for which the provided norm vector is suitable.
Throws:
IOException
• normVector

public void normVector​(double[] normVector,
double sigma)
Sets the norm vector.
Parameters:
normVector - the new norm vector.
sigma - the value for which the provided norm vector is suitable.
• normDelta

public double normDelta()
Return the following values: if a suitable norm vector has been set, an upper bound on the error (the ℓ distance from the rank to be computed); otherwise, an upper bound to the ℓ1 norm of the error, obtained multiplying by α / (1 − α) the ℓ1 norm of the difference between the last two approximations (this idea arose in discussions with David Gleich).
Overrides:
normDelta in class SpectralRanking
Returns:
an upper bound on the error.
• clear

public void clear()
Description copied from class: SpectralRanking
Clears all data and releases resources by nulling SpectralRanking.rank (i.e., results we no longer be available). Please extend this method to handle additional attributes.
Overrides:
clear in class SpectralRanking