Class PageRankParallelGaussSeidel


public class PageRankParallelGaussSeidel
extends PageRank
Computes PageRank using a parallel (multicore) implementation of the Gauß–Seidel method.

Note: this is the implementation of choice to be used when computing PageRank. It uses less memory (one vector of doubles plus one vector of integers) and, experimentally, converges faster than any other implementation. Moreover, it scales linearly with the number of cores.

Warning: Since we need to enumerate the predecessors a node, you must pass to the constructor the transpose of the graph.

Technically, the iteration performed by this class is not a Gauß–Seidel iteration: we simply start a number of threads, and each thread updates a value using a Gauß–Seidel-like rule. As a result, each update uses some old and some new values: in other words, the regular splitting

M − N = I − α (P + uTd)
of the matrix associated to each update is always different (in a Gauß–Seidel iteration, M is upper triangular, and N is strictly lower triangular). Nonetheless, it is easy to check that M is still (up to permutation) upper triangular and invertible, independently of the specific update sequence.

Note that the step() method is not available: due to the need for some synchronization logic, only stepUntil(StoppingCriterion) is available.

The normDelta() method returns the following values:

  • if a suitable norm vector has been set, an upper bound on the error (the ℓ distance from the rank to be computed);
  • otherwise, an upper bound to the ℓ1 norm of the error, obtained multiplying by α / (1 − α) the ℓ1 norm of the difference between the last two approximations (this idea arose in discussions with David Gleich).

To be able to set a norm vector, you need to set the pseudoRank flag and use PowerSeries (setting the Markovian flag) to compute a suitable vector. To do so, you must provide an α and use the PowerSeries.MAX_RATIO_STOPPING_CRITERION. If the computation terminates without errors with maximum ratio σ, the resulting vector can be used with this class to compute pseudoranks for all α < 1 / σ (strictness is essential). Note that the ℓ1 norm of the error bounds the ℓ.

With respect to the description of the exact algorithm in PageRankGaussSeidel, we operate a simplification that is essentially in obtaining a coherent update without incurring in too much synchronization: the rank associated with dangling nodes is computed at the end of each computation, and used unchanged throughout the whole iteration. This corresponds to permuting the array so that dangling nodes come out last.

Author:
Sebastiano Vigna
See Also:
PageRankGaussSeidel, PageRank, SpectralRanking
  • Field Details

    • outdegree

      public int[] outdegree
      The outdegree of each node (initialized after the first computation).
    • pseudoRank

      public boolean pseudoRank
      If true, an everywhere zero dangling-node distribution will be simulated, resulting in the computation of a pseudorank.
  • Constructor Details

    • PageRankParallelGaussSeidel

      public PageRankParallelGaussSeidel​(ImmutableGraph transpose, int requestedThreads, Logger logger)
      Creates a new instance.
      Parameters:
      transpose - the transpose of the graph on which to compute PageRank.
      requestedThreads - the requested number of threads (0 for Runtime.availableProcessors()).
      logger - a logger that will be passed to super().
    • PageRankParallelGaussSeidel

      public PageRankParallelGaussSeidel​(ImmutableGraph transpose)
      Creates a new instance.
      Parameters:
      transpose - the transpose of the graph on which to compute PageRank.
    • PageRankParallelGaussSeidel

      public PageRankParallelGaussSeidel​(ImmutableGraph transpose, Logger logger)
      Creates a new instance.
      Parameters:
      transpose - the transpose of the graph on which to compute PageRank.
      logger - a logger that will be passed to super().
  • Method Details