Class PageRankParallelGaussSeidel


  • public class PageRankParallelGaussSeidel
    extends PageRank
    Computes PageRank using a parallel (multicore) implementation of the Gauß–Seidel method.

    Note: this is the implementation of choice to be used when computing PageRank. It uses less memory (one vector of doubles plus one vector of integers) and, experimentally, converges faster than any other implementation. Moreover, it scales linearly with the number of cores.

    Warning: Since we need to enumerate the predecessors a node, you must pass to the constructor the transpose of the graph.

    Technically, the iteration performed by this class is not a Gauß–Seidel iteration: we simply start a number of threads, and each thread updates a value using a Gauß–Seidel-like rule. As a result, each update uses some old and some new values: in other words, the regular splitting

    M − N = I − α (P + uTd)
    of the matrix associated to each update is always different (in a Gauß–Seidel iteration, M is upper triangular, and N is strictly lower triangular). Nonetheless, it is easy to check that M is still (up to permutation) upper triangular and invertible, independently of the specific update sequence.

    Note that the step() method is not available: due to the need for some synchronization logic, only stepUntil(StoppingCriterion) is available.

    The normDelta() method returns the following values:

    • if a suitable norm vector has been set, an upper bound on the error (the ℓ distance from the rank to be computed);
    • otherwise, an upper bound to the ℓ1 norm of the error, obtained multiplying by α / (1 − α) the ℓ1 norm of the difference between the last two approximations (this idea arose in discussions with David Gleich).

    To be able to set a norm vector, you need to set the pseudoRank flag and use PowerSeries (setting the Markovian flag) to compute a suitable vector. To do so, you must provide an α and use the PowerSeries.MAX_RATIO_STOPPING_CRITERION. If the computation terminates without errors with maximum ratio σ, the resulting vector can be used with this class to compute pseudoranks for all α < 1 / σ (strictness is essential). Note that the ℓ1 norm of the error bounds the ℓ.

    With respect to the description of the exact algorithm in PageRankGaussSeidel, we operate a simplification that is essentially in obtaining a coherent update without incurring in too much synchronization: the rank associated with dangling nodes is computed at the end of each computation, and used unchanged throughout the whole iteration. This corresponds to permuting the array so that dangling nodes come out last.

    Author:
    Sebastiano Vigna
    See Also:
    PageRankGaussSeidel, PageRank, SpectralRanking
    • Field Detail

      • outdegree

        public int[] outdegree
        The outdegree of each node (initialized after the first computation).
      • pseudoRank

        public boolean pseudoRank
        If true, an everywhere zero dangling-node distribution will be simulated, resulting in the computation of a pseudorank.
    • Constructor Detail

      • PageRankParallelGaussSeidel

        public PageRankParallelGaussSeidel​(ImmutableGraph transpose,
                                           int requestedThreads,
                                           Logger logger)
        Creates a new instance.
        Parameters:
        transpose - the transpose of the graph on which to compute PageRank.
        requestedThreads - the requested number of threads (0 for Runtime.availableProcessors()).
        logger - a logger that will be passed to super().
      • PageRankParallelGaussSeidel

        public PageRankParallelGaussSeidel​(ImmutableGraph transpose)
        Creates a new instance.
        Parameters:
        transpose - the transpose of the graph on which to compute PageRank.
      • PageRankParallelGaussSeidel

        public PageRankParallelGaussSeidel​(ImmutableGraph transpose,
                                           Logger logger)
        Creates a new instance.
        Parameters:
        transpose - the transpose of the graph on which to compute PageRank.
        logger - a logger that will be passed to super().
    • Method Detail

      • normVector

        public void normVector​(String normVectorFilename,
                               double sigma)
                        throws IOException
        Sets the norm vector.
        Parameters:
        normVectorFilename - a file containing a norm vector as a list of doubles in DataInput format, or null for no norm vector.
        sigma - the value for which the provided norm vector is suitable.
        Throws:
        IOException
      • normVector

        public void normVector​(double[] normVector,
                               double sigma)
        Sets the norm vector.
        Parameters:
        normVector - the new norm vector.
        sigma - the value for which the provided norm vector is suitable.
      • normDelta

        public double normDelta()
        Return the following values: if a suitable norm vector has been set, an upper bound on the error (the ℓ distance from the rank to be computed); otherwise, an upper bound to the ℓ1 norm of the error, obtained multiplying by α / (1 − α) the ℓ1 norm of the difference between the last two approximations (this idea arose in discussions with David Gleich).
        Overrides:
        normDelta in class SpectralRanking
        Returns:
        an upper bound on the error.
      • clear

        public void clear()
        Description copied from class: SpectralRanking
        Clears all data and releases resources by nulling SpectralRanking.rank (i.e., results we no longer be available). Please extend this method to handle additional attributes.
        Overrides:
        clear in class SpectralRanking