Class SpectralRanking

java.lang.Object
it.unimi.dsi.law.rank.SpectralRanking
Direct Known Subclasses:
DominantEigenvectorParallelPowerMethod, KatzParallelGaussSeidel, LeftSingularVectorParallelPowerMethod, PageRank, PowerSeries

public abstract class SpectralRanking
extends Object
A base abstract class defining methods and attributes supporting computations of graph spectral rankings such as the dominant eigenvector, PageRank or Katz's index. For some elaboration on the relationships between these rankings, see “Spectral Ranking” by Sebastiano Vigna.

The usage pattern for concrete subclasses is as follows: first create an instance specifying the graph over which the spectral ranking should be computed. Then, modify public available attributes to fine-tune the ranking algorithm. Finally,

  • either call the init() method, which initializes the state, and then repeatedly call the step() method; every call will compute the next approximation; the current approximation is contained in the rank attribute;
  • or call the stepUntil(SpectralRanking.StoppingCriterion) method, which calls init() and then iterates the step() method until a certain stopping criterion is met; a SpectralRanking.StoppingCriterion is a class that decides whether the iteration can be stopped. The SpectralRanking class provides two ready-to-use implementations of stopping criteria:
    • one that decides to stop depending on whether the value returned by normDelta() (if implemented) is smaller than a certain threshold.
    • another that decides to stop on the basis of the number of iterations.

    Moreover, this class provides two static methods that compose stopping criteria in a conjunctive or disjunctive way.

At any time, the user may re-initialize the computation, by calling the init() method, or call clear() to get rid of the large arrays that the implementing subclasses usually manage. In the latter case, the arrays are rebuilt on the next call to init().

Choosing a threshold

The stopping threshold used by SpectralRanking.NormStoppingCriterion should be set so to obtain a reasonable number of significant digits. In some cases this requires to adapt the threshold to the graph: for instance, PageRank is a stochastic vector, so its entries tend to be very small (of order 1/n, where n is the number of nodes in the graph). You should be wary about digits that are not significant, as they can lead to very biased results when comparing using Kendall's τ rankings with a significant amount of ties (see “Traps and pitfalls of topic-biased PageRank”, by Paolo Boldi, Roberto Posenato, Massimo Santini, and Sebastiano Vigna, WAW 2006. Fourth Workshop on Algorithms and Models for the Web-Graph, volume 4936 of Lecture Notes in Computer Science, pages 107−116, Springer, 2008).

Note that, depending on the implementation, normDelta() might return a bound on the norm of the difference between the current iterate and the target ranking (e.g., PageRankPowerSeries and PageRankParallelPowerSeries, or any Gauß–Seidel implementation using a w-norm), a tentative estimate of the same bound (e.g., DominantEigenvectorParallelPowerMethod) or simply the norm of the difference between two successive iterates. Precision up to the chosen threshold is guaranteed only in the first case.

Author:
Sebastiano Vigna
  • Field Details

    • DEFAULT_THRESHOLD

      public static final double DEFAULT_THRESHOLD
      Default threshold (note that this value is used as a default by main methods).
      See Also:
      Constant Field Values
    • DEFAULT_MAX_ITER

      public static final int DEFAULT_MAX_ITER
      Default maximum number of iterations (note that this value is used as a default by main methods).
      See Also:
      Constant Field Values
    • DEFAULT_NORM

      public static final Norm DEFAULT_NORM
      The default norm (Norm.L_INFINITY).
    • STOCHASTIC_TOLERANCE

      protected static final double STOCHASTIC_TOLERANCE
      The admitted tolerance in the verification that a vector is a stochastic one. A stochastic vector is nonnegative and has ℓ1 norm equal to 1 ± STOCHASTIC_TOLERANCE.
      See Also:
      Constant Field Values
    • graph

      public final ImmutableGraph graph
      The graph.
    • n

      public final int n
      The number of nodes of graph, cached.
    • logger

      public final Logger logger
      A logger defined by the implementing subclasses.
    • rank

      public double[] rank
      The current rank vector.
    • iteration

      public int iteration
      The current step (0 after initialization).
  • Constructor Details

    • SpectralRanking

      public SpectralRanking​(ImmutableGraph graph, Logger logger)
      Creates a new instance.
      Parameters:
      graph - the graph.
      logger - a logger.
  • Method Details

    • and

      Composes two stopping criteria, producing a single stopping criterion (the computation stops iff both conditions become true; lazy boolean evaluation is applied).
      Parameters:
      stop1 - a stopping criterion.
      stop2 - a stopping criterion.
      Returns:
      a criterion that decides to stop as soon as both criteria are satisfied.
    • or

      Composes two stopping criteria, producing a single stopping criterion (the computation stops iff either condition becomes true; lazy boolean evaluation is applied).
      Parameters:
      stop1 - a stopping criterion.
      stop2 - a stopping criterion.
      Returns:
      a criterion that decides to stop as soon as one of the two criteria is satisfied.
    • isStochastic

      protected static boolean isStochastic​(DoubleList v)
      Commodity method checking whether a vector is stochastic (nonnegative entries summing up to one within STOCHASTIC_TOLERANCE).

      This method uses Kahan's summation algorithm.

      Parameters:
      v - the vector to check.
      Returns:
      true if the vector is stochastic.
    • buildProperties

      public Properties buildProperties​(String graphBasename)
      Returns a Properties object that contains all parameters used by the computation.

      Implementing subclasses should extends this method by calling super() and setting additional properties on the resulting Properties.

      Parameters:
      graphBasename - basename of the graph
      Returns:
      a properties object that represent all the parameters used to calculate the ranking.
    • init

      public void init() throws IOException
      Initializes the rank vector, zeroes iteration and logs basic data. Please extend this method to handle additional attributes.
      Throws:
      IOException
    • step

      public abstract void step() throws IOException
      Performs one computation step.
      Throws:
      IOException
    • normDelta

      public double normDelta()
      Returns the norm of an estimation of the distance to the limit of the iterative process: depending on the implementation, this can be an actual bound or, for example, just the difference between the last two approximations.

      This method must be implemented by concrete subclasses if you want to use SpectralRanking.NormStoppingCriterion.

      Returns:
      the norm of an estimation of the distance to the limit.
      Throws:
      IllegalStateException - if called before the first iteration.
      UnsupportedOperationException - if it is not possible to compute a norm.
    • stepUntil

      public void stepUntil​(SpectralRanking.StoppingCriterion stoppingCriterion) throws IOException
      Calls init() and steps until a given stopping criterion is met. The criterion is checked a posteriori (i.e., after each step); this means that at least one step is performed.
      Parameters:
      stoppingCriterion - the stopping criterion to be used.
      Throws:
      IOException
    • clear

      public void clear()
      Clears all data and releases resources by nulling rank (i.e., results we no longer be available). Please extend this method to handle additional attributes.
    • approximateNormVector

      protected byte[] approximateNormVector​(DoubleIterator doubleIterator)
      Returns a compact logarithmic approximation of a norm vector.
      Parameters:
      doubleIterator - an iterator enumerating a norm vector.
      Returns:
      an array of bytes containing the opposite of a lower bound on the binary logarithm of the doubles returned by the iterator.