es.yrbcn.graph.weighted
Class WeightedPageRank

java.lang.Object
  extended by es.yrbcn.graph.weighted.WeightedPageRank
Direct Known Subclasses:
WeightedPageRankPowerMethod

public abstract class WeightedPageRank
extends Object

A base abstract class definining methods and attributes supporting PageRank (or similar) computations. Includes features such as: settable preference vector, settable damping factor, programmable stopping criteria, step-by-step execution, reusability.

Users of this class should first create an instance specifying the graph over which PageRank should be computed. After doing this, the user may change the data used to compute PageRank (by manually setting the attributes alpha, preference, start), and then (s)he may proceed in one of the following ways:

At any time, the user may re-initialize the computation, by calling the init() method, or (s)he may call the clear() method that gets rid of the large arrays that the implementing classes usually manage. In the latter case, the arrays are rebuilt on the next call to init().

Formulae and preferences

There are two main formulae for PageRank in the literature. The first one, which we shall call weakly preferential, patches all dangling nodes by adding a uniform transition towards all other nodes. The second one, which we shall call strongly preferential, patches all dangling nodes adding transitions weighted following the preference vector v. We can consider the two formulae together, letting u be a vector that is uniform in the weak case and coincides with v in the strong case.

If we denote with P the normalised adjacency matrix of the graph, with d the characteristic vector of dangling nodes, and with α the damping factor, the weakly preferential equation is

x'= x' (α P + αdu' + (1-α)1 v')

By default, weakly preferential PageRank is computed; strongly preferential PageRank computation is enforced by setting stronglyPreferential to true. In the init() method the variable preferentialAdjustment is set to null iff weakly preferential PageRank should be computed, or to preference if strongly preferential PageRank should be computed.


Nested Class Summary
static class WeightedPageRank.IterationNumberStoppingCriterion
          A stopping criterion that stops whenever the number of iterations exceeds a given bound.
static class WeightedPageRank.Norm
          Possible norms, with an implementation.
static class WeightedPageRank.NormDeltaStoppingCriterion
          A stopping criterion that evaluates the norm of the difference between the last two iterates, and stops if this value is smaller than a given threshold.
static interface WeightedPageRank.StoppingCriterion
          A stopping criterion is a strategy that decides when a PageRank computation should be stopped.
 
Field Summary
 double alpha
          The alpha (damping) factor.
protected  BitSet buckets
          If not null, the set of buckets of g.
static double DEFAULT_ALPHA
          The default damping factor.
static int DEFAULT_MAX_ITER
          Default maximum number of iterations.
static double DEFAULT_THRESHOLD
          The default precision.
protected  ArcLabelledImmutableGraph g
          The underlying graph.
 int iterationNumber
          The current step number (0 after initialization).
protected  org.apache.log4j.Logger logger
          A logger defined by the concrete classes.
 WeightedPageRank.Norm norm
          Current norm WeightedPageRank.Norm.L1 is the default value.
protected  int numNodes
          The number of nodes of the underlying graph.
 DoubleList preference
          The preference vector to be used (or null if the uniform preference vector should be used).
 DoubleList preferentialAdjustment
          The vector used for preferential adjustment (u in the above general formula); it coincides with the preference vector if strongly preferential PageRank is desired, or to null otherwise.
 double[] rank
          The current rank vector.
 DoubleList start
          The starting vector to be used at the beginning of the PageRank algorithm (or null if the uniform starting vector should be used).
static double STOCHASTIC_TOLERANCE
          The admitted tolerance in the verification if a vector is a stochastic one.
 boolean stronglyPreferential
          Decides whether we use the strongly or weakly preferential algorithm.
 float[] sumoutweight
          The total out-weight of nodes.
 
Constructor Summary
WeightedPageRank(ArcLabelledImmutableGraph g, org.apache.log4j.Logger logger)
          Creates a new instance calculator with uniform start vector and uniform preference vector.
 
Method Summary
static WeightedPageRank.StoppingCriterion and(WeightedPageRank.StoppingCriterion stop1, WeightedPageRank.StoppingCriterion stop2)
          Composes two stopping criteria, producing a single stopping criterion (the computation stops iff both conditions become true; lazy boolean evaluation is applied).
 it.unimi.dsi.util.Properties buildProperties(String graphBasename, String preferenceFilename, String startFilename)
          Returns a Properties object that contains all the parameters used by the computation.
 void clear()
          Clears all data.
 void init()
          Initializes the variables for PageRank computation.
protected static boolean isStochastic(DoubleList v)
          Checks if the parameter is a stochastic vector: not negative value and 1.0 ± STOCHASTIC_TOLERANCE L1-norm.
 double normDelta()
          Returns a norm of the difference with the previous step rank vector.
static WeightedPageRank.StoppingCriterion or(WeightedPageRank.StoppingCriterion stop1, WeightedPageRank.StoppingCriterion stop2)
          Composes two stopping criteria, producing a single stopping criterion (the computation stops iff either condition becomes true; lazy boolean evaluation is applied).
abstract  void step()
          Performs one computation step.
 void stepUntil(WeightedPageRank.StoppingCriterion stoppingCriterion)
          Calls init() and steps until a given stopping criterion is met.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MAX_ITER

public static final int DEFAULT_MAX_ITER
Default maximum number of iterations.

See Also:
Constant Field Values

DEFAULT_THRESHOLD

public static final double DEFAULT_THRESHOLD
The default precision. We use double precision operands. Therefore the rounding error, always bounded by machine epsilon, can be lower to 1E-16 at most. The actual approximation error of rank values depends by the stop criterion adopted. For example, if the stop criterion is the L1 norm, under the approximation error uniform distribution assumption, a DEFAULT_THRESHOLD = 1E-6 permits to find rank values with 1E-6/numNodes rounding error at most.

See Also:
Constant Field Values

STOCHASTIC_TOLERANCE

public static final double STOCHASTIC_TOLERANCE
The admitted tolerance in the verification if a vector is a stochastic one. A stochastic vector has L1 norm equals to 1 ± STOCHASTIC_TOLERANCE.

See Also:
Constant Field Values

DEFAULT_ALPHA

public static final double DEFAULT_ALPHA
The default damping factor.

See Also:
Constant Field Values

alpha

public double alpha
The alpha (damping) factor. In the random surfer interpretation, this is the probability that the surfer will follow a link in the current page.


preference

public DoubleList preference
The preference vector to be used (or null if the uniform preference vector should be used).


start

public DoubleList start
The starting vector to be used at the beginning of the PageRank algorithm (or null if the uniform starting vector should be used).


preferentialAdjustment

public DoubleList preferentialAdjustment
The vector used for preferential adjustment (u in the above general formula); it coincides with the preference vector if strongly preferential PageRank is desired, or to null otherwise.


g

protected final ArcLabelledImmutableGraph g
The underlying graph.


numNodes

protected final int numNodes
The number of nodes of the underlying graph.


buckets

protected BitSet buckets
If not null, the set of buckets of g.


rank

public double[] rank
The current rank vector.


sumoutweight

public float[] sumoutweight
The total out-weight of nodes.


iterationNumber

public int iterationNumber
The current step number (0 after initialization).


stronglyPreferential

public boolean stronglyPreferential
Decides whether we use the strongly or weakly preferential algorithm.


norm

public WeightedPageRank.Norm norm
Current norm WeightedPageRank.Norm.L1 is the default value.


logger

protected final org.apache.log4j.Logger logger
A logger defined by the concrete classes.

Constructor Detail

WeightedPageRank

public WeightedPageRank(ArcLabelledImmutableGraph g,
                        org.apache.log4j.Logger logger)
Creates a new instance calculator with uniform start vector and uniform preference vector.

Parameters:
g - the graph.
logger - a logger.
Method Detail

and

public static WeightedPageRank.StoppingCriterion and(WeightedPageRank.StoppingCriterion stop1,
                                                     WeightedPageRank.StoppingCriterion stop2)
Composes two stopping criteria, producing a single stopping criterion (the computation stops iff both conditions become true; lazy boolean evaluation is applied).

Parameters:
stop1 - a stopping criterion.
stop2 - a stopping criterion.
Returns:
a criterion that decides to stop as soon as both criteria are satisfied.

or

public static WeightedPageRank.StoppingCriterion or(WeightedPageRank.StoppingCriterion stop1,
                                                    WeightedPageRank.StoppingCriterion stop2)
Composes two stopping criteria, producing a single stopping criterion (the computation stops iff either condition becomes true; lazy boolean evaluation is applied).

Parameters:
stop1 - a stopping criterion.
stop2 - a stopping criterion.
Returns:
a criterion that decides to stop as soon as one of the two criteria is satisfied.

isStochastic

protected static boolean isStochastic(DoubleList v)
Checks if the parameter is a stochastic vector: not negative value and 1.0 ± STOCHASTIC_TOLERANCE L1-norm.

Parameters:
v - vector to check.
Returns:
true if the vector is a stochastic one.

buildProperties

public it.unimi.dsi.util.Properties buildProperties(String graphBasename,
                                                    String preferenceFilename,
                                                    String startFilename)
Returns a Properties object that contains all the parameters used by the computation.

Parameters:
graphBasename - file name of the graph
preferenceFilename - file name of preference vector. It can be null.
startFilename - file name of the eventually start vector.
Returns:
a properties object that represent all the parameters used to calculate the rank.

init

public void init()
          throws IOException
Initializes the variables for PageRank computation.

This method initialises the starting vector

Throws:
IOException

step

public abstract void step()
                   throws IOException
Performs one computation step.

Throws:
IOException

normDelta

public double normDelta()
Returns a norm of the difference with the previous step rank vector. The kind of norm is usually established in the constructor.

Returns:
a norm of the difference with the previous step rank vector.
Throws:
IllegalStateException - if called before the first iteration.
UnsupportedOperationException - if it is not possible to compute a norm.

stepUntil

public void stepUntil(WeightedPageRank.StoppingCriterion stoppingCriterion)
               throws IOException
Calls init() and steps until a given stopping criterion is met. The criterion is checked a posteriori (i.e., after each step); this means that at least one step is performed.

Parameters:
stoppingCriterion - the stopping criterion to be used.
Throws:
IOException - if an exception occurs during computation.

clear

public void clear()
Clears all data. After calling this method, data about the last PageRank computations are cleared, and you should call again init() before computing PageRank again.