Class WeightedTau

java.lang.Object
it.unimi.dsi.law.big.stat.CorrelationIndex
it.unimi.dsi.law.big.stat.WeightedTau

public class WeightedTau
extends CorrelationIndex
Computes the weighted τ between two big score vectors. More precisely, this class computes the formula given by Sebastiano Vigna in “A weighted correlation index for rankings with ties”, Proc. of the 24th International World–Wide Web Conference, pages 1166−1176, 2015, ACM Press, using the algorithm therein described (see details below).

Given two scores vectors for a list of items, this class provides a method to compute efficiently the weighted τ using an ExchangeWeigher.

Instances of this class are immutable. At creation time you can specify a weigher that turns indices into weights, and whether to combine weights additively or multiplicatively. Ready-made weighers include HYPERBOLIC_WEIGHER, which is the weigher of choice. Alternatives include LOGARITHMIC_WEIGHER and QUADRATIC_WEIGHER. Additional methods inherited from CorrelationIndex make it possible to compute directly the weighted τ bewteen two files, to bound the number of significant digits, or to reverse the standard association between scores and ranks (by default, a larger score corresponds to a higher rank, i.e., to a smaller rank index; the largest score gets rank 0).

The weighted τ is defined as follows: consider a rank function ρ (returning natural numbers or ∞) that provides a ground truth—it tells us which elements are more or less important. Consider also a weight function w(−, −) associating with each pair of ranks a nonnegative real number. We define the rank-weighted τ by

r, sρ,w = ∑ij sgn(rirj) sgn(sisj) w(ρ(i), ρ(j))
rρ,w = 〈r, rρ,w1/2
τρ,w(r, s) = 〈r, sρ,w / (‖rρ,wsρ,w).

The weight function can be specified by giving a weigher f (e.g., HYPERBOLIC_WEIGHER) and a combination strategy, which can be additive or multiplicative. The weight of the exchange between i and j is then f(i) ● f(j), where ● is the chosen combinator.

Now, consider the rank function ρr, s induced by the lexicographical order by r and s. We define

τw = (τρr, s, w + τρs, r, w) / 2.

In particular, the (additive) hyperbolic τ is defined by the weight function h(i) = 1 / (i + 1) combined additively:

τh = (τρr, s, h + τρs, r, h) / 2.

The methods inherited from CorrelationIndex compute the formula above using the provided weigher and combination method. A ready-made instance HYPERBOLIC can be used to compute the additive hyperbolic τ. An ad hoc method can instead compute τρ,w.

A main method is provided for command-line usage.

  • Field Details

    • HYPERBOLIC_WEIGHER

      public static final Long2DoubleFunction HYPERBOLIC_WEIGHER
      A hyperbolic weigher (the default one). Rank x has weight 1 / (x + 1).
    • QUADRATIC_WEIGHER

      public static final Long2DoubleFunction QUADRATIC_WEIGHER
      A quadratic weigher. Rank x has weight 1 / (x + 1)2.
    • LOGARITHMIC_WEIGHER

      public static final Long2DoubleFunction LOGARITHMIC_WEIGHER
      A logarithmic weigher. Rank x has weight 1 / ln(x + e).
    • ZERO_WEIGHER

      public static final Long2DoubleFunction ZERO_WEIGHER
      A constant zero weigher.
    • HYPERBOLIC

      public static final WeightedTau HYPERBOLIC
      A singleton instance of the symmetric hyperbolic additive τ.
  • Constructor Details

    • WeightedTau

      public WeightedTau()
      Create an additive hyperbolic τ.
    • WeightedTau

      public WeightedTau​(Long2DoubleFunction hyperbolicWeigher)
      Create an additive weighted τ using the specified weigher.
      Parameters:
      hyperbolicWeigher - a weigher.
    • WeightedTau

      public WeightedTau​(Long2DoubleFunction hyperbolicWeigher, boolean multiplicative)
      Create an additive or multiplicative weighted τ using the specified weigher and combination strategy.
      Parameters:
      hyperbolicWeigher - a weigher.
      multiplicative - if true, weights are combined multiplicatively, rather than additively.
  • Method Details

    • compute

      public double compute​(double[][] v0, double[][] v1)
      Computes the symmetrized weighted τ between two score vectors.
      Specified by:
      compute in class CorrelationIndex
      Parameters:
      v0 - the first score big vector.
      v1 - the second score big vector.
      Returns:
      the symmetric weighted τ.
    • compute

      public double compute​(double[][] v0, double[][] v1, long[][] rank)
      Computes the weighted τ between two score big vectors, given a reference rank.

      Note that this method must be called with some care. More precisely, the two arguments should be built on-the-fly in the method call, and not stored in variables, as the first argument array will be null'd during the execution of this method to free some memory: if the array is referenced elsewhere the garbage collector will not be able to collect it.

      Parameters:
      v0 - the first score big vector.
      v1 - the second score big vector.
      rank - the “ground truth” ranking used to weight exchanges, or null to use the ranking induced lexicographically by v1 and v0 as ground truth.
      Returns:
      the weighted τ.
    • main

      public static void main​(String[] arg) throws NumberFormatException, IOException, JSAPException
      Throws:
      NumberFormatException
      IOException
      JSAPException