Class KendallTau

java.lang.Object
it.unimi.dsi.law.big.stat.CorrelationIndex
it.unimi.dsi.law.big.stat.KendallTau

public class KendallTau
extends CorrelationIndex
Computes Kendall's τ between two score big vectors. More precisely, the this class computes the formula given by Kendall in “The treatment of ties in ranking problems”, Biometrika 33:239−251, 1945.

Note that in the literature the 1945 definition is often called τb, and τ is reserved for the original coefficient (“A new measure of rank correlation”, Biometrika 30:81−93, 1938). But this distinction is pointless, as the 1938 paper defines τ only for rankings with no ties, and the generalisation in the 1945 paper reduces exactly to the original definition if there are no ties.

Given two scores vectors for a list of items, this class provides a method to compute efficiently Kendall's τ using an ExchangeCounter.

This class is a singleton: methods must be invoked on INSTANCE. Additional methods inherited from CorrelationIndex make it possible to compute directly the score bewteen two files, or to bound the number of significant digits.

More precisely, given ri and si (i = 0, 1, …, n − 1), we say that a pair (i, j), i<j, is

  • concordant iff rirj and sisj are both non-zero and have the same sign;
  • discordant iff rirj and sisj are both non-zero and have opposite signs;
  • an r-tie iff rirj = 0;
  • an s-tie iff sisj = 0;
  • a joint tie iff rirj = 0 and sisj = 0.

Let C, D, Tr, Ts, J be the number of concordant pairs, discordant pairs, r-ties, s-ties and joint ties, respectively, and N = n(n − 1)/2. Of course C+D+Tr+TsJ = N. Kendall's τ is now

τ = (CD) / [(NTr)(NTs)]1/2

A main method is provided for command-line usage.

  • Field Details

    • INSTANCE

      public static final KendallTau INSTANCE
      The singleton instance of this class.
  • Method Details

    • compute

      public double compute​(double[][] v0, double[][] v1)
      Computes Kendall's τ between two score vectors.

      Note that this method must be called with some care. More precisely, the two arguments should be built on-the-fly in the method call, and not stored in variables, as the first argument array will be null'd during the execution of this method to free some memory: if the array is referenced elsewhere the garbage collector will not be able to collect it.

      Specified by:
      compute in class CorrelationIndex
      Parameters:
      v0 - the first score big vector.
      v1 - the second score big vector.
      Returns:
      Kendall's τ.
    • compute

      public double compute​(int[][] v0, int[][] v1)
      Computes Kendall's τ between two integer score vectors.

      Note that this method must be called with some care. More precisely, the two arguments should be built on-the-fly in the method call, and not stored in variables, as the first argument array will be null'd during the execution of this method to free some memory: if the array is referenced elsewhere the garbage collector will not be able to collect it.

      Parameters:
      v0 - the first score integer big vector.
      v1 - the second score integer big vector.
      Returns:
      Kendall's τ.
    • main

      public static void main​(String[] arg) throws NumberFormatException, IOException, JSAPException
      Throws:
      NumberFormatException
      IOException
      JSAPException