Class KendallTau
public class KendallTau extends CorrelationIndex
Note that in the literature the 1945 definition is often called τb, and τ is reserved for the original coefficient (“A new measure of rank correlation”, Biometrika 30:81−93, 1938). But this distinction is pointless, as the 1938 paper defines τ only for rankings with no ties, and the generalisation in the 1945 paper reduces exactly to the original definition if there are no ties.
Given two scores vectors for a list of items, this class provides a
method to compute efficiently Kendall's τ using
an ExchangeCounter
.
This class is a singleton: methods must be invoked on INSTANCE
. Additional methods
inherited from CorrelationIndex
make it possible to compute directly the score bewteen
two files, or to bound the number of significant digits.
More precisely, given ri and si (i = 0, 1, …, n − 1), we say that a pair (i, j), i<j, is
- concordant iff ri − rj and si − sj are both non-zero and have the same sign;
- discordant iff ri − rj and si − sj are both non-zero and have opposite signs;
- an r-tie iff ri − rj = 0;
- an s-tie iff si − sj = 0;
- a joint tie iff ri − rj = 0 and si − sj = 0.
Let C, D, Tr, Ts, J be the number of concordant pairs, discordant pairs, r-ties, s-ties and joint ties, respectively, and N = n(n − 1)/2. Of course C+D+Tr+Ts − J = N. Kendall's τ is now
τ = (C − D) / [(N − Tr)(N − Ts)]1/2
A main method is provided for command-line usage.
-
Field Summary
Fields Modifier and Type Field Description static KendallTau
INSTANCE
The singleton instance of this class. -
Method Summary
Methods inherited from class it.unimi.dsi.law.big.stat.CorrelationIndex
compute, compute, compute, computeDoubles, computeDoubles, computeFloats, computeFloats, computeInts, computeInts, computeLongs, computeLongs, loadAsDoubles
-
Field Details
-
INSTANCE
The singleton instance of this class.
-
-
Method Details
-
compute
public double compute(double[][] v0, double[][] v1)Computes Kendall's τ between two score vectors.Note that this method must be called with some care. More precisely, the two arguments should be built on-the-fly in the method call, and not stored in variables, as the first argument array will be
null
'd during the execution of this method to free some memory: if the array is referenced elsewhere the garbage collector will not be able to collect it.- Specified by:
compute
in classCorrelationIndex
- Parameters:
v0
- the first score big vector.v1
- the second score big vector.- Returns:
- Kendall's τ.
-
compute
public double compute(int[][] v0, int[][] v1)Computes Kendall's τ between two integer score vectors.Note that this method must be called with some care. More precisely, the two arguments should be built on-the-fly in the method call, and not stored in variables, as the first argument array will be
null
'd during the execution of this method to free some memory: if the array is referenced elsewhere the garbage collector will not be able to collect it.- Parameters:
v0
- the first score integer big vector.v1
- the second score integer big vector.- Returns:
- Kendall's τ.
-
main
-