Class CorrelationIndex

java.lang.Object
it.unimi.dsi.law.big.stat.CorrelationIndex
Direct Known Subclasses:
KendallTau, WeightedTau

public abstract class CorrelationIndex
extends Object
An abstract class providing basic infrastructure for all classes computing some correlation index between two score big vectors, such as KendallTau.

Implementing classes have just to implement compute(double[][], double[][]) to get a wealth of support method, including loading data in different formats and parsing file types.

  • Constructor Details

    • CorrelationIndex

      protected CorrelationIndex()
  • Method Details

    • compute

      public abstract double compute​(double[][] v0, double[][] v1)
      Computes the correlation between two score big vectors.

      Note that this method must be called with some care if you're right on memory. More precisely, the two arguments should be built on the fly in the method call, and not stored in variables, as the some of the argument arrays might be null'd during the execution of this method to free some memory: if the array is referenced elsewhere the garbage collector will not be able to collect it.

      Parameters:
      v0 - the first score big vector.
      v1 - the second score big vector; in asymmetric correlation indices, this should be the reference score.
      Returns:
      the correlation.
    • computeDoubles

      public double computeDoubles​(CharSequence f0, CharSequence f1) throws IOException
      Computes the correlation between two score big vectors.
      Parameters:
      f0 - the binary file of doubles containing the first score big vector.
      f1 - the binary file of doubles containing the second score big vector.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeDoubles

      public double computeDoubles​(CharSequence f0, CharSequence f1, boolean reverse) throws IOException
      Computes the correlation between two (possible reversed) score big vectors.
      Parameters:
      f0 - the binary file of doubles containing the first score big vector.
      f1 - the binary file of doubles containing the second score big vector.
      reverse - whether to reverse the ranking induced by the score big vectors by loading opposite values.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeFloats

      public double computeFloats​(CharSequence f0, CharSequence f1) throws IOException
      Computes the correlation between two score big vectors.
      Parameters:
      f0 - the binary file of floats containing the first score big vector.
      f1 - the binary file of floats containing the second score big vector.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeFloats

      public double computeFloats​(CharSequence f0, CharSequence f1, boolean reverse) throws IOException
      Computes the correlation between two (possibly reversed) score big vectors.
      Parameters:
      f0 - the binary file of floats containing the first score big vector.
      f1 - the binary file of floats containing the second score big vector.
      reverse - whether to reverse the ranking induced by the score big vectors by loading opposite values.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeInts

      public double computeInts​(CharSequence f0, CharSequence f1) throws IOException
      Computes the correlation between two score big vectors.
      Parameters:
      f0 - the binary file of integers containing the first score big vector.
      f1 - the binary file of integers containing the second score big vector.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeInts

      public double computeInts​(CharSequence f0, CharSequence f1, boolean reverse) throws IOException
      Computes the correlation between two (possibly reversed) score big vectors.
      Parameters:
      f0 - the binary file of integers containing the first score big vector.
      f1 - the binary file of integers containing the second score big vector.
      reverse - whether to reverse the ranking induced by the score big vectors by loading opposite values.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeLongs

      public double computeLongs​(CharSequence f0, CharSequence f1) throws IOException
      Computes the correlation between two score big vectors.
      Parameters:
      f0 - the binary file of longs containing the first score big vector.
      f1 - the binary file of longs containing the second score big vector.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeLongs

      public double computeLongs​(CharSequence f0, CharSequence f1, boolean reverse) throws IOException
      Computes the correlation between (possibly reversed) two score big vectors.
      Parameters:
      f0 - the binary file of longs containing the first score big vector.
      f1 - the binary file of longs containing the second score big vector.
      reverse - whether to reverse the ranking induced by the score big vectors by loading opposite values.
      Returns:
      the correlation.
      Throws:
      IOException
    • compute

      public double compute​(CharSequence f0, CharSequence f1, Class<?> inputType) throws IOException
      Computes the correlation between two score big vectors.
      Parameters:
      f0 - the file containing the first score big vector.
      f1 - the file containing the second score big vector.
      inputType - the input type.
      Returns:
      the correlation.
      Throws:
      IOException
    • compute

      public double compute​(CharSequence f0, Class<?> inputType0, CharSequence f1, Class<?> inputType1) throws IOException
      Computes the correlation between two score big vectors.
      Parameters:
      f0 - the file containing the first score big vector.
      inputType0 - the input type of the first score big vector.
      f1 - the file containing the second score big vector.
      inputType1 - the input type of the second score big vector.
      Returns:
      the correlation.
      Throws:
      IOException
    • compute

      public double compute​(CharSequence f0, Class<?> inputType0, CharSequence f1, Class<?> inputType1, boolean reverse) throws IOException
      Computes the correlation between two (possibly reversed) score big vectors.
      Parameters:
      f0 - the file containing the first score big vector.
      inputType0 - the input type of the first score big vector.
      f1 - the file containing the second score big vector.
      inputType1 - the input type of the second score big vector.
      reverse - whether to reverse the ranking induced by the score big vectors by loading opposite values. they are assumed to be in binary format.
      Returns:
      the correlation.
      Throws:
      IOException
    • loadAsDoubles

      public static double[][] loadAsDoubles​(CharSequence f, Class<?> inputType, boolean reverse) throws IOException
      Loads a big vector of doubles, either in binary or textual form.
      Parameters:
      f - a filename.
      inputType - the input type, expressed as a class: Double, Float, Integer, Long or String to denote a text file.
      reverse - whether to reverse the ranking induced by the score big vector by loading opposite values.
      Returns:
      an array of double obtained reading f.
      Throws:
      IllegalArgumentException - if reverse is true, the type is integer or long and Integer.MIN_VALUE or Long.MIN_VALUE, respectively, appear in the file, as we cannot take the opposite.
      IOException