Class CorrelationIndex

java.lang.Object
it.unimi.dsi.law.stat.CorrelationIndex
Direct Known Subclasses:
AveragePrecisionCorrelation, KendallTau, WeightedTau

public abstract class CorrelationIndex
extends Object
An abstract class providing basic infrastructure for all classes computing some correlation index between two score vectors, such as KendallTau, WeightedTau and AveragePrecisionCorrelation.

Implementing classes have just to implement compute(double[], double[]) to get a wealth of support method, including loading data in different formats and parsing file types.

  • Constructor Details

    • CorrelationIndex

      protected CorrelationIndex()
  • Method Details

    • compute

      public abstract double compute​(double[] v0, double[] v1)
      Computes the correlation between two score vectors.

      Note that this method must be called with some care if you're right on memory. More precisely, the two arguments should be built on the fly in the method call, and not stored in variables, as the some of the argument arrays might be null'd during the execution of this method to free some memory: if the array is referenced elsewhere the garbage collector will not be able to collect it.

      Parameters:
      v0 - the first score vector.
      v1 - the second score vector; in asymmetric correlation indices, this should be the reference score.
      Returns:
      the correlation.
    • computeDoubles

      public double computeDoubles​(CharSequence f0, CharSequence f1) throws IOException
      Computes the correlation between two score vectors.
      Parameters:
      f0 - the binary file of doubles containing the first score vector.
      f1 - the binary file of doubles containing the second score vector.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeDoubles

      public double computeDoubles​(CharSequence f0, CharSequence f1, boolean reverse) throws IOException
      Computes the correlation between two (possible reversed) score vectors.
      Parameters:
      f0 - the binary file of doubles containing the first score vector.
      f1 - the binary file of doubles containing the second score vector.
      reverse - whether to reverse the ranking induced by the score vectors by loading opposite values.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeDoubles

      public double computeDoubles​(CharSequence f0, CharSequence f1, int digits) throws IOException
      Computes the correlation between two score vectors with a given precision.
      Parameters:
      f0 - the binary file of doubles containing the first score vector.
      f1 - the binary file of doubles containing the second score vector.
      digits - the number of digits to be preserved when computing the correlation.
      Returns:
      the correlation.
      Throws:
      IOException
      See Also:
      Precision.truncate(double[], int)
    • computeDoubles

      public double computeDoubles​(CharSequence f0, CharSequence f1, boolean reverse, int digits) throws IOException
      Computes the correlation between two (possible reversed) score vectors with a given precision.
      Parameters:
      f0 - the binary file of doubles containing the first score vector.
      f1 - the binary file of doubles containing the second score vector.
      reverse - whether to reverse the ranking induced by the score vectors by loading opposite values.
      digits - the number of digits to be preserved when computing the correlation.
      Returns:
      the correlation.
      Throws:
      IOException
      See Also:
      Precision.truncate(double[], int)
    • computeFloats

      public double computeFloats​(CharSequence f0, CharSequence f1) throws IOException
      Computes the correlation between two score vectors.
      Parameters:
      f0 - the binary file of floats containing the first score vector.
      f1 - the binary file of floats containing the second score vector.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeFloats

      public double computeFloats​(CharSequence f0, CharSequence f1, boolean reverse) throws IOException
      Computes the correlation between two (possibly reversed) score vectors.
      Parameters:
      f0 - the binary file of floats containing the first score vector.
      f1 - the binary file of floats containing the second score vector.
      reverse - whether to reverse the ranking induced by the score vectors by loading opposite values.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeFloats

      public double computeFloats​(CharSequence f0, CharSequence f1, int digits) throws IOException
      Computes the correlation between two score vectors with a given precision.
      Parameters:
      f0 - the binary file of floats containing the first score vector.
      f1 - the binary file of floats containing the second score vector.
      digits - the number of digits to be preserved when computing the correlation.
      Returns:
      the correlation.
      Throws:
      IOException
      See Also:
      Precision.truncate(double[], int)
    • computeFloats

      public double computeFloats​(CharSequence f0, CharSequence f1, boolean reverse, int digits) throws IOException
      Computes the correlation between two (possibly reversed) score vectors with a given precision.
      Parameters:
      f0 - the binary file of floats containing the first score vector.
      f1 - the binary file of floats containing the second score vector.
      digits - the number of digits to be preserved when computing the correlation.
      reverse - whether to reverse the ranking induced by the score vectors by loading opposite values.
      Returns:
      the correlation.
      Throws:
      IOException
      See Also:
      Precision.truncate(double[], int)
    • computeInts

      public double computeInts​(CharSequence f0, CharSequence f1) throws IOException
      Computes the correlation between two score vectors.
      Parameters:
      f0 - the binary file of integers containing the first score vector.
      f1 - the binary file of integers containing the second score vector.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeInts

      public double computeInts​(CharSequence f0, CharSequence f1, boolean reverse) throws IOException
      Computes the correlation between two (possibly reversed) score vectors.
      Parameters:
      f0 - the binary file of integers containing the first score vector.
      f1 - the binary file of integers containing the second score vector.
      reverse - whether to reverse the ranking induced by the score vectors by loading opposite values.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeLongs

      public double computeLongs​(CharSequence f0, CharSequence f1) throws IOException
      Computes the correlation between two score vectors.
      Parameters:
      f0 - the binary file of longs containing the first score vector.
      f1 - the binary file of longs containing the second score vector.
      Returns:
      the correlation.
      Throws:
      IOException
    • computeLongs

      public double computeLongs​(CharSequence f0, CharSequence f1, boolean reverse) throws IOException
      Computes the correlation between (possibly reversed) two score vectors.
      Parameters:
      f0 - the binary file of longs containing the first score vector.
      f1 - the binary file of longs containing the second score vector.
      reverse - whether to reverse the ranking induced by the score vectors by loading opposite values.
      Returns:
      the correlation.
      Throws:
      IOException
    • compute

      public double compute​(CharSequence f0, Class<?> inputType0, CharSequence f1, Class<?> inputType1, boolean reverse, int digits) throws IOException
      Computes the correlation between two (possibly reversed) score vectors with a given precision.
      Parameters:
      f0 - the file containing the first score vector.
      inputType0 - the input type of the first score vector.
      f1 - the file containing the second score vector.
      inputType1 - the input type of the second score vector.
      reverse - whether to reverse the ranking induced by the score vectors by loading opposite values.
      digits - the number of digits to be preserved when computing the correlation. they are assumed to be in binary format.
      Returns:
      the correlation.
      Throws:
      IOException
      See Also:
      Precision.truncate(double[], int)
    • compute

      public double compute​(CharSequence f0, Class<?> inputType0, CharSequence f1, Class<?> inputType1, boolean reverse) throws IOException
      Computes the correlation between two (possibly reversed) score vectors.
      Parameters:
      f0 - the file containing the first score vector.
      inputType0 - the input type of the first score vector.
      f1 - the file containing the second score vector.
      inputType1 - the input type of the second score vector.
      reverse - whether to reverse the ranking induced by the score vectors by loading opposite values. they are assumed to be in binary format.
      Returns:
      the correlation.
      Throws:
      IOException
      See Also:
      Precision.truncate(double[], int)
    • compute

      public double compute​(CharSequence f0, Class<?> inputType0, CharSequence f1, Class<?> inputType1, int digits) throws IOException
      Computes the correlation between two score vectors with a given precision.
      Parameters:
      f0 - the file containing the first score vector.
      inputType0 - the input type of the first score vector.
      f1 - the file containing the second score vector.
      inputType1 - the input type of the second score vector.
      digits - the number of digits to be preserved when computing the correlation. they are assumed to be in binary format.
      Returns:
      the correlation.
      Throws:
      IOException
      See Also:
      Precision.truncate(double[], int)
    • compute

      public double compute​(CharSequence f0, CharSequence f1, Class<?> inputType) throws IOException
      Computes the correlation between two score vectors.
      Parameters:
      f0 - the file containing the first score vector.
      f1 - the file containing the second score vector.
      inputType - the input type.
      Returns:
      the correlation.
      Throws:
      IOException
      See Also:
      Precision.truncate(double[], int)
    • loadAsDoubles

      public static double[] loadAsDoubles​(CharSequence f, Class<?> inputType, boolean reverse) throws IOException
      Loads a vector of doubles, either in binary or textual form.
      Parameters:
      f - a filename.
      inputType - the input type, expressed as a class: Double, Float, Integer, Long or String to denote a text file.
      reverse - whether to reverse the ranking induced by the score vector by loading opposite values.
      Returns:
      an array of double obtained reading f.
      Throws:
      IllegalArgumentException - if reverse is true, the type is integer or long and Integer.MIN_VALUE or Long.MIN_VALUE, respectively, appear in the file, as we cannot take the opposite.
      IOException
    • parseInputTypes

      public static Class<?>[] parseInputTypes​(JSAPResult jsapResult)
      Commodity method to extract from a JSAPResult instance the file type information provided by the user, or supply the default (doubles in binary form). We look into the parameter type and we look for either a single type, or two types separated by a colon. The types can be double, float, int, long or text. If the parameter is not specified, we return the type Double for both formats.
      Parameters:
      jsapResult - the result of the parsing of a command line.
      Returns:
      a array containing two classes, representing the type of the files to be loaded (using loadAsDoubles(CharSequence, Class, boolean)'s conventions).