Class Util

java.lang.Object
it.unimi.dsi.law.warc.util.Util

public class Util
extends Object
Static utility methods.
  • Field Details

    • CASE_INSENSITIVE_STRING_HASH_STRATEGY

      public static final Hash.Strategy<String> CASE_INSENSITIVE_STRING_HASH_STRATEGY
      The strategy used to decide whether two header names are the same: we require that they are equal up to case.
  • Method Details

    • getASCIIBytes

      public static byte[] getASCIIBytes​(String s)
      Returns the given ASCII string as a byte array; characters are filtered through the 1111111(=0x7F) mask.
      Parameters:
      s - a string.
      Returns:
      s as a byte array.
    • getASCIIBytes

      public static byte[] getASCIIBytes​(MutableString s)
      Returns the given ASCII mutable string as a byte array; characters are filtered through the 1111111(=0x7F) mask.
      Parameters:
      s - a mutable string.
      Returns:
      s as a byte array.
    • getString

      public static String getString​(byte[] array)
      Returns the given byte array as an ASCII string.
    • getString

      public static String getString​(byte[] array, int offset, int length)
      Returns the given byte array as an ASCII string.
    • log10

      public static int log10​(int x)
      Returns ⌊ log10(x) ⌋.
      Parameters:
      x - an integer.
      Returns:
      ⌊ log10(x) ⌋, or -1 if x is smaller than or equal to zero.
    • log10

      public static int log10​(long x)
      Returns ⌊ log10(x) ⌋.
      Parameters:
      x - an integer.
      Returns:
      ⌊ log10(x) ⌋, or -1 if x is smaller than or equal to zero.
    • digits

      public static int digits​(int x)
      Returns the number of decimal digits that are necessary to represent the argument.
      Parameters:
      x - a nonnegative integer.
      Returns:
      the number of decimal digits that are necessary to represent x.
    • digits

      public static int digits​(long x)
      Returns the number of decimal digits that are necessary to represent the argument.
      Parameters:
      x - a nonnegative long.
      Returns:
      the number of decimal digits that are necessary to represent x.
    • parseCommaSeparatedProperty

      public static String[] parseCommaSeparatedProperty​(String s) throws IOException
      The given string is parsed as a comma-separated list of items, and the items are returned in the form of an array, possibly after resolving an indirection. More precisely, s is tokenized as a comma-separated list, and each item in the list is trimmed of all leading and trailing spaces. Then, if the remaining character sequence does not start with @, it is interpreted literally; otherwise, the @ is stripped away and the remaining part is interpreted as a URL or as a filename (depending on whether it is a valid URL or not), and the corresponding URL or file is in turn read (ISO-8859-1 encoded) interpreted as a list of items, one per line, and the items are returned (literally) after trimming all leading and trailing spaces. Lines that start with a # are ignored.
      Parameters:
      s - the property to be parsed.
      Returns:
      the array of items (as explaied above).
      Throws:
      IOException - if an exception is thrown while reading indirect items.
    • consume

      public static void consume​(InputStream in, long howMany) throws IOException
      Consumes a given number of bytes from a stream.
      Parameters:
      in - the stream.
      howMany - the number of bytes to read, actually fewer bytes may be read if end of file is reached.
      Throws:
      IOException
    • consume

      public static void consume​(InputStream in) throws IOException
      Consumes all the bytes of a stream.
      Parameters:
      in - the stream.
      Throws:
      IOException
    • readHeaderLine

      public static String readHeaderLine​(InputStream inputStream, Charset charset) throws IOException
      Return byte array from an (unchunked) input stream. Stop reading when "\n" terminator encountered If the stream ends before the line terminator is found, the last part of the string will still be returned. If no input data available, null is returned.
      Parameters:
      inputStream - the stream to read from.
      charset - the charset used to decode the stream.
      Returns:
      the read line.
      Throws:
      IOException - if an I/O problem occurs
    • readStatusLine

      public static org.apache.http.StatusLine readStatusLine​(MeasurableInputStream is, Charset charset) throws IOException
      Throws:
      IOException
    • readANVLHeaders

      public static void readANVLHeaders​(MeasurableInputStream is, Map<String,​String> map, Charset charset) throws IOException, WarcRecord.FormatException
      Parses headers from the given stream. Headers with the same name are not combined.
      Parameters:
      is - the stream to read headers from
      map - is the map where the headers will be saved
      charset - the charset to use for reading the data
      Throws:
      IOException - if an IO error occurs while reading from the stream
      WarcRecord.FormatException
    • writeANVLHeaders

      public static void writeANVLHeaders​(OutputStream out, Map<String,​String> map, Charset charset)
      Writes a (name, value) map as an ANVL segment in a given stream.
      Parameters:
      out - the stream.
      map - the map.
      charset - the charset of the headers.
    • toHexString

      public static String toHexString​(byte[] a)
      Returns a mutable string representing in hexadecimal a digest.
      Parameters:
      a - a digest, as a byte array.
      Returns:
      a string hexadecimal representation of a.
    • fromHexString

      public static byte[] fromHexString​(String s)
      Returns a byte array corresponding to the given number.
      Parameters:
      s - the number, as a String.
      Returns:
      the byte array.
    • createHierarchicalTempFile

      public static File createHierarchicalTempFile​(File baseDirectory, int pathElements, String prefix, String suffix) throws IOException
      Creates a temporary file with a random hierachical path.

      A random hierarchical path of n path elements is a sequence of n directories of two hexadecimal digits each, followed by a filename created by File.createTempFile(String, String, File).

      This method creates an empty file having a random hierarchical path of the specified number of path elements under a given base directory, creating all needed directories along the hierarchical path (whereas the base directory is expected to already exist).

      Parameters:
      baseDirectory - the base directory (it must exist).
      pathElements - the number of path elements (filename excluded), must be in [0,8]
      prefix - will be passed to File.createTempFile(String, String, File)
      suffix - will be passed to File.createTempFile(String, String, File)
      Returns:
      the temporary file.
      Throws:
      IOException
    • fixURL

      public static void fixURL​(MutableString url)
      Fixes a given URL so that it is BURL-parsable.
      Parameters:
      url - a URL, possibly with bad characters in its path.
    • ensureDirectory

      public static boolean ensureDirectory​(File dir)
      Checks if the given File exists and is a directory, or if not existent, it makes a directory (and its parent).