Package it.unimi.dsi.law.warc.util
Class Util
java.lang.Object
it.unimi.dsi.law.warc.util.Util
public class Util extends Object
Static utility methods.
-
Field Summary
Fields Modifier and Type Field Description static Hash.Strategy<String>
CASE_INSENSITIVE_STRING_HASH_STRATEGY
The strategy used to decide whether two header names are the same: we require that they are equal up to case. -
Method Summary
Modifier and Type Method Description static void
consume(InputStream in)
Consumes all the bytes of a stream.static void
consume(InputStream in, long howMany)
Consumes a given number of bytes from a stream.static File
createHierarchicalTempFile(File baseDirectory, int pathElements, String prefix, String suffix)
Creates a temporary file with a random hierachical path.static int
digits(int x)
Returns the number of decimal digits that are necessary to represent the argument.static int
digits(long x)
Returns the number of decimal digits that are necessary to represent the argument.static boolean
ensureDirectory(File dir)
Checks if the given File exists and is a directory, or if not existent, it makes a directory (and its parent).static void
fixURL(MutableString url)
Fixes a given URL so that it isBURL
-parsable.static byte[]
fromHexString(String s)
Returns a byte array corresponding to the given number.static byte[]
getASCIIBytes(MutableString s)
Returns the given ASCII mutable string as a byte array; characters are filtered through the 1111111(=0x7F) mask.static byte[]
getASCIIBytes(String s)
Returns the given ASCII string as a byte array; characters are filtered through the 1111111(=0x7F) mask.static String
getString(byte[] array)
Returns the given byte array as an ASCII string.static String
getString(byte[] array, int offset, int length)
Returns the given byte array as an ASCII string.static int
log10(int x)
Returns ⌊ log10(x
) ⌋.static int
log10(long x)
Returns ⌊ log10(x
) ⌋.static String[]
parseCommaSeparatedProperty(String s)
The given string is parsed as a comma-separated list of items, and the items are returned in the form of an array, possibly after resolving an indirection.static void
readANVLHeaders(MeasurableInputStream is, Map<String,String> map, Charset charset)
Parses headers from the given stream.static String
readHeaderLine(InputStream inputStream, Charset charset)
Return byte array from an (unchunked) input stream.static org.apache.http.StatusLine
readStatusLine(MeasurableInputStream is, Charset charset)
static String
toHexString(byte[] a)
Returns a mutable string representing in hexadecimal a digest.static void
writeANVLHeaders(OutputStream out, Map<String,String> map, Charset charset)
Writes a (name, value) map as an ANVL segment in a given stream.
-
Field Details
-
CASE_INSENSITIVE_STRING_HASH_STRATEGY
The strategy used to decide whether two header names are the same: we require that they are equal up to case.
-
-
Method Details
-
getASCIIBytes
Returns the given ASCII string as a byte array; characters are filtered through the 1111111(=0x7F) mask.- Parameters:
s
- a string.- Returns:
s
as a byte array.
-
getASCIIBytes
Returns the given ASCII mutable string as a byte array; characters are filtered through the 1111111(=0x7F) mask.- Parameters:
s
- a mutable string.- Returns:
s
as a byte array.
-
getString
Returns the given byte array as an ASCII string. -
getString
Returns the given byte array as an ASCII string. -
log10
public static int log10(int x)Returns ⌊ log10(x
) ⌋.- Parameters:
x
- an integer.- Returns:
- ⌊ log10(
x
) ⌋, or -1 ifx
is smaller than or equal to zero.
-
log10
public static int log10(long x)Returns ⌊ log10(x
) ⌋.- Parameters:
x
- an integer.- Returns:
- ⌊ log10(
x
) ⌋, or -1 ifx
is smaller than or equal to zero.
-
digits
public static int digits(int x)Returns the number of decimal digits that are necessary to represent the argument.- Parameters:
x
- a nonnegative integer.- Returns:
- the number of decimal digits that are necessary to represent
x
.
-
digits
public static int digits(long x)Returns the number of decimal digits that are necessary to represent the argument.- Parameters:
x
- a nonnegative long.- Returns:
- the number of decimal digits that are necessary to represent
x
.
-
parseCommaSeparatedProperty
The given string is parsed as a comma-separated list of items, and the items are returned in the form of an array, possibly after resolving an indirection. More precisely,s
is tokenized as a comma-separated list, and each item in the list is trimmed of all leading and trailing spaces. Then, if the remaining character sequence does not start with@
, it is interpreted literally; otherwise, the@
is stripped away and the remaining part is interpreted as a URL or as a filename (depending on whether it is a valid URL or not), and the corresponding URL or file is in turn read (ISO-8859-1 encoded) interpreted as a list of items, one per line, and the items are returned (literally) after trimming all leading and trailing spaces. Lines that start with a # are ignored.- Parameters:
s
- the property to be parsed.- Returns:
- the array of items (as explaied above).
- Throws:
IOException
- if an exception is thrown while reading indirect items.
-
consume
Consumes a given number of bytes from a stream.- Parameters:
in
- the stream.howMany
- the number of bytes to read, actually fewer bytes may be read if end of file is reached.- Throws:
IOException
-
consume
Consumes all the bytes of a stream.- Parameters:
in
- the stream.- Throws:
IOException
-
readHeaderLine
Return byte array from an (unchunked) input stream. Stop reading when "\n" terminator encountered If the stream ends before the line terminator is found, the last part of the string will still be returned. If no input data available,null
is returned.- Parameters:
inputStream
- the stream to read from.charset
- the charset used to decode the stream.- Returns:
- the read line.
- Throws:
IOException
- if an I/O problem occurs
-
readStatusLine
public static org.apache.http.StatusLine readStatusLine(MeasurableInputStream is, Charset charset) throws IOException- Throws:
IOException
-
readANVLHeaders
public static void readANVLHeaders(MeasurableInputStream is, Map<String,String> map, Charset charset) throws IOException, WarcRecord.FormatExceptionParses headers from the given stream. Headers with the same name are not combined.- Parameters:
is
- the stream to read headers frommap
- is the map where the headers will be savedcharset
- the charset to use for reading the data- Throws:
IOException
- if an IO error occurs while reading from the streamWarcRecord.FormatException
-
writeANVLHeaders
Writes a (name, value) map as an ANVL segment in a given stream.- Parameters:
out
- the stream.map
- the map.charset
- the charset of the headers.
-
toHexString
Returns a mutable string representing in hexadecimal a digest.- Parameters:
a
- a digest, as a byte array.- Returns:
- a string hexadecimal representation of
a
.
-
fromHexString
Returns a byte array corresponding to the given number.- Parameters:
s
- the number, as a String.- Returns:
- the byte array.
-
createHierarchicalTempFile
public static File createHierarchicalTempFile(File baseDirectory, int pathElements, String prefix, String suffix) throws IOExceptionCreates a temporary file with a random hierachical path.A random hierarchical path of n path elements is a sequence of n directories of two hexadecimal digits each, followed by a filename created by
File.createTempFile(String, String, File)
.This method creates an empty file having a random hierarchical path of the specified number of path elements under a given base directory, creating all needed directories along the hierarchical path (whereas the base directory is expected to already exist).
- Parameters:
baseDirectory
- the base directory (it must exist).pathElements
- the number of path elements (filename excluded), must be in [0,8]prefix
- will be passed toFile.createTempFile(String, String, File)
suffix
- will be passed toFile.createTempFile(String, String, File)
- Returns:
- the temporary file.
- Throws:
IOException
-
fixURL
Fixes a given URL so that it isBURL
-parsable.- Parameters:
url
- a URL, possibly with bad characters in its path.
-
ensureDirectory
Checks if the given File exists and is a directory, or if not existent, it makes a directory (and its parent).
-