Class BURL

java.lang.Object
it.unimi.dsi.law.bubing.util.BURL

@Deprecated
public final class BURL
extends Object
Deprecated.
Static methods to manipulate normalized, canonical URLs.
  • Field Details

    • FORBIDDEN_CHARS

      public static final char[] FORBIDDEN_CHARS
      Deprecated.
      Characters that will cause a URI spec to be rejected.
    • BAD_CHAR

      public static final char[] BAD_CHAR
      Deprecated.
      A list of bad characters. It includes the backslash, replaced by the slash, and illegal characters such as spaces and braces, which are replaced by the equivalent percent escape. Square brackets are percent-escaped, too, albeit legal in some circumstances, as they appear frequently in paths.
    • BAD_CHAR_SUBSTITUTE

      public static final String[] BAD_CHAR_SUBSTITUTE
      Deprecated.
      Substitutes for bad characters.
  • Method Details

    • parse

      public static URI parse​(String spec)
      Deprecated.
      Creates a new BUbiNG URL from a string specification if possible, or returns null otherwise.
      Parameters:
      spec - the string specification for a URL.
      Returns:
      a BUbiNG URL corresponding to spec without possibly the fragment, or null if spec is malformed.
      See Also:
      parse(MutableString)
    • parse

      public static URI parse​(MutableString spec)
      Deprecated.
      Creates a new BUbiNG URL from a mutable string specification if possible, or returns null otherwise.

      The conditions for this method not returning null are as follows:

      • spec, once trimmed, must not contain characters in FORBIDDEN_CHARS;
      • once characters in BAD_CHAR have been substituted with the corresponding strings in BAD_CHAR_SUBSTITUTE, and percent signs not followed by two hexadecimal digits have been substituted by %25, spec must not throw an exception when made into a URI.
      • the URI instance so obtained must not be opaque.
      • the URI instance so obtained, if absolute, must have a non-null authority.

      For efficiency, this method modifies the provided specification, and in particular it makes it loose. Caveat emptor.

      Fragments are removed (for a web crawler fragments are just noise). Normalization is applied for you. Scheme and host name are downcased. If the URL has no host name, it is guaranteed that the path is non-null and non-empty (by adding a slash, if necessary). If the host name ends with a dot, it is removed.

      Parameters:
      spec - the string specification for a URL; it can be modified by this method, and in particularly it will always be made loose.
      Returns:
      a BUbiNG URL corresponding to spec without possibly the fragment, or null if spec is malformed.
      See Also:
      parse(String)
    • fromNormalizedByteArray

      public static URI fromNormalizedByteArray​(byte[] normalized)
      Deprecated.
      Creates a new BUbiNG URL from a normalized ASCII string represented by a byte array.

      The string represented by the argument will not go through parse(MutableString). URI.create(String) will be used instead.

      Parameters:
      normalized - a normalized URI string represented by a byte array.
      Returns:
      the corresponding BUbiNG URL.
      Throws:
      IllegalArgumentException - if normalized does not parse correctly.
    • fromNormalizedSchemeAuthorityAndPathQuery

      public static URI fromNormalizedSchemeAuthorityAndPathQuery​(String schemeAuthority, byte[] normalizedPathQuery)
      Deprecated.
      Creates a new BUbiNG URL from a normalized ASCII string representing scheme and authority and a byte-array representation of a normalized ASCII path and query.

      This method is intended to combine the results of schemeAndAuthority(URI)/ schemeAndAuthority(byte[]) and pathAndQueryAsByteArray(byte[])( pathAndQueryAsByteArray(URI).

      Parameters:
      schemeAuthority - an ASCII string representing scheme and authorty.
      normalizedPathQuery - the byte-array representation of a normalized ASCII path and query.
      Returns:
      the corresponding BUbiNG URL.
      Throws:
      IllegalArgumentException - if the two parts, concatenated, do not parse correctly.
    • toByteArray

      public static byte[] toByteArray​(URI url)
      Deprecated.
      Returns an ASCII byte-array representation of a BUbiNG URL.
      Parameters:
      url - a BUbiNG URL.
      Returns:
      an ASCII byte-array representation of uri
    • toString

      public static String toString​(byte[] url)
      Deprecated.
      Returns an ASCII byte-array representation of a BUbiNG URL.
      Parameters:
      url - a BUbiNG URL.
      Returns:
      an ASCII byte-array representation of uri
    • pathAndQueryAsByteArray

      public static byte[] pathAndQueryAsByteArray​(URI url)
      Deprecated.
      Returns an ASCII byte-array representation of the raw path and raw query of a BUbiNG URL.
      Parameters:
      url - a BUbiNG URL.
      Returns:
      an ASCII byte-array representation of the raw path and raw query of a BUbiNG URL.
    • pathAndQuery

      public static String pathAndQuery​(URI url)
      Deprecated.
      Returns the concatenated raw path and raw query of a BUbiNG URL.
      Parameters:
      url - a BUbiNG URL.
      Returns:
      the concatenated raw path and raw query of uri.
    • schemeAndAuthority

      public static String schemeAndAuthority​(URI url)
      Deprecated.
      Returns the concatenated URI.getScheme() and raw authority of a BUbiNG URL.
      Parameters:
      url - a BUbiNG URL.
      Returns:
      the concatenated URI.getScheme() and raw authority of uri.
    • host

      public static String host​(byte[] url)
      Deprecated.
      Extracts the host of an absolute BUbiNG URL in its byte-array representation.
      Parameters:
      url - a byte-array representation of a BUbiNG URL.
      Returns:
      the host of url.
    • pathAndQueryAsByteArray

      public static byte[] pathAndQueryAsByteArray​(byte[] url)
      Deprecated.
      Extracts the path and query of an absolute BUbiNG URL in its byte-array representation.
      Parameters:
      url - a byte-array representation of a BUbiNG URL.
      Returns:
      the path and query in byte-array representation.
    • hostFromSchemeAndAuthority

      public static String hostFromSchemeAndAuthority​(String schemeAuthority)
      Deprecated.
      Extracts the host part from a scheme and authority by removing the scheme, the user info and the port number.
      Parameters:
      schemeAuthority - a scheme and authority.
      Returns:
      the host part.
    • schemeAndAuthority

      public static String schemeAndAuthority​(byte[] url)
      Deprecated.
      Extracts the scheme and authority of an absolute BUbiNG URL in its byte-array representation.
      Parameters:
      url - an absolute BUbiNG URL.
      Returns:
      the scheme and authority of url.
    • memoryUsageOf

      public static int memoryUsageOf​(byte[] array)
      Deprecated.
      Returns the memory usage associated to a byte array.

      This method is useful in establishing the memory footprint of URLs in byte-array representation.

      Parameters:
      array - a byte array.
      Returns:
      its memory usage in bytes.