Package it.unimi.dsi.law.warc.parser
Interface Parser
- All Superinterfaces:
Cloneable
,Filter<Response>
,com.google.common.base.Predicate<Response>
,Predicate<Response>
- All Known Implementing Classes:
BinaryParser
,HTMLParser
public interface Parser extends Filter<Response>, Cloneable
A generic parser for
responses
. It provides link extraction through a
Parser.LinkReceiver
callback and optional digesting.-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static interface
Parser.LinkReceiver
A class that can receive URLs discovered during parsing. -
Field Summary
Fields Modifier and Type Field Description static Parser.LinkReceiver
NULL_LINK_RECEIVER
A no-op implementation ofParser.LinkReceiver
.Fields inherited from interface it.unimi.dsi.law.warc.filters.Filter
FILTER_PACKAGE_NAME
-
Method Summary
Modifier and Type Method Description String
guessedCharset()
Returns a guessed charset for the document, ornull
if the charset could not be guessed.byte[]
parse(Response response, Parser.LinkReceiver linkReceiver)
Parses a response.Methods inherited from interface com.google.common.base.Predicate
apply, equals, test
-
Field Details
-
NULL_LINK_RECEIVER
A no-op implementation ofParser.LinkReceiver
.
-
-
Method Details
-
parse
Parses a response.- Parameters:
response
- a response to parse.linkReceiver
- a link receiver.- Returns:
- a byte digest for the page, or
null
if no digest has been computed. - Throws:
IOException
-
guessedCharset
String guessedCharset()Returns a guessed charset for the document, ornull
if the charset could not be guessed.- Returns:
- a charset or
null
.
-