Class Digester
- All Implemented Interfaces:
Callback
public class Digester extends Object implements Callback
The page is somewhat simplified before being passed (as a sequence of bytes obtained
by breaking each character into the upper and lower byte) to MessageDigest.update(byte[]).
All start/end tags are case-normalized, and all their content (except for the
element-type name) is removed. An exception is made for SRC attribute of
FRAME and IFRAME elements, as they are necessary to
distinguish correctly framed pages without alternative text. The attributes will be resolved
w.r.t. the URL associated to the page.
To avoid clashes between digests coming from different sites, you can optionally set a URL
whose authority that will be used to update the digest before adding the actual text page.
You can set the URL with url(URI). A good idea is to use
the host name (or even the authority).
-
Field Summary
Fields inherited from interface it.unimi.dsi.parser.callback.Callback
EMPTY_CALLBACK_ARRAY -
Constructor Summary
-
Method Summary
Modifier and Type Method Description booleancdata(Element element, char[] data, int offset, int length)booleancharacters(char[] data, int offset, int length, boolean flowBroken)voidconfigure(BulletParser parser)byte[]digest()Returns the digest computed.voidendDocument()booleanendElement(Element element)voidstartDocument()booleanstartElement(Element element, Map<Attribute,MutableString> attributes)voidurl(URI uri)Sets the URI that will be used to tune the next digest.
-
Constructor Details
-
Digester
Creates a new callback using the given message digest.- Parameters:
algorithm- a message digest algorithm (to be passed toMessageDigest.getInstance(java.lang.String)).- Throws:
NoSuchAlgorithmException
-
-
Method Details
-
configure
-
digest
public byte[] digest()Returns the digest computed.- Returns:
- the digest computed.
-
url
Sets the URI that will be used to tune the next digest.- Parameters:
uri- a URI, ornullfor no URL.
-
startDocument
public void startDocument()- Specified by:
startDocumentin interfaceCallback
-
startElement
- Specified by:
startElementin interfaceCallback
-
endElement
- Specified by:
endElementin interfaceCallback
-
characters
public boolean characters(char[] data, int offset, int length, boolean flowBroken)- Specified by:
charactersin interfaceCallback
-
cdata
-
endDocument
public void endDocument()- Specified by:
endDocumentin interfaceCallback
-