Class Digester
- All Implemented Interfaces:
Callback
public class Digester extends Object implements Callback
The page is somewhat simplified before being passed (as a sequence of bytes obtained
by breaking each character into the upper and lower byte) to MessageDigest.update(byte[])
.
All start/end tags are case-normalized, and all their content (except for the
element-type name) is removed. An exception is made for SRC
attribute of
FRAME
and IFRAME
elements, as they are necessary to
distinguish correctly framed pages without alternative text. The attributes will be resolved
w.r.t. the URL associated to the page.
To avoid clashes between digests coming from different sites, you can optionally set a URL
whose authority that will be used to update the digest before adding the actual text page.
You can set the URL with url(URI)
. A good idea is to use
the host name (or even the authority).
-
Field Summary
Fields inherited from interface it.unimi.dsi.parser.callback.Callback
EMPTY_CALLBACK_ARRAY
-
Constructor Summary
-
Method Summary
Modifier and Type Method Description boolean
cdata(Element element, char[] data, int offset, int length)
boolean
characters(char[] data, int offset, int length, boolean flowBroken)
void
configure(BulletParser parser)
byte[]
digest()
Returns the digest computed.void
endDocument()
boolean
endElement(Element element)
void
startDocument()
boolean
startElement(Element element, Map<Attribute,MutableString> attributes)
void
url(URI uri)
Sets the URI that will be used to tune the next digest.
-
Constructor Details
-
Digester
Creates a new callback using the given message digest.- Parameters:
algorithm
- a message digest algorithm (to be passed toMessageDigest.getInstance(java.lang.String)
).- Throws:
NoSuchAlgorithmException
-
-
Method Details
-
configure
-
digest
public byte[] digest()Returns the digest computed.- Returns:
- the digest computed.
-
url
Sets the URI that will be used to tune the next digest.- Parameters:
uri
- a URI, ornull
for no URL.
-
startDocument
public void startDocument()- Specified by:
startDocument
in interfaceCallback
-
startElement
- Specified by:
startElement
in interfaceCallback
-
endElement
- Specified by:
endElement
in interfaceCallback
-
characters
public boolean characters(char[] data, int offset, int length, boolean flowBroken)- Specified by:
characters
in interfaceCallback
-
cdata
-
endDocument
public void endDocument()- Specified by:
endDocument
in interfaceCallback
-