All Classes
Class  Description 

AbstractFilter<T> 
An abstract implementation of a
Filter providing a method
that helps in implementing properly Object.toString() for atomic (i.e., classbased) filters. 
AbstractHttpResponse 
An abstract implementation of
HttpResponse providing a AbstractHttpResponse.toWarcRecord(WarcRecord) method that can
be used to populate a WARC record (in order to write it). 
ArcColouringStrategy 
A colouring on the arcs.

AveragePrecisionCorrelation 
Computes the AP (averageprecision) correlation between two score vectors without ties.

BFS 
Computes the visit order with respect to a breadthfirst visit.

BFS 
Computes the visit order with respect to a breadthfirst visit.

BinaryParser 
A universal binary parser that just computes digests.

BoundedCountingInputStream 
A class that decorates an
InputStream to obtain a
MeasurableInputStream . 
BURL  Deprecated.
Use BUbiNG's BURL.

ByteArrayCharSequence 
An adapter exposing a byte array as an ISO88591encoded
character sequence.

CompressedIntLabel 
An integer label that uses a coder/decoder pair depending on the source node.

CompressWarc 
A tool to compress a WARC file.

ConsistentHashFunction<T extends Comparable<? super T>> 
Provides an implementation of consistent hashing.

ConsistentHashFunction.SkipStrategy<T> 
Allows to skip suitable items when searching for the closest replica.

ContentTypeStartsWith 
Accepts only fetched response whose content type starts with a given string.

CorrelationIndex 
An abstract class providing basic infrastructure for all classes computing some correlation index
between two score big vectors, such as
KendallTau . 
CorrelationIndex 
An abstract class providing basic infrastructure for all classes computing some correlation index between two score vectors,
such as
KendallTau , WeightedTau and AveragePrecisionCorrelation . 
CosineSimilarityStrategy 
A class that compute the similarity between pattern using cosine similarity.

CRC64 
Provides static methods to compute 64bit CRCs of strings and byte arrays.

CutWarc 
A class to extract specific records from a WARC file.

DataInput2Text 
The main method of this class converts a binary
DataOutput file containing numbers to text format. 
DataInput2Text.Type  
DenseVector 
A mutable implementation of
Vector optimized for dense vectors. 
DFS  Deprecated.
This class performs a stackbased visit, but technically not a DFS.

DigestBasedDuplicateDetection 
Allows to determine if an
HttpResponse is duplicate. 
DigestEquals 
Accepts only records of given digest, specified as a hexadecimal string.

Digester 
A callback computing the digest of a page.

DominantEigenvectorParallelPowerMethod 
Computes the left dominant eigenvalue and eigenvector of a graph using a parallel implementation of the power method.

DuplicateSegmentsLessThan 
Accepts only URIs whose path does not contain too many duplicate segments.

EuclideanSimilarityStrategy 
A class that compute the similarity between pattern using the euclidean distance.

ExchangeCounter 
Computes the number of discordances between two score big vectors using Knight's
O(n log n) MergeSortbased algorithm.

ExchangeCounter 
Computes the number of discordances between two score vectors using Knight's
O(n log n) MergeSortbased algorithm.

ExchangeWeigher 
Computes the weight of discordances using a generalisation of Knight's algorithm.

ExchangeWeigher 
Computes the weight of discordances using a generalisation of Knight's algorithm.

ExtractDigestUrls 
A tool to extract digests and URLs from response records of a WARC file.

ExtractLinks 
Extracts links from a WARC file.

Filter<T> 
A filter is a strategy to decide whether to accept a given
object or not.

FilterParser<T> 
A simple parser that transforms a filter expression into a filter.

FilterParserConstants 
Token literal values and constants.

FilterParserTokenManager 
Token Manager.

Filters 
A collection of static methods to deal with
filters . 
GrepWarc 
A "grep" for WARC files.

GZWarcRecord  
GZWarcRecord.GZHeader 
A class to contain fields contained in the gzip header.

GZWarcStats 
A tool to compute some statistics about a gzipped WARC file.

HostEndsWith 
Accepts only URIs whose host ends with (caseinsensitively) a certain suffix.

HostEquals 
Accepts only URIs whose host equals (caseinsensitively) a certain string.

HTMLParser 
An HTML parser with additional responsibilities (such as guessing the character encoding
and resolving relative URLs).

HTMLParser.SetLinkReceiver  
HttpComponentsHttpResponse 
An concrete subclass of
AbstractHttpResponse that implements
missing methods by wrapping an Apache HTTP Components HttpResponse . 
HttpComponentsHttpResponse.HttpResponseHeaderMap 
A wrapper class exposing headers in
HttpResponse.headers()
format by delegating to an HttpResponse . 
HttpResponse 
Provides high level access to WARC records with
recordtype equal to
response and contenttype equal to HTTP
(or HTTPS ). 
HttpResponseFilteredIterator 
A class to iterate over WARC files getting only records corresponding to
HttpResponse that satisfy a given filter. 
ImmutableSparseVector 
An immutable implementation of
Vector optimized for sparse vectors. 
IndexWarc 
A tool to index a WARC file.

InspectableBufferedInputStream 
An input stream that wraps an underlying input stream to make it
rewindable and partially inspectable, using a boundedcapacity memory buffer and an overflow file.

InspectableBufferedInputStream.State 
The possible states of this stream, as explained above.

Int2DoubleMapVector 
A mutable implementation of
Vector for sparse vectors. 
IntegerExchangeCounter 
Computes the number of discordances between two integer score big vectors
using Knight's O(n log n)
MergeSortbased algorithm.

IsHttpResponse 
Accepts only records that are http/https responses.

IsProbablyBinary 
Accepts only http responses whose content stream appears to be binary.

KahanSummation 
Kahan's
summation algorithm encapsulated in an object.

KatzParallelGaussSeidel 
Computes Katz's index using a parallel implementation of the Gauß–Seidel method; this is the implementation of choice to be used when computing Katz's index.

KendallAssortativity 
Computes Kendall's assortativities between the list of degrees of sources and targets of
arcs of a graph.

KendallTau 
Computes Kendall's τ between two score big vectors.

KendallTau 
Computes Kendall's τ between two score vectors.

LayeredLabelPropagation 
A big implementation of the layered label propagation algorithm described by by Paolo
Boldi, Sebastiano Vigna, Marco Rosa, Massimo Santini, and Sebastiano Vigna in “Layered
label propagation: A multiresolution coordinatefree ordering for compressing social
networks”, Proceedings of the 20th international conference on World Wide Web, pages
587−596, ACM, 2011.

LayeredLabelPropagation 
An implementation of the layered label propagation algorithm described by
by Paolo Boldi, Sebastiano Vigna, Marco Rosa, Massimo Santini, and Sebastiano Vigna in “Layered label propagation:
A multiresolution coordinatefree ordering for compressing social networks”,
Proceedings of the 20th international conference on World Wide Web, pages 587−596, ACM, 2011.

LeftSingularVectorParallelPowerMethod 
Computes the left singular vector of a graph using a parallel implementation of the power method.

ListGZWarcComments 
A tool to list the GZip header comments contained in a compressed WARC file.

MeasurableSequenceInputStream 
A
MeasurableInputStream version of a SequenceInputStream . 
MetadataHttpResponse 
An abstract extention of
AbstractHttpResponse which additionally provides support
for getting and setting metadata (i.e., MetadataHttpResponse.uri() , MetadataHttpResponse.statusLine() , MetadataHttpResponse.status() and MetadataHttpResponse.headers() ). 
MetadataHttpResponse.HeaderMap 
A special map used for headers: keys are caseinsensitive, and multiple puts are converted into commaseparated values.

MinimumBase 
Static methods to compute the minimum fibration base of a given graph.

MutableHttpResponse 
A mutable extension of
MetadataHttpResponse that provides
support for setting the content stream. 
NodeColouringStrategy 
A colouring on the nodes.

Norm 
An
Enum providing different ℓ norms. 
NormL1  Deprecated.
Use
Norm.L_1 . 
NormL2  Deprecated.
Use
Norm.L_2 . 
NumberDistinctLines 
The main method of this class reads a UTF8 file containg a newline separated
list of strings and writes a
DataOutputStream containing a
list of ints such that the ith int is equal to the jth
int iff the (crc of the) ith
string is equal to the (crc of
the) jth string. 
PageRank 
A big version of
PageRank . 
PageRank 
An abstract class defining methods and attributes supporting PageRank computations.

PageRankFromCoefficients 
Computes PageRank using its power series.

PageRankGaussSeidel 
Computes PageRank of a graph using the Gauß–Seidel method.

PageRankParallelGaussSeidel 
A big version of
PageRankParallelGaussSeidel . 
PageRankParallelGaussSeidel 
Computes PageRank using a parallel (multicore) implementation of the Gauß–Seidel method.

PageRankParallelPowerSeries 
Computes PageRank using a parallel (multicore) implementation of the powerseries method, which runs
the power method starting from the preference vector, thus evaluating the truncated PageRank power series (see
PageRankPowerSeries ). 
PageRankPowerSeries 
Computes PageRank (and possibly its derivatives in the damping factor) using its power series.

PageRankPush 
Computes strongly preferential PageRank for a preference vector concentrated on a node using the push algorithm.

PageRankPush.EmptyQueueStoppingCritertion  
PageRankPush.IntHeapIndirectPriorityQueue  
PageRankPush.L1NormStoppingCritertion  
ParseException 
This exception is thrown when parse errors are encountered.

Parser 
A generic parser for
responses . 
Parser.LinkReceiver 
A class that can receive URLs discovered during parsing.

PartwiseMinimumBase 
Static methods to compute the minimum fibration base of a given graph using a partwise algorithm.

PathEndsWithOneOf 
Accepts only URIs whose path ends (caseinsensitively) with one of a given set of suffixes.

PearsonAssortativity 
Computes Pearson assortativities between the list of degrees of sources and targets of
arcs of a graph.

PowerSeries 
Computes a power series on a graph using a parallel implementation.

Precision 
A set of commodity methods to manipulate precision of doubles.

PreProcessedMinimumBase 
Static methods to compute the minimum opfibration base of a given graph.

RemappedStringMap  
RemoveHubs 
Removes nodes from a graph following a number of strategies.

Response 
Provides high level access to WARC records with
recordtype equal to
response . 
Salsa 
Computes the SALSA score using a noniterative method.

Salsa.UnionFind  
SchemeEquals 
Accepts only URIs whose scheme equals a certain string (typically,
http ). 
SequentialHttpResponseRead  
SequentialHttpResponseWrite  
SequentialWarcRecordRead  
SequentialWarcRecordWrite  
SimilarityStrategy 
An interface specifying methods used to obtain pattern similarities.

SimpleCharStream 
An implementation of interface CharStream, where the stream is assumed to
contain only ASCII characters (without unicode processing).

SpectralRanking 
A big version of
SpectralRanking . 
SpectralRanking 
A base abstract class defining methods and attributes supporting computations
of graph spectral rankings such as the dominant eigenvector,
PageRank or Katz's index.

SpectralRanking.IterationNumberStoppingCriterion 
A stopping criterion that stops whenever the number of iterations exceeds a given bound.

SpectralRanking.IterationNumberStoppingCriterion 
A stopping criterion that stops whenever the number of iterations exceeds a given bound.

SpectralRanking.NormStoppingCriterion 
A stopping criterion that evaluates
SpectralRanking.normDelta() , and stops
if this value is smaller than a given threshold. 
SpectralRanking.NormStoppingCriterion 
A stopping criterion that evaluates
SpectralRanking.normDelta() , and stops
if this value is smaller than a given threshold. 
SpectralRanking.StoppingCriterion 
A a strategy that decides when a computation should be stopped.

SpectralRanking.StoppingCriterion 
A a strategy that decides when a computation should be stopped.

StatusCategory 
Accepts only fetched response whose status category (status/100) has a certain value.

Text2DataOutput 
The main method of this class converts converts a text file containing numbers to binary
DataOutput format. 
Token 
Describes the input token stream.

TokenMgrError 
Token Manager Error.

URLEquals 
Accepts only a given URIs.

URLMatchesRegex 
Accepts only URIs that match a certain regular expression.

URLShorterThan 
Accepts only URIs whose overall length is below a given threshold.

Util 
A static container of utility methods for all LAW software.

Util 
Static utility methods.

Vector 
A class representing a vector of
double . 
WarcFilteredIterator 
A class to iterate over WARC files getting only records that satisfy a given filter.

WarcHttpResponse 
An
AbstractHttpResponse implementation that reads the response
content from a WARC record (via the WarcHttpResponse.fromWarcRecord(WarcRecord) method. 
WarcRecord 
A class to read/write WARC/0.9 records (for format details, please see the WARC format specifications).

WarcRecord.ContentType 
Content types.

WarcRecord.FormatException 
An exception to denote parsing errors during reads.

WarcRecord.Header 
A class to contain fields contained in the warc
header . 
WarcRecord.RecordType 
Record types.

WeightedTau 
Computes the weighted τ between two big score vectors.

WeightedTau 
Computes the weighted τ between two score vectors.

WeightedTau.AbstractWeigher  
WeightedTau.AbstractWeigher 