All Classes

Class Description
AbstractFilter<T>
An abstract implementation of a Filter providing a method that helps in implementing properly Object.toString() for atomic (i.e., class-based) filters.
AbstractHttpResponse
An abstract implementation of HttpResponse providing a AbstractHttpResponse.toWarcRecord(WarcRecord) method that can be used to populate a WARC record (in order to write it).
ArcColouringStrategy
A colouring on the arcs.
AveragePrecisionCorrelation
Computes the AP (average-precision) correlation between two score vectors without ties.
BFS
Computes the visit order with respect to a breadth-first visit.
BFS
Computes the visit order with respect to a breadth-first visit.
BinaryParser
A universal binary parser that just computes digests.
BoundedCountingInputStream
A class that decorates an InputStream to obtain a MeasurableInputStream.
BURL Deprecated.
ByteArrayCharSequence
An adapter exposing a byte array as an ISO-8859-1-encoded character sequence.
CompressedIntLabel
An integer label that uses a coder/decoder pair depending on the source node.
CompressWarc
A tool to compress a WARC file.
ConsistentHashFunction<T extends Comparable<? super T>>
Provides an implementation of consistent hashing.
ConsistentHashFunction.SkipStrategy<T>
Allows to skip suitable items when searching for the closest replica.
ContentTypeStartsWith
Accepts only fetched response whose content type starts with a given string.
CorrelationIndex
An abstract class providing basic infrastructure for all classes computing some correlation index between two score big vectors, such as KendallTau.
CorrelationIndex
An abstract class providing basic infrastructure for all classes computing some correlation index between two score vectors, such as KendallTau, WeightedTau and AveragePrecisionCorrelation.
CosineSimilarityStrategy
A class that compute the similarity between pattern using cosine similarity.
CRC64
Provides static methods to compute 64-bit CRCs of strings and byte arrays.
CutWarc
A class to extract specific records from a WARC file.
DataInput2Text
The main method of this class converts a binary DataOutput file containing numbers to text format.
DataInput2Text.Type  
DenseVector
A mutable implementation of Vector optimized for dense vectors.
DFS Deprecated.
This class performs a stack-based visit, but technically not a DFS.
DigestBasedDuplicateDetection
Allows to determine if an HttpResponse is duplicate.
DigestEquals
Accepts only records of given digest, specified as a hexadecimal string.
Digester
A callback computing the digest of a page.
DominantEigenvectorParallelPowerMethod
Computes the left dominant eigenvalue and eigenvector of a graph using a parallel implementation of the power method.
DuplicateSegmentsLessThan
Accepts only URIs whose path does not contain too many duplicate segments.
EuclideanSimilarityStrategy
A class that compute the similarity between pattern using the euclidean distance.
ExchangeCounter
Computes the number of discordances between two score big vectors using Knight's O(n log n) MergeSort-based algorithm.
ExchangeCounter
Computes the number of discordances between two score vectors using Knight's O(n log n) MergeSort-based algorithm.
ExchangeWeigher
Computes the weight of discordances using a generalisation of Knight's algorithm.
ExchangeWeigher
Computes the weight of discordances using a generalisation of Knight's algorithm.
ExtractDigestUrls
A tool to extract digests and URLs from response records of a WARC file.
ExtractLinks
Extracts links from a WARC file.
Filter<T>
A filter is a strategy to decide whether to accept a given object or not.
FilterParser<T>
A simple parser that transforms a filter expression into a filter.
FilterParserConstants
Token literal values and constants.
FilterParserTokenManager
Token Manager.
Filters
A collection of static methods to deal with filters.
GrepWarc
A "grep" for WARC files.
GZWarcRecord
A class to read/write WARC/0.9 records in compressed form (for format details, please see the WARC and GZip format specifications).
GZWarcRecord.GZHeader
A class to contain fields contained in the gzip header.
GZWarcStats
A tool to compute some statistics about a gzipped WARC file.
HostEndsWith
Accepts only URIs whose host ends with (case-insensitively) a certain suffix.
HostEquals
Accepts only URIs whose host equals (case-insensitively) a certain string.
HTMLParser
An HTML parser with additional responsibilities (such as guessing the character encoding and resolving relative URLs).
HTMLParser.SetLinkReceiver  
HttpComponentsHttpResponse
An concrete subclass of AbstractHttpResponse that implements missing methods by wrapping an Apache HTTP Components HttpResponse.
HttpComponentsHttpResponse.HttpResponseHeaderMap
A wrapper class exposing headers in HttpResponse.headers() format by delegating to an HttpResponse.
HttpResponse
Provides high level access to WARC records with record-type equal to response and content-type equal to HTTP (or HTTPS).
HttpResponseFilteredIterator
A class to iterate over WARC files getting only records corresponding to HttpResponse that satisfy a given filter.
ImmutableSparseVector
An immutable implementation of Vector optimized for sparse vectors.
IndexWarc
A tool to index a WARC file.
InspectableBufferedInputStream
An input stream that wraps an underlying input stream to make it rewindable and partially inspectable, using a bounded-capacity memory buffer and an overflow file.
InspectableBufferedInputStream.State
The possible states of this stream, as explained above.
Int2DoubleMapVector
A mutable implementation of Vector for sparse vectors.
IntegerExchangeCounter
Computes the number of discordances between two integer score big vectors using Knight's O(n log n) MergeSort-based algorithm.
IsHttpResponse
Accepts only records that are http/https responses.
IsProbablyBinary
Accepts only http responses whose content stream appears to be binary.
KahanSummation
Kahan's summation algorithm encapsulated in an object.
KatzParallelGaussSeidel
Computes Katz's index using a parallel implementation of the Gauß–Seidel method; this is the implementation of choice to be used when computing Katz's index.
KendallAssortativity
Computes Kendall's assortativities between the list of degrees of sources and targets of arcs of a graph.
KendallTau
Computes Kendall's τ between two score big vectors.
KendallTau
Computes Kendall's τ between two score vectors.
LayeredLabelPropagation
A big implementation of the layered label propagation algorithm described by by Paolo Boldi, Sebastiano Vigna, Marco Rosa, Massimo Santini, and Sebastiano Vigna in “Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks”, Proceedings of the 20th international conference on World Wide Web, pages 587−596, ACM, 2011.
LayeredLabelPropagation
An implementation of the layered label propagation algorithm described by by Paolo Boldi, Sebastiano Vigna, Marco Rosa, Massimo Santini, and Sebastiano Vigna in “Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks”, Proceedings of the 20th international conference on World Wide Web, pages 587−596, ACM, 2011.
LeftSingularVectorParallelPowerMethod
Computes the left singular vector of a graph using a parallel implementation of the power method.
ListGZWarcComments
A tool to list the GZip header comments contained in a compressed WARC file.
MeasurableSequenceInputStream
MetadataHttpResponse
An abstract extention of AbstractHttpResponse which additionally provides support for getting and setting metadata (i.e., MetadataHttpResponse.uri(), MetadataHttpResponse.statusLine(), MetadataHttpResponse.status() and MetadataHttpResponse.headers()).
MetadataHttpResponse.HeaderMap
A special map used for headers: keys are case-insensitive, and multiple puts are converted into comma-separated values.
MinimumBase
Static methods to compute the minimum fibration base of a given graph.
MutableHttpResponse
A mutable extension of MetadataHttpResponse that provides support for setting the content stream.
NodeColouringStrategy
A colouring on the nodes.
Norm
An Enum providing different ℓ norms.
NormL1 Deprecated.
NormL2 Deprecated.
NumberDistinctLines
The main method of this class reads a UTF-8 file containg a newline separated list of strings and writes a DataOutputStream containing a list of ints such that the i-th int is equal to the j-th int iff the (crc of the) i-th string is equal to the (crc of the) j-th string.
PageRank
A big version of PageRank.
PageRank
An abstract class defining methods and attributes supporting PageRank computations.
PageRankFromCoefficients
Computes PageRank using its power series.
PageRankGaussSeidel
Computes PageRank of a graph using the Gauß–Seidel method.
PageRankParallelGaussSeidel
A big version of PageRankParallelGaussSeidel.
PageRankParallelGaussSeidel
Computes PageRank using a parallel (multicore) implementation of the Gauß–Seidel method.
PageRankParallelPowerSeries
Computes PageRank using a parallel (multicore) implementation of the power-series method, which runs the power method starting from the preference vector, thus evaluating the truncated PageRank power series (see PageRankPowerSeries).
PageRankPowerSeries
Computes PageRank (and possibly its derivatives in the damping factor) using its power series.
PageRankPush
Computes strongly preferential PageRank for a preference vector concentrated on a node using the push algorithm.
PageRankPush.EmptyQueueStoppingCritertion  
PageRankPush.IntHeapIndirectPriorityQueue  
PageRankPush.L1NormStoppingCritertion  
ParseException
This exception is thrown when parse errors are encountered.
Parser
A generic parser for responses.
Parser.LinkReceiver
A class that can receive URLs discovered during parsing.
PartwiseMinimumBase
Static methods to compute the minimum fibration base of a given graph using a partwise algorithm.
PathEndsWithOneOf
Accepts only URIs whose path ends (case-insensitively) with one of a given set of suffixes.
PearsonAssortativity
Computes Pearson assortativities between the list of degrees of sources and targets of arcs of a graph.
PowerSeries
Computes a power series on a graph using a parallel implementation.
Precision
A set of commodity methods to manipulate precision of doubles.
PreProcessedMinimumBase
Static methods to compute the minimum opfibration base of a given graph.
RemappedStringMap
A StringMap that remaps values returned by another StringMap.
RemoveHubs
Removes nodes from a graph following a number of strategies.
Response
Provides high level access to WARC records with record-type equal to response.
Salsa
Computes the SALSA score using a non-iterative method.
Salsa.UnionFind  
SchemeEquals
Accepts only URIs whose scheme equals a certain string (typically, http).
SequentialHttpResponseRead  
SequentialHttpResponseWrite  
SequentialWarcRecordRead  
SequentialWarcRecordWrite  
SimilarityStrategy
An interface specifying methods used to obtain pattern similarities.
SimpleCharStream
An implementation of interface CharStream, where the stream is assumed to contain only ASCII characters (without unicode processing).
SpectralRanking
A big version of SpectralRanking.
SpectralRanking
A base abstract class defining methods and attributes supporting computations of graph spectral rankings such as the dominant eigenvector, PageRank or Katz's index.
SpectralRanking.IterationNumberStoppingCriterion
A stopping criterion that stops whenever the number of iterations exceeds a given bound.
SpectralRanking.IterationNumberStoppingCriterion
A stopping criterion that stops whenever the number of iterations exceeds a given bound.
SpectralRanking.NormStoppingCriterion
A stopping criterion that evaluates SpectralRanking.normDelta(), and stops if this value is smaller than a given threshold.
SpectralRanking.NormStoppingCriterion
A stopping criterion that evaluates SpectralRanking.normDelta(), and stops if this value is smaller than a given threshold.
SpectralRanking.StoppingCriterion
A a strategy that decides when a computation should be stopped.
SpectralRanking.StoppingCriterion
A a strategy that decides when a computation should be stopped.
StatusCategory
Accepts only fetched response whose status category (status/100) has a certain value.
Text2DataOutput
The main method of this class converts converts a text file containing numbers to binary DataOutput format.
Token
Describes the input token stream.
TokenMgrError
Token Manager Error.
URLEquals
Accepts only a given URIs.
URLMatchesRegex
Accepts only URIs that match a certain regular expression.
URLShorterThan
Accepts only URIs whose overall length is below a given threshold.
Util
A static container of utility methods for all LAW software.
Util
Static utility methods.
Vector
A class representing a vector of double.
WarcFilteredIterator
A class to iterate over WARC files getting only records that satisfy a given filter.
WarcHttpResponse
An AbstractHttpResponse implementation that reads the response content from a WARC record (via the WarcHttpResponse.fromWarcRecord(WarcRecord) method.
WarcRecord
A class to read/write WARC/0.9 records (for format details, please see the WARC format specifications).
WarcRecord.ContentType
Content types.
WarcRecord.FormatException
An exception to denote parsing errors during reads.
WarcRecord.Header
A class to contain fields contained in the warc header.
WarcRecord.RecordType
Record types.
WeightedTau
Computes the weighted τ between two big score vectors.
WeightedTau
Computes the weighted τ between two score vectors.
WeightedTau.AbstractWeigher  
WeightedTau.AbstractWeigher