All Classes
Class | Description |
---|---|
AbstractFilter<T> |
An abstract implementation of a
Filter providing a method
that helps in implementing properly Object.toString() for atomic (i.e., class-based) filters. |
AbstractHttpResponse |
An abstract implementation of
HttpResponse providing a AbstractHttpResponse.toWarcRecord(WarcRecord) method that can
be used to populate a WARC record (in order to write it). |
ArcColouringStrategy |
A colouring on the arcs.
|
AveragePrecisionCorrelation |
Computes the AP (average-precision) correlation between two score vectors without ties.
|
BFS |
Computes the visit order with respect to a breadth-first visit.
|
BFS |
Computes the visit order with respect to a breadth-first visit.
|
BinaryParser |
A universal binary parser that just computes digests.
|
BoundedCountingInputStream |
A class that decorates an
InputStream to obtain a
MeasurableInputStream . |
BURL | Deprecated.
Use BUbiNG's BURL.
|
ByteArrayCharSequence |
An adapter exposing a byte array as an ISO-8859-1-encoded
character sequence.
|
CompressedIntLabel |
An integer label that uses a coder/decoder pair depending on the source node.
|
CompressWarc |
A tool to compress a WARC file.
|
ConsistentHashFunction<T extends Comparable<? super T>> |
Provides an implementation of consistent hashing.
|
ConsistentHashFunction.SkipStrategy<T> |
Allows to skip suitable items when searching for the closest replica.
|
ContentTypeStartsWith |
Accepts only fetched response whose content type starts with a given string.
|
CorrelationIndex |
An abstract class providing basic infrastructure for all classes computing some correlation index
between two score big vectors, such as
KendallTau . |
CorrelationIndex |
An abstract class providing basic infrastructure for all classes computing some correlation index between two score vectors,
such as
KendallTau , WeightedTau and AveragePrecisionCorrelation . |
CosineSimilarityStrategy |
A class that compute the similarity between pattern using cosine similarity.
|
CRC64 |
Provides static methods to compute 64-bit CRCs of strings and byte arrays.
|
CutWarc |
A class to extract specific records from a WARC file.
|
DataInput2Text |
The main method of this class converts a binary
DataOutput file containing numbers to text format. |
DataInput2Text.Type | |
DenseVector |
A mutable implementation of
Vector optimized for dense vectors. |
DFS | Deprecated.
This class performs a stack-based visit, but technically not a DFS.
|
DigestBasedDuplicateDetection |
Allows to determine if an
HttpResponse is duplicate. |
DigestEquals |
Accepts only records of given digest, specified as a hexadecimal string.
|
Digester |
A callback computing the digest of a page.
|
DominantEigenvectorParallelPowerMethod |
Computes the left dominant eigenvalue and eigenvector of a graph using a parallel implementation of the power method.
|
DuplicateSegmentsLessThan |
Accepts only URIs whose path does not contain too many duplicate segments.
|
EuclideanSimilarityStrategy |
A class that compute the similarity between pattern using the euclidean distance.
|
ExchangeCounter |
Computes the number of discordances between two score big vectors using Knight's
O(n log n) MergeSort-based algorithm.
|
ExchangeCounter |
Computes the number of discordances between two score vectors using Knight's
O(n log n) MergeSort-based algorithm.
|
ExchangeWeigher |
Computes the weight of discordances using a generalisation of Knight's algorithm.
|
ExchangeWeigher |
Computes the weight of discordances using a generalisation of Knight's algorithm.
|
ExtractDigestUrls |
A tool to extract digests and URLs from response records of a WARC file.
|
ExtractLinks |
Extracts links from a WARC file.
|
Filter<T> |
A filter is a strategy to decide whether to accept a given
object or not.
|
FilterParser<T> |
A simple parser that transforms a filter expression into a filter.
|
FilterParserConstants |
Token literal values and constants.
|
FilterParserTokenManager |
Token Manager.
|
Filters |
A collection of static methods to deal with
filters . |
GrepWarc |
A "grep" for WARC files.
|
GZWarcRecord | |
GZWarcRecord.GZHeader |
A class to contain fields contained in the gzip header.
|
GZWarcStats |
A tool to compute some statistics about a gzipped WARC file.
|
HostEndsWith |
Accepts only URIs whose host ends with (case-insensitively) a certain suffix.
|
HostEquals |
Accepts only URIs whose host equals (case-insensitively) a certain string.
|
HTMLParser |
An HTML parser with additional responsibilities (such as guessing the character encoding
and resolving relative URLs).
|
HTMLParser.SetLinkReceiver | |
HttpComponentsHttpResponse |
An concrete subclass of
AbstractHttpResponse that implements
missing methods by wrapping an Apache HTTP Components HttpResponse . |
HttpComponentsHttpResponse.HttpResponseHeaderMap |
A wrapper class exposing headers in
HttpResponse.headers()
format by delegating to an HttpResponse . |
HttpResponse |
Provides high level access to WARC records with
record-type equal to
response and content-type equal to HTTP
(or HTTPS ). |
HttpResponseFilteredIterator |
A class to iterate over WARC files getting only records corresponding to
HttpResponse that satisfy a given filter. |
ImmutableSparseVector |
An immutable implementation of
Vector optimized for sparse vectors. |
IndexWarc |
A tool to index a WARC file.
|
InspectableBufferedInputStream |
An input stream that wraps an underlying input stream to make it
rewindable and partially inspectable, using a bounded-capacity memory buffer and an overflow file.
|
InspectableBufferedInputStream.State |
The possible states of this stream, as explained above.
|
Int2DoubleMapVector |
A mutable implementation of
Vector for sparse vectors. |
IntegerExchangeCounter |
Computes the number of discordances between two integer score big vectors
using Knight's O(n log n)
MergeSort-based algorithm.
|
IsHttpResponse |
Accepts only records that are http/https responses.
|
IsProbablyBinary |
Accepts only http responses whose content stream appears to be binary.
|
KahanSummation |
Kahan's
summation algorithm encapsulated in an object.
|
KatzParallelGaussSeidel |
Computes Katz's index using a parallel implementation of the Gauß–Seidel method; this is the implementation of choice to be used when computing Katz's index.
|
KendallAssortativity |
Computes Kendall's assortativities between the list of degrees of sources and targets of
arcs of a graph.
|
KendallTau |
Computes Kendall's τ between two score big vectors.
|
KendallTau |
Computes Kendall's τ between two score vectors.
|
LayeredLabelPropagation |
A big implementation of the layered label propagation algorithm described by by Paolo
Boldi, Sebastiano Vigna, Marco Rosa, Massimo Santini, and Sebastiano Vigna in “Layered
label propagation: A multiresolution coordinate-free ordering for compressing social
networks”, Proceedings of the 20th international conference on World Wide Web, pages
587−596, ACM, 2011.
|
LayeredLabelPropagation |
An implementation of the layered label propagation algorithm described by
by Paolo Boldi, Sebastiano Vigna, Marco Rosa, Massimo Santini, and Sebastiano Vigna in “Layered label propagation:
A multiresolution coordinate-free ordering for compressing social networks”,
Proceedings of the 20th international conference on World Wide Web, pages 587−596, ACM, 2011.
|
LeftSingularVectorParallelPowerMethod |
Computes the left singular vector of a graph using a parallel implementation of the power method.
|
ListGZWarcComments |
A tool to list the GZip header comments contained in a compressed WARC file.
|
MeasurableSequenceInputStream |
A
MeasurableInputStream version of a SequenceInputStream . |
MetadataHttpResponse |
An abstract extention of
AbstractHttpResponse which additionally provides support
for getting and setting metadata (i.e., MetadataHttpResponse.uri() , MetadataHttpResponse.statusLine() , MetadataHttpResponse.status() and MetadataHttpResponse.headers() ). |
MetadataHttpResponse.HeaderMap |
A special map used for headers: keys are case-insensitive, and multiple puts are converted into comma-separated values.
|
MinimumBase |
Static methods to compute the minimum fibration base of a given graph.
|
MutableHttpResponse |
A mutable extension of
MetadataHttpResponse that provides
support for setting the content stream. |
NodeColouringStrategy |
A colouring on the nodes.
|
Norm |
An
Enum providing different ℓ norms. |
NormL1 | Deprecated.
Use
Norm.L_1 . |
NormL2 | Deprecated.
Use
Norm.L_2 . |
NumberDistinctLines |
The main method of this class reads a UTF-8 file containg a newline separated
list of strings and writes a
DataOutputStream containing a
list of ints such that the i-th int is equal to the j-th
int iff the (crc of the) i-th
string is equal to the (crc of
the) j-th string. |
PageRank |
A big version of
PageRank . |
PageRank |
An abstract class defining methods and attributes supporting PageRank computations.
|
PageRankFromCoefficients |
Computes PageRank using its power series.
|
PageRankGaussSeidel |
Computes PageRank of a graph using the Gauß–Seidel method.
|
PageRankParallelGaussSeidel |
A big version of
PageRankParallelGaussSeidel . |
PageRankParallelGaussSeidel |
Computes PageRank using a parallel (multicore) implementation of the Gauß–Seidel method.
|
PageRankParallelPowerSeries |
Computes PageRank using a parallel (multicore) implementation of the power-series method, which runs
the power method starting from the preference vector, thus evaluating the truncated PageRank power series (see
PageRankPowerSeries ). |
PageRankPowerSeries |
Computes PageRank (and possibly its derivatives in the damping factor) using its power series.
|
PageRankPush |
Computes strongly preferential PageRank for a preference vector concentrated on a node using the push algorithm.
|
PageRankPush.EmptyQueueStoppingCritertion | |
PageRankPush.IntHeapIndirectPriorityQueue | |
PageRankPush.L1NormStoppingCritertion | |
ParseException |
This exception is thrown when parse errors are encountered.
|
Parser |
A generic parser for
responses . |
Parser.LinkReceiver |
A class that can receive URLs discovered during parsing.
|
PartwiseMinimumBase |
Static methods to compute the minimum fibration base of a given graph using a partwise algorithm.
|
PathEndsWithOneOf |
Accepts only URIs whose path ends (case-insensitively) with one of a given set of suffixes.
|
PearsonAssortativity |
Computes Pearson assortativities between the list of degrees of sources and targets of
arcs of a graph.
|
PowerSeries |
Computes a power series on a graph using a parallel implementation.
|
Precision |
A set of commodity methods to manipulate precision of doubles.
|
PreProcessedMinimumBase |
Static methods to compute the minimum opfibration base of a given graph.
|
RemappedStringMap | |
RemoveHubs |
Removes nodes from a graph following a number of strategies.
|
Response |
Provides high level access to WARC records with
record-type equal to
response . |
Salsa |
Computes the SALSA score using a non-iterative method.
|
Salsa.UnionFind | |
SchemeEquals |
Accepts only URIs whose scheme equals a certain string (typically,
http ). |
SequentialHttpResponseRead | |
SequentialHttpResponseWrite | |
SequentialWarcRecordRead | |
SequentialWarcRecordWrite | |
SimilarityStrategy |
An interface specifying methods used to obtain pattern similarities.
|
SimpleCharStream |
An implementation of interface CharStream, where the stream is assumed to
contain only ASCII characters (without unicode processing).
|
SpectralRanking |
A big version of
SpectralRanking . |
SpectralRanking |
A base abstract class defining methods and attributes supporting computations
of graph spectral rankings such as the dominant eigenvector,
PageRank or Katz's index.
|
SpectralRanking.IterationNumberStoppingCriterion |
A stopping criterion that stops whenever the number of iterations exceeds a given bound.
|
SpectralRanking.IterationNumberStoppingCriterion |
A stopping criterion that stops whenever the number of iterations exceeds a given bound.
|
SpectralRanking.NormStoppingCriterion |
A stopping criterion that evaluates
SpectralRanking.normDelta() , and stops
if this value is smaller than a given threshold. |
SpectralRanking.NormStoppingCriterion |
A stopping criterion that evaluates
SpectralRanking.normDelta() , and stops
if this value is smaller than a given threshold. |
SpectralRanking.StoppingCriterion |
A a strategy that decides when a computation should be stopped.
|
SpectralRanking.StoppingCriterion |
A a strategy that decides when a computation should be stopped.
|
StatusCategory |
Accepts only fetched response whose status category (status/100) has a certain value.
|
Text2DataOutput |
The main method of this class converts converts a text file containing numbers to binary
DataOutput format. |
Token |
Describes the input token stream.
|
TokenMgrError |
Token Manager Error.
|
URLEquals |
Accepts only a given URIs.
|
URLMatchesRegex |
Accepts only URIs that match a certain regular expression.
|
URLShorterThan |
Accepts only URIs whose overall length is below a given threshold.
|
Util |
A static container of utility methods for all LAW software.
|
Util |
Static utility methods.
|
Vector |
A class representing a vector of
double . |
WarcFilteredIterator |
A class to iterate over WARC files getting only records that satisfy a given filter.
|
WarcHttpResponse |
An
AbstractHttpResponse implementation that reads the response
content from a WARC record (via the WarcHttpResponse.fromWarcRecord(WarcRecord) method. |
WarcRecord |
A class to read/write WARC/0.9 records (for format details, please see the WARC format specifications).
|
WarcRecord.ContentType |
Content types.
|
WarcRecord.FormatException |
An exception to denote parsing errors during reads.
|
WarcRecord.Header |
A class to contain fields contained in the warc
header . |
WarcRecord.RecordType |
Record types.
|
WeightedTau |
Computes the weighted τ between two big score vectors.
|
WeightedTau |
Computes the weighted τ between two score vectors.
|
WeightedTau.AbstractWeigher | |
WeightedTau.AbstractWeigher |