A B C D E F G H I J K L M N O P Q R S T U V W X 

A

abort() - Method in class it.unimi.di.law.bubing.frontier.FetchingThread
Causes the FetchData used by this thread to be FetchData.abort() (whence, the corresponding connection to be closed).
abort() - Method in class it.unimi.di.law.bubing.util.FetchData
Invokes AbstractExecutionAwareRequest.abort() on the underlying request.
AbstractFilter<T> - Class in it.unimi.di.law.warc.filters
An abstract implementation of a Filter providing a method that helps in implementing properly Object.toString() for atomic (i.e., class-based) filters.
AbstractFilter() - Constructor for class it.unimi.di.law.warc.filters.AbstractFilter
 
AbstractSieve<K,V> - Class in it.unimi.di.law.bubing.sieve
A sort of a map, that handles (key,value) pairs of generic type.
AbstractSieve(ByteSerializerDeserializer<K>, ByteSerializerDeserializer<V>, AbstractHashFunction<K>, AbstractSieve.UpdateStrategy<K, V>) - Constructor for class it.unimi.di.law.bubing.sieve.AbstractSieve
Creates a new sieve with the given data.
AbstractSieve.DefaultUpdateStrategy<K,V> - Class in it.unimi.di.law.bubing.sieve
 
AbstractSieve.DiskNewFlow<T> - Class in it.unimi.di.law.bubing.sieve
AbstractSieve.NewFlowReceiver<K> - Interface in it.unimi.di.law.bubing.sieve
An object that can receive a new flow of hash/key pairs and that acts as a listener for the AbstractSieve.
AbstractSieve.SieveEntry<K,V> - Class in it.unimi.di.law.bubing.sieve
A (key,value) pair.
AbstractSieve.UpdateStrategy<K,V> - Interface in it.unimi.di.law.bubing.sieve
An update strategy: it determines how a stored value should be updated in the presence of duplicate keys.
AbstractWarcReader - Class in it.unimi.di.law.warc.io
 
AbstractWarcReader() - Constructor for class it.unimi.di.law.warc.io.AbstractWarcReader
 
AbstractWarcRecord - Class in it.unimi.di.law.warc.records
An abstract implementation of a basic WarcRecord.
AbstractWarcRecord(URI, HeaderGroup) - Constructor for class it.unimi.di.law.warc.records.AbstractWarcRecord
BUilds a record, optionally given the target URI and the warcHeaders.
AbstractWarcRecord(HeaderGroup) - Constructor for class it.unimi.di.law.warc.records.AbstractWarcRecord
BUilds a record, optionally given the warcHeaders.
acceptAllCertificates - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
acceptAllCertificates - Variable in class it.unimi.di.law.bubing.StartupConfiguration
Whether to accept all SSL certificates, or self-signed only.
acquire() - Method in class it.unimi.di.law.bubing.frontier.Workbench
Acquires a visit state for a scheme+authority accessible by politeness.
acquired - Variable in class it.unimi.di.law.bubing.frontier.VisitState
Whether this visit state is currently acquired.
acquired - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Whether this entry has been acquired.
adaptFilterHttpResponse2FetchData(Filter<HttpResponse>) - Static method in class it.unimi.di.law.warc.filters.Filters
Adapts a filter with HttpResponse base type to a filter with FetchData base type.
adaptFilterHttpResponse2URIResponse(Filter<HttpResponse>) - Static method in class it.unimi.di.law.warc.filters.Filters
Adapts a filter with HttpResponse base type to a filter with URIResponse base type.
adaptFilterHttpResponse2WarcRecord(Filter<HttpResponse>) - Static method in class it.unimi.di.law.warc.filters.Filters
Adapts a filter with HttpResponse base type to a filter with WarcRecord base type.
adaptFilterString2URI(Filter<String>) - Static method in class it.unimi.di.law.warc.filters.Filters
Adapts a filter with String base type to a filter with URI base type.
adaptFilterURI2FetchData(Filter<URI>) - Static method in class it.unimi.di.law.warc.filters.Filters
Adapts a filter with URI base type to a filter with FetchData base type.
adaptFilterURI2HttpResponseWarcRecord(Filter<URI>) - Static method in class it.unimi.di.law.warc.filters.Filters
Adapts a filter with URI base type to a filter with HttpResponseWarcRecord base type.
adaptFilterURI2Link(Filter<URI>) - Static method in class it.unimi.di.law.warc.filters.Filters
Adapts a filter with URI base type to a filter with Link base type, applying the original filter to the target URI.
adaptFilterURI2URIResponse(Filter<URI>) - Static method in class it.unimi.di.law.warc.filters.Filters
Adapts a filter with URI base type to a filter with URIResponse base type.
adaptFilterURI2WarcRecord(Filter<URI>) - Static method in class it.unimi.di.law.warc.filters.Filters
Adapts a filter with URI base type to a filter with WarcRecord base type.
add(byte[]) - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
 
add(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
 
add(double) - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Adds a value to the stream.
add(long, long) - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache.Stripe
 
add(VisitState) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
Adds a visit state to the set, if necessary.
add(VisitState) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Adds the given visit state to the visit-state queue.
add(VisitState, Workbench) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Adds the given visit state to the visit-state queue, and adds this entry to the workbench if it was empty and not WorkbenchEntry.acquired.
add(WorkbenchEntry) - Method in class it.unimi.di.law.bubing.frontier.Workbench
Adds a nonempty, not acquired workbench entry to the workbench.
add(WorkbenchEntry) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Adds a workbench entry to the set, if necessary.
add(ParallelFilteredProcessorRunner.Processor<T>, ParallelFilteredProcessorRunner.Writer<? super T>, PrintStream) - Method in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
 
add(ByteArrayList) - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
 
add(T) - Method in class it.unimi.di.law.bubing.util.LockFreeQueue
 
addAll(double[]) - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Adds values to the stream.
addAll(DoubleList) - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Adds values to the stream.
addBlackListedHost(String) - Method in class it.unimi.di.law.bubing.Agent
 
addBlackListedHost(String) - Method in class it.unimi.di.law.bubing.RuntimeConfiguration
Adds a (or a set of) new host to the black list; the host can be specified directly or it can be a file (prefixed by file:).
addBlackListedIPv4(String) - Method in class it.unimi.di.law.bubing.Agent
 
addBlackListedIPv4(String) - Method in class it.unimi.di.law.bubing.RuntimeConfiguration
Adds a (or a set of) new IPv4 to the black list; the IPv4 can be specified directly or it can be a file (prefixed by file:).
addEscapes(String) - Static method in error it.unimi.di.law.warc.filters.parser.TokenMgrError
Replaces unprintable characters by their escaped (or unicode escaped) equivalents in the given string
addIfNotPresent(HeaderGroup, WarcHeader.Name, String) - Static method in class it.unimi.di.law.warc.records.WarcHeader
Adds the given header, if not present (otherwise does nothing).
address2WorkbenchEntry - Variable in class it.unimi.di.law.bubing.frontier.Workbench
The set of workbench entries.
addTo(byte[], int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
Adds a value to the counter associated with a given key.
addTo(byte[], int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
 
addTo(byte[], int, int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
Adds a value to the counter associated with a given key.
addTo(byte[], int, int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
 
addTo(byte[], int, int, long, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
 
adjustBeginLineColumn(int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Method to adjust line and column numbers for the start of a token.
agent - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The agent that created this frontier.
Agent - Class in it.unimi.di.law.bubing
A BUbiNG agent.
Agent(String, int, RuntimeConfiguration) - Constructor for class it.unimi.di.law.bubing.Agent
 
allocated - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
The overall number of bytes allocated (a multiple of ByteArrayDiskQueues.logFileSize).
and() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
 
and(Filter<T>...) - Static method in class it.unimi.di.law.warc.filters.Filters
Produces the conjunction of the given filters.
AND - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
RegularExpression Id.
append(char) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
 
append(char) - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
 
append(long, ByteArrayList) - Method in class it.unimi.di.law.bubing.frontier.Frontier
 
append(long, K) - Method in interface it.unimi.di.law.bubing.sieve.AbstractSieve.NewFlowReceiver
A new key is appended.
append(long, T) - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
 
append(CharSequence) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
 
append(CharSequence) - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
 
append(CharSequence, int, int) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
 
append(CharSequence, int, int) - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
 
appendPointer - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
The current pointer at which new elements can be appended.
apply(char[][], URI) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
Checks whether a specified URL passes a specified robots filter.
apply(Link) - Method in class it.unimi.di.law.warc.filters.SameHost
Apply the filter to a given link, returning true if source and target have the same host.
apply(URIResponse) - Method in class it.unimi.di.law.bubing.parser.BinaryParser
 
apply(URIResponse) - Method in class it.unimi.di.law.bubing.parser.HTMLParser
 
apply(WarcRecord) - Method in class it.unimi.di.law.warc.filters.DigestEquals
Apply the filter to a given WarcRecord
apply(WarcRecord) - Method in class it.unimi.di.law.warc.filters.IsHttpResponse
Apply the filter to a WarcRecord
apply(URI) - Method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
Apply the filter to a given URI
apply(URI) - Method in class it.unimi.di.law.warc.filters.HostEndsWith
Apply the filter to a given URI
apply(URI) - Method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
Apply the filter to a given URI
apply(URI) - Method in class it.unimi.di.law.warc.filters.HostEquals
Apply the filter to a given URI
apply(URI) - Method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
Apply the filter to a given URI
apply(URI) - Method in class it.unimi.di.law.warc.filters.SchemeEquals
Apply the filter to a given URI
apply(URI) - Method in class it.unimi.di.law.warc.filters.URLEquals
Apply the filter to a given URI
apply(URI) - Method in class it.unimi.di.law.warc.filters.URLMatchesRegex
Apply the filter to a given URI
apply(URI) - Method in class it.unimi.di.law.warc.filters.URLShorterThan
Apply the filter to the given URI
apply(HttpResponse) - Method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
Apply the filter to a HttpResponse
apply(HttpResponse) - Method in class it.unimi.di.law.warc.filters.IsProbablyBinary
This method implements a simple heuristic for guessing whether a page is binary.
apply(HttpResponse) - Method in class it.unimi.di.law.warc.filters.ResponseMatches
Checks whether the response associated with this page matches (in ISO-8859-1 encoding) the regular expression provided at construction time.
apply(HttpResponse) - Method in class it.unimi.di.law.warc.filters.StatusCategory
Apply the filter to a given HttpResponse
approximatedSize() - Method in class it.unimi.di.law.bubing.frontier.Workbench
Returns an approximation of the workbench size (in number of entries present on the workbench).
archetypes() - Method in class it.unimi.di.law.bubing.frontier.Frontier
The number of pages stored (does not include duplicates).
ARCHETYPES1XX - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
ARCHETYPES2XX - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
ARCHETYPES3XX - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
ARCHETYPES4XX - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
ARCHETYPES5XX - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
ARCHETYPESOTHERS - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
archetypesStatus - Variable in class it.unimi.di.law.bubing.frontier.Frontier
In position i, with 0 < i <6, the number of pages stored (does not include duplicates) having status ixx.
ARGS - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
RegularExpression Id.
atom() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
 
averageSpeed - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The average speeds of all visit states.
AVERAGESPEED - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 

B

backup(int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Backup a number of characters.
BAD_CHAR - Static variable in class it.unimi.di.law.bubing.util.BURL
A list of bad characters.
BAD_CHAR_SUBSTITUTE - Static variable in class it.unimi.di.law.bubing.util.BURL
Substitutes for bad characters.
BasicHttpClientConnectionManagerWithAlternateDNS(DnsResolver) - Constructor for class it.unimi.di.law.bubing.frontier.FetchingThread.BasicHttpClientConnectionManagerWithAlternateDNS
 
beginColumn - Variable in class it.unimi.di.law.warc.filters.parser.Token
The column number of the first character of this Token.
beginLine - Variable in class it.unimi.di.law.warc.filters.parser.Token
The line number of the first character of this Token.
BeginToken() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Start.
BINARY_CHECK_SCAN_LENGTH - Static variable in class it.unimi.di.law.warc.filters.IsProbablyBinary
 
binaryParser - Variable in class it.unimi.di.law.bubing.util.FetchData
The binary parser associated with this fetched response.
BinaryParser - Class in it.unimi.di.law.bubing.parser
A universal binary parser that just computes digests.
BinaryParser(HashFunction) - Constructor for class it.unimi.di.law.bubing.parser.BinaryParser
Builds a parser for digesting a page.
BinaryParser(HashFunction, boolean) - Constructor for class it.unimi.di.law.bubing.parser.BinaryParser
 
BinaryParser(String) - Constructor for class it.unimi.di.law.bubing.parser.BinaryParser
Builds a parser for digesting a page.
blackListedHostHashes - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
The set of hashes of hosts that should be blacklisted.
blackListedHostHashesLock - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
blackListedHosts - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A host that should be blacklisted (i.e., not crawled).
blackListedIPv4Addresses - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
blackListedIPv4Addresses - Variable in class it.unimi.di.law.bubing.StartupConfiguration
An IPv4 address that should be blacklisted (i.e., not crawled).
blackListedIPv4Lock - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
bloomFilterPrecision - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
bloomFilterPrecision - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The precision of the Bloom filter used for duplicate detection (usually, at least 1/StartupConfiguration.maxUrls).
BoundSessionInputBuffer - Class in it.unimi.di.law.warc.util
A SessionInputBuffer implementation that bounds a SessionInputBuffer (and hence its buffered stream) so that no more than a specified amount of bytes will be read (from its stream), and keeps track of the number of read bytes.
BoundSessionInputBuffer(SessionInputBuffer, long) - Constructor for class it.unimi.di.law.warc.util.BoundSessionInputBuffer
Creates a new SessionInputBuffer bounded to a given maximum length.
broken - Variable in class it.unimi.di.law.bubing.frontier.Workbench
The number of entirely broken entries (i.e., entries containing only broken visit states).
brokenPathQueryCount - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
The number of path+queries living in a broken visit state.
brokenVisitStates - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The number of broken visit states.
BROKENVISITSTATES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
brokenVisitStatesOnWorkbench - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
The number of broken visit states on the workbench.
BUBING_GUESSED_CHARSET - it.unimi.di.law.warc.records.WarcHeader.Name
 
BUBING_IS_DUPLICATE - it.unimi.di.law.warc.records.WarcHeader.Name
 
BubingJob - Class in it.unimi.di.law.bubing.util
The JAI4J Job used by BUbiNG.
BubingJob(ByteArrayList) - Constructor for class it.unimi.di.law.bubing.util.BubingJob
Creates a new BUbiNG job corresponding to a given BUbiNG URL.
bufcolumn - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
buffer - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
The character buffer.
buffer - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
buffer() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Returns the current buffer of this byte-array disk queue.
BufferedHttpEntityFactory - Class in it.unimi.di.law.warc.util
An implementation of a HttpEntityFactory that returns a BufferedHttpEntity.
buffers - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
For each log-file index, the associated ByteBuffer.
bufline - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
bufpos - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Position in buffer.
BuildRepetitionSet - Class in it.unimi.di.law.bubing.tool
Builds and saves the repetition set of a crawl.
BuildRepetitionSet() - Constructor for class it.unimi.di.law.bubing.tool.BuildRepetitionSet
 
BURL - Class in it.unimi.di.law.bubing.util
Static methods to manipulate normalized, canonical URLs in BUbiNG.
BYTE_ARRAY - Static variable in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
A serializer-deserializer for byte arrays that write the array length using variable-length byte encoding, and the writes the content of the array.
BYTE_ARRAY_HASHING_STRATEGY - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
A hash function using MurmurHash3.
BYTE_ARRAY_LIST_HASHING_STRATEGY - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
A hash function using MurmurHash3.
ByteArrayCharSequence - Class in it.unimi.di.law.bubing.util
An adapter exposing a byte array as an ISO-8859-1-encoded character sequence.
ByteArrayCharSequence() - Constructor for class it.unimi.di.law.bubing.util.ByteArrayCharSequence
Creates a new empty byte-array character sequence.
ByteArrayCharSequence(byte[]) - Constructor for class it.unimi.di.law.bubing.util.ByteArrayCharSequence
Creates a new byte-array character sequence using the provided byte array.
ByteArrayCharSequence(byte[], int, int) - Constructor for class it.unimi.di.law.bubing.util.ByteArrayCharSequence
Creates a new byte-array character sequence using the provided byte-array fragment.
ByteArrayDiskQueue - Class in it.unimi.di.law.bubing.util
A queue of byte arrays partially stored on disk.
ByteArrayDiskQueue(ByteDiskQueue) - Constructor for class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
 
ByteArrayDiskQueues - Class in it.unimi.di.law.bubing.util
A set of memory-mapped queues of byte arrays.
ByteArrayDiskQueues(File) - Constructor for class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Creates a set of byte-array disk queues in the given directory using log files of size 226.
ByteArrayDiskQueues(File, int) - Constructor for class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Creates a set of byte-array disk queues in the given directory using the specified file size.
ByteArrayDiskQueues.QueueData - Class in it.unimi.di.law.bubing.util
Metadata associated with a queue.
ByteArrayListByteSerializerDeserializer - Class in it.unimi.di.law.bubing.sieve
ByteArrayListByteSerializerDeserializer() - Constructor for class it.unimi.di.law.bubing.sieve.ByteArrayListByteSerializerDeserializer
 
ByteArraySessionOutputBuffer - Class in it.unimi.di.law.warc.util
A SessionOutputBuffer implementation that uses a byte array as a backing store.
ByteArraySessionOutputBuffer() - Constructor for class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
 
ByteSerializerDeserializer<V> - Interface in it.unimi.di.law.bubing.sieve
A light for of serialization based on InputStream/OutputStream (and FastBufferedInputStream for fast skipping).
ByteWriter - Class in it.unimi.di.law.warc.processors
A writer that simply dumps to the output stream an array of bytes.

C

cache() - Method in class it.unimi.di.law.warc.io.CompressedWarcCachingReader
 
cache() - Method in interface it.unimi.di.law.warc.io.WarcCachingReader
 
cachedContent - Variable in class it.unimi.di.law.warc.util.FastByteArrayInputStreamHttpEntityFactory
 
CapriciousPrintWriter(Writer, long, long, XoRoShiRo128PlusRandom) - Constructor for class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy.CapriciousPrintWriter
 
CatEFGraphs - Class in it.unimi.di.law.bubing.tool
Concatenates Elias–Fano graphs.
CatEFGraphs() - Constructor for class it.unimi.di.law.bubing.tool.CatEFGraphs
 
CHAR_BUFFER_SIZE - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
The size of the internal Jericho buffer.
CHAR_SEQUENCE_HASHING_STRATEGY - Static variable in class it.unimi.di.law.bubing.sieve.AbstractSieve
 
charAt(int) - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
 
CharSequenceByteSerializerDeserializer - Class in it.unimi.di.law.bubing.sieve
CharSequenceByteSerializerDeserializer() - Constructor for class it.unimi.di.law.bubing.sieve.CharSequenceByteSerializerDeserializer
 
CHARSET_PATTERN - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
checkRobots(long) - Method in class it.unimi.di.law.bubing.frontier.VisitState
Checks whether the current robots information has expired and, if necessary, schedules a new robots.txt download.
CHECKSUM_THRESHOLD - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
clear() - Method in class it.unimi.di.law.bubing.frontier.VisitState
Empties this visit state of all the URLs that it contains.
clear() - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
Removes all elements from this set.
clear() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Removes all elements from this set.
clear() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Clears this queue.
clear() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Clears this queue.
clear() - Method in class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
 
clone() - Method in class it.unimi.di.law.bubing.parser.BinaryParser
 
clone() - Method in class it.unimi.di.law.bubing.parser.HTMLParser
 
close() - Method in class it.unimi.di.law.bubing.frontier.FetchingThread
 
close() - Method in class it.unimi.di.law.bubing.frontier.Frontier
Closes the frontier: threads are stopped (if necessary, aborted), sieve and store and robots stream are closed.
close() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
 
close() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve
Closes (forever) this sieve.
close() - Method in class it.unimi.di.law.bubing.sieve.IdentitySieve
 
close() - Method in class it.unimi.di.law.bubing.sieve.MercatorSieve
 
close() - Method in interface it.unimi.di.law.bubing.store.Store
 
close() - Method in class it.unimi.di.law.bubing.store.UnbufferedFileStore
 
close() - Method in class it.unimi.di.law.bubing.store.WarcStore
 
close() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Closes this queue.
close() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Closes all files.
close() - Method in class it.unimi.di.law.bubing.util.FetchData
close() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Closes this queue.
close() - Method in class it.unimi.di.law.warc.io.CompressedWarcWriter
 
close() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveWriter
 
close() - Method in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
 
close() - Method in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter.WriterPair
 
close() - Method in class it.unimi.di.law.warc.io.UncompressedWarcWriter
 
close() - Method in class it.unimi.di.law.warc.processors.ByteWriter
 
close() - Method in class it.unimi.di.law.warc.processors.ConstantPositionURLWriter
 
close() - Method in class it.unimi.di.law.warc.processors.DateURLWriter
 
close() - Method in class it.unimi.di.law.warc.processors.IdentityProcessor
 
close() - Method in class it.unimi.di.law.warc.processors.IdentityWriter
 
close() - Method in class it.unimi.di.law.warc.processors.ResponseContentExtractor
 
close() - Method in class it.unimi.di.law.warc.processors.ToStringWriter
 
close() - Method in class it.unimi.di.law.warc.processors.URLDigestFinalPositionWriter
 
close() - Method in class it.unimi.di.law.warc.processors.URLDigestStatusLengthWriter
 
close() - Method in class it.unimi.di.law.warc.processors.URLDigestWriter
 
close() - Method in class it.unimi.di.law.warc.processors.URLPositionWriter
 
close() - Method in class it.unimi.di.law.warc.processors.URLWriter
 
close() - Method in class it.unimi.di.law.warc.processors.WarcTargetUriExtractor
 
CLOSEPAREN - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
RegularExpression Id.
collect(double) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Performs garbage collection until ByteArrayDiskQueues.ratio() is greater than the specified target ratio.
collectIf(double, double) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
Performs a garbage collection if the space used is below a given threshold, reaching a given target ratio.
column - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
comment - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
An internal representation of the comment of the entry.
compareTo(Delayed) - Method in class it.unimi.di.law.bubing.frontier.VisitState
 
compareTo(Delayed) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
 
compressedSkipLength - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
The actual (compressed) length of the entry.
CompressedWarcCachingReader - Class in it.unimi.di.law.warc.io
 
CompressedWarcCachingReader(InputStream) - Constructor for class it.unimi.di.law.warc.io.CompressedWarcCachingReader
 
CompressedWarcReader - Class in it.unimi.di.law.warc.io
 
CompressedWarcReader(InputStream) - Constructor for class it.unimi.di.law.warc.io.CompressedWarcReader
 
CompressedWarcWriter - Class in it.unimi.di.law.warc.io
 
CompressedWarcWriter(OutputStream) - Constructor for class it.unimi.di.law.warc.io.CompressedWarcWriter
 
ConcurrentCountingMap - Class in it.unimi.di.law.bubing.util
A concurrent counting map.
ConcurrentCountingMap() - Constructor for class it.unimi.di.law.bubing.util.ConcurrentCountingMap
Creates a new concurrent counting map with concurrency level equal to Runtime.availableProcessors().
ConcurrentCountingMap(int) - Constructor for class it.unimi.di.law.bubing.util.ConcurrentCountingMap
Creates a new concurrent counting map.
ConcurrentCountingMap.LockedMap - Class in it.unimi.di.law.bubing.util
 
ConcurrentCountingMap.Stripe - Class in it.unimi.di.law.bubing.util
 
ConcurrentSummaryStats - Class in it.unimi.di.law.bubing.util
 
ConcurrentSummaryStats() - Constructor for class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
 
connectionTimeout - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
connectionTimeout - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The socket connection timeout in milliseconds.
ConstantPositionURLWriter - Class in it.unimi.di.law.warc.processors
 
ConstantPositionURLWriter(String) - Constructor for class it.unimi.di.law.warc.processors.ConstantPositionURLWriter
 
consume() - Method in interface it.unimi.di.law.warc.io.gzarc.GZIPArchive.ReadEntry.LazyInflater
Consumes the (possibly) remaining entry content.
consume() - Method in class it.unimi.di.law.warc.util.BoundSessionInputBuffer
Consumes the remaining bytes (of the buffered stream).
CONTENT_LENGTH - it.unimi.di.law.warc.records.WarcHeader.Name
 
CONTENT_PATTERN - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
CONTENT_TYPE - it.unimi.di.law.warc.records.WarcHeader.Name
 
contentLength - Variable in class it.unimi.di.law.bubing.frontier.Frontier
Statistic about the content length of each archetype
contentLength() - Method in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
 
contentLength(long) - Method in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
 
contentTypeApplication - Variable in class it.unimi.di.law.bubing.frontier.Frontier
Number of archetypes whose indicated content type starts with application (case insensitive)
contentTypeImage - Variable in class it.unimi.di.law.bubing.frontier.Frontier
Number of archetypes whose indicated content type starts with image (case insensitive)
contentTypeOthers - Variable in class it.unimi.di.law.bubing.frontier.Frontier
Number of archetypes whose indicated content type does not start with text, image, or application (case insensitive)
ContentTypeStartsWith - Class in it.unimi.di.law.warc.filters
A filter accepting only fetched response whose content type starts with a given string.
ContentTypeStartsWith(String) - Constructor for class it.unimi.di.law.warc.filters.ContentTypeStartsWith
Creates a filter that only accepts URLs whose content type starts with a given prefix.
contentTypeText - Variable in class it.unimi.di.law.bubing.frontier.Frontier
Number of archetypes whose indicated content type starts with text (case insensitive)
cookieMaxByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
cookieMaxByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The maximum overall size for the (external form of) the cookies accepted from a single host.
cookiePolicy - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
cookiePolicy - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The cookie policy to be used.
cookies - Variable in class it.unimi.di.law.bubing.frontier.VisitState
The cookies of this visit state.
copy() - Method in class it.unimi.di.law.bubing.parser.BinaryParser
 
copy() - Method in class it.unimi.di.law.bubing.parser.HTMLParser
 
copy() - Method in interface it.unimi.di.law.bubing.parser.Parser
This method strengthens the return type of the method inherited from Filter.
copy() - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
 
copy() - Method in class it.unimi.di.law.bubing.test.ImmutableGraphNamedGraphServer
 
copy() - Method in class it.unimi.di.law.bubing.test.RandomNamedGraphServer
 
copy() - Method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
 
copy() - Method in class it.unimi.di.law.warc.filters.DigestEquals
 
copy() - Method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
 
copy() - Method in class it.unimi.di.law.warc.filters.HostEndsWith
 
copy() - Method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
 
copy() - Method in class it.unimi.di.law.warc.filters.HostEquals
 
copy() - Method in class it.unimi.di.law.warc.filters.IsHttpResponse
 
copy() - Method in class it.unimi.di.law.warc.filters.IsProbablyBinary
 
copy() - Method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
 
copy() - Method in class it.unimi.di.law.warc.filters.ResponseMatches
 
copy() - Method in class it.unimi.di.law.warc.filters.SameHost
 
copy() - Method in class it.unimi.di.law.warc.filters.SchemeEquals
 
copy() - Method in class it.unimi.di.law.warc.filters.StatusCategory
 
copy() - Method in class it.unimi.di.law.warc.filters.URLEquals
 
copy() - Method in class it.unimi.di.law.warc.filters.URLMatchesRegex
 
copy() - Method in class it.unimi.di.law.warc.filters.URLShorterThan
 
copy() - Method in class it.unimi.di.law.warc.processors.IdentityProcessor
 
copy() - Method in class it.unimi.di.law.warc.processors.ResponseContentExtractor
 
copy() - Method in class it.unimi.di.law.warc.processors.WarcTargetUriExtractor
 
copy(Filter<T>...) - Static method in class it.unimi.di.law.warc.filters.Filters
 
copyContent(long, long, long, long) - Method in class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
 
count - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues.QueueData
The number of elements in the list (always nonzero).
count(VisitState) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
Returns the number of path+queries associated with the given visit state.
count(Object) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Returns the number of elements associated with the given key.
CRAWLDURATION - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
crawlIsNew - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
crawlIsNew - Variable in class it.unimi.di.law.bubing.StartupConfiguration
Whether this is a new crawl.
crc32 - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
The CRC of the entry.
createFromFile(long, File, int, boolean) - Static method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Creates a new disk-based queue of byte arrays using an existing file.
createFromFile(long, File, int, boolean) - Static method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Creates a new disk-based queue of objects using an existing file.
createHierarchicalTempFile(File, int, String, String) - Static method in class it.unimi.di.law.bubing.util.Util
Creates a temporary file with a random hierachical path.
createNew(File, int, boolean) - Static method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Creates a new disk-based queue of byte arrays.
createNew(File, int, boolean) - Static method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Creates a new disk-based queue of objects.
CRLF - Static variable in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
 
CRLFCRLF - Static variable in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
 
crossAuthorityDuplicates - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
If true, pages with the same content but with different authorities are considered duplicates.
curChar - Variable in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
 
CURRENTQUEUE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
currentToken - Variable in exception it.unimi.di.law.warc.filters.parser.ParseException
This is the last token that has been consumed successfully.

D

DateURLWriter - Class in it.unimi.di.law.warc.processors
 
DateURLWriter() - Constructor for class it.unimi.di.law.warc.processors.DateURLWriter
 
debugStream - Variable in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
Debug output.
decodeInt() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Decodes using vByte a nonnegative integer at the current pointer.
DEFAULT - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
Lexical state.
DEFAULT_LOG2_LOG_FILE_SIZE - Static variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
By default, we use 64 MiB log files.
defaultRequestConfig - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The default configuration for a non-robots.txt request.
DefaultUpdateStrategy() - Constructor for class it.unimi.di.law.bubing.sieve.AbstractSieve.DefaultUpdateStrategy
 
deflater - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.WriteEntry
The deflater to be used to actually write data that has to be compressed as the entry content.
dequeue() - Method in class it.unimi.di.law.bubing.frontier.VisitState
Removes the first path in the queue.
dequeue() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Dequeues a byte array from the queue in FIFO fashion.
dequeue() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Dequeues an object from the queue in FIFO fashion.
dequeue(Object) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Dequeues the first element available for a given key.
dequeueKey() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
Returns the next key in the flow of new pairs remained after the check, and discards the corresponding value.
dequeuePathQueries(VisitState, int) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
Dequeues at most the given number of path+queries into the given visit state.
digest - Variable in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
The last returne digest, or null if HTMLParser.DigestAppendable.init(URI) has been called but HTMLParser.DigestAppendable.digest() hasn't.
digest() - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
 
digest() - Method in class it.unimi.di.law.bubing.util.FetchData
Get the digest
digest(byte[]) - Method in class it.unimi.di.law.bubing.util.FetchData
Set the digest with a given value
digestAlgorithm - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
digestAlgorithm - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The algorithm used for digesting pages (for duplicate filtering).
digestAppendable - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
An object emboding the digest logic, or null for no digest computation.
DigestAppendable(HashFunction) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
Create a digest appendable using a given hash function.
DigestEquals - Class in it.unimi.di.law.warc.filters
A filter accepting only records of given digest, specified as a hexadecimal string.
digests - Variable in class it.unimi.di.law.bubing.frontier.Frontier
A Bloom filter storing page digests for duplicate detection.
DIGESTS_NAME - Static variable in class it.unimi.di.law.bubing.store.WarcStore
 
disable_tracing() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
Disable tracing.
DiskNewFlow(ByteSerializerDeserializer<T>) - Constructor for class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
 
dist - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
A variable used for exponentially-binned distribution of visit state sizes.
distributor - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The thread constantly moving ready URLs into the Frontier.workbench.
Distributor - Class in it.unimi.di.law.bubing.frontier
A thread that distributes ready URLs (coming out of the sieve) into the Workbench queues with the help of a WorkbenchVirtualizer.
Distributor(Frontier) - Constructor for class it.unimi.di.law.bubing.frontier.Distributor
Creates a distributor for the given frontier.
DISTRIBUTORVISITSTATESONDISK - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
DISTRIBUTORWARMUP - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
dnsCacheMaxSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
dnsCacheMaxSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
Maximum number of entries cached by the DNS resolutor when using DnsJavaResolver.
DnsJavaResolver - Class in it.unimi.di.law.bubing.frontier.dns
A resolver based on dnsjava.
DnsJavaResolver() - Constructor for class it.unimi.di.law.bubing.frontier.dns.DnsJavaResolver
 
dnsNegativeTtl - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
dnsNegativeTtl - Variable in class it.unimi.di.law.bubing.StartupConfiguration
Expiration time for negative DNS answers when using DnsJavaResolver.
dnsPositiveTtl - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
dnsPositiveTtl - Variable in class it.unimi.di.law.bubing.StartupConfiguration
Expiration time for positive DNS answers when using DnsJavaResolver.
dnsResolver - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
The DNS resolver used throughout the crawler.
dnsResolverClass - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A DnsResolver.
DNSThread - Class in it.unimi.di.law.bubing.frontier
A thread that continuously dequeues a VisitState from the queue of new visit states (those that still need a DNS resolution), resolves its host and puts it on the Workbench.
DNSThread(Frontier, int) - Constructor for class it.unimi.di.law.bubing.frontier.DNSThread
A DNS thread for the given Frontier, with an index used to set the thread's name.
dnsThreads - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The threads resolving DNS for new visit states.
dnsThreads - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
dnsThreads - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The number of DNS threads (usually few dozens, depending on the server).
dnsThreads(int) - Method in class it.unimi.di.law.bubing.frontier.Frontier
Changes the number of DNS threads.
done - Variable in class it.unimi.di.law.bubing.frontier.Frontier
A lock-free list of visit states ready to be released; it is filled by fetching threads and emptied by the DoneThread.
done() - Method in class it.unimi.di.law.bubing.frontier.StatsThread
Terminates the statistics, closing all the progress loggers.
Done() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Reset buffer when finished.
DoneThread - Class in it.unimi.di.law.bubing.frontier
A thread that continuously dequeues a VisitState from the Frontier.done queue and releases it to the Workbench.
DoneThread(Frontier) - Constructor for class it.unimi.di.law.bubing.frontier.DoneThread
A DoneThread for the given Frontier.
DOTTED_ADDRESS - Static variable in class it.unimi.di.law.bubing.RuntimeConfiguration
A pattern used to identify hosts specified directed via their address in dotted notation.
duplicates - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The number of duplicate pages.
DUPLICATES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
DuplicateSegmentsLessThan - Class in it.unimi.di.law.warc.filters
A filter accepting only URIs whose path does not contain too many duplicate segments.
DuplicateSegmentsLessThan(int) - Constructor for class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
Creates a filter that only accepts URIs whose path does contains less duplicate consecutive segments than the given threshold.

E

emit() - Method in class it.unimi.di.law.bubing.frontier.StatsThread
Emits the statistics.
EMPTY_ARRAY - Static variable in class it.unimi.di.law.warc.filters.Filters
 
EMPTY_CHARSEQUENCE_ARRAY - Static variable in class it.unimi.di.law.bubing.test.RandomNamedGraphServer
 
EMPTY_COOKIE_ARRAY - Static variable in class it.unimi.di.law.bubing.frontier.VisitState
A singleton empty cookie array.
EMPTY_ROBOTS_FILTER - Static variable in class it.unimi.di.law.bubing.util.URLRespectsRobots
A singleton empty robots filter.
emptyPairs - Variable in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
The queue of empty ParallelBufferedWarcWriter.WriterPair instances.
enable_tracing() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
Enable tracing.
encodeInt(int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Encodes using vByte a nonnegative integer at the current pointer.
endColumn - Variable in class it.unimi.di.law.warc.filters.parser.Token
The column number of the last character of this Token.
endLine - Variable in class it.unimi.di.law.warc.filters.parser.Token
The line number of the last character of this Token.
endTag(EndTag) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
 
endTags - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
Cached byte representations of all closing tags.
endTime - Variable in class it.unimi.di.law.bubing.util.FetchData
System.currentTimeMillis() when the GET request was completed.
enlargeBuffer(int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Enlarge the buffer of this queue to a given size.
enlargeBuffer(int) - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Enlarge the buffer of this queue to a given size.
enqueue(byte[]) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Enqueues a byte array to this queue.
enqueue(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Enqueues a byte-array fragment to this queue.
enqueue(ByteArrayList) - Method in class it.unimi.di.law.bubing.frontier.Frontier
Enqueues a URL to the BUbiNG crawl.
enqueue(Object, byte[]) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Enqueues an element (specified as a byte array) associated with a given key.
enqueue(Object, byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Enqueues an element (specified as a byte-array fragment) associated with a given key.
enqueue(URI) - Method in class it.unimi.di.law.bubing.frontier.ParsingThread.FrontierEnqueuer
Enqueues the given URL, provided that it passes the schedule filter, its host is blacklisted.
enqueue(K, V) - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve
Add the given (key,value) pair to the store.
enqueue(K, V) - Method in class it.unimi.di.law.bubing.sieve.IdentitySieve
 
enqueue(K, V) - Method in class it.unimi.di.law.bubing.sieve.MercatorSieve
 
enqueue(T) - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Enqueues an object to this queue.
enqueueLocal(ByteArrayList) - Method in class it.unimi.di.law.bubing.frontier.Frontier
Enqueues a local URL represented by a byte array to the crawl of this agent.
enqueuePathQuery(byte[]) - Method in class it.unimi.di.law.bubing.frontier.VisitState
Enqueues a path+query in byte-array representation, possibly putting this visit state in its entry.
enqueueRobots() - Method in class it.unimi.di.law.bubing.frontier.VisitState
Enqueues the /robots.txt path as the first element of the queue, if the queue is empty, or as the second element otherwise, possibly putting this visit state in its entry.
enqueueURL(VisitState, ByteArrayList) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
Enqueues the given URL as a path+query associated to the scheme+authority of the given visit state.
ensureCapacity(int) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
Ensures that the set has a given capacity.
ensureCapacity(int) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Ensures that the set has a given capacity.
ensureNotPaused() - Method in class it.unimi.di.law.bubing.RuntimeConfiguration
 
entrySummaryStats - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
A variable accumulating statistics about the size (in visit states) of workbench entries.
EOF - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
End of File.
EOL - Static variable in exception it.unimi.di.law.warc.filters.parser.ParseException
The end of line string for this machine.
EPOCH - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
equals(Object) - Method in class it.unimi.di.law.bubing.util.LockFreeQueue
 
equals(Object) - Method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
Compare this object with a given generic one
equals(Object) - Method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
Compare this object with a given generic one
equals(Object) - Method in class it.unimi.di.law.warc.filters.HostEndsWith
Compare this object with a given generic one
equals(Object) - Method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
Compare this object with a given generic one
equals(Object) - Method in class it.unimi.di.law.warc.filters.HostEquals
Compare this object with a given generic one
equals(Object) - Method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
Compare this with a given generic object
equals(Object) - Method in class it.unimi.di.law.warc.filters.SameHost
Compare this object with a given generic one.
equals(Object) - Method in class it.unimi.di.law.warc.filters.SchemeEquals
Compare a given object with this
equals(Object) - Method in class it.unimi.di.law.warc.filters.StatusCategory
Compare this filter with a generic object
equals(Object) - Method in class it.unimi.di.law.warc.filters.URLEquals
Compare this filter with a given object
equals(Object) - Method in class it.unimi.di.law.warc.filters.URLMatchesRegex
Compare this with a given object
equals(Object) - Method in class it.unimi.di.law.warc.filters.URLShorterThan
Compare this with a given object
estimate(T) - Method in interface it.unimi.di.law.bubing.spam.SpamDetector
Estimates the spam score associated with a given information object.
estimateLength(CharSequence[]) - Static method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy
Estimates the length of the page generated by a given array of successors.
exception - Variable in class it.unimi.di.law.bubing.util.FetchData
The exception thrown in case of a failed fetch, or null.
EXCEPTION_HOST_KILLER - Static variable in class it.unimi.di.law.bubing.frontier.ParsingThread
A map recording for each type of exception the number of retries.
EXCEPTION_TO_MAX_RETRIES - Static variable in class it.unimi.di.law.bubing.frontier.ParsingThread
A map recording for each type of exception the number of retries.
EXCEPTION_TO_WAIT_TIME - Static variable in class it.unimi.di.law.bubing.frontier.ParsingThread
A map recording for each type of exception a timeout, Note that 0 means standard politeness time.
ExpandBuff(boolean) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
expectedTokenSequences - Variable in exception it.unimi.di.law.warc.filters.parser.ParseException
Each entry in this array is an array of integers.
externalOutdegree - Variable in class it.unimi.di.law.bubing.frontier.Frontier
Statistics about the number of out-links of each archetype, without considering the links to the same corresponding host

F

FakeResolver - Class in it.unimi.di.law.bubing.frontier.dns
A fake resolver that returns a four-byte representation of the host hashcode.
FakeResolver() - Constructor for class it.unimi.di.law.bubing.frontier.dns.FakeResolver
 
FALSE - Static variable in class it.unimi.di.law.warc.filters.Filters
 
FALSE - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
RegularExpression Id.
FastApproximateByteArrayCache - Class in it.unimi.di.law.bubing.util
A fast, concurrent approximate cache for byte arrays.
FastApproximateByteArrayCache(long) - Constructor for class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
Creates a new cache with specified size and concurrency level equal to Runtime.availableProcessors().
FastApproximateByteArrayCache(long, int) - Constructor for class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
Creates a new cache with specified size.
FastApproximateByteArrayCache.Stripe - Class in it.unimi.di.law.bubing.util
A class containing a stripe of the cache.
FastByteArrayInputStreamHttpEntityFactory - Class in it.unimi.di.law.warc.util
An implementation of a HttpEntityFactory that returns an entity whose content is buffered using a FastByteArrayInputStream.
FastByteArrayInputStreamHttpEntityFactory() - Constructor for class it.unimi.di.law.warc.util.FastByteArrayInputStreamHttpEntityFactory
 
FCOMMENT - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
fetch(URI, HttpClient, RequestConfig, VisitState, boolean) - Method in class it.unimi.di.law.bubing.util.FetchData
Fetches a given URL.
FETCH_ROBOTS - Static variable in class it.unimi.di.law.bubing.RuntimeConfiguration
Whether to fetch and use robots.txt.
FetchData - Class in it.unimi.di.law.bubing.util
Response of a HTTP request.
FetchData(RuntimeConfiguration) - Constructor for class it.unimi.di.law.bubing.util.FetchData
Creates a fetched response according to the given properties.
fetchDataBufferByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
fetchDataBufferByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
Size of the buffer for InspectableFileCachedInputStream instances, in bytes.
fetchedResources - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The number of fetched resources (updated by ParsingThread instances).
FETCHEDRESOURCES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
fetchedRobots - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The number of fetched robots.txt files (updated by ParsingThread instances).
FETCHEDROBOTS - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
fetchFilter - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
fetchFilter - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A filter that will be applied to all ready URLs to decide whether to fetch them.
FetchingThread - Class in it.unimi.di.law.bubing.frontier
A thread fetching pages that will be then analyzed by a ParsingThread.
FetchingThread(Frontier, int) - Constructor for class it.unimi.di.law.bubing.frontier.FetchingThread
Creates a new fetching thread.
FetchingThread.BasicHttpClientConnectionManagerWithAlternateDNS - Class in it.unimi.di.law.bubing.frontier
A support class that makes it possible to plug in a custom DNS resolver.
fetchingThreads - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
fetchingThreads - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The number of fetching threads (hundreds or even thousands).
fetchingThreads(int) - Method in class it.unimi.di.law.bubing.frontier.Frontier
Changes the number of fetching threads.
fetchingThreadWaitingTimeSum - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The sum of the waiting time of the waiting fetching threads: every time a fetching thread waits this sum is updated; every time the statistics are printed this value is reset.
FETCHINGTHREADWAITINGTIMESUM - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
fetchingThreadWaits - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The number of waits performed by fetching threads; every time the statistics are printed this value is reset.
FETCHINGTHREADWAITS - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
FEXTRA - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
FHCRC - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
files - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
For each log-file index, the associated RandomAccessFile.
FillBuff() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
filledPairs - Variable in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
The queue of filled ParallelBufferedWarcWriter.WriterPair instances; their content will be flushed to disk by the ParallelBufferedWarcWriter.flushingThread.
Filter<T> - Interface in it.unimi.di.law.warc.filters
A filter is a strategy to decide whether to accept a given object or not.
FILTER_PACKAGE_NAME - Static variable in interface it.unimi.di.law.warc.filters.Filter
The name of the package that contains this interface as well as most filters.
FilterParser<T> - Class in it.unimi.di.law.warc.filters.parser
A simple parser that transforms a filter expression into a filter.
FilterParser(FilterParserTokenManager) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParser
Constructor with generated Token Manager.
FilterParser(InputStream) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParser
Constructor with InputStream.
FilterParser(InputStream, String) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParser
Constructor with InputStream and supplied encoding
FilterParser(Reader) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParser
Constructor.
FilterParser(Class<T>) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParser
 
FilterParserConstants - Interface in it.unimi.di.law.warc.filters.parser
Token literal values and constants.
FilterParserTokenManager - Class in it.unimi.di.law.warc.filters.parser
Token Manager.
FilterParserTokenManager(SimpleCharStream) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
Constructor.
FilterParserTokenManager(SimpleCharStream, int) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
Constructor.
Filters - Class in it.unimi.di.law.warc.filters
A collection of static methods to deal with filters.
Filters() - Constructor for class it.unimi.di.law.warc.filters.Filters
 
finishedAppending() - Method in class it.unimi.di.law.bubing.frontier.Frontier
 
finishedAppending() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
 
finishedAppending() - Method in interface it.unimi.di.law.bubing.sieve.AbstractSieve.NewFlowReceiver
The new flow of keys is over.
firstPath() - Method in class it.unimi.di.law.bubing.frontier.VisitState
Peeks at the first path in the queue.
FIX_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
flush() - Method in class it.unimi.di.law.bubing.Agent
 
flush() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve
Forces the check+update of all pairs that have been enqueued.
flush() - Method in class it.unimi.di.law.bubing.sieve.IdentitySieve
 
flush() - Method in class it.unimi.di.law.bubing.sieve.MercatorSieve
 
flushingThread - Variable in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
The thread that iteratively extracts filled @link ParallelBufferedWarcWriter.WriterPair} instances from ParallelBufferedWarcWriter.filledPairs, dump them to ParallelBufferedWarcWriter.outputStream and enqueue them to ParallelBufferedWarcWriter.emptyPairs.
flushingThreadException - Variable in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
The exception throw by the ParallelBufferedWarcWriter.flushingThread, if any, or null.
flushingThreadException - Variable in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
 
FNAME - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
followFilter - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
followFilter - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A filter that will be applied to all parsed resources to decide whether to follow their links.
FORBIDDEN_CHARS - Static variable in class it.unimi.di.law.bubing.util.BURL
Characters that will cause a URI spec to be rejected.
forciblyEnqueueRobotsFirst() - Method in class it.unimi.di.law.bubing.frontier.VisitState
Forcibly enqueues the /robots.txt path as the first element of the queue.
formatDate(Calendar) - Static method in class it.unimi.di.law.warc.records.WarcHeader
Formats the date to be written in the WarcHeader.Name.WARC_DATE header.
FormatException(String) - Constructor for exception it.unimi.di.law.warc.io.gzarc.GZIPArchive.FormatException
 
FormatException(String, Throwable) - Constructor for exception it.unimi.di.law.warc.io.gzarc.GZIPArchive.FormatException
 
formatId(UUID) - Static method in class it.unimi.di.law.warc.records.WarcHeader
Formats the record id to be written in the WarcHeader.Name.WARC_RECORD_ID header.
forName(String) - Static method in class it.unimi.di.law.bubing.parser.BinaryParser
Return the hash function corresponding to a given message-digest algorithm given by name.
freeze() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Freezes this queue.
freeze() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Freezes this queue.
fromByteArray(byte[], int) - Method in class it.unimi.di.law.bubing.Agent
 
fromHexString(String) - Static method in class it.unimi.di.law.warc.util.Util
Returns a byte array corresponding to the given number.
fromNormalizedByteArray(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
Creates a new BUbiNG URL from a normalized ASCII string represented by a byte array.
fromNormalizedSchemeAuthorityAndPathQuery(byte[], byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
Creates a new BUbiNG URL from a byte-array representation of a normalized scheme and authority and a byte-array representation of a normalized ASCII path and query.
fromNormalizedSchemeAuthorityAndPathQuery(String, byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
Creates a new BUbiNG URL from a normalized ASCII string representing scheme and authority and a byte-array representation of a normalized ASCII path and query.
fromPayload(HeaderGroup, BoundSessionInputBuffer) - Static method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
fromPayload(HeaderGroup, BoundSessionInputBuffer) - Static method in class it.unimi.di.law.warc.records.HttpRequestWarcRecord
 
fromPayload(HeaderGroup, BoundSessionInputBuffer) - Static method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
fromPayload(HeaderGroup, BoundSessionInputBuffer) - Static method in class it.unimi.di.law.warc.records.InfoWarcRecord
 
fromPayloadMethod(Header) - Static method in enum it.unimi.di.law.warc.records.WarcRecord.Type
Returns the factory method to be used to create a record from the payload given an header specifying the type.
fromStream(InputStream) - Method in class it.unimi.di.law.bubing.sieve.ByteArrayListByteSerializerDeserializer
A serializer-deserializer for byte arrays that write the array length using variable-length byte encoding, and the writes the content of the array.
fromStream(InputStream) - Method in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
Deserializes an object starting from a given portion of a byte array.
fromStream(InputStream) - Method in class it.unimi.di.law.bubing.sieve.CharSequenceByteSerializerDeserializer
 
fromString(String) - Method in class it.unimi.di.law.bubing.Agent
 
FRONT_INCREASE - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
The increase of the front size used by Frontier.updateRequestedFrontSize().
frontier - Variable in class it.unimi.di.law.bubing.frontier.VisitState
A reference to the frontier.
Frontier - Class in it.unimi.di.law.bubing.frontier
The BUbiNG frontier: a class structure that encompasses most of the logic behind the way BUbiNG fetches URLs.
Frontier(RuntimeConfiguration, Store, Agent) - Constructor for class it.unimi.di.law.bubing.frontier.Frontier
Creates the frontier.
Frontier.PropertyKeys - Enum in it.unimi.di.law.bubing.frontier
Names of the scalar fields saved by Frontier.snap().
frontierDir - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
frontierDir - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A directory for storing files (mainly queues managed by ByteArrayDiskQueue) related to the frontier.
FrontierEnqueuer(Frontier, RuntimeConfiguration) - Constructor for class it.unimi.di.law.bubing.frontier.ParsingThread.FrontierEnqueuer
Creates the enqueuer.
FTEXT - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 

G

generate(long, StringBuilder, CharSequence[], boolean) - Static method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy
 
GenerateGraphMap - Class in it.unimi.di.law.bubing.tool
Builds and saves the graph map, that is, a text file containing all URLs ever crawled, and a binary file containing the corresponding nodes (duplicates are mapped to their archetype position).
GenerateGraphMap() - Constructor for class it.unimi.di.law.bubing.tool.GenerateGraphMap
 
generateParseException() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
Generate ParseException.
get() - Method in interface it.unimi.di.law.warc.io.gzarc.GZIPArchive.ReadEntry.LazyInflater
Returns the actual inflater from which the uncompressed entry content may be read.
get(byte[]) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
Returns the visit state associated to a given scheme+authority, or null.
get(byte[]) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Returns the entry for a given IP address.
get(byte[]) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
Gets the value of the counter associated with a given key.
get(byte[]) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
 
get(byte[], int, int) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
Returns the visit state associated to a given scheme+authority specified as a byte-array fragment, or null.
get(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
Gets the value of the counter associated with a given key.
get(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
 
get(byte[], int, int, long) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
 
getActiveFecthingThreads() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypeContentLength() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypeContentTypeApplication() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypeContentTypeImage() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypeContentTypeOthers() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypeContentTypeText() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypeExternalOutdegree() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypeOutdegree() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypes() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypes1xx() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypes2xx() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypes3xx() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypes4xx() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypes5xx() - Method in class it.unimi.di.law.bubing.Agent
 
getArchetypesOther() - Method in class it.unimi.di.law.bubing.Agent
 
getBeginColumn() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Get token beginning column number.
getBeginLine() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Get token beginning line number.
getBroken() - Method in class it.unimi.di.law.bubing.Agent
 
getBrokenVisitStates() - Method in class it.unimi.di.law.bubing.Agent
 
getBrokenVisitStatesOnWorkbench() - Method in class it.unimi.di.law.bubing.Agent
 
getBytes() - Method in class it.unimi.di.law.bubing.Agent
 
getCharsetName(byte[], int) - Static method in class it.unimi.di.law.bubing.parser.HTMLParser
Returns the charset name as indicated by a META HTTP-EQUIV element, if present, interpreting the provided byte array as a sequence of ISO-8859-1-encoded characters.
getCharsetNameFromHeader(String) - Static method in class it.unimi.di.law.bubing.parser.HTMLParser
Extracts the charset name from the header value of a content-type header using a regular expression.
getColumn() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Deprecated.
getComment() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
Returns the comment of the entry.
getConnectionTimeout() - Method in class it.unimi.di.law.bubing.Agent
 
getContent() - Method in class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
 
getContentLength() - Method in class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
 
getCookies(URI, CookieStore, int) - Static method in class it.unimi.di.law.bubing.frontier.FetchingThread
Returns the list of cookies in a given store in the form of an array, limiting their overall size (only the maximal prefix of cookies satisfying the size limit is returned).
getDelay(TimeUnit) - Method in class it.unimi.di.law.bubing.frontier.VisitState
 
getDelay(TimeUnit) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
 
getDnsThreads() - Method in class it.unimi.di.law.bubing.Agent
 
getDuplicatePercentage() - Method in class it.unimi.di.law.bubing.Agent
 
getDuplicates() - Method in class it.unimi.di.law.bubing.Agent
 
getEndColumn() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Get token end column number.
getEndLine() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Get token end line number.
getEntity() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
getEntity() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
getEntry() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveReader
 
getEntry(boolean) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveReader
 
getEntry(String, String, Date) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveWriter
Returns an object that can be used to write an entry in the GZIP archive.
getEntryAverage() - Method in class it.unimi.di.law.bubing.Agent
 
getEntryMax() - Method in class it.unimi.di.law.bubing.Agent
 
getEntryMin() - Method in class it.unimi.di.law.bubing.Agent
 
getEntryVariance() - Method in class it.unimi.di.law.bubing.Agent
 
getFetchFilter() - Method in class it.unimi.di.law.bubing.Agent
 
getFetchingThreads() - Method in class it.unimi.di.law.bubing.Agent
 
getFetchingThreadTotalWaitTime() - Method in class it.unimi.di.law.bubing.Agent
 
getFetchingThreadWaits() - Method in class it.unimi.di.law.bubing.Agent
 
getFilterFromSpec(String, String, Class<T>) - Static method in class it.unimi.di.law.warc.filters.Filters
Creates a filter from a filter class name and an external form.
getFirstHeader(HeaderGroup, WarcHeader.Name) - Static method in class it.unimi.di.law.warc.records.WarcHeader
Returns the first header of given name.
getFollowFilter() - Method in class it.unimi.di.law.bubing.Agent
 
GetImage() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Get token literal value.
getInfo() - Method in class it.unimi.di.law.warc.records.InfoWarcRecord
 
getInstance() - Static method in class it.unimi.di.law.bubing.sieve.CharSequenceByteSerializerDeserializer
 
getInstance() - Static method in class it.unimi.di.law.warc.processors.ByteWriter
 
getInstance() - Static method in class it.unimi.di.law.warc.processors.IdentityProcessor
 
getInstance() - Static method in class it.unimi.di.law.warc.processors.ResponseContentExtractor
 
getInstance() - Static method in class it.unimi.di.law.warc.processors.ToStringWriter
 
getInstance() - Static method in class it.unimi.di.law.warc.processors.URLDigestWriter
 
getInstance() - Static method in class it.unimi.di.law.warc.processors.URLWriter
 
getInstance() - Static method in class it.unimi.di.law.warc.processors.WarcTargetUriExtractor
 
getIpDelay() - Method in class it.unimi.di.law.bubing.Agent
 
getIPOnWorkbench() - Method in class it.unimi.di.law.bubing.Agent
 
getKeepAliveTime() - Method in class it.unimi.di.law.bubing.Agent
 
getKnownCount() - Method in class it.unimi.di.law.bubing.Agent
Returns the number of agents currently known to the JAI4J RemoteJobManager.
getLine() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Deprecated.
getLocale() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
Deprecated.
getLocale() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
Deprecated.
getManager(String) - Method in class it.unimi.di.law.bubing.Agent
 
getMaxUrls() - Method in class it.unimi.di.law.bubing.Agent
 
getMessage() - Method in error it.unimi.di.law.warc.filters.parser.TokenMgrError
You can also modify the body of this method to customize your error messages.
getMetrics() - Method in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
 
getName() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
Returns the name of the entry.
getNextToken() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
Get the next Token.
getNextToken() - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
Get the next Token.
getParseFilter() - Method in class it.unimi.di.law.bubing.Agent
 
getParsingThreads() - Method in class it.unimi.di.law.bubing.Agent
 
getProtocolVersion() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
getProtocolVersion() - Method in class it.unimi.di.law.warc.records.HttpRequestWarcRecord
 
getProtocolVersion() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
getProtocolVersion() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpRequest
 
getProtocolVersion() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
getQueueDistribution() - Method in class it.unimi.di.law.bubing.Agent
 
getReadyToParse() - Method in class it.unimi.di.law.bubing.Agent
 
getReadyURLs() - Method in class it.unimi.di.law.bubing.Agent
 
getReceivedURLs() - Method in class it.unimi.di.law.bubing.Agent
 
getRequestLine() - Method in class it.unimi.di.law.warc.records.HttpRequestWarcRecord
 
getRequestLine() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpRequest
 
getRequests() - Method in class it.unimi.di.law.bubing.Agent
 
getRequiredFrontSize() - Method in class it.unimi.di.law.bubing.Agent
 
getResolvedVisitStates() - Method in class it.unimi.di.law.bubing.Agent
 
getResources() - Method in class it.unimi.di.law.bubing.Agent
 
getResponseBodyMaxByteSize() - Method in class it.unimi.di.law.bubing.Agent
 
getRobotsExpiration() - Method in class it.unimi.di.law.bubing.Agent
 
getScheduleFilter() - Method in class it.unimi.di.law.bubing.Agent
 
getSocketTimeout() - Method in class it.unimi.di.law.bubing.Agent
 
getStatsThread() - Method in class it.unimi.di.law.bubing.frontier.Frontier
Returns the StatsThread.
getStatusLine() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
getStatusLine() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
getStoreFilter() - Method in class it.unimi.di.law.bubing.Agent
 
getStoreSize() - Method in class it.unimi.di.law.bubing.Agent
 
GetSuffix(int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Get the suffix.
getTabSize() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
getToDoSize() - Method in class it.unimi.di.law.bubing.Agent
 
getToken(int) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
Get the specific Token.
getUnknownHosts() - Method in class it.unimi.di.law.bubing.Agent
 
getUnresolved() - Method in class it.unimi.di.law.bubing.Agent
 
getUrlCacheMaxByteSize() - Method in class it.unimi.di.law.bubing.Agent
 
getUrlDelay() - Method in class it.unimi.di.law.bubing.Agent
 
getURLsInQueues() - Method in class it.unimi.di.law.bubing.Agent
 
getURLsInQueuesPercentage() - Method in class it.unimi.di.law.bubing.Agent
 
getValue() - Method in class it.unimi.di.law.warc.filters.parser.Token
An optional attribute value of the Token.
getVisitStates() - Method in class it.unimi.di.law.bubing.Agent
 
getVisitStates() - Method in class it.unimi.di.law.bubing.frontier.StatsThread
Returns the overall number of visit states.
getVisitStatesOnDisk() - Method in class it.unimi.di.law.bubing.Agent
 
getVisitStatesOnDisk() - Method in class it.unimi.di.law.bubing.frontier.StatsThread
Returns the number of visit states on disk.
getVisitStatesOnWorkbench() - Method in class it.unimi.di.law.bubing.Agent
 
getWaitingVisitStates() - Method in class it.unimi.di.law.bubing.Agent
 
getWarcContentLength() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
getWarcContentLength() - Method in interface it.unimi.di.law.warc.records.WarcRecord
Returns the WARC Content-Length header.
getWarcDate() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
getWarcDate() - Method in interface it.unimi.di.law.warc.records.WarcRecord
Returns the WARC-Date header.
getWarcHeader(WarcHeader.Name) - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
getWarcHeader(WarcHeader.Name) - Method in interface it.unimi.di.law.warc.records.WarcRecord
Returns the specified WARC header.
getWarcHeaders() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
getWarcHeaders() - Method in interface it.unimi.di.law.warc.records.WarcRecord
Returns the WARC headers.
getWarcRecordId() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
getWarcRecordId() - Method in interface it.unimi.di.law.warc.records.WarcRecord
Returns the WARC-Record-ID header.
getWarcTargetURI() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
Returns the WARC-Target-URI header as a URI.
getWarcTargetURI() - Method in interface it.unimi.di.law.warc.records.WarcRecord
Returns the WARC-Target-URI header as a URI.
getWarcType() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
getWarcType() - Method in interface it.unimi.di.law.warc.records.WarcRecord
Returns the WARC-Type header.
getWorkbenchByteSize() - Method in class it.unimi.di.law.bubing.Agent
 
getWorkbenchEntry(byte[]) - Method in class it.unimi.di.law.bubing.frontier.Workbench
Returns a workbench entry for the given address, possibly creating one.
getWorkbenchMaxByteSize() - Method in class it.unimi.di.law.bubing.Agent
 
ground() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
 
group - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
group - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The group of this agent; all agents belonging to the same group will coordinate their crawling activity.
guessedCharset - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
The charset we guessed for the last response.
guessedCharset() - Method in class it.unimi.di.law.bubing.parser.BinaryParser
 
guessedCharset() - Method in class it.unimi.di.law.bubing.parser.HTMLParser
 
guessedCharset() - Method in interface it.unimi.di.law.bubing.parser.Parser
Returns a guessed charset for the document, or null if the charset could not be guessed.
GZIP_START - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
GZIPArchive - Class in it.unimi.di.law.warc.io.gzarc
A GZIP archive is an archive made of (concatenated) GZIP entries that are usual GZIP files, except for the presence of two extra fields (in the GZIP header) containing the compressed and uncompressed length of the entry itself.
GZIPArchive() - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
GZIPArchive.Entry - Class in it.unimi.di.law.warc.io.gzarc
A generic GZIP archive entry; it can be instantiated only as a GZIPArchive.ReadEntry or GZIPArchive.WriteEntry.
GZIPArchive.FormatException - Exception in it.unimi.di.law.warc.io.gzarc
 
GZIPArchive.ReadEntry - Class in it.unimi.di.law.warc.io.gzarc
An entry used to read a GZIP archive entry.
GZIPArchive.ReadEntry.LazyInflater - Interface in it.unimi.di.law.warc.io.gzarc
The lazy infalter that can be used to get (part of the) uncompressed entry content.
GZIPArchive.WriteEntry - Class in it.unimi.di.law.warc.io.gzarc
An entry used to write a GZIP archive entry.
GZIPArchiveReader - Class in it.unimi.di.law.warc.io.gzarc
 
GZIPArchiveReader(InputStream) - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPArchiveReader
 
GZIPArchiveWriter - Class in it.unimi.di.law.warc.io.gzarc
 
GZIPArchiveWriter(OutputStream) - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPArchiveWriter
 
GZIPIndexer - Class in it.unimi.di.law.warc.io.gzarc
 
GZIPIndexer() - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPIndexer
 

H

hash(byte[]) - Static method in class it.unimi.di.law.bubing.util.MurmurHash3
Hashes a byte array using MurmurHash3 with seed zero.
hash(byte[], int, int) - Static method in class it.unimi.di.law.bubing.util.MurmurHash3
Hashes a byte-array fragment using MurmurHash3 with seed zero.
hash(byte[], int, int, long) - Static method in class it.unimi.di.law.bubing.util.MurmurHash3
Hashes a byte-array fragment using MurmurHash3.
hash(ByteArrayList) - Static method in class it.unimi.di.law.bubing.util.MurmurHash3
Hashes a ByteArrayList using MurmurHash3 with seed zero.
hash64() - Method in class it.unimi.di.law.bubing.util.BubingJob
A hash based on the host of BubingJob.url.
hashCode() - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
 
hashCode() - Method in class it.unimi.di.law.bubing.util.LockFreeQueue
 
hashCode() - Method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
 
hashCode() - Method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
 
hashCode() - Method in class it.unimi.di.law.warc.filters.HostEndsWith
 
hashCode() - Method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
 
hashCode() - Method in class it.unimi.di.law.warc.filters.HostEquals
 
hashCode() - Method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
 
hashCode() - Method in class it.unimi.di.law.warc.filters.SameHost
 
hashCode() - Method in class it.unimi.di.law.warc.filters.SchemeEquals
 
hashCode() - Method in class it.unimi.di.law.warc.filters.StatusCategory
 
hashCode() - Method in class it.unimi.di.law.warc.filters.URLEquals
 
hashCode() - Method in class it.unimi.di.law.warc.filters.URLMatchesRegex
 
hashCode() - Method in class it.unimi.di.law.warc.filters.URLShorterThan
 
hasher - Variable in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
The hasher currently used to compute the digest.
hashFunction - Variable in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
The message digest used to compute the digest.
hashingStrategy - Variable in class it.unimi.di.law.bubing.sieve.AbstractSieve
 
head - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues.QueueData
The pointer to the head of the list (the least recently enqueued, but not dequeued, element).
hits() - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
Returns the number of cache hits.
HostEndsWith - Class in it.unimi.di.law.warc.filters
A filter accepting only URIs whose host ends with (case-insensitively) a certain suffix.
HostEndsWith(String) - Constructor for class it.unimi.di.law.warc.filters.HostEndsWith
Creates a filter that only accepts URLs whose host part has a given suffix.
HostEndsWithOneOf - Class in it.unimi.di.law.warc.filters
A filter accepting only URIs whose host part ends (case-insensitively) with one of a given set of suffixes.
HostEndsWithOneOf(String[]) - Constructor for class it.unimi.di.law.warc.filters.HostEndsWithOneOf
Creates a filter that only accepts URLs whose host part ends with one of a given set of suffixes.
HostEquals - Class in it.unimi.di.law.warc.filters
A filter accepting only URIs whose host equals (case-insensitively) a certain string.
HostEquals(String) - Constructor for class it.unimi.di.law.warc.filters.HostEquals
Creates a filter that only accepts URLs with a given host.
hostFromSchemeAndAuthority(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
Extracts the host part from a scheme+authority by removing the scheme, the user info and the port number.
HTMLParser<T> - Class in it.unimi.di.law.bubing.parser
An HTML parser with additional responsibilities.
HTMLParser() - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser with a fixed buffer of HTMLParser.CHAR_BUFFER_SIZE characters for link extraction only (no digesting).
HTMLParser(HashFunction) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser with a fixed buffer of HTMLParser.CHAR_BUFFER_SIZE characters for link extraction and, possibly, digesting a page.
HTMLParser(HashFunction, boolean) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser with a fixed buffer of HTMLParser.CHAR_BUFFER_SIZE characters for link extraction and, possibly, digesting a page.
HTMLParser(HashFunction, Parser.TextProcessor<T>, boolean) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser with a fixed buffer of HTMLParser.CHAR_BUFFER_SIZE characters for link extraction and, possibly, digesting a page.
HTMLParser(HashFunction, Parser.TextProcessor<T>, boolean, boolean) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser with a fixed buffer of HTMLParser.CHAR_BUFFER_SIZE for link extraction and, possibly, digesting a page.
HTMLParser(HashFunction, Parser.TextProcessor<T>, boolean, boolean, int) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser for link extraction and, possibly, digesting a page.
HTMLParser(HashFunction, Parser.TextProcessor<T>, boolean, int) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser for link extraction and, possibly, digesting a page.
HTMLParser(String) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser with a fixed buffer of HTMLParser.CHAR_BUFFER_SIZE characters for link extraction and, possibly, digesting a page.
HTMLParser(String, String) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser with a fixed buffer of HTMLParser.CHAR_BUFFER_SIZE characters for link extraction and, possibly, digesting a page.
HTMLParser(String, String, String) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser with a fixed buffer of HTMLParser.CHAR_BUFFER_SIZE characters for link extraction and, possibly, digesting a page.
HTMLParser(String, String, String, String) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser
Builds a parser with a fixed buffer of HTMLParser.CHAR_BUFFER_SIZE characters for link extraction and, possibly, digesting a page.
HTMLParser.DigestAppendable - Class in it.unimi.di.law.bubing.parser
A class computing the digest of a page.
HTMLParser.SetLinkReceiver - Class in it.unimi.di.law.bubing.parser
An implementation of a Parser.LinkReceiver that accumulates the URLs in a public set.
HTTP_EQUIV_PATTERN - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
HTTP_REQUEST_MSGTYPE - Static variable in class it.unimi.di.law.warc.records.HttpRequestWarcRecord
 
HTTP_RESPONSE_MSGTYPE - Static variable in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
HttpEntityFactory - Interface in it.unimi.di.law.warc.util
An interface describing factories that create HttpEntity.
HttpRequest(String, String) - Constructor for class it.unimi.di.law.warc.util.StringHttpMessages.HttpRequest
 
HttpRequestWarcRecord - Class in it.unimi.di.law.warc.records
An implementation of WarcRecord corresponding to a WarcRecord.Type.REQUEST record type.
HttpRequestWarcRecord(URI, HttpRequest) - Constructor for class it.unimi.di.law.warc.records.HttpRequestWarcRecord
 
HttpResponse(int, String, byte[], int, ContentType) - Constructor for class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
HttpResponse(int, String, byte[], ContentType) - Constructor for class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
HttpResponse(int, String, String, ContentType) - Constructor for class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
HttpResponse(String) - Constructor for class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
HttpResponse(String, ContentType) - Constructor for class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
HttpResponseWarcRecord - Class in it.unimi.di.law.warc.records
An implementation of WarcRecord corresponding to a WarcRecord.Type.RESPONSE record type.
HttpResponseWarcRecord(URI, HttpResponse) - Constructor for class it.unimi.di.law.warc.records.HttpResponseWarcRecord
Builds the record given the response and the target URI (using a IdentityHttpEntityFactory to store the entity in the record).
HttpResponseWarcRecord(URI, HttpResponse, HttpEntityFactory) - Constructor for class it.unimi.di.law.warc.records.HttpResponseWarcRecord
Builds the record given the response, the target URI, and a HttpEntityFactory.

I

IdentityHttpEntityFactory - Class in it.unimi.di.law.warc.util
An implementation of a HttpEntityFactory that returns an entity simply wrapping the given one.
IdentityProcessor - Class in it.unimi.di.law.warc.processors
 
IdentitySieve<K,V> - Class in it.unimi.di.law.bubing.sieve
A sieve that simply (and immediately) copies enqueued keys to the new flow receiver.
IdentitySieve(AbstractSieve.NewFlowReceiver<K>, ByteSerializerDeserializer<K>, ByteSerializerDeserializer<V>, AbstractHashFunction<K>, AbstractSieve.UpdateStrategy<K, V>) - Constructor for class it.unimi.di.law.bubing.sieve.IdentitySieve
 
IdentityWriter - Class in it.unimi.di.law.warc.processors
A writer that simply writes the given record.
IdentityWriter() - Constructor for class it.unimi.di.law.warc.processors.IdentityWriter
 
image - Variable in class it.unimi.di.law.warc.filters.parser.Token
The string image of the token.
ImmutableGraphNamedGraphServer - Class in it.unimi.di.law.bubing.test
A NamedGraphServer using an ImmutableGraph for the graph structure and a StringMap for the name of the nodes.
ImmutableGraphNamedGraphServer(ImmutableGraph, StringMap<? extends CharSequence>) - Constructor for class it.unimi.di.law.bubing.test.ImmutableGraphNamedGraphServer
Builds the server.
inBuf - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
index(InputStream) - Static method in class it.unimi.di.law.warc.io.gzarc.GZIPIndexer
Returns a list of pointers to a GZIP archive entries positions (including the end of file).
index(InputStream, ProgressLogger) - Static method in class it.unimi.di.law.warc.io.gzarc.GZIPIndexer
Returns a list of pointers to a GZIP archive entries positions (including the end of file).
InfoWarcRecord - Class in it.unimi.di.law.warc.records
An implementation of WarcRecord corresponding to a WarcRecord.Type.WARCINFO record type.
InfoWarcRecord(Header[]) - Constructor for class it.unimi.di.law.warc.records.InfoWarcRecord
 
init(URI) - Method in class it.unimi.di.law.bubing.parser.BinaryParser
 
init(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
Initializes the digest computation.
init(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
 
init(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
Initializes this receiver for a new page.
init(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.TextProcessor
Initializes this processor for a new page.
init(URI) - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
 
init(URI, byte[], char[][]) - Method in class it.unimi.di.law.bubing.frontier.ParsingThread.FrontierEnqueuer
Initializes the enqueuer for parsing a page with a specific scheme+authority and robots filter.
input_stream - Variable in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
 
inputStream - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
InspectableCachedHttpEntity - Class in it.unimi.di.law.warc.util
An implementation of a HttpEntity that is reusable and can copy its content from another entity at a controlled rate.
InspectableCachedHttpEntity(InspectableFileCachedInputStream) - Constructor for class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
 
INSTANCE - Static variable in class it.unimi.di.law.warc.filters.IsHttpResponse
 
INSTANCE - Static variable in class it.unimi.di.law.warc.filters.IsProbablyBinary
 
INSTANCE - Static variable in class it.unimi.di.law.warc.processors.IdentityProcessor
 
INSTANCE - Static variable in class it.unimi.di.law.warc.util.BufferedHttpEntityFactory
 
INSTANCE - Static variable in class it.unimi.di.law.warc.util.IdentityHttpEntityFactory
 
INT_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
INTEGER - Static variable in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
A trivial serializer-deserializer for Integer.
inUse - Variable in class it.unimi.di.law.bubing.util.FetchData
If true, this istance has been enqueued to the list of results and we are waiting for the signal of the ParsingThread that is analyzing it.
INVALID_LEXICAL_STATE - Static variable in error it.unimi.di.law.warc.filters.parser.TokenMgrError
Tried to change to an invalid lexical state.
ipAddress - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
The IP address of this workbench entry, computed by DnsResolver.resolve(String).
ipDelay - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
ipDelay - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The minimum delay between two consecutive fetches from the same IP address.
ipDelayFactor - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
ipDelayFactor - Variable in class it.unimi.di.law.bubing.StartupConfiguration
An attenuation factor for the multiple-agent IP delay mechanism.
isAlive(long) - Method in class it.unimi.di.law.bubing.frontier.VisitState
Return whether this visit state is fetchable (i.e., if there is at leas one URL and it is allowed by politeness to fetch it).
isDuplicate() - Method in class it.unimi.di.law.bubing.util.FetchData
Get whether the current FetchData is duplicate or not
isDuplicate(boolean) - Method in class it.unimi.di.law.bubing.util.FetchData
Mark the current FetchData as duplicated or not duplicated
isEmpty() - Method in class it.unimi.di.law.bubing.frontier.VisitState
Returns whether this visit state is empty.
isEmpty() - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
Returns whether the set is empty.
isEmpty() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Returns true if the visit-state queue is not empty.
isEmpty() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Returns whether the workbench is empty.
isEmpty() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
 
isEmpty() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
 
isEmpty() - Method in class it.unimi.di.law.warc.util.ReorderingBlockingQueue
Returns whether this queue is empty.
isEntirelyBroken() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Returns true if this entry is nonempty and all its visit states are broken (i.e., VisitState.lastExceptionClassnull)
IsHttpResponse - Class in it.unimi.di.law.warc.filters
A filter accepting only records that are http/https responses.
IsProbablyBinary - Class in it.unimi.di.law.warc.filters
A filter accepting only http responses whose content stream appears to be binary.
it.unimi.di.law.bubing - package it.unimi.di.law.bubing
The main BUbing class, Agent, and the companion classes StartupConfiguration/RuntimeConfiguration, which describe BUbiNG's internals that can be configured or modified at runtime.
it.unimi.di.law.bubing.frontier - package it.unimi.di.law.bubing.frontier
A set of classes orchestrating the movement of URLs to be fetched next by a BUbiNG agent.
it.unimi.di.law.bubing.frontier.dns - package it.unimi.di.law.bubing.frontier.dns
 
it.unimi.di.law.bubing.parser - package it.unimi.di.law.bubing.parser
The system of parsers used for analyzing HTTP responses.
it.unimi.di.law.bubing.sieve - package it.unimi.di.law.bubing.sieve
Several implementation of the idea of an AbstractSieve.
it.unimi.di.law.bubing.spam - package it.unimi.di.law.bubing.spam
 
it.unimi.di.law.bubing.store - package it.unimi.di.law.bubing.store
Implementations of the Store interface.
it.unimi.di.law.bubing.test - package it.unimi.di.law.bubing.test
 
it.unimi.di.law.bubing.tool - package it.unimi.di.law.bubing.tool
 
it.unimi.di.law.bubing.util - package it.unimi.di.law.bubing.util
Miscellaneous utility classes, including the BUbiNG queuing system and its fast cache for URLs.
it.unimi.di.law.warc - package it.unimi.di.law.warc
An implementation of the Web ARChive file format (WARC) specification.
it.unimi.di.law.warc.filters - package it.unimi.di.law.warc.filters
A comprehensive filtering system.
it.unimi.di.law.warc.filters.parser - package it.unimi.di.law.warc.filters.parser
 
it.unimi.di.law.warc.io - package it.unimi.di.law.warc.io
I/O of WARC formatted files.
it.unimi.di.law.warc.io.gzarc - package it.unimi.di.law.warc.io.gzarc
An implementation of a (skippable) GZIP archive.
it.unimi.di.law.warc.processors - package it.unimi.di.law.warc.processors
Processors to manipulate WARC files.
it.unimi.di.law.warc.records - package it.unimi.di.law.warc.records
WARC records.
it.unimi.di.law.warc.tool - package it.unimi.di.law.warc.tool
 
it.unimi.di.law.warc.util - package it.unimi.di.law.warc.util
Utility classes used by the it.unimi.di.law.warc package.
iterator() - Method in class it.unimi.di.law.bubing.frontier.Workbench
Returns an (unmodifiable) iterator over the entries currently on the workbench.
iterator() - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
 

J

JavaResolver - Class in it.unimi.di.law.bubing.frontier.dns
A resolver based on InetAddress.getAllByName(String).
JavaResolver() - Constructor for class it.unimi.di.law.bubing.frontier.dns.JavaResolver
 
JGROUPS_CONFIGURATION_PROPERTY_NAME - Static variable in class it.unimi.di.law.bubing.Agent
The name of the system property that, if set, makes it possible to choose a JGroups configuration file.
jj_nt - Variable in class it.unimi.di.law.warc.filters.parser.FilterParser
Next token.
jjFillToken() - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
 
jjstrLiteralImages - Static variable in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
Token literal values.
JMX_REMOTE_PORT_SYSTEM_PROPERTY - Static variable in class it.unimi.di.law.bubing.Agent
The name of the standard Java system property that sets the JMX service port (it must be set for the agent to start).

K

keepAliveTime - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
keepAliveTime - Variable in class it.unimi.di.law.bubing.StartupConfiguration
If zero, connections are closed at each downloaded resource.
key - Variable in class it.unimi.di.law.bubing.sieve.AbstractSieve.SieveEntry
 
key - Variable in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
The array of keys.
key2QueueData - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
For each key, the associated ByteArrayDiskQueues.QueueData.
keySerDeser - Variable in class it.unimi.di.law.bubing.sieve.AbstractSieve
 
kind - Variable in class it.unimi.di.law.warc.filters.parser.Token
An integer that describes the kind of this token.

L

lastAppendedWasSpace - Variable in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
True iff the last character appended was a space.
lastExceptionClass - Variable in class it.unimi.di.law.bubing.frontier.VisitState
If not null, this fields contains the class of the exception that was thrown during the last attempt to access this scheme+authority.
lastHighCostStat - Variable in class it.unimi.di.law.bubing.frontier.Distributor
The last time we produced a high-cost statistics.
lastPurgeCheck - Variable in class it.unimi.di.law.bubing.frontier.Distributor
The last time we checked for visit states to be purged.
lastRobotsFetch - Variable in class it.unimi.di.law.bubing.frontier.VisitState
System.currentTimeMillis() when we fetched the robots we are caching.
lazyInflater - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.ReadEntry
 
length() - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
 
length() - Method in class it.unimi.di.law.bubing.util.FetchData
Returns (an approximation of) the length of the response (headers and body).
lengthOfHost(byte[], int) - Static method in class it.unimi.di.law.bubing.util.BURL
Finds the length of the host part in a scheme+authority or URL.
LEXICAL_ERROR - Static variable in error it.unimi.di.law.warc.filters.parser.TokenMgrError
Lexical error occurred.
LexicalErr(boolean, int, int, int, String, int) - Static method in error it.unimi.di.law.warc.filters.parser.TokenMgrError
Returns a detailed message for the Error when it is thrown by the token manager to indicate a lexical error.
lexStateNames - Static variable in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
Lexer state names.
line - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
link(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
 
link(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
Handles a link.
Link - Class in it.unimi.di.law.bubing.util
A class representing a link, to be used by schedule filters.
Link(URI, URI) - Constructor for class it.unimi.di.law.bubing.util.Link
Creates a new link with given source and target.
location - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
The location URL from headers of the last response, if any, or null.
location() - Method in class it.unimi.di.law.bubing.parser.HTMLParser
Returns the BURL location header, if present; if it is not present, but the page contains a valid metalocation, the latter is returned.
location(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
 
location(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
Handles the location defined by headers.
lock() - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
Acquires a locked copy of this map.
LockedMap(ConcurrentCountingMap) - Constructor for class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
 
LockFreeQueue<T> - Class in it.unimi.di.law.bubing.util
A thin layer around a ConcurrentLinkedQueue that exhibits a subset of the available methods, and keeps track in an AtomicLong of the size of the queue, so that LockFreeQueue.size() can return in constant time.
LockFreeQueue() - Constructor for class it.unimi.di.law.bubing.util.LockFreeQueue
 
log2LogFileSize - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
The base 2 logarithm of the byte size of a log file.
logFilePositionMask - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
The mask to extract the position inside a log file from a pointer.
logFileSize - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
The byte size of a log file.
LOOP_DETECTED - Static variable in error it.unimi.di.law.warc.filters.parser.TokenMgrError
Detected (and bailed out of) an infinite loop in the token manager.
LOOPBACK - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
The loopback address, cached.

M

main(String[]) - Static method in class it.unimi.di.law.bubing.Agent
 
main(String[]) - Static method in class it.unimi.di.law.bubing.parser.HTMLParser
 
main(String[]) - Static method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy
 
main(String[]) - Static method in class it.unimi.di.law.bubing.tool.BuildRepetitionSet
 
main(String[]) - Static method in class it.unimi.di.law.bubing.tool.CatEFGraphs
 
main(String[]) - Static method in class it.unimi.di.law.bubing.tool.GenerateGraphMap
 
main(String[]) - Static method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
 
main(String[]) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
 
main(String[]) - Static method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
 
main(String[]) - Static method in class it.unimi.di.law.warc.io.gzarc.GZIPIndexer
 
main(String[]) - Static method in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
 
main(String[]) - Static method in class it.unimi.di.law.warc.tool.WarcCompressor
 
mask - Variable in class it.unimi.di.law.bubing.frontier.VisitStateSet
The mask for wrapping a position counter.
mask - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
The mask for wrapping a position counter.
mask - Variable in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
The mask for wrapping a position counter.
max() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the maximum of the values added so far.
MAX_TO_STRING_ROBOTS - Static variable in class it.unimi.di.law.bubing.util.URLRespectsRobots
The maximum number of robots entries returned by Object.toString().
maxFill - Variable in class it.unimi.di.law.bubing.frontier.VisitStateSet
The maximum number of entries that can be filled before rehashing.
maxFill - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
The maximum number of entries that can be filled before rehashing.
maxFill - Variable in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
Threshold after which we rehash.
maxNextCharInd - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
maxUrls - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
maxUrls - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The maximum number of URLs to crawl.
maxUrlsPerSchemeAuthority - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
maxUrlsPerSchemeAuthority - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The maximum number of URLs we shall download from each scheme+authority.
mean() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the mean of the values added so far.
memoryUsageOf(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
Returns the memory usage associated to a byte array.
MercatorSieve<K,V> - Class in it.unimi.di.law.bubing.sieve
A concrete implementation of an AbstractSieve that stores hash data on disk, much in the same way as it was suggested by Allan Heydon and Marc Najork in “Mercator: A Scalable, Extensible Web Crawler”, World Wide Web, (2)4:219−229, 1999, Springer.
MercatorSieve(boolean, File, int, int, int, AbstractSieve.NewFlowReceiver<K>, ByteSerializerDeserializer<K>, ByteSerializerDeserializer<V>, AbstractHashFunction<K>, AbstractSieve.UpdateStrategy<K, V>) - Constructor for class it.unimi.di.law.bubing.sieve.MercatorSieve
Creates a new Mercator-like sieve.
messageThread - Variable in class it.unimi.di.law.bubing.Agent
 
MessageThread - Class in it.unimi.di.law.bubing.frontier
A thread that takes care of pouring the content of Frontier.receivedURLs into the Frontier itself (via the Frontier.enqueue(it.unimi.dsi.fastutil.bytes.ByteArrayList) method).
MessageThread(Frontier) - Constructor for class it.unimi.di.law.bubing.frontier.MessageThread
Creates the thread.
META_PATTERN - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
metaLocation - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
The location URL from META elements of the last response, if any, or null.
metaLocation(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
 
metaLocation(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
Handles the location defined by a META element.
metaRefresh(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
 
metaRefresh(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
Handles the refresh defined by a META element.
min() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the minimum of the values added so far.
MIN_FLUSH_INTERVAL - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
The minimum number of milliseconds between two flushes.
misses() - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
Returns the number of cache misses.
MOVING_AVERAGE_WINDOW - Static variable in class it.unimi.di.law.bubing.frontier.VisitState
The window over which we compute the moving average.
mtime - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
The modification time of the entry.
MurmurHash3 - Class in it.unimi.di.law.bubing.util
A 64-bit implementation of MurmurHash3 for byte-array fragments.
MurmurHash3() - Constructor for class it.unimi.di.law.bubing.util.MurmurHash3
 

N

n - Variable in class it.unimi.di.law.bubing.frontier.VisitStateSet
The current table size.
n - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
The current table size.
n - Variable in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
The current table size.
name - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
name - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The name of this agent; it must be unique within its group.
name - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
An internal representation of the name of the entry.
NamedGraphServer - Interface in it.unimi.di.law.bubing.test
A server that allows one to navigate through a graph whose nodes are decorated with names.
NamedGraphServerHttpProxy - Class in it.unimi.di.law.bubing.test
An HTTP proxy that uses a NamedGraphServer to generate fake HTML pages contaning only links.
NamedGraphServerHttpProxy() - Constructor for class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy
 
NamedGraphServerHttpProxy.CapriciousPrintWriter - Class in it.unimi.di.law.bubing.test
Like a standard print writer, but it sleeps a random amount of time before printing each string (only the methods NamedGraphServerHttpProxy.CapriciousPrintWriter.println(String), NamedGraphServerHttpProxy.CapriciousPrintWriter.print(String) and NamedGraphServerHttpProxy.CapriciousPrintWriter.println() are affected).
newEntity(HttpEntity) - Method in class it.unimi.di.law.warc.util.BufferedHttpEntityFactory
 
newEntity(HttpEntity) - Method in class it.unimi.di.law.warc.util.FastByteArrayInputStreamHttpEntityFactory
 
newEntity(HttpEntity) - Method in interface it.unimi.di.law.warc.util.HttpEntityFactory
 
newEntity(HttpEntity) - Method in class it.unimi.di.law.warc.util.IdentityHttpEntityFactory
 
newFlowReceiver - Variable in class it.unimi.di.law.bubing.sieve.AbstractSieve
 
newToken(int) - Static method in class it.unimi.di.law.warc.filters.parser.Token
 
newToken(int, String) - Static method in class it.unimi.di.law.warc.filters.parser.Token
Returns a new Token object, by default.
newVisitStates - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The queue of new visit states; filled by the Frontier.distributor and emptied by the DNS threads.
next - Variable in class it.unimi.di.law.warc.filters.parser.Token
A reference to the next regular (non-special) token from the input stream.
nextFetch - Variable in class it.unimi.di.law.bubing.frontier.VisitState
The minimum time at which this visit state can be accessed because of host-based politeness, zero at creation.
nextFetch - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
The minimum time at which visit states in this entry can be accessed because of IP-based politeness.
nextFetch() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Returns the minimum time at which some URL in some VisitState of the visit-state queue of this workbench entry can be accessed.
nextFlush - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The time at which the next flush can happen.
noMoreAppend() - Method in class it.unimi.di.law.bubing.frontier.Frontier
 
noMoreAppend() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
 
noMoreAppend() - Method in interface it.unimi.di.law.bubing.sieve.AbstractSieve.NewFlowReceiver
There will be no more new flows (because the sieve that is calling this method was closed).
not(Filter<T>) - Static method in class it.unimi.di.law.warc.filters.Filters
Produces the negation of the given filter.
NOT - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
RegularExpression Id.
NULL_LINK_RECEIVER - Static variable in interface it.unimi.di.law.bubing.parser.Parser
A no-op implementation of Parser.LinkReceiver.
NUM_GZ_WARC_RECORDS - Static variable in class it.unimi.di.law.bubing.store.WarcStore
 
numberOfItems() - Method in class it.unimi.di.law.bubing.sieve.MercatorSieve
 
numberOfReceivedURLs - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The overall number of URLs sent by other agents.
NUMBEROFRECEIVEDURLS - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
numberOfWorkbenchEntries() - Method in class it.unimi.di.law.bubing.frontier.Workbench
Returns the number of existing workbench entries (in and out of the workbench).
numKeys() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Returns the number of keys.

O

ObjectDiskQueue<T> - Class in it.unimi.di.law.bubing.util
A queue of objects partially stored on disk.
ObjectDiskQueue(ByteDiskQueue) - Constructor for class it.unimi.di.law.bubing.util.ObjectDiskQueue
 
onDisk() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
Returns the number of visit states on disk.
OPENPAREN - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
RegularExpression Id.
or() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
 
or(Filter<T>...) - Static method in class it.unimi.di.law.warc.filters.Filters
Produces the disjunction of the given filters.
OR - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
RegularExpression Id.
outdegree - Variable in class it.unimi.di.law.bubing.frontier.Frontier
Statistics about the number of out-links of each archetype
outlinks - Variable in class it.unimi.di.law.bubing.frontier.ParsingThread.FrontierEnqueuer
 
OUTPUT_STREAM_BUFFER_SIZE - Variable in class it.unimi.di.law.bubing.store.UnbufferedFileStore
 
OUTPUT_STREAM_BUFFER_SIZE - Variable in class it.unimi.di.law.bubing.store.WarcStore
 
outputStream - Variable in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
The final output stream.
OVERFLOW_FILES_RANDOM_PATH_ELEMENTS - Static variable in class it.unimi.di.law.bubing.util.FetchData
The number of path elements for the hierarchical overflow files (see Util.createHierarchicalTempFile(File, int, String, String)).

P

ParallelBufferedWarcWriter - Class in it.unimi.di.law.warc.io
A parallel Warc writer.
ParallelBufferedWarcWriter(OutputStream, boolean) - Constructor for class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
Creates a Warc parallel output stream using 2×Runtime.availableProcessors() buffers.
ParallelBufferedWarcWriter(OutputStream, boolean, int) - Constructor for class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
Creates a Warc parallel output stream.
ParallelBufferedWarcWriter.WriterPair - Class in it.unimi.di.law.warc.io
 
ParallelFilteredProcessorRunner - Class in it.unimi.di.law.warc.processors
 
ParallelFilteredProcessorRunner(InputStream) - Constructor for class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
 
ParallelFilteredProcessorRunner(InputStream, Filter<WarcRecord>) - Constructor for class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
 
ParallelFilteredProcessorRunner.Processor<T> - Interface in it.unimi.di.law.warc.processors
 
ParallelFilteredProcessorRunner.Writer<T> - Interface in it.unimi.di.law.warc.processors
 
parse(MutableString) - Static method in class it.unimi.di.law.bubing.util.BURL
Creates a new BUbiNG URL from a mutable string specification if possible, or returns null otherwise.
parse(String) - Static method in class it.unimi.di.law.bubing.util.BURL
Creates a new BUbiNG URL from a string specification if possible, or returns null otherwise.
parse(String) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
 
parse(URI, HttpResponse, Parser.LinkReceiver) - Method in class it.unimi.di.law.bubing.parser.BinaryParser
 
parse(URI, HttpResponse, Parser.LinkReceiver) - Method in class it.unimi.di.law.bubing.parser.HTMLParser
 
parse(URI, HttpResponse, Parser.LinkReceiver) - Method in interface it.unimi.di.law.bubing.parser.Parser
Parses a response.
parseBoolean(String) - Static method in class it.unimi.di.law.bubing.util.Util
Parses a Boolean value reliably, throwing an exception if the argument is not true or false (case insensitively).
parseDate(String) - Static method in class it.unimi.di.law.warc.records.WarcHeader
Parses the date found in a WarcHeader.Name.WARC_DATE header.
ParseException - Exception in it.unimi.di.law.warc.filters.parser
This exception is thrown when parse errors are encountered.
ParseException() - Constructor for exception it.unimi.di.law.warc.filters.parser.ParseException
The following constructors are for use by you for whatever purpose you can think of.
ParseException(Token, int[][], String[]) - Constructor for exception it.unimi.di.law.warc.filters.parser.ParseException
This constructor is used by the method "generateParseException" in the generated parser.
ParseException(String) - Constructor for exception it.unimi.di.law.warc.filters.parser.ParseException
Constructor with message.
parseFilter - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
parseFilter - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A filter that will be applied to all fetched resources to decide whether to parse them.
parseId(String) - Static method in class it.unimi.di.law.warc.records.WarcHeader
Parses the date found in a WarcHeader.Name.WARC_RECORD_ID header.
Parser<T> - Interface in it.unimi.di.law.bubing.parser
A generic parser for responses.
Parser.LinkReceiver - Interface in it.unimi.di.law.bubing.parser
A class that can receive URLs discovered during parsing.
Parser.TextProcessor<T> - Interface in it.unimi.di.law.bubing.parser
A class that can receive piece of text discovered during parsing.
parseRobotsReader(Reader, String) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
Parses the argument as if it were the content of a robots.txt file, and returns a sorted array of prefixes of URLs that the agent should not follow.
parseRobotsResponse(URIResponse, String) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
Parses a robots.txt file contained in a FetchData and returns the corresponding filter as an array of sorted prefixes.
parsers - Variable in class it.unimi.di.law.bubing.frontier.ParsingThread
The parsers used by this thread.
parsers - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
The parser, instantiated.
parsersFromSpecs(String[]) - Static method in class it.unimi.di.law.bubing.RuntimeConfiguration
Given an array of parser specifications, it returns the corresponding list of parsers (only the correct specifications are put in the list.
parserSpec - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A Parser specification that will be parsed using an ObjectParser.
parseTime(String) - Static method in class it.unimi.di.law.bubing.StartupConfiguration
 
ParsingThread - Class in it.unimi.di.law.bubing.frontier
A thread parsing pages retrieved by a FetchingThread.
ParsingThread(Frontier, Store, int) - Constructor for class it.unimi.di.law.bubing.frontier.ParsingThread
Creates a thread.
ParsingThread.FrontierEnqueuer - Class in it.unimi.di.law.bubing.frontier
A small gadget used to insert links in the frontier.
parsingThreads - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The parsing threads.
parsingThreads - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
parsingThreads - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The number of parsing threads (usually, the number of available cores).
parsingThreads(int) - Method in class it.unimi.di.law.bubing.frontier.Frontier
Changes the number of parsing threads.
pathAndQuery(URI) - Static method in class it.unimi.di.law.bubing.util.BURL
Returns the concatenated raw path and raw query of a BUbiNG URL.
pathAndQueryAsByteArray(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
Extracts the path and query of an absolute BUbiNG URL in its byte-array representation.
pathAndQueryAsByteArray(ByteArrayList) - Static method in class it.unimi.di.law.bubing.util.BURL
Extracts the path and query of an absolute BUbiNG URL in its byte-array representation.
pathAndQueryAsByteArray(URI) - Static method in class it.unimi.di.law.bubing.util.BURL
Returns an ASCII byte-array representation of the raw path and raw query of a BUbiNG URL.
PathEndsWithOneOf - Class in it.unimi.di.law.warc.filters
A filter accepting only URIs whose path ends (case-insensitively) with one of a given set of suffixes.
PathEndsWithOneOf(String[]) - Constructor for class it.unimi.di.law.warc.filters.PathEndsWithOneOf
Creates a filter that only accepts URLs whose path ends with one of a given set of suffixes.
pathQueriesInQueues - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The overall number of path+queries stored in VisitState queues.
PATHQUERIESINQUEUES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
pathQueryLimit() - Method in class it.unimi.di.law.bubing.frontier.VisitState
Returns an estimate of the number of path+queries that this visit state should keep in memory.
pause() - Method in class it.unimi.di.law.bubing.Agent
 
paused - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
Whether the crawler is currently paused.
pointer() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Returns the current pointer.
pointer(long) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Sets the current pointer.
poll() - Method in class it.unimi.di.law.bubing.util.LockFreeQueue
 
position(long) - Method in class it.unimi.di.law.warc.io.CompressedWarcCachingReader
 
position(long) - Method in class it.unimi.di.law.warc.io.CompressedWarcReader
 
position(long) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveReader
 
position(long) - Method in class it.unimi.di.law.warc.io.UncompressedWarcReader
 
position(long) - Method in interface it.unimi.di.law.warc.io.WarcCachingReader
 
position(long) - Method in interface it.unimi.di.law.warc.io.WarcReader
 
prepareToAppend() - Method in class it.unimi.di.law.bubing.frontier.Frontier
 
prepareToAppend() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
 
prepareToAppend() - Method in interface it.unimi.di.law.bubing.sieve.AbstractSieve.NewFlowReceiver
A new flow of keys is ready and will start being appended.
prevCharIsCR - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
prevCharIsLF - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
print(String) - Method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy.CapriciousPrintWriter
 
println() - Method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy.CapriciousPrintWriter
 
println(String) - Method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy.CapriciousPrintWriter
 
process(Parser.LinkReceiver, URI, String) - Method in class it.unimi.di.law.bubing.parser.HTMLParser
Pre-process a string that represents a raw link found in the page, trying to derelativize it.
process(WarcRecord, long) - Method in class it.unimi.di.law.warc.processors.IdentityProcessor
 
process(WarcRecord, long) - Method in interface it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner.Processor
 
process(WarcRecord, long) - Method in class it.unimi.di.law.warc.processors.ResponseContentExtractor
 
process(WarcRecord, long) - Method in class it.unimi.di.law.warc.processors.WarcTargetUriExtractor
 
PROTOCOL_VERSION - Static variable in interface it.unimi.di.law.warc.records.WarcRecord
The version of the supported format.
proxyHost - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
proxyHost - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The proxy host, if a proxy should be used; an empty value means that the proxy should not be set.
proxyPort - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
proxyPort - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The proxy port, meaningful only if StartupConfiguration.proxyHost is not empty.
put(byte[], int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
 
put(byte[], int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
Sets the value associated with a given key.
put(byte[], int, int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
 
put(byte[], int, int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
Sets the value associated with a given key.
put(byte[], int, int, long, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
 
put(E, long) - Method in class it.unimi.di.law.warc.util.ReorderingBlockingQueue
Inserts an element with given timestamp, waiting for space to become available if the timestamp of the element minus the current timestamp of the queue exceeds the queue capacity.
putInEntryIfNotEmpty() - Method in class it.unimi.di.law.bubing.frontier.VisitState
Puts this visit state in its entry, if it not empty.
putOnWorkbenchIfNotEmpty(Workbench) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Puts this entry on the workbench, if not empty.

Q

QueueData() - Constructor for class it.unimi.di.law.bubing.util.ByteArrayDiskQueues.QueueData
 
quickMessageThread - Variable in class it.unimi.di.law.bubing.Agent
 
QuickMessageThread - Class in it.unimi.di.law.bubing.frontier
A thread that takes care of pouring the content of Frontier.quickReceivedURLs into Frontier.receivedURLs.
QuickMessageThread(Frontier) - Constructor for class it.unimi.di.law.bubing.frontier.QuickMessageThread
Creates the thread.
quickReceivedURLs - Variable in class it.unimi.di.law.bubing.frontier.Frontier
A queue to quickly buffer URLs communicated by Frontier.receive(BubingJob).

R

RandomNamedGraphServer - Class in it.unimi.di.law.bubing.test
A NamedGraphServer exposing a random graph.
RandomNamedGraphServer(int, int, int) - Constructor for class it.unimi.di.law.bubing.test.RandomNamedGraphServer
Builds the server.
RandomNamedGraphServer(int, int, int, boolean) - Constructor for class it.unimi.di.law.bubing.test.RandomNamedGraphServer
 
ratio() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
 
rc - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The runtime configuration.
read() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Reads a byte at the current pointer.
read() - Method in class it.unimi.di.law.warc.io.CompressedWarcReader
 
read() - Method in class it.unimi.di.law.warc.io.UncompressedWarcReader
 
read() - Method in interface it.unimi.di.law.warc.io.WarcReader
 
read(boolean) - Method in class it.unimi.di.law.warc.io.AbstractWarcReader
 
read(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Reads a specified number of bytes at the current pointer.
readByteArray(ObjectInputStream) - Static method in class it.unimi.di.law.bubing.util.Util
Reads a byte array prefixed by its length encoded using vByte.
readChar() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Read a character.
ReadEntry() - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPArchive.ReadEntry
 
readLong() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Reads a long at the current pointer.
readMetadata() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
 
readVByte(InputStream) - Static method in class it.unimi.di.law.bubing.util.Util
Decodes a natural number from an InputStream using vByte.
READY_URLS_BUFFER_SIZE - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
The size of the buffer used for Frontier.readyURLs.
readyURLs - Variable in class it.unimi.di.law.bubing.frontier.Frontier
A queue to store URLs coming out of the Frontier.sieve.
READYURLSSIZE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
receive(BubingJob) - Method in class it.unimi.di.law.bubing.frontier.Frontier
 
receivedURLs - Variable in class it.unimi.di.law.bubing.frontier.Frontier
A queue to buffer in the long run URLs communicated by Frontier.receive(BubingJob).
receivedURLsLogger - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
A global progress logger, counting the URLs received from other agents.
RECEIVEDURLSSIZE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
refill - Variable in class it.unimi.di.law.bubing.frontier.Frontier
A queue of visit states ready to be reilled; it is filled by fetching threads and emptied by the Distributor.
rehash(int) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
Rehashes the state set to a new size.
rehash(int) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Rehashes the set to a new size.
rehash(int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
Rehashes the stripe.
ReInit(FilterParserTokenManager) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
Reinitialise.
ReInit(SimpleCharStream) - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
Reinitialise parser.
ReInit(SimpleCharStream, int) - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
Reinitialise parser.
ReInit(InputStream) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
Reinitialise.
ReInit(InputStream) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Reinitialise.
ReInit(InputStream, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Reinitialise.
ReInit(InputStream, int, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Reinitialise.
ReInit(InputStream, String) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
Reinitialise.
ReInit(InputStream, String) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Reinitialise.
ReInit(InputStream, String, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Reinitialise.
ReInit(InputStream, String, int, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Reinitialise.
ReInit(Reader) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
Reinitialise.
ReInit(Reader) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Reinitialise.
ReInit(Reader, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Reinitialise.
ReInit(Reader, int, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Reinitialise.
relativeStandardDeviation() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the relative standard deviation of the values added so far.
release(VisitState) - Method in class it.unimi.di.law.bubing.frontier.Workbench
Releases a previously acquired visit state.
remaining() - Method in class it.unimi.di.law.warc.util.BoundSessionInputBuffer
Returns the number of unread bytes (in the buffered stream).
remove() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Removes the top element from the visit-state queue.
remove(byte[], int, int, long) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
 
remove(VisitState) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
Removes a given visit state.
remove(VisitState) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
Removes all path+queries associated with the given visit state.
remove(WorkbenchEntry) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Removes a given workbench entry.
remove(Object) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Remove all elements associated with a given key.
ReorderingBlockingQueue<E> - Class in it.unimi.di.law.warc.util
A blocking queue holding a fixed amount of timestamped items.
ReorderingBlockingQueue(int) - Constructor for class it.unimi.di.law.warc.util.ReorderingBlockingQueue
Creates a ReorderingBlockingQueue with the given fixed capacity.
REQUEST - it.unimi.di.law.warc.records.WarcRecord.Type
 
requestLogger - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
A global progress logger, measuring the number of completed requests.
requiredFrontSize - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The current estimation for the size of the front in IP addresses.
REQUIREDFRONTSIZE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
resetFetchingThreadsWaitingStats() - Method in class it.unimi.di.law.bubing.frontier.Frontier
Resets the statistics relative to the wait time of FetchingThreads.
resolve(String) - Method in class it.unimi.di.law.bubing.frontier.dns.DnsJavaResolver
 
resolve(String) - Method in class it.unimi.di.law.bubing.frontier.dns.FakeResolver
 
resolve(String) - Method in class it.unimi.di.law.bubing.frontier.dns.JavaResolver
 
resolvedVisitStates - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
The number of resolved visit states.
resourceLogger - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
A global progress logger, measuring the number of non-duplicate resources actually stored.
response - Variable in class it.unimi.di.law.bubing.util.FetchData
The response from Apache Http Components returned during the last fetch.
response() - Method in class it.unimi.di.law.bubing.util.FetchData
 
response() - Method in interface it.unimi.di.law.warc.filters.URIResponse
Returns the response part.
response() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
RESPONSE - it.unimi.di.law.warc.records.WarcRecord.Type
 
responseBodyMaxByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
responseBodyMaxByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The maximum size (in bytes) of a response body.
responseCacheDir - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
responseCacheDir - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A directory where the content overflowing the in-memory buffers of FetchData instances (of StartupConfiguration.fetchDataBufferByteSize bytes) will be stored using an InspectableFileCachedInputStream.
ResponseContentExtractor - Class in it.unimi.di.law.warc.processors
 
ResponseMatches - Class in it.unimi.di.law.warc.filters
A filter accepting only http responses whose content stream (in ISO-8859-1 encoding) matches a regular expression.
ResponseMatches(Pattern) - Constructor for class it.unimi.di.law.warc.filters.ResponseMatches
 
restore() - Method in class it.unimi.di.law.bubing.frontier.Frontier
Restores data from the given directory.
result() - Method in class it.unimi.di.law.bubing.parser.BinaryParser
 
result() - Method in class it.unimi.di.law.bubing.parser.HTMLParser
 
result() - Method in interface it.unimi.di.law.bubing.parser.Parser
Returns the result of the processing.
result() - Method in interface it.unimi.di.law.bubing.parser.Parser.TextProcessor
Returns the result of the processing.
result() - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
 
results - Variable in class it.unimi.di.law.bubing.frontier.Frontier
A lock-free list of FetchData to be parsed; it is filled by fetching threads and emptied by the parsing threads.
resume() - Method in class it.unimi.di.law.bubing.Agent
 
retries - Variable in class it.unimi.di.law.bubing.frontier.VisitState
If VisitState.lastExceptionClass is not null, the count of the retries for this type of exception.
robots - Variable in class it.unimi.di.law.bubing.util.FetchData
Whether we are fecthing a robots file.
ROBOTS_PATH - Static variable in class it.unimi.di.law.bubing.frontier.VisitState
A special path marking a robots.txt refresh request.
robotsExpiration - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
robotsExpiration - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The delay after which the robots.txt file is no longer considered valid.
robotsFilter - Variable in class it.unimi.di.law.bubing.frontier.VisitState
The robots-forbidden prefixes we are caching, as returned from the URLRespectsRobots.parseRobotsReader(Reader, String) method.
robotsRequestConfig - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The default configuration for a robots.txt request.
robotsWarcParallelOutputStream - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The Warc file where to write (if so requested) the downloaded robots.txt files.
rootDir - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
rootDir - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A root directory from which the remainig one will be stemmed, if they are relative.
run() - Method in class it.unimi.di.law.bubing.frontier.Distributor
 
run() - Method in class it.unimi.di.law.bubing.frontier.DNSThread
 
run() - Method in class it.unimi.di.law.bubing.frontier.DoneThread
 
run() - Method in class it.unimi.di.law.bubing.frontier.FetchingThread
 
run() - Method in class it.unimi.di.law.bubing.frontier.MessageThread
 
run() - Method in class it.unimi.di.law.bubing.frontier.ParsingThread
 
run() - Method in class it.unimi.di.law.bubing.frontier.QuickMessageThread
 
run() - Method in class it.unimi.di.law.bubing.frontier.StatsThread
 
run() - Method in class it.unimi.di.law.bubing.frontier.TodoThread
 
run() - Method in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
 
run(int) - Method in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
 
runSequentially() - Method in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
 
RuntimeConfiguration - Class in it.unimi.di.law.bubing
Global data shared by all threads.
RuntimeConfiguration(StartupConfiguration) - Constructor for class it.unimi.di.law.bubing.RuntimeConfiguration
 

S

SameHost - Class in it.unimi.di.law.warc.filters
A filter accepting only inter-host links.
SameHost() - Constructor for class it.unimi.di.law.warc.filters.SameHost
 
sampleRelativeStandardDeviation() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the sample relative standard deviation of the values added so far.
sampleStandardDeviation() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the sample standard deviation of the values added so far.
sampleVariance() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the sample variance of the values added so far.
scheduledLinks - Variable in class it.unimi.di.law.bubing.frontier.ParsingThread.FrontierEnqueuer
 
scheduleFilter - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
scheduleFilter - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A filter that will be applied to all links obtained by parsing a page before scheduling them.
schedulePurge() - Method in class it.unimi.di.law.bubing.frontier.VisitState
Disables permanently this visit state and schedules its purge by setting the count associated with its VisitState.schemeAuthority to Integer.MAX_VALUE, clearing the internal queue and setting VisitState.nextFetch to Long.MAX_VALUE.
schemeAndAuthority(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
Extracts the scheme+authority of an absolute BUbiNG URL in its byte-array representation.
schemeAndAuthority(URI) - Static method in class it.unimi.di.law.bubing.util.BURL
Returns the concatenated URI.getScheme() and raw authority of a BUbiNG URL.
schemeAndAuthorityAsByteArray(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
Extracts the scheme+authority of an absolute BUbiNG URL in its byte-array representation.
schemeAuthority - Variable in class it.unimi.di.law.bubing.frontier.VisitState
The scheme and authority visited by this visit state.
schemeAuthority2Count - Variable in class it.unimi.di.law.bubing.frontier.Frontier
A synchronized, highly concurrent map from scheme+authorities to number of stored URLs.
schemeAuthority2VisitState - Variable in class it.unimi.di.law.bubing.frontier.Distributor
An unsynchronized map from scheme+authorities to the corresponding VisitState.
schemeAuthorityDelay - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
schemeAuthorityDelay - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The minimum delay between two consecutive fetches from the same scheme+authority.
SchemeEquals - Class in it.unimi.di.law.warc.filters
A filter accepting only URIs whose scheme equals a certain string (typically, http).
SchemeEquals(String) - Constructor for class it.unimi.di.law.warc.filters.SchemeEquals
Creates a filter that only accepts URIs with a given scheme.
seed - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
An iterator returning URIs that are then used as a seed; this iterator may return null (when invalid or relative URLs are specified).
seed - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A URL from which BUbiNG will start crawling.
setComment(String) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
Sets the comment of the entry.
setConnectionTimeout(int) - Method in class it.unimi.di.law.bubing.Agent
 
setDebugStream(PrintStream) - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
Set debug output.
setDnsThreads(int) - Method in class it.unimi.di.law.bubing.Agent
 
setEntity(HttpEntity) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
setEntity(HttpEntity) - Method in class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
 
setEntity(HttpEntity) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
setFetchFilter(String) - Method in class it.unimi.di.law.bubing.Agent
 
setFetchingThreads(int) - Method in class it.unimi.di.law.bubing.Agent
 
setFollowFilter(String) - Method in class it.unimi.di.law.bubing.Agent
 
setInput(InputStream) - Method in class it.unimi.di.law.warc.io.AbstractWarcReader
 
setIpDelay(long) - Method in class it.unimi.di.law.bubing.Agent
 
setKeepAliveTime(int) - Method in class it.unimi.di.law.bubing.Agent
 
SetLinkReceiver() - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
 
setLocale(Locale) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
Deprecated.
setLocale(Locale) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
Deprecated.
setMaxUrls(long) - Method in class it.unimi.di.law.bubing.Agent
 
setName(String) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
Sets the name of the entry.
setNewFlowRecevier(AbstractSieve.NewFlowReceiver<K>) - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve
Sets the receiver for the new flows generated by this sieve.
setParseFilter(String) - Method in class it.unimi.di.law.bubing.Agent
 
setParsingThreads(int) - Method in class it.unimi.di.law.bubing.Agent
 
setReasonPhrase(String) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
setReasonPhrase(String) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
setResponseBodyMaxByteSize(int) - Method in class it.unimi.di.law.bubing.Agent
 
setRobotsExpiration(long) - Method in class it.unimi.di.law.bubing.Agent
 
setScheduleFilter(String) - Method in class it.unimi.di.law.bubing.Agent
 
setSchemeAuthorityDelay(long) - Method in class it.unimi.di.law.bubing.Agent
 
setSocketTimeout(int) - Method in class it.unimi.di.law.bubing.Agent
 
setStatusCode(int) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
setStatusCode(int) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
setStatusLine(ProtocolVersion, int) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
setStatusLine(ProtocolVersion, int) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
setStatusLine(ProtocolVersion, int, String) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
setStatusLine(ProtocolVersion, int, String) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
setStatusLine(StatusLine) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
setStatusLine(StatusLine) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
setStoreFilter(String) - Method in class it.unimi.di.law.bubing.Agent
 
setTabSize(int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
setUrlCacheMaxByteSize(long) - Method in class it.unimi.di.law.bubing.Agent
 
setWorkbenchEntry(WorkbenchEntry) - Method in class it.unimi.di.law.bubing.frontier.VisitState
Sets the workbench entry and put this visit state in its entry if it is nonempty.
setWorkbenchMaxByteSize(long) - Method in class it.unimi.di.law.bubing.Agent
 
shiftKeys(int) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
Shifts left entries with the specified hash code, starting at the specified position, and empties the resulting free entry.
shiftKeys(int) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Shifts left entries with the specified hash code, starting at the specified position, and empties the resulting free entry.
shiftKeys(int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
Shifts left entries with the specified hash code, starting at the specified position, and empties the resulting free entry.
SHORT_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
sieve - Variable in class it.unimi.di.law.bubing.frontier.Frontier
An instance of a MercatorSieve.
sieveAuxFileIOBufferByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
sieveAuxFileIOBufferByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The I/O buffer used to write the auxiliary file (containing URLs) and to read it back during flushes.
sieveDir - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
sieveDir - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A directory for storing files related to the sieve.
SieveEntry(K, V) - Constructor for class it.unimi.di.law.bubing.sieve.AbstractSieve.SieveEntry
 
sieveSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
sieveSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The number of slots in the sieve.
sieveStoreIOBufferByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
sieveStoreIOBufferByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The size of the two buffers used to read the 64-bit hashes stored by the sieve during flushes.
SimpleCharStream - Class in it.unimi.di.law.warc.filters.parser
An implementation of interface CharStream, where the stream is assumed to contain only ASCII characters (without unicode processing).
SimpleCharStream(InputStream) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Constructor.
SimpleCharStream(InputStream, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Constructor.
SimpleCharStream(InputStream, int, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Constructor.
SimpleCharStream(InputStream, String) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Constructor.
SimpleCharStream(InputStream, String, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Constructor.
SimpleCharStream(InputStream, String, int, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Constructor.
SimpleCharStream(Reader) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Constructor.
SimpleCharStream(Reader, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Constructor.
SimpleCharStream(Reader, int, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Constructor.
size - Variable in class it.unimi.di.law.bubing.frontier.VisitStateSet
Number of entries in the set.
size - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Number of entries in the set.
size - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
The overall number of elements in the queues.
size - Variable in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
Number of entries in the stripe.
size() - Method in class it.unimi.di.law.bubing.frontier.VisitState
Computes the size (i.e., number of URLs) in this visit state.
size() - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
The number of visit states.
size() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Returns the number of visit states currently in the visit-state queue.
size() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Returns the size (number of entries) in the workbench.
size() - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
 
size() - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
 
size() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
 
size() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Deprecated.
size() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Deprecated.
size() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Deprecated.
size() - Method in class it.unimi.di.law.bubing.util.LockFreeQueue
Returns the (approximate) size of this queue.
size() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Deprecated.
size() - Method in class it.unimi.di.law.warc.util.ReorderingBlockingQueue
Returns the number of elements in this queue.
size64() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
 
size64() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Returns the overall number of elements in the queues.
size64() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the number of values added so far.
size64() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
 
skip(FastBufferedInputStream) - Method in class it.unimi.di.law.bubing.sieve.ByteArrayListByteSerializerDeserializer
 
skip(FastBufferedInputStream) - Method in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
Skip an object, usually without deserializing it.
skip(FastBufferedInputStream) - Method in class it.unimi.di.law.bubing.sieve.CharSequenceByteSerializerDeserializer
 
SKIP_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
skipEntry() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveReader
 
snap() - Method in class it.unimi.di.law.bubing.frontier.Frontier
Snaps fields to files in the given directory.
socketTimeout - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
socketTimeout - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The socket timeout.
source - Variable in class it.unimi.di.law.bubing.util.Link
 
spamDetectionPeriodicity - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
spamDetectionPeriodicity - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The number of pages per scheme+authority after which spam detection is performed again periodically.
spamDetectionThreshold - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
spamDetectionThreshold - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The number of pages per scheme+authority after which spam detection is performed.
spamDetector - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
SpamDetector<T> - Interface in it.unimi.di.law.bubing.spam
A detector for spam sites.
spamDetectorUri - Variable in class it.unimi.di.law.bubing.StartupConfiguration
An optional SpamDetector; this URI should point to a serialized instance.
spammicity - Variable in class it.unimi.di.law.bubing.frontier.VisitState
The spammicity score, if computed; -1, otherwise.
SpamTextProcessor - Class in it.unimi.di.law.bubing.parser
An implementation of a Parser.TextProcessor that accumulates the counts of terms from a given set specified via a StringMap.
SpamTextProcessor(Object2LongFunction<MutableString>) - Constructor for class it.unimi.di.law.bubing.parser.SpamTextProcessor
 
SpamTextProcessor(String) - Constructor for class it.unimi.di.law.bubing.parser.SpamTextProcessor
 
SpamTextProcessor.TermCount - Class in it.unimi.di.law.bubing.parser
 
specialToken - Variable in class it.unimi.di.law.warc.filters.parser.Token
This field is used to access special tokens that occur prior to this token, but after the immediately preceding regular (non-special) token.
speedDist - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The logarithmically binned statistics of download speed in bits/s.
standardDeviation() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the standard deviation of the values added so far.
standardFilters() - Static method in class it.unimi.di.law.warc.filters.Filters
Returns a list of the standard filter classes.
start() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
Parser.
start(long) - Method in class it.unimi.di.law.bubing.frontier.StatsThread
Starst all progress loggers.
startOfHost(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
Finds the start of the host part in a URL.
startOfpathAndQuery(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
Returns the starting position (i.e., the slash) of the path+query in the given BUbiNG URL.
startPaused - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
startPaused - Variable in class it.unimi.di.law.bubing.StartupConfiguration
Whether we should start in paused state.
startTag(StartTag) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
 
startTags - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
Cached byte representations of all opening tags.
startTime - Variable in class it.unimi.di.law.bubing.util.FetchData
System.currentTimeMillis() when the GET request was issued.
StartupConfiguration - Class in it.unimi.di.law.bubing
A class whose public fields represent the configuration of BUbiNG at startup.
StartupConfiguration(File) - Constructor for class it.unimi.di.law.bubing.StartupConfiguration
Creates a configuration starting from a given file.
StartupConfiguration(File, Configuration) - Constructor for class it.unimi.di.law.bubing.StartupConfiguration
Creates a configuration starting from a given file and possibly adding and/or overriding some properties with new values.
StartupConfiguration(String) - Constructor for class it.unimi.di.law.bubing.StartupConfiguration
Creates a configuration starting from a given file.
StartupConfiguration(String, Configuration) - Constructor for class it.unimi.di.law.bubing.StartupConfiguration
Creates a configuration starting from a given file and possibly adding and/or overriding some properties with new values.
StartupConfiguration(Configuration) - Constructor for class it.unimi.di.law.bubing.StartupConfiguration
Populate the object fields starting from the given configuration.
StartupConfiguration.DnsResolverSpecification - Annotation Type in it.unimi.di.law.bubing
A marker for the DnsResolver class specification.
StartupConfiguration.FilterSpecification - Annotation Type in it.unimi.di.law.bubing
A marker for filter specifications.
StartupConfiguration.ManyValuesSpecification - Annotation Type in it.unimi.di.law.bubing
A marker for specifications that may have multiple values.
StartupConfiguration.OptionalSpecification - Annotation Type in it.unimi.di.law.bubing
A marker for optional specifications with a default parameter.
StartupConfiguration.StoreSpecification - Annotation Type in it.unimi.di.law.bubing
A marker for the Store class specification.
StartupConfiguration.TimeSpecification - Annotation Type in it.unimi.di.law.bubing
A marker for time specifications; such specification are by default in milliseconds, but it is possible to use suffixes ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days).
STATIC_LEXER_ERROR - Static variable in error it.unimi.di.law.warc.filters.parser.TokenMgrError
An attempt was made to create a second instance of a static token manager.
staticFlag - Static variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
Whether parser is static.
statsThread - Variable in class it.unimi.di.law.bubing.frontier.Distributor
The thread printing statistics.
StatsThread - Class in it.unimi.di.law.bubing.frontier
A class isolating a number of ProgressLogger instances keeping track of a number of quantities of interest related to the Distributor, e.g., requests, transferred byets, etc.
StatsThread(Frontier, Distributor) - Constructor for class it.unimi.di.law.bubing.frontier.StatsThread
Creates the thread.
StatusCategory - Class in it.unimi.di.law.warc.filters
A filter accepting only fetched response whose status category (status/100) has a certain value.
StatusCategory(int) - Constructor for class it.unimi.di.law.warc.filters.StatusCategory
Creates a filter that only accepts responses of the given category.
stop - Variable in class it.unimi.di.law.bubing.frontier.DNSThread
Whether we should stop (used also to reduce the number of threads).
stop - Variable in class it.unimi.di.law.bubing.frontier.DoneThread
When set to true, this thread will complete its execution.
stop - Variable in class it.unimi.di.law.bubing.frontier.FetchingThread
Whether we should stop (used also to reduce the number of threads).
stop - Variable in class it.unimi.di.law.bubing.frontier.MessageThread
When set to true, this thread will complete its execution.
stop - Variable in class it.unimi.di.law.bubing.frontier.ParsingThread
Whether we should stop (used also to reduce the number of threads).
stop - Variable in class it.unimi.di.law.bubing.frontier.QuickMessageThread
When set to true, this thread will complete its execution.
stop() - Method in class it.unimi.di.law.bubing.Agent
 
stopping - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
Whether the crawler is currently being stopping.
store - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The store.
store(URI, HttpResponse, boolean, byte[], String) - Method in interface it.unimi.di.law.bubing.store.Store
 
store(URI, HttpResponse, boolean, byte[], String) - Method in class it.unimi.di.law.bubing.store.UnbufferedFileStore
 
store(URI, HttpResponse, boolean, byte[], String) - Method in class it.unimi.di.law.bubing.store.WarcStore
 
Store - Interface in it.unimi.di.law.bubing.store
An interface for components that are able to store pages.
STORE_NAME - Static variable in class it.unimi.di.law.bubing.store.UnbufferedFileStore
 
STORE_NAME - Static variable in class it.unimi.di.law.bubing.store.WarcStore
 
storeClass - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
storeClass - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The class used to Store the resources.
storeDir - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
storeDir - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A directory where the retrieved content will be written.
storeFilter - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
storeFilter - Variable in class it.unimi.di.law.bubing.StartupConfiguration
A filter that will be applied to all fetched resources to decide whether to store them.
StringHttpMessages - Class in it.unimi.di.law.warc.util
Mock implementations of some AbstractHttpMessage.
StringHttpMessages() - Constructor for class it.unimi.di.law.warc.util.StringHttpMessages
 
StringHttpMessages.HttpRequest - Class in it.unimi.di.law.warc.util
A mock implementation of HttpRequest.
StringHttpMessages.HttpResponse - Class in it.unimi.di.law.warc.util
A mock implementation of HttpResponse using strings (based on ByteArrayEntity).
Stripe() - Constructor for class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
Creates a new stripe.
Stripe(long, float) - Constructor for class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache.Stripe
Creates a new stripe.
SUB_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
subDir(String, String) - Static method in class it.unimi.di.law.bubing.StartupConfiguration
Returns a File object representing a child relative to a parent, or just the child, if absolute.
subSequence(int, int) - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
 
successors(CharSequence) - Method in class it.unimi.di.law.bubing.test.ImmutableGraphNamedGraphServer
 
successors(CharSequence) - Method in interface it.unimi.di.law.bubing.test.NamedGraphServer
If src corresponds to the name of a node in the graph, this method returns an array with the name of its successors (in some order); otherwise, it returns null.
successors(CharSequence) - Method in class it.unimi.di.law.bubing.test.RandomNamedGraphServer
 
sum() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the sum of the values added so far.
suspend() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Suspends this queue.
suspend() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Suspends this queue.
SwitchTo(int) - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
Switch to specified lex state.

T

tabSize - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
tail - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues.QueueData
The pointer to the tail of the list (the most recently enqueued element).
take() - Method in class it.unimi.di.law.warc.util.ReorderingBlockingQueue
Returns the element with the next timestamp, waiting until it is available.
target - Variable in class it.unimi.di.law.bubing.util.Link
 
termCount - Variable in class it.unimi.di.law.bubing.frontier.VisitState
A map from term indices to counts for the pages of this host.
TermCount() - Constructor for class it.unimi.di.law.bubing.parser.SpamTextProcessor.TermCount
 
termCountUpdates - Variable in class it.unimi.di.law.bubing.frontier.VisitState
The number of calls performed to VisitState.updateTermCount(Short2ShortMap).
textProcessor - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
A text processor, or null.
THRESHOLD - Static variable in class it.unimi.di.law.warc.filters.IsProbablyBinary
The number of zeroes that must appear to cause the page to be considered probably binary.
toByteArray(BubingJob) - Method in class it.unimi.di.law.bubing.Agent
 
toByteArray(String) - Static method in class it.unimi.di.law.bubing.util.Util
Returns a byte-array representation of an ASCII string.
toByteArray(URI) - Static method in class it.unimi.di.law.bubing.util.BURL
Returns an ASCII byte-array representation of a BUbiNG URL.
toByteArrayList(String, ByteArrayList) - Static method in class it.unimi.di.law.bubing.util.Util
Writes an ASCII string in a ByteArrayList.
toByteArrayList(URI, ByteArrayList) - Static method in class it.unimi.di.law.bubing.util.BURL
Writes an ASCII representation of a BUbiNG URL in a ByteArrayList.
todo - Variable in class it.unimi.di.law.bubing.frontier.Frontier
A lock-free list of visit states ready to be visited; it is filled by the TodoThread and emptied by the fetching threads.
TodoThread - Class in it.unimi.di.law.bubing.frontier
A thread that continuously acquires a VisitState from the Workbench and adds it to the Frontier.todo queue.
TodoThread(Frontier) - Constructor for class it.unimi.di.law.bubing.frontier.TodoThread
Instantiates the thread.
toHexString(byte[]) - Static method in class it.unimi.di.law.warc.util.Util
Returns a string representing in hexadecimal a digest.
toInputStream() - Method in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
 
token - Variable in class it.unimi.di.law.warc.filters.parser.FilterParser
Current token.
Token - Class in it.unimi.di.law.warc.filters.parser
Describes the input token stream.
Token() - Constructor for class it.unimi.di.law.warc.filters.parser.Token
No-argument constructor
Token(int) - Constructor for class it.unimi.di.law.warc.filters.parser.Token
Constructs a new token for the specified Image.
Token(int, String) - Constructor for class it.unimi.di.law.warc.filters.parser.Token
Constructs a new token for the specified Image and Kind.
token_source - Variable in class it.unimi.di.law.warc.filters.parser.FilterParser
Generated Token Manager.
tokenImage - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
Literal token values.
tokenImage - Variable in exception it.unimi.di.law.warc.filters.parser.ParseException
This is a reference to the "tokenImage" array of the generated parser within which the parse error occurred.
TokenMgrError - Error in it.unimi.di.law.warc.filters.parser
Token Manager Error.
TokenMgrError() - Constructor for error it.unimi.di.law.warc.filters.parser.TokenMgrError
No arg constructor.
TokenMgrError(boolean, int, int, int, String, int, int) - Constructor for error it.unimi.di.law.warc.filters.parser.TokenMgrError
Full Constructor.
TokenMgrError(String, int) - Constructor for error it.unimi.di.law.warc.filters.parser.TokenMgrError
Constructor with message and reason.
TooSlowException - Exception in it.unimi.di.law.bubing.util
A marker IOException for sites that return data too slowly.
TooSlowException() - Constructor for exception it.unimi.di.law.bubing.util.TooSlowException
 
TooSlowException(String) - Constructor for exception it.unimi.di.law.bubing.util.TooSlowException
 
toOutputStream(String, OutputStream) - Static method in class it.unimi.di.law.bubing.util.Util
Writes a string to an output stream, discarding higher order bits.
toSortedPrefixFreeCharArrays(Set<String>) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
 
toStream(ByteArrayList, OutputStream) - Method in class it.unimi.di.law.bubing.sieve.ByteArrayListByteSerializerDeserializer
 
toStream(CharSequence, OutputStream) - Method in class it.unimi.di.law.bubing.sieve.CharSequenceByteSerializerDeserializer
 
toStream(V, OutputStream) - Method in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
Serializes an object starting from a given offset of a byte array.
toString() - Method in class it.unimi.di.law.bubing.frontier.VisitState
 
toString() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
 
toString() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
 
toString() - Method in class it.unimi.di.law.bubing.RuntimeConfiguration
 
toString() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.SieveEntry
 
toString() - Method in class it.unimi.di.law.bubing.StartupConfiguration
 
toString() - Method in class it.unimi.di.law.bubing.util.BubingJob
A string representation of this job
toString() - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
 
toString() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
 
toString() - Method in class it.unimi.di.law.bubing.util.FetchData
 
toString() - Method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
A string representation of the state of this object, that is just the prefix allowed.
toString() - Method in class it.unimi.di.law.warc.filters.DigestEquals
A string representation of the state of this object, that is just the digest allowed.
toString() - Method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
A string representation of the state of this object, that is just the threshold used.
toString() - Method in class it.unimi.di.law.warc.filters.HostEndsWith
A string representation of the state of this object, that is just the suffix allowed.
toString() - Method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
A string representation of the state of this object, that is just the host suffixes allowed.
toString() - Method in class it.unimi.di.law.warc.filters.HostEquals
A string representation of the state of this object, that is just the host allowed.
toString() - Method in class it.unimi.di.law.warc.filters.IsHttpResponse
A string representation of the state of this object.
toString() - Method in class it.unimi.di.law.warc.filters.IsProbablyBinary
A string representation of the state of this filter.
toString() - Method in class it.unimi.di.law.warc.filters.parser.Token
Returns the image.
toString() - Method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
A string representation of the state of this object, that is just the suffixes allowed.
toString() - Method in class it.unimi.di.law.warc.filters.ResponseMatches
A string representation of the state of this filter.
toString() - Method in class it.unimi.di.law.warc.filters.SameHost
Returns SameHost().
toString() - Method in class it.unimi.di.law.warc.filters.SchemeEquals
A string representation of this
toString() - Method in class it.unimi.di.law.warc.filters.StatusCategory
A string representation of this
toString() - Method in class it.unimi.di.law.warc.filters.URLEquals
Get a string representation of this filter
toString() - Method in class it.unimi.di.law.warc.filters.URLMatchesRegex
Get a string representation of this
toString() - Method in class it.unimi.di.law.warc.filters.URLShorterThan
Get a string representation of this filter
toString() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
 
toString() - Method in class it.unimi.di.law.warc.records.HttpRequestWarcRecord
 
toString() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
toString() - Method in class it.unimi.di.law.warc.records.InfoWarcRecord
 
toString() - Method in enum it.unimi.di.law.warc.records.WarcHeader.Name
 
toString() - Method in enum it.unimi.di.law.warc.records.WarcRecord.Type
 
toString() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
 
toString(byte[]) - Static method in class it.unimi.di.law.bubing.util.Util
Returns a string representation of an ASCII byte array.
toString(byte[], int, int) - Static method in class it.unimi.di.law.bubing.util.Util
Returns a string representation of an ASCII byte-array fragment.
toString(char[][]) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
Prints gracefully a robot filter using at most 30 prefixes.
toString(int[]) - Static method in class it.unimi.di.law.bubing.frontier.StatsThread
Returns an integer array as a string, but does not print trailing zeroes.
toString(ByteArrayList) - Static method in class it.unimi.di.law.bubing.util.Util
Returns a string representation of an ASCII byte array.
toString(Object...) - Method in class it.unimi.di.law.warc.filters.AbstractFilter
A helper method that generates a string version of this filter (mainly useful for atomic, i.e., class-based, filters).
toString(AtomicLongArray) - Static method in class it.unimi.di.law.bubing.frontier.StatsThread
Returns an AtomicLongArray array as a string, but does not print trailing zeroes.
ToStringWriter - Class in it.unimi.di.law.warc.processors
 
trackLineColumn - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
TRAILER_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
transferredBytes - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The overall number of transferred bytes.
TRANSFERREDBYTES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
transferredBytesLogger - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
A global progress logger, measuring the number of transferred bytes.
trim() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
Trims this queue.
trim() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
Trims this queue.
TRUE - Static variable in class it.unimi.di.law.warc.filters.Filters
The constantly true filter.
TRUE - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
RegularExpression Id.
truncated - Variable in class it.unimi.di.law.bubing.util.FetchData
True if the last fetch was truncated because of exceedingly long response body.
type() - Method in annotation type it.unimi.di.law.bubing.StartupConfiguration.FilterSpecification
 

U

UnbufferedFileStore - Class in it.unimi.di.law.bubing.store
An unbuffered, directly-to-disk store, mainly for debugging purposes.
UnbufferedFileStore(RuntimeConfiguration) - Constructor for class it.unimi.di.law.bubing.store.UnbufferedFileStore
 
uncompressedSkipLength - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
The length of the entry one uncompressed.
UncompressedWarcReader - Class in it.unimi.di.law.warc.io
 
UncompressedWarcReader(InputStream) - Constructor for class it.unimi.di.law.warc.io.UncompressedWarcReader
 
UncompressedWarcWriter - Class in it.unimi.di.law.warc.io
 
UncompressedWarcWriter(OutputStream) - Constructor for class it.unimi.di.law.warc.io.UncompressedWarcWriter
 
unknownHosts - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The queue of unknown hosts.
unlock() - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
 
unresolved - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
The number of path+queries living in an unresolved visit state.
update(K, V, V) - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DefaultUpdateStrategy
 
update(K, V, V) - Method in interface it.unimi.di.law.bubing.sieve.AbstractSieve.UpdateStrategy
Computes the new value to be put in the store when a duplicate key is found.
updateFetchingThreadsWaitingStats(long) - Method in class it.unimi.di.law.bubing.frontier.Frontier
Updates the statistics relative to the wait time of FetchingThreads.
UpdateLineColumn(char) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
 
updateRequestedFrontSize() - Method in class it.unimi.di.law.bubing.frontier.Frontier
Update, if necessary, the Frontier.requiredFrontSize.
updateStrategy - Variable in class it.unimi.di.law.bubing.sieve.AbstractSieve
 
updateTermCount(Short2ShortMap) - Method in class it.unimi.di.law.bubing.frontier.VisitState
 
uri() - Method in class it.unimi.di.law.bubing.util.FetchData
 
uri() - Method in interface it.unimi.di.law.warc.filters.URIResponse
Returns the URI part.
uri() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
URIResponse - Interface in it.unimi.di.law.warc.filters
An interface implemented by all classes able to expose a HttpResponse and URI, e.g.
url - Variable in class it.unimi.di.law.bubing.util.BubingJob
The BUbiNG URL that should be visited.
url - Variable in class it.unimi.di.law.bubing.util.FetchData
The BUbiNG URL associated with this request.
urlCache - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The URL cache.
urlCacheMaxByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
urlCacheMaxByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The maximum size of the URL cache in bytes.
URLDigestFinalPositionWriter - Class in it.unimi.di.law.warc.processors
 
URLDigestFinalPositionWriter(String, String) - Constructor for class it.unimi.di.law.warc.processors.URLDigestFinalPositionWriter
 
URLDigestStatusLengthWriter - Class in it.unimi.di.law.warc.processors
 
URLDigestStatusLengthWriter() - Constructor for class it.unimi.di.law.warc.processors.URLDigestStatusLengthWriter
 
URLDigestWriter - Class in it.unimi.di.law.warc.processors
 
URLEQUAL_PATTERN - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
The pattern prefixing the URL in a META HTTP-EQUIV element of refresh type.
URLEquals - Class in it.unimi.di.law.warc.filters
A filter accepting only a given URIs.
URLEquals(String) - Constructor for class it.unimi.di.law.warc.filters.URLEquals
Creates a filter that only accepts URIs equal to a given URI.
URLMatchesRegex - Class in it.unimi.di.law.warc.filters
A filter accepting only URIs that match a certain regular expression.
URLMatchesRegex(String) - Constructor for class it.unimi.di.law.warc.filters.URLMatchesRegex
Creates a filter that only accepts URLs matching a given regular expression.
URLPositionWriter - Class in it.unimi.di.law.warc.processors
 
URLPositionWriter(String, String) - Constructor for class it.unimi.di.law.warc.processors.URLPositionWriter
 
URLRespectsRobots - Class in it.unimi.di.law.bubing.util
A class providing static methods to parse robots.txt into arrays of char arrays and handle robot filtering.
urls - Variable in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
The set of URLs gathered so far.
URLShorterThan - Class in it.unimi.di.law.warc.filters
A filter accepting only URIs whose overall length is below a given threshold.
URLShorterThan(int) - Constructor for class it.unimi.di.law.warc.filters.URLShorterThan
Creates a filter that only accepts URLs shorter than the given threshold.
URLWriter - Class in it.unimi.di.law.warc.processors
 
usage - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues.QueueData
The number of bytes used by the list.
USE_BURL_PROPERTY - Static variable in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
used - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
The overall number of bytes used by elements in the queues.
userAgent - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
userAgent - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The User Agent header used for HTTP requests.
userAgentFrom - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
userAgentFrom - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The From header used for HTTP requests.
Util - Class in it.unimi.di.law.bubing.util
Generic static utility method container.
Util - Class in it.unimi.di.law.warc.util
A class containing some utility functions.
Util() - Constructor for class it.unimi.di.law.bubing.util.Util
 
Util() - Constructor for class it.unimi.di.law.warc.util.Util
 

V

value - Variable in class it.unimi.di.law.bubing.sieve.AbstractSieve.SieveEntry
 
value - Variable in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
The array of values.
value - Variable in enum it.unimi.di.law.warc.records.WarcHeader.Name
 
value() - Method in annotation type it.unimi.di.law.bubing.StartupConfiguration.OptionalSpecification
 
valueOf() - Static method in class it.unimi.di.law.warc.filters.IsProbablyBinary
Get a new IsProbablyBinary that will accept only http responses whose content stream appears to be binary.
valueOf() - Static method in class it.unimi.di.law.warc.filters.SameHost
Get a SameHost filter.
valueOf(String) - Static method in enum it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
Get a new ContentTypeStartsWith that will accept only fetched responses whose content type starts with a given string
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.DigestEquals
Get a new DigestEquals that will accept only WarcRecord whose digest is a given string
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
Get a new DuplicateSegmentsLessThan that will accept only URIs whose path does not contain too many duplicate segments.
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.HostEndsWith
Get a new HostEndsWith that will accept only URIs whose suffix is given in input
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
Get a new HostEndsWithOneOf that will accept only URIs whose host part suffix is one of the given suffixes
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.HostEquals
Get a new HostEquals that will accept only URIs whose host part is equal to spec
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.IsHttpResponse
Get a new IsHttpResponse that will accept only WarcRecords that are http/https responses.
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.IsProbablyBinary
Deprecated.
Please use IsProbablyBinary.valueOf() instead.
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
Get a new PathEndsWithOneOf that will accept only URIs whose suffix is one of the allowed suffixes
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.ResponseMatches
Get a new content matcher that will accept only responses whose content stream matches the regular expression.
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.SchemeEquals
Get a new SchemeEquals accepting only URIs whose scheme equals the given string
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.StatusCategory
Get a new StatusCategory accepting only fetched response whose status category (status/100) has a given value
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.URLEquals
Get a new URLEquals accepting only a given URI
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.URLMatchesRegex
Get a new URLMatchesRegex accepting only URIs that match a certain regular expression
valueOf(String) - Static method in class it.unimi.di.law.warc.filters.URLShorterThan
Get a new URLShorterThan
valueOf(String) - Static method in enum it.unimi.di.law.warc.records.WarcHeader.Name
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum it.unimi.di.law.warc.records.WarcRecord.Type
Returns the enum constant of this type with the specified name.
valueOf(Header) - Static method in enum it.unimi.di.law.warc.records.WarcRecord.Type
Determines the WARC record type given the WARC-Type header.
values() - Static method in enum it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum it.unimi.di.law.warc.records.WarcHeader.Name
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum it.unimi.di.law.warc.records.WarcRecord.Type
Returns an array containing the constants of this enum type, in the order they are declared.
valueSerDeser - Variable in class it.unimi.di.law.bubing.sieve.AbstractSieve
 
variance() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
Returns the variance of the values added so far.
vByteLength(int) - Static method in class it.unimi.di.law.bubing.util.Util
Returns the length of the vByte encoding of a natural number.
virtualizer - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The workbench virtualizer used by this frontier.
virtualizerMaxByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
virtualizerMaxByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The maximum size of the virtualizer in bytes; this field is ignored if the virtualizer does not need to be sized.
VIRTUALQUEUESBIRTHTIME - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
VIRTUALQUEUESIZES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
visitState - Variable in class it.unimi.di.law.bubing.frontier.VisitStateSet
The array of keys.
visitState - Variable in class it.unimi.di.law.bubing.util.FetchData
The visit state associated with this request.
VisitState - Class in it.unimi.di.law.bubing.frontier
A class maintaining the current state of the visit of a specific scheme+authority.
VisitState(Frontier, byte[]) - Constructor for class it.unimi.di.law.bubing.frontier.VisitState
Creates a visit state.
visitStates() - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
Returns the array of visit states; the order is arbitrary.
visitStates() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Returns the visit states currently in the queue.
VisitStateSet - Class in it.unimi.di.law.bubing.frontier
A data structure representing the set of visit states created so far.
VisitStateSet() - Constructor for class it.unimi.di.law.bubing.frontier.VisitStateSet
Creates an empty visit state set.
VISITSTATESETSIZE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
VOID - Static variable in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
A NOP-serializer-deserializer for Void (only for values).

W

WARC_BLOCK_DIGEST - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_CONCURRENT_TO - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_DATE - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_FILENAME - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_IDENTIFIED_PAYLOAD_TYPE - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_IP_ADDRESS - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_PAYLOAD_DIGEST - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_PROFILE - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_RECORD_ID - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_REFERS_TO - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_SEGMENT_NUMBER - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_SEGMENT_ORIGIN_ID - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_SEGMENT_TOTAL_LENGTH - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_TARGET_URI - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_TRUNCATED - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_TYPE - it.unimi.di.law.warc.records.WarcHeader.Name
 
WARC_WARCINFO_ID - it.unimi.di.law.warc.records.WarcHeader.Name
 
WarcCachingReader - Interface in it.unimi.di.law.warc.io
 
WarcCompressor - Class in it.unimi.di.law.warc.tool
A tool to compress a WARC file adding the extra GZIP header fields required by the GZIPArchive format used by compressed WARC files.
WarcCompressor() - Constructor for class it.unimi.di.law.warc.tool.WarcCompressor
 
WarcFormatException - Exception in it.unimi.di.law.warc.io
 
WarcFormatException(String) - Constructor for exception it.unimi.di.law.warc.io.WarcFormatException
 
WarcFormatException(String, Throwable) - Constructor for exception it.unimi.di.law.warc.io.WarcFormatException
 
warcHeader(WarcRecord.Type) - Static method in enum it.unimi.di.law.warc.records.WarcRecord.Type
Creates the WARC-Type header of the given record type.
WarcHeader - Class in it.unimi.di.law.warc.records
A class used to represent WARC headers, with a set of static methods to handle them.
WarcHeader(WarcHeader.Name, String) - Constructor for class it.unimi.di.law.warc.records.WarcHeader
Creates a WARC header.
WarcHeader.Name - Enum in it.unimi.di.law.warc.records
An enumeration of WARC headers.
warcHeaders - Variable in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
WARCINFO - it.unimi.di.law.warc.records.WarcRecord.Type
 
WarcReader - Interface in it.unimi.di.law.warc.io
 
WarcRecord - Interface in it.unimi.di.law.warc.records
An interface describing a WARC record.
WarcRecord.Type - Enum in it.unimi.di.law.warc.records
An enumeration of implemented record types.
WarcStore - Class in it.unimi.di.law.bubing.store
A Store implementation using the it.unimi.di.law.warc package.
WarcStore(RuntimeConfiguration) - Constructor for class it.unimi.di.law.bubing.store.WarcStore
 
WarcTargetUriExtractor - Class in it.unimi.di.law.warc.processors
 
WarcWriter - Interface in it.unimi.di.law.warc.io
 
weight - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
weight - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The weight of this agent; agents are assigned a part of the crawl that is proportional to their weight.
weightOfpathQueriesInQueues - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The overall memory (in bytes) used by path+queries stored in VisitState queues.
WEIGHTOFPATHQUERIESINQUEUES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
WORD - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
RegularExpression Id.
workbench - Variable in class it.unimi.di.law.bubing.frontier.Frontier
The workbench.
Workbench - Class in it.unimi.di.law.bubing.frontier
The workbench is a DelayQueue queue of WorkbenchEntry instances, each associated with an IP address.
Workbench() - Constructor for class it.unimi.di.law.bubing.frontier.Workbench
Creates the workbench.
workbenchEntries() - Method in class it.unimi.di.law.bubing.frontier.Workbench
Returns a vector containing the known workbench entries and nulls.
workbenchEntries() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Returns the array of workbench entries; the order is arbitrary.
workbenchEntry - Variable in class it.unimi.di.law.bubing.frontier.VisitState
The workbench entry this visit state belongs to.
workbenchEntry - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
The array of keys.
WorkbenchEntry - Class in it.unimi.di.law.bubing.frontier
An element of the Workbench.
WorkbenchEntry(byte[], AtomicLong) - Constructor for class it.unimi.di.law.bubing.frontier.WorkbenchEntry
Creates a workbench entry for a given IP address.
WorkbenchEntrySet - Class in it.unimi.di.law.bubing.frontier
A data structure representing the set of workbench entries created so far.
WorkbenchEntrySet() - Constructor for class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
Creates a set of workbench entries.
WORKBENCHENTRYSETSIZE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
 
workbenchIsFull() - Method in class it.unimi.di.law.bubing.frontier.Frontier
Returns whether the workbench is full.
workbenchMaxByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
 
workbenchMaxByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
The maximum size of the workbench in bytes.
workbenchSizeInPathQueries - Variable in class it.unimi.di.law.bubing.frontier.Frontier
An estimation of the number of path+query objects that the workbench can store.
WorkbenchVirtualizer - Class in it.unimi.di.law.bubing.frontier
A workbench virtualizer based on a Berkeley DB database.
WorkbenchVirtualizer(Frontier) - Constructor for class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
Creates the virtualizer.
wrap(byte[]) - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
Wraps a byte array into this byte-array character sequence.
wrap(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
Wraps a byte-array fragment into this byte-array character sequence.
write(byte) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Writes a byte at the current pointer.
write(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Writes a specified number of bytes at the current pointer.
write(byte[], long, PrintStream) - Method in class it.unimi.di.law.warc.processors.ByteWriter
 
write(WarcRecord) - Method in class it.unimi.di.law.warc.io.CompressedWarcWriter
 
write(WarcRecord) - Method in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
 
write(WarcRecord) - Method in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter.WriterPair
 
write(WarcRecord) - Method in class it.unimi.di.law.warc.io.UncompressedWarcWriter
 
write(WarcRecord) - Method in interface it.unimi.di.law.warc.io.WarcWriter
 
write(WarcRecord, long, PrintStream) - Method in class it.unimi.di.law.warc.processors.ConstantPositionURLWriter
 
write(WarcRecord, long, PrintStream) - Method in class it.unimi.di.law.warc.processors.DateURLWriter
 
write(WarcRecord, long, PrintStream) - Method in class it.unimi.di.law.warc.processors.IdentityWriter
 
write(WarcRecord, long, PrintStream) - Method in class it.unimi.di.law.warc.processors.URLDigestFinalPositionWriter
 
write(WarcRecord, long, PrintStream) - Method in class it.unimi.di.law.warc.processors.URLDigestStatusLengthWriter
 
write(WarcRecord, long, PrintStream) - Method in class it.unimi.di.law.warc.processors.URLDigestWriter
 
write(WarcRecord, long, PrintStream) - Method in class it.unimi.di.law.warc.processors.URLPositionWriter
 
write(WarcRecord, long, PrintStream) - Method in class it.unimi.di.law.warc.processors.URLWriter
 
write(OutputStream, ByteArraySessionOutputBuffer) - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
write(OutputStream, ByteArraySessionOutputBuffer) - Method in interface it.unimi.di.law.warc.records.WarcRecord
Writes the WARC record.
write(Object, long, PrintStream) - Method in class it.unimi.di.law.warc.processors.ToStringWriter
 
write(T, long, PrintStream) - Method in interface it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner.Writer
 
writeByteArray(byte[], ObjectOutputStream) - Static method in class it.unimi.di.law.bubing.util.Util
Writes a byte array prefixed by its length encoded using vByte.
writeEntry(GZIPArchive.Entry) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveWriter
Writes the entry on the underlying stream.
WriteEntry() - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPArchive.WriteEntry
 
writeHeaders(HeaderGroup, OutputStream) - Static method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
writeLine(String) - Method in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
 
writeLine(CharArrayBuffer) - Method in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
 
writeLong(long) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
Writes a long at the current pointer.
writePayload(ByteArraySessionOutputBuffer) - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
 
writePayload(ByteArraySessionOutputBuffer) - Method in class it.unimi.di.law.warc.records.HttpRequestWarcRecord
 
writePayload(ByteArraySessionOutputBuffer) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
 
writePayload(ByteArraySessionOutputBuffer) - Method in class it.unimi.di.law.warc.records.InfoWarcRecord
 
writeVByte(int, OutputStream) - Static method in class it.unimi.di.law.bubing.util.Util
Encodes a natural number to an OutputStream using vByte.

X

XFL - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
XFL_OS - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
XLEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
 
A B C D E F G H I J K L M N O P Q R S T U V W X 
Skip navigation links