- abort() - Method in class it.unimi.di.law.bubing.frontier.FetchingThread
-
- abort() - Method in class it.unimi.di.law.bubing.util.FetchData
-
Invokes AbstractExecutionAwareRequest.abort()
on the underlying request.
- AbstractFilter<T> - Class in it.unimi.di.law.warc.filters
-
An abstract implementation of a
Filter
providing a
method
that helps in implementing properly
Object.toString()
for atomic (i.e., class-based) filters.
- AbstractFilter() - Constructor for class it.unimi.di.law.warc.filters.AbstractFilter
-
- AbstractSieve<K,V> - Class in it.unimi.di.law.bubing.sieve
-
A sort of a map, that handles (key,value) pairs of generic type.
- AbstractSieve(ByteSerializerDeserializer<K>, ByteSerializerDeserializer<V>, AbstractHashFunction<K>, AbstractSieve.UpdateStrategy<K, V>) - Constructor for class it.unimi.di.law.bubing.sieve.AbstractSieve
-
Creates a new sieve with the given data.
- AbstractSieve.DefaultUpdateStrategy<K,V> - Class in it.unimi.di.law.bubing.sieve
-
- AbstractSieve.DiskNewFlow<T> - Class in it.unimi.di.law.bubing.sieve
-
- AbstractSieve.NewFlowReceiver<K> - Interface in it.unimi.di.law.bubing.sieve
-
An object that can receive a new flow of hash/key pairs and that
acts as a listener for the
AbstractSieve
.
- AbstractSieve.SieveEntry<K,V> - Class in it.unimi.di.law.bubing.sieve
-
A (key,value) pair.
- AbstractSieve.UpdateStrategy<K,V> - Interface in it.unimi.di.law.bubing.sieve
-
An update strategy: it determines how a stored value should be updated in the presence of duplicate keys.
- AbstractWarcReader - Class in it.unimi.di.law.warc.io
-
- AbstractWarcReader() - Constructor for class it.unimi.di.law.warc.io.AbstractWarcReader
-
- AbstractWarcRecord - Class in it.unimi.di.law.warc.records
-
- AbstractWarcRecord(URI, HeaderGroup) - Constructor for class it.unimi.di.law.warc.records.AbstractWarcRecord
-
BUilds a record, optionally given the target URI and the warcHeaders.
- AbstractWarcRecord(HeaderGroup) - Constructor for class it.unimi.di.law.warc.records.AbstractWarcRecord
-
BUilds a record, optionally given the warcHeaders.
- acceptAllCertificates - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- acceptAllCertificates - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
Whether to accept all SSL certificates, or self-signed only.
- acquire() - Method in class it.unimi.di.law.bubing.frontier.Workbench
-
Acquires a visit state for a scheme+authority accessible by politeness.
- acquired - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
Whether this visit state is currently
acquired.
- acquired - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
- adaptFilterHttpResponse2FetchData(Filter<HttpResponse>) - Static method in class it.unimi.di.law.warc.filters.Filters
-
Adapts a filter with
HttpResponse
base type to a filter with
FetchData
base type.
- adaptFilterHttpResponse2URIResponse(Filter<HttpResponse>) - Static method in class it.unimi.di.law.warc.filters.Filters
-
Adapts a filter with
HttpResponse
base type to a filter with
URIResponse
base type.
- adaptFilterHttpResponse2WarcRecord(Filter<HttpResponse>) - Static method in class it.unimi.di.law.warc.filters.Filters
-
Adapts a filter with
HttpResponse
base type to a filter with
WarcRecord
base type.
- adaptFilterString2URI(Filter<String>) - Static method in class it.unimi.di.law.warc.filters.Filters
-
Adapts a filter with String
base type to a filter with URI
base type.
- adaptFilterURI2FetchData(Filter<URI>) - Static method in class it.unimi.di.law.warc.filters.Filters
-
Adapts a filter with
URI
base type to a filter with
FetchData
base type.
- adaptFilterURI2HttpResponseWarcRecord(Filter<URI>) - Static method in class it.unimi.di.law.warc.filters.Filters
-
- adaptFilterURI2Link(Filter<URI>) - Static method in class it.unimi.di.law.warc.filters.Filters
-
Adapts a filter with
URI
base type to a filter with
Link
base type,
applying the original filter to the target URI.
- adaptFilterURI2URIResponse(Filter<URI>) - Static method in class it.unimi.di.law.warc.filters.Filters
-
Adapts a filter with
URI
base type to a filter with
URIResponse
base type.
- adaptFilterURI2WarcRecord(Filter<URI>) - Static method in class it.unimi.di.law.warc.filters.Filters
-
Adapts a filter with
URI
base type to a filter with
WarcRecord
base type.
- add(byte[]) - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
-
- add(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
-
- add(double) - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Adds a value to the stream.
- add(long, long) - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache.Stripe
-
- add(VisitState) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Adds a visit state to the set, if necessary.
- add(VisitState) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
Adds the given visit state to the visit-state queue.
- add(VisitState, Workbench) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
Adds the given visit state to the visit-state queue, and adds this entry to the workbench if it was empty
and not
WorkbenchEntry.acquired.
- add(WorkbenchEntry) - Method in class it.unimi.di.law.bubing.frontier.Workbench
-
Adds a nonempty, not acquired workbench entry to the workbench.
- add(WorkbenchEntry) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
Adds a workbench entry to the set, if necessary.
- add(ParallelFilteredProcessorRunner.Processor<T>, ParallelFilteredProcessorRunner.Writer<? super T>, PrintStream) - Method in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
-
- add(ByteArrayList) - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
-
- add(T) - Method in class it.unimi.di.law.bubing.util.LockFreeQueue
-
- addAll(double[]) - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Adds values to the stream.
- addAll(DoubleList) - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Adds values to the stream.
- addBlackListedHost(String) - Method in class it.unimi.di.law.bubing.Agent
-
- addBlackListedHost(String) - Method in class it.unimi.di.law.bubing.RuntimeConfiguration
-
Adds a (or a set of) new host to the black list; the host can be specified directly or it can be a file (prefixed by
file:
).
- addBlackListedIPv4(String) - Method in class it.unimi.di.law.bubing.Agent
-
- addBlackListedIPv4(String) - Method in class it.unimi.di.law.bubing.RuntimeConfiguration
-
Adds a (or a set of) new IPv4 to the black list; the IPv4 can be specified directly or it can be a file (prefixed by
file:
).
- addEscapes(String) - Static method in error it.unimi.di.law.warc.filters.parser.TokenMgrError
-
Replaces unprintable characters by their escaped (or unicode escaped)
equivalents in the given string
- addIfNotPresent(HeaderGroup, WarcHeader.Name, String) - Static method in class it.unimi.di.law.warc.records.WarcHeader
-
Adds the given header, if not present (otherwise does nothing).
- address2WorkbenchEntry - Variable in class it.unimi.di.law.bubing.frontier.Workbench
-
- addTo(byte[], int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
-
Adds a value to the counter associated with a given key.
- addTo(byte[], int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
-
- addTo(byte[], int, int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
-
Adds a value to the counter associated with a given key.
- addTo(byte[], int, int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
-
- addTo(byte[], int, int, long, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
- adjustBeginLineColumn(int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Method to adjust line and column numbers for the start of a token.
- agent - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The agent that created this frontier.
- Agent - Class in it.unimi.di.law.bubing
-
A BUbiNG agent.
- Agent(String, int, RuntimeConfiguration) - Constructor for class it.unimi.di.law.bubing.Agent
-
- allocated - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
- and() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
- and(Filter<T>...) - Static method in class it.unimi.di.law.warc.filters.Filters
-
Produces the conjunction of the given filters.
- AND - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
-
RegularExpression Id.
- append(char) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
- append(char) - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
-
- append(long, ByteArrayList) - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
- append(long, K) - Method in interface it.unimi.di.law.bubing.sieve.AbstractSieve.NewFlowReceiver
-
A new key is appended.
- append(long, T) - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
-
- append(CharSequence) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
- append(CharSequence) - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
-
- append(CharSequence, int, int) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
- append(CharSequence, int, int) - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
-
- appendPointer - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
The current pointer at which new elements can be appended.
- apply(char[][], URI) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
-
Checks whether a specified URL passes a specified robots filter.
- apply(Link) - Method in class it.unimi.di.law.warc.filters.SameHost
-
Apply the filter to a given link, returning true if source and target
have the same host.
- apply(URIResponse) - Method in class it.unimi.di.law.bubing.parser.BinaryParser
-
- apply(URIResponse) - Method in class it.unimi.di.law.bubing.parser.HTMLParser
-
- apply(WarcRecord) - Method in class it.unimi.di.law.warc.filters.DigestEquals
-
Apply the filter to a given WarcRecord
- apply(WarcRecord) - Method in class it.unimi.di.law.warc.filters.IsHttpResponse
-
Apply the filter to a WarcRecord
- apply(URI) - Method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
-
Apply the filter to a given URI
- apply(URI) - Method in class it.unimi.di.law.warc.filters.HostEndsWith
-
Apply the filter to a given URI
- apply(URI) - Method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
-
Apply the filter to a given URI
- apply(URI) - Method in class it.unimi.di.law.warc.filters.HostEquals
-
Apply the filter to a given URI
- apply(URI) - Method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
-
Apply the filter to a given URI
- apply(URI) - Method in class it.unimi.di.law.warc.filters.SchemeEquals
-
Apply the filter to a given URI
- apply(URI) - Method in class it.unimi.di.law.warc.filters.URLEquals
-
Apply the filter to a given URI
- apply(URI) - Method in class it.unimi.di.law.warc.filters.URLMatchesRegex
-
Apply the filter to a given URI
- apply(URI) - Method in class it.unimi.di.law.warc.filters.URLShorterThan
-
Apply the filter to the given URI
- apply(HttpResponse) - Method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
-
Apply the filter to a HttpResponse
- apply(HttpResponse) - Method in class it.unimi.di.law.warc.filters.IsProbablyBinary
-
This method implements a simple heuristic for guessing whether a page is binary.
- apply(HttpResponse) - Method in class it.unimi.di.law.warc.filters.ResponseMatches
-
Checks whether the response associated with this page matches (in ISO-8859-1 encoding)
the regular expression provided at construction time.
- apply(HttpResponse) - Method in class it.unimi.di.law.warc.filters.StatusCategory
-
Apply the filter to a given HttpResponse
- approximatedSize() - Method in class it.unimi.di.law.bubing.frontier.Workbench
-
Returns an approximation of the workbench size (in number of entries present on the workbench).
- archetypes() - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
The number of pages stored (does not include duplicates).
- ARCHETYPES1XX - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- ARCHETYPES2XX - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- ARCHETYPES3XX - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- ARCHETYPES4XX - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- ARCHETYPES5XX - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- ARCHETYPESOTHERS - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- archetypesStatus - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
In position i, with 0 < i <6, the number of pages stored (does
not include duplicates) having status ixx.
- ARGS - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
-
RegularExpression Id.
- atom() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
- averageSpeed - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The average speeds of all visit states.
- AVERAGESPEED - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- backup(int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Backup a number of characters.
- BAD_CHAR - Static variable in class it.unimi.di.law.bubing.util.BURL
-
A list of bad characters.
- BAD_CHAR_SUBSTITUTE - Static variable in class it.unimi.di.law.bubing.util.BURL
-
- BasicHttpClientConnectionManagerWithAlternateDNS(DnsResolver) - Constructor for class it.unimi.di.law.bubing.frontier.FetchingThread.BasicHttpClientConnectionManagerWithAlternateDNS
-
- beginColumn - Variable in class it.unimi.di.law.warc.filters.parser.Token
-
The column number of the first character of this Token.
- beginLine - Variable in class it.unimi.di.law.warc.filters.parser.Token
-
The line number of the first character of this Token.
- BeginToken() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Start.
- BINARY_CHECK_SCAN_LENGTH - Static variable in class it.unimi.di.law.warc.filters.IsProbablyBinary
-
- binaryParser - Variable in class it.unimi.di.law.bubing.util.FetchData
-
The binary parser associated with this fetched response.
- BinaryParser - Class in it.unimi.di.law.bubing.parser
-
A universal binary parser that just computes digests.
- BinaryParser(HashFunction) - Constructor for class it.unimi.di.law.bubing.parser.BinaryParser
-
Builds a parser for digesting a page.
- BinaryParser(HashFunction, boolean) - Constructor for class it.unimi.di.law.bubing.parser.BinaryParser
-
- BinaryParser(String) - Constructor for class it.unimi.di.law.bubing.parser.BinaryParser
-
Builds a parser for digesting a page.
- blackListedHostHashes - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
The set of hashes of hosts that should be blacklisted.
- blackListedHostHashesLock - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- blackListedHosts - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A host that should be blacklisted (i.e., not crawled).
- blackListedIPv4Addresses - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- blackListedIPv4Addresses - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
An IPv4 address that should be blacklisted (i.e., not crawled).
- blackListedIPv4Lock - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- bloomFilterPrecision - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- bloomFilterPrecision - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
- BoundSessionInputBuffer - Class in it.unimi.di.law.warc.util
-
A SessionInputBuffer
implementation that bounds a SessionInputBuffer
(and hence its
buffered stream) so that no more than a specified amount of bytes will be read (from its stream), and
keeps track of the number of read bytes.
- BoundSessionInputBuffer(SessionInputBuffer, long) - Constructor for class it.unimi.di.law.warc.util.BoundSessionInputBuffer
-
Creates a new SessionInputBuffer
bounded to a given maximum length.
- broken - Variable in class it.unimi.di.law.bubing.frontier.Workbench
-
The number of entirely broken entries (i.e., entries containing only broken visit states).
- brokenPathQueryCount - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
-
The number of path+queries living in a broken visit state.
- brokenVisitStates - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- BROKENVISITSTATES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- brokenVisitStatesOnWorkbench - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
-
The number of broken visit states on the workbench.
- BUBING_GUESSED_CHARSET - it.unimi.di.law.warc.records.WarcHeader.Name
-
- BUBING_IS_DUPLICATE - it.unimi.di.law.warc.records.WarcHeader.Name
-
- BubingJob - Class in it.unimi.di.law.bubing.util
-
The JAI4J
Job
used by BUbiNG.
- BubingJob(ByteArrayList) - Constructor for class it.unimi.di.law.bubing.util.BubingJob
-
Creates a new BUbiNG job corresponding to a given
BUbiNG URL.
- bufcolumn - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- buffer - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
The character buffer.
- buffer - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- buffer() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Returns the current buffer of this byte-array disk queue.
- BufferedHttpEntityFactory - Class in it.unimi.di.law.warc.util
-
- buffers - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
For each log-file index, the associated ByteBuffer
.
- bufline - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- bufpos - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Position in buffer.
- BuildRepetitionSet - Class in it.unimi.di.law.bubing.tool
-
Builds and saves the repetition set of a crawl.
- BuildRepetitionSet() - Constructor for class it.unimi.di.law.bubing.tool.BuildRepetitionSet
-
- BURL - Class in it.unimi.di.law.bubing.util
-
Static methods to manipulate normalized, canonical URLs in BUbiNG.
- BYTE_ARRAY - Static variable in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
-
A serializer-deserializer for byte arrays that write the array length using variable-length byte encoding,
and the writes the content of the array.
- BYTE_ARRAY_HASHING_STRATEGY - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- BYTE_ARRAY_LIST_HASHING_STRATEGY - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- ByteArrayCharSequence - Class in it.unimi.di.law.bubing.util
-
An adapter exposing a byte array as an ISO-8859-1-encoded
character sequence.
- ByteArrayCharSequence() - Constructor for class it.unimi.di.law.bubing.util.ByteArrayCharSequence
-
Creates a new empty byte-array character sequence.
- ByteArrayCharSequence(byte[]) - Constructor for class it.unimi.di.law.bubing.util.ByteArrayCharSequence
-
Creates a new byte-array character sequence using the provided byte array.
- ByteArrayCharSequence(byte[], int, int) - Constructor for class it.unimi.di.law.bubing.util.ByteArrayCharSequence
-
Creates a new byte-array character sequence using the provided byte-array fragment.
- ByteArrayDiskQueue - Class in it.unimi.di.law.bubing.util
-
A queue of byte arrays partially stored on disk.
- ByteArrayDiskQueue(ByteDiskQueue) - Constructor for class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
- ByteArrayDiskQueues - Class in it.unimi.di.law.bubing.util
-
A set of memory-mapped queues of byte arrays.
- ByteArrayDiskQueues(File) - Constructor for class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Creates a set of byte-array disk queues in the given directory using
log files of size 2
26.
- ByteArrayDiskQueues(File, int) - Constructor for class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Creates a set of byte-array disk queues in the given directory using the specified
file size.
- ByteArrayDiskQueues.QueueData - Class in it.unimi.di.law.bubing.util
-
Metadata associated with a queue.
- ByteArrayListByteSerializerDeserializer - Class in it.unimi.di.law.bubing.sieve
-
- ByteArrayListByteSerializerDeserializer() - Constructor for class it.unimi.di.law.bubing.sieve.ByteArrayListByteSerializerDeserializer
-
- ByteArraySessionOutputBuffer - Class in it.unimi.di.law.warc.util
-
A SessionOutputBuffer
implementation that uses a byte array as a backing store.
- ByteArraySessionOutputBuffer() - Constructor for class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
-
- ByteSerializerDeserializer<V> - Interface in it.unimi.di.law.bubing.sieve
-
- ByteWriter - Class in it.unimi.di.law.warc.processors
-
A writer that simply dumps to the output stream an array of bytes.
- cache() - Method in class it.unimi.di.law.warc.io.CompressedWarcCachingReader
-
- cache() - Method in interface it.unimi.di.law.warc.io.WarcCachingReader
-
- cachedContent - Variable in class it.unimi.di.law.warc.util.FastByteArrayInputStreamHttpEntityFactory
-
- CapriciousPrintWriter(Writer, long, long, XoRoShiRo128PlusRandom) - Constructor for class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy.CapriciousPrintWriter
-
- CatEFGraphs - Class in it.unimi.di.law.bubing.tool
-
- CatEFGraphs() - Constructor for class it.unimi.di.law.bubing.tool.CatEFGraphs
-
- CHAR_BUFFER_SIZE - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
The size of the internal Jericho buffer.
- CHAR_SEQUENCE_HASHING_STRATEGY - Static variable in class it.unimi.di.law.bubing.sieve.AbstractSieve
-
- charAt(int) - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
-
- CharSequenceByteSerializerDeserializer - Class in it.unimi.di.law.bubing.sieve
-
- CharSequenceByteSerializerDeserializer() - Constructor for class it.unimi.di.law.bubing.sieve.CharSequenceByteSerializerDeserializer
-
- CHARSET_PATTERN - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
- checkRobots(long) - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Checks whether the current robots information has expired and, if necessary, schedules a new robots.txt
download.
- CHECKSUM_THRESHOLD - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- clear() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Empties this visit state of all the URLs that it contains.
- clear() - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Removes all elements from this set.
- clear() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
Removes all elements from this set.
- clear() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Clears this queue.
- clear() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Clears this queue.
- clear() - Method in class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
-
- clone() - Method in class it.unimi.di.law.bubing.parser.BinaryParser
-
- clone() - Method in class it.unimi.di.law.bubing.parser.HTMLParser
-
- close() - Method in class it.unimi.di.law.bubing.frontier.FetchingThread
-
- close() - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
Closes the frontier: threads are stopped (if necessary, aborted), sieve and store and robots
stream are closed.
- close() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
-
- close() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve
-
Closes (forever) this sieve.
- close() - Method in class it.unimi.di.law.bubing.sieve.IdentitySieve
-
- close() - Method in class it.unimi.di.law.bubing.sieve.MercatorSieve
-
- close() - Method in interface it.unimi.di.law.bubing.store.Store
-
- close() - Method in class it.unimi.di.law.bubing.store.UnbufferedFileStore
-
- close() - Method in class it.unimi.di.law.bubing.store.WarcStore
-
- close() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Closes this queue.
- close() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Closes all files.
- close() - Method in class it.unimi.di.law.bubing.util.FetchData
-
- close() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Closes this queue.
- close() - Method in class it.unimi.di.law.warc.io.CompressedWarcWriter
-
- close() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveWriter
-
- close() - Method in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
-
- close() - Method in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter.WriterPair
-
- close() - Method in class it.unimi.di.law.warc.io.UncompressedWarcWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.ByteWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.ConstantPositionURLWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.DateURLWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.IdentityProcessor
-
- close() - Method in class it.unimi.di.law.warc.processors.IdentityWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.ResponseContentExtractor
-
- close() - Method in class it.unimi.di.law.warc.processors.ToStringWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.URLDigestFinalPositionWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.URLDigestStatusLengthWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.URLDigestWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.URLPositionWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.URLWriter
-
- close() - Method in class it.unimi.di.law.warc.processors.WarcTargetUriExtractor
-
- CLOSEPAREN - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
-
RegularExpression Id.
- collect(double) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
- collectIf(double, double) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
-
Performs a garbage collection if the space used is below a given threshold, reaching a given target ratio.
- column - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- comment - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
-
An internal representation of the comment of the entry.
- compareTo(Delayed) - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
- compareTo(Delayed) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
- compressedSkipLength - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
-
The actual (compressed) length of the entry.
- CompressedWarcCachingReader - Class in it.unimi.di.law.warc.io
-
- CompressedWarcCachingReader(InputStream) - Constructor for class it.unimi.di.law.warc.io.CompressedWarcCachingReader
-
- CompressedWarcReader - Class in it.unimi.di.law.warc.io
-
- CompressedWarcReader(InputStream) - Constructor for class it.unimi.di.law.warc.io.CompressedWarcReader
-
- CompressedWarcWriter - Class in it.unimi.di.law.warc.io
-
- CompressedWarcWriter(OutputStream) - Constructor for class it.unimi.di.law.warc.io.CompressedWarcWriter
-
- ConcurrentCountingMap - Class in it.unimi.di.law.bubing.util
-
A concurrent counting map.
- ConcurrentCountingMap() - Constructor for class it.unimi.di.law.bubing.util.ConcurrentCountingMap
-
Creates a new concurrent counting map with concurrency level equal to Runtime.availableProcessors()
.
- ConcurrentCountingMap(int) - Constructor for class it.unimi.di.law.bubing.util.ConcurrentCountingMap
-
Creates a new concurrent counting map.
- ConcurrentCountingMap.LockedMap - Class in it.unimi.di.law.bubing.util
-
- ConcurrentCountingMap.Stripe - Class in it.unimi.di.law.bubing.util
-
- ConcurrentSummaryStats - Class in it.unimi.di.law.bubing.util
-
- ConcurrentSummaryStats() - Constructor for class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
- connectionTimeout - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- connectionTimeout - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The socket connection timeout in milliseconds.
- ConstantPositionURLWriter - Class in it.unimi.di.law.warc.processors
-
- ConstantPositionURLWriter(String) - Constructor for class it.unimi.di.law.warc.processors.ConstantPositionURLWriter
-
- consume() - Method in interface it.unimi.di.law.warc.io.gzarc.GZIPArchive.ReadEntry.LazyInflater
-
Consumes the (possibly) remaining entry content.
- consume() - Method in class it.unimi.di.law.warc.util.BoundSessionInputBuffer
-
Consumes the remaining bytes (of the buffered stream).
- CONTENT_LENGTH - it.unimi.di.law.warc.records.WarcHeader.Name
-
- CONTENT_PATTERN - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
- CONTENT_TYPE - it.unimi.di.law.warc.records.WarcHeader.Name
-
- contentLength - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
Statistic about the content length of each archetype
- contentLength() - Method in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
-
- contentLength(long) - Method in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
-
- contentTypeApplication - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
Number of archetypes whose indicated content type starts with application (case insensitive)
- contentTypeImage - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
Number of archetypes whose indicated content type starts with image (case insensitive)
- contentTypeOthers - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
Number of archetypes whose indicated content type does not start with text, image, or
application (case insensitive)
- ContentTypeStartsWith - Class in it.unimi.di.law.warc.filters
-
A filter accepting only fetched response whose content type starts with a given string.
- ContentTypeStartsWith(String) - Constructor for class it.unimi.di.law.warc.filters.ContentTypeStartsWith
-
Creates a filter that only accepts URLs whose content type starts with a given prefix.
- contentTypeText - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
Number of archetypes whose indicated content type starts with text (case insensitive)
- cookieMaxByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- cookieMaxByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The maximum overall size for the (external form of) the cookies accepted from a single host.
- cookiePolicy - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- cookiePolicy - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The cookie policy to be used.
- cookies - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
The cookies of this visit state.
- copy() - Method in class it.unimi.di.law.bubing.parser.BinaryParser
-
- copy() - Method in class it.unimi.di.law.bubing.parser.HTMLParser
-
- copy() - Method in interface it.unimi.di.law.bubing.parser.Parser
-
This method strengthens the return type of the method inherited from
Filter
.
- copy() - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
-
- copy() - Method in class it.unimi.di.law.bubing.test.ImmutableGraphNamedGraphServer
-
- copy() - Method in class it.unimi.di.law.bubing.test.RandomNamedGraphServer
-
- copy() - Method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
-
- copy() - Method in class it.unimi.di.law.warc.filters.DigestEquals
-
- copy() - Method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
-
- copy() - Method in class it.unimi.di.law.warc.filters.HostEndsWith
-
- copy() - Method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
-
- copy() - Method in class it.unimi.di.law.warc.filters.HostEquals
-
- copy() - Method in class it.unimi.di.law.warc.filters.IsHttpResponse
-
- copy() - Method in class it.unimi.di.law.warc.filters.IsProbablyBinary
-
- copy() - Method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
-
- copy() - Method in class it.unimi.di.law.warc.filters.ResponseMatches
-
- copy() - Method in class it.unimi.di.law.warc.filters.SameHost
-
- copy() - Method in class it.unimi.di.law.warc.filters.SchemeEquals
-
- copy() - Method in class it.unimi.di.law.warc.filters.StatusCategory
-
- copy() - Method in class it.unimi.di.law.warc.filters.URLEquals
-
- copy() - Method in class it.unimi.di.law.warc.filters.URLMatchesRegex
-
- copy() - Method in class it.unimi.di.law.warc.filters.URLShorterThan
-
- copy() - Method in class it.unimi.di.law.warc.processors.IdentityProcessor
-
- copy() - Method in class it.unimi.di.law.warc.processors.ResponseContentExtractor
-
- copy() - Method in class it.unimi.di.law.warc.processors.WarcTargetUriExtractor
-
- copy(Filter<T>...) - Static method in class it.unimi.di.law.warc.filters.Filters
-
- copyContent(long, long, long, long) - Method in class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
-
- count - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues.QueueData
-
The number of elements in the list (always nonzero).
- count(VisitState) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
-
Returns the number of path+queries associated with the given visit state.
- count(Object) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Returns the number of elements associated with the given key.
- CRAWLDURATION - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- crawlIsNew - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- crawlIsNew - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
Whether this is a new crawl.
- crc32 - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
-
The CRC of the entry.
- createFromFile(long, File, int, boolean) - Static method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Creates a new disk-based queue of byte arrays using an existing file.
- createFromFile(long, File, int, boolean) - Static method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Creates a new disk-based queue of objects using an existing file.
- createHierarchicalTempFile(File, int, String, String) - Static method in class it.unimi.di.law.bubing.util.Util
-
Creates a temporary file with a random hierachical path.
- createNew(File, int, boolean) - Static method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Creates a new disk-based queue of byte arrays.
- createNew(File, int, boolean) - Static method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Creates a new disk-based queue of objects.
- CRLF - Static variable in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
-
- CRLFCRLF - Static variable in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
-
- crossAuthorityDuplicates - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
If true
, pages with the same content but with different authorities are considered duplicates.
- curChar - Variable in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
- CURRENTQUEUE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- currentToken - Variable in exception it.unimi.di.law.warc.filters.parser.ParseException
-
This is the last token that has been consumed successfully.
- DateURLWriter - Class in it.unimi.di.law.warc.processors
-
- DateURLWriter() - Constructor for class it.unimi.di.law.warc.processors.DateURLWriter
-
- debugStream - Variable in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
Debug output.
- decodeInt() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Decodes using vByte a nonnegative integer at the current pointer.
- DEFAULT - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
-
Lexical state.
- DEFAULT_LOG2_LOG_FILE_SIZE - Static variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
By default, we use 64 MiB log files.
- defaultRequestConfig - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The default configuration for a non-robots.txt
request.
- DefaultUpdateStrategy() - Constructor for class it.unimi.di.law.bubing.sieve.AbstractSieve.DefaultUpdateStrategy
-
- deflater - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.WriteEntry
-
The deflater to be used to actually write data that has to be compressed as the entry content.
- dequeue() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Removes the first path in the queue.
- dequeue() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Dequeues a byte array from the queue in FIFO fashion.
- dequeue() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Dequeues an object from the queue in FIFO fashion.
- dequeue(Object) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Dequeues the first element available for a given key.
- dequeueKey() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
-
Returns the next key in the flow of new pairs remained after the check, and discards the corresponding value.
- dequeuePathQueries(VisitState, int) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
-
Dequeues at most the given number of path+queries into the given visit state.
- digest - Variable in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
- digest() - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
- digest() - Method in class it.unimi.di.law.bubing.util.FetchData
-
Get the digest
- digest(byte[]) - Method in class it.unimi.di.law.bubing.util.FetchData
-
Set the digest with a given value
- digestAlgorithm - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- digestAlgorithm - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The algorithm used for digesting pages (for duplicate filtering).
- digestAppendable - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
An object emboding the digest logic, or null
for no digest computation.
- DigestAppendable(HashFunction) - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
Create a digest appendable using a given hash function.
- DigestEquals - Class in it.unimi.di.law.warc.filters
-
A filter accepting only records of given digest, specified as a hexadecimal string.
- digests - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
A Bloom filter storing page digests for duplicate detection.
- DIGESTS_NAME - Static variable in class it.unimi.di.law.bubing.store.WarcStore
-
- disable_tracing() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Disable tracing.
- DiskNewFlow(ByteSerializerDeserializer<T>) - Constructor for class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
-
- dist - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
-
A variable used for exponentially-binned distribution of visit state sizes.
- distributor - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- Distributor - Class in it.unimi.di.law.bubing.frontier
-
- Distributor(Frontier) - Constructor for class it.unimi.di.law.bubing.frontier.Distributor
-
Creates a distributor for the given frontier.
- DISTRIBUTORVISITSTATESONDISK - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- DISTRIBUTORWARMUP - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- dnsCacheMaxSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- dnsCacheMaxSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
Maximum number of entries cached by the DNS resolutor when using
DnsJavaResolver
.
- DnsJavaResolver - Class in it.unimi.di.law.bubing.frontier.dns
-
- DnsJavaResolver() - Constructor for class it.unimi.di.law.bubing.frontier.dns.DnsJavaResolver
-
- dnsNegativeTtl - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- dnsNegativeTtl - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
- dnsPositiveTtl - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- dnsPositiveTtl - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
- dnsResolver - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
The DNS resolver used throughout the crawler.
- dnsResolverClass - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A DnsResolver
.
- DNSThread - Class in it.unimi.di.law.bubing.frontier
-
- DNSThread(Frontier, int) - Constructor for class it.unimi.di.law.bubing.frontier.DNSThread
-
A DNS thread for the given
Frontier
, with an index used to set the thread's name.
- dnsThreads - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- dnsThreads - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- dnsThreads - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The number of
DNS threads (usually few dozens, depending on the server).
- dnsThreads(int) - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
Changes the number of DNS threads.
- done - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- done() - Method in class it.unimi.di.law.bubing.frontier.StatsThread
-
Terminates the statistics,
closing all the progress loggers.
- Done() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Reset buffer when finished.
- DoneThread - Class in it.unimi.di.law.bubing.frontier
-
- DoneThread(Frontier) - Constructor for class it.unimi.di.law.bubing.frontier.DoneThread
-
- DOTTED_ADDRESS - Static variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
A pattern used to identify hosts specified directed via their address in dotted notation.
- duplicates - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The number of duplicate pages.
- DUPLICATES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- DuplicateSegmentsLessThan - Class in it.unimi.di.law.warc.filters
-
A filter accepting only URIs whose path does not contain too many duplicate segments.
- DuplicateSegmentsLessThan(int) - Constructor for class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
-
Creates a filter that only accepts URIs whose path does contains less duplicate consecutive segments than
the given threshold.
- emit() - Method in class it.unimi.di.law.bubing.frontier.StatsThread
-
Emits the statistics.
- EMPTY_ARRAY - Static variable in class it.unimi.di.law.warc.filters.Filters
-
- EMPTY_CHARSEQUENCE_ARRAY - Static variable in class it.unimi.di.law.bubing.test.RandomNamedGraphServer
-
- EMPTY_COOKIE_ARRAY - Static variable in class it.unimi.di.law.bubing.frontier.VisitState
-
A singleton empty cookie array.
- EMPTY_ROBOTS_FILTER - Static variable in class it.unimi.di.law.bubing.util.URLRespectsRobots
-
A singleton empty robots filter.
- emptyPairs - Variable in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
-
- enable_tracing() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Enable tracing.
- encodeInt(int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Encodes using vByte a nonnegative integer at the current pointer.
- endColumn - Variable in class it.unimi.di.law.warc.filters.parser.Token
-
The column number of the last character of this Token.
- endLine - Variable in class it.unimi.di.law.warc.filters.parser.Token
-
The line number of the last character of this Token.
- endTag(EndTag) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
- endTags - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
Cached byte representations of all closing tags.
- endTime - Variable in class it.unimi.di.law.bubing.util.FetchData
-
System.currentTimeMillis()
when the GET request was completed.
- enlargeBuffer(int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Enlarge the buffer of this queue to a given size.
- enlargeBuffer(int) - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Enlarge the buffer of this queue to a given size.
- enqueue(byte[]) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Enqueues a byte array to this queue.
- enqueue(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Enqueues a byte-array fragment to this queue.
- enqueue(ByteArrayList) - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
Enqueues a URL to the BUbiNG crawl.
- enqueue(Object, byte[]) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Enqueues an element (specified as a byte array) associated with a given key.
- enqueue(Object, byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Enqueues an element (specified as a byte-array fragment) associated with a given key.
- enqueue(URI) - Method in class it.unimi.di.law.bubing.frontier.ParsingThread.FrontierEnqueuer
-
Enqueues the given URL, provided that it passes the schedule filter, its host is
blacklisted
.
- enqueue(K, V) - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve
-
Add the given (key,value) pair to the store.
- enqueue(K, V) - Method in class it.unimi.di.law.bubing.sieve.IdentitySieve
-
- enqueue(K, V) - Method in class it.unimi.di.law.bubing.sieve.MercatorSieve
-
- enqueue(T) - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Enqueues an object to this queue.
- enqueueLocal(ByteArrayList) - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
Enqueues a local URL represented by a byte array to the crawl of this agent.
- enqueuePathQuery(byte[]) - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Enqueues a path+query in byte-array representation, possibly putting this visit state in its
entry.
- enqueueRobots() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Enqueues the /robots.txt
path as the first element of the queue, if the queue is
empty, or as the second element otherwise, possibly putting this visit state in its
entry.
- enqueueURL(VisitState, ByteArrayList) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
-
Enqueues the given URL as a path+query associated to the scheme+authority of the given visit state.
- ensureCapacity(int) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Ensures that the set has a given capacity.
- ensureCapacity(int) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
Ensures that the set has a given capacity.
- ensureNotPaused() - Method in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- entrySummaryStats - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
-
A variable accumulating statistics about the size (in visit states) of
workbench entries.
- EOF - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
-
End of File.
- EOL - Static variable in exception it.unimi.di.law.warc.filters.parser.ParseException
-
The end of line string for this machine.
- EPOCH - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- equals(Object) - Method in class it.unimi.di.law.bubing.util.LockFreeQueue
-
- equals(Object) - Method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
-
Compare this object with a given generic one
- equals(Object) - Method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
-
Compare this object with a given generic one
- equals(Object) - Method in class it.unimi.di.law.warc.filters.HostEndsWith
-
Compare this object with a given generic one
- equals(Object) - Method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
-
Compare this object with a given generic one
- equals(Object) - Method in class it.unimi.di.law.warc.filters.HostEquals
-
Compare this object with a given generic one
- equals(Object) - Method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
-
Compare this with a given generic object
- equals(Object) - Method in class it.unimi.di.law.warc.filters.SameHost
-
Compare this object with a given generic one.
- equals(Object) - Method in class it.unimi.di.law.warc.filters.SchemeEquals
-
Compare a given object with this
- equals(Object) - Method in class it.unimi.di.law.warc.filters.StatusCategory
-
Compare this filter with a generic object
- equals(Object) - Method in class it.unimi.di.law.warc.filters.URLEquals
-
Compare this filter with a given object
- equals(Object) - Method in class it.unimi.di.law.warc.filters.URLMatchesRegex
-
Compare this with a given object
- equals(Object) - Method in class it.unimi.di.law.warc.filters.URLShorterThan
-
Compare this with a given object
- estimate(T) - Method in interface it.unimi.di.law.bubing.spam.SpamDetector
-
Estimates the spam score associated with a given information object.
- estimateLength(CharSequence[]) - Static method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy
-
Estimates the length of the page generated by a given array of successors.
- exception - Variable in class it.unimi.di.law.bubing.util.FetchData
-
The exception thrown in case of a failed fetch, or null
.
- EXCEPTION_HOST_KILLER - Static variable in class it.unimi.di.law.bubing.frontier.ParsingThread
-
A map recording for each type of exception the number of retries.
- EXCEPTION_TO_MAX_RETRIES - Static variable in class it.unimi.di.law.bubing.frontier.ParsingThread
-
A map recording for each type of exception the number of retries.
- EXCEPTION_TO_WAIT_TIME - Static variable in class it.unimi.di.law.bubing.frontier.ParsingThread
-
A map recording for each type of exception a timeout, Note that 0 means standard politeness time.
- ExpandBuff(boolean) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- expectedTokenSequences - Variable in exception it.unimi.di.law.warc.filters.parser.ParseException
-
Each entry in this array is an array of integers.
- externalOutdegree - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
Statistics about the number of out-links of each archetype, without considering the links to
the same corresponding host
- FakeResolver - Class in it.unimi.di.law.bubing.frontier.dns
-
A fake resolver that returns a four-byte representation of the host hashcode.
- FakeResolver() - Constructor for class it.unimi.di.law.bubing.frontier.dns.FakeResolver
-
- FALSE - Static variable in class it.unimi.di.law.warc.filters.Filters
-
- FALSE - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
-
RegularExpression Id.
- FastApproximateByteArrayCache - Class in it.unimi.di.law.bubing.util
-
A fast, concurrent approximate cache for byte arrays.
- FastApproximateByteArrayCache(long) - Constructor for class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
-
Creates a new cache with specified size and concurrency level equal to Runtime.availableProcessors()
.
- FastApproximateByteArrayCache(long, int) - Constructor for class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
-
Creates a new cache with specified size.
- FastApproximateByteArrayCache.Stripe - Class in it.unimi.di.law.bubing.util
-
A class containing a stripe of the cache.
- FastByteArrayInputStreamHttpEntityFactory - Class in it.unimi.di.law.warc.util
-
- FastByteArrayInputStreamHttpEntityFactory() - Constructor for class it.unimi.di.law.warc.util.FastByteArrayInputStreamHttpEntityFactory
-
- FCOMMENT - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- fetch(URI, HttpClient, RequestConfig, VisitState, boolean) - Method in class it.unimi.di.law.bubing.util.FetchData
-
Fetches a given URL.
- FETCH_ROBOTS - Static variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
Whether to fetch and use robots.txt
.
- FetchData - Class in it.unimi.di.law.bubing.util
-
Response of a HTTP request.
- FetchData(RuntimeConfiguration) - Constructor for class it.unimi.di.law.bubing.util.FetchData
-
Creates a fetched response according to the given properties.
- fetchDataBufferByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- fetchDataBufferByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
- fetchedResources - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The number of fetched resources (updated by
ParsingThread
instances).
- FETCHEDRESOURCES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- fetchedRobots - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The number of fetched
robots.txt
files (updated by
ParsingThread
instances).
- FETCHEDROBOTS - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- fetchFilter - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- fetchFilter - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A filter that will be applied to all ready URLs to decide whether to fetch them.
- FetchingThread - Class in it.unimi.di.law.bubing.frontier
-
A thread fetching pages that will be then analyzed by a
ParsingThread
.
- FetchingThread(Frontier, int) - Constructor for class it.unimi.di.law.bubing.frontier.FetchingThread
-
Creates a new fetching thread.
- FetchingThread.BasicHttpClientConnectionManagerWithAlternateDNS - Class in it.unimi.di.law.bubing.frontier
-
A support class that makes it possible to plug in a custom DNS resolver.
- fetchingThreads - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- fetchingThreads - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
- fetchingThreads(int) - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
Changes the number of fetching threads.
- fetchingThreadWaitingTimeSum - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The sum of the waiting time of the waiting fetching threads: every time a fetching thread
waits this sum is updated; every time the statistics are printed this value is reset.
- FETCHINGTHREADWAITINGTIMESUM - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- fetchingThreadWaits - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The number of waits performed by fetching threads; every time the statistics are printed this
value is reset.
- FETCHINGTHREADWAITS - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- FEXTRA - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- FHCRC - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- files - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
For each log-file index, the associated RandomAccessFile
.
- FillBuff() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- filledPairs - Variable in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
-
- Filter<T> - Interface in it.unimi.di.law.warc.filters
-
A filter is a strategy to decide whether to accept a given
object or not.
- FILTER_PACKAGE_NAME - Static variable in interface it.unimi.di.law.warc.filters.Filter
-
The name of the package that contains this interface as well as
most filters.
- FilterParser<T> - Class in it.unimi.di.law.warc.filters.parser
-
A simple parser that transforms a filter expression into a filter.
- FilterParser(FilterParserTokenManager) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParser
-
Constructor with generated Token Manager.
- FilterParser(InputStream) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParser
-
Constructor with InputStream.
- FilterParser(InputStream, String) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParser
-
Constructor with InputStream and supplied encoding
- FilterParser(Reader) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParser
-
Constructor.
- FilterParser(Class<T>) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParser
-
- FilterParserConstants - Interface in it.unimi.di.law.warc.filters.parser
-
Token literal values and constants.
- FilterParserTokenManager - Class in it.unimi.di.law.warc.filters.parser
-
Token Manager.
- FilterParserTokenManager(SimpleCharStream) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
Constructor.
- FilterParserTokenManager(SimpleCharStream, int) - Constructor for class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
Constructor.
- Filters - Class in it.unimi.di.law.warc.filters
-
A collection of static methods to deal with
filters
.
- Filters() - Constructor for class it.unimi.di.law.warc.filters.Filters
-
- finishedAppending() - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
- finishedAppending() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
-
- finishedAppending() - Method in interface it.unimi.di.law.bubing.sieve.AbstractSieve.NewFlowReceiver
-
The new flow of keys is over.
- firstPath() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Peeks at the first path in the queue.
- FIX_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- flush() - Method in class it.unimi.di.law.bubing.Agent
-
- flush() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve
-
Forces the check+update of all pairs that have been enqueued.
- flush() - Method in class it.unimi.di.law.bubing.sieve.IdentitySieve
-
- flush() - Method in class it.unimi.di.law.bubing.sieve.MercatorSieve
-
- flushingThread - Variable in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
-
- flushingThreadException - Variable in class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
-
- flushingThreadException - Variable in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
-
- FNAME - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- followFilter - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- followFilter - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A filter that will be applied to all parsed resources to decide whether to follow their links.
- FORBIDDEN_CHARS - Static variable in class it.unimi.di.law.bubing.util.BURL
-
Characters that will cause a URI spec to be rejected.
- forciblyEnqueueRobotsFirst() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Forcibly enqueues the /robots.txt
path as the first element of the queue.
- formatDate(Calendar) - Static method in class it.unimi.di.law.warc.records.WarcHeader
-
- FormatException(String) - Constructor for exception it.unimi.di.law.warc.io.gzarc.GZIPArchive.FormatException
-
- FormatException(String, Throwable) - Constructor for exception it.unimi.di.law.warc.io.gzarc.GZIPArchive.FormatException
-
- formatId(UUID) - Static method in class it.unimi.di.law.warc.records.WarcHeader
-
- forName(String) - Static method in class it.unimi.di.law.bubing.parser.BinaryParser
-
Return the hash function corresponding to a given message-digest algorithm given by name.
- freeze() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Freezes this queue.
- freeze() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Freezes this queue.
- fromByteArray(byte[], int) - Method in class it.unimi.di.law.bubing.Agent
-
- fromHexString(String) - Static method in class it.unimi.di.law.warc.util.Util
-
Returns a byte array corresponding to the given number.
- fromNormalizedByteArray(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Creates a new BUbiNG URL from a normalized ASCII string represented by a byte array.
- fromNormalizedSchemeAuthorityAndPathQuery(byte[], byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Creates a new BUbiNG URL from a byte-array representation of a normalized scheme and
authority and a byte-array representation of a normalized ASCII path and query.
- fromNormalizedSchemeAuthorityAndPathQuery(String, byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Creates a new BUbiNG URL from a normalized ASCII string representing scheme and
authority and a byte-array representation of a normalized ASCII path and query.
- fromPayload(HeaderGroup, BoundSessionInputBuffer) - Static method in class it.unimi.di.law.warc.records.AbstractWarcRecord
-
- fromPayload(HeaderGroup, BoundSessionInputBuffer) - Static method in class it.unimi.di.law.warc.records.HttpRequestWarcRecord
-
- fromPayload(HeaderGroup, BoundSessionInputBuffer) - Static method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- fromPayload(HeaderGroup, BoundSessionInputBuffer) - Static method in class it.unimi.di.law.warc.records.InfoWarcRecord
-
- fromPayloadMethod(Header) - Static method in enum it.unimi.di.law.warc.records.WarcRecord.Type
-
Returns the factory method to be used to create a record from the payload given an header specifying the type.
- fromStream(InputStream) - Method in class it.unimi.di.law.bubing.sieve.ByteArrayListByteSerializerDeserializer
-
A serializer-deserializer for byte arrays that write the array length using variable-length byte encoding,
and the writes the content of the array.
- fromStream(InputStream) - Method in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
-
Deserializes an object starting from a given portion of a byte array.
- fromStream(InputStream) - Method in class it.unimi.di.law.bubing.sieve.CharSequenceByteSerializerDeserializer
-
- fromString(String) - Method in class it.unimi.di.law.bubing.Agent
-
- FRONT_INCREASE - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- frontier - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
A reference to the frontier.
- Frontier - Class in it.unimi.di.law.bubing.frontier
-
The BUbiNG frontier: a class structure that encompasses most of the logic behind the way BUbiNG
fetches URLs.
- Frontier(RuntimeConfiguration, Store, Agent) - Constructor for class it.unimi.di.law.bubing.frontier.Frontier
-
Creates the frontier.
- Frontier.PropertyKeys - Enum in it.unimi.di.law.bubing.frontier
-
- frontierDir - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- frontierDir - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
- FrontierEnqueuer(Frontier, RuntimeConfiguration) - Constructor for class it.unimi.di.law.bubing.frontier.ParsingThread.FrontierEnqueuer
-
Creates the enqueuer.
- FTEXT - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- generate(long, StringBuilder, CharSequence[], boolean) - Static method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy
-
- GenerateGraphMap - Class in it.unimi.di.law.bubing.tool
-
Builds and saves the graph map, that is, a text file containing all URLs ever crawled, and a binary file containing the corresponding
nodes (duplicates are mapped to their archetype position).
- GenerateGraphMap() - Constructor for class it.unimi.di.law.bubing.tool.GenerateGraphMap
-
- generateParseException() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Generate ParseException.
- get() - Method in interface it.unimi.di.law.warc.io.gzarc.GZIPArchive.ReadEntry.LazyInflater
-
Returns the actual inflater from which the uncompressed entry content may be read.
- get(byte[]) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Returns the visit state associated to a given scheme+authority, or null
.
- get(byte[]) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
Returns the entry for a given IP address.
- get(byte[]) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
-
Gets the value of the counter associated with a given key.
- get(byte[]) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
-
- get(byte[], int, int) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Returns the visit state associated to a given scheme+authority specified as a byte-array fragment, or null
.
- get(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
-
Gets the value of the counter associated with a given key.
- get(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
-
- get(byte[], int, int, long) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
- getActiveFecthingThreads() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypeContentLength() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypeContentTypeApplication() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypeContentTypeImage() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypeContentTypeOthers() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypeContentTypeText() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypeExternalOutdegree() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypeOutdegree() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypes() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypes1xx() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypes2xx() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypes3xx() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypes4xx() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypes5xx() - Method in class it.unimi.di.law.bubing.Agent
-
- getArchetypesOther() - Method in class it.unimi.di.law.bubing.Agent
-
- getBeginColumn() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Get token beginning column number.
- getBeginLine() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Get token beginning line number.
- getBroken() - Method in class it.unimi.di.law.bubing.Agent
-
- getBrokenVisitStates() - Method in class it.unimi.di.law.bubing.Agent
-
- getBrokenVisitStatesOnWorkbench() - Method in class it.unimi.di.law.bubing.Agent
-
- getBytes() - Method in class it.unimi.di.law.bubing.Agent
-
- getCharsetName(byte[], int) - Static method in class it.unimi.di.law.bubing.parser.HTMLParser
-
Returns the charset name as indicated by a META
HTTP-EQUIV
element, if
present, interpreting the provided byte array as a sequence of
ISO-8859-1-encoded characters.
- getCharsetNameFromHeader(String) - Static method in class it.unimi.di.law.bubing.parser.HTMLParser
-
Extracts the charset name from the header value of a content-type
header using a regular expression.
- getColumn() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Deprecated.
- getComment() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
-
Returns the comment of the entry.
- getConnectionTimeout() - Method in class it.unimi.di.law.bubing.Agent
-
- getContent() - Method in class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
-
- getContentLength() - Method in class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
-
- getCookies(URI, CookieStore, int) - Static method in class it.unimi.di.law.bubing.frontier.FetchingThread
-
Returns the list of cookies in a given store in the form of an array, limiting their overall size
(only the maximal prefix of cookies satisfying the size limit is returned).
- getDelay(TimeUnit) - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
- getDelay(TimeUnit) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
- getDnsThreads() - Method in class it.unimi.di.law.bubing.Agent
-
- getDuplicatePercentage() - Method in class it.unimi.di.law.bubing.Agent
-
- getDuplicates() - Method in class it.unimi.di.law.bubing.Agent
-
- getEndColumn() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Get token end column number.
- getEndLine() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Get token end line number.
- getEntity() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- getEntity() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
- getEntry() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveReader
-
- getEntry(boolean) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveReader
-
- getEntry(String, String, Date) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveWriter
-
Returns an object that can be used to write an entry in the GZIP archive.
- getEntryAverage() - Method in class it.unimi.di.law.bubing.Agent
-
- getEntryMax() - Method in class it.unimi.di.law.bubing.Agent
-
- getEntryMin() - Method in class it.unimi.di.law.bubing.Agent
-
- getEntryVariance() - Method in class it.unimi.di.law.bubing.Agent
-
- getFetchFilter() - Method in class it.unimi.di.law.bubing.Agent
-
- getFetchingThreads() - Method in class it.unimi.di.law.bubing.Agent
-
- getFetchingThreadTotalWaitTime() - Method in class it.unimi.di.law.bubing.Agent
-
- getFetchingThreadWaits() - Method in class it.unimi.di.law.bubing.Agent
-
- getFilterFromSpec(String, String, Class<T>) - Static method in class it.unimi.di.law.warc.filters.Filters
-
Creates a filter from a filter class name and an external form.
- getFirstHeader(HeaderGroup, WarcHeader.Name) - Static method in class it.unimi.di.law.warc.records.WarcHeader
-
Returns the first header of given name.
- getFollowFilter() - Method in class it.unimi.di.law.bubing.Agent
-
- GetImage() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Get token literal value.
- getInfo() - Method in class it.unimi.di.law.warc.records.InfoWarcRecord
-
- getInstance() - Static method in class it.unimi.di.law.bubing.sieve.CharSequenceByteSerializerDeserializer
-
- getInstance() - Static method in class it.unimi.di.law.warc.processors.ByteWriter
-
- getInstance() - Static method in class it.unimi.di.law.warc.processors.IdentityProcessor
-
- getInstance() - Static method in class it.unimi.di.law.warc.processors.ResponseContentExtractor
-
- getInstance() - Static method in class it.unimi.di.law.warc.processors.ToStringWriter
-
- getInstance() - Static method in class it.unimi.di.law.warc.processors.URLDigestWriter
-
- getInstance() - Static method in class it.unimi.di.law.warc.processors.URLWriter
-
- getInstance() - Static method in class it.unimi.di.law.warc.processors.WarcTargetUriExtractor
-
- getIpDelay() - Method in class it.unimi.di.law.bubing.Agent
-
- getIPOnWorkbench() - Method in class it.unimi.di.law.bubing.Agent
-
- getKeepAliveTime() - Method in class it.unimi.di.law.bubing.Agent
-
- getKnownCount() - Method in class it.unimi.di.law.bubing.Agent
-
- getLine() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Deprecated.
- getLocale() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
Deprecated.
- getLocale() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
Deprecated.
- getManager(String) - Method in class it.unimi.di.law.bubing.Agent
-
- getMaxUrls() - Method in class it.unimi.di.law.bubing.Agent
-
- getMessage() - Method in error it.unimi.di.law.warc.filters.parser.TokenMgrError
-
You can also modify the body of this method to customize your error messages.
- getMetrics() - Method in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
-
- getName() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
-
Returns the name of the entry.
- getNextToken() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Get the next Token.
- getNextToken() - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
Get the next Token.
- getParseFilter() - Method in class it.unimi.di.law.bubing.Agent
-
- getParsingThreads() - Method in class it.unimi.di.law.bubing.Agent
-
- getProtocolVersion() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
-
- getProtocolVersion() - Method in class it.unimi.di.law.warc.records.HttpRequestWarcRecord
-
- getProtocolVersion() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- getProtocolVersion() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpRequest
-
- getProtocolVersion() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
- getQueueDistribution() - Method in class it.unimi.di.law.bubing.Agent
-
- getReadyToParse() - Method in class it.unimi.di.law.bubing.Agent
-
- getReadyURLs() - Method in class it.unimi.di.law.bubing.Agent
-
- getReceivedURLs() - Method in class it.unimi.di.law.bubing.Agent
-
- getRequestLine() - Method in class it.unimi.di.law.warc.records.HttpRequestWarcRecord
-
- getRequestLine() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpRequest
-
- getRequests() - Method in class it.unimi.di.law.bubing.Agent
-
- getRequiredFrontSize() - Method in class it.unimi.di.law.bubing.Agent
-
- getResolvedVisitStates() - Method in class it.unimi.di.law.bubing.Agent
-
- getResources() - Method in class it.unimi.di.law.bubing.Agent
-
- getResponseBodyMaxByteSize() - Method in class it.unimi.di.law.bubing.Agent
-
- getRobotsExpiration() - Method in class it.unimi.di.law.bubing.Agent
-
- getScheduleFilter() - Method in class it.unimi.di.law.bubing.Agent
-
- getSocketTimeout() - Method in class it.unimi.di.law.bubing.Agent
-
- getStatsThread() - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
- getStatusLine() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- getStatusLine() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
- getStoreFilter() - Method in class it.unimi.di.law.bubing.Agent
-
- getStoreSize() - Method in class it.unimi.di.law.bubing.Agent
-
- GetSuffix(int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Get the suffix.
- getTabSize() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- getToDoSize() - Method in class it.unimi.di.law.bubing.Agent
-
- getToken(int) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Get the specific Token.
- getUnknownHosts() - Method in class it.unimi.di.law.bubing.Agent
-
- getUnresolved() - Method in class it.unimi.di.law.bubing.Agent
-
- getUrlCacheMaxByteSize() - Method in class it.unimi.di.law.bubing.Agent
-
- getUrlDelay() - Method in class it.unimi.di.law.bubing.Agent
-
- getURLsInQueues() - Method in class it.unimi.di.law.bubing.Agent
-
- getURLsInQueuesPercentage() - Method in class it.unimi.di.law.bubing.Agent
-
- getValue() - Method in class it.unimi.di.law.warc.filters.parser.Token
-
An optional attribute value of the Token.
- getVisitStates() - Method in class it.unimi.di.law.bubing.Agent
-
- getVisitStates() - Method in class it.unimi.di.law.bubing.frontier.StatsThread
-
Returns the overall number of visit states.
- getVisitStatesOnDisk() - Method in class it.unimi.di.law.bubing.Agent
-
- getVisitStatesOnDisk() - Method in class it.unimi.di.law.bubing.frontier.StatsThread
-
Returns the number of visit states on disk.
- getVisitStatesOnWorkbench() - Method in class it.unimi.di.law.bubing.Agent
-
- getWaitingVisitStates() - Method in class it.unimi.di.law.bubing.Agent
-
- getWarcContentLength() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
-
- getWarcContentLength() - Method in interface it.unimi.di.law.warc.records.WarcRecord
-
Returns the WARC Content-Length
header.
- getWarcDate() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
-
- getWarcDate() - Method in interface it.unimi.di.law.warc.records.WarcRecord
-
Returns the WARC-Date
header.
- getWarcHeader(WarcHeader.Name) - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
-
- getWarcHeader(WarcHeader.Name) - Method in interface it.unimi.di.law.warc.records.WarcRecord
-
Returns the specified WARC header.
- getWarcHeaders() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
-
- getWarcHeaders() - Method in interface it.unimi.di.law.warc.records.WarcRecord
-
Returns the WARC headers.
- getWarcRecordId() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
-
- getWarcRecordId() - Method in interface it.unimi.di.law.warc.records.WarcRecord
-
Returns the WARC-Record-ID
header.
- getWarcTargetURI() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
-
Returns the WARC-Target-URI
header as a URI
.
- getWarcTargetURI() - Method in interface it.unimi.di.law.warc.records.WarcRecord
-
Returns the WARC-Target-URI
header as a URI
.
- getWarcType() - Method in class it.unimi.di.law.warc.records.AbstractWarcRecord
-
- getWarcType() - Method in interface it.unimi.di.law.warc.records.WarcRecord
-
Returns the WARC-Type
header.
- getWorkbenchByteSize() - Method in class it.unimi.di.law.bubing.Agent
-
- getWorkbenchEntry(byte[]) - Method in class it.unimi.di.law.bubing.frontier.Workbench
-
Returns a workbench entry for the given address, possibly creating one.
- getWorkbenchMaxByteSize() - Method in class it.unimi.di.law.bubing.Agent
-
- ground() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
- group - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- group - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The group of this agent; all agents belonging to the same group will coordinate their crawling activity.
- guessedCharset - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
The charset we guessed for the last response.
- guessedCharset() - Method in class it.unimi.di.law.bubing.parser.BinaryParser
-
- guessedCharset() - Method in class it.unimi.di.law.bubing.parser.HTMLParser
-
- guessedCharset() - Method in interface it.unimi.di.law.bubing.parser.Parser
-
Returns a guessed charset for the document, or null
if the charset could not be
guessed.
- GZIP_START - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- GZIPArchive - Class in it.unimi.di.law.warc.io.gzarc
-
A GZIP archive is an archive made of (concatenated) GZIP entries that are usual GZIP files, except for the
presence of two extra fields (in the GZIP header) containing the compressed and uncompressed length of the entry itself.
- GZIPArchive() - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- GZIPArchive.Entry - Class in it.unimi.di.law.warc.io.gzarc
-
- GZIPArchive.FormatException - Exception in it.unimi.di.law.warc.io.gzarc
-
- GZIPArchive.ReadEntry - Class in it.unimi.di.law.warc.io.gzarc
-
An entry used to read a GZIP archive entry.
- GZIPArchive.ReadEntry.LazyInflater - Interface in it.unimi.di.law.warc.io.gzarc
-
The lazy infalter that can be used to get (part of the) uncompressed entry content.
- GZIPArchive.WriteEntry - Class in it.unimi.di.law.warc.io.gzarc
-
An entry used to write a GZIP archive entry.
- GZIPArchiveReader - Class in it.unimi.di.law.warc.io.gzarc
-
- GZIPArchiveReader(InputStream) - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPArchiveReader
-
- GZIPArchiveWriter - Class in it.unimi.di.law.warc.io.gzarc
-
- GZIPArchiveWriter(OutputStream) - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPArchiveWriter
-
- GZIPIndexer - Class in it.unimi.di.law.warc.io.gzarc
-
- GZIPIndexer() - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPIndexer
-
- IdentityHttpEntityFactory - Class in it.unimi.di.law.warc.util
-
An implementation of a
HttpEntityFactory
that returns an entity simply wrapping the given one.
- IdentityProcessor - Class in it.unimi.di.law.warc.processors
-
- IdentitySieve<K,V> - Class in it.unimi.di.law.bubing.sieve
-
- IdentitySieve(AbstractSieve.NewFlowReceiver<K>, ByteSerializerDeserializer<K>, ByteSerializerDeserializer<V>, AbstractHashFunction<K>, AbstractSieve.UpdateStrategy<K, V>) - Constructor for class it.unimi.di.law.bubing.sieve.IdentitySieve
-
- IdentityWriter - Class in it.unimi.di.law.warc.processors
-
A writer that simply writes the given record.
- IdentityWriter() - Constructor for class it.unimi.di.law.warc.processors.IdentityWriter
-
- image - Variable in class it.unimi.di.law.warc.filters.parser.Token
-
The string image of the token.
- ImmutableGraphNamedGraphServer - Class in it.unimi.di.law.bubing.test
-
- ImmutableGraphNamedGraphServer(ImmutableGraph, StringMap<? extends CharSequence>) - Constructor for class it.unimi.di.law.bubing.test.ImmutableGraphNamedGraphServer
-
Builds the server.
- inBuf - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- index(InputStream) - Static method in class it.unimi.di.law.warc.io.gzarc.GZIPIndexer
-
Returns a list of pointers to a GZIP archive entries positions (including the end of file).
- index(InputStream, ProgressLogger) - Static method in class it.unimi.di.law.warc.io.gzarc.GZIPIndexer
-
Returns a list of pointers to a GZIP archive entries positions (including the end of file).
- InfoWarcRecord - Class in it.unimi.di.law.warc.records
-
- InfoWarcRecord(Header[]) - Constructor for class it.unimi.di.law.warc.records.InfoWarcRecord
-
- init(URI) - Method in class it.unimi.di.law.bubing.parser.BinaryParser
-
- init(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
Initializes the digest computation.
- init(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
-
- init(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
-
Initializes this receiver for a new page.
- init(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.TextProcessor
-
Initializes this processor for a new page.
- init(URI) - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
-
- init(URI, byte[], char[][]) - Method in class it.unimi.di.law.bubing.frontier.ParsingThread.FrontierEnqueuer
-
Initializes the enqueuer for parsing a page with a specific scheme+authority and robots filter.
- input_stream - Variable in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
- inputStream - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- InspectableCachedHttpEntity - Class in it.unimi.di.law.warc.util
-
An implementation of a HttpEntity
that is reusable and can copy its content from another entity at a controlled rate.
- InspectableCachedHttpEntity(InspectableFileCachedInputStream) - Constructor for class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
-
- INSTANCE - Static variable in class it.unimi.di.law.warc.filters.IsHttpResponse
-
- INSTANCE - Static variable in class it.unimi.di.law.warc.filters.IsProbablyBinary
-
- INSTANCE - Static variable in class it.unimi.di.law.warc.processors.IdentityProcessor
-
- INSTANCE - Static variable in class it.unimi.di.law.warc.util.BufferedHttpEntityFactory
-
- INSTANCE - Static variable in class it.unimi.di.law.warc.util.IdentityHttpEntityFactory
-
- INT_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- INTEGER - Static variable in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
-
A trivial serializer-deserializer for Integer
.
- inUse - Variable in class it.unimi.di.law.bubing.util.FetchData
-
If true, this istance has been enqueued to the list of results and we are waiting
for the signal of the
ParsingThread
that is analyzing it.
- INVALID_LEXICAL_STATE - Static variable in error it.unimi.di.law.warc.filters.parser.TokenMgrError
-
Tried to change to an invalid lexical state.
- ipAddress - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
The IP address of this workbench entry, computed by DnsResolver.resolve(String)
.
- ipDelay - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- ipDelay - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The minimum delay between two consecutive fetches from the same IP address.
- ipDelayFactor - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- ipDelayFactor - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
An attenuation factor for the multiple-agent IP delay mechanism.
- isAlive(long) - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Return whether this visit state is fetchable (i.e., if there is at leas one URL and it is allowed by politeness to fetch it).
- isDuplicate() - Method in class it.unimi.di.law.bubing.util.FetchData
-
Get whether the current FetchData
is duplicate or not
- isDuplicate(boolean) - Method in class it.unimi.di.law.bubing.util.FetchData
-
Mark the current FetchData
as duplicated or not duplicated
- isEmpty() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Returns whether this visit state is empty.
- isEmpty() - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Returns whether the set is empty.
- isEmpty() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
Returns true if the visit-state queue is not empty.
- isEmpty() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
Returns whether the workbench is empty.
- isEmpty() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
- isEmpty() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
- isEmpty() - Method in class it.unimi.di.law.warc.util.ReorderingBlockingQueue
-
Returns whether this queue is empty.
- isEntirelyBroken() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
- IsHttpResponse - Class in it.unimi.di.law.warc.filters
-
A filter accepting only records that are http/https responses.
- IsProbablyBinary - Class in it.unimi.di.law.warc.filters
-
A filter accepting only http responses whose content stream appears to be binary.
- it.unimi.di.law.bubing - package it.unimi.di.law.bubing
-
- it.unimi.di.law.bubing.frontier - package it.unimi.di.law.bubing.frontier
-
A set of classes orchestrating the movement of URLs to be fetched next by a BUbiNG agent.
- it.unimi.di.law.bubing.frontier.dns - package it.unimi.di.law.bubing.frontier.dns
-
- it.unimi.di.law.bubing.parser - package it.unimi.di.law.bubing.parser
-
The system of parsers used for analyzing HTTP responses.
- it.unimi.di.law.bubing.sieve - package it.unimi.di.law.bubing.sieve
-
- it.unimi.di.law.bubing.spam - package it.unimi.di.law.bubing.spam
-
- it.unimi.di.law.bubing.store - package it.unimi.di.law.bubing.store
-
Implementations of the
Store
interface.
- it.unimi.di.law.bubing.test - package it.unimi.di.law.bubing.test
-
- it.unimi.di.law.bubing.tool - package it.unimi.di.law.bubing.tool
-
- it.unimi.di.law.bubing.util - package it.unimi.di.law.bubing.util
-
- it.unimi.di.law.warc - package it.unimi.di.law.warc
-
An implementation of the Web ARChive file format (WARC) specification.
- it.unimi.di.law.warc.filters - package it.unimi.di.law.warc.filters
-
A comprehensive filtering system.
- it.unimi.di.law.warc.filters.parser - package it.unimi.di.law.warc.filters.parser
-
- it.unimi.di.law.warc.io - package it.unimi.di.law.warc.io
-
I/O of WARC formatted files.
- it.unimi.di.law.warc.io.gzarc - package it.unimi.di.law.warc.io.gzarc
-
An implementation of a (skippable) GZIP archive.
- it.unimi.di.law.warc.processors - package it.unimi.di.law.warc.processors
-
Processors to manipulate WARC files.
- it.unimi.di.law.warc.records - package it.unimi.di.law.warc.records
-
WARC records.
- it.unimi.di.law.warc.tool - package it.unimi.di.law.warc.tool
-
- it.unimi.di.law.warc.util - package it.unimi.di.law.warc.util
-
Utility classes used by the it.unimi.di.law.warc
package.
- iterator() - Method in class it.unimi.di.law.bubing.frontier.Workbench
-
Returns an (unmodifiable) iterator over the entries currently on the workbench.
- iterator() - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
-
- lastAppendedWasSpace - Variable in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
True iff the last character appended was a space.
- lastExceptionClass - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
If not null
, this fields contains the class of the exception that was
thrown during the last attempt to access this scheme+authority.
- lastHighCostStat - Variable in class it.unimi.di.law.bubing.frontier.Distributor
-
The last time we produced a high-cost statistics.
- lastPurgeCheck - Variable in class it.unimi.di.law.bubing.frontier.Distributor
-
The last time we checked for visit states to be purged.
- lastRobotsFetch - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
System.currentTimeMillis()
when we fetched the robots we are
caching.
- lazyInflater - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.ReadEntry
-
- length() - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
-
- length() - Method in class it.unimi.di.law.bubing.util.FetchData
-
Returns (an approximation of) the length of the response (headers and body).
- lengthOfHost(byte[], int) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Finds the length of the host part in a scheme+authority or URL.
- LEXICAL_ERROR - Static variable in error it.unimi.di.law.warc.filters.parser.TokenMgrError
-
Lexical error occurred.
- LexicalErr(boolean, int, int, int, String, int) - Static method in error it.unimi.di.law.warc.filters.parser.TokenMgrError
-
Returns a detailed message for the Error when it is thrown by the
token manager to indicate a lexical error.
- lexStateNames - Static variable in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
Lexer state names.
- line - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- link(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
-
- link(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
-
Handles a link.
- Link - Class in it.unimi.di.law.bubing.util
-
- Link(URI, URI) - Constructor for class it.unimi.di.law.bubing.util.Link
-
Creates a new link with given source and target.
- location - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
The location URL from headers of the last response, if any, or null
.
- location() - Method in class it.unimi.di.law.bubing.parser.HTMLParser
-
Returns the BURL location header, if present; if it is not present, but the page contains a valid metalocation, the latter
is returned.
- location(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
-
- location(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
-
Handles the location defined by headers.
- lock() - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
-
Acquires a locked copy of this map.
- LockedMap(ConcurrentCountingMap) - Constructor for class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
-
- LockFreeQueue<T> - Class in it.unimi.di.law.bubing.util
-
A thin layer around a
ConcurrentLinkedQueue
that exhibits a subset of the available methods,
and keeps track in an
AtomicLong
of the size of the queue,
so that
LockFreeQueue.size()
can return in constant time.
- LockFreeQueue() - Constructor for class it.unimi.di.law.bubing.util.LockFreeQueue
-
- log2LogFileSize - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
The base 2 logarithm of the byte size of a log file.
- logFilePositionMask - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
The mask to extract the position inside a log file from a pointer.
- logFileSize - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
The byte size of a log file.
- LOOP_DETECTED - Static variable in error it.unimi.di.law.warc.filters.parser.TokenMgrError
-
Detected (and bailed out of) an infinite loop in the token manager.
- LOOPBACK - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The loopback address, cached.
- main(String[]) - Static method in class it.unimi.di.law.bubing.Agent
-
- main(String[]) - Static method in class it.unimi.di.law.bubing.parser.HTMLParser
-
- main(String[]) - Static method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy
-
- main(String[]) - Static method in class it.unimi.di.law.bubing.tool.BuildRepetitionSet
-
- main(String[]) - Static method in class it.unimi.di.law.bubing.tool.CatEFGraphs
-
- main(String[]) - Static method in class it.unimi.di.law.bubing.tool.GenerateGraphMap
-
- main(String[]) - Static method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
- main(String[]) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
-
- main(String[]) - Static method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
-
- main(String[]) - Static method in class it.unimi.di.law.warc.io.gzarc.GZIPIndexer
-
- main(String[]) - Static method in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
-
- main(String[]) - Static method in class it.unimi.di.law.warc.tool.WarcCompressor
-
- mask - Variable in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
The mask for wrapping a position counter.
- mask - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
The mask for wrapping a position counter.
- mask - Variable in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
The mask for wrapping a position counter.
- max() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the maximum of the values added so far.
- MAX_TO_STRING_ROBOTS - Static variable in class it.unimi.di.law.bubing.util.URLRespectsRobots
-
The maximum number of robots entries returned by Object.toString()
.
- maxFill - Variable in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
The maximum number of entries that can be filled before rehashing.
- maxFill - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
The maximum number of entries that can be filled before rehashing.
- maxFill - Variable in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
Threshold after which we rehash.
- maxNextCharInd - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- maxUrls - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- maxUrls - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The maximum number of URLs to crawl.
- maxUrlsPerSchemeAuthority - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- maxUrlsPerSchemeAuthority - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The maximum number of URLs we shall download from each scheme+authority.
- mean() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the mean of the values added so far.
- memoryUsageOf(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Returns the memory usage associated to a byte array.
- MercatorSieve<K,V> - Class in it.unimi.di.law.bubing.sieve
-
- MercatorSieve(boolean, File, int, int, int, AbstractSieve.NewFlowReceiver<K>, ByteSerializerDeserializer<K>, ByteSerializerDeserializer<V>, AbstractHashFunction<K>, AbstractSieve.UpdateStrategy<K, V>) - Constructor for class it.unimi.di.law.bubing.sieve.MercatorSieve
-
Creates a new Mercator-like sieve.
- messageThread - Variable in class it.unimi.di.law.bubing.Agent
-
- MessageThread - Class in it.unimi.di.law.bubing.frontier
-
- MessageThread(Frontier) - Constructor for class it.unimi.di.law.bubing.frontier.MessageThread
-
Creates the thread.
- META_PATTERN - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
- metaLocation - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
The location URL from META
elements of the last response, if any, or null
.
- metaLocation(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
-
- metaLocation(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
-
Handles the location defined by a META
element.
- metaRefresh(URI) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
-
- metaRefresh(URI) - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
-
Handles the refresh defined by a META
element.
- min() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the minimum of the values added so far.
- MIN_FLUSH_INTERVAL - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The minimum number of milliseconds between two flushes.
- misses() - Method in class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache
-
Returns the number of cache misses.
- MOVING_AVERAGE_WINDOW - Static variable in class it.unimi.di.law.bubing.frontier.VisitState
-
The window over which we compute the moving average.
- mtime - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
-
The modification time of the entry.
- MurmurHash3 - Class in it.unimi.di.law.bubing.util
-
A 64-bit implementation of MurmurHash3 for byte-array fragments.
- MurmurHash3() - Constructor for class it.unimi.di.law.bubing.util.MurmurHash3
-
- ParallelBufferedWarcWriter - Class in it.unimi.di.law.warc.io
-
A parallel Warc writer.
- ParallelBufferedWarcWriter(OutputStream, boolean) - Constructor for class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
-
Creates a Warc parallel output stream using 2×Runtime.availableProcessors()
buffers.
- ParallelBufferedWarcWriter(OutputStream, boolean, int) - Constructor for class it.unimi.di.law.warc.io.ParallelBufferedWarcWriter
-
Creates a Warc parallel output stream.
- ParallelBufferedWarcWriter.WriterPair - Class in it.unimi.di.law.warc.io
-
- ParallelFilteredProcessorRunner - Class in it.unimi.di.law.warc.processors
-
- ParallelFilteredProcessorRunner(InputStream) - Constructor for class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
-
- ParallelFilteredProcessorRunner(InputStream, Filter<WarcRecord>) - Constructor for class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
-
- ParallelFilteredProcessorRunner.Processor<T> - Interface in it.unimi.di.law.warc.processors
-
- ParallelFilteredProcessorRunner.Writer<T> - Interface in it.unimi.di.law.warc.processors
-
- parse(MutableString) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Creates a new BUbiNG URL from a
mutable string
specification if possible, or returns
null
otherwise.
- parse(String) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Creates a new BUbiNG URL from a string specification if possible, or returns null
otherwise.
- parse(String) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
- parse(URI, HttpResponse, Parser.LinkReceiver) - Method in class it.unimi.di.law.bubing.parser.BinaryParser
-
- parse(URI, HttpResponse, Parser.LinkReceiver) - Method in class it.unimi.di.law.bubing.parser.HTMLParser
-
- parse(URI, HttpResponse, Parser.LinkReceiver) - Method in interface it.unimi.di.law.bubing.parser.Parser
-
Parses a response.
- parseBoolean(String) - Static method in class it.unimi.di.law.bubing.util.Util
-
Parses a Boolean value reliably, throwing an exception if the argument is not
true
or false
(case insensitively).
- parseDate(String) - Static method in class it.unimi.di.law.warc.records.WarcHeader
-
- ParseException - Exception in it.unimi.di.law.warc.filters.parser
-
This exception is thrown when parse errors are encountered.
- ParseException() - Constructor for exception it.unimi.di.law.warc.filters.parser.ParseException
-
The following constructors are for use by you for whatever
purpose you can think of.
- ParseException(Token, int[][], String[]) - Constructor for exception it.unimi.di.law.warc.filters.parser.ParseException
-
This constructor is used by the method "generateParseException"
in the generated parser.
- ParseException(String) - Constructor for exception it.unimi.di.law.warc.filters.parser.ParseException
-
Constructor with message.
- parseFilter - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- parseFilter - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A filter that will be applied to all fetched resources to decide whether to parse them.
- parseId(String) - Static method in class it.unimi.di.law.warc.records.WarcHeader
-
- Parser<T> - Interface in it.unimi.di.law.bubing.parser
-
A generic parser for responses
.
- Parser.LinkReceiver - Interface in it.unimi.di.law.bubing.parser
-
A class that can receive URLs discovered during parsing.
- Parser.TextProcessor<T> - Interface in it.unimi.di.law.bubing.parser
-
A class that can receive piece of text discovered during parsing.
- parseRobotsReader(Reader, String) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
-
Parses the argument as if it were the content of a robots.txt
file,
and returns a sorted array of prefixes of URLs that the agent should not follow.
- parseRobotsResponse(URIResponse, String) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
-
Parses a
robots.txt
file contained in a
FetchData
and
returns the corresponding filter as an array of sorted prefixes.
- parsers - Variable in class it.unimi.di.law.bubing.frontier.ParsingThread
-
The parsers used by this thread.
- parsers - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
The parser, instantiated.
- parsersFromSpecs(String[]) - Static method in class it.unimi.di.law.bubing.RuntimeConfiguration
-
Given an array of parser specifications, it returns the corresponding list of parsers (only
the correct specifications are put in the list.
- parserSpec - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
- parseTime(String) - Static method in class it.unimi.di.law.bubing.StartupConfiguration
-
- ParsingThread - Class in it.unimi.di.law.bubing.frontier
-
- ParsingThread(Frontier, Store, int) - Constructor for class it.unimi.di.law.bubing.frontier.ParsingThread
-
Creates a thread.
- ParsingThread.FrontierEnqueuer - Class in it.unimi.di.law.bubing.frontier
-
A small gadget used to insert links in the frontier.
- parsingThreads - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The parsing threads.
- parsingThreads - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- parsingThreads - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
- parsingThreads(int) - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
Changes the number of parsing threads.
- pathAndQuery(URI) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Returns the concatenated raw path and raw query of a BUbiNG URL.
- pathAndQueryAsByteArray(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Extracts the path and query of an absolute BUbiNG URL in its byte-array representation.
- pathAndQueryAsByteArray(ByteArrayList) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Extracts the path and query of an absolute BUbiNG URL in its byte-array representation.
- pathAndQueryAsByteArray(URI) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Returns an ASCII byte-array representation of
the raw path and raw query of a BUbiNG URL.
- PathEndsWithOneOf - Class in it.unimi.di.law.warc.filters
-
A filter accepting only URIs whose path ends (case-insensitively) with one of a given set of suffixes.
- PathEndsWithOneOf(String[]) - Constructor for class it.unimi.di.law.warc.filters.PathEndsWithOneOf
-
Creates a filter that only accepts URLs whose path ends with one of a given set of suffixes.
- pathQueriesInQueues - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The overall number of path+queries stored in
VisitState
queues.
- PATHQUERIESINQUEUES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- pathQueryLimit() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Returns an estimate of the number of path+queries that this visit state should keep in memory.
- pause() - Method in class it.unimi.di.law.bubing.Agent
-
- paused - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
Whether the crawler is currently paused.
- pointer() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Returns the current pointer.
- pointer(long) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Sets the current pointer.
- poll() - Method in class it.unimi.di.law.bubing.util.LockFreeQueue
-
- position(long) - Method in class it.unimi.di.law.warc.io.CompressedWarcCachingReader
-
- position(long) - Method in class it.unimi.di.law.warc.io.CompressedWarcReader
-
- position(long) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveReader
-
- position(long) - Method in class it.unimi.di.law.warc.io.UncompressedWarcReader
-
- position(long) - Method in interface it.unimi.di.law.warc.io.WarcCachingReader
-
- position(long) - Method in interface it.unimi.di.law.warc.io.WarcReader
-
- prepareToAppend() - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
- prepareToAppend() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
-
- prepareToAppend() - Method in interface it.unimi.di.law.bubing.sieve.AbstractSieve.NewFlowReceiver
-
A new flow of keys is ready and will start being appended.
- prevCharIsCR - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- prevCharIsLF - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- print(String) - Method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy.CapriciousPrintWriter
-
- println() - Method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy.CapriciousPrintWriter
-
- println(String) - Method in class it.unimi.di.law.bubing.test.NamedGraphServerHttpProxy.CapriciousPrintWriter
-
- process(Parser.LinkReceiver, URI, String) - Method in class it.unimi.di.law.bubing.parser.HTMLParser
-
Pre-process a string that represents a raw link found in the page, trying to derelativize it.
- process(WarcRecord, long) - Method in class it.unimi.di.law.warc.processors.IdentityProcessor
-
- process(WarcRecord, long) - Method in interface it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner.Processor
-
- process(WarcRecord, long) - Method in class it.unimi.di.law.warc.processors.ResponseContentExtractor
-
- process(WarcRecord, long) - Method in class it.unimi.di.law.warc.processors.WarcTargetUriExtractor
-
- PROTOCOL_VERSION - Static variable in interface it.unimi.di.law.warc.records.WarcRecord
-
The version of the supported format.
- proxyHost - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- proxyHost - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The proxy host, if a proxy should be used; an empty value means that the proxy should not be set.
- proxyPort - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- proxyPort - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
- put(byte[], int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
-
- put(byte[], int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
-
Sets the value associated with a given key.
- put(byte[], int, int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
-
- put(byte[], int, int, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap
-
Sets the value associated with a given key.
- put(byte[], int, int, long, int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
- put(E, long) - Method in class it.unimi.di.law.warc.util.ReorderingBlockingQueue
-
Inserts an element with given timestamp, waiting for space to become available
if the timestamp of the element minus the current timestamp of the queue exceeds
the queue capacity.
- putInEntryIfNotEmpty() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Puts this visit state in its entry, if it not empty.
- putOnWorkbenchIfNotEmpty(Workbench) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
Puts this entry on the workbench, if not
empty.
- RandomNamedGraphServer - Class in it.unimi.di.law.bubing.test
-
- RandomNamedGraphServer(int, int, int) - Constructor for class it.unimi.di.law.bubing.test.RandomNamedGraphServer
-
Builds the server.
- RandomNamedGraphServer(int, int, int, boolean) - Constructor for class it.unimi.di.law.bubing.test.RandomNamedGraphServer
-
- ratio() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
- rc - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The runtime configuration.
- read() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Reads a byte at the current pointer.
- read() - Method in class it.unimi.di.law.warc.io.CompressedWarcReader
-
- read() - Method in class it.unimi.di.law.warc.io.UncompressedWarcReader
-
- read() - Method in interface it.unimi.di.law.warc.io.WarcReader
-
- read(boolean) - Method in class it.unimi.di.law.warc.io.AbstractWarcReader
-
- read(byte[], int, int) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Reads a specified number of bytes at the current pointer.
- readByteArray(ObjectInputStream) - Static method in class it.unimi.di.law.bubing.util.Util
-
Reads a byte array prefixed by its length encoded using vByte.
- readChar() - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Read a character.
- ReadEntry() - Constructor for class it.unimi.di.law.warc.io.gzarc.GZIPArchive.ReadEntry
-
- readLong() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Reads a long at the current pointer.
- readMetadata() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
-
- readVByte(InputStream) - Static method in class it.unimi.di.law.bubing.util.Util
-
Decodes a natural number from an InputStream
using vByte.
- READY_URLS_BUFFER_SIZE - Static variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- readyURLs - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- READYURLSSIZE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- receive(BubingJob) - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
- receivedURLs - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- receivedURLsLogger - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
-
A global progress logger, counting the URLs received from other agents.
- RECEIVEDURLSSIZE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- refill - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- rehash(int) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Rehashes the state set to a new size.
- rehash(int) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
Rehashes the set to a new size.
- rehash(int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
Rehashes the stripe.
- ReInit(FilterParserTokenManager) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Reinitialise.
- ReInit(SimpleCharStream) - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
Reinitialise parser.
- ReInit(SimpleCharStream, int) - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
Reinitialise parser.
- ReInit(InputStream) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Reinitialise.
- ReInit(InputStream) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Reinitialise.
- ReInit(InputStream, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Reinitialise.
- ReInit(InputStream, int, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Reinitialise.
- ReInit(InputStream, String) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Reinitialise.
- ReInit(InputStream, String) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Reinitialise.
- ReInit(InputStream, String, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Reinitialise.
- ReInit(InputStream, String, int, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Reinitialise.
- ReInit(Reader) - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Reinitialise.
- ReInit(Reader) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Reinitialise.
- ReInit(Reader, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Reinitialise.
- ReInit(Reader, int, int, int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Reinitialise.
- relativeStandardDeviation() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the relative standard deviation of the values added so far.
- release(VisitState) - Method in class it.unimi.di.law.bubing.frontier.Workbench
-
Releases a previously
acquired visit state.
- remaining() - Method in class it.unimi.di.law.warc.util.BoundSessionInputBuffer
-
Returns the number of unread bytes (in the buffered stream).
- remove() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
Removes the top element from the visit-state queue.
- remove(byte[], int, int, long) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
- remove(VisitState) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Removes a given visit state.
- remove(VisitState) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
-
Removes all path+queries associated with the given visit state.
- remove(WorkbenchEntry) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
Removes a given workbench entry.
- remove(Object) - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Remove all elements associated with a given key.
- ReorderingBlockingQueue<E> - Class in it.unimi.di.law.warc.util
-
A blocking queue holding a fixed amount of timestamped items.
- ReorderingBlockingQueue(int) - Constructor for class it.unimi.di.law.warc.util.ReorderingBlockingQueue
-
Creates a ReorderingBlockingQueue
with the given fixed
capacity.
- REQUEST - it.unimi.di.law.warc.records.WarcRecord.Type
-
- requestLogger - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
-
A global progress logger, measuring the number of completed requests.
- requiredFrontSize - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The current estimation for the size of the front in IP addresses.
- REQUIREDFRONTSIZE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- resetFetchingThreadsWaitingStats() - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
- resolve(String) - Method in class it.unimi.di.law.bubing.frontier.dns.DnsJavaResolver
-
- resolve(String) - Method in class it.unimi.di.law.bubing.frontier.dns.FakeResolver
-
- resolve(String) - Method in class it.unimi.di.law.bubing.frontier.dns.JavaResolver
-
- resolvedVisitStates - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
-
The number of resolved visit states.
- resourceLogger - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
-
A global progress logger, measuring the number of non-duplicate resources actually stored.
- response - Variable in class it.unimi.di.law.bubing.util.FetchData
-
The response from Apache Http Components returned during the last fetch.
- response() - Method in class it.unimi.di.law.bubing.util.FetchData
-
- response() - Method in interface it.unimi.di.law.warc.filters.URIResponse
-
Returns the response part.
- response() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- RESPONSE - it.unimi.di.law.warc.records.WarcRecord.Type
-
- responseBodyMaxByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- responseBodyMaxByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The maximum size (in bytes) of a response body.
- responseCacheDir - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- responseCacheDir - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
- ResponseContentExtractor - Class in it.unimi.di.law.warc.processors
-
- ResponseMatches - Class in it.unimi.di.law.warc.filters
-
A filter accepting only http responses whose content stream (in ISO-8859-1 encoding) matches a regular expression.
- ResponseMatches(Pattern) - Constructor for class it.unimi.di.law.warc.filters.ResponseMatches
-
- restore() - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
Restores data from the given directory.
- result() - Method in class it.unimi.di.law.bubing.parser.BinaryParser
-
- result() - Method in class it.unimi.di.law.bubing.parser.HTMLParser
-
- result() - Method in interface it.unimi.di.law.bubing.parser.Parser
-
Returns the result of the processing.
- result() - Method in interface it.unimi.di.law.bubing.parser.Parser.TextProcessor
-
Returns the result of the processing.
- result() - Method in class it.unimi.di.law.bubing.parser.SpamTextProcessor
-
- results - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- resume() - Method in class it.unimi.di.law.bubing.Agent
-
- retries - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
- robots - Variable in class it.unimi.di.law.bubing.util.FetchData
-
Whether we are fecthing a robots file.
- ROBOTS_PATH - Static variable in class it.unimi.di.law.bubing.frontier.VisitState
-
A special path marking a robots.txt
refresh request.
- robotsExpiration - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- robotsExpiration - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The delay after which the robots.txt
file is no longer considered valid.
- robotsFilter - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
- robotsRequestConfig - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The default configuration for a robots.txt
request.
- robotsWarcParallelOutputStream - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The Warc file where to write (if so requested) the downloaded robots.txt
files.
- rootDir - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- rootDir - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A root directory from which the remainig one will be stemmed, if
they are relative.
- run() - Method in class it.unimi.di.law.bubing.frontier.Distributor
-
- run() - Method in class it.unimi.di.law.bubing.frontier.DNSThread
-
- run() - Method in class it.unimi.di.law.bubing.frontier.DoneThread
-
- run() - Method in class it.unimi.di.law.bubing.frontier.FetchingThread
-
- run() - Method in class it.unimi.di.law.bubing.frontier.MessageThread
-
- run() - Method in class it.unimi.di.law.bubing.frontier.ParsingThread
-
- run() - Method in class it.unimi.di.law.bubing.frontier.QuickMessageThread
-
- run() - Method in class it.unimi.di.law.bubing.frontier.StatsThread
-
- run() - Method in class it.unimi.di.law.bubing.frontier.TodoThread
-
- run() - Method in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
-
- run(int) - Method in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
-
- runSequentially() - Method in class it.unimi.di.law.warc.processors.ParallelFilteredProcessorRunner
-
- RuntimeConfiguration - Class in it.unimi.di.law.bubing
-
Global data shared by all threads.
- RuntimeConfiguration(StartupConfiguration) - Constructor for class it.unimi.di.law.bubing.RuntimeConfiguration
-
- SameHost - Class in it.unimi.di.law.warc.filters
-
A filter accepting only inter-host links.
- SameHost() - Constructor for class it.unimi.di.law.warc.filters.SameHost
-
- sampleRelativeStandardDeviation() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the sample relative standard deviation of the values added so far.
- sampleStandardDeviation() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the sample standard deviation of the values added so far.
- sampleVariance() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the sample variance of the values added so far.
- scheduledLinks - Variable in class it.unimi.di.law.bubing.frontier.ParsingThread.FrontierEnqueuer
-
- scheduleFilter - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- scheduleFilter - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A filter that will be applied to all links obtained by parsing a page before scheduling them.
- schedulePurge() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
- schemeAndAuthority(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Extracts the scheme+authority of an absolute BUbiNG URL in its byte-array representation.
- schemeAndAuthority(URI) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Returns the concatenated URI.getScheme() and raw authority
of a BUbiNG URL.
- schemeAndAuthorityAsByteArray(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Extracts the scheme+authority of an absolute BUbiNG URL in its byte-array representation.
- schemeAuthority - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
The scheme and authority visited by this visit state.
- schemeAuthority2Count - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
A synchronized, highly concurrent map from scheme+authorities to number of stored URLs.
- schemeAuthority2VisitState - Variable in class it.unimi.di.law.bubing.frontier.Distributor
-
An
unsynchronized map from scheme+authorities to the corresponding
VisitState
.
- schemeAuthorityDelay - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- schemeAuthorityDelay - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The minimum delay between two consecutive fetches from the same scheme+authority.
- SchemeEquals - Class in it.unimi.di.law.warc.filters
-
A filter accepting only URIs whose scheme equals a certain string (typically, http
).
- SchemeEquals(String) - Constructor for class it.unimi.di.law.warc.filters.SchemeEquals
-
Creates a filter that only accepts URIs with a given scheme.
- seed - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
An iterator returning URIs that are then used as a seed; this iterator may return null
(when
invalid or relative URLs are specified).
- seed - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A URL from which BUbiNG will start crawling.
- setComment(String) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
-
Sets the comment of the entry.
- setConnectionTimeout(int) - Method in class it.unimi.di.law.bubing.Agent
-
- setDebugStream(PrintStream) - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
Set debug output.
- setDnsThreads(int) - Method in class it.unimi.di.law.bubing.Agent
-
- setEntity(HttpEntity) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- setEntity(HttpEntity) - Method in class it.unimi.di.law.warc.util.InspectableCachedHttpEntity
-
- setEntity(HttpEntity) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
- setFetchFilter(String) - Method in class it.unimi.di.law.bubing.Agent
-
- setFetchingThreads(int) - Method in class it.unimi.di.law.bubing.Agent
-
- setFollowFilter(String) - Method in class it.unimi.di.law.bubing.Agent
-
- setInput(InputStream) - Method in class it.unimi.di.law.warc.io.AbstractWarcReader
-
- setIpDelay(long) - Method in class it.unimi.di.law.bubing.Agent
-
- setKeepAliveTime(int) - Method in class it.unimi.di.law.bubing.Agent
-
- SetLinkReceiver() - Constructor for class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
-
- setLocale(Locale) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
Deprecated.
- setLocale(Locale) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
Deprecated.
- setMaxUrls(long) - Method in class it.unimi.di.law.bubing.Agent
-
- setName(String) - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
-
Sets the name of the entry.
- setNewFlowRecevier(AbstractSieve.NewFlowReceiver<K>) - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve
-
Sets the receiver for the new flows generated by this sieve.
- setParseFilter(String) - Method in class it.unimi.di.law.bubing.Agent
-
- setParsingThreads(int) - Method in class it.unimi.di.law.bubing.Agent
-
- setReasonPhrase(String) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- setReasonPhrase(String) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
- setResponseBodyMaxByteSize(int) - Method in class it.unimi.di.law.bubing.Agent
-
- setRobotsExpiration(long) - Method in class it.unimi.di.law.bubing.Agent
-
- setScheduleFilter(String) - Method in class it.unimi.di.law.bubing.Agent
-
- setSchemeAuthorityDelay(long) - Method in class it.unimi.di.law.bubing.Agent
-
- setSocketTimeout(int) - Method in class it.unimi.di.law.bubing.Agent
-
- setStatusCode(int) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- setStatusCode(int) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
- setStatusLine(ProtocolVersion, int) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- setStatusLine(ProtocolVersion, int) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
- setStatusLine(ProtocolVersion, int, String) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- setStatusLine(ProtocolVersion, int, String) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
- setStatusLine(StatusLine) - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- setStatusLine(StatusLine) - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
- setStoreFilter(String) - Method in class it.unimi.di.law.bubing.Agent
-
- setTabSize(int) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- setUrlCacheMaxByteSize(long) - Method in class it.unimi.di.law.bubing.Agent
-
- setWorkbenchEntry(WorkbenchEntry) - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Sets the workbench entry and put this visit state in its entry if it is nonempty.
- setWorkbenchMaxByteSize(long) - Method in class it.unimi.di.law.bubing.Agent
-
- shiftKeys(int) - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Shifts left entries with the specified hash code, starting at the specified position, and
empties the resulting free entry.
- shiftKeys(int) - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
Shifts left entries with the specified hash code, starting at the specified position, and
empties the resulting free entry.
- shiftKeys(int) - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
Shifts left entries with the specified hash code, starting at the specified position, and
empties the resulting free entry.
- SHORT_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- sieve - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- sieveAuxFileIOBufferByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- sieveAuxFileIOBufferByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The I/O buffer used to write the auxiliary file (containing URLs) and to read it back during flushes.
- sieveDir - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- sieveDir - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A directory for storing files related to the
sieve.
- SieveEntry(K, V) - Constructor for class it.unimi.di.law.bubing.sieve.AbstractSieve.SieveEntry
-
- sieveSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- sieveSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The number of slots in the sieve.
- sieveStoreIOBufferByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- sieveStoreIOBufferByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The size of the two buffers used to read the 64-bit hashes stored by the sieve during flushes.
- SimpleCharStream - Class in it.unimi.di.law.warc.filters.parser
-
An implementation of interface CharStream, where the stream is assumed to
contain only ASCII characters (without unicode processing).
- SimpleCharStream(InputStream) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Constructor.
- SimpleCharStream(InputStream, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Constructor.
- SimpleCharStream(InputStream, int, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Constructor.
- SimpleCharStream(InputStream, String) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Constructor.
- SimpleCharStream(InputStream, String, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Constructor.
- SimpleCharStream(InputStream, String, int, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Constructor.
- SimpleCharStream(Reader) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Constructor.
- SimpleCharStream(Reader, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Constructor.
- SimpleCharStream(Reader, int, int, int) - Constructor for class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Constructor.
- size - Variable in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Number of entries in the set.
- size - Variable in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
Number of entries in the set.
- size - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
The overall number of elements in the queues.
- size - Variable in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
Number of entries in the stripe.
- size() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
Computes the size (i.e., number of URLs) in this visit state.
- size() - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
The number of visit states.
- size() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
Returns the number of visit states currently in the visit-state queue.
- size() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntrySet
-
Returns the size (number of entries) in the workbench.
- size() - Method in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
-
- size() - Method in interface it.unimi.di.law.bubing.parser.Parser.LinkReceiver
-
- size() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DiskNewFlow
-
- size() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Deprecated.
- size() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Deprecated.
- size() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Deprecated.
- size() - Method in class it.unimi.di.law.bubing.util.LockFreeQueue
-
Returns the (approximate) size of this queue.
- size() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Deprecated.
- size() - Method in class it.unimi.di.law.warc.util.ReorderingBlockingQueue
-
Returns the number of elements in this queue.
- size64() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
- size64() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
Returns the overall number of elements in the queues.
- size64() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the number of values added so far.
- size64() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
- skip(FastBufferedInputStream) - Method in class it.unimi.di.law.bubing.sieve.ByteArrayListByteSerializerDeserializer
-
- skip(FastBufferedInputStream) - Method in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
-
Skip an object, usually without deserializing it.
- skip(FastBufferedInputStream) - Method in class it.unimi.di.law.bubing.sieve.CharSequenceByteSerializerDeserializer
-
- SKIP_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- skipEntry() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchiveReader
-
- snap() - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
Snaps fields to files in the given directory.
- socketTimeout - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- socketTimeout - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The socket timeout.
- source - Variable in class it.unimi.di.law.bubing.util.Link
-
- spamDetectionPeriodicity - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- spamDetectionPeriodicity - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The number of pages per scheme+authority after which spam detection is performed again periodically.
- spamDetectionThreshold - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- spamDetectionThreshold - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The number of pages per scheme+authority after which spam detection is performed.
- spamDetector - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- SpamDetector<T> - Interface in it.unimi.di.law.bubing.spam
-
A detector for spam sites.
- spamDetectorUri - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
An optional
SpamDetector
; this
URI
should point to a serialized instance.
- spammicity - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
The spammicity score, if computed; -1, otherwise.
- SpamTextProcessor - Class in it.unimi.di.law.bubing.parser
-
- SpamTextProcessor(Object2LongFunction<MutableString>) - Constructor for class it.unimi.di.law.bubing.parser.SpamTextProcessor
-
- SpamTextProcessor(String) - Constructor for class it.unimi.di.law.bubing.parser.SpamTextProcessor
-
- SpamTextProcessor.TermCount - Class in it.unimi.di.law.bubing.parser
-
- specialToken - Variable in class it.unimi.di.law.warc.filters.parser.Token
-
This field is used to access special tokens that occur prior to this
token, but after the immediately preceding regular (non-special) token.
- speedDist - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The logarithmically binned statistics of download speed in bits/s.
- standardDeviation() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the standard deviation of the values added so far.
- standardFilters() - Static method in class it.unimi.di.law.warc.filters.Filters
-
Returns a list of the standard filter classes.
- start() - Method in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Parser.
- start(long) - Method in class it.unimi.di.law.bubing.frontier.StatsThread
-
Starst all progress loggers.
- startOfHost(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Finds the start of the host part in a URL.
- startOfpathAndQuery(byte[]) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Returns the starting position (i.e., the slash) of the path+query in the given BUbiNG URL.
- startPaused - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- startPaused - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
Whether we should start in paused state.
- startTag(StartTag) - Method in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
- startTags - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser.DigestAppendable
-
Cached byte representations of all opening tags.
- startTime - Variable in class it.unimi.di.law.bubing.util.FetchData
-
System.currentTimeMillis()
when the GET request was issued.
- StartupConfiguration - Class in it.unimi.di.law.bubing
-
A class whose public fields represent the configuration of BUbiNG at startup.
- StartupConfiguration(File) - Constructor for class it.unimi.di.law.bubing.StartupConfiguration
-
Creates a configuration starting from a given file.
- StartupConfiguration(File, Configuration) - Constructor for class it.unimi.di.law.bubing.StartupConfiguration
-
Creates a configuration starting from a given file and possibly adding and/or overriding some
properties with new values.
- StartupConfiguration(String) - Constructor for class it.unimi.di.law.bubing.StartupConfiguration
-
Creates a configuration starting from a given file.
- StartupConfiguration(String, Configuration) - Constructor for class it.unimi.di.law.bubing.StartupConfiguration
-
Creates a configuration starting from a given file and possibly adding and/or overriding some
properties with new values.
- StartupConfiguration(Configuration) - Constructor for class it.unimi.di.law.bubing.StartupConfiguration
-
Populate the object fields starting from the given configuration.
- StartupConfiguration.DnsResolverSpecification - Annotation Type in it.unimi.di.law.bubing
-
A marker for the DnsResolver
class specification.
- StartupConfiguration.FilterSpecification - Annotation Type in it.unimi.di.law.bubing
-
A marker for
filter specifications.
- StartupConfiguration.ManyValuesSpecification - Annotation Type in it.unimi.di.law.bubing
-
A marker for specifications that may have multiple values.
- StartupConfiguration.OptionalSpecification - Annotation Type in it.unimi.di.law.bubing
-
A marker for optional specifications with a default parameter.
- StartupConfiguration.StoreSpecification - Annotation Type in it.unimi.di.law.bubing
-
A marker for the
Store
class specification.
- StartupConfiguration.TimeSpecification - Annotation Type in it.unimi.di.law.bubing
-
A marker for time specifications; such specification are by default in milliseconds, but it is possible
to use suffixes ms
(milliseconds), s
(seconds), m
(minutes), h
(hours) and d
(days).
- STATIC_LEXER_ERROR - Static variable in error it.unimi.di.law.warc.filters.parser.TokenMgrError
-
An attempt was made to create a second instance of a static token manager.
- staticFlag - Static variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
Whether parser is static.
- statsThread - Variable in class it.unimi.di.law.bubing.frontier.Distributor
-
The thread printing statistics.
- StatsThread - Class in it.unimi.di.law.bubing.frontier
-
- StatsThread(Frontier, Distributor) - Constructor for class it.unimi.di.law.bubing.frontier.StatsThread
-
Creates the thread.
- StatusCategory - Class in it.unimi.di.law.warc.filters
-
A filter accepting only fetched response whose status category (status/100) has a certain value.
- StatusCategory(int) - Constructor for class it.unimi.di.law.warc.filters.StatusCategory
-
Creates a filter that only accepts responses of the given category.
- stop - Variable in class it.unimi.di.law.bubing.frontier.DNSThread
-
Whether we should stop (used also to reduce the number of threads).
- stop - Variable in class it.unimi.di.law.bubing.frontier.DoneThread
-
When set to true, this thread will complete its execution.
- stop - Variable in class it.unimi.di.law.bubing.frontier.FetchingThread
-
Whether we should stop (used also to reduce the number of threads).
- stop - Variable in class it.unimi.di.law.bubing.frontier.MessageThread
-
When set to true, this thread will complete its execution.
- stop - Variable in class it.unimi.di.law.bubing.frontier.ParsingThread
-
Whether we should stop (used also to reduce the number of threads).
- stop - Variable in class it.unimi.di.law.bubing.frontier.QuickMessageThread
-
When set to true, this thread will complete its execution.
- stop() - Method in class it.unimi.di.law.bubing.Agent
-
- stopping - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
Whether the crawler is currently being stopping.
- store - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The store.
- store(URI, HttpResponse, boolean, byte[], String) - Method in interface it.unimi.di.law.bubing.store.Store
-
- store(URI, HttpResponse, boolean, byte[], String) - Method in class it.unimi.di.law.bubing.store.UnbufferedFileStore
-
- store(URI, HttpResponse, boolean, byte[], String) - Method in class it.unimi.di.law.bubing.store.WarcStore
-
- Store - Interface in it.unimi.di.law.bubing.store
-
An interface for components that are able to store pages.
- STORE_NAME - Static variable in class it.unimi.di.law.bubing.store.UnbufferedFileStore
-
- STORE_NAME - Static variable in class it.unimi.di.law.bubing.store.WarcStore
-
- storeClass - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- storeClass - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The class used to
Store
the resources.
- storeDir - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- storeDir - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A directory where the retrieved content will be written.
- storeFilter - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- storeFilter - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
A filter that will be applied to all fetched resources to decide whether to store them.
- StringHttpMessages - Class in it.unimi.di.law.warc.util
-
Mock implementations of some AbstractHttpMessage
.
- StringHttpMessages() - Constructor for class it.unimi.di.law.warc.util.StringHttpMessages
-
- StringHttpMessages.HttpRequest - Class in it.unimi.di.law.warc.util
-
A mock implementation of HttpRequest
.
- StringHttpMessages.HttpResponse - Class in it.unimi.di.law.warc.util
-
A mock implementation of HttpResponse
using strings (based on ByteArrayEntity
).
- Stripe() - Constructor for class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
Creates a new stripe.
- Stripe(long, float) - Constructor for class it.unimi.di.law.bubing.util.FastApproximateByteArrayCache.Stripe
-
Creates a new stripe.
- SUB_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- subDir(String, String) - Static method in class it.unimi.di.law.bubing.StartupConfiguration
-
Returns a File
object representing a child relative
to a parent, or just the child, if absolute.
- subSequence(int, int) - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
-
- successors(CharSequence) - Method in class it.unimi.di.law.bubing.test.ImmutableGraphNamedGraphServer
-
- successors(CharSequence) - Method in interface it.unimi.di.law.bubing.test.NamedGraphServer
-
If src
corresponds to the name of a node in the graph, this method returns
an array with the name of its successors (in some order); otherwise, it returns null
.
- successors(CharSequence) - Method in class it.unimi.di.law.bubing.test.RandomNamedGraphServer
-
- sum() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the sum of the values added so far.
- suspend() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Suspends this queue.
- suspend() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Suspends this queue.
- SwitchTo(int) - Method in class it.unimi.di.law.warc.filters.parser.FilterParserTokenManager
-
Switch to specified lex state.
- tabSize - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- tail - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues.QueueData
-
The pointer to the tail of the list (the most recently enqueued element).
- take() - Method in class it.unimi.di.law.warc.util.ReorderingBlockingQueue
-
Returns the element with the next timestamp, waiting until it is available.
- target - Variable in class it.unimi.di.law.bubing.util.Link
-
- termCount - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
A map from term indices to counts for the pages of this host.
- TermCount() - Constructor for class it.unimi.di.law.bubing.parser.SpamTextProcessor.TermCount
-
- termCountUpdates - Variable in class it.unimi.di.law.bubing.frontier.VisitState
-
- textProcessor - Variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
A text processor, or null
.
- THRESHOLD - Static variable in class it.unimi.di.law.warc.filters.IsProbablyBinary
-
The number of zeroes that must appear to cause the page to be considered probably
binary.
- toByteArray(BubingJob) - Method in class it.unimi.di.law.bubing.Agent
-
- toByteArray(String) - Static method in class it.unimi.di.law.bubing.util.Util
-
Returns a byte-array representation of an ASCII string.
- toByteArray(URI) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Returns an ASCII byte-array representation of a BUbiNG URL.
- toByteArrayList(String, ByteArrayList) - Static method in class it.unimi.di.law.bubing.util.Util
-
- toByteArrayList(URI, ByteArrayList) - Static method in class it.unimi.di.law.bubing.util.BURL
-
Writes an ASCII representation of a BUbiNG URL in a
ByteArrayList
.
- todo - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
- TodoThread - Class in it.unimi.di.law.bubing.frontier
-
- TodoThread(Frontier) - Constructor for class it.unimi.di.law.bubing.frontier.TodoThread
-
Instantiates the thread.
- toHexString(byte[]) - Static method in class it.unimi.di.law.warc.util.Util
-
Returns a string representing in hexadecimal a digest.
- toInputStream() - Method in class it.unimi.di.law.warc.util.ByteArraySessionOutputBuffer
-
- token - Variable in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Current token.
- Token - Class in it.unimi.di.law.warc.filters.parser
-
Describes the input token stream.
- Token() - Constructor for class it.unimi.di.law.warc.filters.parser.Token
-
No-argument constructor
- Token(int) - Constructor for class it.unimi.di.law.warc.filters.parser.Token
-
Constructs a new token for the specified Image.
- Token(int, String) - Constructor for class it.unimi.di.law.warc.filters.parser.Token
-
Constructs a new token for the specified Image and Kind.
- token_source - Variable in class it.unimi.di.law.warc.filters.parser.FilterParser
-
Generated Token Manager.
- tokenImage - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
-
Literal token values.
- tokenImage - Variable in exception it.unimi.di.law.warc.filters.parser.ParseException
-
This is a reference to the "tokenImage" array of the generated
parser within which the parse error occurred.
- TokenMgrError - Error in it.unimi.di.law.warc.filters.parser
-
Token Manager Error.
- TokenMgrError() - Constructor for error it.unimi.di.law.warc.filters.parser.TokenMgrError
-
No arg constructor.
- TokenMgrError(boolean, int, int, int, String, int, int) - Constructor for error it.unimi.di.law.warc.filters.parser.TokenMgrError
-
Full Constructor.
- TokenMgrError(String, int) - Constructor for error it.unimi.di.law.warc.filters.parser.TokenMgrError
-
Constructor with message and reason.
- TooSlowException - Exception in it.unimi.di.law.bubing.util
-
A marker IOException
for sites that return data too slowly.
- TooSlowException() - Constructor for exception it.unimi.di.law.bubing.util.TooSlowException
-
- TooSlowException(String) - Constructor for exception it.unimi.di.law.bubing.util.TooSlowException
-
- toOutputStream(String, OutputStream) - Static method in class it.unimi.di.law.bubing.util.Util
-
Writes a string to an output stream, discarding higher order bits.
- toSortedPrefixFreeCharArrays(Set<String>) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
-
- toStream(ByteArrayList, OutputStream) - Method in class it.unimi.di.law.bubing.sieve.ByteArrayListByteSerializerDeserializer
-
- toStream(CharSequence, OutputStream) - Method in class it.unimi.di.law.bubing.sieve.CharSequenceByteSerializerDeserializer
-
- toStream(V, OutputStream) - Method in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
-
Serializes an object starting from a given offset of a byte array.
- toString() - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
- toString() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
- toString() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchVirtualizer
-
- toString() - Method in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- toString() - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.SieveEntry
-
- toString() - Method in class it.unimi.di.law.bubing.StartupConfiguration
-
- toString() - Method in class it.unimi.di.law.bubing.util.BubingJob
-
A string representation of this job
- toString() - Method in class it.unimi.di.law.bubing.util.ByteArrayCharSequence
-
- toString() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
- toString() - Method in class it.unimi.di.law.bubing.util.FetchData
-
- toString() - Method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
-
A string representation of the state of this object, that is just the prefix allowed.
- toString() - Method in class it.unimi.di.law.warc.filters.DigestEquals
-
A string representation of the state of this object, that is just the digest allowed.
- toString() - Method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
-
A string representation of the state of this object, that is just the threshold used.
- toString() - Method in class it.unimi.di.law.warc.filters.HostEndsWith
-
A string representation of the state of this object, that is just the suffix allowed.
- toString() - Method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
-
A string representation of the state of this object, that is just the host suffixes allowed.
- toString() - Method in class it.unimi.di.law.warc.filters.HostEquals
-
A string representation of the state of this object, that is just the host allowed.
- toString() - Method in class it.unimi.di.law.warc.filters.IsHttpResponse
-
A string representation of the state of this object.
- toString() - Method in class it.unimi.di.law.warc.filters.IsProbablyBinary
-
A string representation of the state of this filter.
- toString() - Method in class it.unimi.di.law.warc.filters.parser.Token
-
Returns the image.
- toString() - Method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
-
A string representation of the state of this object, that is just the suffixes allowed.
- toString() - Method in class it.unimi.di.law.warc.filters.ResponseMatches
-
A string representation of the state of this filter.
- toString() - Method in class it.unimi.di.law.warc.filters.SameHost
-
Returns SameHost()
.
- toString() - Method in class it.unimi.di.law.warc.filters.SchemeEquals
-
A string representation of this
- toString() - Method in class it.unimi.di.law.warc.filters.StatusCategory
-
A string representation of this
- toString() - Method in class it.unimi.di.law.warc.filters.URLEquals
-
Get a string representation of this filter
- toString() - Method in class it.unimi.di.law.warc.filters.URLMatchesRegex
-
Get a string representation of this
- toString() - Method in class it.unimi.di.law.warc.filters.URLShorterThan
-
Get a string representation of this filter
- toString() - Method in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
-
- toString() - Method in class it.unimi.di.law.warc.records.HttpRequestWarcRecord
-
- toString() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- toString() - Method in class it.unimi.di.law.warc.records.InfoWarcRecord
-
- toString() - Method in enum it.unimi.di.law.warc.records.WarcHeader.Name
-
- toString() - Method in enum it.unimi.di.law.warc.records.WarcRecord.Type
-
- toString() - Method in class it.unimi.di.law.warc.util.StringHttpMessages.HttpResponse
-
- toString(byte[]) - Static method in class it.unimi.di.law.bubing.util.Util
-
Returns a string representation of an ASCII byte array.
- toString(byte[], int, int) - Static method in class it.unimi.di.law.bubing.util.Util
-
Returns a string representation of an ASCII byte-array fragment.
- toString(char[][]) - Static method in class it.unimi.di.law.bubing.util.URLRespectsRobots
-
Prints gracefully a robot filter using at most
30 prefixes.
- toString(int[]) - Static method in class it.unimi.di.law.bubing.frontier.StatsThread
-
Returns an integer array as a string, but does not print trailing zeroes.
- toString(ByteArrayList) - Static method in class it.unimi.di.law.bubing.util.Util
-
Returns a string representation of an ASCII byte array.
- toString(Object...) - Method in class it.unimi.di.law.warc.filters.AbstractFilter
-
A helper method that generates a string version of this filter (mainly
useful for atomic, i.e., class-based, filters).
- toString(AtomicLongArray) - Static method in class it.unimi.di.law.bubing.frontier.StatsThread
-
Returns an AtomicLongArray
array as a string, but does not print trailing zeroes.
- ToStringWriter - Class in it.unimi.di.law.warc.processors
-
- trackLineColumn - Variable in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- TRAILER_LEN - Static variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive
-
- transferredBytes - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The overall number of transferred bytes.
- TRANSFERREDBYTES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- transferredBytesLogger - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
-
A global progress logger, measuring the number of transferred bytes.
- trim() - Method in class it.unimi.di.law.bubing.util.ByteArrayDiskQueue
-
Trims this queue.
- trim() - Method in class it.unimi.di.law.bubing.util.ObjectDiskQueue
-
Trims this queue.
- TRUE - Static variable in class it.unimi.di.law.warc.filters.Filters
-
The constantly true filter.
- TRUE - Static variable in interface it.unimi.di.law.warc.filters.parser.FilterParserConstants
-
RegularExpression Id.
- truncated - Variable in class it.unimi.di.law.bubing.util.FetchData
-
True if the last fetch was truncated because of exceedingly long response body.
- type() - Method in annotation type it.unimi.di.law.bubing.StartupConfiguration.FilterSpecification
-
- UnbufferedFileStore - Class in it.unimi.di.law.bubing.store
-
An unbuffered, directly-to-disk store, mainly for debugging purposes.
- UnbufferedFileStore(RuntimeConfiguration) - Constructor for class it.unimi.di.law.bubing.store.UnbufferedFileStore
-
- uncompressedSkipLength - Variable in class it.unimi.di.law.warc.io.gzarc.GZIPArchive.Entry
-
The length of the entry one uncompressed.
- UncompressedWarcReader - Class in it.unimi.di.law.warc.io
-
- UncompressedWarcReader(InputStream) - Constructor for class it.unimi.di.law.warc.io.UncompressedWarcReader
-
- UncompressedWarcWriter - Class in it.unimi.di.law.warc.io
-
- UncompressedWarcWriter(OutputStream) - Constructor for class it.unimi.di.law.warc.io.UncompressedWarcWriter
-
- unknownHosts - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The queue of unknown hosts.
- unlock() - Method in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.LockedMap
-
- unresolved - Variable in class it.unimi.di.law.bubing.frontier.StatsThread
-
The number of path+queries living in an unresolved visit state.
- update(K, V, V) - Method in class it.unimi.di.law.bubing.sieve.AbstractSieve.DefaultUpdateStrategy
-
- update(K, V, V) - Method in interface it.unimi.di.law.bubing.sieve.AbstractSieve.UpdateStrategy
-
Computes the new value to be put in the store when a duplicate key is found.
- updateFetchingThreadsWaitingStats(long) - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
- UpdateLineColumn(char) - Method in class it.unimi.di.law.warc.filters.parser.SimpleCharStream
-
- updateRequestedFrontSize() - Method in class it.unimi.di.law.bubing.frontier.Frontier
-
- updateStrategy - Variable in class it.unimi.di.law.bubing.sieve.AbstractSieve
-
- updateTermCount(Short2ShortMap) - Method in class it.unimi.di.law.bubing.frontier.VisitState
-
- uri() - Method in class it.unimi.di.law.bubing.util.FetchData
-
- uri() - Method in interface it.unimi.di.law.warc.filters.URIResponse
-
Returns the URI part.
- uri() - Method in class it.unimi.di.law.warc.records.HttpResponseWarcRecord
-
- URIResponse - Interface in it.unimi.di.law.warc.filters
-
An interface implemented by all classes able to expose a HttpResponse
and URI
, e.g.
- url - Variable in class it.unimi.di.law.bubing.util.BubingJob
-
- url - Variable in class it.unimi.di.law.bubing.util.FetchData
-
The BUbiNG URL associated with this request.
- urlCache - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The URL cache.
- urlCacheMaxByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- urlCacheMaxByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The maximum size of the URL cache in bytes.
- URLDigestFinalPositionWriter - Class in it.unimi.di.law.warc.processors
-
- URLDigestFinalPositionWriter(String, String) - Constructor for class it.unimi.di.law.warc.processors.URLDigestFinalPositionWriter
-
- URLDigestStatusLengthWriter - Class in it.unimi.di.law.warc.processors
-
- URLDigestStatusLengthWriter() - Constructor for class it.unimi.di.law.warc.processors.URLDigestStatusLengthWriter
-
- URLDigestWriter - Class in it.unimi.di.law.warc.processors
-
- URLEQUAL_PATTERN - Static variable in class it.unimi.di.law.bubing.parser.HTMLParser
-
The pattern prefixing the URL in a META
HTTP-EQUIV
element of refresh type.
- URLEquals - Class in it.unimi.di.law.warc.filters
-
A filter accepting only a given URIs.
- URLEquals(String) - Constructor for class it.unimi.di.law.warc.filters.URLEquals
-
Creates a filter that only accepts URIs equal to a given URI.
- URLMatchesRegex - Class in it.unimi.di.law.warc.filters
-
A filter accepting only URIs that match a certain regular expression.
- URLMatchesRegex(String) - Constructor for class it.unimi.di.law.warc.filters.URLMatchesRegex
-
Creates a filter that only accepts URLs matching a given regular expression.
- URLPositionWriter - Class in it.unimi.di.law.warc.processors
-
- URLPositionWriter(String, String) - Constructor for class it.unimi.di.law.warc.processors.URLPositionWriter
-
- URLRespectsRobots - Class in it.unimi.di.law.bubing.util
-
A class providing static methods to parse robots.txt
into arrays of char arrays and
handle robot filtering.
- urls - Variable in class it.unimi.di.law.bubing.parser.HTMLParser.SetLinkReceiver
-
The set of URLs gathered so far.
- URLShorterThan - Class in it.unimi.di.law.warc.filters
-
A filter accepting only URIs whose overall length is below a given threshold.
- URLShorterThan(int) - Constructor for class it.unimi.di.law.warc.filters.URLShorterThan
-
Creates a filter that only accepts URLs shorter than the given threshold.
- URLWriter - Class in it.unimi.di.law.warc.processors
-
- usage - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues.QueueData
-
The number of bytes used by the list.
- USE_BURL_PROPERTY - Static variable in class it.unimi.di.law.warc.records.AbstractWarcRecord
-
- used - Variable in class it.unimi.di.law.bubing.util.ByteArrayDiskQueues
-
The overall number of bytes used by elements in the queues.
- userAgent - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- userAgent - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The User Agent header used for HTTP requests.
- userAgentFrom - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- userAgentFrom - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The From header used for HTTP requests.
- Util - Class in it.unimi.di.law.bubing.util
-
Generic static utility method container.
- Util - Class in it.unimi.di.law.warc.util
-
A class containing some utility functions.
- Util() - Constructor for class it.unimi.di.law.bubing.util.Util
-
- Util() - Constructor for class it.unimi.di.law.warc.util.Util
-
- value - Variable in class it.unimi.di.law.bubing.sieve.AbstractSieve.SieveEntry
-
- value - Variable in class it.unimi.di.law.bubing.util.ConcurrentCountingMap.Stripe
-
The array of values.
- value - Variable in enum it.unimi.di.law.warc.records.WarcHeader.Name
-
- value() - Method in annotation type it.unimi.di.law.bubing.StartupConfiguration.OptionalSpecification
-
- valueOf() - Static method in class it.unimi.di.law.warc.filters.IsProbablyBinary
-
Get a new IsProbablyBinary
that will accept only http responses whose content stream appears to be binary.
- valueOf() - Static method in class it.unimi.di.law.warc.filters.SameHost
-
Get a SameHost
filter.
- valueOf(String) - Static method in enum it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.ContentTypeStartsWith
-
Get a new ContentTypeStartsWith
that will accept only fetched responses whose content type starts with a given string
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.DigestEquals
-
Get a new DigestEquals
that will accept only WarcRecord
whose digest is a given string
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.DuplicateSegmentsLessThan
-
Get a new DuplicateSegmentsLessThan
that will accept only URIs whose path does not contain too many duplicate segments.
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.HostEndsWith
-
Get a new HostEndsWith
that will accept only URIs whose suffix is given in input
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.HostEndsWithOneOf
-
Get a new HostEndsWithOneOf
that will accept only URIs whose host part suffix is one of the given suffixes
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.HostEquals
-
Get a new HostEquals
that will accept only URIs whose host part is equal to spec
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.IsHttpResponse
-
Get a new IsHttpResponse that will accept only WarcRecords that are http/https responses.
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.IsProbablyBinary
-
Deprecated.
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.PathEndsWithOneOf
-
Get a new PathEndsWithOneOf
that will accept only URIs whose suffix is one of the allowed suffixes
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.ResponseMatches
-
Get a new content matcher that will accept only responses whose content stream matches the regular expression.
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.SchemeEquals
-
Get a new SchemeEquals accepting only URIs whose scheme equals the given string
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.StatusCategory
-
Get a new StatusCategory
accepting only fetched response whose status category (status/100) has a given value
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.URLEquals
-
Get a new URLEquals
accepting only a given URI
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.URLMatchesRegex
-
Get a new URLMatchesRegex
accepting only URIs that match a certain regular expression
- valueOf(String) - Static method in class it.unimi.di.law.warc.filters.URLShorterThan
-
Get a new URLShorterThan
- valueOf(String) - Static method in enum it.unimi.di.law.warc.records.WarcHeader.Name
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum it.unimi.di.law.warc.records.WarcRecord.Type
-
Returns the enum constant of this type with the specified name.
- valueOf(Header) - Static method in enum it.unimi.di.law.warc.records.WarcRecord.Type
-
Determines the WARC record type given the WARC-Type
header.
- values() - Static method in enum it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum it.unimi.di.law.warc.records.WarcHeader.Name
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum it.unimi.di.law.warc.records.WarcRecord.Type
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- valueSerDeser - Variable in class it.unimi.di.law.bubing.sieve.AbstractSieve
-
- variance() - Method in class it.unimi.di.law.bubing.util.ConcurrentSummaryStats
-
Returns the variance of the values added so far.
- vByteLength(int) - Static method in class it.unimi.di.law.bubing.util.Util
-
Returns the length of the vByte encoding of a natural number.
- virtualizer - Variable in class it.unimi.di.law.bubing.frontier.Frontier
-
The workbench virtualizer used by this frontier.
- virtualizerMaxByteSize - Variable in class it.unimi.di.law.bubing.RuntimeConfiguration
-
- virtualizerMaxByteSize - Variable in class it.unimi.di.law.bubing.StartupConfiguration
-
The maximum size of the virtualizer in bytes; this field is ignored if the virtualizer does not need to be sized.
- VIRTUALQUEUESBIRTHTIME - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- VIRTUALQUEUESIZES - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- visitState - Variable in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
The array of keys.
- visitState - Variable in class it.unimi.di.law.bubing.util.FetchData
-
The visit state associated with this request.
- VisitState - Class in it.unimi.di.law.bubing.frontier
-
A class maintaining the current state of the visit of a specific scheme+authority.
- VisitState(Frontier, byte[]) - Constructor for class it.unimi.di.law.bubing.frontier.VisitState
-
Creates a visit state.
- visitStates() - Method in class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Returns the array of visit states; the order is arbitrary.
- visitStates() - Method in class it.unimi.di.law.bubing.frontier.WorkbenchEntry
-
Returns the visit states currently in the queue.
- VisitStateSet - Class in it.unimi.di.law.bubing.frontier
-
A data structure representing the set of
visit states created so far.
- VisitStateSet() - Constructor for class it.unimi.di.law.bubing.frontier.VisitStateSet
-
Creates an empty visit state set.
- VISITSTATESETSIZE - it.unimi.di.law.bubing.frontier.Frontier.PropertyKeys
-
- VOID - Static variable in interface it.unimi.di.law.bubing.sieve.ByteSerializerDeserializer
-
A NOP-serializer-deserializer for Void (only for values).