java.io.Closeable, java.lang.AutoCloseable, java.lang.Runnablepublic final class FetchingThread
extends java.lang.Thread
implements java.io.Closeable
ParsingThread.
Instances of this class iteratively extract from Frontier.todo (using
polling and exponential backoff) a ready VisitState that has been
previously enqueued by the TodoThread and use their embedded
FetchData to fetch the first URL (or possibly the first few URLs, depending on RuntimeConfiguration.keepAliveTime)
from the VisitState queue. Once the fetch is over, the embedded
FetchData is enqueued in Frontier.results and the thread
waits for a signal on the FetchData. The ParsingThread that
retrieves the FetchData must signal back that the FetchData
instances can be reused when it has finished to use its contents.
The design of the interaction between instances of this class, the
TodoThread and instances of ParsingThread minimizes
contention by sandwiching all FetchingThread instances between two
wait-free queues (signalling back that a FetchData can be reused
causes of course no contention). Exponential backoff should happen rarely in
a full-speed crawl, as the todo queue is almost
always nonempty.
Instances of this class do not access any shared data structure, except for logging. It is expected that large instances of BUbiNG use thousands of fetching threads to download from a large number of sites in parallel.
This class implements Closeable: the close() methods simply closes the underlying FetchData instance.
| Modifier and Type | Class | Description |
|---|---|---|
protected static class |
FetchingThread.BasicHttpClientConnectionManagerWithAlternateDNS |
A support class that makes it possible to plug in a custom DNS resolver.
|
| Modifier and Type | Field | Description |
|---|---|---|
boolean |
stop |
Whether we should stop (used also to reduce the number of threads).
|
| Constructor | Description |
|---|---|
FetchingThread(Frontier frontier,
int index) |
Creates a new fetching thread.
|
| Modifier and Type | Method | Description |
|---|---|---|
void |
abort() |
Causes the
FetchData used by this thread to be FetchData.abort() (whence, the corresponding connection to be closed). |
void |
close() |
|
static org.apache.http.cookie.Cookie[] |
getCookies(java.net.URI url,
org.apache.http.client.CookieStore cookieStore,
int cookieMaxByteSize) |
Returns the list of cookies in a given store in the form of an array, limiting their overall size
(only the maximal prefix of cookies satisfying the size limit is returned).
|
void |
run() |
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitactiveCount, checkAccess, clone, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, onSpinWait, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yieldpublic volatile boolean stop
public FetchingThread(Frontier frontier, int index) throws java.security.NoSuchAlgorithmException, java.lang.IllegalArgumentException, java.io.IOException
frontier - a reference to the Frontier.index - the index of this thread (only for logging purposes).java.security.NoSuchAlgorithmExceptionjava.lang.IllegalArgumentExceptionjava.io.IOExceptionpublic static org.apache.http.cookie.Cookie[] getCookies(java.net.URI url,
org.apache.http.client.CookieStore cookieStore,
int cookieMaxByteSize)
url - the URL which generated the cookies (for logging purposes).cookieStore - a cookie store, usually generated by a response.cookieMaxByteSize - the maximum overall size of cookies in bytes.public void abort()
FetchData used by this thread to be FetchData.abort() (whence, the corresponding connection to be closed).public void run()
run in interface java.lang.Runnablerun in class java.lang.Threadpublic void close()
throws java.io.IOException
close in interface java.lang.AutoCloseableclose in interface java.io.Closeablejava.io.IOException