java.io.Closeable
, java.lang.AutoCloseable
, java.lang.Runnable
public final class FetchingThread
extends java.lang.Thread
implements java.io.Closeable
ParsingThread
.
Instances of this class iteratively extract from Frontier.todo
(using
polling and exponential backoff) a ready VisitState
that has been
previously enqueued by the TodoThread
and use their embedded
FetchData
to fetch the first URL (or possibly the first few URLs, depending on RuntimeConfiguration.keepAliveTime
)
from the VisitState
queue. Once the fetch is over, the embedded
FetchData
is enqueued in Frontier.results
and the thread
waits for a signal on the FetchData
. The ParsingThread
that
retrieves the FetchData
must signal back that the FetchData
instances can be reused when it has finished to use its contents.
The design of the interaction between instances of this class, the
TodoThread
and instances of ParsingThread
minimizes
contention by sandwiching all FetchingThread
instances between two
wait-free queues (signalling back that a FetchData
can be reused
causes of course no contention). Exponential backoff should happen rarely in
a full-speed crawl, as the todo queue is almost
always nonempty.
Instances of this class do not access any shared data structure, except for logging. It is expected that large instances of BUbiNG use thousands of fetching threads to download from a large number of sites in parallel.
This class implements Closeable
: the close()
methods simply closes the underlying FetchData
instance.
Modifier and Type | Class | Description |
---|---|---|
protected static class |
FetchingThread.BasicHttpClientConnectionManagerWithAlternateDNS |
A support class that makes it possible to plug in a custom DNS resolver.
|
Modifier and Type | Field | Description |
---|---|---|
boolean |
stop |
Whether we should stop (used also to reduce the number of threads).
|
Constructor | Description |
---|---|
FetchingThread(Frontier frontier,
int index) |
Creates a new fetching thread.
|
Modifier and Type | Method | Description |
---|---|---|
void |
abort() |
Causes the
FetchData used by this thread to be FetchData.abort() (whence, the corresponding connection to be closed). |
void |
close() |
|
static org.apache.http.cookie.Cookie[] |
getCookies(java.net.URI url,
org.apache.http.client.CookieStore cookieStore,
int cookieMaxByteSize) |
Returns the list of cookies in a given store in the form of an array, limiting their overall size
(only the maximal prefix of cookies satisfying the size limit is returned).
|
void |
run() |
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
activeCount, checkAccess, clone, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, onSpinWait, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
public volatile boolean stop
public FetchingThread(Frontier frontier, int index) throws java.security.NoSuchAlgorithmException, java.lang.IllegalArgumentException, java.io.IOException
frontier
- a reference to the Frontier
.index
- the index of this thread (only for logging purposes).java.security.NoSuchAlgorithmException
java.lang.IllegalArgumentException
java.io.IOException
public static org.apache.http.cookie.Cookie[] getCookies(java.net.URI url, org.apache.http.client.CookieStore cookieStore, int cookieMaxByteSize)
url
- the URL which generated the cookies (for logging purposes).cookieStore
- a cookie store, usually generated by a response.cookieMaxByteSize
- the maximum overall size of cookies in bytes.public void abort()
FetchData
used by this thread to be FetchData.abort() (whence, the corresponding connection to be closed).public void run()
run
in interface java.lang.Runnable
run
in class java.lang.Thread
public void close() throws java.io.IOException
close
in interface java.lang.AutoCloseable
close
in interface java.io.Closeable
java.io.IOException