WarcWriter
, java.io.Closeable
, java.lang.AutoCloseable
public class ParallelBufferedWarcWriter extends java.lang.Object implements WarcWriter
WarcWriter
instances
to parallelize record compression.
Records will be written to the OutputStream
provided at construction time (which we suggest to buffer heavily)
by a flushing thread. There is no guarantee about the order in which the records will be output.
Note that for each thread there is an associated FastByteArrayOutputStream
that will grow as needed to accommodate
the output of the record.
Modifier and Type | Class | Description |
---|---|---|
protected static class |
ParallelBufferedWarcWriter.WriterPair |
Modifier and Type | Field | Description |
---|---|---|
protected java.util.concurrent.ArrayBlockingQueue<ParallelBufferedWarcWriter.WriterPair> |
emptyPairs |
The queue of empty
ParallelBufferedWarcWriter.WriterPair instances. |
protected java.util.concurrent.ArrayBlockingQueue<ParallelBufferedWarcWriter.WriterPair> |
filledPairs |
The queue of filled
ParallelBufferedWarcWriter.WriterPair instances; their content will be flushed to disk by the flushingThread . |
protected it.unimi.di.law.warc.io.ParallelBufferedWarcWriter.FlushingThread |
flushingThread |
The thread that iteratively extracts filled @link ParallelBufferedWarcWriter.WriterPair} instances from
filledPairs ,
dump them to outputStream and enqueue them to emptyPairs . |
protected java.io.IOException |
flushingThreadException |
The exception throw by the
flushingThread , if any, or null . |
protected java.io.OutputStream |
outputStream |
The final output stream.
|
Constructor | Description |
---|---|
ParallelBufferedWarcWriter(java.io.OutputStream outputStream,
boolean compress) |
Creates a Warc parallel output stream using 2×
Runtime.availableProcessors() buffers. |
ParallelBufferedWarcWriter(java.io.OutputStream outputStream,
boolean compress,
int numberOfBuffers) |
Creates a Warc parallel output stream.
|
Modifier and Type | Method | Description |
---|---|---|
void |
close() |
|
void |
write(WarcRecord record) |
protected final java.util.concurrent.ArrayBlockingQueue<ParallelBufferedWarcWriter.WriterPair> emptyPairs
ParallelBufferedWarcWriter.WriterPair
instances.protected final java.util.concurrent.ArrayBlockingQueue<ParallelBufferedWarcWriter.WriterPair> filledPairs
ParallelBufferedWarcWriter.WriterPair
instances; their content will be flushed to disk by the flushingThread
.protected final it.unimi.di.law.warc.io.ParallelBufferedWarcWriter.FlushingThread flushingThread
filledPairs
,
dump them to outputStream
and enqueue them to emptyPairs
.protected final java.io.OutputStream outputStream
protected volatile java.io.IOException flushingThreadException
flushingThread
, if any, or null
.public ParallelBufferedWarcWriter(java.io.OutputStream outputStream, boolean compress)
Runtime.availableProcessors()
buffers.outputStream
- the final output stream.compress
- whether to write compressed records.public ParallelBufferedWarcWriter(java.io.OutputStream outputStream, boolean compress, int numberOfBuffers)
outputStream
- the final output stream.compress
- whether to write compressed records.numberOfBuffers
- the number of buffers.public void write(WarcRecord record) throws java.io.IOException, java.lang.InterruptedException
write
in interface WarcWriter
java.io.IOException
java.lang.InterruptedException
public void close() throws java.io.IOException
close
in interface java.lang.AutoCloseable
close
in interface java.io.Closeable
java.io.IOException