Class WarcRecord
- Direct Known Subclasses:
GZWarcRecord
public class WarcRecord extends Object
To write a record, set header and block appropriately and then call
write(OutputStream). After such call, the WarcRecord.Header.dataLength field
of the header will be modified to reflect the write operation.
To perform a sequence of consecutive read/skip, call read(FastBufferedInputStream)
or skip(FastBufferedInputStream). After a read, the block can (but it is not
required to) be read to obtain the read data. The contentType
field of the header can be used to determine how to further process the content of
block.
As an implementation note: skipping just returns the value of the data-length
field of the skipped record. On the other hand, reading parses the header and sets
all the header fields appropriately; hence it sets block so that it refers to
the block part of the record. Observe that since block is just a "view"
over the underlying stream, its content, or position, are not guaranteed to remain the same after
a consecutive read/skip on the same stream.
This object can be reused for non-consecutive writes on different streams. On the other hand,
to reuse this object for non-consecutive read/skip, the method resetRead() must be
called any time a read/skip does not follow a read/skip from the same stream.
This class uses internal buffering, hence it is not thread safe.
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classWarcRecord.ContentTypeContent types.static classWarcRecord.FormatExceptionAn exception to denote parsing errors during reads.static classWarcRecord.HeaderA class to contain fields contained in the warcheader.static classWarcRecord.RecordTypeRecord types. -
Field Summary
Fields Modifier and Type Field Description static booleanASSERTSMeasurableInputStreamblockThe warcblock.static Object2ObjectMap<byte[],WarcRecord.ContentType>BYTE_REPRESENTATION_TO_CONTENT_TYPEstatic Object2ObjectMap<byte[],WarcRecord.RecordType>BYTE_REPRESENTATION_TO_RECORD_TYPEprotected CRC32crcThe class used inwrite(OutputStream)to compute CRC32 of the content forGZWarcRecord.static byte[]CRLFSome constant strings in their byte equivalent.static booleanDEBUGstatic intDEFAULT_BUFFER_SIZEThe default size of the internal buffer used for headers read/write.WarcRecord.HeaderheaderThe warcheader.static booleanUSE_POSITION_INSTEAD_OF_SKIPTells what method to use to skip bytes in the input stream.static byte[]UUID_FIELD_NAMESome constant strings in their byte equivalent.static byte[]WARC_IDSome constant strings in their byte equivalent. -
Constructor Summary
Constructors Constructor Description WarcRecord()Builds a warc record.WarcRecord(byte[] buffer)Builds a warc record. -
Method Summary
Modifier and Type Method Description voidcopy(WarcRecord record)Copies this warc record fields from another warc record.longread(FastBufferedInputStream bin)A method to read a record from anInputStream.voidresetRead()A method to allow the reuse of the present object for non consecutive reads.longskip(FastBufferedInputStream bin)A method to skip a record from anInputStream.StringtoString()voidwrite(OutputStream out)A method to write this record to anOutputStream.
-
Field Details
-
DEBUG
public static final boolean DEBUG- See Also:
- Constant Field Values
-
ASSERTS
public static final boolean ASSERTS- See Also:
- Constant Field Values
-
USE_POSITION_INSTEAD_OF_SKIP
public static final boolean USE_POSITION_INSTEAD_OF_SKIPTells what method to use to skip bytes in the input stream. It's here for profiling purposes.- See Also:
- Constant Field Values
-
BYTE_REPRESENTATION_TO_CONTENT_TYPE
public static final Object2ObjectMap<byte[],WarcRecord.ContentType> BYTE_REPRESENTATION_TO_CONTENT_TYPE -
BYTE_REPRESENTATION_TO_RECORD_TYPE
public static final Object2ObjectMap<byte[],WarcRecord.RecordType> BYTE_REPRESENTATION_TO_RECORD_TYPE -
DEFAULT_BUFFER_SIZE
public static final int DEFAULT_BUFFER_SIZEThe default size of the internal buffer used for headers read/write.- See Also:
- Constant Field Values
-
WARC_ID
public static final byte[] WARC_IDSome constant strings in their byte equivalent. -
UUID_FIELD_NAME
public static final byte[] UUID_FIELD_NAMESome constant strings in their byte equivalent. -
CRLF
public static final byte[] CRLFSome constant strings in their byte equivalent. -
crc
The class used inwrite(OutputStream)to compute CRC32 of the content forGZWarcRecord. Ifnullthe crc will not be updated. -
header
The warcheader. -
block
The warcblock.
-
-
Constructor Details
-
WarcRecord
public WarcRecord(byte[] buffer)Builds a warc record.- Parameters:
buffer- the buffer used for header read/write buffering.
-
WarcRecord
public WarcRecord()Builds a warc record. It will allocate an internal buffer of sizeDEFAULT_BUFFER_SIZEbytes to buffer header read/writes.
-
-
Method Details
-
copy
Copies this warc record fields from another warc record.- Parameters:
record- the record to copy from.
-
resetRead
public void resetRead()A method to allow the reuse of the present object for non consecutive reads. -
skip
A method to skip a record from anInputStream.- Parameters:
bin- theFastBufferedInputStreamto read from.- Returns:
- the value of the
data-length, or -1 if eof has been reached. - Throws:
IOExceptionWarcRecord.FormatException
-
read
A method to read a record from anInputStream.- Parameters:
bin- theFastBufferedInputStreamto read from.- Returns:
- the value of the
data-length, or -1 if eof has been reached. - Throws:
IOExceptionWarcRecord.FormatException
-
write
A method to write this record to anOutputStream.- Parameters:
out- where to write.- Throws:
IOException
-
toString
-