Class WarcRecord
- Direct Known Subclasses:
GZWarcRecord
public class WarcRecord extends Object
To write a record, set header
and block
appropriately and then call
write(OutputStream)
. After such call, the WarcRecord.Header.dataLength
field
of the header
will be modified to reflect the write operation.
To perform a sequence of consecutive read/skip, call read(FastBufferedInputStream)
or skip(FastBufferedInputStream)
. After a read, the block
can (but it is not
required to) be read to obtain the read data. The contentType
field of the header
can be used to determine how to further process the content of
block
.
As an implementation note: skipping just returns the value of the data-length
field of the skipped record. On the other hand, reading parses the header
and sets
all the header
fields appropriately; hence it sets block
so that it refers to
the block
part of the record. Observe that since block
is just a "view"
over the underlying stream, its content, or position, are not guaranteed to remain the same after
a consecutive read/skip on the same stream.
This object can be reused for non-consecutive writes on different streams. On the other hand,
to reuse this object for non-consecutive read/skip, the method resetRead()
must be
called any time a read/skip does not follow a read/skip from the same stream.
This class uses internal buffering, hence it is not thread safe.
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
WarcRecord.ContentType
Content types.static class
WarcRecord.FormatException
An exception to denote parsing errors during reads.static class
WarcRecord.Header
A class to contain fields contained in the warcheader
.static class
WarcRecord.RecordType
Record types. -
Field Summary
Fields Modifier and Type Field Description static boolean
ASSERTS
MeasurableInputStream
block
The warcblock
.static Object2ObjectMap<byte[],WarcRecord.ContentType>
BYTE_REPRESENTATION_TO_CONTENT_TYPE
static Object2ObjectMap<byte[],WarcRecord.RecordType>
BYTE_REPRESENTATION_TO_RECORD_TYPE
protected CRC32
crc
The class used inwrite(OutputStream)
to compute CRC32 of the content forGZWarcRecord
.static byte[]
CRLF
Some constant strings in their byte equivalent.static boolean
DEBUG
static int
DEFAULT_BUFFER_SIZE
The default size of the internal buffer used for headers read/write.WarcRecord.Header
header
The warcheader
.static boolean
USE_POSITION_INSTEAD_OF_SKIP
Tells what method to use to skip bytes in the input stream.static byte[]
UUID_FIELD_NAME
Some constant strings in their byte equivalent.static byte[]
WARC_ID
Some constant strings in their byte equivalent. -
Constructor Summary
Constructors Constructor Description WarcRecord()
Builds a warc record.WarcRecord(byte[] buffer)
Builds a warc record. -
Method Summary
Modifier and Type Method Description void
copy(WarcRecord record)
Copies this warc record fields from another warc record.long
read(FastBufferedInputStream bin)
A method to read a record from anInputStream
.void
resetRead()
A method to allow the reuse of the present object for non consecutive reads.long
skip(FastBufferedInputStream bin)
A method to skip a record from anInputStream
.String
toString()
void
write(OutputStream out)
A method to write this record to anOutputStream
.
-
Field Details
-
DEBUG
public static final boolean DEBUG- See Also:
- Constant Field Values
-
ASSERTS
public static final boolean ASSERTS- See Also:
- Constant Field Values
-
USE_POSITION_INSTEAD_OF_SKIP
public static final boolean USE_POSITION_INSTEAD_OF_SKIPTells what method to use to skip bytes in the input stream. It's here for profiling purposes.- See Also:
- Constant Field Values
-
BYTE_REPRESENTATION_TO_CONTENT_TYPE
public static final Object2ObjectMap<byte[],WarcRecord.ContentType> BYTE_REPRESENTATION_TO_CONTENT_TYPE -
BYTE_REPRESENTATION_TO_RECORD_TYPE
public static final Object2ObjectMap<byte[],WarcRecord.RecordType> BYTE_REPRESENTATION_TO_RECORD_TYPE -
DEFAULT_BUFFER_SIZE
public static final int DEFAULT_BUFFER_SIZEThe default size of the internal buffer used for headers read/write.- See Also:
- Constant Field Values
-
WARC_ID
public static final byte[] WARC_IDSome constant strings in their byte equivalent. -
UUID_FIELD_NAME
public static final byte[] UUID_FIELD_NAMESome constant strings in their byte equivalent. -
CRLF
public static final byte[] CRLFSome constant strings in their byte equivalent. -
crc
The class used inwrite(OutputStream)
to compute CRC32 of the content forGZWarcRecord
. Ifnull
the crc will not be updated. -
header
The warcheader
. -
block
The warcblock
.
-
-
Constructor Details
-
WarcRecord
public WarcRecord(byte[] buffer)Builds a warc record.- Parameters:
buffer
- the buffer used for header read/write buffering.
-
WarcRecord
public WarcRecord()Builds a warc record. It will allocate an internal buffer of sizeDEFAULT_BUFFER_SIZE
bytes to buffer header read/writes.
-
-
Method Details
-
copy
Copies this warc record fields from another warc record.- Parameters:
record
- the record to copy from.
-
resetRead
public void resetRead()A method to allow the reuse of the present object for non consecutive reads. -
skip
A method to skip a record from anInputStream
.- Parameters:
bin
- theFastBufferedInputStream
to read from.- Returns:
- the value of the
data-length
, or -1 if eof has been reached. - Throws:
IOException
WarcRecord.FormatException
-
read
A method to read a record from anInputStream
.- Parameters:
bin
- theFastBufferedInputStream
to read from.- Returns:
- the value of the
data-length
, or -1 if eof has been reached. - Throws:
IOException
WarcRecord.FormatException
-
write
A method to write this record to anOutputStream
.- Parameters:
out
- where to write.- Throws:
IOException
-
toString
-