Class InspectableBufferedInputStream

java.lang.Object
java.io.InputStream
it.unimi.dsi.fastutil.io.MeasurableInputStream
it.unimi.dsi.law.warc.io.InspectableBufferedInputStream
All Implemented Interfaces:
MeasurableStream, Closeable, AutoCloseable

public class InspectableBufferedInputStream
extends MeasurableInputStream
An input stream that wraps an underlying input stream to make it rewindable and partially inspectable, using a bounded-capacity memory buffer and an overflow file.

Stream behaviour

In the following description, we let K0 be the buffer size, K be the number of bytes read from the underlying stream, P be the index of the next byte that will be returned by this stream (indices start from 0) and L be the number of bytes that will be required. Note that PK, and equality holds if this stream was never rewound; otherwise, P may be smaller than K (and, in particular, it will be zero just after a rewind).

When the stream is connected, up to K0 bytes are read and stored in the buffer; after that, the buffer itself becomes available for inspection. Of course, K is set to the number of bytes actually read, whereas P=0.

Upon reading, as long as P+L-1<K, no byte must actually be read from the input stream. Otherwise, up to P+L-K bytes are read from the input stream and stored onto the overflow file before returning them to the user.

Connecting and disposing

Objects of this class are reusable by design. At any moment, they may be in one of three states: connected, ready, disposed:

  • connected: this stream is connected to an underlying input stream, and has an overflow file open and partially filled; notice that, since the overflow file is reused, the file itself may be larger than the number of bytes written in it;
  • ready: this stream is not connected to any underlying input stream, but it has an overflow file (not open, but ready to be used); notice that, since the overflow file is reused, the file itself may be nonempty;
  • disposed: this stream cannot be used anymore: its resources are disposed and, in particular, its overflow file was actually deleted.

At creation, this stream is ready; it can be connected using connect(InputStream). At any time, it can become ready again by a call to close(). The close() method does not truncate the overflow file; if the user wants to truncate the file, it can do so by calling truncate(long) after closing. The dispose() method makes this stream disposed; this method is called on finalization.

Buffering

This class provides no form of buffering except for the memory buffer described above. Users should consider providing a buffered underlying input stream, or wrapping instances of this class by a FastBufferedInputStream: the former would be appropriate only for those cases when fillAndRewind() is not used; the latter can make accesses more efficient, only if the size of the underlying input stream is often much larger than the buffer size.

  • Field Details

    • LOGGER

      public static final Logger LOGGER
    • DEBUG

      public static final boolean DEBUG
      See Also:
      Constant Field Values
    • OVERFLOW_FILE_RANDOM_PATH_ELEMENTS

      public static final int OVERFLOW_FILE_RANDOM_PATH_ELEMENTS
      The number of path elements for the hierarchical overflow file (see Util.createHierarchicalTempFile(File, int, String, String)).
      See Also:
      Constant Field Values
    • DEFAULT_BUFFER_SIZE

      public static final int DEFAULT_BUFFER_SIZE
      The default buffer size (64KiB).
      See Also:
      Constant Field Values
    • buffer

      public byte[] buffer
      The buffer. When connected, it is filled with the first portion of the underlying input stream (read at connection). The buffer is available for inspection, but users should not modify its content; the number of bytes actually available is inspectable.
    • inspectable

      public int inspectable
      The number of bytes read in the buffer, when connected. It is the minimum between buffer.size and the length of the underlying stream.
    • overflowFile

      public final File overflowFile
      The overflow file used by this stream: it is created at construction time, and deleted on dispose(), finalization, or exit.
  • Constructor Details

    • InspectableBufferedInputStream

      public InspectableBufferedInputStream​(int bufferSize, File overflowFileDir) throws IOException
      Creates a new ready stream.
      Parameters:
      bufferSize - the buffer size, in bytes.
      overflowFileDir - the directory where the overflow file should be created, or null for the default temporary directory.
      Throws:
      IOException - if some exception occurs during creation.
    • InspectableBufferedInputStream

      public InspectableBufferedInputStream​(int bufferSize) throws IOException
      Creates a new ready stream using default temporary directory for the overflow file.
      Parameters:
      bufferSize - the buffer size, in bytes.
      Throws:
      IOException - if some exception occurs during creation.
    • InspectableBufferedInputStream

      public InspectableBufferedInputStream() throws IOException
      Creates a new ready stream with default buffer size, and using default temporary directory for the overflow file.
      Throws:
      IOException - if some exception occurs during creation.
  • Method Details

    • connect

      public void connect​(InputStream underlying) throws IOException
      Connects to a given input stream, and fills the buffer accordingly. Can only be called on a non-disposed stream.
      Parameters:
      underlying - the underlying input stream to which we should connect.
      Throws:
      IOException - if some exception occurs while reading
    • truncate

      public void truncate​(long size) throws FileNotFoundException, IOException
      Truncates the overflow file to a given size. Can only be called when this stream is ready.
      Parameters:
      size - the new size; the final size is guaranteed to be no more than this.
      Throws:
      IOException - if some exception occurs while truncating the file
      FileNotFoundException
    • readBytes

      public long readBytes()
      The number of bytes read so far from the underlying stream.
      Returns:
      the number of bytes read so far from the underlying stream.
    • dispose

      public void dispose() throws IOException
      Disposes this stream, deleting the overflow file and nulling the buffer. After this, the stream is unusable.
      Throws:
      IOException
    • finalize

      protected void finalize() throws Throwable
      Overrides:
      finalize in class Object
      Throws:
      Throwable
    • close

      public void close() throws IOException
      Makes this stream ready. Can only be called on a non-disposed stream. If the stream is ready, it does nothing. If the stream is connected, it closes the underlying stream, making this stream ready for a new connection or to be disposed.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class InputStream
      Throws:
      IOException
    • rewind

      public void rewind() throws IOException
      Rewinds this stream. Can only be called on a connected stream.
      Throws:
      IOException
    • available

      public int available() throws IOException
      Overrides:
      available in class InputStream
      Throws:
      IOException
    • read

      public int read​(byte[] b, int offset, int length) throws IOException
      Overrides:
      read in class InputStream
      Throws:
      IOException
    • read

      public int read​(byte[] b) throws IOException
      Overrides:
      read in class InputStream
      Throws:
      IOException
    • skip

      public long skip​(long n) throws IOException
      Overrides:
      skip in class InputStream
      Throws:
      IOException
    • read

      public int read() throws IOException
      Specified by:
      read in class InputStream
      Throws:
      IOException
    • overflowLength

      public long overflowLength()
      Returns the current length of the overflow file.
      Returns:
      the length of the overflow file.
    • fill

      public void fill​(long limit) throws IOException
      Reads the underlying input stream up to a given limit.
      Parameters:
      limit - the maximum number of bytes to be read, except for the memory buffer. More precisely, up to Math.max(buffer.length,limit) bytes are read (because the buffer is filled at connection).
      Throws:
      IOException
    • fill

      public void fill() throws IOException
      Reads fully the underlying input stream.
      Throws:
      IOException
      See Also:
      fill(long)
    • fillAndRewind

      public void fillAndRewind() throws IOException
      Reads fully the underlying input stream and rewinds.
      Throws:
      IOException
      See Also:
      fill(), rewind()
    • length

      public long length()
      Returns the overall length of this input stream. This method calls it with argument Long.MAX_VALUE.
      Throws:
      RuntimeException - wrapping an IOException if the call to fill(long) does.
    • position

      public long position()