java.io.Closeable
, java.lang.AutoCloseable
IdentitySieve
, MercatorSieve
public abstract class AbstractSieve<K,V>
extends java.lang.Object
implements java.io.Closeable
uniq
filter (in the sense of UN*X).
Here we are completely disregarding the asynchronous logic of dequeuing, which of course impacts the behaviour.
This data structure generalizes and adapts that described in the paper IRLbot: Scaling to 6 Billion Pages and Beyond by Lee et al., where it was called DRUM (Disk Repository with Update Management). We prefer not to refer anymore to disk in our abstract version because different implementations may decide not to use the disk in a strict sense to hold the items.
The structure was originally described as follows:
This implementation of the data structure is different from the one described above in the following regards:
enqueue(Object, Object)
corresponds to the call: it should be very fast,
although some implementation may require it to be blocking, and will perform the check (and update) sometime in the future; when a flush()
is performed,
a new flow of pairs (the pairs that the check method would return) is created: those pairs are actually provided to a AbstractSieve.NewFlowReceiver
object
that can decide what to do with them (see the AbstractSieve.NewFlowReceiver
documentation).
Implementors should guarantee that all method calls are thread-safe, and as efficient
as possible. The enqueue(Object, Object)
method should usually be non-blocking,
but some bounded implementations may decide to make it wait until at least one element is
dequeued: this event should be carefully documented. A call to flush()
guarantees that all new elements
that have not yet been dequeued are made available, if any.
Modifier and Type | Class | Description |
---|---|---|
static class |
AbstractSieve.DefaultUpdateStrategy<K,V> |
|
static class |
AbstractSieve.DiskNewFlow<T> |
A basic, on-disk
AbstractSieve.NewFlowReceiver . |
static interface |
AbstractSieve.NewFlowReceiver<K> |
An object that can receive a new flow of hash/key pairs and that
acts as a listener for the
AbstractSieve . |
static class |
AbstractSieve.SieveEntry<K,V> |
A (key,value) pair.
|
static interface |
AbstractSieve.UpdateStrategy<K,V> |
An update strategy: it determines how a stored value should be updated in the presence of duplicate keys.
|
Modifier and Type | Field | Description |
---|---|---|
static it.unimi.dsi.sux4j.mph.AbstractHashFunction<java.lang.CharSequence> |
CHAR_SEQUENCE_HASHING_STRATEGY |
|
protected it.unimi.dsi.sux4j.mph.AbstractHashFunction<K> |
hashingStrategy |
|
protected ByteSerializerDeserializer<K> |
keySerDeser |
|
protected AbstractSieve.NewFlowReceiver<K> |
newFlowReceiver |
|
protected AbstractSieve.UpdateStrategy<K,V> |
updateStrategy |
|
protected ByteSerializerDeserializer<V> |
valueSerDeser |
Constructor | Description |
---|---|
AbstractSieve(ByteSerializerDeserializer<K> keySerDeser,
ByteSerializerDeserializer<V> valueSerDeser,
it.unimi.dsi.sux4j.mph.AbstractHashFunction<K> hashingStrategy,
AbstractSieve.UpdateStrategy<K,V> updateStrategy) |
Creates a new sieve with the given data.
|
Modifier and Type | Method | Description |
---|---|---|
abstract void |
close() |
Closes (forever) this sieve.
|
abstract boolean |
enqueue(K key,
V value) |
Add the given (key,value) pair to the store.
|
abstract void |
flush() |
Forces the check+update of all pairs that have been enqueued.
|
void |
setNewFlowRecevier(AbstractSieve.NewFlowReceiver<K> newFlowReceiver) |
Sets the receiver for the new flows generated by this sieve.
|
protected final ByteSerializerDeserializer<K> keySerDeser
protected final ByteSerializerDeserializer<V> valueSerDeser
protected final it.unimi.dsi.sux4j.mph.AbstractHashFunction<K> hashingStrategy
protected final AbstractSieve.UpdateStrategy<K,V> updateStrategy
protected AbstractSieve.NewFlowReceiver<K> newFlowReceiver
public static final it.unimi.dsi.sux4j.mph.AbstractHashFunction<java.lang.CharSequence> CHAR_SEQUENCE_HASHING_STRATEGY
public AbstractSieve(ByteSerializerDeserializer<K> keySerDeser, ByteSerializerDeserializer<V> valueSerDeser, it.unimi.dsi.sux4j.mph.AbstractHashFunction<K> hashingStrategy, AbstractSieve.UpdateStrategy<K,V> updateStrategy)
keySerDeser
- the serializer and deserializer to be used to store keys.valueSerDeser
- the serializer and deserializer to be used to store values.hashingStrategy
- the function to be applied to keys (CharSequence
) to obtain hash values (the store actually contains hash values, not keys).updateStrategy
- the strategy used to update the values associated to duplicate keys.public abstract boolean enqueue(K key, V value) throws java.io.IOException, java.lang.InterruptedException
java.io.IOException
java.lang.InterruptedException
public abstract void close() throws java.io.IOException
close
in interface java.lang.AutoCloseable
close
in interface java.io.Closeable
java.io.IOException
public void setNewFlowRecevier(AbstractSieve.NewFlowReceiver<K> newFlowReceiver)
newFlowReceiver
- the new flow receiver for this sieve.public abstract void flush() throws java.io.IOException, java.lang.InterruptedException
java.io.IOException
java.lang.InterruptedException