it.unimi.dsi.law.util.ConsistentHashFunction<T>

public final class ConsistentHashFunction<T extends Comparable<? super T>>
extends Object

Provides an implementation of consistent hashing. Consistent hashing has been introduced in

Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web, by David R. Karger, Eric Lehman, Frank T. Leighton, Rina Panigrahy, Matthew S. Levine and Daniel Lewin, Proc. of the twenty-ninth annual ACM symposium on Theory of computing, El Paso, Texas, United States, 1997, pages 654−663.

This class provides some extension to the original definition: weighted buckets and skippable buckets. More precisely, keys are distributed on buckets proportionally to the weight of the buckets, and it is possible to specify a skip strategy for buckets.

Consistent Hash Function: Properties

A consistent hash function consists, at any time, of a set of objects, called buckets, each with a specified weight (a positive integer). At the beginning, there are no buckets, but you can add a new bucket, or remove an existing bucket.

The method hash(long) can be used to hash a point (a long) to a bucket. More precisely, when applied to a given point, this method will return one of the buckets, and the method will satisfy the following properties:

the bucket returned by hash(long) is one of the buckets currently present in the consistent hash function;
the fraction of points that are hashed to a specific bucket is approximately proportional to the weight of the bucket; for example, if there are only two buckets A and B, and the weight of A is 2 whereas the weight of B is 3, then about 2/5 of all longs will be hashed to A and about 3/5 will be hashed to B;
every time you add a new bucket, some of the points will of course be hashed to the new bucket, but it is impossible that a point that was hashed to an old bucket will now be hashed to some other old bucket; more formally, consider the following sequence of instructions:
```
 Object A = chf.hash(x);
 chf.add(B);
 Object C = chf.hash(x);
 
```
at the end either A==C (i.e., the hash of x has not changed after adding B), or C==B (i.e., now x is hashed to the new object).

Otherwise said, if a new bucket is added, then the number of keys that change their assignment is the minimum necessary; more importantly, it is impossible that a key changes its bucket assignment towards a bucket that already existed: when a bucket is added old buckets can only lose keys towards the new bucket.

It is easy to see that the last property stated above can be equivalently stated by saying that every point determines a (total) order among buckets; the hash(long) method only returns the first element of this order. It is also possible, using hash(long, int) to obtain an array of buckets containing the (first part of the) order. In particular, if the latter method is called with a specified length corresponding to the number of buckets, the whole order will be returned.

Implementation Details

With each bucket, we associate a number of points, called replicae, located on the unit circle (the unit circle itself is represented approximately on the whole range of longs). Then, given a point p, one can get the bucket corresponding to the replica that is closest to p.

The method that gets an array containing the buckets looks at the buckets that are closest to a point, in distance order, without repetitions. In particular, by computing an array as large as the number of buckets, you will obtain a permutation of the buckets themselves. Indeed, another viewpoint on consistent hashing is that it associates a random permutation of the buckets to each point (albeit the interpretation of weights, in that case, becomes a bit twisted).

The number of replicas associated to a bucket is fixed when the bucket is inserted in the map. The actual number depends on the weight and on the constant REPLICAE_PER_BUCKET.

This class handles overlaps (i.e., conflicts in a replica creation). In that case, a local deterministic ordering is generated using the hash code (and, in case of a tie, the lexicographic order of string representation) of the buckets.

The hashing function is deterministically based on the hash code of the buckets, which should be of good quality. This is essential to ensure deterministic replicability of the results of this function across different instances.

Nested Class Summary

Nested Classes

Modifier and Type Class Description

static interface ConsistentHashFunction.SkipStrategy<T>
Allows to skip suitable items when searching for the closest replica.

Field Summary

Fields
Modifier and Type	Field	Description
`protected Set<T>`	`buckets`	The cached key set of `sizes`.
`protected ObjectSortedSet<Long2ObjectMap.Entry<Object>>`	`entrySet`	The cached key set of `replicae`.
`protected Long2ObjectSortedMap<Object>`	`replicae`	Maps points in the unit interval to buckets.
`static int`	`REPLICAE_PER_BUCKET`	Each bucket is replicated this number of times.
`protected Object2IntMap<T>`	`sizes`	For each bucket, its size.
`protected ConsistentHashFunction.SkipStrategy<T>`	`skipStrategy`	The optional strategy to skip buckets, or `null`.

Constructor Summary

Constructors
Constructor	Description
`ConsistentHashFunction()`	Creates a new consistent hash function.
`ConsistentHashFunction(ConsistentHashFunction.SkipStrategy<T> skipStrategy)`	Creates a new consistent hash function with given skip strategy.

Method Summary

Modifier and Type	Method	Description
`boolean`	`add(T bucket, int weight)`	Adds a bucket to the map.
`Set<T>`	`buckets()`	Returns the set of buckets of this consistent hash function.
`T`	`hash(long point)`	Returns the bucket of the replica that is closest to the given point.
`Object[]`	`hash(long point, int n)`	Returns an array of buckets whose replicae are close to the given point.
`T`	`hash(Object key)`	Returns the bucket of the replica that is closest to the given key.
`Object[]`	`hash(Object key, int n)`	Returns an array of buckets whose replicae are close to the given object.
`boolean`	`remove(T bucket)`	Removes a bucket.
`String`	`toString()`

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- REPLICAE_PER_BUCKET
  
  public static final int REPLICAE_PER_BUCKET
  
  Each bucket is replicated this number of times.
  
  See Also:
  
  Constant Field Values
- replicae
  
  protected final Long2ObjectSortedMap<Object> replicae
  
  Maps points in the unit interval to buckets.
- entrySet
  
  protected final ObjectSortedSet<Long2ObjectMap.Entry<Object>> entrySet
  
  The cached key set of replicae.
- sizes
  
  protected final Object2IntMap<T extends Comparable<? super T>> sizes
  
  For each bucket, its size.
- buckets
  
  protected final Set<T extends Comparable<? super T>> buckets
  
  The cached key set of sizes.
- skipStrategy
  
  protected final ConsistentHashFunction.SkipStrategy<T extends Comparable<? super T>> skipStrategy
  
  The optional strategy to skip buckets, or null.
Constructor Details
- ConsistentHashFunction
  
  public ConsistentHashFunction()
  
  Creates a new consistent hash function.
- ConsistentHashFunction
  
  public ConsistentHashFunction(ConsistentHashFunction.SkipStrategy<T> skipStrategy)
  
  Creates a new consistent hash function with given skip strategy.
  
  Parameters:
  
  skipStrategy - a skip strategy, or null.
Method Details
- add
  
  public boolean add(T bucket, int weight)
  
  Adds a bucket to the map.
  
  Parameters:
  
  bucket - the new bucket.
  
  weight - the weight of the new bucket; buckets with a larger weight are returned proportionately more often.
  
  Returns:
  
  false if the bucket was already present.
- remove
  
  public boolean remove(T bucket)
  
  Removes a bucket.
  
  Parameters:
  
  bucket - the bucket to be removed.
  
  Returns:
  
  false if the bucket was not present.
- hash
  
  public Object[] hash(long point, int n)
  
  Returns an array of buckets whose replicae are close to the given point. The first element will be the bucket of the replica closest to the point, followed by the bucket of the next closest replica (whose bucket is not the first, of course) and so on.
  
  Parameters:
  
  point - a point on the unit circle.
  
  n - the number of closest buckets to return.
  
  Returns:
  
  an array of distinct buckets of the closest replicas; the array could be shorter than n if there are not enough buckets and, in case a skip strategy has been specified, it could be empty even if the bucket set is nonempty.
- hash
  
  public Object[] hash(Object key, int n)
  
  Returns an array of buckets whose replicae are close to the given object.
  
  Parameters:
  
  key - an object ot hash.
  
  n - the number of close buckets to return.
  This method just uses hashCode() << 32 as point for hash(long,int)
  
  Returns:
  
  an array of distinct buckets of the closest replicas; the array could be shorter than n if there are not enough buckets and, in case a skip strategy has been specified, it could be empty even if the bucket set is nonempty.
  
  See Also:
  
  hash(long, int)
- hash
  
  public T hash(long point)
  
  Returns the bucket of the replica that is closest to the given point.
  
  Parameters:
  
  point - a point on the unit circle.
  
  Returns:
  
  the bucket of the closest replica, or null if there are no buckets or all buckets must be skipped.
  
  Throws:
  
  NoSuchElementException - if there are no buckets, or if a skip strategy has been specified and it skipped all existings buckets.
  
  See Also:
  
  hash(long, int)
- hash
  
  public T hash(Object key)
  
  Returns the bucket of the replica that is closest to the given key.
  
  Parameters:
  
  key - an object to hash.
  
  Returns:
  
  the bucket of the closest replica, or null if there are no buckets or all buckets must be skipped.
  
  Throws:
  
  NoSuchElementException - if there are no buckets, or if a skip strategy has been specified and it skipped all existings buckets.
  
  See Also:
  
  hash(Object, int)
- buckets
  
  public Set<T> buckets()
  
  Returns the set of buckets of this consistent hash function.
  
  Returns:
  
  the set of buckets.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object

Class ConsistentHashFunction<T extends Comparable<? super T>>

Consistent Hash Function: Properties

Implementation Details

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

REPLICAE_PER_BUCKET

replicae

entrySet

sizes

buckets

skipStrategy

Constructor Details

ConsistentHashFunction

ConsistentHashFunction

Method Details

add

remove

hash

hash

hash

hash

buckets

toString