Class ConsistentHashFunction<T extends Comparable<? super T>>

java.lang.Object
it.unimi.dsi.law.util.ConsistentHashFunction<T>

public final class ConsistentHashFunction<T extends Comparable<? super T>>
extends Object
Provides an implementation of consistent hashing. Consistent hashing has been introduced in

Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web, by David R. Karger, Eric Lehman, Frank T. Leighton, Rina Panigrahy, Matthew S. Levine and Daniel Lewin, Proc. of the twenty-ninth annual ACM symposium on Theory of computing, El Paso, Texas, United States, 1997, pages 654−663.

This class provides some extension to the original definition: weighted buckets and skippable buckets. More precisely, keys are distributed on buckets proportionally to the weight of the buckets, and it is possible to specify a skip strategy for buckets.

Consistent Hash Function: Properties

A consistent hash function consists, at any time, of a set of objects, called buckets, each with a specified weight (a positive integer). At the beginning, there are no buckets, but you can add a new bucket, or remove an existing bucket.

The method hash(long) can be used to hash a point (a long) to a bucket. More precisely, when applied to a given point, this method will return one of the buckets, and the method will satisfy the following properties:

  1. the bucket returned by hash(long) is one of the buckets currently present in the consistent hash function;
  2. the fraction of points that are hashed to a specific bucket is approximately proportional to the weight of the bucket; for example, if there are only two buckets A and B, and the weight of A is 2 whereas the weight of B is 3, then about 2/5 of all longs will be hashed to A and about 3/5 will be hashed to B;
  3. every time you add a new bucket, some of the points will of course be hashed to the new bucket, but it is impossible that a point that was hashed to an old bucket will now be hashed to some other old bucket; more formally, consider the following sequence of instructions:
     Object A = chf.hash(x);
     chf.add(B);
     Object C = chf.hash(x);
     
    at the end either A==C (i.e., the hash of x has not changed after adding B), or C==B (i.e., now x is hashed to the new object).

Otherwise said, if a new bucket is added, then the number of keys that change their assignment is the minimum necessary; more importantly, it is impossible that a key changes its bucket assignment towards a bucket that already existed: when a bucket is added old buckets can only lose keys towards the new bucket.

It is easy to see that the last property stated above can be equivalently stated by saying that every point determines a (total) order among buckets; the hash(long) method only returns the first element of this order. It is also possible, using hash(long, int) to obtain an array of buckets containing the (first part of the) order. In particular, if the latter method is called with a specified length corresponding to the number of buckets, the whole order will be returned.

Implementation Details

With each bucket, we associate a number of points, called replicae, located on the unit circle (the unit circle itself is represented approximately on the whole range of longs). Then, given a point p, one can get the bucket corresponding to the replica that is closest to p.

The method that gets an array containing the buckets looks at the buckets that are closest to a point, in distance order, without repetitions. In particular, by computing an array as large as the number of buckets, you will obtain a permutation of the buckets themselves. Indeed, another viewpoint on consistent hashing is that it associates a random permutation of the buckets to each point (albeit the interpretation of weights, in that case, becomes a bit twisted).

The number of replicas associated to a bucket is fixed when the bucket is inserted in the map. The actual number depends on the weight and on the constant REPLICAE_PER_BUCKET.

This class handles overlaps (i.e., conflicts in a replica creation). In that case, a local deterministic ordering is generated using the hash code (and, in case of a tie, the lexicographic order of string representation) of the buckets.

The hashing function is deterministically based on the hash code of the buckets, which should be of good quality. This is essential to ensure deterministic replicability of the results of this function across different instances.

  • Field Details

  • Constructor Details

    • ConsistentHashFunction

      public ConsistentHashFunction()
      Creates a new consistent hash function.
    • ConsistentHashFunction

      public ConsistentHashFunction​(ConsistentHashFunction.SkipStrategy<T> skipStrategy)
      Creates a new consistent hash function with given skip strategy.
      Parameters:
      skipStrategy - a skip strategy, or null.
  • Method Details

    • add

      public boolean add​(T bucket, int weight)
      Adds a bucket to the map.
      Parameters:
      bucket - the new bucket.
      weight - the weight of the new bucket; buckets with a larger weight are returned proportionately more often.
      Returns:
      false if the bucket was already present.
    • remove

      public boolean remove​(T bucket)
      Removes a bucket.
      Parameters:
      bucket - the bucket to be removed.
      Returns:
      false if the bucket was not present.
    • hash

      public Object[] hash​(long point, int n)
      Returns an array of buckets whose replicae are close to the given point. The first element will be the bucket of the replica closest to the point, followed by the bucket of the next closest replica (whose bucket is not the first, of course) and so on.
      Parameters:
      point - a point on the unit circle.
      n - the number of closest buckets to return.
      Returns:
      an array of distinct buckets of the closest replicas; the array could be shorter than n if there are not enough buckets and, in case a skip strategy has been specified, it could be empty even if the bucket set is nonempty.
    • hash

      public Object[] hash​(Object key, int n)
      Returns an array of buckets whose replicae are close to the given object.
      Parameters:
      key - an object ot hash.
      n - the number of close buckets to return.

      This method just uses hashCode() << 32 as point for hash(long,int)

      Returns:
      an array of distinct buckets of the closest replicas; the array could be shorter than n if there are not enough buckets and, in case a skip strategy has been specified, it could be empty even if the bucket set is nonempty.
      See Also:
      hash(long, int)
    • hash

      public T hash​(long point)
      Returns the bucket of the replica that is closest to the given point.
      Parameters:
      point - a point on the unit circle.
      Returns:
      the bucket of the closest replica, or null if there are no buckets or all buckets must be skipped.
      Throws:
      NoSuchElementException - if there are no buckets, or if a skip strategy has been specified and it skipped all existings buckets.
      See Also:
      hash(long, int)
    • hash

      public T hash​(Object key)
      Returns the bucket of the replica that is closest to the given key.
      Parameters:
      key - an object to hash.
      Returns:
      the bucket of the closest replica, or null if there are no buckets or all buckets must be skipped.
      Throws:
      NoSuchElementException - if there are no buckets, or if a skip strategy has been specified and it skipped all existings buckets.
      See Also:
      hash(Object, int)
    • buckets

      public Set<T> buckets()
      Returns the set of buckets of this consistent hash function.
      Returns:
      the set of buckets.
    • toString

      public String toString()
      Overrides:
      toString in class Object