Class ConsistentHashFunction<T extends Comparable<? super T>>
public final class ConsistentHashFunction<T extends Comparable<? super T>> extends Object
This class provides some extension to the original definition: weighted buckets and skippable buckets. More precisely, keys are distributed on buckets proportionally to the weight of the buckets, and it is possible to specify a skip strategy for buckets.Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web, by David R. Karger, Eric Lehman, Frank T. Leighton, Rina Panigrahy, Matthew S. Levine and Daniel Lewin, Proc. of the twenty-ninth annual ACM symposium on Theory of computing, El Paso, Texas, United States, 1997, pages 654−663.
Consistent Hash Function: Properties
A consistent hash function consists, at any time, of a set of objects, called buckets, each with a specified weight (a positive integer). At the beginning, there are no buckets, but you can add a new bucket, or remove an existing bucket.
The method hash(long)
can be used to hash a point (a long) to a bucket.
More precisely, when applied to a given point, this method will return one of the buckets, and
the method will satisfy the following properties:
- the bucket returned by
hash(long)
is one of the buckets currently present in the consistent hash function; - the fraction of points that are hashed to a specific bucket is approximately proportional to the weight of the bucket; for example, if there are only two buckets A and B, and the weight of A is 2 whereas the weight of B is 3, then about 2/5 of all longs will be hashed to A and about 3/5 will be hashed to B;
- every time you add a new bucket, some of the points will of
course be hashed to the new bucket, but it is impossible that a point that was hashed to an old
bucket will now be hashed to some other old bucket; more formally, consider the following
sequence of instructions:
Object A = chf.hash(x); chf.add(B); Object C = chf.hash(x);
at the end eitherA==C
(i.e., the hash of x has not changed after adding B), orC==B
(i.e., now x is hashed to the new object).
Otherwise said, if a new bucket is added, then the number of keys that change their assignment is the minimum necessary; more importantly, it is impossible that a key changes its bucket assignment towards a bucket that already existed: when a bucket is added old buckets can only lose keys towards the new bucket.
It is easy to see that the last property stated above can be equivalently stated by saying
that every point determines a (total) order among buckets; the hash(long)
method only
returns the first element of this order. It is also possible, using hash(long, int)
to
obtain an array of buckets containing the (first part of the) order. In particular, if
the latter method is called with a specified length corresponding to the number of buckets, the
whole order will be returned.
Implementation Details
With each bucket, we associate a number of points, called replicae, located on the unit
circle (the unit circle itself is represented approximately on the whole range of
long
s). Then, given a point p, one can
get the bucket corresponding to the replica that is closest to p.
The method that gets an array containing the buckets looks at the buckets that are closest to a point, in distance order, without repetitions. In particular, by computing an array as large as the number of buckets, you will obtain a permutation of the buckets themselves. Indeed, another viewpoint on consistent hashing is that it associates a random permutation of the buckets to each point (albeit the interpretation of weights, in that case, becomes a bit twisted).
The number of replicas associated to a bucket is fixed when the bucket is inserted in the
map. The actual number depends on the weight and on the constant REPLICAE_PER_BUCKET
.
This class handles overlaps (i.e., conflicts in a replica creation). In that case, a local deterministic ordering is generated using the hash code (and, in case of a tie, the lexicographic order of string representation) of the buckets.
The hashing function is deterministically based on the hash code of the buckets, which should be of good quality. This is essential to ensure deterministic replicability of the results of this function across different instances.
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
ConsistentHashFunction.SkipStrategy<T>
Allows to skip suitable items when searching for the closest replica. -
Field Summary
Fields Modifier and Type Field Description protected Set<T>
buckets
The cached key set ofsizes
.protected ObjectSortedSet<Long2ObjectMap.Entry<Object>>
entrySet
The cached key set ofreplicae
.protected Long2ObjectSortedMap<Object>
replicae
Maps points in the unit interval to buckets.static int
REPLICAE_PER_BUCKET
Each bucket is replicated this number of times.protected Object2IntMap<T>
sizes
For each bucket, its size.protected ConsistentHashFunction.SkipStrategy<T>
skipStrategy
The optional strategy to skip buckets, ornull
. -
Constructor Summary
Constructors Constructor Description ConsistentHashFunction()
Creates a new consistent hash function.ConsistentHashFunction(ConsistentHashFunction.SkipStrategy<T> skipStrategy)
Creates a new consistent hash function with given skip strategy. -
Method Summary
Modifier and Type Method Description boolean
add(T bucket, int weight)
Adds a bucket to the map.Set<T>
buckets()
Returns the set of buckets of this consistent hash function.T
hash(long point)
Returns the bucket of the replica that is closest to the given point.Object[]
hash(long point, int n)
Returns an array of buckets whose replicae are close to the given point.T
hash(Object key)
Returns the bucket of the replica that is closest to the given key.Object[]
hash(Object key, int n)
Returns an array of buckets whose replicae are close to the given object.boolean
remove(T bucket)
Removes a bucket.String
toString()
-
Field Details
-
REPLICAE_PER_BUCKET
public static final int REPLICAE_PER_BUCKETEach bucket is replicated this number of times.- See Also:
- Constant Field Values
-
replicae
Maps points in the unit interval to buckets. -
entrySet
The cached key set ofreplicae
. -
sizes
For each bucket, its size. -
buckets
The cached key set ofsizes
. -
skipStrategy
The optional strategy to skip buckets, ornull
.
-
-
Constructor Details
-
ConsistentHashFunction
public ConsistentHashFunction()Creates a new consistent hash function. -
ConsistentHashFunction
Creates a new consistent hash function with given skip strategy.- Parameters:
skipStrategy
- a skip strategy, ornull
.
-
-
Method Details
-
add
Adds a bucket to the map.- Parameters:
bucket
- the new bucket.weight
- the weight of the new bucket; buckets with a larger weight are returned proportionately more often.- Returns:
- false if the bucket was already present.
-
remove
Removes a bucket.- Parameters:
bucket
- the bucket to be removed.- Returns:
- false if the bucket was not present.
-
hash
Returns an array of buckets whose replicae are close to the given point. The first element will be the bucket of the replica closest to the point, followed by the bucket of the next closest replica (whose bucket is not the first, of course) and so on.- Parameters:
point
- a point on the unit circle.n
- the number of closest buckets to return.- Returns:
- an array of distinct buckets of the closest replicas; the array could be shorter than
n
if there are not enough buckets and, in case a skip strategy has been specified, it could be empty even if the bucket set is nonempty.
-
hash
Returns an array of buckets whose replicae are close to the given object.- Parameters:
key
- an object ot hash.n
- the number of close buckets to return.This method just uses
hashCode() << 32
as point forhash(long,int)
- Returns:
- an array of distinct buckets of the closest replicas; the array could be shorter than
n
if there are not enough buckets and, in case a skip strategy has been specified, it could be empty even if the bucket set is nonempty. - See Also:
hash(long, int)
-
hash
Returns the bucket of the replica that is closest to the given point.- Parameters:
point
- a point on the unit circle.- Returns:
- the bucket of the closest replica, or
null
if there are no buckets or all buckets must be skipped. - Throws:
NoSuchElementException
- if there are no buckets, or if a skip strategy has been specified and it skipped all existings buckets.- See Also:
hash(long, int)
-
hash
Returns the bucket of the replica that is closest to the given key.- Parameters:
key
- an object to hash.- Returns:
- the bucket of the closest replica, or
null
if there are no buckets or all buckets must be skipped. - Throws:
NoSuchElementException
- if there are no buckets, or if a skip strategy has been specified and it skipped all existings buckets.- See Also:
hash(Object, int)
-
buckets
Returns the set of buckets of this consistent hash function.- Returns:
- the set of buckets.
-
toString
-