A paper by John Lamping and Eric Veach1 describing Jump consistent hash, a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. The accompanying C++ code is indeed short and sweet:
int32_t JumpConsistentHash(uint64_t key, int32_t num_buckets)
{
int64_t b = 1, j = 0;
while (j < num_buckets) {
= j;
b = key * 2862933555777941757ULL + 1;
key = (b + 1) * (double(1LL << 31)/double((key >> 33) + 1));
j }
return b;
}
However, there are a lot of implicit type conversions in the above code. This
becomes apparent if you implement the algorithm in a language without implicit
conversions. The following implementation in Haskell is quite close to a 1:1
translation of the C++ code (but using recursion instead of a while
loop),
and as you can see, there are five different type conversions happening (the
conversions are named in the code):
module JumpConsistentHash (jch) where
import Data.Bits (shiftL, shiftR)
import Data.Int (Int64, Int32)
import Data.Word (Word64)
jch :: Word64 -> Int32 -> Int32
= jch' numBuckets key 1 0
jch key numBuckets
jch' :: Int32 -> Word64 -> Int64 -> Int64 -> Int32
jch' n k b j| j >= i32ToI64 n = i64ToI32 b
| otherwise =
let k' = k * 2862933555777941757 + 1
= i64ToD (1 `shiftL` 31)/ui64ToD ((k' `shiftR` 33) + 1)
z = dToI64 ((i64ToD j + 1) * z)
j' in jch' n k' j j'
i32ToI64 :: Int32 -> Int64
= fromIntegral
i32ToI64
i64ToI32 :: Int64 -> Int32
= fromIntegral
i64ToI32
i64ToD :: Int64 -> Double
= fromIntegral
i64ToD
ui64ToD :: Word64 -> Double
= fromIntegral
ui64ToD
dToI64 :: Double -> Int64
= floor dToI64
So, in the (seemingly) simple algorithm, we have the following conversions taking place:
- The
int32
toint64
conversion happens when comparingj
andb
and is done through sign extension. - The
int64
toint32
conversion is the last step of the algorithm and is done when returningb
, which was a 64-bit integer in the loop, to a 32-bit integer. This behaviour is actually implementation-defined(!)2 3, so the algorithm relies on the compiler doing the sensible thing here. In practice, this is probably not a problem. - The
int64
todouble
anduint64
todouble
conversions happen when doing floating-point calculations with integer values and are “floating-integral conversions”. Loss of precision occurs if the integral value cannot be represented exactly as a value of the floating type. If the value being converted is outside the range of values that can be represented, the behavior is undefined. 4 - The
double
toint64
conversion truncates, i.e. discards the fractional part, of thedouble
. 5
This was, at least for me, an interesting case of how often “hidden” behaviour can be a crucial part of code.