A paper by John Lamping and Eric Veach1 describing Jump consistent hash, a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. The accompanying C++ code is indeed short and sweet:
However, there are a lot of implicit type conversions in the above code. This becomes apparent if you implement the algorithm in a language without implicit conversions. The following implementation in Haskell is quite close to a 1:1 translation of the C++ code (but using recursion instead of a
while loop), and as you can see, there are five different type conversions happening (the conversions are named in the code):
module JumpConsistentHash (jch) where import Data.Bits (shiftL, shiftR) import Data.Int (Int64, Int32) import Data.Word (Word64) jch :: Word64 -> Int32 -> Int32 jch key numBuckets = jch' numBuckets key 1 0 jch' :: Int32 -> Word64 -> Int64 -> Int64 -> Int32 jch' n k b j | j >= i32ToI64 n = i64ToI32 b | otherwise = let k' = k * 2862933555777941757 + 1 z = i64ToD (1 `shiftL` 31)/ui64ToD ((k' `shiftR` 33) + 1) j' = dToI64 ((i64ToD j + 1) * z) in jch' n k' j j' i32ToI64 :: Int32 -> Int64 i32ToI64 = fromIntegral i64ToI32 :: Int64 -> Int32 i64ToI32 = fromIntegral i64ToD :: Int64 -> Double i64ToD = fromIntegral ui64ToD :: Word64 -> Double ui64ToD = fromIntegral dToI64 :: Double -> Int64 dToI64 = floor
So, in the (seemingly) simple algorithm, we have the following conversions taking place:
int64conversion happens when comparing
band is done through sign extension.
int32conversion is the last step of the algorithm and is done when returning
b, which was a 64-bit integer in the loop, to a 32-bit integer. This behaviour is actually implementation-defined(!)2 3, so the algorithm relies on the compiler doing the sensible thing here. In practice, this is probably not a problem.
doubleconversions happen when doing floating-point calculations with integer values and are “floating-integral conversions”. Loss of precision occurs if the integral value cannot be represented exactly as a value of the floating type. If the value being converted is outside the range of values that can be represented, the behavior is undefined. 4
int64conversion truncates, i.e. discards the fractional part, of the
This was, at least for me, an interesting case of how often “hidden” behaviour can be a crucial part of code.
The value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.↩
C++11 Standard §4.7.3↩
C++11 Standard §4.9.2↩
C++11 Standard §4.9.1↩