Portability | portable |
---|---|
Stability | provisional |
Maintainer | johan.tibell@gmail.com |
Safe Haskell | None |
This module defines a class, Hashable
, for types that can be
converted to a hash value. This class exists for the benefit of
hashing-based data structures. The module provides instances for
most standard types. Efficient instances for other types can be
generated automatically and effortlessly using the generics support
in GHC 7.2 and above.
The easiest way to get started is to use the hash
function. Here
is an example session with ghci
.
Prelude> import Data.Hashable Prelude> hash "foo" 60853164
- hash :: Hashable a => a -> Int
- class Hashable a where
- hashWithSalt :: Int -> a -> Int
- hashUsing :: Hashable b => (a -> b) -> Int -> a -> Int
- hashPtr :: Ptr a -> Int -> IO Int
- hashPtrWithSalt :: Ptr a -> Int -> Int -> IO Int
- hashByteArray :: ByteArray# -> Int -> Int -> Int
- hashByteArrayWithSalt :: ByteArray# -> Int -> Int -> Int -> Int
Hashing and security
Applications that use hash-based data structures to store input from untrusted users can be susceptible to "hash DoS", a class of denial-of-service attack that uses deliberately chosen colliding inputs to force an application into unexpectedly behaving with quadratic time complexity.
This library uses the SipHash algorithm to hash strings. SipHash was designed to be more robust against collision attacks than traditional hash algorithms, while retaining good performance.
To further mitigate the risk from collision attacks, this library
provides an environment variable named HASHABLE_SALT
that allows
the default salt used by the hash
function to be chosen at
application startup time.
- In the normal case, the environment variable is not set, and a
fixed salt is used that does not vary between runs. (This choice
can be made permanent by building this package with the
-ffixed-salt
flag.) - If the value is the string
random
, the system's cryptographic pseudo-random number generator will be used to supply a salt. While this may offer added security, it can also violate the assumption of some Haskell libraries that expect the results ofhash
to be stable across application runs. Choose this behaviour with care (and testing)! - When the value is an integer (prefixed with
0x
for hexadecimal), it will be used as the salt.
If HASHABLE_SALT
cannot be parsed, then the first time that a
call to hash
is made, the application will halt with an
informative error message.
(Implementation note: while SipHash is used for strings, a faster—and almost certainly less secure—algorithm is used for numeric types, on the assumption that strings are much more likely as a hash DoS attack vector.)
Computing hash values
hash :: Hashable a => a -> IntSource
Return a hash value for the argument. Defined in terms of
hashWithSalt
and a default salt.
(See the "Hashing and security" section of the Data.Hashable documentation for an important note on working safely with untrusted user input.)
The class of types that can be converted to a hash value.
hashWithSalt :: Int -> a -> IntSource
Return a hash value for the argument, using the given salt.
The general contract of hashWithSalt
is:
- If a value is hashed using the same salt during distinct runs of an application, the result must remain the same. (This is necessary to make it possible to store hashes on persistent media.)
- If two values are equal according to the
==
method, then applying thehashWithSalt
method on each of the two values must produce the same integer result if the same salt is used in each case. - It is not required that if two values are unequal
according to the
==
method, then applying thehashWithSalt
method on each of the two values must produce distinct integer results. (Every programmer will be aware that producing distinct integer results for unequal values will improve the performance of hashing-based data structures.)
This method can be used to compute different hash values for
the same input by providing a different salt in each
application of the method. This implies that any instance that
defines hashWithSalt
must make use of the salt in its
implementation.
Avalanche behavior
A good hash function has a 50% probability of flipping every bit of its result in response to a change of just one bit in its input. This property is called avalanche. To be truly general purpose, hash functions must have strong avalanche behavior.
All of the Hashable
instances provided by this module have
excellent avalanche properties.
Creating new instances
There are two ways to create new instances: by deriving instances automatically using GHC's generic programming support or by writing instances manually.
Generic instances
Beginning with GHC 7.2, the recommended way to make instances of
Hashable
for most types is to use the compiler's support for
automatically generating default instances.
{-# LANGUAGE DeriveGeneric #-} import GHC.Generics (Generic) import Data.Hashable data Foo a = Foo a String deriving (Eq, Generic) instance Hashable a => Hashable (Foo a) data Colour = Red | Green | Blue deriving Generic instance Hashable Colour
If you omit a body for the instance declaration, GHC will generate a default instance that correctly and efficiently hashes every constructor and parameter.
Understanding a compiler error
Suppose you intend to use the generic machinery to automatically
generate a Hashable
instance.
data Oops = Oops -- forgot to add "deriving Generic" here! instance Hashable Oops
And imagine that, as in the example above, you forget to add a
"deriving
" clause to your data type. At compile time,
you will get an error message from GHC that begins roughly as
follows:
Generic
No instance for (GHashable (Rep Oops))
This error can be confusing, as GHashable
is not exported (it is
an internal typeclass used by this library's generics machinery).
The correct fix is simply to add the missing "deriving
".
Generic
Writing instances by hand
To maintain high quality hashes, new Hashable
instances should be
built using existing Hashable
instances, combinators, and hash
functions.
The functions below can be used when creating new instances of
Hashable
. For many string-like types the
hashWithSalt
method can be defined in terms of either
hashPtrWithSalt
or hashByteArrayWithSalt
. Here's how you could
implement an instance for the ByteString
data type, from the
bytestring
package:
import qualified Data.ByteString as B import qualified Data.ByteString.Internal as B import qualified Data.ByteString.Unsafe as B import Data.Hashable import Foreign.Ptr (castPtr) instance Hashable B.ByteString where hashWithSalt salt bs = B.inlinePerformIO $ B.unsafeUseAsCStringLen bs $ \(p, len) -> hashPtrWithSalt p (fromIntegral len) salt
Use hashWithSalt
to compute a hash from several values, using
this recipe:
data Product a b = P a b instance (Hashable a, Hashable b) => Hashable (Product a b) where hashWithSalt s (P a b) = s `hashWithSalt` a `hashWithSalt` b
You can chain hashes together using hashWithSalt
, by following
this recipe:
combineTwo h1 h2 = h1 `hashWithSalt` h2
Transform a value into a Hashable
value, then hash the
transformed value using the given salt.
This is a useful shorthand in cases where a type can easily be
mapped to another type that is already an instance of Hashable
.
Example:
data Foo = Foo | Bar deriving (Enum) instance Hashable Foo where hashWithSalt = hashUsing fromEnum
Compute a hash value for the content of this pointer.
Compute a hash value for the content of this pointer, using an initial salt.
This function can for example be used to hash non-contiguous segments of memory as if they were one contiguous segment, by using the output of one hash as the salt for the next.
:: ByteArray# | data to hash |
-> Int | offset, in bytes |
-> Int | length, in bytes |
-> Int | hash value |
Compute a hash value for the content of this ByteArray#
,
beginning at the specified offset, using specified number of bytes.
Availability: GHC.
:: ByteArray# | data to hash |
-> Int | offset, in bytes |
-> Int | length, in bytes |
-> Int | salt |
-> Int | hash value |
Compute a hash value for the content of this ByteArray#
, using
an initial salt.
This function can for example be used to hash non-contiguous segments of memory as if they were one contiguous segment, by using the output of one hash as the salt for the next.
Availability: GHC.
Hashing types with multiple constructors
For a type with several value constructors, there are a few
possible approaches to writing a Hashable
instance.
If the type is an instance of Enum
, the easiest (and safest) path
is to convert it to an Int
, and use the existing Hashable
instance for Int
.
data Color = Red | Green | Blue deriving Enum instance Hashable Color where hashWithSalt = hashUsing fromEnum
This instance benefits from the fact that the Hashable
instance
for Int
has excellent avalanche properties.
In contrast, a very weak hash function would be:
terribleHash :: Color -> Int terribleHash salt = fromEnum
This has terrible avalanche properties, as the salt is ignored, and every input is mapped to a small integer.
If the type's constructors accept parameters, it can be important to distinguish the constructors.
data Time = Days Int | Weeks Int | Months Int
The weak hash function below guarantees a high probability of days, weeks, and months all colliding when hashed.
veryBadHash :: Time -> Int veryBadHash (Days d) = hash d veryBadHash (Weeks w) = hash w veryBadHash (Months m) = hash m
It is easy to distinguish the constructors using the hashWithSalt
function.
instance Hashable Time where hashWithSalt s (Days n) = s `hashWithSalt` (0::Int) `hashWithSalt` n hashWithSalt s (Weeks n) = s `hashWithSalt` (1::Int) `hashWithSalt` n hashWithSalt s (Months n) = s `hashWithSalt` (2::Int) `hashWithSalt` n
If a constructor accepts multiple parameters, their hashes can be chained.
data Date = Date Int Int Int instance Hashable Date where hashWithSalt s (Date yr mo dy) = s `hashWithSalt` yr `hashWithSalt` mo `hashWithSalt` dy