Safe Haskell | None |
---|---|
Language | Haskell2010 |
A basic open-addressing hash table using linear probing. Use this hash table if you...
- want the fastest possible lookups, and very fast inserts.
- don't care about wasting a little bit of memory to get it.
- don't care that a table resize might pause for a long time to rehash all of the key-value mappings.
- have a workload which is not heavy with deletes; deletes clutter the table with deleted markers and force the table to be completely rehashed fairly often.
Of the hash tables in this collection, this hash table has the best lookup performance, while maintaining competitive insert performance.
Space overhead
This table is not especially memory-efficient; firstly, the table has a maximum load factor of 0.83 and will be resized if load exceeds this value. Secondly, to improve insert and lookup performance, we store a 16-bit hash code for each key in the table.
Each hash table entry requires at least 2.25 words (on a 64-bit machine), two
for the pointers to the key and value and one quarter word for the hash code.
We don't count key and value pointers as overhead, because they have to be
there -- so the overhead for a full slot is at least one quarter word -- but
empty slots in the hash table count for a full 2.25 words of overhead. Define
m
as the number of slots in the table, n
as the number of key value
mappings, and ws
as the machine word size in bytes. If the load factor is
k=n/m
, the amount of space wasted per mapping in words is:
w(n) = (m*(2*ws + 2) - n*(2*ws)) / ws
Since m=n/k
,
w(n) = n/k * (2*ws + 2) - n*(2*ws) = (n * (2 + 2*ws*(1-k)) k) ws
Solving for k=0.83
, the maximum load factor, gives a minimum overhead of
0.71 words per mapping on a 64-bit machine, or 1.01 words per mapping on a
32-bit machine. If k=0.5
, which should be under normal usage the maximum
overhead situation, then the overhead would be 2.5 words per mapping on a
64-bit machine, or 3.0 words per mapping on a 32-bit machine.
Space overhead: experimental results
In randomized testing on a 64-bit machine (see
test/compute-overhead/ComputeOverhead.hs
in the source distribution), mean
overhead (that is, the number of words needed to store the key-value mapping
over and above the two words necessary for the key and the value pointers) is
approximately 1.24 machine words per key-value mapping with a standard
deviation of about 0.30 words, and 1.70 words per mapping at the 95th
percentile.
Expensive resizes
If enough elements are inserted into the table to make it exceed the maximum
load factor, the table is resized. A resize involves a complete rehash of all
the elements in the table, which means that any given call to insert
might
take O(n) time in the size of the table, with a large constant factor. If a
long pause waiting for the table to resize is unacceptable for your
application, you should choose the included linear hash table instead.
References:
- Knuth, Donald E. The Art of Computer Programming, vol. 3 Sorting and Searching. Addison-Wesley Publishing Company, 1973.
Synopsis
- data HashTable s k v
- new :: ST s (HashTable s k v)
- newSized :: Int -> ST s (HashTable s k v)
- delete :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s ()
- lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v)
- insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s ()
- mutate :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> (Maybe v, a)) -> ST s a
- mutateST :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> ST s (Maybe v, a)) -> ST s a
- mapM_ :: ((k, v) -> ST s b) -> HashTable s k v -> ST s ()
- foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a
- computeOverhead :: HashTable s k v -> ST s Double
Documentation
An open addressing hash table using linear probing.
Instances
HashTable HashTable Source # | |
Defined in Data.HashTable.ST.Basic new :: ST s (HashTable s k v) Source # newSized :: Int -> ST s (HashTable s k v) Source # mutate :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> (Maybe v, a)) -> ST s a Source # mutateST :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> ST s (Maybe v, a)) -> ST s a Source # insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s () Source # delete :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s () Source # lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v) Source # foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a Source # mapM_ :: ((k, v) -> ST s b) -> HashTable s k v -> ST s () Source # lookupIndex :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe Word) Source # nextByIndex :: HashTable s k v -> Word -> ST s (Maybe (Word, k, v)) Source # | |
Show (HashTable s k v) Source # | |
newSized :: Int -> ST s (HashTable s k v) Source #
See the documentation for this function in
newSized
.
delete :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s () Source #
See the documentation for this function in
delete
.
lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v) Source #
See the documentation for this function in
lookup
.
insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s () Source #
See the documentation for this function in
insert
.
mutate :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> (Maybe v, a)) -> ST s a Source #
See the documentation for this function in
mutate
.
mutateST :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> ST s (Maybe v, a)) -> ST s a Source #
See the documentation for this function in
mutateST
.
mapM_ :: ((k, v) -> ST s b) -> HashTable s k v -> ST s () Source #
See the documentation for this function in
mapM_
.
foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a Source #
See the documentation for this function in
foldM
.
computeOverhead :: HashTable s k v -> ST s Double Source #
See the documentation for this function in
computeOverhead
.