Safe Haskell | None |
---|
An implementation of Search
based on a ternary search tree
(TSTSet
): https://en.wikipedia.org/wiki/Ternary_search_tree.
The searches are performed by manually generating the close word by deleting, transposing, or adding wildcards to match additional characters.
This makes this structure fast for small distances with a small number of
generated candidates, but impractical for even slightly larger distances -
in my tests BK
outpeforms this module when the
distance is greater than 2.
The data structure has no knowledge of the distance and thus it does not
need to be rebuilt if different edit distances are needed. However this
means that it cannot work with arbitrary EditDistance
instances are
functions need to be defined manually to generate the candidates. In this
case levenshtein
uses deletions
, replaces
, and insertions
to
generate the candidates; while damerauLevenshtein
also uses
transpositions
.
- data TSTSet sym
- empty :: Ord sym => TSTSet sym
- insert :: (Ord sym, ListLike full sym) => full -> TSTSet sym -> TSTSet sym
- levenshtein :: (Ord sym, ListLike full sym) => Int -> full -> TSTSet sym -> [(full, Distance Levenshtein)]
- damerauLevenshtein :: (Ord sym, ListLike full sym) => Int -> full -> TSTSet sym -> [(full, Distance DamerauLevenshtein)]
- data WildCard a
- type WildList a = [WildCard a]
- deletions :: [a] -> [[a]]
- transpositions :: [a] -> [[a]]
- replaces :: WildList a -> [WildList a]
- insertions :: WildList a -> [WildList a]
Type
Operations
levenshtein :: (Ord sym, ListLike full sym) => Int -> full -> TSTSet sym -> [(full, Distance Levenshtein)]Source
damerauLevenshtein :: (Ord sym, ListLike full sym) => Int -> full -> TSTSet sym -> [(full, Distance DamerauLevenshtein)]Source
Candidates generation
The following functions generate candidates distant 1 from the given word, and they are reexported for completeness.
deletions
and transpositions
do not need to wildcard the word, while
replaces
and insertions
do since we are adding characters.
data WildCard a
Operations
transpositions :: [a] -> [[a]]Source
insertions :: WildList a -> [WildList a]Source