Implementation-level interface for spelling suggestion.
- type SpellingWordFilter = String -> String -> Bool
- type SpellingWordCoder = String -> String
- nearbyWordFilter :: String -> String -> Bool
- anyWordFilter :: String -> String -> Bool
- editDistance :: String -> String -> Int
- soundex :: Bool -> String -> String
- phonix :: String -> String
- trivialPhoneticCode :: String -> String
- tryWord :: SpellingWordFilter -> SpellingWordCoder -> String -> [String] -> [String]
Documentation
type SpellingWordFilter = String -> String -> BoolSource
type SpellingWordCoder = String -> StringSource
nearbyWordFilter :: String -> String -> BoolSource
Return True
if the editDistance
from the target word to the
given word is small enough.
editDistance :: String -> String -> IntSource
The weighted edit distance between a pair of strings, with weights for insertion, deletion, transposition and substitution chose to try to mimic spelling errors.
soundex :: Bool -> String -> String
Compute a full soundex code; i.e., do not drop any encodable characters from the result. The leading character of the code will be folded to uppercase. Non-alphabetics are not encoded. If no alphabetics are present, the soundex code will be 0.
The two commonly encountered forms of soundex are Simplified and another known as American, Miracode, NARA or Knuth. This code will calculate either---passing True gets NARA, and False gets Simplified.
Compute a full phonix code; i.e., do not drop any encodable characters from the result. The leading character of the code will be folded to uppercase. Non-alphabetics are not encoded. If no alphabetics are present, the phonix code will be 0.
There appear to be many, many variants of phonix implemented on the web, and I'm too cheap and lazy to go find the original paper by Gadd (1990) that actually describes the original algorithm. Thus, I am taking some big guesses on intent here as I implement. Corrections, especially those involving getting me a copy of the article, are welcome.
Dropping the trailing sound seems to be an integral part of Gadd's technique, but I'm not sure how it is supposed to be done. I am currently compressing runs of vowels, and then dropping the trailing digit or vowel from the code.
Another area of confusion is whether to compress strings of the same code, as in Soundex, or merely strings of the same consonant. I have chosen the former.
trivialPhoneticCode :: String -> StringSource
Map any given word to a constant phonetic code. In other words, suppress phonetic coding.
tryWord :: SpellingWordFilter -> SpellingWordCoder -> String -> [String] -> [String]Source
Core algorithm for spelling suggestion. Takes a prefiltering function, a phonetic coding function, a limit on the number of choices returned, a target word, and a list of candidate words. Returns an ordered list of suggested candidates.