License | None |
---|---|
Maintainer | Volker Strobel (volker.strobel87@gmail.com) |
Stability | experimental |
Portability | None |
Safe Haskell | None |
Language | Haskell2010 |
This module is the main interface for Tweet classification.
- type FeatureMap = Map String Float
- parseCsv :: Text -> Either String (Vector Tweet)
- getFiles :: FilePath -> IO [FilePath]
- extractFeatures :: Tweet -> FeatureMap
- frequency :: Vector String -> FeatureMap
- countItem :: Map String Float -> String -> FeatureMap
- insertInMap :: Map Tweet FeatureMap -> Tweet -> Map Tweet FeatureMap
- getNeighbors :: (Vector Tweet, Vector Tweet) -> Vector (Tweet, PSQ Tweet Float)
- featureIntersection :: Map Tweet FeatureMap -> Tweet -> (Tweet, PSQ Tweet Float)
- mergeTweetFeatures :: (FeatureMap -> FeatureMap -> Float) -> Tweet -> Tweet -> FeatureMap -> Binding Tweet Float
- cosineDistance :: FeatureMap -> FeatureMap -> Float
- idftf :: FeatureMap -> FeatureMap -> FeatureMap
- iFrequency :: FeatureMap -> String -> Float -> Float
- compareLabels :: Int -> Vector (Tweet, PSQ Tweet Float) -> Vector Float
- compareLabelsForScheme :: [Vector (Tweet, PSQ Tweet Float)] -> Int -> [Float]
- getLabel :: Int -> PSQ Tweet Float -> String
- getAccuracy :: Vector Float -> Float
- main :: IO ()
Documentation
type FeatureMap = Map String Float Source
getFiles :: FilePath -> IO [FilePath] Source
Get directory contents of FilePath
. A better variant is at:
extractFeatures :: Tweet -> FeatureMap Source
Extract features (for the bag of words) for one Tweet.
Thereby, the Tweet will be (in order of application):
* tokenized
* converted to a Vector
* String
s will be converted to lowercase
* String
s that are not isAlpha
are removed
* String
s that are element of stopWords
are removed
* Empty String
s will be removed
frequency :: Vector String -> FeatureMap Source
countItem :: Map String Float -> String -> FeatureMap Source
Insert an item into a Map
. Default value is 1 if the item is
not existing. If the item is already existing, its frequency will
be increased by 1.
insertInMap :: Map Tweet FeatureMap -> Tweet -> Map Tweet FeatureMap Source
Take a Map
, consisting of
key: Tweet
value: FeatureMap
and one Tweet and create a new Map
with the added features
from the Tweet
featureIntersection :: Map Tweet FeatureMap -> Tweet -> (Tweet, PSQ Tweet Float) Source
mergeTweetFeatures :: (FeatureMap -> FeatureMap -> Float) -> Tweet -> Tweet -> FeatureMap -> Binding Tweet Float Source
cosineDistance :: FeatureMap -> FeatureMap -> Float Source
idftf :: FeatureMap -> FeatureMap -> FeatureMap Source
Takes a dictionary and a mini dictionary (frequency of words in one Tweet) and calculates the idftf values for all words in the mini dictionary.
iFrequency :: FeatureMap -> String -> Float -> Float Source
compareLabels :: Int -> Vector (Tweet, PSQ Tweet Float) -> Vector Float Source
Calculate the amount of tweets where the predicted label matches the actual label.
getAccuracy :: Vector Float -> Float Source
Get sum total of a vector of floats (i.e., the number of correctly classified tweets) and return the accuracy