Copyright | Christof Schramm |
---|---|
License | GPL v 3 |
Maintainer | Christof Schramm <christof.schramm@campus.lmu.de> |
Stability | Experimental |
Portability | Should work in all common Haskell implementations |
Safe Haskell | None |
Language | Haskell2010 |
A package for reading and writing data in the IDX format. This data format is used for machine-learning data sets like the MINST database of handwritten digits (http://yann.lecun.com/exdb/mnist/)
- data IDXData
- data IDXLabels
- data IDXContentType where
- idxType :: IDXData -> IDXContentType
- idxDimensions :: IDXData -> Vector Int
- isIDXReal :: IDXData -> Bool
- isIDXIntegral :: IDXData -> Bool
- idxDoubleContent :: IDXData -> Vector Double
- idxIntContent :: IDXData -> Vector Int
- labeledIntData :: IDXLabels -> IDXData -> Maybe [(Int, Vector Int)]
- labeledDoubleData :: IDXLabels -> IDXData -> Maybe [(Int, Vector Double)]
- encodeIDXLabels :: IDXLabels -> ByteString
- decodeIDXLabels :: ByteString -> Maybe IDXLabels
- encodeIDXLabelsFile :: IDXLabels -> FilePath -> IO ()
- decodeIDXLabelsFile :: FilePath -> IO (Maybe IDXLabels)
- encodeIDX :: IDXData -> ByteString
- decodeIDX :: ByteString -> Maybe IDXData
- encodeIDXFile :: IDXData -> FilePath -> IO ()
- decodeIDXFile :: FilePath -> IO (Maybe IDXData)
Data types
Datatype for storing IDXData. Internally data is always stored either
as Int
or Double
unboxed vectors. However when binary serialization
is used, the data is serialized according to the IDXContentType
.
data IDXContentType where Source
A type to describe the content, according to IDX spec
Accessing data
idxType :: IDXData -> IDXContentType Source
Return the what type the data is stored in
idxDimensions :: IDXData -> Vector Int Source
Return an unboxed Vector of Int dimensions
isIDXReal :: IDXData -> Bool Source
Return wether the data in this IDXData value is stored as double values
isIDXIntegral :: IDXData -> Bool Source
Return wether the data in this IDXData value is stored as integral values
Raw data
idxDoubleContent :: IDXData -> Vector Double Source
Return contained doubles, if no doubles are contained
convert the content to double by using fromIntegral
. Data is stored like
in a C-array, i.e. the last index changes first.
idxIntContent :: IDXData -> Vector Int Source
Return contained ints, if no ints are contained,
convert content to ints by using round
. Data is stored like
in a C-array, i.e. the last index changes first.
Labeled data
labeledIntData :: IDXLabels -> IDXData -> Maybe [(Int, Vector Int)] Source
Partition a dataset and label each subpartition, return int values
labeledDoubleData :: IDXLabels -> IDXData -> Maybe [(Int, Vector Double)] Source
Partition a dataset and label each subpartition, return double values
IO / Serialization
IDXLabels
ByteString serialization
FileIO
encodeIDXLabelsFile :: IDXLabels -> FilePath -> IO () Source
IDXData (e.g. images)
ByteString serialization
encodeIDX :: IDXData -> ByteString Source
decodeIDX :: ByteString -> Maybe IDXData Source
File IO
encodeIDXFile :: IDXData -> FilePath -> IO () Source