datasets: Classical data sets for statistics and machine learning

Classical machine learning and statistics datasets from the UCI Machine Learning Repository and other sources.

The datasets package defines two different kinds of datasets:

  • small data sets which are directly (or indirectly with `file-embed`) embedded in the package as pure values and do not require network or IO to download the data set. This includes Iris, Anscombe and OldFaithful.

  • other data sets which need to be fetched over the network with Numeric.Datasets.getDataset and are cached in a local temporary directory.

The datafiles/ directory of this package includes copies of a few famous datasets, such as Titanic, Nightingale and Michelson.

Example :

import Numeric.Datasets (getDataset)
import Numeric.Datasets.Iris (iris)
import Numeric.Datasets.Abalone (abalone)

main = do
  -- The Iris data set is embedded
  print (length iris)
  print (head iris)
  -- The Abalone dataset is fetched
  abas <- getDataset abalone
  print (length abas)
  print (head abas)


Versions [RSS] 0.1.0,, 0.2,,,, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.3.0, 0.4.0
Change log
Dependencies aeson (>=, attoparsec (>=0.13), base (>=4.6 && <5), bytestring (>=, cassava (>=, deepseq (>=, directory (>=, exceptions (>=0.10.0), file-embed (>=0.0.11), filepath (>=, hashable (>=, JuicyPixels (>=3.3.3), microlens (>=0.4.10), mtl (>=2.2.2), mwc-random (>=, parallel (>=, req (>=2.0.0), safe-exceptions (>=, streaming (>=, streaming-attoparsec (>=1.0.0), streaming-bytestring (>=0.1.6), streaming-cassava (>=, streaming-commons (>=, stringsearch (>=, tar (>=, text (>=, time (>=, transformers (>=, vector (>=, zlib (>=0.6.2) [details]
Tested with ghc ==7.10.2, ghc ==7.10.3, ghc ==8.0.1, ghc ==8.4.3
License MIT
Author Tom Nielsen <>
Maintainer Marco Zocca <ocramz fripost org>
Category Statistics, Machine Learning, Data Mining, Data
Home page
Bug tracker
Source repo head: git clone
Uploaded by ocramz at 2019-02-12T21:11:06Z
Reverse Dependencies 1 direct, 1 indirect [details]
Downloads 10258 total (29 in the last 30 days)
Rating 2.0 (votes: 1) [estimated by Bayesian average]
