tdigest: On-line accumulation of rank-based statistics

[ bsd3, library, numeric ] [ Propose Tags ] [ Report a vulnerability ]

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl for more details https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0, 0.1, 0.2, 0.2.1, 0.2.1.1, 0.3, 0.3.1
Dependencies base (>=4.7 && <4.10), base-compat (>=0.9.1 && <0.10), binary (>=0.7.1.0 && <0.9), deepseq (>=1.3.0.2 && <1.5), reducers (>=3.12.1 && <3.13), semigroups (>=0.18.2 && <0.19), vector (>=0.11 && <0.13), vector-algorithms (>=0.7.0.1 && <0.8) [details]
Tested with ghc ==7.8.4, ghc ==7.10.3, ghc ==8.0.1, ghc ==8.0.2
License BSD-3-Clause
Author Oleg Grenrus <oleg.grenrus@iki.fi>
Maintainer Oleg Grenrus <oleg.grenrus@iki.fi>
Category Numeric
Home page https://github.com/futurice/haskell-tdigest#readme
Bug tracker https://github.com/futurice/haskell-tdigest/issues
Source repo head: git clone https://github.com/futurice/haskell-tdigest
Uploaded by phadej at 2017-02-16T15:28:40Z
Distributions Arch:0.3.1, LTSHaskell:0.3.1, NixOS:0.3.1, Stackage:0.3.1
Reverse Dependencies 11 direct, 19 indirect [details]
Downloads 12274 total (151 in the last 30 days)
Rating 2.0 (votes: 1) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2017-02-16 [all 1 reports]

Readme for tdigest-0

[back to package description]

tdigest

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl

Synopsis

λ *Data.TDigest > median (tdigest [1..1000] :: TDigest 3)
Just 499.0090729817737

Benchmarks

Using 50M exponentially distributed numbers:

  • average: 16s; incorrect approximation of median, mostly to measure prng speed
  • sorting using vector-algorithms: 33s; using 1000MB of memory
  • sparking t-digest (using some par): 53s
  • buffered t-digest: 68s
  • sequential t-digest: 65s

Example histogram

tdigest-simple -m tdigest -d standard -s 100000 -c 10 -o output.svg -i 34
inkscape --export-png=example.png --export-dpi=80 --export-background-opacity=0 --without-gui example.svg

Example