Copyright | (c) 2020 Composewell Technologies |
---|---|
License | Apache-2.0 |
Maintainer | streamly@composewell.com |
Stability | experimental |
Portability | GHC |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Simple incremental statistical measures over a stream of data. All operations use numerically stable floating point arithmetic.
Measurements can be performed over the entire input stream or on a sliding window of fixed or variable size. Where possible, measures are computed online without buffering the input stream.
Currently there is no overflow detection.
For more advanced statistical measures see the streamly-statistics
package.
Synopsis
- lmap :: (c -> a) -> Fold m (a, Maybe a) b -> Fold m (c, Maybe c) b
- cumulative :: Fold m (a, Maybe a) b -> Fold m a b
- rollingMap :: Monad m => (Maybe a -> a -> Maybe b) -> Fold m (a, Maybe a) (Maybe b)
- rollingMapM :: Monad m => (Maybe a -> a -> m (Maybe b)) -> Fold m (a, Maybe a) (Maybe b)
- length :: (Monad m, Num b) => Fold m (a, Maybe a) b
- sum :: forall m a. (Monad m, Num a) => Fold m (a, Maybe a) a
- sumInt :: forall m a. (Monad m, Integral a) => Fold m (a, Maybe a) a
- powerSum :: (Monad m, Num a) => Int -> Fold m (a, Maybe a) a
- powerSumFrac :: (Monad m, Floating a) => a -> Fold m (a, Maybe a) a
- minimum :: (MonadIO m, Storable a, Ord a) => Int -> Fold m a (Maybe a)
- maximum :: (MonadIO m, Storable a, Ord a) => Int -> Fold m a (Maybe a)
- range :: (MonadIO m, Storable a, Ord a) => Int -> Fold m a (Maybe (a, a))
- mean :: forall m a. (Monad m, Fractional a) => Fold m (a, Maybe a) a
Incremental Folds
Folds of type Fold m (a, Maybe a) b
are incremental sliding window
folds. An input of type (a, Nothing)
indicates that the input element
a
is being inserted in the window without ejecting an old value
increasing the window size by 1. An input of type (a, Just a)
indicates that the first element is being inserted in the window and the
second element is being removed from the window, the window size remains
the same. The window size can only increase and never decrease.
You can compute the statistics over the entire stream using sliding
window folds by keeping the second element of the input tuple as
Nothing
.
lmap :: (c -> a) -> Fold m (a, Maybe a) b -> Fold m (c, Maybe c) b Source #
Map a function on the incoming as well as outgoing element of a rolling window fold.
>>>
lmap f = Fold.lmap (bimap f (f <$>))
cumulative :: Fold m (a, Maybe a) b -> Fold m a b Source #
Convert an incremental fold to a cumulative fold using the entire input stream as a single window.
>>>
cumulative f = Fold.lmap (\x -> (x, Nothing)) f
rollingMap :: Monad m => (Maybe a -> a -> Maybe b) -> Fold m (a, Maybe a) (Maybe b) Source #
Apply a pure function on the latest and the oldest element of the window.
>>>
rollingMap f = FoldW.rollingMapM (\x y -> return $ f x y)
rollingMapM :: Monad m => (Maybe a -> a -> m (Maybe b)) -> Fold m (a, Maybe a) (Maybe b) Source #
Apply an effectful function on the latest and the oldest element of the window.
Sums
length :: (Monad m, Num b) => Fold m (a, Maybe a) b Source #
The number of elements in the rolling window.
This is the \(0\)th power sum.
>>>
length = powerSum 0
sum :: forall m a. (Monad m, Num a) => Fold m (a, Maybe a) a Source #
Sum of all the elements in a rolling window:
\(S = \sum_{i=1}^n x_{i}\)
This is the first power sum.
>>>
sum = powerSum 1
Uses Kahan-Babuska-Neumaier style summation for numerical stability of floating precision arithmetic.
Space: \(\mathcal{O}(1)\)
Time: \(\mathcal{O}(n)\)
powerSum :: (Monad m, Num a) => Int -> Fold m (a, Maybe a) a Source #
Sum of the \(k\)th power of all the elements in a rolling window:
\(S_k = \sum_{i=1}^n x_{i}^k\)
>>>
powerSum k = lmap (^ k) sum
Space: \(\mathcal{O}(1)\)
Time: \(\mathcal{O}(n)\)
Location
minimum :: (MonadIO m, Storable a, Ord a) => Int -> Fold m a (Maybe a) Source #
Find the minimum element in a rolling window.
This implementation traverses the entire window buffer to compute the
minimum whenever we demand it. It performs better than the dequeue based
implementation in streamly-statistics
package when the window size is
small (< 30).
If you want to compute the minimum of the entire stream
minimum
is much faster.
Time: \(\mathcal{O}(n*w)\) where \(w\) is the window size.
range :: (MonadIO m, Storable a, Ord a) => Int -> Fold m a (Maybe (a, a)) Source #
Determine the maximum and minimum in a rolling window.
If you want to compute the range of the entire stream Fold.teeWith (,)
Fold.maximum Fold.minimum
would be much faster.
Space: \(\mathcal{O}(n)\) where n
is the window size.
Time: \(\mathcal{O}(n*w)\) where \(w\) is the window size.
mean :: forall m a. (Monad m, Fractional a) => Fold m (a, Maybe a) a Source #
Arithmetic mean of elements in a sliding window:
\(\mu = \frac{\sum_{i=1}^n x_{i}}{n}\)
This is also known as the Simple Moving Average (SMA) when used in the sliding window and Cumulative Moving Avergae (CMA) when used on the entire stream.
>>>
mean = Fold.teeWith (/) sum length
Space: \(\mathcal{O}(1)\)
Time: \(\mathcal{O}(n)\)