gambler: Composable, streaming, and efficient left folds

[ bsd3, library, streaming ] [ Propose Tags ] [ Report a vulnerability ]
Versions [RSS] 0.0.0.0, 0.0.0.1, 0.0.1.0, 0.1.0.0, 0.2.0.0, 0.3.0.0, 0.4.0.0, 0.4.1.0 (info)
Change log changelog.md
Dependencies base (>=4.16 && <4.18) [details]
License BSD-3-Clause
Copyright 2013-2016 Gabriella Gonzalez, 2023 Mission Valley Software LLC
Author Gabriella Gonzalez, Chris Martin
Maintainer Chris Martin, Julie Moronuki
Revised Revision 1 made by chris_martin at 2023-05-01T21:32:38Z
Category Streaming
Bug tracker https://github.com/typeclasses/gambler/issues
Uploaded by chris_martin at 2023-03-09T07:24:13Z
Distributions
Reverse Dependencies 2 direct, 2 indirect [details]
Downloads 444 total (8 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2023-03-09 [all 1 reports]

Readme for gambler-0.4.1.0

[back to package description]

This package defines the Fold, NonemptyFold, and EffectfulFold types and provides an assortment of ways to construct, combine, and use them.

Every gambler knows that the secret to surviving
Is knowing what to throw away and knowing what to keep

You got to know when to hold 'em, know when to fold 'em
Know when to walk away, and know when to run

The Gambler by Don Schlitz, popularized by Kenny Rogers

Fold

The foldl' function in the base package is used when we want a strictly evaluated result from traversing a list.

foldl' :: Foldable t => (b -> a -> b) -> b -> t a -> b

For example, to sum a list of numbers:

λ> import qualified Data.List as List

λ> List.foldl' (+) 0 [1..100]
5050

What if we put the first two parameters to List.foldl' into a datatype?

data Fold a b = Fold
    { initial :: b
    , step    :: b -> a -> b }

Or, better yet, we can use a trick to turn the datatype into a Functor (which will become important when we discuss the Applicative a bit later):

data Fold a b = forall x. Fold
    { initial :: x
    , step    :: x -> a -> x
    , extract :: x -> b }

We can then express the concept of numeric summation as:

sum :: Num a => Fold a a
sum = Fold{ initial = 0, step = (+), extract = id }

This Fold can be used to sum lists and other Foldable collections, but it can also be used to sum effectful streams. So even without any further mechanism, just having this datatype gives us some useful expressive power. There is no need for each streaming library to duplicate all the work of defining its own copies of sum, product, all, any, and, or, minimum, maximum, etc.; a library that provides some kind of Stream type needs only define a function to apply a fold to a stream ...

foldStream :: Fold a b -> Stream m a -> m b

... and then users can make use of any library of folds that they may find or concoct. gambler itself contains much of the functionality of the standard Data.List module, but there are more things in heaven and earth than are dreamt of in this package.

NonemptyFold

There are some kinds of folding that only work if the input it nonempty. Suppose, for example, we want the greatest of all the items. If there are no items, there is no greatest item. We express this sort of thing with a slight modification to Fold:

data NonemptyFold a b = forall x. NonemptyFold
    { initial :: a -> x
    , step    :: x -> a -> x
    , extract :: x -> b }

The only thing that's different is the type of the initial field has changed from x to a -> x; it is now parameterized on the first item.

The notion of selecting greatest item can now be expressed as:

maximum = NonemptyFold{ initial = id, step = max, extract = id }

A NonemptyFold can be converted to a Fold using Fold.Pure.nonemptyFold. The conversion changes the fold's return type from b to Maybe b to accommodate the possibility of empty input.

EffectfulFold

There is a related function in base that does the same thing as foldl' but in a monadic context:

foldM :: Foldable t => Monad m => (b -> a -> m b) -> b -> t a -> m b

This allows us to perform effects as we fold.

λ> import qualified Control.Monad as Monad

λ> Monad.foldM (\x a -> putStrLn ("* " <> show a) $> (x + a)) 0 [1..5]
* 1
* 2
* 3
* 4
* 5
15

The type we define corresponding to the arguments of Monad.foldM is:

data EffectfulFold m a b = forall x. EffectfulFold
    { initial :: m x
    , step    :: x -> a -> m x
    , extract :: x -> m b }

A regular Fold can be converted to an EffectfulFold of any monad using Fold.Effectful.fold.

Applicative instances

The Fold and EffectfulFold applicatives are great for computing multiple folds over a collection in one pass over the data. For example, suppose that you want to compute both the sum and the length of a list. The following approach works, but it uses space inefficiently:

import qualified Data.List as List

sumAndLength :: Num a => [a] -> (a, Natural)
sumAndLength xs = (List.sum xs, List.genericLength xs)

The problem is this goes over the list in two passes. If you demand the result of sum, the Haskell runtime will materialize the entire list. However, the runtime cannot garbage collect the list because the list is still required for the call to length. The space requirement of sumAndLength is therefore linear with respect to the size of the list. We can do much better.

With gambler, we can instead write:

import qualified Fold.Pure as Fold

sumAndLength :: Num a => [a] -> (a, Natural)
sumAndLength = Fold.runFold $ (,) <$> Fold.sum <*> Fold.length

This achieves the same result using constant space.

ShortcutFold

Operations like sum and length inherently require the entirety of the input. But suppose, for example, we want to know whether a list contains a particular item. Once we find the item, we can stop looking. For situations like this, we have ShortcutFold.

data ShortcutFold a b = forall x y. ShortcutFold
    { initial :: Vitality x y
    , step    :: y -> a -> Vitality x y
    , extract :: Vitality x y -> b }
data Vitality a b = Dead a | Alive Will b
data Will = Ambivalent | Tenacious

If initial or step returns a Dead result, then no further input can be fed into the fold. The Will type will be discussed below.

Example of testing for the presence of a particular element:

λ> import qualified Fold.Shortcut as Fold

λ> Fold.run (Fold.element 'a') "back"
True

λ> Fold.run (Fold.element 'a') "front"
False

By running a shortcut fold over a partially-defined input list, we can verify that evaluation does indeed halt once the element is found:

λ> Fold.run (Fold.element 'a') ("ba" ++ undefined)
True

ShortcutNonemptyFold

The non-empty variant of ShortcutFold is ShortcutNonemptyFold.

data ShortcutNonemptyFold a b = forall x y. ShortcutNonemptyFold
    { initial :: a -> Vitality x y
    , step    :: y -> a -> Vitality x y
    , extract :: Vitality x y -> b }

Like NonemptyFold, the only thing we've changed here is to add an a parameter to the initial field.

Shortcutting Applicative instances

Just like Fold, ShortcutFold can be combined in an applicative manner and the input is traversed in a single pass.

λ> import qualified Fold.Shortcut as Fold

λ> Fold.run ((,) <$> Fold.index 4 <*> Fold.index 2) "abcdef"
(Just 'e',Just 'c')

With an applicative composition, traversal continues as long as any of the constituent parts is alive and still expecting more input. In the example above, index 2 becomes dead after the third character ('c'), but the composition stays alive until the fifth character ('e') results in the completion of index 4.

A fold indicates that it is finished by returning Dead as its Vitality. But there is one more subtlety: When a fold is Alive, it can further specify whether it wants to keep receiving more items or not. This disposition is called Will. A Tenacious will means that the fold wants more input; an Ambivalent will mean the fold can accept more input, but doesn't care. The difference is that tenacious folds keep larger applicative compositions alive, whereas ambivalent folds do not. Shortcut folds are tagged in the documentation as "(ambivalent)" or "(tenacious)".

One example of an ambivalent shortcut fold is length. We see in the example below that the length fold keeps counting only as long as the other fold it is composed with still needs input.

λ> Fold.run ((,) <$> Fold.length <*> Fold.element 'e') "abcdef"
(5,True)

λ> Fold.run ((,) <$> Fold.length <*> Fold.element 'z') "abcdef"
(6,False)

Consider another example:

λ> f1 = ((,) <$> Fold.length <*> Fold.element 'c')

λ> f2 = ((,) <$> Fold.length <*> Fold.element 'e')

λ> Fold.run ((,) <$> f1 <*> f2) "abcdef"
((5,True),(5,True))

The example above composes four folds (two length folds and two element folds). Notice that both of the length folds returned 5 because they keep running until the entire fold stops, regardless of the way they're composed. We can avoid this behavior with the demotivate utility, which stops a fold from being so eager. A demotivated fold will stop receiving input when all of its components are either dead or ambivalent.

λ> Fold.run ((,) <$> Fold.demotivate f1 <*> Fold.demotivate f2) "abcdef"
((3,True),(5,True))

Quick start

To get quickly playing around with gambler, launch GHCi using cabal:

cabal repl --build-depends gambler

Import the module corresponding to one of the varieties of fold:

λ> import qualified Fold.Pure as Fold

And enjoy.

λ> Fold.runFold ((,) <$> Fold.sum <*> Fold.length) [1..1000000]
(500000500000,1000000)

Authors

This package began as mostly a copy of foldl, with some features removed to minimize its dependency set. What remained in gambler-0.0 was essentially the same as what could be found in foldl-1.4.13, subject only to reorganization, renaming, and minor modifications. The Fold, EffectfulFold, and NonemptyFold types are the work of Gabriella Gonzalez.

ShortcutFold and ShortcutNonemptyFold were added by Mission Valley Software in gambler-0.1.

Future plans

Once the Foldable1 class has been added to base, the type of Fold.Nonempty.run may be generalized to accommodate it.