Motivation
==========
Haskell is a great language for data processing.
You load some data in the IO monad, parse it,
funnel the data through various functions and
write the result back to disk or display it
via a web server.
The programmer has the `let` and `where` patterns at hand
which can be used to sub-structure a single function, e.g.
workflow x y = let
a = f x
b = g a y
in h a b
To the environment program, however,
the values of the intermediate steps `a` and `b`
are invisible and the reader does not know you used
the auxiliary functions `f`, `g` and `h`,
although they might be important
when an outsider tries to check the correctness of
the result of the `workflow` function.
This is where the Provenience monad comes in.
How it works
============
The Provenience monad is an ordinary state monad transformer.
The state is a data flow
[graph](https://hackage.haskell.org/package/fgl "fgl"),
which we call the *variable store*. Nodes are
[Pandoc](https://hackage.haskell.org/package/pandoc "pandoc") renderings
of so-called *variables*. A variable is simply a pair of an ordinary
Haskell value together with its node in the graph.
A computation in the Provenience monad performs any number
of the following five actions.
* Register a new variable in the variable store
* Provide a description of a registered variable
(in form of a Pandoc [Block](http://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html#t:Block "Block"))
* Provide a short name for a registered variable (used in hyperlinks)
* Render the value of a registered variable into
its node in the variable store (as a Pandoc `Block`).
There is a class for default rendering methods akin to the `Show` class.
* Apply a variable holding a function to a variable holding a value,
similar to the `<*>` operator of `Applicative` functors.
In the Provenience monad, we write `<%>` instead.
The fifth action is the only action that adds edges to the
data dependency graph. Suppose we have registered a variable `f`
holding a value of type `a -> b` and a variable `x` holding a
value of type `a`. The description of `f` should explain to the reader
what the function that is the value of `f` does.
The monadic action
y <- pure f <%> x
does not register `y` as a new variable; instead `y` points to the same
node in the variable store as `f`. However, the value of `y` is the
application of the value of `f` to the value of `x` and there is now
an edge from `x` to `y` in the data flow graph labelled with the
description of `f`. If `y` is not itself a function
but the desired result, you should overwrite the node's description
(which is still the description of `f`) with a new description of
the value of `y`.
Why this design choice? Because otherwise partial
application is impossible. If <%> always registered new variables,
then
f <%> a <%> b
would register both `f(a)` and `f(a)(b)` as variables, which might not be
what the user intended. But overwriting `f` also means that we can not
re-use the same function variable in several applications. When that is
desired, use a Provenience action producing a variable instead of the
variable itself. Consider the following.
let f = var succ
x <- input 4
y <- f <%> x
z <- f <%> y
Since the Haskell identifier `f` is bound to a Provenience action
that registers a new variable holding the `succ` function, all
three of `x`, `y` and `z` are distinct variables.
The take-home message is that
f <- var succ
x <- input 4
y <- pure f <%> x
is a dangerous style because the value of `f` is not what the corresponding
node in the graph is being used for anymore.
alternative Representation
--------------------------
The variable store also permits to save an alternative representation
of each variable in addition to the Pandoc rendering,
since you might want to provide a machine-readable data flow graph
in addition to a Pandoc document.
Similarly to the IHaskellDisplay
class,
each type used in a variable must have a type class instance
that allows automatic conversion into the alternative representation.
If you don't need this feature, simply choose () as the alternative
representation type.
The graph of alternative representations can be extracted from
the variable store. We provide code to assemble the store into a
spreadsheet (of static cells). Foldable structures
of basic values become columns while doubly-nested structures
become tables.
Example
=======
Continuing the example above, in the Provenience monad you would
write something like the following. Of course it is up to the programmer
to decide how fine-grained the decomposition into Provenience actions
should be.
workflow x' y' = do
---------- register and render the input variables ------------------
x <- input x' -- register and render x'
y <- input y'
x `named` "x" -- links to x show "x" as text
y `named` "y"
x renderDefault "first item of input data" -- describe x
y renderDefault "second item of input data"
linkx <- linkto x -- create a hyperlink, used below
let what_f_does = Para [Str "auxiliary function f applied to ",linkx]
---------------------------------------------------------------------
------ the actual computation is three lines as in the pure code ----
a <- func f what_f_does <%> x
b <- func g (renderDefault "auxiliary function g") <%> a <%> y
c <- func h (renderDefault "auxiliary function h") <%> a <%> b
------ only book-keeping below --------------------------------------
---------------------------------------------------------------------
a `named` "a" >> b `named` "b" >> c `named` "result"
a renderDefault "first intermediate result"
b renderDefault "second intermediate result"
c renderDefault "the workflow result"
render a >> render b >> render c
return c
Above, the action `func` registers a new variable and immediately
supplies a description, which is then used as edge label by the
`<%>` operator on the same line.
You see that instead of one line of pure Haskell you are burdened
with writing four kinds of Provenience actions:
*register*, *describe*, *alias* and *render*. But of the four actions,
three are only concerned with providing descriptions that the pure code
did not contain.
Remarks
=======
This package was inspired by the
[Javelin](https://en.wikipedia.org/wiki/Javelin_Software "wikipedia")
Software. Thanks to John R Levine, one of the authors of Javelin,
for explaining the concepts underlying Javelin.
By using [Pandoc](https://hackage.haskell.org/package/pandoc "pandoc")
the user has a number of output format choices.
With a little CSS, the above example may be rendered like follows.
Unfortunately, hackage does not allow raw html in markdown, so
you have to convert the markdown yourself.
(For the sake of example,
we used `f = abs`, `g = replicate` and `h = fmap concat . replicate`).