srtree: A general library to work with Symbolic Regression expression trees.

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Warnings:

A Symbolic Regression Tree data structure to work with mathematical expressions with support to first order derivative and simplification;


[Skip to Readme]

Properties

Versions 0.1.0.0, 0.1.1.0, 0.1.2.0, 0.1.2.1, 1.0.0.0, 1.0.0.1, 1.0.0.2, 1.0.0.3, 1.0.0.4, 1.0.0.5, 2.0.0.0, 2.0.0.0, 2.0.0.1, 2.0.0.2
Change log ChangeLog.md
Dependencies attoparsec (>=0.14.4 && <0.15), attoparsec-expr (>=0.1.1.2 && <0.2), base (>=4.16 && <5), bytestring (>=0.11 && <0.12), containers (>=0.6.7 && <0.8), dlist (>=1.0 && <1.1), exceptions (>=0.10.7 && <0.11), filepath (>=1.4.0.0 && <1.6), hashable (>=1.4.4.0 && <1.6), ieee754 (>=0.8.0 && <0.9), lens (>=5.2.3 && <5.4), list-shuffle (>=1.0.0.1 && <1.1), massiv (>=1.0.4.0 && <1.1), mtl (>=2.2 && <2.4), nlopt-haskell (>=0.1.3.0 && <0.2), normaldistribution (>=1.1.0.3 && <1.2), optparse-applicative (>=0.17 && <0.19), random (>=1.2 && <1.3), split (>=0.2.5 && <0.3), srtree, statistics (>=0.16.2.1 && <0.17), transformers (>=0.6.1.0 && <0.7), unordered-containers (>=0.2 && <0.3), vector (>=0.12 && <0.14), zlib (>=0.6.3 && <0.8) [details]
License BSD-3-Clause
Copyright 2023 Fabricio Olivetti de França
Author Fabricio Olivetti de França
Maintainer fabricio.olivetti@gmail.com
Category Math, Data, Data Structures
Home page https://github.com/folivetti/srtree#readme
Bug tracker https://github.com/folivetti/srtree/issues
Source repo head: git clone https://github.com/folivetti/srtree
Uploaded by olivetti at 2024-11-07T19:36:45Z

Modules

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for srtree-2.0.0.0

[back to package description]

srtree: A supporting library for tree-based symbolic regression

srtree is a Haskell library that implements a tree-based structure for expressions and supporting functions to be used in the context of symbolic regression.

The expression structure is defined as a fixed-point of a mix of unary and binary tree. This makes it easier to implement supporting functions that requires the traversal of the trees. Also, since it is a parameterized structure, we can creating partial trees to pattern math structures of interest. This structure may contain four types of nodes:

The SRTree structure has instances for Num, Fractional, Floating, IsString which allows to create an expression as a valid Haskell expression such as (remember to turn on OverloadedStrings extension):

expr = "x0" * 2 + sin("x1" * pi + "x0") :: Fix SRTree

This library comes with support to many quality of life functions to handle this data structure. Such as:

Additionally, the library provides supporting function to work with datasets, evaluating the expressions, calculating the derivatives, printing, generating random trees, simplifying the expression, calculating overall statistics, optimizing parameters, and model selection metrics.

Together with this library, we provide example applications (please refer to their corresponding README files):

Organization

The library is organized as Data, Algorithm, and Text modules where the Data modules implement functions directly tied to the data structure and the Algorithm modules implement algorithms related to symbolic regression, finally, the Text modules parse string expressions from different formats and apply simplification, when requested.

Data modules

The Data modules is split into \(5\) submodules:

Data.SRTree

The SRTree val data structure is a sum type structure that can be either a variable index, a parameter index, a constant value (of type Double), an univariate function or a binary operator. The data type is implemented as a fixed point so all the algorithms act on Fix SRTree:

t = "x0" + "t0" * sin("x1" + "t1"**2) :: Fix SRTree 

When creating the expression in a more natural notation, the variables and parameters are String composed of the first letter either x, for variables, or t for parameters (as in theta), and an integer corresponding to the index of the variable or parameter. The fixed point notation, allows us to implment recursive processing of a tree without many of the common boilerplate:

countNodes = 
  \case 
    Var _     = 1
    Const _   = 1
    Param _   = 1
    Uni _ t   = 1 + t 
    Bin _ l r = 1 + l + r

The children are parameterized by the val type parameter. This allows us to create convenient partial structures, such as:

-- + operator pointing to some structure
-- with index 1 and 2
Bin Add 1 2 

-- canonical representation of + operator 
Bin Add () ()

The main functions of this module are:

Data.SRTree.Datasets module

This module exports only the loadDataset function which takes a filename and returns the training and test sets together with the column labels. The filename must follow the format:

filename.ext:start_row:end_row:target:features

where each ':' field is optional. The fields are:

Example of valid names: dataset.csv, mydata.tsv, dataset.csv:20:100, dataset.tsv:20:100:price:m2,rooms,neighborhood, dataset.csv:::5:0,1,2.

Data.SRTree.Derivative module

Calculates symbolic derivatives of the expression w.r.t. the variables or the parameters. The main functions of this module are:

Data.SRTree.Eval module

Evaluates an expression given a dataset. The main functions of this module are:

Data.SRTree.Print module

Support functions to convert an expression tree into a String. The main functions of this module are:

Data.SRTree.Random module

Auxiliary functions to create random trees. The main functions of this module are:

Text modules

The Text module is split into \(2\) modules:

Text.ParseSR module

The only important function of this module is parseSR that parses an string expression from a given algorithm to a certain output. It also converts variable names to x0, x1,...

Text.ParseSR.IO module

The two main functions of this module are:

These functions handle any errors with an Either type and they can be safely pipelined together. Any invalid expression will be printed as "invalid expression ".

Algorithm modules

The Algorithm modules are split into \(5\) submodules:

Algorithm.SRTree.AD module

The main functions of this module are:

Algorithm.SRTree.Likelihood module

The main functions of this module are:

Algorithm.SRTree.Opt module

The main functions of this module are:

Algorithm.SRTree.ModelSelection module

The main functions of this module are:

Algorithm.SRTree.ConfidenceIntervals module

The main functions of this module are:

EqSat modules

The EqSat modules are split into \(4\) submodules:

Algorithm.EqSat module

The main functions of this module are:

Algorithm.EqSat.EGraph module

The main functions of this module are:

Algorithm.EqSat.EqSatDB module

The main functions of this module are:

Algorithm.EqSat.Simplify module

The main functions of this module are:

TODO: