symparsec: Type level string parser combinators

[ data, library, mit, types ] [ Propose Tags ]
Versions [RSS] 0.4.0, 1.0.0, 1.0.1, 1.1.0, 1.1.1
Change log CHANGELOG.md
Dependencies base (>=4.18 && <5), defun-core (>=0.1 && <0.2), singleraeh (>=0.2.0 && <0.3), type-level-show (>=0.2.1 && <0.3) [details]
License MIT
Author Ben Orchard
Maintainer Ben Orchard <thefirstmuffinman@gmail.com>
Category Types, Data
Home page https://github.com/raehik/symparsec#readme
Bug tracker https://github.com/raehik/symparsec/issues
Source repo head: git clone https://github.com/raehik/symparsec
Uploaded by raehik at 2024-05-25T03:52:33Z
Distributions NixOS:1.1.1, Stackage:1.1.1
Downloads 93 total (62 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for symparsec-1.0.0

[back to package description]

Symparsec

Type level string (Symbol) parser combinators. Reify to runtime parsers with guaranteed identical behaviour.

It's a Parsec-like for Symbols; thus, Symparsec.

Previously named symbol-parser.

Requires GHC 9.6 for singling parsers.

Features

  • Define parsers compositionally, largely as you would on the term level.
  • Pretty parse errors.
  • Hopefully decent performance.
  • Reify parsers to term level with guaranteed identical behaviour via a healthy dose of singletons.

Examples

ghci> import Symparsec
ghci> :k! Run (Drop 3 :*>: Isolate 2 NatDec :<*>: (Drop 3 :*>: NatHex)) "___10___FF"
...
= Right '( '(10, 255), "")

Why?

Via GHC.Generics, we may inspect Haskell data types on the type level. Constructor names are Symbols. Ever reify these, then perform some sort of checking or parsing on the term level? Symparsec does the parsing on the type level instead. Catch bugs earlier, get faster runtime.

Design

The parser

A parser is a 4-tuple of:

  • a consuming character parser; given a character and a state, returns
    • Cont s: keep going, here's the next state s
    • Done r: parse successful with value r
    • Err E: parse error, details in the E (a structured error)
  • an end handler, which takes only a state, and can only return Done or Err
  • an initial "raw" state
  • a state initializer, which turns the initial "raw" state into the first state value (the indirection here assists singling)

Running a parser is simple:

  • initialize state
  • parse character by character until end of input, or Done/Err

Parsers may not communicate with the runner any other way. This means no backtracking, chunking etc. This is a conscious decision, made for simplicity.

Note that due to character parsers being consuming, we often need to do a bit of "internal lookahead", where we check if we expect to consume any more characters, and if not then emit a Done. It also means that non-consuming parsers such as Take 0 are invalid for non-empty strings. The state initializer should be used to catch such cases.

This is a rough overview of parser design. See the code and/or Haddock documentation for precise details.

Pitfall: Character parsers always consume

There is no backtracking or lookahead, that you do not implement yourself. This keeps the parser execution extremely simple, but breaks null parsers such as Drop 0, so these must be handled specially (unless you don't getting mind stuck type families on misuse).

For concrete examples, see the implementation of Drop and Literal.

Pitfall: Not all parsers are definable

  • No changing parser state. Thus, parsers such as Try :: Parser s r -> Parser (s, [Char]) r are not definable. Parsers may not backtrack.
    • Combinators such as <|> can emulate backtracking, but they are complex and hard to reason about (they may have bugs!).

Feature: Parsers may be reified to use at runtime, with guaranteed same behaviour

TODO

Contributing

I would gladly accept further combinators or other suggestions. Please add an issue or pull request, or contact me via email or whatever (I'm raehik everywhere).

License

Provided under the MIT license. See LICENSE for license text.