parser-regex: Regex based parsers

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Regex based parsers. See

Regex.Text
To work with Text from the text library.
Regex.List
To work with Strings or lists.
Regex.Base
To work with other sequences.

[Skip to Readme]

Properties

Versions 0.1.0.0, 0.2.0.0, 0.2.0.1
Change log CHANGELOG.md
Dependencies base (>=4.15 && <5.0), containers (>=0.6.4 && <0.8), deepseq (>=1.4.5 && <1.6), ghc-bignum (>=1.1 && <1.4), primitive (>=0.7.3 && <0.10), text (>=2.0.1 && <2.2), transformers (>=0.5.6 && <0.7) [details]
License BSD-3-Clause
Author Soumik Sarkar
Maintainer soumiksarkar.3120@gmail.com
Category Parsing
Home page https://github.com/meooow25/parser-regex
Bug tracker https://github.com/meooow25/parser-regex/issues
Source repo head: git clone https://github.com/meooow25/parser-regex.git
Uploaded by meooow at 2024-12-24T16:13:19Z

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for parser-regex-0.2.0.1

[back to package description]

parser-regex

Hackage Haskell-CI

Regex based parsers

Features

Examples

Versus regex patterns

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

Can you guess what this matches?

This is a non-validating regex to extract parts of a URI, from RFC 3986. It can be translated as follows.

{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative (optional)
import Data.Text (Text)

import Regex.Text (REText)
import qualified Regex.Text as R
import qualified Data.CharSet as CS

data URI = URI
  { scheme    :: Maybe Text
  , authority :: Maybe Text
  , path      :: Text
  , query     :: Maybe Text
  , fragment  :: Maybe Text
  } deriving Show

uriRE :: REText URI
uriRE = URI
  <$> optional (R.someTextOf (CS.not ":/?#") <* R.char ':')
  <*> optional (R.text "//" *> R.manyTextOf (CS.not "/?#"))
  <*> R.manyTextOf (CS.not "?#")
  <*> optional (R.char '?' *> R.manyTextOf (CS.not "#"))
  <*> optional (R.char '#' *> R.manyText)
>>> R.reParse uriRE "https://github.com/meooow25/parser-regex?tab=readme-ov-file#parser-regex"
Just (URI { scheme = Just "https"
          , authority = Just "github.com"
          , path = "/meooow25/parser-regex"
          , query = Just "tab=readme-ov-file"
          , fragment = Just "parser-regex" })

More parsing

Parsing is straightforward, even for tasks which may be impractical with submatch extraction typically offered by regex libraries.

import Control.Applicative ((<|>))
import Data.Text (Text)

import Regex.Text (REText)
import qualified Regex.Text as R
import qualified Data.CharSet as CS

data Expr
  = Var Text
  | Expr :+ Expr
  | Expr :- Expr
  | Expr :* Expr
  deriving Show

exprRE :: REText Expr
exprRE = var `R.chainl1` mul `R.chainl1` (add <|> sub)
  where
    var = Var <$> R.someTextOf CS.asciiLower
    add = (:+) <$ R.char '+'
    sub = (:-) <$ R.char '-'
    mul = (:*) <$ R.char '*'
>>> import qualified Regex.Text as R
>>> R.reParse exprRE "a+b-c*d*e+f"
Just (((Var "a" :+ Var "b") :- ((Var "c" :* Var "d") :* Var "e")) :+ Var "f")

Find and replace

Find and replace using regexes are supported for Text and lists.

>>> import Control.Applicative ((<|>))
>>> import qualified Data.Text as T
>>> import qualified Regex.Text as R
>>>
>>> data Color = Blue | Orange deriving Show
>>> let re = Blue <$ R.text "blue" <|> Orange <$ R.text "orange"
>>> R.find re "color: orange"
Just Orange
>>>
>>> let re = T.toUpper <$> (R.text "cat" <|> R.text "dog" <|> R.text "fish")
>>> R.replaceAll re "locate selfish hotdog"
"loCATe selFISH hotDOG"

Parse any sequence

Regexes are not restricted to parsing text. For example, one may parse vectors from the vector library, because why not.

import Regex.Base (Parser)
import qualified Regex.Base as R
import qualified Data.Vector.Generic as VG

parseVector :: VG.Vector v c => Parser c a -> v c -> Maybe a
parseVector = R.parseFoldr VG.foldr
>>> import Control.Applicative (many)
>>> import qualified Data.Vector as V
>>> import qualified Regex.Base as R
>>>
>>> let p = R.compile $ many ((,) <$> R.satisfy even <*> R.satisfy odd)
>>> let v = V.fromList [0..5] :: V.Vector Int
>>> parseVector p v
Just [(0,1),(2,3),(4,5)]

Documentation

Documentation is available on Hackage: parser-regex

Already familiar with regex patterns? See the Regex pattern cheat sheet.

Alternatives

regex-applicative

regex-applicative is the primary inspiration for this library, and is similar in many ways.

parser-regex attempts to be a more efficient and featureful library built on the ideas of regex-applicative, though it does not aim to provide a superset of regex-applicative's API.

Traditional regex libraries

These libraries use regex patterns.

Consider using these if

Use parser-regex instead if

For a detailed comparison of regex libraries, see here.

Other options

If you are not restricted to regexes, there are many other parsing libraries you may use, too many to list here. See the "Parsing" category on Hackage for a start.

Contributing

Questions, bug reports, documentation improvements, code contributions welcome! Please open an issue as the first step.