-- | This module offers regexes, combinators, and operations to work with the -- list type (@[]@), and also specifically 'String's, which are lists of -- 'Char's. -- module Regex.List ( -- * @RE@s R.RE , R.token , R.satisfy , R.single , R.anySingle , L.list , L.manyList , L.someList , L.manyListMin , L.someListMin -- * @Char@ @RE@s , L.charIgnoreCase , L.oneOfChar , L.stringIgnoreCase , L.manyStringOf , L.someStringOf , L.manyStringOfMin , L.someStringOfMin -- * Numeric @Char@ @RE@s , L.naturalDec , L.integerDec , L.naturalHex , L.integerHex , L.wordRangeDec , L.intRangeDec , L.wordRangeHex , L.intRangeHex , L.wordDecN , L.wordHexN -- * Combinators , R.foldlMany , R.foldlManyMin , L.toMatch , L.withMatch , R.Many(..) , R.manyr , R.optionalMin , R.someMin , R.manyMin , R.atLeast , R.atMost , R.betweenCount , R.atLeastMin , R.atMostMin , R.betweenCountMin , R.sepBy , R.sepBy1 , R.endBy , R.endBy1 , R.sepEndBy , R.sepEndBy1 , R.chainl1 , R.chainr1 -- * Combinators in @base@ -- $combase -- * Compile and parse , L.reParse , P.Parser , P.compile , P.compileBounded , L.parse , L.parseSure -- * List operations , L.find , L.findAll , L.splitOn , L.replace , L.replaceAll -- * Additional information -- ** Recursive definitions -- $recursive-definitions -- ** Laziness -- $laziness -- ** Looping parsers -- $looping-parsers -- ** Performance -- $performance ) where import qualified Regex.Internal.Regex as R import qualified Regex.Internal.Parser as P import qualified Regex.Internal.List as L -- $combase -- -- Various combinators are available in @base@ that work with @RE@s, by virtue -- of @RE@ being @Applicative@ and @Alternative@. -- Since this package does not attempt to redefine or re-export such -- combinators, you need to import and use them. Commonly used combinators -- are: -- -- * "Control.Applicative": @liftA2@, @\<|>@, @empty@, @many@, @some@, -- @optional@ -- * "Control.Monad": @void@, @replicateM@, @replicateM_@ -- * "Data.Foldable": @traverse_@, @for_@, @sequenceA_@, @asum@ -- * "Data.Traversable": @traverse@, @for@, @sequenceA@ -- -- $recursive-definitions -- -- It is not possible to define a @RE@ recursively. If it were permitted, it -- would be capable of parsing more than -- [regular languages](https://en.wikipedia.org/wiki/Regular_language). -- Unfortunately, there is no good way\* to make it impossible to write such -- a regex in the first place. So it must be avoided by the programmer. As an -- example, avoid this: -- -- @ -- re :: RE Char [String] -- re = liftA2 (:) (list "ha") re \<|> [] \<$ list "!" -- diverges! -- @ -- -- Instead, use appropriate combinators from this module: -- -- @ -- re = many (list "ha") <* list "!" -- @ -- -- For the same reason, be cautious when using combinators from the other -- packages on @RE@s. Make sure that they do not attempt to construct a -- recursive @RE@. -- -- If you find that your regex is impossible to write without recursion, -- you are attempting to parse a non-regular language! You need a more powerful -- parser than what this library has to offer. -- -- \*[Unlifted datatypes](https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/primitives.html#unlifted-datatypes) -- can be used for this but they are too inconvenient to work with. -- -- $laziness -- -- Parsing is lazy in the result value, i.e. the @a@ in @RE c a@ or -- @Parser c a@. In fact, for the algorithm used in this library, this laziness -- is essential for good runtime complexity. However, there is little reason -- to be lazy in other aspects, such as the elements of the sequence, @c@, or -- the functions and regexes used in combinators. Functions are strict in such -- arguments. -- -- @ -- -- Lazy in the result -- reParse (pure ⊥) "" = Just ⊥ -- reParse (fmap (\\_ -> ⊥) (char \'a\')) "a" = Just ⊥ -- -- -- Strict in places like -- single ⊥ = ⊥ -- fmap ⊥ r = ⊥ -- liftA2 f r ⊥ = ⊥ -- @ -- -- $looping-parsers -- -- What should be the result of @reParse (many (pure ())) ""@? -- -- Since @many r@ parses @r@ as many times as possible, and @pure ()@ succeeds -- without consuming input, the result should arguably be the infinite list -- @repeat ()@. Similarly, @reParse (foldlMany f z (pure ())) ""@ should -- diverge. Note that this applies to not just @pure x@, but any regex that -- can succeed without consuming input, such as @many x@, @manyMin x@, etc. -- -- This library considers that such an outcome is not desirable in practice. It -- would be surprising to get an infinite structure from a parser. So, in the -- case that @many@ succeeds an infinite number of times, this library treats it -- as succeeding /zero/ times. -- -- By this rule, @reParse (many (pure ())) ""@ parses as @[]@ and -- @reParse (foldlMany f z (pure ())) ""@ parses as @z@. -- -- This behavior makes it impossible to distinguish between zero parses and -- infinite parses. To address this, an alternate combinator 'Regex.List.manyr' -- is provided. This parses into a 'Regex.List.Many', a type that clearly -- indicates if parsing succeeded without consuming input into an infinite list, -- or if it succeeded a finite number of times. -- -- $performance -- -- This section describes some performance characteristics of this library, -- without requiring a dive into the source code. -- -- Parsing with a @RE@ is done in two distinct steps. -- -- 1. A @RE@ is compiled to a @Parser@, which is a -- [nondeterministic finite automaton](https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton) -- (NFA), in \(O(m)\) time. \(m\) here is the size of the @RE@, which is the -- number of nodes in its internal tree representation. The resulting @Parser@ -- has \(O(m)\) size. -- 2. The @Parser@ is run on a list in \(O(mn \log m)\) time, where \(n\) is -- the length of the list. This assumes that each @(c -> Maybe a)@ function -- used to parse individual elements takes \(O(1)\) time. -- -- /Performance tip/: Use @(\<$)@ over @(\<$>)@, and @(\<*)@\/@(*>)@ over -- @liftA2@\/@(\<*>)@ when ignoring the result of a @RE@. Knowing the result is -- ignored allows compiling to a faster parser. -- -- Memory usage for parsing is \(O(nm)\), but -- -- * If the result of a @RE@ is ignored using @(\<$)@, @(\<*)@, or @(*>)@, only -- \(O(m)\) memory is required. -- -- This applies even as subcomponents. So, any subcomponent @RE@ of a larger -- @RE@ that is only recognizing a section of the list is cheaper in terms of -- memory. --