Copyright	(c) Levent Erkok
License	BSD3
Maintainer	erkokl@gmail.com
Stability	experimental
Safe Haskell	Safe-Inferred
Language	Haskell2010

Data.SBV.RegExp

Contents

Regular expressions
Matching
Constructing regular expressions

Description

A collection of regular-expression related utilities. The recommended workflow is to import this module qualified as the names of the functions are specifically chosen to be common identifiers. Also, it is recommended you use the OverloadedStrings extension to allow literal strings to be used as symbolic-strings and regular-expressions when working with this module.

Synopsis

data RegExp
- = Literal String
- | All
- | AllChar
- | None
- | Range Char Char
- | Conc [RegExp]
- | KStar RegExp
- | KPlus RegExp
- | Opt RegExp
- | Comp RegExp
- | Diff RegExp RegExp
- | Loop Int Int RegExp
- | Power Int RegExp
- | Union [RegExp]
- | Inter RegExp RegExp
class RegExpMatchable a where
- match :: a -> RegExp -> SBool
everything :: RegExp
nothing :: RegExp
anyChar :: RegExp
exactly :: String -> RegExp
oneOf :: String -> RegExp
newline :: RegExp
whiteSpaceNoNewLine :: RegExp
whiteSpace :: RegExp
tab :: RegExp
punctuation :: RegExp
asciiLetter :: RegExp
asciiLower :: RegExp
asciiUpper :: RegExp
digit :: RegExp
octDigit :: RegExp
hexDigit :: RegExp
decimal :: RegExp
octal :: RegExp
hexadecimal :: RegExp
floating :: RegExp
identifier :: RegExp

Regular expressions

A note on Equality Regular expressions can be symbolically compared for equality. Note that the regular Haskell Eq instance and the symbolic version differ in semantics: Eq instance checks for "structural" equality, i.e., that the two regular expressions are constructed in precisely the same way. The symbolic equality, however, checks for language equality, i.e., that the regular expressions correspond to the same set of strings. This is a bit unfortunate, but hopefully should not cause much trouble in practice. Note that the only reason we support symbolic equality is to take advantage of the internal decision procedures z3 provides for this case: A similar goal can be achieved by showing there is no string accepted by one but not the other. However, this encoding doesn't perform well in z3.

>>> prove $ ("a" * KStar ("b" * "a")) .== (KStar ("a" * "b") * "a")
Q.E.D.
>>> prove $ ("a" * KStar ("b" * "a")) .== (KStar ("a" * "b") * "c")
Falsifiable
>>> prove $ ("a" * KStar ("b" * "a")) ./= (KStar ("a" * "b") * "c")
Q.E.D.

data RegExp Source #

Regular expressions. Note that regular expressions themselves are concrete, but the match function from the RegExpMatchable class can check membership against a symbolic string/character. Also, we are preferring a datatype approach here, as opposed to coming up with some string-representation; there are way too many alternatives already so inventing one isn't a priority. Please get in touch if you would like a parser for this type as it might be easier to use.

Constructors

Literal String	Precisely match the given string
All	Accept every string
AllChar	Accept every single character
None	Accept no strings
Range Char Char	Accept range of characters
Conc [RegExp]	Concatenation
KStar RegExp	Kleene Star: Zero or more
KPlus RegExp	Kleene Plus: One or more
Opt RegExp	Zero or one
Comp RegExp	Complement of regular expression
Diff RegExp RegExp	Difference of regular expressions
Loop Int Int RegExp	From `n` repetitions to `m` repetitions
Power Int RegExp	Exactly `n` repetitions, i.e., nth power
Union [RegExp]	Union of regular expressions
Inter RegExp RegExp	Intersection of regular expressions

Instances

Instances details

Data RegExp Source #
Instance details Defined in Data.SBV.Core.Symbolic Methods gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> RegExp -> c RegExp # gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c RegExp # toConstr :: RegExp -> Constr # dataTypeOf :: RegExp -> DataType # dataCast1 :: Typeable t => (forall d. Data d => c (t d)) -> Maybe (c RegExp) # dataCast2 :: Typeable t => (forall d e. (Data d, Data e) => c (t d e)) -> Maybe (c RegExp) # gmapT :: (forall b. Data b => b -> b) -> RegExp -> RegExp # gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> RegExp -> r # gmapQr :: forall r r'. (r' -> r -> r) -> r -> (forall d. Data d => d -> r') -> RegExp -> r # gmapQ :: (forall d. Data d => d -> u) -> RegExp -> [u] # gmapQi :: Int -> (forall d. Data d => d -> u) -> RegExp -> u # gmapM :: Monad m => (forall d. Data d => d -> m d) -> RegExp -> m RegExp # gmapMp :: MonadPlus m => (forall d. Data d => d -> m d) -> RegExp -> m RegExp # gmapMo :: MonadPlus m => (forall d. Data d => d -> m d) -> RegExp -> m RegExp #
IsString RegExp Source #	With overloaded strings, we can have direct literal regular expressions.
Instance details Defined in Data.SBV.Core.Symbolic Methods fromString :: String -> RegExp #
Num RegExp Source #	Regular expressions as a `Num` instance. Note that only some operations make sense and not in the most obvious way. For instance, we would typically expect `a - b` to be the same as `a + negate b`, but that equality does not hold in general. So, use the `Num` instance only as constructing syntax, not doing algebraic manipulations.
Instance details Defined in Data.SBV.Core.Symbolic Methods (+) :: RegExp -> RegExp -> RegExp # (-) :: RegExp -> RegExp -> RegExp # (*) :: RegExp -> RegExp -> RegExp # negate :: RegExp -> RegExp # abs :: RegExp -> RegExp # signum :: RegExp -> RegExp # fromInteger :: Integer -> RegExp #
Show RegExp Source #	Convert a reg-exp to a Haskell-like string
Instance details Defined in Data.SBV.Core.Symbolic Methods showsPrec :: Int -> RegExp -> ShowS # show :: RegExp -> String # showList :: [RegExp] -> ShowS #
Eq RegExp Source #
Instance details Defined in Data.SBV.Core.Symbolic Methods (==) :: RegExp -> RegExp -> Bool # (/=) :: RegExp -> RegExp -> Bool #
Ord RegExp Source #
Instance details Defined in Data.SBV.Core.Symbolic Methods compare :: RegExp -> RegExp -> Ordering # (<) :: RegExp -> RegExp -> Bool # (<=) :: RegExp -> RegExp -> Bool # (>) :: RegExp -> RegExp -> Bool # (>=) :: RegExp -> RegExp -> Bool # max :: RegExp -> RegExp -> RegExp # min :: RegExp -> RegExp -> RegExp #
EqSymbolic RegExp Source #	Regular expressions can be compared for equality. Note that we diverge here from the equality in the concrete sense; i.e., the Eq instance does not match the symbolic case. This is a bit unfortunate, but unavoidable with the current design of how we "distinguish" operators. Hopefully shouldn't be a big deal, though one should be careful.
Instance details Defined in Data.SBV.Core.Model Methods (.==) :: RegExp -> RegExp -> SBool Source # (./=) :: RegExp -> RegExp -> SBool Source # (.===) :: RegExp -> RegExp -> SBool Source # (./==) :: RegExp -> RegExp -> SBool Source # distinct :: [RegExp] -> SBool Source # distinctExcept :: [RegExp] -> [RegExp] -> SBool Source # allEqual :: [RegExp] -> SBool Source # sElem :: RegExp -> [RegExp] -> SBool Source # sNotElem :: RegExp -> [RegExp] -> SBool Source #

Matching

A symbolic string or a character (SString or SChar) can be matched against a regular-expression. Note that the regular-expression itself is not a symbolic object: It's a fully concrete representation, as captured by the RegExp class. The RegExp class is an instance of the IsString class, which makes writing literal matches easier. The RegExp type also has a (somewhat degenerate) Num instance: Concatenation corresponds to multiplication, union corresponds to addition, and 0 corresponds to the empty language.

Note that since match is a method of RegExpMatchable class, both SChar and SString can be used as an argument for matching. In practice, this means you might have to disambiguate with a type-ascription if it is not deducible from context.

>>> prove $ \s -> (s :: SString) `match` "hello" .<=> s .== "hello"
Q.E.D.
>>> prove $ \s -> s `match` Loop 2 5 "xyz" .=> length s .>= 6
Q.E.D.
>>> prove $ \s -> s `match` Loop 2 5 "xyz" .=> length s .<= 15
Q.E.D.
>>> prove $ \s -> s `match` Power 3 "xyz" .=> length s .== 9
Q.E.D.
>>> prove $ \s -> s `match`  (exactly "xyz" ^ 3) .=> length s .== 9
Q.E.D.
>>> prove $ \s -> match s (Loop 2 5 "xyz") .=> length s .>= 7
Falsifiable. Counter-example:
  s0 = "xyzxyz" :: String
>>> prove $ \s -> (s :: SString) `match` "hello" .=> s `match` ("hello" + "world")
Q.E.D.
>>> prove $ \s -> sNot $ (s::SString) `match` ("so close" * 0)
Q.E.D.
>>> prove $ \c -> (c :: SChar) `match` oneOf "abcd" .=> ord c .>= ord (literal 'a') .&& ord c .<= ord (literal 'd')
Q.E.D.

class RegExpMatchable a where Source #

Matchable class. Things we can match against a RegExp.

For instance, you can generate valid-looking phone numbers like this:

>>> :set -XOverloadedStrings
>>> let dig09 = Range '0' '9'
>>> let dig19 = Range '1' '9'
>>> let pre   = dig19 * Loop 2 2 dig09
>>> let post  = dig19 * Loop 3 3 dig09
>>> let phone = pre * "-" * post
>>> sat $ \s -> (s :: SString) `match` phone
Satisfiable. Model:
  s0 = "200-8000" :: String

Methods

match :: a -> RegExp -> SBool Source #

match s r checks whether s is in the language generated by r.

Instances

Instances details

RegExpMatchable SChar Source #	Matching a character simply means the singleton string matches the regex.
Instance details Defined in Data.SBV.RegExp Methods match :: SChar -> RegExp -> SBool Source #
RegExpMatchable SString Source #	Matching symbolic strings.
Instance details Defined in Data.SBV.RegExp Methods match :: SString -> RegExp -> SBool Source #

Constructing regular expressions

Basics

everything :: RegExp Source #

Match everything, universal acceptor.

>>> prove $ \(s :: SString) -> s `match` everything
Q.E.D.

nothing :: RegExp Source #

Match nothing, universal rejector.

>>> prove $ \(s :: SString) -> sNot (s `match` nothing)
Q.E.D.

anyChar :: RegExp Source #

Match any character, i.e., strings of length 1

>>> prove $ \(s :: SString) -> s `match` anyChar .<=> length s .== 1
Q.E.D.

Literals

exactly :: String -> RegExp Source #

A literal regular-expression, matching the given string exactly. Note that with OverloadedStrings extension, you can simply use a Haskell string to mean the same thing, so this function is rarely needed.

>>> prove $ \(s :: SString) -> s `match` exactly "LITERAL" .<=> s .== "LITERAL"
Q.E.D.

A class of characters

oneOf :: String -> RegExp Source #

Helper to define a character class.

>>> prove $ \(c :: SChar) -> c `match` oneOf "ABCD" .<=> sAny (c .==) (map literal "ABCD")
Q.E.D.

Spaces

newline :: RegExp Source #

Recognize a newline. Also includes carriage-return and form-feed.

>>> newline
(re.union (str.to.re "\n") (str.to.re "\r") (str.to.re "\f"))
>>> prove $ \c -> c `match` newline .=> isSpaceL1 c
Q.E.D.

whiteSpaceNoNewLine :: RegExp Source #

Recognize white-space, but without a new line.

>>> prove $ \c -> c `match` whiteSpaceNoNewLine .=> c `match` whiteSpace .&& c ./= literal '\n'
Q.E.D.

whiteSpace :: RegExp Source #

Recognize white space.

>>> prove $ \c -> c `match` whiteSpace .=> isSpaceL1 c
Q.E.D.

Separators

tab :: RegExp Source #

Recognize a tab.

>>> tab
(str.to.re "\t")
>>> prove $ \c -> c `match` tab .=> c .== literal '\t'
Q.E.D.

punctuation :: RegExp Source #

Recognize a punctuation character.

>>> prove $ \c -> c `match` punctuation .=> isPunctuationL1 c
Q.E.D.

Letters

asciiLetter :: RegExp Source #

Recognize an alphabet letter, i.e., A..Z, a..z.

asciiLower :: RegExp Source #

Recognize an ASCII lower case letter

>>> asciiLower
(re.range "a" "z")
>>> prove $ \c -> (c :: SChar) `match` asciiLower  .=> c `match` asciiLetter
Q.E.D.
>>> prove $ \c -> c `match` asciiLower  .=> toUpperL1 c `match` asciiUpper
Q.E.D.
>>> prove $ \c -> c `match` asciiLetter .=> toLowerL1 c `match` asciiLower
Q.E.D.

asciiUpper :: RegExp Source #

Recognize an upper case letter

>>> asciiUpper
(re.range "A" "Z")
>>> prove $ \c -> (c :: SChar) `match` asciiUpper  .=> c `match` asciiLetter
Q.E.D.
>>> prove $ \c -> c `match` asciiUpper  .=> toLowerL1 c `match` asciiLower
Q.E.D.
>>> prove $ \c -> c `match` asciiLetter .=> toUpperL1 c `match` asciiUpper
Q.E.D.

Digits

digit :: RegExp Source #

Recognize a digit. One of 0..9.

>>> digit
(re.range "0" "9")
>>> prove $ \c -> c `match` digit .<=> let v = digitToInt c in 0 .<= v .&& v .< 10
Q.E.D.
>>> prove $ \c -> sNot ((c::SChar) `match` (digit - digit))
Q.E.D.

octDigit :: RegExp Source #

Recognize an octal digit. One of 0..7.

>>> octDigit
(re.range "0" "7")
>>> prove $ \c -> c `match` octDigit .<=> let v = digitToInt c in 0 .<= v .&& v .< 8
Q.E.D.
>>> prove $ \(c :: SChar) -> c `match` octDigit .=> c `match` digit
Q.E.D.

hexDigit :: RegExp Source #

Recognize a hexadecimal digit. One of 0..9, a..f, A..F.

>>> hexDigit
(re.union (re.range "0" "9") (re.range "a" "f") (re.range "A" "F"))
>>> prove $ \c -> c `match` hexDigit .<=> let v = digitToInt c in 0 .<= v .&& v .< 16
Q.E.D.
>>> prove $ \(c :: SChar) -> c `match` digit .=> c `match` hexDigit
Q.E.D.

Numbers

decimal :: RegExp Source #

Recognize a decimal number.

>>> decimal
(re.+ (re.range "0" "9"))
>>> prove $ \s -> (s::SString) `match` decimal .=> sNot (s `match` KStar asciiLetter)
Q.E.D.

octal :: RegExp Source #

Recognize an octal number. Must have a prefix of the form 0o/0O.

>>> octal
(re.++ (re.union (str.to.re "0o") (str.to.re "0O")) (re.+ (re.range "0" "7")))
>>> prove $ \s -> s `match` octal .=> sAny (.== take 2 s) ["0o", "0O"]
Q.E.D.

hexadecimal :: RegExp Source #

Recognize a hexadecimal number. Must have a prefix of the form 0x/0X.

>>> hexadecimal
(re.++ (re.union (str.to.re "0x") (str.to.re "0X")) (re.+ (re.union (re.range "0" "9") (re.range "a" "f") (re.range "A" "F"))))
>>> prove $ \s -> s `match` hexadecimal .=> sAny (.== take 2 s) ["0x", "0X"]
Q.E.D.

floating :: RegExp Source #

Recognize a floating point number. The exponent part is optional if a fraction is present. The exponent may or may not have a sign.

>>> prove $ \s -> s `match` floating .=> length s .>= 3
Q.E.D.

Identifiers

identifier :: RegExp Source #

For the purposes of this regular expression, an identifier consists of a letter followed by zero or more letters, digits, underscores, and single quotes. The first letter must be lowercase.

>>> prove $ \s -> s `match` identifier .=> isAsciiLower (head s)
Q.E.D.
>>> prove $ \s -> s `match` identifier .=> length s .>= 1
Q.E.D.