Copyright | (c) Levent Erkok |
---|---|
License | BSD3 |
Maintainer | erkokl@gmail.com |
Stability | experimental |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
A collection of regular-expression related utilities. The recommended
workflow is to import this module qualified as the names of the functions
are specifically chosen to be common identifiers. Also, it is recommended
you use the OverloadedStrings
extension to allow literal strings to be
used as symbolic-strings and regular-expressions when working with
this module.
Synopsis
- data RegExp
- class RegExpMatchable a where
- everything :: RegExp
- nothing :: RegExp
- anyChar :: RegExp
- exactly :: String -> RegExp
- oneOf :: String -> RegExp
- newline :: RegExp
- whiteSpaceNoNewLine :: RegExp
- whiteSpace :: RegExp
- tab :: RegExp
- punctuation :: RegExp
- asciiLetter :: RegExp
- asciiLower :: RegExp
- asciiUpper :: RegExp
- digit :: RegExp
- octDigit :: RegExp
- hexDigit :: RegExp
- decimal :: RegExp
- octal :: RegExp
- hexadecimal :: RegExp
- floating :: RegExp
- identifier :: RegExp
Regular expressions
A note on Equality Regular expressions can be symbolically compared for equality. Note that the regular Haskell Eq
instance and the symbolic version differ in semantics: Eq
instance checks for "structural" equality, i.e., that the two regular expressions
are constructed in precisely the same way. The symbolic equality, however, checks for language equality, i.e., that
the regular expressions correspond to the same set of strings. This is a bit unfortunate, but hopefully should not
cause much trouble in practice. Note that the only reason we support symbolic equality is to take advantage of
the internal decision procedures z3 provides for this case: A similar goal can be achieved by showing there is
no string accepted by one but not the other. However, this encoding doesn't perform well in z3.
>>>
prove $ ("a" * KStar ("b" * "a")) .== (KStar ("a" * "b") * "a")
Q.E.D.>>>
prove $ ("a" * KStar ("b" * "a")) .== (KStar ("a" * "b") * "c")
Falsifiable>>>
prove $ ("a" * KStar ("b" * "a")) ./= (KStar ("a" * "b") * "c")
Q.E.D.
Regular expressions. Note that regular expressions themselves are
concrete, but the match
function from the RegExpMatchable
class
can check membership against a symbolic string/character. Also, we
are preferring a datatype approach here, as opposed to coming up with
some string-representation; there are way too many alternatives
already so inventing one isn't a priority. Please get in touch if you
would like a parser for this type as it might be easier to use.
Literal String | Precisely match the given string |
All | Accept every string |
AllChar | Accept every single character |
None | Accept no strings |
Range Char Char | Accept range of characters |
Conc [RegExp] | Concatenation |
KStar RegExp | Kleene Star: Zero or more |
KPlus RegExp | Kleene Plus: One or more |
Opt RegExp | Zero or one |
Comp RegExp | Complement of regular expression |
Diff RegExp RegExp | Difference of regular expressions |
Loop Int Int RegExp | From |
Power Int RegExp | Exactly |
Union [RegExp] | Union of regular expressions |
Inter RegExp RegExp | Intersection of regular expressions |
Instances
Data RegExp Source # | |
Defined in Data.SBV.Core.Symbolic gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> RegExp -> c RegExp # gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c RegExp # toConstr :: RegExp -> Constr # dataTypeOf :: RegExp -> DataType # dataCast1 :: Typeable t => (forall d. Data d => c (t d)) -> Maybe (c RegExp) # dataCast2 :: Typeable t => (forall d e. (Data d, Data e) => c (t d e)) -> Maybe (c RegExp) # gmapT :: (forall b. Data b => b -> b) -> RegExp -> RegExp # gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> RegExp -> r # gmapQr :: forall r r'. (r' -> r -> r) -> r -> (forall d. Data d => d -> r') -> RegExp -> r # gmapQ :: (forall d. Data d => d -> u) -> RegExp -> [u] # gmapQi :: Int -> (forall d. Data d => d -> u) -> RegExp -> u # gmapM :: Monad m => (forall d. Data d => d -> m d) -> RegExp -> m RegExp # gmapMp :: MonadPlus m => (forall d. Data d => d -> m d) -> RegExp -> m RegExp # gmapMo :: MonadPlus m => (forall d. Data d => d -> m d) -> RegExp -> m RegExp # | |
IsString RegExp Source # | With overloaded strings, we can have direct literal regular expressions. |
Defined in Data.SBV.Core.Symbolic fromString :: String -> RegExp # | |
Num RegExp Source # | Regular expressions as a |
Show RegExp Source # | Convert a reg-exp to a Haskell-like string |
Eq RegExp Source # | |
Ord RegExp Source # | |
EqSymbolic RegExp Source # | Regular expressions can be compared for equality. Note that we diverge here from the equality in the concrete sense; i.e., the Eq instance does not match the symbolic case. This is a bit unfortunate, but unavoidable with the current design of how we "distinguish" operators. Hopefully shouldn't be a big deal, though one should be careful. |
Defined in Data.SBV.Core.Model (.==) :: RegExp -> RegExp -> SBool Source # (./=) :: RegExp -> RegExp -> SBool Source # (.===) :: RegExp -> RegExp -> SBool Source # (./==) :: RegExp -> RegExp -> SBool Source # distinct :: [RegExp] -> SBool Source # distinctExcept :: [RegExp] -> [RegExp] -> SBool Source # allEqual :: [RegExp] -> SBool Source # |
Matching
A symbolic string or a character (SString
or SChar
) can be matched against a regular-expression. Note
that the regular-expression itself is not a symbolic object: It's a fully concrete representation, as
captured by the RegExp
class. The RegExp
class is an instance of the IsString
class, which makes writing
literal matches easier. The RegExp
type also has a (somewhat degenerate) Num
instance: Concatenation
corresponds to multiplication, union corresponds to addition, and 0
corresponds to the empty language.
Note that since match
is a method of RegExpMatchable
class, both SChar
and SString
can be used as
an argument for matching. In practice, this means you might have to disambiguate with a type-ascription
if it is not deducible from context.
>>>
prove $ \s -> (s :: SString) `match` "hello" .<=> s .== "hello"
Q.E.D.>>>
prove $ \s -> s `match` Loop 2 5 "xyz" .=> length s .>= 6
Q.E.D.>>>
prove $ \s -> s `match` Loop 2 5 "xyz" .=> length s .<= 15
Q.E.D.>>>
prove $ \s -> s `match` Power 3 "xyz" .=> length s .== 9
Q.E.D.>>>
prove $ \s -> s `match` (exactly "xyz" ^ 3) .=> length s .== 9
Q.E.D.>>>
prove $ \s -> match s (Loop 2 5 "xyz") .=> length s .>= 7
Falsifiable. Counter-example: s0 = "xyzxyz" :: String>>>
prove $ \s -> (s :: SString) `match` "hello" .=> s `match` ("hello" + "world")
Q.E.D.>>>
prove $ \s -> sNot $ (s::SString) `match` ("so close" * 0)
Q.E.D.>>>
prove $ \c -> (c :: SChar) `match` oneOf "abcd" .=> ord c .>= ord (literal 'a') .&& ord c .<= ord (literal 'd')
Q.E.D.
class RegExpMatchable a where Source #
Matchable class. Things we can match against a RegExp
.
For instance, you can generate valid-looking phone numbers like this:
>>>
:set -XOverloadedStrings
>>>
let dig09 = Range '0' '9'
>>>
let dig19 = Range '1' '9'
>>>
let pre = dig19 * Loop 2 2 dig09
>>>
let post = dig19 * Loop 3 3 dig09
>>>
let phone = pre * "-" * post
>>>
sat $ \s -> (s :: SString) `match` phone
Satisfiable. Model: s0 = "200-8000" :: String
Instances
RegExpMatchable SChar Source # | Matching a character simply means the singleton string matches the regex. |
RegExpMatchable SString Source # | Matching symbolic strings. |
Constructing regular expressions
Basics
everything :: RegExp Source #
Match everything, universal acceptor.
>>>
prove $ \(s :: SString) -> s `match` everything
Q.E.D.
Match nothing, universal rejector.
>>>
prove $ \(s :: SString) -> sNot (s `match` nothing)
Q.E.D.
Match any character, i.e., strings of length 1
>>>
prove $ \(s :: SString) -> s `match` anyChar .<=> length s .== 1
Q.E.D.
Literals
exactly :: String -> RegExp Source #
A literal regular-expression, matching the given string exactly. Note that
with OverloadedStrings
extension, you can simply use a Haskell
string to mean the same thing, so this function is rarely needed.
>>>
prove $ \(s :: SString) -> s `match` exactly "LITERAL" .<=> s .== "LITERAL"
Q.E.D.
A class of characters
oneOf :: String -> RegExp Source #
Helper to define a character class.
>>>
prove $ \(c :: SChar) -> c `match` oneOf "ABCD" .<=> sAny (c .==) (map literal "ABCD")
Q.E.D.
Spaces
Recognize a newline. Also includes carriage-return and form-feed.
>>>
newline
(re.union (str.to.re "\n") (str.to.re "\r") (str.to.re "\f"))>>>
prove $ \c -> c `match` newline .=> isSpaceL1 c
Q.E.D.
whiteSpaceNoNewLine :: RegExp Source #
Recognize white-space, but without a new line.
>>>
prove $ \c -> c `match` whiteSpaceNoNewLine .=> c `match` whiteSpace .&& c ./= literal '\n'
Q.E.D.
whiteSpace :: RegExp Source #
Recognize white space.
>>>
prove $ \c -> c `match` whiteSpace .=> isSpaceL1 c
Q.E.D.
Separators
Recognize a tab.
>>>
tab
(str.to.re "\t")>>>
prove $ \c -> c `match` tab .=> c .== literal '\t'
Q.E.D.
punctuation :: RegExp Source #
Recognize a punctuation character.
>>>
prove $ \c -> c `match` punctuation .=> isPunctuationL1 c
Q.E.D.
Letters
asciiLetter :: RegExp Source #
Recognize an alphabet letter, i.e., A
..Z
, a
..z
.
asciiLower :: RegExp Source #
Recognize an ASCII lower case letter
>>>
asciiLower
(re.range "a" "z")>>>
prove $ \c -> (c :: SChar) `match` asciiLower .=> c `match` asciiLetter
Q.E.D.>>>
prove $ \c -> c `match` asciiLower .=> toUpperL1 c `match` asciiUpper
Q.E.D.>>>
prove $ \c -> c `match` asciiLetter .=> toLowerL1 c `match` asciiLower
Q.E.D.
asciiUpper :: RegExp Source #
Recognize an upper case letter
>>>
asciiUpper
(re.range "A" "Z")>>>
prove $ \c -> (c :: SChar) `match` asciiUpper .=> c `match` asciiLetter
Q.E.D.>>>
prove $ \c -> c `match` asciiUpper .=> toLowerL1 c `match` asciiLower
Q.E.D.>>>
prove $ \c -> c `match` asciiLetter .=> toUpperL1 c `match` asciiUpper
Q.E.D.
Digits
Recognize a digit. One of 0
..9
.
>>>
digit
(re.range "0" "9")>>>
prove $ \c -> c `match` digit .<=> let v = digitToInt c in 0 .<= v .&& v .< 10
Q.E.D.>>>
prove $ \c -> sNot ((c::SChar) `match` (digit - digit))
Q.E.D.
Recognize an octal digit. One of 0
..7
.
>>>
octDigit
(re.range "0" "7")>>>
prove $ \c -> c `match` octDigit .<=> let v = digitToInt c in 0 .<= v .&& v .< 8
Q.E.D.>>>
prove $ \(c :: SChar) -> c `match` octDigit .=> c `match` digit
Q.E.D.
Recognize a hexadecimal digit. One of 0
..9
, a
..f
, A
..F
.
>>>
hexDigit
(re.union (re.range "0" "9") (re.range "a" "f") (re.range "A" "F"))>>>
prove $ \c -> c `match` hexDigit .<=> let v = digitToInt c in 0 .<= v .&& v .< 16
Q.E.D.>>>
prove $ \(c :: SChar) -> c `match` digit .=> c `match` hexDigit
Q.E.D.
Numbers
Recognize a decimal number.
>>>
decimal
(re.+ (re.range "0" "9"))>>>
prove $ \s -> (s::SString) `match` decimal .=> sNot (s `match` KStar asciiLetter)
Q.E.D.
Recognize an octal number. Must have a prefix of the form 0o
/0O
.
>>>
octal
(re.++ (re.union (str.to.re "0o") (str.to.re "0O")) (re.+ (re.range "0" "7")))>>>
prove $ \s -> s `match` octal .=> sAny (.== take 2 s) ["0o", "0O"]
Q.E.D.
hexadecimal :: RegExp Source #
Recognize a hexadecimal number. Must have a prefix of the form 0x
/0X
.
>>>
hexadecimal
(re.++ (re.union (str.to.re "0x") (str.to.re "0X")) (re.+ (re.union (re.range "0" "9") (re.range "a" "f") (re.range "A" "F"))))>>>
prove $ \s -> s `match` hexadecimal .=> sAny (.== take 2 s) ["0x", "0X"]
Q.E.D.
Recognize a floating point number. The exponent part is optional if a fraction is present. The exponent may or may not have a sign.
>>>
prove $ \s -> s `match` floating .=> length s .>= 3
Q.E.D.
Identifiers
identifier :: RegExp Source #
For the purposes of this regular expression, an identifier consists of a letter followed by zero or more letters, digits, underscores, and single quotes. The first letter must be lowercase.
>>>
prove $ \s -> s `match` identifier .=> isAsciiLower (head s)
Q.E.D.>>>
prove $ \s -> s `match` identifier .=> length s .>= 1
Q.E.D.