Copyright | (c) Chris Kuklewicz 2007-2009 |
---|---|
License | BSD-3-Clause |
Maintainer | Andreas Abel |
Stability | stable |
Safe Haskell | None |
Language | Haskell2010 |
Text.Regex.TDFA
Description
The Text.Regex.TDFA module provides a backend for regular expressions. It provides instances for the classes defined and documented in Text.Regex.Base and re-exported by this module. If you import this along with other backends then you should do so with qualified imports (with renaming for convenience).
This regex-tdfa package implements, correctly, POSIX extended regular
expressions. It is highly unlikely that the regex-posix
package on
your operating system is correct, see
http://www.haskell.org/haskellwiki/Regex_Posix for examples of your
OS's bugs.
Importing and using
Declare a dependency on the regex-tdfa
library in your .cabal
file:
build-depends: regex-tdfa ^>= 1.3.2
In Haskell modules where you want to use regexes simply import
this module:
import Text.Regex.TDFA
Basics
>>>
let emailRegex = "[a-zA-Z0-9+._-]+\\@[-a-zA-Z]+\\.[a-z]+"
>>>
"my email is first-name.lastname_1974@e-mail.com" =~ emailRegex :: Bool
True
>>>
"invalid@mail@com" =~ emailRegex :: Bool
False
>>>
"invalid@mail.COM" =~ emailRegex :: Bool
False
>>>
"#@invalid.com" =~ emailRegex :: Bool
False
-- non-monadic λ> <to-match-against>=~
<regex> -- monadic, usesfail
on lack of match λ> <to-match-against>=~~
<regex>
(=~
) and (=~~
) are polymorphic in their return type. This is so that
regex-tdfa can pick the most efficient way to give you your result based on
what you need. For instance, if all you want is to check whether the regex
matched or not, there's no need to allocate a result string. If you only want
the first match, rather than all the matches, then the matching engine can stop
after finding a single hit.
This does mean, though, that you may sometimes have to explicitly specify the type you want, especially if you're trying things out at the REPL.
Common use cases
Get the first match
-- returns empty string if no match
a =~
b :: String -- or ByteString, or Text...
>>>
"alexis-de-tocqueville" =~ "[a-z]+" :: String
"alexis"
>>>
"alexis-de-tocqueville" =~ "[0-9]+" :: String
""
Check if it matched at all
a =~
b :: Bool
>>>
"alexis-de-tocqueville" =~ "[a-z]+" :: Bool
True
Get first match + text before/after
-- if no match, will just return whole -- string in the first element of the tuple a =~ b :: (String, String, String)
>>>
"alexis-de-tocqueville" =~ "de" :: (String, String, String)
("alexis-","de","-tocqueville")
>>>
"alexis-de-tocqueville" =~ "kant" :: (String, String, String)
("alexis-de-tocqueville","","")
Get first match + submatches
-- same as above, but also returns a list of just submatches.
-- submatch list is empty if regex doesn't match at all
a =~
b :: (String, String, String, [String])
>>>
"div[attr=1234]" =~ "div\\[([a-z]+)=([^]]+)\\]" :: (String, String, String, [String])
("","div[attr=1234]","",["attr","1234"])
Get all matches
-- can also return Data.Array instead of ListgetAllTextMatches
(a=~
b) :: [String]
>>>
getAllTextMatches ("john anne yifan" =~ "[a-z]+") :: [String]
["john","anne","yifan"]
>>>
getAllTextMatches ("* - . a + z" =~ "[--z]+") :: [String]
["-",".","a","z"]
Feature support
This package does provide captured parenthesized subexpressions.
Depending on the text being searched this package supports Unicode.
The [Char]
, Text
, Text.Lazy
, and (Seq Char)
text types support Unicode. The ByteString
and ByteString.Lazy
text types only support ASCII.
As of version 1.1.1 the following GNU extensions are recognized, all anchors:
- \` at beginning of entire text
- \' at end of entire text
- \< at beginning of word
- \> at end of word
- \b at either beginning or end of word
- \B at neither beginning nor end of word
The above are controlled by the newSyntax
Bool in CompOption
.
Where the "word" boundaries means between characters that are and are not in the [:word:] character class which contains [a-zA-Z0-9_]. Note that \< and \b may match before the entire text and \> and \b may match at the end of the entire text.
There is no locale support, so collating elements like [.ch.] are simply ignored and equivalence classes like [=a=] are converted to just [a]. The character classes like [:alnum:] are supported over ASCII only, valid classes are alnum, digit, punct, alpha, graph, space, blank, lower, upper, cntrl, print, xdigit, word.
>>>
getAllTextMatches ("john anne yifan" =~ "[[:lower:]]+") :: [String]
["john","anne","yifan"]
This package does not provide "basic" regular expressions. This package does not provide back references inside regular expressions.
The package does not provide Perl style regular expressions. Please look at the regex-pcre and pcre-light packages instead.
This package does not provide find-and-replace.
Avoiding backslashes
If you find yourself writing a lot of regexes, take a look at raw-strings-qq. It'll let you write regexes without needing to escape all your backslashes.
{-# LANGUAGE QuasiQuotes #-}
import Text.RawString.QQ
import Text.Regex.TDFA
λ> "2 * (3 + 1) / 4" =~
[r|\([^)]+\)|] :: String
"(3 + 1)"
Synopsis
- getVersion_Text_Regex_TDFA :: Version
- (=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) => source1 -> source -> target
- (=~~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target, MonadFail m) => source1 -> source -> m target
- data Regex
- data CompOption = CompOption {
- caseSensitive :: Bool
- multiline :: Bool
- rightAssoc :: Bool
- newSyntax :: Bool
- lastStarGreedy :: Bool
- data ExecOption = ExecOption {
- captureGroups :: Bool
- class Extract source where
- type MatchArray = Array Int (MatchOffset, MatchLength)
- class RegexLike regex source => RegexContext regex source target where
- class RegexOptions regex compOpt execOpt => RegexMaker regex compOpt execOpt source | regex -> compOpt execOpt, compOpt -> regex execOpt, execOpt -> regex compOpt where
- makeRegex :: source -> regex
- makeRegexOpts :: compOpt -> execOpt -> source -> regex
- makeRegexM :: MonadFail m => source -> m regex
- makeRegexOptsM :: MonadFail m => compOpt -> execOpt -> source -> m regex
- class Extract source => RegexLike regex source where
- matchOnce :: regex -> source -> Maybe MatchArray
- matchAll :: regex -> source -> [MatchArray]
- matchCount :: regex -> source -> Int
- matchTest :: regex -> source -> Bool
- matchAllText :: regex -> source -> [MatchText source]
- matchOnceText :: regex -> source -> Maybe (source, MatchText source, source)
- class RegexOptions regex compOpt execOpt | regex -> compOpt execOpt, compOpt -> regex execOpt, execOpt -> regex compOpt where
- blankCompOpt :: compOpt
- blankExecOpt :: execOpt
- defaultCompOpt :: compOpt
- defaultExecOpt :: execOpt
- setExecOpts :: execOpt -> regex -> regex
- getExecOpts :: regex -> execOpt
- type MatchOffset = Int
- type MatchLength = Int
- newtype AllMatches (f :: Type -> Type) b = AllMatches {
- getAllMatches :: f b
- newtype AllSubmatches (f :: Type -> Type) b = AllSubmatches {
- getAllSubmatches :: f b
- newtype AllTextMatches (f :: Type -> Type) b = AllTextMatches {
- getAllTextMatches :: f b
- newtype AllTextSubmatches (f :: Type -> Type) b = AllTextSubmatches {
- getAllTextSubmatches :: f b
- data MatchResult a = MR {}
- type MatchText source = Array Int (source, (MatchOffset, MatchLength))
- getVersion_Text_Regex_Base :: Version
Documentation
getVersion_Text_Regex_TDFA :: Version Source #
(=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) => source1 -> source -> target Source #
This is the pure functional matching operator. If the target
cannot be produced then some empty result will be returned. If
there is an error in processing, then error
will be called.
(=~~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target, MonadFail m) => source1 -> source -> m target Source #
This is the monadic matching operator. If a single match fails,
then fail
will be called.
The TDFA backend specific Regex
type, used by this module's RegexOptions
and RegexMaker
.
Instances
data CompOption Source #
Control whether the pattern is multiline or case-sensitive like Text.Regex and whether to capture the subgroups (\1, \2, etc). Controls enabling extra anchor syntax.
Constructors
CompOption | |
Fields
|
Instances
data ExecOption Source #
Constructors
ExecOption | |
Fields
|
Instances
type MatchArray = Array Int (MatchOffset, MatchLength) #
class RegexLike regex source => RegexContext regex source target where #
Instances
RegexContext Regex ByteString ByteString Source # | |
RegexContext Regex ByteString ByteString Source # | |
RegexContext Regex Text Text Source # | Since: 1.3.1 |
RegexContext Regex Text Text Source # | Since: 1.3.1 |
RegexContext Regex String String Source # | |
RegexContext Regex (Seq Char) (Seq Char) Source # | |
class RegexOptions regex compOpt execOpt => RegexMaker regex compOpt execOpt source | regex -> compOpt execOpt, compOpt -> regex execOpt, execOpt -> regex compOpt where #
Minimal complete definition
Nothing
Methods
makeRegex :: source -> regex #
makeRegexOpts :: compOpt -> execOpt -> source -> regex #
makeRegexM :: MonadFail m => source -> m regex #
makeRegexOptsM :: MonadFail m => compOpt -> execOpt -> source -> m regex #
Instances
class Extract source => RegexLike regex source where #
Minimal complete definition
Nothing
Methods
matchOnce :: regex -> source -> Maybe MatchArray #
matchAll :: regex -> source -> [MatchArray] #
matchCount :: regex -> source -> Int #
matchTest :: regex -> source -> Bool #
matchAllText :: regex -> source -> [MatchText source] #
matchOnceText :: regex -> source -> Maybe (source, MatchText source, source) #
Instances
RegexLike Regex ByteString Source # | |
Defined in Text.Regex.TDFA.ByteString Methods matchOnce :: Regex -> ByteString -> Maybe MatchArray # matchAll :: Regex -> ByteString -> [MatchArray] # matchCount :: Regex -> ByteString -> Int # matchTest :: Regex -> ByteString -> Bool # matchAllText :: Regex -> ByteString -> [MatchText ByteString] # matchOnceText :: Regex -> ByteString -> Maybe (ByteString, MatchText ByteString, ByteString) # | |
RegexLike Regex ByteString Source # | |
Defined in Text.Regex.TDFA.ByteString.Lazy Methods matchOnce :: Regex -> ByteString -> Maybe MatchArray # matchAll :: Regex -> ByteString -> [MatchArray] # matchCount :: Regex -> ByteString -> Int # matchTest :: Regex -> ByteString -> Bool # matchAllText :: Regex -> ByteString -> [MatchText ByteString] # matchOnceText :: Regex -> ByteString -> Maybe (ByteString, MatchText ByteString, ByteString) # | |
RegexLike Regex Text Source # | Since: 1.3.1 |
Defined in Text.Regex.TDFA.Text Methods matchOnce :: Regex -> Text -> Maybe MatchArray # matchAll :: Regex -> Text -> [MatchArray] # matchCount :: Regex -> Text -> Int # matchTest :: Regex -> Text -> Bool # matchAllText :: Regex -> Text -> [MatchText Text] # matchOnceText :: Regex -> Text -> Maybe (Text, MatchText Text, Text) # | |
RegexLike Regex Text Source # | Since: 1.3.1 |
Defined in Text.Regex.TDFA.Text.Lazy Methods matchOnce :: Regex -> Text -> Maybe MatchArray # matchAll :: Regex -> Text -> [MatchArray] # matchCount :: Regex -> Text -> Int # matchTest :: Regex -> Text -> Bool # matchAllText :: Regex -> Text -> [MatchText Text] # matchOnceText :: Regex -> Text -> Maybe (Text, MatchText Text, Text) # | |
RegexLike Regex String Source # | |
Defined in Text.Regex.TDFA.String Methods matchOnce :: Regex -> String -> Maybe MatchArray # matchAll :: Regex -> String -> [MatchArray] # matchCount :: Regex -> String -> Int # matchTest :: Regex -> String -> Bool # matchAllText :: Regex -> String -> [MatchText String] # matchOnceText :: Regex -> String -> Maybe (String, MatchText String, String) # | |
RegexLike Regex (Seq Char) Source # | |
Defined in Text.Regex.TDFA.Sequence Methods matchOnce :: Regex -> Seq Char -> Maybe MatchArray # matchAll :: Regex -> Seq Char -> [MatchArray] # matchCount :: Regex -> Seq Char -> Int # matchTest :: Regex -> Seq Char -> Bool # matchAllText :: Regex -> Seq Char -> [MatchText (Seq Char)] # matchOnceText :: Regex -> Seq Char -> Maybe (Seq Char, MatchText (Seq Char), Seq Char) # |
class RegexOptions regex compOpt execOpt | regex -> compOpt execOpt, compOpt -> regex execOpt, execOpt -> regex compOpt where #
Methods
blankCompOpt :: compOpt #
blankExecOpt :: execOpt #
defaultCompOpt :: compOpt #
defaultExecOpt :: execOpt #
setExecOpts :: execOpt -> regex -> regex #
getExecOpts :: regex -> execOpt #
Instances
RegexOptions Regex CompOption ExecOption Source # | |
Defined in Text.Regex.TDFA.Common Methods defaultCompOpt :: CompOption # defaultExecOpt :: ExecOption # setExecOpts :: ExecOption -> Regex -> Regex # getExecOpts :: Regex -> ExecOption # |
type MatchOffset = Int #
type MatchLength = Int #
newtype AllMatches (f :: Type -> Type) b #
Constructors
AllMatches | |
Fields
|
newtype AllSubmatches (f :: Type -> Type) b #
Constructors
AllSubmatches | |
Fields
|
newtype AllTextMatches (f :: Type -> Type) b #
Constructors
AllTextMatches | |
Fields
|
newtype AllTextSubmatches (f :: Type -> Type) b #
Constructors
AllTextSubmatches | |
Fields
|
data MatchResult a #
type MatchText source = Array Int (source, (MatchOffset, MatchLength)) #
getVersion_Text_Regex_Base :: Version #