Safe Haskell | None |
---|---|
Language | Haskell98 |
This module uses HXT to transverse an HTML document using CSS selectors.
The most important function here is findBySelector
, it takes a CSS query and
a string containing the HTML to look into,
and it returns a list of the HTML fragments that matched the given query.
Only a subset of the CSS spec is currently supported:
- By tag name: table td a
- By class names: .container .content
- By Id: #oneId
- By attribute: [hasIt], [exact=match], [contains*=text], [starts^=with], [ends$=with]
- Union: a, span, p
- Immediate children: div > p
- Get jiggy with it: div[data-attr=yeah] > .mon, .foo.bar div, #oneThing
- findBySelector :: HtmlLBS -> Query -> Either String [String]
- type HtmlLBS = ByteString
- type Query = Text
- parseQuery :: Text -> Either String [[SelectorGroup]]
- runQuery :: Cursor -> [[SelectorGroup]] -> [Cursor]
- data Selector
- data SelectorGroup
Documentation
findBySelector :: HtmlLBS -> Query -> Either String [String] Source
Perform a css Query
on Html
. Returns Either
- Left: Query parse error.
- Right: List of matching Html fragments.
type HtmlLBS = ByteString Source
For HXT hackers
These functions expose some low level details that you can blissfully ignore.
parseQuery :: Text -> Either String [[SelectorGroup]] Source
Parses a query into an intermediate format which is easy to feed to HXT
- The top-level lists represent the top level comma separated queries.
- SelectorGroup is a group of qualifiers which are separated with spaces or > like these three: table.main.odd tr.even > td.big
- A SelectorGroup as a list of Selector items, following the above example the selectors in the group are: table, .main and .odd
runQuery :: Cursor -> [[SelectorGroup]] -> [Cursor] Source