tagsoup-0.13.3: Parsing and extracting information from (possibly malformed) HTML/XML documents

Safe HaskellSafe-Inferred

Text.HTML.TagSoup.Entity

Description

This module converts between HTML/XML entities (i.e. &) and the characters they represent.

Synopsis

Documentation

lookupEntity :: String -> Maybe StringSource

Lookup an entity, using lookupNumericEntity if it starts with # and lookupNamedEntity otherwise

lookupNamedEntity :: String -> Maybe StringSource

Lookup a named entity, using htmlEntities

 lookupNamedEntity "amp" == Just "&"
 lookupNamedEntity "haskell" == Nothing

lookupNumericEntity :: String -> Maybe StringSource

Lookup a numeric entity, the leading '#' must have already been removed.

 lookupNumericEntity "65" == Just "A"
 lookupNumericEntity "x41" == Just "A"
 lookupNumericEntity "x4E" === Just "N"
 lookupNumericEntity "x4e" === Just "N"
 lookupNumericEntity "Haskell" == Nothing
 lookupNumericEntity "" == Nothing
 lookupNumericEntity "89439085908539082" == Nothing

escapeXML :: String -> StringSource

Escape an XML string.

 escapeXML "hello world" == "hello world"
 escapeXML "hello & world" == "hello & world"

xmlEntities :: [(String, String)]Source

A table mapping XML entity names to resolved strings. All strings are a single character long. Does not include apos as Internet Explorer does not know about it.

htmlEntities :: [(String, String)]Source

A table mapping HTML entity names to resolved strings. Most resolved strings are a single character long, but some (e.g. ngeqq) are two characters long. The list is taken from http://www.w3.org/TR/html5/syntax.html#named-character-references.