Safe Haskell | None |
---|---|
Language | Haskell2010 |
This module uses the stream decoding functions from Michael Snoyman's new text-stream-decode package to define decoding functions and lenses.
- type Codec = forall m r. Monad m => Lens' (Producer ByteString m r) (Producer Text m (Producer ByteString m r))
- decode :: ((b -> Constant b b) -> a -> Constant b a) -> a -> b
- utf8 :: Codec
- utf8Pure :: Codec
- utf16LE :: Codec
- utf16BE :: Codec
- utf32LE :: Codec
- utf32BE :: Codec
- decodeUtf8 :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r)
- decodeUtf8Pure :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r)
- decodeUtf16LE :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r)
- decodeUtf16BE :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r)
- decodeUtf32LE :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r)
- decodeUtf32BE :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r)
- encodeUtf8 :: Monad m => Text -> Producer ByteString m ()
- encodeUtf16LE :: Monad m => Text -> Producer ByteString m ()
- encodeUtf16BE :: Monad m => Text -> Producer ByteString m ()
- encodeUtf32LE :: Monad m => Text -> Producer ByteString m ()
- encodeUtf32BE :: Monad m => Text -> Producer ByteString m ()
- encodeAscii :: Monad m => Producer Text m r -> Producer ByteString m (Producer Text m r)
- decodeAscii :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r)
- encodeIso8859_1 :: Monad m => Producer Text m r -> Producer ByteString m (Producer Text m r)
- decodeIso8859_1 :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r)
The Lens or Codec type
The Codec
type is a simple specializion of
the Lens'
type synonymn used by the standard lens libraries,
lens and
lens-family. That type,
type Lens' a b = forall f . Functor f => (b -> f b) -> (a -> f a)
is just an alias for an ordinary Prelude type. Thus you use any codec with
the view
/ (^.)
and zoom
functions from those libraries.
type Codec = forall m r. Monad m => Lens' (Producer ByteString m r) (Producer Text m (Producer ByteString m r)) Source
Viewing the Text in a ByteString
Each codec/lens looks into a byte stream that is supposed to contain text.
The particular 'Codec' lenses are named in accordance with the expected
encoding, utf8
, utf16LE
etc. view
/ (^.)
-- here also called decode
--
turns a Codec into a function:
view utf8 :: Producer ByteString m r -> Producer Text m (Producer ByteString m r) decode utf8 Byte.stdin :: Producer Text IO (Producer ByteString IO r) Bytes.stdin ^. utf8 :: Producer Text IO (Producer ByteString IO r)
Uses of a codec with view
(^.)
decode
can always be replaced by the specialized
decoding functions exported here, e.g.
decodeUtf8 :: Producer ByteString m r -> Producer Text m (Producer ByteString m r) decodeUtf8 Byte.stdin :: Producer Text IO (Producer ByteString IO r)
The stream of text a Codec
'sees' in the stream of bytes begins at its head.
At any point of decoding failure, the stream of text ends and reverts to (returns)
the original byte stream. Thus if the first bytes are already
un-decodable, the whole ByteString producer will be returned, i.e.
view utf8 bytestream
will just come to the same as
return bytestream
Where there is no decoding failure, the return value of the text stream will be
an empty byte stream followed by its own return value. In all cases you must
deal with the fact that it is a ByteString producer that is returned, even if
it can be thrown away with Control.Monad.void
void (Bytes.stdin ^. utf8) :: Producer Text IO ()
zoom
converts a Text parser into a ByteString parser:
zoom utf8 drawChar :: Monad m => StateT (Producer ByteString m r) m (Maybe Char) withNextByte :: Parser ByteString m (Maybe Char, Maybe Word8))) withNextByte = do char_ <- zoom utf8 Text.drawChar byte_ <- Bytes.peekByte return (char_, byte_)
withNextByte
will return the first valid Char in a ByteString,
and the first byte of the next character, if they exists. Because
we 'draw' one and 'peek' at the other, the parser as a whole only
advances one Char's length along the bytestring.
decode :: ((b -> Constant b b) -> a -> Constant b a) -> a -> b Source
decode
is just the ordinary view
or (^.)
of the lens libraries;
exported here for convience
decode utf8 p = decodeUtf8 p = view utf8 p = p ^. utf
Non-lens decoding functions
These are functions with the simple type:
decodeUtf8 :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r)
Thus in general
decodeUtf8 = view utf8 decodeUtf16LE = view utf16LE
and so forth, but these forms may be more convenient (and give better type errors!) where lenses are not desired.
decodeUtf8 :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r) Source
decodeUtf8Pure :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r) Source
decodeUtf16LE :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r) Source
decodeUtf16BE :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r) Source
decodeUtf32LE :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r) Source
decodeUtf32BE :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r) Source
Re-encoding functions
These are simply defined
encodeUtf8 = yield . TE.encodeUtf8
They are intended for use with for
for Text.stdin encodeUtf8 :: Producer ByteString IO ()
which would have the effect of
Text.stdin >-> Pipes.Prelude.map (TE.encodeUtf8)
using the encoding functions from Data.Text.Encoding
encodeUtf8 :: Monad m => Text -> Producer ByteString m () Source
encodeUtf16LE :: Monad m => Text -> Producer ByteString m () Source
encodeUtf16BE :: Monad m => Text -> Producer ByteString m () Source
encodeUtf32LE :: Monad m => Text -> Producer ByteString m () Source
encodeUtf32BE :: Monad m => Text -> Producer ByteString m () Source
Functions for latin and ascii text
ascii and latin encodings only use a small number of the characters Text
recognizes; thus we cannot use the pipes Lens
style to work with them.
Rather we simply define functions each way.
encodeAscii :: Monad m => Producer Text m r -> Producer ByteString m (Producer Text m r) Source
encodeAscii
reduces as much of your stream of Text
actually is ascii to a byte stream,
returning the rest of the Text
at the first non-ascii Char
decodeAscii :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r) Source
Reduce a byte stream to a corresponding stream of ascii chars, returning the
unused ByteString
upon hitting an un-ascii byte.
encodeIso8859_1 :: Monad m => Producer Text m r -> Producer ByteString m (Producer Text m r) Source
decodeIso8859_1 :: Monad m => Producer ByteString m r -> Producer Text m (Producer ByteString m r) Source
Reduce a byte stream to a corresponding stream of ascii chars, returning the
unused ByteString
upon hitting the rare un-latinizable byte.