License | BSD-style |
---|---|
Maintainer | Vincent Hanquez <vincent@snarc.org> |
Stability | experimental |
Portability | portable |
Safe Haskell | None |
Language | Haskell2010 |
Opaque packed String encoded in UTF8.
The type is an instance of IsString and IsList, which allow OverloadedStrings
for string literal, and fromList
to convert a [Char] (Prelude String) to a packed
representation
{-# LANGUAGE OverloadedStrings #-} s = "Hello World" :: String
s = fromList ("Hello World" :: Prelude.String) :: String
Each unicode code point is represented by a variable encoding of 1 to 4 bytes,
For more information about UTF8: https://en.wikipedia.org/wiki/UTF-8
- data String
- data Encoding
- = ASCII7
- | UTF8
- | UTF16
- | UTF32
- | ISO_8859_1
- fromBytes :: Encoding -> UArray Word8 -> (String, Maybe ValidationFailure, UArray Word8)
- fromBytesLenient :: UArray Word8 -> (String, UArray Word8)
- fromBytesUnsafe :: UArray Word8 -> String
- toBytes :: Encoding -> String -> UArray Word8
- data ValidationFailure
- lines :: String -> [String]
- words :: String -> [String]
Documentation
Opaque packed array of characters in the UTF8 encoding
Various String Encoding that can be use to convert to and from bytes
fromBytes :: Encoding -> UArray Word8 -> (String, Maybe ValidationFailure, UArray Word8) Source #
Convert a ByteArray to a string assuming a specific encoding.
It returns a 3-tuple of:
- The string that has been succesfully converted without any error
- An optional validation error
- The remaining buffer that hasn't been processed (either as a result of an error, or because the encoded sequence is not fully available)
Considering a stream of data that is fetched chunk by chunk, it's valid to assume that some sequence might fall in a chunk boundary. When converting chunks, if the error is Nothing and the remaining buffer is not empty, then this buffer need to be prepended to the next chunk
fromBytesLenient :: UArray Word8 -> (String, UArray Word8) Source #
Convert a UTF8 array of bytes to a String.
If there's any error in the stream, it will automatically insert replacement bytes to replace invalid sequences.
In the case of sequence that fall in the middle of 2 chunks, the remaining buffer is supposed to be preprended to the next chunk, and resume the parsing.
fromBytesUnsafe :: UArray Word8 -> String Source #
Convert a Byte Array representing UTF8 data directly to a string without checking for UTF8 validity
If the input contains invalid sequences, it will trigger runtime async errors when processing data.
In doubt, use fromBytes
toBytes :: Encoding -> String -> UArray Word8 Source #
Convert a String to a bytearray in a specific encoding
if the encoding is UTF8, the underlying buffer is returned without extra allocation or any processing
In any other encoding, some allocation and processing are done to convert.
data ValidationFailure Source #
Possible failure related to validating bytes of UTF8 sequences.