streaming-bytestring-0.1.4.2: effectful byte steams, or: bytestring io done right.

Copyright(c) Don Stewart 2006 (c) Duncan Coutts 2006-2011 (c) Michael Thompson 2015
LicenseBSD-style
Maintainerwhat_is_it_to_do_anything@yahoo.com
Stabilityexperimental
Portabilityportable
Safe HaskellNone
LanguageHaskell2010

Data.ByteString.Streaming

Contents

Description

A time and space-efficient implementation of effectful byte streams using a stream of packed Word8 arrays, suitable for high performance use, both in terms of large data quantities, or high speed requirements. Streaming ByteStrings are encoded as streams of strict chunks of bytes.

A key feature of streaming ByteStrings is the means to manipulate large or unbounded streams of data without requiring the entire sequence to be resident in memory. To take advantage of this you have to write your functions in a streaming style, e.g. classic pipeline composition. The default I/O chunk size is 32k, which should be good in most circumstances.

Some operations, such as concat, append, reverse and cons, have better complexity than their Data.ByteString equivalents, due to optimisations resulting from the list spine structure. For other operations streaming, like lazy, ByteStrings are usually within a few percent of strict ones.

This module is intended to be imported qualified, to avoid name clashes with Prelude functions. eg.

import qualified Data.ByteString.Streaming as B

Original GHC implementation by Bryan O'Sullivan. Rewritten to use UArray by Simon Marlow. Rewritten to support slices and use ForeignPtr by David Roundy. Rewritten again and extended by Don Stewart and Duncan Coutts. Lazy variant by Duncan Coutts and Don Stewart. Streaming variant by Michael Thompson, following the ideas of Gabriel Gonzales' pipes-bytestring

Synopsis

The ByteString type

data ByteString m r Source

A space-efficient representation of a succession of Word8 vectors, supporting many efficient operations.

An effectful ByteString contains 8-bit bytes, or by using the operations from Data.ByteString.Streaming.Char8 it can be interpreted as containing 8-bit characters.

Introducing and eliminating ByteStrings

empty :: ByteString m () Source

O(1) The empty ByteString -- i.e. return () Note that ByteString m w is generally a monoid for monoidal values of w, like ()

singleton :: Monad m => Word8 -> ByteString m () Source

O(1) Yield a Word8 as a minimal ByteString

pack :: Monad m => Stream (Of Word8) m r -> ByteString m r Source

O(n) Convert a monadic stream of individual Word8s into a packed byte stream.

unpack :: Monad m => ByteString m r -> Stream (Of Word8) m r Source

O(n) Converts a packed byte stream into a stream of individual bytes.

fromLazy :: Monad m => ByteString -> ByteString m () Source

O(c) Transmute a pseudo-pure lazy bytestring to its representation as a monadic stream of chunks.

>>> Q.putStrLn $ Q.fromLazy "hi"
hi
>>> Q.fromLazy "hi"
Chunk "hi" (Empty (()))  -- note: a 'show' instance works in the identity monad
>>> Q.fromLazy $ BL.fromChunks ["here", "are", "some", "chunks"]
Chunk "here" (Chunk "are" (Chunk "some" (Chunk "chunks" (Empty (())))))

toLazy :: Monad m => ByteString m r -> m (Of ByteString r) Source

O(n) Convert an effectful byte stream into a single lazy ByteString with the same internal chunk structure, retaining the original return value.

This is the canonical way of breaking streaming (toStrict and the like are far more demonic). Essentially one is dividing the interleaved layers of effects and bytes into one immense layer of effects, followed by the memory of the succession of bytes.

Because one preserves the return value, toLazy is a suitable argument for mapped

  S.mapped Q.toLazy :: Stream (ByteString m) m r -> Stream (Of L.ByteString) m r
>>> Q.toLazy "hello"
"hello" :> ()
>>> S.toListM $ traverses Q.toLazy $ Q.lines "one\ntwo\nthree\nfour\nfive\n"
["one","two","three","four","five",""]  -- [L.ByteString]

toLazy_ :: Monad m => ByteString m r -> m ByteString Source

O(n) Convert an effectful byte stream into a single lazy ByteString with the same internal chunk structure. See toLazy which preserve connectedness by keeping the return value of the effectful bytestring.

fromChunks :: Monad m => Stream (Of ByteString) m r -> ByteString m r Source

O(c) Convert a monadic stream of individual strict ByteString chunks into a byte stream.

toChunks :: Monad m => ByteString m r -> Stream (Of ByteString) m r Source

O(c) Convert a byte stream into a stream of individual strict bytestrings. This of course exposes the internal chunk structure.

fromStrict :: ByteString -> ByteString m () Source

O(1) yield a strict ByteString chunk.

toStrict :: Monad m => ByteString m r -> m (Of ByteString r) Source

O(n) Convert a monadic byte stream into a single strict ByteString, retaining the return value of the original pair. This operation is for use with mapped.

mapped R.toStrict :: Monad m => Stream (ByteString m) m r -> Stream (Of ByteString) m r 

It is subject to all the objections one makes to Data.ByteString.Lazy toStrict; all of these are devastating.

toStrict_ :: Monad m => ByteString m () -> m ByteString Source

O(n) Convert a byte stream into a single strict ByteString.

Note that this is an expensive operation that forces the whole monadic ByteString into memory and then copies all the data. If possible, try to avoid converting back and forth between streaming and strict bytestrings.

effects :: Monad m => ByteString m r -> m r Source

Perform the effects contained in an effectful bytestring, ignoring the bytes.

copy :: Monad m => ByteString m r -> ByteString (ByteString m) r Source

Make the information in a bytestring available to more than one eliminating fold, e.g.

>>> Q.count 'l' $ Q.count 'o' $ Q.copy $ "hello\nworld"
3 :> (2 :> ())
>>> Q.length $ Q.count 'l' $ Q.count 'o' $ Q.copy $ Q.copy "hello\nworld"
11 :> (3 :> (2 :> ()))
>>> runResourceT $ Q.writeFile "hello2.txt" $ Q.writeFile "hello1.txt" $ Q.copy $ "hello\nworld\n"
>>> :! cat hello2.txt
hello
world
>>> :! cat hello1.txt
hello
world

This sort of manipulation could as well be acheived by combining folds - using Control.Foldl for example. But any sort of manipulation can be involved in the fold. Here are a couple of trivial complications involving splitting by lines:

>>> let doubleLines = Q.unlines . maps (<* Q.chunk "\n" ) . Q.lines
>>> let emphasize = Q.unlines . maps (<* Q.chunk "!" ) . Q.lines
>>> runResourceT $ Q.writeFile "hello2.txt" $ emphasize $ Q.writeFile "hello1.txt" $ doubleLines $ Q.copy $ "hello\nworld"
>>> :! cat hello2.txt
hello!
world!
>>> :! cat hello1.txt
hello

world

As with the parallel operations in Streaming.Prelude, we have

Q.effects . Q.copy       = id
hoist Q.effects . Q.copy = id

The duplication does not by itself involve the copying of bytestring chunks; it just makes two references to each chunk as it arises. This does, however double the number of constructors associated with each chunk.

drained :: (Monad m, MonadTrans t, Monad (t m)) => t m (ByteString m r) -> t m r Source

Perform the effects contained in the second in an effectful pair of bytestrings, ignoring the bytes. It would typically be used at the type

 ByteString m (ByteString m r) -> ByteString m r

mwrap :: m (ByteString m r) -> ByteString m r Source

Reconceive an effect that results in an effectful bytestring as an effectful bytestring. Compare Streaming.mwrap. The closes equivalent of

>>> Streaming.wrap :: f (Stream f m r) -> Stream f m r

is here consChunk. mwrap is the smart constructor for the internal Go constructor.

distribute :: (Monad m, MonadTrans t, MFunctor t, Monad (t m), Monad (t (ByteString m))) => ByteString (t m) a -> t (ByteString m) a Source

Given a byte stream on a transformed monad, make it possible to 'run' transformer.

Transforming ByteStrings

map :: Monad m => (Word8 -> Word8) -> ByteString m r -> ByteString m r Source

O(n) map f xs is the ByteString obtained by applying f to each element of xs.

intercalate :: Monad m => ByteString m () -> Stream (ByteString m) m r -> ByteString m r Source

O(n) The intercalate function takes a ByteString and a list of ByteStrings and concatenates the list after interspersing the first argument between each element of the list.

Basic interface

cons :: Monad m => Word8 -> ByteString m r -> ByteString m r Source

O(1) cons is analogous to '(:)' for lists.

cons' :: Word8 -> ByteString m r -> ByteString m r Source

O(1) Unlike cons, 'cons\'' is strict in the ByteString that we are consing onto. More precisely, it forces the head and the first chunk. It does this because, for space efficiency, it may coalesce the new byte onto the first 'chunk' rather than starting a new 'chunk'.

So that means you can't use a lazy recursive contruction like this:

let xs = cons\' c xs in xs

You can however use cons, as well as repeat and cycle, to build infinite byte streams.

snoc :: Monad m => ByteString m r -> Word8 -> ByteString m r Source

O(n/c) Append a byte to the end of a ByteString

append :: Monad m => ByteString m r -> ByteString m s -> ByteString m s Source

O(n/c) Append two

filter :: Monad m => (Word8 -> Bool) -> ByteString m r -> ByteString m r Source

O(n) filter, applied to a predicate and a ByteString, returns a ByteString containing those characters that satisfy the predicate.

uncons :: Monad m => ByteString m r -> m (Maybe (Word8, ByteString m r)) Source

O(1) Extract the head and tail of a ByteString, or Nothing if it is empty

nextByte :: Monad m => ByteString m r -> m (Either r (Word8, ByteString m r)) Source

O(1) Extract the head and tail of a ByteString, or its return value if it is empty. This is the 'natural' uncons for an effectful byte stream.

denull :: Monad m => Stream (ByteString m) m r -> Stream (ByteString m) m r Source

Remove empty ByteStrings from a stream of bytestrings.

Direct chunk handling

consChunk :: ByteString -> ByteString m r -> ByteString m r Source

Smart constructor for Chunk.

chunk :: ByteString -> ByteString m () Source

Yield-style smart constructor for Chunk.

foldrChunks :: Monad m => (ByteString -> a -> a) -> a -> ByteString m r -> m a Source

Consume the chunks of an effectful ByteString with a natural right fold.

foldlChunks :: Monad m => (a -> ByteString -> a) -> a -> ByteString m r -> m (Of a r) Source

Substrings

Breaking strings

break :: Monad m => (Word8 -> Bool) -> ByteString m r -> ByteString m (ByteString m r) Source

break p is equivalent to span (not . p).

drop :: Monad m => Int64 -> ByteString m r -> ByteString m r Source

O(n/c) drop n xs returns the suffix of xs after the first n elements, or [] if n > length xs.

>>> Q.putStrLn $ Q.drop 6 "Wisconsin"
sin
>>> Q.putStrLn $ Q.drop 16 "Wisconsin"
>>> 

group :: Monad m => ByteString m r -> Stream (ByteString m) m r Source

The group function takes a ByteString and returns a list of ByteStrings such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,

group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]

It is a special case of groupBy, which allows the programmer to supply their own equality test.

groupBy :: Monad m => (Word8 -> Word8 -> Bool) -> ByteString m r -> Stream (ByteString m) m r Source

The groupBy function is a generalized version of group.

span :: Monad m => (Word8 -> Bool) -> ByteString m r -> ByteString m (ByteString m r) Source

span p xs breaks the ByteString into two segments. It is equivalent to (takeWhile p xs, dropWhile p xs)

splitAt :: Monad m => Int64 -> ByteString m r -> ByteString m (ByteString m r) Source

O(n/c) splitAt n xs is equivalent to (take n xs, drop n xs).

>>> rest <- Q.putStrLn $ Q.splitAt 3 "therapist is a danger to good hyphenation, as Knuth notes"
the
>>> Q.putStrLn $ Q.splitAt 19 rest
rapist is a danger 

splitWith :: Monad m => (Word8 -> Bool) -> ByteString m r -> Stream (ByteString m) m r Source

O(n) Splits a ByteString into components delimited by separators, where the predicate returns True for a separator element. The resulting components do not contain the separators. Two adjacent separators result in an empty component in the output. eg.

splitWith (=='a') "aabbaca" == ["","","bb","c",""]
splitWith (=='a') []        == []

take :: Monad m => Int64 -> ByteString m r -> ByteString m () Source

O(n/c) take n, applied to a ByteString xs, returns the prefix of xs of length n, or xs itself if n > length xs.

Note that in the streaming context this drops the final return value; splitAt preserves this information, and is sometimes to be preferred.

>>> Q.putStrLn $ Q.take 8 $ "Is there a God?" >> return True
Is there
>>> Q.putStrLn $ "Is there a God?" >> return True
Is there a God?
True
>>> rest <- Q.putStrLn $ Q.splitAt 8 $ "Is there a God?" >> return True
Is there
>>> Q.effects  rest
True

takeWhile :: Monad m => (Word8 -> Bool) -> ByteString m r -> ByteString m () Source

takeWhile, applied to a predicate p and a ByteString xs, returns the longest prefix (possibly empty) of xs of elements that satisfy p.

Breaking into many substrings

split :: Monad m => Word8 -> ByteString m r -> Stream (ByteString m) m r Source

O(n) Break a ByteString into pieces separated by the byte argument, consuming the delimiter. I.e.

split '\n' "a\nb\nd\ne" == ["a","b","d","e"]
split 'a'  "aXaXaXa"    == ["","X","X","X",""]
split 'x'  "x"          == ["",""]

and

intercalate [c] . split c == id
split == splitWith . (==)

As for all splitting functions in this library, this function does not copy the substrings, it just constructs new ByteStrings that are slices of the original.

Special folds

concat :: Monad m => Stream (ByteString m) m r -> ByteString m r Source

O(n) Concatenate a stream of byte streams.

Builders

toStreamingByteStringWith :: MonadIO m => AllocationStrategy -> Builder -> ByteString m () Source

Take a builder and convert it to a genuine streaming bytestring, using a specific allocation strategy.

toBuilder :: ByteString IO () -> Builder Source

A simple construction of a builder from a byte stream.

>>> let aaa = "10000 is a number\n" :: Q.ByteString IO ()
>>> hPutBuilder  IO.stdout $ toBuilder  aaa
10000 is a number

Building ByteStrings

Infinite ByteStrings

repeat :: Word8 -> ByteString m r Source

repeat x is an infinite ByteString, with x the value of every element.

>>> R.stdout $ R.take 50 $ R.repeat 60
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Q.putStrLn $ Q.take 50 $ Q.repeat 'z'
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

iterate :: (Word8 -> Word8) -> Word8 -> ByteString m r Source

iterate f x returns an infinite ByteString of repeated applications -- of f to x:

iterate f x == [x, f x, f (f x), ...]
>>> R.stdout $ R.take 50 $ R.iterate succ 39
()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXY
>>> Q.putStrLn $ Q.take 50 $ Q.iterate succ '\''
()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXY

cycle :: Monad m => ByteString m r -> ByteString m s Source

cycle ties a finite ByteString into a circular one, or equivalently, the infinite repetition of the original ByteString. For an empty bytestring (like return 17) it of course makes an unproductive loop

>>> Q.putStrLn $ Q.take 7 $ Q.cycle  "y\n"
y
y
y
y

Unfolding ByteStrings

unfoldM :: Monad m => (a -> Maybe (Word8, a)) -> a -> ByteString m () Source

O(n) The unfoldr function is analogous to the Stream 'unfoldr'. unfoldr builds a ByteString from a seed value. The function takes the element and returns Nothing if it is done producing the ByteString or returns Just (a,b), in which case, a is a prepending to the ByteString and b is used as the next element in a recursive call.

unfoldr :: (a -> Either r (Word8, a)) -> a -> ByteString m r Source

unfold is like unfoldr but stops when the co-algebra returns Left; the result is the return value of the 'ByteString m r' 'unfoldr uncons = id'

Folds, including support for Foldl

foldr :: Monad m => (Word8 -> a -> a) -> a -> ByteString m () -> m a Source

foldr, applied to a binary operator, a starting value -- (typically the right-identity of the operator), and a ByteString, -- reduces the ByteString using the binary operator, from right to left.

foldr cons = id

fold :: Monad m => (x -> Word8 -> x) -> x -> (x -> b) -> ByteString m () -> m b Source

fold, applied to a binary operator, a starting value (typically the left-identity of the operator), and a ByteString, reduces the ByteString using the binary operator, from left to right. We use the style of the foldl libarary for left folds

fold_ :: Monad m => (x -> Word8 -> x) -> x -> (x -> b) -> ByteString m r -> m (Of b r) Source

'fold\'' keeps the return value of the left-folded bytestring. Useful for simultaneous folds over a segmented bytestream

head :: Monad m => ByteString m r -> m (Of (Maybe Word8) r) Source

O(c) Extract the first element of a ByteString, which must be non-empty.

head_ :: Monad m => ByteString m r -> m Word8 Source

O(1) Extract the first element of a ByteString, which must be non-empty.

last :: Monad m => ByteString m r -> m (Of (Maybe Word8) r) Source

last_ :: Monad m => ByteString m r -> m Word8 Source

O(n/c) Extract the last element of a ByteString, which must be finite and non-empty.

length :: Monad m => ByteString m r -> m (Of Int r) Source

O(n/c) length returns the length of a byte stream as an Int together with the return value. This makes various maps possible

>>> Q.length "one\ntwo\three\nfour\nfive\n"
23 :> ()
>>> S.print $ S.take 3 $ mapped Q.length $ Q.lines "one\ntwo\three\nfour\nfive\n"
3
8
4

length_ :: Monad m => ByteString m r -> m Int Source

null :: Monad m => ByteString m r -> m (Of Bool r) Source

O(1) Test whether a ByteString is empty, collecting its return value; -- to reach the return value, this operation must check the whole length of the string.

>>> Q.null "one\ntwo\three\nfour\nfive\n"
False :> ()
>>> Q.null ""
True :> ()
>>> S.print $ mapped R.null $ Q.lines "yours,\nMeredith"
False
False

nulls :: Monad m => ByteString m r -> m (Sum (ByteString m) (ByteString m) r) Source

O1 Distinguish empty from non-empty lines, while maintaining streaming; the empty ByteStrings are on the right

>>> nulls  ::  ByteString m r -> m (Sum (ByteString m) (ByteString m) r)

There are many ways to remove null bytestrings from a Stream (ByteString m) m r (besides using denull). If we pass next to

>>> mapped nulls bs :: Stream (Sum (ByteString m) (ByteString m)) m r

then can then apply Streaming.separate to get

>>> separate (mapped nulls bs) :: Stream (ByteString m) (Stream (ByteString m) m) r

The inner monad is now made of the empty bytestrings; we act on this with hoist , considering that

>>> :t Q.effects . Q.concat
Q.effects . Q.concat
  :: Monad m => Stream (Q.ByteString m) m r -> m r

we have

>>> hoist (Q.effects . Q.concat) . separate . mapped Q.nulls
  :: Monad n =>  Stream (Q.ByteString n) n b -> Stream (Q.ByteString n) n b

null_ :: Monad m => ByteString m r -> m Bool Source

O(1) Test whether an ByteString is empty. The value is of course in the monad of the effects.

>>> Q.null "one\ntwo\three\nfour\nfive\n"
False
>>> Q.null $ Q.take 0 Q.stdin
True
>>> :t Q.null $ Q.take 0 Q.stdin
Q.null $ Q.take 0 Q.stdin :: MonadIO m => m Bool

count :: Monad m => Word8 -> ByteString m r -> m (Of Int r) Source

count_ :: Monad m => Word8 -> ByteString m r -> m Int Source

count returns the number of times its argument appears in the ByteString

count = length . elemIndices

I/O with ByteStrings

Standard input and output

getContents :: MonadIO m => ByteString m () Source

getContents. Equivalent to hGetContents stdin. Will read lazily

stdin :: MonadIO m => ByteString m () Source

Pipes-style nomenclature for getContents

stdout :: MonadIO m => ByteString m r -> m r Source

Pipes-style nomenclature for putStr

interact :: (ByteString IO () -> ByteString IO r) -> IO r Source

A synonym for hPut, for compatibility

hPutStr :: Handle -> ByteString IO r -> IO r hPutStr = hPut

  • - | Write a ByteString to stdout putStr :: ByteString IO r -> IO r putStr = hPut IO.stdout

The interact function takes a function of type ByteString -> ByteString as its argument. The entire input from the standard input device is passed to this function as its argument, and the resulting string is output on the standard output device.

interact morph = stdout (morph stdin)

Files

readFile :: MonadResource m => FilePath -> ByteString m () Source

Read an entire file into a chunked 'ByteString IO ()'. The Handle will be held open until EOF is encountered. The block governed by runResourceT will end with the closing of any handles opened.

>>> :! cat hello.txt
Hello world.
Goodbye world. 
>>> runResourceT $ Q.stdout $ Q.readFile "hello.txt"
Hello world.
Goodbye world. 

writeFile :: MonadResource m => FilePath -> ByteString m r -> m r Source

Write a ByteString to a file. Use runResourceT to ensure that the handle is closed.

>>> :set -XOverloadedStrings
>>> runResourceT $ Q.writeFile "hello.txt" "Hello world.\nGoodbye world.\n"
>>> :! cat "hello.txt"
Hello world.
Goodbye world.
>>> runResourceT $ Q.writeFile "hello2.txt" $ Q.readFile "hello.txt"
>>> :! cat hello2.txt
Hello world.
Goodbye world.

appendFile :: MonadResource m => FilePath -> ByteString m r -> m r Source

Append a ByteString to a file. Use runResourceT to ensure that the handle is closed.

>>> runResourceT $ Q.writeFile "hello.txt" "Hello world.\nGoodbye world.\n"
>>> runResourceT $ Q.stdout $ Q.readFile "hello.txt"
Hello world.
Goodbye world.
>>> runResourceT $ Q.appendFile "hello.txt" "sincerely yours,\nArthur\n"
>>> runResourceT $ Q.stdout $  Q.readFile "hello.txt"
Hello world.
Goodbye world.
sincerely yours,
Arthur

I/O with Handles

fromHandle :: MonadIO m => Handle -> ByteString m () Source

Pipes-style nomenclature for hGetContents

toHandle :: MonadIO m => Handle -> ByteString m r -> m r Source

Pipes nomenclature for hPut

hGet :: MonadIO m => Handle -> Int -> ByteString m () Source

Read n bytes into a ByteString, directly from the specified Handle.

hGetContents :: MonadIO m => Handle -> ByteString m () Source

Read entire handle contents lazily into a ByteString. Chunks are read on demand, using the default chunk size.

Once EOF is encountered, the Handle is closed.

Note: the Handle should be placed in binary mode with hSetBinaryMode for hGetContents to work correctly.

hGetContentsN :: MonadIO m => Int -> Handle -> ByteString m () Source

Read entire handle contents lazily into a ByteString. Chunks are read on demand, in at most k-sized chunks. It does not block waiting for a whole k-sized chunk, so if less than k bytes are available then they will be returned immediately as a smaller chunk.

The handle is closed on EOF.

Note: the Handle should be placed in binary mode with hSetBinaryMode for hGetContentsN to work correctly.

hGetN :: MonadIO m => Int -> Handle -> Int -> ByteString m () Source

Read n bytes into a ByteString, directly from the specified Handle, in chunks of size k.

hGetNonBlocking :: MonadIO m => Handle -> Int -> ByteString m () Source

hGetNonBlocking is similar to hGet, except that it will never block waiting for data to become available, instead it returns only whatever data is available. If there is no data available to be read, hGetNonBlocking returns empty.

Note: on Windows and with Haskell implementation other than GHC, this function does not work correctly; it behaves identically to hGet.

hGetNonBlockingN :: MonadIO m => Int -> Handle -> Int -> ByteString m () Source

hGetNonBlockingN is similar to hGetContentsN, except that it will never block waiting for data to become available, instead it returns only whatever data is available. Chunks are read on demand, in k-sized chunks.

hPut :: MonadIO m => Handle -> ByteString m r -> m r Source

Outputs a ByteString to the specified Handle.

Etc.

zipWithStream :: Monad m => (forall x. a -> ByteString m x -> ByteString m x) -> [a] -> Stream (ByteString m) m r -> Stream (ByteString m) m r Source