io-streams-1.4.0.0: Simple, composable, and easy-to-use stream I/O

Safe HaskellSafe
LanguageHaskell2010

System.IO.Streams.Tutorial

Contents

Synopsis

Introduction

The io-streams package defines two "smart handles" for stream processing:

The InputStream type implements all the core operations we expect for a read-only handle. We consume values using read, which returns a Nothing when the resource is done:

read :: InputStream c -> IO (Maybe c)

... and we can push back values using unRead:

unRead :: c -> InputStream c -> IO ()

The OutputStream type implements the write operation which feeds it output, supplying Nothing to signal resource exhaustion:

write :: Maybe c -> OutputStream c -> IO ()

These streams slightly resemble Haskell Handles, but support a wider range of sources and sinks. For example, you can convert an ordinary list to an InputStream source and interact with it using the handle-based API:

ghci> import qualified System.IO.Streams as S
ghci> listHandle <- S.fromList [1, 2]
ghci> S.read listHandle
Just 1
ghci> S.read listHandle
Just 2
ghci> S.read listHandle
Nothing

Additionally, IO streams come with a library of stream transformations that preserve their handle-like API. For example, you can map a function over an InputStream, which generates a new handle to the same stream that returns transformed values:

ghci> oldHandle <- S.fromList [1, 2, 3]
ghci> newHandle <- S.mapM (\x -> return (x * 10)) oldHandle
ghci> S.read newHandle
10
ghci> -- We can still view the stream through the old handle
ghci> S.read oldHandle
2
ghci> -- ... and switch back again
ghci> S.read newHandle
30

IO streams focus on preserving the convention of traditional handles while offering a wider library of stream-processing utilities.

Build Input Streams

The io-streams library provides a simple interface for creating your own InputStreams and OutputStreams.

You can build an InputStream from any IO action that generates output, as long as it wraps results in Just and uses Nothing to signal EOF:

makeInputStream :: IO (Maybe a) -> IO (InputStream a)

As an example, let's wrap an ordinary read-only Handle in an InputStream:

import Data.ByteString (ByteString)
import qualified Data.ByteString as S
import System.IO.Streams (InputStream)
import qualified System.IO.Streams as Streams
import System.IO (Handle, hFlush)

bUFSIZ = 32752

upgradeReadOnlyHandle :: Handle -> IO (InputStream ByteString)
upgradeReadOnlyHandle h = Streams.makeInputStream f
  where
    f = do
        x <- S.hGetSome h bUFSIZ
        return $! if S.null x then Nothing else Just x

We didn't even really need to write the upgradeReadOnlyHandle function, because System.IO.Streams.Handle already provides one that uses the exact same implementation given above:

handleToInputStream :: Handle -> IO (InputStream ByteString)

Build Output Streams

Similarly, you can build any OutputStream from an IO action that accepts input, as long as it interprets Just as more input and Nothing as EOF:

makeOutputStream :: (Maybe a -> IO ()) -> IO (OutputStream a)

A simple OutputStream might wrap putStrLn for ByteStrings:

import Data.ByteString (ByteString)
import qualified Data.ByteString as S
import System.IO.Streams (OutputStream)
import qualified System.IO.Streams as Streams

writeConsole :: IO (OutputStream ByteString)
writeConsole = Streams.makeOutputStream $ \m -> case m of
    Just bs -> S.putStrLn bs
    Nothing -> return ()

The Just wraps more incoming data, whereas Nothing indicates the data is exhausted. In principle, you can feed OutputStreams more input after writing a Nothing to them, but IO streams only guarantee a well-defined behavior up to the first Nothing. After receiving the first Nothing, an OutputStream could respond to additional input by:

  • Using the input
  • Ignoring the input
  • Throwing an exception

Ideally, you should adhere to well-defined behavior and ensure that after you write a Nothing to an OutputStream, you don't write anything else.

Connect Streams

io-streams provides two ways to connect an InputStream and OutputStream:

connect :: InputStream a -> OutputStream a -> IO ()
supply  :: InputStream a -> OutputStream a -> IO ()

connect feeds the OutputStream exclusively with the given InputStream and passes along the end-of-stream notification to the OutputStream.

supply feeds the OutputStream non-exclusively with the given InputStream and does not pass along the end-of-stream notification to the OutputStream.

You can combine both supply and connect to feed multiple InputStreams into a single OutputStream:

import qualified System.IO.Streams as Streams
import System.IO (IOMode(WriteMode))

main = do
   Streams.withFileAsOutput "out.txt" WriteMode $ \outStream ->
   Streams.withFileAsInput  "in1.txt" $ \inStream1 ->
   Streams.withFileAsInput  "in2.txt" $ \inStream2 ->
   Streams.withFileAsInput  "in3.txt" $ \inStream3 ->
   Streams.supply  inStream1 outStream
   Streams.supply  inStream2 outStream
   Streams.connect inStream3 outStream

The final connect seals the OutputStream when the final InputStream terminates.

Keep in mind that you do not need to use connect or supply at all: io-streams mainly provides them for user convenience. You can always build your own abstractions on top of the read and write operations.

Transform Streams

When we build or use IO streams we can tap into all the stream-processing features the io-streams library provides. For example, we can decompress any InputStream of ByteStrings:

import Control.Monad ((>=>))
import Data.ByteString (ByteString)
import System.IO (Handle)
import System.IO.Streams (InputStream, OutputStream)
import qualified System.IO.Streams as Streams
import qualified System.IO.Streams.File as Streams

unzipHandle :: Handle -> IO (InputStream ByteString)
unzipHandle = Streams.handleToInputStream >=> Streams.decompress

... or we can guard it against a denial-of-service attack:

protectHandle :: Handle -> IO (InputStream ByteString)
protectHandle =
    Streams.handleToInputStream >=> Streams.throwIfProducesMoreThan 1000000

io-streams provides many useful functions such as these in its standard library and you take advantage of them by defining IO streams that wrap your resources.

Resource and Exception Safety

IO streams use standard Haskell idioms for resource safety. Since all operations occur in the IO monad, you can use catch, bracket, or various "with..." functions to guard any read or write without any special considerations:

import qualified Data.ByteString as S
import System.IO
import System.IO.Streams (InputStream, OutputStream)
import qualified System.IO.Streams as Streams
import qualified System.IO.Streams.File as Streams

main =
    withFile "test.txt" ReadMode $ \handle -> do
        stream <- Streams.handleToInputStream handle
        mBytes <- Streams.read stream
        case mBytes of
            Just bytes -> S.putStrLn bytes
            Nothing    -> putStrLn "EOF"

However, you can also simplify the above example by using the convenience function withFileAsInput from System.IO.Streams.File:

withFileAsInput
 :: FilePath -> (InputStream ByteString -> IO a) -> IO a

Pushback

All InputStreams support pushback, which simplifies many types of operations. For example, we can peek at an InputStream by combining read and unRead:

peek :: InputStream c -> IO (Maybe c)
peek s = do
    x <- Streams.read s
    case x of
        Nothing -> return ()
        Just c  -> Streams.unRead c s
    return x

... although System.IO.Streams already exports the above function.

InputStreams can customize pushback behavior to support more sophisticated support for pushback. For example, if you protect a stream using throwIfProducesMoreThan and unRead input, it will subtract the unread input from the total byte count. However, these extra features will not interfere with the basic pushback contract, given by the following law:

unRead c stream >> read stream == return (Just c)

When you build an InputStream using makeInputStream, it supplies the default pushback behavior which just saves the input for the next read call. More advanced users can use System.IO.Streams.Internal to customize their own pushback routines.

Thread Safety

IO stream operations are not thread-safe by default for performance reasons. However, you can transform an existing IO stream into a thread-safe one using the provided locking functions:

lockingInputStream  :: InputStream  a -> IO (InputStream  a)
lockingOutputStream :: OutputStream a -> IO (OutputStream a)

These functions do not prevent access to the previous IO stream, so you must take care to not save the reference to the previous stream.

Examples

The following examples show how to use the standard library to implement traditional command-line utilities:

{-# LANGUAGE OverloadedStrings #-}

import Control.Monad ((>=>), join)
import qualified Data.ByteString.Char8 as S
import Data.Int (Int64)
import Data.Monoid ((<>))
import System.IO.Streams (InputStream)
import qualified System.IO.Streams as Streams
import System.IO
import Prelude hiding (head)

cat :: FilePath -> IO ()
cat file = withFile file ReadMode $ \h -> do
    is <- Streams.handleToInputStream h
    Streams.connect is Streams.stdout

grep :: S.ByteString -> FilePath -> IO ()
grep pattern file = withFile file ReadMode $ \h -> do
    is <- Streams.handleToInputStream h >>=
          Streams.lines                 >>=
          Streams.filter (S.isInfixOf pattern)
    os <- Streams.unlines Streams.stdout
    Streams.connect is os

data Option = Bytes | Words | Lines

len :: InputStream a -> IO Int64
len = Streams.fold (\n _ -> n + 1) 0

wc :: Option -> FilePath -> IO ()
wc opt file = withFile file ReadMode $
    Streams.handleToInputStream >=> count >=> print
  where
    count = case opt of
        Bytes -> \is -> do
            (is', cnt) <- Streams.countInput is
            Streams.skipToEof is'
            cnt
        Words -> Streams.words >=> len
        Lines -> Streams.lines >=> len

nl :: FilePath -> IO ()
nl file = withFile file ReadMode $ \h -> do
    nats <- Streams.fromList [1..]
    ls   <- Streams.handleToInputStream h >>= Streams.lines
    is   <- Streams.zipWith
                (\n bs -> S.pack (show n) <> " " <> bs)
                nats
                ls
    os   <- Streams.unlines Streams.stdout
    Streams.connect is os

head :: Int64 -> FilePath -> IO ()
head n file = withFile file ReadMode $ \h -> do
    is <- Streams.handleToInputStream h >>= Streams.lines >>= Streams.take n
    os <- Streams.unlines Streams.stdout
    Streams.connect is os

paste :: FilePath -> FilePath -> IO ()
paste file1 file2 =
    withFile file1 ReadMode $ \h1 ->
    withFile file2 ReadMode $ \h2 -> do
    is1 <- Streams.handleToInputStream h1 >>= Streams.lines
    is2 <- Streams.handleToInputStream h2 >>= Streams.lines
    isT <- Streams.zipWith (\l1 l2 -> l1 <> "\t" <> l2) is1 is2
    os  <- Streams.unlines Streams.stdout
    Streams.connect isT os

yes :: IO ()
yes = do
    is <- Streams.fromList (repeat "y")
    os <- Streams.unlines Streams.stdout
    Streams.connect is os