Copyright | (c) 2019 Composewell Technologies |
---|---|
License | BSD3 |
Maintainer | streamly@composewell.com |
Stability | released |
Portability | GHC |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Unboxed immutable arrays with streaming interfaces.
Please refer to Streamly.Internal.Data.Array for functions that have not yet been released.
For arrays that work on boxed types, not requiring the Unbox
constraint,
please refer to Streamly.Data.Array.Generic. For arrays that can be
mutated in-place, please see Streamly.Data.MutArray.
Synopsis
- data Array a
- pin :: Array a -> IO (Array a)
- unpin :: Array a -> IO (Array a)
- isPinned :: Array a -> Bool
- createOf :: forall m a. (MonadIO m, Unbox a) => Int -> Fold m a (Array a)
- create :: forall m a. (MonadIO m, Unbox a) => Fold m a (Array a)
- writeLastN :: (Storable a, Unbox a, MonadIO m) => Int -> Fold m a (Array a)
- fromListN :: Unbox a => Int -> [a] -> Array a
- fromList :: Unbox a => [a] -> Array a
- toList :: Unbox a => Array a -> [a]
- read :: (Monad m, Unbox a) => Array a -> Stream m a
- readRev :: (Monad m, Unbox a) => Array a -> Stream m a
- reader :: forall m a. (Monad m, Unbox a) => Unfold m (Array a) a
- readerRev :: forall m a. (Monad m, Unbox a) => Unfold m (Array a) a
- cast :: forall a b. Unbox b => Array a -> Maybe (Array b)
- asBytes :: Array a -> Array Word8
- length :: Unbox a => Array a -> Int
- getIndex :: forall a. Unbox a => Int -> Array a -> Maybe a
- pinnedSerialize :: Serialize a => a -> Array Word8
- deserialize :: Serialize a => Array Word8 -> a
- class Unbox a where
- sizeOf :: Proxy a -> Int
- peekAt :: Int -> MutByteArray -> IO a
- peekByteIndex :: Int -> MutByteArray -> IO a
- pokeAt :: Int -> MutByteArray -> a -> IO ()
- pokeByteIndex :: Int -> MutByteArray -> a -> IO ()
- class Serialize a where
- addSizeTo :: Int -> a -> Int
- deserializeAt :: Int -> MutByteArray -> Int -> IO (Int, a)
- serializeAt :: Int -> MutByteArray -> a -> IO Int
- writeN :: forall m a. (MonadIO m, Unbox a) => Int -> Fold m a (Array a)
- write :: forall m a. (MonadIO m, Unbox a) => Fold m a (Array a)
Setup
To execute the code examples provided in this module in ghci, please run the following commands first.
>>>
:m
>>>
:set -XFlexibleContexts
>>>
:set -XMagicHash
>>>
import Data.Function ((&))
>>>
import Data.Functor.Identity (Identity(..))
>>>
import System.IO.Unsafe (unsafePerformIO)
>>>
import Streamly.Data.Array (Array)
>>>
import Streamly.Data.Stream (Stream)
>>>
import qualified Streamly.Data.Array as Array
>>>
import qualified Streamly.Data.Fold as Fold
>>>
import qualified Streamly.Data.ParserK as ParserK
>>>
import qualified Streamly.Data.Stream as Stream
>>>
import qualified Streamly.Data.StreamK as StreamK
For APIs that have not been released yet.
>>>
import qualified Streamly.Internal.Data.Array as Array
>>>
import qualified Streamly.Internal.Data.Stream as Stream
Overview
This module provides APIs to create and use unboxed immutable arrays. Once
created, their contents cannot be modified. Only types that are unboxable
via the Unbox
type class can be stored in these arrays. Note that the
array memory grows automatically when creating a new array, therefore, an
array can be created from a variable length stream.
Folding Arrays
Convert array to stream, and fold the stream:
>>>
fold f arr = Array.read arr & Stream.fold f
>>>
fold Fold.sum (Array.fromList [1,2,3::Int])
6
Transforming Arrays
Convert array to stream, transform, and fold back to array:
>>>
amap f arr = Array.read arr & fmap f & Stream.fold Array.create
>>>
amap (+1) (Array.fromList [1,2,3::Int])
fromList [2,3,4]
Pinned and Unpinned Arrays
The array type can use both pinned and unpinned memory under the hood. The default array creation operations create unpinned arrays. IO operations automatically copy an array to pinned memory if the array passed to it is unpinned. Programmers can use appropriate pinned array generation APIs to reduce the copying if it happens.
Unpinned arrays have the advantage of allowing automatic defragmentation of the memory by GC. Whereas pinned arrays have the advantage of not requiring a copy by GC. Normally you would want to use unpinned arrays. However, in some cases, for example, for long lived large data storage, and for interfacing with the operating system or foreign (non-Haskell) consumers you may want to use pinned arrays.
Creating Arrays from Non-IO Streams
Array creation folds require MonadIO
otherwise the compiler may
incorrectly share the array memory thinking it is pure.
See the fromPureStream
unreleased API to generate an array from an
Identity stream safely without using MonadIO constraint.
Note that Identity
streams can be generalized to IO streams:
>>>
pure = Stream.fromList [1,2,3] :: Stream Identity Int
>>>
generally = Stream.morphInner (return . runIdentity)
>>>
Stream.fold Array.create (generally pure :: Stream IO Int)
fromList [1,2,3]
Programming Tips
This module is designed to be imported qualified:
>>>
import qualified Streamly.Data.Array as Array
The Array Type
Instances
a ~ Char => IsString (Array a) Source # | |
Defined in Streamly.Internal.Data.Array.Type fromString :: String -> Array a Source # | |
Unbox a => Monoid (Array a) Source # | |
Unbox a => Semigroup (Array a) Source # | This should not be used for combining many or N arrays as it would copy
the two arrays everytime to a new array. For coalescing multiple arrays use
|
Unbox a => IsList (Array a) Source # | |
(Unbox a, Read a, Show a) => Read (Array a) Source # | |
(Show a, Unbox a) => Show (Array a) Source # | |
Eq (Array Int16) Source # | |
Eq (Array Int32) Source # | |
Eq (Array Int64) Source # | |
Eq (Array Int8) Source # | |
Eq (Array Word16) Source # | |
Eq (Array Word32) Source # | |
Eq (Array Word64) Source # | |
Eq (Array Word8) Source # | |
Eq (Array Char) Source # | |
Eq (Array Int) Source # | |
(Unbox a, Eq a) => Eq (Array a) Source # | If the type allows a byte-by-byte comparison this instance can be
overlapped by a more specific instance that uses |
(Unbox a, Ord a) => Ord (Array a) Source # | |
Defined in Streamly.Internal.Data.Array.Type | |
Serialize (Array a) Source # | |
Defined in Streamly.Internal.Data.Serialize.Type | |
type Item (Array a) Source # | |
Defined in Streamly.Internal.Data.Array.Type |
Pinning & Unpinning
Arrays are created unpinned by default unless pinned versions of
creation APIs are used. Look for APIs with pinned
prefix in
Streamly.Internal.Data.Array for some unreleased pinned creation APIs.
If an array is to be sent to the OS without any further modification
then it should be created pinned in the first place instead of pinning
it later. Pinning an unpinned array has a copy overhead. OS interfacing
APIs create a pinned array directly or convert an unpinned array to
pinned array before sending it to the OS.
pin :: Array a -> IO (Array a) Source #
Return a copy of the Array
in pinned memory if unpinned, else return the
original array.
unpin :: Array a -> IO (Array a) Source #
Return a copy of the Array
in unpinned memory if pinned, else return the
original array.
Construction
When performance matters, the fastest way to generate an array is
createOf
. IsList
and IsString
instances can be
used to conveniently construct arrays from literal values.
OverloadedLists
extension or fromList
can be used to construct an
array from a list literal. Similarly, OverloadedStrings
extension or
fromList
can be used to construct an array from a string literal.
From Stream
createOf :: forall m a. (MonadIO m, Unbox a) => Int -> Fold m a (Array a) Source #
createOf n
folds a maximum of n
elements from the input stream to an
Array
.
create :: forall m a. (MonadIO m, Unbox a) => Fold m a (Array a) Source #
Fold the whole input to a single array.
Caution! Do not use this on infinite streams.
writeLastN :: (Storable a, Unbox a, MonadIO m) => Int -> Fold m a (Array a) Source #
writeLastN n
folds a maximum of n
elements from the end of the input
stream to an Array
.
From List
fromListN :: Unbox a => Int -> [a] -> Array a Source #
Create an Array
from the first N elements of a list. The array is
allocated to size N, if the list terminates before N elements then the
array may hold less than N elements.
fromList :: Unbox a => [a] -> Array a Source #
Create an Array
from a list. The list must be of finite size.
To List
To Stream
read :: (Monad m, Unbox a) => Array a -> Stream m a Source #
Convert an Array
into a stream.
Pre-release
readRev :: (Monad m, Unbox a) => Array a -> Stream m a Source #
Convert an Array
into a stream in reverse order.
Pre-release
Unfolds
reader :: forall m a. (Monad m, Unbox a) => Unfold m (Array a) a Source #
Unfold an array into a stream.
readerRev :: forall m a. (Monad m, Unbox a) => Unfold m (Array a) a Source #
Unfold an array into a stream in reverse order.
Casting
cast :: forall a b. Unbox b => Array a -> Maybe (Array b) Source #
Cast an array having elements of type a
into an array having elements of
type b
. The length of the array should be a multiple of the size of the
target element otherwise Nothing
is returned.
Random Access
length :: Unbox a => Array a -> Int Source #
O(1) Get the length of the array i.e. the number of elements in the array.
getIndex :: forall a. Unbox a => Int -> Array a -> Maybe a Source #
O(1) Lookup the element at the given index. Index starts from 0.
Serialization
pinnedSerialize :: Serialize a => a -> Array Word8 Source #
Serialize a Haskell type to a pinned byte array. The array is allocated using pinned memory so that it can be used directly in OS APIs for writing to file or sending over the network.
Properties:
1. Identity: deserialize . pinnedSerialize == id
2. Encoded equivalence: pinnedSerialize a == pinnedSerialize a
deserialize :: Serialize a => Array Word8 -> a Source #
Decode a Haskell type from a byte array containing its serialized representation.
Re-exports
The Unbox
type class provides operations for serialization (unboxing)
and deserialization (boxing) of fixed-length, non-recursive Haskell data
types to and from their byte stream representation.
Unbox uses fixed size encoding, therefore, size is independent of the value,
it must be determined solely by the type. This restriction makes types with
Unbox
instances suitable for storing in arrays. Note that sum types may
have multiple constructors of different sizes, the size of a sum type is
computed as the maximum required by any constructor.
The peekAt
operation reads as many bytes from the mutable byte
array as the size
of the data type and builds a Haskell data type from
these bytes. pokeAt
operation converts a Haskell data type to its
binary representation which consists of size
bytes and then stores
these bytes into the mutable byte array. These operations do not check the
bounds of the array, the user of the type class is expected to check the
bounds before peeking or poking.
IMPORTANT: The serialized data's byte ordering remains the same as the host machine's byte order. Therefore, it can not be deserialized from host machines with a different byte ordering.
Instances can be derived via Generics, Template Haskell, or written manually. Note that the data type must be non-recursive. WARNING! Generic and Template Haskell deriving, both hang for recursive data types. Deriving via Generics is more convenient but Template Haskell should be preferred over Generics for the following reasons:
- Instances derived via Template Haskell provide better and more reliable performance.
- Generic deriving allows only 256 fields or constructor tags whereas template Haskell has no limit.
Here is an example, for deriving an instance of this type class using generics:
>>>
import GHC.Generics (Generic)
>>>
:{
data Object = Object { _int0 :: Int , _int1 :: Int } deriving Generic :}
>>>
import Streamly.Data.MutByteArray (Unbox(..))
>>>
instance Unbox Object
To derive the instance via Template Haskell:
import Streamly.Data.MutByteArray (deriveUnbox) $(deriveUnbox [d|instance Unbox Object|])
See deriveUnbox
for more information on deriving
using Template Haskell.
If you want to write the instance manually:
>>>
:{
instance Unbox Object where sizeOf _ = 16 peekAt i arr = do -- Check the array bounds x0 <- peekAt i arr x1 <- peekAt (i + 8) arr return $ Object x0 x1 pokeAt i arr (Object x0 x1) = do -- Check the array bounds pokeAt i arr x0 pokeAt (i + 8) arr x1 :}
Nothing
sizeOf :: Proxy a -> Int Source #
Get the size. Size cannot be zero, should be at least 1 byte.
peekAt :: Int -> MutByteArray -> IO a Source #
peekAt byte-offset array
reads an element of type a
from the
the given the byte offset in the array.
IMPORTANT: The implementation of this interface may not check the bounds of the array, the caller must not assume that.
peekByteIndex :: Int -> MutByteArray -> IO a Source #
Deprecated: Use peekAt.
pokeAt :: Int -> MutByteArray -> a -> IO () Source #
pokeAt byte-offset array
writes an element of type a
to the
the given the byte offset in the array.
IMPORTANT: The implementation of this interface may not check the bounds of the array, the caller must not assume that.
pokeByteIndex :: Int -> MutByteArray -> a -> IO () Source #
Deprecated: Use pokeAt.
Instances
class Serialize a where Source #
The Serialize
type class provides operations for serialization and
deserialization of general Haskell data types to and from their byte stream
representation.
Unlike Unbox
, Serialize
uses variable length encoding, therefore, it can
serialize recursive and variable length data types like lists, or variable
length sum types where the length of the value may vary depending on a
particular constructor. For variable length data types the length is encoded
along with the data.
The deserializeAt
operation reads bytes from the mutable byte array and
builds a Haskell data type from these bytes, the number of bytes it reads
depends on the type and the encoded value it is reading. serializeAt
operation converts a Haskell data type to its binary representation which
must consist of as many bytes as added by the addSizeTo
operation for that
value and then stores these bytes into the mutable byte array. The
programmer is expected to use the addSizeTo
operation and allocate an
array of sufficient length before calling serializeAt
.
IMPORTANT: The serialized data's byte ordering remains the same as the host machine's byte order. Therefore, it can not be deserialized from host machines with a different byte ordering.
Instances can be derived via Template Haskell, or written manually.
Here is an example, for deriving an instance of this type class using template Haskell:
>>>
:{
data Object = Object { _obj1 :: [Int] , _obj2 :: Int } :}
import Streamly.Data.MutByteArray (deriveSerialize) $(deriveSerialize [d|instance Serialize Object|])
See deriveSerialize
and
deriveSerializeWith
for more information on
deriving using Template Haskell.
Here is an example of a manual instance.
>>>
import Streamly.Data.MutByteArray (Serialize(..))
>>>
:{
instance Serialize Object where addSizeTo acc obj = addSizeTo (addSizeTo acc (_obj1 obj)) (_obj2 obj) deserializeAt i arr len = do -- Check the array bounds before reading (i1, x0) <- deserializeAt i arr len (i2, x1) <- deserializeAt i1 arr len pure (i2, Object x0 x1) serializeAt i arr (Object x0 x1) = do i1 <- serializeAt i arr x0 i2 <- serializeAt i1 arr x1 pure i2 :}
addSizeTo :: Int -> a -> Int Source #
addSizeTo accum value
returns accum
incremented by the size of the
serialized representation of value
in bytes. Size cannot be zero. It
should be at least 1 byte.
deserializeAt :: Int -> MutByteArray -> Int -> IO (Int, a) Source #
deserializeAt byte-offset array arrayLen
deserializes a value from
the given byte-offset in the array. Returns a tuple consisting of the
next byte-offset and the deserialized value.
The arrayLen passed is the entire length of the input buffer. It is to be used to check if we would overflow the input buffer when deserializing.
Throws an exception if the operation would exceed the supplied arrayLen.
serializeAt :: Int -> MutByteArray -> a -> IO Int Source #
serializeAt byte-offset array value
writes the serialized
representation of the value
in the array at the given byte-offset.
Returns the next byte-offset.
This is an unsafe operation, the programmer must ensure that the array
has enough space available to serialize the value as determined by the
addSizeTo
operation.