streamly-core-0.2.1: Streaming, parsers, arrays, serialization and more
Copyright(c) 2023 Composewell Technologies
LicenseBSD3-3-Clause
Maintainerstreamly@composewell.com
PortabilityGHC
Safe HaskellSafe-Inferred
LanguageHaskell2010

Streamly.Data.MutByteArray

Description

A low level byte Array type MutByteArray, along with type classes Unbox and Serialize for fast binary serialization and deserialization of Haskell values. Serialization, deserialization performance is similar to, and in some cases many times better than the store package. Conceptually, the Serialize type class works in the same way as store.

Serialize instances are configurable to use constructor names (see encodeConstrNames), record field names (see encodeRecordFields) instead of binary encoded values. This is an experimental feature which allows JSON like properties with faster speed. For example, you can change the order of constructors or record fields without affecting serialized value.

Higher level unboxed array modules Streamly.Data.Array and Streamly.Data.MutArray are built on top of this module. Unboxed arrays are essentially serialized Haskell values. Array modules provide higher level serialization routines like pinnedSerialize and deserialize from the Streamly.Internal.Data.Array module.

Mutable Byte Array

MutByteArray is a primitive mutable array in the IO monad. Unbox and Serialize type classes use this primitive array to serialize data to and deserialize it from. This array is used to build higher level unboxed array types MutArray and Array.

Using Unbox

The Unbox type class is simple and used to serialize non-recursive fixed size data types. This type class is primarily used to implement unboxed arrays. Unboxed arrays are just a sequence of serialized fixed length Haskell data types. Instances of this type class can be derived using Generic or template haskell based deriving functions provided in this module.

Writing a data type to an array using the array creation routines in Streamly.Data.Array or Streamly.Data.MutArray (e.g. writeN or fromListN), serializes the type to the array. Similarly, reading the data type from the array deserializes it. You can also serialize and deserialize directly to and from a MutByteArray, using the type class methods.

Using Serialize

The Serialize type class is a superset of the Unbox type class, it can serialize variable length data types as well e.g. Haskell lists. Use deriveSerialize to derive the instances of the type class automatically and then use the type class methods to serialize and deserialize to and from a MutByteArray.

See pinnedSerialize and deserialize for Array type based serialization.

Comparing serialized values

When using the Unbox type class the same value may result in differing serialized bytes because of unused uninitialized data in case of sum types. Therefore, byte comparison of serialized values is not reliable.

However, the Serialize type class guarantees that the serialized values are always exactly the same and byte comparison of serialized is reliable.

Synopsis

Mutable Byte Array

The standard way to read from or write to a MutByteArray is by using the Unbox or Serialize type class methods.

data MutByteArray Source #

A lifted mutable byte array type wrapping MutableByteArray# RealWorld. This is a low level array used to back high level unboxed arrays and serialized data.

isPinned :: MutByteArray -> Bool Source #

Return True if the array is allocated in pinned memory.

pin :: MutByteArray -> IO MutByteArray Source #

Return a copy of the array in pinned memory if unpinned, else return the original array.

unpin :: MutByteArray -> IO MutByteArray Source #

Return a copy of the array in unpinned memory if pinned, else return the original array.

Unbox

class Unbox a where Source #

The Unbox type class provides operations for serialization (unboxing) and deserialization (boxing) of fixed-length, non-recursive Haskell data types to and from their byte stream representation.

Unbox uses fixed size encoding, therefore, size is independent of the value, it must be determined solely by the type. This restriction makes types with Unbox instances suitable for storing in arrays. Note that sum types may have multiple constructors of different sizes, the size of a sum type is computed as the maximum required by any constructor.

The peekAt operation reads as many bytes from the mutable byte array as the size of the data type and builds a Haskell data type from these bytes. pokeAt operation converts a Haskell data type to its binary representation which consists of size bytes and then stores these bytes into the mutable byte array. These operations do not check the bounds of the array, the user of the type class is expected to check the bounds before peeking or poking.

IMPORTANT: The serialized data's byte ordering remains the same as the host machine's byte order. Therefore, it can not be deserialized from host machines with a different byte ordering.

Instances can be derived via Generics, Template Haskell, or written manually. Note that the data type must be non-recursive. WARNING! Generic and Template Haskell deriving, both hang for recursive data types. Deriving via Generics is more convenient but Template Haskell should be preferred over Generics for the following reasons:

  1. Instances derived via Template Haskell provide better and more reliable performance.
  2. Generic deriving allows only 256 fields or constructor tags whereas template Haskell has no limit.

Here is an example, for deriving an instance of this type class using generics:

>>> import GHC.Generics (Generic)
>>> :{
data Object = Object
    { _int0 :: Int
    , _int1 :: Int
    } deriving Generic
:}
>>> import Streamly.Data.MutByteArray (Unbox(..))
>>> instance Unbox Object

To derive the instance via Template Haskell:

import Streamly.Data.MutByteArray (deriveUnbox)
$(deriveUnbox [d|instance Unbox Object|])

See deriveUnbox for more information on deriving using Template Haskell.

If you want to write the instance manually:

>>> :{
instance Unbox Object where
    sizeOf _ = 16
    peekAt i arr = do
       -- Check the array bounds
        x0 <- peekAt i arr
        x1 <- peekAt (i + 8) arr
        return $ Object x0 x1
    pokeAt i arr (Object x0 x1) = do
       -- Check the array bounds
        pokeAt i arr x0
        pokeAt (i + 8) arr x1
:}

Minimal complete definition

Nothing

Methods

sizeOf :: Proxy a -> Int Source #

Get the size. Size cannot be zero, should be at least 1 byte.

default sizeOf :: SizeOfRep (Rep a) => Proxy a -> Int Source #

peekAt :: Int -> MutByteArray -> IO a Source #

peekAt byte-offset array reads an element of type a from the the given the byte offset in the array.

IMPORTANT: The implementation of this interface may not check the bounds of the array, the caller must not assume that.

default peekAt :: (Generic a, PeekRep (Rep a)) => Int -> MutByteArray -> IO a Source #

peekByteIndex :: Int -> MutByteArray -> IO a Source #

Deprecated: Use peekAt.

pokeAt :: Int -> MutByteArray -> a -> IO () Source #

pokeAt byte-offset array writes an element of type a to the the given the byte offset in the array.

IMPORTANT: The implementation of this interface may not check the bounds of the array, the caller must not assume that.

default pokeAt :: (Generic a, PokeRep (Rep a)) => Int -> MutByteArray -> a -> IO () Source #

pokeByteIndex :: Int -> MutByteArray -> a -> IO () Source #

Deprecated: Use pokeAt.

Instances

Instances details
Unbox IntPtr Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox WordPtr Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Fingerprint Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Int16 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Int32 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Int64 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Int8 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox IoSubSystem Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Word16 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Word32 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Word64 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Word8 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox MicroSecond64 Source # 
Instance details

Defined in Streamly.Internal.Data.Time.Units

Unbox MilliSecond64 Source # 
Instance details

Defined in Streamly.Internal.Data.Time.Units

Unbox NanoSecond64 Source # 
Instance details

Defined in Streamly.Internal.Data.Time.Units

Unbox () Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Bool Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Char Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Double Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Float Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Int Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Word Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox a => Unbox (Complex a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox a => Unbox (Identity a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox a => Unbox (Down a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox (FunPtr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox (Ptr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox a => Unbox (Ratio a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox (StablePtr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox a => Unbox (Const a b) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

deriveUnbox :: Q [Dec] -> Q [Dec] Source #

Given an Unbox instance declaration splice without the methods (e.g. [d|instance Unbox a => Unbox (Maybe a)|]), generate an instance declaration including all the type class method implementations.

Usage:

$(deriveUnbox [d|instance Unbox a => Unbox (Maybe a)|])

Serialize

class Serialize a where Source #

The Serialize type class provides operations for serialization and deserialization of general Haskell data types to and from their byte stream representation.

Unlike Unbox, Serialize uses variable length encoding, therefore, it can serialize recursive and variable length data types like lists, or variable length sum types where the length of the value may vary depending on a particular constructor. For variable length data types the length is encoded along with the data.

The deserializeAt operation reads bytes from the mutable byte array and builds a Haskell data type from these bytes, the number of bytes it reads depends on the type and the encoded value it is reading. serializeAt operation converts a Haskell data type to its binary representation which must consist of as many bytes as added by the addSizeTo operation for that value and then stores these bytes into the mutable byte array. The programmer is expected to use the addSizeTo operation and allocate an array of sufficient length before calling serializeAt.

IMPORTANT: The serialized data's byte ordering remains the same as the host machine's byte order. Therefore, it can not be deserialized from host machines with a different byte ordering.

Instances can be derived via Template Haskell, or written manually.

Here is an example, for deriving an instance of this type class using template Haskell:

>>> :{
data Object = Object
    { _obj1 :: [Int]
    , _obj2 :: Int
    }
:}
import Streamly.Data.MutByteArray (deriveSerialize)
$(deriveSerialize [d|instance Serialize Object|])

See deriveSerialize and deriveSerializeWith for more information on deriving using Template Haskell.

Here is an example of a manual instance.

>>> import Streamly.Data.MutByteArray (Serialize(..))
>>> :{
instance Serialize Object where
    addSizeTo acc obj = addSizeTo (addSizeTo acc (_obj1 obj)) (_obj2 obj)
    deserializeAt i arr len = do
         -- Check the array bounds before reading
        (i1, x0) <- deserializeAt i arr len
        (i2, x1) <- deserializeAt i1 arr len
        pure (i2, Object x0 x1)
    serializeAt i arr (Object x0 x1) = do
        i1 <- serializeAt i arr x0
        i2 <- serializeAt i1 arr x1
        pure i2
:}

Methods

addSizeTo :: Int -> a -> Int Source #

addSizeTo accum value returns accum incremented by the size of the serialized representation of value in bytes. Size cannot be zero. It should be at least 1 byte.

deserializeAt :: Int -> MutByteArray -> Int -> IO (Int, a) Source #

deserializeAt byte-offset array arrayLen deserializes a value from the given byte-offset in the array. Returns a tuple consisting of the next byte-offset and the deserialized value.

The arrayLen passed is the entire length of the input buffer. It is to be used to check if we would overflow the input buffer when deserializing.

Throws an exception if the operation would exceed the supplied arrayLen.

serializeAt :: Int -> MutByteArray -> a -> IO Int Source #

serializeAt byte-offset array value writes the serialized representation of the value in the array at the given byte-offset. Returns the next byte-offset.

This is an unsafe operation, the programmer must ensure that the array has enough space available to serialize the value as determined by the addSizeTo operation.

Instances

Instances details
Serialize Int16 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Int32 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Int64 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Int8 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Word16 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Word32 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Word64 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Word8 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Integer Source # 
Instance details

Defined in Streamly.Internal.Data.MutByteArray

Serialize () Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Methods

addSizeTo :: Int -> () -> Int Source #

deserializeAt :: Int -> MutByteArray -> Int -> IO (Int, ()) Source #

serializeAt :: Int -> MutByteArray -> () -> IO Int Source #

Serialize Bool Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Char Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Double Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Float Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Int Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Word Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize (FunPtr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize (Ptr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize (StablePtr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize (Array a) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize a => Serialize (Maybe a) Source # 
Instance details

Defined in Streamly.Internal.Data.MutByteArray

Serialize a => Serialize [a] Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Methods

addSizeTo :: Int -> [a] -> Int Source #

deserializeAt :: Int -> MutByteArray -> Int -> IO (Int, [a]) Source #

serializeAt :: Int -> MutByteArray -> [a] -> IO Int Source #

(Serialize a, Serialize b) => Serialize (Either a b) Source # 
Instance details

Defined in Streamly.Internal.Data.MutByteArray

Serialize (Proxy a) Source # 
Instance details

Defined in Streamly.Internal.Data.MutByteArray

(Serialize a, Serialize b) => Serialize (a, b) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Methods

addSizeTo :: Int -> (a, b) -> Int Source #

deserializeAt :: Int -> MutByteArray -> Int -> IO (Int, (a, b)) Source #

serializeAt :: Int -> MutByteArray -> (a, b) -> IO Int Source #

Instance Config

data SerializeConfig Source #

Configuration to control how the Serialize instance is generated. The configuration is opaque and is modified by composing config modifier functions, for example:

>>> (inlineSerializeAt (Just NoInline)) . (inlineSerializeAt (Just Inlinable))

The default configuration settings are:

The following experimental options are also available:

inlineAddSizeTo :: Maybe Inline -> SerializeConfig -> SerializeConfig Source #

How should we inline the addSizeTo function? The default is Nothing which means left to the compiler. Forcing inline on addSizeTo function actually worsens some benchmarks and improves none.

inlineSerializeAt :: Maybe Inline -> SerializeConfig -> SerializeConfig Source #

How should we inline the serialize function? The default 'Just Inline'. However, aggressive inlining can bloat the code and increase in compilation times when there are big functions and too many nesting levels so you can change it accordingly. A Nothing value leaves the decision to the compiler.

inlineDeserializeAt :: Maybe Inline -> SerializeConfig -> SerializeConfig Source #

How should we inline the deserialize function? See guidelines in inlineSerializeAt.

Instance Deriving

deriveSerialize :: Q [Dec] -> Q [Dec] Source #

Given an Serialize instance declaration splice without the methods (e.g. [d|instance Serialize a => Serialize (Maybe a)|]), generate an instance declaration including all the type class method implementations.

>>> deriveSerialize = deriveSerializeWith id

Usage:

$(deriveSerialize
      [d|instance Serialize a => Serialize (Maybe a)|])

deriveSerializeWith :: (SerializeConfig -> SerializeConfig) -> Q [Dec] -> Q [Dec] Source #

deriveSerializeWith config-modifier instance-dec generates a template Haskell splice consisting of a declaration of a Serialize instance. instance-dec is a template Haskell declaration splice consisting of a standard Haskell instance declaration without the type class methods (e.g. [d|instance Serialize a => Serialize (Maybe a)|]).

The type class methods for the given instance are generated according to the supplied config-modifier parameter. See SerializeConfig for default configuration settings.

Usage:

$(deriveSerializeWith
      ( inlineSerializeAt (Just NoInline)
      . inlineDeserializeAt (Just NoInline)
      )
      [d|instance Serialize a => Serialize (Maybe a)|])