serdoc
Unified serialization with semi-automatic documentation
Introduction
SerDoc provides:
- A unified interface for serialization formats ("codecs"), in the form of a
'Serializable' typeclass.
- A mini-EDSL (
FieldInfo
) for describing serialization formats as first-class
data structures, and a typeclass (HasInfo
) to link them to codecs and
serializable Haskell types.
- Building blocks and utility code for implementing
Codec
, Serializable
,
and HasInfo
for existing or new serialization formats.
It also includes an implementation of these typeclasses for the binary
package.
Components
SerDoc is split up into two sub-projects:
serdoc-core (this library)
, which provides the typeclasses and building blocks
serdoc-binary
, which provides instances for binary
How To Use
Serializing and deserializing values
For encoding, it's straightforward - use encode
.
For decoding, you have a few options, depending on the codec you use. The most
general form is decodeM
; apart from that, a family of similar functions is
provided, following the conventions:
- Monadic decoders have an
M
suffix; pure decoders (where the decoding Monad
is Except err
or Identity
) have no M
suffix (e.g., decode
vs.
decodeM
).
- Flavors that ignore any remaining unconsumed input have a
_
suffix (e.g.
decode_
).
- Flavors that convert decoding errors to
Either
s have an Either
suffix
(e.g. decodeMEither
); these require that the decoding monad is Except err
or ExceptT err m
.
Keep in mind that depending on how the codec works, the serialized data may be
returned / consumed via the Encoded
type, or passed by (mutable) reference
through the Context
object. The API purposefully supports both ways, because
a given codec may only support one or the other.
Implementing HasInfo
and Serializable
for an existing codec
- For newtype wrappers that use the same serialization format as their
wrapped payloads, the easiest way is to use
GeneralizedNewtypeDeriving
to
derive instances for both typeclasses.
- For newtype wrappers that should implement a different serialization
format, you may need to hand-write instances; if you do this, take special
care to ensure that the
HasInfo
instance matches the actual serialization.
- Lists and some other data structures are supported out of the box and
require no explicit instance; they are serialized using a 32-bit list length
followed by the serialized list elements, in order. If you need a different
representation, then newtype-wrapping may be necessary.
- For enumeration types, a generic wrapper,
ViaEnum
, is provided, which
you can use in combination with the DerivingVia
extension; alternatively,
you can use enumInfo
, encodeEnum
, and decodeEnum
to write the instances
yourself.
String
, due to being just a type alias for [Char]
, will only
serialize iff an instance for Char
exists. However, it is usually preferred
to convert your strings to Text
.
- For record types, consider using
deriveSerDoc
(found in
Data.SerDoc.TH
). This Template Haskell function will generate matching
instances for both typeclasses, following the convention of serializing all
fields in the order they appear in the type declaration, and labelled by
their Haskell field names. Obviously this will only work if instances for
each of the record fields exist.
Adding your own codecs
A codec is indicated using a phantom type; no values of that type ever need to
exist at runtime, we merely use it to identify the codec we want, so we can
define it as a constructorless data
type (like Void
), e.g.:
data MyFantasticCodec
We then need a Codec
instance, which is where we define:
- A type for a context passed to each invocation of
encode
and decodeM
;
this can be anything you want, depending on the needs of your codec. If the
codec does not require any context, use ()
.
- A monadic data type used for encoding,
MonadEncode
. For pure codecs, this
can be Identity
; if you serialize directly to something like a file handle
or network socket, it will typically have to be IO
, or some MonadIO
.
- A monadic data type used for decoding,
MonadDecode
. Since decoding can
fail, this will typically involve not just the required effects for the
decoding process itself, but also some form of error handling. It is
recommended to use Except err
for pure codecs, and ExceptT err IO
(or MonadIO m => ExceptT err m
) for codecs that require IO, where err
is an appropriate
error data structure for your codec.
- The default encoding to use for enum types (optional, defaults to
Word16
).
Providing instances for a reasonable set of primitive values and data
structures is highly recommended; a minimum viable set might be:
()
Bool
Int
, Int8
, Int16
, Int32
, Int64
Word
, Word8
, Word16
, Word32
, Word64
- Whatever type you picked for the default enum encoding
[a]
Maybe a
Either a b
- Tuples up to 7 elements
ByteString