Safe Haskell | None |
---|
Bio.Core.Sequence
Description
This module defines common data structures for biosequences, i.e. data that represents nucleotide or protein sequences.
Basically, anything resembling or wrapping a sequence should
implement the BioSeq
class (and BioSeqQual
if quality information
is available).
The data types are mostly wrappers from lazy bytestrings from
Lazy
and Char8
, but most users
of this module should not need to access the underlying data types directly.
- newtype Qual = Qual {}
- newtype Offset = Offset {}
- newtype SeqData = SeqData {
- unSD :: ByteString
- newtype SeqLabel = SeqLabel {
- unSL :: ByteString
- newtype QualData = QualData {
- unQD :: ByteString
- class BioSeq s where
- class BioSeq sq => BioSeqQual sq where
- toFasta :: BioSeq s => s -> ByteString
- toFastaQual :: BioSeqQual s => s -> ByteString
- toFastQ :: BioSeqQual s => s -> ByteString
- module Data.Stringable
Data definitions
A quality value is in the range 0..255.
An Offset
is a zero-based index into a sequence
Sequence data are lazy bytestrings of ASCII characters.
Constructors
SeqData | |
Fields
|
Sequence data are lazy bytestrings of ASCII characters.
Constructors
SeqLabel | |
Fields
|
Quality data are lazy bytestrings of Qual
s.
Constructors
QualData | |
Fields
|
Class definitions
The BioSeq
class models sequence data, and any data object that
represents a biological sequence should implement it.
Methods
Arguments
:: s | |
-> SeqLabel | Sequence identifier (typically first word of the header) |
Arguments
:: s | |
-> SeqLabel | Sequence header (may contain whitespace), by convention the
first word matches the |
Arguments
:: s | |
-> SeqData | Sequence data |
Arguments
:: s | |
-> Offset | Sequence length |
Arguments
:: s | |
-> SeqLabel | Deprecated. Instead, use |
Deprecated: Warning: 'seqlabel' is deprecated, use 'seqid' or 'seqheader' instead.
class BioSeq sq => BioSeqQual sq whereSource
The BioSeqQual class extends BioSeq
with quality data. Any correspondig data object
should be an instance, this will allow Fasta formatted quality data toFastaQual
, as
well as the combined FastQ format (via toFastQ
).
Helper functions
toFasta :: BioSeq s => s -> ByteStringSource
Any BioSeq
can be formatted as Fasta, 60-char lines.
toFastaQual :: BioSeqQual s => s -> ByteStringSource
Output Fasta-formatted quality data (.qual files), where quality values are output as whitespace-separated integers.
toFastQ :: BioSeqQual s => s -> ByteStringSource
Output FastQ-formatted data. For simplicity, only the Sanger quality format is supported, and only four lines per sequence (i.e. no line breaks in sequence or quality data).
module Data.Stringable