Bio.Sequence.SeqData

bio-0.3.3.2: A bioinformatics library

Bio.Sequence.SeqData

Contents

Data structure
Accessor functions
Converting to and from [Char]
Nucleotide functionality
Protein functionality

Description

Data structures for manipulating (biological) sequences.

Generally supports both nucleotide and protein sequences, some functions, like revcompl, only makes sense for nucleotides.

Synopsis

data Sequence = Seq !SeqData !SeqData !(Maybe QualData)

type Offset = Int64

type SeqData = ByteString

type Qual = Word8

type QualData = ByteString

(!) :: Sequence -> Offset -> Char

seqlength :: Sequence -> Offset

seqlabel :: Sequence -> SeqData

seqheader :: Sequence -> SeqData

seqdata :: Sequence -> SeqData

(?) :: Sequence -> Offset -> Qual

hasqual :: Sequence -> Bool

seqqual :: Sequence -> QualData

fromStr :: String -> SeqData

toStr :: SeqData -> String

compl :: Char -> Char

revcompl :: Sequence -> Sequence

data Amino

translate :: Sequence -> Offset -> [Amino]

fromIUPAC :: SeqData -> [Amino]

toIUPAC :: [Amino] -> SeqData

A sequence is a header, sequence data itself, and optional quality data. All items are lazy bytestrings. The Offset type can be used for indexing.

data Sequence

A sequence consists of a header, the sequence data itself, and optional quality data.

Constructors

Seq !SeqData !SeqData !(Maybe QualData)

header and actual sequence

show/hide

Instances

type Offset = Int64

An offset, index, or length of a SeqData

type SeqData = ByteString

The basic data type used in Sequences

Quality data is normally associated with nucleotide sequences

type Qual = Word8

Basic type for quality data. Range 0..255. Typical Phred output is in the range 6..50, with 20 as the line in the sand separating good from bad.

type QualData = ByteString

Quality data is a Qual vector, currently implemented as a ByteString.

Accessor functions

(!) :: Sequence -> Offset -> Char

Read the character at the specified position in the sequence.

seqlength :: Sequence -> Offset

Return sequence length.

seqlabel :: Sequence -> SeqData

Return sequence label (first word of header)

seqheader :: Sequence -> SeqData

Return full header.

seqdata :: Sequence -> SeqData

Return the sequence data.

(?) :: Sequence -> Offset -> Qual

hasqual :: Sequence -> Bool

Check whether the sequence has associated quality data.

seqqual :: Sequence -> QualData

Return the quality data, or error if none exist. Use hasqual if in doubt.

Converting to and from [Char]

fromStr :: String -> SeqData

Convert a String to SeqData

toStr :: SeqData -> String

Convert a SeqData to a String

Nucleotide functionality

Nucleotide sequences contain the alphabet [A,C,G,T]. IUPAC specifies an extended nucleotide alphabet with wildcards, but it is not supported at this point.

compl :: Char -> Char

Complement a single character. I.e. identify the nucleotide it can hybridize with. Note that for multiple nucleotides, you usually want the reverse complement (see revcompl for that).

revcompl :: Sequence -> Sequence

Calculate the reverse complement. This is only relevant for the nucleotide alphabet, and it leaves other characters unmodified.

Protein functionality

Proteins are chains of amino acids, represented by the IUPAC alphabet.

data Amino

Constructors

Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Tyr
Trp
Val
STP
Asx
Glx
Xle
Xaa

show/hide

Instances

translate :: Sequence -> Offset -> [Amino]

Translate a nucleotide sequence into the corresponding protein sequence. This works rather blindly, with no attempt to identify ORFs or otherwise QA the result.

fromIUPAC :: SeqData -> [Amino]

Convert a sequence in IUPAC format to a list of amino acids.

toIUPAC :: [Amino] -> SeqData

Convert a list of amino acids to a sequence in IUPAC format.

Produced by Haddock version 2.4.2