|
|
|
|
|
Description |
Data structures for manipulating (biological) sequences.
Generally supports both nucleotide and protein sequences, some functions,
like revcompl, only makes sense for nucleotides.
|
|
Synopsis |
|
|
|
|
Data structure
|
|
A sequence is a header, sequence data itself, and optional quality data.
Sequences are type-tagged to identify them as nucleotide, amino acids,
or unknown type.
All items are lazy bytestrings. The Offset type can be used for indexing.
|
|
|
A sequence consists of a header, the sequence data itself, and optional quality data.
The type parameter is a phantom type to separate nucleotide and amino acid sequences
| Constructors | | Instances | |
|
|
|
An offset, index, or length of a SeqData
|
|
|
The basic data type used in Sequences
|
|
Quality data is normally associated with nucleotide sequences
|
|
|
Basic type for quality data. Range 0..255. Typical Phred output is in
the range 6..50, with 20 as the line in the sand separating good from bad.
|
|
|
Quality data is a Qual vector, currently implemented as a ByteString.
|
|
Accessor functions
|
|
|
Read the character at the specified position in the sequence.
|
|
|
Return sequence length.
|
|
|
Return sequence label (first word of header)
|
|
|
Return full header.
|
|
|
Return the sequence data.
|
|
|
|
|
Check whether the sequence has associated quality data.
|
|
|
Return the quality data, or error if none exist. Use hasqual if in doubt.
|
|
Adding information to header
|
|
|
|
|
Modify the header by appending text, or by replacing
all but the sequence label (i.e. first word).
|
|
Converting to and from [Char]
|
|
|
Convert a String to SeqData
|
|
|
Convert a SeqData to a String
|
|
Sequence utilities
|
|
|
Returns a sequence with all internal storage freshly copied and
with sequence and quality data present as a single chunk.
By freshly copying internal storage, defragSeq allows garbage
collection of the original data source whence the sequence was
read; otherwise, use of just a short sequence name can cause an
entire sequence file buffer to be retained.
By compacting sequence data into a single chunk, defragSeq avoids
linear-time traversal of sequence chunks during random access into
sequence data.
|
|
Nucleotide functionality
|
|
Nucleotide sequences contain the alphabet [A,C,G,T].
IUPAC specifies an extended nucleotide alphabet with wildcards, but
it is not supported at this point.
|
|
|
Complement a single character. I.e. identify the nucleotide it
can hybridize with. Note that for multiple nucleotides, you usually
want the reverse complement (see revcompl for that).
|
|
|
Calculate the reverse complement.
This is only relevant for the nucleotide alphabet,
and it leaves other characters unmodified.
|
|
|
Calculate the reverse complent for SeqData only.
|
|
|
For type tagging sequences (protein sequences use Amino below)
|
|
|
|
Phantom type functionality
|
|
Protein functionality
|
|
Proteins are chains of amino acids, represented by the IUPAC alphabet.
|
|
|
Constructors | Ala | | Arg | | Asn | | Asp | | Cys | | Gln | | Glu | | Gly | | His | | Ile | | Leu | | Lys | | Met | | Phe | | Pro | | Ser | | Thr | | Tyr | | Trp | | Val | | STP | | Asx | | Glx | | Xle | | Xaa | |
| Instances | |
|
|
|
Translate a nucleotide sequence into the corresponding protein
sequence. This works rather blindly, with no attempt to identify ORFs
or otherwise QA the result.
|
|
|
Convert a sequence in IUPAC format to a list of amino acids.
|
|
|
Convert a list of amino acids to a sequence in IUPAC format.
|
|
|
|
Display a nicely formated sequence.
|
|
|
A simple function to display a sequence: we generate the sequence string and
| call putStrLn
|
|
|
Returns a properly formatted and probably highlighted string
| representation of a sequence. Highlighting is done using ANSI-Escape
| sequences.
|
|
Default type for sequences
|
|
|
|
Produced by Haddock version 2.4.2 |