Safe Haskell	None
Language	Haskell98

Bio.Base

Description

Common data types used everywhere. This module is a collection of very basic "bioinformatics" data types that are simple, but don't make sense to define over and over.

Synopsis

Documentation

newtype Nucleotide Source

A nucleotide base. We only represent A,C,G,T.

Constructors

N
Fields unN :: Word8

newtype Nucleotides Source

A nucleotide base in an alignment. Experience says we're dealing with Ns and gaps all the type, so purity be damned, they are included as if they were real bases.

To allow Nucleotidess to be unpacked and incorparated into containers, we choose to represent them the same way as the BAM file format: as a 4 bit wide field. Gaps are encoded as 0 where they make sense, N is 15.

Constructors

Ns
Fields unNs :: Word8

newtype Qual Source

Qualities are stored in deciban, also known as the Phred scale. To represent a value p, we store -10 * log_10 p. Operations work directly on the "Phred" value, as the name suggests. The same goes for the Ord instance: greater quality means higher "Phred" score, meand lower error probability.

Constructors

Q
Fields unQ :: Word8

Instances

Bounded Qual Source
Eq Qual Source
Ord Qual Source
Show Qual Source
Storable Qual Source
Unbox Qual Source
Vector Vector Qual Source
MVector MVector Qual Source
data Vector Qual = V_Qual (Vector Word8) Source
data MVector s0 Qual = MV_Qual (MVector s Word8) Source

toQual :: (Floating a, RealFrac a) => a -> Qual Source

fromQual :: Qual -> Double Source

fromQualRaised :: Double -> Qual -> Double Source

probToQual :: Prob -> Qual Source

newtype Prob Source

A positive Double value stored in log domain. We store the natural logarithm (makes computation easier), but allow conversions to the familiar "Phred" scale used for Qual values.

Constructors

Pr
Fields unPr :: Double

Instances

Eq Prob Source
Fractional Prob Source
Num Prob Source
Ord Prob Source
Show Prob Source
Storable Prob Source
Unbox Prob Source
Vector Vector Prob Source
MVector MVector Prob Source
data Vector Prob = V_Prob (Vector Double) Source
data MVector s0 Prob = MV_Prob (MVector s Double) Source

toProb :: Double -> Prob Source

fromProb :: Prob -> Double Source

qualToProb :: Qual -> Prob Source

pow :: Prob -> Double -> Prob infixr 8 Source

data Word8 :: *

8-bit unsigned integer type

Instances

Bounded Word8
Enum Word8
Eq Word8
Integral Word8
Num Word8
Ord Word8
Read Word8
Real Word8
Show Word8
Ix Word8
Storable Word8
Bits Word8
FiniteBits Word8
Binary Word8
Hashable Word8
Prim Word8
Random Word8
Lift Word8
Unbox Word8
IArray UArray Word8
ReadableChunk ByteString Word8
ReadableChunk ByteString Word8
Vector Vector Word8
MVector MVector Word8
ReadableChunk [Word8] Word8
MArray (STUArray s) Word8 (ST s)
VecArrayRW ((:.) Word8 ())
VecArrayRW ((:.) Word8 v) => VecArrayRW ((:.) Word8 ((:.) Word8 v))
data Vector Word8 = V_Word8 (Vector Word8)
data MVector s Word8 = MV_Word8 (MVector s Word8)

nucA :: Nucleotide Source

nucC :: Nucleotide Source

nucG :: Nucleotide Source

nucT :: Nucleotide Source

nucsA :: Nucleotides Source

nucsC :: Nucleotides Source

nucsG :: Nucleotides Source

nucsT :: Nucleotides Source

nucsN :: Nucleotides Source

gap :: Nucleotides Source

toNucleotide :: Char -> Nucleotide Source

Converts a character into a Nucleotides. The usual codes for A,C,G,T and U are understood, - and . become gaps and everything else is an N.

toNucleotides :: Char -> Nucleotides Source

Converts a character into a Nucleotides. The usual codes for A,C,G,T and U are understood, - and . become gaps and everything else is an N.

nucToNucs :: Nucleotide -> Nucleotides Source

showNucleotide :: Nucleotide -> Char Source

showNucleotides :: Nucleotides -> Char Source

isGap :: Nucleotides -> Bool Source

Tests if a Nucleotides is a gap. Returns true only for the gap.

isBase :: Nucleotides -> Bool Source

Tests if a Nucleotides is a base. Returns True for everything but gaps.

isProperBase :: Nucleotides -> Bool Source

Tests if a Nucleotides is a proper base. Returns True for A,C,G,T only.

properBases :: [Nucleotides] Source

compl :: Nucleotide -> Nucleotide Source

Complements a Nucleotides.

compls :: Nucleotides -> Nucleotides Source

Complements a Nucleotides.

everything :: (Bounded a, Ix a) => [a] Source

type Seqid = ByteString Source

Sequence identifiers are ASCII strings Since we tend to store them for a while, we use strict byte strings. If you get a lazy bytestring from somewhere, use shelve to convert it for storage. Use unpackSeqid and packSeqid to avoid the import of Data.ByteString.

unpackSeqid :: Seqid -> String Source

Unpacks a Seqid into a String

packSeqid :: String -> Seqid Source

Packs a String into a Seqid. Only works for ASCII subset.

data Position Source

Coordinates in a genome. The position is zero-based, no questions about it. Think of the position as pointing to the crack between two bases: looking forward you see the next base to the right, looking in the reverse direction you see the complement of the first base to the left.

To encode the strand, we (virtually) reverse-complement any sequence and prepend it to the normal one. That way, reversed coordinates have a negative sign and automatically make sense. Position 0 could either be the beginning of the sequence or the end on the reverse strand... that ambiguity shouldn't really matter.

Constructors

Pos
Fields p_seq :: !Seqid sequence (e.g. some chromosome) p_start :: !Int offset, zero-based

Instances

Eq Position Source
Ord Position Source
Show Position Source

shiftPosition :: Int -> Position -> Position Source

Moves a Position. The position is moved forward according to the strand, negative indexes move backward accordingly.

p_is_reverse :: Position -> Bool Source

data Range Source

Ranges in genomes We combine a position with a length. In 'Range pos len', pos is always the start of a stretch of length len. Positions therefore move in the opposite direction on the reverse strand. To get the same stretch on the reverse strand, shift r_pos by r_length, then reverse direction (or call reverseRange).

Constructors

Range
Fields r_pos :: !Position r_length :: !Int

Instances

Eq Range Source
Ord Range Source
Show Range Source

shiftRange :: Int -> Range -> Range Source

Moves a Range. This is just shiftPosition lifted.

reverseRange :: Range -> Range Source

Reverses a Range to give the same Range on the opposite strand.

extendRange :: Int -> Range -> Range Source

Extends a range. The length of the range is simply increased.

insideRange :: Range -> Range -> Range Source

Expands a subrange. (range1 insideRange range2) interprets range1 as a subrange of range2 and computes its absolute coordinates. The sequence name of range1 is ignored.

wrapRange :: Int -> Range -> Range Source

Wraps a range to a region. This simply normalizes the start position to be in the interval '[0,n)', which only makes sense if the Range is to be mapped onto a circular genome. This works on both strands and the strand information is retained.

w2c :: Word8 -> Char

Conversion between Word8 and Char. Should compile to a no-op.

c2w :: Char -> Word8

Unsafe conversion between Char and Word8. This is a no-op and silently truncates to 8 bits Chars > '\255'. It is provided as convenience for ByteString construction.

findAuxFile :: FilePath -> IO FilePath Source

Finds a file by searching the environment variable BIOHAZARD like a PATH.

Bounded Nucleotide Source
Enum Nucleotide Source
Eq Nucleotide Source
Ord Nucleotide Source
Read Nucleotide Source
Show Nucleotide Source
Ix Nucleotide Source
Storable Nucleotide Source
Unbox Nucleotide Source
Vector Vector Nucleotide Source
MVector MVector Nucleotide Source
data Vector Nucleotide = V_Nucleotide (Vector Word8) Source
data MVector s0 Nucleotide = MV_Nucleotide (MVector s Word8) Source

Bounded Nucleotides Source
Enum Nucleotides Source
Eq Nucleotides Source
Ord Nucleotides Source
Read Nucleotides Source
Show Nucleotides Source
Ix Nucleotides Source
Storable Nucleotides Source
Unbox Nucleotides Source
Vector Vector Nucleotides Source
Vector Vector_Nucs_half Nucleotides
MVector MVector Nucleotides Source
Show (Vector_Nucs_half Nucleotides)
data Vector Nucleotides = V_Nucleotides (Vector Word8) Source
data MVector s0 Nucleotides = MV_Nucleotides (MVector s Word8) Source