Safe Haskell | None |
---|---|
Language | Haskell98 |
- data BamIndex a = BamIndex {
- minshift :: !Int
- depth :: !Int
- unaln_off :: !Int64
- extensions :: a
- refseq_bins :: !(Vector Bins)
- refseq_ckpoints :: !(Vector Ckpoints)
- readBamIndex :: FilePath -> IO (BamIndex ())
- readBaiIndex :: MonadIO m => Iteratee ByteString m (BamIndex ())
- readTabix :: MonadIO m => Iteratee ByteString m TabIndex
- data Region = Region {}
- newtype Subsequence = Subsequence (IntMap Int)
- eneeBamRefseq :: Monad m => BamIndex b -> Refseq -> Enumeratee [BamRaw] [BamRaw] m a
- eneeBamSubseq :: Monad m => BamIndex b -> Refseq -> Subsequence -> Enumeratee [BamRaw] [BamRaw] m a
- eneeBamRegions :: Monad m => BamIndex b -> [Region] -> Enumeratee [BamRaw] [BamRaw] m a
- eneeBamUnaligned :: Monad m => BamIndex b -> Enumeratee [BamRaw] [BamRaw] m a
- subsampleBam :: (MonadIO m, MonadMask m) => FilePath -> Enumerator' BamMeta [BamRaw] m b
Documentation
Full index, unifying BAI and CSI style. In both cases, we have the
binning scheme, parameters are fixed in BAI, but variable in CSI.
Checkpoints are created from the linear index in BAI or from the
loffset
field in CSI.
BamIndex | |
|
readBamIndex :: FilePath -> IO (BamIndex ()) Source
Reads any index we can find for a file. If the file name has a .bai or .csi extension, we read it. Else we look for the index by adding such an extension and by replacing the extension with these two, and finally in the file itself. The first file that exists and can actually be parsed, is used.
readBaiIndex :: MonadIO m => Iteratee ByteString m (BamIndex ()) Source
Read an index in BAI or CSI format, recognized automatically.
Note that TBI is supposed to be compressed using bgzip; it must be
decompressed before being passed to readBaiIndex
.
readTabix :: MonadIO m => Iteratee ByteString m TabIndex Source
Reads a Tabix index. Note that tabix indices are compressed, this is taken care of.
newtype Subsequence Source
A mostly contiguous subset of a sequence, stored as a set of
non-overlapping intervals in an IntMap
from start position to end
position (half-open intervals, naturally).
eneeBamRefseq :: Monad m => BamIndex b -> Refseq -> Enumeratee [BamRaw] [BamRaw] m a Source
Seeks to a given sequence in a Bam file and enumerates only those
records aligning to that reference. We use the first checkpoint
available for the sequence. This requires an appropriate index, and
the file must have been opened in such a way as to allow seeking.
Enumerates over the BamRaw
records of the correct sequence only,
doesn't enumerate at all if the sequence isn't found.
eneeBamSubseq :: Monad m => BamIndex b -> Refseq -> Subsequence -> Enumeratee [BamRaw] [BamRaw] m a Source
eneeBamRegions :: Monad m => BamIndex b -> [Region] -> Enumeratee [BamRaw] [BamRaw] m a Source
eneeBamUnaligned :: Monad m => BamIndex b -> Enumeratee [BamRaw] [BamRaw] m a Source
Seeks to the part of a Bam file that contains unaligned reads and
enumerates those. Sort of the dual to eneeBamRefseq
. We use the
best guess at where the unaligned stuff starts. If no such guess is
available, we decode everything.
subsampleBam :: (MonadIO m, MonadMask m) => FilePath -> Enumerator' BamMeta [BamRaw] m b Source
Subsample randomly from a BAM file. If an index exists, this produces an infinite stream taken from random locations in the file.