Safe Haskell | None |
---|---|
Language | Haskell2010 |
This module contains stuff relating to conventions local to MPI EVAN. The code is needed regularly, but it can be harmful when applied to BAM files that follow different conventions. Most importantly, no program should call these functions by default.
Synopsis
- fixupFlagAbuse :: BamRec -> BamRec
- fixupBwaFlags :: BamRec -> BamRec
- removeWarts :: BamRec -> BamRec
Documentation
fixupFlagAbuse :: BamRec -> BamRec Source #
Fixes abuse of flags valued 0x800 and 0x1000. We used them for low quality and low complexity, but they have since been redefined. If set, we clear them and store them into the ZQ field. Also fixes abuse of the combination of the paired, 1st mate and 2nd mate flags used to indicate merging or trimming. These are canonicalized and stored into the FF field. This function is unsafe on BAM files of unclear origin!
fixupBwaFlags :: BamRec -> BamRec Source #
Fixes typical inconsistencies produced by Bwa: sometimes, 'mate unmapped' should be set, and we can see it, because we match the mate's coordinates. Sometimes 'properly paired' should not be set, because one mate is unmapped. This function is generally safe, but needs to be called only on the output of affected (older?) versions of Bwa.
removeWarts :: BamRec -> BamRec Source #
Removes syntactic warts from old read names or the read names used in FastQ files. Supported conventions:
- A name suffix of
/1
or/2
is turned into the first mate or second mate flag and the read is flagged as paired. - Same for name prefixes of
F_
orR_
, respectively. - A name prefix of
M_
flags the sequence as unpaired and merged - A name prefix of
T_
flags the sequence as unpaired and trimmed - A name prefix of
C_
, optionally before or after any of the other prefixes, is turned into the extra flagXP:i:-1
(result of duplicate removal with unknown duplicate count). - A collection of tags separated from the name by an octothorpe is
removed and put into the fields
XI
andXJ
as text.