ab1Parser: Modules for parsing, generating and manipulating AB1 files.

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Warnings:


[Skip to Readme]

Properties

Versions 0.2.1.1
Change log None available
Dependencies ab1Parser, base (>=4.7 && <5), binary, bytestring, directory, filepath, hscolour, pretty-show, protolude, safe-exceptions, split, text [details]
License BSD-3-Clause
Copyright 2018 HyraxBio
Author HyraxBio
Maintainer andre@hyraxbio.co.za
Category Bioinformatics
Home page https://github.com/hyraxbio/hyraxAbi/#readme
Source repo head: git clone https://github.com/hyraxbio/hyraxAbi
Uploaded by andrevdm at 2018-07-08T13:52:56Z

Modules

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for ab1Parser-0.2.1.1

[back to package description]

HyraxBio AB1 parser and generator (beta 0.2)

This project contains

See http://www6.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf for a high level overview of the AB1 file format.

Terminal app

Dump AB1

To dump an existing AB1 run

ab1Parser-exe dump example.ab1

This will output the structure of the AB1 like this

Header { hName = "ABIF" , hVersion = 101 }
Directory
  { dTagName = "tdir"
  , dTagNum = 1
  , dElemTypeCode = 1023
  , dElemTypeDesc = "root"
  , dElemType = ElemRoot
  , dElemSize = 28
  , dElemNum = 13
  , dDataSize = 364
  , dDataOffset = 61980
  , dData = ""
  , dDataDebug = []
  }
[ Directory
    { dTagName = "DATA"
    , dTagNum = 9
    , dElemTypeCode = 4
    , dElemTypeDesc = "short"
    , dElemType = ElemShort
    , dElemSize = 2
    , dElemNum = 7440
    , dDataSize = 14880
    , dDataOffset = 128
    , dData = ""
    , dDataDebug = []
    }
    
.
.
.

DATA {short} tagNum=9 size=2 count=7440 offset=128  []
DATA {short} tagNum=10 size=2 count=7440 offset=15008  []
DATA {short} tagNum=11 size=2 count=7440 offset=29888  []
DATA {short} tagNum=12 size=2 count=7440 offset=44768  []
FWO_ {char} tagNum=1 size=1 count=4 offset=1195463747  ["GATC"]
LANE {short} tagNum=1 size=2 count=1 offset=65536  ["1"]
PBAS {char} tagNum=1 size=1 count=744 offset=59648  ["GGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAGATGGAAAAGGAAGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGA"]
PDMF {pString} tagNum=1 size=1 count=23 offset=60392  ["KB_3500_POP7_BDTv3.mob"]
PDMF {pString} tagNum=2 size=1 count=23 offset=60415  ["KB_3500_POP7_BDTv3.mob"]
PLOC {short} tagNum=1 size=2 count=744 offset=60438  []
S/N% {short} tagNum=1 size=2 count=4 offset=61926  []
SMPL {pString} tagNum=1 size=1 count=10 offset=61934  ["S17-SeqF1"]
CMNT {pString} tagNum=1 size=1 count=1 offset=61944  ["Generated by HyraxBio AB1 generator"]

The data is output twice. The first section is the detail, the second is the summary.

Selected data types have the "debug data" element populated. e.g. the PBAS (FASTA)

Generate minimal AB1s from FASTAs

To create an AB1 run

ab1Parser-exe gen "./pathContainingFastas" "./pathForOutputAb1s"

This will create an AB1 per input FASTA

Input FASTA format

Each input data should have the following format

> weight
read
> weight
read

Weighted reads

For example

> 0.5
ACG
> 0.3
AAAA
> 1
__AC

Results in the following weighted nucleotide per position

Position Nucleotides (weight)
0 A (0.5 + 0.3)
1 C (0.5), A (0.3)
2 G (0.5), A (0.3 + 1 = 1)
3 A (0.3), C (1)

Note that the reads do not need to be the same length.


Example FASTA - single file

eg1.fasta

> 1
ACTG

Here there is a single FASTA with a single read with a weigh of 1 (100%). The chromatogram for this AB1 shows perfect traces for the input ACTG nucleotides


Example FASTA - two FASTA files

eg1.fasta

> 1
ACAG

eg2.fasta

> 1
ACTG

Two input FASTA files both with a weigh of 1. You can see in the second trace that the third nucleotide is a T (the trace is green). Exactly what the base-calling software (phred & recall etc) decide to call the base as depends on your settings and software choices.


Example FASTA - two FASTA files with different weights

eg1.fasta

> 1
ACAG

eg2.fasta

> 0.3
ACTG

Here the second fasta has a weight of 0.3 and you can see the traces are 30% of the height of the top ones.


Example FASTA - single FASTA with a mix

eg1.fasta

> 1
ACAG
> 0.3
ACTG

The single input FASTA has an AG mix at the third nucleotide. The first read has a weight of 1 and the second a weight of 0.3


Using the modules

For a detailed overview of the code see TODO and the haddock documentation TODO

For now the terminal app (Main.hs) serves as an example and the best starting point to understand the code