hPDB: Protein Databank file format library

[ bioinformatics-, bsd3, library ] [ Propose Tags ] [ Report a vulnerability ]

Protein Data Bank file format is a most popular format for holding biological macromolecular data.

This is a very fast sequential parser:

below 7s for the largest entry in PDB - 1HTQ which is over 70MB - as compared with
11s of RASMOL 2.7.5,
or 2m15s of BioPython with Python 2.6 interpreter.

In its parallel incarnation it is most probably the fastest parser for PDB format.

It is aimed to not only deliver event-based interface, but also a high-level data structure for manipulating data in spirit of BioPython's PDB parser.

hPDB - Haskell library for processing atomic biomolecular structures in Protein Data Bank format - Michal Jan Gajda. BMC Research Notes 2013, 6:483.

[Skip to Readme]

Modules

[Last Documentation]

Bio
- Bio.PDB
  - EventParser
    - Bio.PDB.EventParser.ExperimentalMethods
    - Bio.PDB.EventParser.HelixTypes
    - Bio.PDB.EventParser.PDBEventParser
    - Bio.PDB.EventParser.PDBEventPrinter
    - Bio.PDB.EventParser.PDBEvents
    - Bio.PDB.EventParser.StrandSense
  - Bio.PDB.Fasta
  - Bio.PDB.IO
    - Bio.PDB.IO.OpenAnyFile
  - Bio.PDB.Iterable
  - Bio.PDB.Structure
    - Bio.PDB.Structure.Elements
    - Bio.PDB.Structure.List
    - Bio.PDB.Structure.Neighbours
    - Bio.PDB.Structure.Vector
  - Bio.PDB.StructureBuilder
  - Bio.PDB.StructurePrinter

Flags

Automatic Flags

Name	Description	Default
have-mmap	Use mmap to read input faster.	Enabled
have-sse2	Use -msse2 for faster code.	Enabled
have-text-format	Do not use text-format, since it may require double-conversion and thus linking of libstdc++ which may break compilation due to GHC bug #5289: http://ghc.haskell.org/trac/ghc/ticket/5289	Disabled

Name

Description

Default

have-mmap

Use mmap to read input faster.

Enabled

have-sse2

Use -msse2 for faster code.

Enabled

have-text-format

Do not use text-format, since it may require double-conversion and thus linking of libstdc++ which may break compilation due to GHC bug #5289:

http://ghc.haskell.org/trac/ghc/ticket/5289

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

hPDB-1.5.0.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

MichalGajda

For package maintainers and hackage trustees

edit package information

Candidates

Versions [RSS]	0.99, 0.999, 0.9999, 0.9999.1, 1.0, 1.1, 1.1.1, 1.1.2, 1.2.0, 1.2.0.1, 1.2.0.2, 1.2.0.3, 1.2.0.4, 1.2.0.5, 1.2.0.6, 1.2.0.7, 1.2.0.8, 1.2.0.9, 1.2.0.10, 1.3.0.0, 1.4.0.0, 1.5.0.0 (info)
Change log	changelog
Dependencies	base (>=4.0 && <4.12), bytestring, containers, deepseq, directory, ghc-prim, iterable (>=3.0), linear, mmap, mtl, Octree (>=0.6), parallel (>=3.0.0.0), QuickCheck (>=2.5.0.0), tagged (>=0.7), template-haskell, text (>=0.11.1.13), text-format (>=0.3.1.0), unordered-containers (>=0.2.5.0), vector, zlib [details]
Tested with	ghc ==7.10.3, ghc ==8.0.1, ghc ==8.2.2
License	BSD-3-Clause
Copyright	Copyright by Michal J. Gajda '2009-'2015
Author	Michal J. Gajda
Maintainer	mjgajda@googlemail.com
Category	Bioinformatics
Home page	https://github.com/BioHaskell/hPDB
Bug tracker	mailto:mjgajda@googlemail.com
Source repo	head: git clone https://github.com/BioHaskell/hPDB.git
Uploaded	by MichalGajda at 2018-07-14T20:38:39Z
Distributions
Reverse Dependencies	1 direct, 0 indirect [details]
Downloads	15305 total (14 in the last 30 days)
Rating	2.0 (votes: 1) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs not available [build log] All reported builds failed as of 2018-07-14 [all 3 reports]

Readme for hPDB-1.5.0.0

[back to package description]

hPDB

Haskell PDB file format parser.

Protein Data Bank file format is a most popular format for holding biomolecule data.

This is a very fast parser:

below 7s for the largest entry in PDB - 1HTQ which is over 70MB
as compared with 11s of RASMOL 2.7.5,
or 2m15s of BioPython with Python 2.6 interpreter.

It is aimed to not only deliver event-based interface, but also a high-level data structure for manipulating data in spirit of BioPython's PDB parser.

Details on official releases are on Hackage

This package is also a part of Stackage - a stable subset of Hackage.

Projects for the future:

Please let me know if you would be willing to push the project further.

In particular one may considering these features:

Implement basic spatial operations of RMS superposition (with SVD), affine transform on a substructure.
Use lens to facilitate access to the data structures.
- torsion angles within protein/RNA chain.
Add Octree to the default data structure (with automatic update.)
Migrate out of text-format, since it gives portability trouble, and slows things down when printing.
Write a combinator library for generic fast parsing.
Checking whether GHC 7.8 improved efficiency of fixed point arithmetic, since PDB coordinates have dynamic range of just ~2^20 bits, with smallest step of 0.001.
Class-based wrappers showing Structure-Model-Chain-Residue-Atom interface with possible wrapping of Repa/Accelerate arrays for fast computation.

Please ask me any questions on Gitter.