GeneOntology - parse and index Gene Ontology Annotations In particular, the file 'gene_association.goa_uniprot' that contains links between GO terms and UniProt accessions.
- http://www.geneontology.org/ontology/gene_ontology.obo -- Contains the hierarchy including isA relationships.
- http://www.geneontology.org/GO.format.obo-1_2.shtml -- Describes the OBO format.
- ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/ -- Contains the GOA-UniProt mapping (and a README file).
- http://www.geneontology.org/ontology/GO.defs -- Contains GO definitions (not supported here yet).
- http://www.geneontology.org/doc/GO.terms_and_ids -- GO definitions, simpler and more schematically.
- newtype GoTerm = GO Int
- data GoDef = GoDef !GoTerm !ByteString !GoClass
- type GoHierarchy = [(GoDef, [GoTerm])]
- readObo :: FilePath -> IO GoHierarchy
- readTerms :: FilePath -> IO [GoDef]
- data Annotation = Ann !UniProtAcc !GoTerm !EvidenceCode
- type UniProtAcc = ByteString
- data GoClass
- data EvidenceCode
- readGOA :: FilePath -> IO [Annotation]
- isCurated :: EvidenceCode -> Bool
- decomment :: ByteString -> [ByteString]
Basic data types
A GO term is a positive integer
Reading the OBO format
type GoHierarchy = [(GoDef, [GoTerm])]Source
A list of Go definitions, with pointers to parent nodes. Read from the .obo file. The user may construct the explicit hierachy by storing these in a Map or similar
readObo :: FilePath -> IO GoHierarchySource
Read the GO hierarchy from the obo file. Note that this is not quite a tree structure.
Reading 'terms and ids'
Reading UniProt associations
data Annotation Source
A GOA annotation, containing a UniProt identifier, a GoTerm and an evidence code.
type UniProtAcc = ByteStringSource
A UniProt identifier (short string of capitals and numbers).
data EvidenceCode Source
Evidence codes describe the type of support for an annotation http://www.geneontology.org/GO.evidence.shtml
IC | Inferred by Curator |
IDA | Inferred from Direct Assay |
IEA | Inferred from Electronic Annotation |
IEP | Inferred from Expression Pattern |
IGC | Inferred from Genomic Context |
IGI | Inferred from Genetic Interaction |
IMP | Inferred from Mutant Phenotype |
IPI | Inferred from Physical Interaction |
ISS | Inferred from Sequence or Structural Similarity |
NAS | Non-traceable Author Statement |
ND | No biological Data available |
RCA | Inferred from Reviewed Computational Analysis |
TAS | Traceable Author Statement |
NR | Not Recorded |
readGOA :: FilePath -> IO [Annotation]Source
Read the goa_uniprot file (warning: this one is huge!)
isCurated :: EvidenceCode -> BoolSource
The vast majority of GOA data is IEA, while the most reliable information is manually curated. Filtering on this is useful to keep data set sizes manageable, too.
Utility stuff
decomment :: ByteString -> [ByteString]Source