The RDF model of the Gene Ontology, demystified

The Gene Ontology (GO) is a controlled vocabulary of terms describing genes and gene products. It facilitates the integration of biological and biomedical data and is widely used in bioinformatics.

The ontology defines terms (GO terms) and relationships between those terms. The terms can be thought of as tags that can be assigned to gene products. An individual gene can also be annotated with multiple GO terms.

Gene Ontology in RDF

The GO terms and relationships between them form a graph which can be described in RDF.

In the RDF representation of the Gene Ontology, each term has the IRI of the form For example, the term "mitochondrion inheritance" is represented by the node (obo:GO_0000001). Here is an excerpt of the ontology showing information about obo:GO_0000001:

@prefix obo: <> .
@prefix owl: <> .
@prefix rdfs: <> .

obo:GO_0000001 a owl:Class .
rdfs:subClassOf obo:GO_0048308 ,
obo:GO_0048311 ;
obo:IAO_0000115 "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." ;
rdfs:label "mitochondrion inheritance" .

Running SPARQL queries against the Gene Ontology data

A SPARQL query interface provided by the Gene Ontology project and powered by Blazegraph makes it possible to issue queries against the GO RDF dataset directly in the browser. For example, this query returns all superclasses of mitochondrion inheritance:

PREFIX obo: <>
SELECT ?class ?classLabel
obo:GO_0000001 rdfs:subClassOf ?class .
?class rdfs:label ?classLabel .

Query result:

classclassLabel"occurrent""process""biological_process""mitochondrion inheritance""mitochondrion organization""cellular process""cellular component organization""organelle inheritance""mitochondrion distribution""organelle organization""localization""organelle localization""cellular localization""mitochondrion localization""cellular component organization or biogenesis"

What does it mean?

An instance of a Linked Data set, the Gene Ontology RDF graph is a step towards the more efficient sharing and reuse of biological information. While the GO terms provide the common language for annotating genes, RDF enables the interoperability and integration of those annotations with other datasets and ontologies.

See also

An overview of the official Nobel Prize Linked Data dataset with some example SPARQL queries.
Edit Turtle documents online.
Turtle, RDF
Explore the human genome.
bioinformatics, genomics
Generating a complete human genome sequence, chromosome by chromosome.
Convert FASTQ files to FASTA format.
FASTQ, FASTA, bioinformatics
A tool to generate plasmid maps from GenBank files.
bioinformatics, SVG
Author and visualize RDF-based knowledge graphs.
RDF, knowledge graphs, data visualization
Create pretty sequence logo diagrams online.
bioinformatics, sequence analysis, FASTA
Provisioning a graph database cluster in AWS using TypeScript.
Open FASTA files in the browser.
bioinformatics, FASTA
A WASM port of the MUSCLE sequence alignment tool.
MUSCLE, WASM, bioinformatics
How progressive alignment makes it possible to efficiently align hundreds to thousands of large genomes.

Made by Anton Vasetenkov.

If you want to say hi, you can reach me on LinkedIn or via email. If you like my work, you can support me by buying me a coffee.