The BioGateway Resource
A Semantic Systems Biology Database
BioGateway consists of a graph-based database built on Semantic Web principles, a SPARQL endpoint allowing users to query it, and a Cytoscape app which integrates the query functionality directly into your network building workflow.
What is BioGateway?
BioGateway is an initiative that enables a Semantic Systems Biology approach. It provides an entry point to access a data warehouse where biological data is gathered in the form of triples (using RDF). The systems can be queried using SPARQL. The BioGateway system can also be explored using the SPARQL browser. With this browser, SPARQL results can be visually seen as a network of resources.
The Cytoscape App
We have developed an app for Cytoscape to allow you to directly integrate the power of our Semantic Knowledge Base into your network building workflow. With the Query Builder tool, you can formulate the topology of what you are looking for, and it will generate the SPARQL query for you.
The query result can then be imported directly into the Cytoscape network you are building – without having to deal with result file formats, incompatible column standards or identifiers.
The BioGateway Database
BioGateway Data model
The BioGateway triple store provides a unified protein-centric view on biological networks. The data in BioGateway are modeled as directed multi-graphs, not necessarily acyclic, which is a natural choice for representing complex networks.
There are two types of graphs in BioGateway:
A – those that define entities, e.g. proteins, genes, etc.,
B – those that define relations among entities, e.g. protein-protein interaction, protein-disease interactions, etc.
There are three types of nodes in BioGateway:
- Classes: entities in the domain of discourse, e.g. proteins, diseases, etc. (URIs)
- Instances: particular interpretations/views of entities conditioned on the source (URIs, only B type graphs)
- Attributes: qualities, quantities, etc. (literals)
Nodes are connected through multiple types of edges, a.k.a. properties, semantically defined in external ontologies/taxonomies/vocabularies (URIs). Within any given graph a particular property is used within one unique semantic context.
The atomic unit of information (elementary graph) comprises a pair of nodes (subject and object) connected by a directed edge (predicate), commonly known as a triple.
Protein entities (source: ‘http://uniprot.org/uniprot/’, Reference Proteome filtered). This graph forms the core of BioGateway. The entities are identified by their UniParc IDs conditioned on the biological species, chromosome and encoding gene, e.g. ‘http://rdf.biogateway.eu/prot/9606/chr-17/TP53/UPI000002ED67’, thus the corresponding classes are homogeneous with respect to the amino acid sequences. Together with protein classes there are collections of all translation products encoded by a particular gene (essentially sets, but modelled as rdf:Bag due to RDF limitations), e.g. ‘http://rdf.biogateway.eu/prot/9606/chr-17/TP53/’.
Gene entities (source: ‘http://uniprot.org/uniprot/’, Reference Proteome filtered). Semantically these entities are defined by the sets of translation products they encode and logistically by the preferred gene names (as used in ‘http://uniprot.org/uniprot/’) conditioned on the biological species and chromosome e.g. ‘http://rdf.biogateway.eu/gene/9606/chr-17/TP53/’. The corresponding entities are not guaranteed to be homogeneous with respect to the nucleotide sequences and modeled as collections (rdf:Bag).
Taxonomic entities (source: ‘http://purl.bioontology.org/ontology/NCBITAXON’) identified by external URIs, e.g. ‘http://purl.bioontology.org/ontology/NCBITAXON/9606’.
Ontology term entities (source: https://bioportal.bioontology.org/ontologies/GO) identified by external URIs, e.g. ‘http://purl.obolibrary.org/obo/GO_0000122’.
Disease entities (source: ‘http://purl.bioontology.org/ontology/OMIM’) identified by external URIs, e.g. ‘http://purl.obolibrary.org/OMIM/151623’.
All entities are modeled as subclasses of rdf:Statement with instances conditioned on the source.
Interactions between proteins and biological processes, cellular components, molecular functions (source: ‘http://identifiers.org/goa’).
‘http://purl.obolibrary.org/obo/RO_0002331’ “involved in” (biological process),
‘http://purl.obolibrary.org/obo/BFO_0000050’ “part of” (cellular component),
‘http://purl.obolibrary.org/obo/RO_0002327’ “enables” (molecular function).
Protein-phenotype interactions (currently limited to diseases, source: ‘http://uniprot.org/uniprot/’),
Defining property: ‘http://purl.obolibrary.org/obo/RO_0002331’ “involved in” (disease).
Protein-protein interactions (source: ‘http://identifiers.org/intact/’)
Defining property: ‘http://purl.obolibrary.org/obo/RO_0002436’ “molecularly interacts with” (protein).
Interactions between transcription factors and target genes
Defining property: ‘http://purl.obolibrary.org/obo/RO_0002428’ “involved in regulation of” (gene).
External parental classes
http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag ‘unordered collection’
http://www.w3.org/2000/01/rdf-schema#Class ‘entity type’
http://www.w3.org/2000/01/rdf-schema#Property ‘edge type’
Properties used in A and B graphs
http://www.w3.org/2000/01/rdf-schema#subClassOf ‘is subclass of’
http://www.w3.org/2000/01/rdf-schema#subPropertyOf ‘is subproperty of’
http://semanticscience.org/resource/SIO_000253 ‘has source’ # domain: rdf.bigateway.eu/graph/
http://semanticscience.org/resource/SIO_000772 ‘has evidence’ # range: publications
http://schema.org/evidenceOrigin ‘has evidence origin’ # range: source of metadata
http://www.w3.org/2004/02/skos/core#prefLabel ‘has name’
http://schema.org/evidenceLevel ‘has evidence level’
Properties used in A graphs
http://schema.org/memberOf ‘is member of’
http://purl.obolibrary.org/obo/BFO_0000052 ‘inheres in’ # range: biological species
http://www.w3.org/2004/02/skos/core#closeMatch ‘has close match’ # range: external URIs for genes and proteins
http://www.w3.org/2004/02/skos/core#altLabel ‘has synonym’
http://www.w3.org/2004/02/skos/core#definition ‘has definition’
Properties used in B graphs
http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‘is instance of’
http://purl.obolibrary.org/obo/RO_0002331 ‘involved in’ # range: biological process, disease
http://purl.obolibrary.org/obo/BFO_0000050 ‘part of’ # range: cellular component
http://purl.obolibrary.org/obo/RO_0002327 ‘enables’ # range: molecular function
http://purl.obolibrary.org/obo/RO_0002436 ‘molecularly interacts with’ # range: protein
http://purl.obolibrary.org/obo/RO_0002428 ‘involved in regulation of’ # range: gene
http://www.w3.org/2000/01/rdf-schema#isDefinedBy ‘is defined by’ # range: method
http://www.w3.org/1999/02/22-rdf-syntax-ns#value ‘has value’ # positive/negative
http://www.w3.org/2000/01/rdf-schema#comment ‘has comment’ # amino acid change