Using ontologies
You will need to define the class and relations for the properties in your data. The easiest way is to find classes and properties in existing model (aka. ontologies). Some properties are standard like rdf:type
and rdfs:label
, but for more specific concepts the best is to find an existing data model matching your model.
#
Reuse existing ontologies ♻️A number of ontologies have already been defined for different use-cases and domain. Re-using existing ontologies is faster as you don't need to build the ontology yourself, and it improves the interoperability of your data.
#
Ontologies repositoriesSearch for relevant existing models in ontology repositories:
- Linked Open Vocabulary (LOV)
- BioPortal for biomedical concepts by the NCBI.
- OntologyLookupService by the EBI
- AgroPortal for agronomy by INRIA.
- EcoPortal for ecology by Life Watch Italy.
The BioPortal Recommender and Search services are efficient to look for concepts in most existing biomedical ontologies.
#
Popular ontologies- Semanticscience Integrated Ontology (SIO), a simple, integrated ontology of types and relations for rich description of objects, processes and their attributes.
- BioLink Model, A high level datamodel of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc) and their associations.
- Schema.org, a collaborative project to define schemes for structured data on the Internet, on web pages, in email messages, and beyond.
- Various classes described such as schema:Person, schema:MedicalGuideline, schema:Review, schema:ScholarlyArticle, schema:MedicalScholarlyArticle, schema:Dataset, etc.
- Extensions available, such as BioSchemas for biological data
- Alternatively you can look into Google Data Types, which are mainly built from schema.org and allow to describe and index your website using RDF (JSON-LD)
- DublinCore (dc, dct, dctypes), one of the most generic vocabulary (includes properties such as
dc:identifier
,dct:description
,dct:creator
,dct:license
,dct:rights
...). - PAV: Provenance, Authoring and Versioning ontology.
- PROV: The Provenance Ontology, another ontology to describe provenance with more details.
- DCAT: Data Catalog Vocabulary, to describe datasets.
- NCIT: National Cancer Institute Thesaurus, a vocabulary for clinical care, translational and basic research, and public information and administrative activities.
#
Define the schemaIn the case you are reusing existing ontologies the best is to define the schema your data will follow using SHACL shapes, or ShEx expressions. This will allow you to validate the generated data, and other users will be able to quickly understand your data.
Here are a few examples of tools and methods to generate SHACL or ShEx shapes:
- SHACLGEN - Python library to generate SHACL shapes: https://pypi.org/project/shaclgen/
- RDFShape - A Web app and library to generate SHACL/ShEx: http://rdfshape.weso.es
- SheXer: A library to perform automatic extraction of SHACL/ShEx schemata in RDF graphs: http://shexer.weso.es
- "Shape Designer for ShEx and SHACL constraints" by Boneva et al presented in ISWC 2019: https://gitlab.inria.fr/jdusart/shexjapp
- Astrea: Automatic generation of SHACL shapes from ontologies: https://astrea.linkeddata.es
- TopBraid Composer: https://www.topquadrant.com/products/topbraid-composer/ & https://www.topquadrant.com/from-owl-to-shacl-in-an-automated-way/
- "RDF shape induction using knowledge base profiling" to generate Shapes by Mihindukulasooriya et al. presented in Annual ACM Symposium on Applied Computing in 2018.
- "Towards improving the quality of knowledge graphs with data-driven ontology patterns and SHACL" by Spahiu et al. presented as a Workshape Paper in ISWC in 2018.
#
Ontology design 🎨If you don't find an ontology that fits, or if you need to edit an ontology, you can check at the following tools:
#
ProtégéYou can use the Protégé ontology editor to build your ontology, using a tree view
- Install Protégé on your computer for better performance than the web hosted service.
- Or use WebProtégé for its collaborative features.
#
VocBenchVocBench is a web-based, multilingual, collaborative development platform for managing OWL ontologies, SKOS(/XL) thesauri, and generic RDF datasets.
#
Gra.foGra.fo is a commercial product, but use it for free to build simple RDFS/OWL ontologies with a diagram view and collaboration features.
#
ChowlkChowlk is a web service that automatically generates the OWL code from your Ontology Diagram made with diagrams.net. You will need to follow the instructions to define the diagrams block following a specific format.
#
OwlReady2OwlReady2 is a Python library to work with OWL ontologies. It helps you build OWL ontologies with Python code and Jupyter notebooks.
#
TopBraid ComposerNow with free edition: https://www.topquadrant.com/products/topbraid-composer/
#
StarDogThe StarDog triplestore includes an ontology editor, but it requires a license.
#
Resolve prefixeshttp://prefix.cc is a handy service to resolve prefixes.
E.g. http://prefix.cc/bl
#
Publish the ontology 📰The easiest place to publish your ontology is in a GitHub repository.
#
Publish documentation2 options are available:
- Widoco: generate ontology documentation following the W3C style
- Ontospy: provide multiple choices for ontology documentation (more user-friendly for larger ontologies)
See this example workflow implementing Widoco and Ontospy: https://github.com/vemonet/semanticscience/blob/master/.github/workflows/generate-docs.yml
It allows to automatically generate and publish documentation for your ontology using GitHub Actions and GitHub Pages:
The ontology is published in a GitHub repository, in our case in
ontology/sio.owl
The GitHub Actions workflow is triggered when there is a change in the ontology file.
The GitHub Actions workflow runs Ontospy, or Widoco (yours to choose), given the latest committed ontology file (
ontology/sio.owl
in this example), which generates the HTML documentation in thegh-pages
branch, in a different folder for each documentation type.The
gh-pages
branch is published as a GitHub Page
In this example we have a simple index.html
file to let the user choose the documentation types he wants to access
Feel free to adapt this GitHub Actions workflow
#
Use persistent identifierWe recommend to use the w3id.org system, as it allows any GitHub user to define and reserve your persistent namespace for free in a few minutes:
- Fork the w3id.org repository: https://github.com/perma-id/w3id.org
- Create a folder with your namespace name (e.g. my-onto)
- Add a
.htaccess
file with the redirection to your ontology (and aREADME.md
file shortly explaining the purpose of this namespace) - Send a pull request to the https://github.com/perma-id/w3id.org repository. It usually takes between a few hours and a few days to be accepted.
Examples:
- See this example for a
.htaccess
passing the original w3id URI queries - Or this example to redirect to different websites depending on the path.
The persistent identifiers can be easily modified later if necessary, you will just need to send a new pull request with the changes.
#
Add it to an ontology repositoryDepends on the ontology domain (see above).