Publish data

Publish to our public GraphDB triplestore#

Create a new repository on our GraphDB triplestore at https://graphdb.dumontierlab.com/

Ask for permissions

Ask us to get the permissions to create new repositories after creating an account.

Create the GraphDB repository#

👩‍💻 Go to Setup > Repositories > Create Repository

👨‍💻 Choose the settings of your repository (leave the default if not mentioned here):

  • Ruleset: use RDFS-Plus (Optimized) by default, or a OWL ruleset if you are performing reasoning using OWL ontologies
  • Supports SHACL validation: enable if you plan on using SHACL shapes to validate the RDF loaded in the repository.
  • Use context index: enable this to index the contexts (aka. graphs)
  • For large dataset:
    • Entity index size: increase this to 999999999
    • Entity ID bit-size: increase this to 40

To access your repository:

Edit your repository access#

By default your repository will not be available publicly.

👩‍💻 Go to Users and Access

  • Change the Free Access Settings (top right of the page) to enable public access to read the SPARQL endpoint of your repository
    • Find your repository and enable Read access (checkbox on the left)
  • You can also give Write access to other users
    • We usually give Write access to the import_user to be used in automated workflow (to automatically upload new data to the repository)

Optional: enable GraphDB search index#

You can easily enable GraphDB Lucene search index to quickly search string in your triplestore

Here is an example to create a search index for the rdfs:label and dct:description properties.

👨‍💻 Running this in your GraphDB repository SPARQL editor will insert the triples and the search index will be created (just wait a bit)

PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
INSERT DATA {
# luc:moleculeSize luc:setParam "1" .
luc:includePredicates luc:setParam "http://www.w3.org/2000/01/rdf-schema#label http://purl.org/dc/terms/description" .
luc:useRDFRank luc:setParam "yes" .
luc:searchIndex luc:createIndex "true" .
}

Query the GraphDB search index:

PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
SELECT ?foundUri ?foundLabel {
?foundLabel luc:searchIndex 'TEXT_TO_SEARCH*' ;
luc:score ?score .
?foundUri ?p ?foundLabel .
} ORDER BY ?score LIMIT 200
Wildcard

We are using a * wildcard at the end to match all strings starting with the string TEXT_TO_SEARCH

Automate data processing and loading#

RDF data can be automatically generated and loaded using GitHub Actions workflows.

See this workflow to generate data using a simple convert_to_rdf.py file and load it in the triplestore

  1. Download input file from Google Docs and run python script to generate RDF
- name: Install Python dependencies
run: |
python -m pip install -r requirements.txt
- name: Download CSV files from Google docs
run: |
mkdir -p data/output
wget -O data/food-claims-kg.xlsx "https://docs.google.com/spreadsheets/d/1RWZ6AlGB8m7PO5kjsbbbeI4ETLwvKLOvkrzOpl8zAM8/export?format=xlsx&id=1RWZ6AlGB8m7PO5kjsbbbeI4ETLwvKLOvkrzOpl8zAM8"
- name: Run Python script to generate RDF
run: |
python src/convert_to_rdf.py
  1. Optional: clear existing graph
- name: Clear existing graph
uses: vemonet/sparql-operations-action@v1
with:
query: "CLEAR GRAPH <https://w3id.org/foodkg/graph>"
endpoint: https://graphdb.dumontierlab.com/repositories/FoodHealthClaimsKG/statements
user: ${{ secrets.GRAPHDB_USER }}
password: ${{ secrets.GRAPHDB_PASSWORD }}
  1. Load RDF file previously generated by the workflow in data/output/food_health_kg.ttl for the example
- name: Import RDF files in the triplestore
uses: MaastrichtU-IDS/RdfUpload@master
with:
file: data/output/food_health_kg.ttl
endpoint: https://graphdb.dumontierlab.com/repositories/FoodHealthClaimsKG/statements
user: ${{ secrets.GRAPHDB_USER }}
password: ${{ secrets.GRAPHDB_PASSWORD }}
graph: "https://w3id.org/foodkg/graph"
Secrets

You will need to define those 2 secrets in your GitHub repository workflows secrets: GRAPHDB_USER and GRAPHDB_PASSWORD

Deploy a serverless API for triplestore#

Deploying an API to access and explore your SPARQL endpoint is really easy to do with grlc.io. You just need to define a few SPARQL queries in a GitHub repository, and grlc.io will handle everything else to expose a Swagger API (aka. Open API) to access your knowledge graph.

Enable easy data exploration

🧭 This API will be the entrypoint for people who want to discover your data: they can quickly explore and understand the structure of your knowledge graph through the query you exposed.

To make this example easier to reproduce, we will use the existing grlc.io API deployment defined for the food-claims-kg as example

  1. 👩‍💻 Provide the URL of SPARQL endpoint to query in the endpoint.txt file

  2. 👨‍💻 Define the SPARQL queries in .rq files at the base of the git repo.

    Example

    See this example of a .rq file to define a SPARQL query with a parameter (used to filter using regex()).

  3. That's it 🤯 you can go to your API Swagger UI automatically generated and hosted by grlc.io based on the GitHub repository URL: http://grlc.io/api-git/MaastrichtU-IDS/food-claims-kg

Bonus: combine grlc.io with the GraphDB search index query, and you have a Search API for your knowledge graph! 🔎

An active project

The project is still active and reactive, feel free to post an issue if you face any problem.

Last updated on by Vincent Emonet