Publish data

Publish to our public GraphDB triplestore#

Create a new repository on our GraphDB triplestore at https://graphdb.dumontierlab.com/

Ask for permissions

Ask us to get the permissions to create new repositories after creating an account.

Create the GraphDB repository#

👩‍💻 Go to Setup > Repositories > Create Repository

Or click here: https://graphdb.dumontierlab.com/repository/create

👨‍💻 Choose the settings of your repository (leave the default if not mentioned here):

Ruleset: use RDFS-Plus (Optimized) by default, or a OWL ruleset if you are performing reasoning using OWL ontologies
Supports SHACL validation: enable if you plan on using SHACL shapes to validate the RDF loaded in the repository.
- Visit https://maastrichtu-ids.github.io/shapes-of-you to find SHACL Shapes
- Add new shapes to IDS Shapes repository: https://github.com/MaastrichtU-IDS/shacl-shapes
Use context index: enable this to index the contexts (aka. graphs)
For large dataset:
- Entity index size: increase this to 999999999
- Entity ID bit-size: increase this to 40

To access your repository:

SPARQL endpoint at https://graphdb.dumontierlab.com/repositories/my-repository
SPARQL endpoint to run update queries (e.g. INSERT): https://graphdb.dumontierlab.com/repositories/my-repository/statements
GraphDB admin web UI: https://graphdb.dumontierlab.com and change the repository using the button at the top right of the screen.

Edit your repository access#

By default your repository will not be available publicly.

👩‍💻 Go to Users and Access

Change the Free Access Settings (top right of the page) to enable public access to read the SPARQL endpoint of your repository
- Find your repository and enable Read access (checkbox on the left)
You can also give Write access to other users
- We usually give Write access to the import_user to be used in automated workflow (to automatically upload new data to the repository)

Optional: enable GraphDB search index#

You can easily enable GraphDB Lucene search index to quickly search string in your triplestore

Here is an example to create a search index for the rdfs:label and dct:description properties.

👨‍💻 Running this in your GraphDB repository SPARQL editor will insert the triples and the search index will be created (just wait a bit)

PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
INSERT DATA { 
    # luc:moleculeSize luc:setParam "1" .
    luc:includePredicates luc:setParam "http://www.w3.org/2000/01/rdf-schema#label http://purl.org/dc/terms/description" .
    luc:useRDFRank luc:setParam "yes" .
    luc:searchIndex luc:createIndex "true" .
}

Query the GraphDB search index:

PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
SELECT ?foundUri ?foundLabel {
    ?foundLabel luc:searchIndex 'TEXT_TO_SEARCH*' ;
    luc:score ?score .
    ?foundUri ?p ?foundLabel .
} ORDER BY ?score LIMIT 200

Wildcard

We are using a * wildcard at the end to match all strings starting with the string TEXT_TO_SEARCH

Automate data processing and loading#

RDF data can be automatically generated and loaded using GitHub Actions workflows.

See this workflow to generate data using a simple convert_to_rdf.py file and load it in the triplestore

Download input file from Google Docs and run python script to generate RDF

    - name: Install Python dependencies
      run: |
        python -m pip install -r requirements.txt
    - name: Download CSV files from Google docs
      run: |
        mkdir -p data/output
        wget -O data/food-claims-kg.xlsx "https://docs.google.com/spreadsheets/d/1RWZ6AlGB8m7PO5kjsbbbeI4ETLwvKLOvkrzOpl8zAM8/export?format=xlsx&id=1RWZ6AlGB8m7PO5kjsbbbeI4ETLwvKLOvkrzOpl8zAM8"
    - name: Run Python script to generate RDF
      run: |
        python src/convert_to_rdf.py

Optional: clear existing graph

    - name: Clear existing graph
      uses: vemonet/sparql-operations-action@v1
      with:
        query: "CLEAR GRAPH <https://w3id.org/foodkg/graph>"
        endpoint: https://graphdb.dumontierlab.com/repositories/FoodHealthClaimsKG/statements
        user: ${{ secrets.GRAPHDB_USER }}
        password: ${{ secrets.GRAPHDB_PASSWORD }}

Load RDF file previously generated by the workflow in data/output/food_health_kg.ttl for the example

    - name: Import RDF files in the triplestore
      uses: MaastrichtU-IDS/RdfUpload@master
      with:
        file: data/output/food_health_kg.ttl
        endpoint: https://graphdb.dumontierlab.com/repositories/FoodHealthClaimsKG/statements
        user: ${{ secrets.GRAPHDB_USER }}
        password: ${{ secrets.GRAPHDB_PASSWORD }}
        graph: "https://w3id.org/foodkg/graph"

Secrets

You will need to define those 2 secrets in your GitHub repository workflows secrets: GRAPHDB_USER and GRAPHDB_PASSWORD

Deploy a serverless API for triplestore#

Deploying an API to access and explore your SPARQL endpoint is really easy to do with grlc.io. You just need to define a few SPARQL queries in a GitHub repository, and grlc.io will handle everything else to expose a Swagger API (aka. Open API) to access your knowledge graph.

Enable easy data exploration

🧭 This API will be the entrypoint for people who want to discover your data: they can quickly explore and understand the structure of your knowledge graph through the query you exposed.

To make this example easier to reproduce, we will use the existing grlc.io API deployment defined for the food-claims-kg as example

👩‍💻 Provide the URL of SPARQL endpoint to query in the endpoint.txt file
👨‍💻 Define the SPARQL queries in .rq files at the base of the git repo.
Example
See this example of a .rq file to define a SPARQL query with a parameter (used to filter using regex()).
That's it 🤯 you can go to your API Swagger UI automatically generated and hosted by grlc.io based on the GitHub repository URL: http://grlc.io/api-git/MaastrichtU-IDS/food-claims-kg

Bonus: combine grlc.io with the GraphDB search index query, and you have a Search API for your knowledge graph! 🔎

An active project

The project is still active and reactive, feel free to post an issue if you face any problem.

Publish to our public GraphDB triplestore#

Ask for permissions

Create the GraphDB repository#

Edit your repository access#

Optional: enable GraphDB search index#

Wildcard

Automate data processing and loading#

Secrets

Deploy a serverless API for triplestore#

Enable easy data exploration

Example

An active project