Introduction

The Data Science Research Infrastructure is an OKD 4.6 cluster, the open source version of OpenShift, using RedHat Ceph Storage.

The DSRI provides a graphical user interface on top of the Kubernetes containers orchestration to easily deploy and manage services.

New DSRI version!

This documentation cover the new DSRI version using OKD4.6 available at https://console-openshift-console.apps.dsri2.unimaas.nl

You can find the documentation for the legacy DSRI version using OKD 3.11 here.

Which DSRI version should you use?#

New OKD 4.6 cluster#

You need to start applications on CPU

Storage of applications deployed in the new cluster is automated.

Legacy OKD 3.11 cluster#

You need to run applications on GPU (TensorFlow, PyTorch...)

Storage of applications deployed in the legacy cluster needs to be manually configured.

If you need to run applications on GPU, visit the documentation for the legacy cluster.

Getting started#

What can be done on the DSRI โœ”๏ธ#

Run Data Science applications in Docker container ๐Ÿณ on the UM network, such as:

  • Multiple flavors of JupyterLab (scipy, tensorflow, all-spark, and more)
  • JupyterHub with GitHub authentication
  • RStudio, with a complementary Shiny server
  • VisualStudio Code server
  • Tensorflow or PyTorch on Nvidia GPU (with JupyterLab or VisualStudio Code)
  • Apache Flink cluster for streaming applications
  • Or any program installed in a Docker image!
Data storage

DSRI is a computing infrastructure, built and used to run data science workloads. DSRI stores data in a persistent manner, but all data stored on the DSRI is susceptible to be altered by the workloads you are running, and we cannot guarantee its immutability.

Always keep a safe copy of your data outside the DSRI. And don't rely on the DSRI for long term storage.

What cannot be done โŒ#

  • Since DSRI can only be accessed when on the physical UM network or using the UM VPN, deployed services will not be available on the public Internet ๐Ÿ”’
  • All activities must be legal in basis. You must closely examine and abide by the terms and conditions of any data, software, or web service that you use as part of your work ๐Ÿ“œ
Request an account

If you are working at Maastricht University, see this page to request an account, and run your services on the DSRI.

The DSRI in a nutshell#

Here is a diagram providing a simplified explanation of how the DSRI works, using popular data science applications as examples (JupyterLab, RStudio, VSCode server)

DSRI in a nutshell

The DSRI specifications#

Software#

Hardware#

  • 16 CPU nodes
RAM (GB)CPU (cores)Storage (TB)
Node capacity512 GB64 cores (128 threads)120 TB
Total capacity8 192 GB1 024 cores1 920 TB
  • 1 GPU node: Nvidia DGX1 8x Tesla V100 - 32GB GPU
GPUsRAM (GB)CPU (cores)
GPU node capacity8528 GB40 cores
DSRI infrastructure

Learn more about DSRI#

See the following presentation about the Data Science Research Infrastructure

DSRI April 2021 Community Event Presentation
Last updated on by Vincent Emonet