A workshop to get started with the Data Science Research Infrastructure (DSRI) in an hour π (hopefully)!
During this workshop, you will:
- Access the Data Science Research Infrastructure web UI
- Create a new application from a template in the catalog (RStudio, JupyterLab or VScode )
- Access the application
- Add source code and data in the application
- Optionally install the
oc
command line interface
Prerequisites:
- A web browser (Chrome preferably, as some other web browsers have issues with the VSCode terminal)
- An account on the DSRI with your UM email
- Access to the UM VPN, or direct connection to UMnet or eduroam at Maastricht University
- Students can use the Athena Student Desktop at athenadesktop.maastrichtuniversity.nl to access the DSRI web UI
Access the DSRI π
π The DSRI documentation can be found at https://maastrichtu-ids.github.io/dsri-documentation
-
Connect to the UM VPN.
-
Students can use the Athena Student Desktop at athenadesktop.maastrichtuniversity.nl to access the DSRI web UI
-
On Linux you can use
openconnect
:
sudo openconnect --passwd-on-stdin -u YOUR.UM.USER --authgroup 01-Employees vpn-rw1.maastrichtuniversity.nl
-
-
Access the DSRI OpenShift web UI
- π See the complete documentation to access the DSRI
-
π©βπ» Go to the workspace-workshop project in the OpenShift web UI
Start an application π
Start a JupyterLab/RStudio/VSCode application from the DSRI catalog in ids-projects
π See how to deploy JupyterLab, RStudio, VSCode and lots more.
- π¨βπ» Use your name to generate a unique Application name, e.g.
rstudio-vemonet
- Persistent storage will create automatically.
- It can be found at https://console-openshift-console.apps.dsri2.unimaas.nl/k8s/cluster/persistentvolumes
- β οΈ When copy/pasting the storage name it can happen that a space is added at the end. Be careful to trim all spaces at the start and the end of the storage name before starting the application, otherwise it will fail
-
Access the application you just started
- π©βπ» You can find the URL of your application in the OpenShift web UI workshops overview.
Upload files ποΈ
π¨βπ» For small and medium size files you can simply drag and drop files and folder in the application web UI, or use the Upload files button in RStudio.
This solution works for files up to a few hundred MBs (depending on the application, use it until it fails!).
Upload your code π
We recommend you to use git
with GitHub or GitLab, you can use it directly from the terminal in all applications, or use the web UI integration each app proposes.
π See the documentation for each application:
- RStudio: https://maastrichtu-ids.github.io/dsri-documentation/docs/deploy-rstudio#use-git-in-rstudio
- VSCode: https://maastrichtu-ids.github.io/dsri-documentation/docs/deploy-vscode#use-git-in-vscode
- JupyterLab (with
jupyterlab-git
extension installed): https://maastrichtu-ids.github.io/dsri-documentation/docs/deploy-jupyter#use-git-in-jupyterlab
Upload large data files π¦
For large data files you will need to install the oc
command line interface.
If you have the time it can be quickly installed on MacOS, Linux (works with WSL):
- On Linux π§
wget https://github.com/openshift/origin/releases/download/v3.11.0/openshift-origin-client-tools-v3.11.0-0cbc58b-linux-64bit.tar.gz tar xvf openshift-origin-client-tools*.tar.gz cd openshift-origin-client*/ sudo mv oc kubectl /usr/local/bin/
- On Mac π
brew install openshift-cli
- On Windows π’
π See the complete documentation to upload large data file
π‘ You will have a better connection when directly connected to the UMnet network (or eduroam at UM) to upload large data file. Even better if you can use ethernet wires.
Stop and delete your application β
π¨βπ» Stop your application from the OpenShift web UI Topology page:
You can use the Filter by name search box to quickly find your application based on the name you gave it.
Note: creating more than one pod (βScale upβ) is useless for most data science applications, such as RStudio, VSCode or JupyterLab. It is only relevant for applications running as a cluster, like Apache Flink or Apache Spark, or web application with a lot of traffic (OpenShift will redirect the traffic depending on pod availability, and start new pods if required, aka. horizontal scaling).
π©βπ» Delete your application:
- If you installed the
oc
command line interface, it is easier to use it to delete all the objects related to your application:
oc delete all,secret,configmaps,serviceaccount,rolebinding --selector app=my-application
Replace
my-application
by the Application name you defined.
- Otherwise you will need to manually delete a few objects related to your application in the OpenShift web UI, it can be done easily from the Overview page:
- Delete the Route
- Delete the Service
- Delete the Deployment Config
π See the complete documentation to delete an application.
See you soon! π
π Fill this form to help us create a project for you on the Data Science Research Infrastructure for a longer term!