Creating a project

🍪 Start a new prediction project

We recommend to bootstrap your project using our template, it will generate all required files to get started with an example training and prediction workflow.

Run these commands to install cookiecutter and generate your project from the template:

pip install cookiecutter
cookiecutter https://github.com/MaastrichtU-IDS/cookiecutter-openpredict-api

ℹ️ Once your project has been generated, checkout the generated README.md for the instructions on how to run your project in development.

🚪 Enter the development environment

Use hatch shell to enter a virtual environment for development with all required dependencies installed. Once in the environment you will be able to use commands such as dvc

hatch shell

🔎 Setup data version control

openpredict uses dvc for data version control, it helps you to easily store datasets used by your machine learning workflows that are too big for git, and keep track of changes in a way similar to git. And similarly to git, with dvc you will need to choose a platform to publish your data, such as DagsHub or HuggingFace.

Here we document the process using DagsHub to publish data related to a ML experiment, but you could choose to use a different platform for your project if you wish.

⚠️ Open source projects on DagsHub using the free plan have a 10G storage limit.

Go to dagshub.com, and login with GitHub or Google
Create a new project in DagsHub by connecting it to the GitHub repository with the code for the experiment (this repository)
Set your DagsHub credentials in your local terminal (add these commands to your ~/.bashrc or ~/.zshrc to enable it automatically on boot):

export DAGSHUB_USER="your-org-or-username"
export DAGSHUB_TOKEN="TOKEN"

Link your local repository to the created DagsHub project:

dvc remote add origin https://dagshub.com/$DAGSHUB_USER/openpredict-model.dvc
dvc remote default origin
dvc remote modify origin --local auth basic
dvc remote modify origin --local user $DAGSHUB_USER
dvc remote modify origin --local password $DAGSHUB_TOKEN

Push data

⚠️ Put all data files required to train the model, and the files generated by the training to the data/ folder and publish this data to the remote repository

You can check the status dvc in the current repository with:

dvc status

First add the changes made to the data/ folder:

dvc add data

Then push the added data:

dvc push

Alternatively you can use this shortcut to add changes and push in one command:

hatch run push-data

Pull data

To retrieve all data from the remote repository:

dvc pull

🪄 Run in development

You are free to setup your development workflow as you wish, consider those instructions as recommendations which work out-of-the-box with the code generated by the template.

Note that scripts executed with hatch run are defined in the pyproject.toml file, feel free to check it out and change them as needed.

Deploy API and train

Deploy your prediction function as a Translator Reasoner API on http://localhost:8808

hatch -v run api

Run the script to train the model:

hatch run train

Test

Run the tests defined in the tests/ folder locally:

hatch run test -s

Add dependencies

Add dependencies directly in the pyproject.toml. Try to keep the main dependencies minimal: just what is needed to run the predictions functions. And add all dependencies required for training in the train optional dependencies.

Hatch will automatically update the virtual environment the next time you use it to run a script.

If you are facing issue with the dependencies (e.g. not updated properly), you can reset the environment with:

hatch env prune

🐳 Run with docker

You can also run the training script in docker, see the docker-compose.yml if you need to change the command to execute the script:

docker-compose run training

Or start the TRAPI API:

docker-compose up api

Or start a JupyterLab/VSCode workspace on http://localhost:8888:

docker-compose up workspace