Creating a project
đĒ Start a new prediction project
We recommend to bootstrap your project using our template, it will generate all required files to get started with an example training and prediction workflow.
Run these commands to install cookiecutter
and generate your project from the template:
pip install cookiecutter
cookiecutter https://github.com/MaastrichtU-IDS/cookiecutter-openpredict-api
âšī¸ Once your project has been generated, checkout the generated README.md
for the instructions on how to run your project in development.
đĒ Enter the development environment
Use hatch shell
to enter a virtual environment for development with all required dependencies installed. Once in the environment you will be able to use commands such as dvc
đ Setup data version control
openpredict
uses dvc
for data version control, it helps you to easily store datasets used by your machine learning workflows that are too big for git, and keep track of changes in a way similar to git. And similarly to git, with dvc you will need to choose a platform to publish your data, such as DagsHub or HuggingFace.
Here we document the process using DagsHub to publish data related to a ML experiment, but you could choose to use a different platform for your project if you wish.
â ī¸ Open source projects on DagsHub using the free plan have a 10G storage limit.
-
Go to dagshub.com, and login with GitHub or Google
-
Create a new project in DagsHub by connecting it to the GitHub repository with the code for the experiment (this repository)
-
Set your DagsHub credentials in your local terminal (add these commands to your
~/.bashrc
or~/.zshrc
to enable it automatically on boot):
- Link your local repository to the created DagsHub project:
dvc remote add origin https://dagshub.com/$DAGSHUB_USER/openpredict-model.dvc
dvc remote default origin
dvc remote modify origin --local auth basic
dvc remote modify origin --local user $DAGSHUB_USER
dvc remote modify origin --local password $DAGSHUB_TOKEN
Push data
â ī¸ Put all data files required to train the model, and the files generated by the training to the data/
folder and publish this data to the remote repository
You can check the status dvc
in the current repository with:
First add the changes made to the data/
folder:
Then push the added data:
Alternatively you can use this shortcut to add changes and push in one command:
Pull data
To retrieve all data from the remote repository:
đĒ Run in development
You are free to setup your development workflow as you wish, consider those instructions as recommendations which work out-of-the-box with the code generated by the template.
Note that scripts executed with hatch run
are defined in the pyproject.toml
file, feel free to check it out and change them as needed.
Deploy API and train
Deploy your prediction function as a Translator Reasoner API on http://localhost:8808
Run the script to train the model:
Test
Run the tests defined in the tests/
folder locally:
Add dependencies
Add dependencies directly in the pyproject.toml
. Try to keep the main dependencies minimal: just what is needed to run the predictions functions. And add all dependencies required for training in the train
optional dependencies.
Hatch will automatically update the virtual environment the next time you use it to run a script.
If you are facing issue with the dependencies (e.g. not updated properly), you can reset the environment with:
đŗ Run with docker
You can also run the training script in docker, see the docker-compose.yml
if you need to change the command to execute the script:
Or start the TRAPI API:
Or start a JupyterLab/VSCode workspace on http://localhost:8888: