Translator API to compute and serve predictions of biomedical concepts associations
OpenPredict is a Python library and API to train and serve predicted biomedical entities associations (e.g. disease treated by drug).
Metadata about runs, models evaluations, features are stored using the ML Schema ontology in a RDF triplestore (such as Ontotext GraphDB, or Virtuoso).
Access the Translator OpenPredict API at https://openpredict.semanticscience.org ๐ฎ๐
You can use this API to retrieve predictions for drug/disease, or add new embeddings to improve the model.
Requirements: Python 3.6+ and
pip
installed
You can install the openpredict
python package with pip
to run the OpenPredict API on your machine, to test new embeddings or improve the library.
We currently recommend to install from the source code master
branch to get the latest version of OpenPredict. But we also regularly publish the openpredict
package to PyPI: https://pypi.org/project/openpredict
Clone the repository:
git clone https://github.com/MaastrichtU-IDS/translator-openpredict.git
cd translator-openpredict
Install openpredict
from the source code, the package will be automatically updated when the files changes locally :arrows_counterclockwise:
pip3 install -e .
If you face conflicts with already installed packages, then you might want to use a Virtual Environment to isolate the installation in the current folder before installing OpenPredict:
# Create the virtual environment folder in your workspace
python3 -m venv .venv
# Activate it using a script in the created folder
source .venv/bin/activate
On Windows you might also need to install Visual Studio C++ 14 Build Tools (required for
numpy
)
Start locally the OpenPredict API on http://localhost:8808
openpredict start-api
By default all data are stored in the data/
folder in the directory were you used the openpredict
command (RDF metadata, features and models of each run)
Contributions are welcome! If you wish to help improve OpenPredict, see the instructions to contribute :woman_technologist:
You can easily reset the data of your local OpenPredict deployment by deleting the data/
folder and restarting the OpenPredict API:
rm -rf data/
If you are working on improving OpenPredict, you can explore additional documentation to deploy the OpenPredict API locally or with Docker.
See the TESTING.md
file for more details on testing the API.
The user provides a drug or a disease identifier as a CURIE (e.g. DRUGBANK:DB00394, or OMIM:246300), and choose a prediction model (only the Predict OMIM-DrugBank
classifier is currently implemented).
The API will return predicted targets for the given drug or disease:
Feel free to try the API at openpredict.semanticscience.org
We provide Jupyter Notebooks with examples to use the OpenPredict API:
The default baseline model is openpredict-baseline-omim-drugbank
. You can choose the base model when you post a new embeddings using the /embeddings
call. Then the OpenPredict API will:
7621843c-1f5f-11eb-85ae-48a472db7414
)Once the embedding has been added you can find the existing models previously generated (including openpredict-baseline-omim-drugbank
), and use them as base model when you ask the model for prediction or add new embeddings.
Use this operation if you just want to easily retrieve predictions for a given entity. The /predict
operation takes 4 parameters (1 required):
drug_id
to get predicted diseases it could treat (e.g. DRUGBANK:DB00394
)
disease_id
to get predicted drugs it could be treated with (e.g. OMIM:246300
)Predict OMIM-DrugBank
)The API will return the list of predicted target for the given entity, the labels are resolved using the Translator Name Resolver API:
{
"count": 300,
"hits": [
{
"score": 0.8361061489249737,
"id": "OMIM:246300",
"label": "leprosy, susceptibility to, 3",
"type": "disease"
}
]
}
Try it at https://openpredict.semanticscience.org/predict?drug_id=DRUGBANK:DB00394
The /query
operation will return the same predictions as the /predict
operation, using the ReasonerAPI format, used within the Translator project.
The user sends a ReasonerAPI query asking for the predicted targets given: a source, and the relation to predict. The query is a graph with nodes and edges defined in JSON, and uses classes from the BioLink model.
See this ReasonerAPI query example:
{
"message": {
"query_graph": {
"edges": [
{
"id": "e00",
"source_id": "n00",
"target_id": "n01",
"type": "treated_by"
}
],
"nodes": [
{
"curie": "DRUGBANK:DB00394",
"id": "n00",
"type": "drug"
},
{
"id": "n01",
"type": "disease"
}
]
}
}
}
The results provides the following attributes for the knowledge_graph
edges:
"e0": {
"attributes": [
{
"name": "model_id",
"source": "OpenPredict",
"type": "EDAM:data_1048",
"value": "openpredict-baseline-omim-drugbank"
},
{
"name": "score",
"source": "OpenPredict",
"type": "EDAM:data_1772",
"value": "0.8267106697312154"
}
],
"object": "DRUGBANK:DB00394",
"predicate": "biolink:treated_by",
"relation": "RO:0002434",
"subject": "OMIM:246300"
},
The /predicates
operation will return the entities and relations provided by this API in a JSON object (following the ReasonerAPI specifications).
Try it at https://openpredict.semanticscience.org/predicates
Diagram of the data model used for OpenPredict, based on the ML Schema ontology (mls
):