Preparing Your Application¶
As described in the Introduction, an application must provide an MLproject file. This file is essential for Mantik as it will be read in together with the Compute Backend Config, to execute an application on a remote system such as HPC. Generally, there are at least two ways of managing your application to be able to execute it on an external system:
Managing code in Python scripts in the directory of the MLproject file (see Managing Code in Python Scripts).
Maintaining your code in a Python library (or package), and invoke the library using small Python scripts in the
MLproject
directory, or a CLI, each of which can be invoked in theMLproject
entry points (see Managing Code in a Python Library).
If you’re working with option 2 and also use Apptainer to containerize your application, section Using a Containerized Python Library describes how to have the latest state of the library inside the container on the remote system at runtime.
The MLproject format¶
The MLproject format is used to abstract/package your data science code in a format such that it can be used with Mantik’s CLI commands. In addition, Mantik extends the MLproject structure with the Compute Backend config.
This tutorial will guide you through the creation of an MLproject as it is also used with MLflow. For more information, see the MLflow documentation.
Note
The term “MLproject” can refer to three slightly different things:
The MLproject file in which the project is configured.
The standard name of the project directory (
mlproject
).The overall project.
Usually, the meaning results from context. In the following, it will always refer to the MLproject file of an MLflow project.
Writing an MLproject file¶
The file
MLproject
is required to configure your ML application to run with Mantik.
It is important to note that this file must be called MLproject
and placed in a sub-directory of your Git repository.
(This sub-directory’s content is deployed to the run directory on the remote system by Mantik at run submission.)
It must contain the following sections:
name
: Project nameentry_points
: Entry points for the project (represents e.g. a Python script or a CLI command)
Let’s say you want to name your project my-project
and configure one entrypoint called main
(which is the default).
That entry point should execute a Python script main.py
, which allows to pass one argument lr
of type float
.
Here, lr
might for example be the learning rate used for training a neural network.
The corresponding MLproject file then looks as follows:
name: my-project
entry_points:
main:
parameters:
learning_rate: float
command: "python main.py --learning_rate {learning_rate}"
Here, the name of the MLflow parameter does not necessarily have to be equal to the argument of the Python script. Hence,
entry_points:
main:
parameters:
lr: float
command: "python main.py --learning_rate {lr}"
would also be possible.
Moreover, MLflow entry points can define default values that will be used for parameters if they’re not provided:
entry_points:
main:
parameters:
lr:
type: float
default: 0.1
command: "python main.py --learning_rate {lr}"
Managing Code in Python Scripts¶
One option to maintain applications with Mantik that can be deployed to external compute systems is maintaining your code in Python scripts located in the directory where the MLproject file is located. (Because this folder is deployed to the run directory on the external system.)
In our example, the main Python script is main.py
as defined in the MLproject file main
entry point.
In the file in which the training is configured, make sure to use MLflow for tracking.
The example below reads the argument learning-rate
from the CLI in order to
access parameters as defined in MLproject
.
mlflow
is used to log the parameter to the tracking server.
import argparse
import mlflow
def parse_arguments():
"""Parse command line arguments."""
parser = argparse.ArgumentParser()
parser.add_argument("--learning-rate")
return parser.parse_args()
if __name__ == "__main__":
arguments = parse_arguments()
with mlflow.start_run():
mlflow.log_param("learning rate", args.learning_rate)
Containers and Virtual Environments¶
To make your application executable on the remote system, i.e. to provide all necessary software requirements, you might use Apptainer containers or Python virtual environments.
If you want your project to run in a
container, an Apptainer (formerly Singularity) image must be present either in the MLproject directory or on the remote system. For more details on containerized applications see Containerizing Applications.
Python virtual environment, a Python venv must be present in the MLproject directory or the remote system.
You must provide the path under which your container image or Python venv is located in the Compute Backend Config.
The usage of containers or virtual environments via Mantik is optional, though.
Compute Backend Configuration¶
The Compute Backend Config is one of the extensions to the standard MLproject directory structure as proposed by MLflow. Here, all configuration options for UNICORE or firecREST and the resource that should be allocated can be provided (see Supported HPC Sites and Compute Backend Config).
The Compute Backend Config for the above example might look as follows:
UnicoreApiUrl: https://zam2125.zam.kfa-juelich.de:9112/JUWELS/rest/core
Environment:
Apptainer:
Path: my-apptainer-image.sif
Type: local
Variables:
TEST_ENV_VAR: variable value
Resources:
Queue: batch
Nodes: 2
Exclude:
- "**/*.py"
- another-apptainer-image.sif
Firecrest:
ApiUrl: https://firecrest.cscs.ch
TokenUrl: https://auth.cscs.ch/auth/realms/firecrest-clients/protocol/openid-connect/token
Machine: daint
Environment:
Apptainer:
Path: my-apptainer-image.sif
Type: local
Variables:
TEST_ENV_VAR: variable value
Resources:
Queue: batch
Nodes: 2
Exclude:
- "**/*.py"
- another-apptainer-image.sif
Managing Code in a Python Library¶
If your main runscript main.py
becomes lengthy or complex, you might consider creating a library by outsourcing code to separate files
that are part of a Python package (or library) managed outside your MLproject directory.
To ensure accessibility when submitting a run, specific commands need to be specified
in the PreRunCommandOnLoginNode
in the Compute Backend Config. Otherwise, your main run script won’t have access to the library, given that only the
content of the MLproject directory is sent to the remote system when submitting a run. Let’s go through an example on how this would work:
To execute a main run script that utilizes a developed library, begin by cloning your code repository into a directory of the remote system (e.g. your home directory).
git clone https://github.com/<user-name>/my_application
Set up a virtual environment and install all required dependencies in editable mode:
cd <application directory>
pip install -e .
Assuming that the Code Repository my_application
has the following directory structure:
└── my_application
├── my_application
│ ├── preprocessing.py
│ ├── ...
│
├── tests
└── mlproject
├── MLproject
├── compute-backend-config.yaml
└── main.py
The code in preprocessing.py
:
def execute_preprocesing():
...
The code in main.py
:
import my_application.preprocessing as preprocessing
if __name__ == "__main__":
preprocessing.execute_preprocessing()
Then specify the PreRunCommandOnLoginNode
in the Compute Backend Config as follows:
Environment:
PreRunCommandOnLoginNode:
- cd <path to>/<my-application>
- git fetch
- git checkout $MANTIK_GIT_REF
- git status
- git pull
- cd $MANTIK_WORKING_DIRECTORY
The environment variables MANTIK_GIT_REF
and MANTIK_WORKING_DIRECTORY
are two environment variables set by Mantik internally when submitting your application.
Using a Containerized Python Library¶
We always encourage the usage of containerized applications due to flexibility reasons. For details on containerization, see Containerizing Applications.
However, if you use a container for running your application and maintain your code in a Python library, you don’t always want to re-deploy that image to the remote system for any small change. To avoid this, you can basically use the same strategy as described in the previous section to always get the most recent code available in your container.
To achieve this, you have to copy your local library into the container and install the library in editable mode inside the container at build time:
Bootstrap: docker
From: python:3.7
%files
. /opt/my-application
%post
cd /opt/my-application
pip install -e .
Then, in at runtime, you have to mount the Git repo located on the remote system into the location of your library inside the container
using the -B
/--bind
Apptainer option in the Compute Backend Config:
...
Environment:
Apptainer:
Path: image.sif
Type: local
Options:
- --nv
- -B <path to my-application dir>:/opt/my-application
PreRunCommandOnLoginNode:
- cd <path to>/<my-application>
- git fetch
- git checkout $MANTIK_GIT_REF
- git status
- git pull
- cd $MANTIK_WORKING_DIRECTORY
This will then update your library on the remote system to the latest changes and mount it into the container at runtime to also make the latest changes available inside the container.