Preparing Your Application

As described in the Introduction, an application must provide an MLproject file. This file is essential for Mantik as it will be read in together with the Compute Backend Config, to execute an application on a remote system such as HPC. Generally, there are at least two ways of managing your application to be able to execute it on an external system:

  1. Managing code in Python scripts in the directory of the MLproject file (see Managing Code in Python Scripts).

  2. Maintaining your code in a Python library (or package), and invoke the library using small Python scripts in the MLproject directory, or a CLI, each of which can be invoked in the MLproject entry points (see Managing Code in a Python Library).

If you’re working with option 2 and also use Apptainer to containerize your application, section Using a Containerized Python Library describes how to have the latest state of the library inside the container on the remote system at runtime.

The MLproject format

The MLproject format is used to abstract/package your data science code in a format such that it can be used with Mantik’s CLI commands. In addition, Mantik extends the MLproject structure with the Compute Backend config.

This tutorial will guide you through the creation of an MLproject as it is also used with MLflow. For more information, see the MLflow documentation.

Note

The term “MLproject” can refer to three slightly different things:

  • The MLproject file in which the project is configured.

  • The standard name of the project directory (mlproject).

  • The overall project.

Usually, the meaning results from context. In the following, it will always refer to the MLproject file of an MLflow project.

Writing an MLproject file

The file MLproject is required to configure your ML application to run with Mantik. It is important to note that this file must be called MLproject and placed in a sub-directory of your Git repository. (This sub-directory’s content is deployed to the run directory on the remote system by Mantik at run submission.)

It must contain the following sections:

  • name: Project name

  • entry_points: Entry points for the project (represents e.g. a Python script or a CLI command)

Let’s say you want to name your project my-project and configure one entrypoint called main (which is the default). That entry point should execute a Python script main.py, which allows to pass one argument lr of type float. Here, lr might for example be the learning rate used for training a neural network.

The corresponding MLproject file then looks as follows:

name: my-project

entry_points:
  main:
    parameters:
      learning_rate: float
    command: "python main.py --learning_rate {learning_rate}"

Here, the name of the MLflow parameter does not necessarily have to be equal to the argument of the Python script. Hence,

entry_points:
  main:
    parameters:
      lr: float
    command: "python main.py --learning_rate {lr}"

would also be possible.

Moreover, MLflow entry points can define default values that will be used for parameters if they’re not provided:

entry_points:
  main:
    parameters:
      lr:
        type: float
        default: 0.1
    command: "python main.py --learning_rate {lr}"

Managing Code in Python Scripts

One option to maintain applications with Mantik that can be deployed to external compute systems is maintaining your code in Python scripts located in the directory where the MLproject file is located. (Because this folder is deployed to the run directory on the external system.)

In our example, the main Python script is main.py as defined in the MLproject file main entry point. In the file in which the training is configured, make sure to use MLflow for tracking.

The example below reads the argument learning-rate from the CLI in order to access parameters as defined in MLproject. mlflow is used to log the parameter to the tracking server.

import argparse

import mlflow

def parse_arguments():
    """Parse command line arguments."""
    parser = argparse.ArgumentParser()
    parser.add_argument("--learning-rate")
    return parser.parse_args()

if __name__ == "__main__":
    arguments = parse_arguments()
    with mlflow.start_run():
        mlflow.log_param("learning rate", args.learning_rate)

Containers and Virtual Environments

To make your application executable on the remote system, i.e. to provide all necessary software requirements, you might use Apptainer containers or Python virtual environments.

If you want your project to run in a

  • container, an Apptainer (formerly Singularity) image must be present either in the MLproject directory or on the remote system. For more details on containerized applications see Containerizing Applications.

  • Python virtual environment, a Python venv must be present in the MLproject directory or the remote system.

You must provide the path under which your container image or Python venv is located in the Compute Backend Config.

The usage of containers or virtual environments via Mantik is optional, though.

Compute Backend Configuration

The Compute Backend Config is one of the extensions to the standard MLproject directory structure as proposed by MLflow. Here, all configuration options for UNICORE or firecREST and the resource that should be allocated can be provided (see Supported HPC Sites and Compute Backend Config).

The Compute Backend Config for the above example might look as follows:

UnicoreApiUrl: https://zam2125.zam.kfa-juelich.de:9112/JUWELS/rest/core
Environment:
  Apptainer:
    Path: my-apptainer-image.sif
    Type: local
  Variables:
    TEST_ENV_VAR: variable value
Resources:
  Queue: batch
  Nodes: 2
Exclude:
  - "**/*.py"
  - another-apptainer-image.sif
Firecrest:
  ApiUrl: https://firecrest.cscs.ch
  TokenUrl: https://auth.cscs.ch/auth/realms/firecrest-clients/protocol/openid-connect/token
  Machine: daint
Environment:
  Apptainer:
    Path: my-apptainer-image.sif
    Type: local
  Variables:
    TEST_ENV_VAR: variable value
Resources:
  Queue: batch
  Nodes: 2
Exclude:
  - "**/*.py"
  - another-apptainer-image.sif

Managing Code in a Python Library

If your main runscript main.py becomes lengthy or complex, you might consider creating a library by outsourcing code to separate files that are part of a Python package (or library) managed outside your MLproject directory. To ensure accessibility when submitting a run, specific commands need to be specified in the PreRunCommandOnLoginNode in the Compute Backend Config. Otherwise, your main run script won’t have access to the library, given that only the content of the MLproject directory is sent to the remote system when submitting a run. Let’s go through an example on how this would work:

To execute a main run script that utilizes a developed library, begin by cloning your code repository into a directory of the remote system (e.g. your home directory).

git clone https://github.com/<user-name>/my_application

Set up a virtual environment and install all required dependencies in editable mode:

cd <application directory>
pip install -e .

Assuming that the Code Repository my_application has the following directory structure:

└── my_application
    ├── my_application
    │   ├── preprocessing.py
    │   ├── ...
    │
    ├── tests
    └── mlproject
        ├── MLproject
        ├── compute-backend-config.yaml
        └── main.py

The code in preprocessing.py:

def execute_preprocesing():
    ...

The code in main.py:

import my_application.preprocessing as preprocessing

if __name__ == "__main__":
    preprocessing.execute_preprocessing()

Then specify the PreRunCommandOnLoginNode in the Compute Backend Config as follows:

Environment:
  PreRunCommandOnLoginNode:
    - cd <path to>/<my-application>
    - git fetch
    - git checkout $MANTIK_GIT_REF
    - git status
    - git pull
    - cd $MANTIK_WORKING_DIRECTORY

The environment variables MANTIK_GIT_REF and MANTIK_WORKING_DIRECTORY are two environment variables set by Mantik internally when submitting your application.

Using a Containerized Python Library

We always encourage the usage of containerized applications due to flexibility reasons. For details on containerization, see Containerizing Applications.

However, if you use a container for running your application and maintain your code in a Python library, you don’t always want to re-deploy that image to the remote system for any small change. To avoid this, you can basically use the same strategy as described in the previous section to always get the most recent code available in your container.

To achieve this, you have to copy your local library into the container and install the library in editable mode inside the container at build time:

Bootstrap: docker
From: python:3.7

%files
  . /opt/my-application

%post
  cd /opt/my-application
  pip install -e .

Then, in at runtime, you have to mount the Git repo located on the remote system into the location of your library inside the container using the -B/--bind Apptainer option in the Compute Backend Config:

...
Environment:
  Apptainer:
    Path: image.sif
    Type: local
    Options:
      - --nv
      - -B <path to my-application dir>:/opt/my-application
  PreRunCommandOnLoginNode:
    - cd <path to>/<my-application>
    - git fetch
    - git checkout $MANTIK_GIT_REF
    - git status
    - git pull
    - cd $MANTIK_WORKING_DIRECTORY

This will then update your library on the remote system to the latest changes and mount it into the container at runtime to also make the latest changes available inside the container.