Building and Serving a Model for InferenceΒΆ

The Mantik platform can be used for converting trained MLflow models to containers. This makes it possible to run the models for inference on any machine with no extra setup.

Containerization has to be triggered through the api, and downloading with the Mantik Client. Registering the results of a training run as a trained model is done using the Mantik API.

See the tutorial for more information on this.

Build and Retrieve a Model ContainerΒΆ

Call an API endpoint to start the process of dockerizing a Model. Get the PROJECT ID and MODEL ID of the model you want to dockerize and insert into the below command.

curl -X 'POST' \
  'https://api.cloud.mantik.ai/projects/<PROJECT ID>/models/trained/<MODEL ID>/docker/build' \
  -H 'accept: application/json' \
  -H "Authorization: Bearer ${MLFLOW_TRACKING_TOKEN}" \
  -d ''

It will take approximately 5-10 minutes to build your model.

You can retrieve it using the Mantik Client like

mantik models download --project-id="<PROJECT ID>" --model-id="<MODEL ID>" --load

The client models download options are

Usage: mantik models download [OPTIONS]

  Download the containerized version of the model.

Options:
  --project-id UUID               [required]
  --model-id UUID                 [required]
  --target-dir TEXT               Path to directory where the zipped tarball
                                  image will be downloaded.  [default: ./]
  --image-type [docker|apptainer]
                                  Type of the image to fetch from the mantik
                                  platform.  [default: docker]
  --load / --no-load              Load the tarball image into docker.
                                  [default: no-load]
  --help                          Show this message and exit.

Serving the Model Container for InferenceΒΆ

Once you have your model in your local docker images, serve it using

docker run -p 8080:8080 <MODEL ID>-docker

It will start up a rest api locally, and expose a POST http://127.0.0.1:8080/invocations endpoint, ready for inference.

See the tutorial for an example this, complete with an inference.