Using MLflow for Tracking¶
This tutorial helps you to get started with MLflow for experiment tracking by giving a short overview of the main features of MLflow. For detailed information see the MLflow documentation.
Authentication at the Mantik Platform¶
You can use all MLflow tracking functions with the Mantik platform. However, in order initialize the tracking with Mantik, a token has to be retrieved from the Mantik API to allow authenticating at the MLflow Tracking Server.
The command
import mantik
mantik.init_tracking()
reads environment variables (see Credentials and Environment Variables) and acquires an API token for secure communication with the MLflow Tracking Server.
The required environment variables for tracking are:
MANTIK_USERNAME
MANTIK_PASSWORD
MLFLOW_TRACKING_URI
(default: https://api.cloud.mantik.ai/tracking/)
Attention
In some cases, MLflow sends API requests before executing any code.
Thus, it might be required to use the CLI command mantik init
before running
any Python code.
Being Independent of API Call Success¶
Sometimes invoking a Mantik or MLflow method may fail due to multiple reasons.
To avoid crashing the actual application because of a single API call failing,
we provide a set of convenience functions in mantik.mlflow
that wrap the Mantik and MLflow functions
and log warnings if an invocation of an MLflow method has failed.
The functions cover mantik.mlflow.init_tracking()
and the most important MLflow methods and can be used as follows:
import mantik.mlflow
with mantik.mlflow.start_run():
...
manitk.mlflow.log_param("<name>", "<value>")
If a method is not provided, there is a general
mantik.mlflow.call_method
function
that allows to comfortably call any MLflow method and silently fail with a log message if it fails:
import mantik.mlflow
with mantik.mlflow.start_run():
manitk.mlflow.call_method("log_param", "<name>", "<value>")
In the following, we will mostly make use of the convenience functions that come with Mantik. Each of the method, however, just attempts to invoke the respective MLflow method.
Creating an MLflow Run¶
Once you’ve authenticated yourself at the Mantik platform and created an experiment, you can make use of all MLflow tracking commands in your scripts to configure what is tracked when you e.g. train your ML models.
However, MLflow only allows tracking within the context of an active run. An MLflow run can be created by using context manager, which is the recommended way:
import mlflow
with mlflow.start_run():
...
or via two separate method calls:
import mlflow
mlflow.start_run()
...
mlflow.end_run()
After mlflow.start_run()
, and before mlflow.end_run()
(or before exit of the context),
any MLflow method can be used for tracking to the respective run.
Danger
An MLflow run is not necessarily identical to a Mantik run. While a Mantik run via the Compute Backend always automatically creates an MLflow run that can be used for tracking, an MLflow run might also be created manually to track e.g. training of a ML model.
The Mantik Compute Backend starts an MLflow run and sets the environment variable MLFLOW_RUN_ID
to the ID of that run.
Therefore, use mlflow.start_run()
without passing a run_id
, otherwise MLflow will create an additional run.
This still allows to create nested runs.
Note that mlflow.active_run()
will only detect Mantik’s run once mlflow.start_run()
or any other of the tracking functions has been called.
Remote Tracking¶
By default, MLflow tracks experiments to your local filesystem. In order to use
a remote tracking server, the environment variable MLFLOW_TRACKING_URI
must
be set.
When using the Submitting an Application,
the tracking URI (MLFLOW_TRACKING_URI
), the API token (MLFLOW_TRACKING_TOKEN
), and the MLFLOW run ID (MLFLOW_RUN_ID
)
are provided to the run environment automatically by setting the respective environment variables.
Hence, you do not need to call mantik.init_tracking()
inside your MLproject code.
Experiment Specification¶
Experiments can be referenced by ID or name. We recommend to create experiments
on the platform (see Experiments) and then use IDs to refer to the experiment.
With standard MLflow usage, you can set the MLFLOW_EXPERIMENT_ID
environment variable to track
current runs to the specified experiment. Experiment IDs can be found either in the detail
view of an experiment on the Mantik platform or in the MLflow GUI, where the ID is shown
above the description section of an experiment.
When using the Compute Backend, the experiment_id
is a required
argument in the mantik.ComputeBackendClient.submit_run
method.
Tracking Functions¶
Mlflow offers a multitude of tracking functions, all of which are integrated with the Mantik platform. Below, the most important methods are explained briefly. For detailed reference, refer to the MLflow documentation.
Autologging¶
Mlflow supports many Python ML frameworks out of the box. Autologging enables automatic logging of model configurations from these frameworks, see autolog documentation. You can enable it with:
import mlflow
mlflow.autolog()
Parameters¶
Parameters can be logged explicitly with the
log_params
method.
This is especially useful if you have custom code and parameters that are not
tracked with autologging:
import mlflow
with mlflow.start_run():
mlflow.log_params({"parameter1_name": value1, "parameter2_name": value2})
or alternatively
import mlflow
with mlflow.start_run():
mlflow.log_param("parameter_name", value)
to log individual parameters.
We recommend log_params
over log_param
due to a much better performance
(at least 3X faster, up to 20X faster for a large number of parameters)
Metrics¶
Metrics, mostly regarding the performance of ML models, can be
tracked explicitly with the
log_metrics
method.
import mlflow
with mlflow.start_run():
...
mlflow.log_metrics(metrics={"mse": 2500.00, "rmse": 50.00})
Analogously to parameters, also individual metrics can be logged as
import mlflow
with mlflow.start_run():
...
mlflow.log_metric("metric_name", value)
We recommend log_metrics
over log_metric
due to a much better performance
(at least 3X faster, up to 20X faster for a large number of parameters)
Real-time Monitoring of Training Processes¶
mlflow.log_metrics
/mlflow.log_metric
also allow an optional step
parameter
that allows tracking metrics per step (e.g. epoch).
Metrics logged using step
are available in real-time in the MLflow GUI
to allow monitoring the traning process via, for example, the training loss.
You may either use mlflow.log_metrics()
inside your training loop (e.g. for PyTorch
),
or write a callback (e.g. for TensorFlow) that are
used at each epoch end to automatically track desired metrics:
import mlflow
with mlflow.start_run():
...
for epoch in range(10):
loss = ...
mlflow.log_metrics(metrics={"loss": loss}, step=epoch)
import keras.callbacks
import mlflow
class Metrics(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
mlflow.log_metrics({"loss": logs["loss"]}, step=epoch)
Artifacts¶
Artifacts are any kind of files that you might want to keep from your
experiments, e.g. images, diagrams, or generated text files. Custom artifacts
can be logged with the
log_artifact
method.
import mlflow
with mlflow.start_run():
mlflow.log_artifact("artifact_name", artifact)
For example
import mlflow
import matplotlib.pyplot as plt
fig = plt.figure()
plt.plot([0.2, 0.4], [0.2, 0.2])
fig.savefig('confusion_matrix.pdf')
with mlflow.start_run():
mlflow.log_artifact("confusion_matrix.pdf")
Alternatively, you can use
import mlflow
with mlflow.start_run():
mlflow.log_artifacts("/local/path/to/upload")
We recommend log_artifact
over log_artifacts
due to a much better performance (up to 2X faster).
(Which is different than for log_params
vs. log_param
and log_metrics
vs log_metric
.)
Models¶
Trained models can be stored for later reusability. Autologging also logs
models from supported frameworks. Since every framework might have a custom way
to save models, model storage is subject to the corresponding mlflow
submodules. E.g. if you want to explicitly log an sklearn model, you can use
the mlflow.sklearn.log_model
method.
import mlflow.sklearn
mlflow.sklearn.log_model(model)
See ../inference.md
for more details on how to save and load models.
Saving and Loading Models with MLflow¶
MLflow introduces a standard format for packaging ML models known as
MLflow Model.
The format defines a convention that lets you save and load a model in
multiple flavors (pytorch, keras, sklearn) that can be interpreted by different
downstream platforms. For example, mlflow.sklearn contains save_model
, log_model
,
and load_model
functions for scikit-learn models which allow to save and load
scikit-learn models.
Saving Models¶
Models can be saved and loaded in different ways. Many popular ML libraries (e.g., TensorFlow, PyTorch, Scikit-Learn) are integrated into MLflow, which makes logging models especially easy. For example, a PyTorch model can be saved to the MLflow tracking server just by calling mlflow.pytorch.log_model(model, "<model_name>")
. Currently supported built-in flavors can be found here.
MLflow can be extended to support additional flavors as explained here.
Loading Models¶
You can load a previously logged model for inference in any script or notebook.
To load the model you can use the built-in utilities of the same ML framework (e.g., mlflow.pytorch.load_model
) that you
used for saving the model. Moreover, you can load the model in two ways where the
first way is loading the model using an MLflow run and the second way is using MLflow’s model registry.
When referencing a model run, the following path should be
provided to the load_model
function: "runs:/<mlflow_run_id>/<path_to_model>"
,
where mlflow_run_id
is the run id under which your model was saved and
path_to_model
is the relative path to the model within the run’s artifacts directory.
You can look at the respective run and see the logged model through the Mantik MLflow
UI. There you can also find the RUN ID.
logged_model = "runs:/<mlflow_run_id>/<path_to_model>"
loaded_model = mlflow.pytorch.load_model(logged_model)
The model can also be loaded through the MLflow model registry. The model registry is MLflows central hub of managing models, which allows for model versioning and utilities for model deployment.
For this the model needs to be registered. You can register your saved model from a completed run in the MLflow UI. When viewing the model artifacts here, just
press the “Register Model” button that appears in the upper left corner when selecting the saved model artifact. After this step,
you can load the model by also using mlflow.pytorch.load_model
with a slightly different reference path. To load a model from the model registry,
the path should be "models:/<model_name>/<version>"
where model_name
is the name
of your model in the model registry and version
is the version of the model in the
model registry.
Prediction¶
After loading the model, you can simply make predictions on new input data.
prediction = loaded_model(input_data)
The function call differs based on the framework used to create the model. For more information on the various function calls, please refer to the following documentation.