FAQ and Common Issues

Q: Why am I getting an Invalid token response although my credentials are correct?

A: Tokens are stored locally in a JSON file at ~/.mantik/tokens.json. If you experience any issues with the authentication, simply delete that file and re-try.

Q: Why is an experiment created via the MLflow GUI is not recognized by Mantik?

A: When an experiment is created directly in the MLflow UI, it will not be recorded in Mantik. It is therefore possible to have a mismatched state between Mantik and MLflow. For experiments to be recognized both by MLflow and Mantik, you have to create them using the Mantik platform.

Hence, when creating experiments directly via MLflow (mlflow.create_experiment), the experiment will also not be recognized by Mantik.

Q: Why am I getting multiple runs when submitting my application via Mantik?

A: Counter question: Do you pass a run_id or run_name to mlflow.start_run() or excplicitly set the MLFLOW_RUN_ID environment variable? If yes: don’t do it! If no, please submit a bug report.

Managing runs explicitly from scripts provides some challenges:

  • There is a concept called ActiveRun, which represents the currently active MLflow run. When a run is submitted remotely, a MLflow run is created automatically, and that run’s ID is set as MLFLOW_RUN_ID in the run’s environment. When using mlflow.start_run(), MLflow will pick up this run and use it for logging.

  • The MLflow docs state that an ActiveRun can be picked up in a script using mlflow.active_run(). This method, though, might not behave as expected: it only picks up a run that was created and/or resumed earlier on in the current process (or script) using mlflow.start_run(). Unlike mlflow.start_run(), it does not consume the MLFLOW_RUN_ID.

If you really want to manage runs directly from scripts, mlflow.end_run() needs to be called to stop the created run created and be able to create a new MLflow run.

Q: How can I improve performance when tracking with MLflow?

A: Tests performed by us have shown that the usage of the following MLflow methods is preferred when tracking:

  • log_params: at least 3X faster, up to 20X faster for a large number of parameters compared to log_param

  • log_metrics: at least 3X faster, up to 20X faster for a large number of parameters compared to log_metric

  • log_artifact: up to 2X faster comapred to log_artifacts (which is different than for log_params vs. log_param and log_metrics vs log_metric)

Q: How can I schedule a run without triggering a run immediately?

A: Unfortunately, this is not possible as of right now.