Data Management with MantikΒΆ

The aim of this tutorial is to provide a step-by-step guide to present the Mantik workflow for managing data on remote file systems.

You will learn how to use the Mantik remote file service to:

  • upload data,

  • download data,

  • remove data.

For this tutorial, we assume that you know how to set-up a Mantik project, see Set Up Mantik Account for an introduction.

Warning

In order to follow this tutorial, you need credentials to one of the compute facilities (JUWELS, JURECA, JUSUF) from the JΓΌlich Supercomputing Center or an AWS S3 bucket.

Bookkeeping with Mantik is possible with basically any form of data. For the following, we used images and zipped files that have been organized in the my-images directory in subdirectories as shown below. For the rest of tutorial, we assume that the user’s current working directory is /my-images.

my-images
β”œβ”€β”€ imageDir
β”‚   β”œβ”€β”€ image4.jpeg
β”‚   β”œβ”€β”€ index5.jpeg
β”œβ”€β”€ imageDir1
β”‚   β”œβ”€β”€ image6.jpeg
β”‚   β”œβ”€β”€ index7.jpeg
β”œβ”€β”€ image1.jpeg
β”œβ”€β”€ image2.jpeg
β”œβ”€β”€ image3.jpeg
β”œβ”€β”€ imageArchiv1.zip
β”œβ”€β”€ imageArchiv2.zip
└── .env

To use the remote file service a couple of environment variables have to be set. For details, have a look at the documentation for the Remote File Service. We set up the variables in .env and sourced it, i.e., source .env (see the next section for a sample file).

Example .env fileΒΆ

# .env

# Required environment variables for tracking to Mantik the MLflow server
export MANTIK_USERNAME=<user>
export MANTIK_PASSWORD='<password>'
# Optional environment variables
export MLFLOW_TRACKING_URI=https://cloud.mantik.ai/
export MANTIK_API_URL=https://api.cloud.mantik.ai/

# Required environment variables for remote execution and data management on HPC via UNICORE.
export MANTIK_COMPUTE_BUDGET_ACCOUNT=<compute project>
export MANTIK_UNICORE_AUTH_SERVER_URL=https://uftp.fz-juelich.de:9112/UFTP_Auth/rest/auth/JUDAC

# Required environment variables for S3 file service
export AWS_ACCESS_KEY_ID=<access key ID>
export AWS_SECRET_ACCESS_KEY=<secret access key>

Upload data to remote data storageΒΆ

Examples for JUWELSΒΆ

In order to upload data and bookkeep it through Mantik, we use the Mantik client. For a detailed description on how to set up and use the client, have a look at Remote File Service. Here, we first upload an image file and then additionally upload zipped images to JUWELS via

mantik unicore-file-service copy-file --connection-id=<connection ID> ./image1.jpeg remote:/p/home/jusers/<JUWELS user account>/juwels/image1.jpeg

and

mantik unicore-file-service copy-file --connection-id=<connection ID> ./imageArchiv1.zip remote:/p/home/jusers/<JUWELS user account>/juwels/imageArchiv1.zip

Furthermore, Mantik remote file service allows for the upload of directories and their content. Uploading them to JUWELS is done by calling

mantik unicore-file-service copy-directory --connection-id=<connection ID> ./imageDir1 remote:/p/home/jusers/<JUWELS user account>/juwels/imageDir1/

Examples for AWS S3 bucketΒΆ

The same can be done in case of access to an AWS S3 bucket. Here, we have the credentials for an S3 bucket with the name β€˜data-tutorial-bucket’. In the case of S3, the steps detailed in the previous section for the JUWELS cluster then become

mantik s3-file-service copy-file ./image2.jpeg s3://data-tutorial-bucket/image2.jpeg

mantik s3-file-service copy-file ./image3.jpeg s3://data-tutorial-bucket/image3.jpeg

mantik s3-file-service copy-directory ./imageDir s3://data-tutorial-bucket/imageDir/

mantik s3-file-service copy-file ./imageArchiv2.zip s3://data-tutorial-bucket/imageArchiv2.zip.

For the S3 bucket, we can verify that the transfer of the data has been successful by checking their status on the AWS web page.

Download from remote data storageΒΆ

Similar to uploading files from the remote storage, downloading them from there is equally simple. For this, the order of the folders specified in the commands used for uploading (see above section) have to be inverted. For an example, we download the zipped file Archiv1.zip from the remote storage on JUWELS to a subdirectory of the local folder download. We use

mantik unicore-file-service copy-file --connection-id=<connection ID> remote:/p/home/jusers/<JUWELS user account>/juwels/imageArchiv1.zip ./download/imageArchiv1.zip

Looking now in our image folder, we will find the new subdirectory download/ with a zip file```

my-images
β”œβ”€β”€ download
β”‚   β”œβ”€β”€ imageArchiv1.zip
β”œβ”€β”€ imageDir
β”‚   β”œβ”€β”€ image4.jpeg
β”‚   β”œβ”€β”€ index5.jpeg
.
.
.
└── .env

Delete from remote storageΒΆ

Finally, it is also possible to delete files and directories in remote storage systems. In the following, we will delete data from the storage on JUWELS and the S3 bucket. First, we delete the zipped archive on JUWELS that we have downloaded as described in the section above.

mantik unicore-file-service remove-file --connection-id=<connection ID> --name=remote:/p/home/jusers/<JUWELS user account>/juwels/imageArchiv1.zip remote:/p/home/jusers/<JUWELS user account>/juwels/imageArchiv1.zip

Lastly, it is also possible to remove a directory. The following command shows how to this in a S3 bucket.

mantik s3-file-service remove-directory --name=s3://data-tutorial-bucket/imageDir/ s3://data-tutorial-bucket/imageDir/