Data Management with MantikΒΆ
The aim of this tutorial is to provide a step-by-step guide to present the Mantik workflow for managing data on remote file systems.
You will learn how to use the Mantik remote file service to:
upload data,
download data,
remove data.
For this tutorial, we assume that you know how to set-up a Mantik project, see Set Up Mantik Account for an introduction.
In order to follow this tutorial, you need credentials to one of the compute facilities (JUWELS, JURECA, JUSUF) from the JΓΌlich Supercomputing Center or an AWS S3 bucket.
Bookkeeping with Mantik is possible with basically any form of data. For the following, we used images and zipped files that have been organized in the my-images
directory in subdirectories as shown below. For the rest of tutorial, we assume that the userβs current working directory is /my-images
βββ imageDir
β βββ image4.jpeg
β βββ index5.jpeg
βββ imageDir1
β βββ image6.jpeg
β βββ index7.jpeg
βββ image1.jpeg
βββ image2.jpeg
βββ image3.jpeg
βββ .env
To use the remote file service a couple of environment variables have to be set. For details, have a look at the documentation for the Remote File Service. We set up the variables in .env
and sourced it, i.e., source .env
(see the next section for a sample file).
Example .env
# .env
# Required environment variables for tracking to Mantik the MLflow server
export MANTIK_USERNAME=<user>
export MANTIK_PASSWORD='<password>'
# Optional environment variables
# Required environment variables for remote execution and data management on HPC via UNICORE.
export MANTIK_COMPUTE_BUDGET_ACCOUNT=<compute project>
# Required environment variables for S3 file service
export AWS_ACCESS_KEY_ID=<access key ID>
export AWS_SECRET_ACCESS_KEY=<secret access key>
Upload data to remote data storageΒΆ
Examples for JUWELSΒΆ
In order to upload data and bookkeep it through Mantik, we use the Mantik client. For a detailed description on how to set up and use the client, have a look at Remote File Service. Here, we first upload an image file and then additionally upload zipped images to JUWELS via
mantik unicore-file-service copy-file --connection-id=<connection ID> ./image1.jpeg remote:/p/home/jusers/<JUWELS user account>/juwels/image1.jpeg
mantik unicore-file-service copy-file --connection-id=<connection ID> ./ remote:/p/home/jusers/<JUWELS user account>/juwels/
Furthermore, Mantik remote file service allows for the upload of directories and their content. Uploading them to JUWELS is done by calling
mantik unicore-file-service copy-directory --connection-id=<connection ID> ./imageDir1 remote:/p/home/jusers/<JUWELS user account>/juwels/imageDir1/
Examples for AWS S3 bucketΒΆ
The same can be done in case of access to an AWS S3 bucket. Here, we have the credentials for an S3 bucket with the name βdata-tutorial-bucketβ. In the case of S3, the steps detailed in the previous section for the JUWELS cluster then become
mantik s3-file-service copy-file ./image2.jpeg s3://data-tutorial-bucket/image2.jpeg
mantik s3-file-service copy-file ./image3.jpeg s3://data-tutorial-bucket/image3.jpeg
mantik s3-file-service copy-directory ./imageDir s3://data-tutorial-bucket/imageDir/
mantik s3-file-service copy-file ./ s3://data-tutorial-bucket/
For the S3 bucket, we can verify that the transfer of the data has been successful by checking their status on the AWS web page.
Download from remote data storageΒΆ
Similar to uploading files from the remote storage, downloading them from there is equally simple. For this, the order of the folders specified in the commands used for uploading (see above section) have to be inverted. For an example, we download the zipped file
from the remote storage on JUWELS to a subdirectory of the local folder download
. We use
mantik unicore-file-service copy-file --connection-id=<connection ID> remote:/p/home/jusers/<JUWELS user account>/juwels/ ./download/
Looking now in our image folder, we will find the new subdirectory download/
with a zip file```
βββ download
β βββ
βββ imageDir
β βββ image4.jpeg
β βββ index5.jpeg
βββ .env
Delete from remote storageΒΆ
Finally, it is also possible to delete files and directories in remote storage systems. In the following, we will delete data from the storage on JUWELS and the S3 bucket. First, we delete the zipped archive on JUWELS that we have downloaded as described in the section above.
mantik unicore-file-service remove-file --connection-id=<connection ID> --name=remote:/p/home/jusers/<JUWELS user account>/juwels/ remote:/p/home/jusers/<JUWELS user account>/juwels/
Lastly, it is also possible to remove a directory. The following command shows how to this in a S3 bucket.
mantik s3-file-service remove-directory --name=s3://data-tutorial-bucket/imageDir/ s3://data-tutorial-bucket/imageDir/