How to Get the Current Run ID using MLFlow in Azure Databricks

In this blog post, you will learn how easy to get the run ID in your current Azure Databricks Notebook. This is helpful especially if you want to use that run ID for deployment within the same notebook that you’re working on.

FTF (First Things First)

Install MLflow

What is MLflow?

“MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow currently offers four components: MLflow Tracking, MLflow Projects, MLflow Models & Model Registry” - Read more in MLflow.org

In this blog, we will only create a basic usage of MLflow where it logs a model, save and track.

Setup

You can install MLflow in workspace library or you using python command. In this sample, we’re going to use the python notebook.

dbutils.library.installPyPI("mlflow")
dbutils.library.restartPython()

And we import it.

# Import mlflow
import mlflow
import mlflow.sklearn

# import for unique folder naming
import uuid

Basic Usage

# create unique folder number
unique_folder = str(uuid.uuid1())

def train_and_log_model(data):
    # Split dataset into training set and test set
    x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3,random_state=109)

    # Start an MLflow run; the "with" keyword ensures we'll close the run even if this cell crashes
    with mlflow.start_run():
        model = GaussianNB()

        # train the model
        model.fit(x_train,y_train)

        # Predict
        preds = model.predict(x_test)
        
        mlflow.sklearn.log_model(model, "model")
        modelpath = "/dbfs/mlflow/"+unique_folder+"/model"
        mlflow.sklearn.save_model(model, modelpath)
        
        client = mlflow.tracking.MlflowClient()
        active_run = client.get_run(mlflow.active_run().info.run_id).data
        log_model_history_json = json.loads(active_run.tags['mlflow.log-model.history'])
        log_model_history_data = log_model_history_json[0]
        return log_model_history_data['run_id']

Use it

run_id = train_and_log_model(data) # call to run
print(run_id)
# result 1ccc2a132124468e91e4ea636b6f4023

You can view the Run tab to see the log model. How to Get the Current Run ID in Azure Databricks | Mark Deanil Vicente

How to Get the Current Run ID in Azure Databricks | Mark Deanil Vicente

In the above code, after we save our model, we use the mlflow.tracking.MlflowClient() class. What is MlFlow.Tracking? It is a module that provides Python CRUD interface to MLflow experiments and runs. MlflowClient() class has many methods, but what we need is the get_run(id) because it fetches the run from the backend store within the current notebook and the resulting run contains a collection of run metadata, as well as a collection of run parameters, tags, and metrics.

I would also advice if you also add more mlflow tracking model logs like Parameters, Metrics, Tags and even notes in saving your model.

Learn more from MLFlow Documetation

If you have some questions or comments, please drop it below 👇 :)

Read my blog

How to Get the Current Run ID using MLFlow in Azure Databricks

FTF (First Things First)

Install MLflow

Setup

Basic Usage