Deploying and Monitoring Deep Learning Models on Cloud Pak for Data

Published in

IBM Data Science in Practice

7 min readMar 21, 2023

By Courtney Branson, Advisory Data Scientist, and Carolyn Saplicki, Senior Data Scientist

Today, many businesses create AI models to aid in prediction. However, not all businesses have the tools or understanding of Machine Learning Operations (MLOps) to productionize an AI model. In fact, VentureBeat found that 87% of data science projects never make it in production. Often, research teams do not generally consider the operationalization of models during creation. In this blog, we go over a use case of an AI assisted image recognition process that consisted of many deep learning models. The use case discussed here focuses on health care; however, this could apply to any deep learning image recognition model in industries such as bio-pharmaceutical, oil and gas, manufacturing, agricultural and more. We will highlight an architecture structure that allows for simple deployments and flexible orchestration in Cloud Pak for Data 4.5.2 (CPD) using Watson Studio, Watson Machine Learning (WML), Watson Machine Learning Accelerator (WMLA), and Watson OpenScale.

Project Description

Our client created an AI assisted workflow for a problem in health care. In this problem, AI models would aid the health care persona in deciding the diagnosis (class) of a certain condition. The steps are as follows:

Pre-Processing:

An image would be taken of the patient’s condition. This image would then be pre-processed into multiple different resolutions. Each resolution consisted of sections that make up the pre-processed image. The client utilized complex packages and created their own extensions for pre-processing. To pre-process images on CPD in Watson Studio, a custom image was created and employed so that we could use these packages.

Neural Network Models:

After an image is pre-processed, it will be used in a Neural Network model. Each Neural Network would loop over each section in the specific pre-processed image and output an array of prediction probabilities for each class, resulting in a matrix. For instance, if an image was pre-processed into 100 sections, the Neural Network corresponding to that resolution would output a matrix of 100 rows x n classes.

Each resolution level had multiple Neural Network models with a different number of classes for prediction. The models had varying numbers of classes, because they were built for different purposes, such as for additional research, testing, or diagnosis. Only one set of the models with the same number of classes would be used to generate a final prediction. We allowed for control of which set of models would be called by passing in a parameter that specified which set of models to use.

Aggregation Methods:

A single deep learning model produces a matrix of prediction probabilities of each class for each section of the original image. This matrix is aggregated for all sections based on any number of unique functions resulting in a final array of features. For instance, if the final matrix resulted in 100 sections x n classes, and we decided to use 3 aggregation functions, this would reduce the matrix to an array of 3n x 1.

Random Forest Model:

For a specific image, the final aggregated array from each Neural Network is concatenated. This is the input to the final Random Forest model. The output is the final predicted class and associated prediction probabilities for each class.

Note: there is only one Random Forest model utilized for a final prediction. Currently, the production version of this system only focuses on a specific number of classes. The Random Forest model has the same number of classes as one set of the Neural Network models. When a prediction shows low confidence, the Neural Network with more classes can be invoked to understand the predictions in more detail. This is useful in future work, specifically the creation of a final model with more classes.

Architecture Structure

WMLA Model Deployments

Watson Machine Learning Accelerator (WMLA) is a deep learning service on Cloud Pak for Data. WMLA provides large model support that helps increase the amount of memory available for deep learning models (up to 16 GB or 32 GB per network layer), enabling more complex models and data inputs. WMLA also utilizes both CPUs and GPUs that are dynamically allocated.

We created a deployment in WMLA for each image resolution. Each deployment had multiple Neural Network models that supported a different model use such as research, testing, and/or diagnosis. Furthermore, each deployment also included the aggregation functions and the ability to save the matrix before aggregation. The deployment’s payload request specifies the image to be prediction upon, the path to the image, which model to be utilized and (optionally) the saving of the intermediate output.

Creating a WMLA Inference Service:

WMLA uses a collection of deployment files in order to initialize and run an inference. For our deployment, these files were the kernel.py which contains all the code we need to run within the deployment, a model.json which contains all the configuration parameters, and a README.

Multiple Models:

Mutiple models are loaded in the on_kernel_start() method within our kernel.py. This method is only called once when the kernel is initialized. Loading the models in this function allows for quicker inference when the endpoint is called. Within our on_task_invoke() method we are then able to direct the inference request to the model specified in the payload request.

Storage Volume:

Both the original models and pre-processed images are pulled from an external storage volume. This storage volume can easily be connected to WMLA through a model.json with attributes as seen below:

{
 “name” : “example”,
 “tag” : “example”,
 “weight_path” : “N/A”,
 “runtime” : “python”,
 “kernel_path” : “kernel.py”,
 “attributes” :
 [
 { “key”: “cpd-volumes-cpd::volume1”, “value”: “/mymount/path1”},
 { “key”: “cpd-volumes-cpd::volume2”, “value”: “/mymount/path2”}
 ],
 “schema_version” : “1”
}

A storage volume allows for controllable read/write access, persistent storage and sharing across projects. This means that the data we see in this volume can be mounted to JupyterLab environments as well as mounted to our WMLA deployment space. This, in addition to the volume APIs, made storage volumes easy to use across all phases of our MLOps pipeline.

As of WMLA 2.4, you are not able to connect to multiple storage volumes. However, plans to allow for this functionality are on the roadmap for future versions of WMLA.

Optional Save:

After discussion with the lead scientist, one important feature we added in the architecture structure is the ability to save the matrix produced by the Neural Network models. This matrix is probabilities of classes for each section within the pre-processed image. This will be useful in future work and understanding predictions.

Optional Asset Save to Project Data Assets

This is saved to project data assets and is an optional input with respect to the deployments.

WML Deployment

Once the deep learning models were deployed within WMLA, we could focus on the WML deployment of the Random Forest model. This was a standard WML deployment. We were able to train the model within JupyterLab in Watson Studio and deploy it within the same notebook using the WML Python SDK. The deployment’s payload request specifies the final concatenation of the output of the deep learning models deployed in WMLA and provides a final prediction.

Watson OpenScale

Once the Random Forest model was successfully deployed to WML, we configured Watson OpenScale to monitor the deployment. We specifically monitored quality, drift and explainability monitors. An example of how to configure these monitors can be found here.

In this project, OpenScale does not directly monitor the deep learning models deployed within WMLA. However, the feature columns of the Random Forest model were the aggregated probabilities of each class from the deep learning models. This allowed the client to glean insights into what was happening within the complex deep learning models without setting up direct monitoring.

Our health care client appreciated the explainability aspect of Watson OpenScale, as their models are complex. The ability to explain predictions using an algorithm largely based off LIME allowed for them to consider future research areas. Furthermore, contrastive explanations helped them quantitatively understand which areas were the most impactful and by how much. This allowed them to better understand the nuance in the decision-making process of the models and see trends between diagnoses for their use case.