How To Improve AI Model Robustness in the Last Mile

6 min readApr 20, 2023

Artificial intelligence (AI) and machine learning (ML) have rapidly become key drivers of business transformation. They can improve operations, reduce costs, enhance customer experience, and drive revenue growth.

However, the latest innovations in ML and AI have made one thing clear: AI does not work in isolation.

As AI builders, we must recognize that new ML models impact people, processes, and other technology, especially in the last mile or final stages of AI design. Companies that fail to do so are likely to fall behind their competitors who do.

Despite their incredible capabilities, ML models are not infallible, and companies building AI must consider ways to effectively detect performance changes and deviations in the final stages of development.

This is especially important in this new wave of generative AI, including large language models. We must understand what these models can and can’t do, and what risks they pose, so that we can develop meaningful ways to measure performance. By doing so, companies can maximize the potential of AI and ML to drive success in their organizations.

What Is the Last Mile in AI?

In general, machine learning engineers and data scientists use the term “last mile” to describe the process of preparing an AI solution for broad and universal use. This involves completing the necessary steps to ensure that the AI solution is readily available to be implemented by a wide range of users.

The last mile of AI includes:

Training and educating team members on using the model to make predictions or decisions.
Building trust between the model and anyone affected by it.
Detecting errors and resolving them.
Measuring performance of the model in real world applications.
Fine-tuning the model based on shifts in the environment.
Building scalable infrastructure to support the model.
Taking the proper privacy and security measures to ensure the model is secure and robust.

How the Last Mile Fits Into the Design & Development Process

Realistically, designing ML models is an iterative development and testing effort. However, this conventional wisdom can lead businesses into a never-ending cycle of designing models (the left side of the diagram) that seldomly make it to development (the right side of the diagram).

Source: Pandata

In other instances, organizations proceed to pilot their models and experience unprecedented challenges within the last mile of development. While organizations face unique challenges at every stage of AI design and development, companies often recognize the most friction within the last mile. This is especially true for organizations in high risk industries like energy, healthcare, and finance because the cost of errors is higher and the need for human oversight is greater.

Model Robustness Challenges in the Last Mile

Model robustness refers to the ability of a machine learning model to maintain its performance and accuracy even when faced with unexpected or adversarial data inputs. Achieving model robustness is important for ensuring the reliability and effectiveness of machine learning models in real-world applications, where they may encounter a variety of unexpected scenarios and challenges.

In the last mile of AI, model robustness becomes increasingly critical. A model that performed well in the development and testing phases may not necessarily remain robust in the face of new data or conditions.

There are several common challenges that companies may encounter when striving to achieve model robustness in the last mile of AI. Here are just a few:

Data quality. In production, machine learning models may encounter data that differs from the training data, such as missing values, noise, or outliers. Ensuring data quality and consistency is critical to maintaining model robustness.
Concept drift. The distribution of data may change over time, causing a model to become less effective as it struggles to adapt to new input patterns. Continuously monitoring the model’s performance and updating the training data can help address concept drift.
Model complexity and generalization. Because we have limited tools to assess the inclusivity of data in complex (think: large language) or multimodal (think: text and image) models, over- or underfitting may unintentionally occur.
Interpretability. A lack of interpretability in the model can make it difficult to diagnose and address issues that arise in the last mile. Ensuring model interpretability can help facilitate the identification of problems.

By anticipating these challenges and implementing strategies to mitigate them, companies can improve the robustness of their machine learning models in the last mile of AI.

How To Improve Model Robustness

Machine learning engineers must take steps to ensure that a model remains robust in production. This may involve monitoring the model’s performance and accuracy in real-time, incorporating feedback from end-users, and continuously refining the model through techniques such as retraining or adjusting hyperparameters. Here are a few specific ways to improve model robustness in the last mile.

Set Up Model Management and Alerts

To ensure that machine learning models remain effective in production, companies can set up model management and alerts to detect shifts in trends or a drop in performance.

For example, we worked with a client to build a topic model to categorize all channels of customer feedback. Categorization models typically have some error rate. In this case, we had high 90’s performance, which is fine for statistical reporting.

However, when a comment belonging to a certain topic gets forwarded for review, managers will quickly lose trust in a system that gets 1 in 10 wrong. So we needed to use a higher accuracy threshold for the same model. This meant some items were not ‘scored’ at all, but it did guarantee that when a manager had an item forwarded, it was highly likely to be relevant, ultimately increasing trust in the AI-based solution.

Because there are so many ways that models can break or drift, it’s important that teams set up the right systems of rules to know when to intervene. These management systems should be driven by rate of change in the behavior being modeled, the value to the business, and volume of the data being consumed.

Create Secondary Models

Creating secondary models that utilize the output of an original model can help build advanced workflow logic.

For instance, one can build a model that predicts time to respond or customer churn based on the outputs of a topic model, and then create logic to flag a human decision maker under certain thresholds.

We recently worked on a project to generate alternative text for visually impaired individuals. The original computer vision approach alone was not reliable, so we combined it with a large language model and background provenance of the images. This presented a challenge on determining the reliability of the text generated, so we had to create a model to measure the reliability of the newly generated text.

Generate Synthetic Data

Synthetic data can be used to test the robustness of machine learning models by creating counter-examples, i.e., examples that are intentionally designed to challenge the model’s performance.

By adding synthetic manipulations to the original data, such as noise, rotations, or occlusions, synthetic data can simulate real-world scenarios where the model is likely to fail or make errors. The model can then be trained on both the original and synthetic data to evaluate its performance and identify potential weaknesses.

This approach can help improve the model’s generalization and reduce overfitting, making it more robust to new and unseen data. Additionally, synthetic data can be used to balance class distributions, generate rare events, or simulate data from different domains or distributions, which can be particularly useful when training data is limited or biased.

By prioritizing model robustness in the last mile, businesses can ensure that their machine learning models are reliable and effective in real-world scenarios, leading to better outcomes and increased adoption of AI solutions.

About the Author: Cal Al-Dhubaib is a globally recognized data scientist and AI strategist in trustworthy artificial intelligence, as well as the founder and CEO of Pandata, a Cleveland-based AI consultancy, design, and development firm.

Featured Image: Canva

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.