How To Get Started With Building AI in High-Risk Industries

ODSC - Open Data Science
6 min readJan 16, 2024

Ever found yourself in a conversation about AI in high-risk industries like healthcare, finance, or education, only to wonder, “Where do I even begin?” Trust me, you’re not alone. Leaders in these fields often approach me with the same question: “How do I get started with AI?”

Between the misconceptions surrounding AI — some folks like they’re talking about generative AI when they’re actually talking about automation — and the ever-growing hype, it can be hard to sort through the noise and take the next steps that make sense for your organization.

My goal with this article is to make getting started with AI easy and painless — especially for those of you who navigate risk as a routine challenge. No jargon, no fluff — just a straightforward guide to help your organization tap into AI’s potential without the headache. Ready to dive in?

What is AI? Some Quick Definitions

In short, AI is software that recognizes patterns.

These patterns can be simple spreadsheets, complex medical images, clinical notes, or all of the above. The goal of a model can be simple like measuring how much of something is present or more complex, like predicting the onset of disease or reliably summarizing clinical notes.

Careful Selection of Examples

To recognize and react to these patterns, AI must be programmed with carefully selected examples. As humans, we know all of the pictures below are squirrels. However, with these examples, a model may conclude that any image with a background of grass is a squirrel.

And would not detect the next image as a squirrel (no matter how cutely drawn).

This is all to say that AI and ML models sometimes learn unexpected patterns from the data they are given. When left unchecked, models can learn the wrong things, like the presence of grass means it’s a squirrel, or in this case distinguishing muffins from chihuahuas (as it turns out, it’s surprisingly difficult to tell the difference between the two).

Now imagine dealing with complex, visually similar medical conditions. In this MIT study, researchers wanted to see if biased datasets could potentially adversely impact the efficacy of diagnosis even if typical indicators of heritage like skin tone and facial features weren’t included in the training dataset.

The study showed that the models were sensitive to self-reported race and gender even though radiologists couldn’t identify the same. These implications are huge when it comes to medical datasets that are largely comprised of white males.

When selecting examples of what is “correct” for your ML model, be sure they are:

  • Representative of real-world scenarios that your model may encounter
  • Diverse, and covers a range of situations and conditions
  • Without errors, outliers, or inconsistencies

Remember that machine learning is a lazy process, and looks for the easiest possible way to differentiate between two groups or reproduce a pattern. In this example, the easiest way to distinguish between the two groups is the presence of grass. What’s missing from your examples is equally as important as what’s in them.

When Does Building AI Make Sense?

Nearly any time someone asks me if they can use AI for [insert task or situation], I like to respond with, What are you doing today to solve that task or situation without AI? If there isn’t already a human-led process or knowledge of a process in place, I’m skeptical that jumping into AI will solve your problem.

AI is not a magic solution for every task or problem your organization faces. And it honestly only makes sense to automate or augment something if it meets the following criteria:

  • Humans are making a decisions at a high volume and regularly.
  • These decisions or tasks are high in toil and repetitive in nature.
  • You have access to correct examples.
  • You can measure the impact of a solution in terms of an existing business KPI.

If your task or decision doesn’t check these boxes, it’s likely not a good fit for AI (yet).

How Complexity Drives Cost and Risk

The complexity of a model impacts both project feasibility and cost. The figure below depicts types of data, including spreadsheets, documents, photos, audio, and video, and common AI goals, including measure, predict, recommend, and create. As you move along either axis, you’ll notice an increase in complexity.

The ‘metaphorical area’ under the orange circle generally experiences fewer unintended consequences but doesn’t eliminate the risk. For applications beyond this curve, stress testing and ongoing surveillance become critical as part of deployment.
As the data becomes more complex, it also becomes harder to query, and the likelihood of risk increases. As the goal becomes more complex, it becomes harder to determine what “correct” looks like, as there could be many variations of correct.

The complexity of data and your goal, loosely correlates to how much time and effort goes into curating data, validating and stress testing models, and how long it takes to build the model. Because of this, the timeline and cost of AI projects vary greatly from one to another, which forces most organizations to prioritize projects.

How To Prioritize Projects for AI

To help clients identify and prioritize AI use cases, we usually break down problems into their simplest forms, and then plot them on a matrix like the one below.

In my experience, I’ve found that we’re really good at staving off low-value, high-complexity projects. From the onset the value is questionable and the process to procure the tech or general feasibility is unclear.

High-value, low-complexity projects are the obvious ones, like Zoom transcript summarization. Chances are, somebody has already solved problems using these solutions. If they make sense for your organization, you could start here as a low-risk (and low-cost) way to start incorporating AI into your processes.

Anything sitting in the top right (high value, high complexity) is usually the Holy Grail of AI use cases. But, to get to these high-value, high-complexity projects, we often have to start with low-value, low-complexity projects.

Our goal is to identify these lower-value smaller projects as building blocks that will lead us to achieve some of those longer-term, high-value projects.

In a recent computer vision project where the ultimate goal was the visual detection of pain queues, which has all sorts of implications for passive medical monitoring, we had to start with some basics. Can we use computer vision to recognize when the device is even positioned properly or out of frame?

In order to reliably detect pain visually, you kind of need to make sure that you have consistency in framing. In the end, this easier low-value project enabled us to eventually achieve our larger goal of scoring pain levels.

When thinking about building AI, start by answering why AI is a good tool for the problem at hand. What decisions are you influencing? What examples (data) would you use to train and develop this model? If your task or problem is a good fit for AI, then consider how difficult it will be to build and how valuable it will be to your organization and stakeholders.

About the Author: Cal Al-Dhubaib is a globally recognized data scientist and AI strategist in trustworthy artificial intelligence, as well as the founder and CEO of Pandata, a Cleveland-based AI consultancy, design, and development firm.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.