Anomaly Detection for CRM Data: A Step-By-Step Guide

ODSC - Open Data Science
5 min readOct 17, 2023

Editor’s note: Geeta Shankar and Tuli Nivas are speakers for ODSC West 2023 this October 30th to November 2nd. Be sure to check out their talk, “Anomaly Detection for CRM Production Data,” there!

Anomaly detection for CRM (Customer Relationship Management) data is gaining increasing importance. It is crucial to monitor any unusual behavior in production data and proactively identify the root causes. Regression models can be employed to detect anomalies when two data measures exhibit a high correlation (R2 value). For metrics that may not correlate with any other variables, we can attempt to characterize the behavior over time using forecasting algorithms. One state-of-the-art forecasting algorithm is Prophet, developed by Meta. This is particularly useful for metrics related to response times, such as runtime or compile time. In this blog, we aim to provide a comprehensive guide for building your first anomaly detection models based on production data metrics such as runtime, app CPU time, and database time.

Understanding Anomaly Detection

What are anomalies in CRM data? Anomalies can manifest as outliers in customer behavior or unusual patterns in production systems. It’s imperative to promptly determine whether an anomaly is a result of internal factors, such as capacity constraints or excessive resource utilization, or if it is simply an atypical customer behavior. Occasionally, we encounter seasonal or cyclical anomalies, necessitating the analysis of historical data to discern recurring patterns, like spikes in sales during the holidays or drops in customer engagement during specific months. Therefore, constructing anomaly detection models for critical metrics, such as runtime, app CPU time, and database time, is pivotal for gaining insights into CRM production data.

Getting Started with Your Anomaly Detection Model

The anomaly detection models demonstrated here are implemented in Python, utilizing sklearn for regression modeling, Prophet for forecasting. To begin, you need to set up a working environment with Python and these libraries installed. We recommend using Jupyter Notebook as an excellent environment for experimentation, allowing you to run code snippets and test functionality. Here’s a screenshot of the initial Jupyter Notebook cell with all the necessary imports:

pip install sklearn
pip install prophet
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from prophet import Prophet
import plotly.express as px
import plotly.graph_objects as go

Preparing Your Data

First, you’ll need to access and load your CRM production data. We’ve created synthetic data that closely resembles the metrics collected from a production pod with some of our customers. You can download our synthetic data from here.

Regression Modeling: Detecting Anomalies in Resource Utilization

For detecting anomalies in resource utilization, we will use regression modeling techniques. This example establishes a linear regression model that correlates the number of transactions (count) of previous release data to a metric of high correlation in the previous release data i.e. cumulative app CPU time or cumulative database time. The linear regression model is then applied to theccccccvrcbdbjbujrvdvvchnrtekgjnuefbknhknjrtn

number of transactions processed from current release data to yield predictions of the metric of interest on current release data. Finally, the z-scores are calculated using the linear regression applied on the current release data divided by the standard deviation of the linear regression applied on the previous release data.

lm = LinearRegression()
lm.fit(np.array(prev_release['count']).reshape(-1,1),np.array(prev_release[metric]))
predicted_current_release_metric = lm.predict(np.array(current_release['count']).reshape(-1,1))
predicted_previous_release_metric = lm.predict(np.array(prev_release['count']).reshape(-1,1))
zscores = (current_release[metric]-predicted_current_release_metric)/np.std(prev_release[metric]-predicted_previous_release_metric)

Regression Modeling: Detecting Anomalies in Response Time

To identify anomalies in response time, we will employ the Prophet model. This example shows a Prophet model created with a confidence interval of 99% and yearly and weekly seasonality parameters. We then fit our response data, which we have provided a CSV of P95 runtime for, with hourly granularity. A forecast is created by using the Prophet model to predict on the data which creates a unique confidence interval for each data point. If the actual data point is outside the predicted confidence interval, it is anomalous. The bottom code is a sample for coloring the data green if they are within the confidence interval and red if outside the confidence interval, indicating that they are anomalous.

model = Prophet(interval_width=0.99, yearly_seasonality=True, weekly_seasonality=True)
model.fit(data)
forecast = model.predict(data)
performance = pd.merge(data, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']], on='ds')
performance['anomaly'] = performance.apply(lambda rows: 1 if ((rows.y<rows.yhat_lower)|(rows.y>rows.yhat_upper)) else 0, axis = 1)
anomalies = performance[performance['anomaly']==1].sort_values(by='ds')
performance['color'] = np.where(performance['anomaly']== 1, 'red', 'green')
performance['name'] = np.where(performance['anomaly'] == 1, 'Anomaly','Within Confidence Interval')

Want to Learn More?

If you are interested in putting all these ideas for anomaly detection together, please attend our ODSC West 2023 Workshop where you will find out more on how to transform the data to fit into these anomaly detection models as well as create impactful visualizations for them.

About the Authors/ODSC West 2023 Speakers

Geeta Shankar is a software engineer who specializes in leveraging data for business success. With expertise in computer science, data science, machine learning, and artificial intelligence, she stays updated with the latest data-driven innovations. Her Indian classical music background has taught her the value of sharp thinking, spontaneity, and connecting with diverse individuals. Geeta uses these skills to translate complex data into meaningful insights that enhance performance and customer experiences.

Tuli Nivas is a Software Engineering Architect at Salesforce with extensive experience in design and implementation of test automation and monitoring frameworks. Her interests lie in software testing, cloud computing, big data analytics, systems engineering, and architecture. Tuli holds a PhD in computer science with a focus on building processes to set up robust and fault-tolerant performance engineering systems. Her recent area of expertise has been around machine learning and building data analytics for better and faster troubleshooting of performance problems and anomaly detection in production.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.