May 17, 2024

Retail Sales Forecasting with Snowflake Cortex ML & Snowpark

By Elliot Johnson

Forecasting your company’s sales for the next quarter can be an extremely difficult task. Some forecasts are too optimistic, others too pessimistic. Some executives have wild concepts about what affects a forecast, but some of those ideas are actually correct. 

The truth remains that being able to compile forecasts objectively can be the difference between over-promising or under-promising to potential investors. The brand-new Forecasting tool created on Snowflake Data Cloud Cortex ML allows you to do just that. 

In this blog, we will load our historical sales data, create a projection based on objective stats, and visualize it in a simple line chart—all in under 15 minutes.

What is Cortex ML, and Why Does it Matter?

Cortex ML is Snowflake’s newest feature, added to enhance the ease of use and low-code functionality of your business’s machine learning needs. The newest ML functions are Forecasting, Anomaly Detection, and Contribution Explorer. Today, we will dive deep into the Forecasting feature and build a forecast using market data. 

This low-code solution lets you use your existing Snowflake data and easily create a visualization to predict the future of your sales, taking into account unlimited data points. The forecasting algorithm uses gradient boosting to model data and the rolling average of historical data to help predict trends. Any number of features can be included in this model, and you can use a variable to set the predictive interval, allowing for a higher degree of customization.

How to Build a Predictive Model for Store Sales Using Cortex ML

Preparing the Data and Environment

The first step in creating any ML model is to prepare the data and choose the features you will use to determine the outcome. For this demo, we will use the Walmart Demand Forecasting data, which is available for download at Kaggle here.

First, let’s grant ourselves permission to create and manage a forecast using the analyst role.

				
					USE ROLE admin;
GRANT USAGE ON DATABASE admin_db TO ROLE analyst;
GRANT USAGE ON SCHEMA admin_schema TO ROLE analyst;
GRANT CREATE SNOWFLAKE.ML.FORECAST ON SCHEMA admin_db.admin_schema TO ROLE analyst;

				
			

Next, let’s create the table to house the historical data in our Walmart Demand Forecasting. 

				
					CREATE OR REPLACE TABLE DEMOS.AI_MARKETING.WALMART_SALES_DEMO (
	STORE NUMBER(38,0),
	DATE VARCHAR(16777216),
	WEEKLY_SALES NUMBER(38,2),
	HOLIDAY_FLAG NUMBER(38,0),
	TEMPERATURE NUMBER(38,2),
	FUEL_PRICE NUMBER(38,3),
	CPI NUMBER(38,7),
	UNEMPLOYMENT NUMBER(38,3)
);
				
			

Next, we will load the CSV file downloaded from Kaggle to our table. We’ll show you a quick and easy way to load data for this demo. We will go to the Snowflake Web GUI, find the table in the schema, and click the load data button in the upper right. Creating a staging and a pipeline is the best practice in a production environment to load data, as it keeps the data refreshed as needed. But for the purpose of our demo, this way works.

The second table we create will be our future table.  We will give this table to the model with only the exogenous variables, and it will predict the sales we require. The data is just mocked up, but in your example, it could be a predictor like “Are we running a half-off sale” or a “Special mail-in rebate week.” Something maybe only the people inside your company know will be happening. The data must match the historical data table, excluding the column you wish to predict.

				
					
INSERT INTO FUTURE_WALMART_SALES_DEMO VALUES
  (1, TO_DATE('11-02-2012','MM-DD-YYYY'),0,40.11,2.572,216.0963582,8.106),
  (1, TO_DATE('11-09-2012','MM-DD-YYYY'),0,41.01,2.672,216.0963582,7.106),
  (1, TO_DATE('11-16-2012','MM-DD-YYYY'),0,32.01,2.772,214.0963582,7.106),
  (1, TO_DATE('11-23-2012','MM-DD-YYYY'),1,32.39,2.672,265.0963582,6.506),
  (1, TO_DATE('11-30-2012','MM-DD-YYYY'),0,22.31,2.972,256.0963582,6.506),
  (1, TO_DATE('12-07-2012','MM-DD-YYYY'),0,12.31,3.172,246.0963582,5.906),
  (1, TO_DATE('12-14-2012','MM-DD-YYYY'),0,22.31,3.572,268.0963582,6.106),
  (1, TO_DATE('12-21-2012','MM-DD-YYYY'),0,12.31,3.472,278.0963582,7.106),
  (1, TO_DATE('12-28-2012','MM-DD-YYYY'),1,2.31,3.672,289.0963582,6.906),
  (1, TO_DATE('01-04-2013','MM-DD-YYYY'),1,1.31,3.672,250.0963582,7.806),
  (1, TO_DATE('01-11-2013','MM-DD-YYYY'),0,2.31,3.772,268.0963582,8.906),
  (1, TO_DATE('01-18-2013','MM-DD-YYYY'),0,2.31,3.772,200.0963582,9.906);
				
			

Lastly, we need to create 2 views. Both are for the data we just loaded: historical and future. Notice we updated the date column by casting it to an actual date. We will need that date for the forecasting model.

				
					CREATE OR REPLACE VIEW DEMOS.AI_MARKETING.V1(
	STORE,
	DATE,
	WEEKLY_SALES,
	HOLIDAY_FLAG,
	TEMPERATURE,
	FUEL_PRICE,
	CPI,
	UNEMPLOYMENT
) AS 
    SELECT STORE, TO_DATE(DATE, 'YYYY-MM-DD') AS DATE,WEEKLY_SALES,HOLIDAY_FLAG,TEMPERATURE,FUEL_PRICE,CPI,UNEMPLOYMENT FROM WALMART_SALES_DEMO;
				
			

The second view

				
					CREATE OR REPLACE VIEW DEMOS.AI_MARKETING.FUTURE_WALMART_SALES(
	STORE,
	DATE,
	HOLIDAY_FLAG,
	TEMPERATURE,
	FUEL_PRICE,
	CPI,
	UNEMPLOYMENT
) AS
    SELECT STORE, TO_DATE(DATE, 'YYYY-MM-DD') AS DATE,HOLIDAY_FLAG,TEMPERATURE,FUEL_PRICE,CPI,UNEMPLOYMENT FROM FUTURE_WALMART_SALES_DEMO;
				
			

We have laid the groundwork; now, we can move on to building our model using the incredible new low-code Cortex ML solution.

Building the Model

Now that we have set up all the underlying tables and views, we can use the new Cortex ML functions to build our forecast. Head on over to the AI & ML section of your Snowflake UI. Then, find the Forecasting section and click Create.

First, let’s name our model. We’ll call this walmart_sale_forecast. Second, we will need to select a warehouse to process this data. TASTY_DE_WH, an XS warehouse, will be fine for the Walmart data I’ve selected.

After that, it’s simple to follow the prompts to get the correct information for your model. Remember, we will use the views we created for all the training and prediction. When you complete the seven steps in the form, Cortex ML will automatically generate the code for you. This will appear as a worksheet, like the picture below.

Now, let’s select a warehouse to run the code. Snowflake also makes this part easy. Here is a link to a great table that will help you determine the size of the warehouse you will need to run your predictive analytics.

Once our model is complete, we can view the results of the prediction. In the next section, we will show you how to visualize and evaluate the model.

Visualizing and Evaluating the Model

Two important new entities are created after we create our model. The first is obviously a table with the predictions we requested. The second is the evaluation and feature weight metrics. Let’s take a look at both before we wrap up.

The table provides 95% accuracy of our sales forecast’s upper and lower bounds and the most probable prediction. Using Snowflake’s Chart functions, we can select all data from the table and easily prepare a presentation-ready chart.

Lastly, we can pull two extremely important data pieces from the model we created for the real data geeks: model loss and feature weights.

				
					-- Inspect the accuracy metrics of your model. 
CALL future_walmart_sales!SHOW_EVALUATION_METRICS();

-- Inspect the relative importance of your features, including auto-generated features. 
CALL future_walmart_sales!EXPLAIN_FEATURE_IMPORTANCE();
				
			

These metrics can show us so much about our model and how it is performing. They can also provide, in this case, insights into what gets our customers excited to shop at Walmart. We could add more features to this model and find out if our customers prefer BOGO sales or Rebates, Sales on Clothes vs. Sales on Electronics, etc. These insights are valuable to a business looking to gain an edge on sales and help grow customer loyalty.

Closing

In this blog, we have reviewed how to prepare data that helps us build a sales forecast using Snowflake Cortex ML. The newest feature to the Snowflake arsenal adds an easy-to-use ML pipeline that can help forecast your sales. The other two new features are  Anomaly Detection and Contribution Explorer, which may be explored in a future blog. 

If you need assistance building and deploying your Forecast Models or managing and migrating your old data stack to Snowflake, phData’s industry-leading experts are here to help you with all the steps of the process.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit