The 2021 Executive Guide To Data Science and AI

Case-studies from real-life business scenarios and advice you can act on.

Eleni Nisioti
7 min readAug 2, 2021

--

This post is a bitesize walk-through of the 2021 Executive Guide to Data Science and AI — a white paper packed with up-to-date advice for any CIO or CDO looking to deliver real value through data.

Download the free, unabridged version here.

We have divided the guide into 8 thematic sections, with new content released regularly:

Roadmap

👫 1. Team

How to determine the optimal team structure

🤖 2. Machine learning

The 6 key trends you need to know in 2021

🎨 3. Visualisation

How to tell the right story with the 4 different kinds of dashboard

🔧 4. Tools

Finding the right tool for the problem

♻️ 5. Automation

Automating data pipelines and models

➡️ 6. Deployment

How to build sustainable, scalable live systems

💵 7. Proving Value

How to calculate ROI and prioritise projects

💡 8. Big Ideas

What to look out for in 2022

1. Team

Building the right data science team is complex. With a range of role types available, how do you find the perfect balance of Data Scientists, Data Engineers and Data Analysts to include in your team?

First, let’s explore the key attributes of each role:

The Data Scientist

Data scientists have a wealth of practical expertise building AI systems for a range of applications. They bring deep expertise in machine learning, clustering, natural language processing, time series modelling, optimisation, hypothesis testing and deep learning to the team. The most common data science languages are Python and R SQL is also a must have skill for acquiring and manipulating data.

The Data Engineer

Not everyone working on a data science project is a data scientist. Data engineers are the glue that binds the products of data scientists into a coherent and robust data pipeline. They build production-ready systems using best-practice containerisation technologies, ETL tools and APIs. They are skilled at deploying to any cloud or on-premises infrastructure.

The Data Analyst

Asking the right questions is often the toughest part in a data science project. This is where the inquisitiveness and communication skills of Data Analysts come into play. They are experts at asking the right questions and being inquisitive to ensure that the output from your data science team is actionable, practical and functional. They build interactive dashboards and reports that stakeholders use to consume model outputs and insights.

Data Scientists, Data Engineers and Data Analysts are all crucial roles within a Data Science team

Next, it’s important to understand how to combine these three roles into the perfect team. In the full guide we outline the four different types of data science team and how to determine which is right for your company.

2. Machine Learning

In this section, we look beyond ‘standard’ ML practices and explore the 6 ML trends that will set you apart from the pack in 2021. Below we outline three of our favourites:

From XGBoost to NGBoost

NGBoost is a machine learning algorithm that goes beyond the already powerful XGBoost by predicting an interval, instead of a single point estimate. This allows for a much richer interpretation of predictions, without sacrificing the algorithm’s power. Give this technique a try to take your team’s ML modelling to the next level.

NGBoost price prediction intervals

Human-in-the-loop AI

Human-in-the-loop AI systems siphon off portions of validation data for human review, especially where prediction confidence is low or prediction error is high. During development, the AI system can receive targeted feedback (additional labelled data) on which to continue training and in a live environment can defer marginal predictions to a human for manual consideration.

Explainable ML

When modelling business process, the why is often more important than the what. SHAP values unlock the why behind individual predictions by giving your ML models interpretability. They directly measure the impact of each feature on a given prediction — some features pull up the predicted score and some push it down. You can use SHAP values to build human-interpretable explanations of every prediction.

SHAP values can tell you which pixel values are most responsible for recognising the content of a picture: source: https://github.com/slundberg/shap

Download the full guide to get hold of the complete list of ML trends for 2021.

3. Visualisation

The secret to building exceptional dashboards is simple:

Make sure you team understands the four different kinds of dashboard and ensure that each published dashboard is tailored to one specific type.

The four kinds of dashboard are Operational, Analytical, Strategic and Self-service. These can be defined along two primary dimensions — target audience and interactivity:

The four different types of dashboard

The Operational Dashboard

Operational dashboards display real-time metrics to monitor business processes and alert technical teams to anomalies. Think of your car dashboard: fast, simple and immediately actionable.

The Analytical Dashboard

Analytical dashboards are used by data science teams to interactively mine datasets for insight. They are often shared within the team but not deployed to the wider business.

The Strategic Dashboard

Strategic dashboards present key performance metrics to senior management and aim to tell the ‘big-picture’ story behind the data. These dashboards highlight important areas of success or concern.

The Self-service Dashboard

Self-service dashboards allow users from across the business to quickly find data they need through highly interactivity and easy-to-use dashboards. For example, a sales executive might use a self-serve dashboard to find out how much a client spent with the company last year and on what products.

In the full 2021 Executive Guide To Data Science and AI we give specific examples of each type of dashboard and also present the 6 questions you need to ask before choosing a data visualisation software platform.

3. Tools

We’ve mined the Kaggle 2020 State of Data Science and Machine Learning survey to produce the ultimate guide to the tools and techniques used by ML experts — here’s a sample:

Download the full guide for the low-down on this an other themes such as ML frameworks, ML algorithms, computer vision methods, NLP methods, cloud computing platforms, cloud ML products, AutoML tools and databases.

In addition, a modern business cannot afford to ignore two crucial tools that form part of the toolbox for any modern data science team — Git and Docker.

Git & Docker

Coding is messy — it is rarely a linear process from idea to production ready system. Git, the most commonly used version control system in the world, solves this problem by keeping track of changes to source code over time. It is an essential and highly useful tool to introduce to your data science.

Another frequent frustration when developing data science solutions is ensuring portability. Team members and servers will rarely have the exact same computer setup, meaning the solution may not work when deployed to production or when sharing code between machines.

Docker enables your team to package codebases so that they run on any machine. This is made possible by the use of Docker containers, virtual operating systems that encapsulate everything necessary to run your code.

Automation

We recommend using Apache Airflow to automate each step of a machine learning pipeline — from validating data pipelines, through model training and selection to prediction validation.

Underlying the automation should be a robust alerting system that notifies users when execution or validation steps fail.

Automation can take place at each step of the ML training and prediction pipelines

In the full guide, we explain how to how to automate and set up alerts for each step in the above diagram:

  • Data validation
  • Model training
  • Model validation
  • Model selection
  • Making predictions
  • Prediction validation

For example, during the data validation stage, here are three key points to check as part of the validation scripts.

Download the free whitepaper for the complete guide to setting up automation across each step of your data science project pipelines.

Stay tuned for more bitesize summaries of the Deployment, Proving Value and Big Ideas sections of the 2021 Executive Guide To Data Science and AI.

Download the free, unabridged version here.

Applied Data Science Partners is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch with us at hello@adsp.ai

--

--

Eleni Nisioti

PhD student in AI. Deep learning is not just for machines. I like my coffee like I like my code. Without bugs.