Blog - Data Science Current

What is a Confusion Matrix? Understand the 4 Key Metric of its Interpretation

Data Science Dojo

SEPTEMBER 23, 2024

In the world of machine learning, evaluating the performance of a model is just as important as building the model itself. In this blog, we will explore the concept of a confusion matrix using a spam email example. We highlight the 4 key metrics you must understand and work on while working with a confusion matrix.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Receiver Operating Characteristic (ROC) and Area Under the Curve Explained

Data Science Dojo

SEPTEMBER 13, 2024

In the domain of machine learning, evaluating the performance and results of a classification model is a mandatory step. There are numerous metrics available to get this done. The ones discussed in this blog are the AUC (Area Under the Curve) and ROC (Receiver Operating Characteristic). What is ROC?

Decision Trees

Decision Trees Data Scientist Machine Learning Machine Learning

How Can You Check the Accuracy of Your Machine Learning Model?

Pickl AI

MARCH 5, 2025

The blog explains the limitations of using accuracy alone. It introduces alternative metrics like precision, recall, F1-score, confusion matrices, ROC curves, and Hamming metrics to evaluate models, ensuring improved insights comprehensively. Key Takeaways: Accuracy in Machine Learning is a widely used metric.

Machine Learning

Machine Learning Machine Learning Decision Trees Cross Validation

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

AWS Machine Learning Blog

NOVEMBER 13, 2024

ML governance starts when you want to solve a business use case or problem with ML and is part of every step of your ML lifecycle, from use case inception, model building, training, evaluation, deployment, and monitoring of your production ML system. Prepare the data to build your model training pipeline.

ML

ML ML AWS Data Preparation

Classifiers in Machine Learning

Pickl AI

APRIL 13, 2025

One of the most fundamental tasks in Machine Learning is classification , which involves categorizing data into predefined classes. Classification is a subset of supervised learning, where labelled data guides the algorithm to make predictions. Think of it as sorting mail into different binsletters, packages, and junk mail.

Machine Learning

Machine Learning Machine Learning Decision Trees K-nearest Neighbors

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

AWS Machine Learning Blog

DECEMBER 9, 2024

However, keeping track of numerous experiments, their parameters, metrics, and results can be difficult, especially when working on complex projects simultaneously. For your reference, this blog post demonstrates a solution to create a VPC with no internet connection using an AWS CloudFormation template. max_depth=2, gamma=0.0,

AWS

AWS ML ML Data Scientist

Build a multi-tenant generative AI environment for your enterprise on AWS

AWS Machine Learning Blog

NOVEMBER 7, 2024

The generative AI playground is a UI provided to tenants where they can run their one-time experiments, chat with several FMs, and manually test capabilities such as guardrails or model evaluation for exploration purposes. They include features such as guardrails, red teaming, and model evaluation. The component groups are as follows.

AWS

AWS AI AI Machine Learning

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

This includes: Risk assessment : Identifying and evaluating potential risks associated with AI systems. Monitoring and evaluation : Continuously monitoring and evaluating AI systems to help ensure compliance with regulations and ethical standards. Mitigation strategies : Implementing measures to minimize or eliminate risks.

AWS

AWS ML ML Machine Learning

Faster distributed graph neural network training with GraphStorm v0.4

AWS Machine Learning Blog

FEBRUARY 11, 2025

GraphStorm and SageMaker Pipelines allows you to do that by creating a model pipeline you can run locally to retrieve model metrics, and when youre ready, run your pipeline on the full data on SageMaker, and produce models, predictions, and graph embeddings to use in downstream tasks. times faster in evaluation time! 4xlarge instance.

AWS

AWS Python ML ML

Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Given these challenges faced by RAG systems, monitoring and evaluating generative artificial intelligence (AI) applications powered by RAG is essential. In this post, we show you how to evaluate the performance, trustworthiness, and potential biases of your RAG pipelines and applications on Amazon Bedrock.

AWS

AWS AI AI Artificial Intelligence

How to tackle lack of data: an overview on transfer learning

Data Science Blog

FEBRUARY 23, 2023

However another important point is, conversely you don’t need to increase training data or parameters of a model once it achieves an ideal score in metrics. In this case model performances need to be evaluated with ROC curves , namely relations of true positives and false positives.

Supervised Learning

Supervised Learning Machine Learning Machine Learning Deep Learning

Reduce inference time for BERT models using neural architecture search and SageMaker Automated Model Tuning

AWS Machine Learning Blog

JANUARY 19, 2024

One example is an online retailer who deploys a large number of inference endpoints for text summarization, product catalog classification, and product feedback sentiment classification. The performance of the architecture is typically measured using metrics such as validation loss.

Machine Learning

Machine Learning Machine Learning AWS ML

Image classification model selection using Amazon SageMaker JumpStart

AWS Machine Learning Blog

FEBRUARY 6, 2023

One such task is image classification, where images are accepted as input and the model attempts to classify the image as a whole with object label outputs. In this post, you will see how the TensorFlow image classification algorithm of Amazon SageMaker JumpStart can simplify the implementations required to address these questions.

Algorithm

Algorithm ML ML Machine Learning

How Vericast optimized feature engineering using Amazon SageMaker Processing

AWS Machine Learning Blog

MAY 3, 2023

This includes gathering, exploring, and understanding the business and technical aspects of the data, along with evaluation of any manipulations that may be needed for the model building process. Most of this process is the same for any binary classification except for the feature engineering step.

AWS

AWS Machine Learning Machine Learning ML

Amazon SageMaker built-in LightGBM now offers distributed training using Dask

AWS Machine Learning Blog

JANUARY 30, 2023

Starting today, the SageMaker LightGBM algorithm offers distributed training using the Dask framework for both tabular classification and regression tasks. Because each model is trained with one fixed set of hyperparameter values, the evaluation metric numbers on the hold-out test data can be further improved with hyperparameter optimization.

Algorithm

Algorithm Clustering Machine Learning Machine Learning

Evaluating RAG Pipelines

The MLOps Blog

MAY 15, 2025

TL;DR Evaluation of a RAG pipeline is challenging because it has many components. Each stage, from retrieval to generation and post-processing, requires targeted metrics. RAG evaluation should be approached across three dimensions: performance, cost, and latency. Well cover: Dimensions for evaluating a RAG pipeline.

Database

Database Algorithm ML ML

Churn prediction using multimodality of text and tabular features with Amazon SageMaker Jumpstart

AWS Machine Learning Blog

JANUARY 17, 2023

Evaluate and compare the model performances on the holdout test data. Because the target attribute is binary, our model performs binary prediction, also known as binary classification. The test set is used as the holdout set for model performance evaluation. Prerequisites. BERT + Random Forest.

AWS

AWS Machine Learning Machine Learning Natural Language Processing

Is your model good? A deep dive into Amazon SageMaker Canvas advanced metrics

AWS Machine Learning Blog

JULY 31, 2023

It also enables you to evaluate the models using advanced metrics as if you were a data scientist. In this post, we show how a business analyst can evaluate and understand a classification churn model created with SageMaker Canvas using the Advanced metrics tab. This prediction is called inference.

ML

ML ML Data Preparation Machine Learning

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

With SageMaker Processing jobs, you can use a simplified, managed experience to run data preprocessing or postprocessing and model evaluation workloads on the SageMaker platform. The train, test, and validation datasets and evaluation report that are generated in this pipeline are sent to an S3 bucket.

ML

ML ML AWS Machine Learning

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

SEPTEMBER 29, 2023

Our results reveal that the classification from the KNN model is more accurately representative of the state of the current crop field in 2017 than the ground truth classification data from 2015. Finally, we assess the accuracy of our results and compare this to our ground truth classification.

Machine Learning

Machine Learning Machine Learning ML ML

Classification Algorithm in Machine Learning: A Comprehensive Guide

Pickl AI

AUGUST 28, 2024

Summary: This comprehensive guide covers the basics of classification algorithms, key techniques like Logistic Regression and SVM, and advanced topics such as handling imbalanced datasets. It also includes practical implementation steps and discusses the future of classification in Machine Learning. What is Classification?

Machine Learning

Machine Learning Machine Learning Algorithm K-nearest Neighbors

How to Split Text For Vector Embeddings in Snowflake

phData

NOVEMBER 28, 2024

In this blog, we will discuss: What is Text Splitting, and what is its importance in Vector Embedding? VECTOR_COSINE_SIMILARITY – evaluates the cosine of the angle between vectors, focusing on how closely aligned they are in the direction. Iterate on splitting strategy based on performance metrics.

Python

Python Database SQL Machine Learning

Automate Amazon SageMaker Pipelines DAG creation

AWS Machine Learning Blog

FEBRUARY 29, 2024

You can then iterate on preprocessing, training, and evaluation scripts, as well as configuration choices. The model object is later used in a SageMaker batch transform job for evaluating model performance on a test set. In the context of model training, this is used to calculate the performance metric of a trained model on test data.

AWS

AWS ML ML Machine Learning

An Introduction to Understanding Confusion Matrix in Machine Learning

Pickl AI

AUGUST 12, 2024

Summary: The confusion matrix in Machine Learning is a powerful tool for evaluating classification models. Derived metrics like accuracy and precision guide model improvement, making the confusion matrix essential in Machine Learning. What is a Confusion Matrix in Machine Learning?

Machine Learning

Machine Learning Machine Learning Python Algorithm

Improving asset health and grid resilience using machine learning

AWS Machine Learning Blog

SEPTEMBER 8, 2023

In this blog post, we demonstrate how Duke Energy , a Fortune 150 company headquartered in Charlotte, NC., Next, we present the key metrics used for evaluating the model performance along with the evaluation of our final models. The goal of the model is to do a binary classification between the ROI and background images.

Machine Learning

Machine Learning Machine Learning AWS ML

Learn About ROC Curve and AUC in Machine Learning

Pickl AI

SEPTEMBER 9, 2024

Summary: The ROC Curve and AUC are essential for evaluating binary classifiers in Machine Learning. Both metrics help assess model effectiveness, especially in imbalanced datasets. Introduction Evaluating Machine Learning models is crucial to ensure their effectiveness and reliability.

Machine Learning

Machine Learning Machine Learning Algorithm Deep Learning

Evaluation Metrics for Classification Models in Machine Learning (Part 2)

Heartbeat

DECEMBER 13, 2023

Guide to evaluation metrics for classification in machine learning Photo by Jon Tyson on Unsplash In machine learning, data scientists use evaluation metrics to assess the model's performance in terms of the ability of the various machine learning models to classify the data points into their respective classes accurately.

Machine Learning

Machine Learning Machine Learning Data Scientist Deep Learning

Efficient continual pre-training LLMs for financial domains

AWS Machine Learning Blog

MARCH 28, 2024

Although you can learn more about the comprehensive evaluation results in the paper , the following sample captured from the BloombergGPT paper can give you a glimpse of the benefit of training LLMs using financial domain-specific data. FiQA SA – An aspect-based sentiment classification task based on financial news and headlines.

AWS

AWS Machine Learning Machine Learning Data Quality

Amazon SageMaker XGBoost now offers fully distributed GPU training

AWS Machine Learning Blog

MAY 30, 2023

You can use XGBoost for regression, classification (binary and multiclass), and ranking problems. Benchmarks We benchmarked evaluation metrics to ensure that the model quality didn’t deteriorate with the multi-GPU training path compared to single-GPU training. You can use GPUs to accelerate training on large datasets.

Algorithm

Algorithm ML ML Machine Learning

Evaluating Siamese Network Accuracy (F1-Score, Precision, and Recall) with Keras and TensorFlow

PyImageSearch

FEBRUARY 5, 2024

Home Table of Contents Evaluating Siamese Network Accuracy (F1-Score, Precision, and Recall) with Keras and TensorFlow Building the Face Recognition Application with Siamese Networks Introduction to Model Evaluation in Face Recognition Introduction to Siamese Networks in Facial Recognition Systems Utilizing Siamese Networks for Face Verification Overview (..)

Database

Database Data Pipeline Deep Learning Deep Learning

Effectively solve distributed training convergence issues with Amazon SageMaker Hyperband Automatic Model Tuning

AWS Machine Learning Blog

JULY 13, 2023

To demonstrate the effect of scaling out training on model convergence, we run two simple experiments: Train an image classification model using a fully connected-layer DNN with ReLU activation functions using MXNet and Gluon frameworks. Train a binary classification model using the SageMaker built-in XGBoost algorithm.

Clustering

Clustering Algorithm Deep Learning Deep Learning

Faster R-CNNs

PyImageSearch

NOVEMBER 13, 2023

For example, image classification, image search engines (also known as content-based image retrieval, or CBIR), simultaneous localization and mapping (SLAM), and image segmentation, to name a few, have all been changed since the latest resurgence in neural networks and deep learning. 2015 ), SSD ( Fei-Fei et al., An IoU score > 0.

Deep Learning

Deep Learning Deep Learning Algorithm Support Vector Machines

How to build a decision tree model in IBM Db2

IBM Journey to AI blog

APRIL 13, 2023

These are my major steps in this tutorial: Set up Db2 tables Explore ML dataset Preprocess the dataset Train a decision tree model Generate predictions using the model Evaluate the model I implemented these steps in a Db2 Warehouse on-prem database. I use these counts to compute a few evaluation metrics for the model.

Decision Trees

Decision Trees ML ML Database

Democratize computer vision defect detection for manufacturing quality using no-code machine learning with Amazon SageMaker Canvas

AWS Machine Learning Blog

JUNE 30, 2023

Now, quality engineers and others on the shop floor can build and evaluate these models using no-code ML services, which can accelerate exploration and adoption of these models more broadly in manufacturing operations. This level of accuracy is encouraging, so we can continue the evaluation.

Machine Learning

Machine Learning Machine Learning ML ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

The result of these events can be evaluated afterwards so that they make better decisions in the future. The retrained model might not give a more accurate forecasting result than the existing one, so we can’t simply replace the model with the new one without any evaluation. However, this approach is reactive.

AWS

AWS ML ML ETL

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

The Amazon ML Solutions Lab and L&W team embarked on an end-to-end journey from formulating the ML problem and defining the evaluation metrics, to delivering a high-quality solution. Model performance results In this section, we present the model performance evaluation metrics and results.

AWS

AWS ML ML Machine Learning

How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 5, 2023

The second is a model training interface managed through SageMaker, which allows us to train, tune, and evaluate our model before it is deployed to a production endpoint. We formulate the ML problem as a binary classification task with a goal of predicting equipment faults in the next 60 days. True negative case Figure 5.2:

AWS

AWS ML ML Machine Learning

Improved ML model deployment using Amazon SageMaker Inference Recommender

AWS Machine Learning Blog

APRIL 20, 2023

Each machine learning (ML) system has a unique service level agreement (SLA) requirement with respect to latency, throughput, and cost metrics. We train an XGBoost model for a classification task on a credit card fraud dataset. We demonstrate how to set up Inference Recommender jobs for a credit card fraud detection use case.

ML

ML ML Python AWS

Classification in ML: Lessons Learned From Building and Deploying a Large-Scale Model

The MLOps Blog

DECEMBER 19, 2022

Classification is one of the most widely applied areas in Machine Learning. As Data Scientists, we all have worked on an ML classification model. Do you remember what was the number of classes in the classification problem you solved, at max, maybe 100 or 200? The product catalogue might have close to a million unique products.

ML

ML ML Algorithm Deep Learning

Simplifying the Image Classification Workflow with Lightning & Comet ML

Heartbeat

JUNE 26, 2023

Today, I’ll walk you through how to implement an end-to-end image classification project with Lightning , Comet ML, and Gradio libraries. After that, we’ll track hyperparameters, monitor metrics and save the model with Comet ML. Please keep in mind that you can find the notebook we’re going to use in this blog here.

ML

ML ML Deep Learning Deep Learning

Train a MaskFormer Segmentation Model with Hugging Face Transformers

PyImageSearch

MARCH 13, 2023

Evaluate library. You will learn how to: Load and preprocess the dataset Use the transformers Trainer class to train models Evaluate your trained segmentation model Project Structure For this tutorial, we will use a Colab Notebook. To learn more about the different segmentation subtasks, check out a previous blog post.

Deep Learning

Deep Learning Deep Learning Python Computer Science

How to Save Trained Model in Python

The MLOps Blog

MAY 10, 2023

But you must be aware that save is a single action and gives only a model binary file, so you still need code to make your ML application production-ready. To begin with, let’s create a simple classification model using the most famous Iris-dataset. Usually, all ML and DL models provide some kind of method (eg.

Python

Python ML ML Database

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

AWS Machine Learning Blog

AUGUST 16, 2023

This is a guest blog post co-written with Ben Veasey, Jeremy Anderson, Jordan Knight, and June Li from Travelers. By training on large amounts of unlabeled image data, self-supervised models learn image representations that can be transferred to downstream tasks, such as image classification or segmentation.

ML

ML ML Data Scientist AWS

Create SageMaker Pipelines for training, consuming and monitoring your batch use cases

AWS Machine Learning Blog

APRIL 21, 2023

The first pipeline includes the steps needed to prepare data, train the model, and evaluate the performance of the model. If the model performs acceptably according to the evaluation criteria, the pipeline continues with a step to baseline the data using a built-in SageMaker Pipelines step. Evaluate the model. Train the model.

Data Quality

Data Quality ML ML AWS

What is a Confusion Matrix? Understand the 4 Key Metric of its Interpretation

Receiver Operating Characteristic (ROC) and Area Under the Curve Explained

Trending Sources

How Can You Check the Accuracy of Your Machine Learning Model?

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

Classifiers in Machine Learning

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

Build a multi-tenant generative AI environment for your enterprise on AWS

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Faster distributed graph neural network training with GraphStorm v0.4

Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock

How to tackle lack of data: an overview on transfer learning

Reduce inference time for BERT models using neural architecture search and SageMaker Automated Model Tuning

Image classification model selection using Amazon SageMaker JumpStart

How Vericast optimized feature engineering using Amazon SageMaker Processing

Amazon SageMaker built-in LightGBM now offers distributed training using Dask

Evaluating RAG Pipelines

Churn prediction using multimodality of text and tabular features with Amazon SageMaker Jumpstart

Is your model good? A deep dive into Amazon SageMaker Canvas advanced metrics

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

Classification Algorithm in Machine Learning: A Comprehensive Guide

How to Split Text For Vector Embeddings in Snowflake

Automate Amazon SageMaker Pipelines DAG creation

An Introduction to Understanding Confusion Matrix in Machine Learning

Improving asset health and grid resilience using machine learning

Learn About ROC Curve and AUC in Machine Learning

Evaluation Metrics for Classification Models in Machine Learning (Part 2)

Efficient continual pre-training LLMs for financial domains

Amazon SageMaker XGBoost now offers fully distributed GPU training

Evaluating Siamese Network Accuracy (F1-Score, Precision, and Recall) with Keras and TensorFlow

Effectively solve distributed training convergence issues with Amazon SageMaker Hyperband Automatic Model Tuning

Faster R-CNNs

How to build a decision tree model in IBM Db2

Democratize computer vision defect detection for manufacturing quality using no-code machine learning with Amazon SageMaker Canvas

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker

Improved ML model deployment using Amazon SageMaker Inference Recommender

Classification in ML: Lessons Learned From Building and Deploying a Large-Scale Model

Simplifying the Image Classification Workflow with Lightning & Comet ML

Train a MaskFormer Segmentation Model with Hugging Face Transformers

How to Save Trained Model in Python

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

Create SageMaker Pipelines for training, consuming and monitoring your batch use cases

Stay Connected