Computer Science, Data Preparation and ML

Accelerate data preparation for ML in Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 29, 2023

Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler.

Data Preparation

Data Preparation ML ML Data Quality

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. We add this data to Snowflake as a new table.

ML

ML ML AWS Python

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. The next step is to build ML models using features selected from one or multiple feature groups.

ML

ML ML AWS Data Warehouse

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch. Amazon SageMaker is a comprehensive, fully managed machine learning (ML) platform that revolutionizes the entire ML workflow.

AWS

AWS Computer Science Computer Science Database

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

We discuss the important components of fine-tuning, including use case definition, data preparation, model customization, and performance evaluation. This post dives deep into key aspects such as hyperparameter optimization, data cleaning techniques, and the effectiveness of fine-tuning compared to base models.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

Best practices for Meta Llama 3.2 multimodal fine-tuning on Amazon Bedrock

AWS Machine Learning Blog

MAY 1, 2025

Best practices for data preparation The quality and structure of your training data fundamentally determine the success of fine-tuning. Our experiments revealed several critical insights for preparing effective multimodal datasets: Data structure You should use a single image per example rather than multiple images.

AWS

AWS ML ML AI

Amazon Bedrock Model Distillation: Boost function calling accuracy while reducing cost and latency

AWS Machine Learning Blog

APRIL 30, 2025

Preparing your data Effective data preparation is crucial for successful distillation of agent function calling capabilities. Amazon Bedrock provides two primary methods for preparing your training data: uploading JSONL files to Amazon S3 or using historical invocation logs.

AWS

AWS AI AI Computer Science

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. SageMaker is a fully managed service for building, training, and deploying ML models.

Machine Learning

Machine Learning Machine Learning ML ML

What is MLOps

Towards AI

AUGUST 16, 2023

Pietro Jeng on Unsplash MLOps is a set of methods and techniques to deploy and maintain machine learning (ML) models in production reliably and efficiently. Thus, MLOps is the intersection of Machine Learning, DevOps, and Data Engineering (Figure 1). There is no central store to manage models (versions and stage transitions).

Machine Learning

Machine Learning Machine Learning ML ML

How to Learn AI

Towards AI

AUGUST 24, 2023

Common mistakes and misconceptions about learning AI/ML Markus Spiske on Unsplash A common misconception of beginners is that they can learn AI/ML from a few tutorials that implement the latest algorithms, so I thought I would share some notes and advice on learning AI. Trying to code ML algorithms from scratch.

AI

AI AI Algorithm ML

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

aws sagemaker create-cluster --cli-input-json file://cluster-config.json --region $AWS_REGION You should be able to see your cluster by navigating to SageMaker Hyperpod in the AWS Management Console and see a cluster named ml-cluster listed. After a few minutes, its status should change from Creating to InService.

AWS

AWS Clustering Deep Learning Deep Learning

HAYAT HOLDING uses Amazon SageMaker to increase product quality and optimize manufacturing output, saving $300,000 annually

AWS Machine Learning Blog

MARCH 29, 2023

there is enormous potential to use machine learning (ML) for quality prediction. ML-based predictive quality in HAYAT HOLDING HAYAT is the world’s fourth-largest branded baby diapers manufacturer and the largest paper tissue manufacturer of the EMEA. After the data preparation phase, a two-stage approach is used to build the ML models.

ML

ML ML AWS Machine Learning

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

AWS Machine Learning Blog

DECEMBER 13, 2023

Machine learning (ML) models do not operate in isolation. To deliver value, they must integrate into existing production systems and infrastructure, which necessitates considering the entire ML lifecycle during design and development. GitHub serves as a centralized location to store, version, and manage your ML code base.

AWS

AWS ML ML Data Preparation

How LLMs are Transforming Bot Building, Botnet Detection at Scale, and Declarative ML for Engineers

ODSC - Open Data Science

APRIL 13, 2023

Hands-on Data-Centric AI: Data Preparation Tuning — Why and How? Going into developing machine learning models with a hands-on, data-centric AI approach has its benefits and requires a few extra steps to achieve. Learn more here.

ML

ML ML Data Science Machine Learning

The AI Process

Towards AI

AUGUST 16, 2023

In fact, AI/ML graduate textbooks do not provide a clear and consistent description of the AI software engineering process. Therefore, I thought it would be helpful to give a complete description of the AI engineering process or AI Process, which is described in most AI/ML textbooks [5][6]. 85% or more of AI projects fail [1][2].

AI

AI AI Machine Learning Machine Learning

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 12, 2023

The machine learning (ML) model classifies new incoming customer requests as soon as they arrive and redirects them to predefined queues, which allows our dedicated client success agents to focus on the contents of the emails according to their skills and provide appropriate responses.

Data Science

Data Science Data Scientist AWS ML

Build well-architected IDP solutions with a custom lens – Part 2: Security

AWS Machine Learning Blog

NOVEMBER 22, 2023

Only involving necessary people to do case validation or augmentation tasks reduces the risk of document mishandling and human error when dealing with sensitive data. About the Authors Sherry Ding is a senior artificial intelligence (AI) and machine learning (ML) specialist solutions architect at Amazon Web Services (AWS).

AWS

AWS ML ML Machine Learning

Machine learning with decentralized training data using federated learning on Amazon SageMaker

AWS Machine Learning Blog

AUGUST 22, 2023

Machine learning (ML) is revolutionizing solutions across industries and driving new forms of insights and intelligence from data. Many ML algorithms train over large datasets, generalizing patterns it finds in the data and inferring results from those patterns as new unseen records are processed.

Machine Learning

Machine Learning Machine Learning AWS ML

Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines

AWS Machine Learning Blog

AUGUST 16, 2023

MLOps is a key discipline that often oversees the path to productionizing machine learning (ML) models. MLOps tooling helps you repeatably and reliably build and simplify these processes into a workflow that is tailored for ML. This capability leads to significant time and computational resource savings.

ML

ML ML Data Scientist Python

Fine-tune large multimodal models using Amazon SageMaker

AWS Machine Learning Blog

MAY 29, 2024

Figure 1: LLaVA architecture Prepare data When it comes to fine-tuning the LLaVA model for specific tasks or domains, data preparation is of paramount importance because having high-quality, comprehensive annotations enables the model to learn rich representations and achieve human-level performance on complex visual reasoning challenges.

ML

ML ML AWS Data Visualization

Fine-tune Whisper models on Amazon SageMaker with LoRA

AWS Machine Learning Blog

NOVEMBER 16, 2023

input_ids return batch #apply the data preparation function to all of our fine-tuning dataset samples using dataset's.map method. His current areas of focus are AI/ML infrastructure and applications. Dr. Changsha Ma is an AI/ML Specialist at AWS.

AWS

AWS ML ML Computer Science

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

AWS Machine Learning Blog

AUGUST 14, 2023

Some of the models offer capabilities for you to fine-tune them with your own data. SageMaker JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for machine learning (ML) with SageMaker. Mia Chang is an ML Specialist Solutions Architect for Amazon Web Services.

AWS

AWS Database AI AI

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

These activities cover disparate fields such as basic data processing, analytics, and machine learning (ML). ML is often associated with PBAs, so we start this post with an illustrative figure. The ML paradigm is learning followed by inference. The union of advances in hardware and ML has led us to the current day.

AWS

AWS ML ML Clustering

Advanced RAG patterns on Amazon SageMaker

AWS Machine Learning Blog

MARCH 28, 2024

It provides a collection of pre-trained models that you can deploy quickly and with ease, accelerating the development and deployment of machine learning (ML) applications. Data preparation In this post, we use several years of Amazon’s Letters to Shareholders as a text corpus to perform QnA on.

AWS

AWS Machine Learning Machine Learning AI

Artificial Intelligence Using Python: A Comprehensive Guide

Pickl AI

JULY 12, 2024

Here are a few of the key concepts that you should know: Machine Learning (ML) This is a type of AI that allows computers to learn without being explicitly programmed. Machine Learning algorithms are trained on large amounts of data, and they can then use that data to make predictions or decisions about new data.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Python Natural Language Processing

Understanding Everything About UCI Machine Learning Repository!

Pickl AI

DECEMBER 3, 2024

Established in 1987 at the University of California, Irvine, it has become a global go-to resource for ML practitioners and researchers. The UCI Machine Learning Repository is a well-known online resource that houses vast Machine Learning (ML) research and applications datasets. The global Machine Learning market continues to expand.

Machine Learning

Machine Learning Machine Learning Clustering Supervised Learning

How Data Science and AI is Changing the Future

Pickl AI

NOVEMBER 5, 2024

It combines various techniques from statistics, mathematics, computer science, and domain expertise to interpret complex data sets. AI encompasses various subfields, including Machine Learning (ML), Natural Language Processing (NLP), robotics, and computer vision.

Data Science

Data Science Artificial Intelligence Artificial Intelligence Machine Learning

Reflecting on a decade of data science and the future of visualization tools

Tableau

FEBRUARY 24, 2021

However, another motivation was a personal reflection on a field that did not yet exist a little over a decade ago when I first began my advanced studies in computer science. Moreover, the work carried out by data scientists is distinct from other types of data analysis, because it requires a wider breadth of multidisciplinary skills.

Data Science

Data Science Data Scientist Data Visualization Computer Science

Reflecting on a decade of data science and the future of visualization tools

Tableau

FEBRUARY 24, 2021

However, another motivation was a personal reflection on a field that did not yet exist a little over a decade ago when I first began my advanced studies in computer science. Moreover, the work carried out by data scientists is distinct from other types of data analysis, because it requires a wider breadth of multidisciplinary skills.

Data Science

Data Science Data Scientist Data Visualization Computer Science

Deploy RAG applications on Amazon SageMaker JumpStart using FAISS

AWS Machine Learning Blog

DECEMBER 5, 2024

About the Authors Raghu Ramesha is an ML Solutions Architect with the Amazon SageMaker Service team. He focuses on helping customers build, deploy, and migrate ML production workloads to SageMaker at scale. Ram Vegiraju is an ML Architect with the Amazon SageMaker Service team. In his spare time, he loves traveling and writing.

AWS

AWS ML ML Machine Learning

15 Fan-Favorite Speakers & Instructors Returning for ODSC East 2025

ODSC - Open Data Science

MARCH 18, 2025

Allen Downey, PhD, Principal Data Scientist at PyMCLabs Allen is the author of several booksincluding Think Python, Think Bayes, and Probably Overthinking Itand a blog about data science and Bayesian statistics. in computer science from the University of California, Berkeley; and Bachelors and Masters degrees fromMIT.

Data Science

Data Science Machine Learning Machine Learning Data Scientist

Data Science Current

Accelerate data preparation for ML in Amazon SageMaker Canvas

Use Snowflake as a data source to train ML models with Amazon SageMaker

Webinars

Trending Sources

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Webinars

Cohere Embed multimodal embeddings model is now available on Amazon SageMaker JumpStart

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

Best practices for Meta Llama 3.2 multimodal fine-tuning on Amazon Bedrock

Amazon Bedrock Model Distillation: Boost function calling accuracy while reducing cost and latency

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

What is MLOps

How to Learn AI

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

HAYAT HOLDING uses Amazon SageMaker to increase product quality and optimize manufacturing output, saving $300,000 annually

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

How LLMs are Transforming Bot Building, Botnet Detection at Scale, and Declarative ML for Engineers

The AI Process

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

Build well-architected IDP solutions with a custom lens – Part 2: Security

Machine learning with decentralized training data using federated learning on Amazon SageMaker

Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines

Fine-tune large multimodal models using Amazon SageMaker

Fine-tune Whisper models on Amazon SageMaker with LoRA

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

A review of purpose-built accelerators for financial services

Advanced RAG patterns on Amazon SageMaker

Artificial Intelligence Using Python: A Comprehensive Guide

Understanding Everything About UCI Machine Learning Repository!

How Data Science and AI is Changing the Future

Reflecting on a decade of data science and the future of visualization tools

Reflecting on a decade of data science and the future of visualization tools

Deploy RAG applications on Amazon SageMaker JumpStart using FAISS

15 Fan-Favorite Speakers & Instructors Returning for ODSC East 2025

Stay Connected