AI, AWS, Clustering and Natural Language Processing

Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium

AWS Machine Learning Blog

OCTOBER 5, 2023

In this post, we walk through how to fine-tune Llama 2 on AWS Trainium , a purpose-built accelerator for LLM training, to reduce training times and costs. We review the fine-tuning scripts provided by the AWS Neuron SDK (using NeMo Megatron-LM), the various configurations we used, and the throughput results we saw.

AWS

AWS Machine Learning Machine Learning Deep Learning

Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators

Flipboard

JUNE 20, 2023

For reference, GPT-3, an earlier generation LLM has 175 billion parameters and requires months of non-stop training on a cluster of thousands of accelerated processors. The Carbontracker study estimates that training GPT-3 from scratch may emit up to 85 metric tons of CO2 equivalent, using clusters of specialized hardware accelerators.

AWS

AWS Machine Learning Machine Learning Deep Learning

Monitor embedding drift for LLMs deployed from Amazon SageMaker JumpStart

AWS Machine Learning Blog

FEBRUARY 2, 2024

One of the most useful application patterns for generative AI workloads is Retrieval Augmented Generation (RAG). Embeddings capture the information content in bodies of text, allowing natural language processing (NLP) models to work with language in a numeric form.

AWS

AWS Clustering ETL Database

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

What Is Retrieval-Augmented Generation?

Hacker News

NOVEMBER 15, 2023

To understand the latest advance in generative AI , imagine a courtroom. Like a good judge, large language models ( LLMs ) can respond to a wide variety of human queries. Like a good judge, large language models ( LLMs ) can respond to a wide variety of human queries. So, What Is Retrieval-Augmented Generation?

Database

Database AI AI Natural Language Processing

Training Sessions Coming to ODSC APAC 2023

ODSC - Open Data Science

AUGUST 15, 2023

Advancements in data science and AI are coming at a lightning-fast pace. Full-Stack Machine Learning for Data Scientists Hugo Bowne-Anderson, PhD | Head of Data Science Evangelism and Marketing | Outerbounds This session will address the issue of how to make the life cycle of a machine learning project a repeatable process.

Machine Learning

Machine Learning Machine Learning Data Science Data Scientist

Develop and train large models cost-efficiently with Metaflow and AWS Trainium

AWS Machine Learning Blog

APRIL 29, 2024

To build a production-grade AI system today (for example, to do multilingual sentiment analysis of customer support conversations), what are the primary technical challenges? Historically, natural language processing (NLP) would be a primary research and development expense.

AWS

AWS ML ML Python

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

ODSC - Open Data Science

FEBRUARY 17, 2023

Natural language processing (NLP) has been growing in awareness over the last few years, and with the popularity of ChatGPT and GPT-3 in 2022, NLP is now on the top of peoples’ minds when it comes to AI. Companies are finding NLP to be one of the best applications of AI regardless of industry.

Deep Learning

Deep Learning Deep Learning Data Science Natural Language Processing

Build a powerful question answering bot with Amazon SageMaker, Amazon OpenSearch Service, Streamlit, and LangChain

AWS Machine Learning Blog

MAY 25, 2023

One of the most common applications of generative AI and large language models (LLMs) in an enterprise environment is answering questions based on the enterprise’s knowledge corpus. Amazon Lex provides the framework for building AI based chatbots. Amazon SageMaker Processing jobs for large scale data ingestion into OpenSearch.

AWS

AWS Clustering Python ML

Creating an artificial intelligence 101

Dataconomy

MARCH 13, 2023

The creation of artificial intelligence (AI) has long been a dream of scientists, engineers, and innovators. With advances in machine learning, deep learning, and natural language processing, the possibilities of what we can create with AI are limitless. How to create an artificial intelligence?

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Natural Language Processing Algorithm

How Vericast optimized feature engineering using Amazon SageMaker Processing

AWS Machine Learning Blog

MAY 3, 2023

Furthermore, the dynamic nature of a customer’s data can also result in a large variance of the processing time and resources required to optimally complete the feature engineering. AWS customer Vericast is a marketing solutions company that makes data-driven decisions to boost marketing ROIs for its clients.

AWS

AWS Machine Learning Machine Learning ML

Data Science Career FAQs Answered: Educational Background

Mlearning.ai

MAY 23, 2023

Check out this course to upskill on Apache Spark — [link] Cloud Computing technologies such as AWS, GCP, Azure will also be a plus. Check this course to upskill on AWS — [link] Domain Knowledge Having expertise in a specific industry domain, such as finance, healthcare, or marketing, can be advantageous.

Data Science

Data Science Data Scientist Apache Hadoop Machine Learning

Build enterprise-ready generative AI solutions with Cohere foundation models in Amazon Bedrock and Weaviate vector database on AWS Marketplace

AWS Machine Learning Blog

JANUARY 24, 2024

Generative AI solutions have the potential to transform businesses by boosting productivity and improving customer experiences, and using large language models (LLMs) with these solutions has become increasingly popular. Where is the data processed? Who has access to the data?

AWS

AWS Database AI AI

Zero-shot and few-shot prompting for the BloomZ 176B foundation model with the simplified Amazon SageMaker JumpStart SDK

AWS Machine Learning Blog

AUGUST 14, 2023

In this post and accompanying notebook, we demonstrate how to deploy the BloomZ 176B foundation model using the SageMaker Python simplified SDK in Amazon SageMaker JumpStart as an endpoint and use it for various natural language processing (NLP) tasks. You can also access the foundation models thru Amazon SageMaker Studio.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

AWS Machine Learning Blog

MAY 1, 2024

Large language models (LLMs) are making a significant impact in the realm of artificial intelligence (AI). Llama2 by Meta is an example of an LLM offered by AWS. Llama 2 is an auto-regressive language model that uses an optimized transformer architecture and is intended for commercial and research use in English.

AWS

AWS ML ML Clustering

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

AWS Machine Learning Blog

APRIL 19, 2023

Business requirements We are the US squad of the Sportradar AI department. Then we needed to Dockerize the application, write a deployment YAML file, deploy the gRPC server to our Kubernetes cluster, and make sure it’s reliable and auto scalable. Zach Kimberg is a Software Developer in the Amazon AI org.

ML

ML ML Deep Learning Deep Learning

Must-Have Prompt Engineering Skills for 2024

ODSC - Open Data Science

JANUARY 29, 2024

The role of prompt engineer has attracted massive interest ever since Business Insider released an article last spring titled “ AI ‘Prompt Engineer Jobs: $375k Salary, No Tech Backgrund Required.” While many of us dream of having a job in AI that doesn’t require knowing AI tools and skillsets, that’s not actually the case.

Data Science

Data Science Machine Learning Machine Learning Natural Language Processing

The Memory Bank of LLMs

Mlearning.ai

JUNE 23, 2023

Relational databases (like MySQL) or No-SQL databases (AWS DynamoDB) can store structured or even semi-structured data but there is one inherent problem. Options (Free vs Paid) Closing Introduction In today’s increasingly globalized world, the ability to communicate in multiple languages has become a highly valuable skill.

Database

Database ML ML Natural Language Processing

Scale LLMs with PyTorch 2.0 FSDP on Amazon EKS – Part 2

AWS Machine Learning Blog

APRIL 1, 2024

Machine learning (ML) research has proven that large language models (LLMs) trained with significantly large datasets result in better model quality. Distributed model training requires a cluster of worker nodes that can scale. The following figure shows how FSDP works for two data parallel processes.

Clustering

Clustering AWS ML ML

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

IAM role – SageMaker requires an AWS Identity and Access Management (IAM) role to be assigned to a SageMaker Studio domain or user profile to manage permissions effectively. To learn more about SageMaker Studio JupyterLab Spaces, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools.

SQL

SQL AWS Database Data Scientist

Top 15 Data Analytics Projects in 2023 for beginners to Experienced

Pickl AI

JULY 20, 2023

5. Text Analytics and Natural Language Processing (NLP) Projects: These projects involve analyzing unstructured text data, such as customer reviews, social media posts, emails, and news articles. To ascertain the general sentiment and deal with any potential problems, use natural language processing (NLP) tools.

Analytics

Analytics Analytics Big Data Big Data

Enhance conversational AI with advanced routing techniques with Amazon Bedrock

AWS Machine Learning Blog

APRIL 24, 2024

Conversational artificial intelligence (AI) assistants are engineered to provide precise, real-time responses through intelligent routing of queries to the most suitable AI functions. With AWS generative AI services like Amazon Bedrock , developers can create systems that expertly manage and respond to user requests.

AWS

AWS AI AI SQL

Building a Sentiment Classification System With BERT Embeddings: Lessons Learned

The MLOps Blog

JANUARY 25, 2023

Sentiment analysis, commonly referred to as opinion mining/sentiment classification, is the technique of identifying and extracting subjective information from source materials using computational linguistics , text analysis , and natural language processing. positive, negative, neutral).

Natural Language Processing

Natural Language Processing ML ML Deep Learning

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Flipboard

NOVEMBER 17, 2023

Generative AI models have the potential to revolutionize enterprise operations, but businesses must carefully consider how to harness their power while overcoming challenges such as safeguarding data and ensuring the quality of AI-generated content. Solution overview The following diagram illustrates the solution architecture.

K-nearest Neighbors

K-nearest Neighbors AWS Clustering Database

Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart

AWS Machine Learning Blog

MAY 2, 2023

Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available. Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available. Rachna Chadha is a Principal Solution Architect AI/ML in Strategic Accounts at AWS.

Algorithm

Algorithm Machine Learning Machine Learning Natural Language Processing

Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 2

AWS Machine Learning Blog

APRIL 19, 2024

We stored the embeddings in a vector database and then used the Large Language-and-Vision Assistant (LLaVA 1.5-7b) We used AWS services including Amazon Bedrock , Amazon SageMaker , and Amazon OpenSearch Serverless in this solution. In this post, we demonstrate a different approach. The models are enabled for use immediately.

AWS

AWS ML ML Database

Deploy a Hugging Face (PyAnnote) speaker diarization model on Amazon SageMaker as an asynchronous endpoint

AWS Machine Learning Blog

APRIL 25, 2024

We provide a comprehensive guide on how to deploy speaker segmentation and clustering solutions using SageMaker on the AWS Cloud. Solution overview Amazon Transcribe is the go-to service for speaker diarization in AWS. Make sure the AWS account has a service quota for hosting a SageMaker endpoint for an ml.g5.2xlarge instance.

AWS

AWS ML ML Python

Master the Power of Machine Learning with PyCaret: A Step-by-Step Guide

Mlearning.ai

JUNE 28, 2023

{This article was written without the assistance or use of AI tools, providing an authentic and insightful exploration of PyCaret} Image by Author ‍In the rapidly evolving realm of data science, the imperative to automate machine learning workflows has become an indispensable requisite for enterprises aiming to outpace their competitors.

Machine Learning

Machine Learning Machine Learning Data Preparation Data Science

Getting started with Amazon Titan Text Embeddings

AWS Machine Learning Blog

JANUARY 31, 2024

Embeddings play a key role in natural language processing (NLP) and machine learning (ML). Text embedding refers to the process of transforming text into numerical representations that reside in a high-dimensional vector space. You can use it via either the Amazon Bedrock REST API or the AWS SDK.

Natural Language Processing

Natural Language Processing AWS Machine Learning Machine Learning

How Booking.com modernized its ML experimentation framework with Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 12, 2024

One of the several challenges faced was adapting the existing on-premises pipeline solution for use on AWS. The solution involved two key components: Modifying and extending existing code – The first part of our solution involved the modification and extension of our existing code to make it compatible with AWS infrastructure.

ML

ML ML AWS Machine Learning

From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 23, 2023

Given this mission, Talent.com and AWS joined forces to create a job recommendation engine using state-of-the-art natural language processing (NLP) and deep learning model training techniques with Amazon SageMaker to provide an unrivaled experience for job seekers. The recommendation system has driven an 8.6%

AWS

AWS Deep Learning Deep Learning Machine Learning

Build financial search applications using the Amazon Bedrock Cohere multilingual embedding model

AWS Machine Learning Blog

JANUARY 12, 2024

In this post, we showcase an application that can search and query across financial news in different languages using Cohere’s Embed and Rerank models with Amazon Bedrock. Cohere’s multilingual embedding model generates vector representations of documents for over 100 languages and is available on Amazon Bedrock.

Natural Language Processing

Natural Language Processing AWS Data Science Database

Accelerate hyperparameter grid search for sentiment analysis with BERT models using Weights & Biases, Amazon EKS, and TorchElastic

AWS Machine Learning Blog

MARCH 2, 2023

In our solution, we implement a hyperparameter grid search on an EKS cluster for tuning a bert-base-cased model for classifying positive or negative sentiment for stock market data headlines. A desired cluster can simply be configured using the eks.conf file and launched by running the eks-create.sh to launch the cluster.

Clustering

Clustering AWS Deep Learning Deep Learning

Top 6 Kubernetes use cases

IBM Journey to AI blog

NOVEMBER 13, 2023

Nodes run the pods and are usually grouped in a Kubernetes cluster, abstracting the underlying physical hardware resources. Kubernetes’s declarative, API -driven infrastructure has helped free up DevOps and other teams from manually driven processes so they can work more independently and efficiently to achieve their goals.

Machine Learning

Machine Learning Machine Learning ML ML

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

AWS Machine Learning Blog

NOVEMBER 1, 2023

Now a part of Reveal’s AI-powered eDiscovery platform, Logikcull is a self-service solution that allows legal professionals to process, review, tag, and produce electronic documents as part of a lawsuit or investigation. The servers make the API call by passing the text and language of input documents.

AWS

AWS Machine Learning Machine Learning Clustering

Authoring custom transformations in Amazon SageMaker Data Wrangler using NLTK and SciPy

AWS Machine Learning Blog

APRIL 17, 2023

In other words, companies need to move from a model-centric approach to a data-centric approach.” – Andrew Ng A data-centric AI approach involves building AI systems with quality data involving data preparation and feature engineering. AWS AI services easily integrate with your applications to address many common use cases.

AWS

AWS ML ML Python

Alida gains deeper understanding of customer feedback with Amazon Bedrock

AWS Machine Learning Blog

MARCH 4, 2024

However, when employing the use of traditional natural language processing (NLP) models, they found that these solutions struggled to fully understand the nuanced feedback found in open-ended survey responses. In a console terminal using the Amazon Bedrock CLI.

AWS

AWS ML ML Machine Learning

Training large language models on Amazon SageMaker: Best practices

AWS Machine Learning Blog

MARCH 6, 2023

These factors require training an LLM over large clusters of accelerated machine learning (ML) instances. In the past few years, numerous customers have been using the AWS Cloud for LLM training. We recommend working with your AWS account team or contacting AWS Sales to determine the appropriate Region for your LLM workload.

AWS

AWS Clustering ML ML

Power recommendations and search using an IMDb knowledge graph – Part 3

AWS Machine Learning Blog

JANUARY 6, 2023

Many AWS media and entertainment customers license IMDb data through AWS Data Exchange to improve content discovery and increase customer engagement and retention. We downloaded the data from AWS Data Exchange and processed it in AWS Glue to generate KG files. Background. Prerequisites.

AWS

AWS ML ML Machine Learning

Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library

AWS Machine Learning Blog

JUNE 12, 2023

GPT-J is an open-source 6-billion-parameter model released by Eleuther AI. The model is trained on the Pile and can perform various tasks in language processing. The DLCs are developed through a collaboration between AWS and Hugging Face. For more information, see Near-linear scaling of gigantic-model training on AWS.

AWS

AWS Deep Learning Deep Learning Machine Learning

Fine-tune and Deploy Mistral 7B with Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 14, 2023

This results in a need for further fine-tuning of these generative AI models over the use case-specific and domain-specific data. What is Mistral 7B Mistral 7B is a foundation model developed by Mistral AI, supporting English text and code generation abilities. In this section, we provide examples of two types of fine-tuning.

Natural Language Processing

Natural Language Processing Python Machine Learning Machine Learning

A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction

AWS Machine Learning Blog

SEPTEMBER 27, 2023

The team is looking for a generative AI question answering solution to quickly get information and proceed with their engineering design. The existing generative AI solutions for question answering are mainly based on Retrieval Augmented Generation (RAG). node on SageMaker and compared the inference results for the 125 questions.

AI

AI AI ML ML

How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker

AWS Machine Learning Blog

JANUARY 20, 2023

In this post, we discuss how CCC Intelligent Solutions (CCC) combined Amazon SageMaker with other AWS services to create a custom solution capable of hosting the types of complex artificial intelligence (AI) models envisioned. The challenge CCC processes more than $1 trillion claims transactions annually.

AWS

AWS AI AI Computer Science

Gemma is now available in Amazon SageMaker JumpStart

AWS Machine Learning Blog

MARCH 13, 2024

Gemma was launched with a new Responsible Generative AI Toolkit that provides guidance and essential tools for creating safer AI applications with Gemma. Because the models are hosted and deployed on AWS, your data, whether used for evaluating the model or using it at scale, is never shared with third parties.

Machine Learning

Machine Learning Machine Learning Algorithm Python

Mitigate hallucinations through Retrieval Augmented Generation using Pinecone vector database & Llama-2 from Amazon SageMaker JumpStart

AWS Machine Learning Blog

DECEMBER 6, 2023

Despite the seemingly unstoppable adoption of LLMs across industries, they are one component of a broader technology ecosystem that is powering the new AI wave. Many conversational AI use cases require LLMs like Llama 2, Flan T5, and Bloom to respond to user queries. These models rely on parametric knowledge to answer questions.

Database

Database AWS ML ML

Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium

Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators

Webinars

Trending Sources

Monitor embedding drift for LLMs deployed from Amazon SageMaker JumpStart

Webinars

What Is Retrieval-Augmented Generation?

Training Sessions Coming to ODSC APAC 2023

Develop and train large models cost-efficiently with Metaflow and AWS Trainium

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

Build a powerful question answering bot with Amazon SageMaker, Amazon OpenSearch Service, Streamlit, and LangChain

Creating an artificial intelligence 101

How Vericast optimized feature engineering using Amazon SageMaker Processing

Data Science Career FAQs Answered: Educational Background

Build enterprise-ready generative AI solutions with Cohere foundation models in Amazon Bedrock and Weaviate vector database on AWS Marketplace

Zero-shot and few-shot prompting for the BloomZ 176B foundation model with the simplified Amazon SageMaker JumpStart SDK

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

Must-Have Prompt Engineering Skills for 2024

The Memory Bank of LLMs

Scale LLMs with PyTorch 2.0 FSDP on Amazon EKS – Part 2

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Top 15 Data Analytics Projects in 2023 for beginners to Experienced

Enhance conversational AI with advanced routing techniques with Amazon Bedrock

Building a Sentiment Classification System With BERT Embeddings: Lessons Learned

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart

Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 2

Deploy a Hugging Face (PyAnnote) speaker diarization model on Amazon SageMaker as an asynchronous endpoint

Master the Power of Machine Learning with PyCaret: A Step-by-Step Guide

Getting started with Amazon Titan Text Embeddings

How Booking.com modernized its ML experimentation framework with Amazon SageMaker

From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker

Build financial search applications using the Amazon Bedrock Cohere multilingual embedding model

Accelerate hyperparameter grid search for sentiment analysis with BERT models using Weights & Biases, Amazon EKS, and TorchElastic

Top 6 Kubernetes use cases

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

Authoring custom transformations in Amazon SageMaker Data Wrangler using NLTK and SciPy

Alida gains deeper understanding of customer feedback with Amazon Bedrock

Training large language models on Amazon SageMaker: Best practices

Power recommendations and search using an IMDb knowledge graph – Part 3

Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library

Fine-tune and Deploy Mistral 7B with Amazon SageMaker JumpStart

A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction

­­How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker

Gemma is now available in Amazon SageMaker JumpStart

Mitigate hallucinations through Retrieval Augmented Generation using Pinecone vector database & Llama-2 from Amazon SageMaker JumpStart

Stay Connected

How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker