AI, AWS and Clustering - Data Science Current

AWS at NVIDIA GTC 2024: Accelerate innovation with generative AI on AWS

AWS Machine Learning Blog

APRIL 11, 2024

AWS was delighted to present to and connect with over 18,000 in-person and 267,000 virtual attendees at NVIDIA GTC, a global artificial intelligence (AI) conference that took place March 2024 in San Jose, California, returning to a hybrid, in-person experience for the first time since 2019.

AWS

AWS AI AI Clustering

Integrate HyperPod clusters with Active Directory for seamless multi-user login

AWS Machine Learning Blog

APRIL 22, 2024

Amazon SageMaker HyperPod is purpose-built to accelerate foundation model (FM) training, removing the undifferentiated heavy lifting involved in managing and optimizing a large training compute cluster. In this solution, HyperPod cluster instances use the LDAPS protocol to connect to the AWS Managed Microsoft AD via an NLB.

Clustering

Clustering AWS ML ML

Revolutionizing large language model training with Arcee and AWS Trainium

AWS Machine Learning Blog

APRIL 29, 2024

Close collaboration with AWS Trainium has also played a major role in making the Arcee platform extremely performant, not only accelerating model training but also reducing overall costs and enforcing compliance and data integrity in the secure AWS environment. Our cluster consisted of 16 nodes, each equipped with a trn1n.32xlarge

AWS

AWS Clustering ML ML

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Deploy generative AI models from Amazon SageMaker JumpStart using the AWS CDK

AWS Machine Learning Blog

MAY 23, 2023

Just recently, generative AI applications have captured everyone’s attention and imagination. We are truly at an exciting inflection point in the widespread adoption of ML, and we believe every customer experience and application will be reinvented with generative AI.

AWS

AWS AI AI ML

Boost your forecast accuracy with time series clustering

AWS Machine Learning Blog

APRIL 4, 2023

AWS provides various services catered to time series data that are low code/no code, which both machine learning (ML) and non-ML practitioners can use for building ML solutions. In this post, we seek to separate a time series dataset into individual clusters that exhibit a higher degree of similarity between its data points and reduce noise.

Clustering

Clustering ML ML AWS

Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium

AWS Machine Learning Blog

OCTOBER 5, 2023

In this post, we walk through how to fine-tune Llama 2 on AWS Trainium , a purpose-built accelerator for LLM training, to reduce training times and costs. We review the fine-tuning scripts provided by the AWS Neuron SDK (using NeMo Megatron-LM), the various configurations we used, and the throughput results we saw.

AWS

AWS Machine Learning Machine Learning Deep Learning

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

AWS Machine Learning Blog

DECEMBER 12, 2023

In this post, we’ll summarize training procedure of GPT NeoX on AWS Trainium , a purpose-built machine learning (ML) accelerator optimized for deep learning training. M tokens/$) trained such models with AWS Trainium without losing any model quality. We’ll outline how we cost-effectively (3.2 billion in Pythia. billion in Pythia.

AWS

AWS Deep Learning Deep Learning Machine Learning

Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators

Flipboard

JUNE 20, 2023

For reference, GPT-3, an earlier generation LLM has 175 billion parameters and requires months of non-stop training on a cluster of thousands of accelerated processors. The Carbontracker study estimates that training GPT-3 from scratch may emit up to 85 metric tons of CO2 equivalent, using clusters of specialized hardware accelerators.

AWS

AWS Machine Learning Machine Learning Deep Learning

Exploring architectural choices: Options for running IBM TRIRIGA Application Suite on AWS with Red Hat OpenShift

IBM Journey to AI blog

APRIL 3, 2024

Data and AI are increasingly critical tools in how organizations are evolving their facilities management. Real-time insights infused with AI support dynamic space planning. In this blog post, we walk through the recommended options for running IBM TAS on Amazon Web Services (AWS).

AWS

AWS Clustering AI AI

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

AWS (Amazon Web Services), the comprehensive and evolving cloud computing platform provided by Amazon, is comprised of infrastructure as a service (IaaS), platform as a service (PaaS) and packaged software as a service (SaaS). With its wide array of tools and convenience, AWS has already become a popular choice for many SaaS companies.

AWS

AWS Cloud Computing Data Lakes Database

Monitor embedding drift for LLMs deployed from Amazon SageMaker JumpStart

AWS Machine Learning Blog

FEBRUARY 2, 2024

One of the most useful application patterns for generative AI workloads is Retrieval Augmented Generation (RAG). Because embeddings are an important source of data for NLP models in general and generative AI solutions in particular, we need a way to measure whether our embeddings are changing over time (drifting).

AWS

AWS Clustering ETL Database

How Amazon Search M5 saved 30% for LLM training cost by using AWS Trainium

AWS Machine Learning Blog

NOVEMBER 22, 2023

When AWS launched purpose-built accelerators with the first release of AWS Inferentia in 2020, the M5 team quickly began to utilize them to more efficiently deploy production workloads , saving both cost and reducing latency. We use AWS Batch automated retries to retry jobs that encounter a transient failure during model training.

AWS

AWS ML ML Deep Learning

Implement smart document search index with Amazon Textract and Amazon OpenSearch

AWS Machine Learning Blog

SEPTEMBER 8, 2023

We’ll cover how technologies such as Amazon Textract, AWS Lambda , Amazon Simple Storage Service (Amazon S3), and Amazon OpenSearch Service can be integrated into a workflow that seamlessly processes documents. The main concepts used are the AWS Cloud Development Kit (CDK) constructs, the actual CDK stacks and AWS Step Functions.

AWS

AWS Clustering ML ML

How BigBasket improved AI-enabled checkout at their physical stores using Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 13, 2024

Self-checkout process BigBasket introduced an AI-powered checkout system in their physical stores that uses cameras to distinguish items uniquely. The BigBasket team was running open source, in-house ML algorithms for computer vision object recognition to power AI-enabled checkout at their Fresho (physical) stores.

AWS

AWS AI AI ML

EclipseStore enables high performance and saves 96% data storage costs with WebSphere Liberty InstantOn

IBM Journey to AI blog

MARCH 27, 2024

As AI technology advances, the need for high-performance, cost-effective and easily deployable solutions reached unprecedented levels. Exciting new innovations such as advanced robotics, real-world gaming, neuronal interface technology and AI require three fundamental elements: High-performance solutions.

Clustering

Clustering Database SQL AWS

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 16, 2023

This is a joint blog with AWS and Philips. Since 2014, the company has been offering customers its Philips HealthSuite Platform, which orchestrates dozens of AWS services that healthcare and life sciences companies use to improve patient care.

AWS

AWS ML ML AI

Host the Spark UI on Amazon SageMaker Studio

AWS Machine Learning Blog

AUGUST 8, 2023

You can run Spark applications interactively from Amazon SageMaker Studio by connecting SageMaker Studio notebooks and AWS Glue Interactive Sessions to run Spark jobs with a serverless cluster. With interactive sessions, you can choose Apache Spark or Ray to easily process large datasets, without worrying about cluster management.

AWS

AWS Clustering Machine Learning Machine Learning

Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS

AWS Machine Learning Blog

MARCH 30, 2023

Metadata of the match is processed within the AWS Lambda function MetaDataIngestion , while positional data is ingested using the AWS Fargate container called MatchLink. Additionally, the ball recovery times are sent to a specific topic in the MSK cluster, where they can be accessed by other Bundesliga Match Facts.

AWS

AWS Machine Learning Machine Learning Apache Kafka

Amazon SageMaker model parallel library now accelerates PyTorch FSDP workloads by up to 20%

AWS Machine Learning Blog

DECEMBER 22, 2023

As a result, machine learning practitioners must spend weeks of preparation to scale their LLM workloads to large clusters of GPUs. Integrating tensor parallelism to enable training on massive clusters This release of SMP also expands PyTorch FSDP’s capabilities to include tensor parallelism techniques.

Clustering

Clustering AWS Deep Learning Deep Learning

Live Meeting Assistant with Amazon Transcribe, Amazon Bedrock, and Knowledge Bases for Amazon Bedrock

AWS Machine Learning Blog

APRIL 18, 2024

It’s straightforward to deploy in your AWS account. Prerequisites You need to have an AWS account and an AWS Identity and Access Management (IAM) role and user with permissions to create and manage the necessary resources and components for this application. Everything you need is provided as open source in our GitHub repo.

AWS

AWS Analytics Analytics AI

Develop and train large models cost-efficiently with Metaflow and AWS Trainium

AWS Machine Learning Blog

APRIL 29, 2024

To build a production-grade AI system today (for example, to do multilingual sentiment analysis of customer support conversations), what are the primary technical challenges? For AWS and Outerbounds customers, the goal is to build a differentiated machine learning and artificial intelligence (ML/AI) system and reliably improve it over time.

AWS

AWS ML ML Python

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

AWS recently released Amazon SageMaker geospatial capabilities to provide you with satellite imagery and geospatial state-of-the-art machine learning (ML) models, reducing barriers for these types of use cases. Pass the results of the SageMaker endpoint to Amazon Augmented AI (Amazon A2I).

AWS

AWS Data Pipeline ML ML

Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications

AWS Machine Learning Blog

JULY 6, 2023

A number of AWS independent software vendor (ISV) partners have already built integrations for users of their software as a service (SaaS) platforms to utilize SageMaker and its various features, including training, deployment, and the model registry. In some cases, an ISV may deploy their software in the customer AWS account.

ML

ML ML AWS Data Scientist

How to achieve Kubernetes observability: Principles and best practices

IBM Journey to AI blog

FEBRUARY 15, 2024

Autoscaling When traffic spikes, Kubernetes can automatically spin up new clusters to handle the additional workload. However, unlike VMs, Kubernetes orchestrates container interactions that transcend apps and clusters. This includes data in CI/CD pipelines (which feed into K8s clusters) and GitOps workflows (which power K8s clusters).

Clustering

Clustering Azure Data Visualization AWS

Build a powerful question answering bot with Amazon SageMaker, Amazon OpenSearch Service, Streamlit, and LangChain

AWS Machine Learning Blog

MAY 25, 2023

One of the most common applications of generative AI and large language models (LLMs) in an enterprise environment is answering questions based on the enterprise’s knowledge corpus. Amazon Lex provides the framework for building AI based chatbots. AWS Identity and Access Management roles and policies for access management.

AWS

AWS Clustering Python ML

How we matured our ML-on-Kubernetes capabilities and saved on cloud costs

Snorkel AI

SEPTEMBER 5, 2023

If you’re new to Kubernetes, read our Introduction to Kubernetes post to learn more about the basics, and our Machine learning on Kubernetes: wisdom learned at Snorkel AI post to learn more about our journey with Kubernetes thus far. the orchestrator for our Ray cluster). Let’s dive in!

ML

ML ML Clustering Machine Learning

Get Started with Serving Watson NLP Models

IBM Data Science in Practice

DECEMBER 7, 2022

The same image can also be deployed on a cloud container service like AWS ECS or IBM Code Engine; or on a Kubernetes or OpenShift cluster. cp/ai/watson-nlp-runtime:1.0.18" ARG SENTIMENT_MODEL="cp.icr.io/cp/ai/watson-nlp_sentiment_aggregated-cnn-workflow_lang_en_stock:1.0.6" ARG WATSON_RUNTIME_BASE="cp.icr.io/cp/ai/watson-nlp-runtime:1.0.18"

Clustering

Clustering AI AI AWS

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

AWS Machine Learning Blog

NOVEMBER 3, 2023

To process match metadata, we use an AWS Lambda function called MetaDataIngestion , while positional data is brought in using an AWS Fargate container known as MatchLink. Simultaneously, the shot speed data finds its way to a designated topic within our MSK cluster. Luc Eluère is a Data Scientist within Sportec Solutions AG.

AWS

AWS Apache Kafka Data Scientist Algorithm

How Vericast optimized feature engineering using Amazon SageMaker Processing

AWS Machine Learning Blog

MAY 3, 2023

AWS customer Vericast is a marketing solutions company that makes data-driven decisions to boost marketing ROIs for its clients. Dynamic scaling of feature engineering jobs – A combination of various AWS services is used for this, but most notably SageMaker Processing.

AWS

AWS Machine Learning Machine Learning ML

How we increased or ML on Kubernetes capabilities and saved cloud costs

Snorkel AI

SEPTEMBER 5, 2023

If you’re new to Kubernetes, read our Introduction to Kubernetes post to learn more about the basics, and our Machine learning on Kubernetes: wisdom learned at Snorkel AI post to learn more about our journey with Kubernetes thus far. the orchestrator for our Ray cluster). Let’s dive in!

ML

ML ML Clustering Machine Learning

First ODSC Europe 2023 Sessions Announced

ODSC - Open Data Science

MARCH 27, 2023

Botnets Detection at Scale — Lesson Learned from Clustering Billions of Web Attacks into Botnets. Scaling AI/ML Workloads with Ray Kai Fricke | Senior Software Engineer | Anyscale Inc. You will use the same example to explore both approaches utilizing TensorFlow in a Colab notebook.

Machine Learning

Machine Learning Machine Learning ML ML

Build enterprise-ready generative AI solutions with Cohere foundation models in Amazon Bedrock and Weaviate vector database on AWS Marketplace

AWS Machine Learning Blog

JANUARY 24, 2024

Generative AI solutions have the potential to transform businesses by boosting productivity and improving customer experiences, and using large language models (LLMs) with these solutions has become increasingly popular. Despite their wealth of general knowledge, state-of-the-art LLMs only have access to the information they were trained on.

AWS

AWS Database AI AI

Time for a data center refresh? Get ahead of the growing digital landscape with a modern data center strategy

IBM Journey to AI blog

JANUARY 24, 2024

With the seismic shift wrought by generative AI, the pressure is on IT to modernize and optimize to meet the demand. With the help of AI-based insights, Turbonomic can help guide you through the consideration. Cloud service platforms abound promising greater elasticity and savings.

Clustering

Clustering Azure AWS AI

Retain original PDF formatting to view translated documents with Amazon Textract, Amazon Translate, and PDFBox

AWS Machine Learning Blog

JULY 3, 2023

PDF Translate – An open-source library written in Java and published on AWS Samples in GitHub. The source and translated PDF documents can also be found in the AWS Samples GitHub repo. Prerequisites Before you get started, set up your AWS account and the AWS Command Line Interface (AWS CLI).

AWS

AWS ML ML Clustering

What Is Retrieval-Augmented Generation?

Hacker News

NOVEMBER 15, 2023

To understand the latest advance in generative AI , imagine a courtroom. The court clerk of AI is a process called retrieval-augmented generation, or RAG for short. Retrieval-augmented generation is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.

Database

Database AI AI Natural Language Processing

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

AWS Machine Learning Blog

MAY 1, 2024

Large language models (LLMs) are making a significant impact in the realm of artificial intelligence (AI). Llama2 by Meta is an example of an LLM offered by AWS. To learn more about Llama 2 on AWS, refer to Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart.

AWS

AWS ML ML Clustering

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

AWS Machine Learning Blog

MARCH 15, 2024

Therefore, ML creates challenges for AWS customers who need to ensure privacy and security across distributed entities without compromising patient outcomes. Solution overview We deploy FedML into multiple EKS clusters integrated with SageMaker for experiment tracking. As always, AWS welcomes your feedback.

AWS

AWS ML ML Machine Learning

Schedule your notebooks from any JupyterLab environment using the Amazon SageMaker JupyterLab extension

AWS Machine Learning Blog

MAY 10, 2023

The SageMaker extension expects the JupyterLab environment to have valid AWS credentials and permissions to schedule notebook jobs. We discuss the steps for setting up credentials and AWS Identity and Access Management (IAM) permissions later in this post. See Installing or updating the latest version of the AWS CLI for instructions.

AWS

AWS Data Scientist ML ML

Introducing Amazon SageMaker HyperPod to train foundation models at scale

AWS Machine Learning Blog

NOVEMBER 30, 2023

Building foundation models (FMs) requires building, maintaining, and optimizing large clusters to train models with tens to hundreds of billions of parameters on vast amounts of data. Customers such as Stability AI use SageMaker HyperPod to train their foundation models, including Stable Diffusion. “As

Clustering

Clustering AWS Machine Learning Machine Learning

Access private repos using the @remote decorator for Amazon SageMaker training workloads

AWS Machine Learning Blog

JULY 11, 2023

You can set up private package repositories on AWS in multiple ways: Using AWS CodeArtifact Using Amazon Simple Storage (Amazon S3) Hosting a repository on Amazon Elastic Compute Cloud (Amazon EC2) In this post, we focus on the first option: using CodeArtifact. For more information, refer to Configure the AWS CLI.

AWS

AWS Python ML ML

Remembering the 2023 Data Engineering Summit in Videos

ODSC - Open Data Science

FEBRUARY 21, 2024

Co-located with the leading Data Science and AI Training Conference, ODSC East, this summit will gather the leading minds in Data Engineering in Boston on April 23rd and 24th. NET, and AWS. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Scale your machine learning workloads on Amazon ECS powered by AWS Trainium instances

AWS Machine Learning Blog

MAY 31, 2023

With containers, scaling on a cluster becomes much easier. In late 2022, AWS announced the general availability of Amazon EC2 Trn1 instances powered by AWS Trainium accelerators, which are purpose built for high-performance deep learning training. What you get is an ML development environment that is consistent and portable.

AWS

AWS Machine Learning Machine Learning Clustering

How VMware built an MLOps pipeline from scratch using GitLab, Amazon MWAA, and Amazon SageMaker

Flipboard

MARCH 13, 2023

Therefore, VMware Carbon Black and AWS chose to build a custom MLOps pipeline using Amazon SageMaker for its ease of use, versatility, and fully managed infrastructure. In this post, VMware Carbon Black and AWS architects discuss how we built and managed custom ML workflows using Gitlab , Amazon MWAA, and SageMaker.

ML

ML ML AWS Data Scientist

Training Sessions Coming to ODSC APAC 2023

ODSC - Open Data Science

AUGUST 15, 2023

Advancements in data science and AI are coming at a lightning-fast pace. Troubleshooting Search and Retrieval with LLMs Xander Song | Machine Learning Engineer and Developer Advocate | Arize AI Some of the major challenges in deploying LLM applications are the accuracy of results and hallucinations. Check out a few of them below.

Machine Learning

Machine Learning Machine Learning Data Science Data Scientist

AWS at NVIDIA GTC 2024: Accelerate innovation with generative AI on AWS

Integrate HyperPod clusters with Active Directory for seamless multi-user login

Webinars

Trending Sources

Revolutionizing large language model training with Arcee and AWS Trainium

Webinars

Deploy generative AI models from Amazon SageMaker JumpStart using the AWS CDK

Boost your forecast accuracy with time series clustering

Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators

Exploring architectural choices: Options for running IBM TRIRIGA Application Suite on AWS with Red Hat OpenShift

10 Things AWS Can Do for Your SaaS Company

Monitor embedding drift for LLMs deployed from Amazon SageMaker JumpStart

How Amazon Search M5 saved 30% for LLM training cost by using AWS Trainium

Implement smart document search index with Amazon Textract and Amazon OpenSearch

How BigBasket improved AI-enabled checkout at their physical stores using Amazon SageMaker

EclipseStore enables high performance and saves 96% data storage costs with WebSphere Liberty InstantOn

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

Host the Spark UI on Amazon SageMaker Studio

Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS

Amazon SageMaker model parallel library now accelerates PyTorch FSDP workloads by up to 20%

Live Meeting Assistant with Amazon Transcribe, Amazon Bedrock, and Knowledge Bases for Amazon Bedrock

Develop and train large models cost-efficiently with Metaflow and AWS Trainium

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications

How to achieve Kubernetes observability: Principles and best practices

Build a powerful question answering bot with Amazon SageMaker, Amazon OpenSearch Service, Streamlit, and LangChain

How we matured our ML-on-Kubernetes capabilities and saved on cloud costs

Get Started with Serving Watson NLP Models

Bundesliga Match Facts Shot Speed – Who fires the hardest shots in the Bundesliga?

How Vericast optimized feature engineering using Amazon SageMaker Processing

How we increased or ML on Kubernetes capabilities and saved cloud costs

First ODSC Europe 2023 Sessions Announced

Build enterprise-ready generative AI solutions with Cohere foundation models in Amazon Bedrock and Weaviate vector database on AWS Marketplace

Time for a data center refresh? Get ahead of the growing digital landscape with a modern data center strategy

Retain original PDF formatting to view translated documents with Amazon Textract, Amazon Translate, and PDFBox

What Is Retrieval-Augmented Generation?

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

Schedule your notebooks from any JupyterLab environment using the Amazon SageMaker JupyterLab extension

Introducing Amazon SageMaker HyperPod to train foundation models at scale

Access private repos using the @remote decorator for Amazon SageMaker training workloads

Remembering the 2023 Data Engineering Summit in Videos

Scale your machine learning workloads on Amazon ECS powered by AWS Trainium instances

How VMware built an MLOps pipeline from scratch using GitLab, Amazon MWAA, and Amazon SageMaker

Training Sessions Coming to ODSC APAC 2023

Stay Connected