article thumbnail

Integrate HyperPod clusters with Active Directory for seamless multi-user login

AWS Machine Learning Blog

Amazon SageMaker HyperPod is purpose-built to accelerate foundation model (FM) training, removing the undifferentiated heavy lifting involved in managing and optimizing a large training compute cluster. In this solution, HyperPod cluster instances use the LDAPS protocol to connect to the AWS Managed Microsoft AD via an NLB.

article thumbnail

Revolutionizing large language model training with Arcee and AWS Trainium

AWS Machine Learning Blog

Close collaboration with AWS Trainium has also played a major role in making the Arcee platform extremely performant, not only accelerating model training but also reducing overall costs and enforcing compliance and data integrity in the secure AWS environment. Our cluster consisted of 16 nodes, each equipped with a trn1n.32xlarge

AWS 104
professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators

Flipboard

Machine learning (ML) engineers have traditionally focused on striking a balance between model training and deployment cost vs. performance. This is important because training ML models and then using the trained models to make predictions (inference) can be highly energy-intensive tasks.

AWS 94
article thumbnail

Boost your forecast accuracy with time series clustering

AWS Machine Learning Blog

AWS provides various services catered to time series data that are low code/no code, which both machine learning (ML) and non-ML practitioners can use for building ML solutions. We use the Time Series Clustering using TSFresh + KMeans notebook, which is available on our GitHub repo.

article thumbnail

Scaling distributed training with AWS Trainium and Amazon EKS

AWS Machine Learning Blog

In late 2022, AWS announced the general availability of Amazon EC2 Trn1 instances powered by AWS Trainium —a purpose-built machine learning (ML) accelerator optimized to provide a high-performance, cost-effective, and massively scalable platform for training deep learning models in the cloud. 32xlarge instances.

AWS 91
article thumbnail

Deploy generative AI models from Amazon SageMaker JumpStart using the AWS CDK

AWS Machine Learning Blog

The seeds of a machine learning (ML) paradigm shift have existed for decades, but with the ready availability of virtually infinite compute capacity, a massive proliferation of data, and the rapid advancement of ML technologies, customers across industries are rapidly adopting and using ML technologies to transform their businesses.

AWS 93
article thumbnail

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

AWS Machine Learning Blog

In this post, we’ll summarize training procedure of GPT NeoX on AWS Trainium , a purpose-built machine learning (ML) accelerator optimized for deep learning training. M tokens/$) trained such models with AWS Trainium without losing any model quality. We’ll outline how we cost-effectively (3.2 billion in Pythia.

AWS 100