Remove 10 aws-ecs-amazons-container-tool
article thumbnail

Scale your machine learning workloads on Amazon ECS powered by AWS Trainium instances

AWS Machine Learning Blog

Running machine learning (ML) workloads with containers is becoming a common practice. Containers can fully encapsulate not just your training code, but the entire dependency stack down to the hardware libraries and drivers. With containers, scaling on a cluster becomes much easier. Run the ML task on Amazon ECS.

AWS 77
article thumbnail

Enable pod-based GPU metrics in Amazon CloudWatch

AWS Machine Learning Blog

In February 2022, Amazon Web Services added support for NVIDIA GPU metrics in Amazon CloudWatch , making it possible to push metrics from the Amazon CloudWatch Agent to Amazon CloudWatch and monitor your code for optimal GPU utilization. In order to clone the container project from GitHub , you will need git.