Remove whats-new-data-engineering-and-streaming-january-2024
article thumbnail

Boost inference performance for Mixtral and Llama 2 models with new Amazon SageMaker containers

AWS Machine Learning Blog

In January 2024, Amazon SageMaker launched a new version (0.26.0) LMI allows you to apply tensor parallelism; the latest efficient attention, batching, quantization, and memory management techniques; token streaming; and much more, by just requiring the model ID and optional model parameters.

AWS 84