Boost inference performance for Mixtral and Llama 2 models with new Amazon SageMaker containers
AWS Machine Learning Blog
APRIL 8, 2024
In January 2024, Amazon SageMaker launched a new version (0.26.0) LMI allows you to apply tensor parallelism; the latest efficient attention, batching, quantization, and memory management techniques; token streaming; and much more, by just requiring the model ID and optional model parameters.
Let's personalize your content