Remove wiki Trajectory
article thumbnail

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

AWS Machine Learning Blog

Next, we also evaluate the loss trajectory of the model training on AWS Trainium and compare it with the corresponding run on a P4d (Nvidia A100 GPU cores) cluster. NeoX 20B is trained on 4 nodes with small wiki dataset on GPU and Trainium with same training hyper-parameters (global batch size=256). tokens/$ spent. tokens/$ spent.

AWS 92