Remove module etcd
article thumbnail

Scaling distributed training with AWS Trainium and Amazon EKS

AWS Machine Learning Blog

TorchX has two important dependencies: the Volcano batch scheduler and the etcd server. Volcano handles the scheduling and queuing of training jobs, while the etcd server is a key-value store used by TorchElastic for synchronization and peer discovery during job startup.

AWS 91