LeaderWorkerSet (LWS) is an API for deploying a group of pods as a unit of replication.
It aims to address common deployment patterns of AI/ML inference workloads, especially multi-host inference workloads where the LLM will be sharded and run across multiple devices on multiple nodes.
Use LWS to orchestrate distributed AI/ML Inference workoads with out of the box support for rolling updates, topology-aware placement, and all-or-nothing restart for failure handling
Contributions welcome!
We do a Pull Request contributions workflow on GitHub. New users are always welcome!