DisaggregatedSet

Understanding DisaggregatedSet — purpose, relationship to LeaderWorkerSet, and when to use it.

DisaggregatedSet is a Kubernetes controller and CRD (Custom Resource Definition) that extends LeaderWorkerSet (LWS) to support disaggregated inference workloads — use cases where different phases of inference (e.g., prefill, decode, encode) need to run on separate, independently-scaled groups of pods.

This is especially useful for large language model (LLM) inference services where:

The prefill phase (generating the initial KV cache from the input prompt) is compute-bound and benefits from larger pod groups.
The decode phase (token-by-token autoregressive generation) is memory-bandwidth-bound and can run on smaller groups.
The encode phase (optional context encoding) may have different resource requirements from either.

DisaggregatedSet was introduced in KEP-766 to address these multi-phase, multi-resource serving patterns with a single, declarative Kubernetes resource.

Relationship to LeaderWorkerSet

DisaggregatedSet does not replace LeaderWorkerSet — it orchestrates multiple LeaderWorkerSets.

Each role defined in a DisaggregatedSet spec maps to an independent LeaderWorkerSet, deployed in the same namespace. This means:

Feature	LeaderWorkerSet (LWS)	DisaggregatedSet
Unit	A single group of homogeneous pods	Multiple groups, each with a distinct role
Use case	Uniform inference or training	Disaggregated prefill/decode/encode serving
CRD version	`leaderworkerset.x-k8s.io/v1`	`disaggregatedset.x-k8s.io/v1`
Controller namespace	`lws-system`	`lws-system`
Dependency	None	None (bundled in LWS)

DisaggregatedSet
├── roles[0]: prefill  →  creates LeaderWorkerSet "disaggdeployment-xxx-prefill"
├── roles[1]: decode   →  creates LeaderWorkerSet "disaggdeployment-xxx-decode"
└── roles[2]: encode   →  creates LeaderWorkerSet "disaggdeployment-xxx-encode"

Each child LWS inherits all standard LWS capabilities: rolling updates, subgroup policies, exclusive placement, volume claim templates, and health monitoring.

Roles in DisaggregatedSet

A DisaggregatedSet spec contains a roles list. Each role defines:

Field	Description
`name`	Unique name for this role (e.g., `prefill`, `decode`)
`replicas`	Number of LWS replicas (pod groups) for this role
`rolloutStrategy`	Independent rolling update config per role
`leaderWorkerTemplate`	Pod template defining leader + worker containers

DisaggregatedSet coordinates lifecycle and rollouts across roles. Each role’s replica count, rollout strategy, and pod template can be configured independently, while the controller manages them as a single cohesive unit.

When to Use DisaggregatedSet vs Plain LWS

Use plain LWS when:

All inference pods are homogeneous (same model, same resources).
You do not need to separate prefill from decode.
You are running training jobs or batch workloads without disaggregation.

Use DisaggregatedSet when:

You are running disaggregated LLM inference (e.g., vLLM with P/D disaggregation, SGLang).
Different inference phases require different GPU types or different pod group sizes.
You want to scale prefill and decode replicas independently based on different traffic patterns.
You are evaluating disaggregated serving architectures and need first-class Kubernetes support.

Key Design Principles

LWS-native — DisaggregatedSet is built on top of LWS, not alongside it. This means all LWS features (failure handling, rollout strategy, subgroup topology) are available per role.
Role isolation — Each role’s lifecycle, scaling, and rollout is fully independent. A failed decode role does not impact the prefill role.
Declarative — The entire multi-role inference topology is expressed in a single YAML manifest, making it easy to version-control and apply via GitOps.
API Version — DisaggregatedSet is at v1, matching the LeaderWorkerSet API version.

DisaggregatedSet

Relationship to LeaderWorkerSet

Roles in DisaggregatedSet

When to Use DisaggregatedSet vs Plain LWS

Key Design Principles

Further Reading

Feedback