DisaggregatedSet Examples
- Before You Begin
- Example 1 — Simple 2-Role (Prefill + Decode) Nginx
- Example 2 — 3-Role LLM Inference Pattern
- Understanding Child LWS Names
- Checking Status
- Cleanup
Before You Begin
Make sure the LWS controller manager is installed and running:
kubectl wait deploy/lws-controller-manager \
-n lws-system --for=condition=available --timeout=5m
See the installation guide for setup instructions.
Example 1 — Simple 2-Role (Prefill + Decode) Nginx
This example uses nginx containers to demonstrate a prefill + decode disaggregated topology without requiring a real LLM. It closely mirrors the pattern used in production disaggregated inference.
apiVersion: disaggregatedset.x-k8s.io/v1
kind: DisaggregatedSet
metadata:
name: disagg-nginx-demo
namespace: default
spec:
roles:
# Prefill role: larger pool, higher parallelism
- name: prefill
spec:
replicas: 2
rolloutStrategy:
rollingUpdateConfiguration:
maxSurge: 0
maxUnavailable: 1
leaderWorkerTemplate:
size: 2 # 1 leader + 1 worker per group
workerTemplate:
metadata:
labels:
role: prefill
component: disaggregation
spec:
containers:
- name: nginx
image: nginx:1.29.3
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "100m"
memory: "64Mi"
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 2
# Decode role: smaller pool, lower latency
- name: decode
spec:
replicas: 1
rolloutStrategy:
rollingUpdateConfiguration:
maxSurge: 1
maxUnavailable: 0
leaderWorkerTemplate:
size: 1 # 1 leader only per group
workerTemplate:
metadata:
labels:
role: decode
component: disaggregation
spec:
containers:
- name: nginx
image: nginx:1.29.3
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "100m"
memory: "64Mi"
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 2
Apply and Verify
# Apply the manifest
kubectl apply -f disagg-nginx-demo.yaml
# Confirm both child LeaderWorkerSets were created using label selectors
kubectl get leaderworkerset -n default -l disaggregatedset.x-k8s.io/name=disagg-nginx-demo
# Expected output (revision hash in name is generated dynamically):
# NAME REPLICAS READY AGE
# disagg-nginx-demo-58f79fdb78-prefill 2 2 30s
# disagg-nginx-demo-58f79fdb78-decode 1 1 30s
# Check all pods are running
kubectl get pods -n default -l component=disaggregation
Example 2 — 3-Role LLM Inference Pattern
This example models a 3-phase disaggregated serving topology: prefill (KV cache generation),
decode (token generation), and encode (context encoding). It uses placeholder containers
(registry.k8s.io/pause:3.9) to demonstrate the scheduling topology without requiring GPU resources.
apiVersion: disaggregatedset.x-k8s.io/v1
kind: DisaggregatedSet
metadata:
name: disagg-3role-demo
namespace: default
spec:
roles:
# Prefill: generates KV cache from input tokens — CPU/GPU intensive
- name: prefill
spec:
replicas: 4
rolloutStrategy:
rollingUpdateConfiguration:
maxSurge: 1
maxUnavailable: 0
leaderWorkerTemplate:
size: 2
workerTemplate:
metadata:
labels:
role: prefill
spec:
containers:
- name: model
image: registry.k8s.io/pause:3.9
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "100m"
memory: "64Mi"
# Decode: generates tokens autoregressively — memory-bandwidth intensive
- name: decode
spec:
replicas: 2
rolloutStrategy:
rollingUpdateConfiguration:
maxSurge: 1
maxUnavailable: 0
leaderWorkerTemplate:
size: 1
workerTemplate:
metadata:
labels:
role: decode
spec:
containers:
- name: model
image: registry.k8s.io/pause:3.9
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "100m"
memory: "64Mi"
# Encode: context encoding — optional, separate scaling
- name: encode
spec:
replicas: 2
rolloutStrategy:
rollingUpdateConfiguration:
maxSurge: 1
maxUnavailable: 0
leaderWorkerTemplate:
size: 1
workerTemplate:
metadata:
labels:
role: encode
spec:
containers:
- name: model
image: registry.k8s.io/pause:3.9
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "100m"
memory: "64Mi"
Apply and Verify
kubectl apply -f disagg-3role-demo.yaml
# All three child LWS resources should appear (revision hash generated dynamically)
kubectl get leaderworkerset -n default -l disaggregatedset.x-k8s.io/name=disagg-3role-demo
# NAME REPLICAS READY AGE
# disagg-3role-demo-58f79fdb78-prefill 4 4 30s
# disagg-3role-demo-58f79fdb78-decode 2 2 30s
# disagg-3role-demo-58f79fdb78-encode 2 2 30s
Understanding Child LWS Names
The DisaggregatedSet controller names each child LeaderWorkerSet using a revision hash to track
rollouts. The naming format is:
<DisaggregatedSet-name>-<revision-hash>-<role-name>
For example, a DisaggregatedSet named my-inference with roles prefill and decode creates:
my-inference-58f79fdb78-prefillmy-inference-58f79fdb78-decode
Note: The revision hash is dynamic and changes on each rollout. Never rely on hardcoded child LWS names — always use label selectors to query them.
You can list all child LWS resources for a given DisaggregatedSet with:
kubectl get leaderworkerset -l disaggregatedset.x-k8s.io/name=my-inference
To filter by role:
kubectl get leaderworkerset -l disaggregatedset.x-k8s.io/name=my-inference,disaggregatedset.x-k8s.io/role=prefill
Checking Status
# Check the DisaggregatedSet overall status
kubectl describe disaggregatedset disagg-nginx-demo
# Check the status of child LWSes by label
kubectl get leaderworkerset -l disaggregatedset.x-k8s.io/name=disagg-nginx-demo \
-l disaggregatedset.x-k8s.io/role=prefill
Cleanup
# Deleting the DisaggregatedSet also deletes all child LeaderWorkerSets
kubectl delete disaggregatedset disagg-nginx-demo disagg-3role-demo
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.