Rollout Strategy

Rolling update is vital to online services with zero downtime. For LLM inference services, this is particularly important, which helps to mitigate stockout. Two different configurations are supported in LWS, maxUnavailable and maxSurge:

  • MaxUnavailable: Indicates how many replicas are allowed to be unavailable during the update, the unavailable number is based on the spec.replicas. Defaults to 1. Note that only values >= 1 are supported.
  • MaxSurge: Indicates how many extra replicas can be deployed during the update. Defaults to 0.

Here’s a leaderWorkerSet configured with rollout strategy, you can find the example here:

spec:
  rolloutStrategy:
    type: RollingUpdate
    rollingUpdateConfiguration:
      maxUnavailable: 2
      maxSurge: 2
  replicas: 4

In the following we’ll show how rolling update processes for a leaderWorkerSet with four replicas. The rolling step is equal to maxUnavailable(2)+maxSurge(2)=4, three Replica status are simulated here:

  • ✅ Replica has been updated
  • ❎ Replica hasn’t been updated
  • ⏳ Replica is in rolling update
PartitionReplicasR-0R-1R-2R-3R-4R-5Note
Stage104Before rolling update
Stage246Rolling update started
Stage326Partition changes from 4 to 2
Stage426Since the last Replica is not ready, Partition will not change
Stage506Partition changes from 2 to 0
Stage606R-2 and R-3 become ready
Stage704Scale down to 4 immediately, reclaiming both surge replicas
Stage804R-1 becomes ready
Stage904Rolling update completed

MaxUnavailable Feature

MaxUnavailable was graduated to Beta in Kubernetes 1.35, which means that it is enabled by default.

Feedback

Was this page helpful?