focus keywords: Kubernetes PodDisruptionBudget, Kubernetes PDB zero downtime, kubectl drain PDB, Kubernetes node upgrade PDB, Kubernetes cluster maintenance SLO

Kubernetes Pod Disruption Budgets: Zero-Downtime Drain, Cluster Upgrades & SLO Protection

Audience: DevOps engineers, SREs, and platform teams managing production Kubernetes clusters and reliability SLOs.

Series: DevOps Reliability Engineering Series

Kubernetes pod disruption budgets and cluster reliability

The Node Upgrade That Took Down Payments

It was a routine Tuesday morning maintenance window. The platform team was upgrading the Kubernetes node group from version 1.28 to 1.29. The upgrade script was straightforward: cordon each node, drain it, terminate the old EC2 instance, and let the node group launch a fresh replacement. Automated, tested in staging, nothing to worry about.

At 09:47 UTC, the on-call phone lit up. The payments service was returning 503s. The SLO dashboard turned red. Investigation revealed that three of the four payment service pods had been evicted simultaneously during the drain of two adjacent nodes. The fourth pod—the one survivor—was overwhelmed and crashing under the load. The payment service had a 99.95% availability SLO. In eight minutes, it burned through six hours of error budget.

The team had Kubernetes. They had multiple pod replicas. They had a node upgrade process. What they didn't have was a PodDisruptionBudget—the Kubernetes API object specifically designed to prevent exactly this scenario.

What Is a PodDisruptionBudget?

A PodDisruptionBudget (PDB) is a Kubernetes policy resource that limits the number of pods of a replicated application that can be voluntarily disrupted at any given time. "Voluntary disruption" includes:

PDBs do not protect against involuntary disruptions: node hardware failure, kernel panics, OOM kills, or pods being killed by the kubelet due to resource limits. Those are handled by replication, affinity rules, and resource quotas.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: payment-service-pdb
  namespace: production
spec:
  minAvailable: 3          # At least 3 pods must always be available
  selector:
    matchLabels:
      app: payment-service

With this PDB in place, when kubectl drain attempts to evict a payment service pod, it first consults the PDB. If evicting that pod would drop available pods below 3, the eviction is blocked and the drain pauses until a replacement pod is running.

minAvailable vs maxUnavailable: Choosing Your Strategy

PDBs offer two complementary constraints. Use exactly one in a given PDB.

minAvailable

Guarantees that at least N pods (or N% of desired) remain available at all times. Best for critical services where you have a hard minimum capacity floor.

# Absolute count
spec:
  minAvailable: 3

# Percentage — rounds down (e.g., 75% of 4 pods = 3)
spec:
  minAvailable: "75%"

Use minAvailable when: you know the minimum number of pods needed to handle production load, and you want to be explicit about that floor.

maxUnavailable

Allows at most N pods (or N%) to be disrupted simultaneously. Best when you think in terms of "how many can I take out at once" rather than "what's my minimum."

# Allow only 1 pod to be disrupted at a time
spec:
  maxUnavailable: 1

# Allow up to 25% to be disrupted at once
spec:
  maxUnavailable: "25%"

Use maxUnavailable when: your service can tolerate some unavailability and you want to control the pace of disruption rather than guarantee an absolute floor.

Which to use?

Scenario Recommended
Critical payment/auth serviceminAvailable: N-1
Worker pool with many replicasmaxUnavailable: "20%"
Single-replica service (bad!)Scale to 2+ replicas first
StatefulSet with quorum requirementsminAvailable: majority

Voluntary vs Involuntary Disruptions

Understanding this distinction is critical for setting realistic expectations about what PDBs can and cannot do.

Voluntary disruptions (PDB-protected)

Involuntary disruptions (PDB does NOT apply)

For involuntary disruptions, your protection comes from multi-zone pod topology spread constraints, anti-affinity rules, and sufficient replica counts to absorb a zone failure.

PDB + HPA Interaction: The Scaling Floor Problem

A frequently overlooked interaction: if your HPA (Horizontal Pod Autoscaler) scales your deployment down to minReplicas and your PDB specifies minAvailable: N where N equals or exceeds minReplicas, the Cluster Autoscaler will be permanently blocked from evicting any pods of that deployment. This means nodes can never be drained during scale-down.

# Dangerous configuration:
# HPA minReplicas: 2
# PDB minAvailable: 2
# Result: 0 pods can ever be disrupted voluntarily!

# Correct configuration:
# HPA minReplicas: 3
# PDB minAvailable: 2
# Result: 1 pod can be disrupted; node drain can proceed

The rule of thumb: minAvailable should always be strictly less than HPA's minReplicas. If you set them equal, you deadlock voluntary disruptions during scale-down periods.

# Check current PDB status to detect blocked evictions:
kubectl get pdb -n production

# Output:
# NAME                  MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
# payment-service-pdb   3               N/A               0                     5d
#
# ALLOWED DISRUPTIONS = 0 means the PDB is blocking all evictions right now

Node Drain Workflow with PDB Enforcement

Here is what happens step by step when you run kubectl drain node-1 against a node hosting payment service pods:

  1. Cordon: The node is marked unschedulable. No new pods will be scheduled on it.
  2. List pods: kubectl identifies all evictable pods on the node (excludes DaemonSet pods by default).
  3. For each pod, call the Eviction API: POST /api/v1/namespaces/{ns}/pods/{pod}/eviction
  4. API server checks PDB: If evicting this pod would violate the PDB, the API returns HTTP 429 (Too Many Requests).
  5. kubectl drain retries: The drain command retries the eviction periodically (default retry period) until the PDB budget opens up (i.e., a replacement pod becomes Running and Ready).
  6. Graceful termination: Once the eviction is approved, the pod receives SIGTERM and the terminationGracePeriodSeconds window begins.
# Safe node drain with timeout (fail if stuck for more than 10 minutes)
kubectl drain node-1 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --timeout=600s \
  --pod-selector='app notin (critical-singleton)'

# Monitor drain progress in another terminal:
watch kubectl get pods -n production -o wide | grep node-1

If the drain times out (because PDB keeps blocking), do not force it. Investigate why replacement pods are not becoming Ready—this is often a resource constraint, image pull failure, or liveness probe issue on the new node.

Cluster Upgrade Strategies

Rolling node upgrade (recommended for most cases)

Upgrade nodes one at a time (or in small batches). With PDBs in place, each node drain waits for pod replacements before proceeding. This is the default for managed Kubernetes offerings (EKS managed node groups, GKE node pool upgrades).

# EKS managed node group rolling upgrade (respects PDBs automatically):
aws eks update-nodegroup-version \
  --cluster-name prod-cluster \
  --nodegroup-name payment-nodes \
  --kubernetes-version 1.29

# Check upgrade progress:
aws eks describe-nodegroup \
  --cluster-name prod-cluster \
  --nodegroup-name payment-nodes \
  --query 'nodegroup.status'

Blue/green node pool upgrade

Create a new node pool with the target Kubernetes version alongside the old one. Gradually cordon and drain old nodes while new pods schedule onto new nodes. Provides the cleanest upgrade path with zero risk of disruption blocking, at the cost of running double capacity during the transition.

# Step 1: Create new node pool (Kubernetes 1.29)
# Step 2: Cordon all old nodes (prevents new scheduling)
for node in $(kubectl get nodes -l pool=old -o name); do
  kubectl cordon $node
done

# Step 3: Drain old nodes one by one (PDB-aware)
for node in $(kubectl get nodes -l pool=old -o name); do
  kubectl drain $node --ignore-daemonsets --delete-emptydir-data --timeout=300s
  echo "Drained: $node"
  sleep 30  # allow system to stabilize
done

PDB with StatefulSets

StatefulSets require special consideration. Unlike Deployments, StatefulSet pods have stable identity and often store state. For quorum-based systems (Kafka, Zookeeper, etcd, Elasticsearch), the PDB must preserve the quorum majority:

# Kafka with 3 brokers — quorum requires at least 2
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: kafka-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: kafka

---
# Elasticsearch with 3 master nodes — never disrupt more than 1
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: elasticsearch-master-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: elasticsearch
      role: master

For StatefulSets that use podManagementPolicy: Parallel, be extra careful—the StatefulSet controller can scale multiple pods simultaneously, and without a PDB that constrains this, node drains can still evict multiple pods at once. The lifecycle discipline here parallels what you'd enforce with Java Structured Concurrency scopes, where you define explicit boundaries for concurrent operations.

Common PDB Misconfigurations

Misconfiguration 1: PDB selector matches no pods

If the PDB's matchLabels selector doesn't match any running pods, the PDB has no effect. Always verify after applying:

kubectl describe pdb payment-service-pdb -n production
# Check "Disruptions Allowed" — should be > 0 if pods are running
# Check "Current Healthy" — should match your running replica count

Misconfiguration 2: minAvailable equals replica count

Setting minAvailable: 4 when you have exactly 4 replicas means zero disruptions are ever allowed. The PDB will permanently block all node drains. Set minAvailable to at most replicas - 1.

Misconfiguration 3: No PDB on single-replica services

A single-replica service is disrupted entirely by a node drain. If you cannot tolerate any downtime, you must have at least 2 replicas plus a PDB. There is no way to protect a single-replica service from voluntary disruptions.

Misconfiguration 4: PDB without pod readiness probe

The drain waits for replacement pods to become "Available" according to the PDB—which counts pods that pass their readiness probe. If your pods don't have a readiness probe, Kubernetes considers them immediately ready, even if they are still warming up. This can cause traffic to be routed to cold pods during a drain.

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3

Misconfiguration 5: Forgetting terminationGracePeriodSeconds

When a pod is evicted, it gets a SIGTERM. If your application takes longer than terminationGracePeriodSeconds (default 30s) to shut down gracefully, it is killed with SIGKILL. For Java services with JVM warmup and connection draining, 60–120 seconds is common. Set this explicitly:

spec:
  terminationGracePeriodSeconds: 90
  containers:
  - name: payment-service
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 10"]  # wait for load balancer to deregister

Monitoring Disruption Budget Status

PDB status should be part of your cluster health dashboard. Key metrics to track:

Kubernetes metrics

# Current disruption allowance (Prometheus query via kube-state-metrics):
kube_poddisruptionbudget_status_disruptions_allowed{namespace="production"}

# Expected vs observed healthy pods:
kube_poddisruptionbudget_status_current_healthy{namespace="production"}
kube_poddisruptionbudget_status_desired_healthy{namespace="production"}

Alert: PDB blocking for too long

# Alert when PDB has 0 allowed disruptions for more than 30 minutes
# This indicates a stuck drain or unhealthy pod preventing progress
- alert: PodDisruptionBudgetBlocking
  expr: |
    kube_poddisruptionbudget_status_disruptions_allowed == 0
  for: 30m
  labels:
    severity: warning
  annotations:
    summary: "PDB {{ $labels.poddisruptionbudget }} blocking disruptions in {{ $labels.namespace }}"

Event-based monitoring

# Watch for disruption-related events:
kubectl get events -n production --field-selector reason=Evicting
kubectl get events -n production --field-selector reason=EvictionBlocked

PDB + Cluster Autoscaler Scale-Down

The Cluster Autoscaler respects PDBs when deciding whether to evict pods from underutilized nodes. If a node's pods cannot all be evicted without violating PDBs, the Cluster Autoscaler will not scale down that node. This is the correct behavior—but it can cause cost inefficiencies if PDBs are too restrictive.

# Annotation to prevent Cluster Autoscaler from evicting specific pods:
# (use only when truly needed—prefer PDBs for service-wide policies)
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

A healthy practice is to review Cluster Autoscaler logs weekly and look for pod_not_safe_to_evict_reason=pdb entries on nodes that have been un-evictable for more than 24 hours. These often indicate misconfigured PDBs or stuck pods that are preventing cost optimization.

Key Takeaways

Read More

Explore related posts on Kubernetes reliability and backend engineering:

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices

Discussion / Comments

Related Posts

DevOps

Advanced Kubernetes

Resource management and scheduling for production Kubernetes clusters.

DevOps

Zero-Downtime Deployments

Blue-green, canary, and feature flag deployment strategies for production.

DevOps

Kubernetes KEDA Autoscaling

Event-driven autoscaling with scale-to-zero for Kafka and batch workloads.

Last updated: March 2026 — Written by Md Sanwar Hossain