What is Container Orchestration at Scale and how does it work?

A typical microservices platform in 2026 runs 30–200+ services across multiple environments. Kubernetes provides a declarative control plane : you describe the desired state (replicas, resource constraints, health requirements) and the control loop reconciles reality to match. This is fundamentally different from scripted deployment — infrastructure self-heals without human intervention. Auto-healing: Pods that fail liveness checks are automatically restarted. Nodes that become unhealthy evict and reschedule their workloads. Bin-packing: The scheduler fits pods onto nodes based on resource requests, maximising cluster utilisation while respecting affinity/anti-affinity rules. Service discovery: Built-in DNS for every Service object. No external service registry (Eureka, Consul) needed for intra-cluster communication. Rolling deployments: Native support for zero-downtime updates with configurable surge and unavailability thresholds.

Microservices

Kubernetes Microservices Deployment: Helm, HPA, Health Probes & Production Readiness 2026

Q: What are the production considerations for TL;DR — Production K8s in One Paragraph?

"Package every microservice as a Helm chart with environment-specific values.yaml overrides. Set resource requests = resource limits (Guaranteed QoS class) for JVM services. Configure a startup probe to survive slow JVM boot, then separate liveness from readiness probes. Wire HPA to custom Prometheus metrics or Kafka consumer-lag — not just CPU. Enforce PodDisruptionBudgets , non-root security contexts , and network policies before calling a deployment production-ready."

Running microservices on Kubernetes is not just about writing a Deployment YAML. Production-ready K8s deployments demand Helm-based packaging, precise resource tuning, probe configuration that survives JVM warm-up, HPA backed by real application metrics, and a hardened security posture before the first real traffic hits. This guide gives you every pattern, configuration, and decision — battle-tested in 2026 production environments.

Md Sanwar Hossain April 8, 2026 22 min read Kubernetes

Kubernetes microservices deployment with Helm, HPA, health probes and production readiness 2026

TL;DR — Production K8s in One Paragraph

"Package every microservice as a Helm chart with environment-specific values.yaml overrides. Set resource requests = resource limits (Guaranteed QoS class) for JVM services. Configure a startup probe to survive slow JVM boot, then separate liveness from readiness probes. Wire HPA to custom Prometheus metrics or Kafka consumer-lag — not just CPU. Enforce PodDisruptionBudgets, non-root security contexts, and network policies before calling a deployment production-ready."

Why Kubernetes for Microservices?
Dockerizing Spring Boot Microservices
Helm Chart Structure for Microservices
Resource Limits & Requests
Health Probes: Liveness, Readiness & Startup
Horizontal Pod Autoscaler with Custom Metrics
ConfigMap & Secret Management
Rolling Updates & Rollback Strategies
Pod Disruption Budgets & Zero-Downtime Upgrades
Production Readiness Checklist
Conclusion & Quick Reference

1. Why Kubernetes for Microservices?

Microservices architectures decompose a monolith into many independently deployable services — each with its own release cadence, scaling requirements, and failure domain. This creates operational complexity that manual processes cannot sustain at scale. Kubernetes was purpose-built to solve exactly this class of problems.

Container Orchestration at Scale

A typical microservices platform in 2026 runs 30–200+ services across multiple environments. Kubernetes provides a declarative control plane: you describe the desired state (replicas, resource constraints, health requirements) and the control loop reconciles reality to match. This is fundamentally different from scripted deployment — infrastructure self-heals without human intervention.

Auto-healing: Pods that fail liveness checks are automatically restarted. Nodes that become unhealthy evict and reschedule their workloads.
Bin-packing: The scheduler fits pods onto nodes based on resource requests, maximising cluster utilisation while respecting affinity/anti-affinity rules.
Service discovery: Built-in DNS for every Service object. No external service registry (Eureka, Consul) needed for intra-cluster communication.
Rolling deployments: Native support for zero-downtime updates with configurable surge and unavailability thresholds.
Namespace isolation: Logical tenancy boundaries for teams, environments (dev/staging/prod), or compliance zones — each with its own RBAC and network policy.

The Kubernetes Microservices Value Stack

Beyond orchestration, the K8s ecosystem provides the complete operational stack that microservices need: Helm for packaging, Prometheus + Grafana for observability, Istio or Linkerd for mTLS service mesh, ArgoCD for GitOps delivery, KEDA for event-driven autoscaling, and Cert-Manager for automatic TLS. This ecosystem is why 78% of organisations running microservices use Kubernetes as their runtime platform (CNCF Survey 2025).

Kubernetes architecture diagram showing control plane, worker nodes, pods, and microservices deployment flow — Kubernetes Architecture — control plane components, worker node anatomy, and microservices pod scheduling flow. Source: mdsanwarhossain.me

2. Dockerizing Spring Boot Microservices

The container image is the unit of deployment in Kubernetes. A badly constructed image wastes bandwidth, slows builds, creates security exposure, and causes GC pressure at runtime. Spring Boot's layered jar support and multi-stage Docker builds eliminate all of these problems.

Multi-Stage Build with Layered Jars

Spring Boot 2.3+ produces a layered jar where dependencies, spring boot loader, snapshot dependencies, and application classes are in separate layers. The Docker build exploits this so only changed layers are pushed — typically just the application layer (a few KB), not the 80–150 MB dependency layer.

# Stage 1: Extract layered jar
FROM eclipse-temurin:21-jre-alpine AS builder
WORKDIR /application
COPY target/*.jar application.jar
RUN java -Djarmode=layertools -jar application.jar extract

# Stage 2: Production image
FROM eclipse-temurin:21-jre-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /application

# Copy layers in cache-optimal order (least to most volatile)
COPY --from=builder /application/dependencies/ ./
COPY --from=builder /application/spring-boot-loader/ ./
COPY --from=builder /application/snapshot-dependencies/ ./
COPY --from=builder /application/application/ ./

# JVM flags tuned for container environments
ENV JAVA_OPTS="\
  -XX:+UseContainerSupport \
  -XX:MaxRAMPercentage=75.0 \
  -XX:InitialRAMPercentage=50.0 \
  -XX:+ExitOnOutOfMemoryError \
  -Djava.security.egd=file:/dev/./urandom \
  -Dspring.backgroundpreinitializer.ignore=true"

USER appuser
EXPOSE 8080
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS org.springframework.boot.loader.launch.JarLauncher"]

Critical JVM Container Flags Explained

-XX:+UseContainerSupport (default on Java 11+): Reads cgroup memory/CPU limits instead of host resources. Without this, a JVM in a 512 MiB container on a 64 GB host will set its heap based on 64 GB — causing OOM kills.
-XX:MaxRAMPercentage=75.0: Allocates 75% of cgroup memory limit to heap. Reserve 25% for off-heap (metaspace, thread stacks, NIO buffers, JIT compiled code cache).
-XX:+ExitOnOutOfMemoryError: Immediately terminates the JVM on OOM instead of limping along in a broken state. Kubernetes restarts the pod cleanly.
-Djava.security.egd=file:/dev/./urandom: Prevents SecureRandom from blocking on /dev/random entropy starvation in containers.

Spring Boot Buildpacks Alternative

For teams that prefer zero-Dockerfile workflows, Spring Boot's spring-boot:build-image Maven/Gradle goal uses Cloud Native Buildpacks (Paketo) to produce a production-grade OCI image automatically — including memory calculator, security hardening, and Java runtime selection.

# Maven: build OCI image via Buildpacks (no Dockerfile required)
./mvnw spring-boot:build-image \
  -Dspring-boot.build-image.imageName=registry.example.com/order-service:1.5.2 \
  -Dspring-boot.build-image.publish=true

# Gradle equivalent
./gradlew bootBuildImage \
  --imageName=registry.example.com/order-service:1.5.2 \
  --publishImage

3. Helm Chart Structure for Microservices

Helm is the de-facto package manager for Kubernetes. It converts parameterised Go templates into Kubernetes manifests, manages release history, and enables one-command rollbacks. A well-structured Helm chart is the difference between a reproducible, auditable deployment and a pile of hand-edited YAML files.

Canonical Chart Layout

order-service/
├── Chart.yaml           # Chart metadata, version, appVersion
├── values.yaml          # Default values (safe for dev)
├── values-staging.yaml  # Staging overrides
├── values-prod.yaml     # Production overrides (more replicas, strict probes)
├── charts/              # Subchart dependencies (redis, postgresql)
└── templates/
    ├── _helpers.tpl     # Named template helpers (labels, selectors)
    ├── deployment.yaml
    ├── service.yaml
    ├── hpa.yaml
    ├── pdb.yaml
    ├── configmap.yaml
    ├── serviceaccount.yaml
    ├── networkpolicy.yaml
    └── NOTES.txt        # Post-install usage notes

Production values.yaml

# values.yaml — defaults (safe for development)
replicaCount: 2

image:
  repository: registry.example.com/order-service
  pullPolicy: IfNotPresent
  tag: ""  # Overridden by CI with Chart.appVersion

serviceAccount:
  create: true
  annotations: {}

service:
  type: ClusterIP
  port: 8080

resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"  # Equal to requests = Guaranteed QoS

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 60
  targetMemoryUtilizationPercentage: 80

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 0   # Startup probe handles the delay
  periodSeconds: 15
  failureThreshold: 3
  timeoutSeconds: 5

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  initialDelaySeconds: 0
  periodSeconds: 10
  failureThreshold: 3
  timeoutSeconds: 5

startupProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 30      # 30 × 5s = 150s maximum startup time
  timeoutSeconds: 5

podDisruptionBudget:
  enabled: true
  minAvailable: 1

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]

env:
  SPRING_PROFILES_ACTIVE: "production"
  SERVER_PORT: "8080"

Helm Chart Versioning Strategy

Separate Chart.yaml's version (chart schema version, bumped when templates change) from appVersion (application image tag). CI pipelines should inject the image digest or Git SHA into appVersion at release time. Lock subchart dependency versions using Chart.lock committed to Git — never use floating version ranges for production dependencies.

# Chart.yaml
apiVersion: v2
name: order-service
description: Order management microservice
type: application
version: 3.1.0      # Chart version — bump when templates change
appVersion: "1.5.2" # Application version — injected by CI

dependencies:
  - name: redis
    version: "19.6.2"
    repository: "https://charts.bitnami.com/bitnami"
    condition: redis.enabled

4. Resource Limits & Requests

Resource configuration is the most common source of production Kubernetes incidents. Under-provisioned limits cause OOM kills; over-provisioned requests waste cluster capacity and prevent scheduling. Getting this right for JVM services requires understanding both Kubernetes QoS classes and JVM memory architecture.

Kubernetes QoS Classes

QoS Class	Condition	Eviction Priority	Recommended For
Guaranteed	requests == limits (all containers)	Last evicted	JVM microservices, stateful workloads
Burstable	requests < limits	Medium priority	Batch jobs, dev/staging workloads
BestEffort	No requests or limits set	First evicted	Never in production

JVM Heap vs Container Memory Budget

JVM memory is not just the heap. A container limit of 1 GiB must cover: heap, metaspace, thread stacks (~512 KB/thread × thread count), JIT code cache (~240 MB), off-heap NIO buffers, and GC overhead. A common production formula for Spring Boot services:

# Memory sizing formula for Spring Boot on K8s
# Container limit = Heap + Off-heap overhead
# Off-heap overhead ≈ 256–400 MiB for a typical Spring Boot service

# Example: 512 MiB container limit
# Heap: 75% of 512 MiB = ~384 MiB  (via MaxRAMPercentage=75.0)
# Off-heap: ~128 MiB               (metaspace + threads + buffers)

# Example: 1 GiB container limit (preferred for production)
# Heap: 75% of 1024 MiB = ~768 MiB
# Off-heap: ~256 MiB

# Kubernetes resource block for a 1 GiB service (Guaranteed QoS)
resources:
  requests:
    cpu: "500m"
    memory: "1Gi"
  limits:
    cpu: "500m"     # Equal to request = no CPU throttling surprise
    memory: "1Gi"   # Equal to request = Guaranteed QoS, no OOM eviction

Vertical Pod Autoscaler (VPA) for Right-Sizing

Rather than guessing initial resource values, deploy VPA in Off mode during staging to collect recommendations from real traffic patterns, then promote the recommendations to production manifests. Never run VPA in Auto mode alongside HPA on the same deployment — they conflict on the memory dimension. Use KEDA for event-driven horizontal scaling and VPA for vertical right-sizing in separate workload pools.

5. Health Probes: Liveness, Readiness & Startup

Kubernetes probes are the mechanism by which the platform decides whether to send traffic to a pod (readiness) or restart it (liveness). Misconfigured probes are responsible for a significant proportion of production incidents: premature OOMKills due to aggressive liveness probes during GC pauses, and traffic sent to pods that haven't finished warming up their connection pools.

Spring Boot Actuator Health Groups

Spring Boot 2.3+ ships dedicated /actuator/health/liveness and /actuator/health/readiness endpoints. Configure them explicitly to include the right health indicators:

# application.yaml — Spring Boot Actuator probe configuration
management:
  endpoint:
    health:
      probes:
        enabled: true
      show-details: always
      group:
        liveness:
          include: livenessState          # Only JVM/application state
        readiness:
          include: readinessState,db,redis # Include dependencies
  health:
    livenessstate:
      enabled: true
    readinessstate:
      enabled: true
  endpoints:
    web:
      exposure:
        include: health,info,prometheus,metrics

The Three-Probe Pattern

The critical insight is that liveness and readiness must be independent. The liveness probe should only fail if the application is in an unrecoverable state (deadlock, corruption). The readiness probe should fail when the application cannot serve requests (DB connection lost, circuit breaker open). The startup probe buys time during slow JVM boot without triggering premature liveness restarts.

# deployment.yaml — Three-probe pattern for Spring Boot
containers:
  - name: order-service
    image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
    ports:
      - containerPort: 8080
        protocol: TCP

    # STARTUP PROBE: Runs first. Prevents liveness from firing during slow start.
    # Max startup time = failureThreshold × periodSeconds = 30 × 5 = 150s
    startupProbe:
      httpGet:
        path: /actuator/health/liveness
        port: 8080
      initialDelaySeconds: 10   # Give JVM 10s to begin starting
      periodSeconds: 5
      failureThreshold: 30      # 150s total budget for startup
      timeoutSeconds: 5

    # LIVENESS PROBE: Only for detecting deadlocks / unrecoverable states.
    # Does NOT check dependencies — that's readiness's job.
    livenessProbe:
      httpGet:
        path: /actuator/health/liveness
        port: 8080
      periodSeconds: 15
      failureThreshold: 3       # 3 failures → restart pod
      timeoutSeconds: 5

    # READINESS PROBE: Controls traffic routing. Checks DB, Redis, etc.
    readinessProbe:
      httpGet:
        path: /actuator/health/readiness
        port: 8080
      periodSeconds: 10
      failureThreshold: 3       # 3 failures → remove from Service endpoints
      successThreshold: 1
      timeoutSeconds: 5

Probe Timing Gotchas

GC pause liveness failures: A full GC pause can take 2–10 seconds on heap-heavy JVMs. Set timeoutSeconds to at least 5s and periodSeconds to at least 15s to prevent false positives.
Database connection pool startup: HikariCP validates the pool during startup. If DB is slow, readiness will correctly fail — but liveness must not also fail or you get a restart loop.
JVM class loading: Spring Boot JVMs often take 30–90 seconds to be fully ready in production (especially with many Beans and slow classpath scanning). The startup probe is mandatory for these services.
Sidecar containers: If your pod uses Istio or other sidecars, add initialDelaySeconds to account for sidecar startup before any probe fires.

6. Horizontal Pod Autoscaler with Custom Metrics

CPU-based HPA is often a poor fit for microservices: a Spring Boot service under heavy database I/O load may be saturated while using only 20% CPU. Custom metrics — request rate, queue depth, Kafka consumer lag — are far more meaningful autoscaling signals for most production workloads.

CPU-Based HPA (Baseline)

# hpa.yaml — CPU-based autoscaling (minimum viable)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: order-service-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-service
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60    # Scale up when avg CPU > 60%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30     # React quickly to load spikes
      policies:
        - type: Pods
          value: 4
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300    # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

Custom Metrics HPA via Prometheus Adapter

The Prometheus Adapter bridges Prometheus metrics into Kubernetes' Custom Metrics API, making application-level metrics available to HPA. The following example scales the order service based on HTTP request rate per pod:

# prometheus-adapter ConfigMap rule
rules:
  - seriesQuery: 'http_server_requests_seconds_count{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        pod: {resource: "pod"}
    name:
      matches: "^(.*)_count$"
      as: "http_requests_per_second"
    metricsQuery: |
      sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)

---
# hpa.yaml — Custom metric: requests per second per pod
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: order-service-hpa-rps
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-service
  minReplicas: 2
  maxReplicas: 30
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"    # Scale up if avg > 100 RPS per pod

Kafka Consumer Lag HPA with KEDA

For event-driven microservices consuming from Kafka, KEDA (Kubernetes Event-Driven Autoscaling) provides first-class Kafka consumer lag scaling without needing the Prometheus adapter. This is the 2026 standard for async microservices:

# KEDA ScaledObject — Kafka consumer lag autoscaling
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: order-processor
  minReplicaCount: 2
  maxReplicaCount: 50
  cooldownPeriod: 300           # Seconds before scaling down after lag clears
  pollingInterval: 15           # Check lag every 15 seconds
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka-broker:9092
        consumerGroup: order-processor-group
        topic: orders
        lagThreshold: "100"     # 1 pod per 100 messages of lag
        offsetResetPolicy: latest
      authenticationRef:
        name: kafka-trigger-auth

Kubernetes HPA architecture showing CPU metrics, custom Prometheus metrics, and KEDA Kafka lag autoscaling flow — Kubernetes HPA Architecture — CPU, memory, custom Prometheus metrics, and KEDA Kafka consumer lag autoscaling paths. Source: mdsanwarhossain.me

7. ConfigMap & Secret Management for Microservices

Every microservice has environment-specific configuration: database URLs, feature flags, timeouts, external service endpoints. Never bake configuration into the image. Kubernetes ConfigMap handles non-sensitive config; Secret handles credentials — but raw Kubernetes Secrets are only base64-encoded, not encrypted. Production systems need an external secrets provider.

ConfigMap for Application Properties

# configmap.yaml — mounted as Spring Boot application.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: order-service-config
  namespace: production
data:
  application.yaml: |
    spring:
      datasource:
        url: jdbc:postgresql://postgres-svc:5432/orders
        hikari:
          maximum-pool-size: 20
          minimum-idle: 5
          connection-timeout: 30000
          idle-timeout: 600000
      kafka:
        bootstrap-servers: kafka-svc:9092
        consumer:
          group-id: order-processor-group
          auto-offset-reset: earliest
    management:
      server:
        port: 8081   # Separate management port from application port
    logging:
      level:
        com.example: INFO
        org.springframework: WARN

---
# Mount in Deployment
volumes:
  - name: config-volume
    configMap:
      name: order-service-config
volumeMounts:
  - name: config-volume
    mountPath: /config
    readOnly: true
env:
  - name: SPRING_CONFIG_ADDITIONAL_LOCATION
    value: "file:/config/"

External Secrets Operator (ESO) for Production

The External Secrets Operator syncs secrets from AWS Secrets Manager, HashiCorp Vault, or GCP Secret Manager into Kubernetes Secrets, automatically rotating them when the upstream value changes. This is the 2026 standard — no secrets in Git, no manual kubectl create secret commands:

# ExternalSecret — syncs from AWS Secrets Manager
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: order-service-db-credentials
  namespace: production
spec:
  refreshInterval: 1h             # Re-sync every hour for rotation
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: order-service-db-secret # Creates this K8s Secret
    creationPolicy: Owner
  data:
    - secretKey: db-password
      remoteRef:
        key: production/order-service/db
        property: password
    - secretKey: db-username
      remoteRef:
        key: production/order-service/db
        property: username

8. Rolling Updates & Rollback Strategies

Kubernetes RollingUpdate strategy replaces old pods incrementally, maintaining service availability throughout the deployment. However, the default settings are often too aggressive for JVM-heavy microservices. Careful tuning prevents traffic from being routed to pods that are not yet warm.

Rolling Update Configuration

# deployment.yaml — Rolling update strategy for JVM microservices
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
  namespace: production
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # Create 1 extra pod during update (5 total)
      maxUnavailable: 0    # Never take a pod down before the new one is Ready
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      # Graceful shutdown: allow in-flight requests to complete
      terminationGracePeriodSeconds: 60

      containers:
        - name: order-service
          # ... (image, resources, probes as defined earlier)
          lifecycle:
            preStop:
              exec:
                # Signal Spring Boot to stop accepting new requests.
                # Kubernetes waits for this before sending SIGTERM.
                command: ["sh", "-c", "sleep 5"]

Graceful Shutdown in Spring Boot

# application.yaml — Graceful shutdown configuration
server:
  shutdown: graceful          # Enabled since Spring Boot 2.3

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s  # Wait up to 30s for active requests to finish

# The shutdown sequence:
# 1. K8s removes pod from Service endpoints (stops new traffic)
# 2. preStop hook runs (5s sleep ensures propagation)
# 3. SIGTERM sent to JVM
# 4. Spring context starts graceful shutdown
# 5. Active HTTP requests drain (up to 30s)
# 6. Kafka consumers commit offsets and disconnect
# 7. DB connections returned to pool
# 8. JVM exits cleanly
# 9. SIGKILL sent after terminationGracePeriodSeconds (60s) as backstop

Helm Rollback

Helm maintains a complete release history. Rolling back is a single command that re-applies the previous rendered manifests, triggering another rolling update in reverse. Always verify your rollback target before executing in production:

# List Helm release history
helm history order-service -n production

# Roll back to the previous release
helm rollback order-service -n production

# Roll back to a specific revision
helm rollback order-service 7 -n production

# Verify rollback status
helm status order-service -n production
kubectl rollout status deployment/order-service -n production

9. Pod Disruption Budgets & Zero-Downtime Upgrades

A PodDisruptionBudget (PDB) limits the number of pods that can be simultaneously unavailable due to voluntary disruptions — node drains during cluster upgrades, cluster autoscaler scale-downs, or admin-initiated evictions. Without PDBs, a cluster upgrade can evict all replicas of a service at once, causing a complete service outage.

PodDisruptionBudget Configuration

# pdb.yaml — Ensure at least 1 replica always available
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: order-service-pdb
  namespace: production
spec:
  # minAvailable: 1 means at most (replicas - 1) pods can be disrupted
  # For a 3-replica service: at most 2 can be disrupted → always 1 running
  minAvailable: 1
  selector:
    matchLabels:
      app: order-service

# Alternative: use maxUnavailable for percentage-based budgets
# maxUnavailable: "25%"  # At most 25% of pods can be disrupted at once

# For critical services with 4+ replicas, prefer:
# minAvailable: 2        # Always keep at least 2 pods running

Pod Anti-Affinity for Multi-Zone Availability

PDBs protect against voluntary disruptions, but you also need to ensure replicas are spread across availability zones so a single zone failure does not take down all pods. Use topologySpreadConstraints (preferred over hard anti-affinity in 2026):

# deployment.yaml — Spread pods across zones and nodes
spec:
  template:
    spec:
      topologySpreadConstraints:
        # Spread evenly across availability zones
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: order-service
        # Also spread across nodes within each zone
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: order-service

10. Production Readiness Checklist

Kubernetes production readiness goes beyond deploying workloads. Security contexts, network policies, RBAC, image scanning, and resource quotas are all mandatory before a service can be called production-grade. This is the checklist used at high-traffic platforms in 2026.

Security Context: Non-Root, Read-Only Filesystem

# deployment.yaml — Hardened security context
spec:
  template:
    spec:
      # Pod-level security context
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault    # Enable seccomp filtering

      containers:
        - name: order-service
          # Container-level security context
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true    # Prevents filesystem tampering
            capabilities:
              drop: ["ALL"]                 # Drop all Linux capabilities
            seccompProfile:
              type: RuntimeDefault

          # Read-only root FS requires writable volumes for temp dirs
          volumeMounts:
            - name: tmp-volume
              mountPath: /tmp
            - name: logs-volume
              mountPath: /app/logs

      volumes:
        - name: tmp-volume
          emptyDir: {}          # In-memory, not persisted
        - name: logs-volume
          emptyDir: {}

Network Policy: Zero-Trust Microservice Communication

# networkpolicy.yaml — Default-deny + explicit ingress/egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: order-service-netpol
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: order-service
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # Allow traffic only from API gateway and other internal services
    - from:
        - podSelector:
            matchLabels:
              app: api-gateway
        - podSelector:
            matchLabels:
              app: payment-service
      ports:
        - protocol: TCP
          port: 8080
    # Allow Prometheus scraping on management port
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
      ports:
        - protocol: TCP
          port: 8081
  egress:
    # Allow DNS resolution
    - ports:
        - protocol: UDP
          port: 53
    # Allow outbound to database
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432
    # Allow outbound to Kafka
    - to:
        - podSelector:
            matchLabels:
              app: kafka
      ports:
        - protocol: TCP
          port: 9092

Production Readiness: Full Checklist

Security

☐ runAsNonRoot: true and specific runAsUser set on all containers
☐ readOnlyRootFilesystem: true with explicit writable emptyDir mounts
☐ allowPrivilegeEscalation: false and all capabilities dropped
☐ NetworkPolicy with default-deny — explicit allow rules only
☐ Service account with minimal RBAC permissions (no cluster-admin)
☐ Container images scanned with Trivy or Grype (no HIGH/CRITICAL CVEs in base)
☐ Secrets managed via External Secrets Operator, not raw K8s Secrets
☐ ImagePullPolicy: Always in production (prevent stale cached images)

Reliability

☐ Startup probe configured for JVM startup budget (failureThreshold × periodSeconds)
☐ Liveness probe checks only internal state, not external dependencies
☐ Readiness probe includes DB, Redis, and other required dependencies
☐ PodDisruptionBudget with minAvailable ≥ 1 for all critical services
☐ topologySpreadConstraints for multi-zone pod distribution
☐ terminationGracePeriodSeconds > longest expected request duration + drain buffer
☐ Spring Boot graceful shutdown enabled (server.shutdown: graceful)
☐ HPA configured with meaningful metrics (not just CPU for I/O-bound services)

Observability

☐ Prometheus metrics exposed on /actuator/prometheus
☐ ServiceMonitor (or PodMonitor) created for Prometheus scraping
☐ Structured JSON logging (Logback/Log4j2 JSON encoder)
☐ Distributed tracing headers propagated (OpenTelemetry B3 or W3C Trace Context)
☐ Resource metrics (CPU, memory, GC time) in Grafana dashboard
☐ Alerting rules for error rate, P99 latency, pod restart count

Resource Management

☐ CPU and memory requests set (pod gets scheduled) and limits set (Guaranteed QoS for JVM)
☐ -XX:+UseContainerSupport and -XX:MaxRAMPercentage configured
☐ ResourceQuota on namespace to prevent resource exhaustion by runaway scaling
☐ LimitRange on namespace for default limits on pods without explicit settings
☐ VPA in Recommendation mode for ongoing right-sizing guidance

11. Conclusion & Quick Reference

Deploying microservices on Kubernetes is a multi-layered discipline. The cluster is not the hard part — the patterns you apply on top of it determine whether you get a resilient, self-healing platform or a complex failure domain. The principles that separate production-ready from "it runs in staging" are:

Package with Helm from day one — no hand-edited YAML files that drift between environments.
Set resources accurately — Guaranteed QoS for JVM services, measured with VPA recommendations from real traffic.
Three distinct probes — startup for boot budget, liveness for deadlock detection, readiness for traffic eligibility.
HPA on application metrics — CPU alone is a lagging indicator. Request rate, queue depth, and Kafka lag scale faster and more accurately.
PDBs + topologySpread — protect availability during voluntary disruptions and zone failures simultaneously.
Security context by default — non-root, read-only filesystem, all capabilities dropped, NetworkPolicy default-deny.
Graceful shutdown — coordinate terminationGracePeriodSeconds, preStop hooks, and Spring Boot's graceful shutdown to drain in-flight requests without 502 errors.

Go-Live Decision Gate

☐ All containers run as non-root with read-only filesystem
☐ Three probes configured — startup / liveness / readiness are distinct
☐ Resources set: requests == limits for JVM services (Guaranteed QoS)
☐ HPA configured with meaningful scaling metric, tested under load
☐ PodDisruptionBudget deployed — tested via kubectl drain simulation
☐ Pods spread across ≥ 2 availability zones via topologySpreadConstraints
☐ NetworkPolicy deployed — inbound and outbound locked to known services
☐ Secrets managed via ESO — no plaintext credentials in ConfigMaps or env vars
☐ Graceful shutdown tested: send SIGTERM, verify in-flight requests complete
☐ Rollback tested: helm rollback works and triggers clean rolling update
☐ Prometheus metrics scraped, Grafana dashboard exists with error rate alert

Kubernetes rewards teams that invest in the operational discipline upfront. Each item on this checklist eliminates a class of incident. The platforms running millions of requests per day on Kubernetes are not doing anything exotic — they are doing all of the above, consistently, for every service in the fleet. Start with one service, get every item green, and use it as the golden template for every subsequent service you deploy.