DevOps

Kubernetes Operator Pattern: Building Custom Controllers for Stateful Applications

Kubernetes Deployments and StatefulSets handle stateless and basic stateful workloads well. But running databases, message brokers, and distributed datastores at production scale requires operational knowledge that can't be encoded in YAML — backup orchestration, primary election, rolling schema upgrades, disaster recovery. The Operator Pattern encodes this human operational knowledge as code.

Md Sanwar Hossain March 19, 2026 22 min read DevOps

Kubernetes operator pattern custom controllers stateful applications

The Problem with YAML-Driven StatefulSet Management
The Operator Pattern: Controller + CRD
The Reconciliation Loop
Designing the Custom Resource (CRD)
Implementing the Controller with Kubebuilder
Real Stateful Application Scenarios
Production Failure Scenarios
Testing Operators: Unit, Integration, E2E
Trade-offs and When NOT to Write an Operator
Key Takeaways

1. The Problem with YAML-Driven StatefulSet Management

Kubernetes Operator Pattern | mdsanwarhossain.me — Kubernetes Operator Pattern — mdsanwarhossain.me

Consider deploying a Kafka cluster on Kubernetes. A StatefulSet can manage the pods, but it can't:

Automatically rebalance partitions when a broker is added or removed
Perform rolling upgrades while ensuring at-least one ISR replica per partition
Automatically trigger backup before a destructive operation
Detect and remediate a broker that's joined with mismatched configuration
Expose cluster-level metrics as Kubernetes status conditions

These operations require deep application-specific knowledge. Before Operators, teams encoded this knowledge in runbooks, manual scripts, and tribal knowledge. The Operator Pattern replaces the runbook with code — specifically, a Kubernetes controller that watches a Custom Resource and continuously reconciles the cluster state toward the desired state.

Real-world scale: Strimzi (Kafka Operator), PostgreSQL Operator by Zalando, MongoDB Community Operator, and Vitess Operator are all production examples managing thousands of clusters across organizations. These operators encode years of operational runbook knowledge into code.

2. The Operator Pattern: Controller + CRD

An Operator consists of two components:

Custom Resource Definition (CRD): Extends the Kubernetes API with a new resource type (e.g., KafkaCluster). Users create instances of this resource to express desired state.
Controller: A Go program (typically) running as a Deployment inside the cluster. It watches Custom Resource instances and reconciles the cluster's actual state to match the desired state expressed in the resource.

        User applies: KafkaCluster YAML

                           ↓

        Kubernetes API Server stores in etcd

                           ↓

        Controller watches for KafkaCluster events (informer)

                           ↓

        Reconcile loop runs: compare desired vs actual

                           ↓

        Controller creates/updates/deletes: StatefulSets, Services,

        ConfigMaps, PVCs, RBAC, NetworkPolicies

                           ↓

        Updates KafkaCluster.Status (conditions, observedState)

3. The Reconciliation Loop

Operator Design Pattern | mdsanwarhossain.me — Operator Design Pattern — mdsanwarhossain.me

The reconciliation loop is the heart of every controller. It must be:

Idempotent: Running reconcile 100 times on an already-converged cluster should produce no changes. Use CreateOrUpdate semantics, not just Create.
Level-triggered, not edge-triggered: The controller doesn't act on "what changed" but on "what is the current desired vs. actual state." This makes it resilient to missed events and duplicated events.
Return RequeueAfter on partial progress: If the cluster is in a transitional state (a node is starting up), return ctrl.Result{RequeueAfter: 30 * time.Second} to check again later.

4. Designing the Custom Resource (CRD)

apiVersion: kafka.myorg.io/v1alpha1
kind: KafkaCluster
metadata:
  name: payments-kafka
  namespace: production
spec:
  version: "3.7.0"
  replicas: 3
  resources:
    requests:
      cpu: "2"
      memory: "8Gi"
    limits:
      cpu: "4"
      memory: "16Gi"
  storage:
    class: premium-ssd
    size: 500Gi
  config:
    defaultReplicationFactor: 3
    minInsyncReplicas: 2
    logRetentionHours: 168
  backup:
    enabled: true
    schedule: "0 3 * * *"          # Daily at 3 AM
    destination: s3://backups/kafka
  monitoring:
    prometheusEnabled: true
status:
  phase: Running                   # Pending | Initializing | Running | Degraded
  readyBrokers: 3
  conditions:
    - type: Available
      status: "True"
      lastTransitionTime: "2026-03-19T05:00:00Z"
    - type: BrokersDegraded
      status: "False"

CRD design principles: (1) Spec expresses desired state (immutable business intent); Status expresses observed state. Never let users write to Status — it's owned by the controller. (2) Version your API from day one (v1alpha1 → v1beta1 → v1). Kubernetes CRD versioning with conversion webhooks handles schema evolution. (3) Validation: use CEL (Common Expression Language) validation rules in the CRD schema to reject invalid specs before they reach the controller.

5. Implementing the Controller with Kubebuilder

// Kubebuilder controller skeleton
//+kubebuilder:rbac:groups=kafka.myorg.io,resources=kafkaclusters,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete

func (r *KafkaClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)

    // 1. Fetch the KafkaCluster instance
    cluster := &kafkav1alpha1.KafkaCluster{}
    if err := r.Get(ctx, req.NamespacedName, cluster); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Handle deletion via finalizer
    if !cluster.DeletionTimestamp.IsZero() {
        return r.handleDeletion(ctx, cluster)
    }

    // 3. Reconcile StatefulSet
    if result, err := r.reconcileStatefulSet(ctx, cluster); err != nil || result.Requeue {
        return result, err
    }

    // 4. Reconcile Services
    if result, err := r.reconcileServices(ctx, cluster); err != nil || result.Requeue {
        return result, err
    }

    // 5. Wait for brokers to be ready
    ready, err := r.checkBrokerReadiness(ctx, cluster)
    if err != nil {
        return ctrl.Result{}, err
    }
    if !ready {
        log.Info("Brokers not ready yet, requeueing")
        return ctrl.Result{RequeueAfter: 15 * time.Second}, nil
    }

    // 6. Update status
    return r.updateStatus(ctx, cluster)
}

6. Real Stateful Application Scenarios

Scenario: Safe Rolling Upgrade

When the user updates spec.version from 3.6 to 3.7, the operator can't just do a rolling restart like a Deployment would. It must: (1) Verify that all partitions have sufficient ISR replicas before touching any broker, (2) Upgrade one broker at a time, (3) Wait for the upgraded broker to re-join and ISR to stabilize before upgrading the next, (4) Roll back all upgraded brokers if any step fails. This multi-step coordinated upgrade is impossible with native StatefulSet rolling update logic.

Scenario: Automatic Backup Before Destructive Operation

When a user decreases spec.replicas from 5 to 3, the operator recognizes this as a destructive scale-down. Before proceeding, it triggers an immediate backup (creating a KafkaBackup CR which the operator also manages), waits for the backup to complete successfully, then proceeds with the scale-down. If the backup fails, it blocks the scale-down and sets a status condition explaining why.

7. Production Failure Scenarios

Failure: Controller Restart During Multi-Step Operation

If the controller pod restarts mid-upgrade (step 2 of 5), the reconciliation loop restarts from scratch. Idempotent reconciliation means it re-checks the current state and determines which brokers have been upgraded vs. which haven't — and continues from the correct point. Stateful multi-step operations must be encoded in the Custom Resource's Status (current step, checkpoints) so restarts can resume rather than restart from zero.

Failure: Status Desync After etcd Compaction

etcd compaction can cause controllers to miss events. The controller's informer cache re-syncs on schedule, but there's a window where Status may not reflect actual state. Implement a periodical status health check: every 5 minutes, regardless of events, verify that Status.ReadyBrokers matches the actual pod count. Correct divergence immediately.

8. Testing Operators: Unit, Integration, E2E

Unit tests: Test the reconcile function with a fake Kubernetes client (controller-runtime's envtest fake client). Verify that the correct K8s objects are created/updated for given CRD specs without a real cluster.
Integration tests with envtest: Kubebuilder's envtest runs a real etcd and API server in-process. Test the full reconciliation loop including watching, status updates, and webhook validation.
E2E tests with kind: Spin up a kind (Kubernetes in Docker) cluster in CI, install the operator, apply test CRs, and verify cluster state with assertions. This catches cluster-RBAC issues and real Pod scheduling behavior.
Chaos testing: Kill the controller pod mid-reconciliation; kill a managed pod; simulate PVC binding failures. Verify the operator correctly detects, reports, and recovers from each scenario.

9. Trade-offs and When NOT to Write an Operator

High engineering investment: A production-quality operator requires 3–6 months of engineering. Assess whether using an existing community operator (Strimzi for Kafka, Zalando for PostgreSQL) is sufficient before building custom.
Operator lifecycle responsibility: You own the operator. When Kubernetes upgrades APIs or your application releases a new version, you must update the operator. Budget ongoing maintenance.
Don't use for stateless apps: Deployments + Helm cover stateless application lifecycle perfectly. Operators add complexity where none is needed.
Use when: You're running stateful applications (databases, message brokers) at scale, the operational runbook is longer than 20 pages, you have multiple teams that need to self-service cluster provisioning, or you need automated recovery from application-level failures (not just pod failures).

Operator Maturity Model: The Operator Framework defines 5 maturity levels: (1) Basic Install, (2) Seamless Upgrades, (3) Full Lifecycle, (4) Deep Insights, (5) Auto Pilot. Most custom operators stop at Level 3. Level 5 requires the operator to proactively recommend or implement optimizations based on workload patterns.

10. Key Takeaways

The Operator Pattern encodes operational runbook knowledge (backup, upgrade, failover) as code in a Kubernetes controller.
Reconciliation loops must be idempotent and level-triggered — they compare desired vs. actual state, not deltas between events.
CRD Spec = user-desired state (never written by controller); CRD Status = observed state (owned by controller).
Multi-step operations must checkpoint progress in Status so controller restarts resume correctly.
Test with envtest for unit/integration tests; kind-based E2E tests for full lifecycle validation; chaos testing for resilience.
Before writing a custom operator, evaluate existing community operators (Strimzi, Zalando, etc.) — they encode years of production expertise.

Conclusion

Kubernetes Operators represent the evolution from "infrastructure managed by YAML" to "infrastructure managed by code." For teams running stateful systems at production scale, the investment in a well-designed operator pays for itself within a few months in reduced operational incidents and faster onboarding of new cluster instances.

Start by understanding the reconciliation model deeply — it's the conceptual foundation for everything else. Then choose Kubebuilder over Operator SDK for new projects (better Kubernetes API integration) and write your first operator for the smallest, simplest use case to build intuition before tackling complex stateful workloads.

Frequently Asked Questions

What is The Problem with YAML-Driven StatefulSet Management and how does it work?

Consider deploying a Kafka cluster on Kubernetes. A StatefulSet can manage the pods, but it can't: These operations require deep application-specific knowledge. Before Operators, teams encoded this knowledge in runbooks, manual scripts, and tribal knowledge. The Operator Pattern replaces the runbook with code — specifically, a Kubernetes controller that watches a Custom Resource and continuously reconciles the cluster state toward the desired state. Automatically rebalance partitions when a broker is added or removed Perform rolling upgrades while ensuring at-least one ISR replica per partition Automatically trigger backup before a destructive operation Detect and remediate a broker that's joined with mismatched configuration

How does the The Operator Pattern work and when should you use it?

An Operator consists of two components: Custom Resource Definition (CRD): Extends the Kubernetes API with a new resource type (e.g., KafkaCluster ). Users create instances of this resource to express desired state. Controller: A Go program (typically) running as a Deployment inside the cluster. It watches Custom Resource instances and reconciles the cluster's actual state to match the desired state expressed in the resource.

What is The Reconciliation Loop and how does it work?

The reconciliation loop is the heart of every controller. It must be: Idempotent: Running reconcile 100 times on an already-converged cluster should produce no changes. Use CreateOrUpdate semantics, not just Create . Level-triggered, not edge-triggered: The controller doesn't act on "what changed" but on "what is the current desired vs. actual state." This makes it resilient to missed events and duplicated events. Return RequeueAfter on partial progress: If the cluster is in a transitional state (a node is starting up), return ctrl.Result{RequeueAfter: 30 * time.Second} to check again later.

What is Designing the Custom Resource (CRD) and how does it work?

apiVersion: kafka.myorg.io/v1alpha1 kind: KafkaCluster metadata: name: payments-kafka namespace: production spec: version: "3.7.0" replicas: 3 resources: requests: cpu: "2" memory: "8Gi" limits: cpu: "4" memory: "16Gi" storage: class: premium-ssd size: 500Gi config: defaultReplicationFactor: 3 minInsyncReplicas: 2 logRetentionHours: 168 backup: enabled: true schedule: "0 3 * * *" # Daily at 3 AM destination: s3://backups/kafka monitoring: prometheusEnabled: true status: phase: Running # Pending | Initializing | Running | Degraded readyBrokers: 3 conditions: - type: Available status: "True" lastTransitionTime: "2026-03-19T05:00:00Z" - type: BrokersDegraded status: "False" CRD design principles: (1) Spec expresses desired state (immutable business intent); Status expresses observed state. Never let users write to Status — it's owned by the controller. (2) Version your API from day one ( v1alpha1 → v1beta1 → v1 ).

What is Scenario and how does it work?

When the user updates spec.version from 3.6 to 3.7, the operator can't just do a rolling restart like a Deployment would. It must: (1) Verify that all partitions have sufficient ISR replicas before touching any broker, (2) Upgrade one broker at a time, (3) Wait for the upgraded broker to re-join and ISR to stabilize before upgrading the next, (4) Roll back all upgraded brokers if any step fails. This multi-step coordinated upgrade is impossible with native StatefulSet rolling update logic.

Kubernetes Operator Pattern: Building Custom Controllers for Stateful Applications

Table of Contents

1. The Problem with YAML-Driven StatefulSet Management

2. The Operator Pattern: Controller + CRD

3. The Reconciliation Loop

4. Designing the Custom Resource (CRD)

5. Implementing the Controller with Kubebuilder

6. Real Stateful Application Scenarios

Scenario: Safe Rolling Upgrade

Scenario: Automatic Backup Before Destructive Operation

7. Production Failure Scenarios

Failure: Controller Restart During Multi-Step Operation

Failure: Status Desync After etcd Compaction

8. Testing Operators: Unit, Integration, E2E

9. Trade-offs and When NOT to Write an Operator

10. Key Takeaways

Conclusion

Frequently Asked Questions

What is The Problem with YAML-Driven StatefulSet Management and how does it work?

How does the The Operator Pattern work and when should you use it?

What is The Reconciliation Loop and how does it work?

What is Designing the Custom Resource (CRD) and how does it work?

What is Scenario and how does it work?

Tags

Leave a Comment

Related Posts

Kubernetes Operator Pattern: Building Custom Controllers for Stateful Applications

Table of Contents

1. The Problem with YAML-Driven StatefulSet Management

2. The Operator Pattern: Controller + CRD

3. The Reconciliation Loop

4. Designing the Custom Resource (CRD)

5. Implementing the Controller with Kubebuilder

6. Real Stateful Application Scenarios

Scenario: Safe Rolling Upgrade

Scenario: Automatic Backup Before Destructive Operation

7. Production Failure Scenarios

Failure: Controller Restart During Multi-Step Operation

Failure: Status Desync After etcd Compaction

8. Testing Operators: Unit, Integration, E2E

9. Trade-offs and When NOT to Write an Operator

10. Key Takeaways

Conclusion

Frequently Asked Questions

What is The Problem with YAML-Driven StatefulSet Management and how does it work?

How does the The Operator Pattern work and when should you use it?

What is The Reconciliation Loop and how does it work?

What is Designing the Custom Resource (CRD) and how does it work?

What is Scenario and how does it work?

Tags

Leave a Comment

Related Posts

Advanced Kubernetes: Resource Management and Scheduling for Production Clusters

GitOps with ArgoCD: Kubernetes Continuous Delivery at Scale

Service Mesh with Istio: Traffic Management at Scale in Production

DORA Metrics in Practice: Measuring and Improving Engineering Delivery Performance

Cookie Notice