Prometheus and Grafana monitoring dashboard for Spring Boot
Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices

Prometheus + Grafana for Spring Boot: Custom Metrics, Dashboards & Alerting

In production microservices, "it's slow" is not an incident description — it's the beginning of a question. Prometheus and Grafana, powered by Micrometer's elegant instrumentation facade, transform that question into precise, actionable answers: which endpoint, which service instance, which JVM generation, which database query pool. This guide walks through the complete observability stack from dependency setup to Kubernetes-native deployment.

Table of Contents

  1. Why Prometheus + Grafana is the Gold Standard for Spring Boot
  2. Setting Up Prometheus with Spring Boot Actuator
  3. Custom Business Metrics with Micrometer
  4. Building Production Grafana Dashboards
  5. AlertManager Rules for Spring Boot
  6. Recording Rules for Performance
  7. Production Deployment: Prometheus Operator on Kubernetes

Why Prometheus + Grafana is the Gold Standard for Spring Boot

Prometheus Grafana Spring Boot Monitoring Architecture | mdsanwarhossain.me
Prometheus Grafana Spring Boot Monitoring Architecture — mdsanwarhossain.me

The observability ecosystem for JVM workloads has converged on a clear winner: Prometheus for time-series metric collection paired with Grafana for visualization and alerting. This combination holds approximately 70% market share in cloud-native environments, and the reasons go beyond popularity — they reflect a fundamentally superior architectural model for the dynamic, ephemeral nature of Kubernetes-deployed Spring Boot services.

The pull-based model is the key architectural insight. Unlike push-based systems (StatsD, Graphite, InfluxDB with Telegraf), Prometheus initiates metric collection by scraping HTTP endpoints on each target. This inverts the data flow: services do not need to know where their metrics go, do not need network access to a central collector, and cannot accidentally overwhelm a central aggregator. When a pod dies, Prometheus simply stops receiving data from it — the dead pod cannot send erroneous final metrics that corrupt your time series. When pods scale from 3 to 30 during a traffic spike, Prometheus automatically discovers and scrapes all 30 instances without any configuration change (when using Kubernetes service discovery).

Micrometer is the bridge that makes this elegant. Rather than coupling your Spring Boot code directly to Prometheus client libraries, Micrometer provides a vendor-neutral instrumentation facade. Your application code calls Metrics.counter("orders.created") and Micrometer handles the translation to Prometheus's counter format, OpenTelemetry's metric format, Datadog's StatsD format, or any of 30+ other backends — without changing a single line of application code. This portability means your metric instrumentation is not a liability that ties you to a particular vendor.

Comparison with push-based systems in production:

The Prometheus data model uses labeled time series: every metric is identified by a metric name plus a set of key-value label pairs. http_server_requests_seconds_count{method="POST",status="201",uri="/api/orders"} is a distinct time series from http_server_requests_seconds_count{method="GET",status="200",uri="/api/orders/{id}"}. This labeling system enables PromQL (Prometheus Query Language) to slice and aggregate metrics across any dimension: per-endpoint error rate, per-instance latency, per-region request volume — all from a single set of instrumented metrics.

Setting Up Prometheus with Spring Boot Actuator

The setup requires three coordinated pieces: Spring Boot dependencies that expose the scrape endpoint, application configuration that controls what is exposed and how, and Prometheus configuration that tells the server where to scrape. Let's walk through each in production-ready detail.

Maven dependencies:

<dependencies>
    <!-- Spring Boot Actuator: exposes /actuator/prometheus endpoint -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>

    <!-- Micrometer Prometheus registry: translates Micrometer metrics to Prometheus format -->
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
</dependencies>

For Gradle users:

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-actuator'
    implementation 'io.micrometer:micrometer-registry-prometheus'
}

Application configuration (application.yml):

management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus,metrics
      base-path: /actuator
  endpoint:
    health:
      show-details: when-authorized
    prometheus:
      enabled: true
  metrics:
    tags:
      # Common labels applied to ALL metrics — critical for multi-instance correlation
      application: ${spring.application.name}
      environment: ${APP_ENV:development}
      region: ${AWS_REGION:us-east-1}
    distribution:
      # Enable histogram buckets for percentile computation in Prometheus
      percentiles-histogram:
        http.server.requests: true
        spring.data.repository.invocations: true
      # SLO boundaries for SLI tracking (adds _bucket with these boundaries)
      slo:
        http.server.requests: 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s
      # Precomputed percentiles (client-side, no histogram_quantile needed)
      percentiles:
        http.server.requests: 0.50, 0.90, 0.95, 0.99
  export:
    prometheus:
      enabled: true

spring:
  application:
    name: order-service

Three configuration choices here deserve explanation. First, the common tags (application, environment, region) are injected into every single metric emitted by this service. When you have 15 Spring Boot microservices all sending metrics to the same Prometheus instance, these tags are what let you filter http_server_requests_seconds_count{application="order-service"} vs {application="payment-service"} in a single PromQL query without separate scrapers. Second, percentiles-histogram: true enables Prometheus histogram buckets, which allows histogram_quantile() to compute accurate percentiles server-side — this is the correct approach for multi-instance deployments because you can aggregate across instances. Third, the SLO boundaries create precise bucket boundaries aligned to your service level objectives, so you can directly query what fraction of requests exceeded your 200ms SLO.

Prometheus scrape configuration (prometheus.yml) for local development:

global:
  scrape_interval: 15s       # How often to scrape targets
  evaluation_interval: 15s   # How often to evaluate alert rules
  scrape_timeout: 10s        # Timeout per scrape request

# AlertManager integration
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

# Load alert rules from separate files
rule_files:
  - "rules/spring-boot-alerts.yml"
  - "rules/jvm-alerts.yml"
  - "rules/recording-rules.yml"

scrape_configs:
  # Spring Boot application scrape job
  - job_name: 'spring-boot-order-service'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 10s
    static_configs:
      - targets: ['order-service:8080']
        labels:
          service: 'order-service'
          team: 'platform'

  # Prometheus self-monitoring
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Node exporter for host metrics
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

After starting your application, verify the endpoint is working:

# Verify Prometheus endpoint is accessible
curl http://localhost:8080/actuator/prometheus | head -40

# Expected output format:
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
# jvm_memory_used_bytes{application="order-service",area="heap",id="G1 Eden Space",...} 4.2991616E7
# HELP http_server_requests_seconds Duration of HTTP server request handling
# TYPE http_server_requests_seconds histogram
# http_server_requests_seconds_bucket{application="order-service",exception="None",method="GET",...,le="0.05"} 142

Custom Business Metrics with Micrometer

Prometheus Micrometer Metrics Pipeline | mdsanwarhossain.me
Micrometer Metrics Pipeline — mdsanwarhossain.me

Spring Boot auto-configures dozens of JVM and framework metrics automatically: heap usage, GC pause times, thread pool utilization, HTTP request duration, database connection pool size, and more. But the metrics that catch production incidents before they become outages are business metrics: orders created per minute, payment processing latency, checkout funnel drop-off rate, inventory cache hit ratio. These require intentional instrumentation.

Micrometer provides four core metric types, each suited to different measurement semantics.

Counter — monotonically increasing values:

@Service
@RequiredArgsConstructor
public class OrderService {

    private final MeterRegistry registry;
    private final Counter ordersCreated;
    private final Counter ordersFailedCounter;

    public OrderService(MeterRegistry registry) {
        this.registry = registry;
        // Counter with tags for dimensional slicing
        this.ordersCreated = Counter.builder("orders.created")
            .description("Total number of orders successfully created")
            .tag("region", "us-east-1")
            .register(registry);
        this.ordersFailedCounter = Counter.builder("orders.failed")
            .description("Total number of order creation failures")
            .register(registry);
    }

    public Order createOrder(CreateOrderRequest request) {
        try {
            Order order = processOrder(request);
            // Increment with dynamic tags per call
            registry.counter("orders.created",
                "payment_method", request.getPaymentMethod(),
                "customer_tier", request.getCustomerTier()
            ).increment();
            return order;
        } catch (PaymentDeclinedException e) {
            registry.counter("orders.failed",
                "reason", "payment_declined",
                "payment_method", request.getPaymentMethod()
            ).increment();
            throw e;
        } catch (InventoryException e) {
            registry.counter("orders.failed",
                "reason", "out_of_stock"
            ).increment();
            throw e;
        }
    }
}

Gauge — point-in-time values that go up and down:

@Service
public class CartService {

    private final AtomicInteger activeCheckouts = new AtomicInteger(0);
    private final Map<String, Cart> activeCarts = new ConcurrentHashMap<>();

    public CartService(MeterRegistry registry) {
        // Gauge tracking active checkout sessions
        Gauge.builder("checkout.active.sessions", activeCheckouts, AtomicInteger::get)
            .description("Number of checkout sessions currently in progress")
            .register(registry);

        // Gauge tracking total carts in memory
        Gauge.builder("cart.in.memory.count", activeCarts, Map::size)
            .description("Number of active shopping carts held in memory")
            .tag("store", "us-east")
            .register(registry);

        // JVM-style: report queue depth of a background processor
        registry.gauge("order.processing.queue.depth", orderProcessingQueue,
            queue -> (double) queue.size());
    }

    public CheckoutSession startCheckout(String cartId) {
        activeCheckouts.incrementAndGet();
        try {
            return initiateCheckoutFlow(cartId);
        } finally {
            // Gauge value updates automatically via AtomicInteger reference
        }
    }

    public void completeCheckout(String sessionId) {
        activeCheckouts.decrementAndGet();
        // Grafana shows real-time fluctuation, great for capacity planning
    }
}

Timer — latency and throughput for operations:

@Service
public class PaymentService {

    private final Timer paymentTimer;
    private final Timer externalGatewayTimer;

    public PaymentService(MeterRegistry registry) {
        this.paymentTimer = Timer.builder("payment.processing.duration")
            .description("Time taken to process a payment end-to-end")
            .publishPercentileHistogram()          // Enable histogram_quantile in Prometheus
            .minimumExpectedValue(Duration.ofMillis(50))
            .maximumExpectedValue(Duration.ofSeconds(10))
            .sla(Duration.ofMillis(200), Duration.ofMillis(500), Duration.ofSeconds(1))
            .register(registry);

        this.externalGatewayTimer = Timer.builder("payment.gateway.duration")
            .description("Time taken for the external payment gateway to respond")
            .publishPercentileHistogram()
            .register(registry);
    }

    public PaymentResult processPayment(PaymentRequest request) {
        return paymentTimer.record(() -> {
            // Timer records duration of this supplier automatically
            String gatewayRef = externalGatewayTimer.record(() ->
                stripeClient.charge(request.getAmount(), request.getToken())
            );
            return new PaymentResult(gatewayRef, PaymentStatus.SUCCESS);
        });
    }

    // Alternative: explicit start/stop for async flows
    public CompletableFuture<PaymentResult> processPaymentAsync(PaymentRequest request) {
        Timer.Sample sample = Timer.start(registry);
        return paymentGatewayAsync.charge(request)
            .whenComplete((result, ex) -> {
                sample.stop(Timer.builder("payment.async.duration")
                    .tag("status", ex == null ? "success" : "error")
                    .register(registry));
            });
    }
}

DistributionSummary — non-time value distributions:

@Service
public class ApiGatewayMetrics {

    private final DistributionSummary requestSizeSummary;
    private final DistributionSummary responseSizeSummary;
    private final DistributionSummary orderValueSummary;

    public ApiGatewayMetrics(MeterRegistry registry) {
        this.requestSizeSummary = DistributionSummary.builder("http.request.size.bytes")
            .description("HTTP request payload size in bytes")
            .baseUnit("bytes")
            .publishPercentileHistogram()
            .register(registry);

        this.responseSizeSummary = DistributionSummary.builder("http.response.size.bytes")
            .description("HTTP response payload size in bytes")
            .baseUnit("bytes")
            .publishPercentileHistogram()
            .register(registry);

        this.orderValueSummary = DistributionSummary.builder("order.value.dollars")
            .description("Monetary value of placed orders in USD cents")
            .baseUnit("cents")
            .publishPercentileHistogram()
            .sla(1000, 5000, 10000, 50000) // $10, $50, $100, $500 boundaries
            .register(registry);
    }

    public void recordRequest(int requestBytes, int responseBytes) {
        requestSizeSummary.record(requestBytes);
        responseSizeSummary.record(responseBytes);
    }

    public void recordOrderPlaced(BigDecimal orderValueUsd) {
        // Record GMV — histogram_quantile gives P50/P95 order value
        orderValueSummary.record(orderValueUsd.multiply(BigDecimal.valueOf(100)).longValue());
    }
}

Naming conventions and tagging strategy: Prometheus naming conventions use lowercase with underscores. Micrometer uses dots (e.g., orders.created) and automatically converts them to underscores when exporting to Prometheus (orders_created_total). Keep tag cardinality controlled — a tag with unbounded values (like user ID or request ID) creates a cardinality explosion that will OOM your Prometheus server. Use tags for bounded categorical dimensions: status, method, region, payment_method. Never use unbounded values as tags.

Building Production Grafana Dashboards

A production Grafana dashboard for Spring Boot typically consists of three tiers: the "golden signals" overview row (request rate, error rate, latency), the JVM internals row (heap, GC, threads), and the business metrics row (orders/sec, GMV, funnel metrics). Start by importing the community JVM dashboard and layer your custom panels on top.

Import the standard JVM Micrometer dashboard: In Grafana, navigate to Dashboards → Import and enter ID 4701 (JVM Micrometer) or 12900 (Spring Boot 2.1 Statistics). Select your Prometheus datasource. This immediately gives you heap usage per generation, GC pause time and frequency, thread state breakdown, CPU usage, and class loading — without writing a single PromQL query.

Custom PromQL queries for production panels:

# ── REQUEST RATE (Requests per Second, 5-minute rate) ──
rate(http_server_requests_seconds_count{
  application="order-service",
  status!~"5.."
}[5m])

# ── ERROR RATE (percentage of 5xx responses) ──
(
  rate(http_server_requests_seconds_count{
    application="order-service",
    status=~"5.."
  }[5m])
  /
  rate(http_server_requests_seconds_count{
    application="order-service"
  }[5m])
) * 100

# ── LATENCY P95 (95th percentile request duration) ──
histogram_quantile(0.95,
  sum(rate(http_server_requests_seconds_bucket{
    application="order-service"
  }[5m])) by (le, uri)
)

# ── JVM HEAP USAGE (used vs committed) ──
jvm_memory_used_bytes{application="order-service", area="heap"}
/
jvm_memory_committed_bytes{application="order-service", area="heap"}

# ── GC PAUSE TIME RATE (seconds of GC pause per second) ──
rate(jvm_gc_pause_seconds_sum{application="order-service"}[5m])

# ── HikariCP CONNECTION POOL UTILIZATION ──
hikaricp_connections_active{application="order-service"}
/
hikaricp_connections_max{application="order-service"}

# ── BUSINESS METRIC: Orders per Minute ──
rate(orders_created_total{application="order-service"}[1m]) * 60

# ── BUSINESS METRIC: Payment Success Rate ──
(
  rate(payment_processing_duration_seconds_count{
    application="order-service",
    status="success"
  }[5m])
  /
  rate(payment_processing_duration_seconds_count{
    application="order-service"
  }[5m])
) * 100

# ── APDEX SCORE (Application Performance Index) ──
# Satisfied: < 200ms | Tolerating: 200ms-1s | Frustrated: > 1s
(
  sum(rate(http_server_requests_seconds_bucket{
    application="order-service", le="0.2"
  }[5m]))
  +
  sum(rate(http_server_requests_seconds_bucket{
    application="order-service", le="1.0"
  }[5m])) / 2
) / sum(rate(http_server_requests_seconds_count{
    application="order-service"
}[5m]))

Dashboard provisioning as code — store dashboards in version control with Grafana's provisioning configuration (grafana/provisioning/dashboards/order-service.json) so they are automatically loaded on startup. This enables infrastructure-as-code workflows where dashboard changes go through code review before reaching production.

Configure the datasource via grafana/provisioning/datasources/prometheus.yml:

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    access: proxy
    isDefault: true
    jsonData:
      timeInterval: "15s"
      queryTimeout: "60s"
      httpMethod: POST
    editable: false

AlertManager Rules for Spring Boot

Alerts defined in Prometheus fire when a PromQL expression evaluates to a non-empty result. AlertManager then routes, deduplicates, groups, and delivers those alerts to the correct receiver (Slack, PagerDuty, email). Defining alerts is a two-file exercise: the Prometheus alert rules file and the AlertManager routing configuration.

Prometheus alert rules file (rules/spring-boot-alerts.yml):

groups:
  - name: spring-boot-slo-alerts
    interval: 30s
    rules:
      # SLO Alert: Error rate exceeds 1% over 5 minutes
      - alert: HighErrorRate
        expr: |
          (
            sum(rate(http_server_requests_seconds_count{
              application="order-service",status=~"5.."
            }[5m]))
            /
            sum(rate(http_server_requests_seconds_count{
              application="order-service"
            }[5m]))
          ) > 0.01
        for: 5m
        labels:
          severity: critical
          team: platform
          service: order-service
        annotations:
          summary: "High HTTP error rate on {{ $labels.application }}"
          description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes. SLO threshold: 1%."
          runbook_url: "https://wiki.internal/runbooks/high-error-rate"
          dashboard_url: "https://grafana.internal/d/spring-boot-slo"

      # SLO Alert: P95 latency exceeds 500ms
      - alert: HighLatencyP95
        expr: |
          histogram_quantile(0.95,
            sum(rate(http_server_requests_seconds_bucket{
              application="order-service"
            }[5m])) by (le)
          ) > 0.5
        for: 3m
        labels:
          severity: warning
          team: platform
        annotations:
          summary: "P95 latency SLO breach on order-service"
          description: "P95 request latency is {{ $value | humanizeDuration }}. Threshold: 500ms."

      # Alert: Pod is down (no scrape data)
      - alert: SpringBootInstanceDown
        expr: up{job="spring-boot-order-service"} == 0
        for: 1m
        labels:
          severity: critical
          team: platform
        annotations:
          summary: "Spring Boot instance {{ $labels.instance }} is down"
          description: "Prometheus cannot scrape {{ $labels.instance }}. Pod may be unhealthy."

  - name: jvm-alerts
    rules:
      # JVM heap usage consistently above 90%
      - alert: JvmHeapUsageCritical
        expr: |
          (
            jvm_memory_used_bytes{area="heap"}
            /
            jvm_memory_max_bytes{area="heap"}
          ) > 0.90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "JVM heap usage critical on {{ $labels.application }}"
          description: "Heap usage is {{ $value | humanizePercentage }}. Risk of OOM. Check for memory leaks."

      # GC is spending too much time collecting
      - alert: JvmGcOverhead
        expr: |
          rate(jvm_gc_pause_seconds_sum{application="order-service"}[5m]) > 0.10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "GC overhead > 10% on {{ $labels.application }}"
          description: "GC is consuming {{ $value | humanizePercentage }} of CPU time. Possible GC thrashing."

      # HikariCP connection pool exhausted
      - alert: HikariCPPoolExhausted
        expr: |
          hikaricp_connections_pending{application="order-service"} > 5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "HikariCP connection pool under pressure"
          description: "{{ $value }} threads waiting for DB connection. Increase pool size or optimize queries."

  - name: business-alerts
    rules:
      # Sudden drop in order creation rate (possible outage)
      - alert: OrderRateDrop
        expr: |
          (
            rate(orders_created_total{application="order-service"}[5m])
            /
            rate(orders_created_total{application="order-service"}[30m] offset 5m)
          ) < 0.5
        for: 3m
        labels:
          severity: critical
          team: business
        annotations:
          summary: "Order creation rate dropped 50%"
          description: "Current rate is {{ $value | humanize }}x the baseline. Possible checkout funnel failure."

AlertManager configuration (alertmanager.yml):

global:
  resolve_timeout: 5m
  slack_api_url: '${SLACK_WEBHOOK_URL}'
  pagerduty_url: 'https://events.pagerduty.com/v2/enqueue'

# Notification templates
templates:
  - '/etc/alertmanager/templates/*.tmpl'

route:
  # Default receiver for unmatched alerts
  receiver: 'slack-notifications'
  # Group alerts by these labels to reduce notification noise
  group_by: ['alertname', 'service', 'severity']
  # Wait this long before sending a group (collects related alerts)
  group_wait: 30s
  # Wait before resending a group with new alerts
  group_interval: 5m
  # Wait before re-notifying about an already-firing alert
  repeat_interval: 4h

  routes:
    # Critical alerts page the on-call engineer immediately
    - match:
        severity: critical
      receiver: 'pagerduty-critical'
      group_wait: 10s
      repeat_interval: 1h
      continue: true  # Also send to Slack

    # Business alerts go to the business team Slack channel
    - match:
        team: business
      receiver: 'slack-business-team'

    # Platform team gets all alerts in #platform-alerts
    - match_re:
        team: platform|infrastructure
      receiver: 'slack-platform-team'

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#alerts-all'
        title: '{{ template "slack.title" . }}'
        text: '{{ template "slack.text" . }}'
        send_resolved: true
        color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'

  - name: 'slack-platform-team'
    slack_configs:
      - channel: '#platform-alerts'
        title: '[{{ .Status | toUpper }}] {{ .CommonAnnotations.summary }}'
        text: |
          *Description:* {{ .CommonAnnotations.description }}
          *Runbook:* {{ .CommonAnnotations.runbook_url }}
          *Dashboard:* {{ .CommonAnnotations.dashboard_url }}
        send_resolved: true

  - name: 'slack-business-team'
    slack_configs:
      - channel: '#business-alerts'
        title: '🚨 Business Alert: {{ .CommonAnnotations.summary }}'
        text: '{{ .CommonAnnotations.description }}'

  - name: 'pagerduty-critical'
    pagerduty_configs:
      - routing_key: '${PAGERDUTY_INTEGRATION_KEY}'
        description: '{{ .CommonAnnotations.summary }}'
        details:
          description: '{{ .CommonAnnotations.description }}'
          runbook: '{{ .CommonAnnotations.runbook_url }}'
        severity: critical

inhibit_rules:
  # If the instance is down, suppress all other alerts from it
  - source_match:
      alertname: SpringBootInstanceDown
    target_match_re:
      alertname: '.+'
    equal: ['instance']

Recording Rules for Performance

As your Prometheus instance scales to hundreds of Spring Boot pods and your dashboards grow to dozens of panels, you will encounter a critical performance problem: expensive PromQL queries executed by every dashboard load and every alert evaluation create significant CPU load on Prometheus. Recording rules precompute these expensive queries and store the results as new time series, converting repeated expensive computations into cheap label lookups.

Recording rules file (rules/recording-rules.yml):

groups:
  - name: spring-boot-recording-rules
    interval: 30s
    rules:
      # Precompute 5-minute request rate per service, method, and status
      - record: job:http_server_requests:rate5m
        expr: |
          sum(rate(http_server_requests_seconds_count[5m]))
          by (application, method, status, uri)

      # Precompute error rate ratio (avoids division in every dashboard panel)
      - record: job:http_server_requests:error_rate5m
        expr: |
          sum(rate(http_server_requests_seconds_count{status=~"5.."}[5m]))
          by (application)
          /
          sum(rate(http_server_requests_seconds_count[5m]))
          by (application)

      # Precompute P95 latency per service (expensive histogram_quantile)
      - record: job:http_server_requests:p95_duration5m
        expr: |
          histogram_quantile(0.95,
            sum(rate(http_server_requests_seconds_bucket[5m]))
            by (le, application)
          )

      # Precompute P99 latency per service
      - record: job:http_server_requests:p99_duration5m
        expr: |
          histogram_quantile(0.99,
            sum(rate(http_server_requests_seconds_bucket[5m]))
            by (le, application)
          )

      # JVM heap utilization ratio
      - record: job:jvm_heap:utilization
        expr: |
          sum(jvm_memory_used_bytes{area="heap"}) by (application, instance)
          /
          sum(jvm_memory_max_bytes{area="heap"}) by (application, instance)

      # HikariCP pool utilization ratio
      - record: job:hikaricp_pool:utilization
        expr: |
          hikaricp_connections_active
          /
          hikaricp_connections_max

After recording rules are defined, dashboards reference the precomputed series: job:http_server_requests:p95_duration5m{application="order-service"} instead of the expensive raw histogram query. Query time drops from seconds to milliseconds for complex aggregations across hundreds of instances.

Understanding rate() vs irate(): rate(counter[5m]) computes the per-second average rate over the last 5 minutes — smooth, representative of sustained load, suitable for dashboards and alerts. irate(counter[5m]) computes the instantaneous rate using only the last two data points — highly responsive to spikes, but noisy. Use rate() for alert rules (to avoid false positives from short spikes) and irate() sparingly in dashboards when you need to see fast-changing behavior. In recording rules, always use rate().

Avoiding cardinality explosions: The most common Prometheus performance disaster is unbounded label cardinality. If you accidentally instrument orders_created_total{user_id="..."} across 1 million users, you create 1 million distinct time series from a single metric. Rules of thumb: never use user IDs, request IDs, order IDs, or any UUID as a label value. Use bounded categorical values only (10s to 100s of distinct values, not millions). Monitor your cardinality with prometheus_tsdb_head_series and alert when it grows unexpectedly.

Production Deployment: Prometheus Operator on Kubernetes

Running Prometheus as a static deployment in Kubernetes means manually updating scrape configurations every time a new Spring Boot service is deployed. The Prometheus Operator solves this with Kubernetes-native CRDs: you define a ServiceMonitor resource alongside your Spring Boot deployment, and the Operator automatically configures Prometheus to scrape it. No central Prometheus config file edits, no restarts — entirely declarative and GitOps-compatible.

Install kube-prometheus-stack via Helm:

# Add the Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the complete monitoring stack
helm upgrade --install kube-prometheus-stack \
  prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --values prometheus-values.yml \
  --wait

Production Helm values (prometheus-values.yml):

prometheus:
  prometheusSpec:
    retention: 15d
    retentionSize: "50GB"
    resources:
      requests:
        memory: "2Gi"
        cpu: "500m"
      limits:
        memory: "4Gi"
        cpu: "2000m"
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: gp3
          resources:
            requests:
              storage: 100Gi
    # Auto-discover ServiceMonitors from any namespace
    serviceMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelector: {}
    serviceMonitorNamespaceSelector: {}
    # Remote write to Thanos for long-term storage
    remoteWrite:
      - url: http://thanos-receive:10908/api/v1/receive

grafana:
  enabled: true
  adminPassword: "${GRAFANA_ADMIN_PASSWORD}"
  persistence:
    enabled: true
    storageClassName: gp3
    size: 20Gi
  grafana.ini:
    server:
      domain: grafana.yourdomain.com
      root_url: https://grafana.yourdomain.com
    auth.generic_oauth:
      enabled: true
      client_id: "${OAUTH_CLIENT_ID}"
      client_secret: "${OAUTH_CLIENT_SECRET}"
  sidecar:
    dashboards:
      enabled: true          # Auto-load ConfigMap dashboards
      searchNamespace: ALL
    datasources:
      enabled: true

alertmanager:
  alertmanagerSpec:
    resources:
      requests:
        memory: "256Mi"
      limits:
        memory: "512Mi"
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: gp3
          resources:
            requests:
              storage: 10Gi

ServiceMonitor CRD for your Spring Boot deployment:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: order-service-monitor
  namespace: production
  labels:
    # Must match the Prometheus operator's serviceMonitorSelector
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: order-service   # Matches your Service labels
  namespaceSelector:
    matchNames:
      - production
  endpoints:
    - port: http            # Named port in the Service spec
      path: /actuator/prometheus
      interval: 15s
      scrapeTimeout: 10s
      honorLabels: false
      relabelings:
        # Add pod name as an instance label for fine-grained filtering
        - sourceLabels: [__meta_kubernetes_pod_name]
          targetLabel: pod
        - sourceLabels: [__meta_kubernetes_namespace]
          targetLabel: namespace
      metricRelabelings:
        # Drop high-cardinality Spring Boot endpoint metrics we don't need
        - sourceLabels: [uri]
          regex: "/actuator/.*"
          action: drop

Spring Boot Deployment with correct labels and port naming:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
        version: "2.1.0"
      annotations:
        # Prometheus annotations for basic scraping (fallback without Operator)
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/actuator/prometheus"
    spec:
      containers:
        - name: order-service
          image: your-registry/order-service:2.1.0
          ports:
            - name: http       # Named port — referenced by ServiceMonitor
              containerPort: 8080
          env:
            - name: APP_ENV
              value: production
            - name: AWS_REGION
              valueFrom:
                fieldRef:
                  fieldPath: metadata.annotations['topology.kubernetes.io/region']
          resources:
            requests:
              memory: "512Mi"
              cpu: "250m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8080
            initialDelaySeconds: 20
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 15
---
apiVersion: v1
kind: Service
metadata:
  name: order-service
  namespace: production
  labels:
    app: order-service     # Must match ServiceMonitor selector
spec:
  selector:
    app: order-service
  ports:
    - name: http           # Named port — must match ServiceMonitor endpoint port
      port: 8080
      targetPort: 8080

PrometheusRule CRD for alert rules: With the Prometheus Operator, alert rules are also managed as Kubernetes resources rather than config files:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: order-service-alerts
  namespace: production
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: order-service
      interval: 30s
      rules:
        - alert: HighErrorRate
          expr: |
            (
              sum(rate(http_server_requests_seconds_count{
                namespace="production",
                pod=~"order-service-.*",
                status=~"5.."
              }[5m]))
              /
              sum(rate(http_server_requests_seconds_count{
                namespace="production",
                pod=~"order-service-.*"
              }[5m]))
            ) > 0.01
          for: 5m
          labels:
            severity: critical
            service: order-service
          annotations:
            summary: "order-service error rate exceeds SLO"
            description: "Error rate: {{ $value | humanizePercentage }}"

With this setup, adding observability to a new Spring Boot service requires only two Kubernetes YAML files: a ServiceMonitor pointing at the service's /actuator/prometheus endpoint and a PrometheusRule with the service's alert definitions. The Operator picks them up within 30 seconds and Prometheus begins scraping automatically — no central configuration file to edit, no Prometheus restarts required.

Production Checklist

  • Add application, environment, and region common tags in management.metrics.tags
  • Enable percentiles-histogram: true for all latency-sensitive endpoints
  • Define SLO boundaries with management.metrics.distribution.slo
  • Use ServiceMonitor CRD instead of static scrape configs in Kubernetes
  • Implement recording rules for all expensive PromQL queries used in dashboards
  • Configure inhibit_rules in AlertManager to suppress cascading alert storms
  • Set Prometheus retention + remote write to Thanos for long-term metric storage
  • Monitor Prometheus itself: alert on prometheus_tsdb_head_series > 5000000
  • Store Grafana dashboards as JSON in version control and provision via ConfigMaps

The Prometheus + Grafana + Micrometer stack gives Spring Boot teams the precision instrumentation needed to move from reactive firefighting to proactive reliability engineering. With SLO-based alerts, business metrics dashboards, and GitOps-managed observability configurations, you gain the visibility to confidently deploy to production and catch issues before users report them — the hallmark of a mature engineering organization.

Leave a Comment

Related Posts

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices

Last updated: April 5, 2026