Software Dev

Node.js Production Best Practices: Scalability, Security & Performance in 2026

Node.js powers a significant slice of the internet's backend — from Netflix and LinkedIn to Shopify and NASA. Yet the gap between a working Node.js app and a production-grade Node.js system is enormous. This comprehensive guide covers everything you need to run Node.js at scale in 2026: event loop internals, security hardening, performance tuning, clustering, memory management, and Kubernetes deployment.

Md Sanwar Hossain April 8, 2026 24 min read Node.js production best practices
Node.js production best practices: scalability, security and performance in 2026

TL;DR — The Node.js Production Survival Kit

"Use Fastify over Express for new APIs (3× throughput). Enable clustering to saturate all CPU cores. Set --max-old-space-size explicitly and monitor heap with v8.getHeapStatistics(). Handle every unhandledRejection and uncaughtException. Add Helmet + rate limiting + Zod validation as non-negotiable middleware. Always implement graceful shutdown with a 30-second drain window before Kubernetes sends SIGTERM."

Table of Contents

  1. Node.js Architecture: Event Loop, Libuv & Worker Threads
  2. Project Structure: Layered Architecture for Node.js
  3. Error Handling Best Practices
  4. Express vs Fastify in 2026: Benchmarks & Migration
  5. Security Hardening: Helmet, Rate Limiting & Input Validation
  6. Clustering & Worker Threads for CPU-Bound Tasks
  7. Memory Management & Leak Detection
  8. Logging Best Practices: Pino, Winston & Correlation IDs
  9. Database Connection Pooling
  10. Health Checks, Graceful Shutdown & Zero-Downtime Deploys
  11. Kubernetes Deployment: Resource Limits, HPA & Probes
  12. Monitoring & Observability: OpenTelemetry & Prometheus
  13. Production Checklist

1. Node.js Architecture: Event Loop, Libuv & Worker Threads

Understanding Node.js internals is not academic trivia — it directly dictates how you architect services and diagnose production latency spikes. Node.js is single-threaded by design, but it is not single-process. The event loop runs on a single thread; I/O operations are delegated to libuv's thread pool and OS-level async APIs.

The Event Loop Phases

The event loop cycles through six phases on every tick. Understanding each phase prevents subtle ordering bugs in production:

Microtask queues (Promise.resolve(), queueMicrotask()) are drained after every phase, before moving to the next. process.nextTick() runs before any I/O event, even before resolved Promises — overuse causes starvation.

Libuv Thread Pool

Libuv maintains a thread pool (default size: 4) for operations that cannot be made non-blocking at the OS level: DNS lookups, fs.* calls, crypto operations, and zlib compression. In production, set UV_THREADPOOL_SIZE to match your I/O concurrency — typically 8–16 for API servers:

# In your Kubernetes Deployment or .env
UV_THREADPOOL_SIZE=16
NODE_OPTIONS="--max-old-space-size=2048"

Worker Threads for CPU-Bound Work

Node.js 12+ ships worker_threads as stable. Use them for CPU-intensive operations that would otherwise block the event loop — image resizing, PDF generation, JSON parsing of large payloads, and cryptographic operations:

// worker-pool.js — reusable worker thread pool
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
const os = require('os');

// Main thread: create a pool of workers
if (isMainThread) {
  const POOL_SIZE = os.cpus().length - 1; // Leave one core for the event loop
  const pool = [];

  function createWorker() {
    const worker = new Worker(__filename);
    worker.on('error', console.error);
    worker.on('exit', () => { /* restart if needed */ });
    return worker;
  }

  for (let i = 0; i < POOL_SIZE; i++) pool.push(createWorker());

  module.exports.runTask = (data) => new Promise((resolve, reject) => {
    const worker = pool.shift();
    worker.once('message', result => { pool.push(worker); resolve(result); });
    worker.once('error', err => { pool.push(worker); reject(err); });
    worker.postMessage(data);
  });
} else {
  // Worker thread: process CPU-intensive work
  parentPort.on('message', (data) => {
    const result = heavyCpuWork(data);
    parentPort.postMessage(result);
  });
}
Node.js production architecture: event loop, clustering, worker threads, and Kubernetes deployment
Node.js Production Architecture — event loop internals, clustering, worker threads, and Kubernetes deployment topology. Source: mdsanwarhossain.me

2. Project Structure: Layered Architecture for Node.js

The biggest architectural mistake in Node.js projects is placing all business logic inside route handlers. A layered architecture (routes → controllers → services → repositories) is testable, maintainable, and scales to large teams.

Recommended Directory Structure

src/
├── app.js              # Express/Fastify app setup (no listen())
├── server.js           # Entry point: listen(), clustering, graceful shutdown
├── config/
│   ├── index.js        # Validated env config (dotenv + Joi/Zod)
│   └── database.js     # DB connection pool setup
├── routes/
│   └── user.routes.js  # Route definitions only — no logic
├── controllers/
│   └── user.controller.js  # HTTP layer: parse req, call service, send res
├── services/
│   └── user.service.js     # Business logic — framework-agnostic
├── repositories/
│   └── user.repository.js  # DB queries — swap pg/prisma/mongoose here
├── middleware/
│   ├── auth.middleware.js
│   ├── error.middleware.js  # Central error handler
│   └── validate.middleware.js
├── utils/
│   ├── logger.js
│   └── AppError.js         # Custom error class
└── __tests__/

Config Validation at Startup

Never let a Node.js process start with missing environment variables. Validate all config at startup using Zod or Joi so failures are immediate and explicit:

// config/index.js
const { z } = require('zod');

const envSchema = z.object({
  NODE_ENV:       z.enum(['development', 'test', 'production']),
  PORT:           z.string().transform(Number).default('3000'),
  DATABASE_URL:   z.string().url(),
  JWT_SECRET:     z.string().min(32),
  REDIS_URL:      z.string().url().optional(),
  LOG_LEVEL:      z.enum(['trace','debug','info','warn','error']).default('info'),
});

const parsed = envSchema.safeParse(process.env);
if (!parsed.success) {
  console.error('❌ Invalid environment configuration:', parsed.error.format());
  process.exit(1);
}

module.exports = parsed.data;

3. Error Handling Best Practices

Poor error handling is the #1 cause of unexpected Node.js process crashes in production. The Node.js error handling model distinguishes between two fundamentally different error categories:

Operational vs Programmer Errors

Custom AppError Class

// utils/AppError.js
class AppError extends Error {
  constructor(message, statusCode, code) {
    super(message);
    this.statusCode = statusCode;
    this.code = code;
    this.isOperational = true; // Mark as safe to expose
    Error.captureStackTrace(this, this.constructor);
  }
}

module.exports = AppError;

// Usage in service layer:
if (!user) throw new AppError('User not found', 404, 'USER_NOT_FOUND');

Central Error Handler Middleware

// middleware/error.middleware.js
const AppError = require('../utils/AppError');
const logger = require('../utils/logger');

module.exports = (err, req, res, next) => {
  const statusCode = err.statusCode || 500;
  const isOperational = err.isOperational === true;

  logger[isOperational ? 'warn' : 'error']({
    err,
    requestId: req.id,
    method: req.method,
    url: req.url,
  });

  if (!isOperational && process.env.NODE_ENV === 'production') {
    // Programmer error in production — do not expose internals
    return res.status(500).json({ error: 'Internal server error', requestId: req.id });
  }

  res.status(statusCode).json({
    error: err.message,
    code:  err.code || 'INTERNAL_ERROR',
    requestId: req.id,
    ...(process.env.NODE_ENV !== 'production' && { stack: err.stack }),
  });
};

Global Safety Nets

// server.js — attach before starting the server
process.on('unhandledRejection', (reason, promise) => {
  logger.error({ reason }, 'Unhandled Promise Rejection');
  // Give logger time to flush, then exit so K8s restarts the pod
  setTimeout(() => process.exit(1), 500);
});

process.on('uncaughtException', (err) => {
  logger.fatal({ err }, 'Uncaught Exception — shutting down');
  setTimeout(() => process.exit(1), 500);
});

4. Express vs Fastify in 2026: Performance Benchmarks & Migration

Express has been the default Node.js framework since 2010. In 2026, Fastify is the clear performance winner for new projects. The decision is nuanced for existing codebases.

Performance Comparison

Metric Express 4.x Fastify 4.x Hono 4.x
Req/sec (simple JSON) ~18,000 ~54,000 ~62,000
JSON serialization JSON.stringify fast-json-stringify (schema) native / custom
Schema validation Manual / middleware Built-in (JSON Schema) Zod / Valibot
TypeScript support @types/express First-class First-class
Ecosystem / plugins Massive (15+ years) Growing rapidly Edge-first, smaller

Fastify Hello World with Schema Validation

// app.js — Fastify with JSON schema validation & serialization
const fastify = require('fastify')({ logger: true });

const createUserSchema = {
  body: {
    type: 'object',
    required: ['name', 'email'],
    properties: {
      name:  { type: 'string', minLength: 2, maxLength: 100 },
      email: { type: 'string', format: 'email' },
    },
    additionalProperties: false,
  },
  response: {
    201: {
      type: 'object',
      properties: {
        id:    { type: 'string' },
        name:  { type: 'string' },
        email: { type: 'string' },
      },
    },
  },
};

fastify.post('/users', { schema: createUserSchema }, async (req, reply) => {
  const user = await userService.create(req.body);
  return reply.code(201).send(user);
});

// fast-json-stringify serializes the response 2-4× faster than JSON.stringify

When to Stick With Express

5. Security Hardening: Helmet, Rate Limiting & Input Validation

Node.js security in production requires defense-in-depth: HTTP security headers, rate limiting, input validation, and SQL injection prevention. None of these are optional for internet-facing services.

Helmet — HTTP Security Headers

const helmet = require('helmet');

app.use(helmet({
  contentSecurityPolicy: {
    directives: {
      defaultSrc: ["'self'"],
      scriptSrc:  ["'self'", "'strict-dynamic'"],
      styleSrc:   ["'self'", "'unsafe-inline'"],
      imgSrc:     ["'self'", "data:", "https:"],
    },
  },
  hsts: {
    maxAge: 31536000,     // 1 year
    includeSubDomains: true,
    preload: true,
  },
  referrerPolicy: { policy: 'same-origin' },
  crossOriginEmbedderPolicy: false, // Adjust for CDN assets
}));

Rate Limiting with express-rate-limit & Redis

const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const redis = require('./config/redis');

// General API rate limit
const apiLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 200,
  standardHeaders: 'draft-7',
  legacyHeaders: false,
  store: new RedisStore({ sendCommand: (...args) => redis.sendCommand(args) }),
  keyGenerator: (req) => req.ip + ':' + (req.user?.id || 'anon'),
  handler: (req, res) => res.status(429).json({ error: 'Too many requests', retryAfter: req.rateLimit.resetTime }),
});

// Stricter limit on auth endpoints
const authLimiter = rateLimit({ windowMs: 15 * 60 * 1000, max: 10, store: ... });

app.use('/api/', apiLimiter);
app.use('/auth/', authLimiter);

Input Validation with Zod

const { z } = require('zod');

const CreateUserSchema = z.object({
  name:     z.string().min(2).max(100).trim(),
  email:    z.string().email().toLowerCase(),
  age:      z.number().int().min(13).max(120).optional(),
  role:     z.enum(['admin', 'user', 'moderator']).default('user'),
  website:  z.string().url().optional(),
});

// Reusable validation middleware
const validate = (schema) => (req, res, next) => {
  const result = schema.safeParse(req.body);
  if (!result.success) {
    return res.status(400).json({
      error: 'Validation failed',
      details: result.error.flatten().fieldErrors,
    });
  }
  req.body = result.data; // Replace with sanitized/coerced data
  next();
};

router.post('/users', validate(CreateUserSchema), userController.create);

SQL Injection Prevention

6. Clustering & Worker Threads for CPU-Bound Tasks

A single Node.js process runs on a single CPU core. On a modern 8-core server, a single-process app wastes 87.5% of available compute. Clustering creates one process per core, multiplying throughput by up to 8× for I/O-bound workloads.

Production Cluster Setup

// server.js — production cluster setup
const cluster = require('cluster');
const os = require('os');
const logger = require('./utils/logger');

const NUM_WORKERS = process.env.WEB_CONCURRENCY
  ? parseInt(process.env.WEB_CONCURRENCY, 10)
  : Math.max(1, os.cpus().length - 1); // Leave 1 core for OS

if (cluster.isPrimary) {
  logger.info({ workers: NUM_WORKERS }, 'Master process started');

  for (let i = 0; i < NUM_WORKERS; i++) cluster.fork();

  cluster.on('exit', (worker, code, signal) => {
    logger.warn({ pid: worker.process.pid, code, signal }, 'Worker died — restarting');
    cluster.fork(); // Auto-restart crashed workers
  });

  // Zero-downtime restart: send SIGUSR2 to master
  process.on('SIGUSR2', () => {
    const workers = Object.values(cluster.workers);
    const restartNext = (i) => {
      if (i >= workers.length) return;
      workers[i].once('exit', () => { cluster.fork(); restartNext(i + 1); });
      workers[i].kill('SIGTERM');
    };
    restartNext(0);
  });

} else {
  require('./app').listen(process.env.PORT || 3000, () => {
    logger.info({ pid: process.pid }, 'Worker started');
  });
}

Cluster vs Worker Threads — When to Use Which

Approach Best For Memory Communication
Cluster Serving HTTP I/O-bound requests Separate heap per worker IPC (JSON messages)
Worker Threads CPU-bound tasks (image processing, crypto) Shared ArrayBuffer possible postMessage + SharedArrayBuffer
Both together High-traffic + CPU-heavy features Higher total Both channels

7. Memory Management & Leak Detection

Memory leaks in Node.js are insidious — they manifest as gradual OOMKilled pods in Kubernetes, not as immediate crashes. Proactive monitoring and correct heap configuration are essential.

Setting the Heap Size Correctly

By default, V8 limits the heap to ~1.5 GB on 64-bit systems. Always set --max-old-space-size to ~75% of your container memory limit to give the OS and other processes room:

# Dockerfile — set before node command
ENV NODE_OPTIONS="--max-old-space-size=1536"
# Container limit: 2Gi → heap limit: 1536 MB (75%)

# Or in Kubernetes Deployment:
env:
  - name: NODE_OPTIONS
    value: "--max-old-space-size=1536"

Exposing Heap Metrics with prom-client

const v8 = require('v8');
const client = require('prom-client');

const heapGauge = new client.Gauge({
  name: 'nodejs_heap_used_bytes',
  help: 'V8 heap used in bytes',
  collect() { this.set(v8.getHeapStatistics().used_heap_size); },
});

const externalGauge = new client.Gauge({
  name: 'nodejs_external_memory_bytes',
  help: 'Node.js external memory (Buffers, native addons)',
  collect() { this.set(process.memoryUsage().external); },
});

Common Memory Leak Patterns & Fixes

Heap Snapshot Analysis

// Trigger a heap snapshot on demand via a protected endpoint
const v8 = require('v8');
const fs = require('fs');
const path = require('path');

app.get('/debug/heap-snapshot', requireInternalAuth, (req, res) => {
  const filename = `heap-${Date.now()}.heapsnapshot`;
  const snapshotPath = path.join('/tmp', filename);
  const stream = v8.writeHeapSnapshot(snapshotPath);
  res.json({ message: 'Heap snapshot written', path: stream });
});

// Load the .heapsnapshot file in Chrome DevTools → Memory tab
// Look for objects with high retained size that shouldn't be growing

8. Logging Best Practices: Pino, Winston & Correlation IDs

In production, logs are your primary debugging tool. Structured JSON logs with correlation IDs enable log aggregation and distributed tracing across microservices.

Pino — The Fastest Node.js Logger

Pino is 5–10× faster than Winston and 2–3× faster than Bunyan because it uses a worker thread for serialization. Use Pino for any performance-sensitive service:

// utils/logger.js
const pino = require('pino');

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  formatters: {
    level: (label) => ({ level: label }),
    bindings: (bindings) => ({ pid: bindings.pid, host: bindings.hostname }),
  },
  timestamp: pino.stdTimeFunctions.isoTime,
  redact: {
    paths: ['req.headers.authorization', 'req.body.password', 'req.body.token'],
    censor: '[REDACTED]',
  },
}, pino.destination({ sync: false })); // async destination for performance

module.exports = logger;

Request Correlation IDs with AsyncLocalStorage

const { AsyncLocalStorage } = require('async_hooks');
const { v4: uuidv4 } = require('uuid');

const requestContext = new AsyncLocalStorage();

// Middleware: attach requestId to every log in this request's async chain
app.use((req, res, next) => {
  const requestId = req.headers['x-request-id'] || uuidv4();
  res.setHeader('X-Request-ID', requestId);
  requestContext.run({ requestId }, next);
});

// Child logger that automatically includes requestId
function getLogger() {
  const ctx = requestContext.getStore();
  return ctx ? logger.child({ requestId: ctx.requestId }) : logger;
}

// Usage in any service/repository — no need to pass logger down:
getLogger().info({ userId }, 'User created successfully');

Log Levels in Production

9. Database Connection Pooling

Each database connection consumes memory on both the server and the database. In a clustered Node.js app, connection count scales as workers × pool_size. Getting pooling wrong causes connection exhaustion under load.

pg-pool (PostgreSQL) Configuration

// config/database.js — pg-pool production setup
const { Pool } = require('pg');

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  ssl: process.env.NODE_ENV === 'production' ? { rejectUnauthorized: true } : false,
  max: parseInt(process.env.DB_POOL_MAX || '10', 10),   // Per worker process
  min: parseInt(process.env.DB_POOL_MIN || '2',  10),
  idleTimeoutMillis:    30_000, // Close idle connections after 30s
  connectionTimeoutMillis: 3_000, // Fail fast if pool is exhausted
  statement_timeout:   10_000, // Kill long-running queries after 10s
  application_name:    `myapp-worker-${process.pid}`,
});

pool.on('error', (err, client) => {
  logger.error({ err }, 'Unexpected error on idle client');
});

// Formula: max_connections_per_db = workers × DB_POOL_MAX
// 4 workers × 10 pool = 40 connections. Keep well below PostgreSQL max_connections.

module.exports = pool;

Prisma Best Practices

// Use a singleton PrismaClient to prevent connection pool exhaustion
// config/prisma.js
const { PrismaClient } = require('@prisma/client');

const globalForPrisma = global;
const prisma = globalForPrisma.prisma ?? new PrismaClient({
  log: process.env.NODE_ENV === 'development'
    ? ['query', 'warn', 'error']
    : ['warn', 'error'],
  datasources: {
    db: { url: process.env.DATABASE_URL + '?connection_limit=10&pool_timeout=10' },
  },
});

if (process.env.NODE_ENV !== 'production') globalForPrisma.prisma = prisma;

module.exports = prisma;

// Always disconnect in graceful shutdown:
process.on('beforeExit', async () => { await prisma.$disconnect(); });

Mongoose (MongoDB) Best Practices

10. Health Checks, Graceful Shutdown & Zero-Downtime Deploys

Kubernetes routes traffic based on probe responses and sends SIGTERM before killing a pod. Failing to implement these correctly leads to dropped requests during deployments — a silent killer of user experience.

Health Check Endpoints

// routes/health.routes.js
let isShuttingDown = false;

// Liveness: is the process alive? Never check external dependencies here.
app.get('/health/live', (req, res) => {
  res.status(200).json({ status: 'ok', pid: process.pid, uptime: process.uptime() });
});

// Readiness: is the process ready to serve traffic?
// Return 503 during startup AND during graceful shutdown.
app.get('/health/ready', async (req, res) => {
  if (isShuttingDown) return res.status(503).json({ status: 'shutting_down' });

  try {
    await pool.query('SELECT 1');              // DB check
    await redis.ping();                        // Cache check
    res.status(200).json({ status: 'ready', db: 'ok', cache: 'ok' });
  } catch (err) {
    res.status(503).json({ status: 'not_ready', error: err.message });
  }
});

// Startup probe: takes longer, used once during init
app.get('/health/startup', async (req, res) => {
  try {
    await pool.query('SELECT 1');
    res.status(200).json({ status: 'started' });
  } catch {
    res.status(503).json({ status: 'starting' });
  }
});

Graceful Shutdown

// server.js — graceful shutdown handler
const SHUTDOWN_TIMEOUT_MS = 30_000; // Must be < terminationGracePeriodSeconds in K8s

async function gracefulShutdown(signal) {
  logger.info({ signal }, 'Received shutdown signal');
  isShuttingDown = true; // Fail /health/ready immediately

  // Stop accepting new connections
  server.close(async () => {
    logger.info('HTTP server closed');
    try {
      await pool.end();          // Drain DB pool
      await redis.quit();        // Close Redis connection
      await prisma.$disconnect(); // If using Prisma
      logger.info('All connections drained — exiting cleanly');
      process.exit(0);
    } catch (err) {
      logger.error({ err }, 'Error during shutdown');
      process.exit(1);
    }
  });

  // Hard kill if graceful shutdown takes too long
  setTimeout(() => {
    logger.error('Graceful shutdown timed out — force exiting');
    process.exit(1);
  }, SHUTDOWN_TIMEOUT_MS);
}

process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT',  () => gracefulShutdown('SIGINT'));

11. Kubernetes Deployment: Resource Limits, HPA & Probes

A production Kubernetes Deployment for Node.js requires careful resource sizing, horizontal pod autoscaling, and correctly configured liveness/readiness probes to achieve zero-downtime deploys.

Complete Deployment Manifest

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nodejs-app
  labels:
    app: my-nodejs-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0      # Zero-downtime rolling update
  selector:
    matchLabels:
      app: my-nodejs-app
  template:
    metadata:
      labels:
        app: my-nodejs-app
    spec:
      terminationGracePeriodSeconds: 60  # Must exceed SHUTDOWN_TIMEOUT_MS
      containers:
      - name: api
        image: myregistry/my-nodejs-app:latest
        ports:
        - containerPort: 3000
        env:
        - name: NODE_ENV
          value: production
        - name: NODE_OPTIONS
          value: "--max-old-space-size=1536"
        - name: UV_THREADPOOL_SIZE
          value: "16"
        resources:
          requests:
            cpu:    "250m"
            memory: "512Mi"
          limits:
            cpu:    "1000m"
            memory: "2Gi"
        livenessProbe:
          httpGet:
            path: /health/live
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 15
          failureThreshold: 3
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 10
          failureThreshold: 3
          timeoutSeconds: 5
        startupProbe:
          httpGet:
            path: /health/startup
            port: 3000
          failureThreshold: 30
          periodSeconds: 5   # 30 × 5s = 150s max startup time

Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-nodejs-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-nodejs-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60  # Scale up at 60% CPU (not 80% — gives headroom)
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 minutes before scaling down

Node.js Resource Sizing Guidelines

Service Type CPU Request Memory Request Memory Limit --max-old-space-size
Lightweight API 100m 256Mi 512Mi 384
Standard API 250m 512Mi 2Gi 1536
CPU-intensive service 1000m 1Gi 4Gi 3072

12. Monitoring & Observability: OpenTelemetry & Prometheus

Production observability requires three pillars: metrics (Prometheus + Grafana), traces (OpenTelemetry + Jaeger/Tempo), and logs (Pino + Loki). All three are needed to diagnose latency regressions, error spikes, and capacity issues.

Prometheus Metrics with prom-client

const client = require('prom-client');

// Collect default Node.js metrics (heap, GC, event loop lag, etc.)
client.collectDefaultMetrics({ prefix: 'myapp_' });

// Custom business metrics
const httpRequestDuration = new client.Histogram({
  name: 'myapp_http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
});

const activeConnections = new client.Gauge({
  name: 'myapp_active_http_connections',
  help: 'Number of active HTTP connections',
});

// Middleware to record request duration
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer();
  res.on('finish', () => {
    end({
      method:      req.method,
      route:       req.route?.path || req.path,
      status_code: res.statusCode,
    });
  });
  next();
});

// Prometheus scrape endpoint — protect in production!
app.get('/metrics', requireInternalAuth, async (req, res) => {
  res.set('Content-Type', client.register.contentType);
  res.send(await client.register.metrics());
});

OpenTelemetry Auto-Instrumentation

// tracing.js — load BEFORE any other require()
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');

const sdk = new NodeSDK({
  serviceName: process.env.SERVICE_NAME || 'my-nodejs-app',
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces',
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-http': { enabled: true },
      '@opentelemetry/instrumentation-express': { enabled: true },
      '@opentelemetry/instrumentation-pg': { enabled: true },
      '@opentelemetry/instrumentation-redis-4': { enabled: true },
    }),
  ],
});

sdk.start();
process.on('SIGTERM', () => sdk.shutdown());

# Start with: node -r ./tracing.js server.js

Key Grafana Dashboards to Build

13. Production Checklist

Use this checklist before shipping any Node.js service to production. Missing items in this list have caused real production incidents.

Security

  • ☐ Helmet middleware configured with strict CSP
  • ☐ Rate limiting on all public endpoints (Redis-backed for multi-instance)
  • ☐ Input validated & sanitized with Zod/Joi on every route
  • ☐ All DB queries parameterized — no string interpolation
  • ☐ JWT secrets are 256-bit random, rotated, stored in Vault/K8s Secrets
  • npm audit passes with zero critical/high vulnerabilities
  • .env files never committed — use Kubernetes Secrets or AWS Secrets Manager
  • ☐ CORS configured with explicit allowed origins (not *)
  • ☐ Dependencies pinned to exact versions in package-lock.json

Performance & Scalability

  • ☐ Clustering enabled — one worker per CPU core (or via WEB_CONCURRENCY)
  • --max-old-space-size set to 75% of container memory limit
  • UV_THREADPOOL_SIZE set appropriately (8–16 for I/O-heavy services)
  • ☐ Database connection pool sized correctly (workers × pool_max < DB max_connections)
  • ☐ CPU-bound tasks offloaded to Worker Threads
  • ☐ Response compression enabled (Brotli preferred over gzip)
  • ☐ Caching layer in place for expensive queries (Redis, in-process LRU)
  • ☐ Fastify considered for new services (3× throughput over Express)

Error Handling & Reliability

  • process.on('unhandledRejection') and uncaughtException handlers crash with exit code 1
  • ☐ Central Express error handler (4-argument middleware) registered as last middleware
  • ☐ All async route handlers wrapped in try/catch or use express-async-errors
  • ☐ Custom AppError class with isOperational flag
  • ☐ Circuit breakers on external service calls (opossum)
  • ☐ Retry with exponential backoff for transient failures

Kubernetes & Deployment

  • ☐ Liveness, readiness, and startup probes configured
  • ☐ Readiness probe returns 503 while shutting down
  • terminationGracePeriodSeconds > your shutdown timeout
  • ☐ HPA configured with CPU target of 60% (not 80%)
  • maxUnavailable: 0 in rolling update strategy
  • ☐ PodDisruptionBudget set to minAvailable: 2
  • ☐ Resource requests AND limits set on all containers
  • ☐ Non-root container user (USER node in Dockerfile)

Observability

  • ☐ Structured JSON logging (Pino) with correlation IDs via AsyncLocalStorage
  • ☐ Sensitive fields redacted (authorization, password, PII)
  • ☐ Prometheus metrics endpoint with RED metrics + heap + event loop lag
  • ☐ OpenTelemetry auto-instrumentation for distributed tracing
  • ☐ Alerts on: error rate >1%, p99 latency >1s, heap >80%, event loop lag >100ms
  • ☐ Log aggregation to centralized store (Loki, CloudWatch, Datadog)

Every item in this checklist has been hardened by real production incidents. The teams that skip "optional" items like PodDisruptionBudgets and startup probes are the ones paging at 3am. Production Node.js is not just about writing JavaScript — it's about understanding the entire stack from V8 internals to Kubernetes scheduling.

Leave a Comment

Related Posts

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices · AI/LLM Systems

All Posts
Last updated: April 8, 2026