Should I use MongoDB or PostgreSQL?

Use MongoDB when your data is hierarchical/nested (products with variants, blog posts with comments), your schema evolves frequently, or you need very high write throughput with flexible querying. Use PostgreSQL when you need strong ACID transactions across multiple entities, complex joins between normalized tables, or when your team has stronger SQL expertise. Many production systems use both: PostgreSQL for transactional core data, MongoDB for catalog/content/log data.

How do MongoDB transactions work?

MongoDB 4.0+ supports multi-document ACID transactions across multiple collections, requiring a replica set. In Spring Boot, add a MongoTransactionManager bean and use @Transactional on service methods. Transactions use WiredTiger snapshot isolation. Performance cost is higher than single-document operations, so design documents to minimize the need for multi-document transactions.

What is the MongoDB aggregation pipeline?

The aggregation pipeline is a sequence of stages that transform documents: $match (filter), $group (compute aggregates), $sort, $limit, $project (reshape), $lookup (join), $facet (multi-aggregate). Always put $match first to reduce document count before expensive stages. Use $facet to compute multiple aggregations in a single pass (e.g., category counts + price ranges for e-commerce faceted search).

What is the unbounded array anti-pattern in MongoDB?

The unbounded array anti-pattern occurs when you store an ever-growing array inside a document — e.g., pushing all comment IDs or follower IDs into a single document field. MongoDB documents have a 16MB size limit and large array operations are slow. Instead, use a separate collection for the child items (1:many reference pattern) and query with the parent ID.

How do I choose a shard key for MongoDB?

A good shard key has: high cardinality (many distinct values), even write distribution (avoid monotonically increasing keys like timestamps as sole key — causes hotspot on the highest shard), and matches common query patterns. A compound shard key like {tenantId, _id} distributes writes evenly while allowing efficient tenant-scoped queries. Avoid low-cardinality keys like status or boolean fields.

Core Java April 11, 2026 · 23 min read

MongoDB with Spring Boot: Production Data Modeling, Aggregations & Performance Guide (2026)

A complete production guide covering MongoDB document modeling patterns, embedded vs referenced design, compound indexing with the ESR rule, aggregation pipeline in Java, multi-document transactions, change streams, sharding, and Atlas operations.

Md Sanwar Hossain
Senior Java & Backend Engineer

MongoDB Spring Boot Production Guide 2026

TL;DR: MongoDB shines for hierarchical data, variable schemas, and high write throughput. The key to performance is modeling your documents around your queries — not normalizing like a relational database. One query should retrieve everything you need.

1. When to Choose MongoDB

Feature	PostgreSQL	MongoDB	Cassandra
Schema flexibility	Rigid	✅ Dynamic	Moderate
Hierarchical data	Joins required	✅ Native nesting	Manual denorm
ACID transactions	✅ Full ACID	✅ Multi-doc (4.0+)	❌ LWT only
Write throughput	Moderate	✅ High	✅ Very high
Aggregation	✅ SQL GROUP BY	✅ Pipeline	❌ Limited

Use MongoDB for: product catalogs with varying attributes, content management, user activity logs, IoT time-series, mobile app backends with evolving schemas, and any use case where you query by a primary key and need the full nested document in one request.

2. Spring Boot Setup

# pom.xml + application.yml

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>

# application.yml
spring:
  data:
    mongodb:
      uri: mongodb+srv://user:password@cluster.mongodb.net/mydb?retryWrites=true&w=majority&readPreference=secondaryPreferred
      # readPreference=secondaryPreferred — reads go to replica for better throughput

3. Document Modeling: Embedded vs Referenced

Rule of thumb: Embed data that is always accessed together and has a bounded/small cardinality. Reference data that has an independent lifecycle, high cardinality, or is accessed separately.

// ❌ BAD: Unbounded array anti-pattern

// DON'T: push follower IDs into user document — unbounded growth, 16MB limit
{
  "_id": "user123",
  "name": "Alice",
  "followerIds": ["u1","u2","u3",...,"u99999"]  // grows to millions!
}

// ✅ GOOD: @Document with embedded variants + referenced reviews

@Document(collection = "products")
public class Product {
    @Id
    private String id;         // MongoDB ObjectId (time-sortable, unique)

    @Indexed(unique = true)
    private String sku;

    private String name;
    private String description;
    private String category;   // @Indexed for facets

    // EMBED: variants are few (<20), always fetched with product
    private List<ProductVariant> variants;   // {color, size, price, stock}

    // REFERENCE: reviews are many (unbounded), accessed separately
    @DBRef(lazy = true)       // lazy = don't auto-fetch at load time
    private List<Review> reviews;

    // Metadata
    @CreatedDate
    private Instant createdAt;
    @LastModifiedDate
    private Instant updatedAt;
}

// Separate reviews collection — queried independently, paginated
@Document(collection = "reviews")
public class Review {
    @Id private String id;
    @Indexed private String productId;   // FK by convention
    private String userId;
    private int rating;
    private String comment;
    @CreatedDate private Instant createdAt;
}

4. Indexing: ESR Rule, TTL, Partial, Sparse

ESR Rule: For compound indexes, order fields as Equality → Sort → Range. This maximizes index usage and minimizes in-memory sorts.

// Index types in Spring Data MongoDB

// Compound index following ESR rule for: category=electronics, sort by price, range on stock
@CompoundIndex(def = "{'category': 1, 'price': 1, 'stock': 1}", name = "category_price_stock")
@Document(collection = "products")
public class Product { ... }

// TTL index: auto-delete OTP documents after 5 minutes
@Document(collection = "otps")
public class Otp {
    @Id private String id;
    @Indexed(expireAfterSeconds = 300)  // TTL index
    private Date createdAt;
    private String code;
    private String userId;
}

// Partial index: only index active products (smaller index = faster)
// Must be done programmatically with MongoTemplate
mongoTemplate.indexOps(Product.class).ensureIndex(
    new Index("sku", Sort.Direction.ASC)
        .named("active_sku_idx")
        .sparse()
        .partial(new Document("status", "ACTIVE"))
);

5. Aggregation Pipeline in Java

// ✅ GOOD: MongoTemplate aggregation — $match first, $group, $lookup, $facet

@Service
public class ProductAggregationService {
    @Autowired private MongoTemplate mongoTemplate;

    public CategorySalesReport getSalesReport(String category, LocalDate from, LocalDate to) {
        TypedAggregation<Order> agg = Aggregation.newAggregation(Order.class,
            // Stage 1: $match FIRST — filter before any computation
            Aggregation.match(
                Criteria.where("category").is(category)
                    .and("createdAt").gte(from).lte(to)
                    .and("status").is("COMPLETED")),
            // Stage 2: $group — sum revenue and count orders per product
            Aggregation.group("productId")
                .sum("amount").as("totalRevenue")
                .count().as("orderCount")
                .avg("amount").as("avgOrderValue"),
            // Stage 3: $lookup — join with products collection
            Aggregation.lookup("products", "_id", "_id", "product"),
            Aggregation.unwind("product"),
            // Stage 4: $sort — top selling products first
            Aggregation.sort(Sort.by(Sort.Direction.DESC, "totalRevenue")),
            Aggregation.limit(50),
            // Stage 5: $project — shape output
            Aggregation.project("orderCount", "totalRevenue", "avgOrderValue")
                .andExpression("product.name").as("productName")
        );

        return mongoTemplate.aggregate(agg, CategorySalesReport.class).getMappedResults()
            .stream().findFirst().orElse(new CategorySalesReport());
    }
}

6. Spring Data MongoDB: Repository & MongoTemplate

// Repository + Criteria API for complex queries

// MongoRepository for simple CRUD
public interface ProductRepository extends MongoRepository<Product, String> {
    // Derived queries
    List<Product> findByCategoryAndPriceLessThan(String category, double maxPrice);

    @Query("{ 'category': ?0, 'variants.stock': { $gt: 0 } }")
    Page<Product> findAvailableByCategory(String category, Pageable pageable);
}

// MongoTemplate for complex Criteria
@Service
public class ProductSearchService {
    @Autowired private MongoTemplate mongoTemplate;

    public Page<Product> search(ProductFilter filter, Pageable pageable) {
        Criteria criteria = new Criteria();
        if (filter.getCategory() != null)
            criteria.and("category").is(filter.getCategory());
        if (filter.getMinPrice() != null)
            criteria.and("price").gte(filter.getMinPrice());
        if (filter.getMaxPrice() != null)
            criteria.and("price").lte(filter.getMaxPrice());
        if (filter.getKeyword() != null)
            criteria.and("name").regex(filter.getKeyword(), "i"); // case-insensitive

        Query query = Query.query(criteria)
            .with(pageable)
            .with(Sort.by("createdAt").descending());

        List<Product> results = mongoTemplate.find(query, Product.class);
        long count = mongoTemplate.count(Query.query(criteria), Product.class);
        return new PageImpl<>(results, pageable, count);
    }
}

7. Multi-Document Transactions

// ✅ GOOD: @Transactional with MongoTransactionManager (requires replica set)

@Configuration
public class MongoConfig {
    @Bean
    public MongoTransactionManager transactionManager(MongoDatabaseFactory dbFactory) {
        return new MongoTransactionManager(dbFactory);
    }
}

@Service
public class OrderService {
    @Transactional  // ACID across multiple collections
    public Order placeOrder(OrderRequest request) {
        // 1. Deduct inventory atomically
        Product product = productRepository.findById(request.getProductId())
            .orElseThrow(() -> new ProductNotFoundException(request.getProductId()));
        if (product.getStock() < request.getQuantity()) {
            throw new InsufficientStockException();
        }
        product.setStock(product.getStock() - request.getQuantity());
        productRepository.save(product);

        // 2. Create order record
        Order order = Order.builder()
            .productId(request.getProductId())
            .userId(request.getUserId())
            .quantity(request.getQuantity())
            .amount(product.getPrice() * request.getQuantity())
            .status("PENDING")
            .build();
        return orderRepository.save(order);
        // If any exception: full rollback of both writes
    }
}

8. Change Streams: Real-Time Events

// ✅ GOOD: Change stream listener with resume token for fault tolerance

@Component
public class OrderChangeStreamListener {
    @Autowired private MongoTemplate mongoTemplate;
    @Autowired private EventPublisher eventPublisher;
    private BsonDocument lastResumeToken;

    @PostConstruct
    public void startListening() {
        ChangeStreamOptions options = ChangeStreamOptions.builder()
            .filter(Aggregation.newAggregation(
                Aggregation.match(Criteria.where("operationType").in("insert", "update"))))
            .resumeAt(loadLastResumeToken())  // resume after restart
            .build();

        Flux<ChangeStreamEvent<Order>> stream = mongoTemplate.changeStream(
            "orders", options, Order.class);

        stream.subscribe(event -> {
            lastResumeToken = event.getRaw().getResumeToken();
            saveResumeToken(lastResumeToken);  // persist for crash recovery
            Order order = event.getBody();
            if ("insert".equals(event.getOperationType().getValue())) {
                eventPublisher.publish(new OrderCreatedEvent(order));
            }
        });
    }
}

9. Sharding & Replication

Replica set minimum: 1 primary + 2 secondaries. Always required for transactions.
Shard key selection: High cardinality + even distribution. Compound {tenantId, _id} distributes writes evenly while allowing efficient tenant-scoped queries.
❌ BAD shard key: createdAt alone — monotonically increasing, all writes go to the "last" shard (hot spot).
Hashed sharding: sh.shardCollection("db.products", {_id: "hashed"}) — even distribution but no range queries on _id.
Read preferences: Use secondaryPreferred for read-heavy workloads (reporting, analytics); primary for reads requiring latest data (post-write reads).

10. Production Operations & Atlas

Area	Action	Tool
Slow queries	Enable profiler level 2 (slowms=100)	`db.setProfilingLevel(2, {slowms: 100})`
Query explain	Check executionStats for index usage	`cursor.explain("executionStats")`
Index advisor	Atlas Performance Advisor auto-suggests indexes	Atlas UI
Backups	Atlas continuous backup with point-in-time restore	Atlas / mongodump

11. Interview Questions & Checklist

Q: How would you model a blog with posts and comments in MongoDB?

A: Embed the first ~5 comments in the post document for fast display (no second query). Store all comments in a separate comments collection with a postId index for paginated loading of full comment threads. This pattern — "subset pattern" — balances read performance for the common case (show post + preview comments) and scalability for the edge case (post with 10k comments).

✅ MongoDB Production Checklist

Always use replica set (required for transactions)
Explicit schema validation via $jsonSchema
Avoid unbounded arrays
Follow ESR rule for compound indexes
Use TTL index for session/OTP documents
$match first in every aggregation pipeline
Test explain("executionStats") for all queries
Set maxTimeMS on all queries
Use change streams for event-driven sync
Atlas Performance Advisor in production

12. At BRAC IT: MongoDB for Audit Logs and Analytics

At BRAC IT we use MongoDB for two purposes: audit logging (every loan state transition) and analytics pre-aggregation (daily portfolio risk summaries). We chose MongoDB for these use cases over PostgreSQL for three reasons: our audit event schema evolves frequently as we add new event types and regulatory requirements, individual audit documents are self-contained (no joins needed), and write throughput is significantly higher than PostgreSQL for our insert-heavy workload.

Our audit collection stores one document per event. Each document is timestamped, tagged with the user, service, correlation ID, and contains the full before/after state of the entity:

{
  "_id": ObjectId("..."),
  "eventType": "LOAN_STATUS_CHANGED",
  "correlationId": "a3f4b2c1-...",
  "timestamp": ISODate("2026-04-28T10:32:00Z"),
  "actorId": "officer-uuid-123",
  "actorType": "LOAN_OFFICER",
  "entityId": "loan-uuid-456",
  "before": { "status": "PENDING", "assignedTo": null },
  "after":  { "status": "APPROVED", "assignedTo": "officer-uuid-123" },
  "metadata": {
    "serviceVersion": "2.4.1",
    "hostId": "payment-service-pod-7d8f",
    "ipAddress": "10.0.1.45"
  }
}

This schema has evolved 14 times in three years. Adding a new field to "metadata" in MongoDB requires zero migration — new documents have the field, old documents do not. A @Document class in Spring Data simply adds the new field as nullable and it starts appearing in new documents immediately. In PostgreSQL, that would be an ALTER TABLE plus a default-value backfill migration on a table with 50 million rows.

For analytics, we use MongoDB's aggregation pipeline to pre-compute daily portfolio summaries and store them in a separate "snapshots" collection. Dashboards query the snapshot collection (millisecond response) rather than aggregating 50 million audit events on the fly (minutes). A scheduled job runs nightly:

@Scheduled(cron = "0 30 2 * * *")  // 2:30 AM daily
public void computeDailySnapshot() {
    LocalDate yesterday = LocalDate.now().minusDays(1);

    List<AggregationOperation> pipeline = List.of(
        match(where("eventType").is("LOAN_DISBURSED")
            .and("timestamp").gte(yesterday.atStartOfDay())),
        group("loanPurpose")
            .count().as("totalLoans")
            .sum("amount").as("totalAmount")
            .avg("amount").as("avgAmount"),
        project("totalLoans", "totalAmount", "avgAmount")
            .and("_id").as("loanPurpose")
    );

    List<DailySnapshot> snapshots =
        mongoTemplate.aggregate(newAggregation(pipeline),
            "audit_events", DailySnapshot.class).getMappedResults();

    snapshotRepository.saveAll(snapshots);
}

13. Schema Design Anti-Patterns We Learned the Hard Way

Three MongoDB schema mistakes we made in production and how we fixed them:

Anti-pattern 1: Unbounded arrays. We initially embedded all loan transactions inside the loan document. After 18 months, some loan documents had grown to over 16 MB — MongoDB's document size limit. Writes started failing with "document too large" errors in the middle of the night. The fix: extract transactions into a separate "loan_transactions" collection and reference by loan ID. Any array that can grow without bound belongs in a separate collection, not embedded.

Anti-pattern 2: Over-normalisation. Coming from a relational background, our first schema had loan documents referencing borrower documents by ID, with separate collections for addresses, guarantors, and collateral. Every query required $lookup (MongoDB's join equivalent). Performance was poor — $lookup operations on large collections are expensive and cannot use indexes on the joined collection efficiently. The fix: embed small, stable, frequently-read sub-documents (borrower name, ID number, primary phone) directly in the loan document. Only use references for large or frequently-updated data.

Anti-pattern 3: Missing sparse indexes. We added a "referredBy" field to loan documents — only 15% of loans have referrals. We created an index on referredBy expecting it to speed up referral reporting. The index was 85% null values, taking up space without helping query performance. The fix: use sparse: true on indexes for fields that only a minority of documents contain. Sparse indexes skip documents where the indexed field is missing or null.

// Sparse index for rarely-present field
db.loans.createIndex(
  { "referredBy": 1 },
  { sparse: true, name: "idx_referred_by_sparse" }
)

// In Spring Data MongoDB:
@Document(collection = "loans")
public class Loan {
    @Indexed(sparse = true)
    private String referredBy;  // null for most documents
}

14. Change Streams in Production: Three Lessons Learned

We use MongoDB change streams to sync audit events to Elasticsearch in real time for full-text search. The architecture: a dedicated Spring Boot "sync service" subscribes to the audit_events collection change stream and indexes each event into Elasticsearch as it is inserted. The sync service has been running in production for 14 months with three hard-learned lessons:

Lesson 1: Persist your resume token. A change stream resumes from a position encoded in the resume token. If your sync service crashes or restarts and you have not persisted the resume token, you lose your position and will either miss events (if you resume from "now") or reprocess everything (if you restart from the beginning). We store the resume token in a MongoDB collection after processing each batch of events. On startup, we load the last saved token and resume from there.

Lesson 2: Handle primary elections gracefully. When MongoDB performs a primary election (during rolling upgrades, failovers, or routine maintenance), the change stream cursor becomes invalid. The subscription throws an exception. Your listener must catch this, discard the old cursor, wait a few seconds, and re-open the change stream from the last saved resume token.

Lesson 3: Make processing idempotent. Due to lesson 2 (cursor invalidation + resume), you may process the same event twice on reconnect. Your downstream system must handle duplicate events safely. Our Elasticsearch indexing is idempotent: we use the MongoDB document _id as the Elasticsearch document ID. Indexing the same document twice simply overwrites it — no duplicate records, no errors.

15. MongoDB Performance Checklist

Before going to production with a MongoDB-backed service, validate each item in this checklist:

Every query uses an index — run db.collection.explain("executionStats") on your slowest queries; any COLLSCAN result is a missing index
All arrays have an upper bound — if an array can grow indefinitely, move it to a separate collection
Compound indexes follow ESR rule — Equality fields first, Sort fields second, Range fields last
maxTimeMS set on all read queries — prevents runaway queries from consuming all server resources
Connection pool sized correctly — default is 100; for high-throughput services, increase to 200–500 and monitor pool utilisation
Write concern set appropriately — use w: "majority" for data you cannot afford to lose on primary failover
Replica set configured — never run a single MongoDB node in production; minimum 3-node replica set
Atlas Performance Advisor reviewed — or check slow query log daily during the first two weeks of production traffic
Change stream resume tokens persisted — if you use change streams, store the resume token durably
Index on high-cardinality fields only — indexes on low-cardinality fields (e.g., status with 3 values) hurt write performance more than they help reads

Tags:

mongodb spring boot spring data mongodb mongodb document modeling mongodb aggregation java 2026 mongodb transactions mongodb sharding

Core Java

MySQL vs PostgreSQL Java Backend

Microservices

Redis Caching Patterns

System Design

Database Sharding