MongoDB with Spring Boot: Production Data Modeling, Aggregations & Performance Guide (2026)
A complete production guide covering MongoDB document modeling patterns, embedded vs referenced design, compound indexing with the ESR rule, aggregation pipeline in Java, multi-document transactions, change streams, sharding, and Atlas operations.
1. When to Choose MongoDB
| Feature | PostgreSQL | MongoDB | Cassandra |
|---|---|---|---|
| Schema flexibility | Rigid | ✅ Dynamic | Moderate |
| Hierarchical data | Joins required | ✅ Native nesting | Manual denorm |
| ACID transactions | ✅ Full ACID | ✅ Multi-doc (4.0+) | ❌ LWT only |
| Write throughput | Moderate | ✅ High | ✅ Very high |
| Aggregation | ✅ SQL GROUP BY | ✅ Pipeline | ❌ Limited |
Use MongoDB for: product catalogs with varying attributes, content management, user activity logs, IoT time-series, mobile app backends with evolving schemas, and any use case where you query by a primary key and need the full nested document in one request.
2. Spring Boot Setup
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>
# application.yml
spring:
data:
mongodb:
uri: mongodb+srv://user:password@cluster.mongodb.net/mydb?retryWrites=true&w=majority&readPreference=secondaryPreferred
# readPreference=secondaryPreferred — reads go to replica for better throughput
3. Document Modeling: Embedded vs Referenced
Rule of thumb: Embed data that is always accessed together and has a bounded/small cardinality. Reference data that has an independent lifecycle, high cardinality, or is accessed separately.
// DON'T: push follower IDs into user document — unbounded growth, 16MB limit
{
"_id": "user123",
"name": "Alice",
"followerIds": ["u1","u2","u3",...,"u99999"] // grows to millions!
}
@Document(collection = "products")
public class Product {
@Id
private String id; // MongoDB ObjectId (time-sortable, unique)
@Indexed(unique = true)
private String sku;
private String name;
private String description;
private String category; // @Indexed for facets
// EMBED: variants are few (<20), always fetched with product
private List<ProductVariant> variants; // {color, size, price, stock}
// REFERENCE: reviews are many (unbounded), accessed separately
@DBRef(lazy = true) // lazy = don't auto-fetch at load time
private List<Review> reviews;
// Metadata
@CreatedDate
private Instant createdAt;
@LastModifiedDate
private Instant updatedAt;
}
// Separate reviews collection — queried independently, paginated
@Document(collection = "reviews")
public class Review {
@Id private String id;
@Indexed private String productId; // FK by convention
private String userId;
private int rating;
private String comment;
@CreatedDate private Instant createdAt;
}
4. Indexing: ESR Rule, TTL, Partial, Sparse
ESR Rule: For compound indexes, order fields as Equality → Sort → Range. This maximizes index usage and minimizes in-memory sorts.
// Compound index following ESR rule for: category=electronics, sort by price, range on stock
@CompoundIndex(def = "{'category': 1, 'price': 1, 'stock': 1}", name = "category_price_stock")
@Document(collection = "products")
public class Product { ... }
// TTL index: auto-delete OTP documents after 5 minutes
@Document(collection = "otps")
public class Otp {
@Id private String id;
@Indexed(expireAfterSeconds = 300) // TTL index
private Date createdAt;
private String code;
private String userId;
}
// Partial index: only index active products (smaller index = faster)
// Must be done programmatically with MongoTemplate
mongoTemplate.indexOps(Product.class).ensureIndex(
new Index("sku", Sort.Direction.ASC)
.named("active_sku_idx")
.sparse()
.partial(new Document("status", "ACTIVE"))
);
5. Aggregation Pipeline in Java
@Service
public class ProductAggregationService {
@Autowired private MongoTemplate mongoTemplate;
public CategorySalesReport getSalesReport(String category, LocalDate from, LocalDate to) {
TypedAggregation<Order> agg = Aggregation.newAggregation(Order.class,
// Stage 1: $match FIRST — filter before any computation
Aggregation.match(
Criteria.where("category").is(category)
.and("createdAt").gte(from).lte(to)
.and("status").is("COMPLETED")),
// Stage 2: $group — sum revenue and count orders per product
Aggregation.group("productId")
.sum("amount").as("totalRevenue")
.count().as("orderCount")
.avg("amount").as("avgOrderValue"),
// Stage 3: $lookup — join with products collection
Aggregation.lookup("products", "_id", "_id", "product"),
Aggregation.unwind("product"),
// Stage 4: $sort — top selling products first
Aggregation.sort(Sort.by(Sort.Direction.DESC, "totalRevenue")),
Aggregation.limit(50),
// Stage 5: $project — shape output
Aggregation.project("orderCount", "totalRevenue", "avgOrderValue")
.andExpression("product.name").as("productName")
);
return mongoTemplate.aggregate(agg, CategorySalesReport.class).getMappedResults()
.stream().findFirst().orElse(new CategorySalesReport());
}
}
6. Spring Data MongoDB: Repository & MongoTemplate
// MongoRepository for simple CRUD
public interface ProductRepository extends MongoRepository<Product, String> {
// Derived queries
List<Product> findByCategoryAndPriceLessThan(String category, double maxPrice);
@Query("{ 'category': ?0, 'variants.stock': { $gt: 0 } }")
Page<Product> findAvailableByCategory(String category, Pageable pageable);
}
// MongoTemplate for complex Criteria
@Service
public class ProductSearchService {
@Autowired private MongoTemplate mongoTemplate;
public Page<Product> search(ProductFilter filter, Pageable pageable) {
Criteria criteria = new Criteria();
if (filter.getCategory() != null)
criteria.and("category").is(filter.getCategory());
if (filter.getMinPrice() != null)
criteria.and("price").gte(filter.getMinPrice());
if (filter.getMaxPrice() != null)
criteria.and("price").lte(filter.getMaxPrice());
if (filter.getKeyword() != null)
criteria.and("name").regex(filter.getKeyword(), "i"); // case-insensitive
Query query = Query.query(criteria)
.with(pageable)
.with(Sort.by("createdAt").descending());
List<Product> results = mongoTemplate.find(query, Product.class);
long count = mongoTemplate.count(Query.query(criteria), Product.class);
return new PageImpl<>(results, pageable, count);
}
}
7. Multi-Document Transactions
@Configuration
public class MongoConfig {
@Bean
public MongoTransactionManager transactionManager(MongoDatabaseFactory dbFactory) {
return new MongoTransactionManager(dbFactory);
}
}
@Service
public class OrderService {
@Transactional // ACID across multiple collections
public Order placeOrder(OrderRequest request) {
// 1. Deduct inventory atomically
Product product = productRepository.findById(request.getProductId())
.orElseThrow(() -> new ProductNotFoundException(request.getProductId()));
if (product.getStock() < request.getQuantity()) {
throw new InsufficientStockException();
}
product.setStock(product.getStock() - request.getQuantity());
productRepository.save(product);
// 2. Create order record
Order order = Order.builder()
.productId(request.getProductId())
.userId(request.getUserId())
.quantity(request.getQuantity())
.amount(product.getPrice() * request.getQuantity())
.status("PENDING")
.build();
return orderRepository.save(order);
// If any exception: full rollback of both writes
}
}
8. Change Streams: Real-Time Events
@Component
public class OrderChangeStreamListener {
@Autowired private MongoTemplate mongoTemplate;
@Autowired private EventPublisher eventPublisher;
private BsonDocument lastResumeToken;
@PostConstruct
public void startListening() {
ChangeStreamOptions options = ChangeStreamOptions.builder()
.filter(Aggregation.newAggregation(
Aggregation.match(Criteria.where("operationType").in("insert", "update"))))
.resumeAt(loadLastResumeToken()) // resume after restart
.build();
Flux<ChangeStreamEvent<Order>> stream = mongoTemplate.changeStream(
"orders", options, Order.class);
stream.subscribe(event -> {
lastResumeToken = event.getRaw().getResumeToken();
saveResumeToken(lastResumeToken); // persist for crash recovery
Order order = event.getBody();
if ("insert".equals(event.getOperationType().getValue())) {
eventPublisher.publish(new OrderCreatedEvent(order));
}
});
}
}
10. Production Operations & Atlas
| Area | Action | Tool |
|---|---|---|
| Slow queries | Enable profiler level 2 (slowms=100) | db.setProfilingLevel(2, {slowms: 100}) |
| Query explain | Check executionStats for index usage | cursor.explain("executionStats") |
| Index advisor | Atlas Performance Advisor auto-suggests indexes | Atlas UI |
| Backups | Atlas continuous backup with point-in-time restore | Atlas / mongodump |
11. Interview Questions & Checklist
A: Embed the first ~5 comments in the post document for fast display (no second query). Store all comments in a separate comments collection with a postId index for paginated loading of full comment threads. This pattern — "subset pattern" — balances read performance for the common case (show post + preview comments) and scalability for the edge case (post with 10k comments).
- Always use replica set (required for transactions)
- Explicit schema validation via $jsonSchema
- Avoid unbounded arrays
- Follow ESR rule for compound indexes
- Use TTL index for session/OTP documents
- $match first in every aggregation pipeline
- Test explain("executionStats") for all queries
- Set maxTimeMS on all queries
- Use change streams for event-driven sync
- Atlas Performance Advisor in production