Software Engineer · Java · Spring Boot · Microservices
Advanced Database Indexing Strategies: Composite Indexes, Partial Indexes & Query Optimizer Tricks
Indexes are the difference between a query that takes 3 milliseconds and one that takes 3 minutes. But indexing isn't just "add an index on the columns in your WHERE clause." Column ordering in composite indexes, partial indexes for selective queries, covering indexes that eliminate table lookups, and understanding when the optimizer ignores your carefully crafted indexes — these are the skills that separate database amateurs from production veterans.
Table of Contents
- Why Indexes Are Both Your Best Friend and Worst Enemy
- B-Tree Index Internals: What the Optimizer Actually Does
- Composite Index Column Ordering: The Left-Prefix Rule
- Partial Indexes for Selective High-Cardinality Queries
- Covering Indexes: Eliminating Table Lookups
- Index Bloat, Maintenance, and VACUUM
- When the Optimizer Ignores Your Index
- Failure Scenarios: Wrong Index, Slow Queries, Locking
- Key Takeaways
- Conclusion
1. Why Indexes Are Both Your Best Friend and Worst Enemy
A well-designed index transforms an unusable database into a sub-millisecond response machine. A poorly designed index does the opposite: it consumes disk space, slows down writes, and tricks you into thinking performance is optimized when the optimizer is actually ignoring it entirely.
Consider a production incident: an e-commerce platform added indexes on every column in their orders table — "more indexes means faster queries, right?" The result was catastrophic. Order insertions slowed from 5ms to 200ms because every insert had to update 12 different index structures. Black Friday traffic caused write amplification that saturated the database IOPS, leading to a 45-minute outage.
2. B-Tree Index Internals: What the Optimizer Actually Does
B-tree indexes are the default in PostgreSQL, MySQL, and most relational databases. Understanding their structure explains why certain queries use indexes and others don't:
-- B-tree index on (customer_id) for orders table
--
-- Structure (simplified):
-- [50]
-- / \
-- [25,30] [75,80]
-- / | \ / | \
-- [10-24] [25-29] [30-49] [50-74] [75-79] [80-99]
-- ↓ ↓ ↓ ↓ ↓ ↓
-- (row pointers to actual table data)
-- Query: SELECT * FROM orders WHERE customer_id = 42
-- Optimizer path: Root → [50] → left child → [30-49] → row lookup
-- Cost: O(log n) traversal + 1 random I/O for row fetch
The B-tree is sorted, which enables three query patterns: equality (customer_id = 42), range (customer_id BETWEEN 30 AND 50), and prefix (customer_name LIKE 'John%'). It does NOT efficiently support: suffix (LIKE '%Smith'), inequality on multiple columns, or functions on columns (WHERE YEAR(created_at) = 2026).
3. Composite Index Column Ordering: The Left-Prefix Rule
Composite indexes (indexes on multiple columns) follow the left-prefix rule: the index can only be used if the query filters on a left prefix of the indexed columns. This is the most misunderstood aspect of indexing:
-- Composite index
CREATE INDEX idx_orders_customer_status_date
ON orders(customer_id, status, created_at);
-- ✅ Uses index (filters on left prefix)
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending'
AND created_at > '2026-01-01';
-- ❌ Cannot use index efficiently (skips leftmost column)
SELECT * FROM orders WHERE status = 'pending';
SELECT * FROM orders WHERE created_at > '2026-01-01';
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-01-01';
-- ⚠️ Partial index usage (only uses customer_id portion)
SELECT * FROM orders WHERE customer_id = 123 AND created_at > '2026-01-01';
4. Partial Indexes for Selective High-Cardinality Queries
Partial indexes (PostgreSQL) or filtered indexes (SQL Server) index only a subset of rows. They're powerful for queries that always filter on a specific condition:
-- Full index: indexes ALL 10 million orders
CREATE INDEX idx_orders_status ON orders(status);
-- Size: ~250MB, includes 95% completed orders you rarely query
-- Partial index: indexes only pending orders (5% of table)
CREATE INDEX idx_orders_pending ON orders(customer_id, created_at)
WHERE status = 'pending';
-- Size: ~12MB, perfectly targeted for the queries you actually run
-- Query that benefits
SELECT * FROM orders
WHERE status = 'pending' AND customer_id = 123
ORDER BY created_at;
-- Uses idx_orders_pending, much smaller index to scan
Partial indexes are ideal for: soft-deleted rows (WHERE deleted_at IS NULL), active/inactive flags, status columns where you only query certain values, and time-based partitioning (WHERE created_at > NOW() - INTERVAL '30 days').
5. Covering Indexes: Eliminating Table Lookups
A covering index contains all columns needed by a query, eliminating the need to fetch data from the table itself. This is a massive performance win for read-heavy workloads:
-- Query we want to optimize
SELECT order_id, total_amount, created_at
FROM orders
WHERE customer_id = 123 AND status = 'completed'
ORDER BY created_at DESC
LIMIT 10;
-- Non-covering index: requires table lookup for total_amount
CREATE INDEX idx_orders_cust_status ON orders(customer_id, status, created_at);
-- Execution: Index scan → row pointer → table fetch → return
-- Covering index: all needed columns in the index
CREATE INDEX idx_orders_covering ON orders(customer_id, status, created_at)
INCLUDE (order_id, total_amount); -- PostgreSQL 11+ INCLUDE syntax
-- Execution: Index-only scan → return (no table access!)
-- MySQL equivalent
CREATE INDEX idx_orders_covering
ON orders(customer_id, status, created_at, order_id, total_amount);
6. Index Bloat, Maintenance, and VACUUM
Indexes accumulate dead tuples over time. In PostgreSQL, UPDATE and DELETE operations leave dead entries in indexes. Without proper maintenance, indexes bloat — growing larger while becoming less efficient:
-- Check index bloat in PostgreSQL
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS index_scans
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- Rebuild bloated index (blocks writes briefly)
REINDEX INDEX CONCURRENTLY idx_orders_customer;
-- Autovacuum settings for high-write tables
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.05, -- vacuum at 5% dead tuples
autovacuum_analyze_scale_factor = 0.02 -- analyze at 2% changes
);
MySQL InnoDB handles this differently with its MVCC implementation, but still benefits from OPTIMIZE TABLE periodically. Monitor index size relative to table size — if an index is larger than the table, something is wrong.
7. When the Optimizer Ignores Your Index
The query optimizer makes cost-based decisions. Sometimes it correctly ignores your index because a sequential scan is actually faster:
-- Optimizer ignores index because:
-- 1. Low selectivity (index would return most of the table)
SELECT * FROM orders WHERE status = 'completed'; -- 95% of rows
-- Full table scan is faster than 950,000 random index lookups
-- 2. Function on indexed column (can't use index)
SELECT * FROM orders WHERE LOWER(customer_email) = 'john@example.com';
-- Fix: CREATE INDEX idx_email_lower ON orders(LOWER(customer_email));
-- 3. Type mismatch
SELECT * FROM orders WHERE customer_id = '123'; -- string vs integer
-- Fix: Use correct type: WHERE customer_id = 123
-- 4. OR conditions that can't use single index
SELECT * FROM orders WHERE customer_id = 123 OR status = 'pending';
-- Fix: Use UNION or create separate indexes
-- Force index usage (debug only, not for production)
-- PostgreSQL: SET enable_seqscan = off;
-- MySQL: SELECT * FROM orders FORCE INDEX (idx_customer) WHERE ...
"The optimizer is smarter than you think. If it's not using your index, ask why before forcing it. Nine times out of ten, the optimizer is making the right choice based on statistics you haven't considered."
— PostgreSQL Documentation
8. Failure Scenarios: Wrong Index, Slow Queries, Locking
Wrong index chosen after data distribution change: Statistics become stale. Run ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) after bulk data changes.
Index creation blocks writes: Standard CREATE INDEX locks the table. Use CREATE INDEX CONCURRENTLY (PostgreSQL) or ALTER TABLE ... ALGORITHM=INPLACE (MySQL 5.6+) for online index creation.
Index corruption after crash: Rare but possible. Detect with pg_amcheck (PostgreSQL 14+) or CHECK TABLE (MySQL). Rebuild with REINDEX.
EXPLAIN ANALYZE (not just EXPLAIN) to see actual execution times and row counts. The estimated vs actual row count difference often reveals the root cause of slow queries.
9. Key Takeaways
- Column ordering matters in composite indexes — put equality columns first, then range columns.
- Partial indexes reduce size and improve performance — use them for selective queries.
- Covering indexes eliminate table lookups — use INCLUDE for additional columns.
- Monitor and maintain indexes — bloat degrades performance over time.
- The optimizer is usually right — investigate before forcing index usage.
- Every index slows writes — only create indexes that benefit real query patterns.
10. Conclusion
Database indexing is a craft that rewards deep understanding. The difference between a senior engineer and a junior one often comes down to knowing not just what indexes to create, but why — understanding B-tree structure, the left-prefix rule, partial index selectivity, and covering index trade-offs.
Start with query analysis: run EXPLAIN ANALYZE on your slowest queries, understand what the optimizer is doing, and design indexes that match your actual access patterns. Then monitor: index size, bloat, and usage statistics tell you when indexes need maintenance or removal. The goal isn't the most indexes — it's the right indexes, maintained properly, serving the queries that matter.
Explore More Articles
Discover more in-depth technical guides on databases, system design, and backend engineering.
Read Full Blog HereDiscussion / Comments
Related Posts
PostgreSQL Query Optimization
Deep dive into PostgreSQL query planning and execution optimization.
Database Sharding
Scale horizontally with database sharding strategies and patterns.
Write-Ahead Logging (WAL)
Understand WAL for durability and crash recovery in databases.
Last updated: March 2026 — Written by Md Sanwar Hossain