System Design

Advanced Database Indexing Strategies: Composite Indexes, Partial Indexes & Query Optimizer Tricks

Indexes are the difference between a query that takes 3 milliseconds and one that takes 3 minutes. But indexing isn't just "add an index on the columns in your WHERE clause." Column ordering in composite indexes, partial indexes for selective queries, covering indexes that eliminate table lookups, and understanding when the optimizer ignores your carefully crafted indexes — these are the skills that separate database amateurs from production veterans.

Md Sanwar Hossain March 22, 2026 16 min read System Design

Database Indexing - data structures and query optimization

Why Indexes Are Both Your Best Friend and Worst Enemy
B-Tree Index Internals: What the Optimizer Actually Does
Composite Index Column Ordering: The Left-Prefix Rule
Partial Indexes for Selective High-Cardinality Queries
Covering Indexes: Eliminating Table Lookups
Index Bloat, Maintenance, and VACUUM
When the Optimizer Ignores Your Index
Failure Scenarios: Wrong Index, Slow Queries, Locking
Key Takeaways
Conclusion

1. Why Indexes Are Both Your Best Friend and Worst Enemy

Database Indexing Architecture | mdsanwarhossain.me — Database Indexing Architecture — mdsanwarhossain.me

Consider a production incident: an e-commerce platform added indexes on every column in their orders table — "more indexes means faster queries, right?" The result was catastrophic. Order insertions slowed from 5ms to 200ms because every insert had to update 12 different index structures. Black Friday traffic caused write amplification that saturated the database IOPS, leading to a 45-minute outage.

Rule of thumb: Every index you add speeds up some reads but slows down all writes. The goal is finding the indexes that benefit the most queries while adding the least write overhead.

2. B-Tree Index Internals: What the Optimizer Actually Does

B-tree indexes are the default in PostgreSQL, MySQL, and most relational databases. Understanding their structure explains why certain queries use indexes and others don't:

-- B-tree index on (customer_id) for orders table
-- 
-- Structure (simplified):
--                    [50]
--                   /    \
--            [25,30]      [75,80]
--           /   |   \    /   |   \
--   [10-24] [25-29] [30-49] [50-74] [75-79] [80-99]
--      ↓       ↓       ↓       ↓       ↓       ↓
--   (row pointers to actual table data)
-- Query: SELECT * FROM orders WHERE customer_id = 42
-- Optimizer path: Root → [50] → left child → [30-49] → row lookup
-- Cost: O(log n) traversal + 1 random I/O for row fetch

The B-tree is sorted, which enables three query patterns: equality (customer_id = 42), range (customer_id BETWEEN 30 AND 50), and prefix (customer_name LIKE 'John%'). It does NOT efficiently support: suffix (LIKE '%Smith'), inequality on multiple columns, or functions on columns (WHERE YEAR(created_at) = 2026).

3. Composite Index Column Ordering: The Left-Prefix Rule

Query Optimization | mdsanwarhossain.me — Query Optimization — mdsanwarhossain.me

Composite indexes (indexes on multiple columns) follow the left-prefix rule: the index can only be used if the query filters on a left prefix of the indexed columns. This is the most misunderstood aspect of indexing:

-- Composite index
CREATE INDEX idx_orders_customer_status_date 
ON orders(customer_id, status, created_at);
-- ✅ Uses index (filters on left prefix)
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending' 
  AND created_at > '2026-01-01';
-- ❌ Cannot use index efficiently (skips leftmost column)
SELECT * FROM orders WHERE status = 'pending';
SELECT * FROM orders WHERE created_at > '2026-01-01';
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-01-01';
-- ⚠️ Partial index usage (only uses customer_id portion)
SELECT * FROM orders WHERE customer_id = 123 AND created_at > '2026-01-01';

Column ordering strategy: Put equality columns first (WHERE x = value), then range columns (WHERE y > value). Put high-cardinality columns before low-cardinality when all else is equal.

4. Partial Indexes for Selective High-Cardinality Queries

Partial indexes (PostgreSQL) or filtered indexes (SQL Server) index only a subset of rows. They're powerful for queries that always filter on a specific condition:

Advanced Database Indexing | mdsanwarhossain.me — Advanced Database Indexing — mdsanwarhossain.me

-- Full index: indexes ALL 10 million orders
CREATE INDEX idx_orders_status ON orders(status);
-- Size: ~250MB, includes 95% completed orders you rarely query
-- Partial index: indexes only pending orders (5% of table)
CREATE INDEX idx_orders_pending ON orders(customer_id, created_at)
WHERE status = 'pending';
-- Size: ~12MB, perfectly targeted for the queries you actually run
-- Query that benefits
SELECT * FROM orders 
WHERE status = 'pending' AND customer_id = 123
ORDER BY created_at;
-- Uses idx_orders_pending, much smaller index to scan

Partial indexes are ideal for: soft-deleted rows (WHERE deleted_at IS NULL), active/inactive flags, status columns where you only query certain values, and time-based partitioning (WHERE created_at > NOW() - INTERVAL '30 days').

5. Covering Indexes: Eliminating Table Lookups

A covering index contains all columns needed by a query, eliminating the need to fetch data from the table itself. This is a massive performance win for read-heavy workloads:

-- Query we want to optimize
SELECT order_id, total_amount, created_at
FROM orders
WHERE customer_id = 123 AND status = 'completed'
ORDER BY created_at DESC
LIMIT 10;
-- Non-covering index: requires table lookup for total_amount
CREATE INDEX idx_orders_cust_status ON orders(customer_id, status, created_at);
-- Execution: Index scan → row pointer → table fetch → return
-- Covering index: all needed columns in the index
CREATE INDEX idx_orders_covering ON orders(customer_id, status, created_at)
INCLUDE (order_id, total_amount);  -- PostgreSQL 11+ INCLUDE syntax
-- Execution: Index-only scan → return (no table access!)
-- MySQL equivalent
CREATE INDEX idx_orders_covering 
ON orders(customer_id, status, created_at, order_id, total_amount);

Trade-off: Covering indexes are larger and slower to update. Use them for critical read paths where query latency matters more than write performance.

6. Index Bloat, Maintenance, and VACUUM

Indexes accumulate dead tuples over time. In PostgreSQL, UPDATE and DELETE operations leave dead entries in indexes. Without proper maintenance, indexes bloat — growing larger while becoming less efficient:

-- Check index bloat in PostgreSQL
SELECT 
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS index_scans
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- Rebuild bloated index (blocks writes briefly)
REINDEX INDEX CONCURRENTLY idx_orders_customer;
-- Autovacuum settings for high-write tables
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.05,  -- vacuum at 5% dead tuples
    autovacuum_analyze_scale_factor = 0.02  -- analyze at 2% changes
);

MySQL InnoDB handles this differently with its MVCC implementation, but still benefits from OPTIMIZE TABLE periodically. Monitor index size relative to table size — if an index is larger than the table, something is wrong.

7. When the Optimizer Ignores Your Index

The query optimizer makes cost-based decisions. Sometimes it correctly ignores your index because a sequential scan is actually faster:

-- Optimizer ignores index because:
-- 1. Low selectivity (index would return most of the table)
SELECT * FROM orders WHERE status = 'completed';  -- 95% of rows
-- Full table scan is faster than 950,000 random index lookups
-- 2. Function on indexed column (can't use index)
SELECT * FROM orders WHERE LOWER(customer_email) = 'john@example.com';
-- Fix: CREATE INDEX idx_email_lower ON orders(LOWER(customer_email));
-- 3. Type mismatch
SELECT * FROM orders WHERE customer_id = '123';  -- string vs integer
-- Fix: Use correct type: WHERE customer_id = 123
-- 4. OR conditions that can't use single index
SELECT * FROM orders WHERE customer_id = 123 OR status = 'pending';
-- Fix: Use UNION or create separate indexes
-- Force index usage (debug only, not for production)
-- PostgreSQL: SET enable_seqscan = off;
-- MySQL: SELECT * FROM orders FORCE INDEX (idx_customer) WHERE ...

Operationally, index tuning should be an iterative loop: collect top slow queries, inspect execution plans, design candidate indexes, replay workload in staging, then re-measure with production-like data distribution. Avoid creating overlapping indexes that differ only slightly (for example, (customer_id) and (customer_id, created_at) without a clear read benefit) because each additional index increases write amplification and storage cost. Track write-heavy table throughput before and after index additions; if insert/update latency increases significantly, the index might be solving one query while harming overall system performance. Treat index lifecycle management as part of schema governance, not an ad hoc query-level fix.

"The optimizer is smarter than you think. If it's not using your index, ask why before forcing it. Nine times out of ten, the optimizer is making the right choice based on statistics you haven't considered."
— PostgreSQL Documentation

8. Failure Scenarios: Wrong Index, Slow Queries, Locking

Wrong index chosen after data distribution change: Statistics become stale. Run ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) after bulk data changes.

Index creation blocks writes: Standard CREATE INDEX locks the table. Use CREATE INDEX CONCURRENTLY (PostgreSQL) or ALTER TABLE ... ALGORITHM=INPLACE (MySQL 5.6+) for online index creation.

Index corruption after crash: Rare but possible. Detect with pg_amcheck (PostgreSQL 14+) or CHECK TABLE (MySQL). Rebuild with REINDEX.

Debugging tip: Always use EXPLAIN ANALYZE (not just EXPLAIN) to see actual execution times and row counts. The estimated vs actual row count difference often reveals the root cause of slow queries.

9. Key Takeaways

Column ordering matters in composite indexes — put equality columns first, then range columns.
Partial indexes reduce size and improve performance — use them for selective queries.
Covering indexes eliminate table lookups — use INCLUDE for additional columns.
Monitor and maintain indexes — bloat degrades performance over time.
The optimizer is usually right — investigate before forcing index usage.
Every index slows writes — only create indexes that benefit real query patterns.

10. Conclusion

Database indexing is a craft that rewards deep understanding. The difference between a senior engineer and a junior one often comes down to knowing not just what indexes to create, but why — understanding B-tree structure, the left-prefix rule, partial index selectivity, and covering index trade-offs.

Start with query analysis: run EXPLAIN ANALYZE on your slowest queries, understand what the optimizer is doing, and design indexes that match your actual access patterns. Then monitor: index size, bloat, and usage statistics tell you when indexes need maintenance or removal. The goal isn't the most indexes — it's the right indexes, maintained properly, serving the queries that matter.

Explore More Articles

Discover more in-depth technical guides on databases, system design, and backend engineering.

Read Full Blog Here

Frequently Asked Questions

Why Indexes Are Both Your Best Friend and Worst Enemy?

A well-designed index transforms an unusable database into a sub-millisecond response machine. A poorly designed index does the opposite: it consumes disk space, slows down writes, and tricks you into thinking performance is optimized when the optimizer is actually ignoring it entirely. Consider a production incident: an e-commerce platform added indexes on every column in their orders table — "more indexes means faster queries, right?" The result was catastrophic. Order insertions slowed from 5ms to 200ms because every insert had to update 12 different index structures. Black Friday traffic caused write amplification that saturated the database IOPS, leading to a 45-minute outage.

What is B-Tree Index Internals and how does it work?

B-tree indexes are the default in PostgreSQL, MySQL, and most relational databases. Understanding their structure explains why certain queries use indexes and others don't: -- B-tree index on (customer_id) for orders table -- -- Structure (simplified): -- [50] -- / \ -- [25,30] [75,80] -- / | \ / | \ -- [10-24] [25-29] [30-49] [50-74] [75-79] [80-99] -- ↓ ↓ ↓ ↓ ↓ ↓ -- (row pointers to actual table data) -- Query: SELECT * FROM orders WHERE customer_id = 42 -- Optimizer path: Root → [50] → left child → [30-49] → row lookup -- Cost: O(log n) traversal + 1 random I/O for row fetch The B-tree is sorted, which enables three query patterns: equality (customer_id = 42), range (customer_id BETWEEN 30 AND 50), and prefix (customer_name.

What is Composite Index Column Ordering and how does it work?

Composite indexes (indexes on multiple columns) follow the left-prefix rule : the index can only be used if the query filters on a left prefix of the indexed columns. This is the most misunderstood aspect of indexing:

What is Partial Indexes for Selective High-Cardinality Queries and how does it work?

Partial indexes (PostgreSQL) or filtered indexes (SQL Server) index only a subset of rows. They're powerful for queries that always filter on a specific condition: -- Full index: indexes ALL 10 million orders CREATE INDEX idx_orders_status ON orders(status); -- Size: ~250MB, includes 95% completed orders you rarely query -- Partial index: indexes only pending orders (5% of table) CREATE INDEX idx_orders_pending ON orders(customer_id, created_at) WHERE status = 'pending'; -- Size: ~12MB, perfectly targeted for the queries you actually run -- Query that benefits SELECT * FROM orders WHERE status = 'pending' AND customer_id = 123 ORDER BY created_at; -- Uses idx_orders_pending, much smaller index to scan Partial indexes are ideal for: soft-deleted rows ( WHERE deleted_at IS NULL ), active/inactive flags, status columns where you only query certain values, and.

What is Covering Indexes and how does it work?

A covering index contains all columns needed by a query, eliminating the need to fetch data from the table itself. This is a massive performance win for read-heavy workloads:

Advanced Database Indexing Strategies: Composite Indexes, Partial Indexes & Query Optimizer Tricks

Table of Contents

1. Why Indexes Are Both Your Best Friend and Worst Enemy

2. B-Tree Index Internals: What the Optimizer Actually Does

3. Composite Index Column Ordering: The Left-Prefix Rule

4. Partial Indexes for Selective High-Cardinality Queries

5. Covering Indexes: Eliminating Table Lookups

6. Index Bloat, Maintenance, and VACUUM

7. When the Optimizer Ignores Your Index

8. Failure Scenarios: Wrong Index, Slow Queries, Locking

9. Key Takeaways

10. Conclusion

Explore More Articles

Frequently Asked Questions

Why Indexes Are Both Your Best Friend and Worst Enemy?

What is B-Tree Index Internals and how does it work?

What is Composite Index Column Ordering and how does it work?

What is Partial Indexes for Selective High-Cardinality Queries and how does it work?

What is Covering Indexes and how does it work?

Tags

Leave a Comment

Related Posts

Advanced Database Indexing Strategies: Composite Indexes, Partial Indexes & Query Optimizer Tricks

Table of Contents

1. Why Indexes Are Both Your Best Friend and Worst Enemy

2. B-Tree Index Internals: What the Optimizer Actually Does

3. Composite Index Column Ordering: The Left-Prefix Rule

4. Partial Indexes for Selective High-Cardinality Queries

5. Covering Indexes: Eliminating Table Lookups

6. Index Bloat, Maintenance, and VACUUM

7. When the Optimizer Ignores Your Index

8. Failure Scenarios: Wrong Index, Slow Queries, Locking

9. Key Takeaways

10. Conclusion

Explore More Articles

Frequently Asked Questions

Why Indexes Are Both Your Best Friend and Worst Enemy?

What is B-Tree Index Internals and how does it work?

What is Composite Index Column Ordering and how does it work?

What is Partial Indexes for Selective High-Cardinality Queries and how does it work?

What is Covering Indexes and how does it work?

Tags

Leave a Comment

Related Posts

PostgreSQL Query Optimization in Production: Indexes, EXPLAIN ANALYZE & Slow Query Elimination

Database Sharding Strategies: Horizontal Scaling Patterns for High-Traffic Production Systems

Write-Ahead Logging (WAL) Internals: Crash Recovery and Durability in Distributed Databases

Cookie Notice