How does MySQL InnoDB Internals work at a technical level?

MVCC (Multi-Version Concurrency Control) is why InnoDB readers don't block writers and writers don't block readers. Each row in InnoDB has two hidden columns: trx_id (the transaction ID that last modified this row) and roll_ptr (a pointer to the previous version in the undo log chain). When a transaction reads a row, InnoDB follows this algorithm: Take a ReadView snapshot at transaction start (for REPEATABLE_READ ) or at each statement (for READ_COMMITTED ). The ReadView records the set of active transaction IDs at that moment. For each row version encountered, check if its trx_id is visible (committed before the ReadView was taken and not in the active set). If not visible, follow the roll_ptr to the previous version in the undo log and repeat. Return the first visible version found.

MySQL

MySQL InnoDB Internals: Buffer Pool, Redo Log & MVCC Deep Dive for Java Developers

Q: What is Buffer Pool and how does it work?

The Buffer Pool is InnoDB's cache. Its size ( innodb_buffer_pool_size ) should be 70–80% of available RAM on a dedicated database server. The standard advice is well-known; the why is less so: InnoDB uses a midpoint insertion LRU algorithm, not a standard LRU. This matters enormously for Java applications that run full-table scans (reports, batch jobs, admin exports). -- Check buffer pool utilization SELECT pool_id, pool_size, free_buffers, database_pages, old_database_pages, modified_database_pages, hit_rate, young_makes_per_thousand_gets, not_young_makes_per_thousand_gets FROM information_schema.INNODB_BUFFER_POOL_STATS; -- Buffer pool hit rate — aim for > 99% in production -- hit_rate = 1000 means 100% hit rate (integer per mille) -- young_makes high = pages being promoted from old to young sublist -- If hit_rate is low, increase innodb_buffer_pool_size The LRU list is split into a "young" (hot, 5/8 of the list.

Q: What is Undo Log and how does it work?

When InnoDB modifies a row, it writes the before image (the old version) to the undo log. This serves two purposes: enabling ROLLBACK (replay the undo records in reverse to restore the old state) and MVCC (other transactions can read old versions from the undo log instead of being blocked by the writer). The undo log is stored in the system tablespace ( ibdata1 ) or dedicated undo tablespaces ( undo_001 , undo_002 — the MySQL 8.0 default). A critical production pitfall: a long-running transaction that holds a consistent read snapshot prevents InnoDB from purging old undo log records, causing undo log bloat. The History List Length (HLL) metric measures this:

Understanding InnoDB's internals is the difference between a Java developer who configures Spring Boot's spring.datasource.* properties and one who can explain exactly why a query slowed down under load, why an OOM occurs despite a large heap, and how to tune InnoDB to match the access patterns of a Spring Data JPA application. This deep dive covers the Buffer Pool LRU mechanics, Write-Ahead Logging (WAL) via the Redo Log, MVCC via Undo Logs, B+Tree clustered index layout, and the locking model including Gap Locks that prevent phantom reads.

Md Sanwar Hossain April 4, 2026 24 min read MySQL

MySQL InnoDB internals - buffer pool, redo log, MVCC deep dive for Java developers

InnoDB Architecture: Memory vs Disk Components
Buffer Pool: Pages, LRU, and the Young/Old Trick
Redo Log (WAL): How InnoDB Survives Crashes
Undo Log: The Foundation of ROLLBACK and MVCC
MVCC Deep Dive: ReadView, trx_id, and Consistent Reads
B+Tree Clustered Index: Why InnoDB Hates UUIDs
Locking: Record Locks, Gap Locks, and Next-Key Locks
Java Developer Tuning Guide: Key InnoDB Variables
Key Takeaways

1. InnoDB Architecture: Memory vs Disk Components

MySQL InnoDB Engine Architecture - Buffer Pool, Redo Log, MVCC | mdsanwarhossain.me — MySQL InnoDB Engine Architecture — mdsanwarhossain.me

InnoDB separates its operation into in-memory structures (the Buffer Pool, Log Buffer, and Change Buffer) and on-disk structures (tablespace .ibd files, redo log files ib_logfile0/1, and undo tablespaces). Every read and write operation flows through the Buffer Pool — InnoDB never reads from disk if the page is already in memory. The fundamental design goal is to maximize buffer pool hit rate while ensuring crash recovery via the redo log.

InnoDB pages are 16KB by default (innodb_page_size). Every table row, index entry, and overflow data lives in these fixed-size pages. When a read request comes in for a row, InnoDB reads the entire 16KB page containing that row into the buffer pool, then returns the row. Subsequent reads of other rows on the same page hit memory directly — this spatial locality is why InnoDB performs well for range scans.

2. Buffer Pool: Pages, LRU, and the Young/Old Trick

The Buffer Pool is InnoDB's cache. Its size (innodb_buffer_pool_size) should be 70–80% of available RAM on a dedicated database server. The standard advice is well-known; the why is less so: InnoDB uses a midpoint insertion LRU algorithm, not a standard LRU. This matters enormously for Java applications that run full-table scans (reports, batch jobs, admin exports).

-- Check buffer pool utilization
SELECT
    pool_id,
    pool_size,
    free_buffers,
    database_pages,
    old_database_pages,
    modified_database_pages,
    hit_rate,
    young_makes_per_thousand_gets,
    not_young_makes_per_thousand_gets
FROM information_schema.INNODB_BUFFER_POOL_STATS;

-- Buffer pool hit rate — aim for > 99% in production
-- hit_rate = 1000 means 100% hit rate (integer per mille)
-- young_makes high = pages being promoted from old to young sublist
-- If hit_rate is low, increase innodb_buffer_pool_size

The LRU list is split into a "young" (hot, 5/8 of the list by default) and "old" (cold, 3/8) sublist. New pages enter at the midpoint (top of the old sublist). If accessed again within innodb_old_blocks_time milliseconds (1000ms default), they stay in the old sublist; if accessed later, they are promoted to the young sublist. This prevents a full-table scan from evicting your working set: the scan pages enter the old sublist and are evicted quickly, preserving your hot OLTP pages.

# my.cnf — buffer pool tuning for an 8GB RAM dedicated MySQL server
innodb_buffer_pool_size = 6G         # 75% of RAM
innodb_buffer_pool_instances = 8     # Reduce mutex contention (1 per GB, min 8)
innodb_old_blocks_time = 1000        # ms before old block can be promoted
innodb_old_blocks_pct = 37           # 37% of buffer pool for old sublist

# Multiple buffer pool instances significantly reduce contention
# under high-concurrency Spring Boot applications

3. Redo Log (WAL): How InnoDB Survives Crashes

InnoDB uses Write-Ahead Logging (WAL). Every modification to a data page is first written to the redo log (a fixed-size circular file, ib_logfile0 and ib_logfile1) before the modified page is flushed to the tablespace. On crash recovery, InnoDB replays redo log records to bring data files back to a consistent state. This is how the Durability guarantee of ACID is implemented.

# The most important InnoDB durability setting
innodb_flush_log_at_trx_commit = 1   # Default: flush log on EVERY commit (full ACID)
# = 0: flush once per second (risk: up to 1s of data loss on crash)
# = 2: write to OS cache on commit, flush once per second (risk: OS crash loses data)

# Production: always use 1 for financial/critical data
# Use 2 only for read replicas or non-critical data with accepted risk

# Redo log file size — MySQL 8.0.30+ auto-manages; older versions:
innodb_log_file_size = 256M          # Larger = faster writes, longer recovery
innodb_log_buffer_size = 64M         # Buffer before flush; larger = fewer I/Os for big txns

A common Spring Boot performance issue: the developer enables innodb_flush_log_at_trx_commit=1 (correct for ACID) but uses batch inserts inside individual transactions. Each batch insert flushes the redo log, causing unnecessary fsync storms. The fix is to batch commits — insert 1000 rows per transaction instead of one row per transaction — reducing fsync calls by 999x while maintaining the same durability guarantee.

// Spring Batch: tune chunk size to batch commits
@Bean
public Step importProductsStep(JobRepository jobRepository,
        PlatformTransactionManager txManager,
        ItemReader<ProductCsv> reader,
        ItemWriter<Product> writer) {
    return new StepBuilder("importProducts", jobRepository)
        .<ProductCsv, Product>chunk(500, txManager)  // 500 rows per txn = 500 rows per fsync
        .reader(reader)
        .writer(writer)
        .build();
    // At chunk=1 (default), 100K rows = 100K fsyncs ≈ minutes
    // At chunk=500, 100K rows = 200 fsyncs ≈ seconds
}

4. Undo Log: The Foundation of ROLLBACK and MVCC

When InnoDB modifies a row, it writes the before image (the old version) to the undo log. This serves two purposes: enabling ROLLBACK (replay the undo records in reverse to restore the old state) and MVCC (other transactions can read old versions from the undo log instead of being blocked by the writer).

The undo log is stored in the system tablespace (ibdata1) or dedicated undo tablespaces (undo_001, undo_002 — the MySQL 8.0 default). A critical production pitfall: a long-running transaction that holds a consistent read snapshot prevents InnoDB from purging old undo log records, causing undo log bloat. The History List Length (HLL) metric measures this:

-- Monitor undo log health
SHOW ENGINE INNODB STATUS\G
-- Look for: "History list length" in the TRANSACTIONS section
-- > 1000 is a warning; > 10000 is a problem

-- Find long-running transactions causing HLL growth
SELECT trx_id, trx_started, TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS seconds_running,
       trx_query, trx_state
FROM information_schema.INNODB_TRX
WHERE TIMESTAMPDIFF(SECOND, trx_started, NOW()) > 30
ORDER BY trx_started;

Spring Boot Pitfall: A @Transactional(readOnly = true) Spring service method that holds a connection open while doing HTTP calls or slow business logic creates a long-running read transaction, inflating the HLL and slowing down the purge thread. Keep transaction boundaries tight — fetch your data, close the transaction, then do business logic.

5. MVCC Deep Dive: ReadView, trx_id, and Consistent Reads

MySQL InnoDB MVCC - ReadView and Undo Log Version Chain | mdsanwarhossain.me — MySQL InnoDB MVCC and Locking — mdsanwarhossain.me

MVCC (Multi-Version Concurrency Control) is why InnoDB readers don't block writers and writers don't block readers. Each row in InnoDB has two hidden columns: trx_id (the transaction ID that last modified this row) and roll_ptr (a pointer to the previous version in the undo log chain). When a transaction reads a row, InnoDB follows this algorithm:

Take a ReadView snapshot at transaction start (for REPEATABLE_READ) or at each statement (for READ_COMMITTED). The ReadView records the set of active transaction IDs at that moment.
For each row version encountered, check if its trx_id is visible (committed before the ReadView was taken and not in the active set).
If not visible, follow the roll_ptr to the previous version in the undo log and repeat.
Return the first visible version found.

-- Demonstrate MVCC in action
-- Session 1: REPEATABLE_READ (MySQL default)
START TRANSACTION;
SELECT balance FROM accounts WHERE id = 1;   -- Returns: 1000

-- Session 2 (concurrent): commits a write
UPDATE accounts SET balance = 500 WHERE id = 1;
COMMIT;

-- Session 1: still sees 1000 — ReadView taken at START TRANSACTION
SELECT balance FROM accounts WHERE id = 1;   -- Still returns: 1000

-- Session 1: only sees 500 after it starts a NEW transaction
COMMIT;
START TRANSACTION;
SELECT balance FROM accounts WHERE id = 1;   -- Now returns: 500

-- READ_COMMITTED: each statement gets a fresh ReadView
-- Session 1 would have seen 500 immediately after Session 2 committed

6. B+Tree Clustered Index: Why InnoDB Hates UUIDs

InnoDB stores all table data in a B+Tree clustered index, ordered by the primary key. This is fundamentally different from PostgreSQL's heap storage. The implication: the primary key determines physical row order. Sequential primary keys (AUTO_INCREMENT, BIGINT) are fast to insert because new rows append to the right side of the B+Tree. Random UUIDs (v4) cause page splits on every insert — the new UUID goes to a random position in the tree, requiring InnoDB to split an existing full page, causing write amplification and buffer pool fragmentation:

-- BAD: UUID v4 primary key (random, causes page splits)
CREATE TABLE orders_bad (
    id CHAR(36) PRIMARY KEY DEFAULT (UUID()),  -- Random UUID causes fragmentation
    user_id BIGINT,
    total DECIMAL(10,2)
);

-- GOOD 1: AUTO_INCREMENT BIGINT (sequential, optimal InnoDB insert performance)
CREATE TABLE orders_good (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    user_id BIGINT,
    total DECIMAL(10,2)
);

-- GOOD 2: UUID v7 (time-ordered, compatible with UUID format, no page splits)
-- MySQL 8.0 doesn't have UUID_v7 natively — use application-generated UUIDv7
-- or UUID_TO_BIN(UUID(), true) with swap_flag=true for time-ordering

// Spring Boot JPA entity: prefer @GeneratedValue(strategy = AUTO) with BIGINT
@Entity
@Table(name = "orders")
public class Order {

    // AUTO_INCREMENT — optimal for InnoDB insert performance
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    // If UUID is required for public API (security through obscurity):
    @Column(unique = true, nullable = false)
    private UUID externalId = UUID.randomUUID(); // Only for public exposure

    // Keep primary key as BIGINT for DB-level operations
}

7. Locking: Record Locks, Gap Locks, and Next-Key Locks

InnoDB's locking is more nuanced than most developers expect. Understanding it prevents deadlocks and explains why queries that "should be fast" sometimes wait:

Lock Type	What It Locks	When Applied
Record Lock	A single index row	UPDATE/DELETE on specific row; SELECT FOR UPDATE
Gap Lock	Space between index entries	REPEATABLE_READ range queries; prevents phantom inserts
Next-Key Lock	Row + gap before it	Default in REPEATABLE_READ; prevents phantom reads
Insert Intention Lock	Gap at insert position	Before inserting; compatible with other insert intentions

-- Deadlock example: concurrent insert to the same gap
-- Both sessions try to insert into a gap protected by a gap lock
-- Session 1: holds gap lock (10, 20), waits for Session 2's gap lock
-- Session 2: holds gap lock (10, 20), waits for Session 1's gap lock
-- → InnoDB detects cycle, kills the transaction with fewer undo log records

-- Prevention: reduce isolation level to READ_COMMITTED for non-financial operations
-- (disables gap locking, only record locks remain)
SET TRANSACTION ISOLATION LEVEL READ_COMMITTED;

-- Or: use SELECT ... FOR UPDATE to acquire locks upfront in a consistent order
-- Consistent lock ordering prevents deadlocks by eliminating circular waits

8. Java Developer Tuning Guide: Key InnoDB Variables

Variable	Default	Recommended	Impact
innodb_buffer_pool_size	128M	70-80% RAM	Hit rate, read performance
innodb_buffer_pool_instances	1	8 (min for >1GB pools)	Mutex contention under concurrency
innodb_flush_log_at_trx_commit	1	1 (ACID); 2 (replicas)	Durability vs write throughput
innodb_io_capacity	200	2000–20000 (SSD)	Flush rate of dirty pages
innodb_read_io_threads	4	8-16	Parallel read I/O for large datasets
max_connections	151	Match HikariCP pool × services	Prevent connection exhaustion

HikariCP + InnoDB: Set spring.datasource.hikari.maximum-pool-size to number_of_CPU_cores × 2 for OLTP workloads, not an arbitrarily large number. More connections than InnoDB can service concurrently causes thread scheduling overhead that hurts performance.

9. Key Takeaways

Size innodb_buffer_pool_size to 70–80% of RAM and use innodb_buffer_pool_instances=8 to reduce mutex contention in concurrent Spring Boot applications.
Always use innodb_flush_log_at_trx_commit=1 for financial data; batch your commits to reduce fsync calls (Spring Batch chunk size = rows per commit).
Long-running read transactions inflate the History List Length and prevent undo log purging — keep @Transactional boundaries tight, don't hold connections while doing HTTP calls.
MVCC means readers never block writers in InnoDB — use REPEATABLE_READ for consistent views within a transaction, READ_COMMITTED for lower locking overhead.
Use BIGINT AUTO_INCREMENT primary keys, not UUID v4 — random UUIDs cause B+Tree page splits and degrade write performance significantly at scale.
Monitor INNODB_BUFFER_POOL_STATS hit rate, History List Length, and INNODB_TRX for long-running transactions as core MySQL health metrics.

MySQL InnoDB Buffer Pool MVCC Redo Log B+Tree Java Spring Boot

Questions about InnoDB internals, buffer pool tuning, or MVCC behaviour in your Spring Boot application? Share below.

Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices

Back to Blog

Last updated: April 4, 2026

MySQL InnoDB Internals: Buffer Pool, Redo Log & MVCC Deep Dive for Java Developers

Table of Contents

1. InnoDB Architecture: Memory vs Disk Components

2. Buffer Pool: Pages, LRU, and the Young/Old Trick

3. Redo Log (WAL): How InnoDB Survives Crashes

4. Undo Log: The Foundation of ROLLBACK and MVCC

5. MVCC Deep Dive: ReadView, trx_id, and Consistent Reads

6. B+Tree Clustered Index: Why InnoDB Hates UUIDs

7. Locking: Record Locks, Gap Locks, and Next-Key Locks

8. Java Developer Tuning Guide: Key InnoDB Variables

9. Key Takeaways

Leave a Comment

Leave a Comment

Related Posts

MySQL InnoDB Internals: Buffer Pool, Redo Log & MVCC Deep Dive for Java Developers

Table of Contents

1. InnoDB Architecture: Memory vs Disk Components

2. Buffer Pool: Pages, LRU, and the Young/Old Trick

3. Redo Log (WAL): How InnoDB Survives Crashes

4. Undo Log: The Foundation of ROLLBACK and MVCC

5. MVCC Deep Dive: ReadView, trx_id, and Consistent Reads

6. B+Tree Clustered Index: Why InnoDB Hates UUIDs

7. Locking: Record Locks, Gap Locks, and Next-Key Locks

8. Java Developer Tuning Guide: Key InnoDB Variables

9. Key Takeaways

Leave a Comment

Leave a Comment

Related Posts

Agentic AI Design Patterns for Production Systems

Microservices Architecture: Patterns, Pitfalls, and Scaling

JVM Performance Tuning in Real Production Workloads

Cookie Notice