Skip to main content

Command Palette

Search for a command to run...

How PostgreSQL VACUUM Works: Dead Tuples, FSM/VM, and Locking

Updated
5 min read
K

Software Engineer - Backend

PostgreSQL uses MVCC (Multi-Version Concurrency Control).
Each UPDATE inserts a new row version (xmin = current XID) and marks the old one dead (xmax = current XID).
Each DELETE marks the row dead by setting xmax.

Dead tuples remain on disk, still referenced by their line pointers.
If nothing cleans them up:

  • Tables and indexes bloat with dead tuples.

  • Queries waste CPU checking and skipping invisible rows.

  • The 32-bit transaction ID counter eventually wraps around, corrupting visibility rules.

VACUUM is the cleanup process. It removes dead tuples, reclaims space inside pages, and protects against XID wraparound.


What VACUUM does

When you run normal VACUUM (or autovacuum runs), it performs three tasks.

1. Heap pages and line pointers

  • Each heap page has a line pointer array pointing to tuples.

  • Normal DML never deletes these pointers. UPDATE/DELETE only set xmin/xmax.

  • VACUUM checks each tuple:

    • If dead (not visible to any transaction), VACUUM marks the line pointer unused so it can be reused for new tuples.

Effect: queries skip dead tuples without extra visibility checks.
Normal VACUUM does not move live tuples around within pages. Only VACUUM FULL rewrites and compacts the table.

2. Free Space Map (FSM)

  • FSM is a per-table file (relfilenode_fsm).

  • It tracks free space per page.

  • VACUUM updates the FSM after reclaiming space.

  • Future inserts/updates can quickly find pages with enough space.

3. Visibility Map (VM)

  • VM is another per-table file (relfilenode_vm).

  • It tracks whether each page is:

    • all-visible → every tuple on the page is visible to all current and future transactions.
      (Tuples still have their original xmin, but since the creating transactions are long committed and no deletes/updates are pending, the system knows they are visible in every snapshot. This flag lets index-only scans skip heap lookups for visibility checks, since visibility is guaranteed at the page level.)

    • all-frozen → all tuples have been “frozen” (their xmin replaced with a special FrozenTransactionId).
      (This makes them permanently visible without needing any XID checks, and it protects against transaction ID wraparound.
      An all-frozen page is also all-visible by definition, but not all all-visible pages are frozen — they may still carry unfrozen XIDs that will eventually need freezing.)

Why this matters

  • Index-only scans: Normally, an index lookup must still visit the heap to confirm tuple visibility (xmin/xmax vs snapshot). If a page is marked all-visible, PostgreSQL skips the heap lookup and trusts the VM, making index-only scans efficient.

  • Autovacuum scheduling: Pages marked all-visible don’t need re-scanning for dead tuples, and pages marked all-frozen don’t need re-scanning for XID wraparound. This allows autovacuum to focus only on pages that require cleanup.


Normal VACUUM vs VACUUM FULL

Normal VACUUM

  • Marks dead tuples’ line pointers unused.

  • Updates FSM and VM.

  • Does not shrink the file on disk. Empty space stays inside existing pages.

  • Sequential scans still read the same number of pages (I/O unchanged).
    But they become cheaper in CPU terms, because VACUUM marks dead tuples as unused.
    This lets scans skip those line pointers entirely instead of performing visibility checks on each dead tuple.

  • Lock: SHARE UPDATE EXCLUSIVE → allows SELECT, INSERT, UPDATE, DELETE. Blocks only schema changes that need stronger locks.

VACUUM FULL

  • Rewrites the entire table:

    • Creates a new file.

    • Copies only live tuples, tightly packed.

    • Replaces the old file.

  • Shrinks file size; disk space is returned to the OS.

  • Sequential scans speed up because fewer pages exist.

  • Lock: ACCESS EXCLUSIVE → blocks all reads, writes, and schema changes until it finishes.

  • Needs enough disk space to hold both old and new copies during rewrite.

VACUUM ANALYZE

VACUUM ANALYZE runs a normal VACUUM followed by ANALYZE.

  • ANALYZE → scans the table to collect statistics about column values (e.g., distribution, distinct counts, histograms).
    These statistics are stored in the system catalogs.
    The query planner uses them to choose efficient execution plans (index vs sequential scan, join methods, etc.).

  • Use VACUUM ANALYZE after bulk INSERT/UPDATE/DELETE or large data loads, so the table is cleaned and has fresh planner statistics right away.

  • Autovacuum will eventually perform both VACUUM and ANALYZE automatically, but running VACUUM ANALYZE ensures immediate cleanup and updated stats without waiting for autovacuum thresholds.


Autovacuum

Autovacuum is a background daemon that automates both VACUUM and ANALYZE.
It wakes up periodically (every autovacuum_naptime, default 1 minute) and checks tables against thresholds.

  • When it runs VACUUM
    Triggered when estimated dead tuples exceed:

      autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * reltuples
    

    Defaults: threshold = 50 + 0.2 × number of live rows.

  • When it runs ANALYZE
    Triggered when row changes (INSERT/UPDATE/DELETE) exceed:

      autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * reltuples
    

    Defaults: threshold = 50 + 0.1 × number of live rows.

  • XID wraparound protection
    Even if thresholds aren’t met, autovacuum will force a VACUUM to prevent transaction ID wraparound.

  • Limitations
    Autovacuum never runs VACUUM FULL.
    It reclaims space for reuse inside tables and updates planner stats, but does not shrink files on disk.

Disabling autovacuum is unsafe in production — tables will bloat and risk XID wraparound failure.


When to use VACUUM FULL

Use VACUUM FULL only when:

  • A massive DELETE or UPDATE leaves most of a table empty.

  • You must return disk space to the OS.

  • Sequential scans are slowed by bloat (many mostly-empty pages).

If the table will refill with new data soon, VACUUM FULL is not needed.
Normal VACUUM is enough, since free space will be reused.

Alternatives:

  • TRUNCATE (if all rows can be dropped).

  • CLUSTER (rewrite + reorder).

  • pg_repack (online rewrite without blocking).


Locking modes

  • Normal VACUUM → SHARE UPDATE EXCLUSIVE

    • Allows SELECT, INSERT, UPDATE, DELETE.

    • Blocks stronger operations like VACUUM FULL, CLUSTER, or structural ALTER TABLE.

  • VACUUM FULL → ACCESS EXCLUSIVE

    • Strongest lock.

    • Blocks all reads, writes, and schema changes until completion.


Plain summary

PostgreSQL doesn’t overwrite rows — it leaves behind dead tuples.

  • Normal VACUUM

    • Cleans dead tuples’ line pointers.

    • Updates FSM/VM.

    • Frees internal space but file size stays the same.

    • Makes scans cheaper (less CPU) but not smaller (same I/O).

  • VACUUM FULL

    • Rewrites the table, removes bloat, shrinks file size.

    • Sequential scans are faster (fewer pages).

    • Blocks all access while running.

    • Only necessary to shrink disk usage or fix severe bloat.

  • Autovacuum

    • Background daemon that runs normal VACUUM automatically.

👉 Together, VACUUM + FSM + VM + line pointer cleanup keep PostgreSQL tables healthy, performant, and safe from transaction ID wraparound.

More from this blog

Backend Software Engineering with Krishna Kumar Mahto

21 posts

Backend Software Engineer, focused on Java-based backend applications, PostgreSQL in databases, Kafka/ActiveMQ/RabbitMQ in messaging.