[Proposal] pg_stat_wal_records – per-record-type WAL generation statistics - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject [Proposal] pg_stat_wal_records – per-record-type WAL generation statistics
Date
Msg-id CAHg+QDcE2zrcQqA2aE2SNxVnPNHG5WkOeDve8G2+gLT20_G3JA@mail.gmail.com
Whole thread
Responses Re: [Proposal] pg_stat_wal_records – per-record-type WAL generation statistics
List pgsql-hackers
Hi Hackers,

I'd like to propose a new system view, pg_stat_wal_records, that exposes per-resource-manager, per-record-type WAL generation counts.

Sample Output:
postgres=# SELECT * FROM pg_stat_wal_records ORDER BY count DESC LIMIT 10;
 resource_manager |  record_type   | count  |          stats_reset
------------------+----------------+--------+-------------------------------
 Heap             | INSERT         | 500000 | 2026-03-26 22:15:00.12345+00
 Transaction      | COMMIT         | 500000 |
 Btree            | INSERT_LEAF    |  53821 |
 Heap             | HOT_UPDATE     |  12744 |
 XLOG             | FPI            |   8923 |

The Gap:

Postgre already has pg_stat_wal for aggregate WAL volume (bytes, full-page images, buffers), and pg_walinspect (superuser access required) for post-hoc forensic analysis of individual WAL segments. But I don't see a lightweight, observability tool that answers in real time which record types are responsible for the WAL. Additionally, pg_walinspect runs against on-disk WAL files, which is expensive. This view will be useful for monitoring systems to poll cheaply.

Use cases:
WAL volume investigation: see which record types dominate WAL generation in real time without touching disk.
Monitoring integration: Prometheus/Grafana can poll the view to track WAL composition over time and alert on anomalies.
Replication tuning: identify whether WAL volume is dominated by data changes, index maintenance, FPIs, or vacuum activity to guide tuning.
Extension debugging: custom WAL resource managers get visibility automatically.

Key design decisions
Counting mechanism:
The counting mechanism is a single backend-local array increment in XLogInsert():
pgstat_pending_wal_records[rmid][(info >> 4) & 0x0F]++;

This indexes into a uint64[256][16] array (32 KB per backend) using the rmgr ID and the 4-bit record-type subfield of the WAL info byte. Counters are flushed to shared memory via the standard pgstat infrastructure.
I am using per-backend pending array instead of direct shared-memory writes. The counter is incremented in backend-local memory and flushed to shared memory by the existing pgstat flush cycle. Don't expect to see any contention in the hot path (please see perf results below).
Fixed 256×16 matrix. All 256 possible rmgr IDs × 16 possible record types. This accommodates core resource managers and any custom WAL resource managers from extensions without configuration. The 32 KB per-backend cost is modest. Uses rm_identify() for human-readable names. The SRF calls each resource manager's rm_identify callback to translate the info byte into a readable record type name (for example INSERT, COMMIT, VACUUM, HOT_UPDATE). Added the reset functionality via pg_stat_reset_shared('wal_records'), consistent with the existing pattern for wal, bgwriter, archiver, etc.
View skips zero-count entries, keeping output clean.

Performance overhead
Benchmarked with pgbench (scale 50, 16 clients, 16 threads, 30s, synchronous_commit=off) on 64 vCPU machine with data and WAL on NVMe:

Configuration Avg TPS
With patch 42,266
Without patch 42,053
The overhead is within measurement noise (~0.5%). The increment hits a backend-local, L1-hot array and is dwarfed by XLogInsert's existing CRC, locking, and memcpy work.

Attached a draft patch, please share your thoughts.


Thanks,
Satya
Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PATCH v1] Fix typos in pg_bsd_indent: "dont" -> "don't" in comments Anexo: o arquivo
Next
From: Amit Langote
Date:
Subject: Re: Add comments about fire_triggers argument in ri_triggers.c