Hi Hackers,
I'd like to propose a new system view, pg_stat_wal_records, that exposes per-resource-manager, per-record-type WAL generation counts.
Sample Output:
postgres=# SELECT * FROM pg_stat_wal_records ORDER BY count DESC LIMIT 10;
resource_manager | record_type | count | stats_reset
------------------+----------------+--------+-------------------------------
Heap | INSERT | 500000 | 2026-03-26 22:15:00.12345+00
Transaction | COMMIT | 500000 |
Btree | INSERT_LEAF | 53821 |
Heap | HOT_UPDATE | 12744 |
XLOG | FPI | 8923 |
The Gap:
Postgre already has pg_stat_wal for aggregate WAL volume (bytes, full-page images, buffers), and pg_walinspect (superuser access required) for post-hoc forensic analysis of individual WAL segments. But I don't see a lightweight, observability tool that answers in real time which record types are responsible for the WAL. Additionally, pg_walinspect runs against on-disk WAL files, which is expensive. This view will be useful for monitoring systems to poll cheaply.
Use cases:
WAL volume investigation: see which record types dominate WAL generation in real time without touching disk.
Monitoring integration: Prometheus/Grafana can poll the view to track WAL composition over time and alert on anomalies.
Replication tuning: identify whether WAL volume is dominated by data changes, index maintenance, FPIs, or vacuum activity to guide tuning.
Extension debugging: custom WAL resource managers get visibility automatically.
Key design decisions
Counting mechanism:
The counting mechanism is a single backend-local array increment in XLogInsert():
pgstat_pending_wal_records[rmid][(info >> 4) & 0x0F]++;
This indexes into a uint64[256][16] array (32 KB per backend) using the rmgr ID and the 4-bit record-type subfield of the WAL info byte. Counters are flushed to shared memory via the standard pgstat infrastructure.
I am using per-backend pending array instead of direct shared-memory writes. The counter is incremented in backend-local memory and flushed to shared memory by the existing pgstat flush cycle. Don't expect to see any contention in the hot path (please see perf results below).
Fixed 256×16 matrix. All 256 possible rmgr IDs × 16 possible record types. This accommodates core resource managers and any custom WAL resource managers from extensions without configuration. The 32 KB per-backend cost is modest. Uses rm_identify() for human-readable names. The SRF calls each resource manager's rm_identify callback to translate the info byte into a readable record type name (for example INSERT, COMMIT, VACUUM, HOT_UPDATE). Added the reset functionality via pg_stat_reset_shared('wal_records'), consistent with the existing pattern for wal, bgwriter, archiver, etc.
View skips zero-count entries, keeping output clean.
Performance overhead
Benchmarked with pgbench (scale 50, 16 clients, 16 threads, 30s, synchronous_commit=off) on 64 vCPU machine with data and WAL on NVMe:
Configuration Avg TPS
With patch 42,266
Without patch 42,053
The overhead is within measurement noise (~0.5%). The increment hits a backend-local, L1-hot array and is dwarfed by XLogInsert's existing CRC, locking, and memcpy work.
Attached a draft patch, please share your thoughts.
Thanks,
Satya