Home > mailing lists

[Proposal] pg_stat_wal_records – per-record-type WAL generation statistics - Mailing list pgsql-hackers

From	SATYANARAYANA NARLAPURAM
Subject	[Proposal] pg_stat_wal_records – per-record-type WAL generation statistics
Date	March 27 03:29:25
Msg-id	CAHg+QDcE2zrcQqA2aE2SNxVnPNHG5WkOeDve8G2+gLT20_G3JA@mail.gmail.com Whole thread
Responses	Re: [Proposal] pg_stat_wal_records – per-record-type WAL generation statistics
List	pgsql-hackers

Tree view

Hi Hackers,

I'd like to propose a new system view, pg_stat_wal_records, that exposes per-resource-manager, per-record-type WAL generation counts.

The Gap:

Postgre already has pg_stat_wal for aggregate WAL volume (bytes, full-page images, buffers), and pg_walinspect (superuser access required) for post-hoc forensic analysis of individual WAL segments. But I don't see a lightweight, observability tool that answers in real time which record types are responsible for the WAL. Additionally, pg_walinspect runs against on-disk WAL files, which is expensive. This view will be useful for monitoring systems to poll cheaply.

Use cases:
WAL volume investigation: see which record types dominate WAL generation in real time without touching disk.
Monitoring integration: Prometheus/Grafana can poll the view to track WAL composition over time and alert on anomalies.
Replication tuning: identify whether WAL volume is dominated by data changes, index maintenance, FPIs, or vacuum activity to guide tuning.
Extension debugging: custom WAL resource managers get visibility automatically.

Key design decisions
Counting mechanism:

The counting mechanism is a single backend-local array increment in XLogInsert():
pgstat_pending_wal_records[rmid][(info >> 4) & 0x0F]++;

This indexes into a uint64[256][16] array (32 KB per backend) using the rmgr ID and the 4-bit record-type subfield of the WAL info byte. Counters are flushed to shared memory via the standard pgstat infrastructure.
I am using per-backend pending array instead of direct shared-memory writes. The counter is incremented in backend-local memory and flushed to shared memory by the existing pgstat flush cycle. Don't expect to see any contention in the hot path (please see perf results below).
Fixed 256×16 matrix. All 256 possible rmgr IDs × 16 possible record types. This accommodates core resource managers and any custom WAL resource managers from extensions without configuration. The 32 KB per-backend cost is modest. Uses rm_identify() for human-readable names. The SRF calls each resource manager's rm_identify callback to translate the info byte into a readable record type name (for example INSERT, COMMIT, VACUUM, HOT_UPDATE). Added the reset functionality via pg_stat_reset_shared('wal_records'), consistent with the existing pattern for wal, bgwriter, archiver, etc.
View skips zero-count entries, keeping output clean.

Performance overhead
Benchmarked with pgbench (scale 50, 16 clients, 16 threads, 30s, synchronous_commit=off) on 64 vCPU machine with data and WAL on NVMe:

Configuration Avg TPS
With patch 42,266
Without patch 42,053
The overhead is within measurement noise (~0.5%). The increment hits a backend-local, L1-hot array and is dwarfed by XLogInsert's existing CRC, locking, and memcpy work.

Attached a draft patch, please share your thoughts.

Thanks,

Satya

Attachment

v1-0001-pg-stat-wal-records.patch

pgsql-hackers by date:

From: Bruce Momjian
Date: 27 March, 03:17:51
Subject: Re: [PATCH v1] Fix typos in pg_bsd_indent: "dont" -> "don't" in comments Anexo: o arquivo

From: Amit Langote
Date: 27 March, 03:39:17
Subject: Re: Add comments about fire_triggers argument in ri_triggers.c

[Proposal] pg_stat_wal_records – per-record-type WAL generation statistics - Mailing list pgsql-hackers

Attachment

Previous

Next