Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction - Mailing list pgsql-bugs

From Xuneng Zhou
Subject Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
Date
Msg-id CABPTF7Wb6VOac1EFopGzhAtiHOU8_XRkFvhC6BnSj=jACBB+Gw@mail.gmail.com
Whole thread
In response to Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction  (Xuneng Zhou <xunengzhou@gmail.com>)
Responses Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
List pgsql-bugs
On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi klemen,
>
> Thanks for the report.
>
> On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference:      19439
> > Logged by:          klemen kobau
> > Email address:      klemen.kobau@gmail.com
> > PostgreSQL version: 18.0
> > Operating system:   Linux (EndeavorOS)
> > Description:
> >
> > I am running postgres:18.0 in a docker container, the configuration is as
> > follows:
> >
> >   postgres:
> >     image: postgres:18.0
> >     command: [
> >       "postgres",
> >       "-N", "200",
> >       "-c", "max_prepared_transactions=100",
> >       "-c", "wal_level=logical",
> >       "-c", "max_wal_senders=10",
> >       "-c", "max_replication_slots=20",
> >       "-c", "wal_keep_size=10",
> >       "-c", "max_slot_wal_keep_size=1024"
> >     ]
> >     environment:
> >       POSTGRES_USER: postgres
> >       POSTGRES_PASSWORD: postgres
> >       POSTGRES_DB: postgres
> >       TZ: UTC
> >       PGTZ: UTC
> >     ports:
> >       - 5432:5432
> >     volumes:
> >       - postgres_data:/var/lib/postgresql
> >
> > I use psql version 18.3.
> >
> > I run the following:
> >
> > psql -h localhost -p 5432 -U postgres -d postgres -c "
> > -- Transaction 1: insert 1 row, check stats, commit
> > BEGIN;
> > CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
> > INSERT INTO temp.xact_test (val) VALUES ('a');
> > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> > 'xact_test';
> > COMMIT;
> >
> > -- Transaction 2: insert 1 row, check stats
> > BEGIN;
> > INSERT INTO temp.xact_test (val) VALUES ('b');
> > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> > 'xact_test';
> > ROLLBACK;
> >
> > -- Cleanup
> > DROP TABLE temp.xact_test;
> > "
> >
> > And the output is
> >
> > BEGIN
> > CREATE TABLE
> > INSERT 0 1
> >   relname  | n_tup_ins
> > -----------+-----------
> >  xact_test |         1
> > (1 row)
> >
> > COMMIT
> > BEGIN
> > INSERT 0 1
> >   relname  | n_tup_ins
> > -----------+-----------
> >  xact_test |         2
> > (1 row)
> >
> > ROLLBACK
> > DROP TABLE
>
> I can also reproduce this behavior on HEAD.
>
> > This is not what I would expect from reading
> > https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
> > where it states
> >
> > > A transaction can also see its own statistics (not yet flushed out to the
> > shared memory statistics) in the views pg_stat_xact_all_tables,
> > pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
> > pg_stat_xact_user_functions. These numbers do not act as stated above;
> > instead they update continuously throughout the transaction.
> >
> > based on this, I would expect that the numbers are updated each time an
> > insert happens and that they are isolated per transaction.
> >
>
> This snippet of the doc feels somewhat ambiguous. The current behavior
> does not seem fully consistent with wording such as “throughout the
> transaction.” It also seems more appropriate for these table views to
> reflect stats within their own scopes, rather than cumulative values
> spanning txn boundaries.
>

I’ve looked into this issue and 'd like to propose a patch to address
it. Feedback is very welcome.

--- Root cause

Since the stats subsystem was rewritten (commit 5891c7a8), each
backend keeps a hash of pending stats entries that persist across
transaction boundaries. Entries that flush successfully are deleted
from the backend-local pending list; entries not flushed remain
pending with their accumulated counters. Flushing is driven by
pgstat_report_stat(), called from the ReadyForQuery path, and subject
to a rate limiter.

The pg_stat_xact_* views read these pending entries directly via
find_tabstat_entry() / find_funcstat_entry(). Both functions return
the raw accumulated counters without per-transaction scoping.

The bug is deterministic when multiple top-level transactions are
processed within a single simple-query message, because there is no
ReadyForQuery boundary between the transactions and therefore no
opportunity to flush and remove the pending entry.

--- Fix

The patch introduces a per-entry "transaction baseline" — a lazy
snapshot of the counters taken the first time an entry is touched in
each new top-level transaction. The accessor functions
(find_tabstat_entry, find_funcstat_entry) then subtract the baseline
from the current counters, yielding only the current transaction's
delta.

The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
pgstat_ensure_xact_baseline(), is called at every nontransactional
counter-increment site (the 7 event-counter macros in pgstat.h and the
4 non-inline counter functions in pgstat_relation.c). After the first
call per entry per transaction, the check reduces to a single integer
comparison with a predictably not-taken branch.

For function stats, a new PgStat_FunctionPending wrapper struct embeds
PgStat_FunctionCounts at offset 0 with the baseline fields appended,
so the flush callback requires only a trivial cast change.

--- Testing

The regression test is a TAP test under src/bin/psql/t/ that uses psql
-c to send multi-statement strings as single simple-query messages.
src/bin/psql/t look like the right existing harness for the primary
regression because psql -c sends the whole multi-statement string as
one simple-query message via simple_action_list dispatch. By contrast,
ordinary pg_regress SQL files are executed by psql in file/stdin mode,
which runs through MainLoop() and dispatches top-level statements one
at a time via SendQuery(), creating ReadyForQuery boundaries that
could mask the bug.

The test covers three scenarios: table n_tup_ins and seq_scan
counters, and function calls — each verifying isolation across
consecutive top-level transactions within a single message.


--
Best,
Xuneng

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #19438: segfault with temp_file_limit inside cursor
Next
From: Thomas Munro
Date:
Subject: Re: BUG #19416: Backend SIGSEGV in ExecShutdownHashJoin/ExecHashTableDetach/dsa_free