Re: Add sub-transaction overflow status in pg_stat_activity - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Add sub-transaction overflow status in pg_stat_activity
Date
Msg-id 20221123205654.oi2mve7akpycnj6m@awork3.anarazel.de
Whole thread Raw
In response to Re: Add sub-transaction overflow status in pg_stat_activity  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Add sub-transaction overflow status in pg_stat_activity
Re: Add sub-transaction overflow status in pg_stat_activity
List pgsql-hackers
Hi,

On 2022-11-23 15:25:39 -0500, Robert Haas wrote:
> One thing that I'd really like to see better documented is exactly
> what it is that causes a problem. But first we'd have to understand it
> ourselves. It's not as simple as "if you have more than 64 subxacts in
> any top-level xact, kiss performance good-bye!" because for there to
> be a problem, at least one backend (and probably many) have to take
> snapshots that include that see that overflowed subxact cache and thus
> get marked suboverflowed. Then after that, those snapshots have to be
> used often enough that the additional visibility-checking cost becomes
> a problem. But it's also not good enough to just use those snapshots
> against any old tuples, because tuples that are older than the
> snapshot's xmin aren't going to cause additional lookups, nor are
> tuples newer than the snapshot's xmax.

Indeed. This is why I was thinking that just alerting for overflowed xact
isn't particularly helpful. You really want to see how much they overflow and
how often.

But even that might not be that helpful. Perhaps what we actually need is an
aggregate measure showing the time spent doing subxact lookups due to
overflowed snapshots? Seeing a substantial amount of time spent doing subxact
lookups would be much more accurate call to action than seeing a that some
sessions have a lot of subxacts.


> Intuitively, I feel like this should be pretty rare, and largely
> avoidable if you just don't use long-running transactions, which is a
> good thing to avoid for other reasons anyway.

I think they're just not always avoidable, even in a very well operated
system.


I wonder if we could lower the impact of suboverflowed snapshots by improving
the representation in PGPROC and SnapshotData. What if we

a) Recorded the min and max assigned subxid in PGPROC

b) Instead of giving up in GetSnapshotData() once we see a suboverflowed
   PGPROC, store the min/max subxid of the proc in SnapshotData. We could
   reliably "steal" space for that from ->subxip, as we won't need to store
   subxids for that proc.

c) When determining visibility with a suboverflowed snapshot we use the
   ranges from b) to check whether we need to do a subtrans lookup. I think
   that'll often prevent subtrans lookups.

d) If we encounter a subxid whose parent is in progress and not in ->subxid,
   and subxcnt isn't the max, add that subxid to subxip. That's not free
   because we'd basically need to do an insertion sort, but likely still a lot
   cheaper than doing repeated subtrans lookups.

I think we'd just need a one or two additional fields in SnapshotData.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Add LZ4 compression in pg_dump
Next
From: "David G. Johnston"
Date:
Subject: Re: fixing CREATEROLE