Re: Add sub-transaction overflow status in pg_stat_activity - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Add sub-transaction overflow status in pg_stat_activity |
Date | |
Msg-id | 20221123205654.oi2mve7akpycnj6m@awork3.anarazel.de Whole thread Raw |
In response to | Re: Add sub-transaction overflow status in pg_stat_activity (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Add sub-transaction overflow status in pg_stat_activity
Re: Add sub-transaction overflow status in pg_stat_activity |
List | pgsql-hackers |
Hi, On 2022-11-23 15:25:39 -0500, Robert Haas wrote: > One thing that I'd really like to see better documented is exactly > what it is that causes a problem. But first we'd have to understand it > ourselves. It's not as simple as "if you have more than 64 subxacts in > any top-level xact, kiss performance good-bye!" because for there to > be a problem, at least one backend (and probably many) have to take > snapshots that include that see that overflowed subxact cache and thus > get marked suboverflowed. Then after that, those snapshots have to be > used often enough that the additional visibility-checking cost becomes > a problem. But it's also not good enough to just use those snapshots > against any old tuples, because tuples that are older than the > snapshot's xmin aren't going to cause additional lookups, nor are > tuples newer than the snapshot's xmax. Indeed. This is why I was thinking that just alerting for overflowed xact isn't particularly helpful. You really want to see how much they overflow and how often. But even that might not be that helpful. Perhaps what we actually need is an aggregate measure showing the time spent doing subxact lookups due to overflowed snapshots? Seeing a substantial amount of time spent doing subxact lookups would be much more accurate call to action than seeing a that some sessions have a lot of subxacts. > Intuitively, I feel like this should be pretty rare, and largely > avoidable if you just don't use long-running transactions, which is a > good thing to avoid for other reasons anyway. I think they're just not always avoidable, even in a very well operated system. I wonder if we could lower the impact of suboverflowed snapshots by improving the representation in PGPROC and SnapshotData. What if we a) Recorded the min and max assigned subxid in PGPROC b) Instead of giving up in GetSnapshotData() once we see a suboverflowed PGPROC, store the min/max subxid of the proc in SnapshotData. We could reliably "steal" space for that from ->subxip, as we won't need to store subxids for that proc. c) When determining visibility with a suboverflowed snapshot we use the ranges from b) to check whether we need to do a subtrans lookup. I think that'll often prevent subtrans lookups. d) If we encounter a subxid whose parent is in progress and not in ->subxid, and subxcnt isn't the max, add that subxid to subxip. That's not free because we'd basically need to do an insertion sort, but likely still a lot cheaper than doing repeated subtrans lookups. I think we'd just need a one or two additional fields in SnapshotData. Greetings, Andres Freund
pgsql-hackers by date: