Re: Add sub-transaction overflow status in pg_stat_activity - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Add sub-transaction overflow status in pg_stat_activity |
Date | |
Msg-id | CA+Tgmoa1NpWkLzAsiyyZUnicpvd9Eq-FLYHZqvgUjei2tiAycw@mail.gmail.com Whole thread Raw |
In response to | Re: Add sub-transaction overflow status in pg_stat_activity (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Add sub-transaction overflow status in pg_stat_activity
|
List | pgsql-hackers |
On Wed, Nov 23, 2022 at 2:01 PM Bruce Momjian <bruce@momjian.us> wrote: > I originally thought having this value in pg_stat_activity was overkill, > but seeing the other internal/warning columns in that view, I think it > makes sense. Oddly, is our 64 snapshot performance limit even > documented anywhere? I know it is in Simon's patch I am working on. If it is, I'm not aware of it. We often don't document things that are as internal as that. One thing that I'd really like to see better documented is exactly what it is that causes a problem. But first we'd have to understand it ourselves. It's not as simple as "if you have more than 64 subxacts in any top-level xact, kiss performance good-bye!" because for there to be a problem, at least one backend (and probably many) have to take snapshots that include that see that overflowed subxact cache and thus get marked suboverflowed. Then after that, those snapshots have to be used often enough that the additional visibility-checking cost becomes a problem. But it's also not good enough to just use those snapshots against any old tuples, because tuples that are older than the snapshot's xmin aren't going to cause additional lookups, nor are tuples newer than the snapshot's xmax. So it feels a bit complicated to me to think through the workload where this really hurts. What I'm imagining is that you need a relatively long-running transaction that overflows its subxact limitation but then doesn't commit, so that lots of other backends get overflowed snapshots, and also so that the xmin and xmax of the snapshots being taken get further apart. Or maybe you can have a series short-running transactions that each overflow their subxact cache briefly, but they overlap, so that there's usually at least 1 around in that state, but in that case I think you need a separate long-running transaction to push xmin and xmax further apart. Either way, the backends that get the overflowed snapshots then need to go look at some table data that's been recently modified, so that there are xmin and xmax values newer than the snapshot's xmin. Intuitively, I feel like this should be pretty rare, and largely avoidable if you just don't use long-running transactions, which is a good thing to avoid for other reasons anyway. But there may be more to it than I'm realizing, because I've seen customers hit this issue multiple times. I wonder whether there's some subtlety to the triggering conditions that I'm not fully understanding. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: