On Mon, Nov 14, 2022 at 12:47 PM Andres Freund <andres@anarazel.de> wrote: > I'd go the other way. It's pretty unimportant whether it overflowed, it's > important how many subtxns there are. The cases where overflowing causes real > problems are when there's many thousand subtxns - which one can't judge just > from suboverflowed alone. Nor can monitoring a boolean tell you whether you're > creeping closer to the danger zone.
This is the opposite of what I believe to be true. I thought the problem is that once a single backend overflows the subxid array, all snapshots have to be created suboverflowed, and this makes visibility checking more expensive. It's my impression that for some users this creates and extremely steep performance cliff: the difference between no backends overflowing and 1 backend overflowing is large, but whether you are close to the limit makes no difference as long as you don't reach it, and once you've passed it it makes little difference how far past it you go.
Assuming getting an actual count value to print is fairly cheap, or even a sunk cost if you are going to report overflow, I don't see why we wouldn't want to provide the more detailed data.
My concern, through ignorance, with reporting a number is that it would have no context in the query result itself. If I have two rows with numbers, one with 10 and one with 1,000, is the two orders of magnitude of the second number important or does overflow happen at, say, 65,000 and so both numbers are exceedingly small and thus not worth worrying about? That can be handled by documentation just fine, so long as the reference number in question isn't a per-session variable. Otherwise, showing some kind of "percent of max" computation seems warranted. In which case maybe the two presentation outputs would be: