Hi,
On 2022-02-19 20:57:57 -0800, Noah Misch wrote:
> On Wed, Feb 16, 2022 at 03:43:12PM +0700, John Naylor wrote:
> > On Wed, Feb 16, 2022 at 6:17 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > > On Tue, Feb 15, 2022 at 9:28 AM Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > > > I did notice from my own testing of the failsafe (by artificially
> > > > inducing wraparound failure using an XID burning C function) that
> > > > autovacuum seemed to totally correct the problem, even when the system
> > > > had already crossed xidStopLimit - it came back on its own. I wasn't
> > > > completely sure of how robust this effect was, though.
> >
> > I'll put some effort in finding any way that it might not be robust.
>
> A VACUUM may create a not-trivially-bounded number of multixacts via
> FreezeMultiXactId(). In a cluster at multiStopLimit, completing VACUUM
> without error needs preparation something like:
>
> 1. Kill each XID that might appear in a multixact.
> 2. Resolve each prepared transaction that might appear in a multixact.
> 3. Run VACUUM. At this point, multiStopLimit is blocking new multixacts from
> other commands, and the lack of running multixact members removes the need
> for FreezeMultiXactId() to create multixacts.
>
> Adding to the badness of single-user mode so well described upthread, one can
> enter it without doing (2) and then wrap the nextMXact counter.
If we collected the information along the lines of I proposed in the second half of
https://www.postgresql.org/message-id/20220204013539.qdegpqzvayq3d4y2%40alap3.anarazel.de
we should be able to handle such cases more intelligently, I think?
We could e.g. add an error if FreezeMultiXactId() needs to create a new
multixact for a far-in-the-past xid. That's not great, of course, but if we
include the precise cause (pid of backend / prepared xact name / slot name /
...) necessitating creating a new multi, it'd still be a significant
improvement over the status quo.
Greetings,
Andres Freund