Re: improving wraparound behavior - Mailing list pgsql-hackers

From Andres Freund
Subject Re: improving wraparound behavior
Date
Msg-id 20190504024742.y2cvkf6qohazlxk2@alap3.anarazel.de
Whole thread Raw
In response to Re: improving wraparound behavior  (Stephen Frost <sfrost@snowman.net>)
Responses Re: improving wraparound behavior  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
Hi,

On 2019-05-03 22:41:11 -0400, Stephen Frost wrote:
> I suppose it is a pretty big change in the base autovacuum launcher to
> be something that's run per database instead and then deal with the
> coordination between the two...  but I can't help but feel like it
> wouldn't be that much *work*.  I'm not against doing something smaller
> but was something smaller actually proposed for this specific issue..?

I think it'd be fairly significant. And that we should redo it from
scratch if we go there - because what we have isn't worth using as a
basis.


> > I'm thinking that we'd do something roughly like (in actual code) for
> > GetNewTransactionId():
> > 
> >     TransactionId dat_limit = ShmemVariableCache->oldestXid;
> >     TransactionId slot_limit = Min(replication_slot_xmin, replication_slot_catalog_xmin);
> >     Transactionid walsender_limit;
> >     Transactionid prepared_xact_limit;
> >     Transactionid backend_limit;
> > 
> >     ComputeOldestXminFromProcarray(&walsender_limit, &prepared_xact_limit, &backend_limit);
> > 
> >     if (IsOldest(dat_limit))
> >        ereport(elevel,
> >                errmsg("close to xid wraparound, held back by database %s"),
> >                errdetail("current xid %u, horizon for database %u, shutting down at %u"),
> >                errhint("..."));
> >     else if (IsOldest(slot_limit))
> >       ereport(elevel, errmsg("close to xid wraparound, held back by replication slot %s"),
> >               ...);
> > 
> > where IsOldest wouldn't actually compare plainly numerically, but would
> > actually prefer showing the slot, backend, walsender, prepared_xact, as
> > long as they are pretty close to the dat_limit - as in those cases
> > vacuuming wouldn't actually solve the issue, unless the other problems
> > are addressed first (as autovacuum won't compute a cutoff horizon that's
> > newer than any of those).
> 
> Where the errhint() above includes a recommendation to run the SRF
> described below, I take it?

Not necessarily. I feel conciseness is important too, and this would be
the most imporant thing to tackle.


> Also, should this really be an 'else if', or should it be just a set of
> 'if()'s, thereby giving users more info right up-front?

Possibly? But it'd also make it even harder to read the log / the system
to keep up with logging, because we already log *so* much when close to
wraparound.

If we didn't order it, it'd be hard for users to figure out which to
address first. If we ordered it, people have to further up in the log to
figure out which is the most urgent one (unless we reverse the order,
which is odd too).


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: improving wraparound behavior
Next
From: Stephen Frost
Date:
Subject: Re: improving wraparound behavior