Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss" - Mailing list pgsql-hackers

From Jeff Janes
Subject Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"
Date
Msg-id CAMkU=1yWky3fFnJ8AYAdOCctQWrEF0RWhU8v9GOtFFpxkF3Myw@mail.gmail.com
Whole thread Raw
In response to Cluster "stuck" in "not accepting commands to avoid wraparound data loss"  (Andres Freund <andres@anarazel.de>)
Responses Re: Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"
List pgsql-hackers
Sorry, accidentally failed to include the list originally, here it is
for the list:

On Dec 16, 2015 9:52 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
>
> On Fri, Dec 11, 2015 at 1:08 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > Since changes to datfrozenxid are WAL logged at the time they occur,
> > but the supposedly-synchronous change to ShmemVariableCache is not WAL
> > logged until the next checkpoint, a well timed crash can leave you in
> > the state where the system is in a tizzy about wraparound but each
> > database says "Nope, not me".
>
> ShmemVariableCache is an in-memory data structure, so it's going to
> get blown away and rebuilt on a crash.  But I guess it gets rebuild
> from the contents of the most recent checkpoint record, so that
> doesn't actually help.  However, I wonder if it would be safe to for
> the autovacuum launcher to calculate an updated value and call
> SetTransactionIdLimit() to update ShmemVariableCache.

I was wondering if that should happen either at the end of crash
recovery (but I suppose you can't poll pg_database yet at that
point?), or immediately before throwing the "database is not accepting
commands to avoid wraparound data loss" error.

At which point would it make sense for the launcher do it?  I guess
just after it was started up under PMSIGNAL_START_AUTOVAC_LAUNCHER
conditions?

> But I'm somewhat confused what this has to do with Andres's report.

Doesn't it explain the exact situation he is in, where the oldest
database is 200 million, but the cluster as a whole is 2 billion?

Cheers,

Jeff



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)
Next
From: Robert Haas
Date:
Subject: Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?