Re: visibility map corruption - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: visibility map corruption
Date
Msg-id 20210724000852.GD8025@momjian.us
Whole thread Raw
In response to Re: visibility map corruption  (Bruce Momjian <bruce@momjian.us>)
Responses Re: visibility map corruption  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Thu, Jul  8, 2021 at 09:51:47AM -0400, Bruce Momjian wrote:
> On Thu, Jul  8, 2021 at 08:11:14AM -0500, Justin Pryzby wrote:
> > Also, the pg_upgrade status message still seems to be misplaced:
> > 
> > In 20210706190612.GM22043@telsasoft.com, Justin Pryzby wrote:
> > > I re-arranged the pg_upgrade output of that patch: it was in the middle of the
> > > two halves: "Setting next transaction ID and epoch for new cluster"
> > 
> > +++ b/src/bin/pg_upgrade/pg_upgrade.c
> > @@ -473,6 +473,12 @@ copy_xact_xlog_xid(void)
> >                           "\"%s/pg_resetwal\" -f -x %u \"%s\"",
> >                           new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
> >                           new_cluster.pgdata);
> > +       check_ok();
> > +       prep_status("Setting oldest XID for new cluster");
> > +       exec_prog(UTILITY_LOG_FILE, NULL, true, true,
> > +                         "\"%s/pg_resetwal\" -f -u %u \"%s\"",
> > +                         new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid,
> > +                         new_cluster.pgdata);
> >         exec_prog(UTILITY_LOG_FILE, NULL, true, true,
> >                           "\"%s/pg_resetwal\" -f -e %u \"%s\"",
> >                           new_cluster.bindir, old_cluster.controldata.chkpnt_nxtepoch,
> 
> Wow, you are 100% correct.  Updated patch attached.

OK, I have the patch ready to apply to all supported Postgres versions,
and it passes all my cross-version pg_upgrade tests.

However, I am now stuck on the commit message text, and I think this is
the point Peter Geoghegan was trying to make earlier --- while we know
that preserving the oldest xid in pg_control is the right thing to do,
and that setting it to the current xid - 2 billion (the old behavior)
causes vacuum freeze to run on all tables, but what else does this patch
affect?

As far as I know, seeing a very low oldest xid causes autovacuum to
check all objects and make sure their relfrozenxid is less then
autovacuum_freeze_max_age, but isn't that just a check?  Would that
cause any table scans?  I would think not.  And would this cause
incorrect truncation of pg_xact or fsm or vm files?  I would think not
too.

Even if the old and new cluster had mismatched autovacuum_freeze_max_age
values, I don't see how that would cause any corruption either.

I could perhaps see corruption happening if pg_control's oldest xid
value was closer to the current xid value than it should be, but I can't
see how having it 2-billion away could cause harm, unless perhaps
pg_upgrade itself used enough xids to cause the counter to wrap more
than 2^31 away from the oldest xid recorded in pg_control.

What I am basically asking is how to document this and what it fixes.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Followup Timestamp to timestamp with TZ conversion
Next
From: Peter Geoghegan
Date:
Subject: Re: visibility map corruption