Re: using an end-of-recovery record in all cases - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: using an end-of-recovery record in all cases
Date
Msg-id 20220419203759.GA2575963@nathanxps13
Whole thread Raw
In response to Re: using an end-of-recovery record in all cases  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: using an end-of-recovery record in all cases
Re: using an end-of-recovery record in all cases
List pgsql-hackers
On Mon, Apr 18, 2022 at 04:44:03PM -0400, Robert Haas wrote:
> Here is a new version of the patch which, unlike v1, I think is
> something we could seriously consider applying (not before v16, of
> course). It now removes CHECKPOINT_END_OF_RECOVERY completely, and I
> attach a second patch as well which nukes checkPoint.PrevTimeLineID as
> well.

I'd like to add a big +1 for this change.  IIUC this should help with some
of the problems I've noted elsewhere [0].

> I mentioned two problems with $SUBJECT in the first email with this
> subject line. One was a bug, which Noah has since fixed (thanks,
> Noah). The other problem is that LogStandbySnapshot() and a bunch of
> its friends expect latestCompletedXid to always be a normal XID, which
> is a problem because (1) we currently set nextXid to 3 and (2) at
> startup, we compute latestCompletedXid = nextXid - 1. The current code
> dodges this kind of accidentally: the checkpoint that happens at
> startup is a "shutdown checkpoint" and thus skips logging a standby
> snapshot, since a shutdown checkpoint is a sure indicator that there
> are no running transactions. With the changes, the checkpoint at
> startup happens after we've started allowing write transactions, and
> thus a standby snapshot needs to be logged also. In the cases where
> the end-of-recovery record was already being used, the problem could
> have happened already, except for the fact that those cases involve a
> standby promotion, which doesn't happen during initdb. I explored a
> few possible ways of solving this problem.

Shouldn't latestCompletedXid be set to MaxTransactionId in this case?  Or
is this related to the logic in FullTransactionIdRetreat() that avoids
skipping over the "actual" special transaction IDs?

> So ... I decided that the best approach here seems to be to just skip
> FirstNormalTransactionId and use FirstNormalTransactionId + 1 for the
> first write transaction that the cluster ever processes. That's very
> simple and doesn't seem likely to break anything else. On the downside
> it seems a bit grotty, but I don't see anything better, and on the
> whole, I think with this approach we come out substantially ahead.
> 0001 removes 3 times as many lines as it inserts, and 0002 saves a few
> more lines of code.

This doesn't seem all that bad to me.  It's a little hacky, but it's very
easy to understand and only happens once per initdb.  I don't think it's
worth any extra complexity.

> Now, I still don't really know that there isn't some theoretical
> difficulty here that makes this whole approach a non-starter, but I
> also can't think of what it might be. If the promotion case, which has
> used the end-of-recovery record for many years, is basically safe,
> despite the fact that it switches TLIs, then it seems to me that the
> crash recovery case, which doesn't have that complication, ought to be
> safe too. But I might well be missing something, so if you see a
> problem, please speak up!

Your reasoning seems sound to me.

[0] https://postgr.es/m/C1EE64B0-D4DB-40F3-98C8-0CED324D34CB%40amazon.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Postgres perl module namespace
Next
From: Tom Lane
Date:
Subject: Re: Bad estimate with partial index