Re: BUG #7710: Xid epoch is not updated properly during checkpoint - Mailing list pgsql-bugs

From Simon Riggs
Subject Re: BUG #7710: Xid epoch is not updated properly during checkpoint
Date
Msg-id CA+U5nM+qw3b74FrkY7z2bK73Q6svtCHLiGb3ZrHwXwLRp_nuDQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #7710: Xid epoch is not updated properly during checkpoint  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #7710: Xid epoch is not updated properly during checkpoint
List pgsql-bugs
On 1 December 2012 22:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> tarvip@gmail.com writes:
>> [ txid_current can show a bogus value near XID wraparound ]
>> This happens only if wal_level=hot_standby.
>
> I believe what is happening here is
>
> (1) CreateCheckPoint sets up checkPoint.nextXid and
> checkPoint.nextXidEpoch, near xlog.c line 7070 in HEAD.  At this point,
> nextXid is still a bit less than the wrap point.
>
> (2) After performing the checkpoint, at line 7113, CreateCheckPoint
> calls LogStandbySnapshot() which "helpfully" updates checkPoint.nextXid
> to the latest value.  Which by now has wrapped around.  But it doesn't
> fix checkPoint.nextXidEpoch, so the checkpoint that gets written out has
> effectively lost the epoch bump that should have happened.
>
> While we could add some more logic to try to correct the epoch value
> in this scenario, I think it's a much better idea to just stop having
> LogStandbySnapshot update the nextXid.  That seems to me to be useless
> complication.  I also quite dislike the fact that we're effectively
> redefining the checkpoint nextXid from being taken before the main
> body of the checkpoint to being taken afterwards, but *only* in
> XLogStandbyInfoActive mode.  If that inconsistency isn't already causing
> bugs (besides this one) today, it'll probably cause them in the future.

I agree that the coding looks weird and agree it shouldn't be there.
The meaning of the checkpoint values should not differ because
wal_level has changed.

> So barring objections, I'm going to remove LogStandbySnapshot's behavior
> of returning the updated nextXid.

Removing it may cause other bugs, but if so, those other bugs need to
be solved in the right way, not by having a too-far-forwards nextxid
on the checkpoint record. Having said that, I can't see any bugs that
would be caused by this.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

Previous
From: Simon Riggs
Date:
Subject: Re: BUG #7710: Xid epoch is not updated properly during checkpoint
Next
From: Simon Riggs
Date:
Subject: Re: BUG #7710: Xid epoch is not updated properly during checkpoint