Re: prevent immature WAL streaming - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: prevent immature WAL streaming
Date
Msg-id 8687C486-BC80-4150-8519-13FE395485BF@amazon.com
Whole thread Raw
In response to Re: prevent immature WAL streaming  ("alvherre@alvh.no-ip.org" <alvherre@alvh.no-ip.org>)
Responses Re: prevent immature WAL streaming  ("alvherre@alvh.no-ip.org" <alvherre@alvh.no-ip.org>)
List pgsql-hackers
On 8/24/21, 4:03 PM, "alvherre@alvh.no-ip.org" <alvherre@alvh.no-ip.org> wrote:
> On 2021-Aug-24, Bossart, Nathan wrote:
>
>> I wonder if we need to move the call to RegisterSegmentBoundary() to
>> somewhere before WALInsertLockRelease() for this to work as intended.
>> Right now, boundary registration could take place after the flush
>> pointer has been advanced, which means GetSafeFlushRecPtr() could
>> still return an unsafe position.
>
> Yeah, you're right -- that's a definite risk.  I didn't try to reproduce
> a problem with that, but it is seems pretty obvious that it can happen.
>
> I didn't have a lot of luck with a reliable reproducer script.  I was
> able to reproduce the problem starting with Ryo Matsumura's script and
> attaching a replica; most of the time the replica would recover by
> restarting from a streaming position earlier than where the problem
> occurred; but a few times it would just get stuck with a WAL segment
> containing a bogus record.  Then, after patch, the problem no longer
> occurs.

If moving RegisterSegmentBoundary() is sufficient to prevent the flush
pointer from advancing before we register the boundary, I bet we could
also remove the WAL writer nudge.

Another interesting thing I see is that the boundary stored in
earliestSegBoundary is not necessarily the earliest one.  It's just
the first one that has been registered.  I did this for simplicity for
the .ready file fix, but I can see it causing problems here.  I think
we can move earliestSegBoundary backwards as long as it is greater
than lastNotifiedSeg + 1.  However, it's still not necessarily the
earliest one if we copied latestSegBoundary to earliestSegBoundary in
NotifySegmentsReadyForArchive().  To handle this, we could track
several boundaries in an array, but then we'd have to hold the safe
LSN back whenever the array overflowed and we started forgetting
boundaries.

Perhaps there's a simpler solution.  I'll keep thinking...

Nathan


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Proposal: More structured logging
Next
From: Michael Paquier
Date:
Subject: Re: Tab completion for "create unlogged" a bit too lax?