Re: Assertion failure in WaitForWALToBecomeAvailable state machine - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Assertion failure in WaitForWALToBecomeAvailable state machine
Date
Msg-id CALj2ACUCt1+BmgP=8Th=y3RVWb3fO9AyXp=NzkDyGhp8Uv6_ZQ@mail.gmail.com
Whole thread Raw
In response to Re: Assertion failure in WaitForWALToBecomeAvailable state machine  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Assertion failure in WaitForWALToBecomeAvailable state machine
List pgsql-hackers
On Mon, Aug 15, 2022 at 11:30 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Aug 11, 2022 at 10:06 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Today I encountered the assertion failure [2] twice while working on
> > another patch [1]. The pattern seems to be that the walreceiver got
> > killed or crashed and set it's state to WALRCV_STOPPING or
> > WALRCV_STOPPED by the team the WAL state machine moves to archive and
> > hence the following XLogShutdownWalRcv() code will not get hit:
> >
> >                     /*
> >                      * Before we leave XLOG_FROM_STREAM state, make sure that
> >                      * walreceiver is not active, so that it won't overwrite
> >                      * WAL that we restore from archive.
> >                      */
> >                     if (WalRcvStreaming())
> >                         ShutdownWalRcv();
> >
> > I agree with Kyotaro-san to reset InstallXLogFileSegmentActive before
> > entering into the archive mode. Hence I tweaked the code introduced by
> > the following commit a bit, the result v1 patch is attached herewith.
> > Please review it.
>
> I added it to the current commitfest to not lose track of it:
> https://commitfest.postgresql.org/39/3814/.

Today, I spent some more time on this issue, I modified the v1 patch
posted upthread a bit - now resetting the InstallXLogFileSegmentActive
only when the WAL source switched to archive, not every time in
archive mode.

I'm attaching v2 patch here with, please review it further.

Just for the records - there's another report of the assertion failure
at [1], many thanks to Kyotaro-san for providing consistent
reproducible steps.

[1] - https://www.postgresql.org/message-id/flat/20220909.172949.2223165886970819060.horikyota.ntt%40gmail.com

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)
Next
From: Lev Kokotov
Date:
Subject: Support for Rust