Re: Assertion failure in WaitForWALToBecomeAvailable state machine - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Assertion failure in WaitForWALToBecomeAvailable state machine
Date
Msg-id Yx7ZDSvjmmPuF5Sd@paquier.xyz
Whole thread Raw
In response to Re: Assertion failure in WaitForWALToBecomeAvailable state machine  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Assertion failure in WaitForWALToBecomeAvailable state machine
List pgsql-hackers
On Sat, Sep 10, 2022 at 07:52:01AM +0530, Bharath Rupireddy wrote:
> Today, I spent some more time on this issue, I modified the v1 patch
> posted upthread a bit - now resetting the InstallXLogFileSegmentActive
> only when the WAL source switched to archive, not every time in
> archive mode.
>
> I'm attaching v2 patch here with, please review it further.
>
> Just for the records - there's another report of the assertion failure
> at [1], many thanks to Kyotaro-san for providing consistent
> reproducible steps.
>
> [1] - https://www.postgresql.org/message-id/flat/20220909.172949.2223165886970819060.horikyota.ntt%40gmail.com

Well, the fact that cc2c7d6 is involved here makes this thread an open
item for PG15 as far as I can see, assigned to Noah (added now in
CC).

While reading your last patch, I have found rather confusing that we
only reset InstallXLogFileSegmentActive when the current source is the
archives and it does not match the old source.  This code is already
complicated, and I don't think that having more assumptions in its
internals is a good thing when it comes to its long-term maintenance.
In short, HEAD is rather conservative when it comes to set
InstallXLogFileSegmentActive, switching it only when we request
streaming with RequestXLogStreaming(), but too aggressive when it
comes to reset it and we want something in the middle ground.  FWIW, I
find better the approach taken by Horiguchi-san in [1] to reset the
state before we attempt to read WAL from the archives *or* pg_wal,
after we know that the last source has failed, where we know that we
are not streaming yet (but recovery may be requested soon).

Side note: I don't see much the point of having two routines named
SetInstallXLogFileSegmentActive and ResetInstallXLogFileSegmentActive
that do the opposite thing.  We could just have one.

[1]: https://www.postgresql.org/message-id/20220214.171428.735280610520122187.horikyota.ntt@gmail.com
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Expand palloc/pg_malloc API
Next
From: Michael Paquier
Date:
Subject: Re: pg_upgrade generated files in subdir follow-up