Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Date
Msg-id YgW+Gl+VC+QGFZF4@paquier.xyz
Whole thread Raw
In response to Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
On Thu, Feb 10, 2022 at 04:12:40PM -0800, Andres Freund wrote:
> I'm pretty sure the problem is on the primary. Looking through
> ReindexRelationConcurrently() I think I found at least two problems:
>
> 1) We don't WAL log snapshot conflicts, afaict (i.e. the
> WaitForOlderSnapshots() in phase 3). Which I think means that the new index
> can end up being used "too soon" in pre-existing snapshots on the standby.
>
> I don't think this is the problem this thread is about, but it's definitely a
> problem.
>
> 2) WaitForLockersMultiple() in phase 5 / 6 isn't WAL logged. Without waiting
> for sessions to see the results of Phase 4, 5, we can mark the index as dead
> (phase 5) and drop it (phase 6), while there are ongoing accesses.
>
> I think this is the likely cause of the reported bug.

Yep, I was planning to play with this problem from next week, FWIW,
just lacked time/energy to do so.  And the release was shipping
anyway, so there is plenty of time.

My impression is that we don't really need to change the WAL format
based on the existing APIs we already have, or that in the worst case
it would be possible to make things backward-compatible enough that it
would not be a worry as long as the standbys are updated before the
primaries.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0
Next
From: Andres Freund
Date:
Subject: Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0