Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Date
Msg-id 20220421161710.znq23x3zkgocbrhq@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
List pgsql-bugs
Hi,

On 2022-04-21 12:54:00 +0900, Michael Paquier wrote:
> Based on the analysis of upthread, I have stuck with logging standby
> snapshots once we are done in WaitForOlderSnapshots() so as we make sure
> that a standby does not attempt to look at the data of any running
> transactions it should wait for and the log of AELs at the end of
> WaitForLockersMultiple() to avoid the access of the lockers too early.

Why is this necessary? The comments in WaitForOlderSnapshots() don't really
explain it. This is pretty darn expensive.


> amcheck could be made more robust here, by calling
> RelationGetIndexList() after opening the Relation of the parent to
> check if it is still listed in the returned list as it would look at
> the relcache and discard any invalid indexes on the way.

That seems like a weird approach. Why can't the check just be done on the
relcache entry of the index itself? If that doesn't work, something is still
broken in cache invalidation.


> diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
> index cd30f15eba..704d8f3744 100644
> --- a/src/backend/commands/indexcmds.c
> +++ b/src/backend/commands/indexcmds.c
> @@ -475,6 +475,16 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
>          if (progress)
>              pgstat_progress_update_param(PROGRESS_WAITFOR_DONE, i + 1);
>      }
> +
> +    /*
> +     * Take a snapshot of running transactions and write this to WAL.  With
> +     * this information, a standby is able to know that it should not access
> +     * too soon any information we expect to wait for with a concurrent build.
> +     *
> +     * Skip the log of this information if disabled.
> +     */
> +    if (XLogStandbyInfoActive())
> +        LogStandbySnapshot();
>  }

"should not access too soon" - no idea what that means / guarantees.

The "Skip the log" bit is redundant with the code.


Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Next
From: PG Bug reporting form
Date:
Subject: BUG #17467: Perf degradation after switching to latest jdbc drivers