Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Date
Msg-id 20220211021812.wgv324etfaz4v4yu@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica  (Michael Paquier <michael@paquier.xyz>)
List pgsql-bugs
Hi,

On 2022-02-11 10:38:34 +0900, Michael Paquier wrote:
> My impression is that we don't really need to change the WAL format
> based on the existing APIs we already have, or that in the worst case
> it would be possible to make things backward-compatible enough that it
> would not be a worry as long as the standbys are updated before the
> primaries.

I think it's a question of how "concurrently" we want concurrently to be on
hot standby nodes.

If we are OK with "not at all", we can just lock an AEL at the end of
WaitForLockersMultiple(). It's a bit harder to cause the necessary snapshot
conflict, but we could e.g. extend xl_running_xacts and use the record length
to keep both older primary -> newer replica, and the reverse working.

If we want to actually make concurrently work concurrently on the replica,
we'd have to work harder. That doesn't strike me as feasible for backpatching.

The most important bit bit would be to make WaitForOlderSnapshots() wait for
the xmin of hot_standby_feedback walsenders and/or physical slot xmin
horizons. For phase 5 & 6 we would probably have to do something like waiting
for snapshot conflicts vs walsenders/slots in addition to the
WaitForLockersMultiple() for conflicts on the primary.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0
Next
From: Ryan Kelly
Date:
Subject: ERROR: XX000: variable not found in subplan target list