Re: Potential data loss due to race condition during logical replication slot creation - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: Potential data loss due to race condition during logical replication slot creation
Date
Msg-id ZosrOu7RFnrrsuHL@paquier.xyz
Whole thread Raw
In response to Re: Potential data loss due to race condition during logical replication slot creation  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Potential data loss due to race condition during logical replication slot creation
List pgsql-bugs
On Fri, Jul 05, 2024 at 11:52:52PM +0900, Masahiko Sawada wrote:
> I've attached updated patches for HEAD and pg17 for now (I will create
> the patch for other backbranches).
>
> In the patches, I used a different approach in between HEAD and
> backbranches. On HEAD, we store a flag indicating whether or not we
> should skip snapshot restores into the SnapBuild struct and set it
> only while finding the start point. Therefore we have to bump
> SNAPBUILD_VERSION. On backbranches, I used the approach we discussed
> above; store the flag in LogicalDecodingContext and set it when
> creating the LogicalDecodingContext for a new logical slot. A possible
> downside of the approach taken for backbranches is that we implicitly
> require for users to use the same LogicalDecodingContext for  both
> initializing the context for a new slot and finding its start point.
> IIUC it was not strictly required. This restriction would not be a
> problem at least in the core, but I'm not sure if there are no
> external extensions that create a logical slot in that way. This is
> the main reason why I used a different approach on HEAD and
> backbranches. Therefore, if it does not matter, I think we can use the
> same approach on all branches, which is better in terms of
> maintainability.

--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -189,6 +189,9 @@ struct SnapBuild
     /* Indicates if we are building full snapshot or just catalog one. */
     bool        building_full_snapshot;

+    /* Indicates if we are finding the start point to extract changes */
+    bool        finding_start_point;
+

FYI, I think that it is still OK to bump SNAPBUILD_VERSION on
REL_17_STABLE.  That will reduce by 1 year the time window required to
maintain the tweaks implemented for the versions in the back-branches.
So I'd suggest to do what the v17 version of the patch does for ~16,
and use the snapshot format changes in 17~.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: BUG #18509: Logical decoding behaves badly when processing a change record for a table with altered column
Next
From: Richard Guo
Date:
Subject: Re: BUG #18522: Wrong results with Merge Right Anti Join, inconsistent with Merge Anti Join