Re: speed up a logical replica setup - Mailing list pgsql-hackers

From Noah Misch
Subject Re: speed up a logical replica setup
Date
Msg-id 20240626192116.a8.nmisch@google.com
Whole thread Raw
In response to Re: speed up a logical replica setup  ("Euler Taveira" <euler@eulerto.com>)
List pgsql-hackers
On Tue, Jun 25, 2024 at 09:50:59PM -0300, Euler Taveira wrote:
> On Tue, Jun 25, 2024, at 3:24 AM, Amit Kapila wrote:
> > On Tue, Jun 25, 2024 at 3:38 AM Noah Misch <noah@leadboat.com> wrote:
> > > On Mon, Jun 24, 2024 at 05:20:21PM +0530, Amit Kapila wrote:
> > > > On Sun, Jun 23, 2024 at 11:52 AM Noah Misch <noah@leadboat.com> wrote:
> > > > > > +static void
> > > > > > +create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
> > > > > > +{
> > > > >
> > > > > > +     appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES",
> > > > > > +                                       ipubname_esc);
> > > > >
> > > > > This tool's documentation says it "guarantees that no transaction will be
> > > > > lost."  I tried to determine whether achieving that will require something
> > > > > like the fix from
> > > > > https://postgr.es/m/flat/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com.
> > > > > (Not exactly the fix from that thread, since that thread has not discussed the
> > > > > FOR ALL TABLES version of its race condition.)  I don't know.  On the one
> > > > > hand, pg_createsubscriber benefits from creating a logical slot after creating
> > > > > the publication.  That snapbuild.c process will wait for running XIDs.  On the
> > > > > other hand, an INSERT/UPDATE/DELETE acquires its RowExclusiveLock and builds
> > > > > its relcache entry before assigning an XID, so perhaps the snapbuild.c process
> > >
> > > Correction: it doesn't matter how the original INSERT/UPDATE/DELETE builds its
> > > relcache entry, just how pgoutput of the change builds the relcache entry from
> > > the historic snapshot.
> > >
> > > > > isn't enough to prevent that thread's race condition.  What do you think?
> > > >
> > > > I am not able to imagine how the race condition discussed in the
> > > > thread you quoted can impact this patch. The problem discussed is
> > > > mainly the interaction when we are processing the changes in logical
> > > > decoding w.r.t concurrent DDL (Alter Publication ... Add Table). The
> > > > problem happens because we use the old cache state.
> > >
> > > Right.  Taking the example from
> > > http://postgr.es/m/20231119021830.d6t6aaxtrkpn743y@awork3.anarazel.de, LSNs
> > > between what that mail calls 4) and 5) are not safely usable as start points.
> > > pg_createsubscriber evades that thread's problem if the consistent_lsn it
> > > passes to pg_replication_origin_advance() can't be in a bad-start-point LSN
> > > span.  I cautiously bet the snapbuild.c process achieves that:
> > >
> > > > I am missing your
> > > > point about the race condition mentioned in the thread you quoted with
> > > > snapbuild.c. Can you please elaborate a bit more?
> > >
> > > When pg_createsubscriber calls pg_create_logical_replication_slot(), the key
> > > part starts at:
> > >
> > >         /*
> > >          * If caller needs us to determine the decoding start point, do so now.
> > >          * This might take a while.
> > >          */
> > >         if (find_startpoint)
> > >                 DecodingContextFindStartpoint(ctx);
> > >
> > > Two factors protect pg_createsubscriber.  First, (a) CREATE PUBLICATION
> > > committed before pg_create_logical_replication_slot() started.  Second, (b)
> > > DecodingContextFindStartpoint() waits for running XIDs to complete, via the
> > > process described at the snapbuild.c "starting up in several stages" diagram.
> > > Hence, the consistent_lsn is not in a bad-start-point LSN span.  It's fine
> > > even if the original INSERT populated all caches before CREATE PUBLICATION
> > > started and managed to assign an XID only after consistent_lsn.  From the
> > > pgoutput perspective, that's indistinguishable from the transaction starting
> > > at its first WAL record, after consistent_lsn.  The linked "long-standing data
> > > loss bug in initial sync of logical replication" thread doesn't have (a),
> > > hence its bug.  How close is that to accurate?
> > 
> > Yeah, this theory sounds right to me. The key point is that no DML
> > (processing of WAL corresponding to DML) before CREATE PUBLICATION ...
> > command would have reached pgoutput level because we would have waited
> > for it during snapbuild.c. Can we conclude that the race condition
> > discussed in the other thread won't impact this patch?
> 
> As Noah said the key point is the CREATE PUBLICATION *before* creating the
> replication slots -- that wait transactions to complete.

Let's consider the transaction loss topic concluded.  Thanks for contemplating
it.  I've added an open item for the "dbname containing a space" topic.



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: Proposal: Document ABI Compatibility
Next
From: "David E. Wheeler"
Date:
Subject: Re: Proposal: Document ABI Compatibility