Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Initial Schema Sync for Logical Replication
Date
Msg-id CAD21AoAzibaj_vyogYk5c9wCkP5T1n8s-iqPMbrTdxQ3JZ54Rg@mail.gmail.com
Whole thread Raw
In response to RE: Initial Schema Sync for Logical Replication  ("Kumar, Sachin" <ssetiya@amazon.com>)
Responses Re: Initial Schema Sync for Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Fri, Mar 24, 2023 at 11:51 PM Kumar, Sachin <ssetiya@amazon.com> wrote:
>
> > From: Amit Kapila <amit.kapila16@gmail.com>
> > > I think we won't be able to use same snapshot because the transaction will
> > > be committed.
> > > In CreateSubscription() we can use the transaction snapshot from
> > > walrcv_create_slot() till walrcv_disconnect() is called.(I am not sure
> > > about this part maybe walrcv_disconnect() calls the commits internally ?).
> > > So somehow we need to keep this snapshot alive, even after transaction
> > > is committed(or delay committing the transaction , but we can have
> > > CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart
> > > before tableSync is able to use the same snapshot.)
> > >
> >
> > Can we think of getting the table data as well along with schema via
> > pg_dump? Won't then both schema and initial data will correspond to the
> > same snapshot?
>
> Right , that will work, Thanks!

While it works, we cannot get the initial data in parallel, no?

>
> > > I think we can have same issues as you mentioned New table t1 is added
> > > to the publication , User does a refresh publication.
> > > pg_dump / pg_restore restores the table definition. But before
> > > tableSync can start,  steps from 2 to 5 happen on the publisher.
> > > > 1. Create Table t1(c1, c2); --LSN: 90 2. Insert t1 (1, 1); --LSN 100
> > > > 3. Insert t1 (2, 2); --LSN 110 4. Alter t1 Add Column c3; --LSN 120
> > > > 5. Insert t1 (3, 3, 3); --LSN 130
> > > And table sync errors out
> > > There can be one more issue , since we took the pg_dump without
> > snapshot (wrt to replication slot).
> > >
> >
> > To avoid both the problems mentioned for Refresh Publication, we can do
> > one of the following: (a) create a new slot along with a snapshot for this
> > operation and drop it afterward; or (b) using the existing slot, establish a
> > new snapshot using a technique proposed in email [1].
> >
>
> Thanks, I think option (b) will be perfect, since we don’t have to create a new slot.

Regarding (b), does it mean that apply worker stops streaming,
requests to create a snapshot, and then resumes the streaming?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Amcheck verification of GiST and GIN
Next
From: Richard Guo
Date:
Subject: Re: About the constant-TRUE clause in reconsider_outer_join_clauses