Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Initial Schema Sync for Logical Replication
Date
Msg-id CAD21AoCOuJjOieMikipx-pfRW0m1j2q7KMZQFk2JdSL1WwTHOw@mail.gmail.com
Whole thread Raw
In response to Re: Initial Schema Sync for Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Initial Schema Sync for Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
RE: Initial Schema Sync for Logical Replication  ("Kumar, Sachin" <ssetiya@amazon.com>)
List pgsql-hackers
On Tue, Mar 28, 2023 at 6:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 27, 2023 at 8:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Mar 24, 2023 at 11:51 PM Kumar, Sachin <ssetiya@amazon.com> wrote:
> > >
> > > > From: Amit Kapila <amit.kapila16@gmail.com>
> > > > > I think we won't be able to use same snapshot because the transaction will
> > > > > be committed.
> > > > > In CreateSubscription() we can use the transaction snapshot from
> > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am not sure
> > > > > about this part maybe walrcv_disconnect() calls the commits internally ?).
> > > > > So somehow we need to keep this snapshot alive, even after transaction
> > > > > is committed(or delay committing the transaction , but we can have
> > > > > CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart
> > > > > before tableSync is able to use the same snapshot.)
> > > > >
> > > >
> > > > Can we think of getting the table data as well along with schema via
> > > > pg_dump? Won't then both schema and initial data will correspond to the
> > > > same snapshot?
> > >
> > > Right , that will work, Thanks!
> >
> > While it works, we cannot get the initial data in parallel, no?
> >
>
> Another possibility is that we dump/restore the schema of each table
> along with its data. One thing we can explore is whether the parallel
> option of dump can be useful here. Do you have any other ideas?

A downside of the idea of dumping both table schema and table data
would be that we need to temporarily store data twice the size of the
table (the dump file and the table itself) during the load. One might
think that we can redirect the pg_dump output into the backend so that
it can load it via SPI, but it doesn't work since "COPY tbl FROM
stdin;" doesn't work via SPI. The --inserts option of pg_dump could
help it out but it makes restoration very slow.

>
> One related idea is that currently, we fetch the table list
> corresponding to publications in subscription and create the entries
> for those in pg_subscription_rel during Create Subscription, can we
> think of postponing that work till after the initial schema sync? We
> seem to be already storing publications list in pg_subscription, so it
> appears possible if we somehow remember the value of copy_data. If
> this is feasible then I think that may give us the flexibility to
> perform the initial sync at a later point by the background worker.

It sounds possible. With this idea, we will be able to have the apply
worker restore the table schemas (and create pg_subscription_rel
entries) as the first thing. Another point we might need to consider
is that the initial schema sync (i.e. creating tables) and creating
pg_subscription_rel entries need to be done in the same transaction.
Otherwise, we could end up committing either one change. I think it
depends on how we restore the schema data.

>
> > >
> > > > > I think we can have same issues as you mentioned New table t1 is added
> > > > > to the publication , User does a refresh publication.
> > > > > pg_dump / pg_restore restores the table definition. But before
> > > > > tableSync can start,  steps from 2 to 5 happen on the publisher.
> > > > > > 1. Create Table t1(c1, c2); --LSN: 90 2. Insert t1 (1, 1); --LSN 100
> > > > > > 3. Insert t1 (2, 2); --LSN 110 4. Alter t1 Add Column c3; --LSN 120
> > > > > > 5. Insert t1 (3, 3, 3); --LSN 130
> > > > > And table sync errors out
> > > > > There can be one more issue , since we took the pg_dump without
> > > > snapshot (wrt to replication slot).
> > > > >
> > > >
> > > > To avoid both the problems mentioned for Refresh Publication, we can do
> > > > one of the following: (a) create a new slot along with a snapshot for this
> > > > operation and drop it afterward; or (b) using the existing slot, establish a
> > > > new snapshot using a technique proposed in email [1].
> > > >
> > >
> > > Thanks, I think option (b) will be perfect, since we don’t have to create a new slot.
> >
> > Regarding (b), does it mean that apply worker stops streaming,
> > requests to create a snapshot, and then resumes the streaming?
> >
>
> Shouldn't this be done by the backend performing a REFRESH publication?

Hmm, I might be missing something but the idea (b) uses the existing
slot to establish a new snapshot, right? What existing replication
slot do we use for that? I thought it was the one used by the apply
worker.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: Memory leak from ExecutorState context?
Next
From: Tomas Vondra
Date:
Subject: Re: Memory leak from ExecutorState context?