Home > mailing lists

RE: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From	Kumar, Sachin
Subject	RE: Initial Schema Sync for Logical Replication
Date	March 23, 2023 18:54:49
Msg-id	b8c228aee5564248ae9c1c6d48811d70@amazon.com Whole thread Raw
In response to	Re: Initial Schema Sync for Logical Replication (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Initial Schema Sync for Logical Replication
List	pgsql-hackers

Tree view

> From: Amit Kapila <amit.kapila16@gmail.com>
> IIUC, this is possible only if tablesync process uses a snapshot different than the
> snapshot we have used to perform the initial schema sync, otherwise, this
> shouldn't be a problem. Let me try to explain my understanding with an example
> (the LSNs used are just explain the
> problem):
> 
> 1. Create Table t1(c1, c2); --LSN: 90
> 2. Insert t1 (1, 1); --LSN 100
> 3. Insert t1 (2, 2); --LSN 110
> 4. Alter t1 Add Column c3; --LSN 120
> 5. Insert t1 (3, 3, 3); --LSN 130
> 
> Now, say before starting tablesync worker, apply process performs initial
> schema sync and uses a snapshot corresponding to LSN 100. Then it starts
> tablesync process to allow the initial copy of data in t1.
> Here, if the table sync process tries to establish a new snapshot, it may get data
> till LSN 130 and when it will try to copy the same in subscriber it will fail. Is my
> understanding correct about the problem you described?
Right
> If so, can't we allow
> tablesync process to use the same exported snapshot as we used for the initial
> schema sync and won't that solve the problem you described?
I think we won't be able to use same snapshot because the transaction will be committed.
In CreateSubscription() we can use the transaction snapshot from walrcv_create_slot()
till walrcv_disconnect() is called.(I am not sure about this part maybe walrcv_disconnect() calls
the commits internally ?). 
So somehow we need to keep this snapshot alive, even after transaction is committed(or delay committing
the transaction , but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart before 
tableSync is able to use the same snapshot.) 
> > Refresh publication :-
> >
> > In refresh publication, subscriber does create a new replication slot
Typo-> subscriber does not
> > hence , we can’t run
> >
> > pg_dump with a snapshot which starts from origin(maybe this is not an
> > issue at all). In this case
> >
> > it makes more sense for tableSync worker to do schema sync.
> >
> 
> Can you please explain this problem with some examples?
I think we can have same issues as you mentioned
New table t1 is added to the publication , User does a refresh publication.
pg_dump / pg_restore restores the table definition. But before tableSync
can start,  steps from 2 to 5 happen on the publisher.
> 1. Create Table t1(c1, c2); --LSN: 90
> 2. Insert t1 (1, 1); --LSN 100
> 3. Insert t1 (2, 2); --LSN 110
> 4. Alter t1 Add Column c3; --LSN 120
> 5. Insert t1 (3, 3, 3); --LSN 130
And table sync errors out
There can be one more issue , since we took the pg_dump without snapshot (wrt to replication slot).
(I am not 100 percent sure about this).
Lets imagine applier process is lagging behind publisher. 
Events on publisher
1. alter t1 drop column c; LSN 100 <-- applier process tries to execute this DDL
2. alter t1 drop column d; LSN 110
3. insert into t1 values(..); LSN 120 <-- (Refresh publication called )pg_dump/restore restores this version
Applier process executing 1 will fail because t1 does not have column c. 
Regards
Sachin

pgsql-hackers by date:

From: Robert Haas
Date: 23 March 2023, 18:52:18
Subject: Re: Non-superuser subscription owners

From: Robert Haas
Date: 23 March 2023, 18:55:47
Subject: Re: Transparent column encryption

RE: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

Previous

Next