Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Initial Schema Sync for Logical Replication
Date
Msg-id CAA4eK1KonkKD9Nw35bTG+M9j6ibRK2M0H_W35_TNT9kto8tSsA@mail.gmail.com
Whole thread Raw
In response to Initial Schema Sync for Logical Replication  ("Kumar, Sachin" <ssetiya@amazon.com>)
Responses RE: Initial Schema Sync for Logical Replication  ("Kumar, Sachin" <ssetiya@amazon.com>)
List pgsql-hackers
On Wed, Mar 15, 2023 at 11:12 PM Kumar, Sachin <ssetiya@amazon.com> wrote:
>
>
> Concurrent DDL :-
>
> User can execute a DDL command to table t1 at the same time when subscriber is trying to sync
>
> it. pictorial representation https://imgur.com/a/ivrIEv8 [1]
>
>
>
> In tablesync process, it makes a connection to the publisher and it sees the
>
> table state which can be in future wrt to the publisher, which can introduce conflicts.
>
> For example:-
>
>
>
> CASE 1:- { Publisher removed the column b from the table t1 when subscriber was doing pg_restore
>
> (or any point in concurrent DDL window described in picture [1] ), when tableSync
>
> process will start transaction on the publisher it will see request data of table t1
>
> including column b, which does not exist on the publisher.} So that is why tableSync process
>
> asks for the latest definition.
>
> If we say that we will delay tableSync worker till all the DDL related to table t1 is
>
> applied by the applier process , we can still have a window when publisher issues a DDL
>
> command just before tableSync starts its transaction, and therefore making tableSync and
>
> publisher table definition incompatible (Thanks to Masahiko for pointing out this race
>
> condition).
>

IIUC, this is possible only if tablesync process uses a snapshot
different than the snapshot we have used to perform the initial schema
sync, otherwise, this shouldn't be a problem. Let me try to explain my
understanding with an example (the LSNs used are just explain the
problem):

1. Create Table t1(c1, c2); --LSN: 90
2. Insert t1 (1, 1); --LSN 100
3. Insert t1 (2, 2); --LSN 110
4. Alter t1 Add Column c3; --LSN 120
5. Insert t1 (3, 3, 3); --LSN 130

Now, say before starting tablesync worker, apply process performs
initial schema sync and uses a snapshot corresponding to LSN 100. Then
it starts tablesync process to allow the initial copy of data in t1.
Here, if the table sync process tries to establish a new snapshot, it
may get data till LSN 130 and when it will try to copy the same in
subscriber it will fail. Is my understanding correct about the problem
you described? If so, can't we allow tablesync process to use the same
exported snapshot as we used for the initial schema sync and won't
that solve the problem you described?

>
>
> Applier process will skip all DDL/DMLs related to the table t1 and tableSync will apply those
>
> in Catchup phase.
>
> Although there is one issue what will happen to views/ or functions which depend on the table
>
> . I think they should wait till table_state is > SUBREL_STATE_CREATE (means we have the latest
>
> schema definition from the publisher).
>
> There might be corner cases to this approach or maybe a better way to handle concurrent DDL
>
> One simple solution might be to disallow DDLs on the publisher till all the schema is
>
> synced and all tables have state >= SUBREL_STATE_DATASYNC (We can have CASE 1: issue ,
>
> even with DDL replication, so we have to wait till all the tables have table_state
>
> > SUBREL_STATE_DATASYNC). Which might be a big window for big databases.
>
>
>
>
>
> Refresh publication :-
>
> In refresh publication, subscriber does create a new replication slot hence , we can’t run
>
> pg_dump with a snapshot which starts from origin(maybe this is not an issue at all). In this case
>
> it makes more sense for tableSync worker to do schema sync.
>

Can you please explain this problem with some examples?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Brar Piening
Date:
Subject: Re: doc: add missing "id" attributes to extension packaging page
Next
From: Amit Kapila
Date:
Subject: Re: Initial Schema Sync for Logical Replication