Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Kumar, Sachin
Subject Initial Schema Sync for Logical Replication
Date
Msg-id db02e6773adb4dbcb5b9bb3803ebe340@amazon.com
Whole thread Raw
Responses Re: Initial Schema Sync for Logical Replication  (Peter Smith <smithpb2250@gmail.com>)
Re: Initial Schema Sync for Logical Replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: Initial Schema Sync for Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers

Hi Everyone,

 

I am working on the initial schema sync for Logical replication. Currently, user have to

manually create a schema on subscriber side. Aim of this feature is to add an option in

create subscription, so that schema sync can be automatic. I am sharing Design Doc below,

but there are some corner cases where the design does not work. Please share your opinion

if design can be improved and we can get rid of corner cases. This design is loosely based

on Pglogical.

DDL replication is required for this feature.

(https://www.postgresql.org/message-id/flat/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com)

 

SQL Changes:-

CREATE SUBSCRIPTION subscription_name

CONNECTION 'conninfo'

PUBLICATION publication_name [, ...]

[ WITH ( subscription_parameter [= value] [, ... ] ) ]

sync_initial_schema (enum) will be added to subscription_parameter.

It can have 3 values:-

TABLES, ALL , NONE (Default)

In ALL everything will be synced including global objects too.

 

Restrictions :- sync_initial_schema=ALL can only be used for publication with FOR ALL TABLES

 

Design:-

 

Publisher :-

Publisher have to implement `SHOW CREATE TABLE_NAME`, this table definition will be used by

subscriber to create exact schema of a table on the subscriber. One alternative to this can

be doing it on the subscriber side itself, we can create a function similar to

describeOneTableDetails and call it on the subscriber. We also need maintain same ownership

as of publisher.

 

It should also have turned on publication of DDL commands.

 

Subscriber :-

 

1. In  CreateSubscription()  when we create replication slot(walrcv_create_slot()), should

use CRS_EXPORT_SNAPSHOT, So that we can use this snapshot later in the pg_dump.

 

2.  Now we can call pg_dump with above snapshot from CreateSubscription. This is inside

opts.connect && opts.create_slot if statement. If we fail in this step we have to drop

the replication slot and create a new one again. Because we need snapshot and creating a

replication slot is a way to get snapshot. The reason for running pg_dump with above

snapshot is that we don't want execute DDLs in wal_logs to 2 times. With above snapshot we

get a state of database which is before the replication slot origin and any changes after

the snapshot will be in wal_logs.

 

We will save the pg_dump into a file (custom archive format). So pg_dump will be similar to

pg_dump --connection_string --schema_only --snapshot=xyz -Fc --file initSchema

 

If  sync_initial_schema=TABLES we dont have to call pg_dump/restore at all. TableSync process

will take care of it.

 

3. If we have to sync global objects we need to call pg_dumpall --globals-only also. But pg_dumpall

does not support --snapshot option, So if user creates a new global object between creation

of replication slot and running pg_dumpall, that above global object will be created 2

times on subscriber , which will error out the Applier process.

 

4. walrcv_disconnect should be called after pg_dump is finished, otherwise snapshot will

not be valid.

 

5. Users will replication role cant not call pg_dump , So the replication user have to

superuser. This is a a major problem.

postgres=# create role s4 WITH LOGIN Replication;

CREATE ROLE

─sachin@DUB-1800550165 ~

─$ pg_dump postgres -s -U s4                                                                               1

pg_dump: error: query failed: ERROR:  permission denied for table t1

pg_dump: detail: Query was: LOCK TABLE public.t1, public.t2 IN ACCESS SHARE MODE

 

6. pg_subscription_rel table column srsubstate will have one more state

SUBREL_STATE_CREATE 'c'. if sync_initial_schema is enabled we will set table_state to 'c'.

Above 6 steps will be done even if subscription is not enabled, but connect is true.

 

7. Leader Applier process should check if initSync file exist , if true  then it should

call pg_restore. We are not using —pre-data and —post-data segment as it is used in

Pglogical, Because post_data works on table having data , but we will fill the data into

table on later stages.  pg_restore can be called like this

 

pg_restore --connection_string -1 file_name

-1 option will execute every command inside of one transaction. If there is any error

everything will be rollbacked.

pg_restore should be called quite early in the Applier process code, before any tablesync

process can be created.

Instead of checking if file exist maybe pg_subscription table can be extended with column

SyncInitialSchema and applier process will check SyncInitialSchema == SYNC_PENDING

 

8. TableSync process should check the state of table , if it is SUBREL_STATE_CREATE it should

get the latest definition from the publisher and recreate the table. (We have to recreate

the table even if there are no changes). Then it should go into copy table mode as usual.

 

It might seem that TableSync is doing duplicate work already done by pg_restore. We are doing

it in this way because of concurrent DDLs and refresh publication command.

 

Concurrent DDL :-

User can execute a DDL command to table t1 at the same time when subscriber is trying to sync

it. pictorial representation https://imgur.com/a/ivrIEv8 [1]

 

In tablesync process, it makes a connection to the publisher and it sees the

table state which can be in future wrt to the publisher, which can introduce conflicts.

For example:-

 

CASE 1:- { Publisher removed the column b from the table t1 when subscriber was doing pg_restore

(or any point in concurrent DDL window described in picture [1] ), when tableSync

process will start transaction on the publisher it will see request data of table t1

including column b, which does not exist on the publisher.} So that is why tableSync process

asks for the latest definition.

If we say that we will delay tableSync worker till all the DDL related to table t1 is

applied by the applier process , we can still have a window when publisher issues a DDL

command just before tableSync starts its transaction, and therefore making tableSync and

publisher table definition incompatible (Thanks to Masahiko for pointing out this race

condition).

 

Applier process will skip all DDL/DMLs related to the table t1 and tableSync will apply those

in Catchup phase.

Although there is one issue what will happen to views/ or functions which depend on the table

. I think they should wait till table_state is > SUBREL_STATE_CREATE (means we have the latest

schema definition from the publisher).

There might be corner cases to this approach or maybe a better way to handle concurrent DDL

One simple solution might be to disallow DDLs on the publisher till all the schema is

synced and all tables have state >= SUBREL_STATE_DATASYNC (We can have CASE 1: issue ,

even with DDL replication, so we have to wait till all the tables have table_state

> SUBREL_STATE_DATASYNC). Which might be a big window for big databases.

 

 

Refresh publication :-

In refresh publication, subscriber does create a new replication slot hence , we can’t run

pg_dump with a snapshot which starts from origin(maybe this is not an issue at all). In this case

it makes more sense for tableSync worker to do schema sync.

 

 

If community is happy with above design, I can start working on prototype.

 

Credits :- This design is inspired by Pglogical. Also thanks to Zane, Masahiko, Amit for reviewing earlier designs

 

Regards

Sachin Kumar

Amazon Web Services

pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Add support for DEFAULT specification in COPY FROM
Next
From: "Gregory Stark (as CFM)"
Date:
Subject: Re: On login trigger: take three