Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: Initial Schema Sync for Logical Replication
Date
Msg-id 78149fa6-4c77-4128-8518-197a631c29c3@app.fastmail.com
Whole thread Raw
In response to Re: Initial Schema Sync for Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Initial Schema Sync for Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote:
Now, how do we avoid these problems even if we have our own version of
functionality similar to pg_dump for selected objects? I guess we will
face similar problems. If so, we may need to deny schema sync in any
such case.
There are 2 approaches for initial DDL synchronization:

1) generate the DDL command on the publisher, stream it and apply it as-is on
the subscriber;
2) generate a DDL representation (JSON, for example) on the publisher, stream
it, transform it into a DDL command on subscriber and apply it.

The option (1) is simpler and faster than option (2) because it does not
require an additional step (transformation). However, option (2) is more
flexible than option (1) because it allow you to create a DDL command even if a
feature was removed from the subscriber and the publisher version is less than
the subscriber version or a feature was added to the publisher and the
publisher version is greater than the subscriber version. Of course there are
exceptions and it should forbid the transformation (in this case, it can be
controlled by the protocol version -- LOGICALREP_PROTO_FOOBAR_VERSION_NUM). A
decision must be made: simple/restrict vs complex/flexible.

One of the main use cases for logical replication is migration (X -> Y where X
< Y). Postgres generally does not remove features but it might happen (such as
WITH OIDS syntax) and it would break the DDL replication (option 1). In the
downgrade case (X -> Y where X > Y), it might break the DDL replication if a
new syntax is introduced in X. Having said that, IMO option (1) is fragile if
we want to support DDL replication between different Postgres versions. It
might eventually work but there is no guarantee.

Per discussion [1], I think if we agree that the Alvaro's DDL deparse patch is
the way to go with DDL replication, it seems wise that it should be used for
initial DDL synchronization as well.





--
Euler Taveira

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: HOT chain validation in verify_heapam()
Next
From: "Imseih (AWS), Sami"
Date:
Subject: Re: [BUG] pg_stat_statements and extended query protocol