RE: Support logical replication of DDLs - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: Support logical replication of DDLs
Date
Msg-id OS0PR01MB5716DBA31AC57163D2CF465E94B09@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Support logical replication of DDLs  (Zheng Li <zhengli10@gmail.com>)
Responses RE: Support logical replication of DDLs
Re: Support logical replication of DDLs
List pgsql-hackers
On Saturday, June 18, 2022 3:38 AM Zheng Li <zhengli10@gmail.com> wrote:
> On Wed, Jun 15, 2022 at 12:00 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10@gmail.com> wrote:
> > >
> > >
> > > While I agree that the deparser is needed to handle the potential
> > > syntax differences between the pub/sub, I think it's only relevant
> > > for the use cases where only a subset of tables in the database are
> > > replicated. For other use cases where all tables, functions and
> > > other objects need to be replicated, (for example, creating a
> > > logical replica for major version upgrade) there won't be any syntax
> > > difference to handle and the schemas are supposed to match exactly
> > > between the pub/sub. In other words the user seeks to create an
> > > identical replica of the source database and the DDLs should be
> > > replicated as is in this case.
> > >
> >
> > I think even for database-level replication we can't assume that
> > source and target will always have the same data in which case "Create
> > Table As ..", "Alter Table .. " kind of statements can't be replicated
> > as it is because that can lead to different results.
> "Create Table As .." is already handled by setting the skipData flag of the
> statement parsetreee before replay:
> 
> /*
> * Force skipping data population to avoid data inconsistency.
> * Data should be replicated from the publisher instead.
> */
> castmt->into->skipData = true;
> 
> "Alter Table .. " that rewrites with volatile expressions can also be handled
> without any syntax change, by enabling the table rewrite replication and
> converting the rewrite inserts to updates. ZJ's patch introduced this solution.
> I've also adopted this approach in my latest patch
> 0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch
> 
> > The other point
> > is tomorrow we can extend the database level option/syntax to exclude
> > a few objects (something like [1]) as well in which case we again need
> > to filter at the publisher level
> 
> I think for such cases it's not full database replication and we could treat it as
> table level DDL replication, i.e. use the the deparser format.

Hi,

Here are some points in my mind about the two approaches discussed here.

1) search_patch vs schema qualify

Again, I still think it will bring more flexibility and security by schema qualify the
objects in DDL command as mentioned before[1].

Besides, a schema qualified DDL is also more appropriate for other use
cases(e.g. a table-level replication). As it's possible the schema is different
between pub/sub and it's easy to cause unexpected and undetectable failure if
we just log the search_path. 

It makes more sense to me to have the same style WAL log(schema qualified) for
both database level or table level replication as it will bring more
flexibility.


> "Create Table As .." is already handled by setting the skipData flag of the
> statement parsetreee before replay:

2) About the handling of CREATE TABLE AS:

I think it's not a appropriate approach to set the skipdata flag on subscriber
as it cannot handle EXECUTE command in CTAS.

CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DTAAAA');

The Prepared statement is a temporary object which we don't replicate. So if
you directly execute the original SQL on subscriber, even if you set skipdata
it will fail.

I think it difficult to make this work as you need handle the create/drop of
this prepared statement. And even if we extended subscriber's code to make it
work, it doesn't seems like a standard and elegant approach.


> "Alter Table .. " that rewrites with volatile expressions can also be handled
> without any syntax change, by enabling the table rewrite replication and
> converting the rewrite inserts to updates. ZJ's patch introduced this solution.

3) About the handling of ALTER TABLE rewrite.

The approach I proposed before is based on the event trigger + deparser
approach. We were able to improve that approach as we don't need to replicate
the rewrite in many cases. For example: we don't need to replicate rewrite dml
if there is no volatile/mutable function. We should check and filter these case
at publisher (e.g. via deparser) instead of checking that at subscriber.

Besides, as discussed, we need to give warning or error for the cases when DDL
contains volatile function which would be executed[2]. We should check this at
publisher as well(via deparser).


> I think for such cases it's not full database replication and we could treat it as
> table level DDL replication, i.e. use the the deparser format.

4) I think the point could be that we should make the WAL log format extendable
so that we can extend it to support more useful feature(table filter/schema
maps/DDL filter). If we just WAL log the original SQL, it seems it's difficult
to extend it in the future ?

[1] https://www.postgresql.org/message-id/202204081134.6tcmf5cxl3sz%40alvherre.pgsql
[2] https://www.postgresql.org/message-id/CAA4eK1JVynFsj%2BmcRWj9sewR2yNUs6LuNxJ0eN-gNJ83oKcUOQ%40mail.gmail.com

Best regards,
Hou zj

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: Handle infinite recursion in logical replication setup