Re: Support logical replication of DDLs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Support logical replication of DDLs
Date
Msg-id CAA4eK1JKK9LACPovjogS-LThQBscwkrxBy9RuA6aHFP=vTGjtg@mail.gmail.com
Whole thread Raw
In response to Re: Support logical replication of DDLs  (Zheng Li <zhengli10@gmail.com>)
Responses Re: Support logical replication of DDLs
Re: Support logical replication of DDLs
List pgsql-hackers
On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10@gmail.com> wrote:
>
>
> While I agree that the deparser is needed to handle the potential
> syntax differences between
> the pub/sub, I think it's only relevant for the use cases where only a
> subset of tables in the database
> are replicated. For other use cases where all tables, functions and
> other objects need to be replicated,
> (for example, creating a logical replica for major version upgrade)
> there won't be any syntax difference to
> handle and the schemas are supposed to match exactly between the
> pub/sub. In other words the user seeks to create an identical replica
> of the source database and the DDLs should be replicated
> as is in this case.
>

I think even for database-level replication we can't assume that
source and target will always have the same data in which case "Create
Table As ..", "Alter Table .. " kind of statements can't be replicated
as it is because that can lead to different results. The other point
is tomorrow we can extend the database level option/syntax to exclude
a few objects (something like [1]) as well in which case we again need
to filter at the publisher level.

>
 So I think it's an overkill to use deparser for
> such use cases. It also costs more space and
> time using deparsing. For example, the following simple ALTER TABLE
> command incurs 11 times more space
> in the WAL record if we were to use the format from the deparser,
> there will also be time and CPU overhead from the deparser.
>
...
>
> So I think it's better to define DDL replication levels [1] to tailor
> for the two different use cases. We can use different logging format
> based on the DDL replication level. For example,
> we can simply log the DDL query string and the search_path for
> database level DDL replication. But for table level DDL replication we
> need to use the deparser format in order to
> handle the potential syntax differences and schema mapping requests.
>

I think having different logging formats is worth considering but I am
not sure we can distinguish it for database and table level
replication because of the reasons mentioned above. One thing which
may need a different format is the replication of global objects like
roles, tablespace, etc. but we haven't analyzed them in detail about
those. I feel we may also need a different syntax altogether to
replicate such objects. Also, I think we may want to optimize the
current format in some cases so that the WAL amount could be reduced.

I feel if we think that deparsing is required for this project then
probably at this stage it would be a good idea to explore ways to have
independent ways to test it. One way is to do testing via the logical
replication of DDL (aka via this patch) and the other is to write an
independent test suite as Sawada-San seems to be speculating above
[2]. I am not sure if there is any progress yet on the independent
test suite front yet.

[1] - https://commitfest.postgresql.org/38/3646/
[2] - https://www.postgresql.org/message-id/CAD21AoAX_xiO03hXSY2QfbcKT0XiUvtnzTjy%2BNRJ_EcgBa5B3A%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Remove trailing newlines from pg_upgrade's messages
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Remove trailing newlines from pg_upgrade's messages