On Wed, Jun 15, 2022 at 12:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10@gmail.com> wrote:
> >
> >
> > While I agree that the deparser is needed to handle the potential
> > syntax differences between
> > the pub/sub, I think it's only relevant for the use cases where only a
> > subset of tables in the database
> > are replicated. For other use cases where all tables, functions and
> > other objects need to be replicated,
> > (for example, creating a logical replica for major version upgrade)
> > there won't be any syntax difference to
> > handle and the schemas are supposed to match exactly between the
> > pub/sub. In other words the user seeks to create an identical replica
> > of the source database and the DDLs should be replicated
> > as is in this case.
> >
>
> I think even for database-level replication we can't assume that
> source and target will always have the same data in which case "Create
> Table As ..", "Alter Table .. " kind of statements can't be replicated
> as it is because that can lead to different results.
"Create Table As .." is already handled by setting the skipData flag of
the statement parsetreee before replay:
/*
* Force skipping data population to avoid data inconsistency.
* Data should be replicated from the publisher instead.
*/
castmt->into->skipData = true;
"Alter Table .. " that rewrites with volatile expressions can also be handled
without any syntax change, by enabling the table rewrite replication and
converting the rewrite inserts to updates. ZJ's patch introduced this solution.
I've also adopted this approach in my latest patch
0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch
> The other point
> is tomorrow we can extend the database level option/syntax to exclude
> a few objects (something like [1]) as well in which case we again need
> to filter at the publisher level
I think for such cases it's not full database replication and we could treat it
as table level DDL replication, i.e. use the the deparser format.
> > So I think it's better to define DDL replication levels [1] to tailor
> > for the two different use cases. We can use different logging format
> > based on the DDL replication level. For example,
> > we can simply log the DDL query string and the search_path for
> > database level DDL replication. But for table level DDL replication we
> > need to use the deparser format in order to
> > handle the potential syntax differences and schema mapping requests.
> >
>
> I think having different logging formats is worth considering but I am
> not sure we can distinguish it for database and table level
> replication because of the reasons mentioned above. One thing which
> may need a different format is the replication of global objects like
> roles, tablespace, etc. but we haven't analyzed them in detail about
> those. I feel we may also need a different syntax altogether to
> replicate such objects.
Yes, global objects are not schema qualified so we probably don't need to
use the deparser format for these. We plan to do some evaluation on replication
of global objects.
> Also, I think we may want to optimize the
> current format in some cases so that the WAL amount could be reduced.
>
> I feel if we think that deparsing is required for this project then
> probably at this stage it would be a good idea to explore ways to have
> independent ways to test it. One way is to do testing via the logical
> replication of DDL (aka via this patch) and the other is to write an
> independent test suite as Sawada-San seems to be speculating above
> [2]. I am not sure if there is any progress yet on the independent
> test suite front yet.
Testing DDL deparsing support has been discussed before in [1], we
will also take a close look on it.
[1] https://www.postgresql.org/message-id/5477FD78.1060306%402ndquadrant.com
Regards,
Zheng