RE: Support logical replication of DDLs - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Support logical replication of DDLs
Date
Msg-id OS3PR01MB5718E75922D39AF51064AA4594659@OS3PR01MB5718.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Support logical replication of DDLs  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Wednesday, April 26, 2023 2:32 PM Masahiko Sawada <sawada.mshk@gmail.com>
> Subject: Re: Support logical replication of DDLs
> 
> On Tue, Mar 28, 2023 at 3:22 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Tuesday, March 28, 2023 1:41 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > On Mon, Mar 27, 2023 at 5:37 PM Amit Kapila <amit.kapila16@gmail.com>
> > > wrote:
> > > >
> > > > On Mon, Mar 27, 2023 at 12:07 PM Amit Kapila
> <amit.kapila16@gmail.com>
> > > wrote:
> > > > >
> > > > > On Mon, Mar 27, 2023 at 2:52 AM Tom Lane <tgl@sss.pgh.pa.us>
> wrote:
> > > > > >
> > > > >
> > > > > > I suggest taking a couple of steps back from the minutiae of the
> > > > > > patch, and spending some hard effort thinking about how the thing
> > > > > > would be controlled in a useful fashion (that is, a real design
> > > > > > for the filtering that was mentioned at the very outset), and
> > > > > > about the security issues, and about how we could get to a
> committable
> > > patch.
> > > > > >
> > > > >
> > > > > Agreed. I'll try to summarize the discussion we have till now on
> > > > > this and share my thoughts on the same in a separate email.
> > > > >
> > > >
> > > > The idea to control what could be replicated is to introduce a new
> > > > publication option 'ddl' along with current options 'publish' and
> > > > 'publish_via_partition_root'. The values of this new option could be
> > > > 'table', 'function', 'all', etc. Here 'all' enables the replication of
> > > > all supported DDL commands. Example usage for this would be:
> > > > Example:
> > > > Create a new publication with all ddl replication enabled:
> > > >   CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all');
> > > >
> > > > Enable table ddl replication for an existing Publication:
> > > >   ALTER PUBLICATION pub2 SET (ddl = 'table');
> > > >
> > > > This is what seems to have been discussed but I think we can even
> > > > extend it to support based on operations/commands, say one would like
> > > > to publish only 'create' and 'drop' of tables. Then we can extend the
> > > > existing publish option to have values like 'create', 'alter', and
> > > > 'drop'.
> > > >
> > >
> > > The other idea could be to that for the new option ddl, we input command
> tags
> > > such that the replication will happen for those commands.
> > > For example, ALTER PUBLICATION pub2 SET (ddl = 'Create Table, Alter
> > > Table, ..'); This will obviate the need to have additional values like 'create',
> 'alter',
> > > and 'drop' for publish option.
> > >
> > > The other thought related to filtering is that one might want to filter DDLs
> and
> > > or DMLs performed by specific roles in the future. So, we then need to
> > > introduce another option ddl_role, or something like that.
> > >
> > > Can we think of some other kind of filter for DDL replication?
> >
> > I am thinking another generic syntax for ddl replication like:
> >
> > --
> > CREATE PUBLICATION pubname FOR object_type object_name with (publish
> = 'ddl_type');
> > --
> >
> > To replicate DDLs that happened on a table, we don't need to add new syntax
> or
> > option, we can extend the value for the 'publish' option like:
> >
> > To support more non-table objects replication, we can follow the same style
> and write it like:
> > --
> > CRAETE PUBLICATION FOR FUNCTION f1 with (publish = 'alter'); -- function
> > CRAETE PUBLICATION FOR ALL OPERATORS IN SCHEMA op_schema with
> (publish = 'drop'); -- operators
> > CRAETE PUBLICATION FOR ALL OBJECTS with (publish = 'alter, create, drop');
> -- everything
> > --
> >
> > In this approach, we extend the publication grammar and users can
> > filter the object schema, object name, object type and ddltype. We can also
> add
> > more options to filter role or other infos in the future.
> 
> In this approach, does the subscriber need to track what objects have
> been subscribed similar to tables? For example, suppose that we
> created a publication for function func1 and created a subscription
> for the publication. What if we add function func2 to the publication?
> If we follow the current behavior, DDLs for func2 will be replicated
> to the subscriber but the subscriber won't apply it unless we refresh
> the publication of the subscription. So it seems to me that the
> subscriber needs to have a list of subscribed functions, and we will
> end up having lists of all types of objects.

Yes, If we follow the current behavior, we need to have lists of all types of
objects on subscriber.

But I think even if we only support replicating all DDLs,
when doing initial sync for FUNCTIONS, we still need to skip
the ALTER FUNCTION for the functions that is being synced(creating).

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
Next
From: David Rowley
Date:
Subject: Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns