RE: Add support for specifying tables in pg_createsubscriber. - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Add support for specifying tables in pg_createsubscriber.
Date
Msg-id TY4PR01MB169075A23F8D5EED4154A4AF3943EA@TY4PR01MB16907.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Add support for specifying tables in pg_createsubscriber.  ("Euler Taveira" <euler@eulerto.com>)
List pgsql-hackers
On Friday, August 22, 2025 11:26 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Aug 22, 2025, at 6:57 AM, Zhijie Hou (Fujitsu) wrote:
> > The documentation appears incorrect and needs revision. The latest
> > version no longer depends on the option order; instead, it requires
> > users to provide database-qualified table names, such as -t
> > "db1.sch1.tb1". This adjustment allows the command to internally categorize
> tables by their target database.
> >
>
> I don't like this design. There is no tool that uses 3 elements. It is also confusing
> and redundant to have the database in the --database option and also in the
> --table option.
>
> I'm wondering if we allow using a specified publication is a better UI. If you
> specify --publication and it exists on primary, use it. The current behavior is a
> failure if the publication exists. It changes the current behavior but I don't
> expect someone relying on this failure to abort the execution. Moreover, the
> error message was added to allow only FOR ALL TABLES; the proposal is to
> relax this restriction.

I think allowing the use of an existing publication is a good idea. If we do not
want to modify the behavior of the current --publication option, introducing a
new option like --existing-publication might be prudent. With this change, it's
also necessary to implement checks to ensure column lists or row filters are not
used, as previously discussed. What do you think ?

>
> > I think we can explore extending the existing --clean option in a
> > separate patch to support table cleanup. This option is implemented in
> > a way that allows adding further cleanup objects later, so it should be easy to
> extend it for table.
> > Prior to this extension, it should be noted in the documentation that
> > users are required to clean up the tables themselves.
> >
>
> I would say that these cleanup feature (starting with the cleanup databases) is
> equally important as the feature that selects specific objects.
>
> > I agree that supporting row filter and column list is not
> > straightforward, and we can consider it separately and do not implement that
> in the first version.
> >
>
> The proposal above would allow it with no additional lines of code.
>
> >>
> >> It seems this proposal doesn't serve a general purpose. It is copying
> >> a *whole* cluster to use only a subset of tables. Your task with
> >> pg_createsubscriber is more expensive than doing a manual logical
> >> replication setup. If you have 500 tables and want to replicate only
> >> 400 tables, it doesn't seem productive to specify 400 -t options.
> >
> > Specifying multiple -t options should not be problematic, as users has
> > already done similar things for "FOR TABLE" publication DDLs. I think
> > it's not hard for user to convert FOR TABLE list to -t option list.
> >
>
> Of course it is. Shell limits the number of arguments.

I initially thought that other commands had a similar limit, so did not worry about
that, but given that publication features may necessitate specifying a larger
number of tables, I agree that this limitation arises from using multiple -t
options.

> >> There are some cases like a small set of big tables that this feature
> >> makes sense. However, I'm wondering if a post script should be used
> >> to adjust your setup.
> >
> > I think it's not very convenient for users to perform this conversion manually.
> > I've learned in PGConf.dev this year that some users avoid using
> > pg_createsubscriber because they are unsure of the standard steps
> > required to convert it into subset table replication. Automating this
> > process would be beneficial, enabling more users to use
> > pg_createsubscriber and take advantage of the rapid initial table
> synchronization.
> >
>
> You missed my point. I'm not talking about manually converting a physical
> replica into a logical replica. I'm talking about the plain logical replication setup
> (CREATE PUBLICATION, CREATE SUBSCRIPTION). IME this tool is beneficial
> for large clusters that we want to replicate (almost) all tables.

I understand, but the initial synchronization in plain logical replication can
be slow in many cases, which has been a major complaint I received
recently. Using pg_createsubscriber can significantly improve performance in
those cases, even when subset of tables is published, particularly if the tables
are large or if the number of tables are huge. Of course, there are cases where
plain logical replication outperforms pg_createsubscriber. However, we could
also provide documentation with guidelines to assist users in choosing when to
use this new option in pg_createsubscriber.

Best Regards,
Hou zj




pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Add support for specifying tables in pg_createsubscriber.