On Fri, Apr 28, 2023 at 4:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Yes, in this approach, we need to dump/restore objects while
> specifying with fine granularity. Ideally, the table sync worker dumps
> and restores the table schema, does copy the initial data, and then
> creates indexes, and triggers and table-related objects are created
> after that. So if we go with the pg_dump approach to copy the schema
> of individual tables, we need to change pg_dump (or libpgdump needs to
> be able to do) to support it.
We have been discussing how to sync schema but I'd like to step back a
bit and discuss use cases and requirements of this feature.
Suppose that a table belongs to a publication, what objects related to
the table we want to sync by the initial schema sync features? IOW, do
we want to sync table's ACLs, tablespace settings, triggers, and
security labels too?
If we want to replicate the whole database, e.g. when using logical
replication for major version upgrade, it would be convenient if it
synchronizes all table-related objects. However, if we have only this
option, it could be useless in some cases. For example, in a case
where users have different database users on the subscriber than the
publisher, they might want to sync only CREATE TABLE, and set ACL etc
by themselves. In this case, it would not be necessary to sync ACL and
security labels.
What use case do we want to support by this feature? I think the
implementation could be varied depending on how to select what objects
to sync.
One possible idea is to select objects to sync depending on how DDL
replication is set in the publisher. It's straightforward but I'm not
sure the design of DDL replication syntax has been decided. Also, even
if we create a publication with ddl = 'table' option, it's not clear
to me that we want to sync table-dependent triggers, indexes, and
rules too by the initial sync feature.
Second idea is to make it configurable by users so that they can
specify what objects to sync. But it would make the feature complex
and I'm not sure users can use it properly.
Third idea is that since the use case of synchronizing the whole
database can be achievable even by pg_dump(all), we support
synchronizing only tables (+ indexes) in the initial sync feature,
which can not be achievable by pg_dump.
Feedback is very welcome.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com