Re: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Initial Schema Sync for Logical Replication
Date
Msg-id CAA4eK1JxL=T25uDDeVefGZQ4VbVZO2GfNau-DA3fE31U+VkSyg@mail.gmail.com
Whole thread Raw
In response to Re: Initial Schema Sync for Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Initial Schema Sync for Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, Apr 6, 2023 at 6:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Mar 30, 2023 at 10:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Mar 30, 2023 at 12:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > How can we postpone creating the pg_subscription_rel entries until the
> > > tablesync worker starts and does the schema sync? I think that since
> > > pg_subscription_rel entry needs the table OID, we need either to do
> > > the schema sync before creating the entry (i.e, during CREATE
> > > SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
> > > apply worker needs the information of tables to sync in order to
> > > launch the tablesync workers, but it needs to create the table schema
> > > to get that information.
> >
> > For the above reason, I think that step 6 of the initial proposal won't work.
> >
> > If we can have the tablesync worker create an entry of
> > pg_subscription_rel after creating the table, it may give us the
> > flexibility to perform the initial sync. One idea is that we add a
> > relname field to pg_subscription_rel so that we can create entries
> > with relname instead of OID if the table is not created yet. Once the
> > table is created, we clear the relname field and set the OID of the
> > table instead. It's not an ideal solution but we might make it simpler
> > later.
>
> While writing a PoC patch, I found some difficulties in this idea.
> First, I tried to add schemaname+relname to pg_subscription_rel but I
> could not define the primary key of pg_subscription_rel. The primary
> key on (srsubid, srrelid) doesn't work since srrelid could be NULL.
> Similarly, the primary key on (srsubid, srrelid, schemaname, relname)
> also doesn't work.
>

Can we think of having a separate catalog table say
pg_subscription_remote_rel for this? You can have srsubid,
remote_schema_name, remote_rel_name, etc. We may need some other state
to be maintained during the initial schema sync where this table can
be used. Basically, this can be used to maintain the state till the
initial schema sync is complete because we can create a relation entry
in pg_subscritption_rel only after the initial schema sync is
complete.

> So I tried another idea: that we generate a new OID
> for srrelid and the tablesync worker will replace it with the new
> table's OID once it creates the table. However, since we use srrelid
> in replication slot names, changing srrelid during the initial
> schema+data sync is not straightforward (please note that the slot is
> created by the tablesync worker but is removed by the apply worker).
> Using relname in slot name instead of srrelid is not a good idea since
> it requires all pg_subscription_rel entries have relname, and slot
> names could be duplicated, for example, when the relname is very long
> and we cut it.
>
> I'm trying to consider the idea from another angle: the apply worker
> fetches the table list and passes the relname to the tablesync worker.
> But a problem of this approach is that the table list is not
> persisted. If the apply worker restarts during the initial table sync,
> it could not get the same list as before.
>

Agreed, this has some drawbacks. We can try to explore this if the
above idea of the new catalog table doesn't solve this problem.


--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Fujii.Yuki@df.MitsubishiElectric.co.jp"
Date:
Subject: RE: Partial aggregates pushdown
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node