Re: Apply worker fails if a relation is missing on subscriber even if refresh publication has not been refreshed yet - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Apply worker fails if a relation is missing on subscriber even if refresh publication has not been refreshed yet
Date
Msg-id CAA4eK1KJg=4T72irutiQx3RCtzwRno3JZDLiCRhRX7QGaeCCcw@mail.gmail.com
Whole thread Raw
In response to Apply worker fails if a relation is missing on subscriber even if refresh publication has not been refreshed yet  (Melih Mutlu <m.melihmutlu@gmail.com>)
Responses Re: Apply worker fails if a relation is missing on subscriber even if refresh publication has not been refreshed yet  (Melih Mutlu <m.melihmutlu@gmail.com>)
List pgsql-hackers
On Thu, Dec 22, 2022 at 7:16 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
>
> Hi hackers,
>
> I realized a behaviour of logical replication that seems unexpected to me, but not totally sure.
>
> Let's say a new table is created and added into a publication and not created on subscriber yet. Also "ALTER
SUBSCRIPTION... REFRESH PUBLICATION" has not been called yet.
 
> What I expect in that case would be that logical replication continues to work as it was working before the new table
wascreated. The new table does not get replicated until "REFRESH PUBLICATION" as stated here [1].
 
> This is indeed how it actually seems to work. Until we insert a row into the new table.
>
> After a new row into the new table, the apply worker gets this change and tries to apply it. As expected, it fails
sincethe table does not exist on the subscriber yet. And the worker keeps crashing without and can't apply any changes
forany table.
 
> The obvious way to resolve this is creating the table on subscriber as well. After that apply worker will be back to
workand skip changes for the new table and move to other changes.
 
> Since REFRESH PUBLICATION is not called yet, any change for the new table will not be replicated.
>
> If replication of the new table will not start anyway (until REFRESH PUBLICATION), do we really need to have that
tableon the subscriber for apply worker to work?
 
> AFAIU any change on publication would not affect logical replication setup until the publication gets refreshed on
subscriber.
>

I also have the same understanding but I think if we skip replicating
some table due to the reason that the corresponding publication has
not been refreshed then it is better to LOG that information instead
of silently skipping it. Along similar lines, personally, I don't see
a very strong reason to not throw the ERROR in the case you mentioned.
Do you have any use case in mind where the user has added a table to
the publication even though she doesn't want it to be replicated? One
thing that came to my mind is that due to some reason after adding a
table to the publication, there is some delay in creating the table on
the subscriber and then refreshing the publication and during that
time user expects replication to proceed smoothly. But for that isn't
it better that the user completes the setup on the subscriber before
performing operations on such a table? Because say there is some error
in the subscriber-side setup that the user misses then it would be a
surprise for a user to not see the table data. In such a case, an
ERROR/LOG information could be helpful for users.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Takamichi Osumi (Fujitsu)"
Date:
Subject: RE: Support logical replication of DDLs
Next
From: John Naylor
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum