On Sun, Apr 20, 2025 at 11:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Thank you for the report and sharing the reproducers.
>
> On Fri, Apr 18, 2025 at 2:22 PM Shawn McCoy <shawn.the.mccoy@gmail.com> wrote:
> >
> > Hello,
> >
> > We have discovered a recent regression in the Origin handling of logical replication apply workers. We have found
thecause of the issue was due to the worker resetting its local origin session information during the processing of an
errorthat is
> > silently handled allowing the worker to continue. We suspect this is caused by the recent change made in the
followingthread,
https://www.postgresql.org/message-id/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com.
> >
> > The logical replication apply worker will originally setup the origin correctly. However, on the first insert will
callinto the trigger which will raise an exception. This exception will execute the error callback that resets the
originsession state. The exception will then be silently handled, returning execution back to the apply worker. In the
secondreproduction, a function based index is used with the same result.
> >
> > At this point, the apply worker can continue to commit these changes, but has cleared all local origin session
state.As a result, we will not update our remote to local LSN mapping of the origin. Allowing for duplicate data to be
applied.
>
> I agree with your analysis. When the subscriber restarts the logical
> replication, changes that happened since the last acknowledged LSN
> would be replicated again.
>
> With commit 3f28b2fcac33f, we reset the replication origin in
> apply_error_callback() but I guess moving it to the PG_CATCH() block
> in start_apply() might work.
>
Right. We have wrongly assumed in that commit that the apply worker
will exit after an ERROR, but as shown by this case, the ERROR could
be silently handled. So, +1, for moving replication origin reset to
PG_CATCH in start_apply.
--
With Regards,
Amit Kapila.