Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAD21AoDF1N98AC91KVQXecLcTQCdUtrHR+YU-xNUStULpFqm1A@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Fri, Apr 1, 2022 at 5:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Apr 1, 2022 at 4:44 PM Noah Misch <noah@leadboat.com> wrote:
> >
> > On Tue, Mar 29, 2022 at 10:43:00AM +0530, Amit Kapila wrote:
> > > On Mon, Mar 21, 2022 at 5:51 PM Euler Taveira <euler@eulerto.com> wrote:
> > > > On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:
> > > > I have fixed all the above comments as per your suggestion in the
> > > > attached. Do let me know if something is missed?
> > > >
> > > > Looks good to me.
> > >
> > > This patch is committed
> > > (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=208c5d65bbd60e33e272964578cb74182ac726a8).
> >
> > src/test/subscription/t/029_on_error.pl has been failing reliably on the five
> > AIX buildfarm members:
> >
> > # poll_query_until timed out executing this query:
> > # SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
> > # expecting this output:
> > # t
> > # last actual query output:
> > # f
> > # with stderr:
> > timed out waiting for match: (?^:LOG:  done skipping logical replication transaction finished at 0/1D30788) at
t/029_on_error.plline 50. 
> >
> > I've posted five sets of logs (2.7 MiB compressed) here:
> > https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing
>
> Thank you for the report. I'm investigating this issue.

Looking at the subscriber logs, it successfully fetched the correct
error-LSN from the server logs and set it to ALTER SUBSCRIPTION …
SKIP:

2022-03-30 09:48:36.617 UTC [17039636:4] CONTEXT:  processing remote
data for replication origin "pg_16391" during "INSERT" for replication
target relation "public.tbl" in transaction 725 finished at 0/1D30788
2022-03-30 09:48:36.617 UTC [17039636:5] LOG:  logical replication
subscription "sub" has been disabled due to an error
:
2022-03-30 09:48:36.670 UTC [17039640:1] [unknown] LOG:  connection
received: host=[local]
2022-03-30 09:48:36.672 UTC [17039640:2] [unknown] LOG:  connection
authorized: user=nm database=postgres application_name=029_on_error.pl
2022-03-30 09:48:36.675 UTC [17039640:3] 029_on_error.pl LOG:
statement: ALTER SUBSCRIPTION sub SKIP (lsn = '0/1D30788')
2022-03-30 09:48:36.676 UTC [17039640:4] 029_on_error.pl LOG:
disconnection: session time: 0:00:00.006 user=nm database=postgres
host=[local]
:
2022-03-30 09:48:36.762 UTC [28246036:2] ERROR:  duplicate key value
violates unique constraint "tbl_pkey"
2022-03-30 09:48:36.762 UTC [28246036:3] DETAIL:  Key (i)=(1) already exists.
2022-03-30 09:48:36.762 UTC [28246036:4] CONTEXT:  processing remote
data for replication origin "pg_16391" during "INSERT" for replication
target relation "public.tbl" in transaction 725 finished at 0/1D30788

However, the worker could not start skipping changes of the error
transaction for some reason. Given that "SELECT subskiplsn = '0/0'
FROM pg_subscription WHERE subname = 'sub’” didn't return true, some
value was set to subskiplsn even after the unique key error.

So I'm guessing that the apply worker could not get the updated value
of the subskiplsn or its MySubscription->skiplsn could not match with
the transaction's finish LSN. Also, given that the test is failing on
all AIX buildfarm members, there might be something specific to AIX.

Noah, to investigate this issue further, is it possible for you to
apply the attached patch and run the 029_on_error.pl test? The patch
adds some logs to get additional information.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Is monotonous xid8 is a right way to do?
Next
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Re: Is monotonous xid8 is a right way to do?