Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Skipping logical replication transactions on subscriber side |
Date | |
Msg-id | CAD21AoDF1N98AC91KVQXecLcTQCdUtrHR+YU-xNUStULpFqm1A@mail.gmail.com Whole thread Raw |
In response to | Re: Skipping logical replication transactions on subscriber side (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Skipping logical replication transactions on subscriber side
|
List | pgsql-hackers |
On Fri, Apr 1, 2022 at 5:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Apr 1, 2022 at 4:44 PM Noah Misch <noah@leadboat.com> wrote: > > > > On Tue, Mar 29, 2022 at 10:43:00AM +0530, Amit Kapila wrote: > > > On Mon, Mar 21, 2022 at 5:51 PM Euler Taveira <euler@eulerto.com> wrote: > > > > On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote: > > > > I have fixed all the above comments as per your suggestion in the > > > > attached. Do let me know if something is missed? > > > > > > > > Looks good to me. > > > > > > This patch is committed > > > (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=208c5d65bbd60e33e272964578cb74182ac726a8). > > > > src/test/subscription/t/029_on_error.pl has been failing reliably on the five > > AIX buildfarm members: > > > > # poll_query_until timed out executing this query: > > # SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub' > > # expecting this output: > > # t > > # last actual query output: > > # f > > # with stderr: > > timed out waiting for match: (?^:LOG: done skipping logical replication transaction finished at 0/1D30788) at t/029_on_error.plline 50. > > > > I've posted five sets of logs (2.7 MiB compressed) here: > > https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing > > Thank you for the report. I'm investigating this issue. Looking at the subscriber logs, it successfully fetched the correct error-LSN from the server logs and set it to ALTER SUBSCRIPTION … SKIP: 2022-03-30 09:48:36.617 UTC [17039636:4] CONTEXT: processing remote data for replication origin "pg_16391" during "INSERT" for replication target relation "public.tbl" in transaction 725 finished at 0/1D30788 2022-03-30 09:48:36.617 UTC [17039636:5] LOG: logical replication subscription "sub" has been disabled due to an error : 2022-03-30 09:48:36.670 UTC [17039640:1] [unknown] LOG: connection received: host=[local] 2022-03-30 09:48:36.672 UTC [17039640:2] [unknown] LOG: connection authorized: user=nm database=postgres application_name=029_on_error.pl 2022-03-30 09:48:36.675 UTC [17039640:3] 029_on_error.pl LOG: statement: ALTER SUBSCRIPTION sub SKIP (lsn = '0/1D30788') 2022-03-30 09:48:36.676 UTC [17039640:4] 029_on_error.pl LOG: disconnection: session time: 0:00:00.006 user=nm database=postgres host=[local] : 2022-03-30 09:48:36.762 UTC [28246036:2] ERROR: duplicate key value violates unique constraint "tbl_pkey" 2022-03-30 09:48:36.762 UTC [28246036:3] DETAIL: Key (i)=(1) already exists. 2022-03-30 09:48:36.762 UTC [28246036:4] CONTEXT: processing remote data for replication origin "pg_16391" during "INSERT" for replication target relation "public.tbl" in transaction 725 finished at 0/1D30788 However, the worker could not start skipping changes of the error transaction for some reason. Given that "SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub’” didn't return true, some value was set to subskiplsn even after the unique key error. So I'm guessing that the apply worker could not get the updated value of the subskiplsn or its MySubscription->skiplsn could not match with the transaction's finish LSN. Also, given that the test is failing on all AIX buildfarm members, there might be something specific to AIX. Noah, to investigate this issue further, is it possible for you to apply the attached patch and run the 029_on_error.pl test? The patch adds some logs to get additional information. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
pgsql-hackers by date: