Re: pgsql: Add a view to show the stats of subscription workers. - Mailing list pgsql-committers
From | Amit Kapila |
---|---|
Subject | Re: pgsql: Add a view to show the stats of subscription workers. |
Date | |
Msg-id | CAA4eK1LNHm+KrQi_JM4nuSGbMwOMf5=sHqgrso0PT1eY64SbTA@mail.gmail.com Whole thread Raw |
In response to | Re: pgsql: Add a view to show the stats of subscription workers. (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-committers |
On Tue, Nov 30, 2021 at 12:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Nov 30, 2021 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Nov 30, 2021 at 9:05 AM Amit Kapila <akapila@postgresql.org> wrote: > > > > > > Add a view to show the stats of subscription workers. > > > > > > > sidewinder is showing failure in the test added by this patch: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2021-11-30%2005%3A05%3A25 > > > > I'll investigate and share my findings. > > I've looked at the issue. Here is my analysis: > > Looking at the result the test actually got, we had two error entries > for test_tab1 instead of one: > > # Failed test 'check the error reported by the apply worker' > # at t/026_worker_stats.pl line 33. > # got: 'tap_sub|INSERT|test_tab1|t > # tap_sub||test_tab1|t' > # expected: 'tap_sub|INSERT|test_tab1|t' > > The possible scenarios are: > > The table sync worker for test_tab1 failed due to an error unrelated > to apply changes: > > 2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin with > OID 2 is already active for PID 23706 > > At this time, the view had one error entry for the table sync worker. > After retrying table sync, it succeeded: > > 2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication table > synchronization worker for subscription "tap_sub", table "test_tab1" > has finished > > Then after inserting a row on the publisher, the apply worker inserted > the row but failed due to violating a unique key violation, which is > expected: > > 2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value > violates unique constraint "test_tab1_pkey" > 2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already exists. > 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote data > during "INSERT" for replication target relation "public.test_tab1" in > transaction 721 at 2021-11-30 06:24:04.305096+01 > > As a result, we had two error entries for test_tab1: the table sync > worker error and the apply worker error. I didn't expect that the > table sync worker for test_tab1 failed due to "replication origin with > OID 2 is already active for PID 23706” error. > > Looking at test_subscription_error() in 026_worker_stats.pl, we have > two checks; in the first check, we wait for the view to show the error > entry for the given relation name and xid. This check was passed since > we had the second error (i.g., apply worker error). In the second > check, we get error entries from pg_stat_subscription_workers by > specifying only the relation name. Therefore, we ended up getting two > entries and failed the tests. > > To fix this issue, I think that in the second check, we can get the > error from pg_stat_subscription_workers by specifying the relation > name *and* xid like the first check does. I've attached the patch. > What do you think? > I have responded to this on the original -hackers thread [1] as I think the test case might need some more changes. [1] - https://www.postgresql.org/message-id/CAA4eK1JheT_MTmQiGfWZJeHSEJGzd-mgPPwBwA%3DYWjhcTfykPA%40mail.gmail.com -- With Regards, Amit Kapila.
pgsql-committers by date: