Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Column Filtering in Logical Replication
Date
Msg-id CAA4eK1JcQRQw0G-U4A+vaGaBWSvggYMMDJH4eDtJ0Yf2eUYXyA@mail.gmail.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Column Filtering in Logical Replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: Column Filtering in Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Sun, Mar 20, 2022 at 8:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Mar 18, 2022 at 10:42 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>
> > So the question is why those two sync workers never complete - I guess
> > there's some sort of lock wait (deadlock?) or infinite loop.
> >
>
> It would be a bit tricky to reproduce this even if the above theory is
> correct but I'll try it today or tomorrow.
>

I am able to reproduce it with the help of a debugger. Firstly, I have
added the LOG message and some While (true) loops to debug sync and
apply workers. Test setup

Node-1:
create table t1(c1);
create table t2(c1);
insert into t1 values(1);
create publication pub1 for table t1;
create publication pu2;

Node-2:
change max_sync_workers_per_subscription to 1 in potgresql.conf
create table t1(c1);
create table t2(c1);
create subscription sub1 connection 'dbname = postgres' publication pub1;

Till this point, just allow debuggers in both workers just continue.

Node-1:
alter publication pub1 add table t2;
insert into t1 values(2);

Here, we have to debug the apply worker such that when it tries to
apply the insert, stop the debugger in function apply_handle_insert()
after doing begin_replication_step().

Node-2:
alter subscription sub1 set pub1, pub2;

Now, continue the debugger of apply worker, it should first start the
sync worker and then exit because of parameter change. All of these
debugging steps are to just ensure the point that it should first
start the sync worker and then exit. After this point, table sync
worker never finishes and log is filled with messages: "reached
max_sync_workers_per_subscription limit" (a newly added message by me
in the attached debug patch).

Now, it is not completely clear to me how exactly '013_partition.pl'
leads to this situation but there is a possibility based on the LOGs
it shows.

-- 
With Regards,
Amit Kapila.

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Next
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors