Re: Logical replication timeout problem - Mailing list pgsql-hackers

From Fabrice Chapuis
Subject Re: Logical replication timeout problem
Date
Msg-id CAA5-nLA9rSPWEfewyqAFU4-oXJBFhMYWnq15gWGPdGP5J8n3Qg@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
I made a mistake in the configuration of my test script, in fact I cannot reproduce the problem at the moment.
Yes, on the original environment there is physical replication, that's why for the lab I configured 2 nodes with physical replication.
I'll try new tests next week
Regards

On Fri, Nov 12, 2021 at 7:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis
<fabrice636861@gmail.com> wrote:
>
> Hello,
> Our lab is ready now. Amit,  I compile Postgres 10.18 with your patch.Tang, I used your script to configure logical replication between 2 databases and to generate 10 million entries in an unreplicated foo table. On a standalone instance no error message appears in log.
> I activate the physical replication between 2 nodes, and I got following error:
>
> 2021-11-10 10:49:12.297 CET [12126] LOG:  attempt to send keep alive message
> 2021-11-10 10:49:12.297 CET [12126] STATEMENT:  START_REPLICATION 0/3000000 TIMELINE 1
> 2021-11-10 10:49:15.127 CET [12064] FATAL:  terminating logical replication worker due to administrator command
> 2021-11-10 10:49:15.127 CET [12036] LOG:  worker process: logical replication worker for subscription 16413 (PID 12064) exited with exit code 1
> 2021-11-10 10:49:15.155 CET [12126] LOG:  attempt to send keep alive message
>
> This message look like strange because no admin command have been executed during data load.
> I did not find any error related to the timeout.
> The message coming from the modification made with the patch comes back all the time: attempt to send keep alive message. But there is no "sent keep alive message".
>
> Why logical replication worker exit when physical replication is configured?
>

I am also not sure why that happened may be due to
max_worker_processes reaching its limit. This can happen because it
seems you configured both publisher and subscriber in the same
cluster. Tang, did you also see the same problem?

BTW, why are you bringing physical standby configuration into the
test? Does in your original setup where you observe the problem the
physical standbys were there?

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: SKIP LOCKED assert triggered
Next
From: "Bossart, Nathan"
Date:
Subject: Re: Improving psql's \password command