Re: Logical replication timeout problem - Mailing list pgsql-hackers

From Fabrice Chapuis
Subject Re: Logical replication timeout problem
Date
Msg-id CAA5-nLABf97QKAR8K8NiQs2s6_323dvd7kpAdJ3GZ+p2iR5K7A@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Fabrice Chapuis <fabrice636861@gmail.com>)
Responses Re: Logical replication timeout problem
List pgsql-hackers
Hello,
Our lab is ready now. Amit,  I compile Postgres 10.18 with your patch.Tang, I used your script to configure logical replication between 2 databases and to generate 10 million entries in an unreplicated foo table. On a standalone instance no error message appears in log.
I activate the physical replication between 2 nodes, and I got following error:

2021-11-10 10:49:12.297 CET [12126] LOG:  attempt to send keep alive message
2021-11-10 10:49:12.297 CET [12126] STATEMENT:  START_REPLICATION 0/3000000 TIMELINE 1
2021-11-10 10:49:15.127 CET [12064] FATAL:  terminating logical replication worker due to administrator command
2021-11-10 10:49:15.127 CET [12036] LOG:  worker process: logical replication worker for subscription 16413 (PID 12064) exited with exit code 1
2021-11-10 10:49:15.155 CET [12126] LOG:  attempt to send keep alive message

This message look like strange because no admin command have been executed during data load.
I did not find any error related to the timeout.
The message coming from the modification made with the patch comes back all the time: attempt to send keep alive message. But there is no "sent keep alive message".

Why logical replication worker exit when physical replication is configured?

Thanks for your help

Fabrice



On Fri, Oct 8, 2021 at 9:33 AM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
Thanks Tang for your script. 
Our debugging environment will be ready soon. I will test your script and we will try to reproduce the problem by integrating the patch provided by Amit. As soon as I have results I will let you know.

Regards

Fabrice

On Thu, Sep 30, 2021 at 3:15 AM Tang, Haiying/唐 海英 <tanghy.fnst@fujitsu.com> wrote:

On Friday, September 24, 2021 12:04 AM, Fabrice Chapuis <fabrice636861@gmail.com> wrote:

>

> Thanks for your patch, we are going to set up a lab in order to debug the function.

 

Hi

 

I tried to reproduce this timeout problem on version10.18 but failed.

In my trial, I inserted large amounts of data at publisher, which took more than 1 minute to replicate.

And with the patch provided by Amit, I saw that the frequency of invoking

WalSndKeepaliveIfNecessary function is raised after I inserted data.

 

The test script is attached. Maybe you can try it on your machine and check if this problem could happen.

If I miss something in the script, please let me know.

Of course, it will be better if you can provide your script to reproduce the problem.

 

Regards

Tang

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Should AT TIME ZONE be volatile?
Next
From: Bruce Momjian
Date:
Subject: Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display