Hello,
Our lab is ready now. Amit, I compile Postgres 10.18 with your patch.Tang, I used your script to configure logical replication between 2 databases and to generate 10 million entries in an unreplicated foo table. On a standalone instance no error message appears in log.
I activate the physical replication between 2 nodes, and I got following error:
2021-11-10 10:49:12.297 CET [12126] LOG: attempt to send keep alive message
2021-11-10 10:49:12.297 CET [12126] STATEMENT: START_REPLICATION 0/3000000 TIMELINE 1
2021-11-10 10:49:15.127 CET [12064] FATAL: terminating logical replication worker due to administrator command
2021-11-10 10:49:15.127 CET [12036] LOG: worker process: logical replication worker for subscription 16413 (PID 12064) exited with exit code 1
2021-11-10 10:49:15.155 CET [12126] LOG: attempt to send keep alive message
This message look like strange because no admin command have been executed during data load.
I did not find any error related to the timeout.
The message coming from the modification made with the patch comes back all the time: attempt to send keep alive message. But there is no "sent keep alive message".
Why logical replication worker exit when physical replication is configured?
Thanks for your help
Fabrice
Thanks Tang for your script.
Our debugging environment will be ready soon. I will test your script and we will try to reproduce the problem by integrating the patch provided by Amit. As soon as I have results I will let you know.
Regards
Fabrice
On Friday, September 24, 2021 12:04 AM, Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Thanks for your patch, we are going to set up a lab in order to debug the function.
Hi
I tried to reproduce this timeout problem on version10.18 but failed.
In my trial, I inserted large amounts of data at publisher, which took more than 1 minute to replicate.
And with the patch provided by Amit, I saw that the frequency of invoking
WalSndKeepaliveIfNecessary function is raised after I inserted data.
The test script is attached. Maybe you can try it on your machine and check if this problem could happen.
If I miss something in the script, please let me know.
Of course, it will be better if you can provide your script to reproduce the problem.
Regards
Tang