I made a mistake in the configuration of my test script, in fact I cannot reproduce the problem at the moment. Yes, on the original environment there is physical replication, that's why for the lab I configured 2 nodes with physical replication.
On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote: > > Hello, > Our lab is ready now. Amit, I compile Postgres 10.18 with your patch.Tang, I used your script to configure logical replication between 2 databases and to generate 10 million entries in an unreplicated foo table. On a standalone instance no error message appears in log. > I activate the physical replication between 2 nodes, and I got following error: > > 2021-11-10 10:49:12.297 CET [12126] LOG: attempt to send keep alive message > 2021-11-10 10:49:12.297 CET [12126] STATEMENT: START_REPLICATION 0/3000000 TIMELINE 1 > 2021-11-10 10:49:15.127 CET [12064] FATAL: terminating logical replication worker due to administrator command > 2021-11-10 10:49:15.127 CET [12036] LOG: worker process: logical replication worker for subscription 16413 (PID 12064) exited with exit code 1 > 2021-11-10 10:49:15.155 CET [12126] LOG: attempt to send keep alive message > > This message look like strange because no admin command have been executed during data load. > I did not find any error related to the timeout. > The message coming from the modification made with the patch comes back all the time: attempt to send keep alive message. But there is no "sent keep alive message". > > Why logical replication worker exit when physical replication is configured? >
I am also not sure why that happened may be due to max_worker_processes reaching its limit. This can happen because it seems you configured both publisher and subscriber in the same cluster. Tang, did you also see the same problem?
BTW, why are you bringing physical standby configuration into the test? Does in your original setup where you observe the problem the physical standbys were there?