Re: Logical replication failed with SSL SYSCALL error - Mailing list pgsql-hackers

From shaurya jain
Subject Re: Logical replication failed with SSL SYSCALL error
Date
Msg-id CAHHJ3NQCcdsURQXXn_+0Vuv296zUjTRYnx2Q5XysOCnJj9CevA@mail.gmail.com
Whole thread Raw
Responses Re: Logical replication failed with SSL SYSCALL error
List pgsql-hackers
Hi Team,

Could you please help me with this, It's urgent for the production environment.

On Wed, Apr 19, 2023 at 3:44 PM shaurya jain <12345shaurya@gmail.com> wrote:
Hi Team,

Could you please help, It's urgent for the production env?

On Sun, Apr 16, 2023 at 2:40 AM shaurya jain <12345shaurya@gmail.com> wrote:
Hi Team,

Postgres Version:- 13.8
Issue:- Logical replication failing with SSL SYSCALL error
Priority:-High

We are migrating our database through logical replications, and all of sudden below error pops up in the source and target logs which leads us to nowhere.

Logs from Source:-
LOG:  could not send data to client: Connection reset by peer
STATEMENT:  COPY public.test TO STDOUT
FATAL:  connection to client lost
STATEMENT:  COPY public.test TO STDOUT


Logs from Target:-
2023-04-15 19:07:02 UTC::@:[1250]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:07:02 UTC::@:[1250]:CONTEXT: COPY test, line 365326932
2023-04-15 19:07:03 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 1250) exited with exit code 1
2023-04-15 19:07:03 UTC::@:[7155]:LOG: logical replication table synchronization worker for subscription " sub_tables_2_180", table "test" has started
2023-04-15 19:12:05 UTC:10.144.19.34(33276):postgres@webadmit_staging:[7112]:WARNING: there is no transaction in progress
2023-04-15 19:14:08 UTC:10.144.19.34(33324):postgres@webadmit_staging:[6052]:LOG: could not receive data from client: Connection reset by peer
2023-04-15 19:17:23 UTC::@:[2112]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:17:23 UTC::@:[1089]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:17:23 UTC::@:[2556]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 2556) exited with exit code 1
2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 2112) exited with exit code 1
2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 1089) exited with exit code 1
2023-04-15 19:17:23 UTC::@:[7287]:LOG: logical replication apply worker for subscription "sub_tables_2_180" has started
2023-04-15 19:17:23 UTC::@:[7288]:LOG: logical replication apply worker for subscription "sub_tables_3_192" has started
2023-04-15 19:17:23 UTC::@:[7289]:LOG: logical replication apply worker for subscription "sub_tables_1_180" has started


Just after this error, all other replication slots get disabled for some time and come back online along with COPY command with the new PID in pg_stat_activity.

I have a few queries regarding this:-
  1. The exact reason for disconnection (Few articles claim memory and few network)
  2. Will it lead to data inconsistency?
  3. Does this new PID COPY command again migrate the whole data of the test table once again?
Please help we got stuck here.
--
Thanks and Regards,
Shaurya Jain
Mobile:- +91-8802809405



--
Thanks and Regards,
Shaurya Jain
Mobile:- +91-8802809405



--
Thanks and Regards,
Shaurya Jain
Mobile:- +91-8802809405

pgsql-hackers by date:

Previous
From: Ajit Awekar
Date:
Subject: Memory leak in CachememoryContext
Next
From: Salek Talangi
Date:
Subject: Re: [PATCH] Introduce array_shuffle() and array_sample()