Re: Logical replication failed with SSL SYSCALL error - Mailing list pgsql-hackers

From vignesh C
Subject Re: Logical replication failed with SSL SYSCALL error
Date
Msg-id CALDaNm3Yabfvm1=Wef1u8cHO517uRdzMr3eKD9SJShQvpftsJg@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication failed with SSL SYSCALL error  (shaurya jain <12345shaurya@gmail.com>)
Responses Re: Logical replication failed with SSL SYSCALL error  (shaurya jain <12345shaurya@gmail.com>)
List pgsql-hackers
On Wed, 19 Apr 2023 at 17:26, shaurya jain <12345shaurya@gmail.com> wrote:
>
> Hi Team,
>
> Could you please help me with this, It's urgent for the production environment.
>
> On Wed, Apr 19, 2023 at 3:44 PM shaurya jain <12345shaurya@gmail.com> wrote:
>>
>> Hi Team,
>>
>> Could you please help, It's urgent for the production env?
>>
>> On Sun, Apr 16, 2023 at 2:40 AM shaurya jain <12345shaurya@gmail.com> wrote:
>>>
>>> Hi Team,
>>>
>>> Postgres Version:- 13.8
>>> Issue:- Logical replication failing with SSL SYSCALL error
>>> Priority:-High
>>>
>>> We are migrating our database through logical replications, and all of sudden below error pops up in the source and
targetlogs which leads us to nowhere. 
>>>
>>> Logs from Source:-
>>> LOG:  could not send data to client: Connection reset by peer
>>> STATEMENT:  COPY public.test TO STDOUT
>>> FATAL:  connection to client lost
>>> STATEMENT:  COPY public.test TO STDOUT
>>>
>>> Logs from Target:-
>>> 2023-04-15 19:07:02 UTC::@:[1250]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection
timedout 
>>> 2023-04-15 19:07:02 UTC::@:[1250]:CONTEXT: COPY test, line 365326932
>>> 2023-04-15 19:07:03 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 1250) exited with exit
code1 
>>> 2023-04-15 19:07:03 UTC::@:[7155]:LOG: logical replication table synchronization worker for subscription "
sub_tables_2_180",table "test" has started 
>>> 2023-04-15 19:12:05 UTC:10.144.19.34(33276):postgres@webadmit_staging:[7112]:WARNING: there is no transaction in
progress
>>> 2023-04-15 19:14:08 UTC:10.144.19.34(33324):postgres@webadmit_staging:[6052]:LOG: could not receive data from
client:Connection reset by peer 
>>> 2023-04-15 19:17:23 UTC::@:[2112]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection
timedout 
>>> 2023-04-15 19:17:23 UTC::@:[1089]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection
timedout 
>>> 2023-04-15 19:17:23 UTC::@:[2556]:ERROR: could not receive data from WAL stream: SSL SYSCALL error: Connection
timedout 
>>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 2556) exited with exit
code1 
>>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 2112) exited with exit
code1 
>>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical replication worker" (PID 1089) exited with exit
code1 
>>> 2023-04-15 19:17:23 UTC::@:[7287]:LOG: logical replication apply worker for subscription "sub_tables_2_180" has
started
>>> 2023-04-15 19:17:23 UTC::@:[7288]:LOG: logical replication apply worker for subscription "sub_tables_3_192" has
started
>>> 2023-04-15 19:17:23 UTC::@:[7289]:LOG: logical replication apply worker for subscription "sub_tables_1_180" has
started
>>>
>>> Just after this error, all other replication slots get disabled for some time and come back online along with COPY
commandwith the new PID in pg_stat_activity. 
>>>
>>> I have a few queries regarding this:-
>>>
>>> The exact reason for disconnection (Few articles claim memory and few network)
This might be because of network failure, did you notice any network
instability, could you check the TCP settings.
You could check the following configurations tcp_keepalives_idle,
tcp_keepalives_interval and tcp_keepalives_count.
This means it will connect the server based on tcp_keepalives_idle
seconds specified , if the server does not respond in
tcp_keepalives_interval seconds it'll try again, and will consider the
connection gone after tcp_keepalives_count failures.

>>> Will it lead to data inconsistency?
It will not lead to inconsistency. In case of failure the failed
transaction will be rolled back.

>>> Does this new PID COPY command again migrate the whole data of the test table once again?
Yes, it will migrate the whole table data again in case of failures.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Next
From: Richard Guo
Date:
Subject: Re: Incremental sort for access method with ordered scan support (amcanorderbyop)